Complex event processing (CEP) is an efficient and scalable paradigm for processing event streams. CEP systems are used in many applications to detect interesting patterns from input event streams which might originate from sensors, social media, stock applications, transportation applications, etc. In most CEP applications, events must be detected in near real-time (within a given latency bound) or these events are rendered useless. Moreover, the input event streams in CEP usually have a high volume of data which might exceed the processing capacity of the system, especially in the presence of limited resources (due to monetary budget, hosting on fog and edge nodes, etc.). In such situations, load shedding might be a way to reduce the system load in order to maintain the given latency bound. However, shedding load, of course, implies degradation of the result quality of the application. As a result, the overall goal of this project is to develop load shedding techniques that ensure a given latency bound for a CEP application, while maximizing the perceived quality of result.
In the first part of the project, we intend to propose several load shedding approaches that drop either input events or a portion of the internal state of a single CEP operator application. Dropping reduces the overload on the operator and increases its service rate (throughput) which helps the operator to maintain the given latency bound. The load shedding approaches that we intend to explore in this project follow either the black-box or white-box paradigm with the goal to meet a user-defined latency bound of the single operator application with minimal degradation of quality of result.
In the second part of the project, we explore how to control load shedding in a graph of connected operators. We have these multi-operator graphs particularly in big applications that aggregate multiple streams stepwise towards a sink and in geo-distributed infrastructures, such as clusters of fog or edge nodes. The goal is again to keep an end-to-end latency bound from sources to sink at maximal quality. In this part, this end-to-end latency bound covers processing at multiple operators along a path in the operator graph. The foremost questions we answer in this part are, therefore, at which position in the graph what kind and intensity of load shedding is most effective, i.e., reduces latency with the smallest quality loss. We investigate different query types and include concept drifts in the input streams. We look for a solution that assigns the optimal shedding strategy and intensity to each stream and simultaneously considers stream interdependencies.