Geo-distributed cloud & edge computing
The growing ubiquity of mobile and sensor devices paves the way towards the Internet of Things (IoT) by enabling large-scale applications in the areas of smart homes, smart cities, environmental monitoring, healthcare and social networks etc. These IoT applications on the one hand produce large and fluctuating volumes of data and on the other hand requires high-speed, highly available and resource efficient data processing ensuring low response times for the user queries. For example, semi-autonomous car applications assist drivers in reaching their destinations safely by leveraging online analysis of traffic situations and driving patterns. Likewise, surveillance applications can assist police officers in identifying persons with suspicious activities in the nearby vicinity by online processing of camera streams.
Aggregating and processing data at a centralized cloud platform is not adequate to meet the requirements of these IoT applications. Consequently, there is a paradigm shift pushing the horizon of utility computing model offered by cloud platforms towards the edge of the network – leveraging resources in the network both for compute and storage – enabling application logic to execute on geographically distributed resources throughout the network including routers, edge compute clusters and backend data centres. To this end, large-scale cloud providers such as Microsoft and Google are deploying data centers and edge clusters globally to provide their users low latency access to the computational resources and the cloud services.
In our research, we address many challenges related to processing and management of data in such geo-distributed settings to support latency and availability requirements of IoT applications while ensuring efficient utilization and orchestration of heterogeneous compute and storage resources at the different levels of network hierarchy.
Modern IoT applications need to be able to react to situations occurring in the surrounding world. Thus, a growing number of sensor streams need to be processed in order to detect situations which the application or user is interested in, e.g., the traffic situation in a smart city or the detection of a person in a video surveillance application. To detect situations from sensor streams, Complex Event Processing (CEP) is a well-established paradigm building the bridge between sensors and consumers, i.e., applications or users that are interested in situations. In CEP, domain experts specify the events to be detected following the well-known operator graph model. Each operator defines how to detect event patterns on its incoming event streams and specifies the events to be produced whenever an event pattern was detected. The operators form a network and allow for the stepwise analysis from low level sensor data up to events of interest delivered to the consumers attached to the CEP system. In our research, we tackle essential challenges for utilizing CEP for the IoT applications.
- With the increasing number of data sources and increasing volume at which data is produced parallelization of event detection is becoming of tremendous importance to limit the time events need to be buffered before they actually can be processed by CEP operators. To this end, we are investigating methods for scalable and dynamic adaptation of the parallelization degree of CEP operators so that operator network can meet latency requirements at minimal cost and can ensure the consistency of produced complex event streams w.r.t. the sequential processing.
- Fault tolerance is of critical importance to many IoT applications involving CEP systems. In particular, the event streams provided to consumers of CEP system should be indistinguishable from an execution in which the hosts of some operators fail or unavailable during a temporary partitioning of the network. The CEP operators are usually stateful making it more challenging to provide efficient recovery from failures and to ensure consistency of produced event streams (i.e, no false positives, false negatives and duplicates). To this end, we investigate models and mechanisms to ensure fault-tolerant and correct processing of CEP system in the face of diverse failures.
- To support mobile applications and event sources, we investigate methods for the placement and migration of CEP operators according to the consumer’s location to minimize latency and network utilization. Further research includes on-demand adaptation algorithms to dynamically assign cloud and edge resource to operators in CEP system to minimize processing overhead.
- Respecting privacy is of paramount importance to ensure user acceptance of IoT applications, especially data captured from sensors is often highly privacy sensitive. To this end, we investigate different access control methods for protecting private information in CEP, while ensuring minimum degradation in quality of data to implement IoT applications.
- Parallel complex event processing to meet probabilistic latency bounds
- Privacy in Stream Processing
- CEP in the Large
Today, myriads of ubiquitous sensors integrated in many modern devices are collecting an innumerable amount of raw observations, e.g., GPS positions. Furthermore, to benefit from this enormous amount of data, there is a recent trend of utilizing machine learning algorithms to derive knowledge from the gathered observations. For example, a smartphone can learn the best time to leave home dependent on the weather conditions and travel time to work.
Today every smart phone, every application and every company, i.e., every connected entity, is gathering and analysing such observations. Clearly, greater benefit can arise if such a knowledge is not only used locally but also accessible to others. This raises the need for the data management mechanisms that allows for the distributed maintenance of the observations and knowledge gathered by different entities. In particular, we investigate effective and efficient methods for indexing, updating and retrieving large and changing quantities of data in a large-scale distributed system.
- Adaptable Pervasive Flow Ensembles