NSF SPX: Collaborative Research: NG4S: A Next-generation Geo-distributed Scalable Stateful Stream Processing System.
Today, large scale cloud organizations are deploying datacenters and “edge” clusters globally to provide their users low-latency access to their services. Running stream applications across these geo-distributed sites are emerging as a daily requirement, such as making business decisions from marketing streams, identifying spam campaigns from social network streams, and analyzing existing genomes in different labs and countries to track the sources of a potential epidemic. However, while the progress has been encouraging, the existing efforts have dominantly centered around stateless stream processing, leaving another urgent trend-stateful stream processing-much less explored. A driving need is that the next-generation stream applications need to store and update state along with their processing, and process live data streams in a timely fashion from massive and geo-distributed data sets. Unfortunately, existing systems are mainly designed for stateless stream processing in low-latency intra-datacenter settings and do not scale well for running stream applications that contain large distributed states, suffering a significantly centralized bottleneck and high latency. The goal of this project is to build a geo-distributed scalable stateful stream processing system.
This project will break three prevailing abstractions and redefine them to collectively deliver a next-generation geo-distributed scalable stateful stream processing system. First, we will break the prevailing centralized “one master/many workers” paradigm abstraction, introducing a decentralized “many masters/many slaves” paradigm abstraction that will revolutionarily improve the scalability of stream processing systems. Second, we will break the prevailing stateless operator abstraction, redefining it as new stateful operator abstraction that serves as the basic building blocks of the proposed system. The key innovation here is a new in-memory data structure for storing state and amortizing the memory overhead so as to handle the “big data” need. Third, we will break the prevailing replication-based or checkpointing-based lineage recovery abstraction, introducing a new shard-based parallel recovery abstraction that will handle failures and stragglers in a scalable way to ensure system reliability. Fourth, we will implement the proposed system on a real-life stream processing system (Apache Storm), and validate it by performing real-world experiments, in connection and consultation with the industry partners (two collaboration letters from Citrix and Akamai are included).
Figure 1: Motivating applications illustrate the need for stateful stream processing systems.
Figure 2: FP4S system design [Paper Download].
- Hailu Xu*, Pinchao Liu*, Susana Cruz-Diaz, Dilma Da Silva, and Liting Hu, (* co-first authors) “SR3: Customizable Recovery for Stateful Stream Processing Systems”, In Proceedings of IFIP/ACM Middleware 2020, December 2020.
- Pinchao Liu, Hailu Xu, Dilma Da Silva, Qingyang Wang, Sarker Tanzir Ahmed, and Liting Hu “FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications”, In Proceedings of 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), May 2020. [PDF] [PPT Slides]
- Hailu Xu, Liting Hu, Pinchao Liu, and Boyuan Guan, “Exploiting the Spam Correlations in Scalable Online Social Spam Detection”, In Proceedings of 2019 International Conference on Cloud Computing (CLOUD 2019), June 2019.（Best Student Paper Award）[PDF] [PPT Slides]
- Hailu Xu, Liting Hu, Pinchao Liu, Yao Xiao, Wentao Wang, Jai Dayal, Qingyang Wang, and Yuzhe Tang, “Oases: An Online Scalable Spam Detection System for Social Networks”, In Proceedings of the 11th International Conference on Cloud Computing (IEEE Cloud), July 2018. [PDF] [PPT Slides]
- Hailu Xu, Boyuan Guan, Pinchao Liu, William Escudero, and Liting Hu, “Harnessing the Nature of Spam in Scalable Online Social Spam Detection”, In The 2018 International Workshop on Big Social Media Data Management and Analysis, in conjunction with IEEE Big Data, December 2018. [PDF] [PPT Slides]
- Xin Chen, Ymir Vigfusson, Douglas M. Blough, Fang Zheng, Kun-Lung Wu, and Liting Hu, “GOVERNOR: Smoother Stream Processing Through Smarter Backpressure”, In Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC), July 2017. [PDF]
- FP4S [Demo Video Link]
All codes are provided under GNU General Public License (GPL) or as a web-service, which guarantees your freedom to use the software for academic purposes. For more information, help or comments please contact Dr. Hu.
- FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications [Source Code Download]
FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications. The novelty of FP4S is that we organize all the application’s operators into a distributed hash table (DHT) based consistent ring to associate each operator with a unique set of neighbors.
- Curriculum Development Activities
- Course 6913: Advanced Topics in Information Processing [Video Playlist Link]
- Stream Processing Systems Tutorials
This work is supported by the National Science Foundation (NSF) under NSF-SPX-1919126.