Synopses & Reviews
Traditional Database Management Systems are built on the concept of persistent data sets, that are stored reliably in stable storage and queried/updated several times throughout their lifetime. For several emerging application domains, however, data arrives and needs to be processed on a continuous (24x7) basis, and they need data-processing algorithms and systems that work over continuous data streams. This book is the first in-depth treatment of this important topic, covering basic data stream techniques, data stream synopses, mining data streams, advanced data stream computations, and systems and architectures for data stream management systems.
This volume focuses on the theory and practice of data stream management, and the novel challenges this emerging domain poses for data-management algorithms, systems, and applications. The collection of chapters, contributed by authorities in the field, offers a comprehensive introduction to both the algorithmic/theoretical foundations of data streams, as well as the streaming systems and applications built in different domains.
A short introductory chapter provides a brief summary of some basic data streaming concepts and models, and discusses the key elements of a generic stream query processing architecture. Subsequently, Part I focuses on basic streaming algorithms for some key analytics functions (e.g., quantiles, norms, join aggregates, heavy hitters) over streaming data. Part II then examines important techniques for basic stream mining tasks (e.g., clustering, classification, frequent itemsets). Part III discusses a number of advanced topics on stream processing algorithms, and Part IV focuses on system and language aspects of data stream processing with surveys of influential system prototypes and language designs. Part V then presents some representative applications of streaming techniques in different domains (e.g., network management, financial analytics). Finally, the volume concludes with an overview of current data streaming products and new application domains (e.g. cloud computing, big data analytics, and complex event processing), and a discussion of future directions in this exciting field.
The book provides a comprehensive overview of core concepts and technological foundations, as well as various systems and applications, and is of particular interest to students, lecturers and researchers in the area of data stream management.
About the Author
Minos Garofalakis is a Member of Technical Staff at the Internet Management Research Department of Bell Labs, Lucent Technologies. He received his BSc in 1992 from the Computer Engineering and Informatics Dept. of the University of Patras (UOPCEID). He also spent the following year at UOPCEID as a post-graduate fellow. In the Fall of 1993, he joined the graduate program in Computer Sciences at the University of Wisconsin-Madison, where he received his MSc (1994) and PhD (1998). He joined Bell Labs in Murray Hill, NJ, in September 1998. Minos' current research interests lie in the areas of data streaming, approximate query processing, data mining, network management, and XML databases. His writings have appeared in a number of ACM and IEEE conferences and journals, and he has presented tutorials on data streaming and approximate query processing in the leading international database and data-mining conferences. He is a member of ACM and IEEE, and has served as a program committee member for several conferences in the database area, including ACM SIGMOD, VLDB, ACM SIGKDD, and IEEE ICDE. Johannes Gehrke is an Assistant Professor in the Department of Computer Science at Cornell University. He joined Cornell after completing his PhD at the University of Wisconsin-Madison in 1999. Johannes' research lies in the areas of data mining, database systems, and ubiquitous computing. The recipient of an Alfred P. Sloan Fellowship, a National Science Foundation Career Award, an IBM Faculty Award, and the Cornell College of Engineering James and Mary Tien Excellence in Teaching Award, Johannes is the author of numerous publications on data mining and database systems. Johannes is the co-author of the textbook ``Database Management Systems (current in its third edition), published by McGrawHill in 2002, which is used at universities all over the world. Johannes has given tutorials on data mining and data stream processing at several international conferences and on Wall Street, and he has participated in federal data mining activities organized by the National Academies and the Office of Science and Technology Policy of the President of the United States. Rajeev Rastogi is the Director of the Internet Management Research Department at Bell Labs, Lucent Technologies. He received the BTech degree in Computer Science from the Indian Institute of Technology, Bombay in 1988, and the MSc and PhD degrees in Computer Science from the University of Texas, Austin, in 1990 and 1993, respectively. He joined Bell Labs in Murray Hill, NJ, in 1993 and became a Distinguished Member of Technical Staff (DMTS) in 1998. Rajeev Rastogi is active in the field of databases and has served as a program committee member for several conferences in the area. He currently serves on the editorial board of IEEE Transactions on Knowledge and Data Engineering. His writings have appeared in a number of ACM and IEEE publications and other professional conferences and journals. His research interests include database systems, network management, storage systems and knowledge discovery. His most recent research has focused on the areas of network topology discovery, monitoring, configuration and provisioning, data mining, and high-performance transaction systems.
Table of Contents
Part I: Introduction.- Part II: Computation of Basic Stream Synopses.- Part III: Mining Data Streams.- Part IV: Advanced Topics.- Part V: Systems and Architectures.- Part VI: Applications.