PTCCS368 Stream Processing Syllabus:

PTCCS368 Stream Processing Syllabus – Anna University Part time Regulation 2023

COURSE OBJECTIVES:

 Introduce Data Processing terminology, definition & concepts
 Define different types of Data Processing
 Explain the concepts of Real-time Data processing
 Select appropriate structures for designing and running real-time data services in a business environment
 Illustrate the benefits and drive the adoption of real-time data services to solve real world problems

UNIT I FOUNDATIONS OF DATA SYSTEMS

Introduction to Data Processing, Stages of Data processing, Data Analytics, Batch Processing, Stream processing, Data Migration, Transactional Data processing, Data Mining, Data Management Strategy, Storage, Processing, Integration, Analytics, Benefits of Data as a Service, Challenges

UNIT II REAL-TIME DATA PROCESSING

Introduction to Big data, Big data infrastructure, Real-time Analytics, Near real-time solution, Lambda architecture, Kappa Architecture, Stream Processing,Understanding Data Streams, Message Broker, Stream Processor, Batch & Real-time ETL tools, Streaming Data Storage

UNIT III DATA MODELS AND QUERY LANGUAGES

Relational Model, Document Model, Key-Value Pairs, NoSQL, Object-Relational Mismatch, Manyto-One and Many-to-Many Relationships, Network data models, Schema Flexibility, Structured Query Language, Data Locality for Queries, Declarative Queries, Graph Data models, Cypher Query Language, Graph Queries in SQL, The Semantic Web, CODASYL, SPARQL

UNIT IV EVENT PROCESSING WITH APACHE KAFKA

Apache Kafka, Kafka as Event Streaming platform, Events, Producers, Consumers, Topics, Partitions, Brokers, Kafka APIs, Admin API, Producer API, Consumer API, Kafka Streams API, Kafka Connect API.

UNIT V REAL-TIME PROCESSING USING SPARK STREAMING

Structured Streaming, Basic Concepts, Handling Event-time and Late Data, Fault-tolerant Semantics, Exactly-once Semantics, Creating Streaming Datasets, Schema Inference, Partitioning of Streaming datasets, Operations on Streaming Data, Selection, Aggregation, Projection, Watermarking, Window operations, Types of Time windows, Join Operations, Deduplication

30 PERIODS
PRACTICAL EXERCISES: 30 PERIODS

1. Install MongoDB
2. Design and Implement Simple application using MongoDB
3. Query the designed system using MongoDB
4. Create a Event Stream with Apache Kafka
5. Create a Real-time Stream processing application using Spark Streaming
6. Build a Micro-batch application
7. Real-time Fraud and Anomaly Detection,
8. Real-time personalization, Marketing, Advertising

COURSE OUTCOMES:

CO1:Understand the applicability and utility of different streaming algorithms.
CO2:Describe and apply current research trends in data-stream processing.
CO3:Analyze the suitability of stream mining algorithms for data stream systems.
CO4:Program and build stream processing systems, services and applications.
CO5:Solve problems in real-world applications that process data streams.

TOTAL:60 PERIODS
TEXT BOOKS

1. Streaming Systems: The What, Where, When and How of Large-Scale Data Processing by Tyler Akidau, Slava Chemyak, Reuven Lax, O’Reilly publication
2. Designing Data-Intensive Applications by Martin Kleppmann, O’Reilly Media
3. Practical Real-time Data Processing and Analytics : Distributed Computing and Event Processing using Apache Spark, Flink, Storm and Kafka, Packt Publishing

REFERENCES

1. https://spark.apache.org/docs/latest/streaming-programming-guide.html
2. Kafka.apache.org