SlideShare a Scribd company logo
1 of 78
Coming Up Next
Date	 Title	 Speaker	
12/1	 A Practical Guide To Selecting A Stream
Processing Technology 	
Michael Noll	
12/15	 Streaming in Practice: Putting Apache
Kafka in Production 	
Roger Hoover

More Related Content

Viewers also liked

The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Servicesconfluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center confluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafkaconfluent
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Litaconfluent
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheetconfluent
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 

Viewers also liked (20)

The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Demystifying Stream Processing with Apache Kafka

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78. Coming Up Next Date Title Speaker 12/1 A Practical Guide To Selecting A Stream Processing Technology Michael Noll 12/15 Streaming in Practice: Putting Apache Kafka in Production Roger Hoover

Editor's Notes

  1. Unbounded, unordered, large-scale data sets are increasingly common in day-to-day business (web logs, mobile usage statistics, sensor networks, db changelogs… stream data is everywhere) Alongside, there is also an increasing interest in faster processing of this stream data, the ability to make sense of your data faster than what batch processing allows. That is where stream processing comes in and this talk is about demystifying stream processing and introducing the stream processing capability in Kafka
  2. We are going to talk a lot about stream processing today but let’s first attempt to define it. There are many ways of defining stream processing, I’m going to define it as a programming model that allows you to process unbounded datasets.
  3. So, let’s look into programming paradigms for a bit. There are many ways of defining paradigms for programming, but for today, I’m going to use a specific way which is the way your application get its input and produces output. If you use the input/output dimension, then there are three paradigms for programming. The systems that most of us work on and are familiar with are request/response systems. People are also familiar with batch systems - your Hadoop and DWH clusters. And the third paradigm is stream processing. I’ll talk about each of these briefly.
  4. Request/response systems are synchronous and tightly coupled, latency is really important and there is a higher risk of correlated or cascading failures. You send a request to a service and wait for a response. The way you scale that is by having several instances of the service.
  5. On the other end of the spectrum, there are batch systems - The big difference from request/response is the expectation around latency and the looser coupling. You send all your input and wait for the systems to crunch all that data and once they are done, they send all output at once, possibly hours or sometimes days later. Essentially, Batch systems view data as being bounded in nature.
  6. So coming to stream processing - This is somewhere in between request/response and batch. Here, you send some input and get some output. The definition of some is left to the program. The input could be one thing or everything. The output might be available at variable times too. Maybe you get one output item for every input item, or one output item for every n input items. It is a generalization of these two extremes. The fundamental shift is that stream processing views data as unbounded streams. It accepts the fact that you never really know when your data is complete.
  7. A commonly misunderstood thing about stream processing systems is the notion that it is lossy or inaccurate. But that really isn’t true. A lot of stream processing systems have been that way but the reason for that is a weakness of the system, and not an inherent property of the stream processing paradigm.
  8. The reason there is this misunderstanding is because there are various tricky tradeoffs involved. While processing unbounded data sets, the tradeoffs involved are correctness, cost and latency. And often times, stream processing systems are designed to optimize for some tradeoffs (in particular latency) instead of being designed to offer controls over those tradeoffs. Because some applications might care about correctness, like billing, while some may not have to, like log processing. Some might need to be low latency, like alerts, and some are ok with higher latency, like ETL. Some might not be ok with the cost required to optimize along the other 2 dimensions. The claim I’m making is that you can absolutely make stream processing systems compute accurate results just like batch systems can and do that efficiently.
  9. This talk is about stream processing at a company-wide scale. I’m interested in talking about how the entire company’s business logic and applications can be represented as streams of data. And that is a natural way of viewing applications. For every application, you have streams of input, some processing that maintains state and then streams of output. Through the rest of this talk, we are going to work through this example from the retail industry.
  10. At the top are inputs – which in the retail case are sales of things and shipments of things. And both these inputs are never-ending. So you can represent these as stream of continuous sales and stream of continuous shipments. These streams are useful for a number of things downstream - feed DWH/Hadoop for analytics or index in a search system or send to a monitoring system that flags fraudulent activity. Then you have 2 kinds of processing that needs to happen continuously - price adjustments and inventory adjustments that are both driven by demand data available through the sales and shipments streams. So if some item sells a lot, you want to do inventory adjustments to stock the stores with more of those items. Similarly, if the demand for an item increases, you’d like to do some price adjustments to take advantage of the increased demand. And the sooner the adjustments happen, the better it is for retail businesses.
  11. Looking at this picture a little differently, at the top are the things that fall in the “what happened” category, essentially those are events that are interesting to retail businesses.
  12. The rest of the things are merely reactions to those events where you do something – index in a search system or store in HDFS or DWH or store for query by monitoring systems.
  13. Stream processing is nothing but some functions applied on top of this “what happened” sort of stream event data.
  14. And since Kafka is the ubiquitous technology of choice for storing and moving stream data, stream processing broadly means writing functions on Kafka events.
  15. Kafka has been around for a while and has been widely adopted across 1000s of companies over the last 5 years since me and my colleagues first conceived the system while at LinkedIn. Since then, a number of stream processing approaches have emerged that are built on top of Kafka.
  16. The first is really do it yourself stream processing where you use the basic Kafka libraries and do the processing part yourself. And if you go down this path, there are a number of hard problems that you should know you’d have to deal with.
  17. Ordering - Ensuring data is processed in order While also having the ability to horizontally scale processing by partitioning data as well as processing Ensuring that you have guarantees on your processing even as machines fail Offering state mgt for operations that span records like aggregates need stream processing operators to maintain state. Which also has to be fault-tolerant. Having the ability to reprocess data - If you fix a bug in your stream processing logic or upgrade your application, you would need to reprocess past data to reflect the changed logic. Last but not the least – the treatment of time is crucial to correctness in stream processing. In the remainder of this talk, I will go through each of these, how Kafka and its Streams API addresses these problems
  18. That was the diy approach where you just subscribe to Kafka data and do it yourself. The other approach is to use one of the many stream processing frameworks that are out there. Spark has a streaming module which is quite cool, there is Storm, there is Samza that originated at LinkedIn, there is Flink. All of these work and are deployed with Kafka where the goal of these frameworks is to enable users to express their stream processing operations through convenient APIs.
  19. These frameworks are pretty cool and there is a ton of innovation happening in this space. A common property underlying their design is of making a faster Map Reduce. Due to this reason, most of the stream processing frameworks, including the one we previously built at LI, called Samza share a few traits These include - ● A custom way to configure your processing code as the properties of a job ● A custom mechanism to package and deploy your code. ● Resource management: Placing processes on machines in an optimal manner. This approach to stream processing actually works pretty well for existing workloads that run on Hadoop. The advantage is that you can make your Hadoop workloads go much faster due to these systems with minimal changes. These systems are a good fit for long-running queries or iterative processing on a central cluster like what is required for doing machine learning or graph processing. However, a lot of stream processing happens as part of a company’s core business logic and applications. In order to make stream processing accessible to applications developers, you’d have to allow developers to use stream processing operators with the tools of their choice. That is what we learnt while developing one of these stream processing frameworks - Samza - and getting it adopted at LinkedIn. What developers wanted was to write stream processing applications but what we gave them was a specific way to write their code as a job, ask the Samza team to allocate resources, package, monitor, deploy their code in a specific way on Samza. That didn’t work so well.
  20. The reason was that developers wanted to use tools of their choice. For instance, there are several Ops tools for deployment alone: There are Puppet, Chef and then there are the more recent ones - from Docker to Kubernetes. The ecosystem of deployment tools is thriving and dictating one like what the stream processing frameworks do doesn’t work for application developers.
  21. Since our goal is to enable stream processing as a general-purpose application development paradigm, few things are important here: 1. It should have few moving parts and external dependencies. Essentially, something that can be embedded in applications. 2. It should allow developers to use tools of their choice for configuring, packaging and deploying stream processing code.
  22. Essentially, designing it as a library achieves both those goals for application developers.
  23. No surprise, Kafka Streams is designed to be a library on top of Kafka and has no other external dependencies. The reason is that Kafka already provides the foundational primitives that are required for stream processing.
  24. There are 2 interfaces - there is a callback API and a DSL.
  25. This is the code that you write to build one of these. There is a main method, you set config to tell it where to connect to the Kafka. You express your computation using the DSL operators, then you say start. That is basically it. As a user, you don’t have to worry about packaging, deploying this code a certain way or how it scales. The scale-out and partitioning is handled transparently by Streams library. You can put this code in a Docker image and run it in Mesos or run it as is on bare metal. It doesn’t matter.
  26. With that basic understanding of what Streams is, let’s get back to the stream processing problems I previously described to see how Kafka’s Streams API addresses them.
  27. For those who know Kafka might be familiar with the log abstraction. This is what the storage backend of Kafka is based on. A persistent, replicated, write-ahead and append-only log. Where every record has a unique offset that identifies the record. The writes are always in the form of appends. The readers can use the offset to start reading from any point in the log and scan ahead in order. The log abstraction offers the ordering property that is required for stream processing.
  28. Physically, if you wanted to scale out a log, you can shard it into multiple partitions.
  29. And if you did that, then that is exactly the backend of Kafka where logically a log is a topic or category of data and physically, the log or topic lives in partitions on several machines or brokers. As events occur, we continually append it to the partition’s log. And we have a policy for maintaining a window of the log. Either it is based on time (say retain for a week or 2) or based on size (retain 1TB or less).
  30. There are 2 parts to scalability and parallelism, the first is data scalability which we just discussed and the 2nd is processing scalability. Kafka has a generic group management facility that allows a group of processes (essentially your application) to subscribe to a partitioned resource which is a topic. The powerful thing is that Kafka handles load balancing and distribution of partitions to the different application instances so you can easily scale consumption out.
  31. This is as true for consumers in Kafka consumers as it is for processing topologies written using the Streams API. If you see here, the consumer is replaced by a topology.
  32. You might start with one instance of your application when all partitions are assigned to this one instance.
  33. If you start more instances of your application, Kafka’s Streams API just replicates the topology instances and handles the load balancing by distributing the partitions evenly amongst the application instances that embed the Streams library.
  34. In addition to being easy to scale, processing is also fault tolerant.
  35. If you change your mind and downscale from 3 to one instance again, it automatically detects that there are fewer instances and balances partitions amongst instances again. So it is operationally cheap to process large amounts of data and autoscale your application using the tool of your choice.
  36. Let’s now look at state management, starting with why it comes up wrt stream processing.
  37. Here are some common operations that are useful for processing streams of data. Some operators like filter and map that process one-record-at-a-time may not need to maintain any state beyond one record. Once the record is processed, it is done and forgotten. Such operators are stateless. Then there are other operators like join() or windowed aggregates() where you have to hold onto some data for several records in order to compute the intended operation. Such operators are stateful.
  38. So we need state. But there are 2 choices for storing state required by your stream processing application. The most common is to leverage some kind of external K/V store or database for storing state. Here for every record or groups of records that need to use that state, an external RPC needs to be made to the K/V store. And this creates an impedance mismatch. Kafka can process hundreds of thousands of records per second while an external database accessed in this manner can only handle a few thousand requests/sec. There is also a lack of isolation - one fast processor can overwhelm the database shared by others.
  39. The other option is to chop up the state in partitions and push it inside the processor. This is what Kafka’s Streams API offers. The state database is partitioned the same way as the input streams so data required for processing is available locally on an instance. There are several advantages to doing this: Accessing local state is an order of magnitude faster. It offers better isolation — one fast processor can’t overload the K/V store that is also used by other live services. And it is incredibly flexible. You can embed a write-optimized or read-optimized data structure as the state is fully pluggable. Kafka Streams currently ships with an in-memory store and a RocksDB one but there is no reason why you can’t write your own data structure that is optimized for the processing pattern of your application.
  40. Kafka Streams offers local state that is also fault-tolerant. If one instance fails, the local state shards it hosted is evenly distributed amongst the other instances. You might wonder how the Streams API does that…. It makes the local state partition highly-available by transparently writing all updates made to a processor’s state store to a highly-available special Kafka topic. Even if app instances come and go, the Kafka changelog topic is highly-available so the app can recreate the state when it starts on a new machine by just reading from the changelog topic.
  41. Here’s how this capability works underneath the covers. All this is possible due to the log compaction feature in Kafka. This is essentially what allows you to write every update made to a store to a Kafka topic without running out of space. Assume that every update to your state database turns into a message which has a key and a value. If a row is frequently updated, there will be many messages with the same key (because each update turns into a message). On the other hand, if a row is never updated or deleted, it just stays unchanged in Kafka forever—it is never garbage-collected. Kafka's log compaction will sort this out and garbage-collect the old values so that we don't waste disk space since you only care about the latest value. This means that with log compaction, every row that exists in the database also exists in Kafka—it is only removed from Kafka after it is overwritten or deleted in the database. In other words, if you want to reprocess the entire history of stuff you stored in an external database thingy when your app instance fails, you can just go to the Kafka topic that contains a complete copy of the entire database and recreate your state
  42. The log abstraction along with the log compaction feature also helps solve another hard problem for stream processing - which is reprocessing data.
  43. Depending on what you are trying to do, reprocessing might involve working on a particular window of data or the entire history. Since the Kafka log is persistent, it is also replayable. Let’s say you rolled out your app, found a bug a day later. You might’ve produced incorrect results for the last 24 hours and in order to fix that, you’d want to deploy a new version of your app and also reprocess the last 24 hours worth of data in the Kafka log. This support is still evolving but something that is easily available through the Streams API. How do you do reprocessing with the log abstraction in Kafka? We know that for keyed data the log compaction capability lets you retain all events from the beginning of time in your Kafka log. If so, then reprocessing is simple - you just go back to the beginning of time, re-read and re-process. As the notion of time in Kafka is the offset, this translates into setting the position in the log to 0 and scanning ahead. And this can happen in parallel because these topics are multi subscriber.
  44. Here is how this works in production. So you have your existing application that reads from the tail and maintains some state locally. This is where the reads are happening.
  45. But then you can start another application (in a separate application id), let it consume from offset 0.
  46. As it consumes, the new instance of the state database it writes to, starts filling up…
  47. Until it catches up to the end,
  48. which is when you flip the switch and have the reads happen from the new state. Then you shutdown your old state. This reprocessing capability ends up being really important in stream processing engines since in the absence of this, you end up having to depend on Hadoop to reprocess data essentially ending up with a complex architecture
  49. Last and probably the more nuanced of all the problems, the idea of time, how it plays out in windowing is worth looking into.
  50. We have payed close attention to modeling time correctly in Kafka Streams and more improvements will be made in the future.
  51. Our work is influenced by this insight that the Dataflow team shared in one of their papers -- “Stream data is never complete and can always arrive out-of-order”
  52. When dealing with time in stream processing, there are 2 concepts worth paying attention to: Event time (when an event was created) and processing time (when an event was processed). Due to delays or bottlenecks, these 2 things can diverge and converge. Source of the loss of correctness in many stream processing systems is because they often conflate these 2 things leading to totally incorrect answers.
  53. Let’s dive straight into windowing to understand the distinction between event time and processing time. Consider an application that builds a real-time analytics dashboard that counts the number of visitors to your website. Assume that the time interval and hence window size you were interested in was 15 mins. Let’s say a mobile user visits the website and before the event gets sent out, the user’s phone loses network coverage only to regain network access 12 hours later. When that event makes it to the application’s servers, the correct window to reflect that visit in is not current 15 minute window (processing time), but the window 12 hours ago (event time). What you are really windowing by and counting is that time the event occurred and not the time it was received and processed. Essentially, this translates into the need for the stream processing library to be able to update past windows. That is the right thing to do and is exactly what Kafka’s Streams API does; it assumes that windows aren’t complete and late data can arrive.
  54. Up until now, we talked about how Kafka’s Streams API solves some essential problems in stream processing. Now we are going to talk about a unique capability in Streams that model some stream processing problems like joins and stateful processing effectively. I’ll start by saying that tables and streams are dual. Kafka’s Streams API fully integrates the concepts of tables and streams. All this will make a little more sense with an example.
  55. A Kafka message has a key and a value. Now assume that this is your Kafka topic with 3 messages - (key1, value1), (key2, value2) and the last message updates the value for key1 again. Notice that this stream is a little different - it has updates for previous values in the stream. The third message updates the value for key key1. Now as a thought exercise, consider mapping each message as a row in a table.
  56. If you did that, then here is what that table might look like with every message. It starts out by having just one row, the 2nd message adds another row, BUT the 3rd message actually updates the first row. And so on…
  57. Now go one step further and try to imagine what the change log for this table looks like – change log is a stream that has a message for every update made to the table. Then essentially, what you get back is again… a stream. So a stream with keys can be converted to a table that in turn represents the same stream. So tables are streams are dual. So what?
  58. First off, the stream-table duality is useful for modeling joins.
  59. In this example, we join 2 streams, a stream of sales with a stream of shipments. Each stream has a message of the same format (for simplicity). It has an itemID, the store code, and the count. For the sales stream, the count means number of items sold in a store and for the shipments stream, it means the number of items stocked in the store. Now notice that both streams are changelog streams, and so they really represent a table. The joined view gives us a really useful table - which is basically a real-time view of the inventory on hand for the company.
  60. There is an emergent property of some of the features I just described that may not be obvious. It is that the stream-table duality and local state enables building stateful services with ease. Here’s how:
  61. Back to our inventory-on-hand example application where we were able to compute this joined table.
  62. In reality, this is what the application looks like. Several instances of the application might be running, each assigned a subset of partitions of the sales and shipment stream. And hence, each instance hosts a partition of the inventory-on-hand table.
  63. Now if we allowed the state table to be queryable, then this is nothing but the inventory state application that exposes an API that returns the current count of an item in a particular store in real-time. And that count is kept updated by the stream join as sales take place and new shipments arrive. This is what the Streams API enables in the latest release of Apache Kafka 0.10.1 — not only have local state but have the ability to query it in place. This doesn’t make sense in all domains—often you just want to produce your output to an external database you know and trust. But in cases where your service needs to access a lot of data per request, having this data in the local memory or in a fast local RocksDB instance can be quite powerful. This capability is available in the latest release of Apache Kafka 0.10.1.0
  64. A lot of what stateful applications do is stream processing and several such independent apps or micro services need to be coupled to form a company’s business logic. Enabling such loose coupling is exactly what Kafka and its Streams API addresses
  65. Continuing building on top of our retail example- Back to our inventory-on-hand table that is embedded inside the inventory app. It has a changelog stream for the table available in a Kafka topic. Recollect that a changelog stream has one message per row that changed in the table.
  66. This changelog stream for the inventory-on-hand table is useful for enabling 2 more applications for this retail company: An application that needs to reorder inventory in real-time. Such an application subscribes to the changelog stream, looks at the latest count value for every item and reorders a certain quantity if the current value drops below a threshold. The same changeling stream enables another app - the price adjustment app which consumes the same changelog stream, but computes price adjustments based on demand. For assessing the pricing model, it might have to consult local or remote state. Notice that not only have we built the most important applications for this retail company as stream processors, they are also built as loosely coupled, asynchronous and stateful services. If one fails, the other ones keep consuming from the underlying stream and remain unaffected.
  67. There is an underlying theme in Kafka’s Streams API and that is - the focus on Simplicity.
  68. If stream processing systems don’t provide inbuilt support for say, local state, or reprocessing, users end up with a complex architecture that looks like this. You have Kafka for capturing streams of data. You have to set up the stream processing framework if you are using one. Then you have to set up your code as a job on that stream processing framework. If you are trying to count or aggregate data, then you possibly have to use some external database to store your intermediate state, so now you have to deploy and operate that as well. You possibly have something else to store the output of your transformation, which might be another database. That is already a lot of moving parts. But then you think, oh shit, what if I change the code that transforms data, I have to go back and reprocess the past results so everything matches the changed logic. So then you use Hadoop to pull past data and reprocess it using the same code but this time in Hadoop or Spark, which will populate an offline view. This is known as the Lambda architecture.
  69. The problem is that your downstream application is pretty complex. It has to query both these views, merge the results and then serve them. It works, but it is complicated. The number of distributed thingies that you have committed to is really large.
  70. This is another way to look at what you end up with to deploy your stream processing job.
  71. With Kafka Streams we’ve really tried to make that much simpler with fewer moving parts.
  72. We’ve taken some aggregates and moved it into the stream processor as local durable state so it comes out of the box. We’ve taken the idea of reprocessing and built primitives inside the stream processor so you no longer need to depend on an external batch system to support reprocessing. And your local state is query able, so your app can just read from the local state to display (in this case) a dashboard but could be anything else. The result is that you just have your Kafka cluster and your app and that’s it.
  73. Ok, you love Kafka Streams, but Kafka Streams is designed to be a Kafka library, which means that it can only process data that is already in Kafka. Which in turn means that you have to think about how to get data and out of Kafka.
  74. The previous talk in this series was about large-scale, real-time data ingestion using Kafka where we talked about Kafka’s Connect API. It makes it possible for developers to write connectors from external systems to Kafka easily. The idea is to provide a common runtime that does the hard work and allow all types of connectors to share the same behavior and be monitored the same way.
  75. Over the last few months, there are several dozens of these open-source connectors already available. So you can connect a pretty large set of sources and sinks to build streaming pipelines in this off-the-shelf way without having to write any code. Once the data is in Kafka, you can process it using the Streams API
  76. Here is how all this ties into the big picture. This is what some of us put in practice at LinkedIn. And that is the ability to operate this Kafka based streaming platform as the central nervous system for the company. It is the central bus enabling the development of loosely-coupled microservices that use it for messaging, the building block for stateful stream processing applications as well as the central feed for all the data going into your datawarehouse and Hadoop.
  77. This is our vision at Confluent to make this streaming platform a practical reality by offering a Kafka-based enterprise-ready streaming platform.
  78. This is an exciting and fast moving space with a ton of innovation happening. To make sense of all the different stream processing layers, you might want to attend the next talk, where you will learn how Kafka’s Streams API compares to other stream processing systems that require a separate processing infrastructure and how you go about picking one.