SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Building a real-
time streaming
platform using
Kafka Connect +
Kafka Streams
Jeremy Custenborder, Systems Engineer, Confluent
• Everything in the company is a real-time stream
• > 1.2 trillion messages written per day
• > 3.4 trillion messages read per day
• ~ 1 PB of stream data
• Thousands of engineers
• Tens of thousands of producer processes
Resources
• Confluent
• Company website: http://www.confluent.io
• Blog: http://www.confluent.io/blog
• Free Ebook “Making Sense of Stream Processing”
http://www.confluent.io/making-sense-of-stream-processing-ebook
• Apache Kafka
• http://kafka.apache.org
• Kafka Connect
• http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-
low-latency-data-pipelines
• Kafka Streams
• http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-
made-simple
Thanks!
Jeremy Custenborder | jeremy@confluent.io |
Download Kafka
and Confluent Platform
www.confluent.io/download

Weitere ähnliche Inhalte

Was ist angesagt?

Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

Was ist angesagt? (20)

Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPI
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 

Ähnlich wie Building a real-time streaming platform using Kafka Connect + Kafka Streams

introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
TarekHamdi8
 

Ähnlich wie Building a real-time streaming platform using Kafka Connect + Kafka Streams (20)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Kafka 탄생과 생태계
Kafka 탄생과 생태계Kafka 탄생과 생태계
Kafka 탄생과 생태계
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
introductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdfintroductiontoapachekafka-201102140206.pdf
introductiontoapachekafka-201102140206.pdf
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Web Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who CodeWeb Analytics using Kafka - August talk w/ Women Who Code
Web Analytics using Kafka - August talk w/ Women Who Code
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Data streaming-systems
Data streaming-systemsData streaming-systems
Data streaming-systems
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Building a real-time streaming platform using Kafka Connect + Kafka Streams

  • 1. Building a real- time streaming platform using Kafka Connect + Kafka Streams Jeremy Custenborder, Systems Engineer, Confluent
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. • Everything in the company is a real-time stream • > 1.2 trillion messages written per day • > 3.4 trillion messages read per day • ~ 1 PB of stream data • Thousands of engineers • Tens of thousands of producer processes
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Resources • Confluent • Company website: http://www.confluent.io • Blog: http://www.confluent.io/blog • Free Ebook “Making Sense of Stream Processing” http://www.confluent.io/making-sense-of-stream-processing-ebook • Apache Kafka • http://kafka.apache.org • Kafka Connect • http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale- low-latency-data-pipelines • Kafka Streams • http://www.confluent.io/blog/introducing-kafka-streams-stream-processing- made-simple
  • 56. Thanks! Jeremy Custenborder | jeremy@confluent.io | Download Kafka and Confluent Platform www.confluent.io/download

Hinweis der Redaktion

  1. Hi, I’m Neha Narkhede… There is a big paradigm shift happening around the world where companies are moving rapidly towards leveraging data in real-time and fundamentally moving away from batch-oriented computing. But how do you do that? Well that is what today’s talk is about. I’m going to summarize 6 years of work in 15 mins, so let’s get started.
  2. Unordered, unbounded and large-scale datasets are increasingly common in day-to-day business. Stream data means different things for different businesses. For retail, it might mean streams of orders and shipments, for finance, it might mean streams of stock ticker data while for web companies, it might mean streams of user activity data. Stream data is everywhere. At the same time, there is a huge push towards getting faster results: doing instant credit card fraud detection, doing instant credit card payment processing vs only 5 times a day, being able to detect and alert on a problem that causes retail sales to dip in seconds vs a day later (you can only imagine what that would do to retail companies over black Friday)
  3. So the takeaway is that businesses operate in real-time not batch, if you go to a store to buy something, you don’t wait there for several hours to get it. So data processing required to make key business decisions and to operate a business effectively should also happen in real-time. Here are some examples to support that claim…
  4. Event = something that happened. Different for different businesses.
  5. Log files are also event streams. For instance, every line in a log file is an event that in this case tells you how the service is being used.
  6. There is an inherent duality in tables and streams; Traditional databases are all about tables full of state but are not designed to respond to streams of events that modify those tables.
  7. Tables have rows that store the latest value for a unique key. But…no notion of time
  8. If you look at how a table gets constructed over time, you will notice that…
  9. The operations are actually a stream of events where the event is just the operation that modifies the table. Every database does this internally and it is called a changelog
  10. So events are everywhere, what next? We need to fundamentally move to event-centric thinking. For a retail website, there are possibly various avenues that generate the “product view” event. A standard thing to do is to ensure that all product view data ends up in Hadoop so you can run analytics on user interest to power various business functions from marketing to product positioning and so on.
  11. Reality about 100x more complex. In some corner, you are using some messaging system for app-to-app communication. You might have a custom way of loading data from various databases into Hadoop. But then more destinations appear over time and now you have to feed the same data to a search system, various caches etc. This is a common reality and a simplified version. 300 services ~100 databases Multi-datacenter Trolling: load into Oracle, search, etc
  12. The core insight is that a data pipeline is also an event stream.
  13. What you need instead of that scary picture is a central streaming platform at the heart of a datacenter. A central nervous system that collects data from various sources and feeds all other systems and apps that need to consume and process data in real-time. Why does this make sense?
  14. Why is a streaming platform needed? Because data sources and destinations add up over time. Initially you might have just the web app that produces the product view event and maybe you’ve only thought about analyzing it in Hadoop.
  15. But over time, the mobile app shows up that also produces the same data and several more applications as destinations for search, recommendations, security etc. Event centric thinking involves building a forward-compatible architecture. You will never be able to foresee what future apps might show up that will need the same data. So capture it in a central, scalable streaming platform that asynchronously feeds downstream systems.
  16. So how do you build such a streaming platform?
  17. That journey starts with Apache Kafka.
  18. At a high-level, Kafka is a pub-sub messaging system that has producers that capture events. Events are sent to and stored locally on a central cluster of brokers. And consumers subscribe to topics or named categories of data. End-to-end, producers to consumer data flow is real-time.
  19. Magic of Kafka is in the implementation. It is not just a pub-sub messaging system, it is a modern distributed platform… How so?
  20. All that means, you can throw lots of data at Kafka and have it be made available throughout the company within milliseconds. At LinkedIn and several other companies, Kafka is deployed at a large scale…
  21. In the last 5 years since it was open-sourced, it has been widely adopted by 1000s of companies worldwide.
  22. So Kafka is the foundation of the central streaming platform.
  23. Infrastructure is really only as useful as the data it has. The next step moving to a streaming platform based data architecture is solving the ETL problem.
  24. 0.9
  25. REST Apis for management
  26. Core: Data pipeline Venture bet: Stream processing
  27. Most people think they know…
  28. Doesn’t mean you drop everything on the floor if anything slows down Streaming algorithms—online space Can compute median
  29. About how inputs are translated into outputs (very fundamental)
  30. HTTP/REST All databases Run all the time Each request totally independent—No real ordering Can fail individual requests if you want Very simple! About the future!
  31. “Ed, the MapReduce job never finishes if you watch it like that” Job kicks off at a certain time Cron! Processes all the input, produces all the input Data is usually static Hadoop! DWH, JCL Archaic but powerful. Can do analytics! Compex algorithms! Also can be really efficient! Inherently high latency
  32. Generalizes request/response and batch. Program takes some inputs and produces some outputs Could be all inputs Could be one at a time Runs continuously forever!
  33. For some time, stream processing was thought of as a faster map-reduce layer useful for faster analytics, requiring deployment of a central cluster much like Hadoop. But in my experience, I’ve learnt that the most compelling applications that do stream processing look much more like an event-driven microservice and less like a Hive query or Spark job.
  34. Companies == streams What a retail store do Streams Retail - Sales - Shipments and logistics - Pricing - Re-ordering - Analytics - Fraud and theft
  35. Let’s dive into the real-time analytics and apps area
  36. Only one thing you can do if you think the world needs to change, you live in Silicon Valley—quit your job and do it. Mission: Build a Streaming Platform Product: Confluent Platform
  37. Thank you slide. Add to the end of your presentation.