SlideShare ist ein Scribd-Unternehmen logo
1 von 58
@jaykreps
@apachekafka
@confluentinc
http://kafka.apache.org
http://confluent.io/blog

Weitere ähnliche Inhalte

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Top Kafka and Confluent resources

Hinweis der Redaktion

  1. Introduce self KIP-28 Stream processing in Kafka But first…
  2. This talk is going to be about the intersection of two trendy topics: distributed stream processing and microservices. Most people would see these two things as mostly unrelated. Microservices are all about chopping up big applications to enable small agile teams. Most people, in-so-far as they think about stream processing at all, would think it as a kind of low-latency version of map-reduce. I want to give a different vision for stream processing and show the relationship to microservices.
  3. My experience in this area came from LinkedIn. I was there from 2007-2014. Few dozen engineers => several thousand Niche to global web property Several backends How to scale software engineering? Went with microservices architecture Also built Apache Kafka and a stream processing framework and operated it as a service for the team there.
  4. So how did the microservices journey go? Pro: Did scale eng productivity Con: Very hard to reason about latency and availability as # services grew Reactive Summit - 2.png
  5. Either big monolithic apps with huge amounts of work per request, or lots of little microservices…still all that work is synchronous. Non-blocking I/O gives you concurrency but not asynchronicity Blocking I/O doesn’t work at all. Limit on blocking calls is very small—like two. Non-blocking I/O helps, but doesn’t change the fact that your availability and latency depends on the availability and latency of the entire graph. Leslie Lamport: “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable” Testing all these failure modes is a huge pain. Services aren’t free!
  6. Make async things truly async…i.e. let them happen later, take them out of the service call graph entirely.
  7. What do I mean by that? Obviously if you are displaying a UI to a user and that UI needs some data, and you need to call a service to fetch that data then you can’t make that fetch truly asynchronous because you can’t put data in a UI until you have the data. You can defintiely make it non-blocking—other work can happen while you are waiting—but you can’t not wait. So are there things that can be asynchronous? Hell yes.
  8. Let’s look at an example I like because most people understand the domain: Retail. This could be ecommerce or a big box retailer…doesn’t really matter since we’re going to keep this high level. You can think of the computation a retailer does as processing a sequence of sales and new product shipments, managing inventory, adjusting prices, handling logistics for fullfillment or stocing warehouses, and dealing with frad and analytics, etc. Which of these operations that this business does is synchronous and which is asynchronous? Well clearly the sale is synchronous. You give me money and I give you your product (or a promise of delivery of your product). That is the definition of a synchronous action. But pretty much everything else on this slide is asynchronous. So how is that stuff implemented? Well I think one of three things happen: It get’s made accidentally synchronous—either in a monolithic app or in a microservice It get’s run as a batch job once a day...super async You use a messaging system
  9. Queues: Good in theory, bad in practice Theory: - You need an intermediate store - “Reliable Broadcast” Practice: - World’s worst data store: unreliable, inflexible Unscalable Just adding complexity and solving no problem Every University in the world has a group in the CS department working on advancing the state of the art of databases. None have a group working on messaging. I don’t even think the companies that build messaging systems have people thinking about this. Typical solution: Enterprise messaging systems Not really a solution for microservices No scalability, can’t be operated as an elastic always on services
  10. Streaming platform is the successor to messaging Stream processing is how you build asynchronous services. That is going to be the key to solving my pipeline sprawl problem. Instead of having N^2 different pipelines, one for each pair of systems I am going to have a central place that hosts all these event streams—the streaming platform. This is a central way that all these systems and applications can plug in to get the streams they need. So I can capture streams from databases, and feed them into DWH, Hadoop, monitoring and analytics systems. They key advantage is that there is a single integration point for each thing that wants data. Now obviously to make this work I’m going to need to ensure I have met the reliability, scalability, and latency guarantees for each of these systems.
  11. Database data, log data Lots of systems—databases, specialized system like search, caches Business units N^2 connections Tons of glue code to stitch it all together
  12. This is what that architecture looks like relying on streaming. Two key uses: Acts as a data pipeline between data systems and apps Acts as a backbone for streams of data for stream processing
  13. I’ve talked about events and the case for asynchronous services. And I’ve mentioned stream processing a few times but haven’t really said what I mean by it or what it is good for. So I’ll explain what stream processing is and then I’ll talk about how you do stream processing with Kafka.
  14. Lot’s of ways to categorize computer programs: maybe functional vs object oriented, or distributed vs centralized. One of the most central ways to categorize is how the program gets its iinputs and how those are are translated into outputs After all this is what computer programs do, right, they translate inputs into outputs. 3 major categorizes, the first two everyone knows: request/response and batch The third many people have never heard of, and those who have often misunderstand it.
  15. HTTP/REST All databases Run all the time Each request totally independent—No real ordering Can fail individual requests if you want Very simple! About the future!
  16. “Ed, the MapReduce job never finishes when you watch it like that” Job kicks off at a certain time Cron! Processes all the input, produces all the input Data is usually static Hadoop! DWH, JCL Archaic but powerful. Can do analytics! Compex algorithms! Also can be really efficient! Inherently high latency
  17. Generalizes request/response and batch. Program takes some inputs and produces some outputs Could be all inputs Could be one at a time Runs continuously forever!
  18. Doesn’t mean you drop everything on the floor if anything slows down Streaming algorithms—online space Can compute median
  19. Companies == streams of events What a retail store do Streams Processes they execute can often be though of as stream processing.
  20. So what is Kafka? The second half of this talk will dive into what Kafka is.
  21. It’s a streaming platform. Lets you publish and subscribe to streams of data, stores them reliably, and lets you process them in real time. The second half of this talk will dive into Apache Kafka and talk about it acts as streaming platform and let’s you build event-driven stream processing microservices.
  22. Events = Record = Message Timestamp, an optional key and a value Key is used for partitioning. Timestamp is used for retention and processing.
  23. Not an apache log Different: Commit log Stolen from distributed database internals Key abstraction for systems, real-time processing, data integration Formalization of a stream Reader controls progress—unifies batch and real-time
  24. Relate to pub/sub
  25. Change to Logs Unify Batch and stream processing
  26. We talked about how a table can be represented as a stream of updates and this is a common use of a log. Many change data capture solutions work this way, they capture a log of changes from a database and replication it to destination databases. And in fact, Kafka has special support for this type of log.
  27. World is a process/threads (total order) but no order between
  28. Four APIs to read and write streams of events First two are easy, the producer and consumer allow applications to read and write to Kafka. The connect API allows building connectors that integrate Kafka with existing systems or applications. The streams api allows stream processing on top of Kafka. We’ll go through each of these briefly.
  29. Core: Data pipeline Venture bet: Stream processing
  30. So in effect a stream processing app is basically just some code that consumes input and produces output. So why not just use the producer and consumer APIs? Well, it turns out there are some hard parts to doing real-time stream processing.
  31. How do I partition up the processing and make it possible to dynamically scale my application up or down? How do I handle failures in my processing without losing message? How do I do processing that spans multiple records. For example, I might want to join an input streams of events representing customer activity to a database of side information about my customers, which is also evolving. Or I might want to count the number of customer events that occur in a given window of time. Finally if I update my code, how do I go back and rerun my program with the new logic? What does this process of code evolution look like?
  32. “Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” – Brian Kernighan K in K&R
  33. Planned from the beginning. Early prototypes in 2010, Samza evolved out of that.
  34. Goal is to let you get back to this picture, but let you build really sophisticated apps that are transparently distributed, fault-tolerant, and do non-trivial things with data.
  35. TODO: Like Streams library or scala collections or reactive thingies BUT stateful, fault-tolerant, distributed
  36. This is a simple java main method First, do some configutation to tell it which Kafka cluster to talk to Next tell it how to serialize this data. Then express my transformations. So if we zoom in on those transformations
  37. This is a word count that computes a running count for each word. Since this is a streaming count the count updates as new values with new words appear. So the output is properly interpreted as a “count so far”.
  38. Could be wrong: don’t know shit about reactive Kafka is reactive in the sense of the “Reactive Manifesto” Similarity: declarative, observer pattern Fundamental difference: sync vs async services Huge difference in practical problem domain Doesn’t implement reactive streams api, in fact doesn’t even make any sense, no such thing as back pressure Sync services can just fail and send back an error Async services need to eventually process everything Reprocessing State
  39. Simple library—takes input, let’s the app do transformation, publishes back results. Gives you a convenient, declarative DSL for doing transformations on data. Gives you powerful windowing capabilities based on the timestamp in the event so it handles out-of-order events well. Let’s you reprocess data. Works with low latency Allows powerful stateful processing for joins and aggregations.
  40. TODO: Summarize
  41. Change to “Logs make reprocessing easy”
  42. Time is hard Need a model of time Request/Response ignores the issue, you just set an aggressive timeout Batch solves the issue usually by just freezing all data for the day Stream processing needs to actually address the issue
  43. Curing the MapReduce hangover - Storm cluster in Mesos in AWS (docker?) - Decouple deployment etc - Libraries are really simple Config, packaging, deployment Kafka Streams: Manage the set of live processors and route data to them Uses Kafka’s group management facility External framework Start and restart processes Package processes Deploy code
  44. Kafka Streams: Manage the set of live processors and route data to them Uses Kafka’s group management facility External framework Start and restart processes Package processes Deploy code
  45. DBs handle tables Stream Processors handle streams
  46. We talked about this retail example where we have an input stream of sales that are occuring and an input stream of shipments of new products that are arriving. Well computed off this stream of sales and shipments is a table—the inventory on hand right now in each location. And this combination of sales with the inventory on hand is what is going to drive the process of reordering or raising the price of products that are selling out. The ability to combine tables of stored state, with streams of events is really core to stream processing in real-life examples. And it is one of the more powerful features of Kafka Streams.
  47. In fact, we talked about change logs for replicating updates to a mutable data store like a relational database. And Kafka Streams works really well with this type of stream.
  48. Instead of just taking the change stream out of a source database, and putting it in a destination, it allows you to transform that stream on the fly. In effect this lets you create a kind of materialized view computed off the input. And using the connect api you can replicate that stream into any type of destination store.
  49. In fact the connect and streams apis work really well together. If you think about ETL, meaning extract/transform/load, like you would have for a datawarehouse, Connect is doing the E and the L, the extracting and loading, except it is doing them in real-time as a continuous stream. And Streams is doing the T, the transformation...and of course it is also streaming.
  50. It’s a streaming platform. Lets you publish and subscribe to streams of data, stores them reliably, and lets you process them in real time. The second half of this talk will dive into Apache Kafka and talk about it acts as streaming platform and let’s you build event-driven stream processing microservices.