SlideShare ist ein Scribd-Unternehmen logo
1 von 66
If I’m talking about some console
applications, stills, moonshine and
police now.
That means prayer didn’t work.
Fuck.
Large scale, distributed and reliable
messaging with Kafka
Rafał Hryniewski
@r_hryniewski
fb.me/hryniewskinet
.NET Dev
Blogger
Speaker
Community leader
https://hryniewski.net
rafal@hryniewski.net
Large scale, distributed and reliable
messaging with Kafka
Agenda
 History
 Use cases
 Producers and consumers
 Topics, partitions and clusters
 Streams, AdminClient and Connectors
 Kafka in .NET and Cloud
 External stream processing systems (spark/storm/flink/apex)
History
History
 Developed in LinkedIn
 Open sourced in 2011
 Named after Franz Kafka because it’s optimized for writing
Kafka is basically:
 Open source
 Written in Scala
 Message broker
 Stream processing platform
 High throughput & low latency
 Scalable
 Designed as distributed transaction log
Messaging 101
Event driven architecture 101
Used by
Messaging
Event sourcing
Stream processing
Commit log
User activity tracking
Metrics
Log aggregation
Kafka APIs
 Producer API
 Consumer API
 Connector API
 Streams API
 AdminClient API
Producer
Producer API
 Allows to publish stream of messages to one or more topics
 Asynchronous and thread safe (in original implementation)
 Can deliver messages “at least once”, “at most once” or “exactly once”
 Can batch messages
 Can use partitions for load balancing purpose
Consumer API
 Allows subscription to topic and receiving messages from it
 Messages are pulled from topic – each consumer can process messages at its
own pace
 Supports long polling to avoid being stuck in a loop
 Each consumer handles its own position
 Does not support acknowledgements but can rewind from any offset
 Supports consumer groups
Topics
Topics
 Each topic has a name, is partitioned and is multi-subscriber
 Kafka persists each published message. Retention period is configurable.
 Consumer controls its own offset
 Partition must fit on the server but topic can be partitioned across multiple nodes
 Partitions are replicated across cluster to ensure fault tolerance, each partition has
a leader replica
Cluster
 Kafka runs in cluster
 Cluster has multiple servers/nodes
 Cluster can run on multiple datacenters
 Cluster stores messages in partitioned topics
 Zookeeper coordinates servers in cluster
Streams
Streams
 Acts as stream processor
 Allows consuming inputs from one or more topics and provide processed output to
other topic
 Works (almost) in real time
AdminClient
Connector API
 Build your own reusable consumers/producers
 Integrate Kafka with existing applications
Example Connectors
Kafka in .NET
 Main library is confluent-kafka-dotnet
 Supports Avro serialization/deserialization with schema registry
 Easy to learn, hard to master
Kafka in Azure
 Azure Event Hub are fully compatible with Kafka enabled applications (you just
need to change connection configuration)
 You can setup Kafka Cluster in HDInsight (it’s not cheap)
Kafka in AWS
 Amazon Managed Streaming for Apache Kafka (Amazon MSK)
 Amazon Kinesis has somewhat similar capabilities
Kafka in GCP
 Only in VMs/Containers
Kafka in IBM Cloud
 IBM Event Streams is basically Kafka-as-a-service
External stream processing systems
Apache Apex
 Platform used to help in development of stream and batch oriented applications.
 Designed to process data in-motion
 Performant
 Scalable
 Fault tolerant
 Allows creation of various functions without thinking about distributed environment
Apache Flink
 Focused on parallel, pipelined processing of streams
 Runs Java, Scala, Python and SQL Code
 Manages state
 Great for data analysis and event correlation
Apache Spark
 Analytics engine for big data processing
 Data processing framework
 Used for processing and transforming streams of data
 Also used for training machine learning algorithms
 Great for ETL (Extract, transform, and load) processes
 Supports Java, Scala, Python and R
Apache Storm
 Distributed real-time computation system
 Great for real time analytic systems (in example fraud detection)
 Can handle MASSIVE amounts of data on the fly
 Works with ANY programming language
bit.ly/rh-kafka
Questions?
@r_hryniewskifb.me/hryniewskinet

Weitere ähnliche Inhalte

Was ist angesagt?

A Streaming Platform Architecture Based on Apache Kafka
A Streaming Platform Architecture Based on Apache KafkaA Streaming Platform Architecture Based on Apache Kafka
A Streaming Platform Architecture Based on Apache Kafka
confluent
 

Was ist angesagt? (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Kafka connect
Kafka connectKafka connect
Kafka connect
 
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
 
A Streaming Platform Architecture Based on Apache Kafka
A Streaming Platform Architecture Based on Apache KafkaA Streaming Platform Architecture Based on Apache Kafka
A Streaming Platform Architecture Based on Apache Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Schema registry
Schema registrySchema registry
Schema registry
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
AWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up LoftAWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up Loft
 
Deploying Kafka on DC/OS
Deploying Kafka on DC/OSDeploying Kafka on DC/OS
Deploying Kafka on DC/OS
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Kafka ops-new
Kafka ops-newKafka ops-new
Kafka ops-new
 

Ähnlich wie Large scale, distributed and reliable messaging with Kafka

Ähnlich wie Large scale, distributed and reliable messaging with Kafka (20)

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Edbt19 paper 329
Edbt19 paper 329Edbt19 paper 329
Edbt19 paper 329
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Down the RabbitMQ Hole
Down the RabbitMQ HoleDown the RabbitMQ Hole
Down the RabbitMQ Hole
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 

Mehr von Rafał Hryniewski

Mehr von Rafał Hryniewski (17)

Azure messaging
Azure messagingAzure messaging
Azure messaging
 
Azure developer
Azure developerAzure developer
Azure developer
 
Great webapis
Great webapisGreat webapis
Great webapis
 
DevSecOps - security all the way
DevSecOps - security all the wayDevSecOps - security all the way
DevSecOps - security all the way
 
DevSecOps - Security all the way
DevSecOps - Security all the wayDevSecOps - Security all the way
DevSecOps - Security all the way
 
Anchor modeling
Anchor modelingAnchor modeling
Anchor modeling
 
Meet Gremlin – your guide through graphs in Cosmos DB
Meet Gremlin – your guide through graphs in Cosmos DBMeet Gremlin – your guide through graphs in Cosmos DB
Meet Gremlin – your guide through graphs in Cosmos DB
 
Shit happens – achieve extensibility, modularity and loosely coupled architec...
Shit happens – achieve extensibility, modularity and loosely coupled architec...Shit happens – achieve extensibility, modularity and loosely coupled architec...
Shit happens – achieve extensibility, modularity and loosely coupled architec...
 
Web app security essentials
Web app security essentialsWeb app security essentials
Web app security essentials
 
Public speaking - why am I doing this to myself and why you should too?
Public speaking - why am I doing this to myself and why you should too?Public speaking - why am I doing this to myself and why you should too?
Public speaking - why am I doing this to myself and why you should too?
 
Azure SQL - more or/and less than SQL Server
Azure SQL - more or/and less than SQL ServerAzure SQL - more or/and less than SQL Server
Azure SQL - more or/and less than SQL Server
 
Blazor
BlazorBlazor
Blazor
 
Shodan
ShodanShodan
Shodan
 
Essential security measures in ASP.NET MVC
Essential security measures in ASP.NET MVC Essential security measures in ASP.NET MVC
Essential security measures in ASP.NET MVC
 
.NET, Alexa and me
.NET, Alexa and me.NET, Alexa and me
.NET, Alexa and me
 
ORM – The tip of an iceberg
ORM – The tip of an icebergORM – The tip of an iceberg
ORM – The tip of an iceberg
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to know
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Large scale, distributed and reliable messaging with Kafka

  • 1.
  • 2.
  • 3. If I’m talking about some console applications, stills, moonshine and police now. That means prayer didn’t work. Fuck.
  • 4.
  • 5. Large scale, distributed and reliable messaging with Kafka
  • 7.
  • 8. Large scale, distributed and reliable messaging with Kafka
  • 9. Agenda  History  Use cases  Producers and consumers  Topics, partitions and clusters  Streams, AdminClient and Connectors  Kafka in .NET and Cloud  External stream processing systems (spark/storm/flink/apex)
  • 11. History  Developed in LinkedIn  Open sourced in 2011  Named after Franz Kafka because it’s optimized for writing
  • 12. Kafka is basically:  Open source  Written in Scala  Message broker  Stream processing platform  High throughput & low latency  Scalable  Designed as distributed transaction log
  • 16.
  • 24.
  • 25. Kafka APIs  Producer API  Consumer API  Connector API  Streams API  AdminClient API
  • 27. Producer API  Allows to publish stream of messages to one or more topics  Asynchronous and thread safe (in original implementation)  Can deliver messages “at least once”, “at most once” or “exactly once”  Can batch messages  Can use partitions for load balancing purpose
  • 28.
  • 29. Consumer API  Allows subscription to topic and receiving messages from it  Messages are pulled from topic – each consumer can process messages at its own pace  Supports long polling to avoid being stuck in a loop  Each consumer handles its own position  Does not support acknowledgements but can rewind from any offset  Supports consumer groups
  • 30.
  • 32.
  • 33.
  • 34. Topics  Each topic has a name, is partitioned and is multi-subscriber  Kafka persists each published message. Retention period is configurable.  Consumer controls its own offset  Partition must fit on the server but topic can be partitioned across multiple nodes  Partitions are replicated across cluster to ensure fault tolerance, each partition has a leader replica
  • 35.
  • 36.
  • 37. Cluster  Kafka runs in cluster  Cluster has multiple servers/nodes  Cluster can run on multiple datacenters  Cluster stores messages in partitioned topics  Zookeeper coordinates servers in cluster
  • 39. Streams  Acts as stream processor  Allows consuming inputs from one or more topics and provide processed output to other topic  Works (almost) in real time
  • 41.
  • 42. Connector API  Build your own reusable consumers/producers  Integrate Kafka with existing applications
  • 44. Kafka in .NET  Main library is confluent-kafka-dotnet  Supports Avro serialization/deserialization with schema registry  Easy to learn, hard to master
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51. Kafka in Azure  Azure Event Hub are fully compatible with Kafka enabled applications (you just need to change connection configuration)  You can setup Kafka Cluster in HDInsight (it’s not cheap)
  • 52. Kafka in AWS  Amazon Managed Streaming for Apache Kafka (Amazon MSK)  Amazon Kinesis has somewhat similar capabilities
  • 53. Kafka in GCP  Only in VMs/Containers
  • 54. Kafka in IBM Cloud  IBM Event Streams is basically Kafka-as-a-service
  • 55.
  • 56.
  • 57.
  • 59. Apache Apex  Platform used to help in development of stream and batch oriented applications.  Designed to process data in-motion  Performant  Scalable  Fault tolerant  Allows creation of various functions without thinking about distributed environment
  • 60. Apache Flink  Focused on parallel, pipelined processing of streams  Runs Java, Scala, Python and SQL Code  Manages state  Great for data analysis and event correlation
  • 61. Apache Spark  Analytics engine for big data processing  Data processing framework  Used for processing and transforming streams of data  Also used for training machine learning algorithms  Great for ETL (Extract, transform, and load) processes  Supports Java, Scala, Python and R
  • 62. Apache Storm  Distributed real-time computation system  Great for real time analytic systems (in example fraud detection)  Can handle MASSIVE amounts of data on the fly  Works with ANY programming language
  • 63.