Large scale, distributed and reliable messaging with Kafka

If I’m talking about some console
applications, stills, moonshine and
police now.
That means prayer didn’t work.
Fuck.

Large scale, distributed and reliable
messaging with Kafka

Rafał Hryniewski
@r_hryniewski
fb.me/hryniewskinet
.NET Dev
Blogger
Speaker
Community leader
https://hryniewski.net
rafal@hryniewski.net

Agenda
 History
 Use cases
 Producers and consumers
 Topics, partitions and clusters
 Streams, AdminClient and Connectors
 Kafka in .NET and Cloud
 External stream processing systems (spark/storm/flink/apex)

History
 Developed in LinkedIn
 Open sourced in 2011
 Named after Franz Kafka because it’s optimized for writing

Kafka is basically:
 Open source
 Written in Scala
 Message broker
 Stream processing platform
 High throughput & low latency
 Scalable
 Designed as distributed transaction log

Kafka APIs
 Producer API
 Consumer API
 Connector API
 Streams API
 AdminClient API

Producer API
 Allows to publish stream of messages to one or more topics
 Asynchronous and thread safe (in original implementation)
 Can deliver messages “at least once”, “at most once” or “exactly once”
 Can batch messages
 Can use partitions for load balancing purpose

Consumer API
 Allows subscription to topic and receiving messages from it
 Messages are pulled from topic – each consumer can process messages at its
own pace
 Supports long polling to avoid being stuck in a loop
 Each consumer handles its own position
 Does not support acknowledgements but can rewind from any offset
 Supports consumer groups

Topics
 Each topic has a name, is partitioned and is multi-subscriber
 Kafka persists each published message. Retention period is configurable.
 Consumer controls its own offset
 Partition must fit on the server but topic can be partitioned across multiple nodes
 Partitions are replicated across cluster to ensure fault tolerance, each partition has
a leader replica

Cluster
 Kafka runs in cluster
 Cluster has multiple servers/nodes
 Cluster can run on multiple datacenters
 Cluster stores messages in partitioned topics
 Zookeeper coordinates servers in cluster

Streams
 Acts as stream processor
 Allows consuming inputs from one or more topics and provide processed output to
other topic
 Works (almost) in real time

Connector API
 Build your own reusable consumers/producers
 Integrate Kafka with existing applications

Kafka in .NET
 Main library is confluent-kafka-dotnet
 Supports Avro serialization/deserialization with schema registry
 Easy to learn, hard to master

Kafka in Azure
 Azure Event Hub are fully compatible with Kafka enabled applications (you just
need to change connection configuration)
 You can setup Kafka Cluster in HDInsight (it’s not cheap)

Kafka in AWS
 Amazon Managed Streaming for Apache Kafka (Amazon MSK)
 Amazon Kinesis has somewhat similar capabilities

Kafka in GCP
 Only in VMs/Containers

Kafka in IBM Cloud
 IBM Event Streams is basically Kafka-as-a-service

External stream processing systems

Apache Apex
 Platform used to help in development of stream and batch oriented applications.
 Designed to process data in-motion
 Performant
 Scalable
 Fault tolerant
 Allows creation of various functions without thinking about distributed environment

Apache Flink
 Focused on parallel, pipelined processing of streams
 Runs Java, Scala, Python and SQL Code
 Manages state
 Great for data analysis and event correlation

Apache Spark
 Analytics engine for big data processing
 Data processing framework
 Used for processing and transforming streams of data
 Also used for training machine learning algorithms
 Great for ETL (Extract, transform, and load) processes
 Supports Java, Scala, Python and R

Apache Storm
 Distributed real-time computation system
 Great for real time analytic systems (in example fraud detection)
 Can handle MASSIVE amounts of data on the fly
 Works with ANY programming language

@r_hryniewskifb.me/hryniewskinet

Large scale, distributed and reliable messaging with Kafka

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Large scale, distributed and reliable messaging with Kafka

Ähnlich wie Large scale, distributed and reliable messaging with Kafka (20)

Mehr von Rafał Hryniewski

Mehr von Rafał Hryniewski (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Large scale, distributed and reliable messaging with Kafka