Viktor Gamov, Confluent, Developer Advocate
Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more.
This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems.
https://www.meetup.com/Chennai-Kafka/events/269942117/
16. @gamussa | #confluentvug | @ConfluentINc
16
Messages will be produced
in a round robin fashion
Time
17. @gamussa | #confluentvug | @ConfluentINc
17
Consumers have a position all of their own
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
18. @gamussa | #confluentvug | @ConfluentINc
18
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
Consumers have a position all of their own
19. @gamussa | #confluentvug | @ConfluentINc
19
Old New
Robin
is here
Scan
Viktor
is here
Scan
Ricardo
is here
Scan
Consumers have a position all of their own
20. @gamussa | #confluentvug | @ConfluentINc
20
Only Sequential Access
Old New
Read to offset & scan
28. @gamussa | #confluentvug | @ConfluentINc
28
Linearly Scalable Architecture
Single topic:
- Many producers machines
- Many consumer machines
- Many Broker machines
No Bottleneck!!
Producers
Consumers
29. 29
From/to other systems: Kafka Connect
and more
Tip: Great option to gradually move workloads to Kafka while keeping production running!
30. 30
Kafka Connect
● Deployed standalone (development) or as a distributed cluster (production)
● Elastic service that works on bare-metal, VMs, containers, Kubernetes, ...
● The individual ‘Connector’ determines delivery guarantees, e.g., exactly-once
VM VM
31. 31
Single Message Transforms for real-time ETL
Ingress: modify an Event before storing
● Obfuscate sensitive information, e.g. PII
● Add origin of event for lineage tracking
● Remove unnecessary data fields
● … and more
Egress: modify an Event on its way out
● Route high-priority events to faster stores
● Direct events to different Elasticsearch indexes
● Cast data types to match destination
● … and more
{ user: ab123,
gender: female,
ip: 1.2.3.95 }
{ user: ab123,
ip: 1.2.3.XXX }
32. @gamussa | #confluentvug | @ConfluentINc
32
Replicate to get fault tolerance
replicate
msg
msg
leader
Machine A
Machine B
36. 36
Similar to a traditional messaging system
(ActiveMQ, Rabbit etc) but with:
(a) Far better scalability
(b) Built in fault tolerance / HA
(c) Storage
The log is a type of durable
messaging system
39. @gamussa | #confluentvug | @ConfluentINc
39
Streaming
is the toolset
for dealing with
events
as they move!
40. @gamussa | #confluentvug | @ConfluentINc
40
authorization_attempts possible_fraud
What exactly is Stream Processing?
41. @gamussa | #confluentvug | @ConfluentINc
41
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
42. @gamussa | #confluentvug | @ConfluentINc
42
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
43. @gamussa | #confluentvug | @ConfluentINc
43
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
44. @gamussa | #confluentvug | @ConfluentINc
44
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
45. @gamussa | #confluentvug | @ConfluentINc
45
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
46. @gamussa | #confluentvug | @ConfluentINc
46
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
47. @gamussa | #confluentvug | @ConfluentINc
47
Lower the bar to enter the world of streaming
User Population
CodingSophistication
Core developers who use Java/Scala
Core developers who don’t use Java/Scala, e.g. .NET, Go
Data engineers, architects, DevOps/SRE
BI analysts
streams
49. @gamussa | #confluentvug | @ConfluentINc
49
Interaction with Kafka
Kafka
(data)
KSQL
(processing)
Application
(processing) Java/KStreams, .NET
Does not run on
Kafka brokers
Does not run on
Kafka brokers
50. @gamussa | #confluentvug | @ConfluentINc
50
Find your local Meetup Group
https://cnfl.io/kafka-meetups
Join us in Slack
http://cnfl.io/slack
Grab Stream Processing books
https://cnfl.io/book-bundle