This document discusses the origins and value of Apache Kafka in modern data architectures. It describes how Kafka was created to handle continuous flows of data, addressing limitations in databases and messaging systems. Kafka provides a unified solution for messaging, data storage, and stream processing. It originated from the ideas of treating the log as a first-class citizen and combining messaging, durable storage, and stream processing capabilities into a streaming platform. The document demonstrates how Kafka can be used to build a game scoring application using streams and tables. It recommends ways to learn more about Kafka including trying Confluent Cloud, tutorials, books, and attending Kafka Summit.
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
1. Rediscovering the value of
apache kafka® in modern
data architectures
@riferrei | #kafkameetup | @CONFLUENTINC
2. About me
@riferrei | @kafkameetup | @CONFLUENTINC
• RICARDO FERREIRA
• Works for confluent
• Developer advocate
• Ricardo@confluent.iO
• HTTPS://RIFERREI.NET
3.
4.
5. Origins of apache kafka
@riferrei | @kafkameetup | @CONFLUENTINC
”there were lots of databases and
other systems built to store data,
but what was missing in our
architecture was something that
would help us to handle continuous
flows of data.” – jay kreps
16. @riferrei | @kafkameetup | @CONFLUENTINC
ARE DATABASES LIMITED?
YES, THEY ARE. WHY
DO WE HAVE TO MOVE
DATA FROM ONE DB TO
ANOTHER JUST TO DO
ANALYTICS?
17. @riferrei | @kafkameetup | @CONFLUENTINC
SHARED STATE = MORE DB’S
Business line 1 Business line 2 Business line 3
18. @riferrei | @kafkameetup | @CONFLUENTINC
THIRD REALIZATION
User
tracking
Historical
data
Operational
metricsNosql
database
Graph
database
Sql
database
microservices
...HADOOP
Elastic
search
grafana
Machine
learning
REC.
ENGINE SEARCH SECURITY EMAIL
SOCIAL
GRAPH
19. “The truth is the log.
The database is a cache
of a subset of the log.”
— pat helland
Immutability changes everything
http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
20. @riferrei | @kafkameetup | @CONFLUENTINC
log as first-class citizen
database
LOG
0 1 2 3 4 5 6 7 8LOG
reads
writes
Destination System a
(time = 1)
Destination System b
(time = 3)
21. @riferrei | @kafkameetup | @CONFLUENTINC
SOLUTION: BUILD A COMMIT LOG
Commit LOG
User
tracking
Historical
data
Operational
metricsNosql
database
Graph
database
Sql
database
microservices
...HADOOP
Elastic
search
grafana
Machine
learning
REC.
ENGINE SEARCH SECURITY EMAIL
SOCIAL
GRAPH
24. Origins of apache kafka
@riferrei | @kafkameetup | @CONFLUENTINC
”WE’VE COME TO THINK OF KAFKA AS A
STREAMING PLATFORM: A SYSTEM THAT
LETS YOU PUBLISH AND SUBSCRIBE TO
STREAMS OF DATA, STORE THEM, AND
PROCESS THEM, AND THAT IS EXACTLY
WHAT APACHE KAFKA IS BUILT TO BE.”
– jay kreps
25. @riferrei | @kafkameetup | @CONFLUENTINC
ORIGINS OF APACHE KAFKA
Databases Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Highly Scalable
Durable
Persistent
Ordered
Fast (Low Latency)
26. @riferrei | @kafkameetup | @CONFLUENTINC
ORIGINS OF APACHE KAFKA
Databases Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Highly Scalable
Durable
Persistent
Ordered
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Fast (Low Latency)
Distributed
Commit log
27. @riferrei | @kafkameetup | @CONFLUENTINC
ORIGINS OF APACHE KAFKA
Databases Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Highly Scalable
Durable
Persistent
Ordered
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Fast (Low Latency)
Stream processing
Continuous flows
Scalable integration
Distributed
Streaming platform
29. Origins of apache kafka
@riferrei | @kafkameetup | @CONFLUENTINC
”the ability to combine these three
areas – to bring all the streams of
data together across all the use
cases – is what makes the idea of a
streaming platform so appealing
to people” – jay kreps