Embracing Database Diversity with Kafka and Debezium

@lyaruu#Voxxed
Embracing Database Diversity
with Kafka and Debezium
Frank Lyaruu

@lyaruu#Voxxed
Embracing Database Diversity
with Kafka and Debezium
Frank Lyaruu
CTO Dexels
Amsterdam

• Service provider for professional and amateur team sports in the
Netherlands & Belgium
• 10+ years old
• Managing personal data, planning competitions, assigning ofﬁcials,
supplying data feeds

• 2M+ players
• 6K+ clubs
• 40K matches a week
• Spikey but predictable load

Technology stack
• Oracle database
• Cluster of Java based application servers
• Diverse set of clients

Application
Code
Desktop client Desktop client Web client Web client Mobile client Mobile client
Application
Code
Application
Code
Application
Code

Challenge
• Move to a player centric model instead of a club centric
• Few orders of magnitude more users and load
• Moving away from Oracle is not feasible in the short term
• Scaling Oracle is just too expensive, if at all possible

Application Code
Client
Application Code
Client

Plan
1. Capture data in realtime
2. Dump into Kafka
3. Insert into MongoDB

Kafka
• Kafka Broker
• Kafka Connect API
• Kafka Streams API

Kafka
• Persistent pub/sub message bus
• High throughput
• Subscribers can consume at their own speed
• Subscribers can request a ‘rewind’ and re-consume a topic
• Has some tricks to keep the data volume down
• Having both fast and slow consumers is not a problem

https://www.conﬂuent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/
Martin Kleppmann:

Write ahead log
• Archive Log (Oracle)
• Oplog (MongoDB)
• Write Ahead Log (Postgres)
• Binlog (MySQL)

Write ahead log
• Recovery
• Replication

Postgres (>=9.4)
46.2.1. Logical Decoding
Logical decoding is the process of extracting all persistent changes to a
database's tables into a coherent, easy to understand format which can be
interpreted without detailed knowledge of the database's internal state.
In PostgreSQL, logical decoding is implemented by decoding the contents of
the write-ahead log, which describe changes on a storage level, into an
application-speciﬁc form such as a stream of tuples or SQL statements.
Jeff Klukas / Postgres 9.4 docs

Debezium
• RedHat
• Standardize change data capture
• Uses Kafka Connect API
• Pretty young: 0.7.4
• Based on ‘Bottled Water’ research project

Debezium
• Postgres
• Mysql
• MongoDB
• Oracle*

Core Service
Source database
Java Application Server
Kafka
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
User Backend Service
MongoDB Database
Java Application Server
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}

SELECT * FROM Communication C WHERE PersonId = 1
SELECT * FROM Person WHERE PersonId=1
Example

MongoDB
{
   “_id":1,
   "Name":"Alfredo",
   "DOB":"1990-1-1",
   "Communication":{
      "Mobile":"12345",
      "Email":"alfredo@aol.com",
      "Twitter":"@alfredo"
   }
}

Stream Processing
SQL Record SQL Record
Stream
transformation
MongoDB

Stream Processing
SQL Record SQL Record
Kafka Streams
MongoDB
RocksDB

Kafka Streams at Scale
• ± 500M rows of SQL data
• ± 50 joins
• 500 topics
• 400 Gb of Kafka Data
• 300 Gb of RocksDb data
• Building a complete replica from scratch takes many hours
• After that <100ms latency for changes

Development cycle
• Developing and testing is hard for stateful code
• Starting a new ‘generation’ is costly
• Contaminated data might show up

Conclusions
• Went into production last June
• Generally behaves well (aside from some glitches)
• Kafka Streams is in a lot better state than a year ago

Microservice
API
Any private store
Code in whatever
language

Different parts need the same
data
… but in a different way

Application
UI
SQL Database
Code in some
language
Analytics code
Analytics UI

Application Service
UI
SQL Database
Code in some
language
Analytics UI
Analytics code
Analytics Service
Analytics Database
Analytics API

Event Driven Microservices
• Services push events instead of a request/response model
• Usually backed by a publish/subscribe bus

Application Service
SQL Database
Code in some language
Event Bus
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
Analytics Service
Analytics Database
Code in some language
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}

Elasticsearch
• Add unstructured search to our application
• Reduces load on our source databases
• Users expect google-like interfaces

Neo4J
• Graph Database
• Some analytics are much easier to express in terms of graphs

Firebase Realtime
• Real-time database
• ‘Backend As a Service’
• Essentially one big JSON document
• Very easy to use client libraries for web and mobile
• Safe to develop

Caches
• We can use our streaming engine to update / invalidate caches

Push to clients
• We can push data to clients in real time

Mac book
Postgres Zookeeper Kafka
Kafka
Connect /
Debezium
Kafka
Streams
MongoDB

{
"_id" : "1001",
"Address" : [
{
"zip" : "76036",
"city" : "Euless",
"street" : "3183 Moore Avenue",
"id" : 10,
"state" : "Texas",
"customer_id" : 1001,
"type" : "SHIPPING"
}
],
"last_name" : "Thomas",
"id" : 1001,
"ﬁrst_name" : "Sally",
"email" : "sally.thomas@acme.com"
}
MongoDB

Kafka
Postgres Debezium
Kafka
Streams
MongoDB
Kafka
Connect

If all you have is a hammer…

If all you have is SQL…
you will always think relational
you will always think you need nothing else

Questions?
* Any question that is not: “Why don’t you use Postgres? Postgres can do anything”

Embracing Database Diversity with Kafka and Debezium

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Embracing Database Diversity with Kafka and Debezium

Ähnlich wie Embracing Database Diversity with Kafka and Debezium (20)

Mehr von Frank Lyaruu

Mehr von Frank Lyaruu (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Embracing Database Diversity with Kafka and Debezium