SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Capture the Streams of
Database Changes
Randall Hauch
Founder of Debezium project
@rhauch
Apach Kafka™
2
Producers
Consumers
Apache Kafka Streams API
Apache Kafka Connect API
DB
Change Data Capture Connectors
3
See the list at https://www.confluent.io/product/connectors/
Apache Kafka™
Why capture streams of data changes?
4
DB
Application
Streaming data replication
5
DB
Apache Kafka™
DB2
Streaming analytics and machine learning
6
DB
…
Apache Kafka™
Streaming ETL
7
DB2
Extract Transform Load
DB
Apache Kafka™
Shared data in a microservice architecture
8
Bounded context
DB A
Service A
Apache Kafka™
changes changes changes
other
data
other
data
other
data
Bounded context
DB B
Service B
Bounded context
DB C
Service C
materialized
views
materialized
views
materialized
views
Deconstructed applications
9
DB
Application
Cache
Indexes
Cache
Indexes
DB
Apache Kafka™
CacheIndexes
Application
(dual writes!)
Kafka
Consumers
How do we get a stream of data changes?
10
DB
Application
?
Apache Kafka™
Consumers
How do we get a stream of data changes?
11
Modify the app to
write out events?
DB
Application
Application 2 Application 3
What about the
other apps that
change data?
Dual writes?!
Apache Kafka™
Consumers
How do we get a stream of data changes?
12
Or we can watch the database
DB
Application
Need a connector to do this
Just install, configure and run it,
and it will adapt
No need to change our apps!
Change data capture!
Kafka Connect
Connector
Databases 101
13
insert row 1
insert row 2
update row 1
insert row 3
delete row 2
insert row 4
update row 2
• Applications modify rows in transactions
• DBMS records the changes in a log,
then updates the tables
• DBMS uses log for recovery, replication, …
- MySQL binlog
- MongoDB oplog
- PostgreSQL WAL
• We can (try to) use the log for CDC*
Application
*mileage may vary
Change Data Capture (CDC) at work
14
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
15
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
16
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Change Data Capture (CDC) at work
17
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Stream-Table Duality
18
We can view a table as a stream
and
We can view a stream as a table
Change Data Capture (CDC) at work
19
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
What does a change event look like?
20
• Primary/unique key of the row
• Kind of operation: insert, update, delete
• State of the row after the changes
• State of the row before the changes
• Source-specific provenance metadata
- location in the log
- database name, table name
- transaction ID, source timestamp, …
• Capture timestamp
What does a change event look like?
21
• Key
- Primary/unique key of the row
• Value
- Operation
- State of the row after the changes
- State of the row before the changes (if available)
- Source-specific provenance metadata
- Capture timestamp
• Timestamp
This maps perfectly to a Kafka message!
Single Message Transforms
22
• Simple transformations for a single message
• Defined as part of Kafka Connect
- Some useful transforms provided in-the-box
- Easily implement your own
• Optionally deploy 1+ transforms with each connector
- Modify messages produced by source connector
- Modify messages sent to sink connectors
• Makes it much easier to mix and match connectors
Connectors started long after DBs were created
23
• Databases don’t keep all past changes
- The logs are not kept indefinitely
• So CDC connectors often start by taking an initial snapshot
- Capture initial state of every row at that time
- Then capture and apply changes committed after initial copy started
- Transition can be tricky, but is easier if changes are idempotent
- Must handle failure at any point
• Consumers are eventually consistent with upstream sources
- More sophisticated consumers might process source transactions
Debezium connectors
24
• MySQL connector
- Multiple MySQL topologies
- GTIDs, DDL and DML, table filters, events mirror table structures
• MongoDB connector
- Replica set or sharded cluster
- Only insert events have “after” state; others have patch operation
• PostgreSQL connector
- Provides server-side logical decoding plugin
- Table filters, events mirror table structures
• SQL Server and Oracle connectors coming next
Using Debezium + Kafka Connect
25
MySQL
Using Debezium + Kafka Connect
26
Apache Kafka™
MySQL
• Use existing Kafka cluster
Using Debezium + Kafka Connect
27
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
Using Debezium + Kafka Connect
28
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s)
Using Debezium + Kafka Connect
29
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
Using Debezium + Kafka Connect
30
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Consume change events
Using Debezium + Kafka Connect
31
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
Using Debezium + Kafka Connect
32
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
• Consumers will keep consuming or block until there are more events
Using Debezium + Kafka Connect
33
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
Using Debezium + Kafka Connect
34
Kafka Connect
Apache Kafka™Kafka Connect
MySQL
ConnectorMySQL
PostgreSQL
ConnectorPostgreSQL
MySQL
Connector
MySQL
MySQL
Connector
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
DB2
Kafka Connect
Sink
Connector
Create data pipelines for data you already have
36
DB1
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Create data pipelines for data you already have
37
DB1
DB2
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
ApplicationsApplications
Create data pipelines for data you already have
38
DB1 DB2
Kafka Streams
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
Applications
&
Frameworks
Summary
39
• Just configure and deploy connectors - no custom code!
• Continuously captures changes with low latency and without batching
• Fault tolerant
- failures only cause a delay in processing
- still process events at least once
- avoid dual-write problems
• Use stream processing to combine/merge/join multiple low-level events
• CDC is more complex, but amortize across multiple systems
• Works with limited DBMSes (for now) that have APIs for CDC
Interested? Want to contribute?
40
debezium.io
@debezium
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Apache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsApache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsFlorent Ramiere
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...confluent
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaAraf Karsh Hamid
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Continuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseContinuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseAidas Dragūnas
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven ArchitectureStefan Norberg
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...HostedbyConfluent
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with KafkaAndrei Rugina
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 

Was ist angesagt? (20)

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Apache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsApache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patterns
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and Saga
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Continuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With LiquibaseContinuous DB Changes Delivery With Liquibase
Continuous DB Changes Delivery With Liquibase
 
Event Driven Architecture
Event Driven ArchitectureEvent Driven Architecture
Event Driven Architecture
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 

Ähnlich wie Capture the Streams of Database Changes

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCAbhijit Kumar
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Serena Software
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeLucas Jellema
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database designSalehein Syed
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDan Stine
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 

Ähnlich wie Capture the Streams of Database Changes (20)

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and Liquibase
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 

Mehr von confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Kürzlich hochgeladen (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Capture the Streams of Database Changes

  • 1. Capture the Streams of Database Changes Randall Hauch Founder of Debezium project @rhauch
  • 2. Apach Kafka™ 2 Producers Consumers Apache Kafka Streams API Apache Kafka Connect API DB
  • 3. Change Data Capture Connectors 3 See the list at https://www.confluent.io/product/connectors/
  • 4. Apache Kafka™ Why capture streams of data changes? 4 DB Application
  • 6. Streaming analytics and machine learning 6 DB … Apache Kafka™
  • 7. Streaming ETL 7 DB2 Extract Transform Load DB Apache Kafka™
  • 8. Shared data in a microservice architecture 8 Bounded context DB A Service A Apache Kafka™ changes changes changes other data other data other data Bounded context DB B Service B Bounded context DB C Service C materialized views materialized views materialized views
  • 10. Kafka Consumers How do we get a stream of data changes? 10 DB Application ?
  • 11. Apache Kafka™ Consumers How do we get a stream of data changes? 11 Modify the app to write out events? DB Application Application 2 Application 3 What about the other apps that change data? Dual writes?!
  • 12. Apache Kafka™ Consumers How do we get a stream of data changes? 12 Or we can watch the database DB Application Need a connector to do this Just install, configure and run it, and it will adapt No need to change our apps! Change data capture! Kafka Connect Connector
  • 13. Databases 101 13 insert row 1 insert row 2 update row 1 insert row 3 delete row 2 insert row 4 update row 2 • Applications modify rows in transactions • DBMS records the changes in a log, then updates the tables • DBMS uses log for recovery, replication, … - MySQL binlog - MongoDB oplog - PostgreSQL WAL • We can (try to) use the log for CDC* Application *mileage may vary
  • 14. Change Data Capture (CDC) at work 14 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 15. Change Data Capture (CDC) at work 15 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 16. Change Data Capture (CDC) at work 16 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 17. Change Data Capture (CDC) at work 17 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 18. Stream-Table Duality 18 We can view a table as a stream and We can view a stream as a table
  • 19. Change Data Capture (CDC) at work 19 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 20. What does a change event look like? 20 • Primary/unique key of the row • Kind of operation: insert, update, delete • State of the row after the changes • State of the row before the changes • Source-specific provenance metadata - location in the log - database name, table name - transaction ID, source timestamp, … • Capture timestamp
  • 21. What does a change event look like? 21 • Key - Primary/unique key of the row • Value - Operation - State of the row after the changes - State of the row before the changes (if available) - Source-specific provenance metadata - Capture timestamp • Timestamp This maps perfectly to a Kafka message!
  • 22. Single Message Transforms 22 • Simple transformations for a single message • Defined as part of Kafka Connect - Some useful transforms provided in-the-box - Easily implement your own • Optionally deploy 1+ transforms with each connector - Modify messages produced by source connector - Modify messages sent to sink connectors • Makes it much easier to mix and match connectors
  • 23. Connectors started long after DBs were created 23 • Databases don’t keep all past changes - The logs are not kept indefinitely • So CDC connectors often start by taking an initial snapshot - Capture initial state of every row at that time - Then capture and apply changes committed after initial copy started - Transition can be tricky, but is easier if changes are idempotent - Must handle failure at any point • Consumers are eventually consistent with upstream sources - More sophisticated consumers might process source transactions
  • 24. Debezium connectors 24 • MySQL connector - Multiple MySQL topologies - GTIDs, DDL and DML, table filters, events mirror table structures • MongoDB connector - Replica set or sharded cluster - Only insert events have “after” state; others have patch operation • PostgreSQL connector - Provides server-side logical decoding plugin - Table filters, events mirror table structures • SQL Server and Oracle connectors coming next
  • 25. Using Debezium + Kafka Connect 25 MySQL
  • 26. Using Debezium + Kafka Connect 26 Apache Kafka™ MySQL • Use existing Kafka cluster
  • 27. Using Debezium + Kafka Connect 27 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster
  • 28. Using Debezium + Kafka Connect 28 Apache Kafka™Kafka Connect MySQL MySQL Connector • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s)
  • 29. Using Debezium + Kafka Connect 29 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot MySQL Connector
  • 30. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes Using Debezium + Kafka Connect 30 Apache Kafka™Kafka Connect MySQL MySQL Connector
  • 31. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Consume change events Using Debezium + Kafka Connect 31 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 32. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time Using Debezium + Kafka Connect 32 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 33. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time • Consumers will keep consuming or block until there are more events Using Debezium + Kafka Connect 33 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 34. Using Debezium + Kafka Connect 34 Kafka Connect Apache Kafka™Kafka Connect MySQL ConnectorMySQL PostgreSQL ConnectorPostgreSQL MySQL Connector MySQL MySQL Connector Consumers Consumers Consumers Consumers Consumers Consumers Consumers
  • 35. DB2 Kafka Connect Sink Connector Create data pipelines for data you already have 36 DB1 Extract Kafka Streams Transform Load Kafka Connect Source Connector
  • 36. Create data pipelines for data you already have 37 DB1 DB2 Extract Kafka Streams Transform Load Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector
  • 37. ApplicationsApplications Create data pipelines for data you already have 38 DB1 DB2 Kafka Streams Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector Applications & Frameworks
  • 38. Summary 39 • Just configure and deploy connectors - no custom code! • Continuously captures changes with low latency and without batching • Fault tolerant - failures only cause a delay in processing - still process events at least once - avoid dual-write problems • Use stream processing to combine/merge/join multiple low-level events • CDC is more complex, but amortize across multiple systems • Works with limited DBMSes (for now) that have APIs for CDC
  • 39. Interested? Want to contribute? 40 debezium.io @debezium