Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Integrating
Databases
and Apache
Kafka®
#kafkasummit@rmoff
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Photo by Emily Morter on Unsplash
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Analytics - Database Offload
HDFS / S3 /
BigQuer...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Real-time Event Stream Enrichment
order events
c...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Evolve processing from old systems to new
Stream...
“
@rmoff / No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
But streaming…I've just got
data in a...
“
@rmoff / No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Bold claim: all your
data is event st...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
A Customer
Experience
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
A Sale
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
A Sensor
Reading
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
An Application
Log Entry
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Databases
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Do you think that’s a table
you are querying?
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Stream Table Duality
Account ID Balance
1234...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Stream Table Duality
Account ID Balance
1234...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Stream Table Duality
Account ID Amount
12345...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Stream Table Duality
Account ID Amount
12345...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Stream Table Duality
Account ID Amount
12345...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The truth is the log.
The database is a cache
of...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Photo by Vadim Sherbakov on Unsplash
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Streaming Integration with Kafka Connect
Kafka B...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Streaming Integration with Kafka Connect
Kafka B...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Streaming Integration with Kafka Connect
Kafka B...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Kafka Connect basics
KafkaKafka ConnectSource
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Connectors
KafkaKafka ConnectSource
Connector
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
{
"connector.class":
"io.confluent.connect.jdbc....
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Converters
KafkaKafka ConnectSource
Connector Co...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Serialisation & Schemas
-> Confluent
Schema Regi...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
The Confluent Schema Registry
Source
Avro
Messag...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Single Message Transforms
KafkaKafka ConnectSour...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Extensible
Connector Converter
Transform(s)
hub....
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Change-Data-Capture (CDC)
Query-based Log-based
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
#051024 17:24:13 server id 1 end_l...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Demo Time!
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
https://github.com/confluentinc/demo-scene/tree/...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
"Which one
should I use?"
Photoby TylerNix on Un...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
It Depends!
Photo by Trevor Cole on Unsplash
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
Photoby MateseFields on Unsplash...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
CREATE TABLE my_table (
ID INT,
FOO VARCHAR,
BAR...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
UPDATE
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
DELETE
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
SELECT *
FROM my_table 

WHERE t...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
orderID status address updateTS
42 SHIPPED 29 Ac...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Query-based CDC
orderID status address updateTS
...
Event-driven app
Log-based CDC
Query-based CDC
@rmoff #kafkasummit
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
UPDATE
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
UPDATE
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
DELETE
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
DELETE
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
DELETE
Transaction log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Immutable event log
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Log-based CDC
Photoby SebastianPociecha on Unspl...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Change-Data-Capture (CDC)
Query-based Log-based
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
• Query-based CDC
tl;dr : which tool do I use?
c...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Which Log-Based CDC Tool?
• Open Source RDBMS, 
...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Real-time Event Stream Enrichment
ratings
custom...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Demo Time!
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
https://github.com/confluentinc/demo-scene/tree/...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
Confluent Community Components
Apache Kafka with...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
http://cnfl.io/book-bundle
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
@rmoff robin@confluent.io
http://cnfl.io/slack
h...
No More Silos: Integrating Databases and Apache Kafka
@rmoff #kafkasummit
#EOF
Nächste SlideShare
Wird geladen in …5
×

No More Silos: Integrating Databases into Apache Kafka (Robin Moffatt, Confluent) Kafka Summit NYC 2019

1.472 Aufrufe

Veröffentlicht am

Companies new and old are all recognizing the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low latency analytics, event-driven architectures and the population of multiple downstream systems.

In this talk, we’ll look at one of the most common integration requirements – connecting databases to Kafka. We’ll consider the concept that all data is a stream of events, including that residing within a database. We’ll look at why we’d want to stream data from a database, including driving applications in Kafka from events upstream. We’ll discuss the different methods for connecting databases to Kafka, and the pros and cons of each. Techniques including Change-Data-Capture (CDC) and Kafka Connect will be covered, as well as an exploration of the power of KSQL for performing transformations such as joins on the inbound data.

Attendees of this talk will learn:
* That all data is event streams; databases are just a materialised view of a stream of events.
* The best ways to integrate databases with Kafka.
* Anti-patterns of which to be aware.
* The power of KSQL for transforming streams of data in Kafka.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

No More Silos: Integrating Databases into Apache Kafka (Robin Moffatt, Confluent) Kafka Summit NYC 2019

  1. 1. Integrating Databases and Apache Kafka® #kafkasummit@rmoff
  2. 2. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Photo by Emily Morter on Unsplash
  3. 3. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Analytics - Database Offload HDFS / S3 / BigQuery etc RDBMS
  4. 4. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC
  5. 5. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Evolve processing from old systems to new Stream Processing RDBMS Existing App New App <x>
  6. 6. “ @rmoff / No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit But streaming…I've just got data in a database…right?
  7. 7. “ @rmoff / No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Bold claim: all your data is event streams
  8. 8. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit A Customer Experience
  9. 9. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit A Sale
  10. 10. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit A Sensor Reading
  11. 11. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit An Application Log Entry
  12. 12. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Databases
  13. 13. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Do you think that’s a table you are querying?
  14. 14. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Stream Table Duality Account ID Balance 12345 €50
  15. 15. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Stream Table Duality Account ID Balance 12345 €50Account ID Amount 12345 + €50 Time
  16. 16. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Stream Table Duality Account ID Amount 12345 + €50 12345 + €25 Account ID Balance 12345 €75 Time
  17. 17. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Stream Table Duality Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €15 Time
  18. 18. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Stream Table Duality Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €15 Time Stream Table
  19. 19. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
  20. 20. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Photo by Vadim Sherbakov on Unsplash
  21. 21. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sourcessyslog flat file CSV JSON MQTT
  22. 22. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sinks Amazon S3 MQTT
  23. 23. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks Amazon S3 MQTT syslog flat file CSV JSON MQTT
  24. 24. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Kafka Connect basics KafkaKafka ConnectSource
  25. 25. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Connectors KafkaKafka ConnectSource Connector
  26. 26. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://asgard:3306/demo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/ Easy to configure
  27. 27. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Converters KafkaKafka ConnectSource Connector Converter https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
  28. 28. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Serialisation & Schemas -> Confluent Schema Registry Avro Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
  29. 29. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit The Confluent Schema Registry Source Avro Message Target Schema RegistryAvro Schema Kafka Connect Kafka ConnectAvro Message
  30. 30. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Single Message Transforms KafkaKafka ConnectSource Connector Converter Transform(s)
  31. 31. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Extensible Connector Converter Transform(s) hub.confluent.io
  32. 32. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Change-Data-Capture (CDC) Query-based Log-based
  33. 33. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  34. 34. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  35. 35. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  36. 36. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  37. 37. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  38. 38. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC #051024 17:24:13 server id 1 end_log_pos 98 # Position Timestamp Type Master ID Size Master Pos Flags # 00000004 9d fc 5c 43 0f 01 00 00 00 5e 00 00 00 62 00 00 00 00 00 # 00000017 04 00 35 2e 30 2e 31 35 2d 64 65 62 75 67 2d 6c |..5.0.15.debug.l| # 00000027 6f 67 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |og..............| # 00000037 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| # 00000047 00 00 00 00 9d fc 5c 43 13 38 0d 00 08 00 12 00 |.......C.8......| # 00000057 04 04 04 04 12 00 00 4b 00 04 1a |.......K...|
  39. 39. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Transaction log
  40. 40. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Transaction log
  41. 41. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Transaction log
  42. 42. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Transaction log
  43. 43. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Demo Time!
  44. 44. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit https://github.com/confluentinc/demo-scene/tree/master/no-more-silos Try it yourself:
  45. 45. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit "Which one should I use?" Photoby TylerNix on Unsplash
  46. 46. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit It Depends! Photo by Trevor Cole on Unsplash
  47. 47. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC Photoby MateseFields on Unsplash ✅Usually easier to setup ✅Requires fewer permissions 🛑Needs specific columns in source schema to track changes 🛑Impact of polling the DB (or higher latencies tradeoff) 🛑Can't track deletes 🛑Can't track multiple events between polling interval Read more: http://cnfl.io/kafka-cdc @rmoff #kafkasummit
  48. 48. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit CREATE TABLE my_table ( ID INT, FOO VARCHAR, BAR VARCHAR, WIBBLE VARCHAR, TS_COL TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts
  49. 49. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts INSERT
  50. 50. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts INSERT
  51. 51. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC UPDATE
  52. 52. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts UPDATE
  53. 53. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC DELETE
  54. 54. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC SELECT * FROM my_table 
 WHERE ts_col > previous ts Nope DELETE
  55. 55. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS SELECT * FROM orders 
 WHERE updateTS > previous ts
  56. 56. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 SHIPPED 29 Acacia Road 10:54:29 SELECT * FROM orders 
 WHERE updateTS > previous ts { "orderID": 42, "status": "SHIPPED", "address": "29 Acacia Road", "updateTS": "10:54:29" }
  57. 57. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 PENDING -- 10:54:00 INSERT INTO orders (orderID, status) VALUES (42, 'PENDING');
  58. 58. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 PENDING 1640 Riverside Drive 10:54:20 UPDATE orders SET address = '1640 Riverside Drive' WHERE orderID = 42;
  59. 59. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 PENDING 29 Acacia Road 10:54:25 UPDATE orders SET address = '29 Acacia Road' WHERE orderID = 42;
  60. 60. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 SHIPPED 29 Acacia Road 10:54:29 UPDATE orders SET status = 'SHIPPED' WHERE orderID = 42;
  61. 61. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit orderID status address updateTS 42 SHIPPED 29 Acacia Road 10:54:29 Query-based CDC orderID status address updateTS 42 PENDING -- 10:54:00 42 PENDING 1640 Riverside Drive 10:54:20 42 PENDING 29 Acacia Road 10:54:25 42 SHIPPED 29 Acacia Road 10:54:29
  62. 62. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Query-based CDC orderID status address updateTS 42 SHIPPED 29 Acacia Road 10:54:29 SELECT * FROM orders 
 WHERE updateTS > previous ts { "orderID": 42, "status": "SHIPPED", "address": "29 Acacia Road", "updateTS": "10:54:29" }
  63. 63. Event-driven app Log-based CDC Query-based CDC @rmoff #kafkasummit
  64. 64. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Transaction log
  65. 65. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC UPDATE Transaction log
  66. 66. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC UPDATE Transaction log
  67. 67. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC DELETE Transaction log
  68. 68. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC DELETE Transaction log
  69. 69. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC DELETE Transaction log
  70. 70. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Immutable event log
  71. 71. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Log-based CDC Photoby SebastianPociecha on Unsplash Read more: http://cnfl.io/kafka-cdc ✅ Greater data fidelity ✅ Lower latency ✅ Lower impact on source 🛑 More setup steps 🛑 Higher system privileges required 🛑 For propriatory databases, usually $$$
  72. 72. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Change-Data-Capture (CDC) Query-based Log-based
  73. 73. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit • Query-based CDC tl;dr : which tool do I use? confluent.io/connector/kafka-connect-jdbc
  74. 74. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Which Log-Based CDC Tool? • Open Source RDBMS, 
 e.g. MySQL, PostgreSQL • Debezium • (+ paid options) • Mainframe
 e.g. VSAM, IMS • Attunity • SQData • Proprietory RDBMS, 
 e.g. Oracle, MS SQL, DB2 • Oracle GoldenGate • Debezium + XStream • Attunity • IBM InfoSphere Data Replication • SQData • HVR • tcVISION • Etc https://rmoff.net/2018/12/12/streaming-data-from-oracle-into-kafka-december-2018/ See also:
  75. 75. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Real-time Event Stream Enrichment ratings customer Stream Processing Customer ratings RDBMS <y> CDC
  76. 76. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Demo Time!
  77. 77. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit https://github.com/confluentinc/demo-scene/tree/master/no-more-silos Try it yourself:
  78. 78. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit Confluent Community Components Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI SQL Stream Processing KSQL Datacenter Public Cloud Confluent Cloud CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
  79. 79. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit http://cnfl.io/book-bundle
  80. 80. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit @rmoff robin@confluent.io http://cnfl.io/slack https://www.confluent.io/download/ http://cnfl.io/kafka-cdc
  81. 81. No More Silos: Integrating Databases and Apache Kafka @rmoff #kafkasummit #EOF

×