This talk helps developers and architects understand the benefits, opportunities and challenges in moving from traditional point-to-point integration in application architecture to one with event streaming. Apache Kafka and Spring provide a solid foundation for enterprise and large organizations to implement event streaming solutions. Examples and common patterns are covered
towards the end.
Many thanks to James Watters and all the original content authors, editors and aggregators referenced in the slides.
2. Cover w/ Image
Now part
of VMware
Who am I
■ Global CTO Data and Architecture, MAPBU
■ Advising government agencies, financial
services, aircraft manufacturers, other
F2000 and a few startups
■ Joined Pivotal from the Xtreme Labs
Acquisition in 2013 - moved from Toronto to
San Diego
■ Joined VMware this January as a part of the
Pivotal Acquisition
3. Cover w/ Image
Now part
of VMware
Agenda at a Glance
■ Event Streaming Architecture: Beyond
the Hype
■ Kafka Foundation and Internals
■ Patterns and Enterprise Examples
7. James Watters, Kafka Summit 2019 Keynote,
Video: https://youtu.be/9I3CDfHKfNY?t=582 Slides (with transcript): https://alltechevent.com/kafka-summit-2019-keynote/
Now part
of VMware
8. “A business is a
series of events
and the reactions
to those events.”
—Jay Kreps, CEO, Confluent
9. “The only reason we don’t
think of events….is that so
far the technology has
trained us to think of data
as a static store.”
—Neha Narkhede
11. Events are
Fundamental
to the Design
➔ Events are the language
bridge to the business
➔ This method of
identifying bounded
contexts is a secret to
decoupled architecture
➔ “Tell don’t ask!”
12. 12C O N F I D E N T I A L
The Architecture Challenge
Netflix: https://www.slideshare.net/brucewong3/the-case-for-chaos
Twitter: https://twitter.com/adrianco/status/441883572618948608
Hail-o: http://www.sudo.hailoapp.com/services/2015/03/09/journey-into-a-microservice-world-part-3/
Sources
450+ microservices 500+ microservices 500+ microservices
13. Now part
of VMware
Motivations Behind Kafka
Jay Krep, Linkedin before Kafka:
https://genmwill.wordpress.com/2015/09/28/apache-kafka-and-the-int
ernet-of-things/
Martin Kleppmann, Why dual writes are a bad idea
https://www.confluent.io/blog/using-logs-to-build-a-solid-data-inf
rastructure-or-why-dual-writes-are-a-bad-idea/
Point-to-point integration is a nightmare Consistency across a heterogeneous system is hard
14. Cover w/ Image
Now part
of VMware
James Watters, Kafka Summit 2019 Keynote,
Video: https://youtu.be/9I3CDfHKfNY?t=786
Slides (with transcript): https://alltechevent.com/kafka-summit-2019-keynote/
There is a better way!
■ Global banking brand building
greenfield core banking and payments
with Spring [Cloud] Streams + Kafka…
■ “Kleppmann-like view of Kafka: ‘We
count on Kafka for consistency, strict
ordering, replay, durability and
auditability.’”
■ Kafka mind share 1.0 v.s 2.0
15. Cover w/ Image
Now part
of VMware
Kafka Mindshare Evolution
■ KMS 1.0: A distributed event-hub
that decouples data consumers from
data producers. This enables mass
scalability and agility.
■ KMS 2.0: A data streaming platform
that itself can act a database with ACID
transactions like capabilities.
17. Now part
of VMware
● Like traditional message brokers, it provides
pub/sub mechanism to producers and
consumers via immutable messages
● Unlike conventional message brokers, it
exposes events in a durable distributed
commit log over partitions, not exchange
and queue data structures
Kafka Compared to Traditional Message Brokers
● Consumers read based on an offset with no
ACK sent directly to the producer
● Ability to go back and forth in logical time is
particularly useful for batch processing
modernization and fault/exception recovery
in transactional systems
Apache Kafka docs: https://kafka.apache.org/documentation/#introduction
18. Now part
of VMware
No ACK is a Good
ACK!
Scalability
■ Reading from an offset, means the producer doesn’t
have to wait for a consumer ACK
■ Throughput is only limited by how fast broker can write
producer messages to disk and replicate
Agility
■ Data producer teams do not have to couple their
system to downstream slow/rogue consumers ;)
■ Schema Registry allows producer and consumer
applications to evolve their data format independently
Decoupling Producers from
Consumers Architecturally
19. Now part
of VMware
Kafka Topics and Compaction
● Pub/sub in Kafka is categorized by Topics
● Within Topics, Partitions act as load
balancers
● Fan-out pattern is achieved via Consumer
Groups
● Kafka maintains the last known value of a
message by key.
● A practical way to recover from crashes and
other faults with a bounded storage space
● The background task does not block reads
or writes
Apache Kafka docs: https://kafka.apache.org/documentation/#introduction
20. Cover w/ Image
Now part
of VMware
Topic Partitions
■ Partitions are units of parallelism / load
balancing - like queues but not queues!
■ They are strictly ordered for the
consumer(s) according to the offset
■ Important for achieving ACID transaction
processing
We will get back to this later!
Martin Kleppmann, Staying in Sync: From Transactions to Streams:
Slides
https://speakerdeck.com/ept/staying-in-sync-from-transactions-to-s
treams?slide=63
21. Cover w/ Image
Now part
of VMware
Consumer Groups
■ Enable wiretap/fan-out pattern for
different kinds of consumers on the same
topic
■ Combined with partitions you get
elasticity and extensibility
■ A hypothetical corp-HR-weary example of
agility and scale with consumer groups
and partitions: Realtime variable
compensation for all!
Kafka docs: https://kafka.apache.org/documentation/#introduction
22. Cover w/ Image
Now part
of VMware
Stateful Processing
Confluent Platform docs:
https://docs.confluent.io/current/streams/architecture.html
■ Per-message filtering, enrichment,
transformations are stateless
■ Stateless processing in simplified Extract
Transform Load (ETL) flows or Enterprise
Integration Patterns (EIP) are readily
possible with message queues - e.g.
AMQP
■ Stateful processing acting on durable
windows and persistent values is more
complex
■ It requires a State Store - Kafka Streams
natively supports RocksDB K/V store
23. Now part
of VMware
Duality of Streams and Tables
Streams as Changelogs for Tables
● Each record captures an Upsert
● Playing back the stream can recreate a Table
with message keys and values as tuples
Tables as Snapshots of Streams
● Snapshot is a representation of the most up to
date key/value pair at the point in time
Confluent Platform docs:
https://docs.confluent.io/current/streams/concepts.html
24. Cover w/ Image
Now part
of VMware
Streaming APIs in Kafka
Kafka Streams is a library for stream
processing - natively supported in Spring
Cloud Stream
■ KStream: provide similar capabilities
as user-defined functions, triggers and
stored procedures in RDMBS
■ KTable: similar to materialized views
(MV) and acts as a pre-computed cache
without the invalidation challenges
■ KSQL: Declarative SQL-like way to
query streams in Kafka
Martin Kleppmann, Turning the Database inside out
https://www.oreilly.com/learning/making-sense-of-stream-processin
g/page/5/turning-the-database-inside-out
25. Now part
of VMware
Kafka Streams
Example
Rabobank alerting platform
Jeroen van Disseldorp, Real-time Financial Alerts at Rabobank with Apache Kafka’s Streams API
https://www.confluent.io/blog/real-time-financial-alerts-rabobank-apache-kafkas-streams-api/
26. Now part
of VMware
Not
Much
Code!
Jeroen van Disseldorp, Real-time Financial Alerts at Rabobank with Apache Kafka’s Streams API
https://www.confluent.io/blog/real-time-financial-alerts-rabobank-apache-kafkas-streams-api/
27. Cover w/ Image
Now part
of VMware
Further Reading
What started all of this
■ Jay Kreps, The Log: What every software
engineer should know about real-time data's
unifying abstraction
2019 Kafka Summit 2019 Keynotes
■ Neha Narkhede, Event Streaming: Our
Cloud-Native Journey Lessons
■ Martin Kleppmann, Is Kafka a Database?
■ Jay Kreps, Events Everywhere
■ James Watters, Spring Boot+Kafka: The New
Enterprise Platform
A master level read for any software engineer working on distributed
systems
31. Now part
of VMware
Investment Bank
Asset Management Division
Jared Ruckle, Matt Stine, Ford Donald, and Guillermo Tantachuco -
PCF Secure Hybrid Banking White Paper
32. Asset Management Reference Architecture on CQRS
Jared Ruckle, Matt Stine, Ford Donald, and Guillermo Tantachuco - PCF Secure Hybrid Banking White Paper
33. App and Data Integration via streaming
ETL and event intermediation
34. Centene Corporation
Bryan Zelle, Building an Enterprise Eventing Framework
https://www.confluent.io/online-talks/building-an-enterprise-eventing-framework-on-demand
35. Now part
of VMware
Centene Corp.
Decorating events with metadata
for Taxonomy and Governance
Bryan Zelle, Building an Enterprise Eventing Framework
https://www.confluent.io/online-talks/building-an-enterprise-eventing-framework-on-demand
36. Change Data Capture onboards Legacy
Systems of Record to the Streaming
Platform
37. HCSC Member Profile - Migration from Legacy RDBMS
Anupama Pradhan and Jeff Cherng - Rethinking RDBMS Data Migration, Cloud Foundry Summit 2017
Also see: https://www.slideshare.net/AndyAshta/delivering-healthcare-value-through-transformation-to-big-data-streams
38. ➔ Mainframe and monolithic RDBMS data
teams often the last to move to
continuous delivery
➔ CDC, Event Shunting, patterns
emerging allow streaming data platform
teams to offer mainframe and legacy
RDBMS events to microservices teams
➔ Each team can build appropriate
persistence and achieve multi-DC
replication with streaming platform
Let’s empower
pharmacy microservices
developers while
evolving our legacy?
39. Batch to Streaming is the natural
evolution of reacting to business events
in Realtime
40. Now part
of VMware
Real Estate Data
Integrated Data and Analytics
Platform - hybrid of batch and
streaming
Protected by copyright. All rights Reserved by CoreLogic.
40
Data Management /
Operations
Data Content Data
Governance
Data Supply Chain
(Real-Time & Batch)
Distribution Channels
Integrated Data and Analytics
42. Just-in-time Data Manufacturing
Analytics Platform
Pivotal Cloud Foundry
Parse Transform Enrich Store
Enrich Filter Store
Filter score Transform
Batch
Process
Analyze
Data
Source
Data
Source
Parse Transfrom
Transform
score
score
Enrich
REST API
REST API
Streaming
API
Train
Enrich
Transform
Data Lake Data Lake
Filter
Filter
score
Analyze Train
Parse
Parse
44. ➔ Turning moving packages into streaming
data with RFID, Kafka and Spring Streams
event based microservices
➔ Kafka, Kubernetes and Spring Boot in
every shipping center
➔ Multiple business microservices teams
can layer onto streaming platform to bin
pack last mile services.
➔ Prepared for unanticipated uses cases
Revolutionize our
shipping efficiency with
streaming microservices
45. Now part
of VMware
PKS Managed Clusters
Messaging Middleware
Kafka
Binder Spring Data Repository
Event Driven Microservices
LTL Quote
Service
Scan RFIC
Services
RFID Triggered
Automation
Services
Shipping Centers
46. ➔ 100,00+ container build out of Spring
Streams, Kafka, key-value store
➔ Durability and consistency are critical
for potential legal actions
➔ Multi-phase stream processing with
Spring Streams leading to real-time
predictive microservices alerting
analysts
➔ Cross-cloud replication based on Kafka
➔ Continuously delivery required for real
time apps to improve accuracy and
functionality as project expands
Help secure a
European country?
47. Now part
of VMware
Receiver
App
process queue
Fault tolerant
receiver pairs
staging
and
replication
Apps
Stream
Workers
Data
Enrichment
Stream
Workers
Data
Enrichment
Stream
Workers
Data
Enrichment
process queue
Stream
Workers
Data
Classification
Stream
Workers
Data
Classification
Stream
Workers
Data
Classification
buffer queue
S3 RAW Store
Receiver
App
X.000 Channel
Streams
RDBMS Store
3 DC
KAFKA
Replication
NoSQL
Store
Index
Store