This document discusses the top 5 use cases and architectures for data in motion in 2022. It describes:
1) The Kappa architecture as an alternative to the Lambda architecture that uses a single stream to handle both real-time and batch data.
2) Hyper-personalized omnichannel experiences that integrate customer data from multiple sources in real-time to provide personalized experiences across channels.
3) Multi-cloud deployments using Apache Kafka and data mesh architectures to share data across different cloud platforms.
4) Edge analytics that deploy stream processing and Kafka brokers at the edge to enable low-latency use cases and offline functionality.
5) Real-time cybersecurity applications that use streaming data
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
The Top 5 Apache Kafka Use Cases and Architectures in 2022
1. The Top 5 Use Cases and Architectures for Data in Motion in
2022
Kappa Architecture, Omnichannel, Multi-Cloud, Edge Analytics, and Real-time Cybersecurity
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
4. @KaiWaehner www.kai-waehner.de
Real-time Data in Motion beats Slow Data.
Transportation
Real-time sensor
diagnostics
Driver-rider match
ETA updates
Banking
Fraud detection
Trading, risk systems
Mobile applications /
customer experience
Retail
Real-time inventory
Real-time POS reporting
Personalization
Entertainment
Real-time
recommendations
Personalized
news feed
In-app purchases
5. @KaiWaehner www.kai-waehner.de
This is a fundamental paradigm shift...
5
Infrastructure
as code
Data in motion
as continuous
streams of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming
6. @KaiWaehner www.kai-waehner.de
Apache Kafka is the Platform for Data in
Motion
MES
ERP
Sensors
Mobile
Customer 360
Real-time Alerting
System
Data warehouse
Producers
Consumers
Streams and storage of real time events
Stream
processing
apps
Connectors
Connectors
Stream
processing
apps
Supplier
Alert
Forecast
Inventory Customer
Order
6
7. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
8. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
9. @KaiWaehner www.kai-waehner.de
Lambda Architecture
Option 1: Unified serving layer
9
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Serving Layer
Real-Time App
(Data Processing in Motion)
Batch App
(Data Processing at Rest)
ms
min/hr
15. @KaiWaehner www.kai-waehner.de
Kappa @ Shopify
15
Kappa Building Blocks
The Log (Kafka)
Durability with Topic Compaction and Tiered Storage
Consistency via Exactly-Once Semantics (EOS)
Data Integration via Kafka Connect
Elasticity via dynamic Kafka clusters
Streaming Framework (Kafka Streams / Flink)
Reliability and scalability
Fault tolerance
State management
Sinks
Update/Upsert for simplified design:
RDBMS, NoSQL, Compacted Kafka Topics
Append-only: Regular Kafka Topics, Time Series
17. @KaiWaehner www.kai-waehner.de
Benefits of the Kappa Architecture
The Kappa architecture leverages a single source of truth with a focus on simplicity in the
enterprise architecture
• Improve streaming to handle all the cases
• One codebase that is always in synch
• One set of infrastructure and technology
• The heart of the infrastructure is real-time, scalable, and reliable
• Improved data quality with guaranteed ordering and no mismatches
• No need to re-architect for new use cases, just connect new consumers (real-time, near
real-time, batch, RPC)
• Kappa is NOT a free lunch – know the trade-offs and best practices
17
19. @KaiWaehner www.kai-waehner.de
Kappa Concerns Solved
• Data availability / retention
Compacted Topics, Tiered Storage
• Data consistency and fault-tolerance
Exactly-once semantics, Multi-Region Clusters, Cluster Linking
• Handling late-arriving data
State management in the streaming application, proper data sinks, replay with
guaranteed ordering and timestamps
• Data reprocessing and backfill
Dynamic clusters, stateful applications (Kafka Streams, ksqlDB, external stream
processing framework like Apache Flink)
• Data integration
Kafka Connect for sources and sinks, clients for any language, REST Proxy (real-time
but also batch and RPC
19
20. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
21. @KaiWaehner www.kai-waehner.de
The New Business Reality
Technology is the business
Innovation required for survival
Yesterday’s data = failure
Modern, real-time data
infrastructure is required.
Technology was
a support function
Innovation required for growth
“Good enough” to run on
yesterday’s data
22. @KaiWaehner www.kai-waehner.de
Real-time automation of customer interactions
Improved Shipping and Delivery
Methods
Customer-Driven
In-Store Experiences
Hybrid model
Shopping
Social Influencers / Virtual Reality Shopping:
Journey-focused innovation
General Trends:
● Highly competitive market, work to thin margins
● Moving from High Street (brick & mortar) to Online (Omni-Channel)
● Personalized Customer Experience - optimal buyer journey
Customer
Experience
(CX) Operational
Efficiencies
New
Business
Models
Disruptive
Trends
in
Retail
Warehouse logistics teams aligned with
real-time, in-store demands
Automating the supply chain and core
business processes
Data-Driven
Business Decisions and
Personalized Promotions
23. @KaiWaehner www.kai-waehner.de
“Walmart is a $500 billion in revenue
company, so every second is worth millions of
dollars. Having Confluent
as our partner has been invaluable.
Kafka and Confluent are the
backbone of our digital
omnichannel transformation
and success at Walmart.”
VP of Walmart Cloud
24. @KaiWaehner www.kai-waehner.de
Real-Time Inventory System
https://www.confluent.io/blog/walmart-real-time-inventory-management-using-kafka/
https://www.confluent.io/kafka-summit-san-francisco-2019/when-kafka-meets-the-scaling-and-reliability-needs-of-worlds-largest-retailer-a-walmart-story/
● Investment in Kafka and Confluent has helped topline company
growth
● 8,500 nodes processing 11 billion events per day
● Deliver an omnichannel experience so every customer can
shop the way they want to
25. @KaiWaehner www.kai-waehner.de
Context-specific Customer 360
25
Electrical retailer
Hyper-personalized online retail
experience, turning each
customer visit into a one-on-one
marketing opportunity
Correlation of historical customer
data with real-time digital signals
Maximize customer satisfaction
and revenue growth, increased
customer conversions
https://www.confluent.io/customers/ao/
26. @KaiWaehner www.kai-waehner.de
Dick’s Sporting Goods
26
America’s largest sporting goods retail company
Focused on helping athletes achieve their personal best
Reshape the way athletes gain access to context-specific product
information in real time for a more seamless purchasing
experience online and in stores
Handle pricing and promotions, marketing, and athlete services in
real time to ensure a consistent omnichannel experience and
positive athlete service interaction
Fully-managed multi-cloud strategy with Confluent Cloud for
improved time-to-market and reduced operations cost.
confluent.io/customers/dicks-sporting-goods
27. @KaiWaehner www.kai-waehner.de
Omnichannel Retail
Time
P
C3 C2
C1
Sales Talk on site in
Car Dealership
Right now
Location-based
Customer Action
Customer 360
(Website, Mobile App, On Site in Store, In-Car)
Car Configurator
10 and 8 days ago
Context-specific
Marketing Campaign
90 and 60 days ago
30. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
35. @KaiWaehner www.kai-waehner.de
35
With stream processing the real-time applications are decentralized
Data
Product
STREAM
PROCESSOR
ksqlDB
Query is the interface to
the mesh
Events are the interface
to the mesh
37. @KaiWaehner www.kai-waehner.de
Data Mesh Example: Hybrid Multi-Cloud Architecture
37
Data Engineers
Data Scientists
Data Architects Operators
Architects
SMEs
Data Governance
Shared Services
Application team
Generalist Eng
Generalists Eng
Specialized / Legacy Engineers
38. @KaiWaehner www.kai-waehner.de
Kafka as a Service – Fully Managed?
Infrastructure
management
(commodity)
Scaling
● Upgrades (latest stable version of Kafka)
● Patching
● Maintenance
● Sizing (retention, latency, throughput, storage, etc.)
● Data balancing for optimal performance
● Performance tuning for real-time and latency requirements
● Fixing Kafka bugs
● Uptime monitoring and proactive remediation of issues
● Recovery support from data corruption
● Scaling the cluster as needed
● Data balancing the cluster as nodes are added
● Support for any Kafka issue with less than X minutes response time
Infra-as-a-Service
Harness full power of Kafka
Kafka-specific
management
Platform-as-a-Service
Evolve as you need
Future-proof
Mission-critical reliability
Most Kafka-as-a-Service offerings are partially-managed
Kafka as a Service should be a serverless experience with consumption-based pricing!
39. @KaiWaehner www.kai-waehner.de
Data Governance: Tracking data lineage with Streams in real-time
39
• Lineage must work across domains and data products—and systems, clouds, data centers.
• Event streaming is a foundational technology for this.
On-premise
40. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
41. @KaiWaehner www.kai-waehner.de
What is the “Edge” for Kafka?
• Edge is NOT a data center
• Kafka clients AND the Kafka broker(s)
• Offline business continuity
• Often 100+ locations
• Low-footprint and low-touch
• Hybrid integration
45. @KaiWaehner www.kai-waehner.de
Point of Sale
(POS) Loyalty
System
Local Inventory
Management
Payment Discount
Customer
data
Train
schedule
Payment
data
Loyalty
information
Streams of real time events
Global Inventory
Management
Event Streaming at the Edge
in the Smart Retail Store
Item Availability
46. @KaiWaehner www.kai-waehner.de
Disconnected Edge
Time
P
C3 C2
C1
Context-specific
Advertisement
Real-time
(Milliseconds)
Location-based
Customer Action
Always on (even “offline”)
Replayability
Reduced traffic cost
Better latency
Payment Processing
Near Real-time
(Seconds)
Replication to Cloud
Batch
(Depending on Network Bandwidth)
48. @KaiWaehner www.kai-waehner.de
Devon Energy Corporation
Oil & Gas Industry
Improve drilling and well completion operations
Edge stream processing/analytics + closed-loop control ready
Replication to the cloud in real-time at scale
Vendor agnostic (pumping, wireline, coil, offset wells, drilling
operations, producing wells
Cloud agnostic (AWS, GCP, Azure)
49. @KaiWaehner www.kai-waehner.de
The Top 5 Use Cases and Architectures for Data in Motion in
2022
1) The Kappa Architecture
2) Hyper-personalized Omnichannel
3) Multi-Cloud Deployments
4) Edge Analytics
5) Real-time Cybersecurity
50. @KaiWaehner www.kai-waehner.de
What is Cybersecurity?
Protection of computer systems and networks from information disclosure and theft
Web Scraping, hackers, criminals, terrorists, state-sponsored and state-initiated actors
50
51. @KaiWaehner www.kai-waehner.de
Supply Chain Attack
Targeting less-secure elements in the supply chain
51
https://www.nortonrosefulbright.com/en/knowledge/publications/dfa3603c/six-degrees-of-separation-cyber-risk-across-global-supply-chains
https://www.reuters.com/article/us-tmobile-dataprotection-idUSKCN0RV5PL20151002
52. @KaiWaehner www.kai-waehner.de
Real-time Data in Motion beats Slow Data.
Security
Access control and encryption
Regulatory compliance
Rules engine
Security monitoring
Surveillance
Cybersecurity
Risk classification
Threat detection
Intrusion detection
Incident response
Fraud detection
53. @KaiWaehner www.kai-waehner.de
Data in Motion
The Backbone for Cybersecurity
Industria
l OT
Enterpris
e IT
Consumer
IoT
Logs Personal
Sensors Security
Streams of real time
events
53
Connected
Vehicles
Cyber
Security
Continuous
Data Correlation
Monitoring
Alerting
Proactive Actions
54. @KaiWaehner www.kai-waehner.de
End-to-End Cybersecurity
with the Kafka Ecosystem
Personel
Crew, Cargo
Vessel
Fuel Consumption, Speed,
Planned Maintenance
Tracking
Position, Course, Weather, Draft
Drone or Satellite Relay
COMMs Resilient Kafka
Edge Analytics
Data
Integration
Streaming Analytics
Machine Doing
On-Prem Systems
Bi-Directional Hybrid Cloud Replication
ON SHORE
ON PREM
Staging, Filtering
Shore Edge Analytics
55. @KaiWaehner www.kai-waehner.de
SIEM / SOAR
Situational Awareness
Operational Awareness
Intrusion Detection
Signals and Noise
Signature Detection
Incident Response
Threat Hunting & Intelligence
Vulnerability Management
Digital Forensics
…
was not built for cybersecurity!
56. @KaiWaehner www.kai-waehner.de
Integrate with all legacy and modern interfaces
Record, filter, curate a broad set of traffic streams
Let analytic sinks consume just the right amount of data
Drastically reduce the complexity of the enterprise architectures
Drastically reduce the cost of SIEM / SOAR deployments
Add new analytics engines
Add stream-speed detection and response at scale in real-time
Add mission-critical (non-) security-related applications
…
is the backbone for cybersecurity!
57. @KaiWaehner www.kai-waehner.de
Confluent Sigma
Sigma Stream Processors
Zeek Data and
Detections Viewer
Sigma Rule Editor
sigma rules topic
DNS
dns
detections
topic
dns topic
rule parsing,
filtering,
aggregation,
windowing
sigma
rules
cache
CONN
DHCP
HTTP
SSL
x509
Zeek Data
60. @KaiWaehner www.kai-waehner.de
The Rise of Data in Motion
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
60
61. @KaiWaehner www.kai-waehner.de
I N V E S T M E N T & T I M E
V
A
L
U
E
3
4
5
1
2
Event Streaming Maturity Model
Initial Awareness
/
Pilot (1 Kafka
Cluster)
Start to Build
Pipeline / Deliver
1 New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched,
Hybrid, Multi-
Region)
Build Contextual
Event-Driven Apps
(Stretched,
Hybrid, Multi-
Region)
Central Nervous
System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
61
I want to call out four major trends: (1) cloud, (2) AI and machine learning, (3) mobile devices and ubiquitous connectivity, (4) event streaming. Each of these trends change the way we think.
1) The cloud has changed how we think about data centers and running technical infrastructure. Today, every company is moving to the cloud—your company is [quite likely] doing the same.
2) Machine learning changes how decisions are being made, and this happens increasingly in an automated manner, driven by software that talks to other software.
3) Mobile devices and Internet connectivity have dramatically changed the user experience of how customers want to interact with us, and raised the bar for their expectations. If you can rent the latest blockbuster movie with 1 click on an iPad, you will no longer accept that your bank can take hours or days to inform you of a payment.
4) Event streaming has changed how we think about and how we work with the data that underlies all the other trends. This is the subject of this talk, so let’s take a closer look!
The same is true for running a business. No matter the industry, real-time data beats slow data. Here are a but a few examples, some of which you may recognize from your own use cases.
So Event Streaming is really a fundamental paradigm shift. Just like the Cloud is the future of the Data CENTER, where we now treat physical infrastructure as software code so we can spin up new servers in a matter of seconds, Event Streaming is the future of DATA itself. Here, we realize that, in the real world, data about our business is a continuous, never-ending stream of events, and customers expect us to understand and respond immediately to all this information. [NEXT SLIDE, “What is Event Streaming?”]
There is a new business reality. In the past, technology was a mere support function. We innovated when we needed to grow the business. And in this situation, it was “good enough” to run the business on yesterday’s data. But today, technology IS the business. And if you don’t innovate, you will lose to the competition. And in order to survive, we need modern, real-time data infrastructures.
Here is the story of Walmart, the largest retailer in the world. Walmart’s success is largely dependent on their digital capabilities. Let me share just a few numbers of what they need to integrate: 5000+ stores, 150+ distribution centers, 1000+ vendors, 53K+ trailers owned, 1M+ online transactions, 25M customers per week. Today, Kafka is used for Walmart’s real-time inventory systems, fulfillment, security, fraud prevention. It’s used all across Walmart.com: every single click is streamed into Kafka and made available to every application that needs to consume that data. Another example is Walmart’s grocery pick-up business, which has become more important than ever in the age of COVID. Event streaming enables this from the beginning to the end: when customers interact with their app, all the user behavioral data is streamed into Confluent. When orders are placed, all data flows into Confluent. When the customer enters the store to pick up their groceries, those events are streamed to Confluent. And so on. As we can see, event streaming and Kafka are at the heart of Walmart’s success and their digital transformation.
Small scale data pipelines constantly broken.
Large scale: finance and risk have completely different numbers. Story of one path for books in an investment bank. Boose Allen Hamilton. 3 months analysis. 2 hours to explain.
This allows the applications to connect around data in motion
Acts as a kind of central nervous system
Let’s something happening in one part of the company, trigger the right updates and response everywhere else as it occurs
...Event Streaming with Kafka. Here, data is provided to other data products through streams in Kafka. And any data product can consume via Kafka from the high-quality data streams of other data products. As we can see, this idea of a data mesh is very similar to the idea of a Central Nervous System, where data is continuously flowing, being processed, analyzed, acted upon. Now, we must remember that the data mesh shown here is a LOGICAL view, not a physical one. [OUTRO] If you know Kafka, you know that the reality looks a bit different and...a bit better.
ksqlDB turns the data mesh into something you can query, while still having all the benefits of being decentralized
A self-serve platform can have multiple planes that each serve a different profile of users. In the following example, lists three different data platform planes:
Data infrastructure provisioning plane: supports the provisioning of the underlying infrastructure, required to run the components of a data product and the mesh of products. This includes provisioning of a distributed file storage, storage accounts, access control management system, the orchestration to run data products internal code, provisioning of a distributed query engine on a graph of data products, etc. I would expect that either other data platform planes or only advanced data product developers use this interface directly. This is a fairly low level data infrastructure lifecycle management plane.
Data product developer experience plane: this is the main interface that a typical data product developer uses. This interface abstracts many of the complexities of what entails to support the workflow of a data product developer. It provides a higher level of abstraction than the 'provisioning plane'. It uses simple declarative interfaces to manage the lifecycle of a data product. It automatically implements the cross-cutting concerns that are defined as a set of standards and global conventions, applied to all data products and their interfaces.
Data mesh supervision plane: there are a set of capabilities that are best provided at the mesh level - a graph of connected data products - globally. While the implementation of each of these interfaces might rely on individual data products capabilities, it’s more convenient to provide these capabilities at the level of the mesh. For example, ability to discover data products for a particular use case, is best provided by search or browsing the mesh of data products; or correlating multiple data products to create a higher order insight, is best provided through execution of a data semantic query that can operate across multiple data products on the mesh.
In this final example, we can see again that there are lots of data streams within a data mesh. These data streams may span across systems, data centers, clouds, and so on. For the purpose of tracking data lineage, we ideally want to cover the full mesh, so we must follow the data. Event streaming is again a key technology to implement this in practice, because it lets you track data-in-motion all the way from its origins to intermediate and to the final destinations.
The same is true for running a business. No matter the industry, real-time data beats slow data. Here are a but a few examples, some of which you may recognize from your own use cases.
The rise of Event Streaming can be traced back to 2010, when Apache Kafka was created by the future Confluent founders in Silicon Valley. From there, Kafka began spreading throughout Silicon Valley and across the US West coast. [CLICK] Then, in 2014, Confluent was created with the goal to turn Kafka into an enterprise-ready software stack and cloud offering, after which the adoption of Kafka started to really accelerate. [CLICK] Fast forward to 2020, tens of thousands of companies across the world and across all kinds of industries are using Kafka for event streaming.
What I am telling my family and friends is: You are a Kafka user, whether you know it or not. When you use a smartphone, shop online, make a payment, read the news, listen to music, drive a car, book a flight—it’s very likely that this is powered by Kafka behind the scenes. Kafka is applied even to use cases that I personally would have never predicted, like by scientists for research on astrophysics, where Kafka is used for automatically coordinating globally-distributed, large telescopes to record interstellar phenomenons!
Know 5 stages and talking point for each one.
There’s a common pattern of how organizations adopt this technology.
First, there is initial awareness or a pilot, where an organization is getting to know the technology.
This is followed by the initial development of a basic event pipeline, and the delivery at least 1 new business outcome - maybe provisioning a single source of truth for microservices, or offloading data from a mainframe.
The third stage involves incorporating and leveraging stream processing. In this stage, an organization is not only collecting and transporting data in real-time, but also processing it for added value.
The fourth stage is when an organization starts to build business-transforming contextual event-driven applications. This is a new category of applications - unique to event streaming - where real-time events can be combined with context to deliver powerful, profitable outcomes.
The last stage is when event streaming is pervasive and becomes the central nervous system of the enterprise.
Examples of this in the consumer world are Netflix and LinkedIn… and in the enterprise world are organizations like Capital One.
Confluent accelerates the trajectory of customer journeys to event streaming through its products, support, training, our partner ecosystem and technical account management and services. Let’s talk about you - Where do you see your team on this journey today? How about your LOBs? Your company as a whole? Let’s talk for a few minutes about how we can get you where you need to go.
What we build is a full, enterprise-ready platform to complete open source Apache Kafka.
On top of Kafka, we build a set of features to unleash developer productivity, including the ability to leverage Kafka in languages other than Java, a rich pre-built ecosystem including over 100+ connectors so developers don’t have to spend time building connectors themselves, and enabling stream processing with the ease and familiarity of SQL.
Kafka can sometimes be complex and difficult to operate at scale… we make that easy through GUI-based management and monitoring, DevOps automation including with Kubernetes Operator, and enabling dynamic performance and elasticity in deploying Kafka.
Also, we offer a set of features many organizations consider as pre-requisites when deploying mission-critical apps on Kafka. These include security features that control who has access to what, the ability to investigate potential security incidents via audit logs, the ability to ensure no ‘dirty’ data in Kafka, and that only ‘clean’ data is in the system through schema validation, and features around resilience, so for example if your data center goes down, your customer-facing applications stay running.
We offer all of this with freedom of choice, meaning you can choose self-managed software that you can deploy anywhere, including on-premises, public cloud, private cloud, containers, or Kubernetes. Or you can choose our fully managed cloud service, available on all 3 major cloud providers.
And, importantly, underpinning all this is our committer-led expertise. We at Confluent have over X hours of experience with Kafka. We offer support, professional services, training, and a full partner ecosystem. Simply put, there is no other organization in the world better suited to be an enterprise partner, and no organization in the world that is more capable of ensuring your success. This means everything to the organizations we work with.