Lesfurest.com invited me to talk about the KAPPA Architecture style during a BBL.
Kappa architecture is a style for real-time processing of large volumes of data, combining stream processing, storage, and serving layers into a single pipeline. It's different from the Lambda architecture, uses separate batch and stream processing pipelines.
2. About Quicksign
QuickSign is the European leader in digital onboarding
for financial services. We provide KYC, OCR and
electronic signatures.
Our offer: customized workflows & OCR driven
back-office interfaces to remove operational burdens
for high volumes B2C contracts.
Currently moving from a monolithic product to a BPMN
orchestrated micro-services Kappa CQRS-ES based
plateform
3. • Quicksign CTO since January 2014
• Before that tech Lead then CTO at ProxiAD
since 2009
• Co-Organized the Eclipse Day Paris in 2010
and 2011
• Designing enterprise architectures since 2005
• Passionate about tech since I got my first
Macintosh SE in 1987
Architect of the new Quicksign Kappa platform
Who am I ? - Cédric VIDAL
@cedricvidal
@cedricvidal
4. Kappa Architecture
Kappa Architecture is a software architecture pattern.
Rather than using a relational DB like SQL or a key-value
store like Cassandra, the canonical data store in a Kappa
Architecture system is an append-only immutable log. From
the log, data is streamed through a computational system
and fed into auxiliary stores for serving.
http://www.kappa-architecture.com
5. Compared to Lambda Architecture ?
http://www.kappa-architecture.com
Kappa Architecture is a simplification of Lambda Architecture. A Kappa
Architecture system is like a Lambda Architecture system with the batch
processing system removed. To replace batch processing, data is simply fed
through the streaming system quickly.
6. LOG DATA STORES
An append-only immutable log store is the canonical store in a Kappa Architecture (or Lambda
Architecture) system. Some log databases:
• Apache Kafka
• DistributedLog
STREAMING COMPUTATION SYSTEMS
In Kappa Architecture, data is fed from the log store into a streaming computation system. Some
distributed streaming systems:
• Apache Samza
• Apache Storm
• Apache Spark
• Amazon Kinesis
• Kafka Streams
• Kafka KSQL (SQL DSL over Kafka Streams)
• Apache Flink
• Onyx
• Hazelcast Jet
http://www.kappa-architecture.com
What middlewares ?
7. • Analytics (predictive or not)
• Reporting
• Big data
• IoT
Known users
• Advertising (Criteo)
• ERDF Linky ? (not sure)
Working with numbers, statistics …
… what about business logic ?
Typical use cases and users
8. General observation
• Works for extremely massive big data use cases
• Should handle easily our CQRS ES based workloads …
Technical details of interest
• Sharding by design: with topic partitions, 1 FD / partition / topic
• RPC like low latency: direct socket to socket connection when
producer and consumer are connected at the same time
• Durability
• Replayability
• Topic replication
Why Kafka ?
9. Technical details of interest
• No central topology orchestrator
• No need to submit specific stream processors
• a simple static void main which joins the Kafka Stream cluster
• Very natural with the stateless Kubernetes deployment model
• Embedded ephemeral disk based KV Store to persist intermediate
stream states and query stores (Facebook’s RocksDB)
Why Kafka Streams ?
10. A SQL DSL over Kafka Streams
Generates Kafa Streams topologies
Maintains intermediate state and topics automatically
Gotchas
Very new ! First commit in Sept 2017
But a lot of potential !
A lot of activity in the repos
Already released versions
KSQL
Sept 2017
11. • Data ingestion
• CDC: Change Data Capture
Plus
• Allows to quickly bootstrap a data ingestion project
Gotchas
• Unlike Kafka Stream, requires to submit the Connect descriptor to the
orchestrator
• The data is RAW and WILL require some post-processing (using Kafka
Streams or KSQL) to make it usable
Kafka Connect
12. Business logic requires
• ordering
• consistency
amongst other things
=> CQRS (ES)
What about business logic and µS architectures ?
14. Command Query Responsibility Segregation (Event Sourcing)
• Concepts and examples
– The user sends a Command to the application -> (PurgeCaseCommand,
UpdateCaseEtatEnvoiFtpCommand, …)
– The Command is evaluated by a CommandHandler (permissions,
business rules, cohesion, …) which yields (or not) one or many data
mutation Events (CasePurgedEvent, DocumentUploadedEvent, …)
– Each Event is applied by an EventHandler to the AggregationRoot which
is persisted into an auxiliary store for querying
CQRS (ES)
16. The quest for CQRS-ES on Kafka
Quest for CQRS-ES on Kafka
• Jan 14 2016: Kafka a perfect fit for Axon ? Me on Axon Google group
http://bit.ly/axon-kafka-fit
• Sept 7 2016: Event sourcing, CQRS, stream processing and Apache Kafka:
What’s the connection? , Neha Narkhede, Confluent CTO
http://bit.ly/kafka-cqrs-es
• Devoxx 2017: Hands-on Kafka Streams, Hayssam Saleh, ebiznext Tech Lead
http://bit.ly/devoxx17-handson-kafka-streams
17. • all events concerning a given aggregation root must be handled
sequentially (actually a strength as it avoids all concurrency
headaches)
• requires a solid distributed event bus at the center of the
infrastructure
• must support guaranteed delivery and cluster wide sequential
processing of messages of a given message group (ie the
aggregation root)
• at most one delivery or better: idempotent consumers
http://bit.ly/axon-kafka-fit
CQRS-ES Challenges
18. How the F*** do I query my data ?
Take 1: Model application state as an external datastore
The output from a Kafka Streams topology is written
to an external datastore like a relational
database.
In this view of the world, the event handler is
modelled as a Kafka Streams topology and the
application state is modelled as an external
datastore that the user trusts and operates.
This option for doing CQRS advocates the use of
Kafka Streams to model just the event handler,
leaving the application state to live in an external
data store that is the final output of the Kafka
Streams topology.
Event sourcing, CQRS, stream processing and Apache Kafka: What’s the connection? , Neha Narkhede, Confluent CTO
http://bit.ly/kafka-cqrs-es
19. Kafka Streams also provides an efficient way to
model the application state — it supports local,
partitioned and durable state out-of-the-box.
This local state can be a RocksDB store, or simply,
an in-memory hashmap.
The way this works is that every instance of an
application which embeds the Kafka Streams library
to do stateful stream processing, hosts a subset of
the application’s state, modeled as shards or
partitions of the state store.
The state store is partitioned the same way as the
application’s key space. As a result, all the data
required to serve the queries that arrive at a
particular application instance are available locally
in the state store shards.
How the F*** do I query my data ?
Take 2: Model application state as local state in Kafka Streams
Fault tolerance for this local state store is provided by Kafka Streams by logging all updates
made to the state store, transparently, to a highly-available and durable Kafka topic. So if an
application instance dies and the local state store shards it hosted are lost, Kafka Streams can
recreate state store shards by simply reading from the highly-available Kafka topic and refilling
the data in the state store.
Event sourcing, CQRS, stream processing and Apache Kafka: What’s the connection? , Neha Narkhede, Confluent CTO
http://bit.ly/kafka-cqrs-es
20. How the F*** do I query my data ?
Interactive Queries
● state in Kafka Streams is sharded
● we don’t control on which nodes the data is hosted
● Kafka Streams exposes the information
● Try randomly a node, if available on it, read data and return it
● otherwise, forward request to the appropriate node and return it
● The forwarding mecanism is up to the application
○ REST server side forwarding
○ REST client side forwarding (using some error code)
21. Event Storming
How to design your domain with Kappa/CQRS-ES ?
Event storming is a workshop-based method to quickly find out
what is happening in the domain of a software program.[1]
The business process is "stormed out" as a series of domain
events which are denoted as orange stickies
Invented by Alberto Brandolini in the context of domain-driven
design.
Event Storming can be used as a means for business process
modelling and requirements engineering.
The basic idea is to bring together software developers and
domain experts and learn from each other.
https://en.wikipedia.org/wiki/Event_storming