SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
The Possibilities And Pitfalls of Writing
Your Own State Stores
Daan Gerits
Setting The Scene
KOR Financial
Regulatory Reporting
Event Driven Organization
Event Driven Foundation
Based on Kafka
Long retention (40j!)
Rethink common practices
2
3
Common API approach
4
CQRS API Approach
5
Focus Of This Talk
What is a state store
Part of Kafka Streams
Embedded “cache”
Internal to the application
Local vs Global Statestores
Fault Tolerant through “changelog topics”
Queryable from outside through Interactive Query Service (IQ)
6
The
Challenges
Challenge 1
Key Value Only
KEY
9
value
Challenge 1: Key Value Only
Fast!
GET, SCAN
RocksDB
In-Memory, overflow to disk
Tweak the default memory settings!
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks
https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks
Be smart with keys!
10
Challenge 1: Key Value Only
But what if we had:
- Search capabilities
- Filter-like API
11
Challenge 1: Key Value Only
But what if we had:
- Search capabilities
- Filter-like API
Custom Statestores!
12
Challenge 1: Key Value Only
But what if we had:
- Search capabilities
- Filter-like API
Embedded databases:
- H2, HSQLDB, Derby
- Lucene
- NitriteDB -> https://github.com/nitrite/nitrite-java
13
Challenge 1: Key Value Only
NitriteDB
Document Store
Natural Filtering API
Supports Indexes
14
Cursor cursor = collection.
find(
// and clause
and(
// firstName == John
eq("firstName", "John"),
// elements of data array is less than 4
elemMatch("data", lt("$", 4)),
// elements of fruits list has one element matching orange
elemMatch("fruits", regex("$", "orange")),
// note field contains string 'quick'
text("note", "quick")
)
);
for (Document document : cursor) {
// process the document
}
NO2 keys are Integers and values are Documents
→ Use NO2 Documents as envelopes
→ No relation between NO2 keys and kafka keys
→ But NO2 can search on value (doc.key == …)
Challenge 1: Key Value Only
15
What about fault-tolerance?
→ Changelog topic (compacted)
→ On application Start
Data on FS → Load from FS
Data not on FS → Restore from changelog
Challenge 1: Key Value Only
16
Challenge 1: Key Value Only
17
Topology topology = new Topology()
// read commands from the movie-events topic
.addSource("sourceProcessor"
, Serdes.String().deserializer(), eventSerde.
deserializer(), "movie-events"
)
// add a processor to manipulate the no2 store based on incoming events
.addProcessor("commandHandler"
, MovieEventHandler::
new, "sourceProcessor"
)
// add the no2 statestore itself, using “code” as the key field
.addStateStore(
DocumentStores.
nitriteStore("movies", Serdes.String(), movieSerde, Movie.class, "code"),
"cmdHandler")
// write out processing results back to the original movie-events topic
.addSink("sinkProcessor"
, "movie-events"
, Serdes.String().serializer(), eventSerde.
serializer(), "cmdHandler");
Integrate into Kafka Streams Topology
Challenge 1: Key Value Only
18
// get the statestore from the processor context
DocumentStore<String, Movie, ObjectFilter> store = context.
getStateStore("movies");
// retrieve all movies which contain “Matrix” as part of the title
QueryCursor<Movie> movies = store.
find(and(ObjectFilters.
regex("title", ".*Matrix.*")));
Integrate into Kafka Streams Processors (processor API)
Challenge 2
IQ Querying Limitations
Challenge 2: IQ Querying Limitations
Access data in statestores
Bring-your-own-api
IQv2
IQ Metadata API
Keeps track of which data is located at which instance
21
22
GET people?firstname=john
A
API
B
API
n
API
23
GET people?firstname=john
A
API
B
API
n
API
GET
GET
24
GET people?firstname=john
A
API
B
API
n
API
[...]
25
GET people?firstname=john
A
API
B
API
n
API
[...]
26
[...]
A
API
B
API
n
API
Challenge 2: IQ Querying Limitations
Skewed Data
Time Query(B) < Time Query(C)
A cannot return until all results are in
Shard Failures
What if B is down (or rebalancing)
27
B
API
C
API
28
GET people?firstname=john&offset=50&size=25
A
API
B
API
n
API
29
GET people?firstname=john&offset=50&size=25
A
API
B
API
n
API
30
GET people?firstname=john&offset=50&size=25
A
API
B
API
n
API
[A.results, B.results, …].
sort().subset(50, 25)
Challenge 2: IQ Querying Limitations
Paging
Get parts of the result
Requires sorting for consistent results
Does not scale well:
# records = (partitions * (offset + size))
31
Challenge 3
Cloud Native Issues
33
“Kubernetes is great for many things.
Storage is not one of them.”
Challenge 3: Cloud Native Issues
Rebalances will get you
→ require statestores to be built again from scratch
→ Persistent Volumes help, but are a pain
34
Challenge 3: Cloud Native Issues
Rebalances will get you
→ require statestores to be built again from scratch
→ Persistent Volumes help, but are a pain
CICD flows trigger A LOT of rebalances
35
Conclusion
37
Are these challenges solvable?
38
YES
39
Did you solve all of them?
40
NO
Conclusion
Why?
Doing it right requires time and effort
Not our KOR (pun intended) business
Economical Trade-off
41
Conclusion
APIs can be hosted directly from statestores
→ Your Mileage May Vary
Alternatively
→ Use external storage (ElasticSearch, Mongodb, Arangodb, …)
→ Prefer Connectors over writing directly to external storage
→ Beware of the additional external dependency
→ Flink Statefun (stateful functions) ?
42
SHONO
daan@shono.io
Twitter: @daangerits
Github: @calmera

Weitere ähnliche Inhalte

Ähnlich wie The Possibilities and Pitfalls of Writing Your Own State Stores with Daan Gertis

From Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practiceFrom Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practiceStefanTomm
 
mongoDB Performance
mongoDB PerformancemongoDB Performance
mongoDB PerformanceMoshe Kaplan
 
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMPInria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMPStéphanie Roger
 
Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Aysylu Greenberg
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
Test-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxTest-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxVictor Rentea
 
Java EE 6 CDI Integrates with Spring & JSF
Java EE 6 CDI Integrates with Spring & JSFJava EE 6 CDI Integrates with Spring & JSF
Java EE 6 CDI Integrates with Spring & JSFJiayun Zhou
 
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraSeattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraJosh Turner
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_netNico Ludwig
 
From Android NDK To AOSP
From Android NDK To AOSPFrom Android NDK To AOSP
From Android NDK To AOSPMin-Yih Hsu
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverMongoDB
 
Testing Zen
Testing ZenTesting Zen
Testing Zenday
 
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022HostedbyConfluent
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9Ivan Krylov
 

Ähnlich wie The Possibilities and Pitfalls of Writing Your Own State Stores with Daan Gertis (20)

From Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practiceFrom Java to Kotlin - The first month in practice
From Java to Kotlin - The first month in practice
 
mongoDB Performance
mongoDB PerformancemongoDB Performance
mongoDB Performance
 
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMPInria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
Inria Tech Talk : Comment améliorer la qualité de vos logiciels avec STAMP
 
Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021Software Supply Chains for DevOps @ InfoQ Live 2021
Software Supply Chains for DevOps @ InfoQ Live 2021
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Test-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxTest-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptx
 
Java EE 6 CDI Integrates with Spring & JSF
Java EE 6 CDI Integrates with Spring & JSFJava EE 6 CDI Integrates with Spring & JSF
Java EE 6 CDI Integrates with Spring & JSF
 
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraSeattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
 
From Android NDK To AOSP
From Android NDK To AOSPFrom Android NDK To AOSP
From Android NDK To AOSP
 
Webinar: What's new in the .NET Driver
Webinar: What's new in the .NET DriverWebinar: What's new in the .NET Driver
Webinar: What's new in the .NET Driver
 
Testing Zen
Testing ZenTesting Zen
Testing Zen
 
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
Lowering the Barrier to Stream Processing With Alex Morley | Current 2022
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
What to expect from Java 9
What to expect from Java 9What to expect from Java 9
What to expect from Java 9
 

Mehr von HostedbyConfluent

Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Mehr von HostedbyConfluent (20)

Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Kürzlich hochgeladen

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

The Possibilities and Pitfalls of Writing Your Own State Stores with Daan Gertis

  • 1. The Possibilities And Pitfalls of Writing Your Own State Stores Daan Gerits
  • 2. Setting The Scene KOR Financial Regulatory Reporting Event Driven Organization Event Driven Foundation Based on Kafka Long retention (40j!) Rethink common practices 2
  • 6. What is a state store Part of Kafka Streams Embedded “cache” Internal to the application Local vs Global Statestores Fault Tolerant through “changelog topics” Queryable from outside through Interactive Query Service (IQ) 6
  • 10. Challenge 1: Key Value Only Fast! GET, SCAN RocksDB In-Memory, overflow to disk Tweak the default memory settings! https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks Be smart with keys! 10
  • 11. Challenge 1: Key Value Only But what if we had: - Search capabilities - Filter-like API 11
  • 12. Challenge 1: Key Value Only But what if we had: - Search capabilities - Filter-like API Custom Statestores! 12
  • 13. Challenge 1: Key Value Only But what if we had: - Search capabilities - Filter-like API Embedded databases: - H2, HSQLDB, Derby - Lucene - NitriteDB -> https://github.com/nitrite/nitrite-java 13
  • 14. Challenge 1: Key Value Only NitriteDB Document Store Natural Filtering API Supports Indexes 14 Cursor cursor = collection. find( // and clause and( // firstName == John eq("firstName", "John"), // elements of data array is less than 4 elemMatch("data", lt("$", 4)), // elements of fruits list has one element matching orange elemMatch("fruits", regex("$", "orange")), // note field contains string 'quick' text("note", "quick") ) ); for (Document document : cursor) { // process the document }
  • 15. NO2 keys are Integers and values are Documents → Use NO2 Documents as envelopes → No relation between NO2 keys and kafka keys → But NO2 can search on value (doc.key == …) Challenge 1: Key Value Only 15
  • 16. What about fault-tolerance? → Changelog topic (compacted) → On application Start Data on FS → Load from FS Data not on FS → Restore from changelog Challenge 1: Key Value Only 16
  • 17. Challenge 1: Key Value Only 17 Topology topology = new Topology() // read commands from the movie-events topic .addSource("sourceProcessor" , Serdes.String().deserializer(), eventSerde. deserializer(), "movie-events" ) // add a processor to manipulate the no2 store based on incoming events .addProcessor("commandHandler" , MovieEventHandler:: new, "sourceProcessor" ) // add the no2 statestore itself, using “code” as the key field .addStateStore( DocumentStores. nitriteStore("movies", Serdes.String(), movieSerde, Movie.class, "code"), "cmdHandler") // write out processing results back to the original movie-events topic .addSink("sinkProcessor" , "movie-events" , Serdes.String().serializer(), eventSerde. serializer(), "cmdHandler"); Integrate into Kafka Streams Topology
  • 18. Challenge 1: Key Value Only 18 // get the statestore from the processor context DocumentStore<String, Movie, ObjectFilter> store = context. getStateStore("movies"); // retrieve all movies which contain “Matrix” as part of the title QueryCursor<Movie> movies = store. find(and(ObjectFilters. regex("title", ".*Matrix.*"))); Integrate into Kafka Streams Processors (processor API)
  • 19.
  • 20. Challenge 2 IQ Querying Limitations
  • 21. Challenge 2: IQ Querying Limitations Access data in statestores Bring-your-own-api IQv2 IQ Metadata API Keeps track of which data is located at which instance 21
  • 27. Challenge 2: IQ Querying Limitations Skewed Data Time Query(B) < Time Query(C) A cannot return until all results are in Shard Failures What if B is down (or rebalancing) 27 B API C API
  • 31. Challenge 2: IQ Querying Limitations Paging Get parts of the result Requires sorting for consistent results Does not scale well: # records = (partitions * (offset + size)) 31
  • 33. 33 “Kubernetes is great for many things. Storage is not one of them.”
  • 34. Challenge 3: Cloud Native Issues Rebalances will get you → require statestores to be built again from scratch → Persistent Volumes help, but are a pain 34
  • 35. Challenge 3: Cloud Native Issues Rebalances will get you → require statestores to be built again from scratch → Persistent Volumes help, but are a pain CICD flows trigger A LOT of rebalances 35
  • 39. 39 Did you solve all of them?
  • 40. 40 NO
  • 41. Conclusion Why? Doing it right requires time and effort Not our KOR (pun intended) business Economical Trade-off 41
  • 42. Conclusion APIs can be hosted directly from statestores → Your Mileage May Vary Alternatively → Use external storage (ElasticSearch, Mongodb, Arangodb, …) → Prefer Connectors over writing directly to external storage → Beware of the additional external dependency → Flink Statefun (stateful functions) ? 42