There was a time not long ago when we used relational databases for everything. Even if the data wasn’t particularly relational, we shoehorned it into relational tables, often because that was the only database we had. Thank god these dark times are over and now we have many different kinds of NoSQL databases: Document, realtime, graph, column, but that does not solve the problem that the same data might be a graph from one perspective, but a collection of documents from another.
It would be really nice if we can access that same data in many different ways, depending on the context of what we want to achieve in our current task.
As software architects this is not easy to solve but definitely possible: We can design an architecture using Event Sourcing: Capture the data with Debezium, post it to a Kafka queue, use Kafka Streams to model the data the way we like, and store the data in various different data sources, so we can synchronize data between data sources.
8. • Service provider for professional and amateur team sports in the
Netherlands & Belgium
• 10+ years old
• Managing personal data, planning competitions, assigning officials,
supplying data feeds
9. • 2M+ players
• 6K+ clubs
• 40K matches a week
• Spikey but predictable load
10. Technology stack
• Oracle database
• Cluster of Java based application servers
• Diverse set of clients
13. Challenge
• Move to a player centric model instead of a club centric
• Few orders of magnitude more users and load
• Moving away from Oracle is not feasible in the short term
• Scaling Oracle is just too expensive, if at all possible
17. Kafka
• Persistent pub/sub message bus
• High throughput
• Subscribers can consume at their own speed
• Subscribers can request a ‘rewind’ and re-consume a topic
• Has some tricks to keep the data volume down
• Having both fast and slow consumers is not a problem
24. Postgres (>=9.4)
46.2.1. Logical Decoding
Logical decoding is the process of extracting all persistent changes to a
database's tables into a coherent, easy to understand format which can be
interpreted without detailed knowledge of the database's internal state.
In PostgreSQL, logical decoding is implemented by decoding the contents of
the write-ahead log, which describe changes on a storage level, into an
application-specific form such as a stream of tuples or SQL statements.
Jeff Klukas / Postgres 9.4 docs
25. Debezium
• RedHat
• Standardize change data capture
• Uses Kafka Connect API
• Pretty young: 0.7.4
• Based on ‘Bottled Water’ research project
32. Kafka Streams at Scale
• ± 500M rows of SQL data
• ± 50 joins
• 500 topics
• 400 Gb of Kafka Data
• 300 Gb of RocksDb data
• Building a complete replica from scratch takes many hours
• After that <100ms latency for changes
33. Development cycle
• Developing and testing is hard for stateful code
• Starting a new ‘generation’ is costly
• Contaminated data might show up
34. Conclusions
• Went into production last June
• Generally behaves well (aside from some glitches)
• Kafka Streams is in a lot better state than a year ago
41. Event Driven Microservices
• Services push events instead of a request/response model
• Usually backed by a publish/subscribe bus
42. Application Service
SQL Database
Code in some language
Event Bus
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
Analytics Service
Analytics Database
Code in some language
topic: PERSON
{
id: 123
name: “Alfredo”
dob: 1965-5-1
}
47. Firebase Realtime
• Real-time database
• ‘Backend As a Service’
• Essentially one big JSON document
• Very easy to use client libraries for web and mobile
• Safe to develop
48. Caches
• We can use our streaming engine to update / invalidate caches