#ApacheKafkaTLV: Building distributed, fault-tolerant processing apps with Kafka Streams - use case
The second part of the #2 Meetup, delivered by Anatoly Tichonov - Mentory. Hosted by WeWork Sarona TLV,
2. Kafka Streams: Stream-table duality
● Almost every stream
processing needs database
● Stream of key-value
messages can be considered
as changelog of virtual two-
column table
● Also changelog of this table
can produce the same
original stream
3. Kafka Streams Concepts: Ktable and Kstream
● In KStream each data record represents a self-contained datum in unbounded
dataset
("alice", 1) --> ("alice", 3): sum for key “alice” will return 4
● In KTable each data record represents an update in changelog stream
("alice", 1) --> ("alice", 3): sum for key “alice” will return 3
● KTable provides an ability to look up current values of data records by keys,
which is available through join operations.
4. Kafka Streams: Stateless transformations
● Both KStream and KTable are created from Kafka topic
● Following standard stateless transformations are available:
- Branch (split)
- Filter
- Map
- Flatmap
- GroupBy
- Peek etc….
● All transformation methods can be chained together to compose a complex
processor topology
5. Kafka Streams: Time
Kafka Streams supports the following notions of time:
● Event time: The point in time when an event or data record occurred, created
“by the source”
● Processing time: The point in time when the event or data record happens to
be processed by the stream processing application
● Ingestion-time: The point in time when an event or data record is stored in a
topic partition by a Kafka broker.
6. Kafka Streams: Aggregation and Join
● An aggregation operation takes one input stream or table, and yields a new
table by combining multiple input records into a single output record. Examples
of aggregations are computing counts or sum.
● Aggregation API has standard implementations ready like count and reduce.
● A join operation merges two input streams and/or tables based on the keys of
their data records, and yields a new stream/table
● Records can be joined only by key.
● Logic depends on type of sources, so KStream-to-KStream, KStream-to-KTable
and KTable-to-KTable join are slightly different operations.
10. Price alert feature: Classic
● Save user requests for price alerts to database
● Pay attention to flight offers add/update date
● Create complicated database queries to join user
requests and new offers according to date, source
and destination
● Use scheduler for publishing price alerts