Recorded at the London Microservices Meetup: https://www.meetup.com/London-Microservices/
- Date: 14th of October 2020
- Video: https://youtu.be/Arzr0T0hrCw
- Event page: https://www.meetup.com/London-Microservices/events/273266418/
Follow us on Twitter! https://twitter.com/LondonMicrosvc
---
Building Event-Driven Microservices using Kafka Streams
Stathis Souris, ThousandEyes
Streaming is all the rage these days, but can business systems be built using stream processing?
We'll explore this question by looking at Streaming Microservices using Kafka Streams.
We'll also discuss some of the patterns that we currently use in real-life production microservices at ThousandEyes (part of Cisco) and things to avoid.
Key takeaways:
- Basic Kafka concepts
- Kafka Streams
- Discuss various event-driven service built using Kafka Streams
Stathis spent several years in Athens, Greece, as a Software Engineer before moving to London and ThousandEyes (part of Cisco now).
He enjoys working with large distributed systems using technologies like Kafka, Elasticsearch, Java, Kotlin.
6. 6
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Why Kafka
⢠Distributed, resilient architecture, fault tolerant
⢠Horizontal scalability
⢠High performance (latency of less than 10ms) - real time
⢠User by known companies
â LinkedIn, NetďŹix, AirBnb etc
7. 7
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Apache Kafka: Use cases
⢠Messaging System
⢠Activity Tracking tool
⢠Gather metrics from diďŹerent locations
⢠Application logs
⢠Stream processing (Kafka Streams or Spark e.g.)
⢠Decoupling of systems
⢠Works with Spark, Flink, Hadoop etc
8. 8
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
What is Kafka Streams?
⢠Easy data processing and
transformation library within
Kafka
⢠Standard Java Application
⢠No need to create a separate
cluster
⢠Highly scalable, elastic and fault
tolerant (inherits from Kafka)
⢠Exactly Once Capabilities
⢠One record at a time processing
(no batching)
⢠Works for any application size
10. 10
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Kafka Streams history
⢠The API / Library was introduced as part of Kafka 0.10 (2016)
⢠Serious contender to other processing frameworks such as
Spark, Flink, NiFi etc
11. 11
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
About the Endpoint Agent
⢠Agents that run on users laptops or desktops
⢠Collect metrics from customerâs browser interactions
⢠Perform network tests e.g. ping, pathtrace against various targets
⢠Checks-in every 10 minutes
⢠Alerts & Reports
13. 13
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Why event-driven microservices?
⢠Operate at large scale 100K agents
⢠Complex logic that needs to run at scale
⢠As real time as possible
⢠Asynchronous communication
14. 14
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Why Kafka Streams?
â Inherits Kafka Streams properties
â Simple DSL for
â Aggregations
â Windowing
â Streams & Tables
â <Key, Value>
16. 16
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Use case
Synthetic tests at an interval
Schedule tests on agents dynamically
Powerful visualization and filtering capabilities
17. 17
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Batch Job approach
⢠Agent checks-in every 10 minutes
⢠Batch job runs to assign tests every 15 minutes
⢠Pull state from various DBs
⢠Run business logic
⢠Save assignments
After stress testing:
â Latency increase as we added more agents
â Could only scale vertically - not an option at
that point
18. 18
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Event Driven approach
⢠Stream of check-ins
⢠Use that stream to power the Scheduler
⢠Assign tasks on check-in event
19. 19
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Event Driven approach
â Application scales with number of
Kafka partitions
â Join with GlobalKTables
â Run the business logic
â Save assignments in KTable
Facts:
⢠All state lives in Kafka
⢠At least once delivery
⢠Materialize assignments in MongoDB:
â Historical queries
â Timeline of assignments
20. 20
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Interactive Queries
⢠Query in-memory KTable for assignments
directly
⢠Expose through a Rest API
⢠Very fast
⢠When State store is temporarily unavailable
use MongoDB query
â zero-downtime deployments
23. 23
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Problem:
â updating the KTable on every event
â creating hot partitions that took too long to process
After 20K agents
24. 24
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Use KTable cache
Reduce the commit interval of the application.
StreamsConfig.COMMIT_INTERVAL_MS_CONFIG
Temporary solution
25. 25
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Long term ďŹx
Removed repartitioning step and stored active check-ins in Redis instead
27. 27
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Browser Session Metrics
â Real User Monitoring events coupled with network
tests
â No set interval
â Alerter needs binned data
â One minute window and emit aggregated metrics
28. 28
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Window operator
Problem:
Alerting use case needs aggregated event to be emitted at the end, not on every update.
29. 29
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Suppress operator
Problems:
Windowed aggregates took to long to reach the Alerter.
30. 30
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Aggregation was delayed?
Closing a window is driven by
events, that advance the stream
time.
Solution:
Created a cron job to generate
events every close window +
grace period to force the window
to close.
31. 31
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Production issues
⢠Compaction wasnât working in some cases
⢠Avoid repartitioning to hot keys
⢠Interactive queries misbehavior
â Metadata incorrect
â Created loop between services
32. 32
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Key Takeaways
â Use KTable cache to de-duplicate events before
sending downstream. Use âcommit.intervalâ to
your advantage.
â Avoid hot partition keys if possible especially when
you are going big.
â Make sure compaction works for your topics
â If you donât really use RocksDB disable it
â Use binary format from the beginning if you are
going big
â Kafka as a DB is possible, but donât overdo it
â Small latencies on the processor level can add up
once you have lag (100ms * 10.000 ~= 16min)
33. 33
Copyright Š2020 ThousandEyes, Inc. All Rights Reserved. @ThousandEyes
Q&A
Twitter: @efsouris
Blogpost:
https://medium.com/thousandeyes-engine
ering/kafka-streams-in-the-endpoint-agent
-670a098ae7a4