Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Online Media Data Stream Processing with Kafka
1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
2. 18.
Septem
ber
2012
• What is Streaming Data? 2
• Why Kafka?
• Kafka Architecture
• Use Case: Prospective Search
Overview
3. 18.
Septem
ber
2012
• Spin-off of MeMo News AG, the 3
leading provider for Social Media
Monitoring & Analytics in Switzerland
• Big Data expert, focused on Hadoop,
HBase and Solr
• Objective: Transforming data into
insights
About Sentric
5. 18.
Septem
ber
2012
• Website Activity Data 5
• User activity
• Server activity
• Social Media Data
• News Data
• …
• How to Analyze in Real-Time?
What is Streaming Data?
Data Streams
6. 18.
Septem
ber
2012
6
now
t
Offline
(Hadoop/MR)
Online
(Ka5a)
What is Streaming Data?
Offline vs. Online
8. 18.
Septem
ber
2012
• Message Queues (RabbitMQ, ActiveMQ) 8
• do not scale / have no persistence
• Flume / Scribe
• Log-Aggregation only, high throughput and
scalable, push model
• Focus on offline consumption
• Kafka
• High throughput and scalable, pull model
• Different consumption profiles
Why Kafka?
Streaming Systems
9. 18.
Septem
ber
2012
9
Source:
h<p://research.microso@.com/en-‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-‐final12.pdf
Why Kafka?
Consumer Performance
11. 18.
Septem
ber
2012
• Messaging System 11
• Publish-Subscribe
• Persistent
• High-Throughput
Kafka Architecture
Key Concepts
12. 18.
Septem
ber
2012
12
ZooKeeper
Producer Consumer
Producer
Broker Consumer
Producer
Push Pull
Consumer
Producer
Kafka Architecture
Messaging
13. 18.
Septem
ber
2012
Topics 13
logs … page-views
Msg Msg Msg
Consumer Consumer Consumer
Kafka Architecture
Publish-Subscribe
14. 18.
Septem
ber
2012
• Persists messages to disc 14
• Topic is base abstraction
• Binary write ahead log
• No message ID
• Message offset ID (byte position)
• Messages retained a specific time
• Default is 7 days
Kafka Architecture
Persistent
15. 18.
Septem
ber
2012
• API Simplicity 15
• Append message
• Fetch message from given byte position
• Batching
• Stateless Broker
• O(1) disc access (no seeks)
• Use of operating system features
Kafka Architecture
High-Throughput
16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
17. 18.
Septem
ber
2012
n News Agents 17
Kafka
REST
RT Alerts
Web-UI
HBase
MySQL Solr
Icons by http://dryicons.com
Prospective Search
Solution Architecture
18. 18.
Septem
ber
2012
18
Processing
Pull (Batch)
Prospective
Search
RT Alerts
Kafka Consumer
Icons by http://dryicons.com
Prospective Search
Prospective Search with Kafka
19. 18.
Septem
ber
2012
• http://incubator.apache.org/kafka/ 19
• http://sites.computer.org/debull/
A12june/A12JUN-CD.pdf
Resources to get started
20. 18.
Septem
ber
2012
20
Questions?
Christian Gügi, christian.guegi@sentric.ch
Swiss Big Data User Group
Thank you!