Without seeing what’s wrong with today’s messaging queues, it can be initially confusing to view Apache Kafka as more. By adding additional functionality, true storage, and guarantees it opens opportunities to take full advantage of a publish/subscribe model.
Joined by Yelp’s Justin Cunningham we’ll see how their infrastructure has quickly evolved. Powered by Kafka, Yelp has made the leap to microservices and is seeing the benefits of efficiency and performance.
Speakers:
Justin Cunningham
Technical Lead, Software Engineer, Yelp
Gehrig Kunz
Technical Product Marketing Manager, Confluent
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
How Yelp Leapt to Microservices with More than a Message Queue
1. 1Confidential
Messaging done right:
How Yelp Leapt to
Microservices with more
than a Message Queue
Justin Cunningham, Technical Lead Software Engineering, Yelp
Gehrig Kunz, Technical Product Marketing, Confluent
2. 2Confidential
Streaming in Action Series
August 10th
Why VR needed Stream
Processing to Survive
August 16th
Pandora Plays Nicely Everywhere
with Real-Time Data Pipelines
You are here!
3. 3Confidential
Today’s agenda
How a streaming platform is ‘messaging done right’
• Review what and why messaging queues are a thing
• Gaps you might run into
• Building our dream messaging queue
How Yelp uses Kafka to move to microservices
• Transition to microservices
• Using Kafka for their data pipeline
• Benefits realized
4. 4Confidential
What is a message queue?
From Wikipedia: They use a queue for messaging – the passing of control or of content. Group
communication systems provide similar kinds of functionality.
5. 5Confidential
What is a message queue?
From Wikipedia: They use a queue for messaging – the passing of control or of content. Group
communication systems provide similar kinds of functionality.
7. 7Confidential
Why use a messaging queue?
• Decouple producers and consumers of data
• Greater/more predictable performance
• More flexible architecture
16. 16Confidential
Building our dream messaging queue
Publish/Subscribe Model
I want to ____________ .
have everyone in the company use this.
connect whatever I need.
survive failure scenarios.
17. 17Confidential
Our dream messaging queue
Publish/Subscribe Model
+ Scalability
I want to ____________ .
use Oracle, MySQL, MongoDB, Cassandra.
add search.
recover an entire database.
send some test data to a ML library.
do something new.
Rewind and replay
18. 18Confidential
Our dream messaging queue
Publish/Subscribe Model
+ Scalability
+ True Storage
I want to ____________ .
quickly build apps that use data.
use real-time data.
use accurate, real-time data.
not manage additional things.
19. 19Confidential
Messaging Queue to Streaming Platform
Publish/Subscribe Model
+ Scalability
+ True Storage
+ Stream Processing
Streaming Platform
20. 20Confidential
What a streaming platform enables
Access (and process) what you need
Be flexible for the future
Simplify your infrastructure
21. 21Confidential
Messaging Queue to Streaming Platform
Netflix
Uses Kafka to power their data pipeline,
supporting a trillion messages a day.
Line
Line uses Kafka’s stream processing to
perform streaming ETL on millions of
messages daily.
The New York Times
Kafka is the ‘source of truth’ storing every
article since 1851.
Yelp
Let’s talk to Justin.
28. 28Confidential
86 Million is a Magic Number
I want to process all reviews every day. I want to make 1,000 requests per second
to your service, every second, forever.
Reasonable Becomes Unreasonable
29. 29Confidential
What if we implement a raw bulk-data
API? We could pass it arbitrary SQL to
generalize it.
What if we take DB snapshots and pass
them around?
Flags Prefs Category
33939 533248 37
Potential Solutions?
42. 42Confidential
Datapipe
Producer
Bunsen
Scribe
Replication
Handler MySQL
Other Data Stores
Yelp-main
Services
MONK
DP
DP
JSON
SCHEMATIZER
KAFKAKAFKA
• Paastorm
• Python
• Flatmap
• Flink*
• Java/Scala
• Advanced
Primitives &
Stream SQL
Recursive
MySQL
Services
Yelp-main
Redshift
S3
Flink
Kafka Connect
Cassandra
ES
Overall Data Infra
43. 43Confidential
How it’s helped Yelp
● More than $10 million in direct savings
● Eliminated many duplicative systems
● Higher quality data, metrics and analytics
● Faster, Better Decision Making
44. 44Confidential
A streaming platform can be messaging done right
• Decouple and modernize your
infrastructure
• Reach company-wide scale
• Build streaming applications and data
pipelines (like Yelp’s) with real-time data
45. 45Confidential
Streaming in Action Series
Up next – August 10th
Why VR needed Stream
Processing to Survive
August 16th
Pandora Plays Nicely Everywhere
with Real-Time Data Pipelines