With hundreds of millions of users, Twitter operates one of the world's largest real-time delivery systems, large enough and pervasive enough to exert noticeable "pressure" on the overall internet itself. At steady state, Twitter receives thousands of tweets a second that it needs to deliver to disks, in-memory timelines, email, and mobile devices. The name of the game for Twitter is "now", so those deliveries, which multiply according to the graph of who follows whom, need to occur in real-time. In this session, we will dive into both the "write path" and "read path" of Twitter to understand the architecture which supports those tweets, and also how Twitter serves them through one of the world's largest web sites.
animated slides available at animated slides available at http://www.youtube.com/watch?v=l55jFAGsgbs
7. what are the goals?
⇢ evolve from being solely a web stack
⇢ isolate responsibilities and concerns
⇢ site speed and reliability
⇢ developer innovation speed
8.
9. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
10.
11.
12.
13. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
14. Write API
Social
Ingester Fanout Graph
Service
insert
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
⇢ keyed off
Mobile
Push
“recipient”
Timeline
⇢ pipelined 4k
Blender
Service
“destinations”
at a time
⇢ replicated
15. Write API
Ingester Fanout
using redis
Batch Compute
Timeline Cache
Push Compute
Search Cache
Tweet IDPush User ID
HTTP Bits
Redis Redis
Redis Redis Hadoop
Earlybird Redis 8 bytes 8 bytes 4 bytes
⇢ native list
Mobile
Push
structure
Timeline
⇢ RPUSHX to
Blender
Service
only add to
cached
timelines
16. Write API
Ingester Fanout
using redis
Batch Compute
Timeline Cache
Push Compute
Search Cache
Tweet IDPush User ID
HTTP Bits
Redis Redis
Redis Redis Tweet ID User ID Bits Hadoop
Earlybird Redis
⇢ native list
Mobile
Tweet ID User ID Bits
Push
structure
Tweet ID User ID Bits
Tweet ID User ID Bits
Timeline
⇢ RPUSHX to
Blender Tweet ID User ID Bits
Service
Tweet ID User ID Bits
only add to Tweet ID User ID Bits
cached Tweet ID User ID Bits
timelines
Tweet ID User ID Bits
17. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
18. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
19. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
20. Write API
Ingester Fanout
blender
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
⇢ queries one
Mobile
Push
replica of all
Blender
Timeline
Service indexes
⇢ merges &
ranks results
21. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
22. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
23. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
24. http push / hosebird
⇢ maintains persistent connections with
end clients
⇢ processes tweet & social graph events
⇢ event-based “router”
25. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
event propagation
⇢ write API sends all events into hosebird;
sees content creation events, social graph
changes, etc.
⇢ different queues for public tweets,
protected tweets, social events, etc.
26. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
event cascading
⇢ bandwidth management
⇢ simultaneous connection management
(~1m long lived & open connections to this
cluster)
27. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
firehose
⇢ edge machine simply outputs the public
tweet queue
⇢ only allow a limited number of firehoses
per hosebird box for bandwidth
management
28. Hosebird Firehose
Write API Hosebird Track / Follow
Hosebird User Streams
track / follow
⇢ simple query based on tweet content
⇢ keeps list of terms / users of interest
⇢ parses public tweets at the edge, and if
term matches a token, or user is of
interest, then route
29. Hosebird Firehose
Write API Hosebird Track / Follow
Hosebird User Streams
user streams
⇢ replicate home timeline experience
⇢ upon login, obtain “following” list
⇢ keep cached following list coherent by
seeing social graph updates
⇢ route tweet if from a followed user
30. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
31. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
32. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
33. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis Social
Mobile
Graph
Push
Service
Timeline
Blender
Service
34. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis Social
Mobile
Graph
Push
Service
Timeline
Blender
Service
35. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
36. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
37. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
38. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
39. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
40. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
41. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
42. Write API
Read Path
Write Path
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
43. Write API
Read Path
Write Path
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service