With hundreds of millions of users, Twitter operates one of the world's largest real-time delivery systems, large enough and pervasive enough to exert noticeable "pressure" on the overall internet itself. At steady state, Twitter receives thousands of tweets a second that it needs to deliver to disks, in-memory timelines, email, and mobile devices. The name of the game for Twitter is "now", so those deliveries, which multiply according to the graph of who follows whom, need to occur in real-time. In this session, we will dive into both the "write path" and "read path" of Twitter to understand the architecture which supports those tweets, and also how Twitter serves them through one of the world's largest web sites.
animated slides available at animated slides available at http://www.youtube.com/watch?v=l55jFAGsgbs
7. what are the goals?
⢠evolve from being solely a web stack
⢠isolate responsibilities and concerns
⢠site speed and reliability
⢠developer innovation speed
8.
9. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
10.
11.
12.
13. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
14. Write API
Social
Ingester Fanout Graph
Service
insert
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
⢠keyed off
Mobile
Push
ârecipientâ
Timeline
⢠pipelined 4k
Blender
Service
âdestinationsâ
at a time
⢠replicated
15. Write API
Ingester Fanout
using redis
Batch Compute
Timeline Cache
Push Compute
Search Cache
Tweet IDPush User ID
HTTP Bits
Redis Redis
Redis Redis Hadoop
Earlybird Redis 8 bytes 8 bytes 4 bytes
⢠native list
Mobile
Push
structure
Timeline
⢠RPUSHX to
Blender
Service
only add to
cached
timelines
16. Write API
Ingester Fanout
using redis
Batch Compute
Timeline Cache
Push Compute
Search Cache
Tweet IDPush User ID
HTTP Bits
Redis Redis
Redis Redis Tweet ID User ID Bits Hadoop
Earlybird Redis
⢠native list
Mobile
Tweet ID User ID Bits
Push
structure
Tweet ID User ID Bits
Tweet ID User ID Bits
Timeline
⢠RPUSHX to
Blender Tweet ID User ID Bits
Service
Tweet ID User ID Bits
only add to Tweet ID User ID Bits
cached Tweet ID User ID Bits
timelines
Tweet ID User ID Bits
17. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Cache
HTTP Push
Redis Redis
Redis Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
18. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
19. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
20. Write API
Ingester Fanout
blender
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
⢠queries one
Mobile
Push
replica of all
Blender
Timeline
Service indexes
⢠merges &
ranks results
21. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
22. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
23. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
24. http push / hosebird
⢠maintains persistent connections with
end clients
⢠processes tweet & social graph events
⢠event-based ârouterâ
25. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
event propagation
⢠write API sends all events into hosebird;
sees content creation events, social graph
changes, etc.
⢠different queues for public tweets,
protected tweets, social events, etc.
26. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
event cascading
⢠bandwidth management
⢠simultaneous connection management
(~1m long lived & open connections to this
cluster)
27. Hosebird Firehose
Write API Hosebird User Streams
Hosebird Track / Follow
ďŹrehose
⢠edge machine simply outputs the public
tweet queue
⢠only allow a limited number of ďŹrehoses
per hosebird box for bandwidth
management
28. Hosebird Firehose
Write API Hosebird Track / Follow
Hosebird User Streams
track / follow
⢠simple query based on tweet content
⢠keeps list of terms / users of interest
⢠parses public tweets at the edge, and if
term matches a token, or user is of
interest, then route
29. Hosebird Firehose
Write API Hosebird Track / Follow
Hosebird User Streams
user streams
⢠replicate home timeline experience
⢠upon login, obtain âfollowingâ list
⢠keep cached following list coherent by
seeing social graph updates
⢠route tweet if from a followed user
30. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
31. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
32. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
33. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis Social
Mobile
Graph
Push
Service
Timeline
Blender
Service
34. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis Social
Mobile
Graph
Push
Service
Timeline
Blender
Service
35. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
36. Pull Push
Targeted twitter.com User / Site Streams
home_timeline API Mobile Push (SMS, etc.)
Queried Search API Track / Follow Streams
37. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
38. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
39. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
40. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
41. Synchronous Path
Write API
Ingester Fanout
Asynchronous Path
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender Query Path
Service
42. Write API
Read Path
Write Path
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
43. Write API
Read Path
Write Path
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service
46. Write API
Ingester Fanout
Timeline Cache
Redis Search Index Redis
Earlybird Redis
Earlybird Redis
47. Write API
Ingester Fanout
Timeline Cache
Redis Search Index Redis
Earlybird Redis
Earlybird Redis
search index fanout index
â˘[âhelloâ,âworldâ] â˘[@danadanger, ...]
48. User Intent Query Expansion
âHello, worldâ âHelloâ AND âworldâ
@raffiâs home timeline home_timeline:raffi
49. User Intent Query Expansion
âHello, worldâ âHelloâ AND âworldâ
user_timeline:nelson
@raffiâs home timeline OR
user_timeline:danadanger
51. User Intent Query Expansion
âHello, worldâ âHelloâ AND âworldâ
@raffiâs home timeline home_timeline:raffi
52. User Intent Query Expansion
âHello, worldâ âHelloâ AND âworldâ
home_timeline:raffi
@raffiâs home timeline OR
user_timeline:taylorswift13
53. streaming compute
⢠continuous computation
⢠driven by the events that come into
twitter
⢠generalizing the push mechanism
54. Write API
Ingester Fanout
Batch Compute
Timeline Cache
Push Compute
Search Index
HTTP Push
Redis Redis
Earlybird Redis Hadoop
Earlybird Redis
Mobile
Push
Timeline
Blender
Service