Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Building a Social Platform
Part 3:
Scaling the Data Feed

Socialite
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content storage
• Configurable models and options
• REST API in Dropwizard (Yammer)
– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite

Architecture
GraphServiceProxy
ContentProxy

Feed Service
• Two main functions :
– Aggregating “followed” content for a user
– Forwarding user’s content to “followers”
• Common implementation models :
– Fanout on read
• Query content of all followed users on fly
– Fanout on write
• Add to “cache” of each user’s timeline for every post
• Various storage models for the timeline

Fanout On Read
Pros
Simple implementation
No extra storage for timelines
Cons
– Timeline reads (typically) hit all shards
– Often involves reading more data than required
– May require additional indexing on Content

Fanout On Write
Pros
Timeline can be single document read
Dormant users easily excluded
Working set minimized
Cons
– Fanout for large follower lists can be expensive
– Additional storage for materialized timelines

Fanout On Write
• Three different approaches
– Time buckets
– Size buckets
– Cache
• Each has different pros & cons

Timeline Buckets - Time
Upsert to time range buckets for each user
> db.timed_buckets.find().pretty()
{
"_id" : {"_u" : "jsr", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}
]
}
{
"_id" : {"_u" : "ian", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
]
}
{
"_id" : {"_u" : "jsr", "_t" : 516934 },
"_c" : [
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
]
}

Timeline Buckets - Size
More complex, but more consistently sized
> db.sized_buckets.find().pretty()
{
"_id" : ObjectId("...122"),
"_c" : [
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
],
"_s" : 3,
"_u" : "jsr"
}
{
"_id" : ObjectId("...011"),
"_c" : [
],
"_s" : 1,
"_u" : "ian"
}

Timeline - Cache
Store a limited cache, fall back to fanout on read
– Create single cache doc on demand with upsert
– Limit size of cache with $slice
– Timeout docs with TTL for inactive users
> db.timeline_cache.find().pretty()
{
"_c" : [
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
],
"_u" : "jsr"
}
{
"_c" : [
],
"_u" : "ian"
}

Embedding vs Linking Content
Embedded content for direct access
– Great when it is small, predictable in size
Link to content, store only metadata
– Read only desired content on demand
– Further stabilizes cache document sizes
> db.timeline_cache.findOne({”_id" : "jsr"})
{
"_c" : [
{"_id" : ObjectId("...dc1”)},
{"_id" : ObjectId("...dd2”)},
{"_id" : ObjectId("...da7”)}
],
”_id" : "jsr"
}

Socialite Feed Service
• Implemented four models as plugins
– FanoutOnRead
– FanoutOnWrite – Buckets (size)
– FanoutOnWrite – Buckets (time)
– FanoutOnWrite - Cache
• Switchable by config
• Store content by reference or value
• Benchmark-able back to back

Benchmarking the Feed
• Biggest challenge: scaling the feed
• High cost of "fanout on write"
• Popular user posts => # operations:
– Content collection insert: 1
– Timeline Cache: on average, 130+ cache document
updates
• SCATTER GATHER (slowest shard determines latency)

• Timeline is different from content!
– "It's a Cache"
IT CAN BE REBUILT!

• MongoDB as a cache

IT CAN BE REBUILT!
Effect of removing the cache and forcing drop-back to
fanout on read and rebuilding of the cache:

• Results
– last two weeks
– ran load with one million users
– ran load with ten million users (currently running)
– used avg send rate 1K/s; 2K/s; reads 10K-20k/s
– 22 AWS c3.2xlarge servers (7.5GB RAM)
– 18 across six shards (3 content, 3 user graph)
– 4 mongos and app machines
– 2 c2x4xlarge servers (30GB RAM)
– timeline feed cache (six shards)

Socialite
• Real Working Implementation
– Implements All Components
– Configurable models and options
• Built-in benchmarking
• Questions?
– We will be at "Ask The Experts" this afternoon!

Thank You!

Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Ähnlich wie Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed (20)

Mehr von MongoDB

Mehr von MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Hinweis der Redaktion