SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Building a Social Platform
Part 3:
Scaling the Data Feed
Socialite
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content storage
• Configurable models and options
• REST API in Dropwizard (Yammer)
– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite
Architecture
GraphServiceProxy
ContentProxy
Feed Service
• Two main functions :
– Aggregating “followed” content for a user
– Forwarding user’s content to “followers”
• Common implementation models :
– Fanout on read
• Query content of all followed users on fly
– Fanout on write
• Add to “cache” of each user’s timeline for every post
• Various storage models for the timeline
Fanout On Read
Fanout On Read
Pros
Simple implementation
No extra storage for timelines
Cons
– Timeline reads (typically) hit all shards
– Often involves reading more data than required
– May require additional indexing on Content
Fanout On Write
Fanout On Write
Pros
Timeline can be single document read
Dormant users easily excluded
Working set minimized
Cons
– Fanout for large follower lists can be expensive
– Additional storage for materialized timelines
Fanout On Write
• Three different approaches
– Time buckets
– Size buckets
– Cache
• Each has different pros & cons
Timeline Buckets - Time
Upsert to time range buckets for each user
> db.timed_buckets.find().pretty()
{
"_id" : {"_u" : "jsr", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}
]
}
{
"_id" : {"_u" : "ian", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
]
}
{
"_id" : {"_u" : "jsr", "_t" : 516934 },
"_c" : [
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
]
}
Timeline Buckets - Size
More complex, but more consistently sized
> db.sized_buckets.find().pretty()
{
"_id" : ObjectId("...122"),
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],
"_s" : 3,
"_u" : "jsr"
}
{
"_id" : ObjectId("...011"),
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],
"_s" : 1,
"_u" : "ian"
}
Timeline - Cache
Store a limited cache, fall back to fanout on read
– Create single cache doc on demand with upsert
– Limit size of cache with $slice
– Timeout docs with TTL for inactive users
> db.timeline_cache.find().pretty()
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],
"_u" : "jsr"
}
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],
"_u" : "ian"
}
Embedding vs Linking Content
Embedded content for direct access
– Great when it is small, predictable in size
Link to content, store only metadata
– Read only desired content on demand
– Further stabilizes cache document sizes
> db.timeline_cache.findOne({”_id" : "jsr"})
{
"_c" : [
{"_id" : ObjectId("...dc1”)},
{"_id" : ObjectId("...dd2”)},
{"_id" : ObjectId("...da7”)}
],
”_id" : "jsr"
}
Socialite Feed Service
• Implemented four models as plugins
– FanoutOnRead
– FanoutOnWrite – Buckets (size)
– FanoutOnWrite – Buckets (time)
– FanoutOnWrite - Cache
• Switchable by config
• Store content by reference or value
• Benchmark-able back to back
Benchmark by feed type
Benchmarking the Feed
• Biggest challenge: scaling the feed
• High cost of "fanout on write"
• Popular user posts => # operations:
– Content collection insert: 1
– Timeline Cache: on average, 130+ cache document
updates
• SCATTER GATHER (slowest shard determines latency)
Benchmarking the Feed
• Timeline is different from content!
– "It's a Cache"
IT CAN BE REBUILT!
Benchmarking the Feed
• MongoDB as a cache
IT CAN BE REBUILT!
Effect of removing the cache and forcing drop-back to
fanout on read and rebuilding of the cache:
Benchmarking the Feed
Benchmarking the Feed
Benchmarking the Feed
Benchmarking the Feed
• Results
– last two weeks
– ran load with one million users
– ran load with ten million users (currently running)
– used avg send rate 1K/s; 2K/s; reads 10K-20k/s
– 22 AWS c3.2xlarge servers (7.5GB RAM)
– 18 across six shards (3 content, 3 user graph)
– 4 mongos and app machines
– 2 c2x4xlarge servers (30GB RAM)
– timeline feed cache (six shards)
Summary
Socialite
• Real Working Implementation
– Implements All Components
– Configurable models and options
• Built-in benchmarking
• Questions?
– We will be at "Ask The Experts" this afternoon!
https://github.com/10gen-labs/socialite
https://github.com/10gen-labs/socialite
https://github.com/10gen-labs/socialite
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
Alex Litvinok
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
lehresman
 

Was ist angesagt? (19)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Building Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata CatalogBuilding Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata Catalog
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Mongo db tutorials
Mongo db tutorialsMongo db tutorials
Mongo db tutorials
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 

Andere mochten auch

Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
Chris Clarke
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Andere mochten auch (20)

Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
 
Building an Activity Feed with Cassandra
Building an Activity Feed with CassandraBuilding an Activity Feed with Cassandra
Building an Activity Feed with Cassandra
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
How Auto Trader enables the UK's largest digital automotive marketplace
How Auto Trader enables the UK's largest digital automotive marketplaceHow Auto Trader enables the UK's largest digital automotive marketplace
How Auto Trader enables the UK's largest digital automotive marketplace
 

Ähnlich wie Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
MongoDB APAC
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 

Ähnlich wie Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed (20)

Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

  • 1. Building a Social Platform Part 3: Scaling the Data Feed
  • 2. Socialite • Reference Implementation – Various Fanout Feed Models – User Graph Implementation – Content storage • Configurable models and options • REST API in Dropwizard (Yammer) – https://dropwizard.github.io/dropwizard/ • Built-in benchmarking https://github.com/10gen-labs/socialite
  • 4. Feed Service • Two main functions : – Aggregating “followed” content for a user – Forwarding user’s content to “followers” • Common implementation models : – Fanout on read • Query content of all followed users on fly – Fanout on write • Add to “cache” of each user’s timeline for every post • Various storage models for the timeline
  • 6. Fanout On Read Pros Simple implementation No extra storage for timelines Cons – Timeline reads (typically) hit all shards – Often involves reading more data than required – May require additional indexing on Content
  • 8. Fanout On Write Pros Timeline can be single document read Dormant users easily excluded Working set minimized Cons – Fanout for large follower lists can be expensive – Additional storage for materialized timelines
  • 9. Fanout On Write • Three different approaches – Time buckets – Size buckets – Cache • Each has different pros & cons
  • 10. Timeline Buckets - Time Upsert to time range buckets for each user > db.timed_buckets.find().pretty() { "_id" : {"_u" : "jsr", "_t" : 516935}, "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"} ] } { "_id" : {"_u" : "ian", "_t" : 516935}, "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ] } { "_id" : {"_u" : "jsr", "_t" : 516934 }, "_c" : [ {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ] }
  • 11. Timeline Buckets - Size More complex, but more consistently sized > db.sized_buckets.find().pretty() { "_id" : ObjectId("...122"), "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}, {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ], "_s" : 3, "_u" : "jsr" } { "_id" : ObjectId("...011"), "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ], "_s" : 1, "_u" : "ian" }
  • 12. Timeline - Cache Store a limited cache, fall back to fanout on read – Create single cache doc on demand with upsert – Limit size of cache with $slice – Timeout docs with TTL for inactive users > db.timeline_cache.find().pretty() { "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}, {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ], "_u" : "jsr" } { "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ], "_u" : "ian" }
  • 13. Embedding vs Linking Content Embedded content for direct access – Great when it is small, predictable in size Link to content, store only metadata – Read only desired content on demand – Further stabilizes cache document sizes > db.timeline_cache.findOne({”_id" : "jsr"}) { "_c" : [ {"_id" : ObjectId("...dc1”)}, {"_id" : ObjectId("...dd2”)}, {"_id" : ObjectId("...da7”)} ], ”_id" : "jsr" }
  • 14. Socialite Feed Service • Implemented four models as plugins – FanoutOnRead – FanoutOnWrite – Buckets (size) – FanoutOnWrite – Buckets (time) – FanoutOnWrite - Cache • Switchable by config • Store content by reference or value • Benchmark-able back to back
  • 16. Benchmarking the Feed • Biggest challenge: scaling the feed • High cost of "fanout on write" • Popular user posts => # operations: – Content collection insert: 1 – Timeline Cache: on average, 130+ cache document updates • SCATTER GATHER (slowest shard determines latency)
  • 17. Benchmarking the Feed • Timeline is different from content! – "It's a Cache" IT CAN BE REBUILT!
  • 18. Benchmarking the Feed • MongoDB as a cache
  • 19. IT CAN BE REBUILT! Effect of removing the cache and forcing drop-back to fanout on read and rebuilding of the cache: Benchmarking the Feed
  • 22. Benchmarking the Feed • Results – last two weeks – ran load with one million users – ran load with ten million users (currently running) – used avg send rate 1K/s; 2K/s; reads 10K-20k/s – 22 AWS c3.2xlarge servers (7.5GB RAM) – 18 across six shards (3 content, 3 user graph) – 4 mongos and app machines – 2 c2x4xlarge servers (30GB RAM) – timeline feed cache (six shards)
  • 24. Socialite • Real Working Implementation – Implements All Components – Configurable models and options • Built-in benchmarking • Questions? – We will be at "Ask The Experts" this afternoon! https://github.com/10gen-labs/socialite https://github.com/10gen-labs/socialite

Hinweis der Redaktion

  1. For a Social Platform to store and deliver streaming timelines over long periods of time, careful attention must be paid to the way content is stored. We provide a detailed look into storing an infinite timeline of data while optimizing indexing and sharding configuration for access the most recent window of data. We will also look at some overall performance metrics from Socialite as we scale from a single replica set to a large sharded environment.
  2. image at https://dropwizard.github.io/dropwizard of the hat 
  3. BRUTAL!!!
  4. Variants?
  5. Should you embed the messages/content into "cache"/buckets/etc. or just store references?
  6. WHICH ONE DID WE IMPLEMENT IN SOCIALITE??? All work with Async Service(? or mention later) And we did benchmark them! -> Asya
  7. examining latency of reading content by fanout type - note two types of latency – for sender and for recipient. scaling throughput... THIS WILL NOT SCALE LINEARLY(!) *RERUN WITH SEVERAL SHARDS* replace with new screenshot
  8. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users. Number of updates across all cache / number of documents updated
  9. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  10. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  11. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  12. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  13. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  14. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  15. Some kind of wrap-up
  16. image at https://dropwizard.github.io/dropwizard of the hat 