SlideShare ist ein Scribd-Unternehmen logo
1 von 18
mongo @ ex.fm
 Lucas Hrabovsky
      CTO
   #MongoPGH
ex.fm turns websites into CD’s
browser extensions
_id and indexes
• Bad Ideas
  – ObjectId("4fb284…")
  – Big Compound Indexes
  – Long,VariableWidthStringsMissIndexes
• Good Ideas
  – Make _id mean something
  – Fixed Width Hashes
  – Use _id as a compound index
activity feeds: first attempt
{“_id”: “201109122304-lucas-dan-c7dede43…”,
"username”: “lucas”, "created”: 201109122304,
"actor”: “dan”, “verb”: “love”}


db.user.feed.find({„username‟: „lucas‟, „verb‟: „love‟})
.sort({„created‟: -1})



Working just fine for 4MM documents, but getting slow…
new version of activity feeds
{“_id”: “201109122304-lucas-dan-
c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”:
lucas-love-201109122304, "actor”: “dan”}


db.user.feed.find({„vid‟: /^lucas-/})
.sort({„vid‟: -1})

Fast for all 3 use cases!
removing indexes pays off




Don‟t need to buy more/bigger machines!
sites! sites! sites!
padding factor
•   Variable document size
•   Allocate for the latest and fattest
•   Document moves
•   Can be very inefficient
•   More RAM!
•   Pre-allocate to prevent moves
unbounded embedded lists
•   Useful for followers, favorites
•   Good for a few things, bad for lots
•   Constantly bumping up padding factor
•   Lots of document moves
a metaphor
     • You run a coffee shop and can buy only
       one size of cup. Which size do you buy?
     • On average, each customer has only one
       cup
     • Heavy drinkers have hundreds of cups




credit: Macintex macintex.deviantart.com
bucketing!
•   Split list across multiple documents
•   Median number of items = bucket size
•   Pre-allocate
•   Easy seeking and traversal
•   Much faster
hey charts!
site.meta 1                         site.meta 2

site.songs 1                             site.songs 2




  Allocated and unused

  Allocated and full of data
same charts when using
                bucketing
site.meta 1                               site.meta 2

site.songs 1 - 1               site.songs 2 - 1    site.songs 2 - 2



site.songs 1 -2                site.songs 2 - 3    site.songs 2 - 4



                               site.songs 2 - 5    site.songs 2 -6




  Allocated and unused

  Allocated and full of data
doesn’t work for everything…
• Picking right bucket size
• Defragging
• Random insertion
  – Easy for things you don‟t much care about the
    order of
  – More difficult is you‟re going to insert and
    change the order later
micro documents
db.site.songs.find({_id:
/^bfc25de08d964a8a41226c6016dd7753-
/}).sort({_id:-1})

{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114", ”s" :
18436532 }
{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" :
18804590 }
{ "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" :
18804591 }
paying it back
• Bent mongoengine to make this easy
• Follow github.com/exfm
• Also added tooling for
  – Trace all queries
  – Aggregate tracing by request middleware
  – Raise exceptions when queries miss an index
thanks!

  lucas@ex.fm
github.com/exfm

Weitere ähnliche Inhalte

Ähnlich wie mongodb + ex.fm

Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
lehresman
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
Andrew Brust
 

Ähnlich wie mongodb + ex.fm (20)

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
Real-time Location Based Social Discovery using MongoDB
Real-time Location Based Social Discovery using MongoDBReal-time Location Based Social Discovery using MongoDB
Real-time Location Based Social Discovery using MongoDB
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
10 Ways to Scale Your Website Silicon Valley Code Camp 201910 Ways to Scale Your Website Silicon Valley Code Camp 2019
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDBMongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB
 
Using Aggregation for Analytics
Using Aggregation for Analytics Using Aggregation for Analytics
Using Aggregation for Analytics
 
Using Aggregation for analytics
Using Aggregation for analyticsUsing Aggregation for analytics
Using Aggregation for analytics
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

mongodb + ex.fm

  • 1. mongo @ ex.fm Lucas Hrabovsky CTO #MongoPGH
  • 2. ex.fm turns websites into CD’s
  • 4. _id and indexes • Bad Ideas – ObjectId("4fb284…") – Big Compound Indexes – Long,VariableWidthStringsMissIndexes • Good Ideas – Make _id mean something – Fixed Width Hashes – Use _id as a compound index
  • 5. activity feeds: first attempt {“_id”: “201109122304-lucas-dan-c7dede43…”, "username”: “lucas”, "created”: 201109122304, "actor”: “dan”, “verb”: “love”} db.user.feed.find({„username‟: „lucas‟, „verb‟: „love‟}) .sort({„created‟: -1}) Working just fine for 4MM documents, but getting slow…
  • 6. new version of activity feeds {“_id”: “201109122304-lucas-dan- c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”: lucas-love-201109122304, "actor”: “dan”} db.user.feed.find({„vid‟: /^lucas-/}) .sort({„vid‟: -1}) Fast for all 3 use cases!
  • 7. removing indexes pays off Don‟t need to buy more/bigger machines!
  • 9. padding factor • Variable document size • Allocate for the latest and fattest • Document moves • Can be very inefficient • More RAM! • Pre-allocate to prevent moves
  • 10. unbounded embedded lists • Useful for followers, favorites • Good for a few things, bad for lots • Constantly bumping up padding factor • Lots of document moves
  • 11. a metaphor • You run a coffee shop and can buy only one size of cup. Which size do you buy? • On average, each customer has only one cup • Heavy drinkers have hundreds of cups credit: Macintex macintex.deviantart.com
  • 12. bucketing! • Split list across multiple documents • Median number of items = bucket size • Pre-allocate • Easy seeking and traversal • Much faster
  • 13. hey charts! site.meta 1 site.meta 2 site.songs 1 site.songs 2 Allocated and unused Allocated and full of data
  • 14. same charts when using bucketing site.meta 1 site.meta 2 site.songs 1 - 1 site.songs 2 - 1 site.songs 2 - 2 site.songs 1 -2 site.songs 2 - 3 site.songs 2 - 4 site.songs 2 - 5 site.songs 2 -6 Allocated and unused Allocated and full of data
  • 15. doesn’t work for everything… • Picking right bucket size • Defragging • Random insertion – Easy for things you don‟t much care about the order of – More difficult is you‟re going to insert and change the order later
  • 16. micro documents db.site.songs.find({_id: /^bfc25de08d964a8a41226c6016dd7753- /}).sort({_id:-1}) { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114", ”s" : 18436532 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" : 18804590 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" : 18804591 }
  • 17. paying it back • Bent mongoengine to make this easy • Follow github.com/exfm • Also added tooling for – Trace all queries – Aggregate tracing by request middleware – Raise exceptions when queries miss an index