Optimize MongoDB performance with micro documents and bucketing

•Als PPTX, PDF herunterladen•

0 gefällt mir•383 views

The document discusses various techniques for optimizing MongoDB databases, including: - Making _id values meaningful and using fixed-width hashes to improve indexing performance. - Refactoring activity feed documents to group related data into a single field indexed by MongoDB to enable fast queries. - Removing unnecessary indexes to improve performance without needing more hardware. - Addressing inefficiencies from variable document sizes through techniques like pre-allocating space to prevent document moves within collections.

Technologie Unterhaltung & Humor Business

mongo @ ex.fm
Lucas Hrabovsky
CTO
#MongoPGH

_id and indexes
• Bad Ideas
– ObjectId("4fb284…")
– Big Compound Indexes
– Long,VariableWidthStringsMissIndexes
• Good Ideas
– Make _id mean something
– Fixed Width Hashes
– Use _id as a compound index

$activity feeds: first attempt {“_id”: “201109122304-lucas-dan-c7dede43…”, "username”: “lucas”, "created”: 201109122304, "actor”: “dan”, “verb”: “love”} db.user.feed.find({„username‟: „lucas‟, „verb‟: „love‟}) .sort({„created‟: -1}) Working just fine for 4MM documents, but getting slow…$

$new version of activity feeds {“_id”: “201109122304-lucas-dan- c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”: lucas-love-201109122304, "actor”: “dan”} db.user.feed.find({„vid‟: /^lucas-/}) .sort({„vid‟: -1}) Fast for all 3 use cases!$

removing indexes pays off

Don‟t need to buy more/bigger machines!

padding factor
• Variable document size
• Allocate for the latest and fattest
• Document moves
• Can be very inefficient
• More RAM!
• Pre-allocate to prevent moves

unbounded embedded lists
• Useful for followers, favorites
• Good for a few things, bad for lots
• Constantly bumping up padding factor
• Lots of document moves

a metaphor
• You run a coffee shop and can buy only
one size of cup. Which size do you buy?
• On average, each customer has only one
cup
• Heavy drinkers have hundreds of cups

credit: Macintex macintex.deviantart.com

bucketing!
• Split list across multiple documents
• Median number of items = bucket size
• Pre-allocate
• Easy seeking and traversal
• Much faster

hey charts!
site.meta 1 site.meta 2

site.songs 1 site.songs 2

Allocated and unused

Allocated and full of data

same charts when using
bucketing
site.meta 1 site.meta 2

site.songs 1 - 1 site.songs 2 - 1 site.songs 2 - 2

site.songs 1 -2 site.songs 2 - 3 site.songs 2 - 4

site.songs 2 - 5 site.songs 2 -6

Allocated and unused

Allocated and full of data

doesn’t work for everything…
• Picking right bucket size
• Defragging
• Random insertion
– Easy for things you don‟t much care about the
order of
– More difficult is you‟re going to insert and
change the order later

$micro documents db.site.songs.find({_id: /^bfc25de08d964a8a41226c6016dd7753- /}).sort({_id:-1}) { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114", ”s" : 18436532 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" : 18804590 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" : 18804591 }$

paying it back
• Bent mongoengine to make this easy
• Follow github.com/exfm
• Also added tooling for
– Trace all queries
– Aggregate tracing by request middleware
– Raise exceptions when queries miss an index

Empfohlen

MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBDaniel Coupal

JSON-LD UpdateGregg Kellogg

Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler

Skolan och den digitala världen - i de sociala mediernaAnnika Mayer

100% JS__lucas

Cloudsearch @ ex.fm__lucas

Caribbean cruises IT strategy an analysisAbinesh Raja M

Learn BEM: CSS Naming ConventionIn a Rocket

Empfohlen

MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBDaniel Coupal

JSON-LD UpdateGregg Kellogg

Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler

Skolan och den digitala världen - i de sociala mediernaAnnika Mayer

100% JS__lucas

Cloudsearch @ ex.fm__lucas

Caribbean cruises IT strategy an analysisAbinesh Raja M

Learn BEM: CSS Naming ConventionIn a Rocket

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkinskiwilkins

SQL vs NoSQLJacinto Limjap

Modeling Data in MongoDBlehresman

Scalable web architectureKaushik Paranjape

A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust

NoSQL and The Big Data HullabalooAndrew Brust

Real-time Location Based Social Discovery using MongoDBFredrik Björk

The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares

Postgres Vision 2018: Five Sharding Data ModelsEDB

Is NoSQL The Future of Data Storage?Saltmarch Media

MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB

Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.

10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB

MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz

MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDBMongoDB

Socialite, the Open Source Status FeedMongoDB

MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB

Using Aggregation for analyticsMongoDB

Using Aggregation for Analytics MongoDB

Gen AI in Business - Global Trends Report 2024.pdfAddepto

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Weitere ähnliche Inhalte

Ähnlich wie Optimize MongoDB performance with micro documents and bucketing

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkinskiwilkins

SQL vs NoSQLJacinto Limjap

Modeling Data in MongoDBlehresman

Scalable web architectureKaushik Paranjape

A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust

NoSQL and The Big Data HullabalooAndrew Brust

Real-time Location Based Social Discovery using MongoDBFredrik Björk

The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares

Postgres Vision 2018: Five Sharding Data ModelsEDB

Is NoSQL The Future of Data Storage?Saltmarch Media

MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB

Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.

10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB

MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz

MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDBMongoDB

Socialite, the Open Source Status FeedMongoDB

MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB

Using Aggregation for analyticsMongoDB

Using Aggregation for Analytics MongoDB

Ähnlich wie Optimize MongoDB performance with micro documents and bucketing (20)

MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins

SQL vs NoSQL

Modeling Data in MongoDB

Scalable web architecture

A Practical Look at the NOSQL and Big Data Hullabaloo

NoSQL and The Big Data Hullabaloo

Real-time Location Based Social Discovery using MongoDB

The Fine Art of Schema Design in MongoDB: Dos and Don'ts

Postgres Vision 2018: Five Sharding Data Models

Is NoSQL The Future of Data Storage?

MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...

Learn Learn how to build your mobile back-end with MongoDB

10 Ways to Scale Your Website Silicon Valley Code Camp 2019

MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB

MongoDB for Coder Training (Coding Serbia 2013)

MongoDB .local Toronto 2019: A Complete Methodology of Data Modeling for MongoDB

Socialite, the Open Source Status Feed

MongoDB .local Chicago 2019: A Complete Methodology to Data Modeling for MongoDB

Using Aggregation for analytics

Using Aggregation for Analytics

Kürzlich hochgeladen

Gen AI in Business - Global Trends Report 2024.pdfAddepto

WordPress Websites for Engineers: Elevate Your Brandgvaughan

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

AI as an Interface for Commercial BuildingsMemoori

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

CloudStudio User manual (basic edition):comworks

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Training state-of-the-art general text embeddingZilliz

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Kürzlich hochgeladen (20)

Gen AI in Business - Global Trends Report 2024.pdf

WordPress Websites for Engineers: Elevate Your Brand

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Powerpoint exploring the locations used in television show Time Clash

Unraveling Multimodality with Large Language Models.pdf

Vertex AI Gemini Prompt Engineering Tips

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Artificial intelligence in cctv survelliance.pptx

Streamlining Python Development: A Guide to a Modern Project Setup

Developer Data Modeling Mistakes: From Postgres to NoSQL

AI as an Interface for Commercial Buildings

Connect Wave/ connectwave Pitch Deck Presentation

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

My Hashitalk Indonesia April 2024 Presentation

CloudStudio User manual (basic edition):

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Training state-of-the-art general text embedding

What's New in Teams Calling, Meetings and Devices March 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Optimize MongoDB performance with micro documents and bucketing

1. mongo @ ex.fm Lucas Hrabovsky CTO #MongoPGH

2. ex.fm turns websites into CD’s

3. browser extensions

4. _id and indexes • Bad Ideas – ObjectId("4fb284…") – Big Compound Indexes – Long,VariableWidthStringsMissIndexes • Good Ideas – Make _id mean something – Fixed Width Hashes – Use _id as a compound index

5. activity feeds: first attempt {“_id”: “201109122304-lucas-dan-c7dede43…”, "username”: “lucas”, "created”: 201109122304, "actor”: “dan”, “verb”: “love”} db.user.feed.find({„username‟: „lucas‟, „verb‟: „love‟}) .sort({„created‟: -1}) Working just fine for 4MM documents, but getting slow…

6. new version of activity feeds {“_id”: “201109122304-lucas-dan- c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”: lucas-love-201109122304, "actor”: “dan”} db.user.feed.find({„vid‟: /^lucas-/}) .sort({„vid‟: -1}) Fast for all 3 use cases!

7. removing indexes pays off Don‟t need to buy more/bigger machines!

8. sites! sites! sites!

9. padding factor • Variable document size • Allocate for the latest and fattest • Document moves • Can be very inefficient • More RAM! • Pre-allocate to prevent moves

10. unbounded embedded lists • Useful for followers, favorites • Good for a few things, bad for lots • Constantly bumping up padding factor • Lots of document moves

11. a metaphor • You run a coffee shop and can buy only one size of cup. Which size do you buy? • On average, each customer has only one cup • Heavy drinkers have hundreds of cups credit: Macintex macintex.deviantart.com

12. bucketing! • Split list across multiple documents • Median number of items = bucket size • Pre-allocate • Easy seeking and traversal • Much faster

13. hey charts! site.meta 1 site.meta 2 site.songs 1 site.songs 2 Allocated and unused Allocated and full of data

14. same charts when using bucketing site.meta 1 site.meta 2 site.songs 1 - 1 site.songs 2 - 1 site.songs 2 - 2 site.songs 1 -2 site.songs 2 - 3 site.songs 2 - 4 site.songs 2 - 5 site.songs 2 -6 Allocated and unused Allocated and full of data

15. doesn’t work for everything… • Picking right bucket size • Defragging • Random insertion – Easy for things you don‟t much care about the order of – More difficult is you‟re going to insert and change the order later

16. micro documents db.site.songs.find({_id: /^bfc25de08d964a8a41226c6016dd7753- /}).sort({_id:-1}) { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029114", ”s" : 18436532 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029113", ”s" : 18804590 } { "_id" : "bfc25de08d964a8a41226c6016dd7753-1337029112", ”s" : 18804591 }

17. paying it back • Bent mongoengine to make this easy • Follow github.com/exfm • Also added tooling for – Trace all queries – Aggregate tracing by request middleware – Raise exceptions when queries miss an index

18. thanks! lucas@ex.fm github.com/exfm