Video and slides synchronized, mp3 and slide download available at http://bit.ly/16I9vsI.
Tony Tam shares tips for modeling data with MongoDB for a fast and scalable system based on his experience migrating billions of records from MySQL to MongoDB. Filmed at qconsf.com.
Tony Tam is the CEO and technical co-founder of Wordnik, where he leads development efforts of the site's innovative word navigation system. He is a member of the MongoDB Masters group and has lead the Swagger API Framework open-source initiative. He was a founder and SVP of Engineering at Think Passenger, Lead Engineer at Composite Software and a software development lead at Galileo Labs.
2. InfoQ.com: News & Community Site
⢠750,000 unique visitors/month
⢠Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
⢠Post content from our QCon conferences
⢠News 15-20 / week
⢠Articles 3-4 / week
⢠Presentations (videos) 12-15 / week
⢠Interviews 2-3 / week
⢠Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/data-modeling-mongodb
3. Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
5. Why Modeling Matters!
â˘âŻNoSQL => no joins!
â˘âŻWhat replaces joins?!
â˘âŻ Hierarchy!
â˘âŻ Duplication of data!
â˘âŻ Different models for querying, indexing!
â˘âŻYour optimal data model is (probably) very
different than with relational!
â˘âŻ Simpler!
â˘âŻ More like you develop!
10. Hierarchy with NoSQL!
â˘âŻJSON structure mapped to objects!
â˘âŻ Fetch json from MongoDB**!
â˘âŻ Unmarshall into objects/tuples!
â˘âŻ Use it!
Using JSON4S
12. Hierarchy with NoSQL!
â˘âŻWrite operations!
â˘âŻ Atomic upsert (create, update or fail)!
!
â˘âŻ Saves all levels of object atomically!
â˘âŻ Reduces need for transactions!
13. Hierarchy with NoSQL!
â˘âŻWrite operations!
â˘âŻ Atomic upsert (create, update or fail)!
!
â˘âŻ Saves all levels of object atomically!
â˘âŻ Reduces need for transactions!
All or
Convenienc
not magic
14. Unique IdentiďŹers in your Data!
â˘âŻRelational design => PK/FK!
â˘âŻ Often not âmeaningfulâ identiďŹers for data!
â˘âŻUser Data Model!
15. Unique IdentiďŹers in your Data!
â˘âŻRelational design => PK/FK!
â˘âŻ Often not âmeaningfulâ identiďŹers for data!
â˘âŻUser Data Model!
Unique by
username
17. Data Duplication !
â˘âŻWithout Joins, what about SQL lookup
tables?!
â˘âŻ Duplication of data in NoSQL is required!
â˘âŻTrade storage for speed!
18. Data Duplication !
â˘âŻWithout Joins, what about SQL lookup
tables?!
â˘âŻ Duplication of data in NoSQL is required!
â˘âŻTrade storage for speed!
âŚCan move
logic to app
19. Data Duplication!
â˘âŻMany ďŹelds donât change, ever!
â˘âŻBut⌠many do!
â˘âŻ New decisions for the developer!!
â˘âŻ Often background updates!
20. Data Duplication!
â˘âŻMany ďŹelds donât change, ever!
â˘âŻBut⌠many do!
â˘âŻ New decisions for the developer!!
â˘âŻ Often background updates!
How often
does this
change?
23. Inner Indexes!
â˘âŻConvenience at a cost!
â˘âŻ No index => table scan!
â˘âŻ No value? => table scan!
â˘âŻ No child value? => table scan!
â˘âŻTable scan with big collection?!
â˘âŻCanât index everything!!
96GB of
Indexes?
24. Inner Indexes!
â˘âŻThis will should drive your Data Model!
â˘âŻSparse Data test!
Even with only
2000 non-empty
values!
25. Adding & Modifying!
â˘âŻAppend in mongo is blazing fast!
â˘âŻ âtailâ of data is always in memory!
â˘âŻ Pre-allocated data ďŹles!
â˘âŻMain expense is âindex maintenanceâ!
â˘âŻ Some marshalling/unmarshalling cost**!
â˘âŻModifying? Object growth!
â˘âŻ Pre-allocation of space built in collection design!
26. Adding & Modifying!
â˘âŻEach object has allocated space!
â˘âŻ Exceed that space, need to relocate object!
â˘âŻ Leaves âholeâ in collection!
â˘âŻLarge increases to documents hurts your
overall performance!
â˘âŻYour data model should strive for equally-
sized objects as much as possible!
27. Retrieval!
â˘âŻMany same rules apply as relational!
â˘âŻIndexes !
â˘âŻ complex/inner or not!
â˘âŻ Indexes in RAM? Yes!
â˘âŻ Cardinality matters!
â˘âŻNew(ish) considerations!
â˘âŻ Complex hierarchy not free!
â˘âŻ Marshalling Ăłďł unmarshalling!
29. Marshalling & Unmarshalling!
â˘âŻAll you can eat from your Data Model?!
â˘âŻTechniques have tremendous impact!
â˘âŻ Development ease until it matters!
â˘âŻ 50% speed bump with manual mapping!
Only demand
what you can
consume!
30. Making the most of _id!
â˘âŻIndexes matter!
â˘âŻTailor your _id to be meaningful by access
pattern!
â˘âŻ Itâs your ďŹrst defense when auto-sharding!
â˘âŻDate-driven data?!
â˘âŻ Monotonically _id value!
â˘âŻ Ensures recent data is âhotâ!
31. Making the most of _id!
â˘âŻOther time-based data techniques!
â˘âŻFlexibility in querying!
32. Making the most of _id!
â˘âŻOther time-based data techniques!
â˘âŻFlexibility in querying!
Case-
sensitive
REGEX is
33. Making the most of _id!
â˘âŻHot indexes are happy indexes!
â˘âŻ Access should strive for right bias!
â˘âŻRandom access with large indexes hit disk
17
15
34. Your Data Model!
â˘âŻNoSQL gets you started faster!
â˘âŻMany relational pain points are gone!
â˘âŻNew considerations (easier?)!
â˘âŻMigration should be real effort!
â˘âŻDesigned by access patterns over object
structure!
â˘âŻDonât prematurely optimize, but know
where the knobs are!