SlideShare a Scribd company logo
1 of 39
Download to read offline
MongoDB Schema Design:
                        Insights and Tradeoffs


                                     Montse Medina
                                    COO,

Saturday, May 5, 12
Social content is useful
                  in context


Saturday, May 5, 12
Social context is
       useful in context
Saturday, May 5, 12
Algorithms
                             +
                      Infrastructure




Saturday, May 5, 12
Technology Stack




                                Apache Kafka

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Relational vs. Document-
                   oriented
                                                        Users
                                                 { id: 1,
               Users            Graph              name: “Robert”,
                                                   from:[2],
              id       name    from   to
                                                   to: [5,20]}

                                            vs
                                1     5
              1       Robert
                                1     20         { id: 2,
              2       Monica                       name:”Monica”,
                                2     1
              3       Lucas                        from:[23],
                                2     5            to:[1,5]}
             ...        ...    ...    ...

                                                 ...



Saturday, May 5, 12
Find all the “to” edges for user 5
                       Graph
                  from      to
                                                               Users
                      1     5          Blocks          { id: 5,
                                                         name: “Robert”,

                                              vs
                      1     20                           from:[1,2,4],
                      2     1                            to: [1,20,3,7,2]}
                      2     5
                                                       1 disk se
                      3     4                                    ek
                                                       guarante
                      3     23                                  ed !
                                                  ny
                      3     12
                      4     5                  ma
                                           as s
                      ...   ...         lly s a
                                     tia eek
                              P  ten k s
                                o is           es!
                                   d      ”e dg
                                      “to
Saturday, May 5, 12
Advantages of doc-oriented schema
         •Avoid joins
         •Disk locality when fetching relations (everything
             is stored within a doc record)



          Considerations for schema design
        •N to Many relations == Lists
        •Denormalization is more common

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Schema-less design
        {id: 1, network: Twitter, name: “Robert”,
         from:[2], to: [5,20], screenName: “robertE”}

        {id: 2, network: Facebook, name:”Maria”,
         from:[23], to:[1,5], likes: [“biking”, “hiking”]}
        ...



                                                            he sche maless
                                               L ev erage t         but put
                                                   ture of Mongo,
                                               na
                                                            n with ty p e s i n
                                                 p rotectio
                                                         you r code!

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Read-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Read-Friendly Approach
                                       Hi!


                                             Hi!



                                 Hi!
       Post:
       { _id: postId,
       owner: ownerId,
       recipient: recipientId,
       text: “message”, ...}

Saturday, May 5, 12
Read-Friendly Approach
                                    db.posts.find({recipient: uid})



                                            Sharding Key:
                                                 recipient



                      Fast retrieval, easy sharding
                      Slow writes, enormous amount of storage


Saturday, May 5, 12
Write-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Write-Friendly Approach

                                 Hi!




        Post:
        { _id: postId,
         owner: oId,
         text: “message”, ...}

Saturday, May 5, 12
Write-Friendly Approach

                             db.posts.find({owner: {$in:user.from}})


                                            Sharding Key:
                                                   ?



                      Fast writes, slim storage
                      Slow reads, harder queries


Saturday, May 5, 12
Hybrid Approach

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Hybrid Approach

                               Hi!




     Post:
     { _id: postId,
       owner: ownerId,
       recipients: [u1, u2, u3, u5],
       text: “message”, ...}


Saturday, May 5, 12
Hybrid Approach

                                db.posts.find({recipients: uId})


                                          Sharding Key:
                                              random :)



                        Fast writes, slim storage,
                        reasonable read speed



Saturday, May 5, 12
Random sharding is not
                     random!      t he
           Best -- Impossible for our data         ize disk
                                                nim of
                                             Mi e r
                                                  b r sha rd!
                                             num pe
                                             seeks
            Worse



           Optimal solution




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Indexes
                                           Primary Key
                       link: {                                                                ral
                                                                                         atu e
                                                                                    a n f th
                                    _id: ObjectId(...),
                                    url: “www.jetlore.com”,
                                                                              has content”,
                                    title: “Jetlore is a search platform for social ad o
                                                                        ata ste
                                                                     r d t in             tId
                                                               you se i
                                    description: “...”
                                                                                     j ec
                                }                           If
                                                                  , u fault     Ob
                                                               PK de


                      link: {
                                 _id: “www.jetlore.com”,
                                 title: “Jetlore is a search platform for social content”,
                                 description: “...”
                            }



Saturday, May 5, 12
Indexes
              Augment your schema to enable the
                    most selective index
                                                                                       ount”
                                                                                 ik esC
                                                                         w “l
                         post: {
                                                               a ne                           ient
                                                                                                   s: 1
                                                                                                        ,
                                   _id: ObjectId(...),
                                   recipients: [...],    Add                          r ec ip
                                                                               ex ( {
                                   likes: [...],          fie ld!        r eInd
                                   likesCount: ...,              s.e nsu )
                                                                 p ost nt: -1}
                                   ...}                     db. Cou
                                                                   s
                                                             lik e


                      Want all posts that a user can view sorted by
                      the number of likes




Saturday, May 5, 12
Indexes
                      Make sure to use the proper index

                           db.posts.find({recipients: uId}).sort({date: -1})
                                                                                      ith
                                                                                   tw
                                                                               tes ()
                                                                          a y s lain
                           db.posts.ensureIndex({recipients: 1})       Alw exp
                           db.posts.ensureIndex({date: 1})



                                                   vs               date: -1
                           db.posts.ensureIndex({recipients: 1, date:1})




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Concurrency
                         Try to avoid “save()” in drivers
                      thread1: { _id: u1,                    thread2: { _id: u1,
                                      name: “Robert”,                        name: “Bob”,
                                      from: [u2, u3]                         from: []
                                    }                                      }

                            db.users.update({_id: thread1._id}, {$set: {thread1.from}})

                        db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})


                                                      …but!
                          db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)




Saturday, May 5, 12
Concurrency
       Atomic Commutative Operators

                               db.users.update({_id: u1}, {$pull {to: u2}})

                           db.posts.update({_id: pId}, {$inc: {likesCount: 1}})




                      When updating lists and counters, instead of
                                 using $set, rely on
                               $inc, $addToSet, $pull



Saturday, May 5, 12
Concurrency
                                No Transactions

          user1: { _id: u1,
                                          User1 wants to
                 to: [u2, u3],            unsubscribe from user2.
                 from: [...], ...}

          user2: { _id: u2,               Ideally we would update
                 to: [...],
                 from: [u1, ...], ...}
                                          both users in one
                                          transaction                  ur
                                                                    yo
                                                            ti t in
                                                         en e
                                                      lem c o d
                                                 I mp

Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Reducing collection size
                                   Name your fields with short
                                           names!

     post: {
                      owner: ObjectId,
                      messageText: “loving Jetlore”,
                      mediaUrl: “www.jetlore.com”,
                      mediaTitle: “Jetlore is a user analytics & search platform for social content”
                }
                                                       vs
     post: {
                      o: ObjectId,
                      t: “loving Jetlore”,
                      mu: “www.jetlore.com”,
                      mt: “Jetlore is a user analytics & search platform for social content”
                }


Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB
     ‣   Single lock

     ‣   ($or + sort) query doesn’t use indexes properly

     ‣   Indexes with 2 list fields

     ‣   Record iterators + update
Saturday, May 5, 12
$or & sort query doesn’t use the proper
                        index
            db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})


                            db.posts.ensureIndex({recipients: 1, date: -1})

                              db.posts.ensureIndex({privacy: 1, date: -1})



                         Indexes with 2 list fields

       post: { _id: ObjectId(...),
              recipients: [...],
                                           db.posts.ensureIndex({recipients: 1, links: 1})
              links: [...],
             ... }



Saturday, May 5, 12
Record iterators +
                          updating
      var posts = db.posts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

                      Sort by a field that will not change
                         or rename the old collection

      var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)

      db.posts.renameCollection(“oldPosts”)
      var posts = db.oldPosts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

Saturday, May 5, 12
The take aways

    I. What is more important?

        •      Writes: Optimize for easy inserts/updates

        •      Reads: Optimize for easy querying

    II. Denormalize to enable the most selective index

    III. Concurrency: design to leverage commutative
      operators


Saturday, May 5, 12
Thank you!
                      Try our tech


                               powered by




Saturday, May 5, 12

More Related Content

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

  • 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO, Saturday, May 5, 12
  • 2. Social content is useful in context Saturday, May 5, 12
  • 3. Social context is useful in context Saturday, May 5, 12
  • 4. Algorithms + Infrastructure Saturday, May 5, 12
  • 5. Technology Stack Apache Kafka Saturday, May 5, 12
  • 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ... Saturday, May 5, 12
  • 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “to Saturday, May 5, 12
  • 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more common Saturday, May 5, 12
  • 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code! Saturday, May 5, 12
  • 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 14. Read-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...} Saturday, May 5, 12
  • 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storage Saturday, May 5, 12
  • 17. Write-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...} Saturday, May 5, 12
  • 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queries Saturday, May 5, 12
  • 20. Hybrid Approach Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...} Saturday, May 5, 12
  • 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speed Saturday, May 5, 12
  • 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solution Saturday, May 5, 12
  • 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” } Saturday, May 5, 12
  • 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likes Saturday, May 5, 12
  • 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1}) Saturday, May 5, 12
  • 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false) Saturday, May 5, 12
  • 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pull Saturday, May 5, 12
  • 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mp Saturday, May 5, 12
  • 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” } Saturday, May 5, 12
  • 35. Outline I. Schema design II. Lessons learned for schema design III. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + update Saturday, May 5, 12
  • 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... } Saturday, May 5, 12
  • 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Saturday, May 5, 12
  • 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operators Saturday, May 5, 12
  • 39. Thank you! Try our tech powered by Saturday, May 5, 12