SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Downloaden Sie, um offline zu lesen
Data Modeling for
                 Performance


Mongo Boulder                 Michael Dwan
January 21, 2010                     Snapjoy
i’m michael dwan
 @michaeldwan on the twitter
the project
  Company X
• find business details (web + api)
• search by category/keyword + geo (web + api)
• update (api)



                                   application spec
100,000             30,000
                                 100,000,000
geo areas                              tags
                   partners

                                    2,300
   15,000,000                     categories

       businesses
                              2,000,000
                              requests daily
24,000,000
 urls in sitemap
                          why is this interesting?
• infrequent changes
• monthly updates w/ 12M monthly changes
• “zero downtime”



                                           updates
the problem
 mo’ data, mo’ problems
complexity
providers          mappings                phone_numbers

                                                                          zips
 assets

                               businesses _phone_numbers

                                                                         cities
categorizations




                             businesses
                                                                         states
  categories


                                                           businesses_neighborhoods
                  taggings



                                    users
    tags                                                        neighborhoods
x
xx   x
     architecture
read performance
dow
   n ti
       me
solr
solr getting fussy
dow
      n ti
          me
migrations
the solution
> gem install acts_as_web_scale
Data Modeling for Performance
Data Modeling for Performance
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




                                        a business...
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
}




            a business... has many phone numbers
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}


            a business... has many phone numbers
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
      "5035550091",
      "8005555456"
    ]
}




                      a business... has coordinates
"_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "tagline" : "Your trusty glass hole",
    "description" : "Glass repair...",
    "hours" : "Mon Fri 8 5",
    "url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}



                      a business... has coordinates
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ]
}




                        a business... has many tags
"url" : "http://acmeglasshole.biz",
    "phone_numbers" : [
       "5035550091",
       "8005555456"
    ],
    "coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}



                        a business... has many tags
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ]
}




                        a business... has an address
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                         a business... has an address
belongs to?
{
    "_id" : ObjectId("4ce82937961552247900000f"),
    "name" : "Illinois",
    "slug" : "il",
    ...
}




                                             a state
"coordinates" : [
       45.559294,
       -122.644053
    ],
    "tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St"
    }
}




                     a business... belongs to a state
"tags" : [
       "glass",
       "mirrors",
       "flat glass"
    ],
    "location" : {
       "street_address" : "2035 NE Alberta St",
       "state" : {
         "_id" : ObjectId("4ce829379615522479000026"),
         "meta" : {
            "slug" : "or"
         },
         "display_name" : "Oregon"
       }
    }
}


                     a business... belongs to a state
"state" : {
          "_id" : ObjectId("4ce829379615522479000026"),
          "meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        }
    }
}




                          a business... belongs to a city
"state" : {
           "_id" : ObjectId("4ce829379615522479000026"),
           "meta" : {
              "slug" : "or"
           },
           "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}

                          a business... belongs to a city
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
    }
}




                     a business... belongs to a zip code
},
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}

                     a business... belongs to a zip code
many-to-many?
{
    "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
    "name" : "Auto Glass",
    "slug" : "3063-auto-glass",
    "tags" : [
       "windshields"
    ],
    ...
}




                                       a category
"meta" : {
             "slug" : "or"
          },
          "display_name" : "Oregon"
        },
        "city" : {
           "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"),
           "meta" : {
              "slug" : "portland",
           },
           "display_name" : "Portland, OR"
        },
        "zip" : {
           "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"),
           "display_name" : "97211"
        }
    }
}




                        a business... belongs to a zip code
}
    }
}




            a business... belongs to many categories
}
    },
    "categories" : [
       {
          "_id" : ObjectId("4ce82e50d3dfaa16360004f2"),
          "meta" : {
             "slug" : "282-glass",
             "tags" : [ "windows" ],
          },
          "display_name" : "Glass"
       },
       {
          "_id" : ObjectId("4ce82e64d3dfaa16360014eb"),
          "meta" : {
             "slug" : "3063-auto-glass",
             "tags" : [ "windshields" ],
          },
          "display_name" : "Auto Glass"
       }
    ]
}

               a business... belongs to many categories
queries & indexes
    know what you want
#1 find a business
    I want *that* one
// single business
db.businesses.findOne({
   _id: ObjectId("4ce838ef4a882579960001b9")
})




                                 find a business
#2 find by location
  Businesses in San Francisco, CA
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})




                       find businesses by state/city/zip
// find all within state
db.businesses.find({
   "location.state._id": ObjectId("4ce82937961552247900000f")
})

// find all within city
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")
})

// find all within zip
db.businesses.find({
   "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")
})




                       find businesses by state/city/zip
// the indexes
db.businesses.ensureIndex({"location.city._id": 1})
db.businesses.ensureIndex({"location.zip._id": 1})



                         1.5GB
                          each




    skip “location.state._id” -- only 51 possibilities


                                                 indexes
#3 find by category
 Businesses in the Auto Repair category
// find by category id
db.businesses.find({
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})


// the index
db.businesses.ensureIndex({
   "categories._id":1
})




                               businesses by category
#4 - find by category + location
   Businesses in the Plumbing category in Chicago, IL
// find by city id and category id
db.businesses.find({
   "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")
})




                         businesses by category + city
// city id
 {"location.city._id":1}


         ~ or ~

  // category id
  {"categories._id":1}




 answer: both suck
we need a compound index


         which index should we use?
db.businesses.ensureIndex({
    "location.city._id" : 1, "categories._id" : 1
 })

                     ~ or ~
 db.businesses.ensureIndex({
    "categories._id" : 1, "location.city._id" : 1
 })


      35,000 cities & 2,500 categories


   answer: cities → categories
create one for zip codes and categories too!

                                          which order?
{"location.city._id" : 1}
 {"location.city._id" : 1, "categories._id" : 1}




                 answer: yes

db.businesses.dropIndex("location.city._id_1")




              don’t we have 2 indexes on city id?
#5 - find by keyword
  “something awesome” in Boulder, CO
{
    "_id" : ObjectId("4ce838ef4a882579960001b9"),
    "name" : "Acme Glass Co",
    "keywords" : [
      "glass",
      "repair",
      "acme",
      ...
    ]
}



db.businesses.ensureIndex({
   "location.city._id":1,
   "keywords":1
})



db.businesses.find({
   "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"),
   "keywords":/glass/i
})




             find businesses in city by keyword
me: we’re switching from postgres+solr to mongo
kyle: oh wow, you can replace solr with mongo?
me: with some creativity
kyle: seems like it’d still be hard to get just right
me: it works well
kyle: gotcha



                                chat with Kyle Banker
i was wrong, kyle was right
I




        I’ll never leave you again

...until MongoDB supports full text later this year
                      :)
aggregation
map/reduce to the rescue
sitemaps
big list of every url
• xml files containing each unique url ~ 24M
• 50,000 urls per file, about 500 files
• urls are generated from live data
• http://companyx.com/sitemaps/1.xml


                                              sitemaps
>> "hello!".hash % 6 #=> 5

>> "/ny/new-york/c/apartments".hash % 6 #=> 5




    returns an integer between 0 and the
              number specified




                   partition by consistent hash
1. map each url in the site to a partition
2. reduce all partitions to a single document containing
   all urls in that partition
3. save to a permanent collection




                                             map/reduce
/il/chicago/c/pizza                      4
                                             1
/ny/new-york/c/apartments                1
nd/rugby/c/apartments                    6   2
/14076500-bayside-marina                 2
/13401000-comtrak-logistics-inc          3   3
/12347500-allstate-auto-insurance        1
il/downers-grove/c/computer-web-design   6   4
/1009500-heidelberg-lodges               5
mn/redwood-falls/c/food-service          4   5
/14077000-bank-of-america                5
mn/savage/c/audio-visual-equipment       1   6
...


                                             map
{
                                             {
    "total" : 2,
                                                 "total" : 1,
    "urls" : [
                                                 "urls" : [
      "/12347500-allstate-auto-insurance",
                                                   "/mn/savage/c/audio-visual-equipment"
      "/ny/new-york/c/apartments"
                                                 ]
    ]
                                             }
}




         {
             "_id" : 1,
             "value" : {
               "total" : 2,
               "urls" : [
                 "/12347500-allstate-auto-insurance",
                 "/mn/savage/c/audio-visual-equipment",
                 "/ny/new-york/c/apartments"
               ]
             }
         }                                                                       reduce
db.sitemaps.findOne({_id:1}).value.urls




[
    "/12347500-allstate-auto-insurance",
    "/mn/savage/c/audio-visual-equipment",
    "/ny/new-york/c/apartments"
]




                                             usage
wrap up
115ms average response times


                        2 months later
thank you
 @michaeldwan

Weitere ähnliche Inhalte

Andere mochten auch

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義佩貞 林
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...Research Hub
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育佩貞 林
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?David Hepworth
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011Christopher Rice
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjneGucio Silva
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecElisa Camahort Page
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题碳酸饮料会
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyceGucio Silva
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcyGucio Silva
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviationsUCT
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder PlanoToyota of Irving
 

Andere mochten auch (19)

梅可望校長養生講義
梅可望校長養生講義梅可望校長養生講義
梅可望校長養生講義
 
2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...2015 deep research report on global optically functional films and coatings i...
2015 deep research report on global optically functional films and coatings i...
 
孝順需要教育
孝順需要教育孝順需要教育
孝順需要教育
 
Cennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUSCennox_ASD-SENTINEL_RUS
Cennox_ASD-SENTINEL_RUS
 
知足常樂
知足常樂知足常樂
知足常樂
 
What if everyone gave just 1 penny?
What if everyone gave just 1 penny?What if everyone gave just 1 penny?
What if everyone gave just 1 penny?
 
Awesome images
Awesome imagesAwesome images
Awesome images
 
PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011PS 240 Thinking Politically Spring 2011
PS 240 Thinking Politically Spring 2011
 
Fundusze inwestycyjne
Fundusze inwestycyjneFundusze inwestycyjne
Fundusze inwestycyjne
 
Blog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ecBlog her devries_surveyfinal, ec
Blog her devries_surveyfinal, ec
 
Some Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from YesteryearSome Beaut looking Vehicles from Yesteryear
Some Beaut looking Vehicles from Yesteryear
 
205- New year concert
205- New year concert205- New year concert
205- New year concert
 
197 - Nursing
197 - Nursing197 - Nursing
197 - Nursing
 
Png还是jpg,这是个问题
Png还是jpg,这是个问题Png还是jpg,这是个问题
Png还是jpg,这是个问题
 
Akupresura w-praktyce
Akupresura w-praktyceAkupresura w-praktyce
Akupresura w-praktyce
 
Umysl sprzedawcy
Umysl sprzedawcyUmysl sprzedawcy
Umysl sprzedawcy
 
Journal abbreviations
Journal abbreviationsJournal abbreviations
Journal abbreviations
 
New York City
New York CityNew York City
New York City
 
2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano2011 Toyota Highlnder Plano
2011 Toyota Highlnder Plano
 

Ähnlich wie Data Modeling for Performance

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured DataSMA Marketing
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Matthew Groves
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japantristansokol
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA TREEPTIK
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...EDB
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Natasha Wilson
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesMongoDB
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Isoscon2007
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBMongoDB
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppSabrina Ngai
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druidJulien Lavigne du Cadet
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsClosemarketing
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestMatthew Groves
 

Ähnlich wie Data Modeling for Performance (20)

Rich Results and Structured Data
Rich Results and Structured DataRich Results and Structured Data
Rich Results and Structured Data
 
Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017Querying NoSQL with SQL - KCDC - August 2017
Querying NoSQL with SQL - KCDC - August 2017
 
SDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - JapanSDKs, the good the bad the ugly - Japan
SDKs, the good the bad the ugly - Japan
 
MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA MAKE SENSE OF YOUR BIG DATA
MAKE SENSE OF YOUR BIG DATA
 
Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...Application Development & Database Choices: Postgres Support for non Relation...
Application Development & Database Choices: Postgres Support for non Relation...
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
 
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer ProfilesBig Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
 
Data Mining Open Ap Is
Data Mining Open Ap IsData Mining Open Ap Is
Data Mining Open Ap Is
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Designing Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad AppDesigning Capital One's iPhone and iPad App
Designing Capital One's iPhone and iPad App
 
Interactive analytics at scale with druid
Interactive analytics at scale with druidInteractive analytics at scale with druid
Interactive analytics at scale with druid
 
WordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin pluginsWordCamp Chiclana 2020 Crea schemas sin plugins
WordCamp Chiclana 2020 Crea schemas sin plugins
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
JSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa TechfestJSON Data Modeling - July 2018 - Tulsa Techfest
JSON Data Modeling - July 2018 - Tulsa Techfest
 

Kürzlich hochgeladen

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 

Kürzlich hochgeladen (20)

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 

Data Modeling for Performance

  • 1. Data Modeling for Performance Mongo Boulder Michael Dwan January 21, 2010 Snapjoy
  • 2. i’m michael dwan @michaeldwan on the twitter
  • 3. the project Company X
  • 4. • find business details (web + api) • search by category/keyword + geo (web + api) • update (api) application spec
  • 5. 100,000 30,000 100,000,000 geo areas tags partners 2,300 15,000,000 categories businesses 2,000,000 requests daily 24,000,000 urls in sitemap why is this interesting?
  • 6. • infrequent changes • monthly updates w/ 12M monthly changes • “zero downtime” updates
  • 7. the problem mo’ data, mo’ problems
  • 9. providers mappings phone_numbers zips assets businesses _phone_numbers cities categorizations businesses states categories businesses_neighborhoods taggings users tags neighborhoods
  • 10. x xx x architecture
  • 12. dow n ti me solr
  • 14. dow n ti me migrations
  • 16. > gem install acts_as_web_scale
  • 19. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business...
  • 20. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", } a business... has many phone numbers
  • 21. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has many phone numbers
  • 22. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ] } a business... has coordinates
  • 23. "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has coordinates
  • 24. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ] } a business... has many tags
  • 25. "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has many tags
  • 26. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ] } a business... has an address
  • 27. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... has an address
  • 29. { "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ... } a state
  • 30. "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 31. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" } } a business... belongs to a state
  • 32. "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a state
  • 33. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } } } a business... belongs to a city
  • 34. "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a city
  • 35. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, } } a business... belongs to a zip code
  • 36. }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 38. { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ... } a category
  • 39. "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } } } a business... belongs to a zip code
  • 40. } } } a business... belongs to many categories
  • 41. } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ] } a business... belongs to many categories
  • 42. queries & indexes know what you want
  • 43. #1 find a business I want *that* one
  • 44. // single business db.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9") }) find a business
  • 45. #2 find by location Businesses in San Francisco, CA
  • 46. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) find businesses by state/city/zip
  • 47. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) find businesses by state/city/zip
  • 48. // find all within state db.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f") }) // find all within city db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95") }) // find all within zip db.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0") }) find businesses by state/city/zip
  • 49. // the indexes db.businesses.ensureIndex({"location.city._id": 1}) db.businesses.ensureIndex({"location.zip._id": 1}) 1.5GB each skip “location.state._id” -- only 51 possibilities indexes
  • 50. #3 find by category Businesses in the Auto Repair category
  • 51. // find by category id db.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) // the index db.businesses.ensureIndex({ "categories._id":1 }) businesses by category
  • 52. #4 - find by category + location Businesses in the Plumbing category in Chicago, IL
  • 53. // find by city id and category id db.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2") }) businesses by category + city
  • 54. // city id {"location.city._id":1} ~ or ~ // category id {"categories._id":1} answer: both suck we need a compound index which index should we use?
  • 55. db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1 }) ~ or ~ db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1 }) 35,000 cities & 2,500 categories answer: cities → categories create one for zip codes and categories too! which order?
  • 56. {"location.city._id" : 1} {"location.city._id" : 1, "categories._id" : 1} answer: yes db.businesses.dropIndex("location.city._id_1") don’t we have 2 indexes on city id?
  • 57. #5 - find by keyword “something awesome” in Boulder, CO
  • 58. { "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ] } db.businesses.ensureIndex({ "location.city._id":1, "keywords":1 }) db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i }) find businesses in city by keyword
  • 59. me: we’re switching from postgres+solr to mongo kyle: oh wow, you can replace solr with mongo? me: with some creativity kyle: seems like it’d still be hard to get just right me: it works well kyle: gotcha chat with Kyle Banker
  • 60. i was wrong, kyle was right
  • 61. I I’ll never leave you again ...until MongoDB supports full text later this year :)
  • 63. sitemaps big list of every url
  • 64. • xml files containing each unique url ~ 24M • 50,000 urls per file, about 500 files • urls are generated from live data • http://companyx.com/sitemaps/1.xml sitemaps
  • 65. >> "hello!".hash % 6 #=> 5 >> "/ny/new-york/c/apartments".hash % 6 #=> 5 returns an integer between 0 and the number specified partition by consistent hash
  • 66. 1. map each url in the site to a partition 2. reduce all partitions to a single document containing all urls in that partition 3. save to a permanent collection map/reduce
  • 67. /il/chicago/c/pizza 4 1 /ny/new-york/c/apartments 1 nd/rugby/c/apartments 6 2 /14076500-bayside-marina 2 /13401000-comtrak-logistics-inc 3 3 /12347500-allstate-auto-insurance 1 il/downers-grove/c/computer-web-design 6 4 /1009500-heidelberg-lodges 5 mn/redwood-falls/c/food-service 4 5 /14077000-bank-of-america 5 mn/savage/c/audio-visual-equipment 1 6 ... map
  • 68. { { "total" : 2, "total" : 1, "urls" : [ "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment" "/ny/new-york/c/apartments" ] ] } } { "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] } } reduce
  • 69. db.sitemaps.findOne({_id:1}).value.urls [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] usage
  • 71. 115ms average response times 2 months later