SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Real-time Location Based Social
  Discovery using MongoDB




         Fredrik Björk
       Director of Engineering
           MongoSV, Dec 4th 2012
What is Banjo?
• The most powerful location based mobile
  technology that brings you the moments
  you would otherwise miss
• Aggregates geo tagged posts from
  Facebook, Twitter, Instagram and
  Foursquare in real-time
3
Stats
•   Launched June 2011
•   3 million users
•   Social graph of 400 million profiles
•   50 billion connections
•   ~200 geo posts created per second




                                          4
Why MongoDB?
• Developer friendly
• Easy to maintain and scale
• Automatic failover
• Rapid prototyping of features
• Good fit for consuming, storing and
  presenting JSON data
• Geospatial features out of the box


                                       5
Infrastructure
• ~160 EC2 instances (75% MongoDB, 25%
  Redis)
• SSD drives for low latency
• App servers (Sinatra & Rails) hosted on
  Heroku
• Mongos with authentication running on
  dedicated servers



                                            6
Geo tagged posts
• Consumed as JSON from social network
  APIs - streaming, polling & real-time
  callbacks
• Exposed via REST APIs as JSON to the
  Banjo iOS and Android apps




                                          7
Schema design




https://twitter.com/fbjork/status/262989592561606656




                                                       8
• _id is composed of provider (Facebook:
  1, Twitter: 2 etc.) and post id for
  uniqueness

          https://twitter.com/fbjork/status/262989592561606656


> db.posts.find({ _id: ‘2:262989592561606656’ })

{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    ...
}
                                                                                9
• Coordinates are stored inside an array
  with latitude, longitude


{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    coordinates: [37.784234,-122.438212],
    ...
}




                                                                            10
• Friends are stored inside an array



{
    _id: “2:262989592561606656”,
    username: “fbjork”,
    text: “Will give a presentation at #MongoSV on how we use @MongoDB for
    real-time location based social discovery at @Banjo http://www.10gen.com/
    events/mongosv”,
    coordinates: [37.784234,-122.438212],
    friend_ids: [8816792, 10324882, 2006261, ...]
}




                                                                            11
12
Geospatial Indexing
• Create the geo index:


> db.posts.ensureIndex( { coordinates: ‘2d’ } )




                                                  13
Find nearby posts in Miami:



> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } )


{ _id: “2:809438082”, coordinates: [25.792610,-80.226100], username:
“Rebecca_Boorsma”, text: “I love Miami!”, ... }


{ _id: “2:1234567”, coordinates: [25.781324,-80.431423], username:
“foo”, text: “Another day, another dollar”, ... }




                                                                       14
15
Find friend posts globally:



> db.posts.find({ friend_ids: { $in: [2006261] })


{
    _id: “2:10248172”,
    username: “fbjork”,
    friend_ids: [8816792, 10324882, 2006261, ...],
    ...
}




                                                     16
Find friend posts in a location:



> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },
friend_ids: { $in: [2006261] })


{
    _id: “2:10248172”,
    username: “fbjork”,
    friend_ids: [8816792, 10324882, 2006261, ...],
    ...
}



                                                                   17
Compound geo indexes
• Create a compound index on coordinates
  and friend_ids:

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )




                                                                 18
• Fails for compound indexes with large
   arrays
 • Geospatial indexes have a size limit of
   1000 bytes

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )


Error: Key too large to index




                                                                 19
Geospatial query performance
• Do we need a compound index at all?
• Geospatial index is usually restrictive
  enough
• Problem: Array traversal (using $in) is
  CPU hungry for large arrays
• Solution: Pre-sharded array fields




                                            20
Pre-sharded array fields
• When dealing with large arrays, i.e
  @BarackObama follower ids
• Partition fields using pre-sharding
• shard = Hash(key) MOD shard_count
• Keep array sizes in the low hundreds




                                         21
# shard_example.rb

SHARDS = 3
friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]
friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS }
0
2
0
2
1
2
0


{
    friends_0: [1000, 1002, 1006],
    friends_1: [1004],
    friends_2: [1001, 1003, 1005]
}

                                                           22
Find friend posts using pre-sharding
of the friend arrays:




> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },
friend_0: { $in: [1000] })

{
    friends_0: [1000, 1002, 1006],
    friends_1: [1004],
    friends_2: [1001, 1003, 1005]
}




                                                                   23
Capped collections
• Good fit for storing a feed of posts for a
  period of time
• Eliminates need to expire old posts
• Documents can’t grow
• Documents can’t be deleted
• Resizing collections is painful
• Can’t be sharded


                                              24
TTL collections
• We switched to TTL collections with
  MongoDB 2.2
• Deleting and growing documents is now
  possible
• Easier to change expiration times
• Can be sharded (not by geo)




                                          25
Questions




            26
Thank you!


     Available:                   fredrik@teambanjo.com
iPhone and Android                        @fbjork

Weitere ähnliche Inhalte

Was ist angesagt?

Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
Murat Çakal
 
Building a Location-based platform with MongoDB from Zero.
Building a Location-based platform with MongoDB from Zero.Building a Location-based platform with MongoDB from Zero.
Building a Location-based platform with MongoDB from Zero.
Ravi Teja
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 

Was ist angesagt? (19)

Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Building web applications with mongo db presentation
Building web applications with mongo db presentationBuilding web applications with mongo db presentation
Building web applications with mongo db presentation
 
Building a Location-based platform with MongoDB from Zero.
Building a Location-based platform with MongoDB from Zero.Building a Location-based platform with MongoDB from Zero.
Building a Location-based platform with MongoDB from Zero.
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
MongoDB dessi-codemotion
MongoDB dessi-codemotionMongoDB dessi-codemotion
MongoDB dessi-codemotion
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDBAgile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo db
 

Andere mochten auch

Strum To a New Market
Strum To a New MarketStrum To a New Market
Strum To a New Market
Kelly Ihme
 

Andere mochten auch (15)

夕会3
夕会3夕会3
夕会3
 
茨城県、医療・福祉で活性化
茨城県、医療・福祉で活性化茨城県、医療・福祉で活性化
茨城県、医療・福祉で活性化
 
ヤマトメール便が廃止に 4月以降の代替サービスは?
ヤマトメール便が廃止に 4月以降の代替サービスは?ヤマトメール便が廃止に 4月以降の代替サービスは?
ヤマトメール便が廃止に 4月以降の代替サービスは?
 
政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20
政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20
政府債務の償還と財源の通貨発行権(借換債と交付債)について2015.11.20
 
20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」
20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」
20140529毎日新聞社メディアカフェ講演「インターネットは政治を変えるか?―立命館大、毎日新聞共同研究が明らかにした可能性」
 
Strum To a New Market
Strum To a New MarketStrum To a New Market
Strum To a New Market
 
被災者の行政手続きにおける地図情報の活用とAPIの利用
被災者の行政手続きにおける地図情報の活用とAPIの利用被災者の行政手続きにおける地図情報の活用とAPIの利用
被災者の行政手続きにおける地図情報の活用とAPIの利用
 
ARtoolkitを用いた漢字学習
ARtoolkitを用いた漢字学習ARtoolkitを用いた漢字学習
ARtoolkitを用いた漢字学習
 
Editing tips
Editing tipsEditing tips
Editing tips
 
地域経済に対する自治体財政の影響に関する研究
地域経済に対する自治体財政の影響に関する研究地域経済に対する自治体財政の影響に関する研究
地域経済に対する自治体財政の影響に関する研究
 
研究内容プレゼンテーション(リサーチデザイン)
研究内容プレゼンテーション(リサーチデザイン)研究内容プレゼンテーション(リサーチデザイン)
研究内容プレゼンテーション(リサーチデザイン)
 
Glimpse Inside the 2016 Digital Storytelling Toolkit
Glimpse Inside the 2016 Digital Storytelling ToolkitGlimpse Inside the 2016 Digital Storytelling Toolkit
Glimpse Inside the 2016 Digital Storytelling Toolkit
 
yukai2
yukai2yukai2
yukai2
 
政府の人工知能研究の取組と産業界への期待
政府の人工知能研究の取組と産業界への期待政府の人工知能研究の取組と産業界への期待
政府の人工知能研究の取組と産業界への期待
 
政府債務の償還と財源の通貨発行権(借換債と交付債)
政府債務の償還と財源の通貨発行権(借換債と交付債)政府債務の償還と財源の通貨発行権(借換債と交付債)
政府債務の償還と財源の通貨発行権(借換債と交付債)
 

Ähnlich wie Real-time Location Based Social Discovery using MongoDB

First app online conf
First app   online confFirst app   online conf
First app online conf
MongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
rfischer20
 

Ähnlich wie Real-time Location Based Social Discovery using MongoDB (20)

Geoindexing with MongoDB
Geoindexing with MongoDBGeoindexing with MongoDB
Geoindexing with MongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
How to use MongoDB with CakePHP
How to use MongoDB with CakePHPHow to use MongoDB with CakePHP
How to use MongoDB with CakePHP
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
First app online conf
First app   online confFirst app   online conf
First app online conf
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
MongoDB at RuPy
MongoDB at RuPyMongoDB at RuPy
MongoDB at RuPy
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
 
Schema design
Schema designSchema design
Schema design
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Real-time Location Based Social Discovery using MongoDB

  • 1. Real-time Location Based Social Discovery using MongoDB Fredrik Björk Director of Engineering MongoSV, Dec 4th 2012
  • 2. What is Banjo? • The most powerful location based mobile technology that brings you the moments you would otherwise miss • Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time
  • 3. 3
  • 4. Stats • Launched June 2011 • 3 million users • Social graph of 400 million profiles • 50 billion connections • ~200 geo posts created per second 4
  • 5. Why MongoDB? • Developer friendly • Easy to maintain and scale • Automatic failover • Rapid prototyping of features • Good fit for consuming, storing and presenting JSON data • Geospatial features out of the box 5
  • 6. Infrastructure • ~160 EC2 instances (75% MongoDB, 25% Redis) • SSD drives for low latency • App servers (Sinatra & Rails) hosted on Heroku • Mongos with authentication running on dedicated servers 6
  • 7. Geo tagged posts • Consumed as JSON from social network APIs - streaming, polling & real-time callbacks • Exposed via REST APIs as JSON to the Banjo iOS and Android apps 7
  • 9. • _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness https://twitter.com/fbjork/status/262989592561606656 > db.posts.find({ _id: ‘2:262989592561606656’ }) { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, ... } 9
  • 10. • Coordinates are stored inside an array with latitude, longitude { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], ... } 10
  • 11. • Friends are stored inside an array { _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], friend_ids: [8816792, 10324882, 2006261, ...] } 11
  • 12. 12
  • 13. Geospatial Indexing • Create the geo index: > db.posts.ensureIndex( { coordinates: ‘2d’ } ) 13
  • 14. Find nearby posts in Miami: > db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } ) { _id: “2:809438082”, coordinates: [25.792610,-80.226100], username: “Rebecca_Boorsma”, text: “I love Miami!”, ... } { _id: “2:1234567”, coordinates: [25.781324,-80.431423], username: “foo”, text: “Another day, another dollar”, ... } 14
  • 15. 15
  • 16. Find friend posts globally: > db.posts.find({ friend_ids: { $in: [2006261] }) { _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } 16
  • 17. Find friend posts in a location: > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_ids: { $in: [2006261] }) { _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ... } 17
  • 18. Compound geo indexes • Create a compound index on coordinates and friend_ids: > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) 18
  • 19. • Fails for compound indexes with large arrays • Geospatial indexes have a size limit of 1000 bytes > db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) Error: Key too large to index 19
  • 20. Geospatial query performance • Do we need a compound index at all? • Geospatial index is usually restrictive enough • Problem: Array traversal (using $in) is CPU hungry for large arrays • Solution: Pre-sharded array fields 20
  • 21. Pre-sharded array fields • When dealing with large arrays, i.e @BarackObama follower ids • Partition fields using pre-sharding • shard = Hash(key) MOD shard_count • Keep array sizes in the low hundreds 21
  • 22. # shard_example.rb SHARDS = 3 friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006] friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS } 0 2 0 2 1 2 0 { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005] } 22
  • 23. Find friend posts using pre-sharding of the friend arrays: > db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_0: { $in: [1000] }) { friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005] } 23
  • 24. Capped collections • Good fit for storing a feed of posts for a period of time • Eliminates need to expire old posts • Documents can’t grow • Documents can’t be deleted • Resizing collections is painful • Can’t be sharded 24
  • 25. TTL collections • We switched to TTL collections with MongoDB 2.2 • Deleting and growing documents is now possible • Easier to change expiration times • Can be sharded (not by geo) 25
  • 26. Questions 26
  • 27. Thank you! Available: fredrik@teambanjo.com iPhone and Android @fbjork