SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Optimizing MongoDB:
Lessons Learned at Localytics

          Andrew Rollins
            June 2011
           MongoNYC
Me

•   Email: my first name @ localytics.com
•   twitter.com/andrew311
•   andrewrollins.com
•   Founder, Chief Software Architect at Localytics
Localytics

• Real time analytics for mobile applications
• Built on:
  –   Scala
  –   MongoDB
  –   Amazon Web Services
  –   Ruby on Rails
  –   and more…
Why I‟m here: brain dump!

• To share tips, tricks, and gotchas about:
  –   Documents
  –   Indexes
  –   Fragmentation
  –   Migrations
  –   Hardware
  –   MongoDB on AWS
• Basic to more advanced, a compliment to
  MongoDB Perf Tuning at MongoSF 2011
MongoDB at Localytics

• Use cases:
  – Anonymous loyalty information
  – De-duplication of incoming data
• Requirements:
  – High throughput
  – Add capacity without long down-time
• Scale today:
  – Over 1 billion events tracked in May
  – Thousands of MongoDB operations a second
Why MongoDB?

•   Stability
•   Community
•   Support
•   Drivers
•   Ease of use
•   Feature rich
•   Scale out
OPTIMIZE YOUR DATA
Documents and indexes
Shorten names

Bad:
{
    super_happy_fun_awesome_name: “yay!”
}

Good:
{
    s: “yay!”
}
Use BinData for UUIDs/hashes

Bad:
{
    u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,
    // 36 bytes plus field overhead
}

 Good:
{
    u: BinData(0, “…”),
    // 16 bytes plus field overhead
}
Override _id

Turn this
{
    _id : ObjectId("47cc67093475061e3d95369d"),
    u: BinData(0, “…”) // <- this is uniquely indexed
 }
 into
{
    _id : BinData(0, “…”) // was the u field
}

Eliminated an extra index, but be careful about
locality... (more later, see Further Reading at end)
Pack „em in

• Look for cases where you can squish multiple
  “records” into a single document.
• Why?
  – Decreases number of index entries
  – Brings documents closer to the size of a page,
    alleviating potential fragmentation
• Example: comments for a blog post.
Prefix Indexes
Suppose you have an index on a large field, but that field doesn‟t have
many possible values. You can use a “prefix index” to greatly decrease
index size.

find({k: <kval>})
{
    k: BinData(0, “…”),   // 32 byte SHA256, indexed
 }
into find({p: <prefix>, k: <kval>})
{
    k: BinData(0, “…”),   // 28 byte SHA256 suffix, not indexed
    p: <32-bit integer>   // first 4 bytes of k packed in integer, indexed
}

Example: git commits
FRAGMENTATION AND MIGRATION
Hidden evils
Fragmentation

• Data on disk is memory mapped into RAM.
• Mapped in pages (4KB usually).
• Deletes/updates will cause memory
  fragmentation.


    Disk                        RAM
    doc1                        doc1
                 find(doc1)                 Page
   deleted                     deleted
     …                           …
New writes mingle with old data

                     Data
                     doc1
                                  Page
Write docX           docX
                     doc3
                     doc4         Page
                     doc5

find(docX) also pulls in old doc1, wasting RAM
Dealing with fragmentation

• “mongod --repair” on a secondary, swap with
  primary.
• 1.9 has in-place compaction, but this still holds a
  write-lock.
• MongoDB will auto-pad records.
• Pad records yourself by including and then
  removing extra bytes on first insert.
   – Alternative offered in SERVER-1810.
The Dark Side of Migrations

• Chunks are a logical construct, not physical.
• Shard keys have serious implications.
• What could go wrong?
  – Let‟s run through an example.
Suppose the following

     Chunk 1     • K is the shard key
     k: 1 to 5
                 • K is random
     Chunk 2
     k: 6 to 9


     Shard 1
    {k: 3, …}     1st write
    {k: 9, …}     2nd write
    {k: 1, …}     and so on
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Migrate

     Chunk 1                 Chunk 1
     k: 1 to 5               k: 1 to 5

     Chunk 2
     k: 6 to 9


    Shard 1
                             Shard 2
    {k: 3, …}
                             {k: 3, …}
    {k: 9, …}    Random IO
                             {k: 1, …}
    {k: 1, …}
                             {k: 2, …}
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Shard 1 is now heavily fragmented

     Chunk 1                  Chunk 1
     k: 1 to 5                k: 1 to 5

     Chunk 2
     k: 6 to 9


     Shard 1
                              Shard 2
     {k: 3, …}
                              {k: 3, …}
     {k: 9, …}
                              {k: 1, …}
     {k: 1, …}   Fragmented
                              {k: 2, …}
     {k: 7, …}
     {k: 2, …}
     {k: 8, …}
Why is this scenario bad?

• Random reads
• Massive fragmentation
• New writes mingle with old data
How can we avoid bad migrations?

• Pre-split, pre-chunk
• Better shard keys for better locality
   – Ideally where data in the same chunk tends to be in
     the same region of disk
Pre-split and move

• If you know your key distribution, then pre-create
  your chunks and assign them.
• See this:
  – http://blog.zawodny.com/2011/03/06/mongodb-pre-
    splitting-for-faster-data-loading-and-importing/
Better shard keys

• Usually means including a time prefix in your
  shard key (e.g., {day: 100, id: X})
• Beware of write hotspots
• How to Choose a Shard Key
  – http://www.snailinaturtleneck.com/blog/2011/01/04/ho
    w-to-choose-a-shard-key-the-card-game/
OPTIMIZING HARDWARE/CLOUD
Working Set in RAM
• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.
• Workers hammering MongoDB with this loop, growing data:
   – Loop { insert 500 byte record; find random record }
• Thousands of ops per second when in RAM
• Much less throughput when working set (in this case, all data
  and index) grows beyond RAM.
                     Ops per second over time
                                                           In RAM



                                                           Not In RAM
Pre-fetch

• Updates hold a lock while they fetch the original
  from disk.
• Instead do a read to warm the doc in RAM under
  a shared read lock, then update.
Shard per core

• Instead of a shard per server, try a shard per
  core.
• Use this strategy to overcome write locks when
  writes per second matter.
• Why? Because MongoDB has one big write lock.
Amazon EC2

• High throughput / small working set
  – RAM matters, go with high memory instances.
• Low throughput / large working set
  –   Ephemeral storage might be OK.
  –   Remember that EBS IO goes over Ethernet.
  –   Pay attention to IO wait time (iostat).
  –   Your only shot at consistent perf: use the biggest
      instances in a family.
• Read this:
  – http://perfcap.blogspot.com/2011/03/understanding-
    and-using-amazon-ebs.html
Amazon EBS

• ~200 seeks per second per EBS on a good day
• EBS has *much* better random IO perf than
  ephemeral, but adds a dependency
• Use RAID0
• Check out this benchmark:
  – http://orion.heroku.com/past/2009/7/29/io_performanc
    e_on_ebs/
• To understand how to monitor EBS:
  – https://forums.aws.amazon.com/thread.jspa?messag
    eID=124044
Further Reading
•   MongoDB Performance Tuning
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
•   Monitoring Tips
     – http://blog.boxedice.com/mongodb-monitoring/
•   Markus‟ manual
     – http://www.markus-gattol.name/ws/mongodb.html
•   Helpful/interesting blog posts
     – http://nosql.mypopescu.com/tagged/mongodb/
•   MongoDB on EC2
     – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs
•   EC2 and Ephemeral Storage
     – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-
       ec2.html
•   MongoDB Strategies for the Disk Averse
     – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/
•   MongoDB Perf Tuning at MongoSF 2011
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
Thank you.

• Check out Localytics for mobile analytics!
• Reach me at:
  – Email: my first name @ localytics.com
  – twitter.com/andrew311
  – andrewrollins.com

Weitere ähnliche Inhalte

Was ist angesagt?

Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAACuneyt Goksu
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 ProcessorDVClub
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)kakugawa
 
Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Monowar Mukul
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
LF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...Amazon Web Services
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析frogd
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Ludovico Caldara
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 

Was ist angesagt? (20)

Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Cost Based Oracle Fundamentals
Cost Based Oracle FundamentalsCost Based Oracle Fundamentals
Cost Based Oracle Fundamentals
 
FIREWALLD
FIREWALLDFIREWALLD
FIREWALLD
 
LF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver model
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析
 
Treinamento Oracle GoldenGate 19c
Treinamento Oracle GoldenGate 19cTreinamento Oracle GoldenGate 19c
Treinamento Oracle GoldenGate 19c
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Data servers
Data serversData servers
Data servers
 
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 

Andere mochten auch

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!ADP, LLC
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In MarketingAdobe
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareHealth Catalyst
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...Health Catalyst
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...Health Catalyst
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to FailHealth Catalyst
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemHealth Catalyst
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHealth Catalyst
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesHealth Catalyst
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Health Catalyst
 

Andere mochten auch (14)

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In Marketing
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in Healthcare
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management System
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics (20)

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 

Kürzlich hochgeladen

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Optimizing MongoDB: Lessons Learned at Localytics

  • 1. Optimizing MongoDB: Lessons Learned at Localytics Andrew Rollins June 2011 MongoNYC
  • 2. Me • Email: my first name @ localytics.com • twitter.com/andrew311 • andrewrollins.com • Founder, Chief Software Architect at Localytics
  • 3. Localytics • Real time analytics for mobile applications • Built on: – Scala – MongoDB – Amazon Web Services – Ruby on Rails – and more…
  • 4. Why I‟m here: brain dump! • To share tips, tricks, and gotchas about: – Documents – Indexes – Fragmentation – Migrations – Hardware – MongoDB on AWS • Basic to more advanced, a compliment to MongoDB Perf Tuning at MongoSF 2011
  • 5. MongoDB at Localytics • Use cases: – Anonymous loyalty information – De-duplication of incoming data • Requirements: – High throughput – Add capacity without long down-time • Scale today: – Over 1 billion events tracked in May – Thousands of MongoDB operations a second
  • 6. Why MongoDB? • Stability • Community • Support • Drivers • Ease of use • Feature rich • Scale out
  • 8. Shorten names Bad: { super_happy_fun_awesome_name: “yay!” } Good: { s: “yay!” }
  • 9. Use BinData for UUIDs/hashes Bad: { u: “21EC2020-3AEA-1069-A2DD-08002B30309D”, // 36 bytes plus field overhead } Good: { u: BinData(0, “…”), // 16 bytes plus field overhead }
  • 10. Override _id Turn this { _id : ObjectId("47cc67093475061e3d95369d"), u: BinData(0, “…”) // <- this is uniquely indexed } into { _id : BinData(0, “…”) // was the u field } Eliminated an extra index, but be careful about locality... (more later, see Further Reading at end)
  • 11. Pack „em in • Look for cases where you can squish multiple “records” into a single document. • Why? – Decreases number of index entries – Brings documents closer to the size of a page, alleviating potential fragmentation • Example: comments for a blog post.
  • 12. Prefix Indexes Suppose you have an index on a large field, but that field doesn‟t have many possible values. You can use a “prefix index” to greatly decrease index size. find({k: <kval>}) { k: BinData(0, “…”), // 32 byte SHA256, indexed } into find({p: <prefix>, k: <kval>}) { k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed p: <32-bit integer> // first 4 bytes of k packed in integer, indexed } Example: git commits
  • 14. Fragmentation • Data on disk is memory mapped into RAM. • Mapped in pages (4KB usually). • Deletes/updates will cause memory fragmentation. Disk RAM doc1 doc1 find(doc1) Page deleted deleted … …
  • 15. New writes mingle with old data Data doc1 Page Write docX docX doc3 doc4 Page doc5 find(docX) also pulls in old doc1, wasting RAM
  • 16. Dealing with fragmentation • “mongod --repair” on a secondary, swap with primary. • 1.9 has in-place compaction, but this still holds a write-lock. • MongoDB will auto-pad records. • Pad records yourself by including and then removing extra bytes on first insert. – Alternative offered in SERVER-1810.
  • 17. The Dark Side of Migrations • Chunks are a logical construct, not physical. • Shard keys have serious implications. • What could go wrong? – Let‟s run through an example.
  • 18. Suppose the following Chunk 1 • K is the shard key k: 1 to 5 • K is random Chunk 2 k: 6 to 9 Shard 1 {k: 3, …} 1st write {k: 9, …} 2nd write {k: 1, …} and so on {k: 7, …} {k: 2, …} {k: 8, …}
  • 19. Migrate Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} Random IO {k: 1, …} {k: 1, …} {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 20. Shard 1 is now heavily fragmented Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} {k: 1, …} {k: 1, …} Fragmented {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 21. Why is this scenario bad? • Random reads • Massive fragmentation • New writes mingle with old data
  • 22. How can we avoid bad migrations? • Pre-split, pre-chunk • Better shard keys for better locality – Ideally where data in the same chunk tends to be in the same region of disk
  • 23. Pre-split and move • If you know your key distribution, then pre-create your chunks and assign them. • See this: – http://blog.zawodny.com/2011/03/06/mongodb-pre- splitting-for-faster-data-loading-and-importing/
  • 24. Better shard keys • Usually means including a time prefix in your shard key (e.g., {day: 100, id: X}) • Beware of write hotspots • How to Choose a Shard Key – http://www.snailinaturtleneck.com/blog/2011/01/04/ho w-to-choose-a-shard-key-the-card-game/
  • 26. Working Set in RAM • EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes. • Workers hammering MongoDB with this loop, growing data: – Loop { insert 500 byte record; find random record } • Thousands of ops per second when in RAM • Much less throughput when working set (in this case, all data and index) grows beyond RAM. Ops per second over time In RAM Not In RAM
  • 27. Pre-fetch • Updates hold a lock while they fetch the original from disk. • Instead do a read to warm the doc in RAM under a shared read lock, then update.
  • 28. Shard per core • Instead of a shard per server, try a shard per core. • Use this strategy to overcome write locks when writes per second matter. • Why? Because MongoDB has one big write lock.
  • 29. Amazon EC2 • High throughput / small working set – RAM matters, go with high memory instances. • Low throughput / large working set – Ephemeral storage might be OK. – Remember that EBS IO goes over Ethernet. – Pay attention to IO wait time (iostat). – Your only shot at consistent perf: use the biggest instances in a family. • Read this: – http://perfcap.blogspot.com/2011/03/understanding- and-using-amazon-ebs.html
  • 30. Amazon EBS • ~200 seeks per second per EBS on a good day • EBS has *much* better random IO perf than ephemeral, but adds a dependency • Use RAID0 • Check out this benchmark: – http://orion.heroku.com/past/2009/7/29/io_performanc e_on_ebs/ • To understand how to monitor EBS: – https://forums.aws.amazon.com/thread.jspa?messag eID=124044
  • 31. Further Reading • MongoDB Performance Tuning – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning • Monitoring Tips – http://blog.boxedice.com/mongodb-monitoring/ • Markus‟ manual – http://www.markus-gattol.name/ws/mongodb.html • Helpful/interesting blog posts – http://nosql.mypopescu.com/tagged/mongodb/ • MongoDB on EC2 – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs • EC2 and Ephemeral Storage – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws- ec2.html • MongoDB Strategies for the Disk Averse – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/ • MongoDB Perf Tuning at MongoSF 2011 – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
  • 32. Thank you. • Check out Localytics for mobile analytics! • Reach me at: – Email: my first name @ localytics.com – twitter.com/andrew311 – andrewrollins.com