SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Optimizing MongoDB:
Lessons Learned at Localytics

          Andrew Rollins
            June 2011
           MongoNYC
Me

•   Email: my first name @ localytics.com
•   twitter.com/andrew311
•   andrewrollins.com
•   Founder, Chief Software Architect at Localytics
Localytics

• Real time analytics for mobile applications
• Built on:
  –   Scala
  –   MongoDB
  –   Amazon Web Services
  –   Ruby on Rails
  –   and more…
Why I‟m here: brain dump!

• To share tips, tricks, and gotchas about:
  –   Documents
  –   Indexes
  –   Fragmentation
  –   Migrations
  –   Hardware
  –   MongoDB on AWS
• Basic to more advanced, a compliment to
  MongoDB Perf Tuning at MongoSF 2011
MongoDB at Localytics

• Use cases:
  – Anonymous loyalty information
  – De-duplication of incoming data
• Requirements:
  – High throughput
  – Add capacity without long down-time
• Scale today:
  – Over 1 billion events tracked in May
  – Thousands of MongoDB operations a second
Why MongoDB?

•   Stability
•   Community
•   Support
•   Drivers
•   Ease of use
•   Feature rich
•   Scale out
OPTIMIZE YOUR DATA
Documents and indexes
Shorten names

Bad:
{
    super_happy_fun_awesome_name: “yay!”
}

Good:
{
    s: “yay!”
}
Use BinData for UUIDs/hashes

Bad:
{
    u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,
    // 36 bytes plus field overhead
}

 Good:
{
    u: BinData(0, “…”),
    // 16 bytes plus field overhead
}
Override _id

Turn this
{
    _id : ObjectId("47cc67093475061e3d95369d"),
    u: BinData(0, “…”) // <- this is uniquely indexed
 }
 into
{
    _id : BinData(0, “…”) // was the u field
}

Eliminated an extra index, but be careful about
locality... (more later, see Further Reading at end)
Pack „em in

• Look for cases where you can squish multiple
  “records” into a single document.
• Why?
  – Decreases number of index entries
  – Brings documents closer to the size of a page,
    alleviating potential fragmentation
• Example: comments for a blog post.
Prefix Indexes
Suppose you have an index on a large field, but that field doesn‟t have
many possible values. You can use a “prefix index” to greatly decrease
index size.

find({k: <kval>})
{
    k: BinData(0, “…”),   // 32 byte SHA256, indexed
 }
into find({p: <prefix>, k: <kval>})
{
    k: BinData(0, “…”),   // 28 byte SHA256 suffix, not indexed
    p: <32-bit integer>   // first 4 bytes of k packed in integer, indexed
}

Example: git commits
FRAGMENTATION AND MIGRATION
Hidden evils
Fragmentation

• Data on disk is memory mapped into RAM.
• Mapped in pages (4KB usually).
• Deletes/updates will cause memory
  fragmentation.


    Disk                        RAM
    doc1                        doc1
                 find(doc1)                 Page
   deleted                     deleted
     …                           …
New writes mingle with old data

                     Data
                     doc1
                                  Page
Write docX           docX
                     doc3
                     doc4         Page
                     doc5

find(docX) also pulls in old doc1, wasting RAM
Dealing with fragmentation

• “mongod --repair” on a secondary, swap with
  primary.
• 1.9 has in-place compaction, but this still holds a
  write-lock.
• MongoDB will auto-pad records.
• Pad records yourself by including and then
  removing extra bytes on first insert.
   – Alternative offered in SERVER-1810.
The Dark Side of Migrations

• Chunks are a logical construct, not physical.
• Shard keys have serious implications.
• What could go wrong?
  – Let‟s run through an example.
Suppose the following

     Chunk 1     • K is the shard key
     k: 1 to 5
                 • K is random
     Chunk 2
     k: 6 to 9


     Shard 1
    {k: 3, …}     1st write
    {k: 9, …}     2nd write
    {k: 1, …}     and so on
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Migrate

     Chunk 1                 Chunk 1
     k: 1 to 5               k: 1 to 5

     Chunk 2
     k: 6 to 9


    Shard 1
                             Shard 2
    {k: 3, …}
                             {k: 3, …}
    {k: 9, …}    Random IO
                             {k: 1, …}
    {k: 1, …}
                             {k: 2, …}
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Shard 1 is now heavily fragmented

     Chunk 1                  Chunk 1
     k: 1 to 5                k: 1 to 5

     Chunk 2
     k: 6 to 9


     Shard 1
                              Shard 2
     {k: 3, …}
                              {k: 3, …}
     {k: 9, …}
                              {k: 1, …}
     {k: 1, …}   Fragmented
                              {k: 2, …}
     {k: 7, …}
     {k: 2, …}
     {k: 8, …}
Why is this scenario bad?

• Random reads
• Massive fragmentation
• New writes mingle with old data
How can we avoid bad migrations?

• Pre-split, pre-chunk
• Better shard keys for better locality
   – Ideally where data in the same chunk tends to be in
     the same region of disk
Pre-split and move

• If you know your key distribution, then pre-create
  your chunks and assign them.
• See this:
  – http://blog.zawodny.com/2011/03/06/mongodb-pre-
    splitting-for-faster-data-loading-and-importing/
Better shard keys

• Usually means including a time prefix in your
  shard key (e.g., {day: 100, id: X})
• Beware of write hotspots
• How to Choose a Shard Key
  – http://www.snailinaturtleneck.com/blog/2011/01/04/ho
    w-to-choose-a-shard-key-the-card-game/
OPTIMIZING HARDWARE/CLOUD
Working Set in RAM
• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.
• Workers hammering MongoDB with this loop, growing data:
   – Loop { insert 500 byte record; find random record }
• Thousands of ops per second when in RAM
• Much less throughput when working set (in this case, all data
  and index) grows beyond RAM.
                     Ops per second over time
                                                           In RAM



                                                           Not In RAM
Pre-fetch

• Updates hold a lock while they fetch the original
  from disk.
• Instead do a read to warm the doc in RAM under
  a shared read lock, then update.
Shard per core

• Instead of a shard per server, try a shard per
  core.
• Use this strategy to overcome write locks when
  writes per second matter.
• Why? Because MongoDB has one big write lock.
Amazon EC2

• High throughput / small working set
  – RAM matters, go with high memory instances.
• Low throughput / large working set
  –   Ephemeral storage might be OK.
  –   Remember that EBS IO goes over Ethernet.
  –   Pay attention to IO wait time (iostat).
  –   Your only shot at consistent perf: use the biggest
      instances in a family.
• Read this:
  – http://perfcap.blogspot.com/2011/03/understanding-
    and-using-amazon-ebs.html
Amazon EBS

• ~200 seeks per second per EBS on a good day
• EBS has *much* better random IO perf than
  ephemeral, but adds a dependency
• Use RAID0
• Check out this benchmark:
  – http://orion.heroku.com/past/2009/7/29/io_performanc
    e_on_ebs/
• To understand how to monitor EBS:
  – https://forums.aws.amazon.com/thread.jspa?messag
    eID=124044
Further Reading
•   MongoDB Performance Tuning
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
•   Monitoring Tips
     – http://blog.boxedice.com/mongodb-monitoring/
•   Markus‟ manual
     – http://www.markus-gattol.name/ws/mongodb.html
•   Helpful/interesting blog posts
     – http://nosql.mypopescu.com/tagged/mongodb/
•   MongoDB on EC2
     – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs
•   EC2 and Ephemeral Storage
     – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-
       ec2.html
•   MongoDB Strategies for the Disk Averse
     – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/
•   MongoDB Perf Tuning at MongoSF 2011
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
Thank you.

• Check out Localytics for mobile analytics!
• Reach me at:
  – Email: my first name @ localytics.com
  – twitter.com/andrew311
  – andrewrollins.com

Weitere ähnliche Inhalte

Was ist angesagt?

[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerKevin Kline
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
 Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on KubernetesRené Cannaò
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena DollyJi-Woong Choi
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @MovileEiti Kimura
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBInfluxData
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
 

Was ist angesagt? (20)

[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
HBase
HBaseHBase
HBase
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL Server
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
 Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on Kubernetes
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly
[오픈소스컨설팅]이기종 WAS 클러스터링 솔루션- Athena Dolly
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
 

Andere mochten auch

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!ADP, LLC
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In MarketingAdobe
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareHealth Catalyst
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...Health Catalyst
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...Health Catalyst
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to FailHealth Catalyst
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemHealth Catalyst
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHealth Catalyst
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesHealth Catalyst
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Health Catalyst
 

Andere mochten auch (14)

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In Marketing
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in Healthcare
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management System
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics (20)

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Optimizing MongoDB: Lessons Learned at Localytics

  • 1. Optimizing MongoDB: Lessons Learned at Localytics Andrew Rollins June 2011 MongoNYC
  • 2. Me • Email: my first name @ localytics.com • twitter.com/andrew311 • andrewrollins.com • Founder, Chief Software Architect at Localytics
  • 3. Localytics • Real time analytics for mobile applications • Built on: – Scala – MongoDB – Amazon Web Services – Ruby on Rails – and more…
  • 4. Why I‟m here: brain dump! • To share tips, tricks, and gotchas about: – Documents – Indexes – Fragmentation – Migrations – Hardware – MongoDB on AWS • Basic to more advanced, a compliment to MongoDB Perf Tuning at MongoSF 2011
  • 5. MongoDB at Localytics • Use cases: – Anonymous loyalty information – De-duplication of incoming data • Requirements: – High throughput – Add capacity without long down-time • Scale today: – Over 1 billion events tracked in May – Thousands of MongoDB operations a second
  • 6. Why MongoDB? • Stability • Community • Support • Drivers • Ease of use • Feature rich • Scale out
  • 8. Shorten names Bad: { super_happy_fun_awesome_name: “yay!” } Good: { s: “yay!” }
  • 9. Use BinData for UUIDs/hashes Bad: { u: “21EC2020-3AEA-1069-A2DD-08002B30309D”, // 36 bytes plus field overhead } Good: { u: BinData(0, “…”), // 16 bytes plus field overhead }
  • 10. Override _id Turn this { _id : ObjectId("47cc67093475061e3d95369d"), u: BinData(0, “…”) // <- this is uniquely indexed } into { _id : BinData(0, “…”) // was the u field } Eliminated an extra index, but be careful about locality... (more later, see Further Reading at end)
  • 11. Pack „em in • Look for cases where you can squish multiple “records” into a single document. • Why? – Decreases number of index entries – Brings documents closer to the size of a page, alleviating potential fragmentation • Example: comments for a blog post.
  • 12. Prefix Indexes Suppose you have an index on a large field, but that field doesn‟t have many possible values. You can use a “prefix index” to greatly decrease index size. find({k: <kval>}) { k: BinData(0, “…”), // 32 byte SHA256, indexed } into find({p: <prefix>, k: <kval>}) { k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed p: <32-bit integer> // first 4 bytes of k packed in integer, indexed } Example: git commits
  • 14. Fragmentation • Data on disk is memory mapped into RAM. • Mapped in pages (4KB usually). • Deletes/updates will cause memory fragmentation. Disk RAM doc1 doc1 find(doc1) Page deleted deleted … …
  • 15. New writes mingle with old data Data doc1 Page Write docX docX doc3 doc4 Page doc5 find(docX) also pulls in old doc1, wasting RAM
  • 16. Dealing with fragmentation • “mongod --repair” on a secondary, swap with primary. • 1.9 has in-place compaction, but this still holds a write-lock. • MongoDB will auto-pad records. • Pad records yourself by including and then removing extra bytes on first insert. – Alternative offered in SERVER-1810.
  • 17. The Dark Side of Migrations • Chunks are a logical construct, not physical. • Shard keys have serious implications. • What could go wrong? – Let‟s run through an example.
  • 18. Suppose the following Chunk 1 • K is the shard key k: 1 to 5 • K is random Chunk 2 k: 6 to 9 Shard 1 {k: 3, …} 1st write {k: 9, …} 2nd write {k: 1, …} and so on {k: 7, …} {k: 2, …} {k: 8, …}
  • 19. Migrate Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} Random IO {k: 1, …} {k: 1, …} {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 20. Shard 1 is now heavily fragmented Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} {k: 1, …} {k: 1, …} Fragmented {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 21. Why is this scenario bad? • Random reads • Massive fragmentation • New writes mingle with old data
  • 22. How can we avoid bad migrations? • Pre-split, pre-chunk • Better shard keys for better locality – Ideally where data in the same chunk tends to be in the same region of disk
  • 23. Pre-split and move • If you know your key distribution, then pre-create your chunks and assign them. • See this: – http://blog.zawodny.com/2011/03/06/mongodb-pre- splitting-for-faster-data-loading-and-importing/
  • 24. Better shard keys • Usually means including a time prefix in your shard key (e.g., {day: 100, id: X}) • Beware of write hotspots • How to Choose a Shard Key – http://www.snailinaturtleneck.com/blog/2011/01/04/ho w-to-choose-a-shard-key-the-card-game/
  • 26. Working Set in RAM • EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes. • Workers hammering MongoDB with this loop, growing data: – Loop { insert 500 byte record; find random record } • Thousands of ops per second when in RAM • Much less throughput when working set (in this case, all data and index) grows beyond RAM. Ops per second over time In RAM Not In RAM
  • 27. Pre-fetch • Updates hold a lock while they fetch the original from disk. • Instead do a read to warm the doc in RAM under a shared read lock, then update.
  • 28. Shard per core • Instead of a shard per server, try a shard per core. • Use this strategy to overcome write locks when writes per second matter. • Why? Because MongoDB has one big write lock.
  • 29. Amazon EC2 • High throughput / small working set – RAM matters, go with high memory instances. • Low throughput / large working set – Ephemeral storage might be OK. – Remember that EBS IO goes over Ethernet. – Pay attention to IO wait time (iostat). – Your only shot at consistent perf: use the biggest instances in a family. • Read this: – http://perfcap.blogspot.com/2011/03/understanding- and-using-amazon-ebs.html
  • 30. Amazon EBS • ~200 seeks per second per EBS on a good day • EBS has *much* better random IO perf than ephemeral, but adds a dependency • Use RAID0 • Check out this benchmark: – http://orion.heroku.com/past/2009/7/29/io_performanc e_on_ebs/ • To understand how to monitor EBS: – https://forums.aws.amazon.com/thread.jspa?messag eID=124044
  • 31. Further Reading • MongoDB Performance Tuning – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning • Monitoring Tips – http://blog.boxedice.com/mongodb-monitoring/ • Markus‟ manual – http://www.markus-gattol.name/ws/mongodb.html • Helpful/interesting blog posts – http://nosql.mypopescu.com/tagged/mongodb/ • MongoDB on EC2 – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs • EC2 and Ephemeral Storage – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws- ec2.html • MongoDB Strategies for the Disk Averse – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/ • MongoDB Perf Tuning at MongoSF 2011 – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
  • 32. Thank you. • Check out Localytics for mobile analytics! • Reach me at: – Email: my first name @ localytics.com – twitter.com/andrew311 – andrewrollins.com