SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Optimizing MongoDB:
Lessons Learned at Localytics

          Andrew Rollins
            June 2011
           MongoNYC
Me

•   Email: my first name @ localytics.com
•   twitter.com/andrew311
•   andrewrollins.com
•   Founder, Chief Software Architect at Localytics
Localytics

• Real time analytics for mobile applications
• Built on:
  –   Scala
  –   MongoDB
  –   Amazon Web Services
  –   Ruby on Rails
  –   and more…
Why I‟m here: brain dump!

• To share tips, tricks, and gotchas about:
  –   Documents
  –   Indexes
  –   Fragmentation
  –   Migrations
  –   Hardware
  –   MongoDB on AWS
• Basic to more advanced, a compliment to
  MongoDB Perf Tuning at MongoSF 2011
MongoDB at Localytics

• Use cases:
  – Anonymous loyalty information
  – De-duplication of incoming data
• Requirements:
  – High throughput
  – Add capacity without long down-time
• Scale today:
  – Over 1 billion events tracked in May
  – Thousands of MongoDB operations a second
Why MongoDB?

•   Stability
•   Community
•   Support
•   Drivers
•   Ease of use
•   Feature rich
•   Scale out
OPTIMIZE YOUR DATA
Documents and indexes
Shorten names

Bad:
{
    super_happy_fun_awesome_name: “yay!”
}

Good:
{
    s: “yay!”
}
Use BinData for UUIDs/hashes

Bad:
{
    u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,
    // 36 bytes plus field overhead
}

 Good:
{
    u: BinData(0, “…”),
    // 16 bytes plus field overhead
}
Override _id

Turn this
{
    _id : ObjectId("47cc67093475061e3d95369d"),
    u: BinData(0, “…”) // <- this is uniquely indexed
 }
 into
{
    _id : BinData(0, “…”) // was the u field
}

Eliminated an extra index, but be careful about
locality... (more later, see Further Reading at end)
Pack „em in

• Look for cases where you can squish multiple
  “records” into a single document.
• Why?
  – Decreases number of index entries
  – Brings documents closer to the size of a page,
    alleviating potential fragmentation
• Example: comments for a blog post.
Prefix Indexes
Suppose you have an index on a large field, but that field doesn‟t have
many possible values. You can use a “prefix index” to greatly decrease
index size.

find({k: <kval>})
{
    k: BinData(0, “…”),   // 32 byte SHA256, indexed
 }
into find({p: <prefix>, k: <kval>})
{
    k: BinData(0, “…”),   // 28 byte SHA256 suffix, not indexed
    p: <32-bit integer>   // first 4 bytes of k packed in integer, indexed
}

Example: git commits
FRAGMENTATION AND MIGRATION
Hidden evils
Fragmentation

• Data on disk is memory mapped into RAM.
• Mapped in pages (4KB usually).
• Deletes/updates will cause memory
  fragmentation.


    Disk                        RAM
    doc1                        doc1
                 find(doc1)                 Page
   deleted                     deleted
     …                           …
New writes mingle with old data

                     Data
                     doc1
                                  Page
Write docX           docX
                     doc3
                     doc4         Page
                     doc5

find(docX) also pulls in old doc1, wasting RAM
Dealing with fragmentation

• “mongod --repair” on a secondary, swap with
  primary.
• 1.9 has in-place compaction, but this still holds a
  write-lock.
• MongoDB will auto-pad records.
• Pad records yourself by including and then
  removing extra bytes on first insert.
   – Alternative offered in SERVER-1810.
The Dark Side of Migrations

• Chunks are a logical construct, not physical.
• Shard keys have serious implications.
• What could go wrong?
  – Let‟s run through an example.
Suppose the following

     Chunk 1     • K is the shard key
     k: 1 to 5
                 • K is random
     Chunk 2
     k: 6 to 9


     Shard 1
    {k: 3, …}     1st write
    {k: 9, …}     2nd write
    {k: 1, …}     and so on
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Migrate

     Chunk 1                 Chunk 1
     k: 1 to 5               k: 1 to 5

     Chunk 2
     k: 6 to 9


    Shard 1
                             Shard 2
    {k: 3, …}
                             {k: 3, …}
    {k: 9, …}    Random IO
                             {k: 1, …}
    {k: 1, …}
                             {k: 2, …}
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Shard 1 is now heavily fragmented

     Chunk 1                  Chunk 1
     k: 1 to 5                k: 1 to 5

     Chunk 2
     k: 6 to 9


     Shard 1
                              Shard 2
     {k: 3, …}
                              {k: 3, …}
     {k: 9, …}
                              {k: 1, …}
     {k: 1, …}   Fragmented
                              {k: 2, …}
     {k: 7, …}
     {k: 2, …}
     {k: 8, …}
Why is this scenario bad?

• Random reads
• Massive fragmentation
• New writes mingle with old data
How can we avoid bad migrations?

• Pre-split, pre-chunk
• Better shard keys for better locality
   – Ideally where data in the same chunk tends to be in
     the same region of disk
Pre-split and move

• If you know your key distribution, then pre-create
  your chunks and assign them.
• See this:
  – http://blog.zawodny.com/2011/03/06/mongodb-pre-
    splitting-for-faster-data-loading-and-importing/
Better shard keys

• Usually means including a time prefix in your
  shard key (e.g., {day: 100, id: X})
• Beware of write hotspots
• How to Choose a Shard Key
  – http://www.snailinaturtleneck.com/blog/2011/01/04/ho
    w-to-choose-a-shard-key-the-card-game/
OPTIMIZING HARDWARE/CLOUD
Working Set in RAM
• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.
• Workers hammering MongoDB with this loop, growing data:
   – Loop { insert 500 byte record; find random record }
• Thousands of ops per second when in RAM
• Much less throughput when working set (in this case, all data
  and index) grows beyond RAM.
                     Ops per second over time
                                                           In RAM



                                                           Not In RAM
Pre-fetch

• Updates hold a lock while they fetch the original
  from disk.
• Instead do a read to warm the doc in RAM under
  a shared read lock, then update.
Shard per core

• Instead of a shard per server, try a shard per
  core.
• Use this strategy to overcome write locks when
  writes per second matter.
• Why? Because MongoDB has one big write lock.
Amazon EC2

• High throughput / small working set
  – RAM matters, go with high memory instances.
• Low throughput / large working set
  –   Ephemeral storage might be OK.
  –   Remember that EBS IO goes over Ethernet.
  –   Pay attention to IO wait time (iostat).
  –   Your only shot at consistent perf: use the biggest
      instances in a family.
• Read this:
  – http://perfcap.blogspot.com/2011/03/understanding-
    and-using-amazon-ebs.html
Amazon EBS

• ~200 seeks per second per EBS on a good day
• EBS has *much* better random IO perf than
  ephemeral, but adds a dependency
• Use RAID0
• Check out this benchmark:
  – http://orion.heroku.com/past/2009/7/29/io_performanc
    e_on_ebs/
• To understand how to monitor EBS:
  – https://forums.aws.amazon.com/thread.jspa?messag
    eID=124044
Further Reading
•   MongoDB Performance Tuning
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
•   Monitoring Tips
     – http://blog.boxedice.com/mongodb-monitoring/
•   Markus‟ manual
     – http://www.markus-gattol.name/ws/mongodb.html
•   Helpful/interesting blog posts
     – http://nosql.mypopescu.com/tagged/mongodb/
•   MongoDB on EC2
     – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs
•   EC2 and Ephemeral Storage
     – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-
       ec2.html
•   MongoDB Strategies for the Disk Averse
     – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/
•   MongoDB Perf Tuning at MongoSF 2011
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
Thank you.

• Check out Localytics for mobile analytics!
• Reach me at:
  – Email: my first name @ localytics.com
  – twitter.com/andrew311
  – andrewrollins.com

Weitere ähnliche Inhalte

Was ist angesagt?

User interface and user experience ui ux design basics
User interface  and user experience ui ux design basicsUser interface  and user experience ui ux design basics
User interface and user experience ui ux design basicsRavi Bhadauria
 
User Behavior Analytics And The Benefits To Companies
User Behavior Analytics And The Benefits To CompaniesUser Behavior Analytics And The Benefits To Companies
User Behavior Analytics And The Benefits To CompaniesSpectorsoft
 
Guerilla Usability Testing
Guerilla Usability TestingGuerilla Usability Testing
Guerilla Usability TestingAndy Budd
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptxSurya937648
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
모바일 앱 개발을 위한 Agile 적용
모바일 앱 개발을 위한 Agile 적용모바일 앱 개발을 위한 Agile 적용
모바일 앱 개발을 위한 Agile 적용Kevin Kim
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store ProcedureMarco Tusa
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission FormBram Luyten
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 

Was ist angesagt? (20)

User interface and user experience ui ux design basics
User interface  and user experience ui ux design basicsUser interface  and user experience ui ux design basics
User interface and user experience ui ux design basics
 
User Behavior Analytics And The Benefits To Companies
User Behavior Analytics And The Benefits To CompaniesUser Behavior Analytics And The Benefits To Companies
User Behavior Analytics And The Benefits To Companies
 
Guerilla Usability Testing
Guerilla Usability TestingGuerilla Usability Testing
Guerilla Usability Testing
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
HDFS Overview
HDFS OverviewHDFS Overview
HDFS Overview
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
모바일 앱 개발을 위한 Agile 적용
모바일 앱 개발을 위한 Agile 적용모바일 앱 개발을 위한 Agile 적용
모바일 앱 개발을 위한 Agile 적용
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Agile UX
Agile UXAgile UX
Agile UX
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
MySQL developing Store Procedure
MySQL developing Store ProcedureMySQL developing Store Procedure
MySQL developing Store Procedure
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Data cube
Data cubeData cube
Data cube
 
Secrets of the DSpace Submission Form
Secrets of the DSpace Submission FormSecrets of the DSpace Submission Form
Secrets of the DSpace Submission Form
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Ch4 Threads
Ch4 ThreadsCh4 Threads
Ch4 Threads
 

Andere mochten auch

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!ADP, LLC
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In MarketingAdobe
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareHealth Catalyst
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...Health Catalyst
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...Health Catalyst
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to FailHealth Catalyst
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemHealth Catalyst
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHealth Catalyst
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesHealth Catalyst
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Health Catalyst
 

Andere mochten auch (14)

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In Marketing
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in Healthcare
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management System
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 

Ähnlich wie Optimizing MongoDB: Lessons Learned at Localytics (20)

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Optimizing MongoDB: Lessons Learned at Localytics

  • 1. Optimizing MongoDB: Lessons Learned at Localytics Andrew Rollins June 2011 MongoNYC
  • 2. Me • Email: my first name @ localytics.com • twitter.com/andrew311 • andrewrollins.com • Founder, Chief Software Architect at Localytics
  • 3. Localytics • Real time analytics for mobile applications • Built on: – Scala – MongoDB – Amazon Web Services – Ruby on Rails – and more…
  • 4. Why I‟m here: brain dump! • To share tips, tricks, and gotchas about: – Documents – Indexes – Fragmentation – Migrations – Hardware – MongoDB on AWS • Basic to more advanced, a compliment to MongoDB Perf Tuning at MongoSF 2011
  • 5. MongoDB at Localytics • Use cases: – Anonymous loyalty information – De-duplication of incoming data • Requirements: – High throughput – Add capacity without long down-time • Scale today: – Over 1 billion events tracked in May – Thousands of MongoDB operations a second
  • 6. Why MongoDB? • Stability • Community • Support • Drivers • Ease of use • Feature rich • Scale out
  • 8. Shorten names Bad: { super_happy_fun_awesome_name: “yay!” } Good: { s: “yay!” }
  • 9. Use BinData for UUIDs/hashes Bad: { u: “21EC2020-3AEA-1069-A2DD-08002B30309D”, // 36 bytes plus field overhead } Good: { u: BinData(0, “…”), // 16 bytes plus field overhead }
  • 10. Override _id Turn this { _id : ObjectId("47cc67093475061e3d95369d"), u: BinData(0, “…”) // <- this is uniquely indexed } into { _id : BinData(0, “…”) // was the u field } Eliminated an extra index, but be careful about locality... (more later, see Further Reading at end)
  • 11. Pack „em in • Look for cases where you can squish multiple “records” into a single document. • Why? – Decreases number of index entries – Brings documents closer to the size of a page, alleviating potential fragmentation • Example: comments for a blog post.
  • 12. Prefix Indexes Suppose you have an index on a large field, but that field doesn‟t have many possible values. You can use a “prefix index” to greatly decrease index size. find({k: <kval>}) { k: BinData(0, “…”), // 32 byte SHA256, indexed } into find({p: <prefix>, k: <kval>}) { k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed p: <32-bit integer> // first 4 bytes of k packed in integer, indexed } Example: git commits
  • 14. Fragmentation • Data on disk is memory mapped into RAM. • Mapped in pages (4KB usually). • Deletes/updates will cause memory fragmentation. Disk RAM doc1 doc1 find(doc1) Page deleted deleted … …
  • 15. New writes mingle with old data Data doc1 Page Write docX docX doc3 doc4 Page doc5 find(docX) also pulls in old doc1, wasting RAM
  • 16. Dealing with fragmentation • “mongod --repair” on a secondary, swap with primary. • 1.9 has in-place compaction, but this still holds a write-lock. • MongoDB will auto-pad records. • Pad records yourself by including and then removing extra bytes on first insert. – Alternative offered in SERVER-1810.
  • 17. The Dark Side of Migrations • Chunks are a logical construct, not physical. • Shard keys have serious implications. • What could go wrong? – Let‟s run through an example.
  • 18. Suppose the following Chunk 1 • K is the shard key k: 1 to 5 • K is random Chunk 2 k: 6 to 9 Shard 1 {k: 3, …} 1st write {k: 9, …} 2nd write {k: 1, …} and so on {k: 7, …} {k: 2, …} {k: 8, …}
  • 19. Migrate Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} Random IO {k: 1, …} {k: 1, …} {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 20. Shard 1 is now heavily fragmented Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} {k: 1, …} {k: 1, …} Fragmented {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 21. Why is this scenario bad? • Random reads • Massive fragmentation • New writes mingle with old data
  • 22. How can we avoid bad migrations? • Pre-split, pre-chunk • Better shard keys for better locality – Ideally where data in the same chunk tends to be in the same region of disk
  • 23. Pre-split and move • If you know your key distribution, then pre-create your chunks and assign them. • See this: – http://blog.zawodny.com/2011/03/06/mongodb-pre- splitting-for-faster-data-loading-and-importing/
  • 24. Better shard keys • Usually means including a time prefix in your shard key (e.g., {day: 100, id: X}) • Beware of write hotspots • How to Choose a Shard Key – http://www.snailinaturtleneck.com/blog/2011/01/04/ho w-to-choose-a-shard-key-the-card-game/
  • 26. Working Set in RAM • EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes. • Workers hammering MongoDB with this loop, growing data: – Loop { insert 500 byte record; find random record } • Thousands of ops per second when in RAM • Much less throughput when working set (in this case, all data and index) grows beyond RAM. Ops per second over time In RAM Not In RAM
  • 27. Pre-fetch • Updates hold a lock while they fetch the original from disk. • Instead do a read to warm the doc in RAM under a shared read lock, then update.
  • 28. Shard per core • Instead of a shard per server, try a shard per core. • Use this strategy to overcome write locks when writes per second matter. • Why? Because MongoDB has one big write lock.
  • 29. Amazon EC2 • High throughput / small working set – RAM matters, go with high memory instances. • Low throughput / large working set – Ephemeral storage might be OK. – Remember that EBS IO goes over Ethernet. – Pay attention to IO wait time (iostat). – Your only shot at consistent perf: use the biggest instances in a family. • Read this: – http://perfcap.blogspot.com/2011/03/understanding- and-using-amazon-ebs.html
  • 30. Amazon EBS • ~200 seeks per second per EBS on a good day • EBS has *much* better random IO perf than ephemeral, but adds a dependency • Use RAID0 • Check out this benchmark: – http://orion.heroku.com/past/2009/7/29/io_performanc e_on_ebs/ • To understand how to monitor EBS: – https://forums.aws.amazon.com/thread.jspa?messag eID=124044
  • 31. Further Reading • MongoDB Performance Tuning – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning • Monitoring Tips – http://blog.boxedice.com/mongodb-monitoring/ • Markus‟ manual – http://www.markus-gattol.name/ws/mongodb.html • Helpful/interesting blog posts – http://nosql.mypopescu.com/tagged/mongodb/ • MongoDB on EC2 – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs • EC2 and Ephemeral Storage – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws- ec2.html • MongoDB Strategies for the Disk Averse – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/ • MongoDB Perf Tuning at MongoSF 2011 – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
  • 32. Thank you. • Check out Localytics for mobile analytics! • Reach me at: – Email: my first name @ localytics.com – twitter.com/andrew311 – andrewrollins.com