Tales from the Field

@comerford #MongoDBLondon
Tales from the Field
Adam Comerford
Senior Solutions Engineer, MongoDB

Or:
● Cautionary Tales
● Don’t solve the wrong problems
● Bad schemas hurt ops too
● etc.

The Stories
● Are (mostly) true, and (mostly) actually happened
● Names have been changed to protect the (mostly)
innocent
● No animals were harmed during the making of this
presentation
○ Perhaps a few DBAs and engineers had light
emotional scarring
● Some of the people that inspired the stories may well be
here today at MongoDB London

Story #1: Bill the Bulk Updater
● Bill built a system that tracked status information for
entities in his business domain
● State changes for this system happened in batches:
o Sometimes 10% of entities get updated
o Sometimes 100% get updated
● Essentially, lots of random updates

Bill’s Initial Architecture
Application / mongos
mongod

What about production?
● Bill’s system was a success!
● The product grew, and the number of entities increased
by a factor of 5
● Not a problem - add more shards!

Bill’s Eventual Architecture
mongod
…16 more shards…

Linear Scaling
● Bill’s cluster scaled linearly, as intended
● But, Bill’s TCO scaled linearly too
● More growth was forecast

Large Cluster, Large Expense
● Entity growth predicted at 10x
● Rough calculations called for ~200 shards
● Linear scaling of cost

What problem did Bill overlook?
● Horizontal Scaling = Linear Scaling
● Not necessarily the most efficient option

The “Golden Hammer” Tendency

What did we recommend?
● Scale the random I/O vertically, not horizontally
● Sometimes a combination of vertical & horizontal
scaling is the best approach

Bill’s Final Architecture
mongod SSD

Story #2: Gary the Game Developer
● Gary was launching a AAA game title
● MongoDB would provide the backend for the player’s
online experience
● Launched worldwide, same day, midnight launches

Complex Cloud Deployment
● Deploying in the cloud, but very beefy instances
● 32 vCPU, 244GiB RAM, 8 x SSD
● Single mongod unable to stress instances
● Hence “Micro-Sharding” required to get most out of
instances

Micro-What?
Micro-Sharding is the practice of deploying multiple relatively small (hence “micro”) shards on
large hosts to better take advantage of available resources which are difficult to utilise with a
single mongod instance.
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
Secondary9
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
Secondary9
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
Primary9
For example, 9 shards evenly distributed across 3 hosts, as below:

Extensive Pre-Production Testing
● Load tested
● Failover and Backups tested
● Procedures, architecture reviewed
● Basically, lots of testing/reviewing was done (all
passed)

However…….
The production layout of mongod processes actually was 8 shards on 3 host, reproduced below.
This layout caused a problem in production. But, it was tested and had no issues, right?
Almost: the backup process was tested, and load was tested, but not together…..
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8

The Backup Process
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8
Backups took place on a single host (host 2 below).
The databases were locked, then an LVM snapshot was taken, the lock was released.
This was almost instantaneous in pre-prod testing (no load), not so in production.

Backup Under Load
Once load was introduced to the equation, the snapshots were no longer instantaneous. This
essentially caused the primaries to become unresponsive but not fail over on the host taking the
backup
Which eventually caused a cascading failure, bringing the whole cluster down
HOST1
Primary1
Primary2
Primary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Primary4
Primary5
Primary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Primary7
Primary8

HOST1
Primary1
Primary2
Primary3
Primary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST2
Secondary1
Secondary2
Secondary3
Secondary4
Secondary5
Secondary6
Secondary7
Secondary8
HOST3
Secondary1
Secondary2
Secondary3
Secondary4
Primary5
Primary6
Primary7
Primary8
New process layout proposed, as below, backups still taken on Host2.
The database lock was not necessary because LVM snapshot gives point in time, removed.
Also put some limits on max connections, just in case

Summary
No one single cause:
● Small issue with deployment layout
● Small error with backup process
● Lack of integration with testing plan
● Relatively new system
● Some bad luck
Led to:
● Large outage, slow cautious recovery

Story #3: Rita the Retailer
Rita the Retailer had an ecommerce site, selling
diverse goods in 20+ countries.

Product Catalog: Original
Schema
{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
}

What’s good about this schema?
● Each document contains all the data about a given
product, across all languages/locales
● Very efficient way to retrieve the English, French,
German, etc. translations of a single product’s
information in one query

However……
That is not how the product data is
actually used
(except perhaps by translation staff)

Dominant Query Pattern
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.find( { _id : 375 } , { de_DE : true } );
... and so forth for other locales ...

Which means……
The Product Catalog’s data model
did not fit the way the data was
accessed.

Consequences
● Each document contained ~20x more data than any
common use case needed
● MongoDB lets you request just a subset of a
document’s contents (using a projection), but…
o Typically the whole document will get loaded into
RAM to serve the request
● There are other overheads for reading from disk into
memory (like readahead)

Therefore…..
Less than 5% of data loaded into RAM from disk is
actually required at the time - highly inefficient

Visualising the problem
{ _id: 42,
de_DE : ...,
de_CH : ...,
<... and so on for other locales... > }
<READAHEAD OVERHEAD>
{ _id: 709,
de_DE : ...,
de_CH : ...,
<READAHEAD OVERHEAD>
{ _id: 3600,
de_DE : ...,
de_CH : ...,
- Data in RED are loaded into RAM and used.
- Data in BLUE take up memory but are not required.
- Readahead padding in GREEN makes things even
more inefficient

More RAM? It’s not that simple

● Design for your use case, your dominant query pattern
o In this case: 99.99% of queries want the product
data for exactly one locale at a time
o Hence, alter schema appropriately
● Eliminate inefficiencies on the system
o Make reading from disk less wasteful, maximise I/O
capabilities: reduce readahead settings

Schema: Before & After
Schema After (document per-locale):
{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
{ _id: "375-fr_FR",
... and so on for other locales ...
Query After:
db.catalog.find( { _id : "375-en_US" };
db.catalog.find( { _id : "375-fr_FR" };
db.catalog.find( { _id : "375-de_DE" };
Schema Before (embedded):
{ _id: 375
en_US : { name : ..., description : ...,
<etc...> },
en_GB : { name : ..., description : ...,
<etc...> },
fr_FR : { name : ..., description : ...,
<etc...> },
<... and so on for other locales... >
}
Query Before:
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.find( { _id : 375 } , { de_DE : true } );

Consequences of Changes
● Queries induced minimal overhead
● Greater than 20x distinct products fit in memory at once
● Disk I/O utilization reduced
● UI latency decreased
● Happier Customers
● Profit (well, we hope)

Conclusions
● MongoDB can be used to for a wide range of
(sometimes pretty cool) use cases
● A small problem can seem much bigger
when it happens in production
● We are here to help - if you hit a problem, it’s
likely you are not the first to hit it
● We can provide a fresh perspective, advice
based on experience to prevent and solve
issues

@comerford #MongoDBLondon
Questions?
Adam Comerford
Senior Solutions Engineer, MongoDB

Further Reading for
Retail/Catalogs
● Antoine Girbal (my team mate) has produced a full reference architecture
for this type of application
o Blog part 1: http://tmblr.co/ZiOADx1RRsAWe
o Blog part 2: http://tmblr.co/ZiOADx1LfVmfm
● Detailed presentations and talks from MongoDB World:
o http://www.mongodb.com/presentations/retail-reference-architecture-part-
1-flexible-searchable-low-latency-product-catalog
2-real-time-geo-distributed-inventory
3-scalable-insight-component-providing-user-history

Tales from the Field

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tales from the Field

Similar to Tales from the Field (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Tales from the Field

Editor's Notes