In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. View the slides with video recording: www.mongodb.com/presentations/hardware-provisioning-mongodb
11. Requirements – Step One
• It is impossible to properly size a MongoDB
cluster without first documenting your business
requirements
• Availability: what is your uptime requirement?
• Throughput
• Responsiveness
– what is acceptable latency?
– is higher latency during peak times acceptable?
12. Requirements – Step Two
• Understand your own resources available to you
– Storage
– Memory
– Network
– CPU
• Many customers limited to the options available in
AWS or presented by their own Enterprise
Virtualization team
13. Continuing Requirements – Step
Three
• Once you deploy initially, it is common for requirements to
change
– More users added to the application
• Causes more queries and a larger working set
– New functionality changes queries patterns
• New indexes added causes a larger working set
– What started as a read-intensive application can add more and more
write-heavy workloads
• More write-locking increases reader queue depth
• You must monitor and collect metrics and update your
hardware selection as necessary (scale up /Add RAM? Add
more shards?)
14. Run a Proof of Concept
• Forces you to:
– Do schema / index design
– Understand query patterns
– Get a handle on Working Set size
• Start small on a single node
– See how much performance you can get from one box
• Add replication, then add sharding
– Understand how these affect performance in your use case
• POC can be done on a smaller scale to infer what will be
needed for production
15. POC – Requirements to Gather
Data Sizes
– Total Number of Documents
– Average Document Size
– Size of Data on Disk
– Size of Indexes on Disk
– Expected growth
– What is your document model?
• Ingestion
– Insertions / Updates / Deletes per second, peak &
average
– Bulk inserts / updates? If so, how large and how often?
16. POC – Requirements to Gather
• Query Patterns and Performance Expectations
– Read Response SLA
– Write Response SLA
– Range queries or single document queries?
– Sort conditions
– Is more recent data queried more frequently?
• Data Policies
– How long will you keep the data for?
– Replication Requirements
– Backup Requirements / Time to Recovery
17. POC – Requirements to Gather
• Multi-datacenter Requirements
– Number and location of datacenters
– Cross DC latency
– Active /Active orActive / Passive?
– Geographical / Data locality requirements?
• Security Requirements
– Encryption over the wire (SSL) ?
– Encryption of data at rest?
18. Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput
26. Case Study #1: A Spanish Bank
• Problem statement: want to store 6 months worth of
logs
• 18TB of total data (3 TB/month)
• Primarily analyzing the last month’s worth of logs, so
Working Set Size is 1 month’s worth of data (3TB)
plus indexes (1TB) = 4 TB Working Set
27. Case Study #1: Hardware Selection
• QAEnvironment
– Did not want to mirror a full production cluster. Just
wanted to hold 2TB of data
– 3 nodes / shard * 4 shards = 12 physical machines
– 2 mongos
– 3 config servers (virtual machines)
• Production Environment
– 3 nodes / shard * 36 shards = 108 physical machines
– 128GB/RAM * 36 = 4.6 TB RAM
– 2 mongos
– 3 config servers (virtual machines)
28. Case Study #2: A Large Online
Retailer
• Problem statement: Moving their product catalog
from SQL Server to MongoDB as part of a larger
architectural overhaul to Open Source Software
• 2 main datacenters running active/active
• On Cyber Monday they peaked at 214 requests/sec,
so let’s budget for 400 requests/sec to give some
headroom
29. Case Study #2: The POC
• APOC yielded the following numbers:
– 4 million product SKUs, average JSON document size
30KB
• Need to service requests for:
– a specific product (by _id)
– Products in a specific category (i.e. “Desks” or “Hard
Drives”)
• Returns 72 documents, or 200 if it’s a google bot
crawling)
30. Case Study #2: The Math
• Want to partition (Shard) by category, and have
products that exist in multiple categories duplicated
– The average product appears in 2 categories, so we
actually need to store 8M SKU documents, not 4M
• 8M docs * 30KB/doc = 240GB of data
• 270 GB with indexes
• Working Set is 100% of all data + indexes as this is
a core functionality that must be fast at all times
31. Case Study #2: Our
Recommendation
• MongoDB initial recommendation was to deploy a single
Replica Set with enough RAM in each server to hold all the
data (at least 384GB RAM/server)
• 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC)
– Allows for a node in each DC to go down for maintenance or system
crash while still servicing the application centers in that datacenter
• Deploy using secondary reads (NEAREST read preference)
• This avoids the complexity of sharding, setting up mongos,
config servers, worrying about orphaned documents, etc.
33. Case Study #2: Actual Provisioning
• Customer decided to deploy on their corporate
VMWare Cloud
• IT would not give them nodes any bigger than 64
GB RAM
• Decided to deploy 3 shards (4 nodes each + arbiter)
= 192 GB/RAM cluster wide into a staging
environment and add a fourth shard if staging
proves it would be worthwhile
34. Key Takeaways
• Document your performance requirements up front
• Conduct a Proof of Concept
• Always test with a real workload
• Constantly monitor and adjust based on changing
requirements