When using MongoDB and AWS, you want to design your infrastructure to avoid storage bottlenecks and make the best use of your available storage resources. AWS offers a myriad of storage options, including ephemeral disks, EBS, Provisioned IOPS, and ephemeral SSD's, each offering different performance and persistence characteristics. In this session, we’ll evaluate each of these options in the context of your MongoDB deployment, assessing the benefits and drawbacks of each.
7. Instance
• Data
• Log
• Journal
EBS
• Data
• Log
• Journal
• Snapshots
S3
• Snapshots
• Archived
Backups
Glacier
• Archived
Backups
MongoDB Elements & AWS Storage
Data Lifecycle
8. Instance Storage
• Ephemeral
– If you’re instance is stopped or terminated, ephemeral
storage is lost (!)
• Configurations
– Single or multiple volumes per instance
• Management
– LVM for RAID or snapshots
9. EBS
• Persistent
– Allocated and attached to individual instances like
network-attached storage
– Storage lifecycle independent of instances
• Configuration
– Single or multiple volumes per instance
• Management
– LVM or MD for RAID
– EBS Snapshots (Console or API)
10. Standard EBS
Standard volumes are designed for applications with
moderate I/O requirements. They are also well-suited
for use as boot volumes or applications where I/O can
be bursty.
• Performance is somewhat variable
• Average of 100 IOPS
• Possible to aggregate via RAID but underlying
bursty nature still exists
11. Provisioned IOPS EBS
Provisioned IOPS volumes offer storage with
consistent and low-latency performance, and are
designed for applications with I/O-intensive workloads
such as databases.
• Consistent volume I/O performance
• Available with 100-4000 IOPS per volume
• Launch with EBS-Optimized
– Adds additional network bandwidth for EBS volumes
12. Measuring IOPS
• Volumes are optimized for 4 KB per operation
• MongoDB document sizes and workload patterns
will affect throughput
• Use mongoperf to test disk configuration
– Threads
– Data file size
– Document size
14. Multiple EBS Volumes
• Provisioned IOPS EBS
• EBS-optimized
• Separate volumes for
– Data
– Journal
– Log
• Decrease disk contention during high load
15. Disk Configurations
• Mirror or stripe multiple disks (or both)
– LVM
– MDADM
• Different implications for each RAID level
– Durability
– Performance
– Cost
16. Aggregating IOPS
• Single volumes capable of 4000 IOPS
• Stripe volumes to aggregate IOPS (RAID0, RAID10)
• Note: network bandwidth is the limiting factor
21. Data Safety
• What’s your backup plan?
• Have you tested restoring?
• Is your data highly available?
• How do you recover from disaster?
22. Protecting Your Data
• Replica Sets
– Proper deployments provide HAand DR
• Manual backup/restore
– Scriptable, tuneable
• MMS Backup
– Continuous, secure backup
23. Manual Backup Procedures
EBS
• EBS Snapshots
• LVM Snapshots
Ephemeral
• LVM Snapshots
Note:
• EBS snapshots can be done “hot” but for MongoDB it’s better
to fsyncLock()
• LVM snapshots require enough free space on instance to
store snapshot
24. Restore
• Boot new or use existing instance
• Create new volume from EBS snapshot and attach
or
• Copy over LVM snapshot and create/mount LV
25. LVM
• Copy snapshots to
S3 bucket
• Create lifecycle
rules to move data
from bucket to
Glacier
EBS
• Mount volume from
snapshot
• Copy volume data
to S3 bucket
• Create lifecycle
rules to move data
from bucket to
Glacier
Archiving Backups
30. Standard Ephemeral Storage
• Remember, it’s ephemeral
• Technically feasible
• Lack of persistence is a big negative
• Any benefits can’t outweigh the negatives
31. Ephemeral SSDs
• Performance ceiling might outweigh typical
negatives
• Cost implications: SSD-backed instances are more
expensive
• Does your workload truly need flash?
– Profile early and often to make this determination
• How many drives do you need?
– Drives instance choice
33. SSD Deployment Strategies
• SSD deployments
– Replica Sets
and
– MMS Backup
• High performance
• Highly available
• Continuous backup
mongod
Primary
mongod
Secondar
y
mongod
Secondar
y
MMS
Backup
Agent
34. SSD Deployment Considerations
• One Secondary could use EBS
• Will need to have an instance with
– High network bandwidth and
– Mutliple EBS volumes aggregated to approach IOPS
parity
• Key is avoiding significant replication lag because of
IO performance dropoff
37. Best Practices
• Prototype > Test > Scale
• IO on AWS is easy to scale
• AWS makes it easy to iterate deployment
– Start small
– Profile your workload
– Remove all other bottlenecks
– Add instance and IO capacity
38. Recommended Starting Points
• EBS-Optimized and PIOPS EBS
• M1.large is an effective starting point for profiling an
early production deployment
• Use volumes with 250 or 500 IOPS for data to start
– A dding more IOPS is as easy
– Snapshot and recreate with more capacity
40. Resources
• MMS Monitoring and Backup
– http://mms.mongodb.com
• MongoDB on AWS best practices:
– http://bit.ly/deploy-mongodb-ec2
• MongoDB on AWS Marketplace:
– http://bit.ly/aws-marketplace-mongodb
• MongoDB docs
– http://docs.mongodb.org
41. MongoDB World
New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including
• MongoDB 2.6
• Sharding
• Replication
• Aggregation
http://world.mongodb.com
Save 25% with discount code 25SandeepParikh