This document provides an introduction and best practices for deploying MongoDB on AWS. It describes MongoDB's document model and features like rich queries, geospatial search, and aggregation. New features in MongoDB 3.2 include in-memory storage, encrypted storage, document validation, and dynamic lookups. The document discusses MongoDB high availability using replica sets, elastic scalability through automatic sharding, and query routing. It offers guidance on AWS instance types, EBS volumes, and global deployment architectures. Lastly, it covers management and monitoring tools like Ops Manager, backup strategies, and visual profiling.
4. Features
Rich Queries
Find Paul’s cars
Find everybody in London with a car between 1970 and
1980
Geospatial Find all of the car owners within 5km of Trafalgar Sq.
Text Search Find all the cars described as having leather seats
Aggregation Calculate the average value of Paul’s car collection
Map Reduce
What is the ownership pattern of colors by geography over
time (is purple trending in China?)
6. New Storage engines
In-memory
Run your most demanding, real-time apps with
the in-memory storage engine (beta)
Encrypted
Secure your data-at-rest with the encrypted
storage engine
7. Data Governance: Document Validation
Implement data governance without sacrificing
the agility that comes from a dynamic schema
8. Dynamic Lookup
Combine data from multiple collections with left
outer joins for richer analytics & more flexibility
in data modeling
9. Richer In-Database Analytics & Search
New Aggregation operators extend options for
performing analytics with lower developer
complexity
Array
Operators
Math
Operators
Text
$slice,
$arrayElemAt,
$concatArrays,
$filter, $min,
$max, $avg,
$sum, and
more.
$stdDevSamp,
$stdDevPop,
$sqrt,$abs,
$trunc,
$ceil,$floor,
$log, $pow
$exp, and
more.
Case sensitive
text search
and support for
languages
such as
Arabic, Farsi,
Chinese, and
more.
10. MongoDB Connector for BI
Visualize and explore multi-structured data using
SQL-based BI platforms.
Connector for BI
Provides Schema
Translates Queries
Translates Response
11. Compass: The GUI for MongoDB
Visually explore your data and schema.
Run ad hoc queries in seconds
Make smarter decisions about indexing,
document validation, and more.
No command line needed
13. Replica Sets
Replica set – 2 to 50 copies
Replica sets make up a self-healing ‘shard’
Replica sets address - High availability,
Maintenance (e.g., HW swaps) and Disaster
Recovery
Application
Driver
Primary
Secondary
Secondary
Replication
14. Replica Sets – Workload Isolation
Replica sets enable workload isolation
Example: Operational workloads on the primary
node, analytical workloads on the secondary
nodes
Application
Primary
In-memory
Secondary
WiredTiger
User Data
Sessions, Cart,
Recommendations
Secondary
WiredTiger
Persisted
User Data
15. Elastic Scalability: Automatic Sharding
Increase or decrease capacity as you go
Automatic load balancing
Three types of sharding - Hash-based, Range-
based and Tag-aware
Shard
1
Shard
2
Shard
3
Shard
N
Horizontally Scalable
16. Query Routing
Multiple query optimization models
Each of the sharding options are appropriate for
different apps / use cases
18. AWS EC2 Instance Types
General Purpose - M3, M4
• Start with General Purpose instances and EBS GP2
Compute-optimized - C3, C4
• WiredTiger - Write performance due to document-level concurrency control
Memory-optimized - R3
• In memory storage engine: NEW!
• Larger working set - MMAPv1 with read-intensive applications
Storage-optimized - I2, D2
• Local instance store but lost when instance is stopped or terminated
• Always use with higher replication
https://docs.mongodb.com/ecosystem/platforms/amazon-ec2/#deployment-notes
19. Amazon EBS Volumes
EBS GP2 for general workloads and EBS PIOS for consistent
performance
https://docs.mongodb.com/ecosystem/platforms/amazon-ec2/#storage-
considerations
EBS-optimized Instances provides additional/dedicated bandwidth
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
EBS snapshots for recovery and backups
https://docs.mongodb.com/ecosystem/platforms/amazon-ec2/#backup-restore-
verify
28. The Best Way to Run MongoDB
Ops Manager and Cloud Manager allows you
leverage and automate the best practices.
10x-20x more efficient operations
Complete performance visibility
Protection from data loss
Performance optimization
29. How they work
Ops or Cloud
Manager
mongod mongodmongod
Agent Agent Agent
30. Amazon QuickStart
https://aws.amazon.com/quickstart/
VPC with private and public subnets
Instance role with fine-grained permissions.
Security groups
A fully customized MongoDB cluster - Replica
sets, shards, and config servers and customized
EBS storage
31. Monitoring & Alerting
Over 100+ database-related metrics
Dozens of optimized charts
Custom alerts so incidents do not become
emergencies
32. Do-It-Yourself
Set of utilities distributed with MongoDB –
mongostat, mongotop
Database commands – serverStatus,
dbStats, collStats etc
Self-hosted – Ganglia, mtop, munin, nagios
Hosted (SaaS) – New Relic, Datadog,
Server Density
https://docs.mongodb.com/manual/administration/monitoring/
33. Backup with Point-in-time Recovery
Cluster-wide snapshots of
sharded clusters
Restore to precisely the moment you
need, quickly and safely with point-in-
time restores
34. Manual Backup Considerations
Consider a hidden member in a Replica set
Consider EBS Snapshots
Consider Journaling (Write Ahead Log), and allow for DB durability
in case of a fault
Ensure consistency by using db.fsyncLock()
https://docs.mongodb.com/ecosystem/tutorial/backup-and-restore-
mongodb-on-amazon-ec2/
35. Visual Query Profiler
Identify the slow-running queries
across your cluster with just the click
of a button
Index suggestions to improve your
query performance
Automate rolling index builds to
reduce operational overhead and
the risk of failovers
36. Resources
MongoDB on AWS best practices:
http://docs.mongodb.org/ecosystem/platforms/amazo
n-ec2/
MongoDB production Notes
http://docs.mongodb.org/manual/administration/produ
ction-notes/
MongoDB Documentation
http://docs.mongodb.org