SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Solution Architect, MongoDB
chad.tindel@mongodb.com
@ctindel
Chad Tindel
#MongoDBWorld
Hardware Provisioning
MongoDB is so easy for
programmers….
Even a baby can write an
application!
MongoDB is so easy to
manage with MMS…
Even a baby can manage a cluster!
Hardware Selection for
MongoDB is….
Not so easy!
Text Over Photo
A Cautionary Tale
The methodology (in theory)
Requirements – Step One
• It is impossible to properly size a MongoDB
cluster without first documenting your business
requirements
• Availability: what is your uptime requirement?
• Throughput
• Responsiveness
– what is acceptable latency?
– is higher latency during peak times acceptable?
Requirements – Step Two
• Understand your own resources available to you
– Storage
– Memory
– Network
– CPU
• Many customers limited to the options available in
AWS or presented by their own Enterprise
Virtualization team
Continuing Requirements – Step
Three
• Once you deploy initially, it is common for requirements to
change
– More users added to the application
• Causes more queries and a larger working set
– New functionality changes queries patterns
• New indexes added causes a larger working set
– What started as a read-intensive application can add more and more
write-heavy workloads
• More write-locking increases reader queue depth
• You must monitor and collect metrics and update your
hardware selection as necessary (scale up /Add RAM? Add
more shards?)
Run a Proof of Concept
• Forces you to:
– Do schema / index design
– Understand query patterns
– Get a handle on Working Set size
• Start small on a single node
– See how much performance you can get from one box
• Add replication, then add sharding
– Understand how these affect performance in your use case
• POC can be done on a smaller scale to infer what will be
needed for production
POC – Requirements to Gather
Data Sizes
– Total Number of Documents
– Average Document Size
– Size of Data on Disk
– Size of Indexes on Disk
– Expected growth
– What is your document model?
• Ingestion
– Insertions / Updates / Deletes per second, peak &
average
– Bulk inserts / updates? If so, how large and how often?
POC – Requirements to Gather
• Query Patterns and Performance Expectations
– Read Response SLA
– Write Response SLA
– Range queries or single document queries?
– Sort conditions
– Is more recent data queried more frequently?
• Data Policies
– How long will you keep the data for?
– Replication Requirements
– Backup Requirements / Time to Recovery
POC – Requirements to Gather
• Multi-datacenter Requirements
– Number and location of datacenters
– Cross DC latency
– Active /Active orActive / Passive?
– Geographical / Data locality requirements?
• Security Requirements
– Encryption over the wire (SSL) ?
– Encryption of data at rest?
Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput
Storage Capability
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon SSD EBS ~ 4000 PIOPS / Volume
~ 48,000 PIOPS / Instance
Intel X25-E (SLC) ~ 5,000 IOPS
Fusion IO ~ 135,000 IOPS
Violin Memory 6000 ~ 1,000,000 IOPS
Storage Measuring
Storage Measuring
Memory Measuring
• Added in 2.4
– workingSet option on db.serverStatus()
> db.serverStatus( { workingSet: 1 } )
Network
• Latency
– WriteConcern
– ReadPreference
• Throughput
– Update/Write Patterns
– Reads/Queries
• Come to love netperf
CPU Usage
• Non-indexed Queries
• Sorting
• Aggregation
– Map/Reduce
– Framework
Case Studies (theory applied)
Case Study #1: A Spanish Bank
• Problem statement: want to store 6 months worth of
logs
• 18TB of total data (3 TB/month)
• Primarily analyzing the last month’s worth of logs, so
Working Set Size is 1 month’s worth of data (3TB)
plus indexes (1TB) = 4 TB Working Set
Case Study #1: Hardware Selection
• QAEnvironment
– Did not want to mirror a full production cluster. Just
wanted to hold 2TB of data
– 3 nodes / shard * 4 shards = 12 physical machines
– 2 mongos
– 3 config servers (virtual machines)
• Production Environment
– 3 nodes / shard * 36 shards = 108 physical machines
– 128GB/RAM * 36 = 4.6 TB RAM
– 2 mongos
– 3 config servers (virtual machines)
Case Study #2: A Large Online
Retailer
• Problem statement: Moving their product catalog
from SQL Server to MongoDB as part of a larger
architectural overhaul to Open Source Software
• 2 main datacenters running active/active
• On Cyber Monday they peaked at 214 requests/sec,
so let’s budget for 400 requests/sec to give some
headroom
Case Study #2: The POC
• APOC yielded the following numbers:
– 4 million product SKUs, average JSON document size
30KB
• Need to service requests for:
– a specific product (by _id)
– Products in a specific category (i.e. “Desks” or “Hard
Drives”)
• Returns 72 documents, or 200 if it’s a google bot
crawling)
Case Study #2: The Math
• Want to partition (Shard) by category, and have
products that exist in multiple categories duplicated
– The average product appears in 2 categories, so we
actually need to store 8M SKU documents, not 4M
• 8M docs * 30KB/doc = 240GB of data
• 270 GB with indexes
• Working Set is 100% of all data + indexes as this is
a core functionality that must be fast at all times
Case Study #2: Our
Recommendation
• MongoDB initial recommendation was to deploy a single
Replica Set with enough RAM in each server to hold all the
data (at least 384GB RAM/server)
• 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC)
– Allows for a node in each DC to go down for maintenance or system
crash while still servicing the application centers in that datacenter
• Deploy using secondary reads (NEAREST read preference)
• This avoids the complexity of sharding, setting up mongos,
config servers, worrying about orphaned documents, etc.
Node 1
Primary
Node 2
Secondary
Node 3
Secondary
Node 3
Secondary
Datacenter 3
Arbiter
Datacenter 1 Datacenter 2
Case Study #2: Actual Provisioning
• Customer decided to deploy on their corporate
VMWare Cloud
• IT would not give them nodes any bigger than 64
GB RAM
• Decided to deploy 3 shards (4 nodes each + arbiter)
= 192 GB/RAM cluster wide into a staging
environment and add a fourth shard if staging
proves it would be worthwhile
Key Takeaways
• Document your performance requirements up front
• Conduct a Proof of Concept
• Always test with a real workload
• Constantly monitor and adjust based on changing
requirements
Solution Architect, MongoDB
Chad Tindel
#MongoDBWorld
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
MongoDB
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 

Was ist angesagt? (20)

MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
MongoDB 101 & Beyond: Get Started in MongoDB 3.0, Preview 3.2 & Demo of Ops M...
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Webinar: Technical Introduction to Native Encryption on MongoDB
Webinar: Technical Introduction to Native Encryption on MongoDBWebinar: Technical Introduction to Native Encryption on MongoDB
Webinar: Technical Introduction to Native Encryption on MongoDB
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Scaling MongoDB to a Million Collections
Scaling MongoDB to a Million CollectionsScaling MongoDB to a Million Collections
Scaling MongoDB to a Million Collections
 
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
 
Practical Design Patterns for Building Applications Resilient to Infrastructu...
Practical Design Patterns for Building Applications Resilient to Infrastructu...Practical Design Patterns for Building Applications Resilient to Infrastructu...
Practical Design Patterns for Building Applications Resilient to Infrastructu...
 
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
 
Scale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with AkkaScale and Throughput @ Clicktale with Akka
Scale and Throughput @ Clicktale with Akka
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 

Ähnlich wie Hardware Provisioning

Walking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDBWalking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
MongoDB
 

Ähnlich wie Hardware Provisioning (20)

Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
DataIntensiveComputing.pdf
DataIntensiveComputing.pdfDataIntensiveComputing.pdf
DataIntensiveComputing.pdf
 
try
trytry
try
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDBWalking the Walk: Developing the MongoDB Backup Service with MongoDB
Walking the Walk: Developing the MongoDB Backup Service with MongoDB
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Hardware Provisioning

  • 1. Solution Architect, MongoDB chad.tindel@mongodb.com @ctindel Chad Tindel #MongoDBWorld Hardware Provisioning
  • 2. MongoDB is so easy for programmers….
  • 3. Even a baby can write an application!
  • 4. MongoDB is so easy to manage with MMS…
  • 5. Even a baby can manage a cluster!
  • 8.
  • 9. Text Over Photo A Cautionary Tale
  • 11. Requirements – Step One • It is impossible to properly size a MongoDB cluster without first documenting your business requirements • Availability: what is your uptime requirement? • Throughput • Responsiveness – what is acceptable latency? – is higher latency during peak times acceptable?
  • 12. Requirements – Step Two • Understand your own resources available to you – Storage – Memory – Network – CPU • Many customers limited to the options available in AWS or presented by their own Enterprise Virtualization team
  • 13. Continuing Requirements – Step Three • Once you deploy initially, it is common for requirements to change – More users added to the application • Causes more queries and a larger working set – New functionality changes queries patterns • New indexes added causes a larger working set – What started as a read-intensive application can add more and more write-heavy workloads • More write-locking increases reader queue depth • You must monitor and collect metrics and update your hardware selection as necessary (scale up /Add RAM? Add more shards?)
  • 14. Run a Proof of Concept • Forces you to: – Do schema / index design – Understand query patterns – Get a handle on Working Set size • Start small on a single node – See how much performance you can get from one box • Add replication, then add sharding – Understand how these affect performance in your use case • POC can be done on a smaller scale to infer what will be needed for production
  • 15. POC – Requirements to Gather Data Sizes – Total Number of Documents – Average Document Size – Size of Data on Disk – Size of Indexes on Disk – Expected growth – What is your document model? • Ingestion – Insertions / Updates / Deletes per second, peak & average – Bulk inserts / updates? If so, how large and how often?
  • 16. POC – Requirements to Gather • Query Patterns and Performance Expectations – Read Response SLA – Write Response SLA – Range queries or single document queries? – Sort conditions – Is more recent data queried more frequently? • Data Policies – How long will you keep the data for? – Replication Requirements – Backup Requirements / Time to Recovery
  • 17. POC – Requirements to Gather • Multi-datacenter Requirements – Number and location of datacenters – Cross DC latency – Active /Active orActive / Passive? – Geographical / Data locality requirements? • Security Requirements – Encryption over the wire (SSL) ? – Encryption of data at rest?
  • 18. Resource Usage • Storage – IOPS – Size – Data & Loading Patterns • Memory – Working Set • CPU – Speed – Cores • Network – Latency – Throughput
  • 19. Storage Capability 7,200 rpm SATA ~ 75-100 IOPS 15,000 rpm SAS ~ 175-210 IOPS Amazon SSD EBS ~ 4000 PIOPS / Volume ~ 48,000 PIOPS / Instance Intel X25-E (SLC) ~ 5,000 IOPS Fusion IO ~ 135,000 IOPS Violin Memory 6000 ~ 1,000,000 IOPS
  • 22. Memory Measuring • Added in 2.4 – workingSet option on db.serverStatus() > db.serverStatus( { workingSet: 1 } )
  • 23. Network • Latency – WriteConcern – ReadPreference • Throughput – Update/Write Patterns – Reads/Queries • Come to love netperf
  • 24. CPU Usage • Non-indexed Queries • Sorting • Aggregation – Map/Reduce – Framework
  • 26. Case Study #1: A Spanish Bank • Problem statement: want to store 6 months worth of logs • 18TB of total data (3 TB/month) • Primarily analyzing the last month’s worth of logs, so Working Set Size is 1 month’s worth of data (3TB) plus indexes (1TB) = 4 TB Working Set
  • 27. Case Study #1: Hardware Selection • QAEnvironment – Did not want to mirror a full production cluster. Just wanted to hold 2TB of data – 3 nodes / shard * 4 shards = 12 physical machines – 2 mongos – 3 config servers (virtual machines) • Production Environment – 3 nodes / shard * 36 shards = 108 physical machines – 128GB/RAM * 36 = 4.6 TB RAM – 2 mongos – 3 config servers (virtual machines)
  • 28. Case Study #2: A Large Online Retailer • Problem statement: Moving their product catalog from SQL Server to MongoDB as part of a larger architectural overhaul to Open Source Software • 2 main datacenters running active/active • On Cyber Monday they peaked at 214 requests/sec, so let’s budget for 400 requests/sec to give some headroom
  • 29. Case Study #2: The POC • APOC yielded the following numbers: – 4 million product SKUs, average JSON document size 30KB • Need to service requests for: – a specific product (by _id) – Products in a specific category (i.e. “Desks” or “Hard Drives”) • Returns 72 documents, or 200 if it’s a google bot crawling)
  • 30. Case Study #2: The Math • Want to partition (Shard) by category, and have products that exist in multiple categories duplicated – The average product appears in 2 categories, so we actually need to store 8M SKU documents, not 4M • 8M docs * 30KB/doc = 240GB of data • 270 GB with indexes • Working Set is 100% of all data + indexes as this is a core functionality that must be fast at all times
  • 31. Case Study #2: Our Recommendation • MongoDB initial recommendation was to deploy a single Replica Set with enough RAM in each server to hold all the data (at least 384GB RAM/server) • 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC) – Allows for a node in each DC to go down for maintenance or system crash while still servicing the application centers in that datacenter • Deploy using secondary reads (NEAREST read preference) • This avoids the complexity of sharding, setting up mongos, config servers, worrying about orphaned documents, etc.
  • 32. Node 1 Primary Node 2 Secondary Node 3 Secondary Node 3 Secondary Datacenter 3 Arbiter Datacenter 1 Datacenter 2
  • 33. Case Study #2: Actual Provisioning • Customer decided to deploy on their corporate VMWare Cloud • IT would not give them nodes any bigger than 64 GB RAM • Decided to deploy 3 shards (4 nodes each + arbiter) = 192 GB/RAM cluster wide into a staging environment and add a fourth shard if staging proves it would be worthwhile
  • 34. Key Takeaways • Document your performance requirements up front • Conduct a Proof of Concept • Always test with a real workload • Constantly monitor and adjust based on changing requirements
  • 35. Solution Architect, MongoDB Chad Tindel #MongoDBWorld Thank You

Hinweis der Redaktion

  1. Initialize -> Election Primary + data replication from primary to secondary