Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Deployment Preparedness

1.264 Aufrufe

Veröffentlicht am

Our application development is nearing completion. It's time to prepare our cluster for production, but are we sure the system is capable of handing the load? Have we achieved high availability? What preflight checks should we be running. Learn how Dev & Ops work together to achieve production readiness and plan for scale, availability, monitoring.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Deployment Preparedness

  1. 1. Production Preparedness { name: ‘Bryan Reinero’, title: ‘Developer Advocate’, twitter: ‘@blimpyacht’, code: ‘github.com/breinero’ email: ‘bryan@mongdb.com’ }
  2. 2. 2 Deploy with Joy!
  3. 3. 3
  4. 4. 4
  5. 5. 5 Production Checklist Proper Infrastructure Proper Configuration Proper Monitoring Emergency Procedures
  6. 6. 6 Infrastructure Sizing • RAM • CPU • Disk Size • I/O Bandwidth • Availability
  7. 7. 7 Sizing • Indexes need to be in RAM • Working set needs to be in RAM • I/O Bandwidth - write load - Index updates - Working set migration { _id: ObjectId(), tour: UUID, user: UUID, name: "Doug's Dogs", desc: "The best hot-dog", clues: [ "Hungry for a Coney Island?", "Ask for Dr. Frankenfurter", "Look for the hot dog stand" ] "geometry": { "type": "Point", "coordinates": [125.6, 10.1] } }
  8. 8. 11 Load Testing
  9. 9. 12 Load Testing • Test it like you use it, benchmarks don’t count
  10. 10. 13 Load Testing • Test it like you use it, benchmarks don’t count • Test to failure
  11. 11. 14 Load Testing • Test it like you use it, benchmarks don’t count • Test to failure • Instrument your code!
  12. 12. 15 Load Testing • Test it like you use it, benchmarks don’t count • Test to failure • Instrument your code! https://github.com/breinero/Firehose https://github.com/ParsePlatform/flashback
  13. 13. 16 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load 1K Ops / Second time
  14. 14. 17 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load Memory
  15. 15. 18 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load Input Output
  16. 16. 19
  17. 17. 20 Monitoring Baseline • MongoDB Cloud Manager • MongoDB Ops Manager • Nagios, Zenoss, … Detailed Query Specific • mongotop • db.currentOp() • Query Profiler • mtools
  18. 18. 21 Forensics 2014-08-08T21:15:25.181-0500 [conn1026] getmore myDB.myCollection cursorid:100012502307 ntoreturn:0 keyUpdates:0 numYields:1406953 locks(micros) r:11887558422 nreturned:289 reslen:4208149 28795759ms 2014-08-07T15:31:51.714-0500 [conn7] command myDB.$cmd command: createIndexes { createIndexes: ”myColletion", indexes: [ { key: { Claims.ICN: 1.0 }, name: ”test.a_1" } ] } keyUpdates:0 numYields:0 locks(micros) r:14476 w:25176930351 reslen:113 25176955ms
  19. 19. 22 Logging
  20. 20. 23 Logging • Save and Rotate • Don’t use --quiet • --logpath != --dbpath • Use component verbosity for debugging
  21. 21. 24 Security
  22. 22. 25 Security • Firewall • Bind IP • Encrypt Networks • Enable Access Control • Don’t enable REST interface • Auditing Limit Exposure and use Principal of Least Privileges
  23. 23. 26 Tuning Best Practices • Disable Transparent hugepages • NTP to synchronize time • Set ulimits • Use XFS or Ext4 • Don’t use NFS • Disable NUMA • Have swap Read Production Notes Tunables • Set IO Scheduler NOOP • Adjust readaheads ( MMapV1 ) • Avoid cgroups • SE Linux (?) • RAID
  24. 24. 27 Availability http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
  25. 25. 28 Availability S S DC1 DC2 P Avoid Critical Data Centers
  26. 26. 29 Availability P S DC1 DC2 S DC3
  27. 27. 30 Availability P S DC1 DC2 S AWS
  28. 28. 31 Availability P S DC1 DC2 Arbiter DC3
  29. 29. 32 Availability P DC1 Arbiter AWS S DC2 Down for maintenance
  30. 30. 33 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg
  31. 31. 34 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg Backup and Recovery • File System Snapshot • MMS Cloud • Ops Manager • Mongodump
  32. 32. 35 Backups and Recovery https://spinoff.nasa.gov/spinoff2002/images/070.jpg PERFORM DRILLS OFTEN AND ROUTINELY
  33. 33. 36 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg Document your Procedures • Include ETAs • Follow procedures in docs.mongodb.org
  34. 34. 37 Production Ready Architecture L.B.
  35. 35. 38 Production Ready Architecture L.B. Unindexed queries
  36. 36. 39 Production Ready Architecture L.B. Unindexed queries Leads to collection scans
  37. 37. 40 Production Ready Architecture L.B. Unindexed queries Leads to collection scans Results in high latencies
  38. 38. 41 Classic Failure Scenario L.B. Unindexed queries Leads to collection scans Results in high latenciesCauses memory exhaustion
  39. 39. 42 Production Ready Architecture L.B. Unindexed queries Leads to collection scans Results in high latenciesCauses memory exhaustion CASCADING FAILURE
  40. 40. 43 Circuit Breaker Trigger Conditions • Latency stats.getMean() >= max • OpsPerSecond stats.getN() >= max • ConcurrentOperations stats.getN()*stats.getMean() >= max
  41. 41. 44 Circuit Breaker Trigger Conditions • Latency stats.getMean() >= max • OpsPerSecond stats.getN() >= max • ConcurrentOperations stats.getN()*stats.getMean() >= max https://github.com/breinero/Firehose
  42. 42. 45 Client Side • Don’t use ensureIndex() in application • Look out for connection bombs --maxConnect • DO use operation timeouts • DON’T cause socket timeouts Lower keepalives • Avoid retry bombs
  43. 43. 46 Requirements & Specs Make a DevOps Contract • Database Access Requirements • Database Access Fulfillment Specification • Cluster Configuration • Monitoring and Alerting Specification
  44. 44. 47 Monitoring • Opcounters • Memory • Page Faults • Queues • Replication Lag • Oplog Window • Background Flush Average • Disk space
  45. 45. Thanks! { name: ‘Bryan Reinero’, title: ‘Developer Advocate’, twitter: ‘@blimpyacht’, code: ‘github.com/breinero’ email: ‘bryan@mongdb.com’ }

×