Diese Präsentation wurde erfolgreich gemeldet.

Monitoring MySQL with OpenTSDB

14

Teilen

Wird geladen in …3
×
1 von 42
1 von 42

Monitoring MySQL with OpenTSDB

14

Teilen

Herunterladen, um offline zu lesen

A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.

A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Monitoring MySQL with OpenTSDB

  1. 1. Monitoring MySQL with OpenTSDB Percona live 2013 Geoffrey Anderson, Box Inc. @geodbz
  2. 2. Who Geoffrey Anderson • Database Operations Engineer @ Box, Inc. • a.k.a. DBA • Tooling for MySQL and HBase • #DBHangOps
  3. 3. The Situation
  4. 4. Then You Get More Servers
  5. 5. Enter OpenTSDB
  6. 6. OpenTSDB is... • Distributed • Scalable • Time Series Database • Runs on HBase • Created By Benoit Sigoure HBase TSD for Querying mydb.example.com HAProxy fe1.example.com TSD for Storing Push Metrics Query via API
  7. 7. • FAST • EASY to Scale • EASY to Populate • EASY to collect data • EASY to Query Why OpenTSDB?
  8. 8. Collecting Data
  9. 9. #!/usr/bin/env bash timestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var val do echo "mysql.$var $timestamp $val host=$HOSTNAME" done ganderson@mydb.example.com:~$ _./mysql_collector.sh mysql.Aborted_connects 1366399993 0 host=mydb.example.com mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com mysql.Bytes_received 1366399993 19453687 host=mydb.example.com mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com mysql.Com_admin_commands 1366399993 1 host=mydb.example.com mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com ... Example: mysql_collector.sh
  10. 10. #!/usr/bin/env bash timestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var val do echo "mysql.$var $timestamp $val host=$HOSTNAME" done ganderson@mydb.example.com:~$ _./mysql_collector.sh mysql.Aborted_connects 1366399993 0 host=mydb.example.com mysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_cache_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.com mysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.com mysql.Bytes_received 1366399993 19453687 host=mydb.example.com mysql.Bytes_sent 1366399993 1238166682 host=mydb.example.com mysql.Com_admin_commands 1366399993 1 host=mydb.example.com mysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com ... Example: mysql_collector.sh Metric name Timestamp Value “Tags” (key=val)
  11. 11. * * * * * mysql_collector.sh | nc opentsdb.example.com 4242 Example: adding a cron for OpenTSDB
  12. 12. ganderson@mydb.example.com:tcollector$ tree . |-- collectors | |-- 0 | | |-- ifstat.py | | |-- iostat.py | | |-- procnettcp.py | | |-- procstats.py | |-- 15 | | `-- dfstat.py | |-- 30 | | |-- mysql_collector.sh | |-- 300 | | `-- ptTcpModel.sh | `-- etc | |-- config.py |-- config |-- startstop `-- tcollector.py Run forever Run every 15 seconds Run every 5 minutes Run every 30 seconds
  13. 13. Querying Data
  14. 14. http://opentsdb.example.com /#start=2013/04/10-07:32:29 &end=2013/04/10-07:57:57 &m=sum:proc.stat.cpu.percentage_idle{host=db22} &o=axis x1y1 &m=sum:db.threads_running{host=db22} &o=axis x1y2 &ylabel=CPU idle &y2label=Threads Running &yrange=[0:] &wxh=1475x600 &png
  15. 15. http://opentsdb.example.com /q?start=2013/04/10-07:32:29 &end=2013/04/10-07:57:57 &m=sum:proc.stat.cpu.percentage_idle{host=db22} &o=axis x1y1 &m=sum:db.threads_running{host=db22} &o=axis x1y2 &ylabel=CPU idle &y2label=Threads Running &yrange=[0:] &ascii
  16. 16. Leveraging OpenTSDB For MySQL
  17. 17. user_statistics monitoring
  18. 18. table_statistics monitoring
  19. 19. Table Info from I_S SELECT *, DATA_LENGTH+INDEX_LENGTH AS TOTAL_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN ('PERFORMANCE_SCHEMA','INFORMATION_SCHEMA')
  20. 20. Query Throughput
  21. 21. And other “common” metrics • Various MySQL status counters • QPS (questions) • Threads connected • Temporary tables on disk • Etc. • Various server statistics • %CPU Idle • Free disk space • I/O utilization • Network traffic • Etc.
  22. 22. Future collectors • pt-query-digest/mysqlslow query statistics • Data from “show engine innodb status” • (that is missing from counters) • PERFORMANCE_SCHEMA (MySQL 5.6+) • Query statistics • Processlist information • Background thread information
  23. 23. How does this change things?
  24. 24. In all seriousness, though... • Easily see aggregate graphs • Easily build graphs on-the-fly • Full granularity forever • API request for raw data • Cluster-wide nagios checks with check_tsd
  25. 25. Challenges Switching • Aggregates are the default • Mouse-zooming (patched!) • Auto-suggest for metrics • “The graphs aren’t pretty” • Migrating from proof of concept • Plan for 3+ machines • Data pruning may be required
  26. 26. Some Quick Numbers OpenTSDB @ Box  21,294 metrics  72 tag keys  5,145,745 tag values  90% Interactive graphs return <300ms
  27. 27. Next Steps
  28. 28. Enjoy #PerconaLive 2013 We’re hiring! https://www.box.com/about-us/careers/ geoff@box.com
  29. 29. Image credits  http://upload.wikimedia.org/wikipedia/commons/7/7b/Batelco_Network_Operations_Centre_(NOC).JPG  http://www.flickr.com/photos/hoyvinmayvin/5873697252/  http://www.percona.com/doc/percona-monitoring-plugins  http://www.2cto.com/uploadfile/2012/0731/20120731112415744.jpg  http://media.tumblr.com/tumblr_lvfspoenWU1qi19a2.png  http://img.izismile.com/img/img4/20110527/640/you_can_be_a_superhero_640_01.jpg  http://openclipart.org/image/250px/svg_to_png/26427/Anonymous_notebook.png  http://images.alphacoders.com/768/2560-1600-76893.jpg  http://www.flickr.com/photos/in365/4861180503/  http://openclipart.org/image/250px/svg_to_png/130915/Prohibido_3D.png  http://www.flickr.com/photos/61114149@N02/5566484951/  http://opentsdb.net/img/tsd-sample.png  http://images2.wikia.nocookie.net/__cb20080911160202/bttf/images/5/57/WhatdidItellyou-HQ.jpg  http://www.flickr.com/photos/lisakayaks/3028350539/  http://www.flickr.com/photos/25566302@N00/1472400115  http://www.flickr.com/photos/grandmaitre/5846058698/  http://www.flickr.com/photos/7518432@N06/2673347604/

Hinweis der Redaktion

  • Will be talking about OpenTSDBHow OpenTSDB changed monitoring at boxHow we leverage it’s abilities for day-to-day management of MySQL DBs
  • Youprobablyhave the perconacactigraphs and monitoring plugins
  • Youaddsomeothernagioschecks for funedgecases
  • And you use different tools from the percona toolkit like:StalkPoor man’s profiler (PMP)Query Digest
  • Suddenly finding problems and correlating issues is difficultMaybe you don’t have a NOC yetMaybe you do, and they need better graphs
  • IT’S BIGGER ON THE INSIDE – just kiddingFast!Easy to build graphs on the flyHella easy to scale – just add nodes (HBase or TSDs)Very easy to put data into it – NEXT SLIDES TALK ABOUT THIS YO
  • Running threads follows the CPU spikes PERFECTLYBox has a “long query” killer that gets more aggressive as more threads stack upShould get a look at queries on the server
  • Zoom in to get the exact time interval
  • Know the exact time of a high stack upGo to check Box Anemometer to see what query is there
  • This is the URL for thatCan easily paste this to anyone to see the same interactive graph
  • If you prefer text, that’s also an option via APIYou can build cool tools using the APIWeek over Week graphsSimplifies anomaly detectionURL is pretty simpleEffectively just use “q?” and add “&amp;ascii”
  • Get audit log:LoginsTypes of statements issuedEtc.
  • Get performance information about:Row and index change activityRow read activity
  • Generate daily reports of:Are auto increments columns nearing a boundary on a table?Number of records in a tableSize of a datafile for a table
  • Using pt-tcp-modelAllows us to identify when server stops doing work5min interval
  • Aggregate graphs are the defaultDrill down only when problems in aggregate
  • Aggregatesare thedefault–shift in thinking from lookingatspecificimportantservers.Zooming in on a timeslice was painfullymanual– I wroteup a patch to addmouse-zooming and upstreamed. Thiscementedopentsdb as a powerful monitoring tool for Box, overnightAuto-suggest for metricsisspotty– we wrote a quick cron job that dumps full metric list into JSON “Graphs aren’t pretty” – a few changes to the base GNUPlot options solved this. There’s also a “Smooth” option in the interface nowMigrating from POC – we had a single-node setup for the longest time until that fell over...a lotPlan for 3+ machines – it’s enough to run all the needed bits for a light-weight distributed HBase and TSD setupData pruning – ~4 bytes per metric before HDFS replication add up quicklymysql_tcollector - 370 metrics -- ~1.5k per server. X 30s interval = ~4.2MB/dayeither have a plan to prune old data or build out extra capacity and predict storage needs per server/metric added
  • ×