Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Monitoring all Elements of Your Database Operations With Zabbix

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 75 Anzeige

Monitoring all Elements of Your Database Operations With Zabbix

Herunterladen, um offline zu lesen

In depth look into all aspects of Zabbix, from the history and origins of the software to an overview of the latest features, introduced in Zabbix 3.2 .
Presented by the founder and CEO of Zabbix, Alexei Vladishev at Percona Live 2016 Europe.

In depth look into all aspects of Zabbix, from the history and origins of the software to an overview of the latest features, introduced in Zabbix 3.2 .
Presented by the founder and CEO of Zabbix, Alexei Vladishev at Percona Live 2016 Europe.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Monitoring all Elements of Your Database Operations With Zabbix (20)

Anzeige

Weitere von Zabbix (20)

Aktuellste (20)

Anzeige

Monitoring all Elements of Your Database Operations With Zabbix

  1. 1. The Universal Open Source Enterprise Level Monitoring Solution Monitoring All Elements of Your Database Operations with Zabbix
  2. 2. Who am I? Alexei Vladishev Founder of Zabbix CEO, Architect and Product Manager Twitter: @avladishev Email: alex@zabbix.com 2
  3. 3. My plan • Introduction to Zabbix • Zabbix capabilities • What’s possible? 3
  4. 4. History of Zabbix
  5. 5. It started with a simple script
  6. 6. What’s wrong with the script? • Hard to maintain and extend • It did not scale well • Provided no advanced problem detection • Any change required script modifications • Etc etc etc
  7. 7. Then I redesigned everything and eventually named it Zabbix…
  8. 8. Zabbix is a universal open source enterprise level monitoring solution
  9. 9. Zabbix team 45 members working full-time Offices in Riga (Headquarters), Tokyo and New-York 9
  10. 10. All levels of infrastructure 10
  11. 11. All platforms and OS All platforms All vendors Any metrics 11
  12. 12. Zabbix Architecture
  13. 13. 13 Data collection
  14. 14. 14 History MySQL or PostgreSQL or Oracle or DB2 Analysis Zabbix server Data collection
  15. 15. 15 History Analysis Data collection Reaction: Alerts Automatic actions Zabbix server WEB UI: Visualization Management
  16. 16. How to install Zabbix Official packages Debian, Ubuntu CentOS, RedHat, Oracle Linux
  17. 17. Database monitoring 17
  18. 18. Production environment Load balancer based on HA Proxy 7 x MySQL nodes running on Linux
  19. 19. What do we need to monitor? OS metrics • CPU, memory and network utilization • Disk IO time • Available disk space Database metrics • Configuration related: max connections, buffer sizes, sync mode • Performance: QPS, query performance, cache hit rate, slow queries, buffer pool usage • Availability: DB is up, connections, log files • Consistency: DB encoding, replication • Security: SSL enabled, opened ports, log files 19
  20. 20. Linux (OS) metrics: Zabbix Agent shell> apt-get install zabbix-agent or shell> rpm -Uvh zabbix-agent * for AIX, Solaris, HP-UX, *BSD, Windows: download pre-compiled from www.zabbix.com 20
  21. 21. Active vs Passive Pull • Service checks • Passive agent • SSH and Telnet Push • Active agent • Zabbix Trapper and SNMP Traps • Monitoring of log files 21
  22. 22. Passive PULL 22 Zabbix Server Zabbix Agent Zabbix Agent Active PUSH MySQL MySQL
  23. 23. Add a new host to Zabbix 23
  24. 24. Create a new host 24
  25. 25. 25
  26. 26. 26
  27. 27. Monitoring MySQL metrics show global status … select * from sys.* select * from performance_schema mysql log file
  28. 28. 28 /etc/zabbix/agent/zabbix_mysql.conf: /etc/zabbix/zabbix_agentd.conf:
  29. 29. 29
  30. 30. 30
  31. 31. 31 Template App MySQL
  32. 32. 32
  33. 33. Graphs 33
  34. 34. Maps 34 Any data
  35. 35. 35 Aggregated metrics Calculated metrics Buffer pool disk read percentage: 100 * Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests Number of QPS per cluster: grpsum[“DB Cluster","mysql.status[Questions]",last] Buffer pool utilization: 100 * (Innodb_buffer_pool_pages_total - Innodb_buffer_pool_pages_free) / Innodb_buffer_pool_pages_total
  36. 36. 36 Also calculated Aggregated metrics for a group of hosts
  37. 37. Custom dashboards 37
  38. 38. How to detect problems in this data flow? 38
  39. 39. Triggers! 39
  40. 40. Trigger is а problem definition 40
  41. 41. {server:mysql.status[Questions].last()} > 5000 41 MySQL server is overloaded Tags Datacenter: AM2 Env: Production Service: DB Cluster
  42. 42. Triggers Example {server:mysql.status[Com_update].last()} / {server:mysql.status[Questions].last()} > 0.1 Operators - + / * < > = <> <= >= or and not Functions min max avg last count date time diff regexp and much more! 42
  43. 43. {MySQL_001:mysql.status[Questions].last()} > 5000
 and
 {MySQL_002:mysql.status[Questions].last()} > 5000
 and
 {MySQL_003:mysql.status[Questions].last()} > 5000 43 Analyse everything: any metric and any hosts Cluster is overloaded
  44. 44. Problem view 44
  45. 45. 45
  46. 46. 46
  47. 47. 47
  48. 48. Problems 48
  49. 49. Performance: MySQL is overloaded {MySQL_001:mysql.status[Questions].last()} > 5000 Availability: MySQL is not available {MySQL_001:mysql.ping.last()} = 0 Junior level 49
  50. 50. Flapping! 50 0 2500 5000 7500 10000 10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 {MySQL_001:mysql.status[Questions].last()} > 5000
  51. 51. Too sensitive leads to false positives 51
  52. 52. How to get rid of false positives? 52
  53. 53. Properly define problem conditions and think carefully! MySQL is overloaded MySQL is not available running out of disk space 53 What really means ?
  54. 54. Take advantage of history MySQL is overloaded {MySQL_001:mysql.status[Questions].min(10m)} > 5000 MySQL node is not available {MySQL_001:mysql.ping.max(#3)} = 0 54
  55. 55. Problem disappeared != problem is resolved 55
  56. 56. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 9.95% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 56
  57. 57. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 10.05% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 57
  58. 58. A few examples Problem: Queries per second > 5000
 Now: 4999 Resolved? Problem: Disk space < 10%
 Now: 10.05% Resolved? Problem: MySQL is not available
 Now: last check returned Up Resolved? 58
  59. 59. Different conditions for problem and recovery Before: {MySQL_001:mysql.status[Questions].last()} > 5000 Better alternative: Problem: {MySQL_001:mysql.status[Questions].last()} > 5000 Recovery: {MySQL_001:mysql.status[Questions].last()} < 3000 59
  60. 60. Several examples System is overloaded Problem: {MySQL_001:mysql.status[Questions].min(2m)} > 5000
 Recovery: {MySQL_001:mysql.status[Questions].max(10m)} < 3000 MySQL server is not available
 
 Problem: {MySQL_001:mysql.ping.max(#3)} = 0
 Recovery: {MySQL_001:mysql.ping.min(#10)} = 1 60
  61. 61. No flapping. No false positives. Suddenly we trust our monitoring! 61
  62. 62. Anomaly detection Compare with a norm, where norm is system state in the past. Average number of queries per second for the last hour is 2x less than number of queries per second for the same period week ago {HA Proxy:Questions.avg(1h)} < 2 * {HA Proxy:Questions.avg(1h, 7d)} 62
  63. 63. Problem forecasting 63
  64. 64. Problem Forecasting 64 0 12,5 25 37,5 50 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 y = -2,9455x + 48,309 Problem 10% Trigger function timeleft() ??? hours
  65. 65. Trend Prediction 65 0 12,5 25 37,5 50 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 y = -2,9455x + 48,309 ??? % 4 hours Trigger function forecast()
  66. 66. How Zabbix reacts on problems? 66
  67. 67. 67
  68. 68. Possible reactions • Automatic problem resolution • Sending alerts to user and user group • Opening tickets in Helpdesk systems • Unlimited number of possible reactions 68
  69. 69. Escalate! 69 MySQL Cluster is down Repeated Email SMS and ticket in Helpdesk system Restart HA Proxy SMS to manager 5 min 10 min 15 min 20 min 0 min
  70. 70. Why Zabbix? 70
  71. 71. 71 All-in-one solution Trend prediction Data collection Problem detection Automatic actions Agent based monitoring Encryption Anomaly detection Maintenance Event correlation Scalability Visualization Auto discovery Trigger dependencies Centralized management Service checks IoT and embedded Distributed monitoring Zabbix API Alerting Escalations User permissions Integration with AD, OpenLDAP LLD Agent-less monitoring … and more
  72. 72. Focus on quality and ease of maintenance 72 All components are compatible within one major release Virtually no third party dependencies Zabbix Agents are backward compatible since Zabbix 1.0!
  73. 73. Benefits of Zabbix Free and Open Source Software Extremely flexible Easy to adopt, use commercial services if needed No License Fees Extremely low TCO No vendor lock in 73
  74. 74. share.zabbix.com
  75. 75. The Universal Open Source Enterprise Level Monitoring Solution Thank you! Twitter: @avladishev Email: alex@zabbix.com Learn more at Zabbix booth or www.zabbix.com

×