Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

That's not a metric! Data for cloud-native success

337 Aufrufe

Veröffentlicht am

“Without data, you’re just another person with an opinion.” W. Edwards Deming was talking about statistical quality control in manufacturing but he could equally have been referring to managing modern iterative and automated software deployment pipelines and cloud-native infrastructure. Certainly there's a wealth of open source tools to capture and visualize data. However, a data strategy isn’t solely or even mostly about drawing up a long list of technical measurements and instrumenting software to capture everything.

It's crucial to distinguish between metrics that relate software initiatives to positive business outcomes, the alerts needed to respond to problems now, and the data required for root cause analysis or to optimize processes over time. All data is not equal. And most data is not a metric for measuring success.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

That's not a metric! Data for cloud-native success

  1. 1. THAT’S NOT A METRIC! DATA FOR CLOUD-NATIVE SUCCESS GORDON HAFF Technology Evangelist, Red Hat LC3 China 2017 @ghaff
  3. 3. “Without data you’re just a person with an opinion.” - W. Edwards Deming
  4. 4. “ Implicit in the phrase “big data,” as well as the concept of data as gold, is that more is better. But in the case of analytics, a legitimate question worth considering: Is more data really better?” - Bob O’Donnell
  5. 5. “ You can’t pick your data, but you must pick your metrics.” - Jeff Bladt and Bob Filbin
  6. 6. “ A familiar phrase on the turf is 'horses for courses.’” - Unknown British writer, 1898
  7. 7. “Human beings adjust behavior based on the metrics they’re held against. Anything you measure will impel a person to optimize his score on that metric. What you measure is what you’ll get. Period.”
  8. 8. THE PRINCIPLES ● You need to measure ● You need to choose relevant metrics ● Quantity may not lead to quality ● Different measurements serve different purposes ● Measurements drive behaviors
  9. 9. LENSES
  10. 10. BUSINESS Customer satisfaction Shopping cart abandons Employee turnover OPERATIONS Cluster health Utilization Outages DEVELOPERS “Productivity” Test coverage Time to deploy AUDIENCE
  11. 11. PEOPLE Turnover Capability Response time PROCESS Effectiveness Efficiency Deployment frequency TECHNOLOGY Performance Failure rate Uptime PEOPLE, PROCESS, AND TECHNOLOGY Hat tip to Chris Riley on DevOps.com
  12. 12. BUSINESS SUCCESS Churn Conversion rates Avg revenue per user CUSTOMER EXPERIENCE Customer satisfaction Frequency of visits A/B test results APPLICATION PERFORMANCE Application response Database query time Uptime FUNCTIONAL GOALS (NEW RELIC) SPEED Lead time for changes Code release frequency Mean time to resolution QUALITY Deployment success rate Incident severity Outstanding bugs
  13. 13. DATA
  14. 14. 4 RULES FOR DATA ● Instrument (many/most of) the things ● Root cause analysis (reactive) ● Detect patterns/trends (proactive) ● Context and distributions matter
  15. 15. WHAT DO WE MEASURE AND STORE? ● Most things ● Unexamined data has negative ROI ● General trend toward keeping data “forever” Give it two years and everything will be stored. —Harel Kodesh, GE Digital CTO 300GB of data per engine per flight
  16. 16. SOME DIRECTIONS ● Increased use of statistics and machine learning (eyeballing dashboards doesn’t scale) ● Better understand how data interacts (latency affects page load affects customer conversion affects revenue) ● Context (seasonal patterns are OK) ● Bottom line: Find patterns that don't conform to expected behavior (anomolies 101)
  17. 17. LOGGING: EFK STACK ● ElasticSearch, Fluentd, Kibana ● Collect, index, search, and visualize log data ● Good for ad hoc analytics ● Good for post mortem forensics because of extensive log information ● Fluentd can serve as integration point between cloud native software like Kubernetes and Prometheus
  18. 18. MONITORING: PROMETHEUS ● Time series data model identified by metric name and key/value pairs ● Collection happens via a pull model over HTTP ● Values reliability even under failure conditions over 100% accuracy ● Most associated with web-scale DevSecOps
  19. 19. MONITORING: HAWKULAR ● REST API to store and retrieve availability, counter, and gauge measurements ● Visualization and alerting ● Application performance management ● Integration with ManageIQ (cloud mgmt) ● Most associated with large scale central IT teams with lots of apps
  20. 20. ALARMS
  21. 21. 4 RULES FOR ALARMS ● Exciting, not routine ● Something needs to be fixed. Now. ● No ambers! ● Must reach the right people
  23. 23. WHICH OF THE FOLLOWING SHOULD WAKE UP AN EXPENSIVE ENGINEER AT 2AM? A: Based on current trends, we need to add additional capacity within 2 weeks B: A hardware failure led to a successful cluster failover C: Response time has increased by 20% D: Our customer support site is down because of an AWS-East outage
  24. 24. D: Our customer support site is down because of an AWS-East outage
  26. 26. 4 RULES FOR METRICS ● What’s important to you? (Success criteria) ● Tied to business outcomes ● Traceable to root cause(s) ● Not too many!
  27. 27. SELECTED PAYPAL METRICS WHAT % of failed deployments Customer ticket volume Response time Deployment frequency Change volume
  28. 28. SELECTED PAYPAL METRICS WHAT WHY % of failed deployments Dysfunction in deployment pipeline Customer ticket volume Basic customer satisfaction measure Response time Service operating within thresholds Deployment frequency Faster iterations for new code Change volume User stories/new lines of code
  29. 29. PUPPET LABS METRICS ● Deployment (or change) frequency ● Change lead time ● Change failure rate ● Mean Time to Recover
  30. 30. RED HAT OPENSHIFT ONLINE METRICS ● Number of applications ● Efficiency (cost) ● Response time (various measures) ● Uptime
  31. 31. GARTNER: DEVOPS METRICS Source: Gartner Data-Driven DevOps: Use Metrics to Help Guide Your Journey May 2014
  33. 33. ANTI-PATTERN WARNING SIGNS ● Easy to collect but don’t really mean anything ● Drive lack of cooperation ● Not observable or not actionable ● Not aligned with business objectives
  34. 34. WHAT MATTERS TO YOU? What do you want to optimize for? Customers, cost, speed…?
  35. 35. SUMMARY
  36. 36. ● Measurements matter ● They’re not metrics ● Metrics are about your success factors ● Do you need to wake someone up? ● New open source tooling (but early)
  37. 37. THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews
  38. 38. 40 CREDITS Lens porn: Ash https://www.flickr.com/photos/neothermic/3485301339 Piggy bank: https://www.flickr.com/photos/marcmos/3644751092 Horse racing: https://www.flickr.com/photos/rogerbarker/2881596967 Report card: https://www.flickr.com/photos/richardgiles/3835758300 Traffic light: https://www.flickr.com/photos/96dpi/3124912138/ Air traffic: NATS - UK air traffic control Sleeping: https://www.flickr.com/photos/barkbud/4126277314/