Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Building a Monitoring Plan.pdf

599 Aufrufe

Veröffentlicht am

Building a Monitoring Plan

  • Als Erste(r) kommentieren

Building a Monitoring Plan.pdf

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Paul Ferguson Senior Consultant, Professional Services, Amazon Web Services Chris Kozlowski Senior Technical Account Manager, Amazon Web Services Building a Monitoring Plan
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Who we are Paul Ferguson – Senior Consultant, London Chris Kozlowski – Senior Technical Account Manager, US East Coast Who we are
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Buzzword Bingo Observability Operational Intelligence ‘No Ops’ Composable monitoring Event correlation Signal to noise ratio Alarm fatigue Single pane of glass E-bonding
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What we’ll discuss… • Who needs to be involved, and why • What to Monitor • What makes for an effective monitoring rule • What tools to use and when • Metrics, business outcomes, improvements
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational Challenges will always exist… But with proper planning and design, you will be ready for them.
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. So Why Monitor In the First Place? To Gain Insights! • Customer Experience • Performance & Cost • Trends • Troubleshooting & Remediation • Learning & Improvement
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Goes Into a Monitoring Plan? Alerts System Knowledge People Actions Tools
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. People
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Roles and Responsibilities • Operations – First Responders, Triage • Developers/Engineers – Define normal operation • Management – Tasked with making business decisions in response to events
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System Knowledge
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Faults Configuration Accounting Performance Security Categories of Insight
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Things to Monitor AWS Foundation Services Compute Storage Database Networking AWS Global Infrastructure Regions Availability Zones Edge Locations Operating Systems Applications Databases Networking Internet Gateway Elastic Load Balancer Web Servers (EC2 w/ Auto Scaling) RDS our
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Crafting Alerts
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anatomy of an Effective Alert FCAPS Category: Performance Amazon Cloudwatch Element: Web Server Custom Alert: ALARM Site latency >=2s for 1 minute Elastic Load Balancing EC2 InstancesAuto Scaling EC2 Instance Runbook Owner Test Action
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Drive Towards Achieving Business Insight Metrics Operational Outcomes Webpage Response Time, Job Run Length CPU Wait %, Disk Queue Depth Business Insight! Customer Sentiment, SLAs
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alerting Best Practices • Break alert crafting into batches. Highest Priority First • Refine quickly. • Alert to prompt an action • Descriptive alerts to aid in prompt resolution • Don’t only use email
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tools to use
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to select a good tool • Let your requirements dictate your tools • Start with the tools you have • Consider using native tools on the platform • Integrate tools - ergonomics matter
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example Workload CloudTrail – logging of API calls AWS Config Rules - config CloudWatch – Resources APM for customer experience/ synthetic monitoring
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dashboards
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Actions and Improvements
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Actions • Every alert and event should end in an action to be taken • Escalations to another person should end with an action to be taken by them • Actions are not only technical. Plan for what business decisions might need to be made • Runbooks and Playbooks
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Whenever alarms and alerts fire, identify if they can be improved. • Alert made more accurate, descriptive, timely • Remediation improved • Establish processes with people first, then automate • Identify routine or standard changes as early candidates for automation Improvement and Automation
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary and Next Steps - Check your monitoring approach - Is it user-centric? - Are you measuring the right things? - Write a monitoring plan - Start monitoring, test and iterate The reason operations exists is to support the needs of the business.
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

×