SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Barry Laffoy – Senior DevOps Engineer
Scaling a Monitoring Strategy
For a Microservices Architecture
Thanks to Our Sponsors
http://community.kloia.co.ukJoin Our Community Slack Channel
Monitoring In A Microservices
Environment
Or how to scale your alerting strategy with your team and application
Who AmI?
 Why Should You Listen to me?
 Physics
 Actuarial Science
 Build Engineering
 Experience building and maintaining Excel and
Jenkins
 DevOps at ClearScore
Who Are
ClearScore
 Aim to Solve Money for the World
 Present people their data in a beautiful way, to
empower financial decision making
 Committed to best-in-class technical solutions
 Committed to having fun while we do it
1. What’s the point?
WhatAre
Microservices
Whywouldwe
wantthem?
 12 factor app
 Scalable
 Releasable
 Loggable
 Discoverable
 Monitorable
 And several more “–ables”
 We handle
 Scaling
 Releasing
 Logging
 Discovering
 Monitoring
 Released independently
 Empower ownership
 Distribute risk
MoreReleases
MoreStability
CALMS
 Culture
 Automation
 Lean
 Monitoring
 Sharing
Business
Objectives
 Autonomous Cross Functional Teams
 Increased releases
 Deliver feature more quickly with less risk
 Uptime
ThePlatform
 Drive developer ownership
 Not just a scheduler (Nomad/k8s/etc)
 CI/CD
 Local dev tools
 Cloud native resources
 Logging
 Metrics
 Monitoring
 Alerting
ThePlatform
 Amazon Web Services
 Immutable infrastructure (Packer)
 Infrastructure as code (Terraform)
 Service discovery (Consul)
 Scheduling (Nomad)
 CI/CD (Jenkins)
2. How Hard Is
Monitoring?
Application Performance Monitoring
TraditionalAPM
 Worked great for bare-metal deployment of single Java
app
 Tracing
 Alerts
 Health dashboards
Notsogreatfor
microservices
 Instrumented inside container (not 12 factor)
 Paying for license per process (not scalable)
 Manual configuration of alerting rules
 Limited Language support
 Tracing from service to service very difficult
 Alerting on ”abnormal traffic” limited by simple
statistical model
3. How to move
forward?
Tools,tools,
tools
 Pingdom
 Liveness probes
 CloudWatch
 ElasticSearch
 StatsD, influxdb, grafana
 Next Gen APM, Instana
ThirdParty
Services
 Partner integrations
 Flaky
 Knock-on effects
OffTheShelf
 External Synthetics with PingDom
 Container security scanning with quay.io
 Dependency security scanning with maven/npm
 AMI security scanning with Inspector
 Performance monitoring as part of CI pipeline
 Internal Synthetics with consul-alerts/liveness-readiness probes
Highly
customizable
 Cloud Native with CloudWatch
 Annotating releases in Grafana
 Self managed with statsd
 Infrastructure metrics
 Custom Application Metrics
 Third party integration monitoring
 Alerting rules are “all or nothing”
4. What about APM?
Whatdowe
need?
 Light touch configuration
 Scalable deployment model
 Auto service discovery
 Sensible default alerting rules
 Flexible configuration
 Tracing between asynchronous services
Rollourown
 Already collecting statsd
 Means writing and supporting a lot of logic
 Using some sort of ML?
Traditional
Vendors
 Poor support for distributed microservices
 Poor language support (Scala/akka)
 Mixed results on configurability
EnterInstana
 Discovered quite by accident
 Beautiful UI
 Extremely easy to set up
 Covered most of our desired features out of the box
 Infrastructure monitoring
 Microservice APM
 End-User-Monitoring
5. Culture of Ownership
(It’s not just about tools)
Youbuildityour
runit!
 Delivery teams own their microservices
 Responsible for performance and monitoring in
dev/ci/stg environments
 Ideally, incidents alert to dev team responsible
 Unfortunately, we don’t quite do that
 Sophisticated routing system <picture of me>
Peoplecause
problems
 Things go wrong, when people change things
 Luckily, this means things go wrong during business
hours (mostly)
 Everyone empowered to inspect monitoring tools
 On-call teams supports problem resolution, doesn’t fix
everything
 Understanding teams and services drives platform
improvement
AlertGrooming
 Lots of noise on alert channels
 Alert Fatigue
 ”Boy who cried wolf” syndrome
 Requires proactive maintenance of alerts
 Fix ALL annoying alerts, even if that means fixing the
the alert, not the underlying service
 Investment takes time, but pays dividends in
productivity
MajorIncidents
 Zero blame retros
 Involve stake-holders
 Generate action points with owners (and follow up)
 Detailed incident report with business-friendly
summaries and cost estimates
6. Our Platform Future
Replatforming
 Hashicorp platform
 Great choice to get us to the cloud
 Focused on supporting zillions of containers in HPC
environment
 Limiting our scalability and speed of delivery
 Encouraged anti-pattern of integrating platform details
into services
 Kubernetes migration
 Solves many of our problems
 Natively supports blue-green
 Instana support for cluster health monitoring
 Prometheus on-cluster monitoring
 What to do with our statsd?
Continuous
Deployment2.0
 Investigating CD platforms
Spinnaker/Concourse/Drone
 Routing non-prod alerts to development teams
 Performance, Tracing, Vulnerability issues should be
flagged
GoingGlobal
 Support across timezones
 More and more services
 More and more teams
Serverless
 Functions as a service (on AWS lambda)
 Horizontal auto-scaling
 “No Ops”
 Cheap
 Unsupported by traditional monitoring/tracing
solutions
 X-Ray tracing features with Instana
Thank you
We’re hiring

Weitere ähnliche Inhalte

Was ist angesagt?

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kongapidays
 
Microsoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsMicrosoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsElasticsearch
 
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays
 
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerVMware Tanzu
 
Enterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweEnterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweDevOps.com
 
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementAddressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementEnterprise Management Associates
 
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays
 
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...Nicolas Brousse
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Legacy Typesafe (now Lightbend)
 
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...India Scrum Enthusiasts Community
 
Henrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerHenrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerDevSecCon
 
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays
 
Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira SoftwareAtlassian
 
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanAgile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanIndia Scrum Enthusiasts Community
 
Security architecture best practices for saas applications
Security architecture best practices for saas applicationsSecurity architecture best practices for saas applications
Security architecture best practices for saas applicationskanimozhin
 
Aliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk
 

Was ist angesagt? (20)

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
 
Microsoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applicationsMicrosoft: Enterprise search for cloud native applications
Microsoft: Enterprise search for cloud native applications
 
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
apidays LIVE Australia - Building a scalable API platform for an IoT ecosyste...
 
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipiosapidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
apidays LIVE Singapore - Green APIs by Alex-Adrien Auger, Sipios
 
Soluciones Dynatrace
Soluciones DynatraceSoluciones Dynatrace
Soluciones Dynatrace
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Enterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & ZoweEnterprise DevOps Series: Using VS Code & Zowe
Enterprise DevOps Series: Using VS Code & Zowe
 
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster ManagementAddressing the 8 Key Pain Points of Kubernetes Cluster Management
Addressing the 8 Key Pain Points of Kubernetes Cluster Management
 
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
apidays LIVE Singapore 2021 - Protecting the API ecosystem by Omaru Maruatona...
 
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
 
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
Agile Tour Pune 2015: Agility with Microservices and Devops: Archana Joshi an...
 
Henrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using SwaggerHenrique Dantas - API fuzzing using Swagger
Henrique Dantas - API fuzzing using Swagger
 
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
apidays LIVE Paris - Creating a scalable ecosystem of Microservices by Archan...
 
Enable DevSecOps using Jira Software
 Enable DevSecOps using Jira Software Enable DevSecOps using Jira Software
Enable DevSecOps using Jira Software
 
Microevent
MicroeventMicroevent
Microevent
 
Architecting SaaS
Architecting SaaSArchitecting SaaS
Architecting SaaS
 
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar VenugopalanAgile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
Agile Tour Pune 2015: Dev-ops- niche or mainstream: Bhaskar Venugopalan
 
Security architecture best practices for saas applications
Security architecture best practices for saas applicationsSecurity architecture best practices for saas applications
Security architecture best practices for saas applications
 
Aliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution ArchitectureAliaksei Bahachuk - JavaScript and Solution Architecture
Aliaksei Bahachuk - JavaScript and Solution Architecture
 

Ähnlich wie DevOps Underground - Microservices Monitoring

Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Puppet
 
Auditing in the Cloud
Auditing in the CloudAuditing in the Cloud
Auditing in the Cloudtcarrucan
 
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPeter Marshall
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
Cloud Applications Management Nirvana
Cloud Applications Management NirvanaCloud Applications Management Nirvana
Cloud Applications Management NirvanaSeema Jethani
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! ReloadedCodemotion
 
Securing Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureSecuring Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureQualys
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlowAuditor
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilitySylvain Hellegouarch
 
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWSPeloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWSAmazon Web Services
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
WSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...DFLABS SRL
 
Enterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarEnterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarJohn Mathon
 
Grafana overview deck - Tech - 2023 May v1.pdf
Grafana overview deck  - Tech - 2023 May v1.pdfGrafana overview deck  - Tech - 2023 May v1.pdf
Grafana overview deck - Tech - 2023 May v1.pdfBillySin5
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Claire Priester Papas
 

Ähnlich wie DevOps Underground - Microservices Monitoring (20)

Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019
 
Auditing in the Cloud
Auditing in the CloudAuditing in the Cloud
Auditing in the Cloud
 
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy EnvironmentsPete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
Pete Marshall - casmadrid2015 - Continuous Delivery in Legacy Environments
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Cloud Applications Management Nirvana
Cloud Applications Management NirvanaCloud Applications Management Nirvana
Cloud Applications Management Nirvana
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! Reloaded
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Securing Your Public Cloud Infrastructure
Securing Your Public Cloud InfrastructureSecuring Your Public Cloud Infrastructure
Securing Your Public Cloud Infrastructure
 
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
NetFlow Auditor Anomaly Detection Plus Forensics February 2010 08
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems Reliability
 
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWSPeloton Cycle  Streaming Live Spin Classes to Thousands with Loggly & AWS
Peloton Cycle Streaming Live Spin Classes to Thousands with Loggly & AWS
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
WSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and RoadmapWSO2 Integration Platform: Vision and Roadmap
WSO2 Integration Platform: Vision and Roadmap
 
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
Cyber Crime Conference 2017 - DFLabs Supervised Active Intelligence - Andrea ...
 
Enterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinarEnterprise platform 3.0v4 for webinar
Enterprise platform 3.0v4 for webinar
 
Grafana overview deck - Tech - 2023 May v1.pdf
Grafana overview deck  - Tech - 2023 May v1.pdfGrafana overview deck  - Tech - 2023 May v1.pdf
Grafana overview deck - Tech - 2023 May v1.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018Critical Considerations for Continuous Delivery 04.09.2018
Critical Considerations for Continuous Delivery 04.09.2018
 

Mehr von kloia

re:Invent recap - Application Modernization
re:Invent recap - Application Modernizationre:Invent recap - Application Modernization
re:Invent recap - Application Modernizationkloia
 
Isovalent-kloia Cilium Workshop
Isovalent-kloia Cilium WorkshopIsovalent-kloia Cilium Workshop
Isovalent-kloia Cilium Workshopkloia
 
Kloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation MattersKloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation Matterskloia
 
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfDotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfkloia
 
AWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxAWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxkloia
 
re:Invent Recap
re:Invent Recapre:Invent Recap
re:Invent Recapkloia
 
The New era in QA: k6
The New era in QA: k6The New era in QA: k6
The New era in QA: k6kloia
 
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba SertkayaEtkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba Sertkayakloia
 
AWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN AmbassadorAWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN Ambassadorkloia
 
Camunda BPM - Said Mengi
Camunda BPM - Said MengiCamunda BPM - Said Mengi
Camunda BPM - Said Mengikloia
 
AlOps - Yetişkan Eliaçık
AlOps - Yetişkan EliaçıkAlOps - Yetişkan Eliaçık
AlOps - Yetişkan Eliaçıkkloia
 
Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen kloia
 
Gravitee API Management - Ahmet AYDIN
 Gravitee API Management  -  Ahmet AYDIN Gravitee API Management  -  Ahmet AYDIN
Gravitee API Management - Ahmet AYDINkloia
 
React Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus DemirpolatReact Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus Demirpolatkloia
 
React Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus DemirpolatReact Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus Demirpolatkloia
 
Contract testing - Baran Gayretli
Contract testing - Baran Gayretli Contract testing - Baran Gayretli
Contract testing - Baran Gayretli kloia
 
Contract Testing
Contract TestingContract Testing
Contract Testingkloia
 
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras BilgenUsing Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgenkloia
 
Kloia Quality Assurance
Kloia Quality AssuranceKloia Quality Assurance
Kloia Quality Assurancekloia
 
DevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and SeleniumhubDevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and Seleniumhubkloia
 

Mehr von kloia (20)

re:Invent recap - Application Modernization
re:Invent recap - Application Modernizationre:Invent recap - Application Modernization
re:Invent recap - Application Modernization
 
Isovalent-kloia Cilium Workshop
Isovalent-kloia Cilium WorkshopIsovalent-kloia Cilium Workshop
Isovalent-kloia Cilium Workshop
 
Kloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation MattersKloia - Why Microsoft Modernisation Matters
Kloia - Why Microsoft Modernisation Matters
 
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdfDotNetKonf23 - NET Modernization Problems & Solutions.pdf
DotNetKonf23 - NET Modernization Problems & Solutions.pdf
 
AWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptxAWS User Group Meetup Feb2023.pptx
AWS User Group Meetup Feb2023.pptx
 
re:Invent Recap
re:Invent Recapre:Invent Recap
re:Invent Recap
 
The New era in QA: k6
The New era in QA: k6The New era in QA: k6
The New era in QA: k6
 
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba SertkayaEtkili Blog Yazım Teknikleri - Tuğba Sertkaya
Etkili Blog Yazım Teknikleri - Tuğba Sertkaya
 
AWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN AmbassadorAWS re:Invent 2021 Recap by APN Ambassador
AWS re:Invent 2021 Recap by APN Ambassador
 
Camunda BPM - Said Mengi
Camunda BPM - Said MengiCamunda BPM - Said Mengi
Camunda BPM - Said Mengi
 
AlOps - Yetişkan Eliaçık
AlOps - Yetişkan EliaçıkAlOps - Yetişkan Eliaçık
AlOps - Yetişkan Eliaçık
 
Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen Zaman Yönetimi - Aras Bilgen
Zaman Yönetimi - Aras Bilgen
 
Gravitee API Management - Ahmet AYDIN
 Gravitee API Management  -  Ahmet AYDIN Gravitee API Management  -  Ahmet AYDIN
Gravitee API Management - Ahmet AYDIN
 
React Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus DemirpolatReact Bootcamp Day 2 - Yunus Demirpolat
React Bootcamp Day 2 - Yunus Demirpolat
 
React Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus DemirpolatReact Bootcamp Day 1 - Yunus Demirpolat
React Bootcamp Day 1 - Yunus Demirpolat
 
Contract testing - Baran Gayretli
Contract testing - Baran Gayretli Contract testing - Baran Gayretli
Contract testing - Baran Gayretli
 
Contract Testing
Contract TestingContract Testing
Contract Testing
 
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras BilgenUsing Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
Using Design Methods to Establish Healthy DevOps Practices - Aras Bilgen
 
Kloia Quality Assurance
Kloia Quality AssuranceKloia Quality Assurance
Kloia Quality Assurance
 
DevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and SeleniumhubDevOps Turkey Test Automation with Docker and Seleniumhub
DevOps Turkey Test Automation with Docker and Seleniumhub
 

Kürzlich hochgeladen

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Kürzlich hochgeladen (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

DevOps Underground - Microservices Monitoring

  • 1. Barry Laffoy – Senior DevOps Engineer Scaling a Monitoring Strategy For a Microservices Architecture Thanks to Our Sponsors http://community.kloia.co.ukJoin Our Community Slack Channel
  • 2. Monitoring In A Microservices Environment Or how to scale your alerting strategy with your team and application
  • 3. Who AmI?  Why Should You Listen to me?  Physics  Actuarial Science  Build Engineering  Experience building and maintaining Excel and Jenkins  DevOps at ClearScore
  • 4. Who Are ClearScore  Aim to Solve Money for the World  Present people their data in a beautiful way, to empower financial decision making  Committed to best-in-class technical solutions  Committed to having fun while we do it
  • 5.
  • 8.
  • 9. Whywouldwe wantthem?  12 factor app  Scalable  Releasable  Loggable  Discoverable  Monitorable  And several more “–ables”  We handle  Scaling  Releasing  Logging  Discovering  Monitoring  Released independently  Empower ownership  Distribute risk
  • 11. CALMS  Culture  Automation  Lean  Monitoring  Sharing
  • 12. Business Objectives  Autonomous Cross Functional Teams  Increased releases  Deliver feature more quickly with less risk  Uptime
  • 13. ThePlatform  Drive developer ownership  Not just a scheduler (Nomad/k8s/etc)  CI/CD  Local dev tools  Cloud native resources  Logging  Metrics  Monitoring  Alerting
  • 14. ThePlatform  Amazon Web Services  Immutable infrastructure (Packer)  Infrastructure as code (Terraform)  Service discovery (Consul)  Scheduling (Nomad)  CI/CD (Jenkins)
  • 15. 2. How Hard Is Monitoring? Application Performance Monitoring
  • 16. TraditionalAPM  Worked great for bare-metal deployment of single Java app  Tracing  Alerts  Health dashboards
  • 17. Notsogreatfor microservices  Instrumented inside container (not 12 factor)  Paying for license per process (not scalable)  Manual configuration of alerting rules  Limited Language support  Tracing from service to service very difficult  Alerting on ”abnormal traffic” limited by simple statistical model
  • 18. 3. How to move forward?
  • 19. Tools,tools, tools  Pingdom  Liveness probes  CloudWatch  ElasticSearch  StatsD, influxdb, grafana  Next Gen APM, Instana
  • 20.
  • 22. OffTheShelf  External Synthetics with PingDom  Container security scanning with quay.io  Dependency security scanning with maven/npm  AMI security scanning with Inspector  Performance monitoring as part of CI pipeline  Internal Synthetics with consul-alerts/liveness-readiness probes
  • 23. Highly customizable  Cloud Native with CloudWatch  Annotating releases in Grafana  Self managed with statsd  Infrastructure metrics  Custom Application Metrics  Third party integration monitoring  Alerting rules are “all or nothing”
  • 25. Whatdowe need?  Light touch configuration  Scalable deployment model  Auto service discovery  Sensible default alerting rules  Flexible configuration  Tracing between asynchronous services
  • 26. Rollourown  Already collecting statsd  Means writing and supporting a lot of logic  Using some sort of ML?
  • 27. Traditional Vendors  Poor support for distributed microservices  Poor language support (Scala/akka)  Mixed results on configurability
  • 28. EnterInstana  Discovered quite by accident  Beautiful UI  Extremely easy to set up  Covered most of our desired features out of the box  Infrastructure monitoring  Microservice APM  End-User-Monitoring
  • 29.
  • 30.
  • 31.
  • 32. 5. Culture of Ownership (It’s not just about tools)
  • 33. Youbuildityour runit!  Delivery teams own their microservices  Responsible for performance and monitoring in dev/ci/stg environments  Ideally, incidents alert to dev team responsible  Unfortunately, we don’t quite do that  Sophisticated routing system <picture of me>
  • 34.
  • 35. Peoplecause problems  Things go wrong, when people change things  Luckily, this means things go wrong during business hours (mostly)  Everyone empowered to inspect monitoring tools  On-call teams supports problem resolution, doesn’t fix everything  Understanding teams and services drives platform improvement
  • 36. AlertGrooming  Lots of noise on alert channels  Alert Fatigue  ”Boy who cried wolf” syndrome  Requires proactive maintenance of alerts  Fix ALL annoying alerts, even if that means fixing the the alert, not the underlying service  Investment takes time, but pays dividends in productivity
  • 37. MajorIncidents  Zero blame retros  Involve stake-holders  Generate action points with owners (and follow up)  Detailed incident report with business-friendly summaries and cost estimates
  • 38. 6. Our Platform Future
  • 39. Replatforming  Hashicorp platform  Great choice to get us to the cloud  Focused on supporting zillions of containers in HPC environment  Limiting our scalability and speed of delivery  Encouraged anti-pattern of integrating platform details into services  Kubernetes migration  Solves many of our problems  Natively supports blue-green  Instana support for cluster health monitoring  Prometheus on-cluster monitoring  What to do with our statsd?
  • 40. Continuous Deployment2.0  Investigating CD platforms Spinnaker/Concourse/Drone  Routing non-prod alerts to development teams  Performance, Tracing, Vulnerability issues should be flagged
  • 41. GoingGlobal  Support across timezones  More and more services  More and more teams
  • 42. Serverless  Functions as a service (on AWS lambda)  Horizontal auto-scaling  “No Ops”  Cheap  Unsupported by traditional monitoring/tracing solutions  X-Ray tracing features with Instana