SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Monitoring & Alerting
Quick dive
How much do outages cost us?
Facebook - $500k in just 30 min of outage in 2014
Amazon - $66k/min
Industry average - $300k/hour
Industry total lost revenue - $26.5B
What is monitoring?
The process of becoming aware of the state of a system.
Is my website up and accessible?
Does all the important functionality work?
Is each server up?
Are all the applications we deployed up?
What’s my CPU usage per machine? disk? memory?
Swap?
Start simple
Basic monitoring systems that you can try straight away:
● Google analytics (Android, iOS, UNITY, HTTP, analytics.js)
● Fabric (Crashlytics integration for Android and iOS)
You can also check this detailed comparison table of different monitoring systems.
What does monitoring help with?
● Early problem detection
● Decision making
● Automation
Early problem detection
Performance
● Monitoring anomalies in the behavior of the system helps to detect resource
saturation and rare defects (hard to spot by QA)
● Particular types of bugs related to heavy system load are hard to detect in test
environments, but can be consistently reproduced in production
Availability
● Downtime usually translates directly to losses in revenue and credibility
● 99.99% availability is the industry standard (50min/year)
Decision making
Baselining
● Know the normal, average state of your system (baseline)
● Data-backed Service-Level Agreements (SLAs)
● In-depth performance analysis, saving costs
Predictions
● Help predict what normal traffic levels are during peaks of activity, like
holidays, social events and such (capacity planning)
● Close interaction with monitoring may help predict business trends
Automation
Allows system to automatically adapt to high load situations.
Bursts of input may saturate a system’s capacity and it may have to drop
some traffic. In order to prevent uniformly bad experience for all users an
attempt is made to reject a portion of inputs. This is commonly known as
admission control.
Monitoring system architecture
● Data collection
● Data aggregation and storage
● Presentation
Data collection
The source of data are logs, device statistics, and system measurements:
● Logging network request failure rates (4xx, 5xx)
● Tracking performance of calls to individual
remote services
● Database calls and response time
● Disk and CPU usage
● Logging mobile clients analytics events
Data aggregation and storage
● Incoming data inputs are grouped by their properties and stored as timeseries
● Resulting timeseries submitted to an alarm evaluation engine, which
generates alarms if anomalies are detected (anomaly detection).
One such system is Graphite.
Presentation
Allows visualisation of the real time state of the system. When a fault is identified
and fixed, the correction should be immediately visible.
One powerful tool for dashboarding is Grafana:
● Integrate with Graphite, InfluxDB, OpenTSDB, and KairosDB
● Introduction and basic concepts can be found here
● Useful video on how to setup your first dashboard
● Give it a try
Alerting
Alerting is the capability of a
monitoring system to detect and notify
the engineer about meaningful events.
Levels of alert urgency
● Alerts as records - anomalies that do not impact the service functionality.
● Alerts as notifications - do not need immediate attention.
● Alerts as pages - high severity, response time inforced by internal SLAs.
Tools
● Pagerduty
● OpsGenie
● VictorOps
Anomaly detection
The identification of items, events or observations which do not conform to an
expected pattern or other items in a dataset.
Let’s see how Uber does it.
Issue is detected and fixed, now what?
Detecting and fixing an issue are only the first steps. We need to make sure that the
issue does not happen again.
Use of postmortems is one interesting approach.
Challenges
● Baselining
● Coverage
● Manageability
● Accuracy
● Context
● Human nature
Conclusion
● Get in the habit of measuring, you cannot manage what you cannot measure
● Monitor extensively
● Alarm selectively
● Work smart, not hard, learn from the experience of others
● Have a tactic
Further reading: Effective Monitoring and Alerting
Thank you!
Contact:
sabin.roman@gmail.com
https://nl.linkedin.com/in/sabinroman

Weitere ähnliche Inhalte

Was ist angesagt?

Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Monitoring via Datadog
Monitoring via DatadogMonitoring via Datadog
Monitoring via DatadogKnoldus Inc.
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenParis Container Day
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadogalexismidon
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Elastic APM: Amping up your logs and metrics for the full picture
Elastic APM: Amping up your logs and metrics for the full pictureElastic APM: Amping up your logs and metrics for the full picture
Elastic APM: Amping up your logs and metrics for the full pictureElasticsearch
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale Knoldus Inc.
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to PrometheusJulien Pivotto
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusMarco Pas
 
Dynatrace Cloud-Native Workshop Slides
Dynatrace Cloud-Native Workshop SlidesDynatrace Cloud-Native Workshop Slides
Dynatrace Cloud-Native Workshop SlidesVMware Tanzu
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafanatorkelo
 
Observability for modern applications
Observability for modern applications  Observability for modern applications
Observability for modern applications MoovingON
 
Automated Governance
Automated GovernanceAutomated Governance
Automated GovernanceJohn Willis
 

Was ist angesagt? (20)

Grafana
GrafanaGrafana
Grafana
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Monitoring via Datadog
Monitoring via DatadogMonitoring via Datadog
Monitoring via Datadog
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
End to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max IndenEnd to-end monitoring with the prometheus operator - Max Inden
End to-end monitoring with the prometheus operator - Max Inden
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadog
 
Observability
ObservabilityObservability
Observability
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Elastic APM: Amping up your logs and metrics for the full picture
Elastic APM: Amping up your logs and metrics for the full pictureElastic APM: Amping up your logs and metrics for the full picture
Elastic APM: Amping up your logs and metrics for the full picture
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Dynatrace Cloud-Native Workshop Slides
Dynatrace Cloud-Native Workshop SlidesDynatrace Cloud-Native Workshop Slides
Dynatrace Cloud-Native Workshop Slides
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
Observability for modern applications
Observability for modern applications  Observability for modern applications
Observability for modern applications
 
Automated Governance
Automated GovernanceAutomated Governance
Automated Governance
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 

Ähnlich wie Monitoring & alerting presentation sabin&mustafa

Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance TestingAjay Kumar Vaddadi
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Dieter Plaetinck
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systemsNenad Bozic
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations VisionSteve Mushero
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstractsncct
 
What is onTune for management
What is onTune for managementWhat is onTune for management
What is onTune for managementTeemStone Pty Ltd
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony GoddardOSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony GoddardNETWAYS
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)RichardWarburton
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationKnoldus Inc.
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
 

Ähnlich wie Monitoring & alerting presentation sabin&mustafa (20)

Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance Testing
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstracts
 
What is onTune for management
What is onTune for managementWhat is onTune for management
What is onTune for management
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
The Shape of Cloud to Come
The Shape of Cloud to ComeThe Shape of Cloud to Come
The Shape of Cloud to Come
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony GoddardOSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 

Mehr von Lama K Banna

The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfLama K Banna
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfLama K Banna
 
Investment proposal
Investment proposalInvestment proposal
Investment proposalLama K Banna
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lama K Banna
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery Lama K Banna
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryLama K Banna
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lama K Banna
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLama K Banna
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLama K Banna
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lama K Banna
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLama K Banna
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lama K Banna
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLama K Banna
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lama K Banna
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLama K Banna
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disordersLama K Banna
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lama K Banna
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLama K Banna
 

Mehr von Lama K Banna (20)

The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdf
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdf
 
Investment proposal
Investment proposalInvestment proposal
Investment proposal
 
Funding proposal
Funding proposalFunding proposal
Funding proposal
 
5 incisions
5 incisions5 incisions
5 incisions
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial Surgery
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmd
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular joint
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examination
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial clefts
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformities
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorders
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial trauma
 

Kürzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Monitoring & alerting presentation sabin&mustafa

  • 2. How much do outages cost us? Facebook - $500k in just 30 min of outage in 2014 Amazon - $66k/min Industry average - $300k/hour Industry total lost revenue - $26.5B
  • 3. What is monitoring? The process of becoming aware of the state of a system. Is my website up and accessible? Does all the important functionality work? Is each server up? Are all the applications we deployed up? What’s my CPU usage per machine? disk? memory? Swap?
  • 4. Start simple Basic monitoring systems that you can try straight away: ● Google analytics (Android, iOS, UNITY, HTTP, analytics.js) ● Fabric (Crashlytics integration for Android and iOS) You can also check this detailed comparison table of different monitoring systems.
  • 5. What does monitoring help with? ● Early problem detection ● Decision making ● Automation
  • 6. Early problem detection Performance ● Monitoring anomalies in the behavior of the system helps to detect resource saturation and rare defects (hard to spot by QA) ● Particular types of bugs related to heavy system load are hard to detect in test environments, but can be consistently reproduced in production Availability ● Downtime usually translates directly to losses in revenue and credibility ● 99.99% availability is the industry standard (50min/year)
  • 7. Decision making Baselining ● Know the normal, average state of your system (baseline) ● Data-backed Service-Level Agreements (SLAs) ● In-depth performance analysis, saving costs Predictions ● Help predict what normal traffic levels are during peaks of activity, like holidays, social events and such (capacity planning) ● Close interaction with monitoring may help predict business trends
  • 8. Automation Allows system to automatically adapt to high load situations. Bursts of input may saturate a system’s capacity and it may have to drop some traffic. In order to prevent uniformly bad experience for all users an attempt is made to reject a portion of inputs. This is commonly known as admission control.
  • 9. Monitoring system architecture ● Data collection ● Data aggregation and storage ● Presentation
  • 10. Data collection The source of data are logs, device statistics, and system measurements: ● Logging network request failure rates (4xx, 5xx) ● Tracking performance of calls to individual remote services ● Database calls and response time ● Disk and CPU usage ● Logging mobile clients analytics events
  • 11. Data aggregation and storage ● Incoming data inputs are grouped by their properties and stored as timeseries ● Resulting timeseries submitted to an alarm evaluation engine, which generates alarms if anomalies are detected (anomaly detection). One such system is Graphite.
  • 12. Presentation Allows visualisation of the real time state of the system. When a fault is identified and fixed, the correction should be immediately visible. One powerful tool for dashboarding is Grafana: ● Integrate with Graphite, InfluxDB, OpenTSDB, and KairosDB ● Introduction and basic concepts can be found here ● Useful video on how to setup your first dashboard ● Give it a try
  • 13. Alerting Alerting is the capability of a monitoring system to detect and notify the engineer about meaningful events.
  • 14. Levels of alert urgency ● Alerts as records - anomalies that do not impact the service functionality. ● Alerts as notifications - do not need immediate attention. ● Alerts as pages - high severity, response time inforced by internal SLAs.
  • 16. Anomaly detection The identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Let’s see how Uber does it.
  • 17. Issue is detected and fixed, now what? Detecting and fixing an issue are only the first steps. We need to make sure that the issue does not happen again. Use of postmortems is one interesting approach.
  • 18. Challenges ● Baselining ● Coverage ● Manageability ● Accuracy ● Context ● Human nature
  • 19. Conclusion ● Get in the habit of measuring, you cannot manage what you cannot measure ● Monitor extensively ● Alarm selectively ● Work smart, not hard, learn from the experience of others ● Have a tactic Further reading: Effective Monitoring and Alerting

Hinweis der Redaktion

  1. Today we will discuss about what we love the most in engineering, being waken up at 4am in the morning because of a bug! Talk about how to detect problems with your application and how to fix them as soon as possible
  2. Has anybody used this tools?
  3. The ability to predict demands and then match them based on seasonality translates directly into revenue gains
  4. When a data store that supports a user-facing service starts serving queries much slower than usual, but not slow enough to make an appreciable difference in the overall service’s response time, that should generate a low-urgency alert that is recorded in your monitoring system for future reference or investigation but does not interrupt anyone’s work the data store is running low on disk space and should be scaled out in the next several days
  5. Pics, charts, examples, how much time it takes to setup system, conclusion, pitfalls,
  6. Baselining: “nothing endures but change” Coverage: systems evolve, so should the coverage
  7. Tactic: Runbooks 80% disc storage issue