SlideShare ist ein Scribd-Unternehmen logo
1 von 8
PROPRIETARY AND CONFIDENTIAL
Site Reliability Engineering
Michael Blakeney
"an SRE team is responsible for
the availability, latency,
performance, efficiency, change
management, monitoring,
emergency response, and capacity
planning of their service(s)."
What is SRE?
• Ensuring a Durable Focus on
Engineering
• Pursuing Maximum Change
Velocity Without Violating a
Service’s SLO
• Monitoring
• Emergency Response
• Change Management
• Demand Forecasting and
Capacity Planning
• Provisioning
• Efficiency and Performance
PROPRIETARY AND CONFIDENTIAL
Availability
Time Based Aggregate Based
3
"If you haven't tried it, assume it's broken"
Too binary for distributed systems that
can enter partial downtime or degraded states
Much broader and able to capture user facing
experience more effectively
Service Level Indicators
Service Level Objectives
Service Level Agreement
SLI, SLO, SLA
Database state should be 100% recovered in
no more than 1 day.
"99% of pipeline runs cover 100% of the
data."
90% ( averaged over 1 minute ) of http
requests to the backend should complete in
less than 10ms
4
https://landing.google.com/sre/workbook/chapters/slo-document/
PROPRIETARY AND CONFIDENTIAL
the time it takes for your
service to process a
request
Four Golden Signals
5
Latency
the measurement of the
requests the service is
handling
Traffic
the request rate of errors
Errors
How much a resource
with limited quantity is
utilized, usually
measured as a
Percentage of that
resource
Saturation
PROPRIETARY AND CONFIDENTIAL
Error Budgets
• Error budgets enable teams to make objective decisions regarding prioritization of
features versus reliability.
• Given an availability target the error budget defines the tolerable amount of service
unavailability. i.e. 99.99% availability => 0.01% unavailability or 12.96 minutes per
quarter
https://landing.google.com/sre/sre-book/chapters/availability-table/
https://landing.google.com/sre/workbook/chapters/error-budget-policy/
6
"Ways in which things go wrong are special cases of the ways in which things
go right"
PROPRIETARY AND CONFIDENTIAL
Being Agile with SLOs
• Transparency - the SLO and error budget policies along with all other
relevant material should be made available to the team and stake holders
• Inspection - the team should regularly review and analyze the effectiveness
and relevancy of the policies
• Adaptation - The team should be willing to adjust the policies so as to
maximize the value delivered to customers.
7
PROPRIETARY AND CONFIDENTIAL
References
8
https://landing.google.com/sre/books/

Weitere ähnliche Inhalte

Was ist angesagt?

SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!New Relic
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)Hussain Mansoor
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceFranklin Angulo
 
What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)jeetendra mandal
 
Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIKnoldus Inc.
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...New Relic
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityAcquia
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...ITSM Academy, Inc.
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationDr Ganesh Iyer
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLADr Ganesh Iyer
 
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains Infotech
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains InfotechDevOps Vs SRE Major Differences That You Need To Know - Hidden Brains Infotech
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains InfotechRosalie Lauren
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRESquadcast Inc
 

Was ist angesagt? (20)

SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)What is Site Reliability Engineering (SRE)
What is Site Reliability Engineering (SRE)
 
Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLI
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site Reliability
 
Sre summary
Sre summarySre summary
Sre summary
 
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
Site Reliability Engineering: An Enterprise Adoption Story (an ITSM Academy W...
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
SRE 101
SRE 101SRE 101
SRE 101
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains Infotech
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains InfotechDevOps Vs SRE Major Differences That You Need To Know - Hidden Brains Infotech
DevOps Vs SRE Major Differences That You Need To Know - Hidden Brains Infotech
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
DevOps & SRE at Google Scale
DevOps & SRE at Google ScaleDevOps & SRE at Google Scale
DevOps & SRE at Google Scale
 

Ähnlich wie Site reliability engineering - Lightning Talk

3 Enterprise Storage Assessment
3 Enterprise Storage Assessment3 Enterprise Storage Assessment
3 Enterprise Storage AssessmentJeremiah Loscalzo
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsHimanshu Sahu
 
Dimension of quality in Cloud Database Services
Dimension of quality in Cloud Database ServicesDimension of quality in Cloud Database Services
Dimension of quality in Cloud Database ServicesImran Khan
 
2 Storage Readiness Assessment
2 Storage Readiness Assessment2 Storage Readiness Assessment
2 Storage Readiness AssessmentJeremiah Loscalzo
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmssmarar
 
Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'DevOpsDays DFW
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 
typesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdftypesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdfSRIRAMKIRAN9
 
Top 10 Business Reasons for ALM
Top 10 Business Reasons for ALMTop 10 Business Reasons for ALM
Top 10 Business Reasons for ALMImaginet
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Imaginet
 
Welingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLikeWelingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLikePrinceTrivedi4
 
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm,   integrated enterprise ppm-alm-apm on force.comCloudbyz ppm,   integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.comDinesh Sheshadri
 

Ähnlich wie Site reliability engineering - Lightning Talk (20)

TaaS Webinar
TaaS WebinarTaaS Webinar
TaaS Webinar
 
3 Enterprise Storage Assessment
3 Enterprise Storage Assessment3 Enterprise Storage Assessment
3 Enterprise Storage Assessment
 
Azure governance
Azure governanceAzure governance
Azure governance
 
Design patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applicationsDesign patterns and plan for developing high available azure applications
Design patterns and plan for developing high available azure applications
 
Dimension of quality in Cloud Database Services
Dimension of quality in Cloud Database ServicesDimension of quality in Cloud Database Services
Dimension of quality in Cloud Database Services
 
SFDC ODS Preludesys
SFDC ODS PreludesysSFDC ODS Preludesys
SFDC ODS Preludesys
 
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...Performance Testing Strategy for Cloud-Based System using Open Source Testing...
Performance Testing Strategy for Cloud-Based System using Open Source Testing...
 
2 Storage Readiness Assessment
2 Storage Readiness Assessment2 Storage Readiness Assessment
2 Storage Readiness Assessment
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 
Types of performance testing
Types of performance testingTypes of performance testing
Types of performance testing
 
Rational Quality Manager
Rational Quality ManagerRational Quality Manager
Rational Quality Manager
 
Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'
 
Jagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + YearsJagadeesh_Resume_5 + Years
Jagadeesh_Resume_5 + Years
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
typesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdftypesofperformancetesting-130505055525-phpapp02.pdf
typesofperformancetesting-130505055525-phpapp02.pdf
 
Top 10 Business Reasons for ALM
Top 10 Business Reasons for ALMTop 10 Business Reasons for ALM
Top 10 Business Reasons for ALM
 
Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)Top Business Benefits of Application Lifecycle Management (ALM)
Top Business Benefits of Application Lifecycle Management (ALM)
 
Welingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLikeWelingkar First Year Project- ProjectWeLike
Welingkar First Year Project- ProjectWeLike
 
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm,   integrated enterprise ppm-alm-apm on force.comCloudbyz ppm,   integrated enterprise ppm-alm-apm on force.com
Cloudbyz ppm, integrated enterprise ppm-alm-apm on force.com
 
Module -4 Resource Management.pdf
Module -4 Resource Management.pdfModule -4 Resource Management.pdf
Module -4 Resource Management.pdf
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Site reliability engineering - Lightning Talk

  • 1. PROPRIETARY AND CONFIDENTIAL Site Reliability Engineering Michael Blakeney
  • 2. "an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)." What is SRE? • Ensuring a Durable Focus on Engineering • Pursuing Maximum Change Velocity Without Violating a Service’s SLO • Monitoring • Emergency Response • Change Management • Demand Forecasting and Capacity Planning • Provisioning • Efficiency and Performance
  • 3. PROPRIETARY AND CONFIDENTIAL Availability Time Based Aggregate Based 3 "If you haven't tried it, assume it's broken" Too binary for distributed systems that can enter partial downtime or degraded states Much broader and able to capture user facing experience more effectively
  • 4. Service Level Indicators Service Level Objectives Service Level Agreement SLI, SLO, SLA Database state should be 100% recovered in no more than 1 day. "99% of pipeline runs cover 100% of the data." 90% ( averaged over 1 minute ) of http requests to the backend should complete in less than 10ms 4 https://landing.google.com/sre/workbook/chapters/slo-document/
  • 5. PROPRIETARY AND CONFIDENTIAL the time it takes for your service to process a request Four Golden Signals 5 Latency the measurement of the requests the service is handling Traffic the request rate of errors Errors How much a resource with limited quantity is utilized, usually measured as a Percentage of that resource Saturation
  • 6. PROPRIETARY AND CONFIDENTIAL Error Budgets • Error budgets enable teams to make objective decisions regarding prioritization of features versus reliability. • Given an availability target the error budget defines the tolerable amount of service unavailability. i.e. 99.99% availability => 0.01% unavailability or 12.96 minutes per quarter https://landing.google.com/sre/sre-book/chapters/availability-table/ https://landing.google.com/sre/workbook/chapters/error-budget-policy/ 6 "Ways in which things go wrong are special cases of the ways in which things go right"
  • 7. PROPRIETARY AND CONFIDENTIAL Being Agile with SLOs • Transparency - the SLO and error budget policies along with all other relevant material should be made available to the team and stake holders • Inspection - the team should regularly review and analyze the effectiveness and relevancy of the policies • Adaptation - The team should be willing to adjust the policies so as to maximize the value delivered to customers. 7