SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Normal Accidents and
Outpatient Surgeries
Resilience Engineering Done Right
Safety in a Complex and Changing Environment
"...so safety isn't about the absence of something...that
you need to count errors or monitor violations. But
the presence of something. But the presence of what?
When we need to find that things go right under difficult
circumstances, it's mostly because of people's adaptive
capability; their ability to recognize, adapt to, and absorb
changes and disruptions, some of which might fall outside
of what the system is designed or trained to handle"
-Sidney Dekker
Safety in a Complex and Changing Environment
"...so safety isn't about the absence of something...that you
need to count errors or monitor violations. But the presence
of something. But the presence of what?
When we need to find that things go right under difficult
circumstances, it's mostly because of people's adaptive
capability; their ability to recognize, adapt to, and absorb
changes and disruptions, some of which might fall outside of
what the system is designed or trained to handle"
-Sidney Dekker
RESILIENCE
Vocabulary Lesson
Continuous Integration: The ability to quickly make sure the
system is ready for production.
Vocabulary Lesson
Continuous Integration: The ability to quickly make sure
the system is ready for production.
Resilience: The intrinsic ability of a system to adjust its
functioning prior to, during, or following changes and
disturbances in order to sustain required operations.
Vocabulary Lesson
Continuous Integration: The ability to quickly make sure the
system is ready for production.
Resilience: The intrinsic ability of a system to adjust its
functioning prior to, during, or following changes and
disturbances in order to sustain required operations.
Maintainability: Characteristic of design and installation
which determines the probability that a failed equipment,
machine, or system can be restored to its normal state
within a given timeframe.
Vocabulary Lesson
Continuous Integration: The ability to quickly make sure
the system is ready for production.
Resilience: The intrinsic ability of a system to adjust its
functioning prior to, during, or following changes and
disturbances in order to sustain required operations.
Maintainability: Characteristic of design and installation
which determines the probability that a failed equipment,
machine, or system can be restored to its normal state
within a given timeframe.
The SYSTEM includes all the
hardware and software, but
also all of the PEOPLE
involved.
Maintainability = Uptime Goodness
MTTR vs. MTBF
Maintainability = Uptime Goodness
MTTR vs. MTBF
Low MTTR > Low MTBF
Maintainability = Uptime Goodness
MTTR vs. MTBF
Low MTTR > Low MTBF
Low MTTR = Better Uptime for most types of F
Maintainability = Uptime Goodness
MTTR vs. MTBF
Low MTTR > Low MTBF
Low MTTR = Better Uptime for most types of F
Low MTTR Requires: 
• more useful metrics
• intelligent data analysis
• pre-planned, purposeful resilience
• cooperation between application and infrastructure
Your Average Operations Engineer
Your Average Operations Engineer
Automation as a Default:
"One of the be st wa ys to e lim ina te hum a n proble m s is to
ta ke the hum a n out of the proble m . Ma chine s a re ve ry
good a t doing things re pe a te dly a nd doing the m the
sa m e wa y e ve ry single tim e . Hum a ns a re not good a t
this. Le t the m a chine s do it.”
Rapid Recovery:
"Do we spe nd a n unpre dicta ble a m ount of tim e trying to
solve som e obscure issue , or do we sim ply re cre a te the
insta nce providing the se rvice from configura tion
m a na ge m e nt"
blog.lusis.org/blog/2011/10/18/deploy-all-the-things/
Automation as a Default:
"One of the best ways to eliminate human problems is to
take the human out of the problem. Machines are very good
at doing things repeatedly and doing them the same way
every single time. Humans are not good at this. Let the
machines do it."
Rapid Recovery:
"Do we spe nd a n unpre dicta ble a m ount of tim e trying to
solve som e obscure issue , or do we sim ply re cre a te the
insta nce providing the se rvice from configura tion
m a na ge m e nt"
blog.lusis.org/blog/2011/10/18/deploy-all-the-things/
PUPPET + KICKSTART
+ Network Automation
Automation as a Default:
"One of the best ways to eliminate human problems is to
take the human out of the problem. Machines are very good
at doing things repeatedly and doing them the same way
every single time. Humans are not good at this. Let the
machines do it."
Rapid Recovery:
"Do we spend an unpredictable amount of time trying to
solve some obscure issue, or do we simply recreate the
instance providing the service from configuration
management"
blog.lusis.org/blog/2011/10/18/deploy-all-the-things/
PUPPET + KICKSTART
+ Network Automation
ESPER + HEALTHCHECK + NAGIOS
+ SPLUNK+ OHSHIT
Comfortable Changes
1) Are Small
• Many Small Changes = Fewer Incidents with lower MTTR
Comfortable Changes
1) Are Small
• Many Small Changes = Fewer Incidents with lower MTTR
2) Are Reproducible
RPM:
• Really Peaceful Mornings
• Reduce Paging Monitors
• Reusable Provisioning Methods
Comfortable Changes
1) Are Small
• Many Small Changes = Fewer Incidents with lower MTTR
2) Are Reproducible
RPM:
• Really Peaceful Mornings
• Reduce Paging Monitors
• Reusable Provisioning Methods
Rule # 81: If you are logging into servers, you are doing it
wrong.
Comfortable Changes
3) Are easily understood by your most junior team members
Comfortable Changes
3) Are easily understood by your most junior team members
Rule # 4: Keep it Simple, because you are smart. Do not
make it overly complex because you can.
Comfortable Changes
3) Are easily understood by your most junior team members
Rule # 4: Keep it Simple, because you are smart. Do not
make it overly complex because you can.
4) Can be deployed to a subset of production systems
Comfortable Changes
5) Follow Process
Comfortable Changes
5) Follow Process
Change control, deployment processes, peer review, all of
these things matter for a world-class OPS organization.
Comfortable Changes
6) Have been approved by a GO / NO-GO process with all
relevant parties checking in.
Comfortable Changes
6) Have been approved by a GO / NO-GO process with all
relevant parties checking in.
Ensure that all teams involved in a change have signed off,
including ON-CALL and CUSTOMER SERVICE
Tracking Changes
Small Changes
John Allspaw presented these graphs
of data gathered at Etsy.
More Smaller Deployments
means
Faster MTTR
means
Fewer Minutes of Disruption
Operations Meta-Metrics
When in doubt, COLLECT DATA, Build a Timeline!
Things to Monitor:
Changes
(who/what/when/type)
Incidents
(Type/Severity/Duration)
Responses to Incidents
(TTD/TTR)
Things to Collect:
IRC/Jabber Logs
Jira Logs
Search your Data: Use
HBASE+PIG/HIVE, ESPER,
SOLR and SPLUNK
Store everything, even
stuff you don't yet know
how to use.
Tracking Incidents - MTTD
1. Frequency
2.Severity
3.Root Cause: Five Whys Mentality
o why was the website down? The CPU utilization on all our front-end servers
went to 100%
o why did the CPU usage spike? A new bit of code contained an infinite loop!
o why did that code get written? So-and-so made a mistake
o why did his mistake get checked in? He didn't write a unit test for the
feature
o why didn't he write a unit test? He's a new employee, and he was not properly
trained
1. Time-to-Detect
2.Time-to-Resolve
Tracking Incidents - MTTD
Rule # 18: Monitor EVERYTHING, alert on actionable items
only, record other for trend information.
Rule # 20: Do not make the monitoring system so noisy it is
useless.
Tracking Incidents - MTTD
Data Points to source these metrics from:
Output from Application, CLOG, Puppet, Jabber, Jira,
healthcheck, hardware, Eluna, Nagios....all collectible data
Handling Incident Response - MTTR
Detect a Problem
Communicate to Support/Community/Executives
Begin to take Action
Communicate to Support/Community/Executives
Coordinate Troubleshooting/Diagnosis
Communicate to Support/Community/Executives
Confirm Stability, Resolving Steps
Communicate to Support/Community/Executives
Handling Incident Response - MTTR
Rule # 24: Assign people to be point people for every bit
of technology
Rule # 25: Assign Backup People to those People
Rule #12: Know your bottlenecks, and how to spot them.
Rule # 42: Create gigantic poster size drawings of the
physical layouts of your data center
Rule #43: Create gigantic poster size drawings of the
logical flows of each part of your product.
XKCD #974:
I find that when someone is taking time to do something
right in the present, they're a perfectionist with no ability
to prioritize, whereas when someone took time to do
something right in the past, they're a master artisan of
great foresight.

Weitere ähnliche Inhalte

Was ist angesagt?

AllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAaron Rinehart
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 
SecureWorld: Security is Dead, Rugged DevOps 1f
SecureWorld:  Security is Dead, Rugged DevOps 1fSecureWorld:  Security is Dead, Rugged DevOps 1f
SecureWorld: Security is Dead, Rugged DevOps 1fGene Kim
 
Vulnerability Management Program
Vulnerability Management ProgramVulnerability Management Program
Vulnerability Management ProgramDennis Chaupis
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...Daniel Kanchev
 
2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systemsJaap van Ekris
 
Major Incident - make your NOC Rock
Major Incident - make your NOC RockMajor Incident - make your NOC Rock
Major Incident - make your NOC RockBob Fishman
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom
 
2012 05 corp fin 1c
2012 05 corp fin 1c2012 05 corp fin 1c
2012 05 corp fin 1cGene Kim
 
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)DJ Schleen
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
 
Lost cause analysis - Alarm Management
Lost cause analysis - Alarm ManagementLost cause analysis - Alarm Management
Lost cause analysis - Alarm ManagementDan Young
 
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systems
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systemsIct 9 module 4, lesson 2.3 techniques for diagnosing computer systems
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systemsYonel Cadapan
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 
Mongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedMongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedStanford University
 
Ict 9 module 4, lesson 1.1 safety precautions
Ict 9 module 4, lesson 1.1 safety precautionsIct 9 module 4, lesson 1.1 safety precautions
Ict 9 module 4, lesson 1.1 safety precautionsYonel Cadapan
 
What We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsWhat We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsSeniorStoryteller
 
HSSEQ-RECM Presentation.
HSSEQ-RECM Presentation.HSSEQ-RECM Presentation.
HSSEQ-RECM Presentation.Marco Parodi
 

Was ist angesagt? (20)

AllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security DifferentlyAllDayDevOps 2020 Aaron Rinehart Security Differently
AllDayDevOps 2020 Aaron Rinehart Security Differently
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 
SecureWorld: Security is Dead, Rugged DevOps 1f
SecureWorld:  Security is Dead, Rugged DevOps 1fSecureWorld:  Security is Dead, Rugged DevOps 1f
SecureWorld: Security is Dead, Rugged DevOps 1f
 
Vulnerability Management Program
Vulnerability Management ProgramVulnerability Management Program
Vulnerability Management Program
 
The R.O.A.D to DevOps
The R.O.A.D to DevOpsThe R.O.A.D to DevOps
The R.O.A.D to DevOps
 
More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
 
2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems2016-04-28 - VU Amsterdam - testing safety critical systems
2016-04-28 - VU Amsterdam - testing safety critical systems
 
Major Incident - make your NOC Rock
Major Incident - make your NOC RockMajor Incident - make your NOC Rock
Major Incident - make your NOC Rock
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
 
2012 05 corp fin 1c
2012 05 corp fin 1c2012 05 corp fin 1c
2012 05 corp fin 1c
 
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the Way
 
NCET Tech
NCET Tech NCET Tech
NCET Tech
 
Lost cause analysis - Alarm Management
Lost cause analysis - Alarm ManagementLost cause analysis - Alarm Management
Lost cause analysis - Alarm Management
 
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systems
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systemsIct 9 module 4, lesson 2.3 techniques for diagnosing computer systems
Ict 9 module 4, lesson 2.3 techniques for diagnosing computer systems
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Mongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedMongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons Learned
 
Ict 9 module 4, lesson 1.1 safety precautions
Ict 9 module 4, lesson 1.1 safety precautionsIct 9 module 4, lesson 1.1 safety precautions
Ict 9 module 4, lesson 1.1 safety precautions
 
What We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOpsWhat We Learned from Three Years of Sciencing the Crap Out of DevOps
What We Learned from Three Years of Sciencing the Crap Out of DevOps
 
HSSEQ-RECM Presentation.
HSSEQ-RECM Presentation.HSSEQ-RECM Presentation.
HSSEQ-RECM Presentation.
 

Andere mochten auch

Co vše "umí" otázka
Co vše "umí" otázkaCo vše "umí" otázka
Co vše "umí" otázkaSIMAR
 
Scalability and consistency
Scalability and consistencyScalability and consistency
Scalability and consistencyJonathan Creasy
 
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia Digital
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia DigitalEl estudio Nielsen analiza a los nuevos consumidores online | Estrategia Digital
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia DigitalOscar García
 
Medialni data data data
Medialni data data data Medialni data data data
Medialni data data data SIMAR
 
Výzkumy veřejného mínění v mezinárodním kontextu
Výzkumy veřejného mínění v mezinárodním kontextuVýzkumy veřejného mínění v mezinárodním kontextu
Výzkumy veřejného mínění v mezinárodním kontextuSIMAR
 
Využití a propojování informačních zdrojů
Využití a propojování informačních zdrojůVyužití a propojování informačních zdrojů
Využití a propojování informačních zdrojůSIMAR
 
Statistické informace ČSÚ
Statistické informace ČSÚStatistické informace ČSÚ
Statistické informace ČSÚSIMAR
 
Nuevas herramientas de Google | Google Q3
Nuevas herramientas de Google | Google Q3Nuevas herramientas de Google | Google Q3
Nuevas herramientas de Google | Google Q3Oscar García
 
Nuevas herramientas de Google para Marketing
Nuevas herramientas de Google para Marketing Nuevas herramientas de Google para Marketing
Nuevas herramientas de Google para Marketing Oscar García
 
Data a potřeby výzkumníka
Data a potřeby výzkumníkaData a potřeby výzkumníka
Data a potřeby výzkumníkaSIMAR
 
Volně prodejné studie
Volně prodejné studie Volně prodejné studie
Volně prodejné studie SIMAR
 
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...SIMAR
 
Otevřená data a sociologické zkoumání
Otevřená data a sociologické zkoumáníOtevřená data a sociologické zkoumání
Otevřená data a sociologické zkoumáníSIMAR
 
Pasport výzkumů veřejného mínění
Pasport výzkumů veřejného míněníPasport výzkumů veřejného mínění
Pasport výzkumů veřejného míněníSIMAR
 
Curso SEM / Adwords - Industrial & Ecommerce Webinar
Curso SEM / Adwords - Industrial & Ecommerce WebinarCurso SEM / Adwords - Industrial & Ecommerce Webinar
Curso SEM / Adwords - Industrial & Ecommerce WebinarOscar García
 

Andere mochten auch (16)

Co vše "umí" otázka
Co vše "umí" otázkaCo vše "umí" otázka
Co vše "umí" otázka
 
Scalability and consistency
Scalability and consistencyScalability and consistency
Scalability and consistency
 
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia Digital
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia DigitalEl estudio Nielsen analiza a los nuevos consumidores online | Estrategia Digital
El estudio Nielsen analiza a los nuevos consumidores online | Estrategia Digital
 
Medialni data data data
Medialni data data data Medialni data data data
Medialni data data data
 
Výzkumy veřejného mínění v mezinárodním kontextu
Výzkumy veřejného mínění v mezinárodním kontextuVýzkumy veřejného mínění v mezinárodním kontextu
Výzkumy veřejného mínění v mezinárodním kontextu
 
Využití a propojování informačních zdrojů
Využití a propojování informačních zdrojůVyužití a propojování informačních zdrojů
Využití a propojování informačních zdrojů
 
Statistické informace ČSÚ
Statistické informace ČSÚStatistické informace ČSÚ
Statistické informace ČSÚ
 
Nuevas herramientas de Google | Google Q3
Nuevas herramientas de Google | Google Q3Nuevas herramientas de Google | Google Q3
Nuevas herramientas de Google | Google Q3
 
Nuevas herramientas de Google para Marketing
Nuevas herramientas de Google para Marketing Nuevas herramientas de Google para Marketing
Nuevas herramientas de Google para Marketing
 
Data a potřeby výzkumníka
Data a potřeby výzkumníkaData a potřeby výzkumníka
Data a potřeby výzkumníka
 
Volně prodejné studie
Volně prodejné studie Volně prodejné studie
Volně prodejné studie
 
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...
Co ovlivňuje výsledky výzkumů veřejného mínění a proč se výzkumy na Slovensku...
 
Otevřená data a sociologické zkoumání
Otevřená data a sociologické zkoumáníOtevřená data a sociologické zkoumání
Otevřená data a sociologické zkoumání
 
Pasport výzkumů veřejného mínění
Pasport výzkumů veřejného míněníPasport výzkumů veřejného mínění
Pasport výzkumů veřejného mínění
 
Curso SEM / Adwords - Industrial & Ecommerce Webinar
Curso SEM / Adwords - Industrial & Ecommerce WebinarCurso SEM / Adwords - Industrial & Ecommerce Webinar
Curso SEM / Adwords - Industrial & Ecommerce Webinar
 
Docker How and Why
Docker How and WhyDocker How and Why
Docker How and Why
 

Ähnlich wie Normal accidents and outpatient surgeries

Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Are you ready for the next attack? Reviewing the SP Security Checklist
Are you ready for the next attack? Reviewing the SP Security ChecklistAre you ready for the next attack? Reviewing the SP Security Checklist
Are you ready for the next attack? Reviewing the SP Security ChecklistAPNIC
 
Are you ready for the next attack? reviewing the sp security checklist (apnic...
Are you ready for the next attack? reviewing the sp security checklist (apnic...Are you ready for the next attack? reviewing the sp security checklist (apnic...
Are you ready for the next attack? reviewing the sp security checklist (apnic...Barry Greene
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Building a Modern Security Engineering Organization
Building a Modern Security Engineering OrganizationBuilding a Modern Security Engineering Organization
Building a Modern Security Engineering OrganizationZane Lackey
 
Effective Software Testing for Modern Software Development
Effective Software Testing for Modern Software DevelopmentEffective Software Testing for Modern Software Development
Effective Software Testing for Modern Software DevelopmentAlan Richardson
 
Puppeting in a Highly Regulated Industry
Puppeting in a Highly Regulated IndustryPuppeting in a Highly Regulated Industry
Puppeting in a Highly Regulated IndustryPuppet
 
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...Institute of Contemporary Sciences
 
beginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfbeginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfValerioArvizzigno1
 
The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call Raygun
 
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019Aaron Rinehart
 
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over Brian Troutwine
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
Automation projects successful retrofit
Automation projects  successful retrofitAutomation projects  successful retrofit
Automation projects successful retrofitPratap Chandra
 
Safety and security in mission critical IoT systems
Safety and security in mission critical IoT systemsSafety and security in mission critical IoT systems
Safety and security in mission critical IoT systemsEinar Landre
 

Ähnlich wie Normal accidents and outpatient surgeries (20)

Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Are you ready for the next attack? Reviewing the SP Security Checklist
Are you ready for the next attack? Reviewing the SP Security ChecklistAre you ready for the next attack? Reviewing the SP Security Checklist
Are you ready for the next attack? Reviewing the SP Security Checklist
 
Are you ready for the next attack? reviewing the sp security checklist (apnic...
Are you ready for the next attack? reviewing the sp security checklist (apnic...Are you ready for the next attack? reviewing the sp security checklist (apnic...
Are you ready for the next attack? reviewing the sp security checklist (apnic...
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
P1,P2,P3,M1,M2, D1
P1,P2,P3,M1,M2, D1P1,P2,P3,M1,M2, D1
P1,P2,P3,M1,M2, D1
 
Building a Modern Security Engineering Organization
Building a Modern Security Engineering OrganizationBuilding a Modern Security Engineering Organization
Building a Modern Security Engineering Organization
 
Effective Software Testing for Modern Software Development
Effective Software Testing for Modern Software DevelopmentEffective Software Testing for Modern Software Development
Effective Software Testing for Modern Software Development
 
Puppeting in a Highly Regulated Industry
Puppeting in a Highly Regulated IndustryPuppeting in a Highly Regulated Industry
Puppeting in a Highly Regulated Industry
 
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
The Dark side of AI: Psychology of automation for data scientists - Alex Pall...
 
beginners-guide-to-observability.pdf
beginners-guide-to-observability.pdfbeginners-guide-to-observability.pdf
beginners-guide-to-observability.pdf
 
The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call
 
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019
 
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
Fault-tolerance on the Cheap: Making Systems That (Probably) Won't Fall Over
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
Automation projects successful retrofit
Automation projects  successful retrofitAutomation projects  successful retrofit
Automation projects successful retrofit
 
Unit 1 se
Unit 1 seUnit 1 se
Unit 1 se
 
Safety and security in mission critical IoT systems
Safety and security in mission critical IoT systemsSafety and security in mission critical IoT systems
Safety and security in mission critical IoT systems
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Normal accidents and outpatient surgeries

  • 1. Normal Accidents and Outpatient Surgeries Resilience Engineering Done Right
  • 2. Safety in a Complex and Changing Environment "...so safety isn't about the absence of something...that you need to count errors or monitor violations. But the presence of something. But the presence of what? When we need to find that things go right under difficult circumstances, it's mostly because of people's adaptive capability; their ability to recognize, adapt to, and absorb changes and disruptions, some of which might fall outside of what the system is designed or trained to handle" -Sidney Dekker
  • 3. Safety in a Complex and Changing Environment "...so safety isn't about the absence of something...that you need to count errors or monitor violations. But the presence of something. But the presence of what? When we need to find that things go right under difficult circumstances, it's mostly because of people's adaptive capability; their ability to recognize, adapt to, and absorb changes and disruptions, some of which might fall outside of what the system is designed or trained to handle" -Sidney Dekker RESILIENCE
  • 4. Vocabulary Lesson Continuous Integration: The ability to quickly make sure the system is ready for production.
  • 5. Vocabulary Lesson Continuous Integration: The ability to quickly make sure the system is ready for production. Resilience: The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances in order to sustain required operations.
  • 6. Vocabulary Lesson Continuous Integration: The ability to quickly make sure the system is ready for production. Resilience: The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances in order to sustain required operations. Maintainability: Characteristic of design and installation which determines the probability that a failed equipment, machine, or system can be restored to its normal state within a given timeframe.
  • 7. Vocabulary Lesson Continuous Integration: The ability to quickly make sure the system is ready for production. Resilience: The intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances in order to sustain required operations. Maintainability: Characteristic of design and installation which determines the probability that a failed equipment, machine, or system can be restored to its normal state within a given timeframe. The SYSTEM includes all the hardware and software, but also all of the PEOPLE involved.
  • 8. Maintainability = Uptime Goodness MTTR vs. MTBF
  • 9. Maintainability = Uptime Goodness MTTR vs. MTBF Low MTTR > Low MTBF
  • 10. Maintainability = Uptime Goodness MTTR vs. MTBF Low MTTR > Low MTBF Low MTTR = Better Uptime for most types of F
  • 11. Maintainability = Uptime Goodness MTTR vs. MTBF Low MTTR > Low MTBF Low MTTR = Better Uptime for most types of F Low MTTR Requires:  • more useful metrics • intelligent data analysis • pre-planned, purposeful resilience • cooperation between application and infrastructure
  • 14. Automation as a Default: "One of the be st wa ys to e lim ina te hum a n proble m s is to ta ke the hum a n out of the proble m . Ma chine s a re ve ry good a t doing things re pe a te dly a nd doing the m the sa m e wa y e ve ry single tim e . Hum a ns a re not good a t this. Le t the m a chine s do it.” Rapid Recovery: "Do we spe nd a n unpre dicta ble a m ount of tim e trying to solve som e obscure issue , or do we sim ply re cre a te the insta nce providing the se rvice from configura tion m a na ge m e nt" blog.lusis.org/blog/2011/10/18/deploy-all-the-things/
  • 15. Automation as a Default: "One of the best ways to eliminate human problems is to take the human out of the problem. Machines are very good at doing things repeatedly and doing them the same way every single time. Humans are not good at this. Let the machines do it." Rapid Recovery: "Do we spe nd a n unpre dicta ble a m ount of tim e trying to solve som e obscure issue , or do we sim ply re cre a te the insta nce providing the se rvice from configura tion m a na ge m e nt" blog.lusis.org/blog/2011/10/18/deploy-all-the-things/ PUPPET + KICKSTART + Network Automation
  • 16. Automation as a Default: "One of the best ways to eliminate human problems is to take the human out of the problem. Machines are very good at doing things repeatedly and doing them the same way every single time. Humans are not good at this. Let the machines do it." Rapid Recovery: "Do we spend an unpredictable amount of time trying to solve some obscure issue, or do we simply recreate the instance providing the service from configuration management" blog.lusis.org/blog/2011/10/18/deploy-all-the-things/ PUPPET + KICKSTART + Network Automation ESPER + HEALTHCHECK + NAGIOS + SPLUNK+ OHSHIT
  • 17. Comfortable Changes 1) Are Small • Many Small Changes = Fewer Incidents with lower MTTR
  • 18. Comfortable Changes 1) Are Small • Many Small Changes = Fewer Incidents with lower MTTR 2) Are Reproducible RPM: • Really Peaceful Mornings • Reduce Paging Monitors • Reusable Provisioning Methods
  • 19. Comfortable Changes 1) Are Small • Many Small Changes = Fewer Incidents with lower MTTR 2) Are Reproducible RPM: • Really Peaceful Mornings • Reduce Paging Monitors • Reusable Provisioning Methods Rule # 81: If you are logging into servers, you are doing it wrong.
  • 20. Comfortable Changes 3) Are easily understood by your most junior team members
  • 21. Comfortable Changes 3) Are easily understood by your most junior team members Rule # 4: Keep it Simple, because you are smart. Do not make it overly complex because you can.
  • 22. Comfortable Changes 3) Are easily understood by your most junior team members Rule # 4: Keep it Simple, because you are smart. Do not make it overly complex because you can. 4) Can be deployed to a subset of production systems
  • 24. Comfortable Changes 5) Follow Process Change control, deployment processes, peer review, all of these things matter for a world-class OPS organization.
  • 25. Comfortable Changes 6) Have been approved by a GO / NO-GO process with all relevant parties checking in.
  • 26. Comfortable Changes 6) Have been approved by a GO / NO-GO process with all relevant parties checking in. Ensure that all teams involved in a change have signed off, including ON-CALL and CUSTOMER SERVICE
  • 28.
  • 29. Small Changes John Allspaw presented these graphs of data gathered at Etsy. More Smaller Deployments means Faster MTTR means Fewer Minutes of Disruption
  • 30.
  • 31.
  • 32. Operations Meta-Metrics When in doubt, COLLECT DATA, Build a Timeline! Things to Monitor: Changes (who/what/when/type) Incidents (Type/Severity/Duration) Responses to Incidents (TTD/TTR) Things to Collect: IRC/Jabber Logs Jira Logs Search your Data: Use HBASE+PIG/HIVE, ESPER, SOLR and SPLUNK Store everything, even stuff you don't yet know how to use.
  • 33. Tracking Incidents - MTTD 1. Frequency 2.Severity 3.Root Cause: Five Whys Mentality o why was the website down? The CPU utilization on all our front-end servers went to 100% o why did the CPU usage spike? A new bit of code contained an infinite loop! o why did that code get written? So-and-so made a mistake o why did his mistake get checked in? He didn't write a unit test for the feature o why didn't he write a unit test? He's a new employee, and he was not properly trained 1. Time-to-Detect 2.Time-to-Resolve
  • 34. Tracking Incidents - MTTD Rule # 18: Monitor EVERYTHING, alert on actionable items only, record other for trend information. Rule # 20: Do not make the monitoring system so noisy it is useless.
  • 35. Tracking Incidents - MTTD Data Points to source these metrics from: Output from Application, CLOG, Puppet, Jabber, Jira, healthcheck, hardware, Eluna, Nagios....all collectible data
  • 36. Handling Incident Response - MTTR Detect a Problem Communicate to Support/Community/Executives Begin to take Action Communicate to Support/Community/Executives Coordinate Troubleshooting/Diagnosis Communicate to Support/Community/Executives Confirm Stability, Resolving Steps Communicate to Support/Community/Executives
  • 37. Handling Incident Response - MTTR Rule # 24: Assign people to be point people for every bit of technology Rule # 25: Assign Backup People to those People Rule #12: Know your bottlenecks, and how to spot them. Rule # 42: Create gigantic poster size drawings of the physical layouts of your data center Rule #43: Create gigantic poster size drawings of the logical flows of each part of your product.
  • 38. XKCD #974: I find that when someone is taking time to do something right in the present, they're a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they're a master artisan of great foresight.