SlideShare ist ein Scribd-Unternehmen logo
1 von 113
AA261
     DevOps lessons in
collaborative maintenance
Lindsay Holmwood
    @auxesis
Software Manager
          @
Bulletproof Networks
Trigger warning: death
January 31, 2000




Puerto Vallarta
Seattle
Departed PVR at 13.37 PST
Ascended to 31,000ft
2 hours into flight:
Jammed horizontal stabiliser
No trim control
Redirected to LAX
Pilots unjammed
horizontal stabilisers
2 pilots
3 crew
83 passengers
This is a maintenance accident. Alaska
Airlines' maintenance and inspection of its
horizontal stabilizer activation system was
poorly conceived and woefully executed. The
failure was compounded by poor oversight...
had any of the managers, mechanics,
inspectors, supervisors or FAA overseers
whose job it was to protect this mechanism
done their job conscientiously, this accident
cannot happen.
                  -- John J. Goglia, NTSB Board Member
hindsight != foresight
[hindsight] converts a once
vague, unlikely future into an
   immediate, certain past
                   -- Sidney Dekker
This is a maintenance accident. Alaska
Airlines' maintenance and inspection of its
horizontal stabilizer activation system was
poorly conceived and woefully executed. The
failure was compounded by poor oversight...
had any of the managers, mechanics,
inspectors, supervisors or FAA overseers
whose job it was to protect this mechanism
done their job conscientiously, this accident
cannot happen.
                  -- John J. Goglia, NTSB Board Member
This is a maintenance accident. Alaska
Airlines' maintenance and inspection of its
horizontal stabilizer activation system was
poorly conceived and woefully executed. The
failure was compounded by poor oversight...
had any of the managers, mechanics,
inspectors, supervisors or FAA overseers
whose job it was to protect this mechanism
done their job conscientiously, this accident
cannot happen.
                  -- John J. Goglia, NTSB Board Member
“poorly conceived
       and
woefully executed”
DC-9 -> MD-80 -> MD-83
Evolutionary
product development
Appropriated
maintenance schedules
Jackscrew
lubrication interval
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
1965   every 300-350 hours           launch of DC-9


1985   every 700 hours               industry deregulation


1987   every 1000 hours              industry standardisation


1991   every 1200 hours              industry standardisation


1994   every 1600 hours              industry standardisation


1996   every 8 months (2550 hours)   Alaska Airlines policy change
Decrementalism
Complex system constraints
     Jens Rasmussen
wo
  rkl
     oad
wo
            rkl
               oad


economy
wo
            rkl
               oad


economy        saf
           ety
tim
   e
e
tim



      cost
ty
     ali
qu            cost
e
 tim
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
           ety
wo
            rkl
               oad


economy        saf
          ety
outside: failure of foresight




   oad



                           saf
rkl




                                ety
wo



         economy
outside: failure of foresight




   oad



                           saf
rkl            inside:




                                ety
             trade-offs
wo
          in direction of
         greater efficiency



         economy
trade-offs
 in direction of
greater efficiency
trade-offs
 in direction of
greater efficiency
Constraints on knowledge
Why would they make bad
decisions intentionally?
Decisions seemed rational
Local rationalisation
“people make what they
  consider to be the best
decision based on available
 knowledge at the time”
This is a maintenance accident. Alaska
Airlines' maintenance and inspection of its
horizontal stabilizer activation system was
poorly conceived and woefully executed. The
failure was compounded by poor oversight...
had any of the managers, mechanics,
inspectors, supervisors or FAA overseers
whose job it was to protect this mechanism
done their job conscientiously, this accident
cannot happen.
                  -- John J. Goglia, NTSB Board Member
wo
            rkl
               oad


economy        saf
          ety
ty
     ali
qu            cost
e
 tim
Devops constraints
“God, our ops team are arseholes. I just want
to deploy this change and go home!”
“God, our ops team are arseholes. I just want
to deploy this change and go home!”
        oad




                        saf
        rkl




                         ety
      wo




              economy
“God, our ops team are arseholes. I just want
to deploy this change and go home!”
        oad




                                 oad
                        saf




                                                 saf
        rkl




                                 rkl
                         ety




                                                  ety
      wo




                               wo
              economy                  economy
What are the circumstances?
Where are the tensions?
Have ops been burnt before?
Is there deployment friction?
            Why?
Is deployment high-risk?
Is deployment time consuming?
Is deployment important
    to the business?
“It’s 3am an the pager has gone off again. Why
can’t these devs just write code that works?”
“It’s 3am an the pager has gone off again. Why
can’t these devs just write code that works?”
        oad




                        saf
        rkl




                         ety
      wo




              economy
“It’s 3am an the pager has gone off again. Why
can’t these devs just write code that works?”
        oad




                                 oad
                        saf




                                                 saf
        rkl




                                 rkl
                         ety




                                                  ety
      wo




                               wo
              economy                  economy
[hindsight] converts a once
vague, unlikely future into an
   immediate, certain past
                   -- Sidney Dekker
What are the circumstances?
Where are the tensions?
Why didn’t the dev know the
 code would fail like this?
Why weren’t you involved
when the code was written?
How is code reviewed?
Is the infrastructure anti-fragile?
Is the code anti-fragile?
Hindsight bias
[hindsight] converts a once
vague, unlikely future into an
   immediate, certain past
                   -- Sidney Dekker
What are the motivations?
“amoral actors”
wo
            rkl
               oad


economy
               saf
           ety
wo
            rkl
               oad


economy
               saf
           ety
“root cause” is simply the
point you stop looking
                    -- Sidney Dekker
What are the circumstances?
Where are the tensions?
Thank you!
Thank you!
Liked the talk? Let @auxesis know!
Sidney Dekker [books]
Field Guide to Understand Human Error
Drift Into Failure
Just Culture

Dan Manges [blog]
How incidents affect infrastructure priorities

Weitere ähnliche Inhalte

Ähnlich wie AA261: DevOps lessons in collaborative maintenance

Ips connect 2015 marc mueller stoffels
Ips connect 2015 marc mueller stoffelsIps connect 2015 marc mueller stoffels
Ips connect 2015 marc mueller stoffelsjames_hamilton
 
Career Choices 25 08 2008
Career Choices 25 08 2008Career Choices 25 08 2008
Career Choices 25 08 2008Mona El-Tahan
 
AGME Expansion Project Final 122104
AGME Expansion Project Final 122104AGME Expansion Project Final 122104
AGME Expansion Project Final 122104Richard Houdlette
 
Future of the U.S. Energy Grid
Future of the U.S. Energy GridFuture of the U.S. Energy Grid
Future of the U.S. Energy Gridthinkdsi
 
Integrated Rig Stacking Solutions LLC
Integrated Rig Stacking Solutions LLCIntegrated Rig Stacking Solutions LLC
Integrated Rig Stacking Solutions LLCChip Keener
 
Helicopter Aviation: Human Factors
Helicopter Aviation: Human FactorsHelicopter Aviation: Human Factors
Helicopter Aviation: Human FactorsIHSTFAA
 
Compressed air repair implementation
Compressed air repair implementationCompressed air repair implementation
Compressed air repair implementationTNenergy
 
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy Performance
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy PerformanceAppendix A5_Summary of Analysis for Okhta Tower Facade Energy Performance
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy PerformanceRichard D. Ochotorena
 
Shimpo high precision catalog
Shimpo high precision catalogShimpo high precision catalog
Shimpo high precision catalogElectromate
 
Ppt for automatic plant irrigation system
Ppt for automatic plant irrigation systemPpt for automatic plant irrigation system
Ppt for automatic plant irrigation systemstk25
 

Ähnlich wie AA261: DevOps lessons in collaborative maintenance (11)

Ips connect 2015 marc mueller stoffels
Ips connect 2015 marc mueller stoffelsIps connect 2015 marc mueller stoffels
Ips connect 2015 marc mueller stoffels
 
Career Choices 25 08 2008
Career Choices 25 08 2008Career Choices 25 08 2008
Career Choices 25 08 2008
 
AGME Expansion Project Final 122104
AGME Expansion Project Final 122104AGME Expansion Project Final 122104
AGME Expansion Project Final 122104
 
Future of the U.S. Energy Grid
Future of the U.S. Energy GridFuture of the U.S. Energy Grid
Future of the U.S. Energy Grid
 
Integrated Rig Stacking Solutions LLC
Integrated Rig Stacking Solutions LLCIntegrated Rig Stacking Solutions LLC
Integrated Rig Stacking Solutions LLC
 
Reliability
ReliabilityReliability
Reliability
 
Helicopter Aviation: Human Factors
Helicopter Aviation: Human FactorsHelicopter Aviation: Human Factors
Helicopter Aviation: Human Factors
 
Compressed air repair implementation
Compressed air repair implementationCompressed air repair implementation
Compressed air repair implementation
 
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy Performance
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy PerformanceAppendix A5_Summary of Analysis for Okhta Tower Facade Energy Performance
Appendix A5_Summary of Analysis for Okhta Tower Facade Energy Performance
 
Shimpo high precision catalog
Shimpo high precision catalogShimpo high precision catalog
Shimpo high precision catalog
 
Ppt for automatic plant irrigation system
Ppt for automatic plant irrigation systemPpt for automatic plant irrigation system
Ppt for automatic plant irrigation system
 

Mehr von Lindsay Holmwood

Escalating complexity: DevOps learnings from Air France 447
Escalating complexity: DevOps learnings from Air France 447Escalating complexity: DevOps learnings from Air France 447
Escalating complexity: DevOps learnings from Air France 447Lindsay Holmwood
 
Islands: Puppet at Bulletproof Networks
Islands: Puppet at Bulletproof NetworksIslands: Puppet at Bulletproof Networks
Islands: Puppet at Bulletproof NetworksLindsay Holmwood
 
Latency: The Silent Monitoring System Killer
Latency: The Silent Monitoring System KillerLatency: The Silent Monitoring System Killer
Latency: The Silent Monitoring System KillerLindsay Holmwood
 
Rump - making Puppetmaster-less Puppet meaty
Rump - making Puppetmaster-less Puppet meatyRump - making Puppetmaster-less Puppet meaty
Rump - making Puppetmaster-less Puppet meatyLindsay Holmwood
 
Behaviour driven infrastructure
Behaviour driven infrastructureBehaviour driven infrastructure
Behaviour driven infrastructureLindsay Holmwood
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
Behaviour Driven Monitoring with cucumber-nagios
Behaviour Driven Monitoring with cucumber-nagiosBehaviour Driven Monitoring with cucumber-nagios
Behaviour Driven Monitoring with cucumber-nagiosLindsay Holmwood
 
Flapjack: rethinking monitoring for the cloud
Flapjack: rethinking monitoring for the cloudFlapjack: rethinking monitoring for the cloud
Flapjack: rethinking monitoring for the cloudLindsay Holmwood
 
Monitoring web application behaviour with cucumber-nagios
Monitoring web application behaviour with cucumber-nagiosMonitoring web application behaviour with cucumber-nagios
Monitoring web application behaviour with cucumber-nagiosLindsay Holmwood
 
Your own (little) gem: building an online business with Ruby
Your own (little) gem: building an online business with RubyYour own (little) gem: building an online business with Ruby
Your own (little) gem: building an online business with RubyLindsay Holmwood
 

Mehr von Lindsay Holmwood (12)

Escalating complexity: DevOps learnings from Air France 447
Escalating complexity: DevOps learnings from Air France 447Escalating complexity: DevOps learnings from Air France 447
Escalating complexity: DevOps learnings from Air France 447
 
Islands: Puppet at Bulletproof Networks
Islands: Puppet at Bulletproof NetworksIslands: Puppet at Bulletproof Networks
Islands: Puppet at Bulletproof Networks
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Latency: The Silent Monitoring System Killer
Latency: The Silent Monitoring System KillerLatency: The Silent Monitoring System Killer
Latency: The Silent Monitoring System Killer
 
Rump - making Puppetmaster-less Puppet meaty
Rump - making Puppetmaster-less Puppet meatyRump - making Puppetmaster-less Puppet meaty
Rump - making Puppetmaster-less Puppet meaty
 
Behaviour driven infrastructure
Behaviour driven infrastructureBehaviour driven infrastructure
Behaviour driven infrastructure
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Behaviour Driven Monitoring with cucumber-nagios
Behaviour Driven Monitoring with cucumber-nagiosBehaviour Driven Monitoring with cucumber-nagios
Behaviour Driven Monitoring with cucumber-nagios
 
Flapjack: rethinking monitoring for the cloud
Flapjack: rethinking monitoring for the cloudFlapjack: rethinking monitoring for the cloud
Flapjack: rethinking monitoring for the cloud
 
Monitoring web application behaviour with cucumber-nagios
Monitoring web application behaviour with cucumber-nagiosMonitoring web application behaviour with cucumber-nagios
Monitoring web application behaviour with cucumber-nagios
 
Your own (little) gem: building an online business with Ruby
Your own (little) gem: building an online business with RubyYour own (little) gem: building an online business with Ruby
Your own (little) gem: building an online business with Ruby
 
Deploying Merb
Deploying MerbDeploying Merb
Deploying Merb
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 

AA261: DevOps lessons in collaborative maintenance

  • 1. AA261 DevOps lessons in collaborative maintenance
  • 2. Lindsay Holmwood @auxesis
  • 3. Software Manager @ Bulletproof Networks
  • 4.
  • 8. Departed PVR at 13.37 PST
  • 10. 2 hours into flight: Jammed horizontal stabiliser
  • 13.
  • 15.
  • 16.
  • 17. 2 pilots 3 crew 83 passengers
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. This is a maintenance accident. Alaska Airlines' maintenance and inspection of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  • 24. [hindsight] converts a once vague, unlikely future into an immediate, certain past -- Sidney Dekker
  • 25. This is a maintenance accident. Alaska Airlines' maintenance and inspection of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  • 26. This is a maintenance accident. Alaska Airlines' maintenance and inspection of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  • 27.
  • 28. “poorly conceived and woefully executed”
  • 29.
  • 30.
  • 31.
  • 32. DC-9 -> MD-80 -> MD-83
  • 33.
  • 37.
  • 38. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 39. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 40. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 41. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 42. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 43. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 44. 1965 every 300-350 hours launch of DC-9 1985 every 700 hours industry deregulation 1987 every 1000 hours industry standardisation 1991 every 1200 hours industry standardisation 1994 every 1600 hours industry standardisation 1996 every 8 months (2550 hours) Alaska Airlines policy change
  • 46. Complex system constraints Jens Rasmussen
  • 47.
  • 48. wo rkl oad
  • 49. wo rkl oad economy
  • 50. wo rkl oad economy saf ety
  • 51.
  • 52. tim e
  • 53. e tim cost
  • 54. ty ali qu cost e tim
  • 55. wo rkl oad economy saf ety
  • 56. wo rkl oad economy saf ety
  • 57. wo rkl oad economy saf ety
  • 58. wo rkl oad economy saf ety
  • 59. wo rkl oad economy saf ety
  • 60. wo rkl oad economy saf ety
  • 61. wo rkl oad economy saf ety
  • 62. wo rkl oad economy saf ety
  • 63. wo rkl oad economy saf ety
  • 64. outside: failure of foresight oad saf rkl ety wo economy
  • 65. outside: failure of foresight oad saf rkl inside: ety trade-offs wo in direction of greater efficiency economy
  • 66. trade-offs in direction of greater efficiency
  • 67. trade-offs in direction of greater efficiency
  • 69. Why would they make bad decisions intentionally?
  • 72. “people make what they consider to be the best decision based on available knowledge at the time”
  • 73. This is a maintenance accident. Alaska Airlines' maintenance and inspection of its horizontal stabilizer activation system was poorly conceived and woefully executed. The failure was compounded by poor oversight... had any of the managers, mechanics, inspectors, supervisors or FAA overseers whose job it was to protect this mechanism done their job conscientiously, this accident cannot happen. -- John J. Goglia, NTSB Board Member
  • 74. wo rkl oad economy saf ety
  • 75. ty ali qu cost e tim
  • 76.
  • 78. “God, our ops team are arseholes. I just want to deploy this change and go home!”
  • 79. “God, our ops team are arseholes. I just want to deploy this change and go home!” oad saf rkl ety wo economy
  • 80. “God, our ops team are arseholes. I just want to deploy this change and go home!” oad oad saf saf rkl rkl ety ety wo wo economy economy
  • 81. What are the circumstances?
  • 82. Where are the tensions?
  • 83. Have ops been burnt before?
  • 84. Is there deployment friction? Why?
  • 86. Is deployment time consuming?
  • 87. Is deployment important to the business?
  • 88.
  • 89. “It’s 3am an the pager has gone off again. Why can’t these devs just write code that works?”
  • 90. “It’s 3am an the pager has gone off again. Why can’t these devs just write code that works?” oad saf rkl ety wo economy
  • 91. “It’s 3am an the pager has gone off again. Why can’t these devs just write code that works?” oad oad saf saf rkl rkl ety ety wo wo economy economy
  • 92. [hindsight] converts a once vague, unlikely future into an immediate, certain past -- Sidney Dekker
  • 93. What are the circumstances?
  • 94. Where are the tensions?
  • 95. Why didn’t the dev know the code would fail like this?
  • 96. Why weren’t you involved when the code was written?
  • 97. How is code reviewed?
  • 98. Is the infrastructure anti-fragile?
  • 99. Is the code anti-fragile?
  • 100.
  • 102. [hindsight] converts a once vague, unlikely future into an immediate, certain past -- Sidney Dekker
  • 103.
  • 104. What are the motivations?
  • 106. wo rkl oad economy saf ety
  • 107. wo rkl oad economy saf ety
  • 108. “root cause” is simply the point you stop looking -- Sidney Dekker
  • 109. What are the circumstances?
  • 110. Where are the tensions?
  • 112. Thank you! Liked the talk? Let @auxesis know!
  • 113. Sidney Dekker [books] Field Guide to Understand Human Error Drift Into Failure Just Culture Dan Manges [blog] How incidents affect infrastructure priorities