Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Limiting Damage During Chaos Experiments
Nils Meder | Computer Scientist @ Adobe
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Agenda
• Doing Chaos In Your Production System...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Doing Chaos In Your Production System
• Testin...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Building A Context Around Your Experiments
• C...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Protect Your Infrastructure
• Target Infrastru...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Example: Kill Random Instances
• Terminate Ran...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Protect Your Application
• Plan For Chaos in Y...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Resilience Patterns
• Bulk Heads
• Building Fa...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Resilience Patterns
• “Release It!” - Michael ...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Wrap-Up & Discussion
• Expect The Unexpected
•...
© 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
References
• Resilience Patterns: http://de.sl...
Chaos Engineering - Limiting Damage During Chaos Experiments
Nächste SlideShare
Wird geladen in …5
×

Chaos Engineering - Limiting Damage During Chaos Experiments

282 Aufrufe

Veröffentlicht am

When doing chaos experiments, how can we avoid damage and how can we limit experiments to certain infrastructure/application parts.

Veröffentlicht in: Ingenieurwesen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Chaos Engineering - Limiting Damage During Chaos Experiments

  1. 1. Limiting Damage During Chaos Experiments Nils Meder | Computer Scientist @ Adobe
  2. 2. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Agenda • Doing Chaos In Your Production System • Building A Context Around Your Experiment • Protect Your Infrastructure • Example: Kill Random Instances • Protect Your Application • Resilience Patterns • Wrap-Up & Discussion 2
  3. 3. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Doing Chaos In Your Production System • Testing in Production is The Ultimate Goal • But, It is Not The First Step • There are Always Differences Between Staging and Production • Scale, Networking, Datasets, … • Start In Staging Environment • Make Sure Doesn’t Bring Down The Whole Service • “Know Your Enemy” - Have A Clear View of Your Environment • Iterate Over Your Experiments • Be Brave - Having Just Some Basic Tests Running in Production is Better Than None 3
  4. 4. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Building A Context Around Your Experiments • Chaos Testing is Not Just Pull The Plug • Focus On Business Critical Scenarios/Components First • Have A Clear Goal, e.g. What Happens When The Network Fails? • Focus - Run One Experiment At a Time • Monitor Your Experiments • Define Fallbacks And Defaults 4
  5. 5. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Protect Your Infrastructure • Target Infrastructure Components • Think About Recovery • Take Snapshots • Limit The Damage To Single Instances • Limit The Damage To Groups of Instances • Of The Same Kind • Within The Same Workflow • Limit Percentage Of Impact • Limit What Chaos Tests Are Allowed To Do 5
  6. 6. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Example: Kill Random Instances • Terminate Random EC2 Instances • Focus: • What Happens If A Number Of My Servers Die? • Does Autoscaling Work? • Is the Web API still serving requests? • The Test is Only Allowed To Terminate Instances • Simulate Experiment Before • Take An Environment Snapshot • Run The Test 6 Chaos Test App1 App2App3 Client Appx
  7. 7. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Protect Your Application • Plan For Chaos in Your Application • Fail Fast, But Keep The Streams Flowing • Build Your Application Isolated • Apply Loose Coupling • Introduce Latency Control • Real-Time Data and Diagnostics 7
  8. 8. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Resilience Patterns • Bulk Heads • Building Failure Units • Protect App Against Cross-Failures • Event-Driven & Stateless • Embrace Loose Coupling • Circuit Breaker • Timeouts • Fallbacks • Healthchecks 8
  9. 9. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Resilience Patterns • “Release It!” - Michael Nygard • More On Resilience Patterns, Anit-Patterns and Case-Studies • ISBN-13: 978-0978739218 9
  10. 10. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Wrap-Up & Discussion • Expect The Unexpected • Failures Are The Normal Case & Not Predictable • Do Not Try To Avoid Failures. Embrace Them. • Chaos Engineering Helps To Discover Weak Points • Apply Resilience Patterns 10
  11. 11. © 2016 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. References • Resilience Patterns: http://de.slideshare.net/ufried/patterns-of-resilience • Bulk Heads: http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html • Making APIs More Resilient: http://techblog.netflix.com/2011/12/making-netflix-api-more- resilient.html • “Release It!” - Michael Nygard 12

×