SlideShare ist ein Scribd-Unternehmen logo

stackconf 2023 | Bringing Order to Chaos: Make Your Systems More Resilient with Chaos Engineering by Sayan Mondal.pdf

NETWAYS
NETWAYS

Chaos Engineering is a new approach that helps identify & address weaknesses in software systems by intentionally introducing controlled failures. This talk covers principles & practices of chaos engineering, using real-world examples to show how it has improved resiliency, performance & saved costs. You’ll learn how to design & execute chaos experiments, interpret results, and implement chaos engineering within your organization. The goal is to create highly resilient systems that can withstand any challenge in today’s fast-paced digital landscape.

1 von 30
Downloaden Sie, um offline zu lesen
Confidential / © Harness Inc. 2020
Bringing order to Chaos
Make your systems more resilient
with Chaos Engineering
Confidential / © Harness Inc. 2021
Sayan Mondal
Senior Software Engineer II at
Maintainer at
@s_ayanide
@s-ayanide ● Senior Software Engineer 2 at Harness
● Maintainer of LitmusChaos (CNCF incubating
project) for 2 years, contributing since 3.5 years
● Volunteer and mentor at Linux Foundation
● Chaos Carnival organizing team member
Confidential / © Harness 2022
What causes Downtime
Application Failures
Reputational Impact Financial Impact Poor User Experience
Slack’s Outages Est. >$55M in losses to WF 75,000+ passengers travel
plans impacted
Infrastructure Failures Operational Failures
Confidential / © Harness Inc. 2021
The cloud native problem
Proliferation of applications
into micro services leads to a
RELIABILITY challenge
In cloud native, your code depends on hundreds
of other microservices and runs on many
platforms. The potential of being subjected to a
dependent component failure is huge.
1
Your Application
3
Cloud Native Services
CoreDNS, Envoy,
Prometheus, OpenEBS, etc.
5
Platform Services
Infrastructure
2
Your Application’s
Dependencies
MongoDB, Kafka, TiKV,
Vitess, Postgres, etc.
4
Kubernetes Services
Confidential / © Harness Inc. 2021
Problems with existing solutions
Not automated
Not collaborative
Reactive Approach
● No proactive investments for
failure testing
● Generally driven by root
cause analysis
● No proactive investments for
failure testing
● Generally driven by root
cause analysis
● Driven by Ops
● Not integrated into CI/CD or
Gamedays
Confidential / © Harness Inc. 2021
The solution? Chaos Engineering
SREs + Developers
Experiments are in Git just like code
Chaos engineering is collaborative
Collaborative chaos experiments in
a centralized control plane
Optimize initial investment
Reduce the inertia for starting chaos
Robust Experiments
Public and private chaos hubs with
ready to use experiments
Find weaknesses during build/test phase
Verifying at dev stage saves money
Integrate into CI/CD systems
Rollout automated and controlled
chaos experiments across
prod/non-prod environments
Measure the impact of inducing chaos
Build confidence by starting small
Enables observability for Chaos
Chaos metrics used to assess
impact and manage SLOs/Errors

Recomendados

Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookVMware Tanzu
 
PCF: Platform for a New Era - Kubernetes for the Enterprise - London
PCF: Platform for a New Era - Kubernetes for the Enterprise - LondonPCF: Platform for a New Era - Kubernetes for the Enterprise - London
PCF: Platform for a New Era - Kubernetes for the Enterprise - LondonVMware Tanzu
 
Kubernetes in The Enterprise
Kubernetes in The EnterpriseKubernetes in The Enterprise
Kubernetes in The EnterpriseTyrone Systems
 
MuleSoft Surat Meetup#42 - Runtime Fabric Manager on Self Managed Kubernetes ...
MuleSoft Surat Meetup#42 - Runtime Fabric Manager on Self Managed Kubernetes ...MuleSoft Surat Meetup#42 - Runtime Fabric Manager on Self Managed Kubernetes ...
MuleSoft Surat Meetup#42 - Runtime Fabric Manager on Self Managed Kubernetes ...Jitendra Bafna
 
Virtualization 2011 v1
Virtualization 2011 v1Virtualization 2011 v1
Virtualization 2011 v1Pini Cohen
 
WMworld Europe 2014: Hybrid Sandboxing – Create the Ultimate On and Off Premi...
WMworld Europe 2014: Hybrid Sandboxing – Create the Ultimate On and Off Premi...WMworld Europe 2014: Hybrid Sandboxing – Create the Ultimate On and Off Premi...
WMworld Europe 2014: Hybrid Sandboxing – Create the Ultimate On and Off Premi...VMworld
 
VMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu
 
Cloud Infrastructure Modernisation Guide
Cloud Infrastructure Modernisation GuideCloud Infrastructure Modernisation Guide
Cloud Infrastructure Modernisation GuideMontel Intergalactic
 

Más contenido relacionado

Ähnlich wie stackconf 2023 | Bringing Order to Chaos: Make Your Systems More Resilient with Chaos Engineering by Sayan Mondal.pdf

Unlock the Cloud: Building a Vendor Independent Private Cloud
Unlock the Cloud: Building a Vendor Independent Private CloudUnlock the Cloud: Building a Vendor Independent Private Cloud
Unlock the Cloud: Building a Vendor Independent Private CloudAbiquo, Inc.
 
PHP Buildpacks in the Cloud on Bluemix
PHP Buildpacks in the Cloud on BluemixPHP Buildpacks in the Cloud on Bluemix
PHP Buildpacks in the Cloud on BluemixIBM
 
Cloud Foundry for PHP developers
Cloud Foundry for PHP developersCloud Foundry for PHP developers
Cloud Foundry for PHP developersDaniel Krook
 
Containers vs. VMs: It's All About the Apps!
Containers vs. VMs: It's All About the Apps!Containers vs. VMs: It's All About the Apps!
Containers vs. VMs: It's All About the Apps!Steve Wilson
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Dmitry Lazarenko
 
Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016Stormy Peters
 
CIT-2697 - Customer Success Stories with IBM PureApplication System
CIT-2697 - Customer Success Stories with IBM PureApplication SystemCIT-2697 - Customer Success Stories with IBM PureApplication System
CIT-2697 - Customer Success Stories with IBM PureApplication SystemHendrik van Run
 
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...Cloud Native Day Tel Aviv
 
IBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptxIBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptxssuser666667
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2Pini Cohen
 
Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Phil Estes
 
Pivotal Container Service (PKS) at SF Cloud Foundry Meetup
Pivotal Container Service (PKS) at SF Cloud Foundry MeetupPivotal Container Service (PKS) at SF Cloud Foundry Meetup
Pivotal Container Service (PKS) at SF Cloud Foundry Meetupcornelia davis
 
VMware Tanzu Service Mesh from the Developer’s Perspective
VMware Tanzu Service Mesh from the Developer’s PerspectiveVMware Tanzu Service Mesh from the Developer’s Perspective
VMware Tanzu Service Mesh from the Developer’s PerspectiveVMware Tanzu
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
The evolving story for Agile Integration Architecture in 2019
The evolving story for Agile Integration Architecture in 2019The evolving story for Agile Integration Architecture in 2019
The evolving story for Agile Integration Architecture in 2019Kim Clark
 
Microsoft, Linux, Open Source, DevOps
Microsoft, Linux, Open Source, DevOpsMicrosoft, Linux, Open Source, DevOps
Microsoft, Linux, Open Source, DevOpsJessica Deen
 
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...Weaveworks
 

Ähnlich wie stackconf 2023 | Bringing Order to Chaos: Make Your Systems More Resilient with Chaos Engineering by Sayan Mondal.pdf (20)

Unlock the Cloud: Building a Vendor Independent Private Cloud
Unlock the Cloud: Building a Vendor Independent Private CloudUnlock the Cloud: Building a Vendor Independent Private Cloud
Unlock the Cloud: Building a Vendor Independent Private Cloud
 
PHP Buildpacks in the Cloud on Bluemix
PHP Buildpacks in the Cloud on BluemixPHP Buildpacks in the Cloud on Bluemix
PHP Buildpacks in the Cloud on Bluemix
 
Cloud Foundry for PHP developers
Cloud Foundry for PHP developersCloud Foundry for PHP developers
Cloud Foundry for PHP developers
 
Containers vs. VMs: It's All About the Apps!
Containers vs. VMs: It's All About the Apps!Containers vs. VMs: It's All About the Apps!
Containers vs. VMs: It's All About the Apps!
 
Enterprise serverless
Enterprise serverlessEnterprise serverless
Enterprise serverless
 
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
Private PaaS & Container-as-a-Service for ISVs and Enterprise - Use Cases and...
 
Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016
 
CIT-2697 - Customer Success Stories with IBM PureApplication System
CIT-2697 - Customer Success Stories with IBM PureApplication SystemCIT-2697 - Customer Success Stories with IBM PureApplication System
CIT-2697 - Customer Success Stories with IBM PureApplication System
 
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...
Microservices: Why and When? - Alon Fliess, CodeValue - Cloud Native Day Tel ...
 
IBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptxIBM RedHat OCP Vs xKS.pptx
IBM RedHat OCP Vs xKS.pptx
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2
 
Cloud to Edge
Cloud to EdgeCloud to Edge
Cloud to Edge
 
Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?Containerize, PaaS, or Go Serverless!?
Containerize, PaaS, or Go Serverless!?
 
Pivotal Container Service (PKS) at SF Cloud Foundry Meetup
Pivotal Container Service (PKS) at SF Cloud Foundry MeetupPivotal Container Service (PKS) at SF Cloud Foundry Meetup
Pivotal Container Service (PKS) at SF Cloud Foundry Meetup
 
The Future of Cloud Innovation, featuring Adrian Cockcroft
The Future of Cloud Innovation, featuring Adrian CockcroftThe Future of Cloud Innovation, featuring Adrian Cockcroft
The Future of Cloud Innovation, featuring Adrian Cockcroft
 
VMware Tanzu Service Mesh from the Developer’s Perspective
VMware Tanzu Service Mesh from the Developer’s PerspectiveVMware Tanzu Service Mesh from the Developer’s Perspective
VMware Tanzu Service Mesh from the Developer’s Perspective
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
The evolving story for Agile Integration Architecture in 2019
The evolving story for Agile Integration Architecture in 2019The evolving story for Agile Integration Architecture in 2019
The evolving story for Agile Integration Architecture in 2019
 
Microsoft, Linux, Open Source, DevOps
Microsoft, Linux, Open Source, DevOpsMicrosoft, Linux, Open Source, DevOps
Microsoft, Linux, Open Source, DevOps
 
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...
Security & Resiliency of Cloud Native Apps with Weave GitOps & Tetrate Servic...
 

Último

Monthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEMonthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEOlgaOliveaJohn
 
Auditorium Session 3 - Resilience - Financial Resilience and Collaboration
Auditorium Session 3 - Resilience - Financial Resilience and CollaborationAuditorium Session 3 - Resilience - Financial Resilience and Collaboration
Auditorium Session 3 - Resilience - Financial Resilience and CollaborationMuseums Galleries Scotland
 
Issues affecting LGBT as they grow older.pptx
Issues affecting LGBT as they grow older.pptxIssues affecting LGBT as they grow older.pptx
Issues affecting LGBT as they grow older.pptxbill846304
 
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxTeams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxKai Stenberg
 
TheSimpsons_Fandom_Assignment_4.5pc.pptx
TheSimpsons_Fandom_Assignment_4.5pc.pptxTheSimpsons_Fandom_Assignment_4.5pc.pptx
TheSimpsons_Fandom_Assignment_4.5pc.pptxStevenLuker3
 
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptxGarcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx0461620
 
Freeman_Abigail Personal Brand Exploration
Freeman_Abigail Personal Brand ExplorationFreeman_Abigail Personal Brand Exploration
Freeman_Abigail Personal Brand Explorationabbytoliver
 
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfInstructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfDr. Cherinet Aytenfsu Weldearegay
 
IE Application: Express Yourself - Sofia Merizalde
IE Application: Express Yourself - Sofia MerizaldeIE Application: Express Yourself - Sofia Merizalde
IE Application: Express Yourself - Sofia Merizaldesofiamerizaldev
 
Auditorium Session 2 - Workforce - Diversity/Skills & Confidence
Auditorium Session 2 - Workforce - Diversity/Skills & ConfidenceAuditorium Session 2 - Workforce - Diversity/Skills & Confidence
Auditorium Session 2 - Workforce - Diversity/Skills & ConfidenceMuseums Galleries Scotland
 

Último (11)

Monthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSEMonthly HSE Report March for overall HSE
Monthly HSE Report March for overall HSE
 
Auditorium Session 3 - Resilience - Financial Resilience and Collaboration
Auditorium Session 3 - Resilience - Financial Resilience and CollaborationAuditorium Session 3 - Resilience - Financial Resilience and Collaboration
Auditorium Session 3 - Resilience - Financial Resilience and Collaboration
 
Issues affecting LGBT as they grow older.pptx
Issues affecting LGBT as they grow older.pptxIssues affecting LGBT as they grow older.pptx
Issues affecting LGBT as they grow older.pptx
 
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptxTeams Nation 2024 - #Copilot & Teams or Just Premium.pptx
Teams Nation 2024 - #Copilot & Teams or Just Premium.pptx
 
Auditorium Session 1 - Connection - Inclusion
Auditorium Session 1 - Connection - InclusionAuditorium Session 1 - Connection - Inclusion
Auditorium Session 1 - Connection - Inclusion
 
TheSimpsons_Fandom_Assignment_4.5pc.pptx
TheSimpsons_Fandom_Assignment_4.5pc.pptxTheSimpsons_Fandom_Assignment_4.5pc.pptx
TheSimpsons_Fandom_Assignment_4.5pc.pptx
 
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptxGarcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx
Garcia_RobertDaniel_SPCSTA_PB1_2024-02.pptx
 
Freeman_Abigail Personal Brand Exploration
Freeman_Abigail Personal Brand ExplorationFreeman_Abigail Personal Brand Exploration
Freeman_Abigail Personal Brand Exploration
 
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdfInstructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
Instructional Supervision - By Dr. Cherinet Aytenfsu Weldearegay.pdf
 
IE Application: Express Yourself - Sofia Merizalde
IE Application: Express Yourself - Sofia MerizaldeIE Application: Express Yourself - Sofia Merizalde
IE Application: Express Yourself - Sofia Merizalde
 
Auditorium Session 2 - Workforce - Diversity/Skills & Confidence
Auditorium Session 2 - Workforce - Diversity/Skills & ConfidenceAuditorium Session 2 - Workforce - Diversity/Skills & Confidence
Auditorium Session 2 - Workforce - Diversity/Skills & Confidence
 

stackconf 2023 | Bringing Order to Chaos: Make Your Systems More Resilient with Chaos Engineering by Sayan Mondal.pdf

  • 1. Confidential / © Harness Inc. 2020 Bringing order to Chaos Make your systems more resilient with Chaos Engineering
  • 2. Confidential / © Harness Inc. 2021 Sayan Mondal Senior Software Engineer II at Maintainer at @s_ayanide @s-ayanide ● Senior Software Engineer 2 at Harness ● Maintainer of LitmusChaos (CNCF incubating project) for 2 years, contributing since 3.5 years ● Volunteer and mentor at Linux Foundation ● Chaos Carnival organizing team member
  • 3. Confidential / © Harness 2022 What causes Downtime Application Failures Reputational Impact Financial Impact Poor User Experience Slack’s Outages Est. >$55M in losses to WF 75,000+ passengers travel plans impacted Infrastructure Failures Operational Failures
  • 4. Confidential / © Harness Inc. 2021 The cloud native problem Proliferation of applications into micro services leads to a RELIABILITY challenge In cloud native, your code depends on hundreds of other microservices and runs on many platforms. The potential of being subjected to a dependent component failure is huge. 1 Your Application 3 Cloud Native Services CoreDNS, Envoy, Prometheus, OpenEBS, etc. 5 Platform Services Infrastructure 2 Your Application’s Dependencies MongoDB, Kafka, TiKV, Vitess, Postgres, etc. 4 Kubernetes Services
  • 5. Confidential / © Harness Inc. 2021 Problems with existing solutions Not automated Not collaborative Reactive Approach ● No proactive investments for failure testing ● Generally driven by root cause analysis ● No proactive investments for failure testing ● Generally driven by root cause analysis ● Driven by Ops ● Not integrated into CI/CD or Gamedays
  • 6. Confidential / © Harness Inc. 2021 The solution? Chaos Engineering SREs + Developers Experiments are in Git just like code Chaos engineering is collaborative Collaborative chaos experiments in a centralized control plane Optimize initial investment Reduce the inertia for starting chaos Robust Experiments Public and private chaos hubs with ready to use experiments Find weaknesses during build/test phase Verifying at dev stage saves money Integrate into CI/CD systems Rollout automated and controlled chaos experiments across prod/non-prod environments Measure the impact of inducing chaos Build confidence by starting small Enables observability for Chaos Chaos metrics used to assess impact and manage SLOs/Errors
  • 7. Confidential / © Harness Inc. 2021 How to do Chaos Engineering? Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing
  • 8. Confidential / © Harness Inc. 2021 Project LitmusChaos - a CNCF incubating project Started in 2017; 4+ years of active development 350K+ Litmus installations; 30x usage growth in the last 3 quarters, 50+ chaos experiments, 100+ contributors Stable platform : 2.0 released 50+ enterprises using 2.0 CNCF Incubating project Litmus is an open source platform for practicing chaos engineering in a cloud native way. CNCF Incubating project 30x growth in per-day installations of Litmus in the last 3 quarters; 1500 installations per day Litmus is adopted by
  • 9. Confidential / © Harness Inc. 2021 What’s Exciting about LitmusChaos? Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing Chaos Workflow editor User ChaosCenter Management & Teaming ChaosAgent Custom Image Registry GitOps ChaosCenter Monitoring and Observability Resilience Score Calculation ChaosHub Scheduling More Control over Chaos Results New & Enhanced Experiments Support for Cri-o, ContainerD and Docker
  • 10. Confidential / © Harness Inc. 2021 Architecture Overview of LitmusChaos 2.0 Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing
  • 11. Confidential / © Harness Inc. 2021 Chaos Workflow Deep Dive
  • 12. The ChaosCenter is a single source of truth to control all the different Chaos Activities happening around Litmus. From the ChaosCenter you get the freedom to manage every single part of Litmus and shape your workflows exactly the way you want it. A ChaosAgent in Litmus is nothing but the target cluster where Chaos would be injected via Litmus. There should always be at least one or more than one ChaosAgents connected to the ChaosCenter. Each individual ChaosAgent can be chosen to be the Target Agent for Chaos Injection. Core Components of LitmusChaos “ “ ChaosCenter Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing ChaosAgent
  • 13. Confidential / © Harness Inc. 2021 Variety of faults offered in LitmusChaos Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing Pod Chaos Node Chaos Network Chaos Stress Chaos Cloud Services Application Chaos Pod Failure Container Kill Pod Autoscale Node Drain Forced Eviction (Node Taints) Node Restart/PowerOff Network Latency Packet Loss Network Corruption, Duplication Pod, Node CPU Hog Pod, Node Memory Hog Pod, Node Disk Stress Pod Ephemeral Storage Fill AWS EKS EC2 Termination AWS EBS Disk Detach GCP GPD Disk Detach Kafka Leader Broker Failure Cassandra Ring Disruption OpenEBS Control Plane / Volume Failure
  • 14. Confidential / © Harness Inc. 2020 The Features Let’s take a look at the core features offered
  • 15. Confidential / © Harness Inc. 2020 Chaos Center Chaos Workflows Automate dependency setup, create complex chaos scenarios, support definition of load/validation jobs along with chaos injection Multiple options From Templates, Custom Workflows from Scratch (using ChaosHubs), From pre-created YAMLs Crontrol Chaos Experiments Sequence Control (Parallel as well as Sequential steps creation) Schedules Creation of either Singular or Cron Workflows as Schedules Experiment Priority Attaching priority to Chaos Experiments based on your use cases
  • 16. Confidential / © Harness Inc. 2020 Workflow Management GitOps Rolling out automated changes using GitOps Custom Image Allowing image addition from custom image server (both public and private) Resilience Score Measure and Analyse the Resilience Score of each workflow
  • 17. Confidential / © Harness Inc. 2020 Multi Tenancy Scope Support Supports setup (control plane & agents) and execution of chaos experiments in both cluster scoped and namespace scoped modes. Authentication Authentication and a smooth onboarding process. Choose between email and password auth or OAuth with Google or GitHub for your teams. Create Teams Creating a Team of multiple Users and Project Management Fine-Grained RBAC Flexible RBAC to drill down and grant correct privileges to users.
  • 18. Confidential / © Harness Inc. 2020 Monitoring & Observability Connect Datasource Connect a Data Source (from any Agent) and monitor workflows Visualization Visualize workflow run statistics and aggregated schedules Comparison Compare two or more Workflows Upload Dashboards Upload shared/downloadable dashboards available in the community Tune Dashboards Edit queries, Tune dashboards to create a custom one from scratch Monitor in Real Time Monitor effect of chaos in real time with interleaved events and metrics from Prometheus Datasource
  • 19. Confidential / © Harness Inc. 2020 GitOps for Chaos Git based SCM Integrates with Git-based SCM to provide a single source of truth for chaos artifacts (workflows), such that changes are synchronized bi-directionally b/w the git source and the chaos center - thereby pulling the latest artifact for execution. Tracking Provides an event-tracker microservice to automatically launch “subscribed” chaos workflows upon app upgrades affected by GitOps tools like ArgoCD, Flux
  • 20. Confidential / © Harness Inc. 2020 Non Kubernetes Chaos Chaos on Infra Inject chaos on infrastructure resources such as VMs/instances and disks (AWS, GCP, Azure, VMWare) Attack Baremetal Introduces chaos experiments to bring down baremetal nodes that provide IPMI-based out-of-band access. Chaos on Machine Litmus has developed m-agent: a platform generic daemon agent for orchestrating chaos into any computing node.
  • 21. Confidential / © Harness Inc. 2020 Hands On Demo Time
  • 22. Confidential / © Harness Inc. 2020 Install the components
  • 23. Confidential / © Harness Inc. 2020 Pick the faults
  • 24. Confidential / © Harness Inc. 2020 Inject Chaos in your application
  • 25. Confidential / © Harness Inc. 2020 Observe Impact
  • 26. Confidential / © Harness Inc. 2021 Seamless support for cross cloud connectivity and interactions. Target you applications running on your preferred cloud provider with Litmusctl. Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing Multi-Cloud Support with LitmusChaos 2.0
  • 27. Future Roadmap What’s ahead for us? ● Increased support for chaos against Non-Kubernetes infrastructure components ● More Application specific chaos experiments with native faults and health checks ● Improved Chaos SDK for creation of user-defined experiments ● Additional probe types for diverse steady state-hypothesis validation ● Improved Observability for chaos experiments ● More community supported Chaos Types
  • 28. Confidential / © Harness Inc. 2021 How to Contribute? Why Resilience? Achieving Resilience Litmus 101 Litmus Experiments Contributing We welcome contributions of all kinds ● Development of features, bug fixes, and other improvements. ● Documentation including reference material and examples. ● Bug and feature reports You can choose from a list of sub-dependent repos to contribute to, a few highlighted repos that Litmus uses are: ● Chaos-charts ● Chaos-workflows ● Test-tools ● Litmus UI ● Litmus-go ● website-litmuschaos
  • 29. Confidential / © Harness Inc. 2021 Conclusion ● Chaos Engineering ● A tool called LitmusChaos ● Architecture principle ● Core Components of Chaos Induction ● Demo Conclusion
  • 30. Confidential / © Harness Inc. 2020 /LitmusChaos /litmuschaos Follow Litmus on Thank You @s_ayanide /s-ayanide Contact me on