SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
STRICTLY CONFIDENTIAL
Application Resilience: Challenges & Good Practice
City Business Club - ISITC Europe | July 2019
Dr Aled Sage
aled@cloudsoft.io
© Cloudsoft Corporation 2019
About Aled Sage
VP Engineering @ Cloudsoft
PhD and 20 years experience with
distributed systems
Expertise in Cloud, Devops and Automation
Across different verticals, including
financial services
@aledsage
aled@cloudsoft.io
linkedin.com/in/aledsage
github.com/aledsage© Cloudsoft Corporation 2019 2
© Cloudsoft Corporation 2019
1. What is resiliency?
2. Changing landscape
3. Case study
4. Problems & solutions
Agenda
© Cloudsoft Corporation 2019 3
© Cloudsoft Corporation 2019
• Business applications
• Service level objectives (SLOs)
Service level agreements (SLAs)
• Aim is not 100% availability
• “Availability” is not binary
• Design for failure
• Proactive and reactive
What is Resiliency
Google Site Reliability Engineers (SRE)
“Striving for Imperfection: Using an error budget
to move fast without compromising high
reliability”
CRM, payroll, analysis, trading systems,
settlement systems, …
Proactive: High availability (HA)
Avoid problems
Reactive: Know when something is wrong
Know how to respond
Repair
Disaster recovery (DR)© Cloudsoft Corporation 2019 4
© Cloudsoft Corporation 2019
Business Impact
© Cloudsoft Corporation 2019 5
Gartner
Why Business Leaders Don’t Care About the Cost of Downtime
9 April 2019
© Cloudsoft Corporation 2019
• Private and public cloud
• Kubernetes and Openshift (using containers)
• Bare-metal and mainframes
• Business wants agility
• Security and reliability still vital
• All while driving down costs
Changing Landscape
• Metro Bank, Monzo, Starling
• Facebook? Amazon?
© Cloudsoft Corporation 2019 6
© Cloudsoft Corporation 2019
Comparison with Disruptors
Incumbents Disruptors
Hybrid environment Everything is cloud-native
Chaos engineering?
“No way!”
Kill servers in production
Wall Street bank: decrease VM
provisioning to 9 hours
New container in seconds;
New Virtual Machines in minute(s)
3-6 month releases Small changes
several times a day
Improve the basics:
● Improve resiliency and
agility of existing apps
● Fit with old-world and the
future
© Cloudsoft Corporation 2019
100s of applications
Many lines of business
Small product set
Legacy high-value apps No legacy apps
7
© Cloudsoft Corporation 2019
APPLICATION-CENTRIC APPROACH
✓ Empower and involve application architects and developers
✓ Codify best practices and use of the bank’s strategic tools
✓ Consistent way to automate resiliency
Self-healing applications!
Your application can automatically detect and fix problems
Case-study: Tier-1 Investment Bank
ACHIEVEMENTS UNLOCKED
✓ Application-level Resiliency at scale
✓ Meets business & regulatory needs
✓ Across multiple evolving platforms
© Cloudsoft Corporation 2019 8
© Cloudsoft Corporation 2019
Challenges
© Cloudsoft Corporation 2019 9
Wide variation in
techniques and quality;
bespoke processes.
Involves many systems.
Fiefdoms & politics.
Resiliency is a
cross-cutting concern.
Manual procedures and
tribal knowledge.
Responses to
degradation or failure
not automated.
Manual Bespoke
Operations
fragmented
Rarely tested
Sufficient to meet
auditors’ requirements.
But little practice or
variation in scenarios;
manual testing is
expensive.
© Cloudsoft Corporation 2019
Problems and Solutions
Problems Solutions
Tribal knowledge, rarely tested Gamedays
Processes, environments and applications are
complicated and ever-changing
Post mortems & continual improvement
High availability and DR needs understanding of
application, but infrastructure-focused
Application-centric approach
Bespoke / inconsistent approaches Empower the application architects;
Codify and share best practices.
© Cloudsoft Corporation 2019 10
© Cloudsoft Corporation 2019
Problems and Solutions
© Cloudsoft Corporation 2019
Problems and Solutions
Problems Solutions
Manual runbooks Automated recovery
Involves many systems Encourage APIs and automation;
“glue code” to codify their use;
focus on common application architectures.
Runbooks and CMDB out-of-date “Living model” of the application;
Deployment and in-life management codified in this model.
© Cloudsoft Corporation 2019 12
© Cloudsoft Corporation 2019
• Restart the web-servers (e.g. on an out-of-memory)
• Repave servers or clusters (e.g. when a restart fails, or an update is
required)
• Auto-scale based on latency, load or time-of-day
• Manage DR within a region and/or across regions
Automated Recovery
© Cloudsoft Corporation 2019 13
© Cloudsoft Corporation 2019
Self Managing Systems: Autonomics
© Cloudsoft Corporation 2019 14
Self configuring
• Automated configuration of components and systems follows
high-level policies. Rest of system adjusts automatically and
seamlessly
Self healing
• System automatically detects, diagnoses, and repairs software and
hardware problems
Self optimizing
• Components and systems continually seek opportunities to
improve their own performance and efficiency
Self protecting
• System automatically defends against malicious attacks or
cascading failures. It uses early warning to anticipate and prevent
system wide failures
© Cloudsoft Corporation 2019
Cloudsoft AMP
© Cloudsoft Corporation 2019 15
AMP streamlines development, operations and governance for any
application on any cloud
Rapidly Scalable
Autonomic policy-based
management
Cloud Enabling
Liberate applications
from infrastructure
Powerful Tools
Consistent in-life management
tooling and features
Improve Visibility
Unified management
view of applications
© Cloudsoft Corporation 2019
Suggested Next Steps
© Cloudsoft Corporation 2019
This week
❏ Get the Gartner report: “Why Business Leaders Don’t Care About the Cost of Downtime” free of charge
❏ Talk to person(s) in charge of application resiliency
❏ Example post mortem from the last major incident; how was this knowledge shared and used?
❏ How are application authors involved in planning for and implementing resiliency?
❏ Is there consistent monitoring and recovery strategies (focusing 80:20 rule of existing apps)?
❏ How often is it tested? Are there gamedays?
Next 90 days
❏ Identify example app(s)
❏ Get a review from an expert of an example workload (“well-architected review”)
❏ Organise gamedays
❏ Introduce post mortems
Next 6 months
❏ Consider automation software at the application level for ‘low-hanging fruit’
❏ Involve and empower your application authors
16
get report for free
cloudsoft.io/report
17
e: aled@cloudsoft.io
w: cloudsoft.io
Thank You!
Any Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015
Josh Russ
 

Was ist angesagt? (19)

Digital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and SecurityDigital Transformation, Cloud Adoption and the Impact on SAM and Security
Digital Transformation, Cloud Adoption and the Impact on SAM and Security
 
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
HP Software Performance Tour 2014 - Enterprise Agility in the age of Applicat...
 
Democratizing App Development in Insurance Industry
Democratizing App Development in Insurance IndustryDemocratizing App Development in Insurance Industry
Democratizing App Development in Insurance Industry
 
How to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center ProductivityHow to Increase Fintech Contact Center Productivity
How to Increase Fintech Contact Center Productivity
 
When it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - RadwareWhen it comes to ADCs, Perception is not Reality - Radware
When it comes to ADCs, Perception is not Reality - Radware
 
Optimizing Technology Refresh
Optimizing Technology RefreshOptimizing Technology Refresh
Optimizing Technology Refresh
 
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone ProgramKespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
Kespry | 3 Keys to Lowering the Cost of Roof Inspections with a Drone Program
 
New Model for IT: Cloud Service Provider
New Model for IT: Cloud Service ProviderNew Model for IT: Cloud Service Provider
New Model for IT: Cloud Service Provider
 
FY11 PC Refresh Plan
FY11 PC Refresh PlanFY11 PC Refresh Plan
FY11 PC Refresh Plan
 
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
Higher Efficiency and IT Empowerment with VMware vSphere with Operations Mana...
 
Progress Pacific: Contemporary App Development
Progress Pacific: Contemporary App DevelopmentProgress Pacific: Contemporary App Development
Progress Pacific: Contemporary App Development
 
Why Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational ChallengesWhy Observability is Key to Solving Business and Operational Challenges
Why Observability is Key to Solving Business and Operational Challenges
 
Unified PPM & Agile
Unified PPM & AgileUnified PPM & Agile
Unified PPM & Agile
 
XebiaLabs Overview Slides
XebiaLabs Overview SlidesXebiaLabs Overview Slides
XebiaLabs Overview Slides
 
Cloud Accounting Process Infographic
Cloud Accounting Process InfographicCloud Accounting Process Infographic
Cloud Accounting Process Infographic
 
SECC_Software Testing Services
SECC_Software Testing ServicesSECC_Software Testing Services
SECC_Software Testing Services
 
SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015SG MVPA Workshop Booklet Fall 2015
SG MVPA Workshop Booklet Fall 2015
 
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystemsTechnical Webinar: By the (Play) Book: The Agile Practice at OutSystems
Technical Webinar: By the (Play) Book: The Agile Practice at OutSystems
 
CMMI services presentation -SECC
CMMI services presentation -SECCCMMI services presentation -SECC
CMMI services presentation -SECC
 

Ähnlich wie Application resilience: challenges and good practice

Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
solarisyourep
 

Ähnlich wie Application resilience: challenges and good practice (20)

Enterprise mobile strategy framework - 1st part
Enterprise mobile strategy framework  - 1st partEnterprise mobile strategy framework  - 1st part
Enterprise mobile strategy framework - 1st part
 
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto NetworksSecurity Across the Cloud Native Continuum with ESG and Palo Alto Networks
Security Across the Cloud Native Continuum with ESG and Palo Alto Networks
 
CloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to ResolutionCloudOps with OpsRamp: From Discovery to Resolution
CloudOps with OpsRamp: From Discovery to Resolution
 
Five keys to successful cloud migration
Five keys to successful cloud migrationFive keys to successful cloud migration
Five keys to successful cloud migration
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
 
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
 
Industry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average BusinessIndustry Perspective: DevOps - What it Means for the Average Business
Industry Perspective: DevOps - What it Means for the Average Business
 
DevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a StartupDevOps for Enterprise Systems : Innovate like a Startup
DevOps for Enterprise Systems : Innovate like a Startup
 
Adopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous DeliveryAdopting Cloud Testing for Continuous Delivery
Adopting Cloud Testing for Continuous Delivery
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
 
Presentation advanced management – the road ahead
Presentation   advanced management – the road aheadPresentation   advanced management – the road ahead
Presentation advanced management – the road ahead
 
La Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream ManagementLa Digital Transformation ha un nuovo alleato: Value Stream Management
La Digital Transformation ha un nuovo alleato: Value Stream Management
 
Cloud computing in practice
Cloud computing in practiceCloud computing in practice
Cloud computing in practice
 
End to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOpsEnd to-End Monitoring for ITSM and DevOps
End to-End Monitoring for ITSM and DevOps
 
Enterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - IEnterprise Mobile Strategy Framework - I
Enterprise Mobile Strategy Framework - I
 
Enterprise mobile strategy framework- I
Enterprise mobile strategy framework- IEnterprise mobile strategy framework- I
Enterprise mobile strategy framework- I
 
Under cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiationUnder cloud cover: How leaders are accelerating competitive differentiation
Under cloud cover: How leaders are accelerating competitive differentiation
 
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
Cloud Expo Asia 20181010 - Bringing Your Applications into the Future with Ha...
 
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
Make A Stress Free Move To The Cloud: Application Modernization and Managemen...
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Application resilience: challenges and good practice

  • 1. STRICTLY CONFIDENTIAL Application Resilience: Challenges & Good Practice City Business Club - ISITC Europe | July 2019 Dr Aled Sage aled@cloudsoft.io
  • 2. © Cloudsoft Corporation 2019 About Aled Sage VP Engineering @ Cloudsoft PhD and 20 years experience with distributed systems Expertise in Cloud, Devops and Automation Across different verticals, including financial services @aledsage aled@cloudsoft.io linkedin.com/in/aledsage github.com/aledsage© Cloudsoft Corporation 2019 2
  • 3. © Cloudsoft Corporation 2019 1. What is resiliency? 2. Changing landscape 3. Case study 4. Problems & solutions Agenda © Cloudsoft Corporation 2019 3
  • 4. © Cloudsoft Corporation 2019 • Business applications • Service level objectives (SLOs) Service level agreements (SLAs) • Aim is not 100% availability • “Availability” is not binary • Design for failure • Proactive and reactive What is Resiliency Google Site Reliability Engineers (SRE) “Striving for Imperfection: Using an error budget to move fast without compromising high reliability” CRM, payroll, analysis, trading systems, settlement systems, … Proactive: High availability (HA) Avoid problems Reactive: Know when something is wrong Know how to respond Repair Disaster recovery (DR)© Cloudsoft Corporation 2019 4
  • 5. © Cloudsoft Corporation 2019 Business Impact © Cloudsoft Corporation 2019 5 Gartner Why Business Leaders Don’t Care About the Cost of Downtime 9 April 2019
  • 6. © Cloudsoft Corporation 2019 • Private and public cloud • Kubernetes and Openshift (using containers) • Bare-metal and mainframes • Business wants agility • Security and reliability still vital • All while driving down costs Changing Landscape • Metro Bank, Monzo, Starling • Facebook? Amazon? © Cloudsoft Corporation 2019 6
  • 7. © Cloudsoft Corporation 2019 Comparison with Disruptors Incumbents Disruptors Hybrid environment Everything is cloud-native Chaos engineering? “No way!” Kill servers in production Wall Street bank: decrease VM provisioning to 9 hours New container in seconds; New Virtual Machines in minute(s) 3-6 month releases Small changes several times a day Improve the basics: ● Improve resiliency and agility of existing apps ● Fit with old-world and the future © Cloudsoft Corporation 2019 100s of applications Many lines of business Small product set Legacy high-value apps No legacy apps 7
  • 8. © Cloudsoft Corporation 2019 APPLICATION-CENTRIC APPROACH ✓ Empower and involve application architects and developers ✓ Codify best practices and use of the bank’s strategic tools ✓ Consistent way to automate resiliency Self-healing applications! Your application can automatically detect and fix problems Case-study: Tier-1 Investment Bank ACHIEVEMENTS UNLOCKED ✓ Application-level Resiliency at scale ✓ Meets business & regulatory needs ✓ Across multiple evolving platforms © Cloudsoft Corporation 2019 8
  • 9. © Cloudsoft Corporation 2019 Challenges © Cloudsoft Corporation 2019 9 Wide variation in techniques and quality; bespoke processes. Involves many systems. Fiefdoms & politics. Resiliency is a cross-cutting concern. Manual procedures and tribal knowledge. Responses to degradation or failure not automated. Manual Bespoke Operations fragmented Rarely tested Sufficient to meet auditors’ requirements. But little practice or variation in scenarios; manual testing is expensive.
  • 10. © Cloudsoft Corporation 2019 Problems and Solutions Problems Solutions Tribal knowledge, rarely tested Gamedays Processes, environments and applications are complicated and ever-changing Post mortems & continual improvement High availability and DR needs understanding of application, but infrastructure-focused Application-centric approach Bespoke / inconsistent approaches Empower the application architects; Codify and share best practices. © Cloudsoft Corporation 2019 10
  • 11. © Cloudsoft Corporation 2019 Problems and Solutions
  • 12. © Cloudsoft Corporation 2019 Problems and Solutions Problems Solutions Manual runbooks Automated recovery Involves many systems Encourage APIs and automation; “glue code” to codify their use; focus on common application architectures. Runbooks and CMDB out-of-date “Living model” of the application; Deployment and in-life management codified in this model. © Cloudsoft Corporation 2019 12
  • 13. © Cloudsoft Corporation 2019 • Restart the web-servers (e.g. on an out-of-memory) • Repave servers or clusters (e.g. when a restart fails, or an update is required) • Auto-scale based on latency, load or time-of-day • Manage DR within a region and/or across regions Automated Recovery © Cloudsoft Corporation 2019 13
  • 14. © Cloudsoft Corporation 2019 Self Managing Systems: Autonomics © Cloudsoft Corporation 2019 14 Self configuring • Automated configuration of components and systems follows high-level policies. Rest of system adjusts automatically and seamlessly Self healing • System automatically detects, diagnoses, and repairs software and hardware problems Self optimizing • Components and systems continually seek opportunities to improve their own performance and efficiency Self protecting • System automatically defends against malicious attacks or cascading failures. It uses early warning to anticipate and prevent system wide failures
  • 15. © Cloudsoft Corporation 2019 Cloudsoft AMP © Cloudsoft Corporation 2019 15 AMP streamlines development, operations and governance for any application on any cloud Rapidly Scalable Autonomic policy-based management Cloud Enabling Liberate applications from infrastructure Powerful Tools Consistent in-life management tooling and features Improve Visibility Unified management view of applications
  • 16. © Cloudsoft Corporation 2019 Suggested Next Steps © Cloudsoft Corporation 2019 This week ❏ Get the Gartner report: “Why Business Leaders Don’t Care About the Cost of Downtime” free of charge ❏ Talk to person(s) in charge of application resiliency ❏ Example post mortem from the last major incident; how was this knowledge shared and used? ❏ How are application authors involved in planning for and implementing resiliency? ❏ Is there consistent monitoring and recovery strategies (focusing 80:20 rule of existing apps)? ❏ How often is it tested? Are there gamedays? Next 90 days ❏ Identify example app(s) ❏ Get a review from an expert of an example workload (“well-architected review”) ❏ Organise gamedays ❏ Introduce post mortems Next 6 months ❏ Consider automation software at the application level for ‘low-hanging fruit’ ❏ Involve and empower your application authors 16 get report for free cloudsoft.io/report