SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How AWS Minimizes the Blast
Radius of Failures
Peter Vosshall
VP & Distinguished Engineer
Amazon Web Services
A R C 3 3 8
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“If anything can go wrong, it will.”
Murphy’s Law
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blast radius
• Who is impacted?
• How many workloads?
• What functionality?
• How many locations?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“As a thought exercise, how could you
cut the blast radius for a similar event
in half?”
Correction of Errors Template
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Region isolation
Availability Zone independence
Cell-based architecture
Shuffle sharding
Operational practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS regions
Region New region (coming soon)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Regional service endpoints
ec2.us-west-1.amazonaws.com
ec2.us-east-2.amazonaws.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shared nothing architecture
X
X
X
us-west-1 us-east-2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inter-Region Virtual Private Cloud (VPC) Peering
Region A
VPC A VPC B
Region B
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inter-Region Virtual Private Cloud (VPC) Peering
Region A
VPC A
VPC Peering
Connection
VPC B
Region B
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inter-Region Virtual Private Cloud (VPC) Peering
Region A
VPC A
VPC Peering
Connection
VPC B
Region B
X
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inter-Region Virtual Private Cloud (VPC) Peering
Region A
VPC A
VPC Peering
Connection
VPC B
Cross
Region
Orchestrator
Region B
X
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS regions
Region isolation Single region blast radius
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability Zones
Region
Availability zone Availability zone
Availability zone
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability Zones
Region
Availability zone Availability zone
Availability zone
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability Zones
Region
Availability zone Availability zone
Availability zone
• Geographically spread to avoid
correlated failure
• Geographically close for low latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability Zones
Region
Availability zone Availability zone
Availability zone
• n data centers per AZ (1 or more)
• n AZs per region (typically 3+)
• 57 AZs across 19 regions globally
• Coming soon: 15 additional AZs
across 5 new regions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a
Instances
DB Instance DB instance
standby
Availability zone b
Instances
Multi-AZ architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a
Instances
DB instance DB instance
standby
Availability zone b
Instances
Multi-AZ architecture
DB instanceDB instance
standby
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone Availability zone
Availability zone
• Enables fault-tolerant applications
• AWS regional services designed to
withstand AZ failures
• Leveraged in S3’s design for
99.999999999% durability
Multi-AZ Zero blast radius!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Zonal services
• Amazon EC2 instances
• Amazon EBS volumes
• Amazon EMR clusters
• AWS CloudHSM instances
• NAT gateways
• etc.
• Zone-specific resources
• Zone-local control planes
• Availability Zone independence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control planes and data planes
• Control plane: administration of resources
• Data plane: usage of resources
• Separating control plane from data plane reduces blast radius
Service Control plane Data plane
Amazon DynamoDB DescribeTable API Query API
Amazon EC2 RunInstances API A running EC2 instance
AWS Lambda CreateFunction API Invoke API
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Zone a control plane
Zone a
Zone a data plane
Zone b control plane
Zone b
Zone b data plane
Zone c control plane
Zone c
Zone c data plane
Availability Zone independence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Zone a control plane
Zone a
Zone a data plane
Regional control plane
Zone b control plane
Zone b
Zone b data plane
Zone c control plane
Zone c
Zone c data plane
Availability Zone independence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
✔ ✔
✔
✔ ✔
Zone a control plane
Zone a
Zone b data plane
Regional control plane
Zone b control plane
Zone b
Zone b data plane
Zone c control plane
Zone c
Zone c data plane
Availability Zone independence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blast Radius with Availability Zones
Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability zone failure
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theoretical blast radius
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Abstracted Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typical service application
Compute
Service
Storage
Compute
Service
Storage
Compute
Service
Storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell-based architecture
Compute
Cell 0
Compute
Cell n
Service
Storage
Compute
Cell 1
Storage Storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cells and Availability zones
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability Zone failure
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theoretical blast radius
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cells and Availability Zones – zonal services
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability zone failure
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partial availability zone failure
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theoretical blast radius
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System properties
Cell 0
Service
Cell 1 Cell n
• Workload isolation
• Failure containment
• Scale-out vs. scale-up
• Testability
• Manageability
✔X ✔
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System properties
Cell 0
Service
Cell 1 Cell n
• Workload isolation
• Failure containment
• Scale-out vs. scale-up
• Testability
• Manageability
Cell n+1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Core considerations
Cell 0
Service
Cell 1 Cell n
• Cell size
• Router robustness
• Partitioning dimension
• Cross-cell use cases
• Cell migration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell size tradeoffs
Smaller cells
• Reduced blast radius
• Easier to test
• Cells easier to operate
Larger cells
• Cost efficiency
• Reduced splits
• System easier to operate
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Router robustness
Cell 0
Service
Cell 1 Cell n
• Remaining shared component
• Stress testing
• Battle hardening
• Keep it simple!
• “Thinnest Possible Layer”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partitioning dimension
Cell 0
Service
Cell 1 Cell n
• System-specific
• Needs careful analysis
• Cut with grain of domain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cut with the grain
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cross-cell use cases
Cell 0
Service
Cell 1 Cell n
• May be unavoidable…keep to
small subset of workload
• Scatter/gather queries
• Batch operations
• Coordinated writes
• Cell migration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell migration
Cell 0
Service
Cell 1 Cell n
• Relocation of workloads
• Driven by
• Heat management
• Size balancing
• Scale-out
• Similar to VM live migration
• Careful coordination between
router and cells
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Theoretical blast radius
Zone a Zone a
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
X X X X X X XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell-based architecture
XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
XX
♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢
✔ ✔
♡ ♧♢
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 8
Shard size = 2
Combinations = 28
Overlap % customers
0 53.6%
1 42.8%
2 3.6%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 100
Shard size = 5
Combinations = 75 million!
Overlap % customers
0 77%
1 21%
2 1.8%
3 0.06%
4 0.0006%
5 0.0000013%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“I'd wager that every new AWS
engineer knows within their first week,
if not their first day, that we never
want to touch more than one zone at
a time.”
Werner Vogels
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Software deployments
• Staggered deployments
• Across regions
• Across availability zones
• Across cells
• Fractional deployments
• Within deployment units
• Automated tests and canaries
• Automated rollback
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automation is key
• Reviewable
• Testable
• Predictable
• Repeatable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
End-to-end ownership
• Service teams own everything!
• No “throw it over the wall”
• Direct feedback loop to reduce
blast radius
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
In conclusion…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
In conclusion…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
In conclusion…
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Peter Vosshall
vosshall@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...
마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...
마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...Amazon Web Services Korea
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...Amazon Web Services
 
AWS Black Belt Online Seminar 2017 AWS Shield
AWS Black Belt Online Seminar 2017 AWS ShieldAWS Black Belt Online Seminar 2017 AWS Shield
AWS Black Belt Online Seminar 2017 AWS ShieldAmazon Web Services Japan
 
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...Amazon Web Services Korea
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...Amazon Web Services Korea
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...Amazon Web Services Korea
 
Secured API Acceleration with Engineers from Amazon CloudFront and Slack
Secured API Acceleration with Engineers from Amazon CloudFront and SlackSecured API Acceleration with Engineers from Amazon CloudFront and Slack
Secured API Acceleration with Engineers from Amazon CloudFront and SlackAmazon Web Services
 
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...Amazon Web Services Japan
 
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항Opennaru, inc.
 
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
20191016 AWS Black Belt Online Seminar Amazon Route 53 ResolverAmazon Web Services Japan
 
Building the Business Case for AWS
Building the Business Case for AWSBuilding the Business Case for AWS
Building the Business Case for AWSAmazon Web Services
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAmazon Web Services
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)Amazon Web Services
 
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...Amazon Web Services Korea
 

Was ist angesagt? (20)

마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...
마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...
마이그레이션과 함께 시작되는 Cloud Financial Management 전략 세우기-곽내인, AWS Cloud Financial Ma...
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Introduction to Amazon EKS
Introduction to Amazon EKSIntroduction to Amazon EKS
Introduction to Amazon EKS
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
 
AWS Black Belt Online Seminar 2017 AWS Shield
AWS Black Belt Online Seminar 2017 AWS ShieldAWS Black Belt Online Seminar 2017 AWS Shield
AWS Black Belt Online Seminar 2017 AWS Shield
 
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...
현대백화점 리테일테크랩과 AWS Prototyping 팀 개발자가 들려주는 인공 지능 무인 스토어 개발 여정 - 최권열 AWS 프로토타이핑...
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
 
Secured API Acceleration with Engineers from Amazon CloudFront and Slack
Secured API Acceleration with Engineers from Amazon CloudFront and SlackSecured API Acceleration with Engineers from Amazon CloudFront and Slack
Secured API Acceleration with Engineers from Amazon CloudFront and Slack
 
AWS Cost Management Workshop
AWS Cost Management WorkshopAWS Cost Management Workshop
AWS Cost Management Workshop
 
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
 
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
20191016 AWS Black Belt Online Seminar Amazon Route 53 Resolver
 
Building the Business Case for AWS
Building the Business Case for AWSBuilding the Business Case for AWS
Building the Business Case for AWS
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)DAT302_Deep Dive on Amazon Relational Database Service (RDS)
DAT302_Deep Dive on Amazon Relational Database Service (RDS)
 
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...
 

Ähnlich wie How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018

Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Amazon Web Services
 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Amazon Web Services
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M usersAmazon Web Services
 
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
建構全球跨區域  x Active-Active架構的無伺服器化後台服務建構全球跨區域  x Active-Active架構的無伺服器化後台服務
建構全球跨區域 x Active-Active架構的無伺服器化後台服務Amazon Web Services
 
How to build scalable and resilient applications in the cloud - AWS Summit Ca...
How to build scalable and resilient applications in the cloud - AWS Summit Ca...How to build scalable and resilient applications in the cloud - AWS Summit Ca...
How to build scalable and resilient applications in the cloud - AWS Summit Ca...Amazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...Amazon Web Services
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
 
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloudAWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloudCobus Bernard
 
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloudAWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloudCobus Bernard
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018Amazon Web Services
 
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018Amazon Web Services
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
 

Ähnlich wie How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018 (20)

Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends
 
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
Best Practices for Building Multi-Region, Active-Active Serverless Applicatio...
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M users
 
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
建構全球跨區域  x Active-Active架構的無伺服器化後台服務建構全球跨區域  x Active-Active架構的無伺服器化後台服務
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
 
How to build scalable and resilient applications in the cloud - AWS Summit Ca...
How to build scalable and resilient applications in the cloud - AWS Summit Ca...How to build scalable and resilient applications in the cloud - AWS Summit Ca...
How to build scalable and resilient applications in the cloud - AWS Summit Ca...
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...
How to Build Multi-Region Applications in the Cloud: AWS Developer Workshop -...
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloudAWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
 
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloudAWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
 
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How AWS Minimizes the Blast Radius of Failures Peter Vosshall VP & Distinguished Engineer Amazon Web Services A R C 3 3 8
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. “If anything can go wrong, it will.” Murphy’s Law
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blast radius • Who is impacted? • How many workloads? • What functionality? • How many locations?
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 10. “As a thought exercise, how could you cut the blast radius for a similar event in half?” Correction of Errors Template
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Region isolation Availability Zone independence Cell-based architecture Shuffle sharding Operational practices
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS regions Region New region (coming soon)
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Regional service endpoints ec2.us-west-1.amazonaws.com ec2.us-east-2.amazonaws.com
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shared nothing architecture X X X us-west-1 us-east-2
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inter-Region Virtual Private Cloud (VPC) Peering Region A VPC A VPC B Region B
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inter-Region Virtual Private Cloud (VPC) Peering Region A VPC A VPC Peering Connection VPC B Region B
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inter-Region Virtual Private Cloud (VPC) Peering Region A VPC A VPC Peering Connection VPC B Region B X
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inter-Region Virtual Private Cloud (VPC) Peering Region A VPC A VPC Peering Connection VPC B Cross Region Orchestrator Region B X
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS regions Region isolation Single region blast radius
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability Zones Region Availability zone Availability zone Availability zone
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability Zones Region Availability zone Availability zone Availability zone
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability Zones Region Availability zone Availability zone Availability zone • Geographically spread to avoid correlated failure • Geographically close for low latency
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability Zones Region Availability zone Availability zone Availability zone • n data centers per AZ (1 or more) • n AZs per region (typically 3+) • 57 AZs across 19 regions globally • Coming soon: 15 additional AZs across 5 new regions
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Instances DB Instance DB instance standby Availability zone b Instances Multi-AZ architecture
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Instances DB instance DB instance standby Availability zone b Instances Multi-AZ architecture DB instanceDB instance standby
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone Availability zone Availability zone • Enables fault-tolerant applications • AWS regional services designed to withstand AZ failures • Leveraged in S3’s design for 99.999999999% durability Multi-AZ Zero blast radius!
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Zonal services • Amazon EC2 instances • Amazon EBS volumes • Amazon EMR clusters • AWS CloudHSM instances • NAT gateways • etc. • Zone-specific resources • Zone-local control planes • Availability Zone independence
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control planes and data planes • Control plane: administration of resources • Data plane: usage of resources • Separating control plane from data plane reduces blast radius Service Control plane Data plane Amazon DynamoDB DescribeTable API Query API Amazon EC2 RunInstances API A running EC2 instance AWS Lambda CreateFunction API Invoke API
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Zone a control plane Zone a Zone a data plane Zone b control plane Zone b Zone b data plane Zone c control plane Zone c Zone c data plane Availability Zone independence
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Zone a control plane Zone a Zone a data plane Regional control plane Zone b control plane Zone b Zone b data plane Zone c control plane Zone c Zone c data plane Availability Zone independence
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ✔ ✔ ✔ ✔ ✔ Zone a control plane Zone a Zone b data plane Regional control plane Zone b control plane Zone b Zone b data plane Zone c control plane Zone c Zone c data plane Availability Zone independence
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blast Radius with Availability Zones Zone a
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability zone failure Zone a Zone a
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Theoretical blast radius Zone a Zone a
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Abstracted Architecture
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Typical service application Compute Service Storage Compute Service Storage Compute Service Storage
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell-based architecture Compute Cell 0 Compute Cell n Service Storage Compute Cell 1 Storage Storage
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cells and Availability zones Zone a Zone a
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability Zone failure Zone a Zone a
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Theoretical blast radius Zone a Zone a
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cells and Availability Zones – zonal services Zone a Zone a
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability zone failure Zone a Zone a
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Partial availability zone failure Zone a Zone a
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Theoretical blast radius Zone a Zone a
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System properties Cell 0 Service Cell 1 Cell n • Workload isolation • Failure containment • Scale-out vs. scale-up • Testability • Manageability ✔X ✔
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System properties Cell 0 Service Cell 1 Cell n • Workload isolation • Failure containment • Scale-out vs. scale-up • Testability • Manageability Cell n+1
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Core considerations Cell 0 Service Cell 1 Cell n • Cell size • Router robustness • Partitioning dimension • Cross-cell use cases • Cell migration
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell size tradeoffs Smaller cells • Reduced blast radius • Easier to test • Cells easier to operate Larger cells • Cost efficiency • Reduced splits • System easier to operate
  • 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Router robustness Cell 0 Service Cell 1 Cell n • Remaining shared component • Stress testing • Battle hardening • Keep it simple! • “Thinnest Possible Layer”
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Partitioning dimension Cell 0 Service Cell 1 Cell n • System-specific • Needs careful analysis • Cut with grain of domain
  • 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cut with the grain
  • 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cross-cell use cases Cell 0 Service Cell 1 Cell n • May be unavoidable…keep to small subset of workload • Scatter/gather queries • Batch operations • Coordinated writes • Cell migration
  • 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell migration Cell 0 Service Cell 1 Cell n • Relocation of workloads • Driven by • Heat management • Size balancing • Scale-out • Similar to VM live migration • Careful coordination between router and cells
  • 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Theoretical blast radius Zone a Zone a
  • 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. X X X X X X XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell-based architecture XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding XX ♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢ ✔ ✔ ♡ ♧♢
  • 77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 8 Shard size = 2 Combinations = 28 Overlap % customers 0 53.6% 1 42.8% 2 3.6%
  • 78. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 100 Shard size = 5 Combinations = 75 million! Overlap % customers 0 77% 1 21% 2 1.8% 3 0.06% 4 0.0006% 5 0.0000013%
  • 79. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 80. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 81. “I'd wager that every new AWS engineer knows within their first week, if not their first day, that we never want to touch more than one zone at a time.” Werner Vogels
  • 82. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Software deployments • Staggered deployments • Across regions • Across availability zones • Across cells • Fractional deployments • Within deployment units • Automated tests and canaries • Automated rollback
  • 83. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automation is key • Reviewable • Testable • Predictable • Repeatable
  • 84. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. End-to-end ownership • Service teams own everything! • No “throw it over the wall” • Direct feedback loop to reduce blast radius
  • 85. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. In conclusion…
  • 86. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. In conclusion…
  • 87. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. In conclusion…
  • 88. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Peter Vosshall vosshall@amazon.com
  • 89. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.