Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

•Als PPTX, PDF herunterladen•

2 gefällt mir•5,102 views

Infrastructure as code, automation, monitoring, disaster recovery, security, scaling and cost tracking are all subjects that are easily accessible but too often overlooked until it is already too late. In this session Cotap will share what AWS offers to help them stay ahead of the curve. By following 4 simple rules they will show how Cotap's Engineering team has been able to run for the past 12 months with over four nines of availability. They deploy 3 to 5 times a day, run in 2 regions/6 AZs and still manage to keep AWS costs below the monthly salary of an Engineer.

Ingenieurwesen

AWS Loft: Behind the scenes with Cotap
Architecting for the Cloud:
Hoping for the best, prepared for the worst.

Infrastructure as Code
● Current state
● Past decisions
● Tracking the evolution

Infrastructure as Code
● CloudFormation
● Design -> JSON
● Version Control!

Rule #1
All changes have to be under Version
Control

Design for automation
● AutoScalingGroups
● Hardware: CloudFormation
● Software: Configuration management
● Cattle not Cats

Rule #2
No instances should be launched manually.

Monitoring & Alerting
● Cost of
o Interruptions
o Waking somebody up
● Channels
● Self-healing infrastructure
● External monitoring
● Page only when critical

Monitoring & Alerting
Situation Channel Page
Disk full 60% Chat, Email ✗
Disk full 90% Chat, Email, PagerDuty ✓
Chef not running for > 30m Chat, Email ✗
Redis not running for > 3 x 5s Chat, Email, PagerDuty ✓
ElasticSearch N-1 Chat, Email ✗
ElasticSearch N-2 Chat, Email, PagerDuty ✓

Platform to fail
● Easy creation of temporary “Stacks”
● Branches can get their own hardware
● Clients can talk to a branch
● QA happens on Sandbox
● Exact copy of Production
● Scale up/down based on needs
● Different Region (us-east-1)

Rule #3
All changes have to go through Sandbox.

Rule #4
Production is just a more powerful Sandbox

Disaster Recovery
● Multi-AZs
● Traffic routing
● Multi-Regions (S3 too)
● AutoScalingGroups Min:1 Max:1
● Off-site backups (VPN + Disks)
● RPO + RTO

Security
● MFA
● Public key distribution
● Root key rotation
● Private/Public Subnets
● ACLs/Security Groups
● Update AMIs
● Trusted Advisor!

Scaling
● Preemptive
● Automatic
● Vertically
● Horizontally
● Bottlenecks

Cost Control
● Tags
o Role
o Environment
● Cost explorer
● Threshold alerting
● Share monthly
● Export to CSV
● Right-Scale (ASG)

4 rules of 5 nines.
● All changes have to be under VC
● No instance should be launched manually
● All changes are deployed to Sandbox first
● Production is just a more powerful Sandbox

Questions?
t: @martincozzi
e: martin@cotap.com
engineering.cotap.com

Empfohlen

Architecting for the Cloud: Hoping for the best, prepared for the worstCotap Engineering

The bond between automation and network engineeringJimmy Lim

21 - IDNOG03 - Jimmy Halim (Cloudflare) - Brief Introduction of CloudFlare, t...Indonesia Network Operators Group

Reliability at scale praveen shukla

Managing Global Distributed NetworkJimmy Lim

Memory in goIman Tunggono

Firebase Cloud Functions: a quick overviewJoseph Lust

TemplateOmran Aleid

Empfohlen

Architecting for the Cloud: Hoping for the best, prepared for the worstCotap Engineering

The bond between automation and network engineeringJimmy Lim

21 - IDNOG03 - Jimmy Halim (Cloudflare) - Brief Introduction of CloudFlare, t...Indonesia Network Operators Group

Reliability at scale praveen shukla

Managing Global Distributed NetworkJimmy Lim

Memory in goIman Tunggono

Firebase Cloud Functions: a quick overviewJoseph Lust

TemplateOmran Aleid

Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack

Keystone event processing pipeline on a dockerized microservices architectureZhenzhong Xu

EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield

Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent

AWS re:Invent 2016 Fast ForwardShuen-Huei Guan

Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro

AWS basicsmbaric

Netflix Data Pipeline With KafkaSteven Wu

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

FastNetMon and MetricsAltinity Ltd

Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini

Netflix Open Source Meetup Season 4 Episode 2aspyker

Scaling MagentoCopious

Empowering Real-Time Decision Making with Data StreamingSafe Software

COM+ & MSMQG Srinivasan

Scaling up uber's real time data analyticsXiang Fu

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent

Zero Downtime JEE ArchitecturesAlexander Penev

Application Caching: The Hidden Microservice (SAConf)Scott Mansfield

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Weitere ähnliche Inhalte

Ähnlich wie Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack

Keystone event processing pipeline on a dockerized microservices architectureZhenzhong Xu

EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield

Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent

AWS re:Invent 2016 Fast ForwardShuen-Huei Guan

Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro

AWS basicsmbaric

Netflix Data Pipeline With KafkaSteven Wu

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

FastNetMon and MetricsAltinity Ltd

Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini

Netflix Open Source Meetup Season 4 Episode 2aspyker

Scaling MagentoCopious

Empowering Real-Time Decision Making with Data StreamingSafe Software

COM+ & MSMQG Srinivasan

Scaling up uber's real time data analyticsXiang Fu

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent

Zero Downtime JEE ArchitecturesAlexander Penev

Application Caching: The Hidden Microservice (SAConf)Scott Mansfield

Ähnlich wie Architecting for the Cloud: Hoping for the Best, Prepared for the Worst (20)

Ensuring Performance in a Fast-Paced Environment (CMG 2014)

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT

Keystone event processing pipeline on a dockerized microservices architecture

EVCache: Lowering Costs for a Low Latency Cache with RocksDB

Our Multi-Year Journey to a 10x Faster Confluent Cloud

AWS re:Invent 2016 Fast Forward

Big data Argentina meetup 2020-09: Intro to presto on docker

AWS basics

Netflix Data Pipeline With Kafka

FastNetMon and Metrics

Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Netflix Open Source Meetup Season 4 Episode 2

Scaling Magento

Empowering Real-Time Decision Making with Data Streaming

COM+ & MSMQ

Scaling up uber's real time data analytics

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

Zero Downtime JEE Architectures

Application Caching: The Hidden Microservice (SAConf)

Kürzlich hochgeladen

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Introduction to Multiple Access Protocol.pptxupamatechverse

UNIT - IV - Air Compressors and its Performancesivaprakash250

Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Extrusion Processes and Their Limitations120cr0395

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat

Java Programming :Event Handling(Types of Events)simmis5

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Kürzlich hochgeladen (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7

KubeKraft presentation @CloudNativeHooghly

Introduction to Multiple Access Protocol.pptx

UNIT - IV - Air Compressors and its Performance

Glass Ceramics: Processing and Properties

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

Extrusion Processes and Their Limitations

Roadmap to Membership of RICS - Pathways and Routes

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...

Java Programming :Event Handling(Types of Events)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

Introduction to IEEE STANDARDS and its different types.pptx

Architecting for the Cloud: Hoping for the Best, Prepared for the Worst

1. AWS Loft: Behind the scenes with Cotap Architecting for the Cloud: Hoping for the best, prepared for the worst.

10.

11. Infrastructure as Code

12. Infrastructure as Code ● Current state ● Past decisions ● Tracking the evolution

13. Infrastructure as Code ● CloudFormation ● Design -> JSON ● Version Control!

14. Infrastructure as Code

15. Infrastructure as Code

16. Infrastructure as Code

17.

18.

19. Rule #1 All changes have to be under Version Control

20. Design for automation

21. Design for automation ● AutoScalingGroups ● Hardware: CloudFormation ● Software: Configuration management ● Cattle not Cats

22. Rule #2 No instances should be launched manually.

23. Monitoring & Alerting

24. Monitoring & Alerting ● Cost of o Interruptions o Waking somebody up ● Channels ● Self-healing infrastructure ● External monitoring ● Page only when critical

25. Monitoring & Alerting Situation Channel Page Disk full 60% Chat, Email ✗ Disk full 90% Chat, Email, PagerDuty ✓ Chef not running for > 30m Chat, Email ✗ Redis not running for > 3 x 5s Chat, Email, PagerDuty ✓ ElasticSearch N-1 Chat, Email ✗ ElasticSearch N-2 Chat, Email, PagerDuty ✓

26. Monitoring & Alerting ● Cost of o Interruptions o Waking somebody up ● Channels ● Self-healing infrastructure ● External monitoring ● Page only when critical

27. Platform to fail

28. Platform to fail ● Easy creation of temporary “Stacks” ● Branches can get their own hardware ● Clients can talk to a branch ● QA happens on Sandbox ● Exact copy of Production ● Scale up/down based on needs ● Different Region (us-east-1)

29. Platform to fail

30. Platform to fail ● Easy creation of temporary “Stacks” ● Branches can get their own hardware ● Clients can talk to a branch ● QA happens on Sandbox ● Exact copy of Production ● Scale up/down based on needs ● Different Region (us-east-1)

31. Rule #3 All changes have to go through Sandbox.

32. Rule #4 Production is just a more powerful Sandbox

33. Disaster Recovery

34. Disaster Recovery ● Multi-AZs ● Traffic routing ● Multi-Regions (S3 too) ● AutoScalingGroups Min:1 Max:1 ● Off-site backups (VPN + Disks) ● RPO + RTO

35. Security

36. Security ● MFA ● Public key distribution ● Root key rotation ● Private/Public Subnets ● ACLs/Security Groups ● Update AMIs ● Trusted Advisor!

37. Security

38. Scaling

39. Scaling ● Preemptive ● Automatic ● Vertically ● Horizontally ● Bottlenecks

40. Scaling

41. Cost Control

42. Cost Control ● Tags o Role o Environment ● Cost explorer ● Threshold alerting ● Share monthly ● Export to CSV ● Right-Scale (ASG)

43. Cost Control

44. Cost Control

45. Cost Control ● Tags o Role o Environment ● Cost explorer ● Threshold alerting ● Share monthly ● Export to CSV ● Right-Scale (ASG)

46. 4 rules of 5 nines. ● All changes have to be under VC ● No instance should be launched manually ● All changes are deployed to Sandbox first ● Production is just a more powerful Sandbox

47. Questions? t: @martincozzi e: martin@cotap.com engineering.cotap.com

Hinweis der Redaktion

A breakthrough came in April 1913. A production engineer in the flywheel magneto assembly area tried a new way to put this component's parts together. The operation was divided into 29 separate steps. Workers placed only one part in the assembly before pushing the flywheel down the line to the next employee. Previously, it had taken one employee about 20 minutes to assemble a flywheel magneto. Divided among 29 men, the job took 13 minutes. It was eventually trimmed to five minutes. This approach was applied gradually to the construction of the engine and other parts.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own
Fine balance between productivity and keeping systems running. If you are constantly fixing your systems you are not shipping code. Have rules and procedures in place to deploy code that won’t break your infrastructure.
Fine balance between productivity and keeping systems running. If you are constantly fixing your systems you are not shipping code. Have rules and procedures in place to deploy code that won’t break your infrastructure.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own stacks (thanks to automation) and try their own stuff. We also do bi-weekly catastrophe scenario, a manual chaos monkey if you will.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own stacks (thanks to automation) and try their own stuff. We also do bi-weekly catastrophe scenario, a manual chaos monkey if you will.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own stacks (thanks to automation) and try their own stuff. We also do bi-weekly catastrophe scenario, a manual chaos monkey if you will.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own stacks (thanks to automation) and try their own stuff. We also do bi-weekly catastrophe scenario, a manual chaos monkey if you will.
Give people a platform to fail. Code from the assembly line goes to Sandbox. It is reviewed, tested and used internally. Identify problems in Sandbox early on, before pushing them out to the public. People can actually build their own stacks (thanks to automation) and try their own stuff. We also do bi-weekly catastrophe scenario, a manual chaos monkey if you will.
If you have applied most of the previous rules, then easy scaling comes at a low cost on Amazon. CloudFormation handles creating AutoScaling groups easily.
Here we can talk about routing traffic from on AZ to another on the day of launch because our instances in AZ1 had the wrong MTU.
If you have applied most of the previous rules, then easy scaling comes at a low cost on Amazon. CloudFormation handles creating AutoScaling groups easily.
If you have applied most of the previous rules, then easy scaling comes at a low cost on Amazon. CloudFormation handles creating AutoScaling groups easily. 4 types of scaling Preemptive, you know traffic is coming (Press, conference etc.) How quickly can you scale up your applications. Automatically, Scaling up and back down when necessary Vertically, can you change instance type easily? With cloudformation we are able to rotate an entire cluster size without manual intervention Horizontally: Add machines and grow your cluster
Insert picture of service cost control
Insert picture of service cost control
Insert picture of service cost control
Insert picture of service cost control