SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Managing ECS hosts with AWS lambda and step
functions
Terraform at Comtravo
Terraform at Comtravo
➢ Six environments maintained by Terraform.
➢ Integrated into our CI/CD pipeline.
➢ Each environment has:
○ 500+ AWS components.
○ 43 Lambdas.
○ 25 microservices.
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Merge to master
CI/CD at Comtravo: Mono-repo Merge to master
ECS at Comtravo
ECS: Many interesting challenges
One such challenge:
Update EC2 hosts in a ECS cluster
Update EC2 hosts in a ECS cluster: Use cases
➢ You have a custom AMI for your ECS cluster(s).
➢ You want to always rollout the latest ECS-optimized AMIs.
➢ You want to rotate the admin keys.
➢ Change Instance type.
➢ Use an updated user_data script.
Update EC2 hosts in a ECS cluster: The process
➢ Terraform emits an AWS cloudwatch event once launch
configuration was created.
➢ Detach “old instances“ from ASG and wait for capacity.
➢ “Move” services from old instances to new instances.
➢ Terminate old instances when no more tasks running.
➢ Alert on failures.
Terraform + AWS Events + AWS Step functions =
Awesome
I created a new
launch configuration
lc-1234 for ASG
asg-1234 belonging
to ECS cluster
cluster-A
AWS CloudWatch Events
time
Task A
started
bar
Task C
started
Task B
stopped
ECS
Host
bla
baz
custom event
custom event
custom event
Terraform Event Emitter
resource "null_resource" "launch-config-update" {
provisioner "local-exec" {
command = "python ${path.module}/scripts/emit_launchconfig_event.py
--launch_configuration_name ${aws_launch_configuration.ecs-lc.name}
--autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name}
--ami ${var.aws_ami}
--cluster ${var.cluster}"
}
triggers {
launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}"
}
}
Terraform Event
{
"version": "0",
"id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55",
"source": "comtravo.terraform.alpha",
"account": "1234567890",
"time": "2018-05-09T13:35:43Z",
"region": "eu-west-1",
"resources": [
"ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
],
"detail": {
"ami": "ami-bfb5fec6",
"status": "ACTIVE",
"agentConnected": false,
"autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005",
"environment": "alpha",
"clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha"
"launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
},
"detailType": "ECS Launch Configuration Change"
}
AWS CloudWatch Event Rules
resource "aws_cloudwatch_event_rule" "ecs-manager" {
name = "capture-ecs-events-${terraform.workspace}"
description = "Capture ECS related events"
event_pattern = <<PATTERN
{
"source": [
"comtravo.terraform.${terraform.workspace}"
],
"detail-type": [
"ECS Launch Configuration Change"
],
"detail": {
"clusterArn": [
"arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}"
],
"status": ["ACTIVE"]
}
}
PATTERN
}
AWS Step functions
DEMO
Questions
You all have been awesome!!!
Extras
ECS Challenge #1
ECS AGENT DISCONNECTS
#1 ECS agent disconnects - Initial solution
➢ Cron job on ECS hosts to notify via SNS event and restart
ECS agent.
➢ Chances of ECS agent failing again due to some inherent
problem within the instance are high.
#1 ECS agent disconnects - Initial solution
#1 ECS agent disconnects - Better solution
➢ Detect ECS agent disconnects.
➢ Bootup new ECS host and wait for it to be healthy.
➢ “Move” all the existing containers from the problematic
instance to a new Instance.
➢ Terminate the problematic instance.
➢ Alert on failures.
#1 ECS agent disconnects - Better solution
#1 ECS agent disconnects: Detection
How do we detect ECS agent disconnects?
AWS Cloudwatch EVENTS to the
rescue!!!
#1 ECS agent disconnects: ECS Events
time
Task A
started
bar
Task C
started
Task B
stopped foo baz
ECS agent
disconnected
ECS agent
connected
ECS agent
disconnected
#1 ECS agent disconnects: Filter ECS Events
{
"detail": {
"agentConnected": [
false
],
"clusterArn": [
"arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa"
],
"status": [
"ACTIVE"
]
},
"detail-type": [
"ECS Container Instance State Change"
],
"source": [
"aws.ecs"
]
}
#1 ECS agent disconnects: Trigger step function
#1 ECS agent disconnects: ECS Events

Weitere ähnliche Inhalte

Was ist angesagt?

Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
Sam Bashton
 

Was ist angesagt? (19)

Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with Terraform
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
 
Terraform at Scale
Terraform at ScaleTerraform at Scale
Terraform at Scale
 
London Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in ProductionLondon Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in Production
 
Terraform
TerraformTerraform
Terraform
 
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudAWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World Optimization
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
 
Testing & deploying terraform
Testing & deploying terraformTesting & deploying terraform
Testing & deploying terraform
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
 
From * to Symfony2
From * to Symfony2From * to Symfony2
From * to Symfony2
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018
 
Scalable Event Tracking
Scalable Event TrackingScalable Event Tracking
Scalable Event Tracking
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraform
 
Orbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmOrbiter and how to extend Docker Swarm
Orbiter and how to extend Docker Swarm
 
Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 

Ähnlich wie Zero downtime ECS host updates with Terraform

Scalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWSScalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWS
Fernando Rodriguez
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Tobias Schneck
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Tobias Schneck
 

Ähnlich wie Zero downtime ECS host updates with Terraform (20)

Zero down time ECS cluster upgrades
Zero down time ECS cluster upgradesZero down time ECS cluster upgrades
Zero down time ECS cluster upgrades
 
From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in Sydney
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivSelf Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
 
Scalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWSScalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWS
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud
 
ProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdfProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdf
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)
 
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
 
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
 
(Re)discover your AEM
(Re)discover your AEM(Re)discover your AEM
(Re)discover your AEM
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An intro
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 

Kürzlich hochgeladen

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Kürzlich hochgeladen (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 

Zero downtime ECS host updates with Terraform

  • 1.
  • 2. Managing ECS hosts with AWS lambda and step functions
  • 4. Terraform at Comtravo ➢ Six environments maintained by Terraform. ➢ Integrated into our CI/CD pipeline. ➢ Each environment has: ○ 500+ AWS components. ○ 43 Lambdas. ○ 25 microservices.
  • 5. CI/CD at Comtravo: Mono-repo Pull request
  • 6. CI/CD at Comtravo: Mono-repo Pull request
  • 7. CI/CD at Comtravo: Mono-repo Merge to master
  • 8. CI/CD at Comtravo: Mono-repo Merge to master
  • 10. ECS: Many interesting challenges
  • 11. One such challenge: Update EC2 hosts in a ECS cluster
  • 12. Update EC2 hosts in a ECS cluster: Use cases ➢ You have a custom AMI for your ECS cluster(s). ➢ You want to always rollout the latest ECS-optimized AMIs. ➢ You want to rotate the admin keys. ➢ Change Instance type. ➢ Use an updated user_data script.
  • 13. Update EC2 hosts in a ECS cluster: The process ➢ Terraform emits an AWS cloudwatch event once launch configuration was created. ➢ Detach “old instances“ from ASG and wait for capacity. ➢ “Move” services from old instances to new instances. ➢ Terminate old instances when no more tasks running. ➢ Alert on failures.
  • 14. Terraform + AWS Events + AWS Step functions = Awesome I created a new launch configuration lc-1234 for ASG asg-1234 belonging to ECS cluster cluster-A
  • 15. AWS CloudWatch Events time Task A started bar Task C started Task B stopped ECS Host bla baz custom event custom event custom event
  • 16. Terraform Event Emitter resource "null_resource" "launch-config-update" { provisioner "local-exec" { command = "python ${path.module}/scripts/emit_launchconfig_event.py --launch_configuration_name ${aws_launch_configuration.ecs-lc.name} --autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name} --ami ${var.aws_ami} --cluster ${var.cluster}" } triggers { launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}" } }
  • 17. Terraform Event { "version": "0", "id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55", "source": "comtravo.terraform.alpha", "account": "1234567890", "time": "2018-05-09T13:35:43Z", "region": "eu-west-1", "resources": [ "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" ], "detail": { "ami": "ami-bfb5fec6", "status": "ACTIVE", "agentConnected": false, "autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005", "environment": "alpha", "clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha" "launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" }, "detailType": "ECS Launch Configuration Change" }
  • 18. AWS CloudWatch Event Rules resource "aws_cloudwatch_event_rule" "ecs-manager" { name = "capture-ecs-events-${terraform.workspace}" description = "Capture ECS related events" event_pattern = <<PATTERN { "source": [ "comtravo.terraform.${terraform.workspace}" ], "detail-type": [ "ECS Launch Configuration Change" ], "detail": { "clusterArn": [ "arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}" ], "status": ["ACTIVE"] } } PATTERN }
  • 20. DEMO
  • 21.
  • 23. You all have been awesome!!!
  • 25. ECS Challenge #1 ECS AGENT DISCONNECTS
  • 26. #1 ECS agent disconnects - Initial solution ➢ Cron job on ECS hosts to notify via SNS event and restart ECS agent. ➢ Chances of ECS agent failing again due to some inherent problem within the instance are high.
  • 27. #1 ECS agent disconnects - Initial solution
  • 28. #1 ECS agent disconnects - Better solution ➢ Detect ECS agent disconnects. ➢ Bootup new ECS host and wait for it to be healthy. ➢ “Move” all the existing containers from the problematic instance to a new Instance. ➢ Terminate the problematic instance. ➢ Alert on failures.
  • 29. #1 ECS agent disconnects - Better solution
  • 30. #1 ECS agent disconnects: Detection How do we detect ECS agent disconnects? AWS Cloudwatch EVENTS to the rescue!!!
  • 31. #1 ECS agent disconnects: ECS Events time Task A started bar Task C started Task B stopped foo baz ECS agent disconnected ECS agent connected ECS agent disconnected
  • 32. #1 ECS agent disconnects: Filter ECS Events { "detail": { "agentConnected": [ false ], "clusterArn": [ "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa" ], "status": [ "ACTIVE" ] }, "detail-type": [ "ECS Container Instance State Change" ], "source": [ "aws.ecs" ] }
  • 33. #1 ECS agent disconnects: Trigger step function
  • 34. #1 ECS agent disconnects: ECS Events