SlideShare a Scribd company logo
1 of 31
Download to read offline
Using EBS with
Auto Scaling Groups
How to use the immense power of AWS
Auto-Scaling Groups for a stateful Docker application.
Background
In a service-oriented world where requests can come from anywhere at any time,
keeping a system constantly up and available is essential to its success.
When running at scale, failures happen. This is just a fact of life for modern,
distributed systems.
The focus should not be on trying to prevent those failures, unless you want to start
a hard-disk company. Instead, we should endeavour to automatically react to those
failures - restoring service quickly and with minimal impact.
Background
ASGs (Auto Scaling Groups) can help by automatically monitoring the load and
health of your instances. If a node fails, it will be replaced automatically so you
don't get woken up in the middle of the night with a PagerDuty alert.
This post explores how to use AWS auto-scaling groups for stateful apps because
special care needs to be taken when using EBS volumes.
Background
Our application allows users to post data to our API and we use Cassandra to both
save and analyse the data.
A Use Case
CassandraREST API diskAnalytics
We decide to employ one of the killer features of Cassandra - the ability to scale
horizontally.
We settle on having three nodes in our Cassandra ring. As well as providing high
availability in the event of a node failure, this will also mean we distribute read
queries across more CPUs and increase the total disk capacity across the cluster.
We could spin up three EC2 nodes and install Cassandra using Terraform or
Ansible. However, for the reasons mentioned above, we want the Cassandra
cluster to auto-heal if a node fails and so we decide to use an Auto-Scaling Group.
A Use Case
Auto-Scaling
Groups
There are a few steps to creating an ASG:
● Create AMI
○ creating the base image to launch instances from
● Launch Configuration
○ configure what instance size and keyname
● Auto-Scaling Group
○ manage multiple instances in the group
Let's walk through this setup:
Auto-Scaling Groups
Create AMI
We bake an AMI based on Ubuntu Xenial 16.04 with Docker CE 17.03 installed so
we can run Cassandra inside a container.
Auto-Scaling Groups
$ aws ec2 create-image 
--instance-id ${BUILDER_INSTANCE_ID} 
--name myami
Launch Configuration
Then we create a launch configuration which uses our AMI and instance type
`t2.large` for our group.
Notice the `--block-device-mappings` field - this describes how our launch configuration will create and
attach a new EBS drive to each instance.
$ aws autoscaling create-launch-configuration 
--launch-configuration-name asg_demo_config 
--image-id myami 
--instance-type t2.large 
--key-name my-key 
--block-device-mappings
"[{"DeviceName":"/dev/sda1","Ebs":{"SnapshotId":"snap-3decf207"}},{"DeviceName":"/dev/sdf","Ebs":{"SnapshotI
d":"snap-eed6ac86"}}]"
Auto-Scaling Groups
Auto-Scaling Group
Next, we create the Auto-Scaling Group and point at the Launch Configuration we
just made. This ASG now manages the number of instances in our Cassandra
cluster. We create a Load-Balancer and point it at the ASG which lets us send traffic
to any of the instances in that group.
$ aws autoscaling create-auto-scaling-group 
--auto-scaling-group-name asg_demo 
--launch-configuration-name asg_demo_config 
--min-size 3 
--max-size 3 
--desired-capacity 3 
--availability-zones eu-west-2
Auto-Scaling Groups
Auto Scaling Groups
Cassandra 1
Load
Balancer
Cassandra 2
Cassandra 3
EBS
EBS
EBS
Application
Replication
Auto Scaling Group
Let's see what this looks like:
A Problem
When running tests with this setup, we realise a fundamental flaw in our system:
If a node fails, it is automatically replaced with a new EC2 instance and EBS
volume (great), but this volume doesn't have any data. Cassandra will populate
this empty volume using a replica but this can take a significant amount of time
- which hurts our performance until complete.
The problem is that within an Auto-Scaling Group - AWS treats an EC2 instance and
its associated EBS volume as a single, atomic unit.
A Problem
A Problem
This means that if the EC2 instance is terminated, the EBS drive is deleted - along
with the dataset Cassandra it was using.
A new EBS drive will be created but Cassandra will have to send all the data over
the network to rebuild the dataset on that node.
This can take a long time if the dataset is large. What if we could just reuse the EBS
drive that was attached to the old node? Then most of our dataset is already there
when the new node starts up.
We realise that we need to de-couple compute and storage.
Cassandra 1
LB Cassandra 2
Cassandra 3
EBS
Auto Scaling Group
EBS
DRIVE
DELETED!
EBS
EC2
INSTANCE
DELETED!
Cassandra 1
LB Cassandra 2
Cassandra 3
EBS
Auto Scaling Group
EBS
EMPTY
Application
Replication
(REBUILDING)
NEW EBS
DRIVE
CREATED!
NEW EC2
INSTANCE
CREATED!
Node Failure Cluster Repair
Portworx:
The Solution
Portworx: The Solution
Using Portworx to add a data services layer - we can have a level of separation
with Auto-Scaling Groups managing EC2 instances (compute) and Portworx
managing EBS volumes (storage).
The key aspect of this solution is that when the Auto Scaling Group terminates an
EC2 instance - the EBS volume is NOT removed. More importantly, the same EBS
volume that was attached to an instance previously, is re-used for the next
instance.
Let's see what this looks like:
Auto Scaling Group PX EBS Pool
EBS 1
Cassandra 1
PX 1
PX
EBS 2
Cassandra 2
PX 2
PX
EBS 3
Cassandra 3
PX 3
PX
Portworx EBS Pool
Load
Balancer
Cassandra
Portworx Volume
EBS Storage Volume
Application
Replication
OTHER
Other
Portworx Volumes
This means our design now works because:
● data written by an instance that is terminated is not lost
● Cassandra containers re-use volumes and so already have most of their data
● Rebuilding shards takes significantly less time because only the writes that
happened in the last few minutes need to be copied
The reason this works is because Portworx is a data services layer that manages
your underlying storage (EBS) and leaves the Auto-Scaling Group to manage only
the compute (EC2).
Let's compare how this works in a failure scenario:
Portworx: The Solution
1. A single node fails in a 3 node Cassandra ring
2. The ASG creates a new EC2 instance and a new EBS volume to attach to it
3. Cassandra starts on the new node and discovers an empty volume and so
starts to rebuild from the replica
4. Once the rebuild is complete (some time later) - the cluster is back and healthy
Failover with pure ASGs
Cassandra 1
LB Cassandra 2
Cassandra 3
EBS
Auto Scaling Group
EBS
DRIVE
DELETED!
EBS
EC2
INSTANCE
DELETED!
Cassandra 1
LB Cassandra 2
Cassandra 3
EBS
Auto Scaling Group
EBS
EMPTY
Application
Replication
(REBUILDING)
NEW EBS
DRIVE
CREATED!
NEW EC2
INSTANCE
CREATED!
Node Failure Cluster Repair
1. A single node fails in a 3 node Cassandra ring
2. The ASG creates a only a new EC2 instance - the old EBS volume is not
deleted
3. Cassandra starts on the new node and discovers a mostly full volume - it starts
re-building to catch up with any writes that happened in the last few moments
4. Once the rebuild is complete (significantly less time later) - the cluster is back
and healthy
Auto scaling ASGs plus Portworx
LB
Auto Scaling Group PX EBS Pool
EBS 1
Cassandra 1
PX 1
PX
EBS 2
Cassandra 2
PX 2
PX
EBS 3
Cassandra 3
LB
Auto Scaling Group PX EBS Pool
EBS 1
Cassandra 1
PX 1
PX
EBS 2
Cassandra 2
PX 2
PX
EBS 3
Cassandra 3
PX 3
PX
Node Failure Cluster Repair
EC2
INSTANCE
DELETED!
EBS
DRIVE
REMAINS!
NEW EC2
INSTANCE
CREATED!
Application
Replication
Application
Replication
(MINIMAL
REBUILD)
EBS
DRIVE
REMAINS!OTHER
PX 3
OTHER
An EBS volume could contain hundreds of Gigabytes. Being able to reuse that
existing EBS drive - with dataset intact, means Cassandra takes an order of
magnitude less time to rebuild.
This only works because Portworx can de-couple compute from storage.
Comparison
Portworx has a clusterid and can use one of three methods to connect to the AWS
api:
● Using AWS access credentials
● Using Cloud Init
● Using Instance Privileges
Portworx is now able to create new EBS volumes on demand. As it creates these
EBS volumes, it will tag them with identifying values so at any time, it can
enumerate the available pool of EBS volumes available to an Auto-Scaling Group.
How it works
When a new instance is added to the group - Portworx does the following:
● check the pool to see if there are are candidate EBS volumes that can be used
● if no - then create one using the specification of volumes already in the pool
● in both cases - the EBS volume is associated with the new instance
How it works
How it works
Using this setup - if we have a healthy 3 node Cassandra cluster and one of our
nodes dies - whilst the Auto-Scaling Group will replace the compute instance,
Portworx will reuse the storage volume.
Conclusion
By de-coupling compute from storage, we get the immense power of AWS
Auto-Scaling Groups to manage compute without worrying that your data will
disappear or that your cluster will take hours to actually scale.
To try this out - check out our documentation on AWS Auto Scaling Groups.
Conclusion
Using EBS with
Auto Scaling Groups
Visit the Portworx website to find out more!

More Related Content

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Using EBS with AWS auto-scaling groups and Docker

  • 1. Using EBS with Auto Scaling Groups How to use the immense power of AWS Auto-Scaling Groups for a stateful Docker application.
  • 3. In a service-oriented world where requests can come from anywhere at any time, keeping a system constantly up and available is essential to its success. When running at scale, failures happen. This is just a fact of life for modern, distributed systems. The focus should not be on trying to prevent those failures, unless you want to start a hard-disk company. Instead, we should endeavour to automatically react to those failures - restoring service quickly and with minimal impact. Background
  • 4. ASGs (Auto Scaling Groups) can help by automatically monitoring the load and health of your instances. If a node fails, it will be replaced automatically so you don't get woken up in the middle of the night with a PagerDuty alert. This post explores how to use AWS auto-scaling groups for stateful apps because special care needs to be taken when using EBS volumes. Background
  • 5. Our application allows users to post data to our API and we use Cassandra to both save and analyse the data. A Use Case CassandraREST API diskAnalytics We decide to employ one of the killer features of Cassandra - the ability to scale horizontally.
  • 6. We settle on having three nodes in our Cassandra ring. As well as providing high availability in the event of a node failure, this will also mean we distribute read queries across more CPUs and increase the total disk capacity across the cluster. We could spin up three EC2 nodes and install Cassandra using Terraform or Ansible. However, for the reasons mentioned above, we want the Cassandra cluster to auto-heal if a node fails and so we decide to use an Auto-Scaling Group. A Use Case
  • 8. There are a few steps to creating an ASG: ● Create AMI ○ creating the base image to launch instances from ● Launch Configuration ○ configure what instance size and keyname ● Auto-Scaling Group ○ manage multiple instances in the group Let's walk through this setup: Auto-Scaling Groups
  • 9. Create AMI We bake an AMI based on Ubuntu Xenial 16.04 with Docker CE 17.03 installed so we can run Cassandra inside a container. Auto-Scaling Groups $ aws ec2 create-image --instance-id ${BUILDER_INSTANCE_ID} --name myami
  • 10. Launch Configuration Then we create a launch configuration which uses our AMI and instance type `t2.large` for our group. Notice the `--block-device-mappings` field - this describes how our launch configuration will create and attach a new EBS drive to each instance. $ aws autoscaling create-launch-configuration --launch-configuration-name asg_demo_config --image-id myami --instance-type t2.large --key-name my-key --block-device-mappings "[{"DeviceName":"/dev/sda1","Ebs":{"SnapshotId":"snap-3decf207"}},{"DeviceName":"/dev/sdf","Ebs":{"SnapshotI d":"snap-eed6ac86"}}]" Auto-Scaling Groups
  • 11. Auto-Scaling Group Next, we create the Auto-Scaling Group and point at the Launch Configuration we just made. This ASG now manages the number of instances in our Cassandra cluster. We create a Load-Balancer and point it at the ASG which lets us send traffic to any of the instances in that group. $ aws autoscaling create-auto-scaling-group --auto-scaling-group-name asg_demo --launch-configuration-name asg_demo_config --min-size 3 --max-size 3 --desired-capacity 3 --availability-zones eu-west-2 Auto-Scaling Groups
  • 12. Auto Scaling Groups Cassandra 1 Load Balancer Cassandra 2 Cassandra 3 EBS EBS EBS Application Replication Auto Scaling Group Let's see what this looks like:
  • 14. When running tests with this setup, we realise a fundamental flaw in our system: If a node fails, it is automatically replaced with a new EC2 instance and EBS volume (great), but this volume doesn't have any data. Cassandra will populate this empty volume using a replica but this can take a significant amount of time - which hurts our performance until complete. The problem is that within an Auto-Scaling Group - AWS treats an EC2 instance and its associated EBS volume as a single, atomic unit. A Problem
  • 15. A Problem This means that if the EC2 instance is terminated, the EBS drive is deleted - along with the dataset Cassandra it was using. A new EBS drive will be created but Cassandra will have to send all the data over the network to rebuild the dataset on that node. This can take a long time if the dataset is large. What if we could just reuse the EBS drive that was attached to the old node? Then most of our dataset is already there when the new node starts up. We realise that we need to de-couple compute and storage.
  • 16. Cassandra 1 LB Cassandra 2 Cassandra 3 EBS Auto Scaling Group EBS DRIVE DELETED! EBS EC2 INSTANCE DELETED! Cassandra 1 LB Cassandra 2 Cassandra 3 EBS Auto Scaling Group EBS EMPTY Application Replication (REBUILDING) NEW EBS DRIVE CREATED! NEW EC2 INSTANCE CREATED! Node Failure Cluster Repair
  • 18. Portworx: The Solution Using Portworx to add a data services layer - we can have a level of separation with Auto-Scaling Groups managing EC2 instances (compute) and Portworx managing EBS volumes (storage). The key aspect of this solution is that when the Auto Scaling Group terminates an EC2 instance - the EBS volume is NOT removed. More importantly, the same EBS volume that was attached to an instance previously, is re-used for the next instance. Let's see what this looks like:
  • 19. Auto Scaling Group PX EBS Pool EBS 1 Cassandra 1 PX 1 PX EBS 2 Cassandra 2 PX 2 PX EBS 3 Cassandra 3 PX 3 PX Portworx EBS Pool Load Balancer Cassandra Portworx Volume EBS Storage Volume Application Replication OTHER Other Portworx Volumes
  • 20. This means our design now works because: ● data written by an instance that is terminated is not lost ● Cassandra containers re-use volumes and so already have most of their data ● Rebuilding shards takes significantly less time because only the writes that happened in the last few minutes need to be copied The reason this works is because Portworx is a data services layer that manages your underlying storage (EBS) and leaves the Auto-Scaling Group to manage only the compute (EC2). Let's compare how this works in a failure scenario: Portworx: The Solution
  • 21. 1. A single node fails in a 3 node Cassandra ring 2. The ASG creates a new EC2 instance and a new EBS volume to attach to it 3. Cassandra starts on the new node and discovers an empty volume and so starts to rebuild from the replica 4. Once the rebuild is complete (some time later) - the cluster is back and healthy Failover with pure ASGs
  • 22. Cassandra 1 LB Cassandra 2 Cassandra 3 EBS Auto Scaling Group EBS DRIVE DELETED! EBS EC2 INSTANCE DELETED! Cassandra 1 LB Cassandra 2 Cassandra 3 EBS Auto Scaling Group EBS EMPTY Application Replication (REBUILDING) NEW EBS DRIVE CREATED! NEW EC2 INSTANCE CREATED! Node Failure Cluster Repair
  • 23. 1. A single node fails in a 3 node Cassandra ring 2. The ASG creates a only a new EC2 instance - the old EBS volume is not deleted 3. Cassandra starts on the new node and discovers a mostly full volume - it starts re-building to catch up with any writes that happened in the last few moments 4. Once the rebuild is complete (significantly less time later) - the cluster is back and healthy Auto scaling ASGs plus Portworx
  • 24. LB Auto Scaling Group PX EBS Pool EBS 1 Cassandra 1 PX 1 PX EBS 2 Cassandra 2 PX 2 PX EBS 3 Cassandra 3 LB Auto Scaling Group PX EBS Pool EBS 1 Cassandra 1 PX 1 PX EBS 2 Cassandra 2 PX 2 PX EBS 3 Cassandra 3 PX 3 PX Node Failure Cluster Repair EC2 INSTANCE DELETED! EBS DRIVE REMAINS! NEW EC2 INSTANCE CREATED! Application Replication Application Replication (MINIMAL REBUILD) EBS DRIVE REMAINS!OTHER PX 3 OTHER
  • 25. An EBS volume could contain hundreds of Gigabytes. Being able to reuse that existing EBS drive - with dataset intact, means Cassandra takes an order of magnitude less time to rebuild. This only works because Portworx can de-couple compute from storage. Comparison
  • 26. Portworx has a clusterid and can use one of three methods to connect to the AWS api: ● Using AWS access credentials ● Using Cloud Init ● Using Instance Privileges Portworx is now able to create new EBS volumes on demand. As it creates these EBS volumes, it will tag them with identifying values so at any time, it can enumerate the available pool of EBS volumes available to an Auto-Scaling Group. How it works
  • 27. When a new instance is added to the group - Portworx does the following: ● check the pool to see if there are are candidate EBS volumes that can be used ● if no - then create one using the specification of volumes already in the pool ● in both cases - the EBS volume is associated with the new instance How it works
  • 28. How it works Using this setup - if we have a healthy 3 node Cassandra cluster and one of our nodes dies - whilst the Auto-Scaling Group will replace the compute instance, Portworx will reuse the storage volume.
  • 30. By de-coupling compute from storage, we get the immense power of AWS Auto-Scaling Groups to manage compute without worrying that your data will disappear or that your cluster will take hours to actually scale. To try this out - check out our documentation on AWS Auto Scaling Groups. Conclusion
  • 31. Using EBS with Auto Scaling Groups Visit the Portworx website to find out more!