SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Autoscaling your
Nomad jobs
stackconf online 2021
➔Used to be a Molecular Biologist,
Used to be a Molecular Biologist,
➔Then became a Dev,
Then became a Dev,
➔Now an Ops.
Now an Ops.
➔Currently
Currently CTO @ Hot Potatoes
CTO @ Hot Potatoes
Moving it all to the cloud
Vertical Scaling
Horizontal Scaling / Load Balancers
And than stuff got complicated….
Nomad
job "blog" {
datacenters = ["aws"]
type = "service"
group "hugo" {
network {
port "http" {
to = 80
}
}
task "nginx" {
driver = "docker"
config {
image = "${PRIVATE}.dkr.ecr.us-east-1.amazonaws.com/blog:19"
ports = ["http"]
Deploy the blog
job "blog" {
group "hugo" {
count = 2
service {
name = "blog"
tags = ["traefik.enable=true"]
port = "http"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
1 == None
job "blog" {
datacenters = ["aws"]
type = "service"
group "hugo" {
count = 2
constraint {
operator = "distinct_hosts"
value = "true"
}q
Force onto different hardware
job "blog" {
datacenters = ["aws"]
type = "service"
group "hugo" {
count = 2
Spread {
attribute = "${meta.rack}"
target "his" {
percent = 50
}
target "her" {
percent = 50
}
}
Suggest onto different hardware
/etc/nomad.d/config.hcl
Client {
Enabled = true
Meta {
"rack" = "his"
}
}
Based on custom meta-data
●
Introduced in/with Nomad 0.11
●
(Currently) independent release cycle
●
Gaining new functionality every release
●
Build in Functionality for horizontal and vertical scaling
●
Extendable by your own (community) plugins
Nomad-Autoscaler
●
Makes decisions based on a checks
●
Checks are a combination of
– Data queried from an APM
– Defined STRATEGY
– Attempt to approach TARGET value
●
Multiple Checks can be combined
●
Answer with the most resources will win!
●
ScaleOut and ScaleIn => ScaleOut
●
ScaleOut and ScaleNone => ScaleOut
●
ScaleOut(10) and ScaleOut(9) => ScaleOut(10)
Nomad-Autoscaler TLDR
job "autoscaler" {
type = "service"
group "autoscaler" {
task "autoscaler" {
driver = "docker"
config {
image = "hashicorp/nomad-autoscaler:0.3.3"
command = "nomad-autoscaler"
args = [
"agent",
"-config",
"${NOMAD_TASK_DIR}/config.hcl",
"-http-bind-address",
"0.0.0.0",
]
Deploy the autoscaler
/etc/nomad.d/config.hcl
nomad {
address = "http://{{env "attr.unique.network.ip-address" }}:4646"
}
apm "prometheus" {
driver = "prometheus"
config = {
address = "http://prometheus.service.consul:9090"
}
}
strategy "target-value" {
driver = "target-value"
}
Config for the autoscaler
Metrics
https://
prometheus.io/
group "hugo" {
count = 3
scaling {
enabled = true
min = 1
max = 20
policy {
cooldown = "20s"
check "avg_instance_sessions" {
source = "prometheus"
query = "scalar(avg(traefik_service_open_connections{service="blog@consulcatalog"}))"
strategy "target-value" {
target = 5
}
Enable autoscaling for the blog
Dashboards
https://grafana.com/oss/grafana/
Enable autoscaling
Observe scaling down event
Observe the autoscaler
agent: querying APM: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value
target=local-nomad
agent: calculating new count: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value
target=local-nomad
agent: next count outside limits: policy_id=248f6157-ca37-f868-a0ab-
cabbc67fec1d source=prometheus strategy=target-value target=local-nomad
from=3 to=0 min=1 max=10
agent: updated count to be within limits: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus
strategy=target-value target=local-nomad from=3 to=1 min=1 max=10
agent: scaling target: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d
source=prometheus strategy=target-value target=local-nomad
target_config="map[group:demo job_id:webapp]" from=3 to=1 reason="capping
count to min value of 1"
Apply load
hey -z 1m -c 30 http://127.0.0.1:8000
Remove load
Logs
https://grafana.com/oss/loki/
group "autoscaler" {
task "autoscaler" {
driver = "docker"
config {
image = "hashicorp/nomad-autoscaler:0.3.3"
command = "nomad-autoscaler"
logging {
type = "loki"
config {
loki-url = 'http://loki.service.consul:3100/api/prom/push'
tag = "loki"
Directly to loki
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
task "promtail" {
driver = "docker"
lifecycle {
hook = "prestart"
sidecar = true
}
config {
image = "grafana/promtail:2.2.1"
args = [
"-config.file",
"${NOMAD_TASK_DIR}/promtail.yaml",
]
Promtail sidecar
${NOMAD_TASK_DIR}/promtail.yaml
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
task: autoscaler
__path__: /alloc/logs/autoscaler*
pipeline_stages:
- match:
selector: '{task="autoscaler"}'
stages:
- json:
expressions:
policy_id: '"@policy_id"'
source: '"@source"'
strategy: '"@strategy"'
target: '"@target"'
group: '"@group"'
job: '"@job"'
namespace: '"@namespace"'
Promtail sidecar
https://grafana.com/docs/loki/latest/clients/promtail/
Annotate your graphs
Correlate events with metrics
https://learn.hashicorp.com/tutorials/
nomad/autoscaler-vagrant-demo?
in=nomad/ecosystem
Try it yourself
Moving it all to the cloud *
apm "prometheus" {
driver = "prometheus"
config = {
address = "http://prometheus.service.consul:9090"
}
}
target "aws-asg" {
driver = "aws-asg"
config = {
aws_region = "{{ $x := env "attr.platform.aws.placement.availability-zone" }}{{ $length := len $x |subtract 1 }}
{{ slice $x 0 $length}}"
}
}
Grow into your platform
scaling "cluster_policy" {
policy {
cooldown = "2m"
evaluation_interval = "1m"
check "cpu_allocated_percentage" {
source = "prometheus"
query = "scalar(sum(nomad_client_allocated_cpu{node_class="hashistack"}*100/
(nomad_client_unallocated_cpu{node_class="hashistack"}+nomad_client_allocated_cpu{node_class="hashistack"}))/
count(nomad_client_allocated_cpu{node_class="hashistack"}))"
strategy "target-value" {
target = 70
}
}
target "aws-asg" {
dry-run = "false"
aws_asg_name = "${client_asg_name}"
node_class = "hashistack"
node_drain_deadline = "5m”
Grow into your platform
Observe the autoscaler again
agent.worker.check_handler: querying source: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e-
bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg
query=scalar(sum(nomad_client_allocated_memory{node_class="hashistack"}*100/
(nomad_client_unallocated_memory{node_class="hashistack"}+nomad_client_allocated_memory{node_class="hashistack"}))/
count(nomad_client_allocated_memory))
agent.worker.check_handler: calculating new count: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e-
bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg count=1 metric=95.17948717948718
agent.worker.check_handler: scaling target:
check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e-
bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg
from=1 to=2 reason="scaling up because factor is 1.359707" meta=map[]
internal_plugin.aws-asg: successfully performed and verified scaling out:
action=scale_out asg_name=hashistack-nomad_client desired_count=2
agent.worker.check_handler: successfully submitted scaling action to target: check=mem_allocated_percentage
policy_id=bf68649a-d087-2e69-362e-bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg desired_count=2
https://github.com/hashicorp/nomad-
autoscaler/tree/master/demo/remote
Try it yourself
Moving it all to the cloud – QED
bram@attachmentgenie.com
@attachmentgenie
slideshare.net/attachmentgenie
Thank You

Weitere ähnliche Inhalte

Kürzlich hochgeladen

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)Max Lee
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfTestgrid.io
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionMohammed Fazuluddin
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityamy56318795
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfQ-Advise
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 

Kürzlich hochgeladen (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 

Empfohlen

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Empfohlen (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

stackconf 2021 | Autoscaling with HashiCorp Nomad

  • 2. ➔Used to be a Molecular Biologist, Used to be a Molecular Biologist, ➔Then became a Dev, Then became a Dev, ➔Now an Ops. Now an Ops. ➔Currently Currently CTO @ Hot Potatoes CTO @ Hot Potatoes
  • 3. Moving it all to the cloud
  • 5. Horizontal Scaling / Load Balancers
  • 6. And than stuff got complicated….
  • 8. job "blog" { datacenters = ["aws"] type = "service" group "hugo" { network { port "http" { to = 80 } } task "nginx" { driver = "docker" config { image = "${PRIVATE}.dkr.ecr.us-east-1.amazonaws.com/blog:19" ports = ["http"] Deploy the blog
  • 9. job "blog" { group "hugo" { count = 2 service { name = "blog" tags = ["traefik.enable=true"] port = "http" check { type = "tcp" interval = "10s" timeout = "2s" } } 1 == None
  • 10. job "blog" { datacenters = ["aws"] type = "service" group "hugo" { count = 2 constraint { operator = "distinct_hosts" value = "true" }q Force onto different hardware
  • 11. job "blog" { datacenters = ["aws"] type = "service" group "hugo" { count = 2 Spread { attribute = "${meta.rack}" target "his" { percent = 50 } target "her" { percent = 50 } } Suggest onto different hardware
  • 12. /etc/nomad.d/config.hcl Client { Enabled = true Meta { "rack" = "his" } } Based on custom meta-data
  • 13. ● Introduced in/with Nomad 0.11 ● (Currently) independent release cycle ● Gaining new functionality every release ● Build in Functionality for horizontal and vertical scaling ● Extendable by your own (community) plugins Nomad-Autoscaler
  • 14. ● Makes decisions based on a checks ● Checks are a combination of – Data queried from an APM – Defined STRATEGY – Attempt to approach TARGET value ● Multiple Checks can be combined ● Answer with the most resources will win! ● ScaleOut and ScaleIn => ScaleOut ● ScaleOut and ScaleNone => ScaleOut ● ScaleOut(10) and ScaleOut(9) => ScaleOut(10) Nomad-Autoscaler TLDR
  • 15. job "autoscaler" { type = "service" group "autoscaler" { task "autoscaler" { driver = "docker" config { image = "hashicorp/nomad-autoscaler:0.3.3" command = "nomad-autoscaler" args = [ "agent", "-config", "${NOMAD_TASK_DIR}/config.hcl", "-http-bind-address", "0.0.0.0", ] Deploy the autoscaler
  • 16. /etc/nomad.d/config.hcl nomad { address = "http://{{env "attr.unique.network.ip-address" }}:4646" } apm "prometheus" { driver = "prometheus" config = { address = "http://prometheus.service.consul:9090" } } strategy "target-value" { driver = "target-value" } Config for the autoscaler
  • 18. group "hugo" { count = 3 scaling { enabled = true min = 1 max = 20 policy { cooldown = "20s" check "avg_instance_sessions" { source = "prometheus" query = "scalar(avg(traefik_service_open_connections{service="blog@consulcatalog"}))" strategy "target-value" { target = 5 } Enable autoscaling for the blog
  • 22. Observe the autoscaler agent: querying APM: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value target=local-nomad agent: calculating new count: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value target=local-nomad agent: next count outside limits: policy_id=248f6157-ca37-f868-a0ab- cabbc67fec1d source=prometheus strategy=target-value target=local-nomad from=3 to=0 min=1 max=10 agent: updated count to be within limits: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value target=local-nomad from=3 to=1 min=1 max=10 agent: scaling target: policy_id=248f6157-ca37-f868-a0ab-cabbc67fec1d source=prometheus strategy=target-value target=local-nomad target_config="map[group:demo job_id:webapp]" from=3 to=1 reason="capping count to min value of 1"
  • 23. Apply load hey -z 1m -c 30 http://127.0.0.1:8000
  • 26. group "autoscaler" { task "autoscaler" { driver = "docker" config { image = "hashicorp/nomad-autoscaler:0.3.3" command = "nomad-autoscaler" logging { type = "loki" config { loki-url = 'http://loki.service.consul:3100/api/prom/push' tag = "loki" Directly to loki docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
  • 27. task "promtail" { driver = "docker" lifecycle { hook = "prestart" sidecar = true } config { image = "grafana/promtail:2.2.1" args = [ "-config.file", "${NOMAD_TASK_DIR}/promtail.yaml", ] Promtail sidecar
  • 28. ${NOMAD_TASK_DIR}/promtail.yaml scrape_configs: - job_name: system static_configs: - targets: - localhost labels: task: autoscaler __path__: /alloc/logs/autoscaler* pipeline_stages: - match: selector: '{task="autoscaler"}' stages: - json: expressions: policy_id: '"@policy_id"' source: '"@source"' strategy: '"@strategy"' target: '"@target"' group: '"@group"' job: '"@job"' namespace: '"@namespace"' Promtail sidecar https://grafana.com/docs/loki/latest/clients/promtail/
  • 32. Moving it all to the cloud *
  • 33. apm "prometheus" { driver = "prometheus" config = { address = "http://prometheus.service.consul:9090" } } target "aws-asg" { driver = "aws-asg" config = { aws_region = "{{ $x := env "attr.platform.aws.placement.availability-zone" }}{{ $length := len $x |subtract 1 }} {{ slice $x 0 $length}}" } } Grow into your platform
  • 34. scaling "cluster_policy" { policy { cooldown = "2m" evaluation_interval = "1m" check "cpu_allocated_percentage" { source = "prometheus" query = "scalar(sum(nomad_client_allocated_cpu{node_class="hashistack"}*100/ (nomad_client_unallocated_cpu{node_class="hashistack"}+nomad_client_allocated_cpu{node_class="hashistack"}))/ count(nomad_client_allocated_cpu{node_class="hashistack"}))" strategy "target-value" { target = 70 } } target "aws-asg" { dry-run = "false" aws_asg_name = "${client_asg_name}" node_class = "hashistack" node_drain_deadline = "5m” Grow into your platform
  • 35. Observe the autoscaler again agent.worker.check_handler: querying source: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e- bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg query=scalar(sum(nomad_client_allocated_memory{node_class="hashistack"}*100/ (nomad_client_unallocated_memory{node_class="hashistack"}+nomad_client_allocated_memory{node_class="hashistack"}))/ count(nomad_client_allocated_memory)) agent.worker.check_handler: calculating new count: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e- bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg count=1 metric=95.17948717948718 agent.worker.check_handler: scaling target: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e- bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg from=1 to=2 reason="scaling up because factor is 1.359707" meta=map[] internal_plugin.aws-asg: successfully performed and verified scaling out: action=scale_out asg_name=hashistack-nomad_client desired_count=2 agent.worker.check_handler: successfully submitted scaling action to target: check=mem_allocated_percentage policy_id=bf68649a-d087-2e69-362e-bbe71b5544f7 source=prometheus strategy=target-value target=aws-asg desired_count=2
  • 37. Moving it all to the cloud – QED