Using Ansible at Scale to Manage a Public Cloud

•Als ODP, PDF herunterladen•

26 gefällt mir•13,639 views

Jesse Keating

An overview of three scale challenges at Rackspace and how Ansible key features helps us solve those three challenges.

Technologie

Jesse Keating – Linux Systems Engineer IV – Cloud Servers
@iamjkeating
Using Ansible at Scale to Manage
a Public Cloud
06/13/2013 – AnsibleFest

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Rackspace cares about scale
● Scale of server systems
● Scale of environments
● Scale of engineers

Rackspace Public Cloud
● 4 “Production” regions
– 1 to 8 cells per region
– 250 to 500 nodes per cell
● Nearly 15K “systems” in production
● Another 500~ in CI/pre-production
● Mixed use of copy-pasta pssh scripts, pre-configured
agent actions, jenkins automation, and host-based
config management
● Managed by admins, engineers, developers

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Case study: Hotpatch One Production
Environment
● 3900~ compute-nodes
– Spread across 8 cells
– Out of 6000~ total hosts
● Alerting will flood admins
● Output is hard to parse

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Key Features
● Inventory plugin
● Simple process flow
● Reusable playbooks with variable adjustments
● Avoids repeated actions on downed hosts
● Cleaner output

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Use
● Replacing use of pssh for Random Tasks
● Replacing use of pssh for Expected Tasks (outside
config management)
● Reuse existing inventory content
● Easily bolt together processes such as disabling nagios
alerts prior to execution

Rackspace OpenStack Development
● At least 7 major software projects
– Different feature schedules within each
● One Continuous Integration environment
● One Pre-production environment
● One branch of code that can easily be deployed
● New code deploys every two weeks

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Case Study: Create production like
environment to test disruptive product code
change
● 30~ virtual instances
– DB servers
– Rabbit servers
– Service providers
● 40~ capacity nodes
– Hypervisor + nova-compute VM
● Mixed use of fabric, shell scripts, copy-pasta
● No self service

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Key Features
● Intermix local actions and remote actions
● External inventory plugin
● Start from nothing
● API to use directly within another application

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Use
● Replacing use of fabric, pssh, copy-pasta
● Boot strapping environment to the point where existing
config management can take over
● Freeing up Engineer time by making it self-service
● Freeing up resources by tearing down environments
after use
● Working toward using same process to build out
production environments

Rackspace Engineering
● Between 4K and 6K employees/contractors
● Between 500 and 1K Engineer/Developer types
● Many dozens of summer interns
● Countless groups
● Countless projects
● Rapid team creation / shifting of resources
● Mixed use of Mac OSX and Linux
● Mixed use of automation, configuration, et al tools
● Disjoint ownership of engineering onboarding

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Case study: Ozone Onboard
● 30+ git repos
● 5+ utilities w/ configuration
● Permissions to a plethora of services
● Configuration for CI/preprod/prod environments
● Details scattered throughout wiki pages and tribal
knowledge

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Key Features
● Modular Roles
● Minimal dependencies
● OS agnostic
● Idempotent
● Fast
● Easy to use and extend

RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Ansible Use
● Developer bootstraps their own system by selecting
roles and providing details
● Teams own role definitions within a shared framework
● Repeatable process
– Ansible playbook to clone/update roles
– Second playbook to process roles

Conclusion
● Ansible solves many problems Rackspace faces
● Chip away at edges with Ansible, perhaps one day
replace existing config management systems with
Ansible
● Continue to assist in development of Ansible
modules, plugins, and scale testing
● Launch Ansibox soon!

36
RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM
RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. |
WWW.RACKSPACE.COM

Empfohlen

Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI

Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin

Redo log improvements MYSQL 8.0Mydbops

Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB

What is new in MariaDB 10.6?Mydbops

Patroni - HA PostgreSQL made easyAlexander Kukushkin

MariaDB MaxScaleMariaDB plc

Transparent sharding with Spider: what's new and getting startedMariaDB plc

Empfohlen

Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI

Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin

Redo log improvements MYSQL 8.0Mydbops

Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB

What is new in MariaDB 10.6?Mydbops

Patroni - HA PostgreSQL made easyAlexander Kukushkin

MariaDB MaxScaleMariaDB plc

Transparent sharding with Spider: what's new and getting startedMariaDB plc

VSAN – Architettura e DesignVMUG IT

Maxscale switchover, failover, and auto rejoinWagner Bianchi

MySQL Database Architectures - 2020-10Kenny Gryp

Student guide power systems for aix - virtualization i implementing virtual...solarisyougood

Oracle data guard for beginnersPini Dibask

Galera cluster for high availability Mydbops

Oracle database high availability solutionsKirill Loifman

MySQL InnoDB Cluster HA Overview & DemoKeith Hollman

MySQL Shell for Database EngineersMydbops

MySQL Group Replication - HandsOn TutorialKenny Gryp

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné

Galera cluster for MySQL - Introduction SlidesSeveralnines

Percona XtraDB Cluster ( Ensure high Availability )Mydbops

Webinar: PostgreSQL continuous backup and PITR with BarmanGabriele Bartolini

Masakari project onboardingSampath Priyankara

Maria DB Galera Cluster for High AvailabilityOSSCube

OpenStack Kolla IntroductionDaneyon Hansen

Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops

A day in the life of a VSAN I/O - STO7875Duncan Epping

Boyan Krosnov - Building a software-defined cloud - our experienceShapeBlue

2021.02 new in Ceph Pacific DashboardCeph Community

Weitere ähnliche Inhalte

Was ist angesagt?

VSAN – Architettura e DesignVMUG IT

Maxscale switchover, failover, and auto rejoinWagner Bianchi

MySQL Database Architectures - 2020-10Kenny Gryp

Student guide power systems for aix - virtualization i implementing virtual...solarisyougood

Oracle data guard for beginnersPini Dibask

Galera cluster for high availability Mydbops

Oracle database high availability solutionsKirill Loifman

MySQL InnoDB Cluster HA Overview & DemoKeith Hollman

MySQL Shell for Database EngineersMydbops

MySQL Group Replication - HandsOn TutorialKenny Gryp

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné

Galera cluster for MySQL - Introduction SlidesSeveralnines

Percona XtraDB Cluster ( Ensure high Availability )Mydbops

Webinar: PostgreSQL continuous backup and PITR with BarmanGabriele Bartolini

Masakari project onboardingSampath Priyankara

Maria DB Galera Cluster for High AvailabilityOSSCube

OpenStack Kolla IntroductionDaneyon Hansen

Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops

A day in the life of a VSAN I/O - STO7875Duncan Epping

Was ist angesagt? (20)

VSAN – Architettura e Design

Maxscale switchover, failover, and auto rejoin

MySQL Database Architectures - 2020-10

Student guide power systems for aix - virtualization i implementing virtual...

Oracle data guard for beginners

Galera cluster for high availability

Oracle database high availability solutions

MySQL InnoDB Cluster HA Overview & Demo

MySQL Shell for Database Engineers

MySQL Group Replication - HandsOn Tutorial

Supporting Apache HBase : Troubleshooting and Supportability Improvements

The Full MySQL and MariaDB Parallel Replication Tutorial

Galera cluster for MySQL - Introduction Slides

Percona XtraDB Cluster ( Ensure high Availability )

Webinar: PostgreSQL continuous backup and PITR with Barman

Masakari project onboarding

Maria DB Galera Cluster for High Availability

OpenStack Kolla Introduction

Wars of MySQL Cluster ( InnoDB Cluster VS Galera )

A day in the life of a VSAN I/O - STO7875

Ähnlich wie Using Ansible at Scale to Manage a Public Cloud

Boyan Krosnov - Building a software-defined cloud - our experienceShapeBlue

2021.02 new in Ceph Pacific DashboardCeph Community

OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin

Oracle week Israel - OpenStack Platform - 2013Arthur Berezin

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez

AWS migration: getting to Data Center heaven with AWS and ChefJuan Vicente Herrera Ruiz de Alejo

CloudStack and LINBIT SDS IntegrationShapeBlue

Red Hat Storage RoadmapColleen Corrice

Red Hat Storage RoadmapRed_Hat_Storage

Deploying OpenStack with AnsibleKevin Carter

RTP NPUG: Ansible Intro and Integration with ACIJoel W. King

Academy PRO: Docker. Part 1Binary Studio

Running Java Applications inside Kubernetes with Nested Container Architectur...Jelastic Multi-Cloud PaaS

Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham

Flexible computePeter Clapham

Introduction to Apache Mesos and DC/OSSteve Wong

Docker - Ankara JUG, Nisan 2015Mustafa AKIN

VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld

Deep dive into OpenStack storage, Sean Cohen, Red HatSean Cohen

Deep Dive into Openstack Storage, Sean Cohen, Red HatCloud Native Day Tel Aviv

Ähnlich wie Using Ansible at Scale to Manage a Public Cloud (20)

Boyan Krosnov - Building a software-defined cloud - our experience

2021.02 new in Ceph Pacific Dashboard

OpenStack Best Practices and Considerations - terasky tech day

Oracle week Israel - OpenStack Platform - 2013

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...

AWS migration: getting to Data Center heaven with AWS and Chef

CloudStack and LINBIT SDS Integration

Red Hat Storage Roadmap

Deploying OpenStack with Ansible

RTP NPUG: Ansible Intro and Integration with ACI

Academy PRO: Docker. Part 1

Running Java Applications inside Kubernetes with Nested Container Architectur...

Sanger, upcoming Openstack for Bio-informaticians

Flexible compute

Introduction to Apache Mesos and DC/OS

Docker - Ankara JUG, Nisan 2015

VMworld 2013: How SRP Delivers More Than Power to Their Customers

Deep dive into OpenStack storage, Sean Cohen, Red Hat

Deep Dive into Openstack Storage, Sean Cohen, Red Hat

Kürzlich hochgeladen

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Scale your database traffic with Read & Write split using MySQL RouterMydbops

A Journey Into the Emotions of Software DevelopersNicole Novielli

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

QCon London: Mastering long-running processes in modern architecturesBernd Ruecker

2024 April Patch TuesdayIvanti

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

How to write a Business Continuity PlanDatabarracks

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Kürzlich hochgeladen (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Generative Artificial Intelligence: How generative AI works.pdf

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Scale your database traffic with Read & Write split using MySQL Router

A Journey Into the Emotions of Software Developers

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

How AI, OpenAI, and ChatGPT impact business and software.

Testing tools and AI - ideas what to try with some tool examples

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Genislab builds better products and faster go-to-market with Lean project man...

QCon London: Mastering long-running processes in modern architectures

2024 April Patch Tuesday

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

How to write a Business Continuity Plan

TeamStation AI System Report LATAM IT Salaries 2024

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Using Ansible at Scale to Manage a Public Cloud

1. Jesse Keating – Linux Systems Engineer IV – Cloud Servers @iamjkeating Using Ansible at Scale to Manage a Public Cloud 06/13/2013 – AnsibleFest

2. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Rackspace cares about scale ● Scale of server systems ● Scale of environments ● Scale of engineers

3. Scale of Server Systems

4. Rackspace Public Cloud ● 4 “Production” regions – 1 to 8 cells per region – 250 to 500 nodes per cell ● Nearly 15K “systems” in production ● Another 500~ in CI/pre-production ● Mixed use of copy-pasta pssh scripts, pre-configured agent actions, jenkins automation, and host-based config management ● Managed by admins, engineers, developers

5. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Case study: Hotpatch One Production Environment ● 3900~ compute-nodes – Spread across 8 cells – Out of 6000~ total hosts ● Alerting will flood admins ● Output is hard to parse

6. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Key Features ● Inventory plugin ● Simple process flow ● Reusable playbooks with variable adjustments ● Avoids repeated actions on downed hosts ● Cleaner output

7. Need to change

8. .. and

9. to...

10.

11. So we can do...

12. Or this

13. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Use ● Replacing use of pssh for Random Tasks ● Replacing use of pssh for Expected Tasks (outside config management) ● Reuse existing inventory content ● Easily bolt together processes such as disabling nagios alerts prior to execution

14. Scale of Environments

15. Rackspace OpenStack Development ● At least 7 major software projects – Different feature schedules within each ● One Continuous Integration environment ● One Pre-production environment ● One branch of code that can easily be deployed ● New code deploys every two weeks

16. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Case Study: Create production like environment to test disruptive product code change ● 30~ virtual instances – DB servers – Rabbit servers – Service providers ● 40~ capacity nodes – Hypervisor + nova-compute VM ● Mixed use of fabric, shell scripts, copy-pasta ● No self service

17. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Key Features ● Intermix local actions and remote actions ● External inventory plugin ● Start from nothing ● API to use directly within another application

18. Start with localhost prep

19. Local actions to boot instances

20. Remote actions on hosts

21. Existing yaml for host vars

22.

23. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Use ● Replacing use of fabric, pssh, copy-pasta ● Boot strapping environment to the point where existing config management can take over ● Freeing up Engineer time by making it self-service ● Freeing up resources by tearing down environments after use ● Working toward using same process to build out production environments

24. Scale of Engineers

25. Rackspace Engineering ● Between 4K and 6K employees/contractors ● Between 500 and 1K Engineer/Developer types ● Many dozens of summer interns ● Countless groups ● Countless projects ● Rapid team creation / shifting of resources ● Mixed use of Mac OSX and Linux ● Mixed use of automation, configuration, et al tools ● Disjoint ownership of engineering onboarding

26. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Case study: Ozone Onboard ● 30+ git repos ● 5+ utilities w/ configuration ● Permissions to a plethora of services ● Configuration for CI/preprod/prod environments ● Details scattered throughout wiki pages and tribal knowledge

27. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Key Features ● Modular Roles ● Minimal dependencies ● OS agnostic ● Idempotent ● Fast ● Easy to use and extend

28. Overview of Ansibox

29. User edited file

30. Top level playbook

31. Generated Playbook

32. Making it go

33. Ozone Tasks

34. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Ansible Use ● Developer bootstraps their own system by selecting roles and providing details ● Teams own role definitions within a shared framework ● Repeatable process – Ansible playbook to clone/update roles – Second playbook to process roles

35. Conclusion ● Ansible solves many problems Rackspace faces ● Chip away at edges with Ansible, perhaps one day replace existing config management systems with Ansible ● Continue to assist in development of Ansible modules, plugins, and scale testing ● Launch Ansibox soon!

Hinweis der Redaktion

I'm Jesse Keating I work at Rackspace I'm going to talk about what Rackspace does with Ansible
At Rackspace we care about scale. Scale of number of server systems Scale of product environments Scale of engineers doing awesome things at Rackspace. Going to cover three scale challenges with three case studies that will highlight key Ansible features that have made it my go to tool in the box.
First is the scale of servers. I work in the Rackspace Public Cloud product group. We have... It is a lot to handle. Have existing inventory files for use with pssh/etc. Admins worry about what's there, engineers work on growing capacity and automation, developers work on new code and new tools to deploy code. We all work together, DevOps.
A real world example from a couple days ago Needed to copy one file out to nova-compute Vms and restart nova-compute service Want to avoid flooding the admins with alerts Want easy to read output to know what happened. Before would have been manual actions on nagios hosts, bash script around pssh, lots of output noise, repeat delays on inactive hosts
Key things Ansible brings to the party
Example of existing inventory contents. Regions with cells with groups
More
Json output that ansible can use. Groups of groups, group_vars, addresses.
Fairly simple python script to hand to ansible (but it can be anything, so long as it hands back json)
Silly example of a one-off task
Actual playbook used to hot-patch production
This is how we're using Ansible RIGHT NOW with our production environment Building up a toolbox as we go
Next I want to talk about the scale of our environments. Again I'll be focusing on our public cloud, which is powered by OpenStack. Stop me when you spot the problem. Servers, block storage, object storage, networks, auth, usage, etc... CI is really just for automated tests to gauge health Way too many moving parts for one pre-production environment, puts risk on deploying code in timely manner. Not easy to deploy from personal branch/fork
What we want to do is build out preproduction environments for each group or individual developer. Big task Before could be days or weeks before an environment could be created, then could sit unused for long periods of time. Devs couldn't do it, Engineers had to find time to fit it in.
Why we went with Ansible to back this service
Apologize for puppet/mco stuff here, but that is what is pre-existing Localhost actions to prepare files for new hosts
Use the host loop to parallelize host boot up in one of our internal Nova environments Eventually this will use the rax module, which could do the DNS step for us
Now do some actions on the remote hosts. Not showing everything Still in development
Inventory files look a little different here, more details per host. Making use of some yaml syntax to have defaults that can be overloaded.
Plugin to read the files, and use --host
What could take days/weeks to get done can now take minutes. Automating the part that isn't already automated, filling the gap. Will hook it into a web service where developers can make a reservation and provide input as to what they want deployed. Significant overlap with process to roll out new production environments, obvious next step
Finally lets talk about the scale of our Engineering organization(s) No hard rules about what tech must be used. Best practices bubble up A real challenge to bring on new employees, worse to bring on intern and make most use of their time
Once more talking about our cloud group, ozone. Not the full story, but some idea of what has to happen. Took me weeks to get fully set up, and I think I'm still missing some stuff, exacerbated by being remote and off-hours from main group some times.
How can Ansible help here?
Ansibox is a project I'm working on personally to help with onboarding. Taking inspiration from Github's Boxen project. Roles are where the magic happens.
Engineers should have to give limited input to Ansibox in order for Ansibox to be able to perform the setup. These could be prompted for in the future. Engineer names a role and provides a location to find that role.
The top level playbook fetches all the roles, can update them optionally. Generates another playbook to actually go through and apply the roles to the host. Generated playbook comes from a template and is very simple.
Here is a look at after it gets generated. Doing sudo no at this level, each task in each role can decide to do sudo if author wants it.
A very simple start to a ansibox executable. Two playbooks are necessary due to Ansible design Prompt is there for second play in case any role wants sudo
This is the start of a task list for the ozone role. Repos get cloned, tools get installed, configuration files get put into place. Here we could also check for permissions to services and prompt the engineer on what to do to gain access
With this system it becomes easy for an engineer to boot strap a system, and easy for a group to own that process for the group. Engineers can also add their own roles for personal setups, and be unafraid to refresh devices. Engineers can also contribute to the system as gaps are found