2024: Domino Containers - The Next Step. News from the Domino Container commu...
Deploying OpenStack with Ansible
1. Created by: Kevin Carter & Curtis Collicutt
OS-Ansible-Deployment
Deploying OpenStack with Ansible
presentation > osad <<EOP
1
2. www.rackspace.com 2
Who am I?
Kevin Carter
● Developer at the Rackspace Private Cloud
● Open source activist
● Cloud operations junkie
● Python aficionado
● Recovering rubyist
● Beer lover
● Soccer fanatic
3. www.rackspace.com 3
Who am I?
Curtis Collicutt
● Lead OpenStack Engineer @ AURO
● Information Security
● Storage systems
● How do computers even?
● Films
5. www.rackspace.com 5
OSAD and what we’re about
● Deployer Experience
● Vanilla OpenStack
● Scalability
● Stability
6. www.rackspace.com 6
Why are we here?
In late 2013, the Rackspace Private Cloud team set out to
solve our common deployment, maintenance, scalability, and
stability problems.
7. www.rackspace.com 7
Distribution packaging of OpenStack
● Out of date packages
● Out of band configuration
● Packages include proprietary patches
● Time to bug resolution is longer than it
should
● Broken dependencies
8. www.rackspace.com 8
Available deployment tooling
● Maybe, sometimes, sorta, eventually “consistent”, kinda?
● Upgrades difficult or impossible
● Steep learning curve
9. www.rackspace.com 9
Legacy architecture does not scale
● Almost all deployment systems reference an
architecture that suffers from the “controller 1
controller 2” model
● VIP failover for OpenStack supporting services
bound to break and when it does it’ll break
spectacularly!
10. www.rackspace.com 10
What we devised
A source-based installation of OpenStack, built within LXC
containers, using a multi-master architecture orchestrated
and deployed via Ansible.
11. www.rackspace.com 11
Why Ansible?
● Community engagement
● Orchestration
● Almost no code
● Low barrier to entry
● Crazy powerful, stupid simple
13. www.rackspace.com 13
What is OSAD?
OSAD == OpenStack Ansible Deployment
● Uses LXC containers to isolate components and services
● Deploys OpenStack from upstream sources
● Runs on Ubuntu 14.04
● Built for production
● No proprietary secrete sauce
○ But you could bolt on as much as you want
● Created following the KISS principle
14. www.rackspace.com 14
● All Ansible tasks and roles target multiple nodes, even if that number is
a multiple of one (1)
○ EVERYTHING is tagged!
● Process separation on infrastructure components (controller nodes)
○ Microservice-like, where it makes sense
OSAD architecture
15. www.rackspace.com 15
● Galera multi-master cluster
● RabbitMQ with mirrored queues and deterministic sorting of the master
queues
● “Cheese shop” index build for your environment stored within your
environment
OSAD infrastructure components
16. www.rackspace.com 16
● OSAD does not know about the “all in one” deployment
○ LXC enables the base system to deploy a multi-node cloud even
with only one physical node
○ An AIO in our gate job emulates a 32 node cloud
● Neutron with the Linux Bridge agent offer stability and supportability
○ Open vSwitch is feature-full but Linux Bridge “just works”™
OSAD scale
17. www.rackspace.com 17
Community project
● We support Juno and Icehouse but the code contains
Rackspace-isms
● Kilo is our first “community” release of OSAD
● 41 contributors presently in the project
○ Not all Rackers
18. www.rackspace.com 18
Community project
We take our role within the community seriously!
# Lines of change between Juno and Kilo
git diff --stat juno kilo
1158 files changed, 39061 insertions(+), 81368 deletions(-)
19. www.rackspace.com 19
● Deployer experience: Ansible
● Vanilla OpenStack: Source-based installation
● Scalability: Built within LXC containers
● Stability: Obviously!
OSAD and what we’re about
20. www.rackspace.com 20
OSAD configuration
● OSAD configuration is your window into inventory
○ lives in /etc/openstack_deploy
● Dynamic inventory generated via config
● Compatible with Ansible static inventory
● Execution made simple using the openstack-ansible wrapper.
21. www.rackspace.com 21
OSAD deployment
# Change to the playbooks directory
cd /opt/os-ansible-deployment/playbooks
# Open your favorite terminal multiplexer
tmux new -s osad-deployment
# Do all the things!
openstack-ansible setup-everything.yml
Go get coffee|food|beer, this will take a minute.
22. www.rackspace.com 22
What an OpenStack deployment looks like with OSAD
Diagram not built to scale.
Derived from an All in One Installation.
23. www.rackspace.com 23
OSAD adding a compute node
# Execute run limited to the nova_compute group
openstack-ansible setup-everything.yml
--limit nova_compute
compute_hosts:
compute1:
ip: 172.29.236.201
compute2:
ip: 172.29.236.202
compute3:
ip: 172.29.236.203
compute4:
ip: 172.29.236.204
compute5:
ip: 172.29.236.205
EDIT: /etc/openstack_deploy/openstack_user_config.yml
24. www.rackspace.com 24
OSAD adding an infrastructure node
# Execute the setup with a limit on the infra groups we’re adding
openstack-ansible setup-everything.yml
--limit os-infra_all,
shared-infra_all,
identity_all
shared-infra_hosts:
infra1:
ip: 172.29.236.101
os-infra_hosts:
infra1:
ip: 172.29.236.101
identity_hosts:
infra1:
ip: 172.29.236.101
EDIT: /etc/openstack_deploy/openstack_user_config.yml
25. www.rackspace.com 25
OSAD reconfiguring all of neutron
# Execute a run limited to neutron_all
openstack-ansible setup-everything.yml
--limit neutron_all
global_overrides:
provider_networks:
- network:
container_bridge: "br-vxlan"
container_type: "veth"
container_interface: "eth10"
ip_from_q: "tunnel"
type: "vxlan"
range: "1:1000"
net_name: "vxlan"
group_binds:
- neutron_linuxbridge_agent
EDIT: /etc/openstack_deploy/conf.d/neutron_networks.yml
26. www.rackspace.com 26
● AURO - Public OpenStack Cloud
● Compute, Volume, Swift, Heat, Neutron
● Canadian data residency, ownership
● Vancouver region, Toronto up next
AURO - OpenStack
27. www.rackspace.com 27
● Not using as much as we’d like
● Mostly the infrastructure components
○ Rabbit, Galera, Memcached, etc
● Absolutely invaluable as an example
● Will continue to bring in more OSAD components as we operate over
time
● Team somewhat new to config mgmt
AURO & OSAD - What we are using
28. www.rackspace.com 28
● Great example of:
o Using Ansible
o Deploying OpenStack
o Testing - All in one, use of OpenStack infra
● Already supports Kilo
● Packaging and deploying OpenStack (ie. not using OS packages -
Python Wheels very cool)
● Segregation of services
AURO & OSAD - What we like
29. www.rackspace.com 29
● Public cloud
● Midonet
● Different HA Model
● Billing
● Support Model
○ Multiple tiers of internal support
AURO - Differences from OSAD
30. www.rackspace.com 30
● Not to restart services in same run as changes
o Need to control restarts in HA manner, rolling
● Every task tagged
● Continuously run (from Ansible Tower and/or Jenkins)
● Installing once is easy, operating forever is hard
● Ansible to help manage many small changes faster
● People don’t ssh into servers, only Ansible
AURO - Ansible Guiding Principles
31. www.rackspace.com 31
● Easy to use mostly idempotent modules then run a command or shell
task and make a mess of it
● changed_when: False is too easy to stumble with
● Multiple environments
● Being able to run one-time commands across all systems is as powerful
as it is dangerous
$ ansible -a reboot all
AURO - Ansible Struggles
32. www.rackspace.com 32
● Deploy OpenStack from source
● Segregation of services
● More monitoring
● Ansible callback plugins are useful
● Learn more from OpenStack testing infra
● Need a couple modules
o Midonet
o Swift
AURO - Near term improvements
33. www.rackspace.com 33
● Be “Pluggable?” (What does that even mean?)
o Neutron network - eg. Midonet
o HA model - eg. ECMP/BGP load balancing
● Balancing community roles and playbooks with custom
requirements
● Learn how to consume OSAD properly
AURO - OSAD Comments/Ideas/Questions
34. www.rackspace.com
● Secrets (eg. Hashicorp Vault, KeyWhiz)
● Continuous integration...err integration
● Caching (Ansible has Redis, other ideas?)
● What is the “future” of config mgmt? Must be more than just
pkg/config/start/bootstrap
● Change request workflow
34
AURO - Configuration Management Future
35. www.rackspace.com 35
● Increase community participation in OSAD
○ Community members wanted!
○ Pull requests welcome :)
● Build out the operational modules found within the upstream
● Modular Dynamic inventory
● etc . . .
Where does Ansible and OpenStack go from here?
OpenStack is hard. plain and simple. Now I’m here to talk about how Ansible makes Operating and Deploying OpenStack clouds easier it by no means makes it simple. I have no magic pixie dust that makes OpenStack simple. Deployers that claim to have a scalable production ready OpenStack cloud in > 10 minutes are on crack. People writing configuration management software for OpenStack know that OpenStack is hard but we’re all out there trying to make life easier for everyone in the community.
Talk about why I’m here presenting about Ansible and OpenStack
Where did we come from? - Rackspace Private Cloud has been here a while. I’d go as far as saying we were the first.
Don’t call it a comeback we’ve been here for years.
Packaging OpenStack sucks, say why.
Talk about why I’m here presenting about Ansible and OpenStack
Where did we come from? - Rackspace Private Cloud has been here a while. I’d go as far as saying we were the first.
Don’t call it a comeback we’ve been here for years.
Packaging OpenStack sucks, say why.
RCBOPS chef was a good example of the “run thrice” philosophy.
- Stackforge chef cookbooks is not much better
Upgrading required a lot of retool for ever release, even if it's a point release.
If you're using Puppet or Chef you’re learning a “DSL” which is more like a language than a task driven system.
Additionally when coming from the greater OpenStack community telling people that they need to learn Ruby
or some variant there of is a hard sell.
The controller model makes it hard or impossible to scale past 2 controllers and in production under heavy workloads
we’ve found that operators need the ability to scale beyond the two node limit.
If you use the controller model and you have two of them, then you likely have a VIP that fails over between the two nodes
this VIP failover is error prone and makes services like plain jane MySQL and RabbitMQ very unhappy. The controller
model generally does not account for the issues that can be caused when using mirrored queues.
Talk about why I’m here presenting about Ansible and OpenStack
Where did we come from? - Rackspace Private Cloud has been here a while. I’d go as far as saying we were the first.
Don’t call it a comeback we’ve been here for years.
Packaging OpenStack sucks, say why.
* Community, community, community…
* The power of true orchestration and task driven deployments, not a system of run thrice until nice.
* YAML is not code, YAML is easy to read, YAML is not code, YAML is easy to read.
* Everything is SSH, no agent, no CVEs due to agents.
* If the environment is large enough simply set Ansible forks accordingly and go…
* We made the LXC module.
** Pull request from rackspace for use of lxc in ansible natively: https://github.com/ansible/ansible-modules-extras/pull/123
* LXC is almost more bare metal. With LXC we can simulate additional host machines and treat the containers like just the same as we would another physical node.
* LXC is compatible with a lot of networks: veth, vlan, macvlan, and even physical device management.
* LXC can be built in an LVM using a real filesystem that can handle a production workload.
* LXC is rock solid. Container don’t crash under our workload, we’ve had containers up with impressive uptime, though we still treat them like disposable resources.
* OSAD is in stackforge and is gated using the OpenStack development process and model.
* Everything is tested with tempest.
* Containers for process and service separation.
* OpenStack services are installed from upstream sources.
* No proprietary software that you have to buy into.
^ and we have scale using OpenStack as it was intended from the upstream developers.
Our OpenStack deployment includes:
galera, rabbitmq, repository servers, rsyslog, memcached, keystone, glance, nova, neutron, heat, cinder, tempest, swift, horizon
* Ansible tagging allows me to run one logical set of tasks in a given role.
- Within the roles everything is a namespaced, even the tags.
- there are presently 319 tags in master.
* Process and service separation in containers means everything is a “node”.
* In the spirit of all things open source, we use MariaDB + Galera.
* Your own personal PyPi index, local to your deployment is always available to you, but it’s also mirrored at:
- http://rpc-repo.rackspace.com/
- https://mirror.rackspace.com/rackspaceprivatecloud/
* All in One simulates a larger environment than most production clouds.
* We used OVS, it worked, until it didn’t.
- For production we use LinuxBridge and in the future we’ll visit other plugins.
* We have an internally elected PTL at this point, though we’ll have a formal election soon.
* Everything is gated through gerrit.
The community commitment within the project forced us to refactor to make the system more supportable from the perspective of the greater community. That refactor forced us to “keystone-lite” the repo such that it removed all of the Rackspace-isms making the deployment system more generic.
Contributor list
# git log --format='%aN' | sort -u | wc -l
The community commitment within the project forced us to refactor to make the system more supportable from the perspective of the greater community. That refactor forced us to “keystone-lite” the repo such that it removed all of the Rackspace-isms making the deployment system more generic.
* We have an internally elected PTL (me) at this point, though we’ll have a formal election soon.
* Everything is gated through gerrit.
* When we committed to stackforge we excised cruft and deployment decisions that only benefited the Rackspace Private Cloud
* We made the decision to follow Ansible best practices to the letter where we could.
Total lines of content in juno which includes all the things within the repo.
# find . -type f -exec grep -v -e '^#' -e '^$' {} \; | wc -l
77391
Total lines of content in master which includes all the things within the repo.
# find . -type f -exec grep -v -e '^#' -e '^$' {} \; | wc -l
37045
Lines of YAML no comments no new lines in master
# find . -type f -name '*.yml' -exec grep -v -e '^#' -e '^$' {} \; | wc -l
9881
Vanilla OpenStack, in terms of the bits that power all of OpenStack is simpler to use, operate, and understand.
Simple is amazing!
* https://github.com/docker/docker/issues/7229
* https://www.mail-archive.com/aufs-users@lists.sourceforge.net/msg03847.html
* http://www.linuxquestions.org/questions/linux-general-1/which-linux-distros-use-aufs-unionfs-630594/
Cloud components are cattle, spend 30 minutes troubleshooting a broken component and if its not simple to fix kill them when they misbehave.
This is what a basic openstack_user_config.yml file looks like.
* It’s easy to get started
* the config is simple to understand
* can become as complex as you want it.
The basic openstack_user_config.yml file is essentially your entry point into Ansible inventory.
ascii diagram of stack.
This is what a basic openstack_user_config.yml file looks like.
* Its easy to get started
* the config is simple to understand
* can become as complex as you want it.
The basic openstack_user_config.yml file is essentially your entry point into Ansible inventory.
This is what a basic openstack_user_config.yml file looks like.
* Its easy to get started
* the config is simple to understand
* can become as complex as you want it.
The basic openstack_user_config.yml file is essentially your entry point into Ansible inventory.
This is what a basic openstack_user_config.yml file looks like.
* Its easy to get started
* the config is simple to understand
* can become as complex as you want it.
The basic openstack_user_config.yml file is essentially your entry point into Ansible inventory.
Thanks to Kevin and the OpenStack Ansible Deployment team, all the people who have contributed.
As usual we are standing on the shoulders of giants, from OSAD to Ansible to OpenStack to Linux and more
I’m not used to speaking in front of this many people, so forgive my mistakes
AURO - one of the few OpenStack public clouds in Canada, we have a lot of work ahead of us with with such a great community we can get the job done
Canadian data residency and ownership is important to many of our customers
Fairly stock OpenStack other than using Midonet
We started our second generation deployment while the OSAD team was working on moving from Juno to Kilo and removing “raxisms”, we had to get started and that has caused us not to use as much of OSAD as we would like
We definitely have some thinking and learning to do in terms of creating a process and workflow to consume OSAD and to integrate our particular infrastructure choices
When we upgrade from Juno to Kilo we will bring in much more of OSAD, if not all
We have a lot of work to do in terms of getting our organization up to speed and into a more “devopsy” style of working
Having full, working config files is a tremendous help to anyone deploying OpenStack
We like the emphasis on testing, that is the only way we will be able to continuously improve our deployment, the only way we will be able to operate a cloud over a long period of time
We need to get off of our dependency on the os packages, we will deploy from source using OSAD’s methodology
We really want to be part of the Ansible, Openstack, and OSAD community, we are committed to giving back where we can, low on resources at the moment though
Segregation of services is important to us
Public cloud is in many ways quite different than private cloud
We have multiple tiers of support and need to ensure they have the tools to do their job but also keep segregation of duties
We have to bill people, will be implementing stacktach, currently our own internal system
I think one of the most powerful things about ansible is the ability to use it to operate openstack over time, not just initial deployment
These are things that I personally struggle with and are not necessarily issues with Ansible or OSAD; have to watch I don’t shoot myself in the foot so to speak
Ansible’s power and flexibility are...very powerful, almost too powerful in some cases
I wrote a quick callback plugin to send a notification to slack when a playbook causes changes or fails
I think it’s good that we are a public cloud and want to use OSAD
Mostly we just need to figure out how to use as much as OSAD as possible while still having a unique environment
Though almost all OpenStack deployments are unique
I do struggle with secrets and variables in Ansible
If we need to do ITIL like things, how do we do that with config mgmt?
Ansible as the “execution engine” for change mgmt, “continuous improvement”
CONCLUSION: Basically we consume as much of OSAD as we can, add our custom requirements and account for differences, then wrap that all in monitoring, continuous integration and change management
Again thanks to the community, we have a lot of work to do for AURO and a lot of learning to do and changes to implement
Thanks to all the people writing modules too
Talk about where Ansible and OpenStack go from here.
modules
commits upstream
improving ansible
issues we’ve faced