4. 4
Why are we here?
In late 2013, the Rackspace Private Cloud team set out to solve our common
deployment, maintenance, scalability, and stability problems.
5. 5
Distribution packaging of OpenStack
● Out of date packages
● Out of band configuration
● Packages include proprietary patches
● Time to bug resolution is longer than it should
● Broken dependencies
6. 6
Legacy architecture does not scale
● Almost all deployment systems reference an architecture that
suffers from the “controller 1 controller 2” model
● VIP failover for OpenStack supporting services bound to break and
when it does it’ll break spectacularly!
7. 7
What we devised
A source-based installation of OpenStack, built within LXC containers, using a
multi-master architecture orchestrated and deployed via Ansible.
8. 8
Why Ansible?
● Community engagement
● Orchestration
● Almost no code
● Low barrier to entry
● Crazy powerful, stupid simple
9. 9
Why containers?
● LXC ≈ More bare metal
● Compatible with many networking architectures
● Supports an LVM backend
● Stable
10. 10
What is OSAD?
OSAD == OpenStack Ansible Deployment
● Uses LXC containers to isolate components and services
● Deploys OpenStack from upstream sources
● Runs on Ubuntu 14.04
● Built for production
● No proprietary secrete sauce
○ But you could bolt on as much as you want
● Created following the KISS principle
11. 11
● All Ansible tasks and roles target multiple nodes, even if that number is a multiple of one (1)
○ EVERYTHING is tagged!
● Process separation on infrastructure components (controller nodes)
○ Microservice-like, where it makes sense
OSAD architecture
12. 12
● Galera multi-master cluster
● RabbitMQ with mirrored queues and deterministic sorting of the master queues
● Pip Package index build for your environment stored within your environment
OSAD infrastructure components
13. 13
● OSAD does not know about the “all in one” deployment
○ LXC enables the base system to deploy a multi-node cloud even with only one physical node
○ An AIO in our gate job emulates a 32 node cloud
● Neutron with the Linux Bridge agent offer stability and supportability
○ Open vSwitch is feature-full but Linux Bridge “just works”™
OSAD scale
14. 14
Community project
● We support Juno and Icehouse but the code contains Rackspace-isms
● Kilo is our first “community” release of OSAD
● 41 contributors presently in the project
○ Not all Rackers
15. 15
● Deployer experience: Ansible
● Vanilla OpenStack: Source-based installation
● Scalability: Built within LXC containers
● Stability: Obviously!
OSAD and what we’re about
OpenStack is hard. plain and simple, especially in production. People writing configuration management software for OpenStack know that OpenStack is hard but we’re all out there trying to make life easier for everyone in the community.
Old method, to new, what issues did we have, why re-architecture?
What issues did we have with packaging?
Carrying monkey patches etc.
The controller model makes it hard or impossible to scale past 2 controllers and in production under heavy workloads
we’ve found that operators need the ability to scale beyond the two node limit.
If you use the controller model and you have two of them, then you likely have a VIP that fails over between the two nodes
this VIP failover is error prone and makes services like plain jane MySQL and RabbitMQ very unhappy. The controller
model generally does not account for the issues that can be caused when using mirrored queues.
Mention ansible selected os-ad as official openstack deployment for ansible.
* Community, community, community…
* The power of true orchestration and task driven deployments, not a system of run thrice until nice.
* YAML is not code, YAML is easy to read, YAML is not code, YAML is easy to read.
* Everything is SSH, no agent, no CVEs due to agents.
* If the environment is large enough simply set Ansible forks accordingly and go…
* We made the LXC module.
** Pull request from rackspace for use of lxc in ansible natively: https://github.com/ansible/ansible-modules-extras/pull/123
* LXC is almost more bare metal. With LXC we can simulate additional host machines and treat the containers like just the same as we would another physical node.
* LXC is compatible with a lot of networks: veth, vlan, macvlan, and even physical device management.
* LXC can be built in an LVM using a real filesystem that can handle a production workload.
* LXC is rock solid. Container don’t crash under our workload, we’ve had containers up with impressive uptime, though we still treat them like disposable resources.
* OSAD is in stackforge and is gated using the OpenStack development process and model.
* Everything is tested with tempest.
* Containers for process and service separation.
* OpenStack services are installed from upstream sources.
* No proprietary software that you have to buy into.
^ and we have scale using OpenStack as it was intended from the upstream developers.
Our OpenStack deployment includes:
galera, rabbitmq, repository servers, rsyslog, memcached, keystone, glance, nova, neutron, heat, cinder, tempest, swift, horizon
* Ansible tagging allows me to run one logical set of tasks in a given role.
- Within the roles everything is a namespaced, even the tags.
- there are presently 319 tags in master.
* Process and service separation in containers means everything is a “node”.
* In the spirit of all things open source, we use MariaDB + Galera.
* Your own personal PyPi index, local to your deployment is always available to you, but it’s also mirrored at:
- http://rpc-repo.rackspace.com/
- https://mirror.rackspace.com/rackspaceprivatecloud/
* All in One simulates a larger environment than most production clouds.
* We used OVS, it worked, until it didn’t.
- For production we use LinuxBridge and in the future we’ll visit other plugins.
* We have an internally elected PTL at this point, though we’ll have a formal election soon.
* Everything is gated through gerrit.
The community commitment within the project forced us to refactor to make the system more supportable from the perspective of the greater community. That refactor forced us to “keystone-lite” the repo such that it removed all of the Rackspace-isms making the deployment system more generic.
Contributor list
# git log --format='%aN' | sort -u | wc -l
Vanilla OpenStack, in terms of the bits that power all of OpenStack is simpler to use, operate, and understand.
Simple is amazing!
* https://github.com/docker/docker/issues/7229
* https://www.mail-archive.com/aufs-users@lists.sourceforge.net/msg03847.html
* http://www.linuxquestions.org/questions/linux-general-1/which-linux-distros-use-aufs-unionfs-630594/
Cloud components are cattle, spend 30 minutes troubleshooting a broken component and if its not simple to fix kill them when they misbehave.
Key Message: Bring familiarity to the OpenStack ecosystem; focus on Keystone use and role management; provides tight security controls
Key Message: Is a visual demonstration to the new v9.0 architecture; built with a distributed approach in mind
* An Highly Available Production ready build of OpenStack requires a series of servers
* Configuration of components must be without error and must be repeatable