More Related Content Similar to Considerations for Operating An OpenStack Cloud (20) More from Mark Voelker (11) Considerations for Operating An OpenStack Cloud1. Mark T. Voelker, Technical Leader @ Cisco
OpenStack ATC/StackForge Puppet Core/Foundation Member #54
All Things Open 2014
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
2. @marktvoelker
• Tech Lead at Cisco, StackForge Puppet core developer, OS Foundation
Member #54
• Fact: can be bribed with doughnuts
• Currently works in Cisco’s Cloud & Virtualization Group
• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively
Scalable Data Centers, Devops, making sawdust with extreme prejudice
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
3. • Tech lead, manager, software developer, architect
• Started in OpenStack in 2011 at the Diablo Design Summit
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
4. The great thing about my job is that I get to have fun exploring a lot
of new things…
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
5. ….and I get to help build a LOT of clouds.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
6. Today’s talk won’t be overly formal….
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
7. …because I tend to get excited by this stuff.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
8. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
9. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
10. ……then you know how to get to Day 1.
Now let’s talk about getting to Day 30…
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
11. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
12. High
Availability?
Sounds
great--I’ll
take two!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
13. • Consider whether you want active/active or active/passive
• Setup and tooling differs a bit, but I generally like active/active
• Note that docs.openstack.org has an HA Guide
• A bit dated…patches welcome!
• Prioritize HA for the control plane
• That also means thinking about your database, network, and RPC bus
• Instance-level HA: there be dragons
• But yes, it’s being looked at
• Pets vs cattle
• Note: HA == more hardware
• Some components need at least 3 nodes
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
14. • Stuff OpenStack needs to run: message brokers
• Check out RabbitMQ clustering and mirrored queues
• Check out Galera for MySQL/MariaDB
• I usually see Percona XtraDB
• Frontend with an HAProxy/Keepalived pair
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
15. • Don’t do rabbit clustering
over a WAN
• Be aware of the SELECT…
FOR UPDATE issue
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
16. • Long story short: Neutron and some parts of Nova invoke an SQL
pattern known as “SELECT…FOR UPDATE” which Galera
doesn’t support due to issues with cross-node locking.
• Can cause deadlocks symptoms.
• Neutron/nova code being refactored to remove, but will likely not
be done until at least Kilo.
• Meanwhile: use HAProxy to send writes to a single Galera node
and you should be fine
• With the obvious scalability bottleneck
• More info here.
• Thank Jay Pipes & Peter Boros for
the find!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
17. • Use Swift, Ceph, or other highly available storage to back Glance
• Pick a highly available storage backend for Cinder too
• Use Keepalived/HAProxy to front-end multiple API servers
• Or another load balancer technology of your choice
• Can be deployed as dedicated nodes for scale, or cohabitate
• Network: DVR vs Provider Network Extensions
• Distributed Virtual Routers are a new experimental feature in Juno (not yet
ready for production)
• Please go test it and report/fix bugs!
• Provider networks essentially punt the availability issue to your physical
network
• Allows you to use standard tools like virtual port channels and VRRP
• Also highly performant
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
18. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
19. We start with bare metal.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
20. • For a cloud of any real size, you don’t want to be installing
operating systems by hand
• Remember that baremetal bringup actually isn’t something that
just happens once…often recurs for upgrades, capacity
expansion, etc.
• Baremetal bringup tools can also have other uses, like inventory
or bootstrapping configuration management agents.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
21. • A simple (~15k lines of Python code) tool for managing baremetal
deployments
• Flexible usage (API, CLI, GUI)
• Allows you to define systems (actual machines) and profiles (what
you want to do with them)
• Provides hooks for Puppet so you can then do further automation
once the OS is up and running
• Provides control for power (via IPMI or other means), DHCP/PXE
(for netbooting machines), and more.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
22. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
23. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
24. • Razor
• Developed by EMC, managed by Puppet Labs (occasionally used with Chef
too)
• Initial release in 2012
• Uses a “microkernel” loaded onto the machine to gather facts before
provisioning
• Tag + Policy model
• Crowbar
• Originally written by Dell, now a community project
• Originally designed to deploy OpenStack on all the way from baremetal
• Now deploys other stuff too (namely, Hadoop)
• Uses Chef to handle everything after the OS install
• Foreman
• Used by Red Hat among others
• Does baremetal bringup and serves as a Puppet ENC
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
25. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
26. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
27. “Cloud isn’t just an infrastructure technology….it’s a new
operations model. And with OpenStack in particular, it’s
one that’s very well suited to a DevOps style of
management. Many companies aren’t just adopting
cloud, they’re changing how they operate.”
“Besides, logging into servers to mess with config files
makes me sad.”
--That ranty guy in Raleigh again
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
28. • Remember, OpenStack is a set of interoperating distributed
systems
• That means you’re going to have a lot of software to configure on
a lot of machines
• You’re probably going to want to make changes over time
• You’re probably going to have more than one person touching
your cloud
• CM tools help you treat configuration as code, so you can
collaborate more easily
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
29. Pile of
Bash
Scripts
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29
30. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30
31. • An increasingly common pattern:
• Puppet or Chef for configuration management, PLUS
• Ansible or Salt for cross-node orchestration
• Recommendation: use the tools that work for you!
• But remember: you don’t have to do it alone.
• Several CM tools have thriving collaborators in the OpenStack community
• Links for later:
• Puppet for OpenStack
• Chef for OpenStack
• Ansible for OpenStack
• SaltStack for OpenStack
• Pile of bash scripts for OpenStack
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
32. • Unit tests for your deployment code are a good idea
• ServerSpec tests to make sure your config management system
did what it was supposed to are great
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32
33. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
34. …well, haven’t you always wanted a butler?
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
35. • DevOps: actually pretty handy
• OpenStack change velocity (community’s and yours)
• Anecdote: the majority of deployments I work with have some
customizations or backports from future releases
• It’s not just OpenStack, it’s all the underpinning components and
your CM code too!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35
36. • OpenStack itself uses CI/CD tools in it’s development
process…you should consider using them in your cloud buildout
too!
• The OpenStack Infra team has created some awesome tools: JJB, Zuul, etc
• They’re all open source and you can even see how OpenStack’s own CI is
set up (check out Elizabeth Joseph’s slides from yesterday for more!).
• The basics:
• An integration server (Jenkins, Go, Travis, etc)
• A code review and repository tool (Gerrit, Cgit, GitHub, etc)
• A battery of automated tests (lint checks, rspec-puppet, Tempest, Rally, etc)
• Some form of packaging (rpmbuild/mock, sbuilder/pbuilder, etc)
• An artifact repository (Artifactory, yum/apt repos, etc)
• Optionally, some deployment jobs (usually powered by your CM tool)
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36
37. • …you never intend to change the code yourself
• …building your own packages would violate a support contract
with your distribution
• …you’ve never used a CI/CD pipeline before (but really: you
should start learning)
• …you have a static environment that absolutely will not change,
need to add capacity, etc.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37
38. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38
39. • Now that you have a cloud, you’ll probably want to know that all
it’s parts stay in good working order.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39
40. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40
41. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41
42. • I’ve worked on a lot of OpenStack clouds and almost everyone
has their own preferred monitoring toolset.
• One possible exception: almost everybody seems to love Graphite.
• The golden rule is: use the tools that work for you!
• Very often this will be whatever you’re using in the rest of your
infrastructure.
• Break it down into at least two buckets:
• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are
OpenStack plugins out there on NagiosExchange)
• Trending data collection/plotting (ex: collectd/statsd feeding graphite)
• Also: use your peers!
• Check out Tong Li’s Monitoring as a Service talk later today!
• Operators often willing to share, so ask on the openstack-operators list.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42
43. • Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43
44. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44
45. • Distributed systems generate logs…..all over the place.
• Finding the root of problems may mean correlating logs from different
machines…but which?
• OpenStack in particular *can* be pretty verbose
• You may also be dealing with logs from other distributed tools in
your cloud (RabbitMQ, databases, etc)
• Generally you want to get logs together, be able to search them,
and be able to visualize them.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45
46. Unlike monitoring tools, there seems to be pretty broad consensus
on good tools here in deployments I’ve worked with….
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46