10. Redundancy & Resilience
CloudStack Management Server
Very easy to set up additional management servers
Load balancing required to give high availabilty
www.shapeblue.com
11. Redundancy & Resilience
MySQL
Master / Slave is ‘standard’
Alternatives include
MySQL Proxy (Mirroring)
Galera Cluster
MMM
www.shapeblue.com
12. Redundancy & Resilience
Load
Load
Balancer
Balancer
MySQL MySQL MySQL
CS Man CS Man
Galera
www.shapeblue.com
13. Redundancy & Resilience
DC1 DC2
F5 F5
CS Man CS Man CS Man CS Man
F5 F5
MySQL MySQL MySQL MySQL MySQL MySQL
Galera Galera
www.shapeblue.com
14. Redundancy & Resilience
Server ‘pairs’
MySQL masters and slaves
CS Man & MySQL master
vCenter & MS SQL server
Any other redundant servers i.e. DNS
www.shapeblue.com
15. Redundancy & Resilience
Laws of probability….
Same failure probability as raid 0
If management server OR MySQL
master fails – downtime for the
whole management system.
www.shapeblue.com
16. Automation
Automation of the
infrastructure build
www.shapeblue.com
17. Automation
Uses:
Why automate
Automation tools
Examples
Configuring management VMs
Build a CloudStack 4.0 management server
Deployment of hosts
Configuration of CloudStack (through API)
www.shapeblue.com
21. Automation
Example:
Base Build of Management
Servers using Shell Scripting
www.shapeblue.com
22. Automation – Management VM Configuration
Create Deployment VM
Download CSV & Make it executable,
CentOS (Minimal) + hostconfig script run it.
wget
www.shapeblue.com
26. Automation – Management VM Configuration
# if vm management use this one NETMASK4=255.255.255.248
if [ "$SecondaryNICNetwork" == “Mgmt" ]; then GATEWAY4=10.141.163.1
echo "ADDRESS0=10.128.9.9 ADDRESS5=213.212.69.0
NETMASK0=255.255.255.255 NETMASK5=255.255.255.0
GATEWAY0=10.14.16.1 GATEWAY5=10.141.163.1" > /etc/sysconfig/network-
ADDRESS1=10.128.3.13 scripts/route-eth1
NETMASK1=255.255.255.255 fi
GATEWAY1=10.14.16.1
ADDRESS2=64.238.199.202
NETMASK2=255.255.255.255
GATEWAY2=10.141.163.1
ADDRESS3=213.212.65.202
NETMASK3=255.255.255.255
GATEWAY3=10.141.163.1
ADDRESS4=213.212.68.168
www.shapeblue.com
28. Automation – CS4 Management Server
Create Deployment VM Set static IP address Download the script,
CentOS (Minimal) + and ensure it has an make it executable,
wget FQDN run it.
www.shapeblue.com
29. Automation – CS4 Management Server
#!/bin/bash chkconfig ntpd on
chkconfig mysqld on
setenforce permissive chkconfig nfs on
sed -i "/SELINUX=/ cSELINUX=permissive" /etc/selinux/config chkconfig rpcbind on
echo " service ntpd restart
[cloudstack] service mysqld restart
name=cloudstack service rpcbind start
baseurl=http://cloudstack.apt-get.eu/rhel/4.0/ service nfs start
enabled=1
gpgcheck=0" > /etc/yum.repos.d/cloudstack.repo /usr/bin/mysqladmin -u root password 'password'
yum update -y cloud-setup-databases cloud:cloud@localhost --deploy-
yum install ntp cloud-client mysql-server -y as=root:password
sed -i -e '/datadir/ ainnodb_rollback_on_timeout=1' -e '/datadir/ cloud-setup-management
ainnodb_lock_wait_timeout=600' -e '/datadir/
amax_connections=350' -e '/datadir/ alog-bin=mysql-bin' -e
"/datadir/ abinlog-format = 'ROW'" /etc/my.cnf
www.shapeblue.com
30. Automation
Example:
Host deployment server build
using shell scipting
www.shapeblue.com
31. Automation – Host Deployment
Create Deployment VM Download to VM Hypervisor Download build script,
installation media (inc. make it executable,
CentOS (Minimal) + wget XenServer Updates) run it.
www.shapeblue.com
32. Automation – Host Deployment
Deployment server (VM)
The script downloads, builds and configures:
DHCP
PXE (TFTP)
HTTP server
Script writes the scripts needed to for PXE boot of XenServer &
ESXi hosts
Script also writes the answer files and post installation scripts to
configure XenServer and ESXi hosts
www.shapeblue.com
41. Automation – The API
What is the API
The API is the real engine of CloudStack
The web GUI is simply making API calls
Port 8096 by default
www.shapeblue.com
42. Automation – The API
Using the API
CloudStack GUI
Browser
Word, Excel
Using Firebug/IE Developer Tools with CloudStack
www.shapeblue.com
43. Automation – The API
API commands directly through a browser
Immediate response
www.shapeblue.com
44. Automation – The API
API calls from a Word document or Excel spread sheet
www.shapeblue.com
45. Automation – The API
Using Firebug / IE Developer Tools
IE press F12 or ‘view developer tools’
Firefox install Firebug add-on
www.shapeblue.com
47. Automation – The API
Global Settings
http://csman:8096/client/api?command=updateConfiguration&name=vmware.management.portgroup&value=svc-
console
http://csman:8096/client/api?command=updateConfiguration&name=allow.user.create.projects&value=false
http://csman:8096/client/api?command=updateConfiguration&name=allow.public.user.templates&value=false
http://csman:8096/client/api?command=updateConfiguration&name=apply.allocation.algorithm.to.pods&value=true
http://csman:8096/client/api?command=updateConfiguration&name=cpu.overprovisioning.factor&value=2
http://csman:8096/client/api?command=updateConfiguration&name=vm.allocation.algorithm&value=random
www.shapeblue.com
48. Automation – The API
Getting information
http://csman1:8096/client/api?command=listServiceOfferings
http://csman1:8096/client/api?command=listTemplates&templatefilter=featur
ed
www.shapeblue.com
49. Automation – The API
Deploying an instance
Base command:
http://csman1:8096/client/api?command=deployVirtualMachine
The required options:
The Service Offering and Template IDs
&serviceofferingid=XXX
&templateid=XXX
&zoneid=XXX
&domainid=XXX
&account=XXX
Optional options:
&displayname=xxx
www.shapeblue.com
50. Automation – The API
Deploying an instance
http://csman1:8096/client/api?command=deployVirtualMachine&serviceofferingi
d=XXX&templateid=XXX
Can be used to create a large number of instances very quickly
www.shapeblue.com
51. Documentation
A word on documentation
www.shapeblue.com
52. Documentation
Dull, boring, tedious, slow – Crucial.
Write what you’re going to do
Follow what you wrote
Update it
With redundant servers; follow it again
With scripts ‘snapshot’ and start again
www.shapeblue.com
53. Any Questions ?
Paul Angus
paul.angus@shapeblue.com
Twitter: @ShapeBlue
www.shapeblue.com
www.shapeblue.com
End of day So we’ll start gentlyHopefully something for everyonePlease bear with me if seems obvious to you.Bear with my english accent and phrases/terminologyBasics and more advanced ideasPaul AngusEngineering and Science degreesCloudStack 2.13
Design Phase – some tips from experienceRedundancy & resilience – again some thoughts on building redundancy and resilience into the infrastructureAutomation – some examples of automation in the building of a cs architecture.Finanally a ‘word’ on documentation.
Storage, networking, overall architects technical and managerialEverything is interconnected and feels like everything relies on everything elseSomeone chipping in can be invaluable – particularly if they have past experience.
One that gets everyone:Switchsupports 4096 VLANs - but not at the same time.Gotchas…[add more]
Private clouds – you have your current usage to judgePublic clouds harder to predict – is guided by offerings.
Performance and / or capacityStorageNetworkNetwork (to storage) is often the limiting factor as jump to 10 Gbe is large(Although LACP in XenServer 6.1 and ESXi 5.1 will help to mitigate this)Not much point being able to run VMs per host if only 1Gb/s link Not much point to 256GB RAM with a single quad core processor unless a specific workload.
A few words on designing the infrastructure to maximise uptime.
I’ll look at the major CloudStack management elements – CSMan MySQLAnd then look at considerations if you’ve virtualised your management farm
CSMan internally, worth having a second management server as a ‘hot spare’Otherwise you’re going to need to load balance your connections to them.
Master / Slave is the ‘supported’ configuration, manual switchover not idealbut in a opensource environment anything goes.Some alternatives..
Example of active/active load balanced elements
Really cool setup – trader media group >autotraderThey can suffer the loss of an entire datacenterUserightscale to burst to amazon.
As well as what you have; ‘where’ you have it is importantRedundant pairs – DNS ServersWant anti-affinity (two MySQL servers)Want Affinity CSMan and MySQL Master
Similar to RAID 0In similar 2 hosts – if split the double the probability of loss of systemWLB / DRS rules or switch off WLB/DRS
Look at:Advantages of automationTools – from behemoth infrastructures to the simplest of toolsHost deployment, configuring management vms and configuration of cloudstack+ a couple of odds and ends
How do we achieve that…>
Aim to remove as much human error as possibleSame time speed deployment up.Running a script is also quicker than typing and far more repeatableKick a ‘load’ of scripts off at the same time
Automation can come in multiple formsChef & Puppet – enterprise grade automation – works for in-house use (required infrastructure makes it less useful for SIs)KickStart and Python – enables to learn one language and stick to it, requires the interpreter to be installed, but Python is Shell scripts don’t have to be fancy – have to learn awk & sed – have to learn the different languages.API calls
Simple example using BASH scripting.A management farm with a reasonable level of redundancy can easily have more than a dozen servers.The particular installation that this example is based on was a multi-tiered network with three interfaces on each vm plus static routes, but one of the networks I worked on had 7 tiers and used proxy servers to reach the internet-a lot of typing and to configure it all
We actually tend to combine these steps and create a VM template with these baked in.But essentially this is all we’d do
We have a single CSV with all the networking information.The actual one this is based on had 3 interfaces in each VM
Script itself:Ask for the hostname thenRead values from the csv usinggrep and awk based on hostname
Write theifcfg-ethx files, ntp and network files.
In this case there were routes which changed depending on which network the ‘secondary’ interface was connected to...takes out human error
Req: static IP and fdqnPure management server (no nfs or kvm)Separate scripts to add nfs and kvm(thanks to Wido who built the repo)
Again BASH scripting to build the server, however the configuring of hypervisors requires other scripting languagesInckickstart for ESXi
Use a management VM created by previous scriptRepeat with new script to configure the server
Write configuration file. Conscious decision to limit the number of files required.Self contained (requires hypervisor installation files)Look at some elements of the file >
After yum install of DHCP, suselinux and httpSee that script writes files rather than importing / downloading themAdds complexity in script because of escape characters
This section writes the default file for pxe bootingThis section is just shows the ESXi option XenServer or XCP as wellCould have a different script for each host, but then need to generate a new file for each host to tie it to its mac address.Simpler to add a line in the final CSV
Xenserver answer fileNote escape characters for quotes, but variables come from earlier in full script
CSV file for hosts
Note escape characters \\Weird stuff because of Xen XE command syntaxESXi has the ESXCli and vicfg commands
Xen updates are usually painful
In word or excel click on link in documentationImagine a spread sheet of the required storage with the final command built at the endWe don’t tend to fully automate this as ‘press-and-go’ because we want to keep an eye on what’s actually happeningThrough the GUI itself
Cheat for finding out what the cloudstackgui is actually up to
See the call in the to the API and the resulting responseCan we used to ‘see how the gui does it’
Example of global settings(still need to restart the management server)
Can retrieve information using the guiOtherwise only available through database
Example of deploying a virtual machine
Paste into browser and keep pressing refreshSpin up 150 hosts to stress test an environment
Take your pickOnly way to remember what you did,only way for others to replicateRun scripts from scratch impossible to update code and separately make changes