Inside the Atlassian OnDemand Private Cloud

Inside the Atlassian OnDemand
private cloud

George Barnett
SAAS Platform Architect

Tuesday, July 10, 12

In 2010 a team of engineers moved into our secret lair
(above a pub) to re-imagine our hosted platform.


6 months later
13,500 VMs

Launch - October 2011
1000 VMs


We have a cloud. So what?


We also had a cloud.. and ..
VM sprawl Poor performance

Over provisioning
Slow deployments

Low visibility into the full stack


Virtualisation often creates
new challenges but does
nothing about existing ones.


Be less ﬂexible about what
infrastructure you provide.


“You can use any database you like, as
long as its PostgreSQL 8.4.”

#summit12


• Stop trying to be everything to everyone
• (we have other clouds within Atlassian)

• Lower operational complexity
• Easier to provide a deeply integrated, well supported
toolchain
• Small test surface matrix


Fail fast. Learn quickly.


Do as little
as possible

deploy and
use it


Block-1
A small scale model of the initial proposed platform
architecture. 4 desktop machines and a switch.

Purpose: Validate design, evaluate failure modes.

http://history.nasa.gov/Apollo204/blocks.html


Block-1
Applications do not fall over.

Network boot assumptions validated.

Creation of VM’s over NFS too resource and time
intensive. (more on this later)


Block-2
A large scale model of the platform architecture.

Purpose: Validate hardware resource assumptions and
compare CPU vendors.

http://history.nasa.gov/Apollo204/blocks.html


Block-2
Customers per GB of RAM metric validated

VM Distribution and failover tools work.

Initial specs of compute hardware too conservative.
Decided to add 50% more RAM.


Hardware


Challenge
Existing platform hardware was a poor ﬁt for our workload.

Memory and IO were heavily constrained, but CPU was not.


Monitoring
We took 6 months worth of monitoring data from our
existing platform.
We used this to data to determine the right mix of
hardware.


• 10 x Compute nodes (144G RAM, 12 cores, NO disks)
• 3 x Storage nodes (24 disks)
• Each rack delivered fully assembled
• Unwrap, provide power, networking
• Connected to customers in ~2 hours


Advantage #1
Reliable.

Each machine goes through a 2
day burn in before it goes into the
rack.


Advantage #2
Neat.


Advantage #3
Consistent.


Advantage #4
Easy to deploy.


No disks.


Wait. What?


Challenge
Existing compute infrastructure used local disk for swap
and hypervisor boot.
Once we got the memory density right, it’s only boot.


• No disks in compute infrastructure
• Avoid spinning 20 more disks per rack for a hypervisor OS

• Evaluated booting from:
• USB drives
• NFS
• Custom binary initrd image + kernel


• No disks in compute infrastructure
• Avoid spinning 20 more disks per rack for a hypervisor OS

• Evaluated booting from:
• USB drives (unreliable and slow!)
• NFS (what if the network goes away?)
• Custom binary initrd image + kernel


• Image is ~170Mb gzipped ﬁlesystem
• Download on boot, extract into ram - ~400Mb

• No external dependencies after boot
• All compute nodes boot from the same image
• Reboot to known state


Compute Node Netboot Server
dhcp
PXE DHCP
response

TFTP
gpxe

dhcp
DHCP
Etherboot response

HTTP
bootscript

kernel & boot image

Boot


Sharp Edges.
• No swap == provision carefully
• Not a problem if you automate provisioning

• Treat running hypervisor image like an appliance
• Don’t change code - rebuild image and reboot
• Doing this often? Too many services in the hypervisor


Software


Challenge
Virtualisation is often inefﬁcient.
There’s a memory and CPU penalty which is hard to
avoid.


Open VZ
• Linux containers
• Basis for Parallels Virtuozzo Containers
• LXC isn’t there yet

• No guest OS kernels
• No performance hit
• Better resource sharing


Performance


http://wiki.openvz.org/Performance/vConsolidate-SMP


http://wiki.openvz.org/Performance/LAMP


Resource de-duping


“Don’t load the same thing
twice”


Challenge
Java VM’s aren’t lightweight.


• Full virtualisation does a poor job at this
• 50 VMs = 50 Kernels + 50 caches + 50 shared libs!
• Memory de-dupe combats this, but burns CPU.

• Memory de-dupe works across all OSes
• We don’t use Windows.
• By being less ﬂexible, we can exploit Linux speciﬁc features.


OpenVZ containers all share
the same kernel.


• Provide a single OS image to all - free beneﬁts:
• Shared libraries only load once.
• OS is cached only once.
• OS image is the same on every instance.


Challenge
If all containers share the same OS image, then
managing state is a nightmare!
One bad change in one container would break them all!


• But managing state on multiple machines is a solved
problem!
• What if you have >10,000 machines.

• Why are you modifying the OS anyway?


Does your iPhone upgrade
iOS when you install an
app?


“Fix problems by removing them, not by adding
systems to manage them.”

#summit12


Read-only OS images


Data classes in a system
• OS and system daemon code
• Application code
• Application and user data


OpenVZ Kernel


Container

OpenVZ Kernel


Container

OS tools
System supplied code

OpenVZ Kernel


Container

OS tools
/ - Read Only
System supplied code

OpenVZ Kernel


Container

OS tools Applications, JVM’s
/ - Read Only
System supplied code Configs

OpenVZ Kernel


Container

/ - Read Only /sw - Read Only

OpenVZ Kernel


Container

Application and user data - /data (R/W)


OpenVZ Kernel


Container

Application and user data - /data (R/W)

/data/service/


OpenVZ Kernel


How?
• Storage nodes export /e/ro/ & /e/rw
• Build an OS distro inside a chroot.
• Use whatever tools you are comfortable with.

• Put this chroot tree in the RO location on storage nodes
• Make a “data” dir in the RW location for each container


How?
• On Container start bind mount:
/net/storage-n/e/ro/os/linux-image-v1/
-> /vz/<ctid>/root
• Replace etc, var & tmp with a memfs
• Linux expects to be able to write to these

• Mount containers data dir (RW) to /data


More beneﬁts
• Distribute OS images as a simple directory.
• Prove that environments (Dev, Stg, Prd) are identical
using MD5sum.
• Flip between OS versions by changing a variable


The Swear Wall


The swear wall helps prevent death by a thousand cuts.

Your team has a gut feeling about whats hurting them -
this helps you quantify that feeling and act on the pain.


1.!@&*^# Solaris!
2.Solaris gets a mark
3.Repeat
4.Periodically throw out offensive technology
5...
6.PROFIT!! (swear less)


Optimise for the task at hand.

Don’t layer solutions onto problems. Get rid of them.


Thank you!


Inside the Atlassian OnDemand Private Cloud

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Inside the Atlassian OnDemand Private Cloud

Ähnlich wie Inside the Atlassian OnDemand Private Cloud (20)

Mehr von Atlassian

Mehr von Atlassian (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Inside the Atlassian OnDemand Private Cloud