The document summarizes the development of Atlassian's private cloud platform. It describes how an initial team built a secret test environment (Block-1) and then a larger test environment (Block-2) to validate the architecture. Over time, the platform grew to 13,500 VMs which led to issues like poor performance and slow deployments. The team then focused on optimizing the platform infrastructure using technologies like OpenVZ containers to reduce overhead and read-only OS images to improve consistency and simplify management. The summaries emphasize how the team took an iterative approach, testing concepts on small scales before full deployment to address issues and focus on the tasks.
13. Be less flexible about what
infrastructure you provide.
Tuesday, July 10, 12
14. “You can use any database you like, as
long as its PostgreSQL 8.4.”
#summit12
Tuesday, July 10, 12
15. • Stop trying to be everything to everyone
• (we have other clouds within Atlassian)
• Lower operational complexity
• Easier to provide a deeply integrated, well supported
toolchain
• Small test surface matrix
Tuesday, July 10, 12
17. Do as little
as possible
deploy and
use it
Tuesday, July 10, 12
18. Block-1
A small scale model of the initial proposed platform
architecture. 4 desktop machines and a switch.
Purpose: Validate design, evaluate failure modes.
http://history.nasa.gov/Apollo204/blocks.html
Tuesday, July 10, 12
19. Block-1
Applications do not fall over.
Network boot assumptions validated.
Creation of VM’s over NFS too resource and time
intensive. (more on this later)
Tuesday, July 10, 12
20. Block-2
A large scale model of the platform architecture.
Purpose: Validate hardware resource assumptions and
compare CPU vendors.
http://history.nasa.gov/Apollo204/blocks.html
Tuesday, July 10, 12
21. Block-2
Customers per GB of RAM metric validated
VM Distribution and failover tools work.
Initial specs of compute hardware too conservative.
Decided to add 50% more RAM.
Tuesday, July 10, 12
23. Challenge
Existing platform hardware was a poor fit for our workload.
Memory and IO were heavily constrained, but CPU was not.
Tuesday, July 10, 12
24. Monitoring
We took 6 months worth of monitoring data from our
existing platform.
We used this to data to determine the right mix of
hardware.
Tuesday, July 10, 12
25. • 10 x Compute nodes (144G RAM, 12 cores, NO disks)
• 3 x Storage nodes (24 disks)
• Each rack delivered fully assembled
• Unwrap, provide power, networking
• Connected to customers in ~2 hours
Tuesday, July 10, 12
26. Advantage #1
Reliable.
Each machine goes through a 2
day burn in before it goes into the
rack.
Tuesday, July 10, 12
32. Challenge
Existing compute infrastructure used local disk for swap
and hypervisor boot.
Once we got the memory density right, it’s only boot.
Tuesday, July 10, 12
33. • No disks in compute infrastructure
• Avoid spinning 20 more disks per rack for a hypervisor OS
• Evaluated booting from:
• USB drives
• NFS
• Custom binary initrd image + kernel
Tuesday, July 10, 12
34. • No disks in compute infrastructure
• Avoid spinning 20 more disks per rack for a hypervisor OS
• Evaluated booting from:
• USB drives (unreliable and slow!)
• NFS (what if the network goes away?)
• Custom binary initrd image + kernel
Tuesday, July 10, 12
35. • Image is ~170Mb gzipped filesystem
• Download on boot, extract into ram - ~400Mb
• No external dependencies after boot
• All compute nodes boot from the same image
• Reboot to known state
Tuesday, July 10, 12
37. Sharp Edges.
• No swap == provision carefully
• Not a problem if you automate provisioning
• Treat running hypervisor image like an appliance
• Don’t change code - rebuild image and reboot
• Doing this often? Too many services in the hypervisor
Tuesday, July 10, 12
39. Challenge
Virtualisation is often inefficient.
There’s a memory and CPU penalty which is hard to
avoid.
Tuesday, July 10, 12
40. Open VZ
• Linux containers
• Basis for Parallels Virtuozzo Containers
• LXC isn’t there yet
• No guest OS kernels
• No performance hit
• Better resource sharing
Tuesday, July 10, 12
46. Challenge
Java VM’s aren’t lightweight.
Tuesday, July 10, 12
47. • Full virtualisation does a poor job at this
• 50 VMs = 50 Kernels + 50 caches + 50 shared libs!
• Memory de-dupe combats this, but burns CPU.
• Memory de-dupe works across all OSes
• We don’t use Windows.
• By being less flexible, we can exploit Linux specific features.
Tuesday, July 10, 12
49. • Provide a single OS image to all - free benefits:
• Shared libraries only load once.
• OS is cached only once.
• OS image is the same on every instance.
Tuesday, July 10, 12
50. Challenge
If all containers share the same OS image, then
managing state is a nightmare!
One bad change in one container would break them all!
Tuesday, July 10, 12
51. • But managing state on multiple machines is a solved
problem!
• What if you have >10,000 machines.
• Why are you modifying the OS anyway?
Tuesday, July 10, 12
52. Does your iPhone upgrade
iOS when you install an
app?
Tuesday, July 10, 12
53. “Fix problems by removing them, not by adding
systems to manage them.”
#summit12
Tuesday, July 10, 12
62. Container
OS tools
System supplied code
OpenVZ Kernel
Tuesday, July 10, 12
63. Container
OS tools
/ - Read Only
System supplied code
OpenVZ Kernel
Tuesday, July 10, 12
64. Container
OS tools
/ - Read Only
System supplied code
OpenVZ Kernel
Tuesday, July 10, 12
65. Container
OS tools Applications, JVM’s
/ - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
66. Container
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
67. Container
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
68. Container
Application and user data - /data (R/W)
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
69. Container
Application and user data - /data (R/W)
/data/service/
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
70. Container
Application and user data - /data (R/W)
/data/service/
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
71. Container
Application and user data - /data (R/W)
/data/service/
OS tools Applications, JVM’s
/ - Read Only /sw - Read Only
System supplied code Configs
OpenVZ Kernel
Tuesday, July 10, 12
72. How?
• Storage nodes export /e/ro/ & /e/rw
• Build an OS distro inside a chroot.
• Use whatever tools you are comfortable with.
• Put this chroot tree in the RO location on storage nodes
• Make a “data” dir in the RW location for each container
Tuesday, July 10, 12
73. How?
• On Container start bind mount:
/net/storage-n/e/ro/os/linux-image-v1/
-> /vz/<ctid>/root
• Replace etc, var & tmp with a memfs
• Linux expects to be able to write to these
• Mount containers data dir (RW) to /data
Tuesday, July 10, 12
74. More benefits
• Distribute OS images as a simple directory.
• Prove that environments (Dev, Stg, Prd) are identical
using MD5sum.
• Flip between OS versions by changing a variable
Tuesday, July 10, 12
76. The swear wall helps prevent death by a thousand cuts.
Your team has a gut feeling about whats hurting them -
this helps you quantify that feeling and act on the pain.
Tuesday, July 10, 12