Multi-Cell OpenStack: How to Evolve Your Cloud to Scale
OpenStack Design Summit, Paris - November, 2014
Belmiro Moreira - CERN
Matt Van Winkle - Rackspace
Sam Morrison - NeCTAR, University of Melbourne
1. Multi-Cell Openstack:
How to Evolve your Cloud
to Scale
● Belmiro Moreira - CERN
● Matt Van Winkle - Rackspace
● Sam Morrison - NeCTAR, University of
Melbourne
2. Cells: How we use them
at NeCTAR
Sam Morrison
sam.morrison@unimelb.edu.au
3. NeCTAR Research Cloud
● Started in 2011
● Funded by the Australian Government
● 8 institutions around the country
● Production early 2012 - Openstack Diablo
● All federated to appear as 1 cloud from the
users point of view
● Put the compute near the data and tools
● 5000+ users
4. NeCTAR Sites
● University of Melbourne
● National Computation Infrastructure
● Monash University
● Queensland CyberInfrastructure Foundation
● eResearch SA
● University of Tasmania
● Intersect, NSW
● iVEC, WA
5. Cells to build a Federation
● Use cells to federate geographically
separated sites
● Different hardware/networks/people
● Parent cell run centrally at unimelb along
with keystone/cinder/glance etc (no neutron)
● Each site has 1 or more compute cells
● These roughly match up to availability zones from a
users perspective (cells are behind the scenes)
6. How big?
● Each site ~4000 cores, ~150 hypervisors
● 6 sites in production, 4600+ instances
● Last 2 sites in prod by end of year
● ~1000 hypervisors, 40k cores
● ~10 compute cells
● Some sites have multiple datacenters so
have multiple cells
7. Pain points
● Cell scheduling isn’t smart
● Broadcast calls rely on all cells to be alive
● Not many people to share experiences with
● Upgrades, although havana → icehouse
could happen in stages. Much easier!
8. Things we’ve added, not in
trunk (yet)
● Security group syncing
● ec2 id mappings (needed for metadata)
● Availability zone / aggregate support
● Flavour management
*We assume a cell only has 1 parent
9. Cells: How we use them
at CERN
Belmiro Moreira
email: belmiro.moreira @ cern.ch
@belmiromoreira
10. CERN
● Conseil Européen pour la Recherche Nucléaire – aka
European Organization for Nuclear Research
● Founded in 1954 with an international treaty
○ 21 state members, other countries contribute to experiments
○ Situated between Geneva and the Jura Mountains, straddling the Swiss-
French border
● CERN mission is to do fundamental research
● CERN provides particle accelerators and other infrastructure
for high-energy physics research
11. CERN - Cloud Infrastructure
● In production since July 2013
● Performed two upgrades: Grizzly -> Havana -> Icehouse
○ Currently running: nova; glance; keystone; horizon; cinder w/ Ceph;
ceilometer
● RDO distribution on SLC6; pip with Windows Server 2012 R2
● 2 geographically separated data centres
○ Geneva (Switzerland) and Budapest (Hungary)
● Numbers
○ ~3000 compute nodes (75k cores; 140TB RAM)
■ ~2900 kvm; ~100 Hyper-V;
○ ~8000 virtual machines
12. CERN - Cloud Infrastructure - Cells
● Why we use cells?
○ Scale transparently between different Data Centres
○ Availability and Resilience
○ Isolate different use-cases
● Today: 1 api Cell and 8 compute Cells
○ 2 level tree
○ size range between 100 to ~1600 compute nodes
○ 6 Compute Cells in Switzerland; 2 Compute Cells in Hungary
● “Shared” and “Private” Cells
○ 3 availability zones available in “Shared” Cells
13. CERN - Cells Limitations
● Missing functionality
○ Security Groups
○ Flavor propagation (api -> compute)
○ Manage aggregates on api Cell
○ Server groups
● Cell scheduler
● Ceilometer integration
14. CERN - Cells Challenges
● More ~74000 cores by beginning 2015
○ How to organize and distribute nodes between different cells?
● Split current large cells into a small number (~200) of
compute nodes
○ Expected to have +30 cells by end 2015
○ How to manage a large number of Cells?
15. Created by: Matt Van Winkle @mvanwink
Modified Date: 10/29/2014
Cells at Rackspace
Cells: How to Evolve Your Cloud to Scale
16. • Managed Cloud company offering a suite of dedicated and cloud hosting products
• Founded in 1998 in San Antonio, TX
• Home of Fanatical Support
• More than 200,000 customers in 120 countries
Rackspace
www.rackspace.com
17. • In production since August 2012
– Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder
• Regular upgrades from trunk
– Package built on trunk pull from 10/21 in testing now
• Compute nodes are Debian based
– Run as VMs on hypervisors and manage via XAPI
• 6 Geographic regions around the globe
– DFW; ORD; IAD; LON; SYD; HKG
• Numbers
– 10’s of 1000’s of hypervisors (over 330K Cores, 1+ Petabyte of RAM)
• All XenServer
– Over 150,000 virtual machines
Rackspace – Cloud Infrastructure
www.rackspace.com
18. • Why we use cells?
– Manage Multiple Flavor Classes
– Network resources (Public IPs, Private IPs, aggregation routers, etc)
– Network Constraints
– Continual Supply Chain
• 1 Global API cell per region with multiple Compute cells (3 – 35+)
– 2 level tree
– Size between ~100 and ~600 hosts per cell
• Control infrastructure exists as instances in small OpenStack deployment
• All cells available to all tenants
– Tested “dedicated” cells for potential large customers
Rackspace – Cloud Infrastructure - Cells
www.rackspace.com
19. • Missing Functionality
– Security Groups
– Host aggregates
• Scheduler
– No “disable”
– Incomplete host statuses
• Other services are not cell aware
– Neutron is a prime example
Rackspace – Cells Limitations
www.rackspace.com
20. • Increasing number of flavor classes
– Different Hardware specs per class
– Sizing varies by average VM density
• Multiple vendor sources
– Subtle hardware differences in same specs across different vendors
• Scaling global services with cell growth
– Still don’t have the perfect ratios
Rackspace – Cells Challenges
www.rackspace.com
21. • Nova Dev team met this morning to discuss cells in a few sessions:
– Cells – Wednesday, November 5, 09:00
– Cells continued – Wednesday, November 5, 09:50
• Areas of discussion
– Feature completion
– No-op/single cell as default
– Cell awareness in APIs
• Recap from sessions
Cells Feature Completion
www.rackspace.com
22. Thank You!
● Belmiro Moreira - CERN - belmiro.moreira@cern.ch
● Matt Van Winkle - Rackspace - @mvanwink
● Sam Morrison - NeCTAR, University of Melbourne - sam.morrison@unimelb.
edu.au
Questions?
www.rackspace.com