SlideShare a Scribd company logo
1 of 39
Download to read offline
Unveiling CERN Cloud Architecture
Openstack Design Summit – Tokyo, 2015
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
What is CERN?
•  European Organization for Nuclear
Research (Conseil Européen pour la
Recherche Nucléaire)
•  Founded in 1954
•  21 state members, other countries
contribute to experiments
•  Situated between Geneva and the
Jura Mountains, straddling the
Swiss-French border
•  CERN mission is to do fundamental
research
3
LHC - Large Hadron Collider
4
LHC and Experiments
5
CMS detector
https://www.google.com/maps/streetview/#cern
LHC and Experiments
6
Proton-lead collisions at ALICE detector
CERN Data Centres
7
OpenStack at CERN by numbers
8
~ 5000 Compute Nodes (~130k cores)
•  ~ 4800 KVM
•  ~ 200 Hyper-V
~ 2400 Images ( ~ 30 TB in use)
~ 1800 Volumes ( ~ 800 TB allocated)
~ 2000 Users
~ 2300 Projects
~ 16000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
OpenStack timeline at CERN
9
ESSEX
5 Apr 2012
FOLSOM
27 Sep 2012
GRIZZLY
4 Apr 2013
HAVANA
17 Oct 2013
ICEHOUSE
17 Apr 2014
JUNO
16 Oct 2014
Havana
February 2014
Icehouse
October 2014
KILO
30 Apr 2015
“Hamster”
Oct 2013
“Guppy”
Jun 2012
“Ibex”
Mar 2013
Grizzly
Jul 2013
Juno
April 2015
LIBERTY
Kilo
October 2015
CERN production infrastructure
•  Evolution of the number of VMs created since July 2013
OpenStack timeline at CERN
10
Number of VMs running Number of VMs created (cumulative)
Infrastructure Overview
•  One region, two data centres, 26 Cells
•  HA architecture only on Top Cell
•  Children Cells control plane are usually VMs running in the shared infrastructure
•  Using nova-network with custom CERN driver
•  2 Hypervisor types (KVM, HyperV)
•  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2
•  2 Ceph instances
•  Keystone integrated with CERN account/lifecycle system
•  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally
•  Deployment using OpenStack puppet modules and RDO
11
Architecture Overview
12
Nova Compute Cell
Nova Top Cell
Nova Compute Cell
Nova Compute Cell
Load BalancerCeph
Glance
Cinder
Heat
Ceilometer
Horizon
Keystone
DB infrastructure
(...)
Geneva Data Centre Budapest Data Centre
Ceph
DB infrastructure
Nova Compute Cell
Nova Compute Cell
Nova Compute Cell
(...)
Why Cells?
•  Single endpoint to users
•  Scale transparently between Data Centres
•  Availability and Resilience
•  Isolate different use-cases
13
CellsV1 Limitations
•  Functionality Limitations:
•  Security Groups
•  Manage aggregates on Top Cell
•  Availability Zone support
•  Cell scheduler limited functionality
•  Ceilometer integration
14
Nova Deployment at CERN
15
nova-cells
rabbitmqTop cell controller API node
nova-api
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
DB
(...)
Load Balancer
DB DB
Nova - Cells Control Plane
Top Cell Controller:
•  Controller nodes running only on
physical nodes
•  Clustered RabbitMQ with mirrored
queues
•  “nova-api” nodes are VMs
•  deployed in the “common” (user
shared) infrastructure
16
Children Cells Controllers:
•  Only ONE controller node per cell
•  NO HA at Children Cell level
•  Most are VMs running in other
Cells
•  Children Cell controller fails?
•  Replaced by another VM
•  User VMs are still available
•  ~200 compute nodes per cell
Nova - Cells Scheduling
•  Different cells have different use cases
•  Hardware, Location, Network configuration, Hypervisor type, ...
•  Cells capabilities
•  “datacentre”, “hypervisor”, “avzs”
•  example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva
•  scheduler filters to use these capabilities
•  CERN Cell Filters available at:
https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters
17
Nova - Cells Scheduling - Project Mapping
How we map projects to cells?
https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py
•  Default cells; Dedicated cells
•  Target cell will be selected considering the following configuration:
“nova.conf”
cells_default=cellA,cellB,cellC,cellD
cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3>
•  “disabling” a cell is removing it from the list...
http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html
18
Nova - Cells Scheduling - AVZs
•  CellsV1 implementation is not aware of aggregates
•  How to have AVZs with cells?
•  Create the aggregate/availability zone in the Top Cell
•  Create “fake” nova-compute services to add nodes into the
AVZs aggregates
•  Cell scheduler uses “capabilities” to identify AVZs
•  NO aggregates in the children cells
19
Nova - Legacy Child Cell configuration at CERN
•  Our first cell (2013)
•  Cell with >1000 compute nodes
•  Any problem in Cell control plane had huge impact
•  All availability zones behind this Cell using aggregates
•  Aggregates dedicated to specific projects
•  Multiple hardware types
•  KVM and Hyper-V
20
Nova - Cell Division (from 1 to 9)
How to divide an existing Cell?
•  Setup new Child Cells controllers
•  Copy the existing DB to all new Cells and delete all instance records that
will not belong to the new Cell
•  Move compute nodes to new Cells
•  Change instances “cells path” in Top Cell DB
21
Nova - Live Migration
•  Block live migration
•  Compute nodes don’t have shared storage
•  Not used for daily operations...
•  Resources availability and network clusters constraints
•  Only considered for pets
•  Planned for the SLC6 to CC7 migration
•  Planned for hardware end of life
•  How to orchestrate large live-migration campaign?
22
Nova - Live Migration
•  Block live migration with volumes attached is problematic...
•  Attached Cinder volumes are block migrated along with instance
•  They are copied, over the network, from themselves to themselves
•  Can cause data corruption
•  https://bugs.launchpad.net/nova/+bug/1376615
•  https://bugzilla.redhat.com/show_bug.cgi?id=1203032
•  https://review.openstack.org/#/c/176768/
23
Nova - Kilo with SLC6
•  Kilo dropped support to Python 2.6
•  We still have ~800 compute nodes running on SLC6
•  We needed to build Nova RPM for SLC6
•  Original recipe from GoDaddy!
•  Create a venv using python 2.7 from SCL
•  Build the venv with Anvil
•  Package the venv in a RPM
24
Nova - Network
CERN network configuration:
•  Network is divided into several "network clusters" (L3 networks), that
have several ”IP services" (L2 subnets)
•  Each compute node is associated to a "network cluster”
•  VMs running in a compute node can only have an IP from the "network
cluster" associated to the compute node
•  https://etherpad.openstack.org/p/Network_Segmentation_Usecases
25
Nova - Network
•  Developed CERN Network driver
•  Create a new VM
1.  Selects the network cluster considering the compute node selected to boot the instance
2.  Selects an address from the network cluster
3.  Updates CERN network database
4.  Waits for the central DNS refresh
•  “fixed_ips” table contains IPv4, IPv6, MAC and network cluster
•  New table does the mapping “host” -> network cluster
•  Network constraints in some nova operations
•  Resize, Live-Migration
•  https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py
26
Neutron is coming...
•  NOT in production. Testing/developing instance
•  What we use/don't use from Neutron
•  No SDN or tunneling
•  Only provider networks, no private/tenant
•  Flat networking.  VMs bridged directly to the real network
•  No DHCP or DNS from neutron. We have already our infrastructure
•  We don't use floating IPs
•  Neutron API not exposed to users
•  Implemented API extensions and Mechanism Driver for our use case
•  https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b
•  How to migrate from nova-network to Neutron?
27
Keystone Deployment at CERN
28
Load Balancer
DB
Service
CatalogueDB
Keystone
Service
Catalogue
(Exposed to Users) (Dedicated to Ceilometer)
Keystone
Active
Directory
Keystone
•  Keystone nodes are VMs
•  Integrated with CERN’s Active Directory infrastructure
•  Project life cycle
•  ~200 arrivals/departures per month
•  CERN user subscribes the "cloud service”
•  Created "Personal Project" with limited quota
•  “Shared Projects” created by request
•  "Personal project" disabled when user leaves the Organization
•  After 3 months stop resources and after 6 months delete resources (VMs,
Volumes, Images, …)
29
Glance Deployment at CERN
30
Load Balancer
DB
Glance-api
Glance-registry
Glance node
(Exposed to Users)
Glance-api
Glance-registry
Glance node
(Only used for Ceilometer calls)
Ceph
Geneva
Glance
•  Uses Ceph backend in Geneva
•  Glance nodes are VMs
•  NO Glance image cache
•  Glance API and Glance Registry running in the same node
•  Glance API only talks with local Glance Registry
•  Two sets of nodes (API exposed to users and Ceilometer)
•  When Glance Quotas per Project?
•  Problematic in private clouds where users are not “charged” for storage
31
Cinder Deployment at CERN
32
Load Balancer
DB
Cinder-api
Cinder-volume
Cinder node
Cinder-scheduler
rabbitmq
Ceph
Geneva
Ceph
Budapest
NetApp
Cinder
•  Ceph and NetApp backends
•  Extended list of available volume types (QoS, Backend, Location)
•  Cinder nodes are VMs
•  Active/Active?
•  When a volume is created a “cinder-volume” node is associated
•  Responsible for volume operations
•  Not easy to replace cinder controller nodes
•  DB entries need to be changed manually
•  More about CERN storage infrastructure for OpenStack:
•  https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the-
life-of-a-petabyte-scale-block-storage-service
33
Ceilometer Deployment at CERN
34
nova-compute
ceilometer-compute
Hbase
Ceilometer
Notification
Agent
Ceilometer
Pulling
Collector
Ceilometer
Notification
Collector
Ceilometer
UDP
Collector
MysqlMongoDB
Ceilometer
API
Cell
rabbitmq
notifications
Ceilometer
rabbitmq
Ceilometer
Evaluator & Notifier
sampleRPC
sampleUDP
Ceilometer
API
HEAT
ceilometer-central-agent
Compute node
Ceilometer
35
•  “ceilometer-compute-agent” queries “nova-api” for the
instances hosted in the compute node
•  This can be very demanding for
“nova-api”
•  When using the default
“instance_name_template” the
“instance_name” in Top Cell is
different from the Child Cell
•  Need to have “nova-api” per Cell
Number of Nova API calls done by ceilometer-compute-agent per hour
•  Using a dedicated RabbitMQ cluster for Ceilometer
•  Initially we used Children Cells
Not a good idea!
•  Any failure/slow down in the
backend storage system can create
a big queue...
Ceilometer
36
Size of “metering.sample” queue
Rally
37
•  Probe/Benchmarking the Infrastructure every hour
Challenges
•  Capacity increase to 200k cores by Summer 2016
•  Live Migrate thousands of VMs
•  Upgrade ~800 compute nodes from SLC6 to CC7
•  Retire old servers
•  Move to Neutron
•  Identity Federation with different scientific sites
•  Magnum and containers possibilities
38
belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

More Related Content

What's hot

CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sBelmiro Moreira
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERNArne Wiebalck
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating ScienceTim Bell
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellAmrita Prasad
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudArne Wiebalck
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack LibertyStephen Gordon
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionArne Wiebalck
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
 

What's hot (20)

CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 
Operational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in ProductionOperational War Stories from 5 Years of Running OpenStack in Production
Operational War Stories from 5 Years of Running OpenStack in Production
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
TripleO
 TripleO TripleO
TripleO
 

Viewers also liked

OpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesOpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesJalal Mostafa
 
Interoperable OpenFlow with NDMs and TTPs
Interoperable OpenFlow with NDMs and TTPsInteroperable OpenFlow with NDMs and TTPs
Interoperable OpenFlow with NDMs and TTPsUS-Ignite
 
OpenStack Architecture Board
OpenStack Architecture BoardOpenStack Architecture Board
OpenStack Architecture BoardOpen Stack
 
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)Ubuntu Korea Community
 
Mastering OpenStack - Episode 08 - Storage Decisions
Mastering OpenStack - Episode 08 - Storage DecisionsMastering OpenStack - Episode 08 - Storage Decisions
Mastering OpenStack - Episode 08 - Storage DecisionsRoozbeh Shafiee
 
Mastering OpenStack - Episode 09 - Storage Decisions
Mastering OpenStack - Episode 09 - Storage DecisionsMastering OpenStack - Episode 09 - Storage Decisions
Mastering OpenStack - Episode 09 - Storage DecisionsRoozbeh Shafiee
 
Mastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutMastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutRoozbeh Shafiee
 
Quick overview of Openstack architecture
Quick overview of Openstack architectureQuick overview of Openstack architecture
Quick overview of Openstack architectureToni Ramirez
 
Mastering OpenStack - Episode 04 - Provisioning and Deployment
Mastering OpenStack - Episode 04 - Provisioning and DeploymentMastering OpenStack - Episode 04 - Provisioning and Deployment
Mastering OpenStack - Episode 04 - Provisioning and DeploymentRoozbeh Shafiee
 
Mastering OpenStack - Episode 05 - Controller Nodes
Mastering OpenStack - Episode 05 - Controller NodesMastering OpenStack - Episode 05 - Controller Nodes
Mastering OpenStack - Episode 05 - Controller NodesRoozbeh Shafiee
 
Mastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesMastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesRoozbeh Shafiee
 
Openstack architure part 1
Openstack architure part 1Openstack architure part 1
Openstack architure part 1Nhan Cao Thanh
 
Mastering OpenStack - Episode 07 - Compute Nodes
Mastering OpenStack - Episode 07 - Compute NodesMastering OpenStack - Episode 07 - Compute Nodes
Mastering OpenStack - Episode 07 - Compute NodesRoozbeh Shafiee
 
Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015Tomasz Zen Napierala
 
Mastering OpenStack - Episode 06 - Controller Nodes
Mastering OpenStack - Episode 06 - Controller NodesMastering OpenStack - Episode 06 - Controller Nodes
Mastering OpenStack - Episode 06 - Controller NodesRoozbeh Shafiee
 
Cloud Infrastructure Migration
Cloud Infrastructure MigrationCloud Infrastructure Migration
Cloud Infrastructure MigrationRoozbeh Shafiee
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition OpenStack Foundation
 

Viewers also liked (20)

OpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesOpenStack Architecture and Use Cases
OpenStack Architecture and Use Cases
 
Interoperable OpenFlow with NDMs and TTPs
Interoperable OpenFlow with NDMs and TTPsInteroperable OpenFlow with NDMs and TTPs
Interoperable OpenFlow with NDMs and TTPs
 
OpenStack Architecture Board
OpenStack Architecture BoardOpenStack Architecture Board
OpenStack Architecture Board
 
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
KT 안재석 박사 - 오픈 소스 기반 클라우드 컴퓨팅 솔루션 open stack 이야기 (2011Y05M28D)
 
Mastering OpenStack - Episode 08 - Storage Decisions
Mastering OpenStack - Episode 08 - Storage DecisionsMastering OpenStack - Episode 08 - Storage Decisions
Mastering OpenStack - Episode 08 - Storage Decisions
 
Mastering OpenStack - Episode 09 - Storage Decisions
Mastering OpenStack - Episode 09 - Storage DecisionsMastering OpenStack - Episode 09 - Storage Decisions
Mastering OpenStack - Episode 09 - Storage Decisions
 
Mastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutMastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling Out
 
Quick overview of Openstack architecture
Quick overview of Openstack architectureQuick overview of Openstack architecture
Quick overview of Openstack architecture
 
Mastering OpenStack - Episode 04 - Provisioning and Deployment
Mastering OpenStack - Episode 04 - Provisioning and DeploymentMastering OpenStack - Episode 04 - Provisioning and Deployment
Mastering OpenStack - Episode 04 - Provisioning and Deployment
 
Mastering OpenStack - Episode 05 - Controller Nodes
Mastering OpenStack - Episode 05 - Controller NodesMastering OpenStack - Episode 05 - Controller Nodes
Mastering OpenStack - Episode 05 - Controller Nodes
 
Mastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesMastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple Architectures
 
OpenStack architecture and services
OpenStack architecture and servicesOpenStack architecture and services
OpenStack architecture and services
 
Openstack architure part 1
Openstack architure part 1Openstack architure part 1
Openstack architure part 1
 
Mastering OpenStack - Episode 07 - Compute Nodes
Mastering OpenStack - Episode 07 - Compute NodesMastering OpenStack - Episode 07 - Compute Nodes
Mastering OpenStack - Episode 07 - Compute Nodes
 
Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015
 
OpenStack Compute - Juno Updates
OpenStack Compute - Juno UpdatesOpenStack Compute - Juno Updates
OpenStack Compute - Juno Updates
 
Mastering OpenStack - Episode 06 - Controller Nodes
Mastering OpenStack - Episode 06 - Controller NodesMastering OpenStack - Episode 06 - Controller Nodes
Mastering OpenStack - Episode 06 - Controller Nodes
 
Cloud Infrastructure Migration
Cloud Infrastructure MigrationCloud Infrastructure Migration
Cloud Infrastructure Migration
 
Open stack nova reverse engineer
Open stack nova reverse engineerOpen stack nova reverse engineer
Open stack nova reverse engineer
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 

Similar to Unveiling CERN Cloud Architecture - October, 2015

OpenCloud - A Research Cloud
OpenCloud - A Research CloudOpenCloud - A Research Cloud
OpenCloud - A Research CloudON.Lab
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 
La apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLa apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLibreCon
 
OVN: Scaleable Virtual Networking for Open vSwitch
OVN: Scaleable Virtual Networking for Open vSwitchOVN: Scaleable Virtual Networking for Open vSwitch
OVN: Scaleable Virtual Networking for Open vSwitchmestery
 
Kubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOpsKubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOpsJim Bugwadia
 
Adventures with acs and odl
Adventures with acs and odlAdventures with acs and odl
Adventures with acs and odlHugo Trippaers
 
KuberNETes - meetup
KuberNETes - meetupKuberNETes - meetup
KuberNETes - meetupNathan Ness
 
All about open stack
All about open stackAll about open stack
All about open stackDataCentred
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
2011 Essex Summit: Openstack/Hyper-V clouds
2011 Essex Summit: Openstack/Hyper-V clouds2011 Essex Summit: Openstack/Hyper-V clouds
2011 Essex Summit: Openstack/Hyper-V cloudsppouliot
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networkingmarkmcclain
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNCeph Community
 
Demystifying Kubernetes for Enterprise DevOps
Demystifying Kubernetes for Enterprise DevOpsDemystifying Kubernetes for Enterprise DevOps
Demystifying Kubernetes for Enterprise DevOpsJim Bugwadia
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...Rohit Agarwalla
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationHank Preston
 

Similar to Unveiling CERN Cloud Architecture - October, 2015 (20)

OpenCloud - A Research Cloud
OpenCloud - A Research CloudOpenCloud - A Research Cloud
OpenCloud - A Research Cloud
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
La apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privadaLa apuesta de Telefónica por la cloud privada
La apuesta de Telefónica por la cloud privada
 
OVN: Scaleable Virtual Networking for Open vSwitch
OVN: Scaleable Virtual Networking for Open vSwitchOVN: Scaleable Virtual Networking for Open vSwitch
OVN: Scaleable Virtual Networking for Open vSwitch
 
Kubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOpsKubernetes for Enterprise DevOps
Kubernetes for Enterprise DevOps
 
Adventures with acs and odl
Adventures with acs and odlAdventures with acs and odl
Adventures with acs and odl
 
KuberNETes - meetup
KuberNETes - meetupKuberNETes - meetup
KuberNETes - meetup
 
All about open stack
All about open stackAll about open stack
All about open stack
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
2011 Essex Summit: Openstack/Hyper-V clouds
2011 Essex Summit: Openstack/Hyper-V clouds2011 Essex Summit: Openstack/Hyper-V clouds
2011 Essex Summit: Openstack/Hyper-V clouds
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
Demystifying Kubernetes for Enterprise DevOps
Demystifying Kubernetes for Enterprise DevOpsDemystifying Kubernetes for Enterprise DevOps
Demystifying Kubernetes for Enterprise DevOps
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
PSOCLD-1006 Cisco Cloud Architectures on OpenStack - Cisco Live! US 2015 San ...
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
 
Kubernetes2
Kubernetes2Kubernetes2
Kubernetes2
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Unveiling CERN Cloud Architecture - October, 2015

  • 1.
  • 2. Unveiling CERN Cloud Architecture Openstack Design Summit – Tokyo, 2015 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  • 3. What is CERN? •  European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) •  Founded in 1954 •  21 state members, other countries contribute to experiments •  Situated between Geneva and the Jura Mountains, straddling the Swiss-French border •  CERN mission is to do fundamental research 3
  • 4. LHC - Large Hadron Collider 4
  • 5. LHC and Experiments 5 CMS detector https://www.google.com/maps/streetview/#cern
  • 6. LHC and Experiments 6 Proton-lead collisions at ALICE detector
  • 8. OpenStack at CERN by numbers 8 ~ 5000 Compute Nodes (~130k cores) •  ~ 4800 KVM •  ~ 200 Hyper-V ~ 2400 Images ( ~ 30 TB in use) ~ 1800 Volumes ( ~ 800 TB allocated) ~ 2000 Users ~ 2300 Projects ~ 16000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes
  • 9. OpenStack timeline at CERN 9 ESSEX 5 Apr 2012 FOLSOM 27 Sep 2012 GRIZZLY 4 Apr 2013 HAVANA 17 Oct 2013 ICEHOUSE 17 Apr 2014 JUNO 16 Oct 2014 Havana February 2014 Icehouse October 2014 KILO 30 Apr 2015 “Hamster” Oct 2013 “Guppy” Jun 2012 “Ibex” Mar 2013 Grizzly Jul 2013 Juno April 2015 LIBERTY Kilo October 2015 CERN production infrastructure
  • 10. •  Evolution of the number of VMs created since July 2013 OpenStack timeline at CERN 10 Number of VMs running Number of VMs created (cumulative)
  • 11. Infrastructure Overview •  One region, two data centres, 26 Cells •  HA architecture only on Top Cell •  Children Cells control plane are usually VMs running in the shared infrastructure •  Using nova-network with custom CERN driver •  2 Hypervisor types (KVM, HyperV) •  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2 •  2 Ceph instances •  Keystone integrated with CERN account/lifecycle system •  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally •  Deployment using OpenStack puppet modules and RDO 11
  • 12. Architecture Overview 12 Nova Compute Cell Nova Top Cell Nova Compute Cell Nova Compute Cell Load BalancerCeph Glance Cinder Heat Ceilometer Horizon Keystone DB infrastructure (...) Geneva Data Centre Budapest Data Centre Ceph DB infrastructure Nova Compute Cell Nova Compute Cell Nova Compute Cell (...)
  • 13. Why Cells? •  Single endpoint to users •  Scale transparently between Data Centres •  Availability and Resilience •  Isolate different use-cases 13
  • 14. CellsV1 Limitations •  Functionality Limitations: •  Security Groups •  Manage aggregates on Top Cell •  Availability Zone support •  Cell scheduler limited functionality •  Ceilometer integration 14
  • 15. Nova Deployment at CERN 15 nova-cells rabbitmqTop cell controller API node nova-api rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute DB (...) Load Balancer DB DB
  • 16. Nova - Cells Control Plane Top Cell Controller: •  Controller nodes running only on physical nodes •  Clustered RabbitMQ with mirrored queues •  “nova-api” nodes are VMs •  deployed in the “common” (user shared) infrastructure 16 Children Cells Controllers: •  Only ONE controller node per cell •  NO HA at Children Cell level •  Most are VMs running in other Cells •  Children Cell controller fails? •  Replaced by another VM •  User VMs are still available •  ~200 compute nodes per cell
  • 17. Nova - Cells Scheduling •  Different cells have different use cases •  Hardware, Location, Network configuration, Hypervisor type, ... •  Cells capabilities •  “datacentre”, “hypervisor”, “avzs” •  example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva •  scheduler filters to use these capabilities •  CERN Cell Filters available at: https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters 17
  • 18. Nova - Cells Scheduling - Project Mapping How we map projects to cells? https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py •  Default cells; Dedicated cells •  Target cell will be selected considering the following configuration: “nova.conf” cells_default=cellA,cellB,cellC,cellD cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3> •  “disabling” a cell is removing it from the list... http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html 18
  • 19. Nova - Cells Scheduling - AVZs •  CellsV1 implementation is not aware of aggregates •  How to have AVZs with cells? •  Create the aggregate/availability zone in the Top Cell •  Create “fake” nova-compute services to add nodes into the AVZs aggregates •  Cell scheduler uses “capabilities” to identify AVZs •  NO aggregates in the children cells 19
  • 20. Nova - Legacy Child Cell configuration at CERN •  Our first cell (2013) •  Cell with >1000 compute nodes •  Any problem in Cell control plane had huge impact •  All availability zones behind this Cell using aggregates •  Aggregates dedicated to specific projects •  Multiple hardware types •  KVM and Hyper-V 20
  • 21. Nova - Cell Division (from 1 to 9) How to divide an existing Cell? •  Setup new Child Cells controllers •  Copy the existing DB to all new Cells and delete all instance records that will not belong to the new Cell •  Move compute nodes to new Cells •  Change instances “cells path” in Top Cell DB 21
  • 22. Nova - Live Migration •  Block live migration •  Compute nodes don’t have shared storage •  Not used for daily operations... •  Resources availability and network clusters constraints •  Only considered for pets •  Planned for the SLC6 to CC7 migration •  Planned for hardware end of life •  How to orchestrate large live-migration campaign? 22
  • 23. Nova - Live Migration •  Block live migration with volumes attached is problematic... •  Attached Cinder volumes are block migrated along with instance •  They are copied, over the network, from themselves to themselves •  Can cause data corruption •  https://bugs.launchpad.net/nova/+bug/1376615 •  https://bugzilla.redhat.com/show_bug.cgi?id=1203032 •  https://review.openstack.org/#/c/176768/ 23
  • 24. Nova - Kilo with SLC6 •  Kilo dropped support to Python 2.6 •  We still have ~800 compute nodes running on SLC6 •  We needed to build Nova RPM for SLC6 •  Original recipe from GoDaddy! •  Create a venv using python 2.7 from SCL •  Build the venv with Anvil •  Package the venv in a RPM 24
  • 25. Nova - Network CERN network configuration: •  Network is divided into several "network clusters" (L3 networks), that have several ”IP services" (L2 subnets) •  Each compute node is associated to a "network cluster” •  VMs running in a compute node can only have an IP from the "network cluster" associated to the compute node •  https://etherpad.openstack.org/p/Network_Segmentation_Usecases 25
  • 26. Nova - Network •  Developed CERN Network driver •  Create a new VM 1.  Selects the network cluster considering the compute node selected to boot the instance 2.  Selects an address from the network cluster 3.  Updates CERN network database 4.  Waits for the central DNS refresh •  “fixed_ips” table contains IPv4, IPv6, MAC and network cluster •  New table does the mapping “host” -> network cluster •  Network constraints in some nova operations •  Resize, Live-Migration •  https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py 26
  • 27. Neutron is coming... •  NOT in production. Testing/developing instance •  What we use/don't use from Neutron •  No SDN or tunneling •  Only provider networks, no private/tenant •  Flat networking.  VMs bridged directly to the real network •  No DHCP or DNS from neutron. We have already our infrastructure •  We don't use floating IPs •  Neutron API not exposed to users •  Implemented API extensions and Mechanism Driver for our use case •  https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b •  How to migrate from nova-network to Neutron? 27
  • 28. Keystone Deployment at CERN 28 Load Balancer DB Service CatalogueDB Keystone Service Catalogue (Exposed to Users) (Dedicated to Ceilometer) Keystone Active Directory
  • 29. Keystone •  Keystone nodes are VMs •  Integrated with CERN’s Active Directory infrastructure •  Project life cycle •  ~200 arrivals/departures per month •  CERN user subscribes the "cloud service” •  Created "Personal Project" with limited quota •  “Shared Projects” created by request •  "Personal project" disabled when user leaves the Organization •  After 3 months stop resources and after 6 months delete resources (VMs, Volumes, Images, …) 29
  • 30. Glance Deployment at CERN 30 Load Balancer DB Glance-api Glance-registry Glance node (Exposed to Users) Glance-api Glance-registry Glance node (Only used for Ceilometer calls) Ceph Geneva
  • 31. Glance •  Uses Ceph backend in Geneva •  Glance nodes are VMs •  NO Glance image cache •  Glance API and Glance Registry running in the same node •  Glance API only talks with local Glance Registry •  Two sets of nodes (API exposed to users and Ceilometer) •  When Glance Quotas per Project? •  Problematic in private clouds where users are not “charged” for storage 31
  • 32. Cinder Deployment at CERN 32 Load Balancer DB Cinder-api Cinder-volume Cinder node Cinder-scheduler rabbitmq Ceph Geneva Ceph Budapest NetApp
  • 33. Cinder •  Ceph and NetApp backends •  Extended list of available volume types (QoS, Backend, Location) •  Cinder nodes are VMs •  Active/Active? •  When a volume is created a “cinder-volume” node is associated •  Responsible for volume operations •  Not easy to replace cinder controller nodes •  DB entries need to be changed manually •  More about CERN storage infrastructure for OpenStack: •  https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the- life-of-a-petabyte-scale-block-storage-service 33
  • 34. Ceilometer Deployment at CERN 34 nova-compute ceilometer-compute Hbase Ceilometer Notification Agent Ceilometer Pulling Collector Ceilometer Notification Collector Ceilometer UDP Collector MysqlMongoDB Ceilometer API Cell rabbitmq notifications Ceilometer rabbitmq Ceilometer Evaluator & Notifier sampleRPC sampleUDP Ceilometer API HEAT ceilometer-central-agent Compute node
  • 35. Ceilometer 35 •  “ceilometer-compute-agent” queries “nova-api” for the instances hosted in the compute node •  This can be very demanding for “nova-api” •  When using the default “instance_name_template” the “instance_name” in Top Cell is different from the Child Cell •  Need to have “nova-api” per Cell Number of Nova API calls done by ceilometer-compute-agent per hour
  • 36. •  Using a dedicated RabbitMQ cluster for Ceilometer •  Initially we used Children Cells Not a good idea! •  Any failure/slow down in the backend storage system can create a big queue... Ceilometer 36 Size of “metering.sample” queue
  • 37. Rally 37 •  Probe/Benchmarking the Infrastructure every hour
  • 38. Challenges •  Capacity increase to 200k cores by Summer 2016 •  Live Migrate thousands of VMs •  Upgrade ~800 compute nodes from SLC6 to CC7 •  Retire old servers •  Move to Neutron •  Identity Federation with different scientific sites •  Magnum and containers possibilities 38