SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
2
10 years of OpenStack at
CERN
(From 0 to 300k cores)
Virtual Open Infrastructure Summit
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
2009 - 2011
4
Virtualization and Server Consolidation
CVI - CERN Virtual Infrastructure
5
LxCloud - Virtualize Batch Infrastructure
6
2010 - 2011
7
OpenStack – Early Days
OpenStack at CERN - Early days
8
https://indico.cern.ch/event/118726/attachments/60920/87520/OpenStack.pdf
OpenStack at CERN - Early days
● Snapshot of one of the first development versions of OpenStack Horizon
9
2011 - 2013
10
Agile Infrastructure and Cloud Prototype
Agile Infrastructure Project
11
Prototyping CERN OpenStack Cloud
12
Prototyping CERN OpenStack Cloud
13
ESSEX
(April 2012)
FOLSOM
(September 2012)
GRIZZLY
(April 2013)
“Guppy”
June 2012
“Ibex”
March 2013
“Hamster”
October 2012
Prototyping CERN OpenStack Cloud
14
Engaging with the Community
15
16
2013 - 2020
17
From 0 to +300k cores
18
CERN OpenStack Cloud - 2013
19
CERN OpenStack Cloud - 2013
20
CERN OpenStack Cloud - 2013
21
CERN OpenStack Cloud - 2013
22
CERN OpenStack Cloud - Growth
23
CERN OpenStack Cloud - Growth
● OpenStack projects available in the CERN Cloud over releases
24
Grizzly Havana Icehouse Juno Kilo Liberty Mitaka Newton Ocata Pike Queens Stein
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic
Mistral
Manila
Qinling ?
Watcher ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic
Mistral
Manila
Qinling ?
Watcher ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic *
Mistral *
Manila *
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic ?
Mistral ?
Manila *
Trove ?
Murano ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic ?
Mistral ?
Manila ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic ?
Mistral ?
Manila ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum *
Barbican *
Neutron *
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat *
Rally *
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Nova
Glance
Horizon
Keystone
Ceilometer *
Nova
Glance
Horizon
Keystone
Ceilometer *
* - Pilot service
? - Trial service
Rocky Train Ussuri
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic
Mistral
Manila
Qinling ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic
Mistral
Manila
Qinling ?
Nova
Glance
Horizon
Keystone
Ceilometer
Cinder
Heat
Rally
EC2API
Magnum
Barbican
Neutron
Ironic
Mistral
Manila
Qinling ?
25
Milestone Highlights
26
Nova - Cells
● Allows Nova to scale to thousands of compute nodes
● Biggest Nova Cells deployment
● Moved from 2 cells to +80 cells
● Upgrade from CellsV1 to CellsV2 in 2018
27
Ceilometer - The Rise & Fall
● OpenStack Ceilometer deployed
● Removed after run it for 3 years. Not scalable and difficult to retrieve data
28
Storage - Cinder, Manila, S3
● OpenStack Cinder with Ceph backend (2014)
○ Several volume types available
● OpenStack Manila (Fileshare service). Backed by CephFS (2017)
● S3 available (end 2018)
29
Container Orchestration - Magnum
● OpenStack Magnum service available since 2016
● Extremely popular service, +500 clusters
30
Networking - Nova-network to Neutron
31
Baremetal Provisioning - Ironic
● In production since 2018
● All new hardware is enrolled using Ironic. +5000 nodes managed by Ironic
● Existing hardware will be enrolled into Ironic during 2020
32
Operations - Rundeck and Mistral
● OpenStack Mistral
● RunDeck
33
Operations
34
Operations
● Experience growing/managing the Infrastructure during the last 7 years
● Several upgrades during this journey
○ OpenStack release cycle is every 6 months!
○ SLC6 to CC7 upgrade
○ CC7 upgrades
○ CC7 to C8 upgrade?
● Supported for few years KVM and HyperV in the same infrastructure
○ Migrated CVI VMs to OpenStack HyperV and then to OpenStack KVM
● Security updates required reboot of all cloud
● Most user management operations are automated
○ project creation; quotas; ...
35
https://techblog.web.cern.ch/techblog/post/region-split/
Splitting the Infrastructure into Regions
36
● Public Clouds
○ Based on different pricing/SLA considering resource availability
○ Reserved instances vs spot-market
● Private Clouds
○ Quotas are hard limits. Leads to a reduction in resource utilization
○ Preemptible instances
■ Projects that exhausted their quota can continue to create instances
● Opportunistic workloads
● Low SLA
● Preemptible Instances Workflow in OpenStack Nova
○ The creation of a non preemptible VM fails because there aren’t available resources
○ Instances that fail with “Nova Valid Host”, go to “PENDING” state instead of “ERROR”
○ The Reaper service is notified and it tries to free the requested resources
■ Rebuild the instance
■ Or change instance state to “ERROR”
Preemptible Instances
37
https://techblog.web.cern.ch/techblog/post/preemptible-instances/
Work with the Community
38
39
2020 - ...
40
What’s next?
Challenges
● Leveraging Container Orchestration to deploy OpenStack control plane
● Re-enroll existing physical resources into OpenStack Ironic
● Introduction of GPU resources
● Move all resources from nova-network to Neutron
● Exploring how to provide ML platforms and Functions as a Service to our
users.
41
42
Summary
● During the last 10 years, resource management and deployment model
changed completely
○ From Virtualization and Server consolidation to a Cloud Infrastructure
○ From Baremetal to VMs, to managed Baremetal to Containers
● Continue to adapt the Infrastructure to the new technologies and
requirements
○ Iterative approach to introduce new services, new functionality
○ Continue to explore new approaches to deploy/manage a large infrastructure
■ Control Plane managed by kubernetes
■ New regions
■ Preemptible instances
43
Hall of Fame
Stefano Zilli
Wataru Takase
Thomas Hartland
Mihai Patrascoiu
Belmiro Moreira
Mateusz Kowalski
Thomas Oulevey
Arne Wiebalck
Jan van Eldik
Jose Castro
Spyridon Trigazis
Daniel Abad
Luis Pigueiras
Vitor Araujo
Luis Fernandez Alvarez
Daniel Fernandez Rodriguez
Gary McGilvary
Marek Denis
Andrea Giardini
Bruno Bompastor
44
Joe Harrison
Thodoris Tsioutias
Clenimar Filemon
Markus Sommer
Vipin Rathi
Ran Du
Cris Cordeiro
Luca Tartarini
Dinika Saxena
Shweta Oak
Jakub
Pavel
Antonio Marino
Marcos Fermin Lobo
Davide Michelino
Parin Pocheba
Nitin Aggarwal
Sean Crosby
Ignacio Dominguez
Martinez-Casanueva
Monika Talach
Mathieu Velten
Bertrand Noel
Konstantinos Samaras-Tsakiris
Surya Seetharaman
Robert Vasek
Ricardo Brito da Rocha
Costin Gament
Domenico Giordano
Iago Santos Pardo
Victor Araujo
Chirag Arora
Cas van der Laan
Zygimantas Matonis
Patrycja Gorniak
Elizaveta Svitanko
Venkata Ravicharan Nudurupati
Fedor Kitashov
Juan Dupuis
Serena Ziviani
Diogo Guerra
Evangelia Santorinaiou
Henni Mohamed
Roberto Soares
Theodoros Tsioutsias
Dheeraj Gupta
Vineet Menon
Lalit Dagre
Pranav Gaur
@belmiromoreira

Weitere Àhnliche Inhalte

Was ist angesagt?

OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
Tim Bell
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
Stephen Gordon
 

Was ist angesagt? (20)

Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStack
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
OpenContrail Implementations
OpenContrail ImplementationsOpenContrail Implementations
OpenContrail Implementations
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
 
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit TokyoOpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
 
Integrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private CloudIntegrating Bare-metal Provisioning into CERN's Private Cloud
Integrating Bare-metal Provisioning into CERN's Private Cloud
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 

Ähnlich wie 10 Years of OpenStack at CERN - From 0 to 300k cores

VMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIOVMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIO
Filip Verloy
 
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdfBRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
ssuserc6aaff
 

Ähnlich wie 10 Years of OpenStack at CERN - From 0 to 300k cores (20)

TOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real WorldTOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real World
 
Openstack Pakistan Workshop (intro)
Openstack Pakistan Workshop (intro)Openstack Pakistan Workshop (intro)
Openstack Pakistan Workshop (intro)
 
Getting-Started-With-Openstack
Getting-Started-With-OpenstackGetting-Started-With-Openstack
Getting-Started-With-Openstack
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
Openstack Pakistan intro
Openstack Pakistan introOpenstack Pakistan intro
Openstack Pakistan intro
 
OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017
 
9 ways to consume kubernetes on open stack in 15 mins (k8s meetup)
9 ways to consume kubernetes on open stack in 15 mins (k8s meetup)9 ways to consume kubernetes on open stack in 15 mins (k8s meetup)
9 ways to consume kubernetes on open stack in 15 mins (k8s meetup)
 
VMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIOVMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIO
 
OpenStack und Containers
OpenStack und ContainersOpenStack und Containers
OpenStack und Containers
 
Getting started with OpenStack
Getting started with OpenStackGetting started with OpenStack
Getting started with OpenStack
 
Open stack Architecture and Use Cases
Open stack Architecture and Use CasesOpen stack Architecture and Use Cases
Open stack Architecture and Use Cases
 
OpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesOpenStack Architecture and Use Cases
OpenStack Architecture and Use Cases
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
 
(Open)Stacking Containers
(Open)Stacking Containers(Open)Stacking Containers
(Open)Stacking Containers
 
From OpenStack to Docker swarm
From OpenStack to Docker swarmFrom OpenStack to Docker swarm
From OpenStack to Docker swarm
 
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdfBRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
BRKVIR-2601 Architecting an OpenStack Based Cloud with Cisco Infrastructure.pdf
 
How to Integrate Kubernetes in OpenStack
 How to Integrate Kubernetes in OpenStack  How to Integrate Kubernetes in OpenStack
How to Integrate Kubernetes in OpenStack
 
HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 
Introduction to Open stack - An Overview
Introduction to Open stack - An Overview Introduction to Open stack - An Overview
Introduction to Open stack - An Overview
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 

KĂŒrzlich hochgeladen

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

KĂŒrzlich hochgeladen (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

10 Years of OpenStack at CERN - From 0 to 300k cores

  • 1.
  • 2. 2 10 years of OpenStack at CERN (From 0 to 300k cores) Virtual Open Infrastructure Summit Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  • 3.
  • 4. 2009 - 2011 4 Virtualization and Server Consolidation
  • 5. CVI - CERN Virtual Infrastructure 5
  • 6. LxCloud - Virtualize Batch Infrastructure 6
  • 7. 2010 - 2011 7 OpenStack – Early Days
  • 8. OpenStack at CERN - Early days 8 https://indico.cern.ch/event/118726/attachments/60920/87520/OpenStack.pdf
  • 9. OpenStack at CERN - Early days ● Snapshot of one of the first development versions of OpenStack Horizon 9
  • 10. 2011 - 2013 10 Agile Infrastructure and Cloud Prototype
  • 13. Prototyping CERN OpenStack Cloud 13 ESSEX (April 2012) FOLSOM (September 2012) GRIZZLY (April 2013) “Guppy” June 2012 “Ibex” March 2013 “Hamster” October 2012
  • 15. Engaging with the Community 15
  • 16. 16
  • 17. 2013 - 2020 17 From 0 to +300k cores
  • 18. 18
  • 19. CERN OpenStack Cloud - 2013 19
  • 20. CERN OpenStack Cloud - 2013 20
  • 21. CERN OpenStack Cloud - 2013 21
  • 22. CERN OpenStack Cloud - 2013 22
  • 23. CERN OpenStack Cloud - Growth 23
  • 24. CERN OpenStack Cloud - Growth ● OpenStack projects available in the CERN Cloud over releases 24 Grizzly Havana Icehouse Juno Kilo Liberty Mitaka Newton Ocata Pike Queens Stein Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic Mistral Manila Qinling ? Watcher ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic Mistral Manila Qinling ? Watcher ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic * Mistral * Manila * Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic ? Mistral ? Manila * Trove ? Murano ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic ? Mistral ? Manila ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic ? Mistral ? Manila ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum * Barbican * Neutron * Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally Nova Glance Horizon Keystone Ceilometer Cinder Heat * Rally * Nova Glance Horizon Keystone Ceilometer Cinder Nova Glance Horizon Keystone Ceilometer * Nova Glance Horizon Keystone Ceilometer * * - Pilot service ? - Trial service Rocky Train Ussuri Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic Mistral Manila Qinling ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic Mistral Manila Qinling ? Nova Glance Horizon Keystone Ceilometer Cinder Heat Rally EC2API Magnum Barbican Neutron Ironic Mistral Manila Qinling ?
  • 25. 25
  • 27. Nova - Cells ● Allows Nova to scale to thousands of compute nodes ● Biggest Nova Cells deployment ● Moved from 2 cells to +80 cells ● Upgrade from CellsV1 to CellsV2 in 2018 27
  • 28. Ceilometer - The Rise & Fall ● OpenStack Ceilometer deployed ● Removed after run it for 3 years. Not scalable and difficult to retrieve data 28
  • 29. Storage - Cinder, Manila, S3 ● OpenStack Cinder with Ceph backend (2014) ○ Several volume types available ● OpenStack Manila (Fileshare service). Backed by CephFS (2017) ● S3 available (end 2018) 29
  • 30. Container Orchestration - Magnum ● OpenStack Magnum service available since 2016 ● Extremely popular service, +500 clusters 30
  • 31. Networking - Nova-network to Neutron 31
  • 32. Baremetal Provisioning - Ironic ● In production since 2018 ● All new hardware is enrolled using Ironic. +5000 nodes managed by Ironic ● Existing hardware will be enrolled into Ironic during 2020 32
  • 33. Operations - Rundeck and Mistral ● OpenStack Mistral ● RunDeck 33
  • 35. Operations ● Experience growing/managing the Infrastructure during the last 7 years ● Several upgrades during this journey ○ OpenStack release cycle is every 6 months! ○ SLC6 to CC7 upgrade ○ CC7 upgrades ○ CC7 to C8 upgrade? ● Supported for few years KVM and HyperV in the same infrastructure ○ Migrated CVI VMs to OpenStack HyperV and then to OpenStack KVM ● Security updates required reboot of all cloud ● Most user management operations are automated ○ project creation; quotas; ... 35
  • 37. ● Public Clouds ○ Based on different pricing/SLA considering resource availability ○ Reserved instances vs spot-market ● Private Clouds ○ Quotas are hard limits. Leads to a reduction in resource utilization ○ Preemptible instances ■ Projects that exhausted their quota can continue to create instances ● Opportunistic workloads ● Low SLA ● Preemptible Instances Workflow in OpenStack Nova ○ The creation of a non preemptible VM fails because there aren’t available resources ○ Instances that fail with “Nova Valid Host”, go to “PENDING” state instead of “ERROR” ○ The Reaper service is notified and it tries to free the requested resources ■ Rebuild the instance ■ Or change instance state to “ERROR” Preemptible Instances 37 https://techblog.web.cern.ch/techblog/post/preemptible-instances/
  • 38. Work with the Community 38
  • 39. 39
  • 41. Challenges ● Leveraging Container Orchestration to deploy OpenStack control plane ● Re-enroll existing physical resources into OpenStack Ironic ● Introduction of GPU resources ● Move all resources from nova-network to Neutron ● Exploring how to provide ML platforms and Functions as a Service to our users. 41
  • 42. 42
  • 43. Summary ● During the last 10 years, resource management and deployment model changed completely ○ From Virtualization and Server consolidation to a Cloud Infrastructure ○ From Baremetal to VMs, to managed Baremetal to Containers ● Continue to adapt the Infrastructure to the new technologies and requirements ○ Iterative approach to introduce new services, new functionality ○ Continue to explore new approaches to deploy/manage a large infrastructure ■ Control Plane managed by kubernetes ■ New regions ■ Preemptible instances 43
  • 44. Hall of Fame Stefano Zilli Wataru Takase Thomas Hartland Mihai Patrascoiu Belmiro Moreira Mateusz Kowalski Thomas Oulevey Arne Wiebalck Jan van Eldik Jose Castro Spyridon Trigazis Daniel Abad Luis Pigueiras Vitor Araujo Luis Fernandez Alvarez Daniel Fernandez Rodriguez Gary McGilvary Marek Denis Andrea Giardini Bruno Bompastor 44 Joe Harrison Thodoris Tsioutias Clenimar Filemon Markus Sommer Vipin Rathi Ran Du Cris Cordeiro Luca Tartarini Dinika Saxena Shweta Oak Jakub Pavel Antonio Marino Marcos Fermin Lobo Davide Michelino Parin Pocheba Nitin Aggarwal Sean Crosby Ignacio Dominguez Martinez-Casanueva Monika Talach Mathieu Velten Bertrand Noel Konstantinos Samaras-Tsakiris Surya Seetharaman Robert Vasek Ricardo Brito da Rocha Costin Gament Domenico Giordano Iago Santos Pardo Victor Araujo Chirag Arora Cas van der Laan Zygimantas Matonis Patrycja Gorniak Elizaveta Svitanko Venkata Ravicharan Nudurupati Fedor Kitashov Juan Dupuis Serena Ziviani Diogo Guerra Evangelia Santorinaiou Henni Mohamed Roberto Soares Theodoros Tsioutsias Dheeraj Gupta Vineet Menon Lalit Dagre Pranav Gaur