SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Containers on Baremetal
And Preemptible Servers
At CERN and SKA
Belmiro Moreira, CERN
@belmoreira
John Garbutt, StackHPC
@johnthetubaguy
CERN - Large Hadron Collider (LHC)
CERN: Compact Muon Solenoid (CMS)
CERN - Cloud resources status board - 11/05/2018@09:23
What is SKA?
Image courtesy of CSIRO
Containers on Baremetal
SKA’s Science Data Processor
ALaSKA
SKA Performance Prototype
AlaSKA
Why Baremetal? Why Containers?
● Single security zone
● No need for virt, so target baremetal
● 30 seconds to switch ingest to Supernova, Fast radio burst, …
● Easier development and deployment cycles
Magnum with Ironic
● Magnum used extensively at CERN
● Docker Swarm and Kubernetes are supported
● Historically a separate driver for baremetal, badly maintained
● Queens moves to using Fedora Atomic for VM and baremetal
System Integration Prototype
Lessons learned
● Extra network ports added after initial setup
● Updated Docker version in Fedora Atomic 27
● Updating Atomic image with RDMA drivers was tricky
● Root disk wasn’t resized by cloud-init
http://www.stackhpc.com/magnum-queens.html
Preemptible Instances
Resources Utilization
● Public Clouds give the illusion of infinite capacity
○ Users pay for resources that they use
● Private Clouds
○ Resource management usually is based in project quotas
○ Prevent resources being exhausted
○ Prevent “over-committing” resources/quota
○ Manage individual projects requirements
○ Reserve resources for operations with higher priority
○ Scientific Clouds
■ Projects have different funding models
■ They expect a predefined number of resources available
■ But not always these resources are used full time
Idle Resources with quotas
Project 1 Project 2
Quota utilization Quota utilization
Idle Resources with quotas
Idle Resources with quotas
Maximize Resource Utilization
● Public Clouds
○ Based on different pricing/SLA considering resource availability
○ Reserved instances vs spot-market
● Private Clouds
○ Quotas are hard limits. Leads to a reduction in resource utilization
○ Preemptible instances
■ Projects that exhausted their quota can continue to create instances
● Opportunistic workloads
● Low SLA
Preemptible Instances
● Proposal to implement Preemptible Instances into OpenStack
○ Build a prototype
○ Minimise changes required in OpenStack nova
● Starting simple
○ Use dedicated projects for Preemptible Instances
■ Avoids tagging individual instances
○ Introduce a “Reaper” service
■ Orchestrator to manage preemptible instances
● Removes preemptible instances when resources are required for non preemptible
instances
● Applies a maximum TTL to preemptible instances
Workflow
Instance set into
PENDING state
Nova notifications
consumer
Selects
preemptible(s)
instance(s) to delete
Nova “Reaper” service
nova-api
nova-scheduler
“No Valid Host”
“No Valid Host”
Notification
1) delete selected preemptible(s)
2) rebuild instance
Reset instance ERROR state
Workflow
● The creation of a non preemptible VM fails because there aren’t available
resources
● Instances that fail with “Nova Valid Host”, go to “PENDING” state instead of
“ERROR”
● The Reaper service is notified and it tries to free the requested resources
○ Rebuild the instance
○ Or change instance state to “ERROR”
Current work in Preemptible Instances
● Add instance state PENDING (spec)
○ https://review.openstack.org/#/c/554212/
● Allow rebuild instances in cell0 (spec)
○ https://review.openstack.org/#/c/554218/
● Add scheduling notification
○ https://review.openstack.org/#/c/566470/
● Implement instance state PENDING
○ https://review.openstack.org/#/c/566473/
● Reaper prototype:
○ https://gitlab.cern.ch/ttsiouts/ReaperServicePrototype
Join the Scientific SIG and...
Get involved!
https://www.openstack.org/science/
Belmiro Moreira, CERN
@belmoreira
John Garbutt, StackHPC
@johnthetubaguy
Join the Scientific SIG and...
Get involved!
https://www.openstack.org/science/

Weitere ähnliche Inhalte

Was ist angesagt?

The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Belmiro Moreira
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellAmrita Prasad
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
OpenContrail Implementations
OpenContrail ImplementationsOpenContrail Implementations
OpenContrail ImplementationsJakub Pavlik
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksJakub Pavlik
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating ScienceTim Bell
 
Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStackRainya Mosher
 
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies Jakub Pavlik
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit TokyoOpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit TokyoJakub Pavlik
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack LibertyStephen Gordon
 
Kubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleKubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleStephen Gordon
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephSean Cohen
 

Was ist angesagt? (20)

The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
OpenContrail Implementations
OpenContrail ImplementationsOpenContrail Implementations
OpenContrail Implementations
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStack
 
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
OpenStack Ousts vCenter for DevOps and Unites IT Silos at AVG Technologies
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit TokyoOpenContrail Experience tcp cloud OpenStack Summit Tokyo
OpenContrail Experience tcp cloud OpenStack Summit Tokyo
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 
Kubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleKubernetes and OpenStack at Scale
Kubernetes and OpenStack at Scale
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
 

Ähnlich wie Containers on Baremetal and Preemptible VMs at CERN and SKA

OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula Project
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceShapeBlue
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
OpenStack Neutron: What's New In Kilo and a Look Toward Liberty
OpenStack Neutron: What's New In Kilo and a Look Toward LibertyOpenStack Neutron: What's New In Kilo and a Look Toward Liberty
OpenStack Neutron: What's New In Kilo and a Look Toward Libertymestery
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremCloudOps2005
 
Build bare metal kubernetes cluster for hpc on open stack in translational me...
Build bare metal kubernetes cluster for hpc on open stack in translational me...Build bare metal kubernetes cluster for hpc on open stack in translational me...
Build bare metal kubernetes cluster for hpc on open stack in translational me...Shuquan Huang
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetesdatamantra
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceSpyros Trigazis
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleMirantis
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebula Project
 
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...Arthur Berezin
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnumTon Ngo
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnumTon Ngo
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
 

Ähnlich wie Containers on Baremetal and Preemptible VMs at CERN and SKA (20)

OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful Clouds
 
Neutron Updates - Kilo Edition
Neutron Updates - Kilo EditionNeutron Updates - Kilo Edition
Neutron Updates - Kilo Edition
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experience
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
OpenStack Neutron: What's New In Kilo and a Look Toward Liberty
OpenStack Neutron: What's New In Kilo and a Look Toward LibertyOpenStack Neutron: What's New In Kilo and a Look Toward Liberty
OpenStack Neutron: What's New In Kilo and a Look Toward Liberty
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
 
Build bare metal kubernetes cluster for hpc on open stack in translational me...
Build bare metal kubernetes cluster for hpc on open stack in translational me...Build bare metal kubernetes cluster for hpc on open stack in translational me...
Build bare metal kubernetes cluster for hpc on open stack in translational me...
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container service
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnum
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnum
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 

Kürzlich hochgeladen

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Kürzlich hochgeladen (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Containers on Baremetal and Preemptible VMs at CERN and SKA

  • 1. Containers on Baremetal And Preemptible Servers At CERN and SKA
  • 2. Belmiro Moreira, CERN @belmoreira John Garbutt, StackHPC @johnthetubaguy
  • 3.
  • 4.
  • 5.
  • 6. CERN - Large Hadron Collider (LHC)
  • 7. CERN: Compact Muon Solenoid (CMS)
  • 8. CERN - Cloud resources status board - 11/05/2018@09:23
  • 9. What is SKA? Image courtesy of CSIRO
  • 10.
  • 12. SKA’s Science Data Processor
  • 15. Why Baremetal? Why Containers? ● Single security zone ● No need for virt, so target baremetal ● 30 seconds to switch ingest to Supernova, Fast radio burst, … ● Easier development and deployment cycles
  • 16. Magnum with Ironic ● Magnum used extensively at CERN ● Docker Swarm and Kubernetes are supported ● Historically a separate driver for baremetal, badly maintained ● Queens moves to using Fedora Atomic for VM and baremetal
  • 18. Lessons learned ● Extra network ports added after initial setup ● Updated Docker version in Fedora Atomic 27 ● Updating Atomic image with RDMA drivers was tricky ● Root disk wasn’t resized by cloud-init http://www.stackhpc.com/magnum-queens.html
  • 20. Resources Utilization ● Public Clouds give the illusion of infinite capacity ○ Users pay for resources that they use ● Private Clouds ○ Resource management usually is based in project quotas ○ Prevent resources being exhausted ○ Prevent “over-committing” resources/quota ○ Manage individual projects requirements ○ Reserve resources for operations with higher priority ○ Scientific Clouds ■ Projects have different funding models ■ They expect a predefined number of resources available ■ But not always these resources are used full time
  • 21. Idle Resources with quotas Project 1 Project 2 Quota utilization Quota utilization
  • 24. Maximize Resource Utilization ● Public Clouds ○ Based on different pricing/SLA considering resource availability ○ Reserved instances vs spot-market ● Private Clouds ○ Quotas are hard limits. Leads to a reduction in resource utilization ○ Preemptible instances ■ Projects that exhausted their quota can continue to create instances ● Opportunistic workloads ● Low SLA
  • 25. Preemptible Instances ● Proposal to implement Preemptible Instances into OpenStack ○ Build a prototype ○ Minimise changes required in OpenStack nova ● Starting simple ○ Use dedicated projects for Preemptible Instances ■ Avoids tagging individual instances ○ Introduce a “Reaper” service ■ Orchestrator to manage preemptible instances ● Removes preemptible instances when resources are required for non preemptible instances ● Applies a maximum TTL to preemptible instances
  • 26. Workflow Instance set into PENDING state Nova notifications consumer Selects preemptible(s) instance(s) to delete Nova “Reaper” service nova-api nova-scheduler “No Valid Host” “No Valid Host” Notification 1) delete selected preemptible(s) 2) rebuild instance Reset instance ERROR state
  • 27. Workflow ● The creation of a non preemptible VM fails because there aren’t available resources ● Instances that fail with “Nova Valid Host”, go to “PENDING” state instead of “ERROR” ● The Reaper service is notified and it tries to free the requested resources ○ Rebuild the instance ○ Or change instance state to “ERROR”
  • 28. Current work in Preemptible Instances ● Add instance state PENDING (spec) ○ https://review.openstack.org/#/c/554212/ ● Allow rebuild instances in cell0 (spec) ○ https://review.openstack.org/#/c/554218/ ● Add scheduling notification ○ https://review.openstack.org/#/c/566470/ ● Implement instance state PENDING ○ https://review.openstack.org/#/c/566473/ ● Reaper prototype: ○ https://gitlab.cern.ch/ttsiouts/ReaperServicePrototype
  • 29. Join the Scientific SIG and... Get involved! https://www.openstack.org/science/
  • 30. Belmiro Moreira, CERN @belmoreira John Garbutt, StackHPC @johnthetubaguy
  • 31. Join the Scientific SIG and... Get involved! https://www.openstack.org/science/