SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
CERN Batch in the HNSciCloud
14/06/2018 HNSciCloud & Batch 2
Ben Jones IT-CM
CERN Batch Service
• 230k cores of compute
provided via HTCondor (or
LSF) for LHC Grid and local
users
• Large increase in capacity
over last couple of years to
support Run 2
• Batch service has been able
to run on public cloud and
other opportunistic resources
– in particular grid workflows
14/06/2018 HNSciCloud & Batch 3
HNSciCloud & Batch 4
HTCondor Utilization
Grid vs Local
Grid Local
Authentication X509 Proxy Kerberos
Submitters LHC experiments, COMPASS,
NA62, ILC, DUNE…
Local users of experiments,
Beams, Theorists, AMS,
ATLAS Tier-0
Submission method Submission frameworks:
GlideinWMS, Dirac, PanDA,
AliEn
From condor_submit by hand,
to complicated DAGs, to Tier-0
submit frameworks.
Storage Grid protocols. SRM,
XRootD…
AFS, EOS
HNSciCloud & Batch14/06/2018 5
XBatch Cloud experience
14/06/2018 HNSciCloud & Batch 6
2016 2017 2018
“XBatch” integrate cloud
resources into standard pool
with standard tools
Use Docker to abstract
external cloud platform
Batch on Cloud resources
• Grid jobs better candidate to run in cloud as already
designed to be location agnostic, with sophisticated job
management & monitoring
• Use same provisioning & orchestration for public cloud
and local cloud, where possible
• We generally have flat capacity & more jobs than
resources
• The machine / container running job the job lives longer
than the job
• Limited infrastructure required in cloud (proxies…)
14/06/2018 HNSciCloud & Batch 7
xBatch Stack
14/06/2018 HNSciCloud & Batch 8
Terraform
Cloud-init / Provisioning Scripts
Puppet / Foreman
HTCondor startd / Docker Universe
Configuration
Personalization
Provisioning
Application
Job of each layer is just to bootstrap the next
xBatch Stack
14/06/2018 HNSciCloud & Batch 9
Terraform
Cloud-init / Provisioning Scripts
Puppet / Foreman
HTCondor startd / Docker Universe
Configuration
Personalization
Provisioning
Application
Job of each layer is just to bootstrap the next
• Terraform selected as
industry standard tool to
abstract APIs
• Support out of the box for
OpenStack (ie T-Systems,
CERN) or CloudStack
(Exoscale)
• Issues if used to expand /
shrink regularly, but ideal
for our purposes
xBatch Stack
14/06/2018 HNSciCloud & Batch 10
Terraform
Cloud-init / Provisioning Scripts
Puppet / Foreman
HTCondor startd / Docker Universe
Configuration
Personalization
Provisioning
Application
Job of each layer is just to bootstrap the next
• Cloud-init personalizes
machine
• Install and configure
puppet client
• Use one-shot time limited
secret, unique to each
machine, to sign off x509
certificate needed for
puppet & condor
• Terraform post-exec where
cloud-init support lacking
xBatch Stack
14/06/2018 HNSciCloud & Batch 11
Terraform
Cloud-init / Provisioning Scripts
Puppet / Foreman
HTCondor startd / Docker Universe
Configuration
Personalization
Provisioning
Application
Job of each layer is just to bootstrap the next
• As with internal machines,
foreman used for
classification & inventory,
puppet used for
configuration
• Some differences with
internal machines, but re-
use components &
configuration where
possible
xBatch Stack
14/06/2018 HNSciCloud & Batch 12
Terraform
Cloud-init / Provisioning Scripts
Puppet / Foreman
HTCondor startd / Docker Universe
Configuration
Personalization
Provisioning
Application
Job of each layer is just to bootstrap the next
• HTCondor with Docker
”universe” to abstract
cloud machine from wlcg
worker node environment
• Host provides CVMFS and
HTCondor but can use
Cloud-provided CentOS
images
• HTCondor can be
configured to work across
firewalls & NATs
HTCondor Communication
14/06/2018 HNSciCloud & Batch 13
Communication via firewall
14/06/2018 HNSciCloud & Batch 14
CERN HTCondor to HNSci Cloud
• HTCondor works with symmetric
matching of Host Properties with
Job Requirements
• We route jobs explicitly asking for
cloud resources (ie
“WantHNSciRHEA”
“WantHNSciTSys”) to machines
in those clouds
• For experiments, they can be
monitored as specific sites. For
HTCondor, they are separate
routes
14/06/2018 15HNSciCloud & Batch
T-Systems Challenges
• T-Systems primarily RFC1918 addresses, requiring NAT.
• No issue for HTCondor, but additional issue for managing the network
resource
• Initial deployment used self-managed SNAT server
• Single point of failure, that in fact failed during Availability Zone outage
• Migrated to new T-Systems managed SNAT service
• Different flavours of service by # of connections difficult to provision for
• Both solutions require manual intervention from T-Systems to set ports
to 10Gb
• Actual bandwidth has varied under testing
• Still unexplained network issues leading to expired jobs
14/06/2018 HNSciCloud & Batch 16
RHEA Challenges
• Prefer consistent use of provisioning via terraform against
Cloud APIs rather than intermediate PaaS
• Necessary to re-provision resources after contract changes
• Exoscale flavours provide more RAM than strictly required
• 8 core/32GB RAM (4GB/Core), WLCG standard is 2GB/core
• Terraform CloudStack provider sets disk in GiB, but gives status
in KiB
• Leads to circular provisioning for any change
• Required to “ignore” Disk in terraform resource definition
• We learn that we should just migrate to terraform-exoscale :)
14/06/2018 HNSciCloud & Batch 17
Positives
• TSystems API speed now much improved
from previous engagement
• No bottlenecks for terraform provisioning
• Exoscale performance very good, and useful
features like online resizing of instances
14/06/2018 HNSciCloud & Batch 18
WLCG Cloud Consolidation
14/06/2018 HNSciCloud & Batch 19
• 8 of the 10 members of the Buyers Group have
WLCG workload
• Agreement to consolidate to a shared WLCG
tenant
• Reduce effort at procurer sites to support WLCG
workload in Commercial Cloud
• Reduce network traffic between each procurer
site and commercial clouds to support WLCG
workload
• Simplification for LHC experiments to exploit
commercial cloud resources
Unconsolidated
• In the current setup,
an experiment could
have to define 8
additional sites to
send jobs to the
same resources
14/06/2018 HNSciCloud & Batch 20
Shared tenant
• With a shared
tenant, and a single
entry point (CE),
only one site has to
be set up, per cloud
vendor
14/06/2018 HNSciCloud & Batch 21
Vendor batch service
• If Cloud vendors
provide metered
batch services (ie
HTCondor) then
there’s the
possibility of further
simplifying
14/06/2018 HNSciCloud & Batch 22
Multicloud
• If we have to
manage multiple
clouds, having entry
points / routes
onsite may remain
the best solution.
14/06/2018 HNSciCloud & Batch 23
Questions?
HNSciCloud & Batch 2414/06/2018

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
 
KubeCon NA, Seattle, 2016: Performance and Scalability Tuning Kubernetes for...
KubeCon NA, Seattle, 2016:  Performance and Scalability Tuning Kubernetes for...KubeCon NA, Seattle, 2016:  Performance and Scalability Tuning Kubernetes for...
KubeCon NA, Seattle, 2016: Performance and Scalability Tuning Kubernetes for...
 
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
 
Open ebs 101
Open ebs 101Open ebs 101
Open ebs 101
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and future
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 
Adventures in Research
Adventures in ResearchAdventures in Research
Adventures in Research
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...Red Hat Summit 2017:  Wicked Fast PaaS: Performance Tuning of OpenShift and D...
Red Hat Summit 2017: Wicked Fast PaaS: Performance Tuning of OpenShift and D...
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
 
OpenEBS Technical Workshop - KubeCon San Diego 2019
OpenEBS Technical Workshop - KubeCon San Diego 2019OpenEBS Technical Workshop - KubeCon San Diego 2019
OpenEBS Technical Workshop - KubeCon San Diego 2019
 

Ähnlich wie CERN Batch in the HNSciCloud

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 

Ähnlich wie CERN Batch in the HNSciCloud (20)

Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Bridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftBridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaft
 
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker[WSO2Con EU 2018] Deploying Applications in K8S and Docker
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experience
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 

Mehr von Helix Nebula The Science Cloud

Mehr von Helix Nebula The Science Cloud (20)

M-PIL-3.2 Public Session
M-PIL-3.2 Public SessionM-PIL-3.2 Public Session
M-PIL-3.2 Public Session
 
Deep Learning for Fast Simulation
Deep Learning for Fast SimulationDeep Learning for Fast Simulation
Deep Learning for Fast Simulation
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
Container Federation Use Cases
Container Federation Use CasesContainer Federation Use Cases
Container Federation Use Cases
 
LHCb on RHEA and T-Systems
LHCb on RHEA and T-SystemsLHCb on RHEA and T-Systems
LHCb on RHEA and T-Systems
 
HNSciCloud CMS status-report
HNSciCloud CMS status-reportHNSciCloud CMS status-report
HNSciCloud CMS status-report
 
Helix Nebula Science Cloud usage by ALICE
Helix Nebula Science Cloud usage by ALICEHelix Nebula Science Cloud usage by ALICE
Helix Nebula Science Cloud usage by ALICE
 
Hybrid cloud for science
Hybrid cloud for scienceHybrid cloud for science
Hybrid cloud for science
 
HNSciCloud PILOT PLATFORM OVERVIEW
HNSciCloud PILOT PLATFORM OVERVIEWHNSciCloud PILOT PLATFORM OVERVIEW
HNSciCloud PILOT PLATFORM OVERVIEW
 
HNSciCloud Overview
HNSciCloud Overview HNSciCloud Overview
HNSciCloud Overview
 
This Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open SessionThis Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open Session
 
Cloud Services for Education - HNSciCloud applied to the UP2U project
Cloud Services for Education - HNSciCloud applied to the UP2U projectCloud Services for Education - HNSciCloud applied to the UP2U project
Cloud Services for Education - HNSciCloud applied to the UP2U project
 
Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017
 
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
EOSC in practice - Silvana Muscella (chair EOSC HLEG)EOSC in practice - Silvana Muscella (chair EOSC HLEG)
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
 
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, ItalyHelix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
 
Pilot phase Award Ceremony - INFN Introduction and welcome
Pilot phase Award Ceremony - INFN Introduction and welcomePilot phase Award Ceremony - INFN Introduction and welcome
Pilot phase Award Ceremony - INFN Introduction and welcome
 
Early adopter group and closing of webinar - João Fernandes (CERN)
Early adopter group and closing of webinar - João Fernandes (CERN)Early adopter group and closing of webinar - João Fernandes (CERN)
Early adopter group and closing of webinar - João Fernandes (CERN)
 
HNSciCloud pilot phase - Andrea Chierici (INFN)
HNSciCloud pilot phase - Andrea Chierici (INFN)HNSciCloud pilot phase - Andrea Chierici (INFN)
HNSciCloud pilot phase - Andrea Chierici (INFN)
 
Pilot phase Award Ceremony - T-Systems
Pilot phase Award Ceremony - T-SystemsPilot phase Award Ceremony - T-Systems
Pilot phase Award Ceremony - T-Systems
 
Pilot phase Award Ceremony - RHEA
Pilot phase Award Ceremony - RHEAPilot phase Award Ceremony - RHEA
Pilot phase Award Ceremony - RHEA
 

Kürzlich hochgeladen

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

CERN Batch in the HNSciCloud

  • 1.
  • 2. CERN Batch in the HNSciCloud 14/06/2018 HNSciCloud & Batch 2 Ben Jones IT-CM
  • 3. CERN Batch Service • 230k cores of compute provided via HTCondor (or LSF) for LHC Grid and local users • Large increase in capacity over last couple of years to support Run 2 • Batch service has been able to run on public cloud and other opportunistic resources – in particular grid workflows 14/06/2018 HNSciCloud & Batch 3
  • 4. HNSciCloud & Batch 4 HTCondor Utilization
  • 5. Grid vs Local Grid Local Authentication X509 Proxy Kerberos Submitters LHC experiments, COMPASS, NA62, ILC, DUNE… Local users of experiments, Beams, Theorists, AMS, ATLAS Tier-0 Submission method Submission frameworks: GlideinWMS, Dirac, PanDA, AliEn From condor_submit by hand, to complicated DAGs, to Tier-0 submit frameworks. Storage Grid protocols. SRM, XRootD… AFS, EOS HNSciCloud & Batch14/06/2018 5
  • 6. XBatch Cloud experience 14/06/2018 HNSciCloud & Batch 6 2016 2017 2018 “XBatch” integrate cloud resources into standard pool with standard tools Use Docker to abstract external cloud platform
  • 7. Batch on Cloud resources • Grid jobs better candidate to run in cloud as already designed to be location agnostic, with sophisticated job management & monitoring • Use same provisioning & orchestration for public cloud and local cloud, where possible • We generally have flat capacity & more jobs than resources • The machine / container running job the job lives longer than the job • Limited infrastructure required in cloud (proxies…) 14/06/2018 HNSciCloud & Batch 7
  • 8. xBatch Stack 14/06/2018 HNSciCloud & Batch 8 Terraform Cloud-init / Provisioning Scripts Puppet / Foreman HTCondor startd / Docker Universe Configuration Personalization Provisioning Application Job of each layer is just to bootstrap the next
  • 9. xBatch Stack 14/06/2018 HNSciCloud & Batch 9 Terraform Cloud-init / Provisioning Scripts Puppet / Foreman HTCondor startd / Docker Universe Configuration Personalization Provisioning Application Job of each layer is just to bootstrap the next • Terraform selected as industry standard tool to abstract APIs • Support out of the box for OpenStack (ie T-Systems, CERN) or CloudStack (Exoscale) • Issues if used to expand / shrink regularly, but ideal for our purposes
  • 10. xBatch Stack 14/06/2018 HNSciCloud & Batch 10 Terraform Cloud-init / Provisioning Scripts Puppet / Foreman HTCondor startd / Docker Universe Configuration Personalization Provisioning Application Job of each layer is just to bootstrap the next • Cloud-init personalizes machine • Install and configure puppet client • Use one-shot time limited secret, unique to each machine, to sign off x509 certificate needed for puppet & condor • Terraform post-exec where cloud-init support lacking
  • 11. xBatch Stack 14/06/2018 HNSciCloud & Batch 11 Terraform Cloud-init / Provisioning Scripts Puppet / Foreman HTCondor startd / Docker Universe Configuration Personalization Provisioning Application Job of each layer is just to bootstrap the next • As with internal machines, foreman used for classification & inventory, puppet used for configuration • Some differences with internal machines, but re- use components & configuration where possible
  • 12. xBatch Stack 14/06/2018 HNSciCloud & Batch 12 Terraform Cloud-init / Provisioning Scripts Puppet / Foreman HTCondor startd / Docker Universe Configuration Personalization Provisioning Application Job of each layer is just to bootstrap the next • HTCondor with Docker ”universe” to abstract cloud machine from wlcg worker node environment • Host provides CVMFS and HTCondor but can use Cloud-provided CentOS images • HTCondor can be configured to work across firewalls & NATs
  • 14. Communication via firewall 14/06/2018 HNSciCloud & Batch 14
  • 15. CERN HTCondor to HNSci Cloud • HTCondor works with symmetric matching of Host Properties with Job Requirements • We route jobs explicitly asking for cloud resources (ie “WantHNSciRHEA” “WantHNSciTSys”) to machines in those clouds • For experiments, they can be monitored as specific sites. For HTCondor, they are separate routes 14/06/2018 15HNSciCloud & Batch
  • 16. T-Systems Challenges • T-Systems primarily RFC1918 addresses, requiring NAT. • No issue for HTCondor, but additional issue for managing the network resource • Initial deployment used self-managed SNAT server • Single point of failure, that in fact failed during Availability Zone outage • Migrated to new T-Systems managed SNAT service • Different flavours of service by # of connections difficult to provision for • Both solutions require manual intervention from T-Systems to set ports to 10Gb • Actual bandwidth has varied under testing • Still unexplained network issues leading to expired jobs 14/06/2018 HNSciCloud & Batch 16
  • 17. RHEA Challenges • Prefer consistent use of provisioning via terraform against Cloud APIs rather than intermediate PaaS • Necessary to re-provision resources after contract changes • Exoscale flavours provide more RAM than strictly required • 8 core/32GB RAM (4GB/Core), WLCG standard is 2GB/core • Terraform CloudStack provider sets disk in GiB, but gives status in KiB • Leads to circular provisioning for any change • Required to “ignore” Disk in terraform resource definition • We learn that we should just migrate to terraform-exoscale :) 14/06/2018 HNSciCloud & Batch 17
  • 18. Positives • TSystems API speed now much improved from previous engagement • No bottlenecks for terraform provisioning • Exoscale performance very good, and useful features like online resizing of instances 14/06/2018 HNSciCloud & Batch 18
  • 19. WLCG Cloud Consolidation 14/06/2018 HNSciCloud & Batch 19 • 8 of the 10 members of the Buyers Group have WLCG workload • Agreement to consolidate to a shared WLCG tenant • Reduce effort at procurer sites to support WLCG workload in Commercial Cloud • Reduce network traffic between each procurer site and commercial clouds to support WLCG workload • Simplification for LHC experiments to exploit commercial cloud resources
  • 20. Unconsolidated • In the current setup, an experiment could have to define 8 additional sites to send jobs to the same resources 14/06/2018 HNSciCloud & Batch 20
  • 21. Shared tenant • With a shared tenant, and a single entry point (CE), only one site has to be set up, per cloud vendor 14/06/2018 HNSciCloud & Batch 21
  • 22. Vendor batch service • If Cloud vendors provide metered batch services (ie HTCondor) then there’s the possibility of further simplifying 14/06/2018 HNSciCloud & Batch 22
  • 23. Multicloud • If we have to manage multiple clouds, having entry points / routes onsite may remain the best solution. 14/06/2018 HNSciCloud & Batch 23