SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Clouds at CERN : A 5 year perspective
Utility and Cloud Computing Conference, December 19, 2018
Tim Bell
@noggin143UCC 2018 2
About Tim
• Responsible for Compute
and Monitoring in CERN
IT department
• Elected member of the
OpenStack Foundation
management board
• Member of the
OpenStack user
committee from 2013-
2015
UCC 2018 3
UCC 2018 4
CERNa
Worldwide
collaboration
CERN’s primary mission:
SCIENCE
Fundamental research on particle physics,
pushing the boundaries of knowledge and
technology
CERN
World’s largest
particle physics
laboratory
UCC 20185
Image credit: CERN
UCC 20186
The Large Hadron Collider: LHC
1232
dipole magnets
15 metres
35t EACH
27km
Image credit: CERN
Image credit: CERN
COLDER
TEMPERATURES
than outer space
( 120t He )
UCC 20187
LHC: World’s Largest Cryogenic System (1.9 K)
Vacuum?
• Yes
UCC 20188
LHC: Highest Vacuum
104 km
of PIPES
10-11bar (~ moon)
Image credit: CERN
Image credit: CERN
Image credit: CERN
UCC 20189
ATLAS, CMS, ALICE and LHCb
EIFFEL
TOWER
HEAVIER
than the
Image credit: CERN
UCC 2018 10
40 million
pictures
per second
1PB/s
Image credit: CERN
About the CERN IT Department
UCC 2018 11
Enable the laboratory to fulfill its mission
- Main data centre on Meyrin site
- Wigner data centre in Budapest (since 2013)
- Connected via three dedicated 100Gbs links
- Where possible, resources at both sites
(plus disaster recovery)
Drone footage of the CERN CC
About the CERN IT Department
UCC 2018
4
Enable the laboratory to fulfill its mission
- Main data centre on Meyrin site
- Wigner data centre in Budapest (since 2013)
- Connected via three dedicated 100Gbs links
- Where possible, resources at both sites
(plus disaster recovery)
Drone footage of the CERN CC
19/12/2018
Status: Service Level Overview
UCC 2018
12
Outline
UCC 2018
13
• Fabric Management before 2012
• The AI Project
• The three AI areas
- Configuration Management
- Monitoring
- Resource provisioning
• Review
CERN IT Tools up to 2011 (1)
UCC 2018
14
• Developed in series of EU funded projects
- 2001-2004: European DataGrid
- 2004-2010: EGEE
• Work package 4 – Fabric management:
“Deliver a computing fabric comprised of all the necessary tools to
manage a centre providing grid services on clusters of thousands of
nodes.”
CERN IT Tools up to 2011 (2)
UCC 2018
15
• The WP4 software was developed from scratch
- Scale and experience needed for LHC Computing was special
- Config’ mgmt, monitoring, secret store, service status, state mgmt, service databases, …
LEMON – LHC Era Monitoring
- client/server based monitoring
- local agent with sensors
- samples stored in a cache & sent to server
- UDP or TCP, w/ or w/o encryption
- support for remote entities
- system administration toolkit
- automated installation, configuration &
management of clusters
- clients interact with a configuration
database (CMDB) & and an installation
infrastructure (AII)
Around 8’000 servers managed!
2012: A Turning Point for CERN IT
UCC 2018
16
• EU projects finished in 2010: decreasing development and support
• LHC compute and data requirements increasing
- Moore’s law would help, but not enough
• Staff would not grow with managed resources
- Standardization & automation, current tools not apt
• Other deployments have surpassed the CERN one
- Mostly commercial companies like Google, Facebook, Rackspace, Amazon, Yahoo!, …
- We were no longer special! Can we profit?
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
we are
here
what we
can afford
LS1 (2013) ahead, next window for change would only open in 2019 …
2012
UCC 2018
17
How we began …
• Formed a small team of service managers from …
- Large services (e.g. batch, plus)
- Existing fabric services (e.g. monitoring)
- Existing virtualization service
• ... to define project goals
- What issues do we need to address?
- What forward looking features do we need?
http://iopscience.iop.org/article/10.1088/1742-6596/396/4/042002/pdf
Agile Infrastructure Project Goals
UCC 2018
18
New data centre support
- Overcome limits of CC in Meyrin
- Disaster recovery and business continuity
- ‘Smart hands’ approach
1
Agile Infrastructure Project Goals
UCC 2018
19
Sustainable tool support
- Tools to be used at our scale need maintenance
- Tools with a limited community require more time for
newcomers to become productive and are less valuable
for the time after (transferable skills)
2
Agile Infrastructure Project Goals
UCC 2018
20
Improve user response time
- Reduce the resource provisioning time span
(current virtualization service reached scaling limits)
- Self-service kiosk
3
Agile Infrastructure Project Goals
UCC 2018
21
Enable cloud interfaces
- Experiments already started to use EC2
- Enable libraries such as Apache’s libcloud
4
Agile Infrastructure Project Goals
UCC 2018
22
Precise monitoring and
accounting
- Enable timely monitoring for debugging
- Showback usage to the cloud users
- Consolidate accounting data for usage of CPU, network,
storage … across batch, physical nodes and grid
resources
5
Agile Infrastructure Project Goals
UCC 2018
23
Improve resource
efficiency
- Adapt provisioned resources to services’ needs
- Streamline the provisioning workflows
(e.g. burn-in, repair or retirement)
6
Our Approach: Tool Chain and DevOps
UCC 2018
24
• CERN’s requirements are no longer special!
• A set of tools emerged when looking at other places
• Small dedicated tools
allowed for rapid validation &
prototyping
• Adapted our processes,
policies and work flows
to the tools!
• Join (and contribute to)
existing communities!
IT Policy Changes for Services
UCC 2018
25
• Services shall be virtual …
- Within reason
- Exceptions are costly!
• Puppet managed, and …
• … monitored!
- (Semi-)automatic with Puppet
Decrease provisioning time
Increase resource efficiency
Simplify infrastructure mgmt
Profit from others’ work
Speed up deployment
‘Automatic’ documentation
Centralized monitoring
Integrated alarm handling
UCC 2018
26
Tools + Policies:
Sounds simple!
From tools to services is complex!
- Integration w/ sec services?
- Incident handling?
- Request work flows?
- Change management?
- Accounting and charging?
- Life cycle management?
- … Image: Subbu Allamaraju
Public Procurement Timelines
UCC 2018 27
Resource Provisioning: IaaS
UCC 2018
28
• Based on OpenStack
- Collection of open source projects for cloud orchestration
- Started by NASA and Rackspace in 2010
- Grown into a global software community
Early Prototypes
UCC 2018 29
The CERN Cloud Service
UCC 2018
30
• Production since July 2013
- Several rolling upgrades since,
now on Rocky
- Many sub services deployed
• Spans two data centers
- One region, one API entry point
• Deployed using RDO + Puppet
- Mostly upstream, patched where needed
• Many sub services run on VMs!
- Boot strapping
UCC 2018
31
Agility in the Cloud
UCC 2018
32
• Use case spectrum
- Batch service (physics analysis)
- IT services (built on each other)
- Experiment services (build)
- Engineering (chip design)
- Infrastructure (hotel, bikes)
- Personal (development)
• Hardware spectrum
- Processor archs (features, NUMA, …)
- Core-to-RAM ratio (1:2, 1:3, 1:5, …)
- Core-to-disk ratio (2x or 4x SSDs)
- Disk layout (2, 3, 4, mixed)
- Network (1/10GbE, FC, domain)
- Location (DC, power)
- SLC6, CC7, RHEL, Windows
- …
What about our initial goals?
UCC 2018
33
• The remote DC is seamlessly
integrated
- No difference from provisioning PoV
- Easily accessible by users
- Local DC limits overcome (business continuity?)
• Sustainable tools
- Number of managed machines has multiplied
- Good collaboration with upstream communities
- Newcomers know tools, can use knowledge
afterwards
• Provisioning time span is ~minutes
- Was several months before
- Self-service kiosk with automated workflows
• Cloud interfaces
- Good OpenStack adoption, EC2 support
• Flexible monitoring infra
- Automatic in for simple cases
- Powerful tool set for more complex ones
- Accounting for local and grid resources
• Increased resource efficiency
- ‘Packing’ of services
- Overcommit
- Adapted to services’ needs
- Quick draining & back filling
So … 100% success?
Cloud Architecture Overview
UCC 2018
34
• Top and child cells for scaling
- API, DB, MQ, Compute nodes
- Remote DC is set of cells
• Nova HA only on top cell
- Simplicity vs impact
• Other projects global
- Load balanced controllers
- RabbitMQ clusters
• Three Ceph instances
- Volumes (Cinder), images (Glance), shares (Manila)
UCC 2018 35
HL-LHC SKA
Tech. Challenge: Scaling
• OpenStack Cells provides composable units
• Cells V1 – Special custom developments
• Cells V2 – Now the standard deployment model
• Broadcast vs Targetted queries
• Handling down cells
• Quota
• Academic and scientific instances push the
limits
• Now many enterprise clouds above 1000
hypervisors
• CERN running 73 Cells in production
UCC 2018 36
https://www.openstack.org/analytics
Tech. Challenge: CPU Performance
UCC 2018
37
• The benchmarks on full-node VMs was about 20% lower
than the one of the underlying host
- Smaller VMs much better
• Investigated various tuning options
- KSM*, EPT**, PAE, Pinning, … +hardware type dependencies
- Discrepancy down to ~10% between virtual and physical
• Comparison with Hyper-V: no general issue
- Loss w/o tuning ~3% (full-node), <1% for small VMs
- … NUMA-awareness!
*KSM on/off: beware of memory reclaim! **EPT on/off: beware of expensive page table walks!
CPU Performance: NUMA
UCC 2018
38
• NUMA-awareness identified as most
efficient setting
• “EPT-off” side-effect
- Small number of hosts, but very
visible there
• Use 2MB Huge Pages
- Keep the “EPT off” performance gain
with “EPT on”
NUMA roll-out
UCC 2018
39
• Rolled out on ~2’000 batch hypervisors (~6’000 VMs)
- HP allocation as boot parameter  reboot
- VM NUMA awareness as flavor metadata  delete/recreate
• Cell-by-cell (~200 hosts):
- Queue-reshuffle to minimize resource impact
- Draining & deletion of batch VMs
- Hypervisor reconfiguration (Puppet) & reboot
- Recreation of batch VMs
• Whole update took about 8 weeks
- Organized between batch and cloud teams
- No performance issue observed since
VM Before After
4x 8 8%
2x 16 16%
1x 24 20% 5%
1x 32 20% 3%
Tech. Challenge: Under used resources
UCC 2018 40
VM Expiry
UCC 2018 41
• Each personal instance will have an expiration date
• Set shortly after creation and evaluated daily
• Configured to 180 days, renewable
• Reminder mails starting 30 days before expiration
Expiry results
UCC 2018 42
• Results exceeded
expectations
• Expired
• >1000 VMs
• >3000 cores
Tech. Challenge: Bare Metal
UCC 2018 43
• VMs not suitable for all of our use cases
- Storage and database nodes, HPC clusters, boot strapping,
critical network equipment or specialised network setups,
precise/repeatable benchmarking for s/w frameworks, …
• Complete our service offerings
- Physical nodes (in addition to VMs and containers)
- OpenStack UI as the single pane of glass
• Simplify hardware provisioning workflows
- For users: openstack server create/delete
- For procurement & h/w provisioning team: initial on-boarding, server re-assignments
• Consolidate accounting & bookkeeping
- Resource accounting input will come from less sources
- Machine re-assignments will be easier to track
Adapt the Burn In process
• “Burn-in” before acceptance
- Compliance with technical spec (e.g. performance)
- Find failed components (e.g. broken RAM)
- Find systematic errors (e.g. bad firmware)
- Provoke early failing due to stress
- Tests include
- CPU: burnK7, burnP6, burnMMX (cooling)
- RAM: memtest, Disk: badblocks
- Network: iperf(3) between pairs of nodes
- automatic node pairing
- Benchmarking: HEPSpec06 (& fio)
- derivative of SPEC06
- we buy total compute capacity (not newest processors)
UCC 2018 44
Exploiting cloud services for burn in
UCC 2018 45
Tech. Challenge: Containers
UCC 2018 46
An OpenStack API Service that allows creation of container
clusters
● Use your OpenStack credentials, quota and roles
● You choose your cluster type
● Multi-Tenancy
● Quickly create new clusters with advanced features
such as multi-master
● Integrated monitoring and CERN storage access
● Making it easy to do the right thing
Scale Testing using Rally
• An Openstack benchmark test tool
• Easily extended by plugin
• Test result in HTML reports
• Used by many projects
• Context: set up environment
• Scenario: run benchmark
• Recommended for a production
service
to verify that the service behaves as
expected at all time
UCC 2018 47
Kubernetes
Cluster
pods,
contai
ners
Rally
report
First Attempt – 1M requests/Seq
• 200 Nodes
• Found multiple limits
• Heat Orchestration scaling
• Authentication caches
• Volume deletion
• Site services
UCC 2018 48
Second Attempt – 7M requests/Seq
• Fixes and scale to 1000 Nodes
UCC 2018 49
Cluster Size
(Nodes)
Concurrency Deployment
Time (min)
2 50 2.5
16 10 4
32 10 4
128 5 5.5
512 1 14
1000 1 23
Tech. Challenge: Meltdown
UCC 2018 50
• In January 2018, a security vulnerability was
disclosed a new kernel everywhere
• Staged campaign
• 7 reboot days, 7 tidy up days
• By availability zone
• Benefits
• Automation now to reboot the cloud if needed -
33,000 VMs on 9,000 hypervisors
• Latest QEMU and RBD user code on all VMs
• Then L1TF came along
• And we had to do it all again......
06/06/2018
UCC 2018 51
First run LS1 Second run Third run LS3 HL-LHC Run4
…2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025
LS2
 Significant part of cost comes
from global operations
 Even with technology increase of
~15%/year, we still have a big
gap if we keep trying to do things
with our current compute models
Raw data volume
increases significantly
for High Luminosity LHC
2026
Commercial Clouds
UCC 2018 52
Non-Technical Challenges (1)
UCC 2018
53
• Agile Infrastructure Paradigm Adoption
- ‘VMs are slower than physical machines.’
- ‘I need to keep control on the full stack.’
- ‘This would not have happened with physical machines.’
- ‘It’s the cloud, so it should be able to do X!’
- ‘Using a config’ management tool is too dangerous!’
- ‘They are my machines’
Non-Technical Challenges (2)
UCC 2018
54
• Agility can bring great benefits …
• … but mind (adapted) Hooke’s Law!
- Avoid irreversible deformations
• Ensure the tail is moving as well as
the head
- Application support
- Cultural changes
- Workflow adoption
- Open source community culture can help
Non-Technical Challenges (3)
• Contributor License Agreements
• Patches needed but merges/review time
• Regular staff changes limits Karma
• Need to be a polyglot
• Python, Ruby, Go, … and legacy Perl etc.
• Keep riding the release wave
• Avoid the end-of-life scenarios
UCC 2018 55
Ongoing Work Areas
• Spot Market / Pre-emptible instances
• Software Defined Networking
• Regions
• GPUs
• Containers on Bare Metal
• …
UCC 2018 56
Summary
UCC 2018
57
Positive results after 5 years into the project!
- LHC needs met without additional staff
- Tools and workflows widely adopted and accepted
- Many technical challenges were mastered and returned upstream
- Integration with open source communities successful
- Use of common tools increased CERN’s attraction of talents
Further enhancements in function & scale needed for HL-LHC
Further Information
• CERN information outside the auditorium
• Jobs at CERN – wide range of options
• http://jobs.cern
• CERN blogs
• http://openstack-in-production.blogspot.ch
• https://techblog.web.cern.ch/techblog/
• Recent Talks at OpenStack summits
• https://www.openstack.org/videos/search?search=cern
• Source code
• https://github.com/cernops and https://github.com/openstack
UCC 2018 58
UCC 2018
59
Agile Infrastructure Core Areas
UCC 2018
61
• Resource provisioning (IaaS)
- Based on OpenStack
• Centralized Monitoring
- Based on Collectd (sensor) + ‘ELK’ stack
• Configuration Management
- Based on Puppet
Configuration Management
UCC 2018
62
• Client/server architecture
- ‘agents’ running on hosts plus horizontally scalable ‘masters’
• Desired state of hosts described in ‘manifests’
- Simple, declarative language
- ‘resource’ basic unit for system modeling, e.g. package or service
• ‘agent’ discovers system state using ‘facter’
- Sends current system state to masters
• Master compiles data and manifests into ‘catalog’
- Agent applies catalog on the host
Status: Config’ Management (1)
UCC 2018
63
(virtual and physical, private and public cloud)
(‘base’ is what every Puppet node gets)
(compilations are spread out)
(this number includes dev changes)
(number Puppet code committers)
Status: Config’ Management (2)
UCC 2018
64
Status: Config’ Management (3)
UCC 2018
65
• Changes to QA are
announced publicly
• QA duration: 1 week
• All Service Managers
can stop a change!
Monitoring: Scope
UCC 2018
66
Data Centre Monitoring
• Two DCs at CERN and Wigner
• Hardware, O/S, and services
• PDUs, temp sensors, …
• Metrics and logs
Experiment Dashboards
- WLCG Monitoring
- Sites availability, data transfers,
job information, reports
- Used by WLCG, experiments,
sites and users
UCC 2018
67
Status: (Unified) Monitoring (1)
• Offering: monitor, collect, aggregate, process,
visualize, alarm … for metrics and logs!
• ~400 (virtual) servers, 500GB/day, 1B docs/day
- Mon data management from CERN IT and WLCG
- Infrastructure and tools for CERN IT and WLCG
• Migrations ongoing (double maintenance)
- CERN IT: From Lemon sensor to collectd
- WLCG: From former infra, tools, and dashboards
Status: (Unified) Monitoring (2)
UCC 2018
68
Kafka cluster
(buffering) *
Processing
Data enrichment
Data aggregation
Batch Processing
Transport
FlumeKafkasink
Flumesinks
FTS
Data
Sources
Rucio
XRootD
Jobs
…
Lemon
syslog
app log
DB
HTTP
feed
AMQ
Flume
AMQ
Flume
DB
Flume
HTTP
Flume
Log
GW
Flume
Metric
GW
Logs
Lemon
metrics
HDFS
Elastic
Search
…
Storage &
Search
Others
(influxdb)
Data
Access
CLI, API
User
Views
User
Jobs
User
Data
Today: > 500 GB/day, 72h buffering

Weitere ähnliche Inhalte

Was ist angesagt?

20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Arne Wiebalck
 
TOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real WorldTOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real WorldAndrew Hickey
 
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...Arne Wiebalck
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresRafael Ferreira da Silva
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...J On The Beach
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpStacy Véronneau
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailJose Antonio Coarasa Perez
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...OpenStack
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project OverviewFloris Sluiter
 

Was ist angesagt? (20)

20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
 
TOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real WorldTOWARDS Hybrid OpenStack Clouds in the Real World
TOWARDS Hybrid OpenStack Clouds in the Real World
 
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...
(R)Evolution in the CERN IT Department: A 5 year perspective on the Agile Inf...
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
Helix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status UpdateHelix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status Update
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
Big Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS ExampleBig Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS Example
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project Overview
 

Ähnlich wie 20181219 ucc open stack 5 years v3

Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project OverviewRECAP Project
 
Grid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsGrid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsDibyadip Das
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...OW2
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...Palladio Optimization Suite: QoS optimization for component-based Cloud appli...
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...Michele Ciavotta, PH. D.
 

Ähnlich wie 20181219 ucc open stack 5 years v3 (20)

Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
RECAP Project Overview
RECAP Project OverviewRECAP Project Overview
RECAP Project Overview
 
Grid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locationsGrid Computing - Collection of computer resources from multiple locations
Grid Computing - Collection of computer resources from multiple locations
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Dice presents-feb2014
Dice presents-feb2014Dice presents-feb2014
Dice presents-feb2014
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...Palladio Optimization Suite: QoS optimization for component-based Cloud appli...
Palladio Optimization Suite: QoS optimization for component-based Cloud appli...
 

Mehr von Tim Bell

CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019Tim Bell
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Tim Bell
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013Tim Bell
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6Tim Bell
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2Tim Bell
 
Havana survey results-final
Havana survey results-finalHavana survey results-final
Havana survey results-finalTim Bell
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3Tim Bell
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating ScienceTim Bell
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with PuppetTim Bell
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2Tim Bell
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 

Mehr von Tim Bell (17)

CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2
 
Havana survey results-final
Havana survey results-finalHavana survey results-final
Havana survey results-final
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with Puppet
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

20181219 ucc open stack 5 years v3

  • 1.
  • 2. Clouds at CERN : A 5 year perspective Utility and Cloud Computing Conference, December 19, 2018 Tim Bell @noggin143UCC 2018 2
  • 3. About Tim • Responsible for Compute and Monitoring in CERN IT department • Elected member of the OpenStack Foundation management board • Member of the OpenStack user committee from 2013- 2015 UCC 2018 3
  • 4. UCC 2018 4 CERNa Worldwide collaboration CERN’s primary mission: SCIENCE Fundamental research on particle physics, pushing the boundaries of knowledge and technology
  • 6. UCC 20186 The Large Hadron Collider: LHC 1232 dipole magnets 15 metres 35t EACH 27km Image credit: CERN
  • 7. Image credit: CERN COLDER TEMPERATURES than outer space ( 120t He ) UCC 20187 LHC: World’s Largest Cryogenic System (1.9 K)
  • 8. Vacuum? • Yes UCC 20188 LHC: Highest Vacuum 104 km of PIPES 10-11bar (~ moon) Image credit: CERN
  • 9. Image credit: CERN Image credit: CERN UCC 20189 ATLAS, CMS, ALICE and LHCb EIFFEL TOWER HEAVIER than the Image credit: CERN
  • 10. UCC 2018 10 40 million pictures per second 1PB/s Image credit: CERN
  • 11. About the CERN IT Department UCC 2018 11 Enable the laboratory to fulfill its mission - Main data centre on Meyrin site - Wigner data centre in Budapest (since 2013) - Connected via three dedicated 100Gbs links - Where possible, resources at both sites (plus disaster recovery) Drone footage of the CERN CC About the CERN IT Department UCC 2018 4 Enable the laboratory to fulfill its mission - Main data centre on Meyrin site - Wigner data centre in Budapest (since 2013) - Connected via three dedicated 100Gbs links - Where possible, resources at both sites (plus disaster recovery) Drone footage of the CERN CC 19/12/2018
  • 12. Status: Service Level Overview UCC 2018 12
  • 13. Outline UCC 2018 13 • Fabric Management before 2012 • The AI Project • The three AI areas - Configuration Management - Monitoring - Resource provisioning • Review
  • 14. CERN IT Tools up to 2011 (1) UCC 2018 14 • Developed in series of EU funded projects - 2001-2004: European DataGrid - 2004-2010: EGEE • Work package 4 – Fabric management: “Deliver a computing fabric comprised of all the necessary tools to manage a centre providing grid services on clusters of thousands of nodes.”
  • 15. CERN IT Tools up to 2011 (2) UCC 2018 15 • The WP4 software was developed from scratch - Scale and experience needed for LHC Computing was special - Config’ mgmt, monitoring, secret store, service status, state mgmt, service databases, … LEMON – LHC Era Monitoring - client/server based monitoring - local agent with sensors - samples stored in a cache & sent to server - UDP or TCP, w/ or w/o encryption - support for remote entities - system administration toolkit - automated installation, configuration & management of clusters - clients interact with a configuration database (CMDB) & and an installation infrastructure (AII) Around 8’000 servers managed!
  • 16. 2012: A Turning Point for CERN IT UCC 2018 16 • EU projects finished in 2010: decreasing development and support • LHC compute and data requirements increasing - Moore’s law would help, but not enough • Staff would not grow with managed resources - Standardization & automation, current tools not apt • Other deployments have surpassed the CERN one - Mostly commercial companies like Google, Facebook, Rackspace, Amazon, Yahoo!, … - We were no longer special! Can we profit? 0 20 40 60 80 100 120 140 160 Run 1 Run 2 Run 3 Run 4 GRID ATLAS CMS LHCb ALICE we are here what we can afford LS1 (2013) ahead, next window for change would only open in 2019 … 2012
  • 17. UCC 2018 17 How we began … • Formed a small team of service managers from … - Large services (e.g. batch, plus) - Existing fabric services (e.g. monitoring) - Existing virtualization service • ... to define project goals - What issues do we need to address? - What forward looking features do we need? http://iopscience.iop.org/article/10.1088/1742-6596/396/4/042002/pdf
  • 18. Agile Infrastructure Project Goals UCC 2018 18 New data centre support - Overcome limits of CC in Meyrin - Disaster recovery and business continuity - ‘Smart hands’ approach 1
  • 19. Agile Infrastructure Project Goals UCC 2018 19 Sustainable tool support - Tools to be used at our scale need maintenance - Tools with a limited community require more time for newcomers to become productive and are less valuable for the time after (transferable skills) 2
  • 20. Agile Infrastructure Project Goals UCC 2018 20 Improve user response time - Reduce the resource provisioning time span (current virtualization service reached scaling limits) - Self-service kiosk 3
  • 21. Agile Infrastructure Project Goals UCC 2018 21 Enable cloud interfaces - Experiments already started to use EC2 - Enable libraries such as Apache’s libcloud 4
  • 22. Agile Infrastructure Project Goals UCC 2018 22 Precise monitoring and accounting - Enable timely monitoring for debugging - Showback usage to the cloud users - Consolidate accounting data for usage of CPU, network, storage … across batch, physical nodes and grid resources 5
  • 23. Agile Infrastructure Project Goals UCC 2018 23 Improve resource efficiency - Adapt provisioned resources to services’ needs - Streamline the provisioning workflows (e.g. burn-in, repair or retirement) 6
  • 24. Our Approach: Tool Chain and DevOps UCC 2018 24 • CERN’s requirements are no longer special! • A set of tools emerged when looking at other places • Small dedicated tools allowed for rapid validation & prototyping • Adapted our processes, policies and work flows to the tools! • Join (and contribute to) existing communities!
  • 25. IT Policy Changes for Services UCC 2018 25 • Services shall be virtual … - Within reason - Exceptions are costly! • Puppet managed, and … • … monitored! - (Semi-)automatic with Puppet Decrease provisioning time Increase resource efficiency Simplify infrastructure mgmt Profit from others’ work Speed up deployment ‘Automatic’ documentation Centralized monitoring Integrated alarm handling
  • 26. UCC 2018 26 Tools + Policies: Sounds simple! From tools to services is complex! - Integration w/ sec services? - Incident handling? - Request work flows? - Change management? - Accounting and charging? - Life cycle management? - … Image: Subbu Allamaraju
  • 28. Resource Provisioning: IaaS UCC 2018 28 • Based on OpenStack - Collection of open source projects for cloud orchestration - Started by NASA and Rackspace in 2010 - Grown into a global software community
  • 30. The CERN Cloud Service UCC 2018 30 • Production since July 2013 - Several rolling upgrades since, now on Rocky - Many sub services deployed • Spans two data centers - One region, one API entry point • Deployed using RDO + Puppet - Mostly upstream, patched where needed • Many sub services run on VMs! - Boot strapping
  • 32. Agility in the Cloud UCC 2018 32 • Use case spectrum - Batch service (physics analysis) - IT services (built on each other) - Experiment services (build) - Engineering (chip design) - Infrastructure (hotel, bikes) - Personal (development) • Hardware spectrum - Processor archs (features, NUMA, …) - Core-to-RAM ratio (1:2, 1:3, 1:5, …) - Core-to-disk ratio (2x or 4x SSDs) - Disk layout (2, 3, 4, mixed) - Network (1/10GbE, FC, domain) - Location (DC, power) - SLC6, CC7, RHEL, Windows - …
  • 33. What about our initial goals? UCC 2018 33 • The remote DC is seamlessly integrated - No difference from provisioning PoV - Easily accessible by users - Local DC limits overcome (business continuity?) • Sustainable tools - Number of managed machines has multiplied - Good collaboration with upstream communities - Newcomers know tools, can use knowledge afterwards • Provisioning time span is ~minutes - Was several months before - Self-service kiosk with automated workflows • Cloud interfaces - Good OpenStack adoption, EC2 support • Flexible monitoring infra - Automatic in for simple cases - Powerful tool set for more complex ones - Accounting for local and grid resources • Increased resource efficiency - ‘Packing’ of services - Overcommit - Adapted to services’ needs - Quick draining & back filling So … 100% success?
  • 34. Cloud Architecture Overview UCC 2018 34 • Top and child cells for scaling - API, DB, MQ, Compute nodes - Remote DC is set of cells • Nova HA only on top cell - Simplicity vs impact • Other projects global - Load balanced controllers - RabbitMQ clusters • Three Ceph instances - Volumes (Cinder), images (Glance), shares (Manila)
  • 36. Tech. Challenge: Scaling • OpenStack Cells provides composable units • Cells V1 – Special custom developments • Cells V2 – Now the standard deployment model • Broadcast vs Targetted queries • Handling down cells • Quota • Academic and scientific instances push the limits • Now many enterprise clouds above 1000 hypervisors • CERN running 73 Cells in production UCC 2018 36 https://www.openstack.org/analytics
  • 37. Tech. Challenge: CPU Performance UCC 2018 37 • The benchmarks on full-node VMs was about 20% lower than the one of the underlying host - Smaller VMs much better • Investigated various tuning options - KSM*, EPT**, PAE, Pinning, … +hardware type dependencies - Discrepancy down to ~10% between virtual and physical • Comparison with Hyper-V: no general issue - Loss w/o tuning ~3% (full-node), <1% for small VMs - … NUMA-awareness! *KSM on/off: beware of memory reclaim! **EPT on/off: beware of expensive page table walks!
  • 38. CPU Performance: NUMA UCC 2018 38 • NUMA-awareness identified as most efficient setting • “EPT-off” side-effect - Small number of hosts, but very visible there • Use 2MB Huge Pages - Keep the “EPT off” performance gain with “EPT on”
  • 39. NUMA roll-out UCC 2018 39 • Rolled out on ~2’000 batch hypervisors (~6’000 VMs) - HP allocation as boot parameter  reboot - VM NUMA awareness as flavor metadata  delete/recreate • Cell-by-cell (~200 hosts): - Queue-reshuffle to minimize resource impact - Draining & deletion of batch VMs - Hypervisor reconfiguration (Puppet) & reboot - Recreation of batch VMs • Whole update took about 8 weeks - Organized between batch and cloud teams - No performance issue observed since VM Before After 4x 8 8% 2x 16 16% 1x 24 20% 5% 1x 32 20% 3%
  • 40. Tech. Challenge: Under used resources UCC 2018 40
  • 41. VM Expiry UCC 2018 41 • Each personal instance will have an expiration date • Set shortly after creation and evaluated daily • Configured to 180 days, renewable • Reminder mails starting 30 days before expiration
  • 42. Expiry results UCC 2018 42 • Results exceeded expectations • Expired • >1000 VMs • >3000 cores
  • 43. Tech. Challenge: Bare Metal UCC 2018 43 • VMs not suitable for all of our use cases - Storage and database nodes, HPC clusters, boot strapping, critical network equipment or specialised network setups, precise/repeatable benchmarking for s/w frameworks, … • Complete our service offerings - Physical nodes (in addition to VMs and containers) - OpenStack UI as the single pane of glass • Simplify hardware provisioning workflows - For users: openstack server create/delete - For procurement & h/w provisioning team: initial on-boarding, server re-assignments • Consolidate accounting & bookkeeping - Resource accounting input will come from less sources - Machine re-assignments will be easier to track
  • 44. Adapt the Burn In process • “Burn-in” before acceptance - Compliance with technical spec (e.g. performance) - Find failed components (e.g. broken RAM) - Find systematic errors (e.g. bad firmware) - Provoke early failing due to stress - Tests include - CPU: burnK7, burnP6, burnMMX (cooling) - RAM: memtest, Disk: badblocks - Network: iperf(3) between pairs of nodes - automatic node pairing - Benchmarking: HEPSpec06 (& fio) - derivative of SPEC06 - we buy total compute capacity (not newest processors) UCC 2018 44
  • 45. Exploiting cloud services for burn in UCC 2018 45
  • 46. Tech. Challenge: Containers UCC 2018 46 An OpenStack API Service that allows creation of container clusters ● Use your OpenStack credentials, quota and roles ● You choose your cluster type ● Multi-Tenancy ● Quickly create new clusters with advanced features such as multi-master ● Integrated monitoring and CERN storage access ● Making it easy to do the right thing
  • 47. Scale Testing using Rally • An Openstack benchmark test tool • Easily extended by plugin • Test result in HTML reports • Used by many projects • Context: set up environment • Scenario: run benchmark • Recommended for a production service to verify that the service behaves as expected at all time UCC 2018 47 Kubernetes Cluster pods, contai ners Rally report
  • 48. First Attempt – 1M requests/Seq • 200 Nodes • Found multiple limits • Heat Orchestration scaling • Authentication caches • Volume deletion • Site services UCC 2018 48
  • 49. Second Attempt – 7M requests/Seq • Fixes and scale to 1000 Nodes UCC 2018 49 Cluster Size (Nodes) Concurrency Deployment Time (min) 2 50 2.5 16 10 4 32 10 4 128 5 5.5 512 1 14 1000 1 23
  • 50. Tech. Challenge: Meltdown UCC 2018 50 • In January 2018, a security vulnerability was disclosed a new kernel everywhere • Staged campaign • 7 reboot days, 7 tidy up days • By availability zone • Benefits • Automation now to reboot the cloud if needed - 33,000 VMs on 9,000 hypervisors • Latest QEMU and RBD user code on all VMs • Then L1TF came along • And we had to do it all again...... 06/06/2018
  • 51. UCC 2018 51 First run LS1 Second run Third run LS3 HL-LHC Run4 …2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025 LS2  Significant part of cost comes from global operations  Even with technology increase of ~15%/year, we still have a big gap if we keep trying to do things with our current compute models Raw data volume increases significantly for High Luminosity LHC 2026
  • 53. Non-Technical Challenges (1) UCC 2018 53 • Agile Infrastructure Paradigm Adoption - ‘VMs are slower than physical machines.’ - ‘I need to keep control on the full stack.’ - ‘This would not have happened with physical machines.’ - ‘It’s the cloud, so it should be able to do X!’ - ‘Using a config’ management tool is too dangerous!’ - ‘They are my machines’
  • 54. Non-Technical Challenges (2) UCC 2018 54 • Agility can bring great benefits … • … but mind (adapted) Hooke’s Law! - Avoid irreversible deformations • Ensure the tail is moving as well as the head - Application support - Cultural changes - Workflow adoption - Open source community culture can help
  • 55. Non-Technical Challenges (3) • Contributor License Agreements • Patches needed but merges/review time • Regular staff changes limits Karma • Need to be a polyglot • Python, Ruby, Go, … and legacy Perl etc. • Keep riding the release wave • Avoid the end-of-life scenarios UCC 2018 55
  • 56. Ongoing Work Areas • Spot Market / Pre-emptible instances • Software Defined Networking • Regions • GPUs • Containers on Bare Metal • … UCC 2018 56
  • 57. Summary UCC 2018 57 Positive results after 5 years into the project! - LHC needs met without additional staff - Tools and workflows widely adopted and accepted - Many technical challenges were mastered and returned upstream - Integration with open source communities successful - Use of common tools increased CERN’s attraction of talents Further enhancements in function & scale needed for HL-LHC
  • 58. Further Information • CERN information outside the auditorium • Jobs at CERN – wide range of options • http://jobs.cern • CERN blogs • http://openstack-in-production.blogspot.ch • https://techblog.web.cern.ch/techblog/ • Recent Talks at OpenStack summits • https://www.openstack.org/videos/search?search=cern • Source code • https://github.com/cernops and https://github.com/openstack UCC 2018 58
  • 60.
  • 61. Agile Infrastructure Core Areas UCC 2018 61 • Resource provisioning (IaaS) - Based on OpenStack • Centralized Monitoring - Based on Collectd (sensor) + ‘ELK’ stack • Configuration Management - Based on Puppet
  • 62. Configuration Management UCC 2018 62 • Client/server architecture - ‘agents’ running on hosts plus horizontally scalable ‘masters’ • Desired state of hosts described in ‘manifests’ - Simple, declarative language - ‘resource’ basic unit for system modeling, e.g. package or service • ‘agent’ discovers system state using ‘facter’ - Sends current system state to masters • Master compiles data and manifests into ‘catalog’ - Agent applies catalog on the host
  • 63. Status: Config’ Management (1) UCC 2018 63 (virtual and physical, private and public cloud) (‘base’ is what every Puppet node gets) (compilations are spread out) (this number includes dev changes) (number Puppet code committers)
  • 64. Status: Config’ Management (2) UCC 2018 64
  • 65. Status: Config’ Management (3) UCC 2018 65 • Changes to QA are announced publicly • QA duration: 1 week • All Service Managers can stop a change!
  • 66. Monitoring: Scope UCC 2018 66 Data Centre Monitoring • Two DCs at CERN and Wigner • Hardware, O/S, and services • PDUs, temp sensors, … • Metrics and logs Experiment Dashboards - WLCG Monitoring - Sites availability, data transfers, job information, reports - Used by WLCG, experiments, sites and users
  • 67. UCC 2018 67 Status: (Unified) Monitoring (1) • Offering: monitor, collect, aggregate, process, visualize, alarm … for metrics and logs! • ~400 (virtual) servers, 500GB/day, 1B docs/day - Mon data management from CERN IT and WLCG - Infrastructure and tools for CERN IT and WLCG • Migrations ongoing (double maintenance) - CERN IT: From Lemon sensor to collectd - WLCG: From former infra, tools, and dashboards
  • 68. Status: (Unified) Monitoring (2) UCC 2018 68 Kafka cluster (buffering) * Processing Data enrichment Data aggregation Batch Processing Transport FlumeKafkasink Flumesinks FTS Data Sources Rucio XRootD Jobs … Lemon syslog app log DB HTTP feed AMQ Flume AMQ Flume DB Flume HTTP Flume Log GW Flume Metric GW Logs Lemon metrics HDFS Elastic Search … Storage & Search Others (influxdb) Data Access CLI, API User Views User Jobs User Data Today: > 500 GB/day, 72h buffering

Hinweis der Redaktion

  1. Reference: Fabiola’s talk @ Univ of Geneva https://www.unige.ch/public/actualites/2017/le-boson-de-higgs-et-notre-vie/ European Centre for Nuclear research Founded in 1954, today 22 member state World largest particle physics laboratory ~2.300 staff, 13k users on site Budget 1k MCHF Mission Answer fundamental question on the universe Advance the technology frontiers Train scientist of tomorrow Bring nations together https://communications.web.cern.ch/fr/node/84
  2. For all this fundamental research, CERN provides different facilities to scientists, for example the LHC It’s a ring 27 km in circumference, crosses 2 countries, 100 mt underground, accelerates 2 particle beans to near the speed of light, and it make them collides to 4 different points where there are detectors to observe the fireworks. 2.500 people employed by CERN, > 10k users on the site Talk about LHC here, describe experiment, lake geneve , mont blanc, an then jump in Big ring is the LHC, the small one is the SPS, computer centre is not so far. Pushing the boundary of technology, It facilitate research, we just run the accelerators, experiment are done by institurtes, member states, university Itranco swiss border, very close to geneva
  3. Our flagship program is the LHC Trillions of protons race around the 27km ring in opposite directions over 11,000 times a second, travelling at 99.9999991 per cent the speed of light. Largest machine on earth
  4. With an operating temperature of about -271 degrees Celsius, just 1.9 degrees above absolute zero, the LHC is one of the coldest places in the universe 120T Helium, only at that temperature there is no resistence
  5. https://home.cern/about/engineering/vacuum-empty-interstellar-space Inside beam operate a vey high vacuum, comparable to vacuum of the moon, there actually 2 beam, proton beams going int 2 directions, vaccum to avoiud protocon interacting with other particles
  6. Technology very advanced beasts, 4 of them, ATLAS and CMS are the most well known ones, generale pouprose testing standard model properties, in those detector higgs particle have been discovered in 2012 In the picture you can see physicists. ALICE and LHCB To sample and record the debris from up to 600 million proton collisions per second, scientists are building gargantuan devices that measure particles with micron precision.
  7. 100 Mpixel camera, 40 Million picture per seconds https://www.ethz.ch/en/news-and-events/eth-news/news/2017/03/new-heart-for-cerns-cms.html