SlideShare ist ein Scribd-Unternehmen logo
1 von 32
CERN Infrastructure Evolution




                                 Tim Bell
                                   CHEP
CERN IT Department
                              24th May 2012
 CH-1211 Genève 23
        Switzerland
www.cern.ch/it
Agenda

• Problems to address
• Ongoing projects
  – Data centre expansion
  – Configuration management
  – Infrastructure as a Service
  – Monitoring
• Timelines
• Summary

          CERN Infrastructure Evolution   2
Problems to address

• CERN data centre is reaching its
  limits
• IT staff numbers remain fixed but
  more computing capacity is needed
• Tools are high maintenance and
  becoming increasingly brittle
• Inefficiencies exist but root cause
  cannot be easily identified

          CERN Infrastructure Evolution   3
Data centre extension history

• Studies in 2008 into building a new
  computer centre on the CERN
  Prevessin site
  – Too costly
• In 2011, tender run across CERN
  member states for remote hosting
  – 16 bids
• In March 2012, Wigner Institute in
  Budapest, Hungary selected
          CERN Infrastructure Evolution   4
Data Centre Selection




• Wigner Institute in Budapest, Hungary

           CERN Infrastructure Evolution   5
Data Centre Layout & Ramp-up




          CERN Infrastructure Evolution   6
Data Centre Plans
•   Timescales
     – Prototyping in 2012
     – Testing in 2013
     – Production in 2014
•   This will be a “hands-off” facility for CERN
     – They manage the infrastructure and hands-on work
     – We do everything else remotely
•   Need to be reasonably dynamic in our usage
     – Will pay power bill
•   Various possible models of usage – sure to evolve
     – The less specific the hardware installed there, the
       easier to change function
     – Disaster Recovery for key services in Meyrin data
       centre becomes a realistic scenario

                 CERN Infrastructure Evolution               7
Infrastructure Tools Evolution

• We had to develop our own toolset in 2002
• Nowadays,
   – CERN compute capacity is no longer leading edge
   – Many options available for open source fabric
     management
   – We need to scale to meet the upcoming capacity
     increase
• If there is a requirement which is not available
  through an open source tool, we should question
  the need
   – If we are the first to need it, contribute it back to the
     open source tool


                CERN Infrastructure Evolution                    8
Where to Look?
• Large community out there taking the “tool chain”
  approach whose scaling needs match ours:
  O(100k) servers and many applications
• Become standard and join this community
Bulk Hardware Deployment
• Adapt processes for minimal physical
  intervention
• New machines register themselves
  automatically into network databases
  and inventory
• Burn-in test uses standard Scientific
  Linux and production monitoring
  tools
• Detailed hardware inventory with
  serial number tracking to support
  accurate CERN Infrastructureof failures
            analysis Evolution            10
Current Configurations




• Many, diverse applications (“clusters”) managed by different teams
• ..and 700+ other “unmanaged” Linux nodes in VMs that could benefit
  from a simple configuration system


                  CERN Infrastructure Evolution                 11
Job Adverts from Indeed.com

                                                    Index of millions
    Puppet                                          of worldwide job
                                                    posts across
                                                    thousands of
                                                    job sites




                                                    These are the
                                                    sort of posts our
     Quattor                                        departing staff
                                                    will be applying
                                                    for.


         Agile Infrastructure - Configuration and
          CERN Infrastructure Evolution                         12
                     Operation Tools
Configuration Management

• Using off the shelf components
  – Puppet – configuration definition
  – Foreman – GUI and Data store
  – Git – version control
  – Mcollective – remote execution
• Integrated with
  – CERN Single Sign On
  – CERN Certificate Authority
  – Installation Server
           CERN Infrastructure Evolution   13
Status




         CERN Infrastructure Evolution   14
Different Service Models
                      • Pets are given names like
                        lsfmaster.cern.ch
                      • They are unique, lovingly hand
                        raised and cared for
                      • When they get ill, you nurse them
                        back to health

                      • Cattle are given numbers like
                        vm0042.cern.ch
                      • They are almost identical to other
                        cattle
                      • When they get ill, you get another
                        one


• Future application architectures tend towards Cattle
• .. But users now need support for both modes of working

                 CERN Infrastructure Evolution               15
Infrastructure as a Service
• Goals
   –   Improve repair processes with virtualisation
   –   More efficient use of our hardware
   –   Better tracking of usage
   –   Enable remote management for new data centre
   –   Support potential new use cases (PaaS, Cloud)
   –   Sustainable support model
• At scale for 2015
   – 15,000 servers
   – 90% of hardware virtualized
   – 300,000 VMs needed

               CERN Agile Infrastructure
                                              16
OpenStack
• Open source cloud software
• Supported by 173 companies including
  IBM, RedHat, Rackspace, HP, Cisco,
  AT&T, …
• Vibrant development community and
  ecosystem
• Infrastructure as a Service to our scale
• Started in 2010 but maturing rapidly




             CERN Agile Infrastructure
                                             17
Openstack @ CERN
                                                         HORIZON
              KEYSTONE




     GLANCE                                       NOVA




                                                                   Scheduler
                                     Compute
Registry      Image




                                        Volume
                                                                    Network



                      CERN Agile Infrastructure
                                                                   18
Scheduling and Accounting
• Multiple uses of IaaS
  – Server consolidation
  – Classic batch (single or multi-core)
  – Cloud VMs such as CERNVM
• Scheduling options
  – Availability zones for disaster recovery
  – Quality of service options to improve efficiency
    such as build machines, public login services
  – Batch system scalability is likely to be an issue
• Accounting
  – Use underlying services of IaaS and
    Hypervisors for reporting and quotas
             CERN Infrastructure Evolution          19
Monitoring in IT

• >30 monitoring applications
   – Number of producers: ~40k
   – Input data volume: ~280 GB per day
• Covering a wide range of different resources
   – Hardware, OS, applications, files, jobs, etc.
• Application-specific monitoring solutions
   – Using different technologies (including commercial
     tools)
   – Sharing similar needs: aggregate metrics, get alarms,
     etc
• Limited sharing of monitoring data
   – Hard to implement complex monitoring queries



                         20
Monitoring Architecture

              Operations                       Dashboards
                Tools                                              Splunk
                                                and APIs



                                               Storage and         Hadoop
                                              Analysis Engine



              Operations                         Storage
              Consumers                         Consumer




                           Messaging Broker                        Apollo




   Producer                   Producer                  Producer
                                                                   Lemon
     Sensor                    Sensor                    Sensor


                   CERN Infrastructure Evolution                     21
Current Tool Snapshot (Liable to Change)

                                                     Puppet
                                                    Foreman
                 mcollective, yum                                   Jenkins

                                                    AIMS/PXE
                                                     Foreman                  JIRA

Openstack Nova



                                                                                 git, SVN



                                                                              Koji, Mock
                                                     Yum repo
                                                       Pulp




     Hardware                                                   Lemon
     database
                                    Puppet stored
                                      config DB
Timelines
Year   What                Actions
2012                       Prepare formal project plan
                           Establish IaaS in CERN Data Centre
                           Monitoring Implementation as per WG
                           Migrate lxcloud users
                           Early adopters to use new tools


2013   LS 1                Extend IaaS to remote Data Centre
       New Data Centre     Business Continuity
                           Migrate CVI users
                           General migration to new tools with SLC6
                           and Windows 8


2014   LS 1 (to            Phase out legacy tools such as Quattor
       November)




            CERN Infrastructure Evolution                           23
Conclusions
• New data centre provides opportunity to
  expand the Tier-0 capacity
• New tools are required to manage these
  systems and improve processes given
  fixed staff levels
• Integrated monitoring is key to identify
  inefficiencies, bottlenecks and usage
• Moving to an open source tool chain has
  allowed a rapid proof of concept and will
  be a more sustainable solution


            CERN Infrastructure Evolution     24
Background Information

• HEPiX Agile Talks
  – http://cern.ch/go/99Ck
• Tier-0 Upgrade
  – http://cern.ch/go/NN98

Other information, get in touch
         Tim.Bell@cern.ch
         CERN Infrastructure Evolution   25
Backup Slides




CERN Infrastructure Evolution   26
CERN Data Centre Numbers

Systems                      7,899 Hard disks               62,023
Processors                 14,972 Raw disk capacity (TiB)   62,660
Cores                      64,623 Tape capacity (PiB)             47
Memory (TiB)                   165 Ethernet 1Gb ports       16,773
Racks                        1,070 Ethernet 10Gb ports        622
Power Consumption (KiW)      2,345




From http://sls.cern.ch/sls/service.php?id=CCBYNUM

               CERN Infrastructure Evolution                 27
Computer Centre Capacity




         CERN Infrastructure Evolution   28
Capacity to end of LEP




          CERN Infrastructure Evolution   29
Foreman Dashboard




Agile Infrastructure -
Configuration and Operation
Tools
Industry Context - Cloud

•   Cloud computing models are now standardising
     –   Facilities as a Service – such as Equinix, Safehost
     –   Infrastructure as a Service - Amazon EC2, CVI or lxcloud
     –   Platform as a Service - Microsoft Azure or CERN Web Services
     –   Software as a Service – Salesforce, Google Mail, Service-Now, Indico
•   Different customers want access to different layers
     – Both our users and the IT Service Managers




                                          Applications
                                     Platform
                              Infrastructure
                             Facilities


                     CERN Infrastructure Evolution                         31
SLC 4 to SLC 5 migration




          CERN Infrastructure Evolution   32

Weitere ähnliche Inhalte

Was ist angesagt?

Open solaris customer presentation
Open solaris customer presentationOpen solaris customer presentation
Open solaris customer presentationxKinAnx
 
The Evolving Data Center Network: Open and Software-Defined
The Evolving Data Center Network: Open and Software-DefinedThe Evolving Data Center Network: Open and Software-Defined
The Evolving Data Center Network: Open and Software-DefinedDell World
 
Novedades Oow Boot
Novedades Oow BootNovedades Oow Boot
Novedades Oow BootFran Navarro
 
Enterprise-Grade Networking in OpenStack
Enterprise-Grade Networking in OpenStackEnterprise-Grade Networking in OpenStack
Enterprise-Grade Networking in OpenStackMarten Hauville
 
Accelerate to the Cloud
Accelerate to the CloudAccelerate to the Cloud
Accelerate to the CloudNovell
 
Silver peak acceleration, agility and velocity
Silver peak   acceleration, agility and velocitySilver peak   acceleration, agility and velocity
Silver peak acceleration, agility and velocityresponsedatacomms
 
Oracle Database Appliance RAC in a box Some Strings Attached
Oracle Database Appliance RAC in a box Some Strings AttachedOracle Database Appliance RAC in a box Some Strings Attached
Oracle Database Appliance RAC in a box Some Strings AttachedFuad Arshad
 
Securing Your Cloud Applications with Novell Cloud Security Service
Securing Your Cloud Applications with Novell Cloud Security ServiceSecuring Your Cloud Applications with Novell Cloud Security Service
Securing Your Cloud Applications with Novell Cloud Security ServiceNovell
 
Realizing the Promise of the Cloud
Realizing the Promise of the CloudRealizing the Promise of the Cloud
Realizing the Promise of the CloudNovell
 
2020 Vision For Your Network
2020 Vision For Your Network2020 Vision For Your Network
2020 Vision For Your NetworkDell World
 
Big Data 2107 for Ribbon
Big Data 2107 for RibbonBig Data 2107 for Ribbon
Big Data 2107 for RibbonSamuel Dratwa
 
Commercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made SimpleCommercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made Simplearcserve data protection
 
Introduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed DatabasesIntroduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed DatabasesShankar Iyer
 
The Software Based Data Center. Is It For You?
The Software Based Data Center. Is It For You?The Software Based Data Center. Is It For You?
The Software Based Data Center. Is It For You?Dell World
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeSumant Tambe
 

Was ist angesagt? (19)

Open solaris customer presentation
Open solaris customer presentationOpen solaris customer presentation
Open solaris customer presentation
 
The Evolving Data Center Network: Open and Software-Defined
The Evolving Data Center Network: Open and Software-DefinedThe Evolving Data Center Network: Open and Software-Defined
The Evolving Data Center Network: Open and Software-Defined
 
Exalogic Bcn
Exalogic BcnExalogic Bcn
Exalogic Bcn
 
Novedades Oow Boot
Novedades Oow BootNovedades Oow Boot
Novedades Oow Boot
 
Enterprise-Grade Networking in OpenStack
Enterprise-Grade Networking in OpenStackEnterprise-Grade Networking in OpenStack
Enterprise-Grade Networking in OpenStack
 
Accelerate to the Cloud
Accelerate to the CloudAccelerate to the Cloud
Accelerate to the Cloud
 
Silver peak acceleration, agility and velocity
Silver peak   acceleration, agility and velocitySilver peak   acceleration, agility and velocity
Silver peak acceleration, agility and velocity
 
Oracle Database Appliance RAC in a box Some Strings Attached
Oracle Database Appliance RAC in a box Some Strings AttachedOracle Database Appliance RAC in a box Some Strings Attached
Oracle Database Appliance RAC in a box Some Strings Attached
 
Securing Your Cloud Applications with Novell Cloud Security Service
Securing Your Cloud Applications with Novell Cloud Security ServiceSecuring Your Cloud Applications with Novell Cloud Security Service
Securing Your Cloud Applications with Novell Cloud Security Service
 
Realizing the Promise of the Cloud
Realizing the Promise of the CloudRealizing the Promise of the Cloud
Realizing the Promise of the Cloud
 
2020 Vision For Your Network
2020 Vision For Your Network2020 Vision For Your Network
2020 Vision For Your Network
 
Big Data 2107 for Ribbon
Big Data 2107 for RibbonBig Data 2107 for Ribbon
Big Data 2107 for Ribbon
 
Commercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made SimpleCommercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made Simple
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Introduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed DatabasesIntroduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed Databases
 
Cont0519
Cont0519Cont0519
Cont0519
 
The Software Based Data Center. Is It For You?
The Software Based Data Center. Is It For You?The Software Based Data Center. Is It For You?
The Software Based Data Center. Is It For You?
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/Subscribe
 
Big Data NoSQL 1017
Big Data NoSQL 1017Big Data NoSQL 1017
Big Data NoSQL 1017
 

Andere mochten auch

SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinal
SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinalSoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinal
SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinalSoftLayer Technologies
 
SoftLayer overview for Telx CBX Jun10 Final
SoftLayer overview for Telx CBX Jun10 FinalSoftLayer overview for Telx CBX Jun10 Final
SoftLayer overview for Telx CBX Jun10 FinalSoftLayer Technologies
 
Shams.khawaja
Shams.khawajaShams.khawaja
Shams.khawajaNASAPMC
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionSteve Traylen
 

Andere mochten auch (6)

SoftLayer Company Overview
SoftLayer Company OverviewSoftLayer Company Overview
SoftLayer Company Overview
 
SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinal
SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinalSoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinal
SoftLayer Presentation at Stifel Nicolaus Event Feb 10 2011 vfinal
 
SoftLayer overview for Telx CBX Jun10 Final
SoftLayer overview for Telx CBX Jun10 FinalSoftLayer overview for Telx CBX Jun10 Final
SoftLayer overview for Telx CBX Jun10 Final
 
Soft Layer Q109 Media Kit
Soft Layer Q109 Media KitSoft Layer Q109 Media Kit
Soft Layer Q109 Media Kit
 
Shams.khawaja
Shams.khawajaShams.khawaja
Shams.khawaja
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to Production
 

Ähnlich wie 20120524 cern data centre evolution v2

OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Oracleonoracle dec112012
Oracleonoracle dec112012Oracleonoracle dec112012
Oracleonoracle dec112012patmisasi
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale CloudOpen Stack
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Eshed Gal-Or
 
(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deploymentBIOVIA
 
Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012Veloxum Corporation
 
OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpen Stack
 
EMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in LondonEMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in LondonMark Collier
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
Serverless spark
Serverless sparkServerless spark
Serverless sparkMamathaBusi
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStackCMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStackJoe Arnold
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introOpen Stack
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudNick Gerner
 
Enterprise Virtualization with Xen
Enterprise Virtualization with XenEnterprise Virtualization with Xen
Enterprise Virtualization with XenFrank Martin
 
Implementing Private Database Clouds
Implementing Private Database CloudsImplementing Private Database Clouds
Implementing Private Database CloudsRoland Slee
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 

Ähnlich wie 20120524 cern data centre evolution v2 (20)

OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Oracleonoracle dec112012
Oracleonoracle dec112012Oracleonoracle dec112012
Oracleonoracle dec112012
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale Cloud
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?
 
(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment
 
Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012Veloxum corporate introduction for crowdfunder may 29 2012
Veloxum corporate introduction for crowdfunder may 29 2012
 
OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overview
 
EMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in LondonEMEA OpenStack Day Intro, July 13th 2011 in London
EMEA OpenStack Day Intro, July 13th 2011 in London
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStackCMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
CMPE 297 Lecture: Building Infrastructure Clouds with OpenStack
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the Cloud
 
Enterprise Virtualization with Xen
Enterprise Virtualization with XenEnterprise Virtualization with Xen
Enterprise Virtualization with Xen
 
Implementing Private Database Clouds
Implementing Private Database CloudsImplementing Private Database Clouds
Implementing Private Database Clouds
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 

Mehr von Tim Bell

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019Tim Bell
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Tim Bell
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013Tim Bell
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6Tim Bell
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 

Mehr von Tim Bell (20)

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

20120524 cern data centre evolution v2

  • 1. CERN Infrastructure Evolution Tim Bell CHEP CERN IT Department 24th May 2012 CH-1211 Genève 23 Switzerland www.cern.ch/it
  • 2. Agenda • Problems to address • Ongoing projects – Data centre expansion – Configuration management – Infrastructure as a Service – Monitoring • Timelines • Summary CERN Infrastructure Evolution 2
  • 3. Problems to address • CERN data centre is reaching its limits • IT staff numbers remain fixed but more computing capacity is needed • Tools are high maintenance and becoming increasingly brittle • Inefficiencies exist but root cause cannot be easily identified CERN Infrastructure Evolution 3
  • 4. Data centre extension history • Studies in 2008 into building a new computer centre on the CERN Prevessin site – Too costly • In 2011, tender run across CERN member states for remote hosting – 16 bids • In March 2012, Wigner Institute in Budapest, Hungary selected CERN Infrastructure Evolution 4
  • 5. Data Centre Selection • Wigner Institute in Budapest, Hungary CERN Infrastructure Evolution 5
  • 6. Data Centre Layout & Ramp-up CERN Infrastructure Evolution 6
  • 7. Data Centre Plans • Timescales – Prototyping in 2012 – Testing in 2013 – Production in 2014 • This will be a “hands-off” facility for CERN – They manage the infrastructure and hands-on work – We do everything else remotely • Need to be reasonably dynamic in our usage – Will pay power bill • Various possible models of usage – sure to evolve – The less specific the hardware installed there, the easier to change function – Disaster Recovery for key services in Meyrin data centre becomes a realistic scenario CERN Infrastructure Evolution 7
  • 8. Infrastructure Tools Evolution • We had to develop our own toolset in 2002 • Nowadays, – CERN compute capacity is no longer leading edge – Many options available for open source fabric management – We need to scale to meet the upcoming capacity increase • If there is a requirement which is not available through an open source tool, we should question the need – If we are the first to need it, contribute it back to the open source tool CERN Infrastructure Evolution 8
  • 9. Where to Look? • Large community out there taking the “tool chain” approach whose scaling needs match ours: O(100k) servers and many applications • Become standard and join this community
  • 10. Bulk Hardware Deployment • Adapt processes for minimal physical intervention • New machines register themselves automatically into network databases and inventory • Burn-in test uses standard Scientific Linux and production monitoring tools • Detailed hardware inventory with serial number tracking to support accurate CERN Infrastructureof failures analysis Evolution 10
  • 11. Current Configurations • Many, diverse applications (“clusters”) managed by different teams • ..and 700+ other “unmanaged” Linux nodes in VMs that could benefit from a simple configuration system CERN Infrastructure Evolution 11
  • 12. Job Adverts from Indeed.com Index of millions Puppet of worldwide job posts across thousands of job sites These are the sort of posts our Quattor departing staff will be applying for. Agile Infrastructure - Configuration and CERN Infrastructure Evolution 12 Operation Tools
  • 13. Configuration Management • Using off the shelf components – Puppet – configuration definition – Foreman – GUI and Data store – Git – version control – Mcollective – remote execution • Integrated with – CERN Single Sign On – CERN Certificate Authority – Installation Server CERN Infrastructure Evolution 13
  • 14. Status CERN Infrastructure Evolution 14
  • 15. Different Service Models • Pets are given names like lsfmaster.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one • Future application architectures tend towards Cattle • .. But users now need support for both modes of working CERN Infrastructure Evolution 15
  • 16. Infrastructure as a Service • Goals – Improve repair processes with virtualisation – More efficient use of our hardware – Better tracking of usage – Enable remote management for new data centre – Support potential new use cases (PaaS, Cloud) – Sustainable support model • At scale for 2015 – 15,000 servers – 90% of hardware virtualized – 300,000 VMs needed CERN Agile Infrastructure 16
  • 17. OpenStack • Open source cloud software • Supported by 173 companies including IBM, RedHat, Rackspace, HP, Cisco, AT&T, … • Vibrant development community and ecosystem • Infrastructure as a Service to our scale • Started in 2010 but maturing rapidly CERN Agile Infrastructure 17
  • 18. Openstack @ CERN HORIZON KEYSTONE GLANCE NOVA Scheduler Compute Registry Image Volume Network CERN Agile Infrastructure 18
  • 19. Scheduling and Accounting • Multiple uses of IaaS – Server consolidation – Classic batch (single or multi-core) – Cloud VMs such as CERNVM • Scheduling options – Availability zones for disaster recovery – Quality of service options to improve efficiency such as build machines, public login services – Batch system scalability is likely to be an issue • Accounting – Use underlying services of IaaS and Hypervisors for reporting and quotas CERN Infrastructure Evolution 19
  • 20. Monitoring in IT • >30 monitoring applications – Number of producers: ~40k – Input data volume: ~280 GB per day • Covering a wide range of different resources – Hardware, OS, applications, files, jobs, etc. • Application-specific monitoring solutions – Using different technologies (including commercial tools) – Sharing similar needs: aggregate metrics, get alarms, etc • Limited sharing of monitoring data – Hard to implement complex monitoring queries 20
  • 21. Monitoring Architecture Operations Dashboards Tools Splunk and APIs Storage and Hadoop Analysis Engine Operations Storage Consumers Consumer Messaging Broker Apollo Producer Producer Producer Lemon Sensor Sensor Sensor CERN Infrastructure Evolution 21
  • 22. Current Tool Snapshot (Liable to Change) Puppet Foreman mcollective, yum Jenkins AIMS/PXE Foreman JIRA Openstack Nova git, SVN Koji, Mock Yum repo Pulp Hardware Lemon database Puppet stored config DB
  • 23. Timelines Year What Actions 2012 Prepare formal project plan Establish IaaS in CERN Data Centre Monitoring Implementation as per WG Migrate lxcloud users Early adopters to use new tools 2013 LS 1 Extend IaaS to remote Data Centre New Data Centre Business Continuity Migrate CVI users General migration to new tools with SLC6 and Windows 8 2014 LS 1 (to Phase out legacy tools such as Quattor November) CERN Infrastructure Evolution 23
  • 24. Conclusions • New data centre provides opportunity to expand the Tier-0 capacity • New tools are required to manage these systems and improve processes given fixed staff levels • Integrated monitoring is key to identify inefficiencies, bottlenecks and usage • Moving to an open source tool chain has allowed a rapid proof of concept and will be a more sustainable solution CERN Infrastructure Evolution 24
  • 25. Background Information • HEPiX Agile Talks – http://cern.ch/go/99Ck • Tier-0 Upgrade – http://cern.ch/go/NN98 Other information, get in touch Tim.Bell@cern.ch CERN Infrastructure Evolution 25
  • 27. CERN Data Centre Numbers Systems 7,899 Hard disks 62,023 Processors 14,972 Raw disk capacity (TiB) 62,660 Cores 64,623 Tape capacity (PiB) 47 Memory (TiB) 165 Ethernet 1Gb ports 16,773 Racks 1,070 Ethernet 10Gb ports 622 Power Consumption (KiW) 2,345 From http://sls.cern.ch/sls/service.php?id=CCBYNUM CERN Infrastructure Evolution 27
  • 28. Computer Centre Capacity CERN Infrastructure Evolution 28
  • 29. Capacity to end of LEP CERN Infrastructure Evolution 29
  • 30. Foreman Dashboard Agile Infrastructure - Configuration and Operation Tools
  • 31. Industry Context - Cloud • Cloud computing models are now standardising – Facilities as a Service – such as Equinix, Safehost – Infrastructure as a Service - Amazon EC2, CVI or lxcloud – Platform as a Service - Microsoft Azure or CERN Web Services – Software as a Service – Salesforce, Google Mail, Service-Now, Indico • Different customers want access to different layers – Both our users and the IT Service Managers Applications Platform Infrastructure Facilities CERN Infrastructure Evolution 31
  • 32. SLC 4 to SLC 5 migration CERN Infrastructure Evolution 32

Hinweis der Redaktion

  1. The CERN computing infrastructure has a number of immediate problems to be solved. Specifically, how do we continue providing the capacity for the LHC experiments. The CERN computer centre was built in the 1970s for a Cray and a Mainframe and has been gradually adapted to meet the needs of modern commodity computing. However, we have reached the limits and can no longer adapt due to limits on cooling and electricity. At the same time, the tool set that we wrote during the past 10 years to manage the 8,000 servers is getting old. There were no alternatives when we started and so a hand-crafted set of tools such as Quattor, Sindes, SMS and Lemon have been developed. These are now requiring more work to maintain than resources are available. Equally, we know that we are not getting the most out of our installed capacity but identifying the root causes behind inefficiencies is difficult with the vast amounts of data to work through and correlate.
  2. Deployment is staged, starting small in 2013 and growing through 2017 in gradual steps. This allows us to debug processes and facilities early in 2013.
  3. CERN’s computing capacity is no longer at a leading edge. Companies like Rackspace have over 80,000 servers and Google is rumoured to have 500,000 worldwide. Upcoming evolution like the new datacentre, IPv6, scalability pressures, even new operating system versions require substantial effort to support on our own. We need to share the effort with others and benefit from their work. We started a project to make the CERN tools more agile… with the basic principles of We are not unique, there are solutions out there If we find our requirements are not met, question the requirement (and question it again) If we need to develop something, find the nearest tool and submit a patch rather than write from scratch
  4. Followed a Google toolchain approach of small tools glued together… aim is that we try quickly and change if we find limitations. Tool selection is based on maintenance, community support, packaging and documentation quality.
  5. We arrange bulk deliveries of 1000s of servers a year and these currently can take weeks to install and get into production. The large web companies do this automatically using network boot and automatic registering. We will do the same so that the only necessary steps for installation are to boot the machines on a switch which is declared as being an installation target. The machines then register themselves into the network and inventory tools. This saves manual effort on entry of MAC addresses. After this, the burn in tests are started to stress the machines to the full. This finds both near failing parts along with major issues such as firmware problems. The end results are automatically analysed and failures reported to the vendor. All serial numbers and firmware versions are tracked so that failure analysis is easier and statistics on exchanges can be maintained over the life of machines.
  6. When it comes to configuring the boxes after they have been tested, we have a large variety of configurations. There are 100s of different configurations in the computer centre with only a few such as the batch farm being large. Many of the smaller ones have eless than 10 boxes where the effort is high to configure them for newly trained service managers. Over 700 boxes in the computer centre are not under configuration management control too….
  7. At the same time, the tools we use are not commonly used elsewhere. New arrivals do not learn them at university and require individual training (as no formal training courses are available). By adopting a tool commonly used elsewhere, our staff can learn from books and courses and if they leave after a few years at CERN, they have directly re-usable skills for their CVs.
  8. So, we chose a set of software and integrated it with some standard CERN services which are stable and scalable. Puppet is key for us as there are many off the shelf components, from NTP through to web services, which can be re-used. We chose Git although the default source code version control system at CERN is svn. This is because the toolset we are investigating come with out of the box support for git. Thus, we should do what the others do…
  9. Currently, managing 160 boxes on the pilot platform.
  10. Managing 15,000 servers is not simple… there are pets which need looking after and cattle which are disponsable. The aim is to create the infrastructure to support both pets and cattle but manage the hardware like cattle so that hardware management staff can work without needing to keep on scheduling interventions.
  11. Our objectives for the coming years are having 90% of the hardware resources on the DCs virtualized. In this scenario w e will have around 300,000 Virtual Machines. CERN IT department currently has two small projects covering this (LxCloud and CVI). The main objective is building a common solution combining these two projects in terms of scalability and efficiency. We need to support multiple centres, scalability to 300K VMs and assist in reducing the running effort via live migration and platform as a service.
  12. So after some research in IaaS solutions, we found this solution (Openstack). It provides a set of tools that let us build our private cloud. This solution is multi-site, scalable and uses standard cloud interfaces. The clients could build their applications using these cloud interfaces. This will position our cloud solution as another cloud provider.
  13. Openstack has different components like Nova, Glance, Keystone, Horizon and Swift. This is the architecture of the different services and how the interact with each other. There are also the internal components of Glance and Nova. If you have a look on Nova, they are two key components that are the Queue and the Database. Nova is built on a shared-nothing, messaging-based architecture. All the major nova components can be run on multiple servers. This means that most component to component communication must go via message queue. Our implementation at CERN uses these main components.
  14. So, final tool summary …
  15. Early adopters such as dev boxes, non-quattorised boxes, lxplus