2. Keeping up with big data science
Experiences and outlook for the CERN LHC computing
14/03/2019 Tim Bell 2
Tim Bell
CERN IT
@noggin143
Register Lectures
14th March 2019
3. About Tim
• Responsible for
Compute and
Monitoring in
CERN IT
department
• Previously worked
for IBM and
Deutsche Bank
14/03/2019 Tim Bell 3
4. The Mission of CERN
Push back the frontiers of knowledge
E.g. the secrets of the Big Bang …what was the matter like
within the first moments of the Universe’s existence?
Develop new technologies for
accelerators and detectors
Information technology - the Web and the GRID
Medicine - diagnosis and therapy
Train scientists and engineers of
tomorrow
Unite people from different countries
and cultures
14/03/2019 Tim Bell 4
5. 5
CERN: founded in 1954: 12 European States
“Science for Peace”
Today: 22 Member States
Member States: Austria, Belgium, Bulgaria, Czech Republic, Denmark, Finland,
France, Germany, Greece, Hungary, Israel, Italy, Netherlands, Norway, Poland,
Portugal, Romania, Slovak Republic, Spain, Sweden, Switzerland and
United Kingdom
Associate Members in the Pre-Stage to Membership: Cyprus, Serbia, Slovenia
Associate Member States: India, Lithuania, Pakistan, Turkey, Ukraine
Applications for Membership or Associate Membership:
Brazil, Croatia, Estonia
Observers to Council: Japan, Russia, United States of America;
European Union, JINR and UNESCO
~ 2600 staff
~ 1800 other paid personnel
~ 13000 scientific users
Budget (2018) ~ 1150 MCHF
5
6. Science is getting more and more global
CERN: 235 staff, 55 fellows, 7 doctoral + 3 technical students
6
10. Discovery 2012, Nobel Prize in Physics 2013
The Nobel Prize in Physics 2013 was awarded jointly to François Englert
and Peter W. Higgs "for the theoretical discovery of a mechanism that
contributes to our understanding of the origin of mass of subatomic
particles, and which recently was confirmed through the discovery of the
predicted fundamental particle, by the ATLAS and CMS experiments at
CERN's Large Hadron Collider”. 10
11. 12th March 1989, 30 years ago
“Vague but interesting”
Or Archie? Gopher?
14/03/2019 Tim Bell 11
https://web30.web.cern.ch/
https://www.youtube.com/watch?v=A1L2xODZSI4
12. Medical Application as an Example of Particle Physics Spin-off
Accelerating particle beams
~30’000 accelerators worldwide
~17’000 used for medicine
Hadron Therapy
Leadership in Ion
Beam Therapy now
in Europe and
Japan
Tumour
Target
Protons
light ions
>100’000 patients treated worldwide (45 facilities)
>50’000 patients treated in Europe (14 facilities)
X-ray protons
Detecting particles
Imaging PET Scanner
Clinical trial in Portugal, France
and Italy for new breast imaging
system (ClearPEM)
14/03/2019 Tim Bell 12
13. Data Analysis at the LHC
The process to transform raw data into useful physics datasets
• This is a complicated series of steps at the LHC (Run2)
Data
Volume
Processing
and people
HLT Reconstruction Reprocessing Organized
Analysis
Final
Selection
50k
cores
80kcores
20k
cores
40k
cores
DAQandTrigger
(lessthan200)
Operations
(lessthan100)
Operations
(lessthan100)
AnalysisUsers
(lMorethan1000)
AnalysisUsers
(lMorethan1000)
SelectedRAW
(1GB/s)
DerivedData
(2GB/s)
FromDetector(1PB/s)
AnalysisSelection
(100MB/s)
AfterHardwareTrigger(TB/s)
DerivedData
(2GB/s)
14/03/2019 Tim Bell 13
14. 14
Tier-1: permanent
storage, re-processing,
analysis
Tier-0
(CERN and Hungary):
data recording,
reconstruction and
distribution
Tier-2: Simulation,
end-user analysis
> 2 million jobs/day
~1M CPU cores
~1 EB of storage
~170 sites,
42 countries
10-100 Gb links
WLCG:
An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage
resource into a single infrastructure accessible by all LHC physicists
The Worldwide LHC Computing Grid
14/03/2019 Tim Bell
15. Big Science – Big Data
40 million pictures per second in one experiment, of which about 1000 recorded
Worldwide LHC Computing Grid – 800 PB of storage
>170 sites in 42 countries
14/03/2019 Tim Bell 15
16. 2018 was quite a year - Storage
LHCC; 26 Feb 2019 Ian Bird 16
2018: 88 PB
ATLAS: 24.7
CMS: 43.6
LHCb: 7.3
ALICE: 12.4
inc. parked b-physics data
Data transfers
Heavy Ion Run
CERN Tape Store
330 PB archived
18. TEIN
TEIN,
Mumbai
GÉANT,
Europe
GARR
LHCONE VRF domain/aggregator
UChi
PoP router
See http://lhcone.net for more detail.
Ver. 4.2, May 29, 2018 – WEJohnston, ESnet, wej@es.net
Belle II Tier 1/2
- yellow outline indicates LHC+Belle II site
- Dashed outline indicates distributed site
}KEK
LHC Tier 2/3 ATLAS and CMS
LHCONE L3VPN: A global infrastructure for High Energy Physics data analysis (LHC, Belle II, Pierre Auger Observatory,
NOvA, XENON)
TIFR
NKN,India
toCERN
Starlight
(Chicago)
NetherLight
(Amsterdam)MANLAN
(NewYork)
GÉANT
Open
London
CESNET
Czechia
ESnet
Geneva
RedIRIS
Spain
PIC-T1
RoEduNet
Romania
UAIC xx, ISS,
NIHAM,
ITIM, NIPNEx3
PacWave
(Sunnyvale)
JANET
UK
IC
PacWave
Los Angeles
NORDUnet
Nordic
UChi
(MWT2)
ARNES
Slovenia
SiGNET xx
AGLT2
MSU
AGLT2
UM
NREN/site router at exchange point
Budapest
LPNHE
CEA-
RENATER
France
LPNHE
APC,
LAL
IPNHO
LLR,
CEA-
IRFU
LPC,
CPPM,
IPHC,
Subatech
CC-IN2P3-T1
Korea
CANARI
E
GARR
Italy
INFN
Bari, Catania,
Frascati,
Legnaro,
Milano, Roma1,
Torino
INFN
Pisa, Napoli
London
Internet2
ESnet
Caltech
MIT
NDGF-
T1,
NDGF-
T1b
NDGF-
T1c
RedIRI
S
PacificWave
(distributedexchange)
Bucharest
LHC Tier 1 ATLAS and CMSCNAF-T1
CSTNet
Network providerANSP
UWisc
Redecomep
HEPGrid
(UERJ)
CBPF
RNP/IPĔ
Brazil
SAMP
A
(USP)
AtlanticWave
(distributedexchange)
PNWG,
KREONET,
SINET,
CANARIE,
ESnet
KIAE/JINR
Russia
HEPNET
JINR, IHEP
Protvino, GridPNPI,
RCC-KI T2, ITEP
UIUC
(MWT2)
UCSD
MANLAN
toCERN
NOTES
1) LHCOPN paths are not shown on
this diagram
2) The “LHCONE peerings” at the
exchange points indicate who has a
presence there and not that all peer
with each other (see
https://twiki.cern.ch/twiki/bin/view/LH
CONE/LhcOneVRF )
Internet2
USA
ASGC
KREON
ET
RRC-KI T1
JINR T1
Global ResearchPlatform Network (GRPnet)
PacificWave/
PNWG
PERN
Pakistan
NCP-
LCG2
Mumbai
Madrid
CANARIE
toGÉANT
(Londonvia
Orient+)
FCCN
Portugal
CANARIE
London
SINET-Internet2
SPRACE
ANSP
Prague
Paris
Brussels
Belnet
Belgium
Lisbon
ESnet
USA
BNL-T1BNL-T1, FNAL-T1
Moscow
Helsinki
prague_cesnet_lcg
2,
FZU/praguelcg2
SingAREN
Singapore
DFN
Germany
DESY
RWTH, Wupp.U,
GSI
DE-KIT-T1KIT
Frankfurt
toCERN
SINET,
Internet2
CANARIE,
ESnet
Internet2, SINET,
Caltech, UCSD,
AARNet, UCSD
Sites that are
standalone VRFsUNLPNU LHC ALICE or LHCb site
NORDUn
et
CANARIE
CANARI
E
ESnet, CERN,
MREN,
Internet2,
GÉANT, GARR,
CANARIE,
UChicago, ANL,
FNAL,
KREONET,
PacificWave,
GRPnet, UMich
via Merit/MILR,
ASGC,
KIAE/RU,
CSTNET, CUDI
ASGC
UT
Arlington
CNAF-T1
Kharkov-
KIPT
URAN
Ukraine
ThaiREN
Thailand
Hamburg
Beijing
MyREN
Malaysia
Singapore
NKN
India
NKN
India
CSTNet China
Chin
a
Next
Gen
erati
on
Inter
net,
Inter
net
Exch
ange
,
Beiji
ng
IHEP,
Beijing
CMS
ATLAS
Sarov
INR
(Troitsk)
CIEMAT-LCG2,
UAM-LCG2
BEgrid-ULB-
VUB,
Begrid-UCL,
IIHE
AARNet
Australia
UMel
PK-CIIT
PNNL, SLAC, ANL, ORNL
to
JGN
HKIX
Hong Kong
ASGC
Taiwan
ASGC-T1
Indiana
USA
UNL
KREON
ET
Milan
ESnet
CERN
Geneva
CERN-T1
CERN-T0
NET2
BU
UCSB
SINET+
ESnet
CANARIE
Canada
TRIUMF-T1 Montreal
MREN
(Chicago)
Harvard
NL-T1
SURFsara
Nikhef
Netherlands
Beijing
viaOrient+
Internet2,
CERN, MIT,
UChi, UFla,
UOak,
Vand,
GÉANT (all
of NRENS
and sites)
and
Brazilian
sites),
ESnet,
NORDUnet,
KIAE/RU
(HEPNET,
GRIDPNPI,
IHEP),
CANARIE
SANET
Slovakia
FMPh-
UNIBA
xxIEPSA
S-
Kosice
GAR
R
Amsterdam
Amsterdam
CERN,
NORDUn
et,
GÉANT,
Caltech,
NL-T1,
U. Mich,
SURFsar
a, ESnet,
PSNC
Vienna
PIONIER
PolandPSNC
KIAE
Cyfronet AGH
Kraków, U.Warsaw
ICM
CERNLight
Geneva
GRNET
Greece
HEPLAB
xx
Ioannina
UIC
CIC
OmniPoP
(Chicago)
JGN
Japan
ESnet,Internet2,
CANARIEvia
PacWave,LA
Vanderbilt, UFlorida,
NE, SoW, Harvard
VECC, TIFR
(same VRF as
ESnet Europe)
IU(MWT2), ND,
Purdue
WIX
(Washington)
SOX
AMPATH
NAP of Americas
(Miami)
PTTA
NAP do Brasil
(São Paulo)
SimFraU
Uvic, Utor, UBC, McGill
ASGC2, NCU,
NTU
SINET
Japan
KEK T1
Hirosh.,
Tsukuba,
Tokyo
ICEPP
U Tokyo
CUDI
CUDI
Mexico
UNAM
Tokyo
Osaka
Osaka
KREONET
NORDUnet
Poznan
Communication links:
1/10, 20/30/40, and 100Gb/s
Underlined link information
indicates link provider, not use
Exchange point/regional R&E communication
nexus
w/ switch providing VLAN connections
Collaborating sites not
connected to LHCONE
JGN –
SingAREN/NS
CC
Tokyo
to ESnet
via
Seattle,
SINET
KREONE
T
Hong
Kong
Connection internal to a domain,
and of unspecified bandwidth
PacWave /
PNWG
(Seattle)
Hong Kong
CNGI-6IX
CERNet2 China
KREONET2
Korea
PNUKISTI –T1
KNU, KCMS
to
JGN
London
GÉA
NT,
CAN
ARIE,
NOR
DUne
t,
SINE
T
ESne
t
MOXY
(Montreal)
GÉANT,
MAN LAN,
Worldwide networking
19. LHC Schedule
Run 3 Alice, LHCb
upgrades
Run 4 ATLAS, CMS
upgrades
14/03/2019 Tim Bell 19
20. CERN Infrastructure Transitions
• Pre-LHC (-2009)
• Mainframes (80s) to Unix (90s) to Linux (00s)
• EU funded developments such as Quattor and
Lemon
• Long Shutdown 1 (2013-2015)
• Move to open source cloud based infrastructure
• Community tools such as OpenStack, Puppet and
Grafana
• Long Shutdown 2 (2019-2021) ?
• Add Containerisation with Kubernetes and Terraform
14/03/2019 Tim Bell 20
22. Open Source Communities
• Good cultural fit with CERN
• Meritocracy
• Sharing with other labs
• Giving back to society
• Matches staffing models
• Contract lengths
• Attracts skills
• Peer recognition
• Career opportunities
• Need to support growth
• Contributions back
• Scale testing
• Dojos e.g. OpenStack, CentOS, Ceph
• Evangelise
• Input for Governance
14/03/2019 Tim Bell 22
24. CERN Open Data Portal
14/03/2019 Tim Bell 24
Publicly-accessible site for curated releases of CERN data sets and software
http://opendata.cern.ch
LHC
and
more
2016
CMS
300 TB
2017
CMS
~1 PB
25. LHC Schedule
Run 3 Alice, LHCb
upgrades
Run 4 ATLAS, CMS
upgrades
14/03/2019 Tim Bell 25
26. Events at HL-LHC
• Increased complexity due to much higher pile-up and
higher trigger rates will bring several challenges to
reconstruction algorithmsMS had to cope with monster pile-up
8b4e bunch structure à pile-up of ~ 60 events/x-ing
for ~ 20 events/x-ing)
CMS: event with 78 reconstructed vertices
CMS: event from 2017 with 78
reconstructed vertices
ATLAS: simulation for HL-LHC
with 200 vertices
14/03/2019 Tim Bell 26
27. HL-LHC computing cost parameters
Tim Bell 27
Core
Algorithms
Infrastructure
Software
Performance
Parameters
Business of the experiments:
amount of Raw data, thresholds;
Detector design long term
computing cost implications
Business of the experiments:
reconstruction, and
simulation algorithms
Performance/architectures/memory
etc.;
Tools to support: automated
build/validation
Collaboration with externals – via HSF
New grid/cloud models; optimize
CPU/disk/network; economies
of scale via clouds, joint
procurements etc.
28. The HL-LHC computing challenge
• HL-LHC needs for ATLAS and CMS are above the expected
hardware technology evolution (15% to 20%/yr) and funding (flat)
• The main challenge is storage, but computing requirements grow
20-50x
14/03/2019 Tim Bell 28
30. • Data flow challenges
3
0
Desert site
Perth /
Cape Town
World users
10 – 50 x data rate reduction by SDP
SDP
31. Google
searches
98 PB
LHC Science
data
~200 PB
SKA Phase 1 –
2023
~300 PB/year
science data
HL-LHC – 2026
~600 PB Raw data
HL-LHC – 2026
~1 EB Physics data
SKA Phase 2 – mid-2020’s
~1 EB science data
LHC – 2016
50 PB raw data
Facebook
uploads
180 PB
Google
Internet archive
~15 EB
Yearly data volumes
10 Billion of these
14/03/2019 Tim Bell 31
32. Medical Data Deluge
• “150 EBytes of
medical data in the
US, growing 48%
annually” [1]
• Cost of instruments
and laboratory
equipment
decreasing fast (e.g.
sub-1k$ genomic
sequencers)
• Medical and fitness
wearable devices on
the rise, projected
data produced in
2020 335 PB/month
[2]
Wearable
devices
Instruments
Images
Publications,
EHR, notes
Clinical trials
Simulations
[1] Esteva A. et al., A Guide to Deep Learning in Healthcare, in Nature – Medicine, Vol. 25, Jan 2019, 24-29
[2] https://www.statista.com/statistics/292837/global-wearable-device-mobile-data-traffic/
34. “data lake” Concept
Idea is to localize bulk
data in a cloud service
(Tier 1’s data lake):
minimize replication,
assure availability
Serve data to remote
(or local) compute –
grid, cloud, HPC, ???
Simple caching is all
that is needed at
compute site
Works at national,
regional, global scales
37. Using Supercomputers?
• Can we use supercomputers in the various national laboratories for
LHC computing?
• Large scale super computer resources optimized for tightly coupled computing
are being used for more HEP applications
• Many of the same techniques needed to burst jobs to clouds and handle
distributed storage are the same needed to burst to high scale on centralized
HPC resources
• HPC resources have many cores but generally less memory per core
• Applications have been modified to be better suited to HPC
• Smaller memory footprints, more use of parallel algorithms, modifications in
IO
3714/03/2019 Tim Bell
39. New methods
14/03/2019 Tim Bell 39
Data acquisition
• Real time event categorization
• Data monitoring &
certification
• Fast inference for trigger
systems
Data Reconstruction
• Calorimeter reconstruction
• Boosted object jet tagging
Data Processing
• Computing resource
optimization
• Predicting data popularity
• Intelligent networking
Data Simulation
• Adversarial networks
• Fast simulation
Data Analysis
• Knowledge base
• Data reduction
• Searches for new physics
๏ Fut ur e det ect or s wi l l be 3D ar r ays of sensor s wi t h r egul ar
geomet r y
๏ I t woul d be i deal t o qui ckl y r econst r uct par t i cl es di r ect l y
f r om t he i mage ( whi ch i s what Deep Lear ni ng became f amous f or )
P a r t ic le r e c o n s t r u c t io n a s im a g e d e t e c t io n
12
Deep Learning for Imaging Calorimet ry
Vitoria Barin Pacela,⇤ Jean-Roch Vlimant, Maurizio Pierini, and Maria Spiropulu
California Institute of Technology and
CMS
Weinvestigateparticlereconstruction using Deep Learning, based on a dataset consisting of single-
particle energy showers in a highly-granular Linear Collider Detector calorimeter with a regular 3D
array of cells. We perform energy regression on photons, electrons, neutral and charged pions, and
discuss the performance of our model in each particle dataset.
I . I N T ROD U CT I ON
One the greatest challenges at the LHC at
CERN is to collect and analyse data efficiently.
Sophisticated machine learning methods have
been researched to tackle this problem, such as
boosted decision trees and deep learning. In
this project, we are using deep neural networks
(DNN) [1] [2] to recognize images originated by
the collisions in the Linear Collider Detector
(LCD) calorimeter [3] [4], designed to operate
at the Compact Linear Collider (CLIC).
Preliminary studies have explored the possi-
bility of reconstructing particlesfrom calorimet-
ric deposits using image recognition techniques
based on convolutional neural networks, using
a dataset of simulated hits of individual par-
ticles on the LCD surface. The dataset con-
sists of calorimetric showers produced by sin-
gle particles (pions, electrons or photons) hit-
ting the surface of an electromagnetic calorime-
FIG. 1. Visualization of the data. Charged pion
event displayed in the ECAL and HCAL. Every hit
isshown in itsrespectivecell in each of thecalorime-
ters. Warmer colors (like orange and pink) repre-
sent higher energies, as 420 GeV, whereas colder
colors, like blue, represent lower energies, as 50
GeV.[5]
I I . M ET H OD S
The datasets were simulated as close as pos-
40. CERN openlab
Evaluate state of the art technologies in
collaboration with companies to address CERN’s
extreme computing challenges
14/03/2019 Tim Bell 40
41. High Luminosity LHC until 2035
• Ten times more collisions than
the original design
Studies in progress:
Compact Linear Collider (CLIC)
• Up to 50Km long
• Linear e+e- collider √s up to 3 TeV
Future Circular Collider (FCC)
• ~100 Km circumference
• New technology magnets
100 TeV pp collisions in 100km ring
• e+e- collider (FCC-ee) as 1st step?
European Strategy for Particle Physics
• Preparing next update in 2020
Future of particle physics ?
14/03/2019 Tim Bell 41
43. Summary
• CERN’s physics program will challenge storage,
networking and compute technology
• Collaborations with industry, open source, open
data and outreach drive CERN’s missions with
significant benefits outside High Energy Physics
and research
Further information at
• http://home.cern
• http://techblog.web.cern.ch
• http://lhcathome.web.cern.ch/
• http://opendata.cern.ch/
14/03/2019 Tim Bell 43
46. ESFRI Science Projects
HL-LHC SKA
FAIR CTA
KM3Net JIVE-ERIC
ELT EST
EURO-VO EGO-VIRGO
(LSST) (CERN,ESO)
Goals:
Prototype an infrastructure for the European
Open Science Cloud that is adapted to the
Exabyte-scale needs of the large ESFRI science
projects.
Ensure that the science communities drive the
development of the EOSC.
Has to address FAIR data management, long term
preservation, open access, open science, and
contribute to the EOSC catalogue of services.
• HL-LHC
• Square Kilometer Array (SKA)
• Facility for Antiproton and Ion Research (FAIR)
• Cubic Kilometre Neutrino Telescope
(KM3NET)
• Cherenkov Telescope Array (CTA)
• Extremely Large Telescope (ELT)
• European Solar Telescope (EST)
• European Gravitational Observatory (EGO)
Hinweis der Redaktion
4
5
6
12
Takes weeks and involves a big central operation team and large user community. Data is touched several times and by different sets of teams
90% of compute resources are now allocated on the cloud
ESCAPE (European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures) aims to address the Open Science challenges shared by ESFRI facilities (CTA, ELT, EST, FAIR, HL-LHC, KM3NeT, SKA) as well as other pan-European research infrastructures (CERN, ESO, JIV-ERIC, EGO-Virgo) in astronomy and particle physics research domains.
ESFRI https://www.esfri.eu/
FAIR is in GSI Darmstadt, Germany
KM3NET - http://www.km3net.org/ in the med
ELT –