Panel: NRP Science Impacts​

Larry Smarr
Larry SmarrInstitute Director, Calit2 um California Institute for Telecommunications and Information Technology
Accelerating Science Discovery
with AI Inference-as-a-Service
High Energy Physics and Gravitational Wave Showcases
Shih-Chieh Hsu
University of Washington
Fourth National Research Platform (4NRP)
Feb 9 2023
San Diego Supercomputer Center
https://a3d3.ai/
OAC-2117997
NSF HDR Institute A3D3
Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery
Our vision is to establish a tightly coupled organization of
domain scientists, computer scientists, and engineers that
unite three core components which are essential to achieve real-
time AI to transform science and engineering discoveries.
NSF Harnessing the Data Revolution (HDR)
● A3D3 is among a much bigger Ecosystem
● A national-scale activity to enable new
modes of data-driven discovery that will
address fundamental questions at the
frontiers of science and engineering.
● Three parallel tracks (~70 awards,~$200M)
○ Institutes
■ Ideas Labs+Framework (28)
[NSF 19-543][NSF 19-549] $52.8M
■ Institutes (5) [NSF 21-519] $78.5M
○ TRIPODS
■ Phase I (15) [NSF 19-550] $22.2M
■ Phase II (2) [NSF 21-604] $20M
○ DSC (19)
■ [NSF 19-518] [NSF 21-604] $25.4M
Amy Walton Oct 26
HDR
A Nationwide Institute
79 Members/10 institutions:
● 17 Senior Personnel
● 3 Research Scientists
● 11 Postdocs
● 27 PhD
● 3 Master
● 12 Undergrad
● 4 Postbacs (Sum ‘22)
● 1 High School
$15M for 5 years since 2021
$1.25M supplement to empower
HDR Ecosystem
trainees.html
S. Hauck
S.-C. Hsu
A. Orsborn E. Shlizerman
M. Coughlin P. Harris E. Katsavounidis
S. Han
K. Hanson
M. Neubauer D. Chen
K.Scholberg
J. Duart
M. Graham
M. Liu M. D. Makin
P. Li
Director Deputy Dir.
co-PI
co-PI
co-PI
Trending of big data volume
• Next-generation experiments will outpace industry data volumes
APS/Alan Stonebraker and V. Gülzow/DESY
HL-LHC projection
CMS Twiki
ATLAS twiki
● Substantial continued software R&D improvements starting to fit into optimistic resource regions
● Similar story for disk and tape
● Memory, network projects are uncertain but undeniably finite resources
Increasing complexity of data
CMS High Granularity Calorimeter
w/ 200 simultaneous pp collisions
Global LIGO-VIRGO-KAGRA Gravitational Wave
detection and parameter inference analysis
Cross-disciplinary challenge
Revolution of AI
AI algorithms have the ability to go beyond algorithms
- Using low level features with deep neural networks and more
advanced data structures lead to long latency
Revolution of AI
E. Moreno et al. Phys. Rev. D 102, 012010 (2020)
AI algorithms have the ability to go beyond algorithms
- Using low level features with deep neural networks and more
advanced data structures lead to long latency
AI algorithms can naturally be accelerated by coprocessors.
The question is HOW!
Direct connect
● Simplest form of coprocessor
implementation
● Difficult to scale up
● User needs to know coprocessor knowledge
in some details
as-a-Service Connection
● Simplest support for mixed hardware
● Scalable
● Throughput optimizations for multiple-client
● Simple client side
FastML Lab
https://fastmachinelearning.org/
Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
OAC-1904444
Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
M. Wang et al., fdata.2020.604083
J. Krupa, MLST 2 (2021) 035005
OAC-1904444
Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
M. Wang et al., fdata.2020.604083
J. Krupa, MLST 2 (2021) 035005
Y. Feng et. al., ML Acce@MIT 2023
OAC-1904444
Machine Learning as-a-Service
Use NVidia triton inference server
for GPU + Customized GCP
Kubernetes
SONIC for HEP
https://github.com/hls-fpga-machine-learning/SonicCMS
Hermes for Gravitational Wave
https://github.com/ML4GW/hermes
aaS for High-Level Trigger and Offline Reconstruction
19
Scalability Test
Benchmarks
Calorimeter
Energy regression
3 layer MLP
2k parameters
Top jet classification
Large CNN
10M parameters
Heterogeneous computing performance comparison
● FPGA-aaST greatly outperfoms GPU-aaS for FACILE
○ Small network, large batch is ideally suited for FPGA
Heterogeneous computing performance comparison
● FPGA-aaST greatly outperfoms GPU-aaS for FACILE
○ Small network, large batch is ideally suited for FPGA
● Comparable performance between FPGA-aaST and GPU-aaS for ResNet
CMS MiniAOD
MiniAOD derivation = step in offline
processing
● Large-scale tests of SONIC-
enabled workflows in Google
Cloud :~100 GPUs/
~10,000 CPU cores
● Achieved ~10% speed up relative
to running all jobs on CPU
● Optimized CPU:GPU ratio of 32:1
Three ML algorithms
in workflow (10% per-
event latency):Jet
tagger, MET
regression, Tau ID
P. McCormack
ProtoDune-Single Phase 1kt LAr-TPC
● SONIC framework has been implemented
to enable use of CNN for track vs. EM
cascade discrimination (EmTrkMichelId)
● A 100 GPU cloud server with Kubernetes
load balancer was used to process 6.4
million out of 7.2 million events from 2018
● SONIC accelerates ML inference for
ProtoDUNE reconstruction
○ – 2.7x speed up of full ProtoDUNE workflow
● Optimal: 1 T4 GPU per 68 CPU
T. Yang
A real-time monitoring view
Gravitational Wave
Nature Astronomy 6(2022) 529
NVIDIA V100 32GB GPUs+32
vCPUs+6 concurrent execution
Deploy DeepClean to remove noise from roughly a month’s worth of strain data from the O3
observing run of the LIGO-Virgo instruments
X10
X5
IaaS CPU
IaaS GPU
FACILE @ NRP with FaaST [2010.08556]
● Original studies with FaaST leveraged
large DDR4 memory banks
● NRP equipped with Alveo U55Cs,
contain new high bandwidth memory
(HBM)
○ Understanding how to most efficiently use
HBM memory to transfer data is vital for
high throughput applications
○ Only really possible with physical cards at
NRP!
NRP - a wonderful playground for aaS R&D
● CMS: High-Level Trigger / HPC:
● miniAOD-as-a-Service (offline integration)
● HLT-as-a-service(concept demo)
● ATLAS:
○ AthenaTriton
○ Simulation: FastCaloSimGPU-as-a-Serve
● Clustering: SPVCNN Calorimeter and Vertex
● Tracking: ACTSExaTrkX-as-a-Service
● LIGO-Virgo-KAGRA
○ ML4GW/HERMES: Inference-as-a-Service in the upcoming 4th run for hardware denoising and Gravitational
Wave detection
● Zwicky Transient Facility
○ NMMA: ML-assisted follow-up for GW counterparts due to kilonovae or Gamma-Ray Burst afterglow
● Neural science
○ Large-scale brain recording & behavioral monitoring
Accelerating Physics with ML@MIT, Jan 2023
Summary
● AI as a service shifts paradigm of real time AI processing and offline
processing
○ We have demonstrate promising acceleration for LHC experiment and LIGO
○ It has been used in the ProtoDune-SP data reprocessing
● Influence on system like at NRP is crucial for future tests (multiple cards,
multiple algorithms)
29
https://a3d3.ai/jobs.html
Backup
Community engagement and training
● bring together developers and stakeholders with an interest in fully integrating
machine learning-based tools from experiment to physics analyses and results
Accelerating Physics with ML@MIT, Jan
2023
https://indico.cern.ch/event/1224718/
Fast Machine Learning workshop, Oct 2022
A3D3 for Machine Learning Challenge
● A3D3 receives $1.25M supplement grant. One of the activities is to lead Machine Learning Challenge
for the NSF HDR Ecosystem and looking for collaboration with HEP community
● Aim is to make a series of datasets released to public and explore common ML and data approaches
a. Use these datasets to make a set of ML Challenges
b. Use for education, training and outreach
c. Engagement with industry partners to ensure challenges are aligned with real-world applications
(training and professional development pipeline)
● We are lacking a clear framework for testing and validation
a. There are potentially a few options:
i. Hugging Face
ii. https://www.modelshare.org/
● We are looking for building strong connections with MLCommons Science, FAIR4HEP and FAIR-
Universe.
FACILE @ NRP with FaaST [2010.08556]
● FACILE algorithm for calorimeter
energy reconstruction from
overlapping pulses in CMS
hadronic calorimeter (HCAL)
● 3-layer MLP, 2k parameters,
necessary to run 16k times per
event (large inherent batch)
● 15 ms latency on CPU, 2 ms
on GPU (8x), 0.2 ms on FPGA
(80x)
D. Rankin
FACILE @ NRP with FaaST [2010.08556]
● FACILE algorithm for calorimeter
energy reconstruction from
overlapping pulses in CMS
hadronic calorimeter (HCAL)
● 3-layer MLP, 2k parameters,
necessary to run 16k times per
event (large inherent batch)
● 15 ms latency on CPU, 2 ms
on GPU (8x), 0.2 ms on FPGA
(80x)
1 von 34

Recomendados

TeraGrid Communication and Computation von
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
369 views42 Folien
OpenACC Monthly Highlights: January 2021 von
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
498 views13 Folien
Lecture_IIITD.pptx von
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
28 views24 Folien
OpenACC and Open Hackathons Monthly Highlights August 2022 von
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC and Open Hackathons Monthly Highlights August 2022
OpenACC and Open Hackathons Monthly Highlights August 2022OpenACC
266 views14 Folien
OpenACC and Open Hackathons Monthly Highlights: April 2022 von
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC and Open Hackathons Monthly Highlights: April 2022
OpenACC and Open Hackathons Monthly Highlights: April 2022OpenACC
169 views12 Folien
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility von
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
657 views35 Folien

Más contenido relacionado

Similar a Panel: NRP Science Impacts​

AI Super computer update von
AI Super computer update AI Super computer update
AI Super computer update Ganesan Narayanasamy
522 views43 Folien
Scientific Application Development and Early results on Summit von
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
528 views43 Folien
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti... von
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Larry Smarr
29 views13 Folien
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
98 views13 Folien
Security Challenges and the Pacific Research Platform von
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformLarry Smarr
883 views38 Folien
Convergence of Machine Learning, Big Data and Supercomputing von
Convergence of Machine Learning, Big Data and SupercomputingConvergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and SupercomputingDESMOND YUEN
346 views36 Folien

Similar a Panel: NRP Science Impacts​(20)

Scientific Application Development and Early results on Summit von Ganesan Narayanasamy
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti... von Larry Smarr
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Andrew Wiedlea - Wireless FasterData and Distributed Open Compute Opportuniti...
Larry Smarr29 views
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von Larry Smarr
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Larry Smarr98 views
Security Challenges and the Pacific Research Platform von Larry Smarr
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
Larry Smarr883 views
Convergence of Machine Learning, Big Data and Supercomputing von DESMOND YUEN
Convergence of Machine Learning, Big Data and SupercomputingConvergence of Machine Learning, Big Data and Supercomputing
Convergence of Machine Learning, Big Data and Supercomputing
DESMOND YUEN346 views
The Coming Age of Extreme Heterogeneity in HPC von inside-BigData.com
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
inside-BigData.com509 views
OpenACC Monthly Highlights Summer 2019 von OpenACC
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
OpenACC211 views
Cyberinfrastructure and Applications Overview: Howard University June22 von marpierc
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc1.7K views
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf von OpenACC
OpenACC and Open Hackathons Monthly Highlights May  2023.pdfOpenACC and Open Hackathons Monthly Highlights May  2023.pdf
OpenACC and Open Hackathons Monthly Highlights May 2023.pdf
OpenACC172 views
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI von inside-BigData.com
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
OpenACC Monthly Highlights: May 2019 von OpenACC
OpenACC Monthly Highlights: May 2019OpenACC Monthly Highlights: May 2019
OpenACC Monthly Highlights: May 2019
OpenACC1K views
OpenACC Monthly Highlights: October2020 von OpenACC
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
OpenACC372 views
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf von OpenACC
OpenACC and Open Hackathons Monthly Highlights June 2022.pdfOpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC and Open Hackathons Monthly Highlights June 2022.pdf
OpenACC575 views
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx von OpenACC
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptxOpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
OpenACC308 views
Performance Characterization and Optimization of In-Memory Data Analytics on ... von Ahsan Javed Awan
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Ahsan Javed Awan192 views
OpenACC and Hackathons Monthly Highlights von OpenACC
OpenACC and Hackathons Monthly HighlightsOpenACC and Hackathons Monthly Highlights
OpenACC and Hackathons Monthly Highlights
OpenACC209 views
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptx von OpenACC
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptxOpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptx
OpenACC609 views

Más de Larry Smarr

Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
80 views100 Folien
Global Network Advancement Group - Next Generation Network-Integrated Systems von
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
109 views72 Folien
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Larry Smarr
84 views12 Folien
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
93 views22 Folien
Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
8 views100 Folien
Panel: The Global Research Platform: An Overview von
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewLarry Smarr
94 views11 Folien

Más de Larry Smarr(20)

Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr80 views
Global Network Advancement Group - Next Generation Network-Integrated Systems von Larry Smarr
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
Larry Smarr109 views
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von Larry Smarr
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Larry Smarr84 views
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von Larry Smarr
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
Larry Smarr93 views
Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr8 views
Panel: The Global Research Platform: An Overview von Larry Smarr
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
Larry Smarr94 views
Panel: Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Panel: Future Wireless Extensions of Regional Optical NetworksPanel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical Networks
Larry Smarr119 views
Global Research Platform Workshops - Maxine Brown von Larry Smarr
Global Research Platform Workshops - Maxine BrownGlobal Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine Brown
Larry Smarr92 views
Built around answering questions von Larry Smarr
Built around answering questionsBuilt around answering questions
Built around answering questions
Larry Smarr101 views
Democratizing Science through Cyberinfrastructure - Manish Parashar von Larry Smarr
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
Larry Smarr114 views
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses; von Larry Smarr
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Larry Smarr92 views
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je... von Larry Smarr
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Larry Smarr101 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr193 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr7 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr6 views
Frank Würthwein - NRP and the Path forward von Larry Smarr
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
Larry Smarr130 views
Global Network Advancement Group Next Generation Network-Integrated Sys... von Larry Smarr
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
Larry Smarr42 views
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical NetworksRobert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Larry Smarr5 views
Larry Smarr - NRP Application Drivers von Larry Smarr
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
Larry Smarr141 views
Richard Alo: Panel - Reaching More Minority-Serving Campuses von Larry Smarr
Richard Alo: Panel -  Reaching More Minority-Serving CampusesRichard Alo: Panel -  Reaching More Minority-Serving Campuses
Richard Alo: Panel - Reaching More Minority-Serving Campuses
Larry Smarr22 views

Último

AI: mind, matter, meaning, metaphors, being, becoming, life values von
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life valuesTwain Liu 刘秋艳
34 views16 Folien
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure von
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI InfrastructureCXL Forum
125 views16 Folien
Five Things You SHOULD Know About Postman von
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
25 views43 Folien
[2023] Putting the R! in R&D.pdf von
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdfEleanor McHugh
38 views127 Folien
Combining Orchestration and Choreography for a Clean Architecture von
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean ArchitectureThomasHeinrichs1
68 views24 Folien
MemVerge: Memory Viewer Software von
MemVerge: Memory Viewer SoftwareMemVerge: Memory Viewer Software
MemVerge: Memory Viewer SoftwareCXL Forum
118 views10 Folien

Último(20)

AI: mind, matter, meaning, metaphors, being, becoming, life values von Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure von CXL Forum
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
CXL Forum125 views
Five Things You SHOULD Know About Postman von Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 views
Combining Orchestration and Choreography for a Clean Architecture von ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 views
MemVerge: Memory Viewer Software von CXL Forum
MemVerge: Memory Viewer SoftwareMemVerge: Memory Viewer Software
MemVerge: Memory Viewer Software
CXL Forum118 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... von Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 views
Business Analyst Series 2023 - Week 3 Session 5 von DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10165 views
"Fast Start to Building on AWS", Igor Ivaniuk von Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays36 views
The Importance of Cybersecurity for Digital Transformation von NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS25 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV von Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 views
TE Connectivity: Card Edge Interconnects von CXL Forum
TE Connectivity: Card Edge InterconnectsTE Connectivity: Card Edge Interconnects
TE Connectivity: Card Edge Interconnects
CXL Forum96 views
Data-centric AI and the convergence of data and model engineering: opportunit... von Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier29 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... von Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin70 views
CXL at OCP von CXL Forum
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum208 views
AMD: 4th Generation EPYC CXL Demo von CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 views
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... von NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS28 views
JCon Live 2023 - Lice coding some integration problems von Bernd Ruecker
JCon Live 2023 - Lice coding some integration problemsJCon Live 2023 - Lice coding some integration problems
JCon Live 2023 - Lice coding some integration problems
Bernd Ruecker67 views
Web Dev - 1 PPT.pdf von gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 views

Panel: NRP Science Impacts​

  • 1. Accelerating Science Discovery with AI Inference-as-a-Service High Energy Physics and Gravitational Wave Showcases Shih-Chieh Hsu University of Washington Fourth National Research Platform (4NRP) Feb 9 2023 San Diego Supercomputer Center https://a3d3.ai/ OAC-2117997
  • 2. NSF HDR Institute A3D3 Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery
  • 3. Our vision is to establish a tightly coupled organization of domain scientists, computer scientists, and engineers that unite three core components which are essential to achieve real- time AI to transform science and engineering discoveries.
  • 4. NSF Harnessing the Data Revolution (HDR) ● A3D3 is among a much bigger Ecosystem ● A national-scale activity to enable new modes of data-driven discovery that will address fundamental questions at the frontiers of science and engineering. ● Three parallel tracks (~70 awards,~$200M) ○ Institutes ■ Ideas Labs+Framework (28) [NSF 19-543][NSF 19-549] $52.8M ■ Institutes (5) [NSF 21-519] $78.5M ○ TRIPODS ■ Phase I (15) [NSF 19-550] $22.2M ■ Phase II (2) [NSF 21-604] $20M ○ DSC (19) ■ [NSF 19-518] [NSF 21-604] $25.4M Amy Walton Oct 26 HDR
  • 5. A Nationwide Institute 79 Members/10 institutions: ● 17 Senior Personnel ● 3 Research Scientists ● 11 Postdocs ● 27 PhD ● 3 Master ● 12 Undergrad ● 4 Postbacs (Sum ‘22) ● 1 High School $15M for 5 years since 2021 $1.25M supplement to empower HDR Ecosystem trainees.html S. Hauck S.-C. Hsu A. Orsborn E. Shlizerman M. Coughlin P. Harris E. Katsavounidis S. Han K. Hanson M. Neubauer D. Chen K.Scholberg J. Duart M. Graham M. Liu M. D. Makin P. Li Director Deputy Dir. co-PI co-PI co-PI
  • 6. Trending of big data volume • Next-generation experiments will outpace industry data volumes APS/Alan Stonebraker and V. Gülzow/DESY
  • 7. HL-LHC projection CMS Twiki ATLAS twiki ● Substantial continued software R&D improvements starting to fit into optimistic resource regions ● Similar story for disk and tape ● Memory, network projects are uncertain but undeniably finite resources
  • 8. Increasing complexity of data CMS High Granularity Calorimeter w/ 200 simultaneous pp collisions Global LIGO-VIRGO-KAGRA Gravitational Wave detection and parameter inference analysis
  • 10. Revolution of AI AI algorithms have the ability to go beyond algorithms - Using low level features with deep neural networks and more advanced data structures lead to long latency
  • 11. Revolution of AI E. Moreno et al. Phys. Rev. D 102, 012010 (2020) AI algorithms have the ability to go beyond algorithms - Using low level features with deep neural networks and more advanced data structures lead to long latency AI algorithms can naturally be accelerated by coprocessors. The question is HOW!
  • 12. Direct connect ● Simplest form of coprocessor implementation ● Difficult to scale up ● User needs to know coprocessor knowledge in some details
  • 13. as-a-Service Connection ● Simplest support for mixed hardware ● Scalable ● Throughput optimizations for multiple-client ● Simple client side FastML Lab https://fastmachinelearning.org/
  • 14. Machine Learning as-a-Service D Rankin’s talk J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13 D. Rankin et. al., H2RC51942.2020.00010 Brainwave EC2 F1 OAC-1904444
  • 15. Machine Learning as-a-Service D Rankin’s talk J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13 D. Rankin et. al., H2RC51942.2020.00010 Brainwave EC2 F1 M. Wang et al., fdata.2020.604083 J. Krupa, MLST 2 (2021) 035005 OAC-1904444
  • 16. Machine Learning as-a-Service D Rankin’s talk J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13 D. Rankin et. al., H2RC51942.2020.00010 Brainwave EC2 F1 M. Wang et al., fdata.2020.604083 J. Krupa, MLST 2 (2021) 035005 Y. Feng et. al., ML Acce@MIT 2023 OAC-1904444
  • 17. Machine Learning as-a-Service Use NVidia triton inference server for GPU + Customized GCP Kubernetes SONIC for HEP https://github.com/hls-fpga-machine-learning/SonicCMS Hermes for Gravitational Wave https://github.com/ML4GW/hermes
  • 18. aaS for High-Level Trigger and Offline Reconstruction
  • 20. Benchmarks Calorimeter Energy regression 3 layer MLP 2k parameters Top jet classification Large CNN 10M parameters
  • 21. Heterogeneous computing performance comparison ● FPGA-aaST greatly outperfoms GPU-aaS for FACILE ○ Small network, large batch is ideally suited for FPGA
  • 22. Heterogeneous computing performance comparison ● FPGA-aaST greatly outperfoms GPU-aaS for FACILE ○ Small network, large batch is ideally suited for FPGA ● Comparable performance between FPGA-aaST and GPU-aaS for ResNet
  • 23. CMS MiniAOD MiniAOD derivation = step in offline processing ● Large-scale tests of SONIC- enabled workflows in Google Cloud :~100 GPUs/ ~10,000 CPU cores ● Achieved ~10% speed up relative to running all jobs on CPU ● Optimized CPU:GPU ratio of 32:1 Three ML algorithms in workflow (10% per- event latency):Jet tagger, MET regression, Tau ID P. McCormack
  • 24. ProtoDune-Single Phase 1kt LAr-TPC ● SONIC framework has been implemented to enable use of CNN for track vs. EM cascade discrimination (EmTrkMichelId) ● A 100 GPU cloud server with Kubernetes load balancer was used to process 6.4 million out of 7.2 million events from 2018 ● SONIC accelerates ML inference for ProtoDUNE reconstruction ○ – 2.7x speed up of full ProtoDUNE workflow ● Optimal: 1 T4 GPU per 68 CPU T. Yang A real-time monitoring view
  • 25. Gravitational Wave Nature Astronomy 6(2022) 529 NVIDIA V100 32GB GPUs+32 vCPUs+6 concurrent execution Deploy DeepClean to remove noise from roughly a month’s worth of strain data from the O3 observing run of the LIGO-Virgo instruments X10 X5 IaaS CPU IaaS GPU
  • 26. FACILE @ NRP with FaaST [2010.08556] ● Original studies with FaaST leveraged large DDR4 memory banks ● NRP equipped with Alveo U55Cs, contain new high bandwidth memory (HBM) ○ Understanding how to most efficiently use HBM memory to transfer data is vital for high throughput applications ○ Only really possible with physical cards at NRP!
  • 27. NRP - a wonderful playground for aaS R&D ● CMS: High-Level Trigger / HPC: ● miniAOD-as-a-Service (offline integration) ● HLT-as-a-service(concept demo) ● ATLAS: ○ AthenaTriton ○ Simulation: FastCaloSimGPU-as-a-Serve ● Clustering: SPVCNN Calorimeter and Vertex ● Tracking: ACTSExaTrkX-as-a-Service ● LIGO-Virgo-KAGRA ○ ML4GW/HERMES: Inference-as-a-Service in the upcoming 4th run for hardware denoising and Gravitational Wave detection ● Zwicky Transient Facility ○ NMMA: ML-assisted follow-up for GW counterparts due to kilonovae or Gamma-Ray Burst afterglow ● Neural science ○ Large-scale brain recording & behavioral monitoring Accelerating Physics with ML@MIT, Jan 2023
  • 28. Summary ● AI as a service shifts paradigm of real time AI processing and offline processing ○ We have demonstrate promising acceleration for LHC experiment and LIGO ○ It has been used in the ProtoDune-SP data reprocessing ● Influence on system like at NRP is crucial for future tests (multiple cards, multiple algorithms)
  • 31. Community engagement and training ● bring together developers and stakeholders with an interest in fully integrating machine learning-based tools from experiment to physics analyses and results Accelerating Physics with ML@MIT, Jan 2023 https://indico.cern.ch/event/1224718/ Fast Machine Learning workshop, Oct 2022
  • 32. A3D3 for Machine Learning Challenge ● A3D3 receives $1.25M supplement grant. One of the activities is to lead Machine Learning Challenge for the NSF HDR Ecosystem and looking for collaboration with HEP community ● Aim is to make a series of datasets released to public and explore common ML and data approaches a. Use these datasets to make a set of ML Challenges b. Use for education, training and outreach c. Engagement with industry partners to ensure challenges are aligned with real-world applications (training and professional development pipeline) ● We are lacking a clear framework for testing and validation a. There are potentially a few options: i. Hugging Face ii. https://www.modelshare.org/ ● We are looking for building strong connections with MLCommons Science, FAIR4HEP and FAIR- Universe.
  • 33. FACILE @ NRP with FaaST [2010.08556] ● FACILE algorithm for calorimeter energy reconstruction from overlapping pulses in CMS hadronic calorimeter (HCAL) ● 3-layer MLP, 2k parameters, necessary to run 16k times per event (large inherent batch) ● 15 ms latency on CPU, 2 ms on GPU (8x), 0.2 ms on FPGA (80x) D. Rankin
  • 34. FACILE @ NRP with FaaST [2010.08556] ● FACILE algorithm for calorimeter energy reconstruction from overlapping pulses in CMS hadronic calorimeter (HCAL) ● 3-layer MLP, 2k parameters, necessary to run 16k times per event (large inherent batch) ● 15 ms latency on CPU, 2 ms on GPU (8x), 0.2 ms on FPGA (80x)

Hinweis der Redaktion

  1. Large Hadron Collider (LHC) and the Square Kilometre Array (SKA)
  2. Are there things around you in the ecosystem that do not exist, but if they were there it would make your project more impactful or increase its chances of success? Kyle has good connections through iris-hep with Hugging-Face and ModelShare This could connect well with Reana