1. Accelerating Science Discovery
with AI Inference-as-a-Service
High Energy Physics and Gravitational Wave Showcases
Shih-Chieh Hsu
University of Washington
Fourth National Research Platform (4NRP)
Feb 9 2023
San Diego Supercomputer Center
https://a3d3.ai/
OAC-2117997
2. NSF HDR Institute A3D3
Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery
3. Our vision is to establish a tightly coupled organization of
domain scientists, computer scientists, and engineers that
unite three core components which are essential to achieve real-
time AI to transform science and engineering discoveries.
4. NSF Harnessing the Data Revolution (HDR)
● A3D3 is among a much bigger Ecosystem
● A national-scale activity to enable new
modes of data-driven discovery that will
address fundamental questions at the
frontiers of science and engineering.
● Three parallel tracks (~70 awards,~$200M)
○ Institutes
■ Ideas Labs+Framework (28)
[NSF 19-543][NSF 19-549] $52.8M
■ Institutes (5) [NSF 21-519] $78.5M
○ TRIPODS
■ Phase I (15) [NSF 19-550] $22.2M
■ Phase II (2) [NSF 21-604] $20M
○ DSC (19)
■ [NSF 19-518] [NSF 21-604] $25.4M
Amy Walton Oct 26
HDR
5. A Nationwide Institute
79 Members/10 institutions:
● 17 Senior Personnel
● 3 Research Scientists
● 11 Postdocs
● 27 PhD
● 3 Master
● 12 Undergrad
● 4 Postbacs (Sum ‘22)
● 1 High School
$15M for 5 years since 2021
$1.25M supplement to empower
HDR Ecosystem
trainees.html
S. Hauck
S.-C. Hsu
A. Orsborn E. Shlizerman
M. Coughlin P. Harris E. Katsavounidis
S. Han
K. Hanson
M. Neubauer D. Chen
K.Scholberg
J. Duart
M. Graham
M. Liu M. D. Makin
P. Li
Director Deputy Dir.
co-PI
co-PI
co-PI
6. Trending of big data volume
• Next-generation experiments will outpace industry data volumes
APS/Alan Stonebraker and V. Gülzow/DESY
7. HL-LHC projection
CMS Twiki
ATLAS twiki
● Substantial continued software R&D improvements starting to fit into optimistic resource regions
● Similar story for disk and tape
● Memory, network projects are uncertain but undeniably finite resources
8. Increasing complexity of data
CMS High Granularity Calorimeter
w/ 200 simultaneous pp collisions
Global LIGO-VIRGO-KAGRA Gravitational Wave
detection and parameter inference analysis
10. Revolution of AI
AI algorithms have the ability to go beyond algorithms
- Using low level features with deep neural networks and more
advanced data structures lead to long latency
11. Revolution of AI
E. Moreno et al. Phys. Rev. D 102, 012010 (2020)
AI algorithms have the ability to go beyond algorithms
- Using low level features with deep neural networks and more
advanced data structures lead to long latency
AI algorithms can naturally be accelerated by coprocessors.
The question is HOW!
12. Direct connect
● Simplest form of coprocessor
implementation
● Difficult to scale up
● User needs to know coprocessor knowledge
in some details
13. as-a-Service Connection
● Simplest support for mixed hardware
● Scalable
● Throughput optimizations for multiple-client
● Simple client side
FastML Lab
https://fastmachinelearning.org/
14. Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
OAC-1904444
15. Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
M. Wang et al., fdata.2020.604083
J. Krupa, MLST 2 (2021) 035005
OAC-1904444
16. Machine Learning as-a-Service
D Rankin’s talk
J. Duarte et al., Com. Soft. Big Sci. 3 (2019) 13
D. Rankin et. al., H2RC51942.2020.00010
Brainwave
EC2 F1
M. Wang et al., fdata.2020.604083
J. Krupa, MLST 2 (2021) 035005
Y. Feng et. al., ML Acce@MIT 2023
OAC-1904444
17. Machine Learning as-a-Service
Use NVidia triton inference server
for GPU + Customized GCP
Kubernetes
SONIC for HEP
https://github.com/hls-fpga-machine-learning/SonicCMS
Hermes for Gravitational Wave
https://github.com/ML4GW/hermes
21. Heterogeneous computing performance comparison
● FPGA-aaST greatly outperfoms GPU-aaS for FACILE
○ Small network, large batch is ideally suited for FPGA
22. Heterogeneous computing performance comparison
● FPGA-aaST greatly outperfoms GPU-aaS for FACILE
○ Small network, large batch is ideally suited for FPGA
● Comparable performance between FPGA-aaST and GPU-aaS for ResNet
23. CMS MiniAOD
MiniAOD derivation = step in offline
processing
● Large-scale tests of SONIC-
enabled workflows in Google
Cloud :~100 GPUs/
~10,000 CPU cores
● Achieved ~10% speed up relative
to running all jobs on CPU
● Optimized CPU:GPU ratio of 32:1
Three ML algorithms
in workflow (10% per-
event latency):Jet
tagger, MET
regression, Tau ID
P. McCormack
24. ProtoDune-Single Phase 1kt LAr-TPC
● SONIC framework has been implemented
to enable use of CNN for track vs. EM
cascade discrimination (EmTrkMichelId)
● A 100 GPU cloud server with Kubernetes
load balancer was used to process 6.4
million out of 7.2 million events from 2018
● SONIC accelerates ML inference for
ProtoDUNE reconstruction
○ – 2.7x speed up of full ProtoDUNE workflow
● Optimal: 1 T4 GPU per 68 CPU
T. Yang
A real-time monitoring view
25. Gravitational Wave
Nature Astronomy 6(2022) 529
NVIDIA V100 32GB GPUs+32
vCPUs+6 concurrent execution
Deploy DeepClean to remove noise from roughly a month’s worth of strain data from the O3
observing run of the LIGO-Virgo instruments
X10
X5
IaaS CPU
IaaS GPU
26. FACILE @ NRP with FaaST [2010.08556]
● Original studies with FaaST leveraged
large DDR4 memory banks
● NRP equipped with Alveo U55Cs,
contain new high bandwidth memory
(HBM)
○ Understanding how to most efficiently use
HBM memory to transfer data is vital for
high throughput applications
○ Only really possible with physical cards at
NRP!
27. NRP - a wonderful playground for aaS R&D
● CMS: High-Level Trigger / HPC:
● miniAOD-as-a-Service (offline integration)
● HLT-as-a-service(concept demo)
● ATLAS:
○ AthenaTriton
○ Simulation: FastCaloSimGPU-as-a-Serve
● Clustering: SPVCNN Calorimeter and Vertex
● Tracking: ACTSExaTrkX-as-a-Service
● LIGO-Virgo-KAGRA
○ ML4GW/HERMES: Inference-as-a-Service in the upcoming 4th run for hardware denoising and Gravitational
Wave detection
● Zwicky Transient Facility
○ NMMA: ML-assisted follow-up for GW counterparts due to kilonovae or Gamma-Ray Burst afterglow
● Neural science
○ Large-scale brain recording & behavioral monitoring
Accelerating Physics with ML@MIT, Jan 2023
28. Summary
● AI as a service shifts paradigm of real time AI processing and offline
processing
○ We have demonstrate promising acceleration for LHC experiment and LIGO
○ It has been used in the ProtoDune-SP data reprocessing
● Influence on system like at NRP is crucial for future tests (multiple cards,
multiple algorithms)
31. Community engagement and training
● bring together developers and stakeholders with an interest in fully integrating
machine learning-based tools from experiment to physics analyses and results
Accelerating Physics with ML@MIT, Jan
2023
https://indico.cern.ch/event/1224718/
Fast Machine Learning workshop, Oct 2022
32. A3D3 for Machine Learning Challenge
● A3D3 receives $1.25M supplement grant. One of the activities is to lead Machine Learning Challenge
for the NSF HDR Ecosystem and looking for collaboration with HEP community
● Aim is to make a series of datasets released to public and explore common ML and data approaches
a. Use these datasets to make a set of ML Challenges
b. Use for education, training and outreach
c. Engagement with industry partners to ensure challenges are aligned with real-world applications
(training and professional development pipeline)
● We are lacking a clear framework for testing and validation
a. There are potentially a few options:
i. Hugging Face
ii. https://www.modelshare.org/
● We are looking for building strong connections with MLCommons Science, FAIR4HEP and FAIR-
Universe.
33. FACILE @ NRP with FaaST [2010.08556]
● FACILE algorithm for calorimeter
energy reconstruction from
overlapping pulses in CMS
hadronic calorimeter (HCAL)
● 3-layer MLP, 2k parameters,
necessary to run 16k times per
event (large inherent batch)
● 15 ms latency on CPU, 2 ms
on GPU (8x), 0.2 ms on FPGA
(80x)
D. Rankin
34. FACILE @ NRP with FaaST [2010.08556]
● FACILE algorithm for calorimeter
energy reconstruction from
overlapping pulses in CMS
hadronic calorimeter (HCAL)
● 3-layer MLP, 2k parameters,
necessary to run 16k times per
event (large inherent batch)
● 15 ms latency on CPU, 2 ms
on GPU (8x), 0.2 ms on FPGA
(80x)
Hinweis der Redaktion
Large Hadron Collider (LHC) and the Square Kilometre Array (SKA)
Are there things around you in the ecosystem that do not exist, but if they were there it would make your project more impactful or increase its chances of success?
Kyle has good connections through iris-hep with Hugging-Face and ModelShare
This could connect well with Reana