SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Downloaden Sie, um offline zu lesen
Exascale Deep Learning for Climate Analytics
Thorsten Kurth*, Josh Romero*, Sean Treichler, Mayur Mudigonda,

Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe,

Massimiliano Fatica, Prabhat, Michael Houston
GTC Silicon Valley
03/17/2019, San Jose, CA
The Team
2
Thorsten Kurth Sean Treichler Josh Romero Mayur Mudigonda Nathan Luehr Everett Phillips
Ankur Mahesh Michael Matheson Jack Deslippe Massimiliano Fatica Prabhat Michael Houston
Socio-Economic Impact of
Extreme Weather Events
• tropical cyclones and
atmospheric rivers have major
impact on modern economy and
society
• CA: 50% of rainfall through
atmospheric rivers
• FL: flooding, influence on
insurance premiums and home
prices
• $200B worth of damage in 2017
• costs of ~$10B/event for large
events
3Harvey 2017
Katrina 2005 Berkeley 2019
Santa Rosa 2018
pixabay
pixabay Chris Samuel
pixabay
Understanding Extreme
Weather Phenomena
• will there be more hurricanes?
• will they be more intense?
• will they make landfall more often?
• will atmospheric rivers carry more
water?
• can they help mitigate droughts and
decrease risk of forest fires?
• will they cause flooding and heavy
precipitation?
4
pixabay
pixabay
pixabay
Typical Climate Analytics
5
Typical Climate Analytics
5
Typical Climate Analytics
5
mean/std

Temperature
Typical Climate Analytics
5
mean/std

Temperature
14M variables/3h O(1) variables/y
Impact Quantification of Extreme Weather Events
• detect hurricanes and atmospheric
rivers in climate model projections
• enable geospatial analysis of EW
events and statistical impact
studies for regions around the
world
• flexible and scalable detection
algorithm
• gear up for future simulations with
∼1 km2 spatial resolution
6
M.F. Wehner, doi:10.1002/2013MS000276
Unique Challenges for Climate Analytics
• interpret as segmentation problem
• 3 classes - background (BG), tropical cyclones (TC), atmospheric rivers (AR)
• deep learning has proven successful for these tasks
• climate data is complex
• high imbalance - more than 95% of 

pixels are background
• high variance - shape of events change
• many input channels w/ 

different properties
• high resolution required
• no static background, highly variable 

in space and time
7
NASA
Unique Challenges for Deep Learning
• need labeled data for supervised approach
• can be leveraged from existing heuristic-based approaches
• define neural network architecture
• balance between compute performance and model accuracy
• employ high productivity/flexibility frameworks for rapid prototyping
• performance optimization requires holistic approach
• hyper parameter tuning (HPO)
• necessary for convergence and accuracy
8
Unique Challenges for Deep Learning at Extreme Scale
9
• data management
• shuffling/loading/processing/feeding 20 TB dataset to keep GPUs busy
• efficient use of remote filesystem
• multi-node coordination and synchronization
• synchronous reduction of O(50)MB across 27360 GPUs after each iteration
• hyper parameter tuning (HPO)
• convergence and accuracy challenging due to larger global batch sizes
Label Creation: Atmospheric Rivers
10
1. The climate model predicts 

water vapor, wind speeds 

and humidity
2. These observables are used to

compute the 

Integrated Water Vapor Transport
Label Creation: Atmospheric Rivers
11
3. Binarization by thresholding at

95th percentile
4. Flood fill algorithm generates

AR candidates by masking out 

regions in mid-latitudes
Label Creation: Tropical Cyclones
12
1. Extract cyclone center and radius using 

thresholds for pressure, temperature, and vorticity
2. Binarize patch around cyclone center 

using thresholds for water vapor, 

wind, and precipitation
Software: TensorFlow
• high-productivity deep learning framework in Python
with C++-backend, developed by Google
• leverages optimized cuDNN library for performance
sensitive kernels (e.g. convolutions) on NVIDIA GPUs
• dataflow-style programming and asynchronous graph
execution
• provides features for building I/O input pipeline
• can be combined with external Python modules to
provide good flexibility
13
Software: Horovod
• library developed by Uber for distributed training
• provides straightforward wrappers to convert existing
single process TensorFlow programs to multi-process
data-parallel programs
• uses MPI for worker coordination/control
• MPI is ubiquitous in HPC and available broadly on
systems targeting HPC applications
• can leverage MPI or NCCL for collective communication
on NVIDIA GPUs (e.g. allreduce)
14
Systems
15
• Cray XC50 HPC system at CSCS, 5th on top500
• 5320 nodes with Intel Xeon E5-2695v3 

and 1 NVIDIA P100 GPU
• Cray Aries interconnect in 

diameter 5 dragonfly topology
• ~54.4 PetaFlop/s peak performance (FP32)
Piz Daint Summit
• leadership class HPC system at OLCF, 1st on top500
• 4609 nodes with 2 IBM P9 CPU and 6 NVIDIA V100 GPU
• 300 GB/s NVLink connection btw. 3 GPUs in a group
• 800 GB available NVMe storage/node
• dual-rail EDR Infiniband in fat-tree topology
• ~3.45 ExaFlop/s theoretical peak performance (FP16)
Carlos Jones (ORNL)CSCS
Single GPU
cuDNN
Single GPU
cuDNN
Single GPU
• Things to consider:
• Is my TensorFlow model
efficiently using GPU
resources?
• Is my data input pipeline
keeping up?
• Is my TensorFlow model
providing reasonable
results?
cuDNN
Single Node
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
Single Node
NCCL
MPI
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
Single Node
NCCL
MPI
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
cuDNN
Single Node
• Things to consider:
• Is my data input pipeline
still keeping up?
• Is my data-parallel
TensorFlow model
providing reasonable
results?
• How is my performance
scaling over PCIe/NVLink
using Horovod?
Multi Node
NCCL MPI
Multi Node
• Things to consider:
• Is my data-parallel
TensorFlow model still
providing reasonable
results?
• How is my performance
scaling over NVLink +
InfiniBand using Horovod?
• How do I distribute my
data across so many nodes?
NCCL MPI
Deep Learning Models for Extreme Weather Segmentation
20
decoderencoder
1152×768, 16
5×5 conv, 64
1152×768, 3
input output
5×5 conv, +322×
1×1 conv, 128, /2
5×5 conv, +32
1×1 conv, 384, /2
5×5 conv, +32
4×
5×
1x1 deconv, 160, /2
5×5 conv, +324×
1x1 deconv, 128, /2
5×5 conv, +322×
1x1 deconv, 64, /2
5×5 conv, +32
1x1 deconv, 64, /2
5×5 conv, +32
1×1 conv, 3
2×
2×
5×5 conv, +322×
1×1 conv, 192, /2
5×5 conv, +322×
1×1 conv, 256, /2
1152×768, 128
72×48, 160
144×96, 384
144×96, 544
576×384, 192
144×96, 384
288×192, 256
288×192, 384288×192, 256
576×384, 192 576×384, 256
1152×768, 128
1152×768, 192
1152×768, 256
Tiramisu, 35 layers, 

7.8M parameters, 4.2 TF/sample
20
DeepLabv3+, 66 layers, 

43.7M parameters, 14.4 TF/sample
decoder
ASPP
encoder
7×7 conv, 64, /2
1152×768, 16
1×1 conv, 64
3×3 conv, 64
1×1 conv, 256
288×192, 64
3×
1×1 conv, 128
3×3 conv, 128
1×1 conv, 512
144×96, 256
4×
1×1 conv, 256
3×3 conv, 256, d 2
1×1 conv, 1024
144×96, 512
6×
1×1 conv, 512
3×3 conv, 512, d 4
1×1 conv, 2048
144×96, 1024
3×
1×1 conv, 256
3×3 conv, 256, d 12
3×3 conv, 256, d 24
3×3 conv, 256, d 36
1×1 conv, 256
144×96, 1024
144×96, 2048
3×3 deconv, 256, /2
1×1 conv, 48
3×3 conv, 64
3×3 conv, 128
3×3 conv, 256
3×3 conv, 256
3×3 deconv, 256, /2
3×3 deconv, 256, /2
1×1 conv, 3
3×3 conv, 256
3×3 conv, 256
1152×768, 3
288×192, 256
1152×768, 256
1152×768, 128
288×192, 256
3×3 maxpool, /2
input output
Data Staging
21
• 250 training samples/GPU

(15 GB), sample w/ replacement
• each file will be read at most once from FS
• files shared between nodes via MPI
(mpi4py)
• preprocess and feed data to GPU
asynchronously using tf.data and python
multiprocess
Dataset Size Required BW
(27K GPUs)
GPFS/LUSTRE BurstBuffer NVM/e or DRAM
20 TB (~63K samples) 3.8 TB/s ~400 GB/s ~2 TB/s ~26 TB/s
N
V
M
e
N
V
M
e
N
V
M
e
...
1.5K
sam
ples
1.5Ksamples
1.5K
sam
ples
shuffle
On-Node I/O Pipeline
• files are in HDF5 with single sample + label/file
• list of filenames passed to TensorFlow Dataset API (tf.data)
• HDF5 serialization bottleneck addressed with multiprocessing + h5py
• extract and batch using tf.data input pipeline
22
...
data-2107-12-26-02-4.h5
data-2107-12-26-03-1.h5
data-2107-12-26-03-4.h5
data-2107-12-26-04-1.h5
data-2107-12-26-04-4.h5
data-2107-12-26-05-1.h5
data-2107-12-26-05-4.h5
data-2107-12-26-06-1.h5
data-2107-12-26-06-4.h5
data-2107-12-26-07-1.h5
...
...
data-2107-03-03-06-1.h5
data-2107-05-24-00-4.h5
data-2107-08-30-03-4.h5
data-2107-10-29-01-4.h5
data-2107-12-11-07-1.h5
data-2107-08-14-03-4.h5
data-2107-01-08-01-4.h5
data-2107-09-08-04-1.h5
data-2107-09-22-00-1.h5
data-2107-07-16-03-4.h5
...
shuffle
4-way parallel 

read + preprocess
batch
asimovinstitute.org/neural-network-zoo
CPU
GPU
Single Node Performance
• GPU execution profiled with CUDA profiler, kernels grouped by category
• convolution kernels: use latest cuDNN, favor higher computational intensity
• pay attention to memory layout to reduce transposes and copies
• tuning input pipeline on CPU to keep off critical path
23
DeepLabv3+ FP16 Training
Category
#
Kern
Time
(ms)
Math
(TF)
Mem
(GB)
%
Time
%
Math
%
Mem
Forward
n Convolutions 158 147.9 9.61 27.6 18.1 52.0 20.7
Point-wise 829 52.3 < 0.1 24.3 6.4 51.6
Backward
n
Convolutions 195 300.2 19.21 50.5 36.7 51.2 18.7
Point-wise 157 25.6 < 0.1 6.3 3.1 27.3
Optimizer 1219 3.9 < 0.1 1.1 0.5 31.3
Copies / Transposes 708 213.2 - 92.6 26.1 48.3
Allreduce (NCCL) 30 58.7 < 0.1 0.6 7.2 1.1
Type Conversions 201 1.3 - 0.6 0.2 51.3
GPU Idle 14.2 1.7
Total 3497 817.3 28.82 203.6 28.2 27.7
Improvements to Horovod: Original Control Plane
24
w4
w3
w1
...
{1, 2, 5, 8, 13}
{2, 3, 5, 10, 13}
{1, 3, 5, 10, 13, 14}
w2 {1, 3, 5, 7, 13}
w1
{5, 13}
w1
intersect lists
allreduce5,13
gather broadcast
w4
w3
w1
...
w2
Improvements to Horovod: Tree-based Control Plane
25
...
w4
w3
w1
w2
wN
allreduce5,13
asynchronous
gather + intersect
...
w4
w3
w1
w2
w5
w6
w7
w5
w6
w7
...
...
tree-based
broadcast
Improvements to Horovod: Hybrid All-Reduce
• NCCL uses NVLink for
high throughput, but
ring-based algorithms
are latency-limited at
scale
• hybrid NCCL/MPI
strategy uses strengths
of both
• one inter-node all
reduce per virtual NIC
• MPI work overlaps well
with GPU computation
26
intra-node 

allreduce (NCCL)
4x inter-node
allreduce (MPI)
4x intra-node
broadcast (NCCL)
Gradient Pipelining (Lag)
27
wN
...gN
k
allreduce
ḡk wN
w3
g3
k ḡk w3
w2
g2
k ḡk
w2
w1
g1
k ḡk w1
lag-0 (fully synchronous)
qN
q1
wN
...
w1
gN
k-1
g1
k-1
gN
k
g1
k
...
q1
qN
allreduce
ḡk-1
ḡk-1
wN
w1
...
...
lag-1
...
Importance of Node-Local Memory
28
} 

~10% performance
penalty for reading
from global file system

Scaling Tiramisu
• FP16-model sensitive
to communication
• FP16-model BW-bound

(only 2.5x faster than
FP32)
• almost ideal scaling for
both precisions on
Summit when gradient
lag is used
29
Scaling DeepLabv3+
30
• FP16-model sensitive
to communication
• FP16-model BW-bound

(only 2.5x faster than
FP32)
• excellent scaling for
both precisions on
Summit when gradient
lag is used
31
999 PetaFlop/s
(FP16) sustained
DeepLabv3+, 4560 nodes (27360 GPU)
1.13 ExaFlop/s
(FP16) peak
31
DeepLabv3+, 4560 nodes (27360 GPU)
Concurrency/Precision and Convergence
33
Concurrency/Precision and Convergence
33
~2.1x improvement in time to solution
Model/Lag and Convergence
34
Segmentation Animation
35
• best result for intersection-over-union (IoU) obtained: ∼73%
• result at large scale (batch-size > 1500): IoU ∼55%
Segmentation Animation
35
• best result for intersection-over-union (IoU) obtained: ∼73%
• result at large scale (batch-size > 1500): IoU ∼55%
What has happened since Gordon Bell?
• Horovod:
• Hierarchical (hybrid) allreduce available in upstream code (NVIDIA/Amazon)
• Control plane bypass via caching available soon (NVIDIA)
• NCCL:
• Version 2.4.x introduced tree-based reduction algorithms for improved
performance at scale
• NERSC:
• ClimateNet: human annotated labels from domain experts
• project serves as science problem for the ISC-HPC19 student cluster AI challenge
• Gained insights for design and evaluation of upcoming NERSC Perlmutter system
36
Conclusions
• deep learning and HPC converge, achieving exascale performance
• compute capabilities of contemporary HPC systems can be utilized to tackle
challenging scientific deep learning problems
• HPO and convergence at scale still an open problem — but now we can do it
• software enhancements benefit deep learning community at large
• deep learning-powered techniques usher in a new era of precision analytics for
various science areas
37
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Lviv Data Science Summer School
 
A Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather ForecastingA Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather Forecastingijctcm
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Artificial neural network for load forecasting in smart grid
Artificial neural network for load forecasting in smart gridArtificial neural network for load forecasting in smart grid
Artificial neural network for load forecasting in smart gridEhsan Zeraatparvar
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earningAnirudh Ganguly
 
A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesReza Farrahi Moghaddam, PhD, BEng
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist databasebtandale
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterEditor IJCATR
 
Microstructural Analysis and Machine Learning
Microstructural Analysis and Machine LearningMicrostructural Analysis and Machine Learning
Microstructural Analysis and Machine LearningPFHub PFHub
 
Neural Network Presentation
Neural Network PresentationNeural Network Presentation
Neural Network PresentationOmoye
 
11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...Alexander Decker
 
(Eckovation machine learning) mnist handwritten digit recognition
(Eckovation machine learning) mnist handwritten digit  recognition(Eckovation machine learning) mnist handwritten digit  recognition
(Eckovation machine learning) mnist handwritten digit recognitionChaitany Bhardwaj
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...journalBEEI
 
Accelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsAccelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsDarian Frajberg
 
Learning from data for wind–wave forecasting
Learning from data for wind–wave forecastingLearning from data for wind–wave forecasting
Learning from data for wind–wave forecastingJonathan D'Cruz
 
Face Recognition Using Neural Networks
Face Recognition Using Neural NetworksFace Recognition Using Neural Networks
Face Recognition Using Neural NetworksCSCJournals
 

Was ist angesagt? (20)

01 Introduction to Machine Learning
01 Introduction to Machine Learning01 Introduction to Machine Learning
01 Introduction to Machine Learning
 
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilisti...
 
A Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather ForecastingA Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather Forecasting
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Artificial neural network for load forecasting in smart grid
Artificial neural network for load forecasting in smart gridArtificial neural network for load forecasting in smart grid
Artificial neural network for load forecasting in smart grid
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
 
A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral Images
 
Digit recognition using mnist database
Digit recognition using mnist databaseDigit recognition using mnist database
Digit recognition using mnist database
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural Filter
 
F017533540
F017533540F017533540
F017533540
 
Microstructural Analysis and Machine Learning
Microstructural Analysis and Machine LearningMicrostructural Analysis and Machine Learning
Microstructural Analysis and Machine Learning
 
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Long Zhou - 2017 -  Neural System Combination for Machine TransaltionLong Zhou - 2017 -  Neural System Combination for Machine Transaltion
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
 
Neural Network Presentation
Neural Network PresentationNeural Network Presentation
Neural Network Presentation
 
11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...11.digital image processing for camera application in mobile devices using ar...
11.digital image processing for camera application in mobile devices using ar...
 
AI IEEE
AI IEEEAI IEEE
AI IEEE
 
(Eckovation machine learning) mnist handwritten digit recognition
(Eckovation machine learning) mnist handwritten digit  recognition(Eckovation machine learning) mnist handwritten digit  recognition
(Eckovation machine learning) mnist handwritten digit recognition
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
 
Accelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsAccelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile Systems
 
Learning from data for wind–wave forecasting
Learning from data for wind–wave forecastingLearning from data for wind–wave forecasting
Learning from data for wind–wave forecasting
 
Face Recognition Using Neural Networks
Face Recognition Using Neural NetworksFace Recognition Using Neural Networks
Face Recognition Using Neural Networks
 

Ähnlich wie Exascale Deep Learning for Climate Analytics

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
 
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...Deltares
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksbalmanme
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
MSR 2009
MSR 2009MSR 2009
MSR 2009swy351
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 

Ähnlich wie Exascale Deep Learning for Climate Analytics (20)

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...
DSD-INT 2017 High Performance Parallel Computing with iMODFLOW-MetaSWAP - Ver...
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Exascale Deep Learning for Climate Analytics

  • 1. Exascale Deep Learning for Climate Analytics Thorsten Kurth*, Josh Romero*, Sean Treichler, Mayur Mudigonda,
 Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe,
 Massimiliano Fatica, Prabhat, Michael Houston GTC Silicon Valley 03/17/2019, San Jose, CA
  • 2. The Team 2 Thorsten Kurth Sean Treichler Josh Romero Mayur Mudigonda Nathan Luehr Everett Phillips Ankur Mahesh Michael Matheson Jack Deslippe Massimiliano Fatica Prabhat Michael Houston
  • 3. Socio-Economic Impact of Extreme Weather Events • tropical cyclones and atmospheric rivers have major impact on modern economy and society • CA: 50% of rainfall through atmospheric rivers • FL: flooding, influence on insurance premiums and home prices • $200B worth of damage in 2017 • costs of ~$10B/event for large events 3Harvey 2017 Katrina 2005 Berkeley 2019 Santa Rosa 2018 pixabay pixabay Chris Samuel pixabay
  • 4. Understanding Extreme Weather Phenomena • will there be more hurricanes? • will they be more intense? • will they make landfall more often? • will atmospheric rivers carry more water? • can they help mitigate droughts and decrease risk of forest fires? • will they cause flooding and heavy precipitation? 4 pixabay pixabay pixabay
  • 9. Impact Quantification of Extreme Weather Events • detect hurricanes and atmospheric rivers in climate model projections • enable geospatial analysis of EW events and statistical impact studies for regions around the world • flexible and scalable detection algorithm • gear up for future simulations with ∼1 km2 spatial resolution 6 M.F. Wehner, doi:10.1002/2013MS000276
  • 10. Unique Challenges for Climate Analytics • interpret as segmentation problem • 3 classes - background (BG), tropical cyclones (TC), atmospheric rivers (AR) • deep learning has proven successful for these tasks • climate data is complex • high imbalance - more than 95% of 
 pixels are background • high variance - shape of events change • many input channels w/ 
 different properties • high resolution required • no static background, highly variable 
 in space and time 7 NASA
  • 11. Unique Challenges for Deep Learning • need labeled data for supervised approach • can be leveraged from existing heuristic-based approaches • define neural network architecture • balance between compute performance and model accuracy • employ high productivity/flexibility frameworks for rapid prototyping • performance optimization requires holistic approach • hyper parameter tuning (HPO) • necessary for convergence and accuracy 8
  • 12. Unique Challenges for Deep Learning at Extreme Scale 9 • data management • shuffling/loading/processing/feeding 20 TB dataset to keep GPUs busy • efficient use of remote filesystem • multi-node coordination and synchronization • synchronous reduction of O(50)MB across 27360 GPUs after each iteration • hyper parameter tuning (HPO) • convergence and accuracy challenging due to larger global batch sizes
  • 13. Label Creation: Atmospheric Rivers 10 1. The climate model predicts 
 water vapor, wind speeds 
 and humidity 2. These observables are used to
 compute the 
 Integrated Water Vapor Transport
  • 14. Label Creation: Atmospheric Rivers 11 3. Binarization by thresholding at
 95th percentile 4. Flood fill algorithm generates
 AR candidates by masking out 
 regions in mid-latitudes
  • 15. Label Creation: Tropical Cyclones 12 1. Extract cyclone center and radius using 
 thresholds for pressure, temperature, and vorticity 2. Binarize patch around cyclone center 
 using thresholds for water vapor, 
 wind, and precipitation
  • 16. Software: TensorFlow • high-productivity deep learning framework in Python with C++-backend, developed by Google • leverages optimized cuDNN library for performance sensitive kernels (e.g. convolutions) on NVIDIA GPUs • dataflow-style programming and asynchronous graph execution • provides features for building I/O input pipeline • can be combined with external Python modules to provide good flexibility 13
  • 17. Software: Horovod • library developed by Uber for distributed training • provides straightforward wrappers to convert existing single process TensorFlow programs to multi-process data-parallel programs • uses MPI for worker coordination/control • MPI is ubiquitous in HPC and available broadly on systems targeting HPC applications • can leverage MPI or NCCL for collective communication on NVIDIA GPUs (e.g. allreduce) 14
  • 18. Systems 15 • Cray XC50 HPC system at CSCS, 5th on top500 • 5320 nodes with Intel Xeon E5-2695v3 
 and 1 NVIDIA P100 GPU • Cray Aries interconnect in 
 diameter 5 dragonfly topology • ~54.4 PetaFlop/s peak performance (FP32) Piz Daint Summit • leadership class HPC system at OLCF, 1st on top500 • 4609 nodes with 2 IBM P9 CPU and 6 NVIDIA V100 GPU • 300 GB/s NVLink connection btw. 3 GPUs in a group • 800 GB available NVMe storage/node • dual-rail EDR Infiniband in fat-tree topology • ~3.45 ExaFlop/s theoretical peak performance (FP16) Carlos Jones (ORNL)CSCS
  • 19.
  • 20.
  • 23. cuDNN Single GPU • Things to consider: • Is my TensorFlow model efficiently using GPU resources? • Is my data input pipeline keeping up? • Is my TensorFlow model providing reasonable results?
  • 27. NCCL MPI cuDNN cuDNN cuDNN cuDNN cuDNN cuDNN Single Node • Things to consider: • Is my data input pipeline still keeping up? • Is my data-parallel TensorFlow model providing reasonable results? • How is my performance scaling over PCIe/NVLink using Horovod?
  • 29. Multi Node • Things to consider: • Is my data-parallel TensorFlow model still providing reasonable results? • How is my performance scaling over NVLink + InfiniBand using Horovod? • How do I distribute my data across so many nodes? NCCL MPI
  • 30. Deep Learning Models for Extreme Weather Segmentation 20 decoderencoder 1152×768, 16 5×5 conv, 64 1152×768, 3 input output 5×5 conv, +322× 1×1 conv, 128, /2 5×5 conv, +32 1×1 conv, 384, /2 5×5 conv, +32 4× 5× 1x1 deconv, 160, /2 5×5 conv, +324× 1x1 deconv, 128, /2 5×5 conv, +322× 1x1 deconv, 64, /2 5×5 conv, +32 1x1 deconv, 64, /2 5×5 conv, +32 1×1 conv, 3 2× 2× 5×5 conv, +322× 1×1 conv, 192, /2 5×5 conv, +322× 1×1 conv, 256, /2 1152×768, 128 72×48, 160 144×96, 384 144×96, 544 576×384, 192 144×96, 384 288×192, 256 288×192, 384288×192, 256 576×384, 192 576×384, 256 1152×768, 128 1152×768, 192 1152×768, 256 Tiramisu, 35 layers, 
 7.8M parameters, 4.2 TF/sample 20 DeepLabv3+, 66 layers, 
 43.7M parameters, 14.4 TF/sample decoder ASPP encoder 7×7 conv, 64, /2 1152×768, 16 1×1 conv, 64 3×3 conv, 64 1×1 conv, 256 288×192, 64 3× 1×1 conv, 128 3×3 conv, 128 1×1 conv, 512 144×96, 256 4× 1×1 conv, 256 3×3 conv, 256, d 2 1×1 conv, 1024 144×96, 512 6× 1×1 conv, 512 3×3 conv, 512, d 4 1×1 conv, 2048 144×96, 1024 3× 1×1 conv, 256 3×3 conv, 256, d 12 3×3 conv, 256, d 24 3×3 conv, 256, d 36 1×1 conv, 256 144×96, 1024 144×96, 2048 3×3 deconv, 256, /2 1×1 conv, 48 3×3 conv, 64 3×3 conv, 128 3×3 conv, 256 3×3 conv, 256 3×3 deconv, 256, /2 3×3 deconv, 256, /2 1×1 conv, 3 3×3 conv, 256 3×3 conv, 256 1152×768, 3 288×192, 256 1152×768, 256 1152×768, 128 288×192, 256 3×3 maxpool, /2 input output
  • 31. Data Staging 21 • 250 training samples/GPU
 (15 GB), sample w/ replacement • each file will be read at most once from FS • files shared between nodes via MPI (mpi4py) • preprocess and feed data to GPU asynchronously using tf.data and python multiprocess Dataset Size Required BW (27K GPUs) GPFS/LUSTRE BurstBuffer NVM/e or DRAM 20 TB (~63K samples) 3.8 TB/s ~400 GB/s ~2 TB/s ~26 TB/s N V M e N V M e N V M e ... 1.5K sam ples 1.5Ksamples 1.5K sam ples shuffle
  • 32. On-Node I/O Pipeline • files are in HDF5 with single sample + label/file • list of filenames passed to TensorFlow Dataset API (tf.data) • HDF5 serialization bottleneck addressed with multiprocessing + h5py • extract and batch using tf.data input pipeline 22 ... data-2107-12-26-02-4.h5 data-2107-12-26-03-1.h5 data-2107-12-26-03-4.h5 data-2107-12-26-04-1.h5 data-2107-12-26-04-4.h5 data-2107-12-26-05-1.h5 data-2107-12-26-05-4.h5 data-2107-12-26-06-1.h5 data-2107-12-26-06-4.h5 data-2107-12-26-07-1.h5 ... ... data-2107-03-03-06-1.h5 data-2107-05-24-00-4.h5 data-2107-08-30-03-4.h5 data-2107-10-29-01-4.h5 data-2107-12-11-07-1.h5 data-2107-08-14-03-4.h5 data-2107-01-08-01-4.h5 data-2107-09-08-04-1.h5 data-2107-09-22-00-1.h5 data-2107-07-16-03-4.h5 ... shuffle 4-way parallel 
 read + preprocess batch asimovinstitute.org/neural-network-zoo CPU GPU
  • 33. Single Node Performance • GPU execution profiled with CUDA profiler, kernels grouped by category • convolution kernels: use latest cuDNN, favor higher computational intensity • pay attention to memory layout to reduce transposes and copies • tuning input pipeline on CPU to keep off critical path 23 DeepLabv3+ FP16 Training Category # Kern Time (ms) Math (TF) Mem (GB) % Time % Math % Mem Forward n Convolutions 158 147.9 9.61 27.6 18.1 52.0 20.7 Point-wise 829 52.3 < 0.1 24.3 6.4 51.6 Backward n Convolutions 195 300.2 19.21 50.5 36.7 51.2 18.7 Point-wise 157 25.6 < 0.1 6.3 3.1 27.3 Optimizer 1219 3.9 < 0.1 1.1 0.5 31.3 Copies / Transposes 708 213.2 - 92.6 26.1 48.3 Allreduce (NCCL) 30 58.7 < 0.1 0.6 7.2 1.1 Type Conversions 201 1.3 - 0.6 0.2 51.3 GPU Idle 14.2 1.7 Total 3497 817.3 28.82 203.6 28.2 27.7
  • 34. Improvements to Horovod: Original Control Plane 24 w4 w3 w1 ... {1, 2, 5, 8, 13} {2, 3, 5, 10, 13} {1, 3, 5, 10, 13, 14} w2 {1, 3, 5, 7, 13} w1 {5, 13} w1 intersect lists allreduce5,13 gather broadcast w4 w3 w1 ... w2
  • 35. Improvements to Horovod: Tree-based Control Plane 25 ... w4 w3 w1 w2 wN allreduce5,13 asynchronous gather + intersect ... w4 w3 w1 w2 w5 w6 w7 w5 w6 w7 ... ... tree-based broadcast
  • 36. Improvements to Horovod: Hybrid All-Reduce • NCCL uses NVLink for high throughput, but ring-based algorithms are latency-limited at scale • hybrid NCCL/MPI strategy uses strengths of both • one inter-node all reduce per virtual NIC • MPI work overlaps well with GPU computation 26 intra-node 
 allreduce (NCCL) 4x inter-node allreduce (MPI) 4x intra-node broadcast (NCCL)
  • 37. Gradient Pipelining (Lag) 27 wN ...gN k allreduce ḡk wN w3 g3 k ḡk w3 w2 g2 k ḡk w2 w1 g1 k ḡk w1 lag-0 (fully synchronous) qN q1 wN ... w1 gN k-1 g1 k-1 gN k g1 k ... q1 qN allreduce ḡk-1 ḡk-1 wN w1 ... ... lag-1 ...
  • 38. Importance of Node-Local Memory 28 } 
 ~10% performance penalty for reading from global file system

  • 39. Scaling Tiramisu • FP16-model sensitive to communication • FP16-model BW-bound
 (only 2.5x faster than FP32) • almost ideal scaling for both precisions on Summit when gradient lag is used 29
  • 40. Scaling DeepLabv3+ 30 • FP16-model sensitive to communication • FP16-model BW-bound
 (only 2.5x faster than FP32) • excellent scaling for both precisions on Summit when gradient lag is used
  • 43.
  • 45. Concurrency/Precision and Convergence 33 ~2.1x improvement in time to solution
  • 47. Segmentation Animation 35 • best result for intersection-over-union (IoU) obtained: ∼73% • result at large scale (batch-size > 1500): IoU ∼55%
  • 48. Segmentation Animation 35 • best result for intersection-over-union (IoU) obtained: ∼73% • result at large scale (batch-size > 1500): IoU ∼55%
  • 49. What has happened since Gordon Bell? • Horovod: • Hierarchical (hybrid) allreduce available in upstream code (NVIDIA/Amazon) • Control plane bypass via caching available soon (NVIDIA) • NCCL: • Version 2.4.x introduced tree-based reduction algorithms for improved performance at scale • NERSC: • ClimateNet: human annotated labels from domain experts • project serves as science problem for the ISC-HPC19 student cluster AI challenge • Gained insights for design and evaluation of upcoming NERSC Perlmutter system 36
  • 50. Conclusions • deep learning and HPC converge, achieving exascale performance • compute capabilities of contemporary HPC systems can be utilized to tackle challenging scientific deep learning problems • HPO and convergence at scale still an open problem — but now we can do it • software enhancements benefit deep learning community at large • deep learning-powered techniques usher in a new era of precision analytics for various science areas 37