SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Perspective on HPC-enabled AI
Tim Barr
September 7, 2017
AI is Everywhere
Copyright© 2017 Cray Inc. 2
Deep Learning Component of AI
The punchline: Deep Learning is a High Performance
Computing problem
• Delivers benefits similar to HPC in other disciplines
• The value is in the decisions that are enabled
• Characterized by the same underlying factors
• Large amount of computation
• Large amount of data motion (I/O and network)
• The same methods work
• HPC Technology and HPC Best Practice apply directly to DL
3
Deep Learning Training: Behind the Scenes
Compute gradients locally
Global average of gradients
PnP1 P2
Process samples
} One
Mini-batch
Deploying lots of computational power requires lots of communication.
} One
Mini-batchRepeat…
Computationally-intensive training phase
Copyright© 2017 Cray Inc. 4
High
Performance
Simulation
High
Performance
Machine and
Deep Learning
Why Are We Here?
Faster is
better
More accurate
is better
Computationally
Intensive
Communication
Intensive
Copyright© 2017 Cray Inc. 5
Let’s Use Weather As An Example
• More Accurate is
Better
• At100km (top) and
25km (bottom)
• Missed tropical
cyclones and big
waves up to 30 meters
high
• Faster is Better
• Higher resolution
simulation requires 64X
more computation
http://www.nersc.gov/news-publications/nersc-news/science-news/2017-
2/researchers-catch-extreme-waves-with-high-resolution-modeling
Copyright© 2017 Cray Inc.
6
HPC and AI Will Converge
Big Data HPC
40% Reduction in error
rates when 10x more data is
being used in coordination
with AI in speech recognition 1
28% believe HPC
will allow them to scale
computationally to build
deep learning
algorithms that can take
advantage of high
volumes of data 1
2xDigital data
is doubling in size
every two years,
and by 2020 the
digital universe
will reach 44
zettabytes 2
Machine Learning
Deep Learning
1. “Are AI/Machine Learning/Deep Learning in Your Company’s Future?”,
insideBigData + NVIDIA
2. EMC Digital Universe with Research & Analysis by IDC Copyright© 2017 Cray Inc.
7
What is Deep Learning ?
ARTIFICIAL INTELLIGENCE
Design of intelligent systems that augments human productivity. Systems
that help decision makers do what they do best; leveraging computers doing what they do best
Sense Comprehend Predict Act and Adapt
ANALYTICS MACHINE LEARNING
Search for the what, when, where and why Learn patterns from the past to predict future
Leverage domain and data science to query
datasets for insights:
Unsupervised
Group, cluster and
organize content with
domain-specific
heuristic models
Supervised
Train mathematical
predictive models with
labelled data
Descriptive What happened?
Diagnostic Why did it happen? DEEPLEARNING
Predictive What will happen? Train and use neural networks as a predictive model
Prescriptive How to make it happen? Vision Speech Language
Copyright© 2017 Cray Inc.
8
“AI and machine learning have reached a critical tipping point and will
increasingly augment and extend virtually every technology enabled
service, thing or application.”
“The combination of extensive parallel processing power, advanced
algorithms and massive data sets to feed the algorithms has
unleashed this new era.”
Gartner’s Top 10 Strategic Technology Trends for 2017
“Fast data is just as important as big data. In 2016, we’ll witness
the emergence of a new class of real-time applications in e-
commerce and financial technology services powered by super-
speedy data analytics. ‘Fast data’ is the second iteration of big
data, and it will create a lot of value.”
Fortune Magazine, December 2015
In a competitive international
economy, advanced AI combined
with supercomputing are essential
ingredients for:
▪ Solution of strategically important
problems
▪ Maintaining global leadership in
industry, government and
academia
▪ Creating next generation
technologies, products and
services
Performance will be an AI Innovation and
Adoption Driver
Copyright© 2017 Cray Inc.
9
Deep Learning Will Require Supercomputing
• An AI Revolution Started For
Courageous Enterprises
• Yes, Deep Learning Warrants All
The Fuss
• Expect To Need Thousands Of
Cores
10
Copyright© 2017 Cray Inc.
Deep Learning with Supercomputers
NERSC – Deep Learning in Science
11
Opportunities to apply
DL widely in support of
classic HPC simulation
and modelling
Copyright© 2017 Cray Inc.
Deep Learning in Automotive
Noise, Vibration and Harshness at Daimler
• Noise, Vibration and
Harshness is a traditional
HPC application used in
automotive and aerospace
• Deep Learning has the
potential to do an
automatic evaluation of
results in complex, multi-
component, non-linear
applications
Copyright© 2017 Cray Inc.
12
Deep Learning Examples in Manufacturing
Aerospace Drones
10-fold increase in the commercial drone
fleet by 2021…FAA, 2017
Digital Twin
“Top 10 technologies for 2017”,
Gartner
Autonomous Vehicle
OEMs will invest $7 billion in
development…Frost &Sullivan, 2016
Leveraging data analytics and deep learning between engineering disciplines
and across the enterprise has great potential for product quality and innovation
Copyright© 2017 Cray Inc.
13
Will not see
ROI
imminently
Will not see
ROI for
sometime
Beginning to
see ROI
See significant
ROI
17%46%25%10%
ROI Timeline
When Should You Start?
A Sample from the Financial Services Sector
Source: Innovita Partners, 7/2017, exclusively for Cray
▪ ROI payoff will be 1 – 2 years
▪ Time to begin experimentation
is now
<1 year 1 year 1 to 2 years 3 to 4 years 5 to 7 years
Copyright© 2017 Cray Inc.
14
Why Deep Learning Now?
Adjustable weights
Weights are not learned
Learnable weights and
threshold
XOR Problem Solution to nonlinearly
separable problems
Big computation, local
optima/overfitting
Limitations of
learning prior
Kernel function:
Human intervention
Hierarchical feature learning
Electronic
brain
Perceptron ADALINE XOR Backpropagation Deep LearningSVM
AI WinterGolden Age
"Large
Enough"
Data to
Train
Compute
Power
Advanced
Algorithms
and Software
Frameworks
Data
Science
Expertise
Deep
Learning
Now
Image Source: Andrew L. Beam. (2017, February 13). Deep Learning 101 – Part 1:History and Background[Blog post]. Retrieved from
https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html
15
Deep Learning Challenges
“AI systems still demand considered design,
knowledge engineering and model building”, ForresterAI
TechRadarQ1 2017
▪ A lot to learn for practitioners and end-users:
▪ Large, complex workflows
▪ Different Toolkits + Data Movement + Network
▪ Defining the value returned to the business
▪ Training times grow with data sizes and complexity:
▪ Days to Weeks
▪ Compounded with hyper parameter optimization
(O(1000) is not unrealistic)
Copyright© 2017 Cray Inc.
16
HPC and AI
Enabling resource intensive training by delivering
performance efficiencies and scalability
Architectures
Software
Platforms
▪Deep Learning Platforms - dense
GPU to scalable platforms with
optimized software stacks
▪Apply HPC best practices and
expertise to improve deep
learning frameworks and core
algorithms
Expertise
Copyright© 2017 Cray Inc.
17
Reduce Total Workflow Time
Why? The Deep Neural Net Training Problem
• DNN model with weights on all connections
• Largest models now hundreds of layers,
and millions (to billions) of nodes
• Large set of labeled training data
• Idealized training algorithm:
• For every minibatch of training samples:
• run samples forward through the model
• compute the error vs. the training data
• back-propagate error through the NN to update the weights (gradient descent)
• After all data processed, iteratively optimize hyperparameters until
required accuracy is achieved
A (not particularly deep) neural net
Copyright© 2017 Cray Inc.
18
Reduce Total Workflow Time
▪ Minutes, Hours:
▪ Interactive research!
Instant gratification!
▪ 1-4 days
▪ Tolerable
▪ Interactivity replaced by
running many
experiments in parallel
▪ 1-4 weeks:
▪ High value experiments
only
▪ Progress stalls
▪ >1 month
▪ Don’t even try
Data
Acquisition
Data
Preparation
Model
Training
Model
Testing
Source: Large-Scale Deep Learning for
Intelligent Computer Systems, Jeff
Dean, Google
Apply HPC best practices
and expertise to improve
deep learning frameworks
and core algorithms
Copyright© 2017 Cray Inc.
19
0
100
200
300
400
500
600
700
64 Nodes 128 Nodes 256 Nodes 512 Nodes 1024 Nodes 2048 Nodes
EpochElapsedTime(Seconds)
“Applying a supercomputing approach to optimize deep
learning workloads represents a powerful breakthrough
for training and evaluating deep learning algorithms at
scale. Our collaboration with Cray and CSCS has
demonstrated how the Microsoft Cognitive Toolkit can be
used to push the boundaries of deep learning.”
- Dr. Xuedong Huang, distinguished engineer, Microsoft AI and
Research
Microsoft Cognitive Toolkit
Cray Focus: Deep Learning Training at Scale
CNTK: Distributed Version vs Cray MPI Parallel Implementation
Copyright© 2017 Cray Inc.
▪ Apply HPC Best Practices and Cray Expertise
to improve DL systems and core algorithms
with real-world use cases
▪ Collaborations across Cray customers and
other stakeholders
▪ Currently optimizing different toolkits:
▪ CNTK
▪ TensorFlow
▪ MXNet
20
HPC Focus: Comprehensive Systems
Configuration
Monitoring
Serving
Infrastructure
Data
Collection
Feature
Extraction
Data
Verification
Machine
Resource
Management
Analysis Tools
ML
Code
Process
Management Tools
“Only a small fraction of real-world ML systems is composed of
the ML code, as shown by the small black box in the middle.
The required surrounding infrastructure is vast and complex.”
-Adapted from Hidden Technical Debt in Machine Learning Systems,
Sculley et. al., NIPS ‘15
Copyright© 2017 Cray Inc.
21
HPC Supports the Entire AI Workflow
Deep Learning
workflows are not
limited to training.
● Similar to other HPC
and analytics
workloads, significant
portions of DL jobs are
devoted to data
collection, preparation
and management.
Data
Acquisition
Data
Preparation
Model
Training
Model
Testing
• Cleansing
• Shaping
• Enrichment
Data Annotation
(Ground Truth)
Test
Set
Validation
Set
Train
Model
Evaluate Performance and
optimize model
Cross-
Validation
Iterative
Training
Set
Copyright© 2017 Cray Inc.
22
AI is everywhere… Even the grocery store
23
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
pkaviya
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 

Was ist angesagt? (20)

Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
 
CLOUD ENABLING TECHNOLOGIES.pptx
 CLOUD ENABLING TECHNOLOGIES.pptx CLOUD ENABLING TECHNOLOGIES.pptx
CLOUD ENABLING TECHNOLOGIES.pptx
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Gpu Systems
Gpu SystemsGpu Systems
Gpu Systems
 
High Performance Computing - The Future is Here
High Performance Computing - The Future is HereHigh Performance Computing - The Future is Here
High Performance Computing - The Future is Here
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Dual-core processor
Dual-core processorDual-core processor
Dual-core processor
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPCHPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
HPC DAY 2017 | HPE Strategy And Portfolio for AI, BigData and HPC
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Distributed Computing ppt
Distributed Computing pptDistributed Computing ppt
Distributed Computing ppt
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 

Ähnlich wie Perspective on HPC-enabled AI

Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
Data Con LA
 
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E..."1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
Edge AI and Vision Alliance
 

Ähnlich wie Perspective on HPC-enabled AI (20)

The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
 
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceBuilding Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Center of Excellence
Center of Excellence Center of Excellence
Center of Excellence
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Data Tells the Story - Greenplum Summit 2018
Data Tells the Story - Greenplum Summit 2018Data Tells the Story - Greenplum Summit 2018
Data Tells the Story - Greenplum Summit 2018
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Data
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
 
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy M...
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E..."1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
"1,000X in Three Years: How Embedded Vision is Transitioning from Exotic to E...
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Perspective on HPC-enabled AI

  • 1. Perspective on HPC-enabled AI Tim Barr September 7, 2017
  • 2. AI is Everywhere Copyright© 2017 Cray Inc. 2
  • 3. Deep Learning Component of AI The punchline: Deep Learning is a High Performance Computing problem • Delivers benefits similar to HPC in other disciplines • The value is in the decisions that are enabled • Characterized by the same underlying factors • Large amount of computation • Large amount of data motion (I/O and network) • The same methods work • HPC Technology and HPC Best Practice apply directly to DL 3
  • 4. Deep Learning Training: Behind the Scenes Compute gradients locally Global average of gradients PnP1 P2 Process samples } One Mini-batch Deploying lots of computational power requires lots of communication. } One Mini-batchRepeat… Computationally-intensive training phase Copyright© 2017 Cray Inc. 4
  • 5. High Performance Simulation High Performance Machine and Deep Learning Why Are We Here? Faster is better More accurate is better Computationally Intensive Communication Intensive Copyright© 2017 Cray Inc. 5
  • 6. Let’s Use Weather As An Example • More Accurate is Better • At100km (top) and 25km (bottom) • Missed tropical cyclones and big waves up to 30 meters high • Faster is Better • Higher resolution simulation requires 64X more computation http://www.nersc.gov/news-publications/nersc-news/science-news/2017- 2/researchers-catch-extreme-waves-with-high-resolution-modeling Copyright© 2017 Cray Inc. 6
  • 7. HPC and AI Will Converge Big Data HPC 40% Reduction in error rates when 10x more data is being used in coordination with AI in speech recognition 1 28% believe HPC will allow them to scale computationally to build deep learning algorithms that can take advantage of high volumes of data 1 2xDigital data is doubling in size every two years, and by 2020 the digital universe will reach 44 zettabytes 2 Machine Learning Deep Learning 1. “Are AI/Machine Learning/Deep Learning in Your Company’s Future?”, insideBigData + NVIDIA 2. EMC Digital Universe with Research & Analysis by IDC Copyright© 2017 Cray Inc. 7
  • 8. What is Deep Learning ? ARTIFICIAL INTELLIGENCE Design of intelligent systems that augments human productivity. Systems that help decision makers do what they do best; leveraging computers doing what they do best Sense Comprehend Predict Act and Adapt ANALYTICS MACHINE LEARNING Search for the what, when, where and why Learn patterns from the past to predict future Leverage domain and data science to query datasets for insights: Unsupervised Group, cluster and organize content with domain-specific heuristic models Supervised Train mathematical predictive models with labelled data Descriptive What happened? Diagnostic Why did it happen? DEEPLEARNING Predictive What will happen? Train and use neural networks as a predictive model Prescriptive How to make it happen? Vision Speech Language Copyright© 2017 Cray Inc. 8
  • 9. “AI and machine learning have reached a critical tipping point and will increasingly augment and extend virtually every technology enabled service, thing or application.” “The combination of extensive parallel processing power, advanced algorithms and massive data sets to feed the algorithms has unleashed this new era.” Gartner’s Top 10 Strategic Technology Trends for 2017 “Fast data is just as important as big data. In 2016, we’ll witness the emergence of a new class of real-time applications in e- commerce and financial technology services powered by super- speedy data analytics. ‘Fast data’ is the second iteration of big data, and it will create a lot of value.” Fortune Magazine, December 2015 In a competitive international economy, advanced AI combined with supercomputing are essential ingredients for: ▪ Solution of strategically important problems ▪ Maintaining global leadership in industry, government and academia ▪ Creating next generation technologies, products and services Performance will be an AI Innovation and Adoption Driver Copyright© 2017 Cray Inc. 9
  • 10. Deep Learning Will Require Supercomputing • An AI Revolution Started For Courageous Enterprises • Yes, Deep Learning Warrants All The Fuss • Expect To Need Thousands Of Cores 10 Copyright© 2017 Cray Inc.
  • 11. Deep Learning with Supercomputers NERSC – Deep Learning in Science 11 Opportunities to apply DL widely in support of classic HPC simulation and modelling Copyright© 2017 Cray Inc.
  • 12. Deep Learning in Automotive Noise, Vibration and Harshness at Daimler • Noise, Vibration and Harshness is a traditional HPC application used in automotive and aerospace • Deep Learning has the potential to do an automatic evaluation of results in complex, multi- component, non-linear applications Copyright© 2017 Cray Inc. 12
  • 13. Deep Learning Examples in Manufacturing Aerospace Drones 10-fold increase in the commercial drone fleet by 2021…FAA, 2017 Digital Twin “Top 10 technologies for 2017”, Gartner Autonomous Vehicle OEMs will invest $7 billion in development…Frost &Sullivan, 2016 Leveraging data analytics and deep learning between engineering disciplines and across the enterprise has great potential for product quality and innovation Copyright© 2017 Cray Inc. 13
  • 14. Will not see ROI imminently Will not see ROI for sometime Beginning to see ROI See significant ROI 17%46%25%10% ROI Timeline When Should You Start? A Sample from the Financial Services Sector Source: Innovita Partners, 7/2017, exclusively for Cray ▪ ROI payoff will be 1 – 2 years ▪ Time to begin experimentation is now <1 year 1 year 1 to 2 years 3 to 4 years 5 to 7 years Copyright© 2017 Cray Inc. 14
  • 15. Why Deep Learning Now? Adjustable weights Weights are not learned Learnable weights and threshold XOR Problem Solution to nonlinearly separable problems Big computation, local optima/overfitting Limitations of learning prior Kernel function: Human intervention Hierarchical feature learning Electronic brain Perceptron ADALINE XOR Backpropagation Deep LearningSVM AI WinterGolden Age "Large Enough" Data to Train Compute Power Advanced Algorithms and Software Frameworks Data Science Expertise Deep Learning Now Image Source: Andrew L. Beam. (2017, February 13). Deep Learning 101 – Part 1:History and Background[Blog post]. Retrieved from https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html 15
  • 16. Deep Learning Challenges “AI systems still demand considered design, knowledge engineering and model building”, ForresterAI TechRadarQ1 2017 ▪ A lot to learn for practitioners and end-users: ▪ Large, complex workflows ▪ Different Toolkits + Data Movement + Network ▪ Defining the value returned to the business ▪ Training times grow with data sizes and complexity: ▪ Days to Weeks ▪ Compounded with hyper parameter optimization (O(1000) is not unrealistic) Copyright© 2017 Cray Inc. 16
  • 17. HPC and AI Enabling resource intensive training by delivering performance efficiencies and scalability Architectures Software Platforms ▪Deep Learning Platforms - dense GPU to scalable platforms with optimized software stacks ▪Apply HPC best practices and expertise to improve deep learning frameworks and core algorithms Expertise Copyright© 2017 Cray Inc. 17
  • 18. Reduce Total Workflow Time Why? The Deep Neural Net Training Problem • DNN model with weights on all connections • Largest models now hundreds of layers, and millions (to billions) of nodes • Large set of labeled training data • Idealized training algorithm: • For every minibatch of training samples: • run samples forward through the model • compute the error vs. the training data • back-propagate error through the NN to update the weights (gradient descent) • After all data processed, iteratively optimize hyperparameters until required accuracy is achieved A (not particularly deep) neural net Copyright© 2017 Cray Inc. 18
  • 19. Reduce Total Workflow Time ▪ Minutes, Hours: ▪ Interactive research! Instant gratification! ▪ 1-4 days ▪ Tolerable ▪ Interactivity replaced by running many experiments in parallel ▪ 1-4 weeks: ▪ High value experiments only ▪ Progress stalls ▪ >1 month ▪ Don’t even try Data Acquisition Data Preparation Model Training Model Testing Source: Large-Scale Deep Learning for Intelligent Computer Systems, Jeff Dean, Google Apply HPC best practices and expertise to improve deep learning frameworks and core algorithms Copyright© 2017 Cray Inc. 19
  • 20. 0 100 200 300 400 500 600 700 64 Nodes 128 Nodes 256 Nodes 512 Nodes 1024 Nodes 2048 Nodes EpochElapsedTime(Seconds) “Applying a supercomputing approach to optimize deep learning workloads represents a powerful breakthrough for training and evaluating deep learning algorithms at scale. Our collaboration with Cray and CSCS has demonstrated how the Microsoft Cognitive Toolkit can be used to push the boundaries of deep learning.” - Dr. Xuedong Huang, distinguished engineer, Microsoft AI and Research Microsoft Cognitive Toolkit Cray Focus: Deep Learning Training at Scale CNTK: Distributed Version vs Cray MPI Parallel Implementation Copyright© 2017 Cray Inc. ▪ Apply HPC Best Practices and Cray Expertise to improve DL systems and core algorithms with real-world use cases ▪ Collaborations across Cray customers and other stakeholders ▪ Currently optimizing different toolkits: ▪ CNTK ▪ TensorFlow ▪ MXNet 20
  • 21. HPC Focus: Comprehensive Systems Configuration Monitoring Serving Infrastructure Data Collection Feature Extraction Data Verification Machine Resource Management Analysis Tools ML Code Process Management Tools “Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.” -Adapted from Hidden Technical Debt in Machine Learning Systems, Sculley et. al., NIPS ‘15 Copyright© 2017 Cray Inc. 21
  • 22. HPC Supports the Entire AI Workflow Deep Learning workflows are not limited to training. ● Similar to other HPC and analytics workloads, significant portions of DL jobs are devoted to data collection, preparation and management. Data Acquisition Data Preparation Model Training Model Testing • Cleansing • Shaping • Enrichment Data Annotation (Ground Truth) Test Set Validation Set Train Model Evaluate Performance and optimize model Cross- Validation Iterative Training Set Copyright© 2017 Cray Inc. 22
  • 23. AI is everywhere… Even the grocery store 23