SlideShare a Scribd company logo
1 of 31
Download to read offline
Transparent Hardware
Acceleration for Deep
Learning
Indrajit Poddar (I.P), STSM, IBM Cognitive Systems
ipoddar@us.ibm.com
Data
Time
Available
Data
Understood
Data
Enterprise
Amnesia
80 million
wearable health
devices will
be available by
2017.
2.5
quintillion
bytes of data
generated daily
by connected
machines.
There
will be
28 times
more
sensor-
enabled
devices
than
people
by the
year 2020.
25 gigabytes
of data per hour
is generated by a
connected car.
90% of cars will
be connected by 2020.
153 exabytes
of healthcare
data generated by
devices in 2013.
Increasing to 2,314
exabytes in 2020.
1.7 megabytes
of data per
second
generated by
every human
being on the
planet by 2020.
2
see hear feel
talk learn
write read
find discover
3
accident
risk
rate
90%
inspection
times
10X
number of
inspections
4
Assembled the 1.2 billion letter genome
(faster and cheaper than ever before)
to understand its vulnerabilities
Culex quinquefasciatus
The Southern House Mosquito
5
Radiologists
Overloaded with medical
imaging data.
Eye Fatigue.
Missed Diagnoses.
Radiologists are scarce.
6
Shape
Attenuation
Boundary
save
time
money
lives
Technology
understands
morphology.
91% accuracy
cancerous
determination.
Holy grail?
Premalignant
lesions
7
22,000 computer node cluster
716,000 Intel CPUs
30 computer node cluster
60 Power CPUs
120 NVIDIA Tesla P100 GPUs
NVidia NVLink
Oil & Gas Billion Cell Reservoir calculation
8
Artificial
Intelligence
and
Cognitive
Applications
Machine
Learning
Deep
Learning
(Neural Networks)
The deeper you go, the more value you gain,
and the more you know
9
As neural networks go deeper,
they provide
a dramatic increase
in accuracy.
Higher accuracy
networks require
way higher
computation
which increases
prediction latency.
10
When scale-out is not enough…
Deep Learning model training is not easy to distribute
Training can take hours,
days or weeks with large
data-sets
Real-time analytics possible with:
Unprecedented demand for offloaded computation,
accelerators, and higher memory bandwidth systems
Resulting in….
Moore’s law is dying
11
OpenPOWER: Open Hardware for High Performance
1
2
Systems designed for
big data analytics
and superior cloud economics
Upto:
12 cores per cpu
96 hardware threads per cpu
§1 TB RAM
7.6Tb/s combined I/O Bandwidth
GPUs and FPGAs coming…
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://mc.jarvice.com/
Why IBM's shrinking transistors look
like a breakthrough for all of IT…
Faster – Lower Power - Smaller
5 NM 50% more switches than 7nm
1st ever 5nm transistor structure
(nanosheet)
40% more throughout @ fixed power …or
75% power savings at same throughput
13
IBM PowerAI
the accelerated
platform for
deep learning
dramatically improved training times
The LARGER the problem …
the BIGGER the NVLink advantage
4Xthreads/core
memory bandwidth
more cache
UNIQUE
CPU ßà NVLink ßà GPU
more
powerful
vs. x86 +
14
Graphics
Memory
POWER8
CPU
DDR4
GPU
NVLink
CPUDDR4
PCIe
Graphics
Memory
POWER8 with NVLink
delivers 2.8X the bandwidth
PCIe Data Pipe
POWER8 NVLink
Data Pipe
GPU
Performance…
Faster Training
and Inferencing
Unique innovation through
OpenPower collaboration
System Bottleneck Here
15
16
Large Model SupportDistributed Deep Learning
faster training times
for data scientists
(Competitors)
Limited memory on GPU forces
trade-off in model size / data
resolution
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
(PowerAI)
Use system memory and GPU
to support more complex models
and higher resolution data
Traditional Model Support à
CPUDDR4
GPU
PCIe
Graphics
Memory
Performance…
Faster Training
and Inferencing
95% scaling efficiency on the Caffe deep learning
framework over 256 NVIDIA GPUs in 64 systems
IBM Research achieved a new image recognition accuracy of 33.8% for a
neural network trained on a very large data set (7.5M images). The
previous record published by demonstrated 29.8% accuracy.
https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/
enterprise-ready
software distribution
built on open source
tools for ease
of development
performance
faster training times
for data scientists
17
precompiled and current
open source frameworks
Enterprise-Ready
Software Distribution
Built on Open Source
18
Tools for Ease
of Development
rich advisory and building
toolsets to flatten
time to value
AI Vision
rich toolset image
recognition neural
networks
automated deep learning
toolkit data preparation
DL Insight toolkit supports
auto-training runs for
hyper parameter tuning
+++
19
In my dreams
I’m coding in
an open data science
framework,
running on Spark and
Power
…in minutes
IBM Data Science Experience
Learn
Create
Collaborate
Tools for Ease
of Development
20
Accelerators/GPUs in a Cloud Stack
21
Containers
and images
Accelerators
Clustering frameworks
Workload
Aware
Scheduling
Shared
Resource
Management
Emerging
Workloads
Dev Ops & Micro Services
High Performance
Computing
Design / Simulation / Modeling
‘New-gen
Workloads’
Hadoop, Spark, Containers
with Spark
IBM
Cloud
private
New
High Performance
Analytics
Trade / Risk Analytics
IBM Data
Science
Experience
Deep Learning Training & Inference
Accelerated Deep Learning on OpenPOWER
Try on NIMBIX: https://mc.jarvice.com/
22
Build Deep Learning Docker Images Using PowerAI Software
23
Dockerfile extending Nvidia base images for POWER:
FROM nvidia/cuda-ppc64le:8.0-cudnn6-devel-ubuntu16.04
ENV POWERAI_REPO mldl-repo-local_4.0.0_ppc64el.deb
RUN apt-get update && apt-get install -y git wget ssh vim curl &&
apt-get clean
# import PowerAI repo
RUN cd /tmp && wget
https://public.dhe.ibm.com/software/server/POWER/Linux/mldl/ubu
ntu/${POWERAI_REPO} && dpkg -i ${POWERAI_REPO} && rm
${POWERAI_REPO}
# install PowerAI
RUN apt-get update && apt-get install -y power-mldl && apt-get
clean
IBM
Cloud
private
See example: https://github.com/knm3000/nvidia-powerai/blob/master/Dockerfile
Run PowerAI software with NVIDIA Docker
24
A	Docker	wrapper	and	tool
to	package	and	GPU	based	
apps
Enhance	portability	of	
images	by	using	drivers	on	
the	host
No	need	to	include	drivers	
in	Docker	image
See	Blog:	
https://developer.ibm.com/linuxonpo
wer/tutorials/powerai-docker-images/
https://github.com/NVIDIA/nvidia-docker/tree/ppc64le
Manage GPU clusters with Kubernetes and IBM Cloud private
• Open source orchestration system
for Docker containers on multiple
hosts: https://kubernetes.io/
• GPU scheduling features getting
upstream
• IBM Cloud private, :
Download free Community Edition,
Ask on slack
• Download RPM from
https://www.rpmfind.net/linux/rpm2html/search.php?q
uery=kubernetes&arch=ppc64le
| 25
http://on-demand.gputechconf.com/gtc/2017/presentation/s7258-seetharami-seelam-
speed-up-deep-learning-service.pdf
Show >100x speedup for Caffe inferencing with GPUs in PowerAI in
NIMBIX in less than 5 minutes
PowerAI Trial Configurations in a public cloud:
• Docker container builds and comes up in minutes
• Single P100 GPUs
• 30 days with 60 hrs standard (120 for Sales referral)
• 128GB RAM, 32 CPU threads, 1TB shared storage
• Quad P100 GPUs
• 30 days with 120hrs standard (more by request)
• 512GB RAM, 128 CPU threads, 1TB shared storage
Nimbix Cloud	Advantages
• Easier	to	use
• Highest	Performance
• Ultra	Fast	Launch	Times
• Lower	Cost
• Faster	time	to	Value
• Bare-Metal	Acceleration
• Enterprise	Accounting
• Application	Marketplace
• Private	Apps
https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine-
learning-with-openpower-gpus-and-docker
Experience performance
with productivity
A superior integrated stack and
adequate hardware resources
for deep learning insights
https://www.nimbix.net/ibm-power-nimbix-cloud
26
9Days
Acceleration training …. days become hours
4Hours
Recognition
Shape
Attenuation
Boundary
Recognition
Shape
Attenuation
Boundary
54x
Learning
runs with
Power 8
4Hours
4Hours
4Hours
4Hours
. . . . . . . .
. . . . . . .
4Hours
What will you do?
Iterate more and create more accurate models?
Create more models?
Both?
27
Developer Resources for POWER systems
• Linux on POWER Developer Portal
https://developer.ibm.com/linuxonpower/
• Find open source Linux packages in
popular OS distros
https://developer.ibm.com/linuxonpower/open-source-
pkgs/
• Request free VMs from Oregon State
University Open Source Lab:
http://osuosl.org/services/powerdev/
• Get answers to Linux specific questions
in Stack Overflow
https://developer.ibm.com/answers/smartspace/linuxo
npower/index.html
• See Blogs on Deep Learning and
PowerAI topics
https://developer.ibm.com/linuxonpower/blog/
| 28
Notices and Disclaimers
29
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written
permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of
initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS
DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM
THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF
OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those
customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries
in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials
and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant
or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as
to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any
actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services
or products will ensure that the customer is in compliance with any law.
© 2016 International Business Machines C
Notices and Disclaimers Con’t.
30
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources.
IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to
non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of
any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES,
EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other
intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™,
FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on
Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®,
PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS,
Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are
trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
31
Thank You!!
Q & A

More Related Content

What's hot

Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
NVIDIA Taiwan
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
Willy Marroquin (WillyDevNET)
 

What's hot (20)

OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital WorldFujitsu World Tour 2017 - Compute Platform For The Digital World
Fujitsu World Tour 2017 - Compute Platform For The Digital World
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
Nvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't WaitNvidia SC16: The Greatest Challenges Can't Wait
Nvidia SC16: The Greatest Challenges Can't Wait
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
AI + E-commerce
AI + E-commerceAI + E-commerce
AI + E-commerce
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
Intel's Machine Learning Strategy
Intel's Machine Learning StrategyIntel's Machine Learning Strategy
Intel's Machine Learning Strategy
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 

Similar to Transparent Hardware Acceleration for Deep Learning

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
 

Similar to Transparent Hardware Acceleration for Deep Learning (20)

ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Power AI introduction
Power AI introductionPower AI introduction
Power AI introduction
 
OpenPOWER Seminar at IIT Madras
OpenPOWER Seminar at IIT MadrasOpenPOWER Seminar at IIT Madras
OpenPOWER Seminar at IIT Madras
 
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on PowerGschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Java and the GPU - Everything You Need To Know
Java and the GPU - Everything You Need To KnowJava and the GPU - Everything You Need To Know
Java and the GPU - Everything You Need To Know
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
OpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT BangaloreOpenPOWER Seminar at IIIT Bangalore
OpenPOWER Seminar at IIIT Bangalore
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
S104877 cdm-data-reuse-jburg-v1809d
S104877 cdm-data-reuse-jburg-v1809dS104877 cdm-data-reuse-jburg-v1809d
S104877 cdm-data-reuse-jburg-v1809d
 
Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?Getting to timely insights - how to make it happen?
Getting to timely insights - how to make it happen?
 
Breaking RSA & the internet
Breaking RSA & the internetBreaking RSA & the internet
Breaking RSA & the internet
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Ibm power 824
Ibm power 824Ibm power 824
Ibm power 824
 
DataArt
DataArtDataArt
DataArt
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Transparent Hardware Acceleration for Deep Learning

  • 1. Transparent Hardware Acceleration for Deep Learning Indrajit Poddar (I.P), STSM, IBM Cognitive Systems ipoddar@us.ibm.com
  • 2. Data Time Available Data Understood Data Enterprise Amnesia 80 million wearable health devices will be available by 2017. 2.5 quintillion bytes of data generated daily by connected machines. There will be 28 times more sensor- enabled devices than people by the year 2020. 25 gigabytes of data per hour is generated by a connected car. 90% of cars will be connected by 2020. 153 exabytes of healthcare data generated by devices in 2013. Increasing to 2,314 exabytes in 2020. 1.7 megabytes of data per second generated by every human being on the planet by 2020. 2
  • 3. see hear feel talk learn write read find discover 3
  • 5. Assembled the 1.2 billion letter genome (faster and cheaper than ever before) to understand its vulnerabilities Culex quinquefasciatus The Southern House Mosquito 5
  • 6. Radiologists Overloaded with medical imaging data. Eye Fatigue. Missed Diagnoses. Radiologists are scarce. 6
  • 8. 22,000 computer node cluster 716,000 Intel CPUs 30 computer node cluster 60 Power CPUs 120 NVIDIA Tesla P100 GPUs NVidia NVLink Oil & Gas Billion Cell Reservoir calculation 8
  • 10. As neural networks go deeper, they provide a dramatic increase in accuracy. Higher accuracy networks require way higher computation which increases prediction latency. 10
  • 11. When scale-out is not enough… Deep Learning model training is not easy to distribute Training can take hours, days or weeks with large data-sets Real-time analytics possible with: Unprecedented demand for offloaded computation, accelerators, and higher memory bandwidth systems Resulting in…. Moore’s law is dying 11
  • 12. OpenPOWER: Open Hardware for High Performance 1 2 Systems designed for big data analytics and superior cloud economics Upto: 12 cores per cpu 96 hardware threads per cpu §1 TB RAM 7.6Tb/s combined I/O Bandwidth GPUs and FPGAs coming… OpenPOWER Traditional Intel x86 http://www.softlayer.com/POWER-SERVERS https://mc.jarvice.com/
  • 13. Why IBM's shrinking transistors look like a breakthrough for all of IT… Faster – Lower Power - Smaller 5 NM 50% more switches than 7nm 1st ever 5nm transistor structure (nanosheet) 40% more throughout @ fixed power …or 75% power savings at same throughput 13
  • 14. IBM PowerAI the accelerated platform for deep learning dramatically improved training times The LARGER the problem … the BIGGER the NVLink advantage 4Xthreads/core memory bandwidth more cache UNIQUE CPU ßà NVLink ßà GPU more powerful vs. x86 + 14
  • 15. Graphics Memory POWER8 CPU DDR4 GPU NVLink CPUDDR4 PCIe Graphics Memory POWER8 with NVLink delivers 2.8X the bandwidth PCIe Data Pipe POWER8 NVLink Data Pipe GPU Performance… Faster Training and Inferencing Unique innovation through OpenPower collaboration System Bottleneck Here 15
  • 16. 16 Large Model SupportDistributed Deep Learning faster training times for data scientists (Competitors) Limited memory on GPU forces trade-off in model size / data resolution POWER CPU DDR4 GPU NVLink Graphics Memory (PowerAI) Use system memory and GPU to support more complex models and higher resolution data Traditional Model Support à CPUDDR4 GPU PCIe Graphics Memory Performance… Faster Training and Inferencing 95% scaling efficiency on the Caffe deep learning framework over 256 NVIDIA GPUs in 64 systems IBM Research achieved a new image recognition accuracy of 33.8% for a neural network trained on a very large data set (7.5M images). The previous record published by demonstrated 29.8% accuracy. https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/
  • 17. enterprise-ready software distribution built on open source tools for ease of development performance faster training times for data scientists 17
  • 18. precompiled and current open source frameworks Enterprise-Ready Software Distribution Built on Open Source 18
  • 19. Tools for Ease of Development rich advisory and building toolsets to flatten time to value AI Vision rich toolset image recognition neural networks automated deep learning toolkit data preparation DL Insight toolkit supports auto-training runs for hyper parameter tuning +++ 19
  • 20. In my dreams I’m coding in an open data science framework, running on Spark and Power …in minutes IBM Data Science Experience Learn Create Collaborate Tools for Ease of Development 20
  • 21. Accelerators/GPUs in a Cloud Stack 21 Containers and images Accelerators Clustering frameworks Workload Aware Scheduling Shared Resource Management Emerging Workloads Dev Ops & Micro Services High Performance Computing Design / Simulation / Modeling ‘New-gen Workloads’ Hadoop, Spark, Containers with Spark IBM Cloud private New High Performance Analytics Trade / Risk Analytics IBM Data Science Experience Deep Learning Training & Inference
  • 22. Accelerated Deep Learning on OpenPOWER Try on NIMBIX: https://mc.jarvice.com/ 22
  • 23. Build Deep Learning Docker Images Using PowerAI Software 23 Dockerfile extending Nvidia base images for POWER: FROM nvidia/cuda-ppc64le:8.0-cudnn6-devel-ubuntu16.04 ENV POWERAI_REPO mldl-repo-local_4.0.0_ppc64el.deb RUN apt-get update && apt-get install -y git wget ssh vim curl && apt-get clean # import PowerAI repo RUN cd /tmp && wget https://public.dhe.ibm.com/software/server/POWER/Linux/mldl/ubu ntu/${POWERAI_REPO} && dpkg -i ${POWERAI_REPO} && rm ${POWERAI_REPO} # install PowerAI RUN apt-get update && apt-get install -y power-mldl && apt-get clean IBM Cloud private See example: https://github.com/knm3000/nvidia-powerai/blob/master/Dockerfile
  • 24. Run PowerAI software with NVIDIA Docker 24 A Docker wrapper and tool to package and GPU based apps Enhance portability of images by using drivers on the host No need to include drivers in Docker image See Blog: https://developer.ibm.com/linuxonpo wer/tutorials/powerai-docker-images/ https://github.com/NVIDIA/nvidia-docker/tree/ppc64le
  • 25. Manage GPU clusters with Kubernetes and IBM Cloud private • Open source orchestration system for Docker containers on multiple hosts: https://kubernetes.io/ • GPU scheduling features getting upstream • IBM Cloud private, : Download free Community Edition, Ask on slack • Download RPM from https://www.rpmfind.net/linux/rpm2html/search.php?q uery=kubernetes&arch=ppc64le | 25 http://on-demand.gputechconf.com/gtc/2017/presentation/s7258-seetharami-seelam- speed-up-deep-learning-service.pdf
  • 26. Show >100x speedup for Caffe inferencing with GPUs in PowerAI in NIMBIX in less than 5 minutes PowerAI Trial Configurations in a public cloud: • Docker container builds and comes up in minutes • Single P100 GPUs • 30 days with 60 hrs standard (120 for Sales referral) • 128GB RAM, 32 CPU threads, 1TB shared storage • Quad P100 GPUs • 30 days with 120hrs standard (more by request) • 512GB RAM, 128 CPU threads, 1TB shared storage Nimbix Cloud Advantages • Easier to use • Highest Performance • Ultra Fast Launch Times • Lower Cost • Faster time to Value • Bare-Metal Acceleration • Enterprise Accounting • Application Marketplace • Private Apps https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine- learning-with-openpower-gpus-and-docker Experience performance with productivity A superior integrated stack and adequate hardware resources for deep learning insights https://www.nimbix.net/ibm-power-nimbix-cloud 26
  • 27. 9Days Acceleration training …. days become hours 4Hours Recognition Shape Attenuation Boundary Recognition Shape Attenuation Boundary 54x Learning runs with Power 8 4Hours 4Hours 4Hours 4Hours . . . . . . . . . . . . . . . 4Hours What will you do? Iterate more and create more accurate models? Create more models? Both? 27
  • 28. Developer Resources for POWER systems • Linux on POWER Developer Portal https://developer.ibm.com/linuxonpower/ • Find open source Linux packages in popular OS distros https://developer.ibm.com/linuxonpower/open-source- pkgs/ • Request free VMs from Oregon State University Open Source Lab: http://osuosl.org/services/powerdev/ • Get answers to Linux specific questions in Stack Overflow https://developer.ibm.com/answers/smartspace/linuxo npower/index.html • See Blogs on Deep Learning and PowerAI topics https://developer.ibm.com/linuxonpower/blog/ | 28
  • 29. Notices and Disclaimers 29 Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law. © 2016 International Business Machines C
  • 30. Notices and Disclaimers Con’t. 30 Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.