Transparent Hardware Acceleration for Deep Learning

Transparent Hardware
Acceleration for Deep
Learning
Indrajit Poddar (I.P), STSM, IBM Cognitive Systems
ipoddar@us.ibm.com

Data
Time
Available
Data
Understood
Data
Enterprise
Amnesia
80 million
wearable health
devices will
be available by
2017.
2.5
quintillion
bytes of data
generated daily
by connected
machines.
There
will be
28 times
more
sensor-
enabled
devices
than
people
by the
year 2020.
25 gigabytes
of data per hour
is generated by a
connected car.
90% of cars will
be connected by 2020.
153 exabytes
of healthcare
data generated by
devices in 2013.
Increasing to 2,314
exabytes in 2020.
1.7 megabytes
of data per
second
generated by
every human
being on the
planet by 2020.
2

see hear feel
talk learn
write read
find discover
3

accident
risk
rate
90%
inspection
times
10X
number of
inspections
4

Assembled the 1.2 billion letter genome
(faster and cheaper than ever before)
to understand its vulnerabilities
Culex quinquefasciatus
The Southern House Mosquito
5

Radiologists
Overloaded with medical
imaging data.
Eye Fatigue.
Missed Diagnoses.
Radiologists are scarce.
6

Shape
Attenuation
Boundary
save
time
money
lives
Technology
understands
morphology.
91% accuracy
cancerous
determination.
Holy grail?
Premalignant
lesions
7

22,000 computer node cluster
716,000 Intel CPUs
30 computer node cluster
60 Power CPUs
120 NVIDIA Tesla P100 GPUs
NVidia NVLink
Oil & Gas Billion Cell Reservoir calculation
8

Artificial
Intelligence
and
Cognitive
Applications
Machine
Learning
Deep
Learning
(Neural Networks)
The deeper you go, the more value you gain,
and the more you know
9

As neural networks go deeper,
they provide
a dramatic increase
in accuracy.
Higher accuracy
networks require
way higher
computation
which increases
prediction latency.
10

When scale-out is not enough…
Deep Learning model training is not easy to distribute
Training can take hours,
days or weeks with large
data-sets
Real-time analytics possible with:
Unprecedented demand for offloaded computation,
accelerators, and higher memory bandwidth systems
Resulting in….
Moore’s law is dying
11

OpenPOWER: Open Hardware for High Performance
1
2
Systems designed for
big data analytics
and superior cloud economics
Upto:
12 cores per cpu
96 hardware threads per cpu
§1 TB RAM
7.6Tb/s combined I/O Bandwidth
GPUs and FPGAs coming…
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://mc.jarvice.com/

Why IBM's shrinking transistors look
like a breakthrough for all of IT…
Faster – Lower Power - Smaller
5 NM 50% more switches than 7nm
1st ever 5nm transistor structure
(nanosheet)
40% more throughout @ fixed power …or
75% power savings at same throughput
13

IBM PowerAI
the accelerated
platform for
deep learning
dramatically improved training times
The LARGER the problem …
the BIGGER the NVLink advantage
4Xthreads/core
memory bandwidth
more cache
UNIQUE
CPU ßà NVLink ßà GPU
more
powerful
vs. x86 +
14

Graphics
Memory
POWER8
CPU
DDR4
GPU
NVLink
CPUDDR4
PCIe
Graphics
Memory
POWER8 with NVLink
delivers 2.8X the bandwidth
PCIe Data Pipe
POWER8 NVLink
Data Pipe
GPU
Performance…
Faster Training
and Inferencing
Unique innovation through
OpenPower collaboration
System Bottleneck Here
15

16
Large Model SupportDistributed Deep Learning
faster training times
for data scientists
(Competitors)
Limited memory on GPU forces
trade-off in model size / data
resolution
POWER
CPU
DDR4
GPU
NVLink
Graphics
Memory
(PowerAI)
Use system memory and GPU
to support more complex models
and higher resolution data
Traditional Model Support à
CPUDDR4
GPU
PCIe
Graphics
Memory
Performance…
Faster Training
and Inferencing
95% scaling efficiency on the Caffe deep learning
framework over 256 NVIDIA GPUs in 64 systems
IBM Research achieved a new image recognition accuracy of 33.8% for a
neural network trained on a very large data set (7.5M images). The
previous record published by demonstrated 29.8% accuracy.
https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/

enterprise-ready
software distribution
built on open source
tools for ease
of development
performance
faster training times
for data scientists
17

precompiled and current
open source frameworks
Enterprise-Ready
Software Distribution
Built on Open Source
18

Tools for Ease
of Development
rich advisory and building
toolsets to flatten
time to value
AI Vision
rich toolset image
recognition neural
networks
automated deep learning
toolkit data preparation
DL Insight toolkit supports
auto-training runs for
hyper parameter tuning
+++
19

In my dreams
I’m coding in
an open data science
framework,
running on Spark and
Power
…in minutes
IBM Data Science Experience
Learn
Create
Collaborate
Tools for Ease
of Development
20

Accelerators/GPUs in a Cloud Stack
21
Containers
and images
Accelerators
Clustering frameworks
Workload
Aware
Scheduling
Shared
Resource
Management
Emerging
Workloads
Dev Ops & Micro Services
High Performance
Computing
Design / Simulation / Modeling
‘New-gen
Workloads’
Hadoop, Spark, Containers
with Spark
IBM
Cloud
private
New
High Performance
Analytics
Trade / Risk Analytics
IBM Data
Science
Experience
Deep Learning Training & Inference

Accelerated Deep Learning on OpenPOWER
Try on NIMBIX: https://mc.jarvice.com/
22

Build Deep Learning Docker Images Using PowerAI Software
23
Dockerfile extending Nvidia base images for POWER:
FROM nvidia/cuda-ppc64le:8.0-cudnn6-devel-ubuntu16.04
ENV POWERAI_REPO mldl-repo-local_4.0.0_ppc64el.deb
RUN apt-get update && apt-get install -y git wget ssh vim curl &&
apt-get clean
# import PowerAI repo
RUN cd /tmp && wget
https://public.dhe.ibm.com/software/server/POWER/Linux/mldl/ubu
ntu/${POWERAI_REPO} && dpkg -i ${POWERAI_REPO} && rm
${POWERAI_REPO}
# install PowerAI
RUN apt-get update && apt-get install -y power-mldl && apt-get
clean
IBM
Cloud
private
See example: https://github.com/knm3000/nvidia-powerai/blob/master/Dockerfile

Run PowerAI software with NVIDIA Docker
24
A Docker wrapper and tool
to package and GPU based
apps
Enhance portability of
images by using drivers on
the host
No need to include drivers
in Docker image
See Blog:
https://developer.ibm.com/linuxonpo
wer/tutorials/powerai-docker-images/
https://github.com/NVIDIA/nvidia-docker/tree/ppc64le

Manage GPU clusters with Kubernetes and IBM Cloud private
• Open source orchestration system
for Docker containers on multiple
hosts: https://kubernetes.io/
• GPU scheduling features getting
upstream
• IBM Cloud private, :
Download free Community Edition,
Ask on slack
• Download RPM from
https://www.rpmfind.net/linux/rpm2html/search.php?q
uery=kubernetes&arch=ppc64le
| 25
http://on-demand.gputechconf.com/gtc/2017/presentation/s7258-seetharami-seelam-
speed-up-deep-learning-service.pdf

Show >100x speedup for Caffe inferencing with GPUs in PowerAI in
NIMBIX in less than 5 minutes
PowerAI Trial Configurations in a public cloud:
• Docker container builds and comes up in minutes
• Single P100 GPUs
• 30 days with 60 hrs standard (120 for Sales referral)
• 128GB RAM, 32 CPU threads, 1TB shared storage
• Quad P100 GPUs
• 30 days with 120hrs standard (more by request)
• 512GB RAM, 128 CPU threads, 1TB shared storage
Nimbix Cloud Advantages
• Easier to use
• Highest Performance
• Ultra Fast Launch Times
• Lower Cost
• Faster time to Value
• Bare-Metal Acceleration
• Enterprise Accounting
• Application Marketplace
• Private Apps
https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine-
learning-with-openpower-gpus-and-docker
Experience performance
with productivity
A superior integrated stack and
adequate hardware resources
for deep learning insights
https://www.nimbix.net/ibm-power-nimbix-cloud
26

9Days
Acceleration training …. days become hours
4Hours
Recognition
Shape
Attenuation
Boundary
Recognition
Shape
Attenuation
Boundary
54x
Learning
runs with
Power 8
4Hours
4Hours
4Hours
4Hours
. . . . . . . .
. . . . . . .
4Hours
What will you do?
Iterate more and create more accurate models?
Create more models?
Both?
27

Developer Resources for POWER systems
• Linux on POWER Developer Portal
https://developer.ibm.com/linuxonpower/
• Find open source Linux packages in
popular OS distros
https://developer.ibm.com/linuxonpower/open-source-
pkgs/
• Request free VMs from Oregon State
University Open Source Lab:
http://osuosl.org/services/powerdev/
• Get answers to Linux specific questions
in Stack Overflow
https://developer.ibm.com/answers/smartspace/linuxo
npower/index.html
• See Blogs on Deep Learning and
PowerAI topics
https://developer.ibm.com/linuxonpower/blog/
| 28

Notices and Disclaimers
29
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written
permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of
initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS
DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM
THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF
OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those
customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries
in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials
and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant
or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as
to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any
actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services
or products will ensure that the customer is in compliance with any law.
© 2016 International Business Machines C

Notices and Disclaimers Con’t.
30
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources.
IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to
non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of
any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES,
EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other
intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™,
FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on
Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®,
PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS,
Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are
trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Transparent Hardware Acceleration for Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transparent Hardware Acceleration for Deep Learning

Similar to Transparent Hardware Acceleration for Deep Learning (20)

Recently uploaded

Recently uploaded (20)

Transparent Hardware Acceleration for Deep Learning