2. #ibmedge
Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
and at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it
should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal
obligation to deliver any material, code or functionality. Information about potential future products may not be
incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products
remains at our sole discretion.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the
I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be
given that an individual user will achieve results similar to those stated here.
1
3. #ibmedge
About Indrajit (a.k.a. I.P)
Expertise:
• Accelerated Cloud Data Services, Machine
Learning and Deep Learning
• Apache Spark, TensorFlow… with GPUs
• Distributed Computing (scale out and up)
• Cloud Foundry, Spectrum Conductor, Mesos,
Kubernetes, Docker, OpenStack, WebSphere
• Cloud computing on High Performance Systems
• OpenPOWER, IBM POWER
2
Indrajit Poddar
Senior Technical Staff Member,
Master Inventor, IBM Systems
ipoddar@us.ibm.com
Twitter: @ipoddar
8. #ibmedge
We will talk about
7
- Current state of Deep Learning
- Deep Learning for cancer diagnosis (Digital Pathology)
9. #ibmedge
We will talk about
8
- Current state of Deep Learning
- Deep Learning for cancer diagnosis (Digital Pathology)
- TensorFlow framework for Deep Learning
10. #ibmedge
We will talk about
9
- Current state of Deep Learning
- Deep Learning for cancer diagnosis (Digital Pathology)
- TensorFlow framework for Deep Learning
- Distributing TensorFlow with Docker
11. #ibmedge
We will talk about
10
- Current state of Deep Learning
- Deep Learning for cancer diagnosis (Digital Pathology)
- TensorFlow framework for Deep Learning
- Distributing TensorFlow with Docker
- Faster training with TensorFlow on OpenPOWER and GPUs
12. #ibmedge
We will talk about
11
- Current state of Deep Learning
- Deep Learning for cancer diagnosis (Digital Pathology)
- TensorFlow framework for Deep Learning
- Distributing TensorFlow with Docker
- Faster training with TensorFlow on OpenPOWER and GPUs
- Infrastructure for TensorFlow as a Service
16. #ibmedge
15
Medical Data Analysis Example: Image classification
Comparing classification by humans and by machines
Detected by a
Doctor visually
Caught by a
Trained model
17. #ibmedge
Time Scale: Before, Digital Pathology, Deep Learning
16
1980 1990 1997 2005
Video
cameras
Progress in
functional
telemedicine
Robotic
microscopy
First fully
functional WSI
Scanner
ANN intro Yann LeCun et al.,
backpropagation
algorithm
“Deep Learning” for Speech
Recognition
18. #ibmedge
Machines are now learning the way we learn
17
From "Texture of the Nervous System
of Man and the Vertebrates" by
Santiago Ramón y Cajal.
Artificial Neural Networks
20. #ibmedge
Time Scale: Advances in Deep Learning
19
2005 2010 2015
Whole Slide
Image (WSI)
Scanner
2016
GPUs 12 core/socket 8 thread/core
21. #ibmedge
Open Source Deep Learning Libraries
20
IBM Machine Learning and Deep Learning distribution for Ubuntu on OpenPOWER:
http://openpowerfoundation.org/blogs/openpower-deep-learning-distribution/
(does not include TensorFlow and DL4J in the current release)
22. #ibmedge
Why TensorFlow?
• Authored by Google
• OpenSource
• TensorFlow has a Python API
• Use Jupyter notebooks and examples to learn
• Distributed training
21
https://www.tensorflow.org/
23. #ibmedge
Why distribute in clusters and why use GPUs?
• Input data sets are becoming larger
• High resolution images
• Video feeds
• Large number of training features
• Training times are very long (hours, days and weeks)
• Moore’s law is dying
• CPUs are not getting any faster
• Even the largest machine has limited capacity
22
24. #ibmedge
Distributed Deep Learning using TensorFlow
• TensorFlow (version > 0.8.0) can distribute compute intensive tasks on
multiple nodes
• Parameter Server for storing parameters (weight matrix)
• Performing computations in Clients (Workers)
• Once computed, gradients are sent to Parameter Server to update stored parameters
23
SuperVessel Private
Network
◼ Worker Task
◼ Parameter
Server Task
node #1 node #2 node #10
•••◼ Worker Task ◼ Worker Task
# define Parameter Server jobs:
with tf.device('/job:ps/task:%d' % taskID):
...
# define Worker jobs
with tf.device('/job:worker/task:%d' %
taskID):
...
26. The Problem: automated detection of metastases in whole-slide images of lymph node sections,
Source: Camelyon16
The Solution: Train Deep Learning Model, and classify whole slide histology image at “Level 0”
Medical Data Analysis Example
27. #ibmedge
Questions to address
26
- How long does it take to train a model?
- How performance will scale vs the cluster size?
- How scaling the cluster out will affect accuracy?
28. #ibmedge
27
Deep Learning in a TensorFlow cluster
Goal: improve the training time for Camelyon16 without losing accuracy significantly.
100K images, ~2GB
4 training epochs
(~5.5k iterations at batch size 72)
VGG model
29. #ibmedge
28
Medical Data Analysis Example: applying Deep Learning
Goal: improve the training time for Camelyon16 without losing accuracy significantly.
100K images, ~2GB
4 training epochs
(~5.5k iterations at batch size 72)
VGG model
30. #ibmedge
29
Medical Data Analysis Example: applying Deep Learning
Accuracy metrics: ROC
100K images, ~2Gb
4 training epochs
(~5.5k iterations at batch size 72)
VGG model
Zoom
33. #ibmedge
Example Dockerfile to create Deep Learning images
32
FROM ppc64le/ubuntu:14.04
MAINTAINER Mike Hollinger <mchollin@us.ibm.com>
#bring in some base utils
RUN apt-get -y update && apt-get -y install software-properties-common wget
build-essential bash-completion #enable apt-add-repository and wget for the next line and
for the cuda installer to work correctly
RUN apt-get -y install dictionaries-common #inexplicably, this needs to be first before
vnc-related things will install successfully
#install VNC and VNC-related items
RUN apt-get -y install x11vnc xfce4 xvfb xfce4-artwork xubuntu-icon-theme
#install advanced toolchain and Linux SDK
RUN wget
ftp://public.dhe.ibm.com/software/server/iplsdk/v1.9.0/packages/deb/repo/dists/trusty/B346
CA20.gpg.key -O /tmp/B346CA20.gpg.key
RUN wget
ftp://ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu/dists/precise/6976a827.gpg.key -O
/tmp/6976a827.gpg.key
RUN wget
http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/p
ublic.gpg -O /tmp/xl_public.gpg
RUN apt-key add /tmp/B346CA20.gpg.key
RUN apt-key add /tmp/6976a827.gpg.key
RUN apt-key add /tmp/xl_public.gpg
RUN add-apt-repository "deb ftp://ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty
at9.0"
RUN apt-get -y update
RUN apt-get -y install advance-toolchain-at9.0-runtime
advance-toolchain-at9.0-devel
advance-toolchain-at9.0-perf
advance-toolchain-at9.0-mcore-libs
#install XL C/C++ Community Edition, auto-accepting the license (from Ke Wen Lin)
RUN apt-get -y install xlc.13.1.4 xlc-license-community.13.1.4
RUN mkdir -p /opt/ibm/xlC/13.1.4/lap/license/ && chmod a+rx
/opt/ibm/xlC/13.1.4/lap/license
RUN echo "Status=9" >/opt/ibm/xlC/13.1.4/lap/license/status.dat
RUN /opt/ibm/xlC/13.1.4/bin/xlc_configure
RUN apt-get -y install ibm-sdk-lop
#bring in the ibm mldl PPA
RUN apt-add-repository -y ppa:ibmpackages/mldl
#bring local cuda repo with GPU driver 352.39 and CUDA 7.5
RUN wget
http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo
-ubuntu1404-7-5-local_7.5-18_ppc64el.deb &&
dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb && apt-get
update &&
apt-get install -y --no-install-recommends --force-yes cuda
gpu-deployment-kit &&
ln -s /usr/lib/nvidia-352/libnvidia-ml.so /usr/lib/libnvidia-ml.so &&
rm cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb
#bring in and install cudnn
COPY rootfs/cudnn-7.0-linux-ppc64le-v3.0-prod.tgz
/tmp/cudnn-7.0-linux-ppc64le-v3.0-prod.tgz
RUN tar --no-same-owner -xvf /tmp/cudnn-7.0-linux-ppc64le-v3.0-prod.tgz -C /usr/local
#copy then untar to handle ownership problems vs "add"
#install the MLDL frameworks
RUN apt-get update && apt-get -y install torch caffe theano
continued ..
install deep learning
software
install GPU drivers
and libraries
37. #ibmedge
OpenPOWER: GPU support
36
GPU
Credit: Kevin Klaues, Mesosphere
IBM Spectrum
Conductor includes
enhanced support for
fine grained GPU and
CPU scheduling with
Apache Spark and
Docker
Mesos supports GPUs
38. #ibmedge
POWER8 Core: Back bone of big data computing system
• Enhanced Micro-Architecture
• Increased Execution Bandwidth
• SMT 8
• Transactional Memory
• Vector/Scalar Unit
• High-performance Integer & FP Vector Processor
• Optimized for Data Rich Applications
VSU
FXU
IFU
DFU
ISU
PC
PC
LSU
39. #ibmedge
Combined I/O Bandwidth = 7.6Tb/s
POWER
8
Process
or
Memory
Buffers
Memory
Buffers
PCI
DMI
PCI
POWER8
Processor
POWER8
Processor
DMI
DMI
DMI
DMI
DMI
DMI
DMI
NODE-to-NODE
ON-NODE
SMP
Putting it all together with the memory links, on- and off-node SMP links
as well as PCIe, at 7.6Tb/s of chip I/O bandwidth
40. #ibmedge
New OpenPOWER Systems with NVLink
39
S822LC-hpc “Minsky”:
2 POWER8 CPUs with 4 NVIDIA® Tesla® P100 GPUs GPUs
hooked directly to CPUs using Nvidia’s NVLink high-speed
interconnect
http://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html
42. #ibmedge
Machine Learning and Deep Learning analytics on OpenPOWER
No code changes needed!!
41
ATLAS
Automatically Tuned Linear Algebra
Software)
43. #ibmedge
Challenges and what’s next
42
● Infrastructure issues:
○ Advanced resource scheduling with Platform Conductor and Kubernetes or Mesos
○ More GPUs per system (up to 4-16 cards) for improved power consumption and better density
● TensorFlow issues:
○ Resolve problems with TF-Slim and model convergence
○ Integrate HDFS or another Distributed FS with TensorFlow
○ Try Synchronous training and compare results with Asynchronous
● Improve model training for better accuracy:
○ Train on a 300K samples dataset
○ Increase the number of training iterations to 30 epochs
○ 2 iteration for update False-Positive samples in dataset
○ Use another model (change from VGG16 to Inception-v3)
44. #ibmedge
More related sessions at Edge
•Expo Center Demo
•Tue, Sept 20, 1:00-2:00PM, RM 312: Docker on IBM Power Systems: Build, Ship and Run
•Tue, Sept 20, 1:00-2:00PM, RM 313: Docker Containers for High Performance Computing
•Tue, Sept 20, 1:00-2:00PM, RM 317C: Lab: FPGA Virtualization and Operations Environment for
Accelerator Application Development on Cloud
•Tue, Sept 20, 2:15-3:15PM, RM 320: Bringing the Deep Learning Revolution into the Enterprise
•Tue, Sept 20, 5:00-6:00PM, RM 308, Thu, Sep 22, 09:45 AM - 10:45 AM : Enabling Cognitive
Workloads on the Cloud: GPU Enablement with Mesos, Docker and Marathon on POWER
•Wed, Sept 21, 9:45-10:45AM, RM 317 C: Lab: Fast, Scalable, Easy Machine Learning in the Cloud
with OpenPOWER, GPUs and Docker
46. #ibmedge
Notices and Disclaimers Con’t.
45
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.