Talk @ ACM SF Bayarea Chapter on Deep Learning for medical imaging space.
The talk covers use cases, special challenges and solutions for Deep Learning for Medical Image Analysis using Tensorflow+Keras. You will learn about:
- Use cases for Deep Learning in Medical Image Analysis
- Different DNN architectures used for Medical Image Analysis
- Special purpose compute / accelerators for Deep Learning (in the Cloud / On-prem)
- How to parallelize your models for faster training of models and serving for inferenceing.
- Optimization techniques to get the best performance from your cluster (like Kubernetes/ Apache Mesos / Spark)
- How to build an efficient Data Pipeline for Medical Image Analysis using Deep Learning
- Resources to jump start your journey - like public data sets, common models used in Medical Image Analysis
2. Agenda
Use cases for Deep Learning in Medical Imaging
What is Deep Learning?
Deep Learning models in Medical Imaging
Rise of Specialized Compute
Techniques for Optimization
E2E Pipeline
Look into future
Steps for starting your journey
References
4. Deep Learning in Medical Imaging
Real-time
Clinical
Diagnostics
(Enlitic)
Whole-body
Portable
Ultrasound
(Butterfly Networks,
Baylabs)
Radiology
Assistant, Cloud
Imaging AI
(Zebra, Arterys)
Intelligent Stoke
Care
(Viz.ai)
Screening
Tumor, Diabetic
Retinopathy
(Google, Enlitic, IBM)
Oncology
(Flatiron Health)
5. Source: Nature
Skin Cancer
5.4M cases on non-melanoma
skin cancer each year in US
20% Americans will get skin
cancer
Actinic Keratosis (pre-cancer)
affects 58 M Americans
78k melanomas each year –
10K deaths
$8.1B in US annual costs for skin
cancer
5
6. Successes!
Mammographic mass
classification
Brain Lesions
Air way leakages
Diabetic Retinopathy
Prostrate Segmentation
Breast cancer metastasis
Skin Lesion Classification
Bone suppression in Chest X-Rays
6
Source: arXiv:1702.05747
7. What is
Deep
Learning?
AI Neural Networks
composed of many
layers
Learn like humans
Automated Feature
Learning
Layers are like Image
Filters
11. Shift towards Specialized Compute
Special purpose Cloud
Google TPU, Microsoft Brainwave, Intel Nervana, IBM Power AI, Nvidia v100
Bare Metal Cloud – Preview AWS, GCE coming April 2018
Spectrum: CPU, GPU, FPGA, Custom Asics
Edge Compute: Hardware accelerators, AI SOC
Intel Neural Compute Stick, Nvidia Jetson, Nvidia Drive PX (Self driving cars)
Architectures
Cluster Compute, HPC, Neuromorphic, Quantum compute
Complexity in Software
Model tuning/optimizations specific to hardware
Growing need for compilers to optimize based on deployment hardware
Workload specific compute: Model training, Inference
11
12. CPU Optimizations
Leverage High Performant compute tools
Intel Python, Intel Math Kernel Library (MKL),
NNPack (for multi-core CPUs)
Compile Tensorflow from Source for CPU
Optimizations
Proper Batch size, using all cores & memory
Proper Data Format
NCHW for CPUs vs Tensorflow default NHWC
Use Queues for Reading Data
Source: Intel Research Blog
12
14. Parallelize your models
Data Parallelism
Tensorflow Estimator + Experiments
Parameter Server, Worker cluster
Intel BigDL Spark Cluster
Baidu’s Ring AllReduce
Uber’s Horovod TensorFusion
HyperTune Google Cloud ML
Model Parallelism
Graph too large to fit on one
machine
Tensorflow Model Towers
14
16. Workload Partitioning
Source: Amazon MxNET
Minimize communication time
Place neighboring layers on same GPU
Balance workload between GPUs
Different layers have different memory-compute
properties
Model on left more balanced
LSTM unrolling: ↓ memory, ↑ compute time
Encode/Decode: ↑ memory
16
17. Optimizations for Inferencing
Graph Transform Tool
Freeze graph (variables to constants)
Quantize weights (20 M weights for IV3)
Inception v3 93 MB → 1.5 MB
Pruning, Weight Sharing, Deep Compression
AlexNet 35x smaller, VGG-16 49x smaller
3x to 4x speedup, 3x to 7x more energy-efficient
17
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=/tmp/classify_image_graph_def.pb
--outputs="softmax" --out_graph=/tmp/quantized_graph.pb
--transforms='add_default_attributes strip_unused_nodes(type=float,
shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'
18. Cluster
Optimizations
Define your ML Container locally
Evaluate with different parameters in the cloud
Use EFS / GFS for data storage and sharing across
nodes
Create separate Data processing container
Mount EFS/GFS drive on all pods for shared
storage
Avoid GPU Fragmentation problems by bundling
jobs
Placement optimizations – Kubernetes Bundle
as pods, Mesos placement constraints
GPU Drivers bundling in container a problem
Mount as Readonly volume, or use Nvidia-
docker
18
19. Uber’s
Horovod on
Mesos
Peleton Gang Scheduler
MPI based bandwidth
optimized communication
Code for one GPU, replicates
across cluster
Nested Containers
19
Source: Uber Mesoscon
20. Pipeline:
Google’s TFX
20
Continuous Training & Serving
Data Analysis, Transformation,
Validation
Model Training, Validation,
Serving
Warm-Startup
22. Future: FPGA Hardware Microservices
Project Brainwave Source: Microsoft Research Blog
22
23. FPGA Optimizations
Brainwave Compiler Source: Microsoft Research Blog
23
Can FPGA Beat GPU Paper:
➢ Optimizing CNNs on Intel FPGA
➢ FPGA vs GPU: 60x faster, 2.3x more energy-
efficient
➢ <1% loss of accuracy
ESE on FPGA Paper:
➢ Optimizing LSTMs on Xilinx FPGA
➢ FPGA vs CPU: 43x faster, 40x more energy-
efficient
➢ FPGA vs GPU: 3x faster, 11.5x more energy-
efficient
26. Medical Imaging Open Datasets
http://www.cancerimagingarchive.net/
Lung Cancer, Skin Cancer, Breast Cancer….
Kaggle Open Datasets
Diabetic Retinopathy, Lung Cancer
Kaggle Data Science Bowl 2018
https://www.kaggle.com/c/data-science-bowl-2018
ISIC Skin Cancer Dataset
https://challenge.kitware.com/#challenge/583f126bcad3a51cc66c8d9a
Grand Challenges in Medical Image Analysis
https://grand-challenges.grand-challenge.org/all_challenges/
And more…
https://github.com/sfikas/medical-imaging-datasets
26
27. Where to start your journey?
Level 1: Just Starting
Start with the Kaggle and other Open Competitions
Use the existing pre-trained networks (like GoogleNet) with the Medical Open Source
data
Level 2: Intermediate
Experiment with models specific to Medical Imaging space like U-Net/V-Net
Combine 3rd party data sets for greater insights
Level 3: Advanced
Experiment with building new models from scratch
Level 4: Mature
Add feedback loop to your models, learning from outcomes
Experiment with Deep Reinforcement Learning
Industrialize the ML/DL Pipeline, shared model repository across company
27
28. Resources
CBInsights AI in Healthcare Map: https://www.cbinsights.com/research/artificial-intelligence-startups-healthcare/
DL in Medical Imaging Survey : https://arxiv.org/pdf/1702.05747.pdf
Unet: https://arxiv.org/pdf/1505.04597.pdf
Learning to diagnose from scratch exploiting dependencies in labels: https://arxiv.org/pdf/1710.10501.pdf
TieNet Chest X-Ray Auto-reporting: https://arxiv.org/pdf/1801.04334.pdf
Dermatologist level classification of Skin Cancer using DL: https://www.nature.com/articles/nature21056
Tensorflow Intel CPU Optimized: https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-
architecture
Tensorflow Quantization: https://www.tensorflow.org/performance/quantization
Deep Compression Paper: https://arxiv.org/abs/1510.00149
Microsoft’s Project Brainwave: https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/
Can FPGAs Beat GPUs?: http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf
ESE on FPGA: https://arxiv.org/abs/1612.00694
Intel Spark BigDL: https://software.intel.com/en-us/articles/bigdl-distributed-deep-learning-on-apache-spark
Baidu’s Paddle-Paddle on Kubernetes: http://blog.kubernetes.io/2017/02/run-deep-learning-with-paddlepaddle-on-
kubernetes.html
Uber’s Horovod Distributed Training framework for Tensorflow: https://github.com/uber/horovod
TFX: Tensorflow based production scale ML Platform: https://dl.acm.org/citation.cfm?id=3098021
Explainable AI: https://www.cc.gatech.edu/~alanwags/DLAI2016/(Gunning)%20IJCAI-16%20DLAI%20WS.pdf
28