SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Improving Hardware Efficiency for
DNN Applications
Hai (Helen) Li
Electrical and Computing Engineering Department
Duke Center for Evolutionary Intelligence
Technology Landscape
2
Desktop/Workstation
• Personal
computing
• Wired internet
Mobile Phone
• Sensing
• Display
• Wireless
communication
& internet
• Computing at
the edge
• Data integration
• Large-scale storage
• Large-scale
computing
Data Center / Cloud
• IoT
• Robotics
• Industrial internet
• Self-driving cars
• Smart Grid
• Secure autonomous
networks
• Real-time data to
decision
Intelligent SystemsProgram Learn
Static
Offline
Virtual world
Dynamic
Online
Real world
T. Hylton, “Perspectives on
Neuromorphic Computing,” 2016.
Deep Neural Networks
Deeper Model
 Lower Error Rate
 Higher Requirement on Computation
Input
Information
Weights
Recognition
Results
ImageNet Challenge (ILSVRC)
• Dataset: 1.2M images in 1K categories
• Classification: make 5 guesses
0
20
40
60
80
100
120
140
160
0
8
16
24
32
2010 2011 2012 2013 2014 2015
3.57%
6.7%
11.7%
16.4%
25.8%
28.2%
Shallow 8 8 22
152
Top5 Error # Layers
3
Data Explosion & Hardware Development
• Every minute, we send over 200
million emails, click almost 2
million likes on Facebook, send
almost 300K tweets and up-load
200K photos to Facebook as
well as 100 hours of video to
YouTube.
• “Data centers consumes up to 1.5%
of all the world’s electricity…”
• “Google’s data centers draw almost
260 MW of power, which is more
power than Salt Lake City uses…”
4
Google’s “Council Bluffs” data center
facilities in Iowa.
2X transistor count,
But only 40% faster,
50% more efficient…
L. Ceze, “Approximate Overview of Approximate Computing”.
J. Glanz, “Google Details, and Defends, Its Use of Electricity”
Hardware Acceleration for DNNs
• GPUs
– Fast, but high power
consumption (~200W)
– Training DNNs in back-end
GPU clusters
• FPGAs
– Massively parallel + low-power
(~25W) + reconfigurable
– Suitable for latency-sensitive
real-time inference job
• ASICs
– Fast + energy efficient
– Long development cycle
• Novel architectures and
emerging devices
5
Efficiency vs. Flexibility
Mismatch: Software vs. Hardware
Software Hardware
Model/Component scale Large Small/Moderate
Reconfigurability Easy Hard
Accuracy vs. Power Accuracy Tradeoff
Training implementation Easy Hard
Precision vs.
Limited programmability
Double (high)
precision
Low precision
(often a few bits)
Connectivity realization Easy Hard
6
Our work: Improve Efficiency for DNN Applications thorugh
Software/Hardware Co-Design Framework
Outline
Our work: Improve Efficiency for DNN Applications
Through Software/Hardware Co-Design
• Introduction
• Research Spotlights
– Structured Sparsity Regularization (NIPS’16)
– Local Distributed Mobile System for DNN
– ApesNet for Image Segmentation
• Conclusion
7
Related Work
• State-of-the-art methods to reduce the number of parameters
– Weight regularization (L1-norm)
– Connection pruning
8
Layer conv1 conv2 conv3 conv4 conv5
Sparsity 0.927 0.95 0.951 0.942 0.938
Theoretical speedup 2.61 7.14 16.12 12.42 10.77
AlexNet, B. Liu, et al., CVPR 2015
AlexNet, S. Han, et al., NIPS 2015
Remained
Theoretical Speedup ≠ Practical Speedup
• Forwarding speedups of
AlexNet on GPU platforms
and the sparsity.
• Baseline is GEMM of cuBLAS.
• The sparse matrixes are stored
in the format of compressed
sparse row (CSR) and
accelerated by cuSPARSE.
9
Speedup Sparsity
B. Liu, et al., “Hardcoding nonzero weights in source code,” CVPR’15
S. Han, et al., “Customizing an EIE chip accelerator for compressed DNN,” ISCA’17
Random
Sparsity
Irregular
Memory
Access
Poor
Data
Locality
No or
trivial
Speedup
Software or
Hardware
Customization
Structured
Sparsity
Regular
Memory
Access
Good
Data
Locality
Real
Speedup
Combining
SW and HW
Customization
Computation-efficient Structured Sparsity
Example 1: Removing 2D filters in convolution (2D-filter-wise sparsity)
Example 2: Removing rows/columns in GEMM (row/column-wise sparsity)
http://deeplearning.net/
3D filter =
stacked 2D filters
Lowering
Weight matrix
filter
feature map
GEMM
GEneral Matrix Matrix Multiplication
Non-structured sparsity
Structured sparsity
featurematrix
10
Structured Sparsity Regularization
• Group Lasso regularization in ML model
11
argmin
w
E w   argmin
w
ED
w  
s.t.	Rg
w g
			
argmin
w
E w   argmin
w
ED
w g
Rg
w  
w0
w1
w2
		
ED
w Example:
Rg
w0
,w1
,w2  w0
2
w1
2
 w2
2
g
group 1 group 2
M. Yuan, 2006
(w0,w1)=(0,0)
Many groups will be zeros
SSL: Structured Sparsity Learning
• Group Lasso regularization in DNNs
• The learned structured sparsity is determined by the way of splitting
groups
12
shortcut
depth-wisefilter-wise
channel-wise
…
shape-wise
W(l)
:,cl,:,:
W(l)
nl,:,:,:
W
(l)
:,cl,ml,kl
W(l)
Penalize unimportant
filters and channels Learn filter shapes Learn the depth of layers
SSL: Structured Sparsity Learning
• Group Lasso regularization in DNNs
13
shortcut
depth-wisefilter-wise
channel-wise …
shape-wise
W(l)
:,cl,:,:
W(l)
nl,:,:,:
W
(l)
:,cl,ml,kl
W(l)
filters and channels Learn filter shapes Learn the depth of layers
Penalizing Unimportant Filters and Channels
14
LeNet # Error Filter # Channel # FLOP Speedup
Baseline - 1 0.9% 20 – 50 1 – 20 100% – 100% 1.00× – 1.00×
2 0.8% 5 – 19 1 – 4 25% – 7.6% 1.64× – 5.23×
3 1.0% 3 - 12 1 – 3 15% – 3.6% 1.99× – 7.44×
LeNet on MNIST
The data is represented in the order of conv1 – conv2.
1
2
3
Conv1 filters (gray level 128 represents zero)
SSL obtains FEWER but more natural patterns
Learning Smaller Filter Shapes
LeNet on MNIST
LeNet # Error Filter # Channel # FLOP Speedup
Baseline - 1 0.9% 25 – 500 1 – 20 100% – 100% 1.00× – 1.00×
4 0.8% 21 – 41 1 – 2 8.4% – 8.2% 2.33× – 6.93×
5 1.0% 7 - 14 1 – 1 1.4% – 2.8% 5.19× – 10.82×
The size of filters after removing zero shape fibers, in the order of conv1 – conv2.
Learned shapes of
conv1 filters:
LeNet 1
5x5
LeNet 4
21
LeNet 5
7
Learned shape of conv2 filters @ LeNet 5 3D 20x5x5 filters is regularized to 2D filters!
Smaller weight matrix
16
SSL obtains SMALLER filters w/o accuracy loss
Learning Smaller Dense Weight Matrix
17
ConvNet # Error Row Sparsity Column Sparsity Speedup
Baseline 1 17.9% 12.5%–0%–0% 0%–0%–0% 1.00×–1.00×–1.00×
4 17.9% 50.0%–28.1%–1.6% 0%–59.3%–35.1% 1.43×–3.05×–1.57×
5 16.9% 31.3%–0%–1.6% 0%–42.8%–9.8% 1.25×–2.01×–1.18×
ConvNet on CIFAR-10
Row/column sparsity is represented in the order of conv1–conv2–conv3.
The learned conv1 filters
3
2
1
SSL can efficiently learn DNNs with smaller but dense
weight matrix which has good locality
Regularizing The Depth of DNNs
18
Baseline, K. He, CVPR’16
# layers Error # layers Error
ResNet 20 8.82% 32 7.51%
SSL-ResNet 14 8.54% 18 7.40%
ResNet on CIFAR-10
shortcut
depth-wise W(l)
Depth-wise SSL
Experiments – AlexNet@ImageNet
• SSL saves 30-40% (60-70%) FLOPs with 0 (<1.5%) accuracy loss by
structurally removing 2D filters.
• Deeper layers have higher sparsity.
19
0
20
40
60
80
100
0
20
40
60
80
100
41.5 42 42.5 43 43.5 44
conv1
conv2
conv3
conv4
conv5
FLOP
p y
% top-1 error
%2D-filtersparsity
%FLOPreduction
Baseline
http://deeplearning.net/
Stacked
2D filters
Learning 2D-filter-wise sparsity
3D convolution = sum of 2D convolutions:
Experiments – AlexNet@ImageNet
Higher speedups than non-structured speedups
• 5.1X/3.1X layer-wise speedup on CPU/GPU with 2% accuracy loss
• 1.4X layer-wise speedup on both CPU and GPU w/o accuracy loss
20
0
1
2
3
4
5
6
Quadro Tesla Titan
Black
Xeon
T8
Xeon
T4
Xeon
T2
Xeon
T1
l1 SSL
speedup
2% loss
Method
Top1
Error
Statistics conv1 conv2 conv3 conv4 conv5
1 L1 44.67%
Sparsity 67.6% 92.4% 97.2% 96.6% 94.3%
CPU 0.80× 2.91× 4.84× 3.83× 2.76×
GPU 0.25× 0.52× 1.38× 1.04× 1.36×
2 SSL 44.66%
Col. Sparsity 0.0% 63.2% 76.9% 84.7% 80.7%
Row Sparsity 9.4% 12.9% 40.6% 46.9% 0.0%
CPU 1.05× 3.37× 6.27× 9.73× 4.93×
GPU 1.00× 2.37× 4.94× 4.03× 3.05×
3 Pruning* 42.80% Sparsity 16.0% 62.0% 65.0% 63.0% 63.0%
4 L1 42.51%
Sparsity 14.7% 76.2% 85.3% 81.5% 76.3%
CPU 0.34× 0.99× 1.30× 1.10× 0.93×
GPU 0.08× 0.17× 0.42× 0.30× 0.32×
5 SSL 42.53%
Sparsity 0.00% 20.9% 39.7% 39.7% 24.6%
CPU 1.00× 1.27× 1.64× 1.68× 1.32×
GPU 1.00× 1.25× 1.63× 1.72× 1.36×
Open Source
• Source code in Github, and trained model in model zoo
• https://github.com/wenwei202/caffe/tree/scnn
21
Collaborated Work with Intel – ICLR 2017
Jongsoo Park, et al, “Faster CNNs with Direct Sparse Convolutions and Guided
Pruning,” ICLR 2017.
 Direct Sparse Convolution: an
efficient implementation of
convolution with sparse filters
 Performance Model: A
predictor of the speedup vs.
sparsity level
 Guided Sparsity Learning: A
performance-model-supervised
pruning process that avoids
pruning layers, which are
unbeneficial for speedup but
harmful for accuracy
27
Outline
Our work: Improve Efficiency for DNN Applications
Through Software/Hardware Co-Design
• Introduction
• Research Spotlights
– Structured Sparsity Regularization
– Local Distributed Mobile System for DNN (DATE’17)
– ApesNet for Image Segmentation
• Conclusion
28
Neural Network on Mobile Platforms
Advantages: Data Accessibility
Mobility
Navigation
AR/VR ApplicationsImage Analysis
Language Processing
Object Recognition Motion Prediction
Malware Detection
Wearable Devices
(1) A brain-inspired
chip from IBM
(2) Google glass
(3) Nixie
(4) Android wear
smart watch
Applications:
29
Challenges and Preliminary Analysis
30
Layer Analysis of DNNs on Smartphones:
Computing-intensive Memory-intensive
Security Battery Capacity Hardware Performance
CPU MemoryGPU
Challenges:
Original DNN Model
Local Distributed Mobile System for DNN
31
System Overview of MoDNN:
BODP
MSCC&
FGCP
Model Processor
Middleware
Computing Cluster
Group Owner
Worker Node Worker Node
Optimization of Convolutional Layers
32
Node 0 Node 1
Node 2 Node 3
Node 3Node 2
Node 0 Node 1
InputNeurons
Node 0
Node 1
Node 2
Node 3
Node 0
Node 1
Node 3
Node 2
OutputNeurons
‐240
‐180
‐120
‐600
0
6000
1200
1800
2400
32
64
128
256
512
1024
2048
4096
8192
TransmissionSize(KB)
ExecutionTime(ms)
1_1 1_2 2_1 2_2 3_1 3_2 3_3 4_1 4_2 4_3 5_35_25_1
Local 2 Workers 3 Workers 4 Workers 4 Workers 2D Grid
0
24
48
72
96
32
64
128
256
512
1024
2048
4096
8192
Execution Time Reduction of 16 layers in VGG
Biased One-dimensional Partition (BODP)
Optimization of Fully-connected Layers
33
Schematic diagram of the partition scheme for fully-connected layers
Sparsity PartitionCluster
0 4 8 12 16
Array
Linked List
ExecutionTime(ms)
Calculation (Millions)
0
350
50
100
150
200
250
300
GET
SPT
Performance characterization of Nexus 5
Speed-oriented decision: SPMV and GEMV
Optimization of Sparse Fully-connected Layers
• Apply spectral co-clustering to original sparse matrix
• Decide the execution method for each cluster
• Initialize the estimated time with the execution time of each
cluster and their corresponding outliers.
34
Modified Spectral Co-Clustering (MSCC)
Target: Higher Computing Efficiency & Lower Transmission Amount
Input Neurons
OutputNeurons
(a) Original (b) Clustered (MSCC) (c) Outliers (FGCP)
Optimization of Sparse Fully-connected Layers
35
• Choose the node of
highest time
• Find the sparsest line
• Offload it to GO
Fine-Grain Cross Partition (FGCP)
Target: Workload Balance & Lower Transmission Amount
Input Neurons
OutputNeurons
(a) Original (b) Clustered (MSCC) (c) Outliers (FGCP)
2 Workers 3 Workers 4 Workers
70% 100%Sparsity 70% 100%Sparsity 70% 100%Sparsity
20%
30%
40%
50%
60%
70%
TransmissionSizeReduction(%)
FL_6: 25088×4096 FL_7: 4096×4096 FL_8: 1000×4096
Transmission size decrease ratio of MoDNN to baseline.
Outline
Our work: Improve Efficiency for DNN Applications
Through Software/Hardware Co-Design
• Introduction
• Research Spotlights
– Structured Sparsity Regularization
– Local Distributed Mobile System for DNN
– ApesNet for Image Segmentation (ESWEEK’16)
• Conclusion
36
ApesNet Design Concept
• ConvBlock & ApesBlock
• Asymmetric encoder & decoder
• Thin ConvBlock w/ large feature map
• Thick ApesBlock w/ small kernel
37
ConvBlock
16 feature maps 8 feature maps
#Feature map #Feature map
SegNet 64 64
Deconv-Net 64 64
AlexNet 96 -
VGG16 64 -
Preliminary Result on
Running Time
Conv 7x7
67%
Max-
Pooling
10%
Batch-
normalization
10%
Relu
6%
Drop-out
7%
VGG: arxiv 1409.1556; Deconv-Net: arxiv 1505.04366; SegNet: arxiv 1511.00561
M: # Feature map
k: Kernel size
r: Dropout ratio
38
ApesBlock
Two ApesBlock: 5x5 kernel, 64 feature maps
Previous models
# Parameters
64 Conv 8x8 0.4M
64 Conv 8x8 0.2M
4096 Full-connected 4.2M
Max-Pooling -
M: # Feature map
k: Kernel size
r: Dropout ratio
Conv
ReLU
Conv
Dropout
Conv
Dropout
39
Convolution (En) Up-sampling Convolution (De)
Neuron Visualization of ApesNet
Feature response at each layer
(Neuron visualization)
• Encoder: Convolution
• Decoder: Up-sampling
• Decoder: Convolution
40
Input SegNet Ours Label
CamVid Dataset
SegNet-
Basic
ApesNet
Model size 5.40MB 3.40MB
GTX 760M 181ms 73ms
GTX 860M 170ms 70ms
TITAN X
(Kepler)
63ms 40ms
Tesla K40 58ms 39ms
GTX 1080 48ms 33ms
Class
Avg.
Mean
IoU
Bike Tree Sky Car Sign Road Pedestrian
SegNet-Basic 62.9% 46.2% 75.1% 83.1% 88.3% 80.2% 36.2% 91.4% 56.2%
ApesNet 69.3% 48.0% 76.0% 80.2% 95.7% 84.2% 52.3% 93.9% 59.9%
Evaluation and Comparison
41
Input ApesNet Output
ApesNet: An Efficient DNN for Image Segmentation
42
Outline
Our work: Improve Efficiency for DNN Applications
Through Software/Hardware Co-Design
• Introduction
• Research Spotlights
– Structured Sparsity Regularization
– Local Distributed Mobile System for DNN
– ApesNet for Image Segmentation
• Conclusion
43
Conclusion
• DNNs demonstrate great success and potentials in
various types of applications;
• Many software optimization techniques cannot obtain
the theoretical speedup in real implementations;
• The mismatch between the software requirement and
hardware capability is more severe as problem scale
increases;
• New approaches that coordinate the software and
hardware co-design and optimization are necessary.
• New devices and novel architecture could play an
important role too.
44
Thanks to Our Sponsors, Collaborators, & Students
We welcome industrial collaborators.
45

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMiguel González-Fierro
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidYufeng Guo
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachMaurizio Calo Caligaris
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 Eran Shlomo
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)Julien SIMON
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at ScaleIntel Nervana
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Why is Deep learning hot right now? and How can we apply it on each day job?
Why is Deep learning hot right now? and How can we apply it on each day job?Why is Deep learning hot right now? and How can we apply it on each day job?
Why is Deep learning hot right now? and How can we apply it on each day job?Issam AlZinati
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignForrest Iandola
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 

Was ist angesagt? (20)

Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep Learning
 
On-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on AndroidOn-device machine learning: TensorFlow on Android
On-device machine learning: TensorFlow on Android
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)An Introduction to Deep Learning (May 2018)
An Introduction to Deep Learning (May 2018)
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Why is Deep learning hot right now? and How can we apply it on each day job?
Why is Deep learning hot right now? and How can we apply it on each day job?Why is Deep learning hot right now? and How can we apply it on each day job?
Why is Deep learning hot right now? and How can we apply it on each day job?
 
Small Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their DesignSmall Deep-Neural-Networks: Their Advantages and Their Design
Small Deep-Neural-Networks: Their Advantages and Their Design
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 

Andere mochten auch

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabChester Chen
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へPreferred Networks
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events홍배 김
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder RoadshowPredix
 
Módulo3. redes sociales presentación
Módulo3. redes sociales presentaciónMódulo3. redes sociales presentación
Módulo3. redes sociales presentacióneSAMFyC
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldChester Chen
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
 
Entering the Conversation (Week 2)
Entering the Conversation (Week 2)Entering the Conversation (Week 2)
Entering the Conversation (Week 2)Ron Martinez
 
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12Yusuke Iwasawa
 
Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks홍배 김
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science CompetitionJeong-Yoon Lee
 
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션Ian Choi
 

Andere mochten auch (20)

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLabSF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder Roadshow
 
Evernote
EvernoteEvernote
Evernote
 
Módulo3. redes sociales presentación
Módulo3. redes sociales presentaciónMódulo3. redes sociales presentación
Módulo3. redes sociales presentación
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
Fraud embezzlement
Fraud embezzlementFraud embezzlement
Fraud embezzlement
 
Entering the Conversation (Week 2)
Entering the Conversation (Week 2)Entering the Conversation (Week 2)
Entering the Conversation (Week 2)
 
Midterm review2
Midterm review2Midterm review2
Midterm review2
 
Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12Paper Reading, "On Causal and Anticausal Learning", ICML-12
Paper Reading, "On Causal and Anticausal Learning", ICML-12
 
Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션
[SOSCON 2016] 오픈스택을 살펴보는 오픈 소스 컨트리뷰션
 

Ähnlich wie Improving Hardware Efficiency for DNN Applications

Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .netMarco Parenzan
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionSafaa Alnabulsi
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesNamkug Kim
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyQualcomm Research
 

Ähnlich wie Improving Hardware Efficiency for DNN Applications (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 

Mehr von Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdfChester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataChester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleChester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapChester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in sparkChester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 

Mehr von Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 

Kürzlich hochgeladen

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

Improving Hardware Efficiency for DNN Applications

  • 1. Improving Hardware Efficiency for DNN Applications Hai (Helen) Li Electrical and Computing Engineering Department Duke Center for Evolutionary Intelligence
  • 2. Technology Landscape 2 Desktop/Workstation • Personal computing • Wired internet Mobile Phone • Sensing • Display • Wireless communication & internet • Computing at the edge • Data integration • Large-scale storage • Large-scale computing Data Center / Cloud • IoT • Robotics • Industrial internet • Self-driving cars • Smart Grid • Secure autonomous networks • Real-time data to decision Intelligent SystemsProgram Learn Static Offline Virtual world Dynamic Online Real world T. Hylton, “Perspectives on Neuromorphic Computing,” 2016.
  • 3. Deep Neural Networks Deeper Model  Lower Error Rate  Higher Requirement on Computation Input Information Weights Recognition Results ImageNet Challenge (ILSVRC) • Dataset: 1.2M images in 1K categories • Classification: make 5 guesses 0 20 40 60 80 100 120 140 160 0 8 16 24 32 2010 2011 2012 2013 2014 2015 3.57% 6.7% 11.7% 16.4% 25.8% 28.2% Shallow 8 8 22 152 Top5 Error # Layers 3
  • 4. Data Explosion & Hardware Development • Every minute, we send over 200 million emails, click almost 2 million likes on Facebook, send almost 300K tweets and up-load 200K photos to Facebook as well as 100 hours of video to YouTube. • “Data centers consumes up to 1.5% of all the world’s electricity…” • “Google’s data centers draw almost 260 MW of power, which is more power than Salt Lake City uses…” 4 Google’s “Council Bluffs” data center facilities in Iowa. 2X transistor count, But only 40% faster, 50% more efficient… L. Ceze, “Approximate Overview of Approximate Computing”. J. Glanz, “Google Details, and Defends, Its Use of Electricity”
  • 5. Hardware Acceleration for DNNs • GPUs – Fast, but high power consumption (~200W) – Training DNNs in back-end GPU clusters • FPGAs – Massively parallel + low-power (~25W) + reconfigurable – Suitable for latency-sensitive real-time inference job • ASICs – Fast + energy efficient – Long development cycle • Novel architectures and emerging devices 5 Efficiency vs. Flexibility
  • 6. Mismatch: Software vs. Hardware Software Hardware Model/Component scale Large Small/Moderate Reconfigurability Easy Hard Accuracy vs. Power Accuracy Tradeoff Training implementation Easy Hard Precision vs. Limited programmability Double (high) precision Low precision (often a few bits) Connectivity realization Easy Hard 6 Our work: Improve Efficiency for DNN Applications thorugh Software/Hardware Co-Design Framework
  • 7. Outline Our work: Improve Efficiency for DNN Applications Through Software/Hardware Co-Design • Introduction • Research Spotlights – Structured Sparsity Regularization (NIPS’16) – Local Distributed Mobile System for DNN – ApesNet for Image Segmentation • Conclusion 7
  • 8. Related Work • State-of-the-art methods to reduce the number of parameters – Weight regularization (L1-norm) – Connection pruning 8 Layer conv1 conv2 conv3 conv4 conv5 Sparsity 0.927 0.95 0.951 0.942 0.938 Theoretical speedup 2.61 7.14 16.12 12.42 10.77 AlexNet, B. Liu, et al., CVPR 2015 AlexNet, S. Han, et al., NIPS 2015 Remained
  • 9. Theoretical Speedup ≠ Practical Speedup • Forwarding speedups of AlexNet on GPU platforms and the sparsity. • Baseline is GEMM of cuBLAS. • The sparse matrixes are stored in the format of compressed sparse row (CSR) and accelerated by cuSPARSE. 9 Speedup Sparsity B. Liu, et al., “Hardcoding nonzero weights in source code,” CVPR’15 S. Han, et al., “Customizing an EIE chip accelerator for compressed DNN,” ISCA’17 Random Sparsity Irregular Memory Access Poor Data Locality No or trivial Speedup Software or Hardware Customization Structured Sparsity Regular Memory Access Good Data Locality Real Speedup Combining SW and HW Customization
  • 10. Computation-efficient Structured Sparsity Example 1: Removing 2D filters in convolution (2D-filter-wise sparsity) Example 2: Removing rows/columns in GEMM (row/column-wise sparsity) http://deeplearning.net/ 3D filter = stacked 2D filters Lowering Weight matrix filter feature map GEMM GEneral Matrix Matrix Multiplication Non-structured sparsity Structured sparsity featurematrix 10
  • 11. Structured Sparsity Regularization • Group Lasso regularization in ML model 11 argmin w E w   argmin w ED w   s.t. Rg w g argmin w E w   argmin w ED w g Rg w   w0 w1 w2 ED w Example: Rg w0 ,w1 ,w2  w0 2 w1 2  w2 2 g group 1 group 2 M. Yuan, 2006 (w0,w1)=(0,0) Many groups will be zeros
  • 12. SSL: Structured Sparsity Learning • Group Lasso regularization in DNNs • The learned structured sparsity is determined by the way of splitting groups 12 shortcut depth-wisefilter-wise channel-wise … shape-wise W(l) :,cl,:,: W(l) nl,:,:,: W (l) :,cl,ml,kl W(l) Penalize unimportant filters and channels Learn filter shapes Learn the depth of layers
  • 13. SSL: Structured Sparsity Learning • Group Lasso regularization in DNNs 13 shortcut depth-wisefilter-wise channel-wise … shape-wise W(l) :,cl,:,: W(l) nl,:,:,: W (l) :,cl,ml,kl W(l) filters and channels Learn filter shapes Learn the depth of layers
  • 14. Penalizing Unimportant Filters and Channels 14 LeNet # Error Filter # Channel # FLOP Speedup Baseline - 1 0.9% 20 – 50 1 – 20 100% – 100% 1.00× – 1.00× 2 0.8% 5 – 19 1 – 4 25% – 7.6% 1.64× – 5.23× 3 1.0% 3 - 12 1 – 3 15% – 3.6% 1.99× – 7.44× LeNet on MNIST The data is represented in the order of conv1 – conv2. 1 2 3 Conv1 filters (gray level 128 represents zero) SSL obtains FEWER but more natural patterns
  • 15. Learning Smaller Filter Shapes LeNet on MNIST LeNet # Error Filter # Channel # FLOP Speedup Baseline - 1 0.9% 25 – 500 1 – 20 100% – 100% 1.00× – 1.00× 4 0.8% 21 – 41 1 – 2 8.4% – 8.2% 2.33× – 6.93× 5 1.0% 7 - 14 1 – 1 1.4% – 2.8% 5.19× – 10.82× The size of filters after removing zero shape fibers, in the order of conv1 – conv2. Learned shapes of conv1 filters: LeNet 1 5x5 LeNet 4 21 LeNet 5 7 Learned shape of conv2 filters @ LeNet 5 3D 20x5x5 filters is regularized to 2D filters! Smaller weight matrix 16 SSL obtains SMALLER filters w/o accuracy loss
  • 16. Learning Smaller Dense Weight Matrix 17 ConvNet # Error Row Sparsity Column Sparsity Speedup Baseline 1 17.9% 12.5%–0%–0% 0%–0%–0% 1.00×–1.00×–1.00× 4 17.9% 50.0%–28.1%–1.6% 0%–59.3%–35.1% 1.43×–3.05×–1.57× 5 16.9% 31.3%–0%–1.6% 0%–42.8%–9.8% 1.25×–2.01×–1.18× ConvNet on CIFAR-10 Row/column sparsity is represented in the order of conv1–conv2–conv3. The learned conv1 filters 3 2 1 SSL can efficiently learn DNNs with smaller but dense weight matrix which has good locality
  • 17. Regularizing The Depth of DNNs 18 Baseline, K. He, CVPR’16 # layers Error # layers Error ResNet 20 8.82% 32 7.51% SSL-ResNet 14 8.54% 18 7.40% ResNet on CIFAR-10 shortcut depth-wise W(l) Depth-wise SSL
  • 18. Experiments – AlexNet@ImageNet • SSL saves 30-40% (60-70%) FLOPs with 0 (<1.5%) accuracy loss by structurally removing 2D filters. • Deeper layers have higher sparsity. 19 0 20 40 60 80 100 0 20 40 60 80 100 41.5 42 42.5 43 43.5 44 conv1 conv2 conv3 conv4 conv5 FLOP p y % top-1 error %2D-filtersparsity %FLOPreduction Baseline http://deeplearning.net/ Stacked 2D filters Learning 2D-filter-wise sparsity 3D convolution = sum of 2D convolutions:
  • 19. Experiments – AlexNet@ImageNet Higher speedups than non-structured speedups • 5.1X/3.1X layer-wise speedup on CPU/GPU with 2% accuracy loss • 1.4X layer-wise speedup on both CPU and GPU w/o accuracy loss 20 0 1 2 3 4 5 6 Quadro Tesla Titan Black Xeon T8 Xeon T4 Xeon T2 Xeon T1 l1 SSL speedup 2% loss Method Top1 Error Statistics conv1 conv2 conv3 conv4 conv5 1 L1 44.67% Sparsity 67.6% 92.4% 97.2% 96.6% 94.3% CPU 0.80× 2.91× 4.84× 3.83× 2.76× GPU 0.25× 0.52× 1.38× 1.04× 1.36× 2 SSL 44.66% Col. Sparsity 0.0% 63.2% 76.9% 84.7% 80.7% Row Sparsity 9.4% 12.9% 40.6% 46.9% 0.0% CPU 1.05× 3.37× 6.27× 9.73× 4.93× GPU 1.00× 2.37× 4.94× 4.03× 3.05× 3 Pruning* 42.80% Sparsity 16.0% 62.0% 65.0% 63.0% 63.0% 4 L1 42.51% Sparsity 14.7% 76.2% 85.3% 81.5% 76.3% CPU 0.34× 0.99× 1.30× 1.10× 0.93× GPU 0.08× 0.17× 0.42× 0.30× 0.32× 5 SSL 42.53% Sparsity 0.00% 20.9% 39.7% 39.7% 24.6% CPU 1.00× 1.27× 1.64× 1.68× 1.32× GPU 1.00× 1.25× 1.63× 1.72× 1.36×
  • 20. Open Source • Source code in Github, and trained model in model zoo • https://github.com/wenwei202/caffe/tree/scnn 21
  • 21. Collaborated Work with Intel – ICLR 2017 Jongsoo Park, et al, “Faster CNNs with Direct Sparse Convolutions and Guided Pruning,” ICLR 2017.  Direct Sparse Convolution: an efficient implementation of convolution with sparse filters  Performance Model: A predictor of the speedup vs. sparsity level  Guided Sparsity Learning: A performance-model-supervised pruning process that avoids pruning layers, which are unbeneficial for speedup but harmful for accuracy 27
  • 22. Outline Our work: Improve Efficiency for DNN Applications Through Software/Hardware Co-Design • Introduction • Research Spotlights – Structured Sparsity Regularization – Local Distributed Mobile System for DNN (DATE’17) – ApesNet for Image Segmentation • Conclusion 28
  • 23. Neural Network on Mobile Platforms Advantages: Data Accessibility Mobility Navigation AR/VR ApplicationsImage Analysis Language Processing Object Recognition Motion Prediction Malware Detection Wearable Devices (1) A brain-inspired chip from IBM (2) Google glass (3) Nixie (4) Android wear smart watch Applications: 29
  • 24. Challenges and Preliminary Analysis 30 Layer Analysis of DNNs on Smartphones: Computing-intensive Memory-intensive Security Battery Capacity Hardware Performance CPU MemoryGPU Challenges:
  • 25. Original DNN Model Local Distributed Mobile System for DNN 31 System Overview of MoDNN: BODP MSCC& FGCP Model Processor Middleware Computing Cluster Group Owner Worker Node Worker Node
  • 26. Optimization of Convolutional Layers 32 Node 0 Node 1 Node 2 Node 3 Node 3Node 2 Node 0 Node 1 InputNeurons Node 0 Node 1 Node 2 Node 3 Node 0 Node 1 Node 3 Node 2 OutputNeurons ‐240 ‐180 ‐120 ‐600 0 6000 1200 1800 2400 32 64 128 256 512 1024 2048 4096 8192 TransmissionSize(KB) ExecutionTime(ms) 1_1 1_2 2_1 2_2 3_1 3_2 3_3 4_1 4_2 4_3 5_35_25_1 Local 2 Workers 3 Workers 4 Workers 4 Workers 2D Grid 0 24 48 72 96 32 64 128 256 512 1024 2048 4096 8192 Execution Time Reduction of 16 layers in VGG Biased One-dimensional Partition (BODP)
  • 27. Optimization of Fully-connected Layers 33 Schematic diagram of the partition scheme for fully-connected layers Sparsity PartitionCluster 0 4 8 12 16 Array Linked List ExecutionTime(ms) Calculation (Millions) 0 350 50 100 150 200 250 300 GET SPT Performance characterization of Nexus 5 Speed-oriented decision: SPMV and GEMV
  • 28. Optimization of Sparse Fully-connected Layers • Apply spectral co-clustering to original sparse matrix • Decide the execution method for each cluster • Initialize the estimated time with the execution time of each cluster and their corresponding outliers. 34 Modified Spectral Co-Clustering (MSCC) Target: Higher Computing Efficiency & Lower Transmission Amount Input Neurons OutputNeurons (a) Original (b) Clustered (MSCC) (c) Outliers (FGCP)
  • 29. Optimization of Sparse Fully-connected Layers 35 • Choose the node of highest time • Find the sparsest line • Offload it to GO Fine-Grain Cross Partition (FGCP) Target: Workload Balance & Lower Transmission Amount Input Neurons OutputNeurons (a) Original (b) Clustered (MSCC) (c) Outliers (FGCP) 2 Workers 3 Workers 4 Workers 70% 100%Sparsity 70% 100%Sparsity 70% 100%Sparsity 20% 30% 40% 50% 60% 70% TransmissionSizeReduction(%) FL_6: 25088×4096 FL_7: 4096×4096 FL_8: 1000×4096 Transmission size decrease ratio of MoDNN to baseline.
  • 30. Outline Our work: Improve Efficiency for DNN Applications Through Software/Hardware Co-Design • Introduction • Research Spotlights – Structured Sparsity Regularization – Local Distributed Mobile System for DNN – ApesNet for Image Segmentation (ESWEEK’16) • Conclusion 36
  • 31. ApesNet Design Concept • ConvBlock & ApesBlock • Asymmetric encoder & decoder • Thin ConvBlock w/ large feature map • Thick ApesBlock w/ small kernel 37
  • 32. ConvBlock 16 feature maps 8 feature maps #Feature map #Feature map SegNet 64 64 Deconv-Net 64 64 AlexNet 96 - VGG16 64 - Preliminary Result on Running Time Conv 7x7 67% Max- Pooling 10% Batch- normalization 10% Relu 6% Drop-out 7% VGG: arxiv 1409.1556; Deconv-Net: arxiv 1505.04366; SegNet: arxiv 1511.00561 M: # Feature map k: Kernel size r: Dropout ratio 38
  • 33. ApesBlock Two ApesBlock: 5x5 kernel, 64 feature maps Previous models # Parameters 64 Conv 8x8 0.4M 64 Conv 8x8 0.2M 4096 Full-connected 4.2M Max-Pooling - M: # Feature map k: Kernel size r: Dropout ratio Conv ReLU Conv Dropout Conv Dropout 39
  • 34. Convolution (En) Up-sampling Convolution (De) Neuron Visualization of ApesNet Feature response at each layer (Neuron visualization) • Encoder: Convolution • Decoder: Up-sampling • Decoder: Convolution 40
  • 35. Input SegNet Ours Label CamVid Dataset SegNet- Basic ApesNet Model size 5.40MB 3.40MB GTX 760M 181ms 73ms GTX 860M 170ms 70ms TITAN X (Kepler) 63ms 40ms Tesla K40 58ms 39ms GTX 1080 48ms 33ms Class Avg. Mean IoU Bike Tree Sky Car Sign Road Pedestrian SegNet-Basic 62.9% 46.2% 75.1% 83.1% 88.3% 80.2% 36.2% 91.4% 56.2% ApesNet 69.3% 48.0% 76.0% 80.2% 95.7% 84.2% 52.3% 93.9% 59.9% Evaluation and Comparison 41
  • 36. Input ApesNet Output ApesNet: An Efficient DNN for Image Segmentation 42
  • 37. Outline Our work: Improve Efficiency for DNN Applications Through Software/Hardware Co-Design • Introduction • Research Spotlights – Structured Sparsity Regularization – Local Distributed Mobile System for DNN – ApesNet for Image Segmentation • Conclusion 43
  • 38. Conclusion • DNNs demonstrate great success and potentials in various types of applications; • Many software optimization techniques cannot obtain the theoretical speedup in real implementations; • The mismatch between the software requirement and hardware capability is more severe as problem scale increases; • New approaches that coordinate the software and hardware co-design and optimization are necessary. • New devices and novel architecture could play an important role too. 44
  • 39. Thanks to Our Sponsors, Collaborators, & Students We welcome industrial collaborators. 45