For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Jeff Gehlhaar, Vice President of Technology, Corporate Research and Development, at Qualcomm, presents the "Deep-learning-based Visual Perception in Mobile and Embedded Devices: Opportunities and Challenges" tutorial at the May 2015 Embedded Vision Summit.
Deep learning approaches have proven extremely effective for a range of perceptual tasks, including visual perception. Incorporating deep-learning-based visual perception into devices such as robots, automobiles and smartphones enable these machines to become much more intelligent and intuitive. And, while some applications can rely on the enormous compute power available in the cloud, many systems require local intelligence for various reasons. In these applications, the enormous computing requirements of deep-learning-based vision, creates unique challenges related to power and efficiency.
In this talk, Jeff explores applications and use cases where on-device deep-learning-based visual perception provides great benefits. He dives deeply into the challenges that these applications face, and explores techniques to overcome them.
Boost Fertility New Invention Ups Success Rates.pdf
"Deep-learning-based Visual Perception in Mobile and Embedded Devices: Opportunities and Challenges," a Presentation from Qualcomm
1. 1
Jeff Gehlhaar, Vice President, Qualcomm Research
May 12, 2015
Deep-learning-based visual perception
in mobile and embedded devices:
Opportunities and challenges
4. 4
Key elements of “Cognition”
Hear AnticipateSee PlanConcepts
AutonomousClassify Infer context
Relationships
Perception ActionReasoning
5. 5
Rich
Connectivity
Heterogeneous
Computing
On-device
Intelligence
On-device capabilities
• Integrated modem & AP
• Adaptive RF front end
• LTE broadcast & service
focused modem features
• Tightly integrated Wi-Fi/BT
• Leading location / GPS
• Fully customized
architecture
• Superior performance at low
power consumption
• Highly optimized for cutting-
edge cognitive capabilities
• On-device machine learning
• Computer vision
• Behavioral analysis
• Sensor processing and
classification algorithms
• Natural language processing
Visual
Perception
Speech & Audio
Understanding
Natural
Interactions
Intelligent
Connectivity
Immersive
Multimedia
Intuitive
Security
Always On
Awareness
On the road to a “Cognitive Platform”
6. 6
On-device visual perception is key
Democratizing
robotics to assist
us in daily lives
Revolutionizing
transportation with
autonomous cars
Contextualizing your
environment through scene
understanding
7. 7
Process data closest to the source, complement cloud
Why fully on-device matters
Reliability
Efficient use of
network bandwidth
Low
Latency
Security and
user privacy
8. 8
• Qualcomm Technologies, Inc. has been applying
machine learning to mobile for many years
• Deep learning for visual perception
• Provides best-in-class solutions
• Traditionally a cloud-only solution, but not on
mobile (until now)
• Presents many implementation challenges
• Our mobile focused platform goes beyond deep
learning to include RNNs and other strategies
• Applications: Security, handwriting, natural
language processing, etc.
Deep learning solves visual perception
C C C C C C
C C C C C C
Pooling
Fully Connected
Result
Deep Network
10. 10
Typical computing environment for deep
learning
Performance
Teraflops
Memory
bandwidth
100s of GB/s
Storage
10s of GBs of RAM
Power
100s of watts
Best-in-class server-based visual perception models
require about ~2B MAC operations per image
11. 11
Supporting deep learning on-device is
a major challenge
Power and thermal efficiency
Storage and memory bandwidth
limitations
Battery powered
Constrained
mobile environment
Visual perception
workloads
Compute intensive
Large and complicated
neural network models
12. 12
Within the power and thermal
constraints of mobile devices
Solving the challenge of
on-device visual perception
15. 15
Key to deep learning on mobile is an efficient execution environment that considers all
aspects of the SoC combined with efficient library implementations
• Careful analysis of deep learning tradeoffs
• Consider the impact of different network architectures
• Focus on cache performance, data locality, DRAM utilization efficiency
• Focus on parallelism and heterogeneity
• Take advantage of heterogeneous computing frameworks (e.g. Qualcomm MARE)
• Span execution across Qualcomm® Snapdragon™ CPU, DSP, and GPU
• Focus on underlying optimizations
• Convolutions implemented as highly efficient matrix multiply operations
• Smart buffer management for GPU and fixed bit-width optimizations for DSP
• Optimized matrix multiply for Snapdragon processors1
• 6X faster than Eigen
Efficient execution on mobile SoCs
1. Results are based on Snapdragon 805 processor and Eigen 3.2.2
Qualcomm Snapdragon and Qualcomm Multicore Asynchronous Runtime Environment are products of Qualcomm Technologies, Inc.
16. 16
Goal
Reduce both physical size and number of MACs required at equivalent precision
• Utilize available memory bandwidth, computations effectively -> power efficiency
• Smaller size permits in-field model upgrades and improvements
Reducing model size through compression
C C C C C C
C C C C C C
Pooling
Fully Convolution
Result
Deep Network
Qualcomm Technologies, Inc.
approach
• Initial SVD approach based on a paper by
Denton, et. al. of NYU1
• Qualcomm Technologies Inc. approach
involves replacing single layers with
multiple layers
• Approach permits fine-tuning all layers,
not just layers above compressed layers
Results
• Up to a 10X reduction in physical
model size
• Up to a 35% reduction in the
number of MAC operations with
minimal lost of precision
1. “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation”, arXiv:1404.0736 [cs.CV]
17. 17
Size compression and error rate impact
FC Layer Compressed
Original Network
FC and Conv Layer
Compressed
Fully connected layer compression significantly impacts physical network size
10X size reduction
~1% pt loss in top5 error
18. 18
MAC compression and error rate impact
FC Layer Compressed
Original Network
FC and Conv Layer
Compressed
Compression
~ 35% MAC reduction
~ 1.3% pt loss in top5 error
Fine Tuning
2.5% pt improvement in
top5 error under max
MAC constraints
AlexNet
Convolutional layer compression significantly impacts MAC requirements
19. 19
Focus on reduction of precision for both weights (static value) and
activations (dynamic values) versus traditional 32-bit floating approaches
• Physically smaller networks
• 2X improvement in memory access efficiency for network weights
Fixed point and reduced bit widths
16-bit values are used with no net increase in top-5 error
ActivationBitWidths
Neural Network Weight Bit Widths
4 8 16 24 32 Float
8 20.0% 1.4% 0.1% 0.1% 0.1% 0.1%
16 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
24 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
32 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
Float 20.1% 1.4% 0.0% 0.0% 0.0% 0.0%
0.0%
21. 21
Expanding the frontier of visual perception
• More complex models
• Video classification
• Scene parsing and object localization and tracking
Platform enhancements
• Evolution of the SoC
Working towards “Cognition”
• Qualcomm Research is experimenting with algorithms for
“reasoning” to link perception to action
What comes next?
22. 22
• Qualcomm Technologies, Inc. web sites:
• Computer Vision: https://www.qualcomm.com/invention/research/projects/computer-vision
• Cognitive Technologies: https://www.qualcomm.com/invention/cognitive-technologies
• FastCV™ SDK: : https://developer.qualcomm.com/mobile-development/add-advanced-
features/computer-vision-fastcv/tools-and-resources
• Embedded Vision Alliance web sites:
• Heterogeneous computing for CV: http://www.embedded-vision.com/platinum-
members/qualcomm/embedded-vision-training/videos/pages/oct-2013-embedded-vision-
summit-heterogeneous
• CV acceleration: http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-
training/videos/pages/september-2013-qualcomm-uplinq-conferenc
• Demo in Technology Showcase
• Scene detect through on-device deep learning
Additional resources
FastCV is a product of Qualcomm Technologies, Inc.
Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries.
FastCV is a trademark of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with permission.
Other products and brand names may be trademarks or registered trademarks of their respective owners.