Object Detection using Deep Neural Networks

1
Object Detection
By Usman Qayyum
4, Dec, 2018

Talk Covers Three Papers (Object Detection -> Embedded Computing)
2
SqueezeNet-2016SSD-2016 TinySSD-2018
=+

Image Classification/Object Detection
● Autonomous vehicles, smart video surveillance, facial detection and various
applications, fast and robust object detection is need of an hour
● Nonly recognizing and classifying every object in an image, but localizing each one by
drawing the appropriate bounding box around it.
3

CNN Migration (Image Classification)
4

Object Detection as Classification
CNN
deer?
cat?
background?

with Sliding Window
CNN
deer?
cat?
background?

with Box Proposals

Box Proposal Method : Selective Search
Segmentation As Selective
Search for Object
Recognition. van de Sande
et al. ICCV 2011

Idea behind Object Detectors
● Box Proposals
● Classifier Algorithm
11

RCNN
Rich feature hierarchies for accurate object detection and semantic segmentation.
Girshick et al. CVPR 2014.
https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf

Fast-RCNN
Fast R-CNN. Girshick. ICCV 2015.
https://arxiv.org/abs/1504.08083
Idea: No need to recompute features for every box independently,
Regress refined bounding box coordinates.

Faster-RCNN
Ren et al. NIPS 2015.
Idea: Integrate the Bounding Box Propos
als as part of the CNN predictions

YOLO- You Only Look Once
● Single Shot Detector
Redmon et al. CVPR 2016.
Idea: No bounding box proposals.
Predict a class and a box for every
location in a grid.

SSD: Single Shot Detector
Liu et al. ECCV 2016.
Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augm
entation + Hard negative mining + Other design choices in the network.

-The overall objective loss function is a weighted sum of the localization loss and the confidence loss(conf)
N: the number of matched default boxes
l: predicted boxes g: the ground truth box
x=1 denotes some certain default box is matched to a ground truth box17
1
( , , , ) ( ( , ) ( , , ))conf locL x c l g L x c L x l g
N
 
SSD: Single Shot Detector

AI Workload Migration
Embedded
(Mobile/Edge)
Server/Clou
d
Execution/Inference
Training
Execution/Inference
Intelligence &
Analytics
Key Use Cases
Vision | Audio | Security
Benefits
Low Latency | Privacy

How ? (AI in Embedded Devices)
Pruning Quantization22

SqueezeNet (Parameter Reduction)
● Strategy 1. Replace 3x3 filters with 1x1 filters
○ Parameters per filter: (3x3 filter) = 9 * (1x1 filter)
● Strategy 2. Decrease the number of input channels to 3x3 filters
○ Total # of parameters: (# of input channels) * (# of ﬁlters) * ( # of parameters per filter)
● Strategy 3. Downsample late in the network so that convolution layers have large
activation maps
○ Size of activation maps: the size of input data, the choice of layers in which to downsample in the
CNN architecture
23
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."

Strategy#1 Conv1x1 or Kernel Reduction
24

Microarchitecture – Fire Module
25
Squeeze Layer
Set s1x1 < (e1x1 + e3x3),
limits the # of input channels to 3*3 filters
Strategy 2. Decrease the number of input channels to
3x3 filters
Total # of parameters: (# of input channels) * (# of
ﬁlters) * ( # of parameters per filter)
How much can we limit
s1x1?
Strategy 1. Replace 3*3 filters with 1*1 filters
Parameters per filter: (3*3 filter) = 9 * (1*1 filter)
How much can we replace 3*3 with 1*1?
(e1x1 vs e3x3 )?

Expand
● In the "expand" modules, what are the
tradeoffs when we turn the knob
between mostly 1x1 and mostly 3x3
filters?
● Hypothesis: if having more weights
leads to higher accuracy, then having
all 3x3 filters should give the highest
accuracy
27

Macroarchitecture
29
Strategy 3. Downsample late in the network so that
convolution layers have large activation maps
Size of activation maps: the size of input data, the
choice of layers in which to downsample in the CNN
architecture

TinySSD (SSD with Microarchitecture)
31

Object Detection using Deep Neural Networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Object Detection using Deep Neural Networks

Ähnlich wie Object Detection using Deep Neural Networks (20)

Mehr von Usman Qayyum

Mehr von Usman Qayyum (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Object Detection using Deep Neural Networks