SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Tutorial
Faster R-CNN
Object Detection: Localization & Classification
Hwa Pyung Kim
Department of Computational Science and Engineering, Yonsei University
hpkim0512@yonsei.ac.kr
𝑥
𝑦
𝑤
ℎ
Bounding box regression (localization):
Where?
Object Detection: Classification + Regression
A dog at (𝒙, 𝒚, 𝒘, 𝒉)
+ =
1
0
0
⋮
Dog
Cat
⋮
Person
Classification (recognition):
What?
Objection Detection
Feature
map
Encoding
(conv&pool)
Combining
features
𝒙, 𝒚
w
h
Bounding box information
• 𝒙, 𝒚 : top left corner position
• w = width
• h = height
Dog
Cat
Person
⋮
pool5 features[224,224,3]
[7,7,512]
Input image
224
224
7 =
224
32
32 = 25
5 = # of pooling
7
7
Vgg16 Networks
Pooling
CNN-based Object Detection:
There are clues of dog (What) at local position (Where)
in the convolution feature map
Fully-connected
layers
Classification
Regression
𝑥
𝑦
𝑤
ℎ
1
0
0
⋮
These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”.
⋯ ⋯ Dog
Multiple Object Detection:
Localize and Classify all objects appearing in the image
How many objects are in there?
• Classify these multiply overlapping objects
• Identify their bounding boxes
PASCAL VOC2007
Background
Person
Dining table
Extract “region proposals” using
selective search method.
ConvNet
Region based CNN (R-CNN) method
CNN input (fixed size)
Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape
Classifier
&
Regressor
Classifier
&
Regressor
Classifier
&
Regressor
Fast R-CNN
feature map
ConvNet
Classifier &
Regressor
RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Classifier
&
Regressor
What is Region Proposal Network?
Region Proposal Network (RPN)
Region Proposal Network
380
480 11 =
360
32
, 15 =
480
32
32 = 25
5 = # of pooling
512 = # of filters
15
11
512
Conv feature map
RPN
RPN outputs a set of rectangular object
proposals, each with an objectness score.
How?
Region proposals
Region Proposal Network
Conv feature map
15
11
512
Region Proposals & Anchor Boxes
𝑠 𝑜𝑏𝑗
𝑠 𝑛𝑜𝑏𝑗
t𝑥
t𝑦
t𝑤
tℎ
Fully-
connected
layers
Input: each sliding window
3×3×512
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the proposal is parametrized relative to an anchor.
𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥
𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦
𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤
𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ
Output:
• 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
• 2 scores: 𝑠 𝑜𝑏𝑗
, 𝑠 𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
Anchor box information
• 𝒂𝒙 , 𝒂𝒚 : center position
• 𝒂𝒘 = width
• 𝒂𝒉 = height
Anchor box
For example, 𝑎𝑤 = 𝑎ℎ = 128
• 𝑎𝑤 and 𝑎ℎ are fixed.
• 𝑎𝑥 , 𝑎𝑦 is determined by the
position of the red box
Region Proposals & Anchor Boxes
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1Conv feature map
15
11
512
Fully-
connected
layers
3×3×512
• 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed.
• 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the
position of the red box
9 Anchor boxes = 3 ratios × 3 scales
For example,
𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128,
𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯
𝑎𝑤7 =
1
2
× 𝑎ℎ7 = 128, ⋯
Output: For 𝑖 = 1, ⋯ , 9,
• 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖
• 2 scores: 𝑠𝑖
𝑜𝑏𝑗
, 𝑠𝑖
𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the 9 proposals are parametrized relative to 9 anchors.
Input: each sliding window
Region Proposal Network
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖
Anchor box information
• 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position
• 𝒂𝒘𝒊 = width
• 𝒂𝒉𝒊 = height
Region Proposal Network
Fully-
connected
layers
Conv feature map
Anchor boxes
15
11
512
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑜𝑏𝑗
exp 𝑠𝑖
𝑜𝑏𝑗
+ exp 𝑠𝑖
𝑛𝑜𝑏𝑗
⋮
𝑝1
𝑝𝑥1
𝑝𝑦1
𝑝𝑤1
𝑝ℎ1
𝑝2
𝑝𝑥2
𝑝𝑦2
𝑝𝑤2
𝑝ℎ2
𝑝9
𝑝𝑥9
𝑝𝑦9
𝑝𝑤9
𝑝ℎ9
Extract 9 Proposals relative to 9 Anchors
Proposals
3×3×512
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
⋮
⋮
Total # of windows # of proposals
per a window
Total # of proposals: 11 × 15 × 9 = 1485
Conv feature map
The proposals highly overlaps each other!
Need to reduce redundancy.
Generate Region Proposals
15
11
512
Total#ofwindows=11×15
Region Proposal Network
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯ ⋯
Most probable proposal
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Proposal information
• 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position
• 𝒑𝒘𝒊 = width
• 𝒑𝒉𝒊 = height
• 𝒑𝒊 = objectness probability,
𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓
𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
Region Proposal Network
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.83𝐼𝑂𝑈 = 0.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.830.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
𝐼𝑂𝑈 =
Most probable proposal
30 proposals having IoU>0.7
are discarded.
Region Proposal Network
Given the most probable proposal,
the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Summary of step 1-2 in NMS.
Step 3:
Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process.
Next most probable proposal
36 proposals having IoU>0.7
are discarded.
Reduce redundancy by NMS
Region Proposal Network
Before NMS After NMS
1,485 proposals 300 proposals
Repeats the previous procedure until…
Reduce redundancy by NMS
Summary of RPN
Inputs:
• Conv feature map
Outputs:
• Region proposals coordinates.
• Probabilities representing how likely the image in that region proposal will be an object.
Region Proposal Network
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Now we are ready to explain
Classifier & Regressor.
Classifier
&
Regressor
Classifier & Regressor
RoI pooling layer
Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Classifier & Regressor
Bilinear interpolation
& Max pooling
Input for
Classifier & Regressor
: fixed-size
Conv feature map
Bilinear interpolation
& Max pooling
Convert the features inside valid RoI into a small feature map with a fixed spatial extent.
𝑝𝑥′
= 𝑝𝑥 ⋅
15
, 𝑝𝑦′
= 𝑝𝑦 ⋅
11
, 𝑝𝑤′
= 𝑝𝑤 ⋅
15
, 𝑝ℎ′
= 𝑝ℎ ⋅
11
360
480
11
15
5
8
3
9
7
7
7
7
𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
⋯
300 RoI pooled feature maps
RoI pooling layer generates
inputs for Classifier & Regressor
Classifier & Regressor
7
7
512
7
7
512
7
7
512
7
7
512
⋮
𝑠0
𝑟𝑥0
𝑟𝑦0
𝑟𝑤0
𝑟ℎ0
𝑠15
𝑟𝑥15
𝑟𝑦15
𝑟𝑤15
𝑟ℎ15
𝑠20
𝑟𝑥20
𝑟𝑦20
𝑟𝑤20
𝑟ℎ20
𝑝0 = 0.0124
𝑝15 = 0.9797
𝑝20 = 0.0001
⋮
RoI pooling
Classification & Regression per each proposal
𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖
𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖
𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖
ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑗=0
20
exp 𝑠𝑗
Background
Person
TV monitor
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Fully-connected
layers
⋮
𝑝0
𝑥0
𝑦0
𝑤0
ℎ0
𝑝15
𝑥15
𝑦15
𝑤15
ℎ15
𝑝20
𝑥20
𝑦20
𝑤20
ℎ20
⋮
Proposal
Classification &
Bounding-box regression
Each of the 21 classes
gets its own refined
bounding-box prediction and
assign estimated probability.
Classifier & Regressor
7
7
512
7×7×512
4096
Summary of Classification & Regression
Regress & classify
each class from proposals
⋮
Background
Person
TV monitor
⋮
⋮
Reduce redundancy
by NMS
Dining table
⋮
None
None
Classifier & Regressor
Discard bounding boxes
(p < 0.6 or background)
⋮
⋮
⋮
Region Proposals
Summary of Classifier & Regressor
Inputs:
• Conv feature map
• Region proposals
Outputs:
• Bounding boxes coordinate of objects in the image.
• Classification of bounding boxes
Classifier & Regressor
Training process for RPN
Ground-truth proposals associated with anchors 𝐴𝑗
𝑘
Find the nearest bounding box from each anchors, 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝐴𝑗
𝑘
• Ground-truth probability of objectness: 𝑝𝑗
(𝑘)
≔
1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
> 0.7
0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
< 0.3
• Ground-truth proposal transformation: 𝑡𝑗
(𝑘)
≔ 𝑡𝑥𝑗
(𝑘)
, 𝑡𝑦𝑗
(𝑘)
, 𝑡𝑤𝑗
(𝑘)
, 𝑡ℎ𝑗
(𝑘)
where Δ 𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑎𝑥𝑗
(𝑘)
/𝑎𝑤𝑗
(𝑘)
, Δ 𝑦𝑗
𝑘
= 𝑦𝑖
(𝑘)
− 𝑎𝑦𝑗
(𝑘)
/𝑎ℎ𝑗
(𝑘)
, Δ 𝑤𝑗 = log 𝑤𝑖
𝑘
/𝑎𝑤𝑗
(𝑘)
, Δℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑎ℎ𝑗
(𝑘)
Predicted proposals
• Predicted probability of objectness: 𝑝𝑗
𝑘
• Predicted proposal transformation: 𝑡𝑗
(𝑘)
= 𝑡𝑥𝑗
𝑘
, 𝑡𝑦𝑗
𝑘
, t𝑤𝑗
𝑘
, tℎ𝑗
𝑘
where
𝑡𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 ,
Anchor boxes
𝐴(𝑘)
= 𝐴𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
(𝑘)
where A𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
, 𝑎𝑦𝑗
(𝑘)
, 𝑎𝑤𝑗
(𝑘)
, 𝑎ℎ𝑗
(𝑘)
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
𝐿 𝑅𝑃𝑁 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
, 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 =
1
2
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝐻 𝑝𝑗
(𝑘)
, 𝑝𝑗
𝑘
+ 𝜆 𝑅𝑃𝑁
1
𝑁𝑎𝑛𝑐
(𝑘)
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝑝𝑗
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑡𝑗
𝑘
, 𝑡𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Training process for Classifier & Regressor
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
c
Ground-truth Classification & Regression associated with proposals 𝑃𝑗
(𝑘)
Find the nearest bounding box from each proposals 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝑃𝑗
𝑘
• Ground-truth Classification: 𝑐𝑗
(𝑘)
≔ 𝑐𝑗,0
(𝑘)
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
(𝑘)
=
1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝑃𝑗
𝑘
< 0.5
0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠
• Ground-truth Regression: 𝑟𝑗
(𝑘)
≔ 𝑟𝑥𝑗
(𝑘)
, 𝑟𝑦𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
(𝑘)
where 𝑟𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑝𝑥 𝑗
(𝑘)
/𝑝𝑤𝑗
(𝑘)
, 𝑟𝑦𝑗
𝑘
= 𝑦𝑖
𝑘
− 𝑝𝑦𝑗
(𝑘)
/𝑝ℎ 𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
= log 𝑤𝑖
𝑘
/𝑝𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑝ℎ 𝑗
(𝑘)
𝐶𝑖
𝑘
+ 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡
Predicted Classification & Regression
• Predicted Classification: 𝑐𝑗
𝑘
= 𝑐𝑗,0
𝑘
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
𝑘
• Predicted Regression: 𝑟𝑗
(𝑘)
= r𝑥𝑗
𝑘
, r𝑦𝑗
𝑘
, r𝑤𝑗
𝑘
, rℎ𝑗
𝑘
where
𝑟𝑗
𝑘
, 𝑐𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘
; 𝑊𝐶𝑅
Region Proposals associated with anchors 𝐴𝑗
(𝑘)
P(𝑘)
≔ 𝑃𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
, 𝑃𝑗
𝑘
= 𝑝𝑥 𝑗
𝑘
, 𝑝𝑦𝑗
𝑘
, 𝑝𝑤𝑗
𝑘
, 𝑝ℎ 𝑗
𝑘
where
𝑝𝑥 𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
+ 𝑎𝑤𝑗
(𝑘)
𝑡𝑥𝑗
(𝑘)
, 𝑝𝑦𝑗
𝑘
= 𝑎𝑦𝑗
(𝑘)
+ 𝑎ℎ𝑗
(𝑘)
𝑡𝑦𝑗
(𝑘)
𝑝𝑤𝑗
𝑘
= 𝑎𝑤𝑗
𝑘
exp 𝑡𝑤𝑗
(𝑘)
, 𝑝ℎ 𝑗
(𝑘)
= 𝑎ℎ𝑗
(𝑘)
exp 𝑡ℎ𝑗
(𝑘)
𝑃(𝑘)
← 𝑁𝑀𝑆(𝑃 𝑘
, 𝑁𝑝𝑟𝑜𝑝)
𝐿 𝐶𝑅 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
, 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 =
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
𝐻 𝑐𝑗
𝑘
, 𝑐𝑗
𝑘
+ 𝜆 𝐶𝑅
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
1 − 𝑐𝑗,0
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑟𝑗
𝑘
, 𝑟𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
The History of object detection
in deep learning
Yolo Yolo v2 SSD
RCNN
Fast RCNN
Faster RCNN
Mask RCNN
DSSD
2012.12
AlexNet
2014.9
VggNet &
InceptionNet
15.12.10
ResNet
2013.11.11
2015.4.30
2015.5.14
15.6.8 15.12.2515.12.08 17.1.23
17.3.20
Application to Ultrasound-based Fetal biometry
References
[Gitbooks] Object Localization and Detection
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html
[ICCV2015 Tutorial] Convolutional Feature Maps
https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf
[Infographic] The Modern History of Object Recognition
https://github.com/Nikasa1889/HistoryObjectRecognition
[Tensorflow Code] tf-Faster-RCNN
https://github.com/kevinjliang/tf-Faster-RCNN
[Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
[pyimagesearch] Intersection over Union (IoU) for object detection
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[Stanford c231n] Lecture 11: Detection and Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Thank you
E-mail: hpkim0512@yonsei.ac.kr/
Hompage: https://hpkim0512.blogspot.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

R-CNN
R-CNNR-CNN
R-CNN
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 

Ähnlich wie Tutorial on Object Detection (Faster R-CNN)

streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
GopiNathVelivela
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 

Ähnlich wie Tutorial on Object Detection (Faster R-CNN) (20)

CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher Dimensions
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptx
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
LSH
LSHLSH
LSH
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & Rasterization
 

Kürzlich hochgeladen

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Kürzlich hochgeladen (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Tutorial on Object Detection (Faster R-CNN)

  • 1. Tutorial Faster R-CNN Object Detection: Localization & Classification Hwa Pyung Kim Department of Computational Science and Engineering, Yonsei University hpkim0512@yonsei.ac.kr
  • 2. 𝑥 𝑦 𝑤 ℎ Bounding box regression (localization): Where? Object Detection: Classification + Regression A dog at (𝒙, 𝒚, 𝒘, 𝒉) + = 1 0 0 ⋮ Dog Cat ⋮ Person Classification (recognition): What? Objection Detection Feature map Encoding (conv&pool) Combining features 𝒙, 𝒚 w h Bounding box information • 𝒙, 𝒚 : top left corner position • w = width • h = height
  • 3. Dog Cat Person ⋮ pool5 features[224,224,3] [7,7,512] Input image 224 224 7 = 224 32 32 = 25 5 = # of pooling 7 7 Vgg16 Networks Pooling CNN-based Object Detection: There are clues of dog (What) at local position (Where) in the convolution feature map Fully-connected layers Classification Regression 𝑥 𝑦 𝑤 ℎ 1 0 0 ⋮ These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”. ⋯ ⋯ Dog
  • 4. Multiple Object Detection: Localize and Classify all objects appearing in the image How many objects are in there? • Classify these multiply overlapping objects • Identify their bounding boxes PASCAL VOC2007
  • 5. Background Person Dining table Extract “region proposals” using selective search method. ConvNet Region based CNN (R-CNN) method CNN input (fixed size) Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape Classifier & Regressor Classifier & Regressor Classifier & Regressor
  • 6. Fast R-CNN feature map ConvNet Classifier & Regressor RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
  • 7. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks feature map Region Proposal Network RoI pooling proposals ConvNet Classifier & Regressor What is Region Proposal Network?
  • 8. Region Proposal Network (RPN) Region Proposal Network 380 480 11 = 360 32 , 15 = 480 32 32 = 25 5 = # of pooling 512 = # of filters 15 11 512 Conv feature map RPN RPN outputs a set of rectangular object proposals, each with an objectness score. How? Region proposals
  • 9. Region Proposal Network Conv feature map 15 11 512 Region Proposals & Anchor Boxes 𝑠 𝑜𝑏𝑗 𝑠 𝑛𝑜𝑏𝑗 t𝑥 t𝑦 t𝑤 tℎ Fully- connected layers Input: each sliding window 3×3×512 For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the proposal is parametrized relative to an anchor. 𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥 𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦 𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤 𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ Output: • 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ • 2 scores: 𝑠 𝑜𝑏𝑗 , 𝑠 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal Anchor box information • 𝒂𝒙 , 𝒂𝒚 : center position • 𝒂𝒘 = width • 𝒂𝒉 = height Anchor box For example, 𝑎𝑤 = 𝑎ℎ = 128 • 𝑎𝑤 and 𝑎ℎ are fixed. • 𝑎𝑥 , 𝑎𝑦 is determined by the position of the red box
  • 10. Region Proposals & Anchor Boxes ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1Conv feature map 15 11 512 Fully- connected layers 3×3×512 • 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed. • 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the position of the red box 9 Anchor boxes = 3 ratios × 3 scales For example, 𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128, 𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯ 𝑎𝑤7 = 1 2 × 𝑎ℎ7 = 128, ⋯ Output: For 𝑖 = 1, ⋯ , 9, • 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖 • 2 scores: 𝑠𝑖 𝑜𝑏𝑗 , 𝑠𝑖 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the 9 proposals are parametrized relative to 9 anchors. Input: each sliding window Region Proposal Network 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖 Anchor box information • 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position • 𝒂𝒘𝒊 = width • 𝒂𝒉𝒊 = height
  • 11. Region Proposal Network Fully- connected layers Conv feature map Anchor boxes 15 11 512 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑜𝑏𝑗 exp 𝑠𝑖 𝑜𝑏𝑗 + exp 𝑠𝑖 𝑛𝑜𝑏𝑗 ⋮ 𝑝1 𝑝𝑥1 𝑝𝑦1 𝑝𝑤1 𝑝ℎ1 𝑝2 𝑝𝑥2 𝑝𝑦2 𝑝𝑤2 𝑝ℎ2 𝑝9 𝑝𝑥9 𝑝𝑦9 𝑝𝑤9 𝑝ℎ9 Extract 9 Proposals relative to 9 Anchors Proposals 3×3×512 ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9
  • 12. ⋮ ⋮ Total # of windows # of proposals per a window Total # of proposals: 11 × 15 × 9 = 1485 Conv feature map The proposals highly overlaps each other! Need to reduce redundancy. Generate Region Proposals 15 11 512 Total#ofwindows=11×15 Region Proposal Network
  • 13. Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ ⋯ Most probable proposal Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Proposal information • 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position • 𝒑𝒘𝒊 = width • 𝒑𝒉𝒊 = height • 𝒑𝒊 = objectness probability, 𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓 𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
  • 14. Region Proposal Network Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.83𝐼𝑂𝑈 = 0.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
  • 15. Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.830.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2 Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) 𝐼𝑂𝑈 =
  • 16. Most probable proposal 30 proposals having IoU>0.7 are discarded. Region Proposal Network Given the most probable proposal, the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Summary of step 1-2 in NMS. Step 3: Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process. Next most probable proposal 36 proposals having IoU>0.7 are discarded. Reduce redundancy by NMS
  • 17. Region Proposal Network Before NMS After NMS 1,485 proposals 300 proposals Repeats the previous procedure until… Reduce redundancy by NMS
  • 18. Summary of RPN Inputs: • Conv feature map Outputs: • Region proposals coordinates. • Probabilities representing how likely the image in that region proposal will be an object. Region Proposal Network
  • 19. feature map Region Proposal Network RoI pooling proposals ConvNet Now we are ready to explain Classifier & Regressor. Classifier & Regressor Classifier & Regressor
  • 20. RoI pooling layer Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′ 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Classifier & Regressor Bilinear interpolation & Max pooling Input for Classifier & Regressor : fixed-size Conv feature map Bilinear interpolation & Max pooling Convert the features inside valid RoI into a small feature map with a fixed spatial extent. 𝑝𝑥′ = 𝑝𝑥 ⋅ 15 , 𝑝𝑦′ = 𝑝𝑦 ⋅ 11 , 𝑝𝑤′ = 𝑝𝑤 ⋅ 15 , 𝑝ℎ′ = 𝑝ℎ ⋅ 11 360 480 11 15 5 8 3 9 7 7 7 7 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′
  • 21. ⋯ 300 RoI pooled feature maps RoI pooling layer generates inputs for Classifier & Regressor Classifier & Regressor 7 7 512 7 7 512 7 7 512 7 7 512
  • 22. ⋮ 𝑠0 𝑟𝑥0 𝑟𝑦0 𝑟𝑤0 𝑟ℎ0 𝑠15 𝑟𝑥15 𝑟𝑦15 𝑟𝑤15 𝑟ℎ15 𝑠20 𝑟𝑥20 𝑟𝑦20 𝑟𝑤20 𝑟ℎ20 𝑝0 = 0.0124 𝑝15 = 0.9797 𝑝20 = 0.0001 ⋮ RoI pooling Classification & Regression per each proposal 𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖 𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖 𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖 ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑗=0 20 exp 𝑠𝑗 Background Person TV monitor 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Fully-connected layers ⋮ 𝑝0 𝑥0 𝑦0 𝑤0 ℎ0 𝑝15 𝑥15 𝑦15 𝑤15 ℎ15 𝑝20 𝑥20 𝑦20 𝑤20 ℎ20 ⋮ Proposal Classification & Bounding-box regression Each of the 21 classes gets its own refined bounding-box prediction and assign estimated probability. Classifier & Regressor 7 7 512 7×7×512 4096
  • 23. Summary of Classification & Regression Regress & classify each class from proposals ⋮ Background Person TV monitor ⋮ ⋮ Reduce redundancy by NMS Dining table ⋮ None None Classifier & Regressor Discard bounding boxes (p < 0.6 or background) ⋮ ⋮ ⋮ Region Proposals
  • 24. Summary of Classifier & Regressor Inputs: • Conv feature map • Region proposals Outputs: • Bounding boxes coordinate of objects in the image. • Classification of bounding boxes Classifier & Regressor
  • 25. Training process for RPN Ground-truth proposals associated with anchors 𝐴𝑗 𝑘 Find the nearest bounding box from each anchors, 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝐴𝑗 𝑘 • Ground-truth probability of objectness: 𝑝𝑗 (𝑘) ≔ 1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 > 0.7 0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 < 0.3 • Ground-truth proposal transformation: 𝑡𝑗 (𝑘) ≔ 𝑡𝑥𝑗 (𝑘) , 𝑡𝑦𝑗 (𝑘) , 𝑡𝑤𝑗 (𝑘) , 𝑡ℎ𝑗 (𝑘) where Δ 𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑎𝑥𝑗 (𝑘) /𝑎𝑤𝑗 (𝑘) , Δ 𝑦𝑗 𝑘 = 𝑦𝑖 (𝑘) − 𝑎𝑦𝑗 (𝑘) /𝑎ℎ𝑗 (𝑘) , Δ 𝑤𝑗 = log 𝑤𝑖 𝑘 /𝑎𝑤𝑗 (𝑘) , Δℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑎ℎ𝑗 (𝑘) Predicted proposals • Predicted probability of objectness: 𝑝𝑗 𝑘 • Predicted proposal transformation: 𝑡𝑗 (𝑘) = 𝑡𝑥𝑗 𝑘 , 𝑡𝑦𝑗 𝑘 , t𝑤𝑗 𝑘 , tℎ𝑗 𝑘 where 𝑡𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 , Anchor boxes 𝐴(𝑘) = 𝐴𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 (𝑘) where A𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) , 𝑎𝑦𝑗 (𝑘) , 𝑎𝑤𝑗 (𝑘) , 𝑎ℎ𝑗 (𝑘) Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) 𝐿 𝑅𝑃𝑁 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) , 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 = 1 2 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝐻 𝑝𝑗 (𝑘) , 𝑝𝑗 𝑘 + 𝜆 𝑅𝑃𝑁 1 𝑁𝑎𝑛𝑐 (𝑘) 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝑝𝑗 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑡𝑗 𝑘 , 𝑡𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 26. Training process for Classifier & Regressor Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) c Ground-truth Classification & Regression associated with proposals 𝑃𝑗 (𝑘) Find the nearest bounding box from each proposals 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝑃𝑗 𝑘 • Ground-truth Classification: 𝑐𝑗 (𝑘) ≔ 𝑐𝑗,0 (𝑘) , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 (𝑘) = 1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝑃𝑗 𝑘 < 0.5 0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠 • Ground-truth Regression: 𝑟𝑗 (𝑘) ≔ 𝑟𝑥𝑗 (𝑘) , 𝑟𝑦𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 (𝑘) where 𝑟𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑝𝑥 𝑗 (𝑘) /𝑝𝑤𝑗 (𝑘) , 𝑟𝑦𝑗 𝑘 = 𝑦𝑖 𝑘 − 𝑝𝑦𝑗 (𝑘) /𝑝ℎ 𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) = log 𝑤𝑖 𝑘 /𝑝𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑝ℎ 𝑗 (𝑘) 𝐶𝑖 𝑘 + 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡 Predicted Classification & Regression • Predicted Classification: 𝑐𝑗 𝑘 = 𝑐𝑗,0 𝑘 , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 𝑘 • Predicted Regression: 𝑟𝑗 (𝑘) = r𝑥𝑗 𝑘 , r𝑦𝑗 𝑘 , r𝑤𝑗 𝑘 , rℎ𝑗 𝑘 where 𝑟𝑗 𝑘 , 𝑐𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘 ; 𝑊𝐶𝑅 Region Proposals associated with anchors 𝐴𝑗 (𝑘) P(𝑘) ≔ 𝑃𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 , 𝑃𝑗 𝑘 = 𝑝𝑥 𝑗 𝑘 , 𝑝𝑦𝑗 𝑘 , 𝑝𝑤𝑗 𝑘 , 𝑝ℎ 𝑗 𝑘 where 𝑝𝑥 𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) + 𝑎𝑤𝑗 (𝑘) 𝑡𝑥𝑗 (𝑘) , 𝑝𝑦𝑗 𝑘 = 𝑎𝑦𝑗 (𝑘) + 𝑎ℎ𝑗 (𝑘) 𝑡𝑦𝑗 (𝑘) 𝑝𝑤𝑗 𝑘 = 𝑎𝑤𝑗 𝑘 exp 𝑡𝑤𝑗 (𝑘) , 𝑝ℎ 𝑗 (𝑘) = 𝑎ℎ𝑗 (𝑘) exp 𝑡ℎ𝑗 (𝑘) 𝑃(𝑘) ← 𝑁𝑀𝑆(𝑃 𝑘 , 𝑁𝑝𝑟𝑜𝑝) 𝐿 𝐶𝑅 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) , 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 = 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 𝐻 𝑐𝑗 𝑘 , 𝑐𝑗 𝑘 + 𝜆 𝐶𝑅 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 1 − 𝑐𝑗,0 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑟𝑗 𝑘 , 𝑟𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 27. The History of object detection in deep learning Yolo Yolo v2 SSD RCNN Fast RCNN Faster RCNN Mask RCNN DSSD 2012.12 AlexNet 2014.9 VggNet & InceptionNet 15.12.10 ResNet 2013.11.11 2015.4.30 2015.5.14 15.6.8 15.12.2515.12.08 17.1.23 17.3.20
  • 29. References [Gitbooks] Object Localization and Detection https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html [ICCV2015 Tutorial] Convolutional Feature Maps https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf [Infographic] The Modern History of Object Recognition https://github.com/Nikasa1889/HistoryObjectRecognition [Tensorflow Code] tf-Faster-RCNN https://github.com/kevinjliang/tf-Faster-RCNN [Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 [pyimagesearch] Intersection over Union (IoU) for object detection https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/ [Stanford c231n] Lecture 11: Detection and Segmentation http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
  • 30. Thank you E-mail: hpkim0512@yonsei.ac.kr/ Hompage: https://hpkim0512.blogspot.com