SlideShare ist ein Scribd-Unternehmen logo
1 von 94
Downloaden Sie, um offline zu lesen
Object Detection
Computer Vision 2
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
Spring 2020
Acknowledgements
2
Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
[UPC TelecomBCN 2016] [UPC TelecomBCN 2017]
Acknowledgements
3
[UPC TelecomBCN 2018]
Míriam Bellver
miriam.bellver@bsc.edu
PhD Candidate
Barcelona Supercomputing Center
Universitat Politècnica de Catalunya
Andreu Girbau
andreu.girbau@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
AutomaticTV
[UPC TelecomBCN 2019]
Outline
4
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Recap
Figure from Charles Ollion - Olivier Grisel
Object Detection
CAT, DOG, DUCK
The task of assigning a label and a
bounding box to all objects in the
image:
1. We don’t know number of objects
2. Object detection relies on object
proposal and object classification
9
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
10
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
11
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
12
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
13
Object Detection as Classification
Challenge:
Very large amount of possibilities:
● position
● scale
● aspect ratio
14
Object Detection as Classification
Question: Do you think it is feasible to evaluate all possibilities ?
Challenge:
Very large amount of possibilities:
● position
● scale
● aspect ratio
Solution: If your classifier is fast enough, go for it
15
Object Detection as Classification
Object Detection with ConvNets?
Convnets are computationally demanding. We can’t test all positions & scales !
Solution: Look at a tiny subset of positions. Choose them wisely :)
16
Outline
17
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
Classic Datasets
18
PASCAL
20 categories
6k training images
6k validation images
10k test images
ILSVRC
200 categories
456k training images
60k validation + test images
COCO
80 categories
200k training images
60k val + test images
Classic Datasets
Classic Datasets
Open Images Dataset
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Open Images Dataset v6
PASCAL
20 categories
6k training images
6k validation images
10k test images
ILSVRC
200 categories
456k training images
60k validation + test images
COCO
80 categories
200k training images
60k val + test images
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Open Images Dataset v6
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified
image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
Images with a large number of different classes annotated (11
on the left, 7 on the right).
Open Images Dataset v6
Outline
25
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
26
Evaluation metrics: Intersection over Union (IoU)
● aka Jaccard index
● Size of intersection divided by the size of
the union
● Evaluate localization
Figure: Pyimagesearch
27
Metric: Average Precision (AP) for Object Detection
Consider the case in which your object detection algorithm provides you:
● Coordinates for each bounding box.
● A confidence for each bounding box
0.7
0.9
Predictions
0.5
28
Rank your predictions based on the confidence score of your object detection
algorithm:
0.7
0.9
0.9
0.7
#1
#2
#3
Predictions
Metric: Average Precision (AP) for Object Detection
0.5
0.5
29
Set a criteria to identify whether your predictions are correct.
Typically, a minimum IoU with respect to the bounding boxes from the ground truth annotation.
○ For example, IoU > 0.5. This is referred as AP0.5
.
○ Other popular options: AP0.75
, or a range of IoU [0.5:0.95] in 0.05 steps
○ Each GT box can only be assigned to one predicted box.
0.7
0.9
0.9
0.7
#1
#2
#3
Ground truth True Positive (TP)
False Positive (FP)
0.5
0.5
Confidencescore
Metric: Average Precision (AP) for Object Detection
30
Compute the point of the Precision-Recall curve by considering as decision thresholds (Thr) the
confidence scores of the ranked detections.
Rank Correct ?
1 True
2 False
3 True
Ground truth True Positive (TP)
False Positive (FP) or
False Negative (FN)
0.7
0.9
0.5
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
Metric: Average Precision (AP) for Object Detection
31
In the object detection case, in which GT objects may never any predictions, we may consider that
trying to find the missing objects with an infinite amount of object proposals would drop precision
to 0.0, but would eventually find all objects, so recall would be 1.0
Table inspired by: Johnatan Hui, “mAP (mean Average Precision) for Object Detection” (Medium 2018)
Ground truth True Positive (TP)
False Positive (FP) or
False Negative (FN)
0.7
0.9
0.5
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
32
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
Precision
Recall
1.0
0.5
0.5 1.0
33
“The precision at each recall level r is interpolated by taking the maximum precision (...) for which the
corresponding recall exceeds r.” (from Pascal VOC) [ref]
[ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual
Object Classes (VOC) challenge." IJCV 2010.
Metric: Average Precision (AP) for Object Detection
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Precision
Recall
1.0
0.5
0.5 1.00
34
Actually, not all PR pairs need to be computed because AP for object detection only requires
the PR pairs related to True positives:
Threshold Precision Recall
0.9 1/1 1/4
0.7 1/2 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Rank Correct ?
1 True
2 False
3 True
∞ True(s)
Metric: Average Precision (AP) for Object Detection
Precision
Recall
1.0
0.5
0.5 1.00
35
● The AP metric approximates the area of the PR curve.
● There are different methods for this approximation that may cause
inconsistencies between implementations.
● Popular ones
○ (suggested) “the mean precision at a set of eleven equally spaced
recall levels [0, 0.1, ...1]”
○ “weighted mean of precisions achieved at each threshold, with the
increase in recall from the previous threshold used as the weight”
(scikit-learn).
[ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual
Object Classes (VOC) challenge." IJCV 2010.
Metric: Average Precision (AP) for Object Detection
36
In our work, we adopt the approach from Pascal VOC:
● AP is “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]”
Threshold Precision Recall
0.9 1/1 1/4
0.5 2/3 2/4
0.0 ⋍ 0 1
Recall Precision
0.0 1.00
0.1 1.00
0.2 1.00
0.3 0.67
0.4 0.67
0.5 0.00
... 0.00
1.0 0.00
AP 0.39
Precision
Recall
1.0
0.5
0.5 1.00
Metric: Average Precision (AP) for Object Detection
37
What if your object detection algorithm does not provide any confidence score ?
#1
#2
#3
Predictions
Metric: Average Precision w/o confidence scores
?
38
If your object detection algorithm does not provide any confidence score:
● Generate N random ranks (eg. N=10) and average your metrics across these N runs.
● Average the obtained APs.
#1
#2
#3
#1
#2
#3
#1
#2
#3
AP1
AP2
APN
...
AP
Metric: Average Precision w/o confidence scores
39
Evaluation metrics: mean Average Precision (mAP)
In the cases of multiple Q classes (eg. car, bike, person…), the mAP averages
across the AP(q) of each class:
● Further readings:
○ Tarang Sangh, “Measuring Object Detection models — mAP — What is Mean Average Precision?” (Medium
2018)
40
Evaluation metrics: Average Precision (AP)
You can obtain implementations for this Average Precision for Object Detection
from:
TensorFlow Microsoft CoCo dataset API
Outline
41
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Outline
42
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Object Detection
There are two main families:
● Two-Stage: Region proposal and then classification
● Single-Stage: A grid in the image where each cell is a
proposal
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
Slide Credit: CS231n 44
Region Proposals
45
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
Region Proposals
46
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
NMS: Non-Maximum Suppression
Region Proposals: from pixels
#SS Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. IJCV
2013
47
Region Proposals: from pixels
#MCG Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2016). Multiscale combinatorial grouping for
image segmentation and object proposal generation. TPAMI 2016
48
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. CVPR 2014.
49
R-CNN
R-CNN
50
We expect: We get:
Non Maximum Suppression + score threshold
R-CNN + Non Maximum Suppression (NMS)
51
#DPM Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively
trained part-based models. TPAMI 2009.
Figure: Adrian Rosebrock
52
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. CVPR 2014.
R-CNN
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of
CNN for each region proposal
2. SVMs and regressors are post-hoc: CNN features not
updated in response to SVMs and regressors
Slide Credit: CS231n 53
Fast R-CNN:
Girshick Fast R-CNN. ICCV 2015c
Solution: Share computation of convolutional layers between region proposals for an image
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
54
Fast R-CNN
Solution: Train it all together end to end
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
55Girshick Fast R-CNN. ICCV 2015
-Softmax over (K+1) classes and 4 box offsets
-Positive box are the ones with larger Intersection
Over Union with ground truth
Fast R-CNN: RoI-Pooling
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
(variable size)
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
(fixed size)
Fully-connected layers expect
low-res conv features:
C x h x w
Slide Credit: CS231n 56Girshick Fast R-CNN. ICCV 2015
RoI poolings allow 1) to propagate gradient only on interesting
regions, and 2) efficient computing.
Input: convolutional map + N regions of interest
Output: tensor of N x 7 x 7 x depth features
Fast R-CNN: RoI-Pooling
Slide Credit: CS231n 58
Fast R-CNN
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
mAP (VOC 2007) 66.0 66.9
Using VGG-16 CNN on Pascal VOC 2007 dataset
Faster!
FASTER!
Better!
Fast R-CNN: Limitation
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Test time per image
with Selective Search
50 seconds 2 seconds
(Speedup) 1x 25x
Test-time speeds do not include region proposals
59
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
Fast R-CNN
60
Learn proposals end-to-end sharing parameters with the classification network
#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region
proposal networks. NIPS 2015.
Faster R-CNN
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
61
Learn proposals end-to-end sharing parameters with the classification network
This network is called Region Proposal Network (RPN), and the proposals are learnt!!
#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region
proposal networks. NIPS 2015.
Faster R-CNN replaces
selective search (SS) with the
Region Proposal Network
(RPN), which is trained
jointly.
Faster R-CNN
Region Proposal Network (RPN)
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
63#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster R-CNN: Towards real-time object detection with region
proposal networks. NIPS 2015.
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fast R-CNN Faster R-CNN
Test time per image
(with proposals)
50 seconds 2 seconds 0.2 seconds
(Speedup) 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
Slide Credit: CS231n 64
Faster R-CNN
Mask R-CNN: Object Detection + Instance Segmentation
65
He et al. Mask R-CNN. ICCV 2017
Next lecture: Instance & Image Segmentation
66
Source: Detectron2
Carles
Ventura
Two-stage vs Single-stage methods
67
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Two-stage vs Single-stage methods
68
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Instead of having two networks
Region Proposals Network + Classifier Network
in one-stage architectures, bounding boxes and confidences for multiple categories
are predicted directly with a single network
Outline
69
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
One-stage methods
70
Problem:
Too many positions & scales to test
Previously… :
Overfeat
71#OverFeat Sermanet, Pierre, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. "Overfeat:
Integrated recognition, localization and detection using convolutional networks." ICLR 2014
One-stage methods
72
Problem:
Too many positions & scales to test
Solution: If your classifier is fast enough, go for it
Previously… :
73
Problem:
Too many positions & scales to test
Modern detectors parallelize feature extraction across all
locations.
Region classification is not slow anymore!
Previously… :
One-stage methods
YOLO: You Only Look Once
74#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
For each cell of the S x S predict:
● B boxes and confidence scores C (5 x B values) + classes c
75Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
Bounding boxes + confidence
Class probability map
Final detections
YOLO: You Only Look Once
76Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
S x S grid on input
Bounding boxes + confidence
Class probability map
Final detections
Final detections:
Cj * prob(c) > threshold
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 77
YOLO: You Only Look Once
YOLO: You Only Look Once
78
Each cell predicts:
- For each bounding box:
- 4 coordinates (x, y, w, h)
- 1 confidence value
- Some number of class
probabilities
For Pascal VOC:
- 7x7 grid
- 2 bounding boxes / cell
- 20 classes
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016
79
Same idea as YOLO, + several predictors at different stages in the network to allow different receptive
fields.
YOLOv2
80Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
YOLOv3
81
YOLO v2
+ residual blocks
+ skip connections
+ upsampling
+ detection at
multiple scales
YOLOv4
82Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection”
arXiv 2020.
83
#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Military Applications & Privacy Risks
84
RetinaNet
85
Matching proposal-based performance with a one-stage approach
Problem of one-stage detectors? They evaluate many candidate locations but only
a few have objects ---> IMBALANCE, making learning inefficient
Focal loss: Key idea is to lower loss weight for well classified samples, increase it
for difficult ones.
Lin et al. Focal Loss for Dense Object Detection. ICCV 2017
Overview
86
Neural Archictures for Object Detection
87
Two-stage methods
● R-CNN
● Fast R-CNN
● Faster R-CNN
● Mask R-CNN
Single-stage methods
● YOLO
● SSD
● RetinaNet
Software implementations
88
Most models are publicly available ready to be used off-the-shelf.
Model Framework
Faster R-CNN [torchvision] (< suggested)
[Detectron2] [Keras]
RetinaNet [Detectron2] (< suggested)
[Keras]
Benchmark [TensorFlow Object Detection API]
YOLOv3 [PyTorch]
SSD [PyTorch] [Tutorial on Keras]
Mask R-CNN [torchvision] (< suggested)
[PyTorch] [Keras & TF] [tutorial]
Software implementations
89
Wang, Xin, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. "Frustratingly Simple
Few-Shot Object Detection." arXiv preprint arXiv:2003.06957 (2020). [code based on Detectron 2]
Probably, you will not be interested in the object classes defined in Pascal/COCO. You can adapt
(fine-tune) existing models to your own object classes.
Software implementations for Mobile
90
TensorFlow Lite: Object Detection
PyTorch Mobile (no specific solutions for object detection)
Software implementations
91
Jordi Torres, “TensorFlow or PyTorch? ” (2020) [in Catalan]
Outline
92
1. Motivation
2. Datasets
3. Evaluation
4. Neural Architectures
a. Two-stage
b. Single-stage
5. Software implementations
Next lab: ImageNet models
93
Dani
Fojo
Your questions
94

Weitere ähnliche Inhalte

Was ist angesagt?

You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
A Deep Learning algorithm for automatic detection of unexpected accidents und...
A Deep Learning algorithm for automatic detection of unexpected accidents und...A Deep Learning algorithm for automatic detection of unexpected accidents und...
A Deep Learning algorithm for automatic detection of unexpected accidents und...19520SaiSree
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
B&W Image Coloring (1).pptx
B&W  Image Coloring (1).pptxB&W  Image Coloring (1).pptx
B&W Image Coloring (1).pptxVaibhav533087
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep LearningCapgemini
 

Was ist angesagt? (20)

You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
A Deep Learning algorithm for automatic detection of unexpected accidents und...
A Deep Learning algorithm for automatic detection of unexpected accidents und...A Deep Learning algorithm for automatic detection of unexpected accidents und...
A Deep Learning algorithm for automatic detection of unexpected accidents und...
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
B&W Image Coloring (1).pptx
B&W  Image Coloring (1).pptxB&W  Image Coloring (1).pptx
B&W Image Coloring (1).pptx
 
Yolo
YoloYolo
Yolo
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
 

Ähnlich wie Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionAmar Jindal
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillanceAshfaqul Haque John
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...Lviv Startup Club
 
Object detection at night
Object detection at nightObject detection at night
Object detection at nightSanjay Crúzé
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
EUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasEUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasAndy Rosales-Elias
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection ProcessBenjamin Bengfort
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSSoma Boubou
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
IRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET Journal
 

Ähnlich wie Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020 (20)

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
 
You only look once
You only look onceYou only look once
You only look once
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
Neural networks
Neural networksNeural networks
Neural networks
 
Object detection at night
Object detection at nightObject detection at night
Object detection at night
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
EUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales EliasEUREKA Poster Andy Rosales Elias
EUREKA Poster Andy Rosales Elias
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
IRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning TechniqueIRJET- Object Detection using Machine Learning Technique
IRJET- Object Detection using Machine Learning Technique
 

Mehr von Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Kürzlich hochgeladen

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Kürzlich hochgeladen (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcelona 2020

  • 1. Object Detection Computer Vision 2 Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Associate Professor Universitat Politècnica de Catalunya Spring 2020
  • 2. Acknowledgements 2 Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya [UPC TelecomBCN 2016] [UPC TelecomBCN 2017]
  • 3. Acknowledgements 3 [UPC TelecomBCN 2018] Míriam Bellver miriam.bellver@bsc.edu PhD Candidate Barcelona Supercomputing Center Universitat Politècnica de Catalunya Andreu Girbau andreu.girbau@upc.edu PhD Candidate Universitat Politècnica de Catalunya AutomaticTV [UPC TelecomBCN 2019]
  • 4. Outline 4 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage
  • 5. Recap Figure from Charles Ollion - Olivier Grisel
  • 6. Recap Figure from Charles Ollion - Olivier Grisel
  • 7. Recap Figure from Charles Ollion - Olivier Grisel
  • 8. Recap Figure from Charles Ollion - Olivier Grisel
  • 9. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image: 1. We don’t know number of objects 2. Object detection relies on object proposal and object classification 9
  • 10. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 10
  • 11. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 11
  • 12. Object Detection as Classification Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 12
  • 13. Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 13 Object Detection as Classification
  • 14. Challenge: Very large amount of possibilities: ● position ● scale ● aspect ratio 14 Object Detection as Classification Question: Do you think it is feasible to evaluate all possibilities ?
  • 15. Challenge: Very large amount of possibilities: ● position ● scale ● aspect ratio Solution: If your classifier is fast enough, go for it 15 Object Detection as Classification
  • 16. Object Detection with ConvNets? Convnets are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 16
  • 17. Outline 17 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage
  • 18. Classic Datasets 18 PASCAL 20 categories 6k training images 6k validation images 10k test images ILSVRC 200 categories 456k training images 60k validation + test images COCO 80 categories 200k training images 60k val + test images
  • 21. Open Images Dataset Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset]
  • 22. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Open Images Dataset v6 PASCAL 20 categories 6k training images 6k validation images 10k test images ILSVRC 200 categories 456k training images 60k validation + test images COCO 80 categories 200k training images 60k val + test images
  • 23. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Open Images Dataset v6
  • 24. Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV 2020. [dataset] Images with a large number of different classes annotated (11 on the left, 7 on the right). Open Images Dataset v6
  • 25. Outline 25 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 26. 26 Evaluation metrics: Intersection over Union (IoU) ● aka Jaccard index ● Size of intersection divided by the size of the union ● Evaluate localization Figure: Pyimagesearch
  • 27. 27 Metric: Average Precision (AP) for Object Detection Consider the case in which your object detection algorithm provides you: ● Coordinates for each bounding box. ● A confidence for each bounding box 0.7 0.9 Predictions 0.5
  • 28. 28 Rank your predictions based on the confidence score of your object detection algorithm: 0.7 0.9 0.9 0.7 #1 #2 #3 Predictions Metric: Average Precision (AP) for Object Detection 0.5 0.5
  • 29. 29 Set a criteria to identify whether your predictions are correct. Typically, a minimum IoU with respect to the bounding boxes from the ground truth annotation. ○ For example, IoU > 0.5. This is referred as AP0.5 . ○ Other popular options: AP0.75 , or a range of IoU [0.5:0.95] in 0.05 steps ○ Each GT box can only be assigned to one predicted box. 0.7 0.9 0.9 0.7 #1 #2 #3 Ground truth True Positive (TP) False Positive (FP) 0.5 0.5 Confidencescore Metric: Average Precision (AP) for Object Detection
  • 30. 30 Compute the point of the Precision-Recall curve by considering as decision thresholds (Thr) the confidence scores of the ranked detections. Rank Correct ? 1 True 2 False 3 True Ground truth True Positive (TP) False Positive (FP) or False Negative (FN) 0.7 0.9 0.5 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 Metric: Average Precision (AP) for Object Detection
  • 31. 31 In the object detection case, in which GT objects may never any predictions, we may consider that trying to find the missing objects with an infinite amount of object proposals would drop precision to 0.0, but would eventually find all objects, so recall would be 1.0 Table inspired by: Johnatan Hui, “mAP (mean Average Precision) for Object Detection” (Medium 2018) Ground truth True Positive (TP) False Positive (FP) or False Negative (FN) 0.7 0.9 0.5 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection
  • 32. 32 Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection Precision Recall 1.0 0.5 0.5 1.0
  • 33. 33 “The precision at each recall level r is interpolated by taking the maximum precision (...) for which the corresponding recall exceeds r.” (from Pascal VOC) [ref] [ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual Object Classes (VOC) challenge." IJCV 2010. Metric: Average Precision (AP) for Object Detection Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Precision Recall 1.0 0.5 0.5 1.00
  • 34. 34 Actually, not all PR pairs need to be computed because AP for object detection only requires the PR pairs related to True positives: Threshold Precision Recall 0.9 1/1 1/4 0.7 1/2 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Rank Correct ? 1 True 2 False 3 True ∞ True(s) Metric: Average Precision (AP) for Object Detection Precision Recall 1.0 0.5 0.5 1.00
  • 35. 35 ● The AP metric approximates the area of the PR curve. ● There are different methods for this approximation that may cause inconsistencies between implementations. ● Popular ones ○ (suggested) “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]” ○ “weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight” (scikit-learn). [ref] Everingham, Mark, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The Pascal Visual Object Classes (VOC) challenge." IJCV 2010. Metric: Average Precision (AP) for Object Detection
  • 36. 36 In our work, we adopt the approach from Pascal VOC: ● AP is “the mean precision at a set of eleven equally spaced recall levels [0, 0.1, ...1]” Threshold Precision Recall 0.9 1/1 1/4 0.5 2/3 2/4 0.0 ⋍ 0 1 Recall Precision 0.0 1.00 0.1 1.00 0.2 1.00 0.3 0.67 0.4 0.67 0.5 0.00 ... 0.00 1.0 0.00 AP 0.39 Precision Recall 1.0 0.5 0.5 1.00 Metric: Average Precision (AP) for Object Detection
  • 37. 37 What if your object detection algorithm does not provide any confidence score ? #1 #2 #3 Predictions Metric: Average Precision w/o confidence scores ?
  • 38. 38 If your object detection algorithm does not provide any confidence score: ● Generate N random ranks (eg. N=10) and average your metrics across these N runs. ● Average the obtained APs. #1 #2 #3 #1 #2 #3 #1 #2 #3 AP1 AP2 APN ... AP Metric: Average Precision w/o confidence scores
  • 39. 39 Evaluation metrics: mean Average Precision (mAP) In the cases of multiple Q classes (eg. car, bike, person…), the mAP averages across the AP(q) of each class: ● Further readings: ○ Tarang Sangh, “Measuring Object Detection models — mAP — What is Mean Average Precision?” (Medium 2018)
  • 40. 40 Evaluation metrics: Average Precision (AP) You can obtain implementations for this Average Precision for Object Detection from: TensorFlow Microsoft CoCo dataset API
  • 41. Outline 41 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 42. Outline 42 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 43. Object Detection There are two main families: ● Two-Stage: Region proposal and then classification ● Single-Stage: A grid in the image where each cell is a proposal
  • 44. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector Slide Credit: CS231n 44
  • 45. Region Proposals 45 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90
  • 46. Region Proposals 46 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90 NMS: Non-Maximum Suppression
  • 47. Region Proposals: from pixels #SS Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. IJCV 2013 47
  • 48. Region Proposals: from pixels #MCG Pont-Tuset, J., Arbelaez, P., Barron, J. T., Marques, F., & Malik, J. (2016). Multiscale combinatorial grouping for image segmentation and object proposal generation. TPAMI 2016 48
  • 49. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. 49 R-CNN
  • 50. R-CNN 50 We expect: We get: Non Maximum Suppression + score threshold
  • 51. R-CNN + Non Maximum Suppression (NMS) 51 #DPM Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part-based models. TPAMI 2009. Figure: Adrian Rosebrock
  • 52. 52 Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014. R-CNN
  • 53. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors Slide Credit: CS231n 53
  • 54. Fast R-CNN: Girshick Fast R-CNN. ICCV 2015c Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 54
  • 55. Fast R-CNN Solution: Train it all together end to end R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 55Girshick Fast R-CNN. ICCV 2015 -Softmax over (K+1) classes and 4 box offsets -Positive box are the ones with larger Intersection Over Union with ground truth
  • 56. Fast R-CNN: RoI-Pooling Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal (variable size) Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal (fixed size) Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 56Girshick Fast R-CNN. ICCV 2015
  • 57. RoI poolings allow 1) to propagate gradient only on interesting regions, and 2) efficient computing. Input: convolutional map + N regions of interest Output: tensor of N x 7 x 7 x depth features Fast R-CNN: RoI-Pooling
  • 58. Slide Credit: CS231n 58 Fast R-CNN R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better!
  • 59. Fast R-CNN: Limitation Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds do not include region proposals 59
  • 60. Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 60 Learn proposals end-to-end sharing parameters with the classification network #Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS 2015. Faster R-CNN
  • 61. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 61 Learn proposals end-to-end sharing parameters with the classification network This network is called Region Proposal Network (RPN), and the proposals are learnt!! #Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS 2015.
  • 62. Faster R-CNN replaces selective search (SS) with the Region Proposal Network (RPN), which is trained jointly. Faster R-CNN
  • 63. Region Proposal Network (RPN) Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 63#Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J.. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015.
  • 64. Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 64 Faster R-CNN
  • 65. Mask R-CNN: Object Detection + Instance Segmentation 65 He et al. Mask R-CNN. ICCV 2017
  • 66. Next lecture: Instance & Image Segmentation 66 Source: Detectron2 Carles Ventura
  • 67. Two-stage vs Single-stage methods 67 Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 68. Two-stage vs Single-stage methods 68 resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels Instead of having two networks Region Proposals Network + Classifier Network in one-stage architectures, bounding boxes and confidences for multiple categories are predicted directly with a single network
  • 69. Outline 69 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 70. One-stage methods 70 Problem: Too many positions & scales to test Previously… :
  • 71. Overfeat 71#OverFeat Sermanet, Pierre, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. "Overfeat: Integrated recognition, localization and detection using convolutional networks." ICLR 2014
  • 72. One-stage methods 72 Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it Previously… :
  • 73. 73 Problem: Too many positions & scales to test Modern detectors parallelize feature extraction across all locations. Region classification is not slow anymore! Previously… : One-stage methods
  • 74. YOLO: You Only Look Once 74#YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input For each cell of the S x S predict: ● B boxes and confidence scores C (5 x B values) + classes c
  • 75. 75Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input Bounding boxes + confidence Class probability map Final detections YOLO: You Only Look Once
  • 76. 76Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Proposal-free object detection pipeline S x S grid on input Bounding boxes + confidence Class probability map Final detections Final detections: Cj * prob(c) > threshold YOLO: You Only Look Once
  • 77. Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 77 YOLO: You Only Look Once
  • 78. YOLO: You Only Look Once 78 Each cell predicts: - For each bounding box: - 4 coordinates (x, y, w, h) - 1 confidence value - Some number of class probabilities For Pascal VOC: - 7x7 grid - 2 bounding boxes / cell - 20 classes 7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
  • 79. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 79 Same idea as YOLO, + several predictors at different stages in the network to allow different receptive fields.
  • 80. YOLOv2 80Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
  • 81. YOLOv3 81 YOLO v2 + residual blocks + skip connections + upsampling + detection at multiple scales
  • 82. YOLOv4 82Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection” arXiv 2020.
  • 83. 83 #YOLO Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
  • 84. Military Applications & Privacy Risks 84
  • 85. RetinaNet 85 Matching proposal-based performance with a one-stage approach Problem of one-stage detectors? They evaluate many candidate locations but only a few have objects ---> IMBALANCE, making learning inefficient Focal loss: Key idea is to lower loss weight for well classified samples, increase it for difficult ones. Lin et al. Focal Loss for Dense Object Detection. ICCV 2017
  • 87. Neural Archictures for Object Detection 87 Two-stage methods ● R-CNN ● Fast R-CNN ● Faster R-CNN ● Mask R-CNN Single-stage methods ● YOLO ● SSD ● RetinaNet
  • 88. Software implementations 88 Most models are publicly available ready to be used off-the-shelf. Model Framework Faster R-CNN [torchvision] (< suggested) [Detectron2] [Keras] RetinaNet [Detectron2] (< suggested) [Keras] Benchmark [TensorFlow Object Detection API] YOLOv3 [PyTorch] SSD [PyTorch] [Tutorial on Keras] Mask R-CNN [torchvision] (< suggested) [PyTorch] [Keras & TF] [tutorial]
  • 89. Software implementations 89 Wang, Xin, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, and Fisher Yu. "Frustratingly Simple Few-Shot Object Detection." arXiv preprint arXiv:2003.06957 (2020). [code based on Detectron 2] Probably, you will not be interested in the object classes defined in Pascal/COCO. You can adapt (fine-tune) existing models to your own object classes.
  • 90. Software implementations for Mobile 90 TensorFlow Lite: Object Detection PyTorch Mobile (no specific solutions for object detection)
  • 91. Software implementations 91 Jordi Torres, “TensorFlow or PyTorch? ” (2020) [in Catalan]
  • 92. Outline 92 1. Motivation 2. Datasets 3. Evaluation 4. Neural Architectures a. Two-stage b. Single-stage 5. Software implementations
  • 93. Next lab: ImageNet models 93 Dani Fojo