SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
SSD:
Single Shot MultiBox Detector
Wei Liu, et al., “SSD: Single Shot MultiBox Detector”, ECCV 2016
6th January, 2019
PR12 Paper Review
JinWon Lee
Samsung Electronics
Related Papers in PR12
• Faster R-CNN
 PR-013 : https://youtu.be/kcPAGIgBGRs
• YOLO
 PR-016 : https://youtu.be/eTDcoeqj1_w
• YOLO9000
 PR-023 : https://youtu.be/6fdclSGgeio
Object Detection
Slide from Stanford CS231n Lecture Slide
Milestones in Generic Object Detection
Figure from “Deep Learning for Generic Object Detection: A Survey”, arxiv1809.02165
2-stage detector
1-stage detector
Faster R-CNN &YOLO
Figure from “Deep Learning for Generic Object Detection: A Survey”, arxiv1809.02165
Introduction
• Current state-of-the-art object detection systems are variants of the
following approach:
 Hypothesize bounding boxes, resample pixels or features for each box, and
apply a high-quality classifier.
• Computationally too intensive and too slow for real-time
applications.
 Faster R-CNN operates at only 7 FPS with mAP 73.2%
• Significantly increased speed comes only at the cost of significantly
decreased detection accuracy.
 YOLO operates at 45FPS with mAP 63.4%
Single Shot MultiBox Detector
• First deep network based object detector that does not resample
pixels or features for bounding box hypotheses(1-stage) and is as
accurate as approaches that do.
 59 FPS with mAP 74.3% onVOC2007 test
Model
Base network – VGG16 FC6 and FC7
converted to
convolution layer
Auxiliary structure to
produce detections
Collections of BBoxes
& Scores
Multi-scale Feature Maps for Detection
• Adding convolutional feature layers to the end of the truncated base
networks.
• These layers decrease in size progressively and allow predictions of
detections at multiple scales.
 YOLO use only single scale feature map.
Convolutional Predictors for Detection
• For a feature layer of size m x n with p channels, the basic element
for predicting parameters of a potential detection is a 3 x 3 x p small
kernel that produces either a score for a category, or a shape offset
relative to the default box coordinates.
 YOLO has a fully connected layer for a score and a bbox coordinates.
Default Boxes and Aspect Ratios
• For each box out of k at a given location, we compute c class scores
and the 4 offsets relative to the original default box shape.
• Total of (c+4)k filters at each location in the feature map.
• Yielding (c+4)kmn outputs for a m x n feature map.
......
19
19
1번째 bbox의
dx, dy, dh,
dw – 4 ch
1번째 bbox의
class score
– 21 ch
2번째 bbox의
dx, dy, dh,
dw – 4 ch
2번째 bbox의
class score –
21 ch
150 = 6 x (21 + 4)
Training – Matching Strategy
• Matching each ground truth box to the default box with the best
jaccard overlap. Every ground truth box has at least 1
correspondence.
• Then, matching default boxes to any ground truth with jaccard
overlap higher than a threshold(0.5).
Jaccard Overlap =
Figure from Wikipedia
Training Objective
• Similar to Faster R-CNN
 N : number of default matched bboxes
 Lconf : classification loss  cross-entropy
 Lloc : localization loss  Smoth L1 loss
set to 1 by cross validation
Choosing Scales and Aspect Ratios for Default
Boxes
• Feature maps from different layers are used to handle scale variance.
• Specific feature map locations learn to be responsive to specific area
of the image and particular scales of objects.
Choosing Scales and Aspect Ratios for Default
Boxes
Slide from https://goo.gl/NsP6Wg
Choosing Scales and Aspect Ratios for Default
Boxes
• If m default maps are used for prediction, the scale of the default
boxes for each feature map is computed as:
• smin is 0.2 and smax is 0.9
• For example (PASCALVOC 2007)
 sk is 0.1, 0.2, 0.375, 0.55, 0.725. 0.9
 means 30, 60, 112.5, 165, 217.5, 270 pixels @ input image(300x300)
Choosing Scales and Aspect Ratios for Default
Boxes
• At each scale, different aspect ratios are considered
For the aspect ratio of 1, one default box is
added whose scale is
Hard Negative Mining
• Significant imbalance between positive and negative training
examples.
 After the matching step, most of the default boxes are negatives, especially
when the number of possible default boxes is large.
• Sorting them using the highest confidence loss for each default box.
• Pick the top ones so that the ratio between the negatives and
positives is at most 3:1
Data Augmentation
• Use the entire original input image.
• Sample a patch so that the minimum jaccard overlap with the
objects is 0.1, 0.3, 0.5, 0.7 or 0.9.
• Randomly sample a patch.
• The size of each sampled patch is [0,1, 1] of the original image size
• Aspect ratio is between ½ and 2.
• Horizontally flipped with probability of 0.5
Experimental Results
• Base Network
 VGG16 with fc6 and fc7 converted to conv layers and pool5 from 2x2 – s2 to
3x3 – s1 using atrous algorithm, removed fc8 and dropout
 Fine-tuned using SGD with momentum
PASCALVOC2007 Detection Results
Model Analysis
Visualization of Performance
Sensitivity to Different Object Characteristics
Data Augmentation for Small Objects
• Randomly place an image on a canvas of 16 of the original image size
filled with mean values before doing any random crop operation.
Results on Multiple Datasets with New Data
Augmentation
InferenceTime
Conclusions
• Single-shot object detector for multiple categories
• One key feature is multiple convolutional maps to deal with different
scales
• More default bounding boxes, the better results obtained
• Comparable accuracy to state-of-the-are object detectors but much
faster
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Yolo
YoloYolo
Yolo
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
YOLO
YOLOYOLO
YOLO
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
You only look once
You only look onceYou only look once
You only look once
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Yolov5
Yolov5 Yolov5
Yolov5
 
Object detection
Object detectionObject detection
Object detection
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Object detection
Object detectionObject detection
Object detection
 

Ähnlich wie PR-132: SSD: Single Shot MultiBox Detector

PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
지현 백
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Lviv Startup Club
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
지현 백
 

Ähnlich wie PR-132: SSD: Single Shot MultiBox Detector (20)

Densebox
DenseboxDensebox
Densebox
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual RepresentationsSimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in Videos
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
Andrii Belas  "Overview of object detection approaches: cases, algorithms and...Andrii Belas  "Overview of object detection approaches: cases, algorithms and...
Andrii Belas "Overview of object detection approaches: cases, algorithms and...
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 

Mehr von Jinwon Lee

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 

Mehr von Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

PR-132: SSD: Single Shot MultiBox Detector

  • 1. SSD: Single Shot MultiBox Detector Wei Liu, et al., “SSD: Single Shot MultiBox Detector”, ECCV 2016 6th January, 2019 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. Related Papers in PR12 • Faster R-CNN  PR-013 : https://youtu.be/kcPAGIgBGRs • YOLO  PR-016 : https://youtu.be/eTDcoeqj1_w • YOLO9000  PR-023 : https://youtu.be/6fdclSGgeio
  • 3. Object Detection Slide from Stanford CS231n Lecture Slide
  • 4. Milestones in Generic Object Detection Figure from “Deep Learning for Generic Object Detection: A Survey”, arxiv1809.02165 2-stage detector 1-stage detector
  • 5. Faster R-CNN &YOLO Figure from “Deep Learning for Generic Object Detection: A Survey”, arxiv1809.02165
  • 6. Introduction • Current state-of-the-art object detection systems are variants of the following approach:  Hypothesize bounding boxes, resample pixels or features for each box, and apply a high-quality classifier. • Computationally too intensive and too slow for real-time applications.  Faster R-CNN operates at only 7 FPS with mAP 73.2% • Significantly increased speed comes only at the cost of significantly decreased detection accuracy.  YOLO operates at 45FPS with mAP 63.4%
  • 7. Single Shot MultiBox Detector • First deep network based object detector that does not resample pixels or features for bounding box hypotheses(1-stage) and is as accurate as approaches that do.  59 FPS with mAP 74.3% onVOC2007 test
  • 8. Model Base network – VGG16 FC6 and FC7 converted to convolution layer Auxiliary structure to produce detections Collections of BBoxes & Scores
  • 9. Multi-scale Feature Maps for Detection • Adding convolutional feature layers to the end of the truncated base networks. • These layers decrease in size progressively and allow predictions of detections at multiple scales.  YOLO use only single scale feature map.
  • 10. Convolutional Predictors for Detection • For a feature layer of size m x n with p channels, the basic element for predicting parameters of a potential detection is a 3 x 3 x p small kernel that produces either a score for a category, or a shape offset relative to the default box coordinates.  YOLO has a fully connected layer for a score and a bbox coordinates.
  • 11. Default Boxes and Aspect Ratios • For each box out of k at a given location, we compute c class scores and the 4 offsets relative to the original default box shape. • Total of (c+4)k filters at each location in the feature map. • Yielding (c+4)kmn outputs for a m x n feature map. ...... 19 19 1번째 bbox의 dx, dy, dh, dw – 4 ch 1번째 bbox의 class score – 21 ch 2번째 bbox의 dx, dy, dh, dw – 4 ch 2번째 bbox의 class score – 21 ch 150 = 6 x (21 + 4)
  • 12. Training – Matching Strategy • Matching each ground truth box to the default box with the best jaccard overlap. Every ground truth box has at least 1 correspondence. • Then, matching default boxes to any ground truth with jaccard overlap higher than a threshold(0.5). Jaccard Overlap = Figure from Wikipedia
  • 13. Training Objective • Similar to Faster R-CNN  N : number of default matched bboxes  Lconf : classification loss  cross-entropy  Lloc : localization loss  Smoth L1 loss set to 1 by cross validation
  • 14. Choosing Scales and Aspect Ratios for Default Boxes • Feature maps from different layers are used to handle scale variance. • Specific feature map locations learn to be responsive to specific area of the image and particular scales of objects.
  • 15. Choosing Scales and Aspect Ratios for Default Boxes Slide from https://goo.gl/NsP6Wg
  • 16. Choosing Scales and Aspect Ratios for Default Boxes • If m default maps are used for prediction, the scale of the default boxes for each feature map is computed as: • smin is 0.2 and smax is 0.9 • For example (PASCALVOC 2007)  sk is 0.1, 0.2, 0.375, 0.55, 0.725. 0.9  means 30, 60, 112.5, 165, 217.5, 270 pixels @ input image(300x300)
  • 17. Choosing Scales and Aspect Ratios for Default Boxes • At each scale, different aspect ratios are considered For the aspect ratio of 1, one default box is added whose scale is
  • 18. Hard Negative Mining • Significant imbalance between positive and negative training examples.  After the matching step, most of the default boxes are negatives, especially when the number of possible default boxes is large. • Sorting them using the highest confidence loss for each default box. • Pick the top ones so that the ratio between the negatives and positives is at most 3:1
  • 19. Data Augmentation • Use the entire original input image. • Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3, 0.5, 0.7 or 0.9. • Randomly sample a patch. • The size of each sampled patch is [0,1, 1] of the original image size • Aspect ratio is between ½ and 2. • Horizontally flipped with probability of 0.5
  • 20. Experimental Results • Base Network  VGG16 with fc6 and fc7 converted to conv layers and pool5 from 2x2 – s2 to 3x3 – s1 using atrous algorithm, removed fc8 and dropout  Fine-tuned using SGD with momentum
  • 24. Sensitivity to Different Object Characteristics
  • 25. Data Augmentation for Small Objects • Randomly place an image on a canvas of 16 of the original image size filled with mean values before doing any random crop operation.
  • 26. Results on Multiple Datasets with New Data Augmentation
  • 28. Conclusions • Single-shot object detector for multiple categories • One key feature is multiple convolutional maps to deal with different scales • More default bounding boxes, the better results obtained • Comparable accuracy to state-of-the-are object detectors but much faster