SlideShare ist ein Scribd-Unternehmen logo
1 von 97
Downloaden Sie, um offline zu lesen
You Only Look Once:
Unified, Real-Time Object Detection (2016)
Taegyun Jeon
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." 

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Slide courtesy of DeepSystem.io “YOLO: You only look once Review”
Evaluation on VOC2007
Main Concept
• Object Detection
• Regression problem
• YOLO
• Only One Feedforward
• Global context
• Unified (Real-time detection)
• YOLO: 45 FPS
• Fast YOLO: 155 FPS
• General representation
• Robust on various background
• Other domain
(Real-time system: https://cs.stackexchange.com/questions/56569/what-is-real-time-in-a-computer-vision-context)
Object Detection as Regression Problem
• Previous: Repurpose classifiers to perform detection
• Deformable Parts Models (DPM)
• Sliding window
• R-CNN based methods
• 1) generate potential bounding boxes.
• 2) run classifiers on these proposed boxes
• 3) post-processing (refinement, elimination, rescore)
Object Detection as Regression Problem
• YOLO: Single Regression Problem
• Image → bounding box coordinate and class probability.
• Extremely Fast
• Global reasoning
• Generalizable representation
Unified Detection
• All BBox, All classes
1) Image → S x S grids
2) grid cell 

→ B: BBoxes and Confidence score
→ C: class probabilities w.r.t #classes
x, y, w, h, confidence
Appendix: Intersection over Union (IOU)
Unified Detection
• Predict one set of class probabilities 

per grid cell, regardless of the number 

of boxes B.
• At test time, 

individual box confidence prediction
Network Design: YOLO
• Modified GoogLeNet
• 1x1 reduction layer (“Network in Network”)
Appendix: GoogLeNet
1 1 4 10 6 2 1 1
24
2
Conv. layer
Fully conn. Layer
Conv. layer
3x3x512
Maxpool Layer
2x2-s-2
Conv. layer
3x3x1024
Maxpool Layer
2x2-s-2
Conv. layer
3x3x1024
Conv. Layer
3x3x1024-s-2
Conv. layer
3x3x1024
Conv. Layer
3x3x1024
Network Design: YOLO-tiny (9 Conv. Layers)
Conv. layer
3x3x16
Maxpool Layer
2x2-s-2
Conv. layer
3x3x32
Maxpool Layer
2x2-s-2
Conv. layer
3x3x64
Maxpool Layer
2x2-s-2
Conv. layer
3x3x128
Maxpool Layer
2x2-s-2
Conv. layer
3x3x256
Maxpool Layer
2x2-s-2
Conv. layer
3x3x512
Maxpool Layer
2x2-s-2
Conv. layer
3x3x1024
Conv. layer
3x3x1024
Conv. layer
3x3x1024
9
2
Conv. layer
Fully conn. Layer
1
1
1
1
1
1
1 1 1
YOLO-Tiny
YOLO
Training
1) Pretrain with ImageNet 1000-class 

competition dataset
20Conv. layer
Feature Extractor Object Classifier
Training
2) “Network on Convolutional Feature Maps”
Increased input resolution (224x224)
→(448x448)
Appendix: Networks on Convolutional Feature Maps
20Conv. layer
4
2
Conv. layer
Fully conn. Layer
Feature Extractor Object Classifier
Inference
Loss Function (sum-squared error)
Appendix: sum-squared error
Loss Function (sum-squared error)
Loss Function (sum-squared error)
Re
If object appears in cell i
The jth bbox predictor in cell i is 

“responsible” for that prediction
Train strategy
epochs=135
batch_size=64
momentum_a = 0.9
decay=0.0005
lr=[10-3, 10-2, 10-3, 10-4]
dropout_rate=0.5
augmentation

=[scaling, translation, exposure, saturation]
Inference
Just like in training?
S=7, B=2 for Pascal VOC
Limitation of YOLO
• Group of small objects
• Unusual aspect ratios
• Coarse feature
• Localization error of bounding box
Comparison to Other Detection System
Comparison to Other Real-Time System
VOC Error
Combining Fast R-CNN and YOLO
VOC 2012 Leaderboard
Generalizability: Person Detection in Artwork
Generalizability: Person Detection in Artwork
Summary
Appendix | Implementation
• YOLO (darknet): https://pjreddie.com/darknet/yolov1/
• YOLOv2 (darknet): https://pjreddie.com/darknet/yolo/
• YOLO (caffe): https://github.com/xingwangsfu/caffe-yolo
• YOLO (TensorFlow: Train+Test): https://github.com/thtrieu/darkflow
• YOLO (TensorFlow: Test): https://github.com/gliese581gg/YOLO_tensorflow
Appendix | Slides
• DeepSense.io (google presentation - "YOLO: Inference")
Thanks
fb.com/taegyun.jeon
github.com/tgjeon
taylor.taegyun.jeon@gmail.com
Paper reviewed by 

Taegyun Jeon
Appendix | Intersection over Union (IoU)
• IoU(pred, truth)=[0, 1]
Appendix | GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Appendix | GoogLeNet
Inception Module (1x1 convolution for dimension reductions)
Appendix | Networks on Convolutional Feature Maps
Previous: FC
Proposed : Conv + FC
Ren, Shaoqing, et al. "Object detection networks on convolutional feature maps." IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
Appendix | Sum-squared error (SSE)
sum of squared errors of prediction (SSE), is the sum of the
squares of residuals (deviations predicted from actual empirical values
of data). It is a measure of the discrepancy between the data and an
estimation model. A small RSS indicates a tight fit of the model to the
data. It is used as an optimality criterion in parameter selection and
model selection.
https://en.wikipedia.org/wiki/Residual_sum_of_squares

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001
 
You only look once
You only look onceYou only look once
You only look once
 
Object detection
Object detectionObject detection
Object detection
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Yolo
YoloYolo
Yolo
 
Object detection
Object detectionObject detection
Object detection
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Yolov5
Yolov5 Yolov5
Yolov5
 
Multiple object detection
Multiple object detectionMultiple object detection
Multiple object detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 

Ähnlich wie [PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection

Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume LaforgeGroovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
Guillaume Laforge
 
Testing multithreaded java applications for synchronization problems
Testing multithreaded java applications for synchronization problemsTesting multithreaded java applications for synchronization problems
Testing multithreaded java applications for synchronization problems
Vassil Popovski
 

Ähnlich wie [PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection (20)

Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
IISc Internship Report
IISc Internship ReportIISc Internship Report
IISc Internship Report
 
Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume LaforgeGroovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
Groovy to infinity and beyond - SpringOne2GX - 2010 - Guillaume Laforge
 
Procedural Content Generation with Clojure
Procedural Content Generation with ClojureProcedural Content Generation with Clojure
Procedural Content Generation with Clojure
 
Sync considered unethical
Sync considered unethicalSync considered unethical
Sync considered unethical
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Testing multithreaded java applications for synchronization problems
Testing multithreaded java applications for synchronization problemsTesting multithreaded java applications for synchronization problems
Testing multithreaded java applications for synchronization problems
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Code Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code MutationCode Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code Mutation
 
Lego For Engineers - Dependency Injection for LIDNUG (2011-06-03)
Lego For Engineers - Dependency Injection for LIDNUG (2011-06-03)Lego For Engineers - Dependency Injection for LIDNUG (2011-06-03)
Lego For Engineers - Dependency Injection for LIDNUG (2011-06-03)
 
PyDresden 20170824 - Deep Learning for Computer Vision
PyDresden 20170824 - Deep Learning for Computer VisionPyDresden 20170824 - Deep Learning for Computer Vision
PyDresden 20170824 - Deep Learning for Computer Vision
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 

Mehr von Taegyun Jeon

[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
Taegyun Jeon
 

Mehr von Taegyun Jeon (15)

TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
TensorFlow-KR 3rd meetup - Lightning Talk for SI AnalyticsTensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training
 
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
GDG DevFest Seoul 2017: Codelab - Time Series Analysis for Kaggle using Tenso...
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
[대전AI포럼] 위성영상 분석 기술 개발 현황 소개
 
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
[PR12] PR-026: Notes for CVPR Machine Learning Sessions[PR12] PR-026: Notes for CVPR Machine Learning Sessions
[PR12] PR-026: Notes for CVPR Machine Learning Sessions
 
[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networks[PR12] image super resolution using deep convolutional networks
[PR12] image super resolution using deep convolutional networks
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & KerasGoogle Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
 
TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)
TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)
TensorFlow KR 2nd Meetup - Lightening talk (Satrec Initiative)
 
인공지능: 변화와 능력개발
인공지능: 변화와 능력개발인공지능: 변화와 능력개발
인공지능: 변화와 능력개발
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 

Kürzlich hochgeladen

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 

Kürzlich hochgeladen (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 

[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection

  • 1. You Only Look Once: Unified, Real-Time Object Detection (2016) Taegyun Jeon Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." 
 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  • 2. Slide courtesy of DeepSystem.io “YOLO: You only look once Review” Evaluation on VOC2007
  • 3. Main Concept • Object Detection • Regression problem • YOLO • Only One Feedforward • Global context • Unified (Real-time detection) • YOLO: 45 FPS • Fast YOLO: 155 FPS • General representation • Robust on various background • Other domain (Real-time system: https://cs.stackexchange.com/questions/56569/what-is-real-time-in-a-computer-vision-context)
  • 4. Object Detection as Regression Problem • Previous: Repurpose classifiers to perform detection • Deformable Parts Models (DPM) • Sliding window • R-CNN based methods • 1) generate potential bounding boxes. • 2) run classifiers on these proposed boxes • 3) post-processing (refinement, elimination, rescore)
  • 5. Object Detection as Regression Problem • YOLO: Single Regression Problem • Image → bounding box coordinate and class probability. • Extremely Fast • Global reasoning • Generalizable representation
  • 6. Unified Detection • All BBox, All classes 1) Image → S x S grids 2) grid cell 
 → B: BBoxes and Confidence score → C: class probabilities w.r.t #classes x, y, w, h, confidence Appendix: Intersection over Union (IOU)
  • 7. Unified Detection • Predict one set of class probabilities 
 per grid cell, regardless of the number 
 of boxes B. • At test time, 
 individual box confidence prediction
  • 8. Network Design: YOLO • Modified GoogLeNet • 1x1 reduction layer (“Network in Network”) Appendix: GoogLeNet 1 1 4 10 6 2 1 1 24 2 Conv. layer Fully conn. Layer Conv. layer 3x3x512 Maxpool Layer 2x2-s-2 Conv. layer 3x3x1024 Maxpool Layer 2x2-s-2 Conv. layer 3x3x1024 Conv. Layer 3x3x1024-s-2 Conv. layer 3x3x1024 Conv. Layer 3x3x1024
  • 9. Network Design: YOLO-tiny (9 Conv. Layers) Conv. layer 3x3x16 Maxpool Layer 2x2-s-2 Conv. layer 3x3x32 Maxpool Layer 2x2-s-2 Conv. layer 3x3x64 Maxpool Layer 2x2-s-2 Conv. layer 3x3x128 Maxpool Layer 2x2-s-2 Conv. layer 3x3x256 Maxpool Layer 2x2-s-2 Conv. layer 3x3x512 Maxpool Layer 2x2-s-2 Conv. layer 3x3x1024 Conv. layer 3x3x1024 Conv. layer 3x3x1024 9 2 Conv. layer Fully conn. Layer 1 1 1 1 1 1 1 1 1 YOLO-Tiny YOLO
  • 10. Training 1) Pretrain with ImageNet 1000-class 
 competition dataset 20Conv. layer Feature Extractor Object Classifier
  • 11. Training 2) “Network on Convolutional Feature Maps” Increased input resolution (224x224) →(448x448) Appendix: Networks on Convolutional Feature Maps 20Conv. layer 4 2 Conv. layer Fully conn. Layer Feature Extractor Object Classifier
  • 12.
  • 14.
  • 15.
  • 16. Loss Function (sum-squared error) Appendix: sum-squared error
  • 18. Loss Function (sum-squared error) Re If object appears in cell i The jth bbox predictor in cell i is 
 “responsible” for that prediction
  • 19. Train strategy epochs=135 batch_size=64 momentum_a = 0.9 decay=0.0005 lr=[10-3, 10-2, 10-3, 10-4] dropout_rate=0.5 augmentation
 =[scaling, translation, exposure, saturation]
  • 20. Inference Just like in training? S=7, B=2 for Pascal VOC
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81. Limitation of YOLO • Group of small objects • Unusual aspect ratios • Coarse feature • Localization error of bounding box
  • 82. Comparison to Other Detection System
  • 83. Comparison to Other Real-Time System
  • 90. Appendix | Implementation • YOLO (darknet): https://pjreddie.com/darknet/yolov1/ • YOLOv2 (darknet): https://pjreddie.com/darknet/yolo/ • YOLO (caffe): https://github.com/xingwangsfu/caffe-yolo • YOLO (TensorFlow: Train+Test): https://github.com/thtrieu/darkflow • YOLO (TensorFlow: Test): https://github.com/gliese581gg/YOLO_tensorflow
  • 91. Appendix | Slides • DeepSense.io (google presentation - "YOLO: Inference")
  • 93. Appendix | Intersection over Union (IoU) • IoU(pred, truth)=[0, 1]
  • 94. Appendix | GoogLeNet Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  • 95. Appendix | GoogLeNet Inception Module (1x1 convolution for dimension reductions)
  • 96. Appendix | Networks on Convolutional Feature Maps Previous: FC Proposed : Conv + FC Ren, Shaoqing, et al. "Object detection networks on convolutional feature maps." IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
  • 97. Appendix | Sum-squared error (SSE) sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection. https://en.wikipedia.org/wiki/Residual_sum_of_squares