SlideShare ist ein Scribd-Unternehmen logo
1 von 31
1
Object Detection
By Usman Qayyum
4, Dec, 2018
Talk Covers Three Papers (Object Detection -> Embedded Computing)
2
SqueezeNet-2016SSD-2016 TinySSD-2018
=+
Image Classification/Object Detection
● Autonomous vehicles, smart video surveillance, facial detection and various
applications, fast and robust object detection is need of an hour
● Nonly recognizing and classifying every object in an image, but localizing each one by
drawing the appropriate bounding box around it.
3
CNN Migration (Image Classification)
4
Object Detection as Classification
CNN
deer?
cat?
background?
Object Detection as Classification
CNN
deer?
cat?
background?
Object Detection as Classification
CNN
deer?
cat?
background?
Object Detection as Classification
with Sliding Window
CNN
deer?
cat?
background?
Object Detection as Classification
with Box Proposals
Box Proposal Method : Selective Search
Segmentation As Selective
Search for Object
Recognition. van de Sande
et al. ICCV 2011
Idea behind Object Detectors
● Box Proposals
● Classifier Algorithm
11
RCNN
Rich feature hierarchies for accurate object detection and semantic segmentation.
Girshick et al. CVPR 2014.
https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Fast-RCNN
Fast R-CNN. Girshick. ICCV 2015.
https://arxiv.org/abs/1504.08083
Idea: No need to recompute features for every box independently,
Regress refined bounding box coordinates.
Faster-RCNN
Ren et al. NIPS 2015.
https://arxiv.org/abs/1506.01497
Idea: Integrate the Bounding Box Propos
als as part of the CNN predictions
YOLO- You Only Look Once
● Single Shot Detector
Redmon et al. CVPR 2016.
https://arxiv.org/abs/1506.02640
Idea: No bounding box proposals.
Predict a class and a box for every
location in a grid.
SSD: Single Shot Detector
Liu et al. ECCV 2016.
Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augm
entation + Hard negative mining + Other design choices in the network.
-The overall objective loss function is a weighted sum of the localization loss and the confidence loss(conf)
N: the number of matched default boxes
l: predicted boxes g: the ground truth box
x=1 denotes some certain default box is matched to a ground truth box17
1
( , , , ) ( ( , ) ( , , ))conf locL x c l g L x c L x l g
N
 
SSD: Single Shot Detector
Performance
18
Accuracy Vs Computation
19
AI Workload Migration
Embedded
(Mobile/Edge)
Server/Clou
d
Execution/Inference
Training
Execution/Inference
Intelligence &
Analytics
Key Use Cases
Vision | Audio | Security
Benefits
Low Latency | Privacy
AI in Embedded Devices
21
How ? (AI in Embedded Devices)
Pruning Quantization22
SqueezeNet (Parameter Reduction)
● Strategy 1. Replace 3x3 filters with 1x1 filters
○ Parameters per filter: (3x3 filter) = 9 * (1x1 filter)
● Strategy 2. Decrease the number of input channels to 3x3 filters
○ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter)
● Strategy 3. Downsample late in the network so that convolution layers have large
activation maps
○ Size of activation maps: the size of input data, the choice of layers in which to downsample in the
CNN architecture
23
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."
Strategy#1 Conv1x1 or Kernel Reduction
24
Microarchitecture – Fire Module
25
Squeeze Layer
Set s1x1 < (e1x1 + e3x3),
limits the # of input channels to 3*3 filters
Strategy 2. Decrease the number of input channels to
3x3 filters
Total # of parameters: (# of input channels) * (# of
filters) * ( # of parameters per filter)
How much can we limit
s1x1?
Strategy 1. Replace 3*3 filters with 1*1 filters
Parameters per filter: (3*3 filter) = 9 * (1*1 filter)
How much can we replace 3*3 with 1*1?
(e1x1 vs e3x3 )?
Expand
● In the "expand" modules, what are the
tradeoffs when we turn the knob
between mostly 1x1 and mostly 3x3
filters?
● Hypothesis: if having more weights
leads to higher accuracy, then having
all 3x3 filters should give the highest
accuracy
27
28
Macroarchitecture
29
Strategy 3. Downsample late in the network so that
convolution layers have large activation maps
Size of activation maps: the size of input data, the
choice of layers in which to downsample in the CNN
architecture
Performance
30
TinySSD (SSD with Microarchitecture)
31
Thanks for your attention.
32

Weitere ähnliche Inhalte

Was ist angesagt?

Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
Manish Khare
 

Was ist angesagt? (20)

Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentation
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Object detection
Object detectionObject detection
Object detection
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
YOLO
YOLOYOLO
YOLO
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Moving object detection in video surveillance
Moving object detection in video surveillanceMoving object detection in video surveillance
Moving object detection in video surveillance
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 
Object tracking
Object trackingObject tracking
Object tracking
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objectsVideo object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object Tracking
 
Object recognition
Object recognitionObject recognition
Object recognition
 

Ähnlich wie Object Detection using Deep Neural Networks

Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000
Ivonne Liu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
butest
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
Editor IJARCET
 

Ähnlich wie Object Detection using Deep Neural Networks (20)

Convolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep LearningConvolutional Neural Network (CNN)of Deep Learning
Convolutional Neural Network (CNN)of Deep Learning
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...
 
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
Visual Search Engine with MXNet Gluon
Visual Search Engine with MXNet GluonVisual Search Engine with MXNet Gluon
Visual Search Engine with MXNet Gluon
 
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDSHYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
 
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDSHYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
HYBRIDIZATION OF DCT BASED STEGANOGRAPHY AND RANDOM GRIDS
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 

Mehr von Usman Qayyum

Mehr von Usman Qayyum (6)

Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Ai for kids
Ai for kidsAi for kids
Ai for kids
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
Thermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkThermal colorization using Deep Neural Network
Thermal colorization using Deep Neural Network
 
Introduction to deep Learning
Introduction to deep LearningIntroduction to deep Learning
Introduction to deep Learning
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 

Object Detection using Deep Neural Networks

  • 1. 1 Object Detection By Usman Qayyum 4, Dec, 2018
  • 2. Talk Covers Three Papers (Object Detection -> Embedded Computing) 2 SqueezeNet-2016SSD-2016 TinySSD-2018 =+
  • 3. Image Classification/Object Detection ● Autonomous vehicles, smart video surveillance, facial detection and various applications, fast and robust object detection is need of an hour ● Nonly recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. 3
  • 4. CNN Migration (Image Classification) 4
  • 5. Object Detection as Classification CNN deer? cat? background?
  • 6. Object Detection as Classification CNN deer? cat? background?
  • 7. Object Detection as Classification CNN deer? cat? background?
  • 8. Object Detection as Classification with Sliding Window CNN deer? cat? background?
  • 9. Object Detection as Classification with Box Proposals
  • 10. Box Proposal Method : Selective Search Segmentation As Selective Search for Object Recognition. van de Sande et al. ICCV 2011
  • 11. Idea behind Object Detectors ● Box Proposals ● Classifier Algorithm 11
  • 12. RCNN Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014. https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
  • 13. Fast-RCNN Fast R-CNN. Girshick. ICCV 2015. https://arxiv.org/abs/1504.08083 Idea: No need to recompute features for every box independently, Regress refined bounding box coordinates.
  • 14. Faster-RCNN Ren et al. NIPS 2015. https://arxiv.org/abs/1506.01497 Idea: Integrate the Bounding Box Propos als as part of the CNN predictions
  • 15. YOLO- You Only Look Once ● Single Shot Detector Redmon et al. CVPR 2016. https://arxiv.org/abs/1506.02640 Idea: No bounding box proposals. Predict a class and a box for every location in a grid.
  • 16. SSD: Single Shot Detector Liu et al. ECCV 2016. Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augm entation + Hard negative mining + Other design choices in the network.
  • 17. -The overall objective loss function is a weighted sum of the localization loss and the confidence loss(conf) N: the number of matched default boxes l: predicted boxes g: the ground truth box x=1 denotes some certain default box is matched to a ground truth box17 1 ( , , , ) ( ( , ) ( , , ))conf locL x c l g L x c L x l g N   SSD: Single Shot Detector
  • 20. AI Workload Migration Embedded (Mobile/Edge) Server/Clou d Execution/Inference Training Execution/Inference Intelligence & Analytics Key Use Cases Vision | Audio | Security Benefits Low Latency | Privacy
  • 21. AI in Embedded Devices 21
  • 22. How ? (AI in Embedded Devices) Pruning Quantization22
  • 23. SqueezeNet (Parameter Reduction) ● Strategy 1. Replace 3x3 filters with 1x1 filters ○ Parameters per filter: (3x3 filter) = 9 * (1x1 filter) ● Strategy 2. Decrease the number of input channels to 3x3 filters ○ Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) ● Strategy 3. Downsample late in the network so that convolution layers have large activation maps ○ Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture 23 Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."
  • 24. Strategy#1 Conv1x1 or Kernel Reduction 24
  • 25. Microarchitecture – Fire Module 25 Squeeze Layer Set s1x1 < (e1x1 + e3x3), limits the # of input channels to 3*3 filters Strategy 2. Decrease the number of input channels to 3x3 filters Total # of parameters: (# of input channels) * (# of filters) * ( # of parameters per filter) How much can we limit s1x1? Strategy 1. Replace 3*3 filters with 1*1 filters Parameters per filter: (3*3 filter) = 9 * (1*1 filter) How much can we replace 3*3 with 1*1? (e1x1 vs e3x3 )?
  • 26. Expand ● In the "expand" modules, what are the tradeoffs when we turn the knob between mostly 1x1 and mostly 3x3 filters? ● Hypothesis: if having more weights leads to higher accuracy, then having all 3x3 filters should give the highest accuracy 27
  • 27. 28
  • 28. Macroarchitecture 29 Strategy 3. Downsample late in the network so that convolution layers have large activation maps Size of activation maps: the size of input data, the choice of layers in which to downsample in the CNN architecture
  • 30. TinySSD (SSD with Microarchitecture) 31
  • 31. Thanks for your attention. 32