SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Contextless Object Recognition 
with Shape-enriched SIFT and 
Bags of Features 
Marcel Tella Amo 
Directed by Dr. Matthias Zeppelzauer (TU Wien) 
Codirected by Dr. Xavier Giró-i-Nieto (UPC)
Motivation 
2 
Object Recognition and Classification 
Categories 
• Ball 
• Airplane 
• Chair 
• Beaver 
• … 
Ball Airplane Chair 
Shape 
Information 
Texture 
information
3 
Index 
Requirements 
State of the Art 
Design 
Results
Requirements 
4
Requirements State of the Art Design Results 
Design shape features that can be used in an 
aggregated framework, like Bag of Words with 
no need of matching or alignment. 
5 
Take a 
successful method : 
Shape 
Information 
SIFT
Requirements State of the Art Design Results 
Analyse the implication of the vocabulary size 
with respect to the size of the shape features. 
SIFT 
6 
Shape
The proposed features should be at least scale, 
rotation and translation invariant. If it is 
possible, flip invariant as well. 
7 
Requirements State of the Art Design Results
Need for Segmentation to codify the shape 
Study the limitations of shape coding when using a state of the art 
segmentation. 
Manual annotations vs Automatic Segmentation 
8 
Requirements State of the Art Design Results
State of the Art 
9
Requirements State of the Art Design Results 
Object Candidates algorithms 
Multiscale Combinatorial Grouping (MCG) 
10 
Ranking 
Object Plausibility 
Arbelaez, P., Pont-Tuset, J., Barron, J. T., Marques, F., Malik, J. (2014). 
Multiscale Combinatorial Grouping. CVPR. 
High 
Low
Requirements State of the Art Design Results 
Shape Context 
11 
G. Mori, S. Belongie, and J. Malik. Ecient shape 
matching using shape 
contexts. PAMI, 27(11), 2005.
Requirements State of the Art Design Results 
Interest point descriptors: 
SIFT descriptor 
Simplified example 
Typically 4x4 divisions * 8 bins/hist = 128 features 
dense SIFT 
sparse SIFT 
12 
David G Lowe, Distinctive image features from scale-invariant keypoints, International journal of 
computer vision 60 (2004), no. 2, 91{110.
Requirements State of the Art Design Results 
Enrichment of SIFT 
Extra features : Absolute spatial location (X,Y) or angle and distance 
Rene Grzeszick, Leonard Rothacker, and Gernot A. Fink, "Bag-of-features representations using spatial visual vocabularies 
for object classication,“ in IEEE Intl. Conf. on Image Processing, Melbourne, Australia, 2013 
Extra features : Relative position + aspect ratio + scale ratio + Color Space 
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In 
Computer Vision{ECCV 2012} (pp. 430-443). Springer Berlin Heidelberg. 
13 
128-dimensional SIFT descriptor Extra features
Bag of Words 
14 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Bags of Words - Pipeline 
15 
Get 
Descriptors 
Clustering 
(K-means) 
Create 
histograms 
Train Model 
(SVM) 
Image 
Create 
histogram 
Evaluate 
(SVM)
Design 
16
Requirements State of the Art Design Results 
Why dense SIFT? 
17
Main principle: Combination of dense SIFT and Object Candidates 
18 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Distance to the nearest border (DNB) 
Logarithmic distance to the nearest border (LDNB) 
Less influence of big distances 
19 
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order 
pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.
Distance and Angle to the nearest border (DANB) 
Problem: Really similar in 2D but very different values. 
Solution: Codify them in two separated features. 
20 
Requirements State of the Art Design Results
Rotation Invariant Angle to the nearest border 
21 
Requirements State of the Art Design Results
Distance to the center (DC) 
22 
Requirements State of the Art Design Results
η - Angular Scan (ηAS) 
WINNER! 
23 
Requirements State of the Art Design Results
Shape Context from a dense SIFT (DSC) 
Note: It crosses the contour of the region like Shape Context. 
ηAS does not! 
24 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Rotation Invariant Region Quantization (RIRQ) 
Main idea: Get spatial information. 
Easily extensible to a pyramid! 
25 
Lazebnik, S., Schmid, C., & Ponce, J. (2006). 2006 IEEE Computer Society Conference on (Vol. 2, pp. 
2169-2178). IEEE.
Achieving flip invariance (RIRQ) 
1 
2 
4 3 
1 
2 3 
4 
2 
4 1 
3 2 
3 
4 
1 
4 2 2 4 
SORT SORT 
2 4 
26 
Requirements State of the Art Design Results
Where do we integrate our features? 
Two main Architectures 
Enriched SIFT (eSIFT) 
SIFT Shape features 
Visual Vocabulary 
Bag of eSIFT visual words 
BoW+Shape 
SIFT 
Visual Vocabulary 
Bag of Words Shape histogram 
27 
Requirements State of the Art Design Results
BoW+Shape Creation of the shape histograms 
SIFT 
Accumulation of features 
Visual Vocabulary 
Bag of Words Shape histogram 
1 
1. Accumulate the 
same feature for all 
points . 
2. Create a 
histogram of X bins 
for that feature. 
1 
2 
2 
3. Concatenate 
histograms to create 
the final one. 
Example: 8-Angular Scan 
8 distances (different angles) 
# SIFT keypoints 
28 
Requirements State of the Art Design Results
Results and conclusions 
29
Requirements State of the Art Design Results 
The dataset: Caltech-101 
30 
•Well recognized dataset 
• 101 Different Categories of images 
• Ground truth annotations available 
• From 40 to 800 images per category.
Requirements State of the Art Design Results 
Metrics: Accuracy (%) 
31 
Correct Classifications 
Correct + Incorrect Classifications
Requirements State of the Art Design Results 
Experiments setup 
32 
• 30 images per category in train and 30-50 in test. 
• 101 Categories + Background category. 
• Different Vocabulary sizes in the X axis. 
• Accuracy(%) in the Y axis: 
•Experiments and analysis: 
• eSIFT 
• BoW+S 
• eSIFT vs BoW+S 
• Performance acheived 
• Comparison between adding features before or after quantization 
• Number of bins per histogram 
• Ground truth vs MCG Object Canditates 
• Context vs Shape
Results enriched SIFT 
33 
Requirements State of the Art Design Results
Results BoW+S 
34 
Requirements State of the Art Design Results
Requirements State of the Art Design Results 
Performance achieved 
35 
Conclusion 
With Angular Scan, there is an increase of performance 
from 16% to around 41%.
Requirements State of the Art Design Results 
Comparison between adding features 
after and before 
Conclusion 
In Angular Scan, if the number of shape features is high, 
both architectures tend to converge. 36
Requirements State of the Art Design Results 
Number of bins per histogram 
Conclusion 
In Angular Scan, 8 bins is the value that gives the best 
performance. 37
Requirements State of the Art Design Results 
Ground truth vs MCG Object Candidates 
Conclusion 1 
2 
Higher vocabulary values lead to a more robust 
approach in terms of segmentation errors. 
Shape-based methods are more sensible to 
segmentation errors than texture-based. 38
Requirements State of the Art Design Results 
Context gain vs Shape gain 
Conclusion 
Object 
Context 
It gives better performance to codify the shape 
than the context of the image. 39
FutureWork 
Comparison betwen our work and 
Second Order Pooling 
PhD thesis of Carles Ventura 
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order 
pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg. 
40
Distance to the nearest border (DNB) 
41 
Future Work
Conclusions 
1. Increase of performance from 16% to around 41% 
2. In Angular Scan, if the number of shape features is high, both 
architectures tend to converge. 
3. In Angular Scan, 8 bins is the value that gives the best performance. 
4. Higher vocabulary values lead to a more robust approach in terms of 
segmentation errors. 
5. Shape-based methods are more sensible to segmentation errors than 
texture-based. 
6. It gives better performance to codify the shape than the context of the 
image. 
Thank you! 
Questions? 42

Weitere ähnliche Inhalte

Was ist angesagt?

Cvpr2007 object category recognition p2 - part based models
Cvpr2007 object category recognition   p2 - part based modelsCvpr2007 object category recognition   p2 - part based models
Cvpr2007 object category recognition p2 - part based models
zukun
 
Iccv2009 recognition and learning object categories p1 c01 - classical methods
Iccv2009 recognition and learning object categories   p1 c01 - classical methodsIccv2009 recognition and learning object categories   p1 c01 - classical methods
Iccv2009 recognition and learning object categories p1 c01 - classical methods
zukun
 

Was ist angesagt? (20)

Ar1 twf030 lecture2.2
Ar1 twf030 lecture2.2Ar1 twf030 lecture2.2
Ar1 twf030 lecture2.2
 
On NURBS Geometry Representation in 3D modelling
On NURBS Geometry Representation in 3D modellingOn NURBS Geometry Representation in 3D modelling
On NURBS Geometry Representation in 3D modelling
 
Point Cloud Segmentation for 3D Reconstruction
Point Cloud Segmentation for 3D ReconstructionPoint Cloud Segmentation for 3D Reconstruction
Point Cloud Segmentation for 3D Reconstruction
 
Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017
 
Polygon Mesh Representation
Polygon Mesh RepresentationPolygon Mesh Representation
Polygon Mesh Representation
 
Lec15 sfm
Lec15 sfmLec15 sfm
Lec15 sfm
 
Lec10 alignment
Lec10 alignmentLec10 alignment
Lec10 alignment
 
Template Matching - Pattern Recognition
Template Matching - Pattern RecognitionTemplate Matching - Pattern Recognition
Template Matching - Pattern Recognition
 
Cvpr2007 object category recognition p2 - part based models
Cvpr2007 object category recognition   p2 - part based modelsCvpr2007 object category recognition   p2 - part based models
Cvpr2007 object category recognition p2 - part based models
 
Lec13 stereo converted
Lec13 stereo convertedLec13 stereo converted
Lec13 stereo converted
 
Ar1 twf030 lecture1.2
Ar1 twf030 lecture1.2Ar1 twf030 lecture1.2
Ar1 twf030 lecture1.2
 
PPT s03-machine vision-s2
PPT s03-machine vision-s2PPT s03-machine vision-s2
PPT s03-machine vision-s2
 
Object representations
Object representationsObject representations
Object representations
 
Practical Digital Image Processing 3
 Practical Digital Image Processing 3 Practical Digital Image Processing 3
Practical Digital Image Processing 3
 
Iccv2009 recognition and learning object categories p1 c01 - classical methods
Iccv2009 recognition and learning object categories   p1 c01 - classical methodsIccv2009 recognition and learning object categories   p1 c01 - classical methods
Iccv2009 recognition and learning object categories p1 c01 - classical methods
 
GRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D GraphicsGRPHICS01 - Introduction to 3D Graphics
GRPHICS01 - Introduction to 3D Graphics
 
Practical Digital Image Processing 4
Practical Digital Image Processing 4Practical Digital Image Processing 4
Practical Digital Image Processing 4
 
Lec14 eigenface and fisherface
Lec14 eigenface and fisherfaceLec14 eigenface and fisherface
Lec14 eigenface and fisherface
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Build Your Own 3D Scanner: Conclusion
Build Your Own 3D Scanner: ConclusionBuild Your Own 3D Scanner: Conclusion
Build Your Own 3D Scanner: Conclusion
 

Ähnlich wie Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)
manojg1990
 

Ähnlich wie Contextless Object Recognition with Shape-enriched SIFT and Bags of Features (20)

Salient KeypointSelection for Object Representation
Salient KeypointSelection for Object RepresentationSalient KeypointSelection for Object Representation
Salient KeypointSelection for Object Representation
 
187186134 5-geometric-modeling
187186134 5-geometric-modeling187186134 5-geometric-modeling
187186134 5-geometric-modeling
 
187186134 5-geometric-modeling
187186134 5-geometric-modeling187186134 5-geometric-modeling
187186134 5-geometric-modeling
 
5 geometric modeling
5 geometric modeling5 geometric modeling
5 geometric modeling
 
5 geometric-modeling-ppt-university-of-victoria
5 geometric-modeling-ppt-university-of-victoria5 geometric-modeling-ppt-university-of-victoria
5 geometric-modeling-ppt-university-of-victoria
 
5_Geometric_Modeling.pdf
5_Geometric_Modeling.pdf5_Geometric_Modeling.pdf
5_Geometric_Modeling.pdf
 
Automatic Image Annotation (AIA)
Automatic Image Annotation (AIA)Automatic Image Annotation (AIA)
Automatic Image Annotation (AIA)
 
Presentation vision transformersppt.pptx
Presentation vision transformersppt.pptxPresentation vision transformersppt.pptx
Presentation vision transformersppt.pptx
 
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
COMPUTER CONTROL IN PROCESS PLANNING Unit 2 (ME CAD/CAM)
 
A CAD ppt 25-10-19.pdf
A CAD ppt 25-10-19.pdfA CAD ppt 25-10-19.pdf
A CAD ppt 25-10-19.pdf
 
06_features_slides.pdf
06_features_slides.pdf06_features_slides.pdf
06_features_slides.pdf
 
SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...SANN: Programming Code Representation Using Attention Neural Network with Opt...
SANN: Programming Code Representation Using Attention Neural Network with Opt...
 
111431635-geometric-modeling-glad1-150630140219-lva1-app6892 (1).pdf
111431635-geometric-modeling-glad1-150630140219-lva1-app6892 (1).pdf111431635-geometric-modeling-glad1-150630140219-lva1-app6892 (1).pdf
111431635-geometric-modeling-glad1-150630140219-lva1-app6892 (1).pdf
 
Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)Geometric modeling111431635 geometric-modeling-glad (1)
Geometric modeling111431635 geometric-modeling-glad (1)
 
3DRepo
3DRepo3DRepo
3DRepo
 
lecture_16_jiajun.pdf
lecture_16_jiajun.pdflecture_16_jiajun.pdf
lecture_16_jiajun.pdf
 
Scrdet++ analysis
Scrdet++ analysisScrdet++ analysis
Scrdet++ analysis
 
Dibujo y Modelación 3D
Dibujo y Modelación 3DDibujo y Modelación 3D
Dibujo y Modelación 3D
 
Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008
 
2015 10-08 - additive manufacturing software 1
2015 10-08 - additive manufacturing software  12015 10-08 - additive manufacturing software  1
2015 10-08 - additive manufacturing software 1
 

Mehr von Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Contextless Object Recognition with Shape-enriched SIFT and Bags of Features

  • 1. Contextless Object Recognition with Shape-enriched SIFT and Bags of Features Marcel Tella Amo Directed by Dr. Matthias Zeppelzauer (TU Wien) Codirected by Dr. Xavier Giró-i-Nieto (UPC)
  • 2. Motivation 2 Object Recognition and Classification Categories • Ball • Airplane • Chair • Beaver • … Ball Airplane Chair Shape Information Texture information
  • 3. 3 Index Requirements State of the Art Design Results
  • 5. Requirements State of the Art Design Results Design shape features that can be used in an aggregated framework, like Bag of Words with no need of matching or alignment. 5 Take a successful method : Shape Information SIFT
  • 6. Requirements State of the Art Design Results Analyse the implication of the vocabulary size with respect to the size of the shape features. SIFT 6 Shape
  • 7. The proposed features should be at least scale, rotation and translation invariant. If it is possible, flip invariant as well. 7 Requirements State of the Art Design Results
  • 8. Need for Segmentation to codify the shape Study the limitations of shape coding when using a state of the art segmentation. Manual annotations vs Automatic Segmentation 8 Requirements State of the Art Design Results
  • 9. State of the Art 9
  • 10. Requirements State of the Art Design Results Object Candidates algorithms Multiscale Combinatorial Grouping (MCG) 10 Ranking Object Plausibility Arbelaez, P., Pont-Tuset, J., Barron, J. T., Marques, F., Malik, J. (2014). Multiscale Combinatorial Grouping. CVPR. High Low
  • 11. Requirements State of the Art Design Results Shape Context 11 G. Mori, S. Belongie, and J. Malik. Ecient shape matching using shape contexts. PAMI, 27(11), 2005.
  • 12. Requirements State of the Art Design Results Interest point descriptors: SIFT descriptor Simplified example Typically 4x4 divisions * 8 bins/hist = 128 features dense SIFT sparse SIFT 12 David G Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision 60 (2004), no. 2, 91{110.
  • 13. Requirements State of the Art Design Results Enrichment of SIFT Extra features : Absolute spatial location (X,Y) or angle and distance Rene Grzeszick, Leonard Rothacker, and Gernot A. Fink, "Bag-of-features representations using spatial visual vocabularies for object classication,“ in IEEE Intl. Conf. on Image Processing, Melbourne, Australia, 2013 Extra features : Relative position + aspect ratio + scale ratio + Color Space Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision{ECCV 2012} (pp. 430-443). Springer Berlin Heidelberg. 13 128-dimensional SIFT descriptor Extra features
  • 14. Bag of Words 14 Requirements State of the Art Design Results
  • 15. Requirements State of the Art Design Results Bags of Words - Pipeline 15 Get Descriptors Clustering (K-means) Create histograms Train Model (SVM) Image Create histogram Evaluate (SVM)
  • 17. Requirements State of the Art Design Results Why dense SIFT? 17
  • 18. Main principle: Combination of dense SIFT and Object Candidates 18 Requirements State of the Art Design Results
  • 19. Requirements State of the Art Design Results Distance to the nearest border (DNB) Logarithmic distance to the nearest border (LDNB) Less influence of big distances 19 Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg.
  • 20. Distance and Angle to the nearest border (DANB) Problem: Really similar in 2D but very different values. Solution: Codify them in two separated features. 20 Requirements State of the Art Design Results
  • 21. Rotation Invariant Angle to the nearest border 21 Requirements State of the Art Design Results
  • 22. Distance to the center (DC) 22 Requirements State of the Art Design Results
  • 23. η - Angular Scan (ηAS) WINNER! 23 Requirements State of the Art Design Results
  • 24. Shape Context from a dense SIFT (DSC) Note: It crosses the contour of the region like Shape Context. ηAS does not! 24 Requirements State of the Art Design Results
  • 25. Requirements State of the Art Design Results Rotation Invariant Region Quantization (RIRQ) Main idea: Get spatial information. Easily extensible to a pyramid! 25 Lazebnik, S., Schmid, C., & Ponce, J. (2006). 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169-2178). IEEE.
  • 26. Achieving flip invariance (RIRQ) 1 2 4 3 1 2 3 4 2 4 1 3 2 3 4 1 4 2 2 4 SORT SORT 2 4 26 Requirements State of the Art Design Results
  • 27. Where do we integrate our features? Two main Architectures Enriched SIFT (eSIFT) SIFT Shape features Visual Vocabulary Bag of eSIFT visual words BoW+Shape SIFT Visual Vocabulary Bag of Words Shape histogram 27 Requirements State of the Art Design Results
  • 28. BoW+Shape Creation of the shape histograms SIFT Accumulation of features Visual Vocabulary Bag of Words Shape histogram 1 1. Accumulate the same feature for all points . 2. Create a histogram of X bins for that feature. 1 2 2 3. Concatenate histograms to create the final one. Example: 8-Angular Scan 8 distances (different angles) # SIFT keypoints 28 Requirements State of the Art Design Results
  • 30. Requirements State of the Art Design Results The dataset: Caltech-101 30 •Well recognized dataset • 101 Different Categories of images • Ground truth annotations available • From 40 to 800 images per category.
  • 31. Requirements State of the Art Design Results Metrics: Accuracy (%) 31 Correct Classifications Correct + Incorrect Classifications
  • 32. Requirements State of the Art Design Results Experiments setup 32 • 30 images per category in train and 30-50 in test. • 101 Categories + Background category. • Different Vocabulary sizes in the X axis. • Accuracy(%) in the Y axis: •Experiments and analysis: • eSIFT • BoW+S • eSIFT vs BoW+S • Performance acheived • Comparison between adding features before or after quantization • Number of bins per histogram • Ground truth vs MCG Object Canditates • Context vs Shape
  • 33. Results enriched SIFT 33 Requirements State of the Art Design Results
  • 34. Results BoW+S 34 Requirements State of the Art Design Results
  • 35. Requirements State of the Art Design Results Performance achieved 35 Conclusion With Angular Scan, there is an increase of performance from 16% to around 41%.
  • 36. Requirements State of the Art Design Results Comparison between adding features after and before Conclusion In Angular Scan, if the number of shape features is high, both architectures tend to converge. 36
  • 37. Requirements State of the Art Design Results Number of bins per histogram Conclusion In Angular Scan, 8 bins is the value that gives the best performance. 37
  • 38. Requirements State of the Art Design Results Ground truth vs MCG Object Candidates Conclusion 1 2 Higher vocabulary values lead to a more robust approach in terms of segmentation errors. Shape-based methods are more sensible to segmentation errors than texture-based. 38
  • 39. Requirements State of the Art Design Results Context gain vs Shape gain Conclusion Object Context It gives better performance to codify the shape than the context of the image. 39
  • 40. FutureWork Comparison betwen our work and Second Order Pooling PhD thesis of Carles Ventura Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Computer Vision-ECCV 2012 (pp. 430-443). Springer Berlin Heidelberg. 40
  • 41. Distance to the nearest border (DNB) 41 Future Work
  • 42. Conclusions 1. Increase of performance from 16% to around 41% 2. In Angular Scan, if the number of shape features is high, both architectures tend to converge. 3. In Angular Scan, 8 bins is the value that gives the best performance. 4. Higher vocabulary values lead to a more robust approach in terms of segmentation errors. 5. Shape-based methods are more sensible to segmentation errors than texture-based. 6. It gives better performance to codify the shape than the context of the image. Thank you! Questions? 42