SlideShare a Scribd company logo
1 of 36
Download to read offline
Object Recognition with
     Deformable Models
            Pedro F. Felzenszwalb
        Department of Computer Science
            University of Chicago



Joint work with: Dan Huttenlocher, Joshua Schwartz,
         David McAllester, Deva Ramanan.
Example Problems
  Detecting rigid objects              PASCAL challenge




                            Medical image
Detecting non-rigid objects   analysis
                                            Segmenting cells
Deformable Models
•   Significant challenge:
    - Handling variation in appearance within object classes
    - Non-rigid objects, generic categories, etc.
•   Deformable models approach:
    - Consider each object as a deformed version of a template
    - Compact representation
    - Leads to interesting modeling and algorithmic problems
Overview
•   Part I: Pictorial Structures
    - Deformable part models
    - Highly efficient matching algorithms
•   Part II: Deformable Shapes
    - Triangulated polygons
    - Hierarchical models
•   Part III: The PASCAL Challenge
    - Recognizing 20 object categories in realistic scenes
    - Discriminatively trained, multiscale, deformable part models
Part I: Pictorial Structures

•   Introduced by Fischler and Elschlager in 1973

•   Part-based models:
    - Each part represents local visual properties
    - “Springs” capture spatial relationships
                             Matching model to image involves
                             joint optimization of part locations
                                       “stretch and fit”
Local Evidence + Global Decision

•   Parts have a match quality at each image location

•   Local evidence is noisy
    - Parts are detected in the context of the whole model
             part




          test image                 match quality
Matching Problem

•   Model is represented by a graph G = (V, E)
    - V = {v ,...,v } are the parts
                 1         n

    - (v ,v ) ∈ E indicates a connection between parts
         i   j

•   mi(li) is a cost for placing part i at location li

•   dij(li,lj) is a deformation cost

•   Optimal configuration for the object is L = (l1,...,ln) minimizing
                     n
     E(L) =          ∑ m (l ) + ∑ d (l ,l )
                               i i                 ij i j
                     i=1             (vi,vj) ∈ E
Matching Problem
                           n
                E(L) =    ∑ m (l ) + ∑ d (l ,l )
                                 i i                 ij i j
                          i=1          (vi,vj) ∈ E


•   Assume n parts, k possible locations for each part
    - There are k n   configurations L

•   If graph is a tree we can use dynamic programming
    - O(nk ) algorithm
            2


•   If dij(li,lj) = g(li-lj) we can use min-convolutions
    - O(nk) algorithm
    - As fast as matching each part separately!
Dynamic Programming on Trees
                     n                                                 v2
          E(L) =    ∑ m (l ) + ∑ d (l ,l )
                              i i                 ij i j
                    i=1             (vi,vj) ∈ E                   v1



•   For each l1 find best l2:

    - Best (l ) = min [m (l ) + d
           2 1
                         l2
                                2 2               12(l1,l2)   ]
•   “Delete” v2 and solve problem with smaller model

•   Keep removing leafs until there is a single part left
Min-Convolution Speedup
                                                           v2

      Best2(l1) = min [m2(l2) + d12(l1,l2)]           v1
                     l2




•   Brute force: O(k2) --- k is number of locations

•   Suppose d12(l1,l2) = g(l1-l2):

    - Best (l ) = min [m (l ) + g(l -l )]
           2 1
                     l2
                            2 2        1 2


•   Min-convolution: O(k) if g is convex
Finding Motorbikes

Model with 6 parts:
      2 wheels
    2 headlights
front & back of seat
Human Pose Estimation
Human Tracking




Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance
IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
Part II: Deformable Shapes
•   Shape is a fundamental cue for recognizing objects

•   Many objects have no well defined parts
    - We can capture their outlines using deformable models
Triangulated Polygons




•   Polygonal templates

•   Delauney triangulation gives natural decomposition of an object

•   Consider deforming each triangle “independently”


                                    Rabbit ear can be bent by
                                    changing shape of a single
                                            triangle
Structure of Triangulated Polygons


                     There are 2 graphs associated with a
                            triangulated polygon



If the polygon is simple (no holes):

  Dual graph is a tree
  Graphical structure of triangulation is a 2-tree
Deformable Matching
        Consider piecewise affine maps from model
        to image (taking triangles to triangles)

        Find globally optimal deformation using
Model   dynamic programming over 2-tree




            Matching to MRI data
Hierarchical Shape Model
•   Shape-tree of curve from a to b:
    -   Select midpoint c, store relative location c | a,b.
    -   Left child is a shape-tree of sub-curve from a to c.
    -   Right child is a shape-tree of sub-curve from c to b.
                            h
            f           c       d     i
                e   g                                     c | a,b
                                          b
        a

                                              e | a,c                d | c,b




                                    f | a,e     g | e,c             h | c,d    i | d,b
Deformations

•   Independently perturb relative locations stored in a shape-tree
    -   Local and global properties are preserved
    -   Reconstructed curve is perceptually similar to original
Matching
                     h
     f           c           d     i
         e   g                                         c | a,b

a
                                       b   w                                           p

                                           e | a,c                d | c,b
                                                                                               r


                         v       f | a,e     g | e,c             h | c,d    i | d,b
                                                                                           q


                                             u
    model                                                                             curve

Match(v, [p,q]) = w1
Match(u, [q,r]) = w2
Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r))

         similar to parsing with the CKY algorithm
Recognizing Leafs




Nearest neighbor classification
                                  15 species
   Shape-tree           96.28
                                  75 examples per species
 Inner distance         94.13
                                  (25 training, 50 test)
 Shape context          88.12
Part III: PASCAL Challenge
•   ~10,000 images, with ~25,000 target objects
    - Objects from 20 categories (person, car, bicycle, cow, table...)
    - Objects are annotated with labeled bounding boxes
Model Overview




detection     root filter   part filters deformation
                                         models

Model has a root filter plus deformable parts
Histogram of Gradient (HOG) Features




•   Image is partitioned into 8x8 pixel blocks

•   In each block we compute a histogram of gradient orientations
    - Invariant to changes in lighting, small deformations, etc.
•   We compute features at different resolutions (pyramid)
Filters

•   Filters are rectangular templates defining weights for features

•   Score is dot product of filter and subwindow of HOG pyramid


                                                          H
                                          W
                                      Score of H at this location is H ⋅ W




                        HOG pyramid
Object Hypothesis




                                              Score is sum of filter
                                             scores plus deformation
                                                      scores

  Image pyramid        HOG feature pyramid




Multiscale model captures features at two-resolutions
Training
•   Training data consists of images with labeled bounding boxes

•   Need to learn the model structure, filters and deformation costs




                                    Training
Connection With Linear Classifiers
 •   Score of model is sum of filter scores plus deformation scores
     - Bounding box in training data specifies that score should be
       high for some placement in a range


                   w is a model
                   x is a detection window
                   z are filter placements




concatenation of filters and       concatenation of features
deformation parameters            and part displacements
Latent SVMs


Linear in w if z is fixed




            Regularization   Hinge loss
Learned Models
                            Bicycle
                     Sofa


          Car
Bottle
Example Results
More Results
Overall Results

•   9 systems competed in the 2007 challenge

•   Out of 20 classes we get:
    - First place in 10 classes
    - Second place in 6 classes
•   Some statistics:
    - It takes ~2 seconds to evaluate a model in one image
    - It takes ~3 hours to train a model
    - MUCH faster than most systems
Component Analysis

                               PASCAL2006 Person
             1
            0.9                       Root (0.18)
                                      Root+Latent (0.24)
            0.8                       Parts+Latent (0.29)
            0.7                       Root+Parts+Latent (0.34)
            0.6
precision




            0.5
            0.4
            0.3
            0.2
            0.1
             0
                  0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9        1
                                     recall
Summary

•   Deformable models provide an elegant framework for object
    detection and recognition

    - Efficient algorithms for matching models to images
    - Applications: pose estimation, medical image analysis,
      object recognition, etc.

•   We can learn models from partially labeled data

    - Generalized standard ideas from machine learning
    - Leads to state-of-the-art results in PASCAL challenge
•   Future work: hierarchical models, grammars, 3D objects

More Related Content

What's hot

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Jia-Bin Huang
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsMatthew Leingang
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defensessj4mathgenius
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based EncryptionPratik Poddar
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsXavier Anguera
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthXavier Anguera
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Jimbo Lamb
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Learning with Nets and Meshes
Learning with Nets and MeshesLearning with Nets and Meshes
Learning with Nets and MeshesDon Sheehy
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
 
Auctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsAuctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsJason Riedy
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Joo-Haeng Lee
 
Fcv learn ramanan
Fcv learn ramananFcv learn ramanan
Fcv learn ramananzukun
 

What's hot (19)

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Masters Thesis Defense
Masters Thesis DefenseMasters Thesis Defense
Masters Thesis Defense
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applications
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New Youth
 
Gottlob ICDE 2011
Gottlob ICDE 2011Gottlob ICDE 2011
Gottlob ICDE 2011
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2Integrated Math 2 Section 6-2
Integrated Math 2 Section 6-2
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Svm my
Svm mySvm my
Svm my
 
Learning with Nets and Meshes
Learning with Nets and MeshesLearning with Nets and Meshes
Learning with Nets and Meshes
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
 
Auctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) MatchingsAuctions for Distributed (and Possibly Parallel) Matchings
Auctions for Distributed (and Possibly Parallel) Matchings
 
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
Note on Coupled Line Cameras for Rectangle Reconstruction (ACDDE 2012)
 
Fcv learn ramanan
Fcv learn ramananFcv learn ramanan
Fcv learn ramanan
 
Venn diagram
Venn diagramVenn diagram
Venn diagram
 

Similar to Object Recognition with Deformable Models

Computer Vision transformations
Computer Vision  transformationsComputer Vision  transformations
Computer Vision transformationsWael Badawy
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detectionzukun
 
lec07_transformations.pptx
lec07_transformations.pptxlec07_transformations.pptx
lec07_transformations.pptxAneesAbbasi14
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetectionJie Feng
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferCharles Deledalle
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicingElsayed Hemayed
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2tingyuansenastro
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
 
Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Pirouz Nourian
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationUnited States Air Force Academy
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationUlaş Bağcı
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networksVincent Traag
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 

Similar to Object Recognition with Deformable Models (20)

Computer Vision transformations
Computer Vision  transformationsComputer Vision  transformations
Computer Vision transformations
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detection
 
16 17 bag_words
16 17 bag_words16 17 bag_words
16 17 bag_words
 
lec07_transformations.pptx
lec07_transformations.pptxlec07_transformations.pptx
lec07_transformations.pptx
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetection
 
point processing
point processingpoint processing
point processing
 
Solid modeling
Solid modelingSolid modeling
Solid modeling
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicing
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Integration
IntegrationIntegration
Integration
 
Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017Mesh final pzn_geo1004_2015_f3_2017
Mesh final pzn_geo1004_2015_f3_2017
 
Cg 04-math
Cg 04-mathCg 04-math
Cg 04-math
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image Segmentation
 
Community structure in complex networks
Community structure in complex networksCommunity structure in complex networks
Community structure in complex networks
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 

Recently uploaded

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 

Recently uploaded (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 

Object Recognition with Deformable Models

  • 1. Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.
  • 2. Example Problems Detecting rigid objects PASCAL challenge Medical image Detecting non-rigid objects analysis Segmenting cells
  • 3. Deformable Models • Significant challenge: - Handling variation in appearance within object classes - Non-rigid objects, generic categories, etc. • Deformable models approach: - Consider each object as a deformed version of a template - Compact representation - Leads to interesting modeling and algorithmic problems
  • 4. Overview • Part I: Pictorial Structures - Deformable part models - Highly efficient matching algorithms • Part II: Deformable Shapes - Triangulated polygons - Hierarchical models • Part III: The PASCAL Challenge - Recognizing 20 object categories in realistic scenes - Discriminatively trained, multiscale, deformable part models
  • 5. Part I: Pictorial Structures • Introduced by Fischler and Elschlager in 1973 • Part-based models: - Each part represents local visual properties - “Springs” capture spatial relationships Matching model to image involves joint optimization of part locations “stretch and fit”
  • 6. Local Evidence + Global Decision • Parts have a match quality at each image location • Local evidence is noisy - Parts are detected in the context of the whole model part test image match quality
  • 7. Matching Problem • Model is represented by a graph G = (V, E) - V = {v ,...,v } are the parts 1 n - (v ,v ) ∈ E indicates a connection between parts i j • mi(li) is a cost for placing part i at location li • dij(li,lj) is a deformation cost • Optimal configuration for the object is L = (l1,...,ln) minimizing n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E
  • 8. Matching Problem n E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E • Assume n parts, k possible locations for each part - There are k n configurations L • If graph is a tree we can use dynamic programming - O(nk ) algorithm 2 • If dij(li,lj) = g(li-lj) we can use min-convolutions - O(nk) algorithm - As fast as matching each part separately!
  • 9. Dynamic Programming on Trees n v2 E(L) = ∑ m (l ) + ∑ d (l ,l ) i i ij i j i=1 (vi,vj) ∈ E v1 • For each l1 find best l2: - Best (l ) = min [m (l ) + d 2 1 l2 2 2 12(l1,l2) ] • “Delete” v2 and solve problem with smaller model • Keep removing leafs until there is a single part left
  • 10. Min-Convolution Speedup v2 Best2(l1) = min [m2(l2) + d12(l1,l2)] v1 l2 • Brute force: O(k2) --- k is number of locations • Suppose d12(l1,l2) = g(l1-l2): - Best (l ) = min [m (l ) + g(l -l )] 2 1 l2 2 2 1 2 • Min-convolution: O(k) if g is convex
  • 11. Finding Motorbikes Model with 6 parts: 2 wheels 2 headlights front & back of seat
  • 13. Human Tracking Ramanan, Forsyth, Zisserman, Tracking People by Learning their Appearance IEEE Pattern Analysis and Machine Intelligence (PAMI). Jan 2007
  • 14. Part II: Deformable Shapes • Shape is a fundamental cue for recognizing objects • Many objects have no well defined parts - We can capture their outlines using deformable models
  • 15. Triangulated Polygons • Polygonal templates • Delauney triangulation gives natural decomposition of an object • Consider deforming each triangle “independently” Rabbit ear can be bent by changing shape of a single triangle
  • 16. Structure of Triangulated Polygons There are 2 graphs associated with a triangulated polygon If the polygon is simple (no holes): Dual graph is a tree Graphical structure of triangulation is a 2-tree
  • 17. Deformable Matching Consider piecewise affine maps from model to image (taking triangles to triangles) Find globally optimal deformation using Model dynamic programming over 2-tree Matching to MRI data
  • 18. Hierarchical Shape Model • Shape-tree of curve from a to b: - Select midpoint c, store relative location c | a,b. - Left child is a shape-tree of sub-curve from a to c. - Right child is a shape-tree of sub-curve from c to b. h f c d i e g c | a,b b a e | a,c d | c,b f | a,e g | e,c h | c,d i | d,b
  • 19. Deformations • Independently perturb relative locations stored in a shape-tree - Local and global properties are preserved - Reconstructed curve is perceptually similar to original
  • 20. Matching h f c d i e g c | a,b a b w p e | a,c d | c,b r v f | a,e g | e,c h | c,d i | d,b q u model curve Match(v, [p,q]) = w1 Match(u, [q,r]) = w2 Match(w, [p,r]) = w1 + w2 + dif((e|a,c), (q|p,r)) similar to parsing with the CKY algorithm
  • 21. Recognizing Leafs Nearest neighbor classification 15 species Shape-tree 96.28 75 examples per species Inner distance 94.13 (25 training, 50 test) Shape context 88.12
  • 22. Part III: PASCAL Challenge • ~10,000 images, with ~25,000 target objects - Objects from 20 categories (person, car, bicycle, cow, table...) - Objects are annotated with labeled bounding boxes
  • 23.
  • 24. Model Overview detection root filter part filters deformation models Model has a root filter plus deformable parts
  • 25. Histogram of Gradient (HOG) Features • Image is partitioned into 8x8 pixel blocks • In each block we compute a histogram of gradient orientations - Invariant to changes in lighting, small deformations, etc. • We compute features at different resolutions (pyramid)
  • 26. Filters • Filters are rectangular templates defining weights for features • Score is dot product of filter and subwindow of HOG pyramid H W Score of H at this location is H ⋅ W HOG pyramid
  • 27. Object Hypothesis Score is sum of filter scores plus deformation scores Image pyramid HOG feature pyramid Multiscale model captures features at two-resolutions
  • 28. Training • Training data consists of images with labeled bounding boxes • Need to learn the model structure, filters and deformation costs Training
  • 29. Connection With Linear Classifiers • Score of model is sum of filter scores plus deformation scores - Bounding box in training data specifies that score should be high for some placement in a range w is a model x is a detection window z are filter placements concatenation of filters and concatenation of features deformation parameters and part displacements
  • 30. Latent SVMs Linear in w if z is fixed Regularization Hinge loss
  • 31. Learned Models Bicycle Sofa Car Bottle
  • 34. Overall Results • 9 systems competed in the 2007 challenge • Out of 20 classes we get: - First place in 10 classes - Second place in 6 classes • Some statistics: - It takes ~2 seconds to evaluate a model in one image - It takes ~3 hours to train a model - MUCH faster than most systems
  • 35. Component Analysis PASCAL2006 Person 1 0.9 Root (0.18) Root+Latent (0.24) 0.8 Parts+Latent (0.29) 0.7 Root+Parts+Latent (0.34) 0.6 precision 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 recall
  • 36. Summary • Deformable models provide an elegant framework for object detection and recognition - Efficient algorithms for matching models to images - Applications: pose estimation, medical image analysis, object recognition, etc. • We can learn models from partially labeled data - Generalized standard ideas from machine learning - Leads to state-of-the-art results in PASCAL challenge • Future work: hierarchical models, grammars, 3D objects