SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Compositional Hierarchy for 3D
Object Recognition
Maria Isabel Restrepo

October 26, 2009
Goal:




        Maria Isabel Restrepo
Goal

                 Geometry             Expected Appearance




Renderings obtained by Dan Crispell


                                                            Maria Isabel Restrepo
Goal: Recognition in a 3D World




                                  Maria Isabel Restrepo
Compelling Characteristics
POWERFUL GEOMETRIC AND PHOTOMETRIC REPRESENTATION* OF SCENES


✤ It is a 3D, geometric representation that supports discovery of spatial relations

✤ Its appearance is modeled by MOG to handle illumination variations

✤ Appearance and geometry are automatically learned from multiple images with
   calibrated cameras


✤ It is faithful to the scenes: There are no prior assumptions about the model

          THESE CHARACTERISTICS ARE IDEAL FOR OBJECT RECOGNITION

* [Pollard and Mundy, CVPR 2007] [Crispell]


                                                                        Maria Isabel Restrepo
Outline

✤   Volumetric appearance model - The Voxel World

✤   Insights on classical recognition methods

✤   Compositional hierarchies
    ✤   Bienenstock, Geman, Potter, 97; Geman, Chi, 2002; Geman, Jin, CVPR 2006
    ✤   Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008
    ✤   Mundy & Ozcanli, SPIE ’09


✤   Experimental work: Proof of concept

✤   Future work
                                                                                  Maria Isabel Restrepo
The Voxel World

  Probabilistic representation of 3-d scenes based on volumetric units -voxel.




                                                                    p(intensity)
                                                                                        intensity




Surface probability is given by incremental learning   Appearance is modeled a Mixture of Gaussians
                                                                                                                
                                                                    3
                                                                                                     (I−µk )2
                                pN (Ix +1 |X ∈ S)
                                     N
                                                                                   wk          1    −
                                                                                                        2σ 2
   P N +1 (X ∈ S) = P N (X ∈ S)                            p(I) =                                 e      k
                                    pN (Ix +1 )
                                         N
                                                                                   W          2πσk
                                                                                                 2
                                                                    k=1




                                                                                                       Maria Isabel Restrepo
Outline

✤   Volumetric appearance model - The Voxel World

✤   Insights on classical recognition methods

✤   Compositional hierarchies
    ✤   Jin  Geman
    ✤   Fidler  Leonardis, CVPR’07; Fidler, Boben  Leonardis, CVPR 2008
    ✤   Mundy  Ozcanli, SPIE ’09


✤   Experimental work: Proof of concept

✤   Future work
                                                                            Maria Isabel Restrepo
Classical Recognition: Bag of Features
                                                            Codeword,               Feature space -
                               Feature descriptor
                                                            Codebook                   Classify




                                                                                         e.g .SVM
                                   e.g. SIFT- Lowe
                                                                                        Naive Bayes
                                     HOG- Dalal
                                                                                            NN

Drawbacks:                                    Many have proposed more complex
                                           representations of spatial object structure.
✤   Disregards spatial
                                       ✤   Constellation Models [Weber and Welling et al, Fergus et al]
    information                            -Complex, few parts
✤   Large number of features are       ✤   Probabilistic voting [Leibe, Schiele] -Large codebook -
                                           complex matching
    needed
                                       ✤   Hierarchical representations




                                                                                        Maria Isabel Restrepo
formation about the geometric
                           Learning Hierarchical Models of Scenes, Objects, and Parts                                                                              center and local appearance. F

Hierarchical Representations
                         Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky
                                                                                                                                                                   clusters and their distributions f
                      Electrical Engineering  Computer Science, Massachusetts Institute of Technology                                                             are therefore represented in on
                          esuddert@mit.edu, torralba@csail.mit.edu, billf@mit.edu, willsky@mit.edu                                                                 clusters and geometric distribu
                                                                                                                                                          We thus need the means of finding the similarities among
                                                                                                                                                          different hierarchical nodes in a geometrical sense.
                                    Abstract                                                                                                                 We propose to create similarity connections between hi-
                                                                                                           o                    ζ        νo                                    a                    b
                                                                                                                                                          erarchical nodes within layers to achieve invariance for high
                                                                                                                                                                                     top nodes                         aj
             We describe a hierarchical probabilistic model for the                                                                                       variability in object shape and draw similarities across lay-
          detection and recognition of objects in cluttered, natural                                                 r          Φ        ∆o               ers to achieve a proper scale normalization of features. We
                                                                                                                                                                                                    h
                                                                                                                                    O                     show how a d   layer-independent description of objects de-
          scenes. The model is based on a set of parts which describe                α         θ                                                                              f                          e

    Address the need for a
                                                                                                                                                                                     c
                                                                                                                                                          fined by the so-called shape-terminals, i.e. shapinals, can
✤
                                                                                                                                                                                              g
          the expected appearance and position, in an object centered                              O       z                             νp               be passed to the higher-level, the category-specific repre-
          coordinate frame, of features detected by a low-level inter-                                                          µ                         sentation. If performed inappearance k problem of ter-
                                                                                                                                                                    l     m       n     this manner, the r i
          est operator. Each object category then has its own distri-                                                                                                                      aj
                                                                                                                                                          minal nodes within the hierarchical “library” is solved in a
    representation that
          bution over these parts, which are shared between objects.
          We learn the parameters of this model via a Gibbs sampler
                                                                                     β         φ
                                                                                                   P
                                                                                                           w        x
                                                                                                                         Nm
                                                                                                                                Λ
                                                                                                                                    P
                                                                                                                                        ∆p                                             geometric
                                                                                                                                                          natural way. There is no distributions
                                                                                                                                                                                     need to by-pass or float features to
                                                                                                                                                          the top-most layer and thus unnecessarily load the complex-
                                                                                                                                                                                    p(g j,a j |O n)
                                                                                                                          M
                                                                                                                                                          ity of representation, which may prevent the unsupervised
    incorporates geometric
          which uses the graphical model’s structure to analytically
                                                                                   Figure 1. Graphical model describing how latent parts z                creation of higher layers (the problem arising in [7]). In-
          average over many parameters. Applied to a database of                                                                                                       (a)
          images of isolated objects, the sharing of parts among ob-
                                                                                   generate the appearance w and position x , relative to
                                                                             Sudderth, Torralba, Freeman Willsky                                      Mikolajcjzyk, Leibe, Schiele
                                                                                                                                                          stead, at each hierarchical stage of learning, only a subset
                                                                                   an image–specific reference location r , of the features                of the layer’s statistically most repeatable features can be

    coherence
          jects improves detection accuracy when few training exam-
          ples are available. We also extend this hierarchical frame-                     MIT-2006
                                                                                   detected in an image of object o . Boxes denote repli-            UK, Switzerland,Hierarchical 2006
                                                                                                                                                          Figure 2. (a) Germany structur
                                                                                                                                                          combined further, yet the final, cross-layered description of
                                                                                                                                                          objects will retain its descriptive power.
                                                                                                                                                                   Appearance clusters (left column
                                                                                   cation of the corresponding random variables: there are
          work to scenes containing multiple objects.
                                                                                   M images, with Nm observed features in image m.                                 tions for different object classes. F
          1. Introduction
              In this paper, we develop methods for the visual detec-
                                                                                                                                                                   are in 2D Cartesian coordinate sys
                                                                                    with interesting semantic interpretations, and can improve
✤   Allow for a more
          tion and recognition of object categories. We argue that
          multi–object recognition systems should be based on mod-
                                                                                    performance when few training examples are available. Fi-
                                                                                    nally, object appearance information is shared between the                      Building the tree. To build t
          els which consider the relationships between different ob-                many scenes in which that object is found.
    efficient representation
          ject categories during the training process. This approach
          provides several benefits. At the lowest level, significant
                                                                                         We begin in Sec. 2 by describing our generative model
                                                                                    for objects and parts, including a discussion of related work
                                                                                                                                                                    clidean distance) to group the a
          computational savings can be achieved if different cate-                                                                                                  a hyperball of a given radius r
                                                                                    in the machine vision and text analysis literature. Sec. 3
          gories share a common set of features. More importantly,                  then describes parameter estimation methods which com-                          or part they belong to. To bu
          jointly trained recognition systems can use similarities be-              bine Gibbs sampling with efficient variational approxima-
          tween object categories to their advantage by learning fea-               tions. In Sec. 4, we provide simulations demonstrating
                                                                                                                                                                    ply agglomerative clustering. T
          tures which lead to better generalization [4, 18]. This inter–            the potential benefits of feature sharing. We conclude in                        with the number of clusters eq
✤   Consistent with
          category regularization is particularly important in the com-
          mon case where few training examples are available.
              In complex, natural scenes, object recognition systems
                                                                                    Sec. 5 with preliminary extensions of the object hierarchy
                                                                                    to scenes containing multiple objects.                                          and merges the two closest cl
                                                                                                                                                                    record the indices of merged cl
    biological systems
          can be further improved by using contextual knowledge                     2. A Generative Model for Object Features
          about the objects likely to be found in a given scene, and Jin and Geman, 2006 Our generative model for objects is summarized in the               Figuretance at which the representation. are m
                                                                                                                                                                    1. Cross-layered, scale independent clusters
          common spatial relationships between those objects [7, 19, Williamsgraphical model (a directed Geman
                                                                      Chris                            Jin and Bayesian network) of Fig. 1.         ANC
                                                                                                                                                             Fidler, Boben continues until the l
                                                                                                                                                                                          Leonardis
          20]. In this paper, we propose a hierarchical generative    Hierarchical Object Recognition
                                                                                    The nodes of this graph represent random variables, where
                                                                                                                                                                    procedure hierarchical compositional
          model for objects, the parts composing them, and the scenes                                 Brown University                                    3.1. The base model:
                                                                                                                                                              U. The resulting Slovenia trace
                                                                                                                                                                    of Ljubljana, clustering
                                                                                    shaded nodes are observed during training, and rounded                       framework [7]
          surrounding them. The model, which is summarized in
          Figs. 1 and 5, shares information between object categories
                                                                                                         CVPR 2006
                                                                                    boxes are fixed hyperparameters. Edges encode the con-
                                                                                                                                                                      CVPR 2007, 2008
                                                                                                                                                             We build on our previously proposed approach [7], p
                                                                                                                                                                    tree. The only parameter to
                                                                                    ditional densities underlying the generative process [12].            where we proposed an unsupervised learning framework
          in three distinct ways. First, parts define distributions over a
                                                                                    2.1. From Images to Features                                          to obtain tom nodes (radius of appearanc
                                                                                                                                                                    a hierarchical compositional representation of ob-
          common low–level feature vocabularly, leading to compu-                                                                                         ject categories. Starting with simple oriented filters the ap-
          tational savings when analyzing new images. In addition,                       Following [17], we represent each of our M grayscale             proach learns the first three The of optimally sharable
                                                                                                                                                                    tree levels. layers radii for interm
          and more unusually, objects are defined using a common                     training images by a set of SIFT descriptors [13] computed            features, defined as loose spatial Isabel Restrepo
                                                                                                                                                                                    Maria compositions, i.e. parts.
          set of parts. This structure leads to the discovery of parts              on affine covariant regions. We use K-means clustering to              Upon thetributed higher-layer categorical representa- n
                                                                                                                                                                     third layer, a between the bottom
                                                                                                                                                          tion is derived with minimal supervision. The model is in
                                                                                                                                                          essence composed of two recursively iterated radii are o
                                                                                                                                                                    the top node. These steps, 1.) a
Prior work by Geman: Efficient Discrimination
        [Bienenstock, Geman, Potter, 97], [Geman, Chi, 2002], [Geman, Jin, CVPR 2006]

         A COMPOSITIONAL MACHINE:                                                                                                                                                          license plates
        ✤      Probabilistic framework
        ✤      Hierarchy and reusability                                                                                                                                                 license numbers

        ✤      It does not exclude the sharing of subparts
        ✤      Parts are everywhere, compositions are rare                                                                                                                                plate boundary

        ✤      Need to model relative geometry of parts                                                                                            (active) bricks. The proportionality sign (∝)generic letter,
                                                                                                                                                                                                  can be replaced
                                                                                                                                                   with equality (=) if, at the introduction generic number
                                                                                                                                                                                                of each attribute
                                                                                 20
                                                                                                                                                   function, aβ , care is taken to ensure that p0 (aβ ) is exactly
                                                                                                                                                                                                β
                                                                                 40
                                                                                                                                                   the current (“unperturbed”) conditional distribution on aβ
                                                                                 60
                                                                                                                                                   given xβ  0. In general, it is not practical to compute an

       Markovian distribution:                               Test set: 385 images, mostly from Logan Airport
                                                                                 80


                                                              Compositional distribution:
                                                                                100
                                                                                                                                                                                               characters, plate
                                                                                                                                                   exact null distribution and P must be re-normalized.
                                                                                                                                                       The effect on coverage of the perturbation can be seen
                                                                                                                                                                                                        sides
                                                                                120
                                                                                                                                                   by comparing the upper and lower panels in Figure 3. For

                Basic structures                                              Composition vs.
                                                                                140
                                                                                                                                                   each non-terminal brick β, the denominator, p0 (aβ ), was
                                                                                                                                                                                                      β
                                                                                                                                                   approximated by assuming that in the absence of an explicit
   Efficient discrimination: Markov versus Content-Sensitive dist.              160
                                                                                                                                                   constraint, the prior distribution on aβ is the parts of
                                                                                                                                                                                                     one consis-
cient discrimination: Markov versus Content-Sensitive 200                      Coincidence
                                                                                      20   40 60  80  100 120 140 160 180
                                                                                                                          dist.                    tent with independent instantiations of the children. The
                                                                                                                                                                                                characters and
                                                                                    (active) bricks. The proportionality sign (∝) can be replaced  numerator, pc (aβ ), was constructed to encourage regularity
                                                                                                                                                                 β
      20                                                                         20                                                                in                                              plate sides
                                                                                    with equality (=) if, at the introduction of each attribute the relative positions of character parts, and of charac-
                                                                                    function, aβ , care is taken to ensure that p0 (aβ ) is exactly
                                                                                                                                 β                 ters, in composing characters and strings, respectively. The
      40                                                                         40
                                                                                    the current (“unperturbed”) conditional distribution on aβ     upper panel is a sample instantiation from the Markov back-
      60                                                                         60
                                                                                    given xβ  0. In general, it is not practical to compute an    bone; the lower panel is a sample instantiation from the full
      80


     100
                                                   Sampling                      80


                                                                                100
                                                                                    exact null distribution and P must be re-normalized.
                                                                                        The effect on coverage of the perturbation can be seen
                                                                                                                                                   compositional distribution. Samples from the full compo-
                                                                                                                                                   sitional distribution can be computed (at considerable com-
     120                                                                        120 by comparing the upper and lower panels in Figure 3. For       putational cost) through a variant of importance sampling.
     140                                                                        140
                                                                                    each non-terminal brick β, the denominator, p0 (aβ ), was
                                                                                                                                      β            Conditional Data Models. The data model connects in-
                                                                                    approximated by assuming that in the absence of an explicit    terpretations to the grey-level image, and completes the
     160                       Original image
                                image
                  discrimination: 160 180 200                               Zoomed license license region 200 aβ is the one consis-
        EfficientOriginal 120 140 Markov versus Content-Sensitive dist. 60 the region
                                                                                160     Zoomed prior 140 160 180
                                                                                    constraint, 80 100 120 distribution on
           20  40  60  80  100                                                        20   40
                                                                                                                                                   Bayesian framework. In the license-plate-reading demon-
                                                                                    tent with independent instantiations of the children. The      stration system, we have assumed that the data distribution,
                                                                           Figure 3. Samples from Markov backbone (upper panel, ‘4850’)
                                                                                    numerator, pc (aβ ), was constructed to encourage regularity
                                                                                                  β                                                conditioned on an interpretation, is a function only of the
                                                                           and compositional distribution (lower panel, ‘8502’).
                                                                                    in the relative positions of character parts, and of charac-
      20                                                                                                                                           states of the terminal bricks:
      40
                                                                                    ters, in composing characters and strings, respectively. The
                                                                           aβ (I ) returns the relative coordinates of the four numerals back-
                                                                                    upper panel is a sample instantiation from the Markov                          P (y|I ) = P (y|{xβ : β ∈ T })
      60
                                                                                    bone; the lower panel is a sample instantiation from the full
                                                                           that instantiate β in the interpretation I . Similarly, each            where T ⊆ B is the set of terminal, or bottom-row, bricks.
                                                          Zoomed license character brick, and each numeral Samples fromhas an as-
                                                                                    compositional distribution. in particular, the full compo-
      80
                       Original image                                      region
                                                                                                                                                       Good performance in most image analysis applications
     100


     120
                                                   Detection               sociated attribute function can be computed (at considerable com-
                                                                                    sitional distribution that computes the relative coor-
                                                                                     of the particular parts a variant of importance that
                                                                                                                                                   requires some degree of photometric invariance. In the
                                                                           dinatesputational cost) through that are composed into sampling. context of a probability model, the notion of invariance is
     140                                                                            Conditional Data Models. The A “compositional
                                                                           character in a particular interpretation. data model connects in-       closely connected to the statistical notion of sufficiency.
           Top object under MarkovMarkov Top object under built to thea grey-level image, and completes the
                       Top object under                                    distribution” is content-sensitive
                                                                               Top object under content-sensitive (Equation 1)
                                                                                    terpretations from Markov backbone
     160                                                                                                                                           The following data model, employed in the demonstration
                   60 distribution                                                    distribution
           20  40      80  100  120
                                    distribution
                                     140 160 180 200
                                                                                                 distribution
                                                                                    Bayesian framework. In the license-plate-reading demon-
                                                                           and a pair of probability distributions, pc (“composed”) and
                                                                                                                          β                        system, is an example of the application of sufficiency to
                                                                                    stration system, we have assumed that the data distribution,
 Figure 3. Samples from Markov backbone (upper panel, ‘4850’)β (“null”), on each attribute a . The former, composed
                                                                           p0                                     β
                                                                                                                                                   invariance. As remarked earlier, the terminal bricks in
                   Top object under Markov                                 distribution, captures regularities of the is a function only of the
                                                                                    conditioned on an interpretation, arrangements (i.e.
                                                     Top object under content-sensitive
 and compositional distribution (lower panel, ‘8502’).                                                                                                                                     Maria Isabel Restrepo
                                                                                                                                                   the demonstration system represent reusable parts of alpha-
                          distribution                          distribution        states of the terminal bricks:
                                                                           instantiations) of the children bricks, given that they are             numeric characters. The states of the terminal bricks code
                                                                           parts of the object represented by (y|{xβlatter, null distribu-
                                                                                                     P (y|I ) = P β; the : β ∈ T })                the local position of the represented part. Some of the parts
 aβ (I ) returns the relative coordinates of the four numerals             tion, is the attribute distribution in the absence of the non-          can be more-or-less clearly discerned from the upper-hand
 that instantiate β in the interpretation I . Similarly, each
Prior Work by Fidler and Leonardis
[Fidler, Berginc, Leonardis CVPR 2006], [Fidler, Leonardis, CVPR 2007], [Fidler, Boben, Leonardis CVPR 2008]


 Compositionality and bottom-up learning
 ✤   Computation efficiency - Scalable
 ✤   Bottom up learning: All classes in early
     layers, then class specific
 ✤   Models general and discriminative
 ✤   Sharing of parts

 Have learned complete objects from simple edges




                                             Example of learned whole-object shape models.
 Fidler, M. Boben, A. Leonardis. Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation. Submitted to a journal.
                                                            Images from Fidler webpage


                                                                                                                                  Maria Isabel Restrepo
!




Work by Mundy and Ozcanli
[Mundy, Ozcanli, SPIE 2009 ]
                                                                                            F igu re 6 A n example of vehicle ext rema oper ator responses.           1, 0.5, 90o , dark . T he spatial resolution is
                                                                                                      a round 0.7 meters, with about 25 pixels on a vehicle. T he oper ator response is indicated by the cyan dot.
                                                                                                      T he oper ator ker nel extent is indicated in blue. T he or iginal grey scale intensity is in the red channel.




                                                                                      Composition of Parts
✤    Combine Geman’s and Leonardis’ work into an
     unified Bayesian framework                                                              F igu re 7 T he composition of ext rema oper ators. T he anisot ropic da r k oper ator,
                                                                                                     b r ight pea k oper ator,
                                                                                                     or ientation     '   .
                                                                                                                                 ' . T he composition is cha r acter ized by distance d
                                                                                                                                                                                          , is composed with one of a
                                                                                                                                                                                           '   and relative




✤    Classification of foreground objects: Vehicles                                          F igu re 8 T h ree p r imitive ext rema oper ators compose in a L ayer 1 node. T he cent r al pa r t is
                                                                                                             2, 1, - 45o , bright , and the second p r imitive pa r t is ' 2, 1, - 45o , dark . T he pea k responses of the

                                                                                                                                                                                                                              !
                                                                                                      oper ators a re indicated by cyan pixels. T he oper ator ker nel is indicated in blue. T he vehicle intensity is in
                                                                                                      the red channel.




✤    Domain: Low resolution, satellite images




 Probabilistic Score:
                            p(dαα , θαα |ci  )P (ci  )
    p(ci  |dαα , θαα ) =                 αα       αα
       αα
                                  p(dαα , θαα )                                                                                                                                                              !


                                                                         k−1
                                                                                                                      j        j
p(d    αα   ,θ   αα   ) = p(d   αα   ,θ   αα   |¯
                                                    c
                                                    αα   )P (¯
                                                              c
                                                              αα   )+         p(d   αα   ,θ    αα                 |cαα )P (cαα )
                                                                         j=0

                                                                                                                                              Maria Isabel Restrepo
Hierarchical Composition for 3D Objects


                                       Buildings, streets, trees,
                                               rivers...


                                       Windows, street lines,
                                          roofs, leafs ...



                                      Junctions, curves...



                                 Simple primitives e.g edges




                         Learn bottom-up




                                                         Maria Isabel Restrepo
Outline
✤   Volumetric appearance model - The Voxel World

✤   Insights on classical recognition methods

✤   Compositional hierarchies
      ✤   Jin  Geman
      ✤   Fidler  Leonardis, CVPR’07; Fidler, Boben  Leonardis, CVPR 2008
      ✤   Mundy  Ozcanli, SPIE ’09


✤   Proof of concept: Construction of a simple hierarchy to find
    windows in the voxel world

✤   Future Work
                                                                              Maria Isabel Restrepo
Data and Algorithm
                                                  ˜
                                   min DKL (f (x)|f (x))                Algorithm Steps
                                    or   f1 (x)
                                                                        1.For each orientation

                      K1
                      
                                                                         ✤   Apply corner kernel on
            f (x) =         wk fk (x)             ˜
                                                  f (x) ∼ N(˜f , σf )
                                                            µ ˜2
                      k=1
                                                                             appearance and occupancy grids
Top :Mean appearance near wall surface. Bottom: occupancy                ✤   Perform non-maxima
                                                                             suppression on kernel-specific
                                                                             region
                                                                        2.Build a hierarchy to find windows




                                                                                                 Maria Isabel Restrepo
The Primitives: Corner Kernel

      Corner kernel in 2D                          Corner kernel in 3D
  Every pixel has a label/weight               Every voxel has a label/weight

                                       DEPTH

                                                                        PLUS (+)
                                                                        REGION




                                   HEIGHT




                                                   WIDTH             MINUS (-)
                                                                     REGION




                                                                         Maria Isabel Restrepo
The Primitives: Corner Kernel



                                 PLUS (+)
                                REGION -
                                 WHITE
                                VOXELS



                                MINUS (-)
                                REGION-
                                 BLACK
                                VOXELS




                                Maria Isabel Restrepo
The Primitives: Corner Kernel
Rotate kernel to create layer of primitives

                         z



                 ψ               θ
                                     y
                             φ
                     x
     Coordinate system of a corner kernel



                                         Layer 1: Primitives
                                            3D Corners




                                                               Maria Isabel Restrepo
Applying the Kernel

                      Corresponding voxels




                                     Maria Isabel Restrepo
Applying the Kernel

                      “Convolve” kernel with
                         appearance grid




                                      Maria Isabel Restrepo
Operator Response and Simplifications
                                             Ixi         :        Intensity at voxel xi
                                                K        :        Kernel response
                                                                               
                                                K       =                Ixi −                         Ixj
                                                                  i:xi ∈R+               j:xj ∈R−

                                                K       ∼         Nk (µk , σk ) Distribution of the response
                                                                            2

                                                                                                                               
                                                µk =               µxi −               µxj         2
                                                                                                  σk   =               2
                                                                                                                      σxi   +               2
                                                                                                                                           σxj
                                                        i:xi ∈R+            j:xj ∈R−                       i:xi ∈R+             j:xj ∈R−




 This may be the first feature detector based on the spatial arrangement of appearance distributions

                                                                                         |R1



                                            {
                                                                                             +|
                                                                   
                                                 µk ,                      P (xi ∈ S)            t and µk  0
     kernel response = rα =                                      i:xi ∈R+


                                                 0,          otherwise




                                                                                                                        Maria Isabel Restrepo
Experiment Setup:

1. Demonstrate Hierarchy on a small region   Experimental hierarchy



                                                                      Object Layer:
                                                                        Window



                                                                      Layer 3:
2. Show some results on the full grid                            Triplets of corners



                                                                     Layer 2:
                                                                 Pairs of corners



                                                                    Layer 1:
                                                                Corner primitives




                                                               Maria Isabel Restrepo
Algorithm Steps

                  Algorithm Steps
                  1. For each orientation
                    ✤   Run a corner kernel




                                      Maria Isabel Restrepo
Layer 1: Simple Features

                           Algorithm Steps
                           1. For each orientation
                             ✤   Run a corner kernel
                             ✤   Perform non-maxima suppression
                                 on kernel-specific region




                                                            Maria Isabel Restrepo
Layer 2
          Algorithm Steps
          2. Build a hierarchy
             2.1 Pair corners (90°)→Pairs
                 p(ci i ,αj |dαi ,αj , θαi ,αj ) =
                    α
                            1

                 {   |{rαi , rαj  0}|
                     0,   otherwise
                                         ,   for rαi , rαj  0




                                                      Maria Isabel Restrepo
Layer 3

          Algorithm Steps
          1. ...
          2. Build a hierarchy
             2.1.Pair coplanar corners (90°)→ Pairs
             2.2.Pair corner pairs→ Triplets




                                          Maria Isabel Restrepo
Object Layer : Windows

                     Algorithm Steps
                     1. ...
                     2. Build a hierarchy
                          2.1.Pair corners (90°)→Pair
                          2.2.Pair corner pairs→ L-shape
                          2.3.Pair Triplets→ Window




                                                        Maria Isabel Restrepo
Full Grid: Occupancy Probabilities
Full Grid Results: Corners
Full Grid Results: Windows
Summary

✤ Appealing characteristics of The Voxel World and Compositional Hierarchies

✤ Introduced volumetric feature detectors that operate on distribution functions of
  appearance


✤ Demonstrated, using a very simple instance of a compositional hierarchy the
  efficiency of such representation.


✤ Localized large number of windows




                                                                       Maria Isabel Restrepo
Future Work

✤ Include other extrema operators in the hierarchy (e.g. edges)

✤ Use occupancy information

✤ Learn prior distributions to fully explain probability density of compositions

✤ Optimize source code: Search and storage of parts (e.g octree)

✤ Learn parts automatically

✤ Learn whole-object hierarchies




                                                                       Maria Isabel Restrepo
The Principle of Compositionality
The meaning of a complex expression is determined by
  its structure and the meanings of its constituents.
              Stanford Encyclopedia of Philosophy




                      Questions?




                                                    Maria Isabel Restrepo

Weitere ähnliche Inhalte

Andere mochten auch

Pp linkedin example to upload to shareslide
Pp linkedin example to upload to shareslidePp linkedin example to upload to shareslide
Pp linkedin example to upload to shareslideCindy Eack
 
Skills dev small
Skills dev small Skills dev small
Skills dev small Oly Bell
 
Media cd covers slideshow
Media cd covers slideshowMedia cd covers slideshow
Media cd covers slideshowOly Bell
 
Skills dev
Skills devSkills dev
Skills devOly Bell
 
Media cd covers slideshow
Media cd covers slideshowMedia cd covers slideshow
Media cd covers slideshowOly Bell
 
Elementy dydaktyki (1)
Elementy dydaktyki (1)Elementy dydaktyki (1)
Elementy dydaktyki (1)szczypek96
 
Croft search master090705
Croft search master090705Croft search master090705
Croft search master090705AndyPaton
 
Creating Sustainable Change
Creating Sustainable ChangeCreating Sustainable Change
Creating Sustainable Changenaveen_ncn
 
Candidate Manager - erecruiting solution
Candidate Manager - erecruiting solutionCandidate Manager - erecruiting solution
Candidate Manager - erecruiting solutionCandidate_Manager
 
Mobility manager 90
Mobility manager 90Mobility manager 90
Mobility manager 90Axle-IT
 
Magazine Research media
Magazine Research mediaMagazine Research media
Magazine Research mediaOly Bell
 
Leadership Lessons
Leadership LessonsLeadership Lessons
Leadership Lessonsnaveen_ncn
 
SP3 features
SP3 featuresSP3 features
SP3 featuresAxle-IT
 

Andere mochten auch (16)

Pp linkedin example to upload to shareslide
Pp linkedin example to upload to shareslidePp linkedin example to upload to shareslide
Pp linkedin example to upload to shareslide
 
Skills dev small
Skills dev small Skills dev small
Skills dev small
 
Media cd covers slideshow
Media cd covers slideshowMedia cd covers slideshow
Media cd covers slideshow
 
Skills dev
Skills devSkills dev
Skills dev
 
Candidate Manager
Candidate ManagerCandidate Manager
Candidate Manager
 
Media cd covers slideshow
Media cd covers slideshowMedia cd covers slideshow
Media cd covers slideshow
 
Elementy dydaktyki (1)
Elementy dydaktyki (1)Elementy dydaktyki (1)
Elementy dydaktyki (1)
 
Croft search master090705
Croft search master090705Croft search master090705
Croft search master090705
 
Creating Sustainable Change
Creating Sustainable ChangeCreating Sustainable Change
Creating Sustainable Change
 
Candidate Manager - erecruiting solution
Candidate Manager - erecruiting solutionCandidate Manager - erecruiting solution
Candidate Manager - erecruiting solution
 
Candidate manager
Candidate managerCandidate manager
Candidate manager
 
Mobility manager 90
Mobility manager 90Mobility manager 90
Mobility manager 90
 
Magazine Research media
Magazine Research mediaMagazine Research media
Magazine Research media
 
Leadership Lessons
Leadership LessonsLeadership Lessons
Leadership Lessons
 
ICPRAM 2012
ICPRAM 2012ICPRAM 2012
ICPRAM 2012
 
SP3 features
SP3 featuresSP3 features
SP3 features
 

Ähnlich wie Progress review1

A probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptA probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptirisshicat
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Modelszukun
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingIPALab
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introductionzukun
 
Grade 7 year at a glance teachers
Grade 7 year at a glance teachersGrade 7 year at a glance teachers
Grade 7 year at a glance teachersjtrost
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1zukun
 

Ähnlich wie Progress review1 (12)

A probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptA probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features ppt
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
Seminar
SeminarSeminar
Seminar
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
 
UCB 2012-02-28
UCB 2012-02-28UCB 2012-02-28
UCB 2012-02-28
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite Tracing
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
 
Surveys
SurveysSurveys
Surveys
 
Grade 7 year at a glance teachers
Grade 7 year at a glance teachersGrade 7 year at a glance teachers
Grade 7 year at a glance teachers
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
 

Progress review1

  • 1. Compositional Hierarchy for 3D Object Recognition Maria Isabel Restrepo October 26, 2009
  • 2. Goal: Maria Isabel Restrepo
  • 3. Goal Geometry Expected Appearance Renderings obtained by Dan Crispell Maria Isabel Restrepo
  • 4. Goal: Recognition in a 3D World Maria Isabel Restrepo
  • 5. Compelling Characteristics POWERFUL GEOMETRIC AND PHOTOMETRIC REPRESENTATION* OF SCENES ✤ It is a 3D, geometric representation that supports discovery of spatial relations ✤ Its appearance is modeled by MOG to handle illumination variations ✤ Appearance and geometry are automatically learned from multiple images with calibrated cameras ✤ It is faithful to the scenes: There are no prior assumptions about the model THESE CHARACTERISTICS ARE IDEAL FOR OBJECT RECOGNITION * [Pollard and Mundy, CVPR 2007] [Crispell] Maria Isabel Restrepo
  • 6. Outline ✤ Volumetric appearance model - The Voxel World ✤ Insights on classical recognition methods ✤ Compositional hierarchies ✤ Bienenstock, Geman, Potter, 97; Geman, Chi, 2002; Geman, Jin, CVPR 2006 ✤ Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008 ✤ Mundy & Ozcanli, SPIE ’09 ✤ Experimental work: Proof of concept ✤ Future work Maria Isabel Restrepo
  • 7. The Voxel World Probabilistic representation of 3-d scenes based on volumetric units -voxel. p(intensity) intensity Surface probability is given by incremental learning Appearance is modeled a Mixture of Gaussians 3 (I−µk )2 pN (Ix +1 |X ∈ S) N wk 1 − 2σ 2 P N +1 (X ∈ S) = P N (X ∈ S) p(I) = e k pN (Ix +1 ) N W 2πσk 2 k=1 Maria Isabel Restrepo
  • 8. Outline ✤ Volumetric appearance model - The Voxel World ✤ Insights on classical recognition methods ✤ Compositional hierarchies ✤ Jin Geman ✤ Fidler Leonardis, CVPR’07; Fidler, Boben Leonardis, CVPR 2008 ✤ Mundy Ozcanli, SPIE ’09 ✤ Experimental work: Proof of concept ✤ Future work Maria Isabel Restrepo
  • 9. Classical Recognition: Bag of Features Codeword, Feature space - Feature descriptor Codebook Classify e.g .SVM e.g. SIFT- Lowe Naive Bayes HOG- Dalal NN Drawbacks: Many have proposed more complex representations of spatial object structure. ✤ Disregards spatial ✤ Constellation Models [Weber and Welling et al, Fergus et al] information -Complex, few parts ✤ Large number of features are ✤ Probabilistic voting [Leibe, Schiele] -Large codebook - complex matching needed ✤ Hierarchical representations Maria Isabel Restrepo
  • 10. formation about the geometric Learning Hierarchical Models of Scenes, Objects, and Parts center and local appearance. F Hierarchical Representations Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky clusters and their distributions f Electrical Engineering Computer Science, Massachusetts Institute of Technology are therefore represented in on esuddert@mit.edu, torralba@csail.mit.edu, billf@mit.edu, willsky@mit.edu clusters and geometric distribu We thus need the means of finding the similarities among different hierarchical nodes in a geometrical sense. Abstract We propose to create similarity connections between hi- o ζ νo a b erarchical nodes within layers to achieve invariance for high top nodes aj We describe a hierarchical probabilistic model for the variability in object shape and draw similarities across lay- detection and recognition of objects in cluttered, natural r Φ ∆o ers to achieve a proper scale normalization of features. We h O show how a d layer-independent description of objects de- scenes. The model is based on a set of parts which describe α θ f e Address the need for a c fined by the so-called shape-terminals, i.e. shapinals, can ✤ g the expected appearance and position, in an object centered O z νp be passed to the higher-level, the category-specific repre- coordinate frame, of features detected by a low-level inter- µ sentation. If performed inappearance k problem of ter- l m n this manner, the r i est operator. Each object category then has its own distri- aj minal nodes within the hierarchical “library” is solved in a representation that bution over these parts, which are shared between objects. We learn the parameters of this model via a Gibbs sampler β φ P w x Nm Λ P ∆p geometric natural way. There is no distributions need to by-pass or float features to the top-most layer and thus unnecessarily load the complex- p(g j,a j |O n) M ity of representation, which may prevent the unsupervised incorporates geometric which uses the graphical model’s structure to analytically Figure 1. Graphical model describing how latent parts z creation of higher layers (the problem arising in [7]). In- average over many parameters. Applied to a database of (a) images of isolated objects, the sharing of parts among ob- generate the appearance w and position x , relative to Sudderth, Torralba, Freeman Willsky Mikolajcjzyk, Leibe, Schiele stead, at each hierarchical stage of learning, only a subset an image–specific reference location r , of the features of the layer’s statistically most repeatable features can be coherence jects improves detection accuracy when few training exam- ples are available. We also extend this hierarchical frame- MIT-2006 detected in an image of object o . Boxes denote repli- UK, Switzerland,Hierarchical 2006 Figure 2. (a) Germany structur combined further, yet the final, cross-layered description of objects will retain its descriptive power. Appearance clusters (left column cation of the corresponding random variables: there are work to scenes containing multiple objects. M images, with Nm observed features in image m. tions for different object classes. F 1. Introduction In this paper, we develop methods for the visual detec- are in 2D Cartesian coordinate sys with interesting semantic interpretations, and can improve ✤ Allow for a more tion and recognition of object categories. We argue that multi–object recognition systems should be based on mod- performance when few training examples are available. Fi- nally, object appearance information is shared between the Building the tree. To build t els which consider the relationships between different ob- many scenes in which that object is found. efficient representation ject categories during the training process. This approach provides several benefits. At the lowest level, significant We begin in Sec. 2 by describing our generative model for objects and parts, including a discussion of related work clidean distance) to group the a computational savings can be achieved if different cate- a hyperball of a given radius r in the machine vision and text analysis literature. Sec. 3 gories share a common set of features. More importantly, then describes parameter estimation methods which com- or part they belong to. To bu jointly trained recognition systems can use similarities be- bine Gibbs sampling with efficient variational approxima- tween object categories to their advantage by learning fea- tions. In Sec. 4, we provide simulations demonstrating ply agglomerative clustering. T tures which lead to better generalization [4, 18]. This inter– the potential benefits of feature sharing. We conclude in with the number of clusters eq ✤ Consistent with category regularization is particularly important in the com- mon case where few training examples are available. In complex, natural scenes, object recognition systems Sec. 5 with preliminary extensions of the object hierarchy to scenes containing multiple objects. and merges the two closest cl record the indices of merged cl biological systems can be further improved by using contextual knowledge 2. A Generative Model for Object Features about the objects likely to be found in a given scene, and Jin and Geman, 2006 Our generative model for objects is summarized in the Figuretance at which the representation. are m 1. Cross-layered, scale independent clusters common spatial relationships between those objects [7, 19, Williamsgraphical model (a directed Geman Chris Jin and Bayesian network) of Fig. 1. ANC Fidler, Boben continues until the l Leonardis 20]. In this paper, we propose a hierarchical generative Hierarchical Object Recognition The nodes of this graph represent random variables, where procedure hierarchical compositional model for objects, the parts composing them, and the scenes Brown University 3.1. The base model: U. The resulting Slovenia trace of Ljubljana, clustering shaded nodes are observed during training, and rounded framework [7] surrounding them. The model, which is summarized in Figs. 1 and 5, shares information between object categories CVPR 2006 boxes are fixed hyperparameters. Edges encode the con- CVPR 2007, 2008 We build on our previously proposed approach [7], p tree. The only parameter to ditional densities underlying the generative process [12]. where we proposed an unsupervised learning framework in three distinct ways. First, parts define distributions over a 2.1. From Images to Features to obtain tom nodes (radius of appearanc a hierarchical compositional representation of ob- common low–level feature vocabularly, leading to compu- ject categories. Starting with simple oriented filters the ap- tational savings when analyzing new images. In addition, Following [17], we represent each of our M grayscale proach learns the first three The of optimally sharable tree levels. layers radii for interm and more unusually, objects are defined using a common training images by a set of SIFT descriptors [13] computed features, defined as loose spatial Isabel Restrepo Maria compositions, i.e. parts. set of parts. This structure leads to the discovery of parts on affine covariant regions. We use K-means clustering to Upon thetributed higher-layer categorical representa- n third layer, a between the bottom tion is derived with minimal supervision. The model is in essence composed of two recursively iterated radii are o the top node. These steps, 1.) a
  • 11. Prior work by Geman: Efficient Discrimination [Bienenstock, Geman, Potter, 97], [Geman, Chi, 2002], [Geman, Jin, CVPR 2006] A COMPOSITIONAL MACHINE: license plates ✤ Probabilistic framework ✤ Hierarchy and reusability license numbers ✤ It does not exclude the sharing of subparts ✤ Parts are everywhere, compositions are rare plate boundary ✤ Need to model relative geometry of parts (active) bricks. The proportionality sign (∝)generic letter, can be replaced with equality (=) if, at the introduction generic number of each attribute 20 function, aβ , care is taken to ensure that p0 (aβ ) is exactly β 40 the current (“unperturbed”) conditional distribution on aβ 60 given xβ 0. In general, it is not practical to compute an Markovian distribution: Test set: 385 images, mostly from Logan Airport 80 Compositional distribution: 100 characters, plate exact null distribution and P must be re-normalized. The effect on coverage of the perturbation can be seen sides 120 by comparing the upper and lower panels in Figure 3. For Basic structures Composition vs. 140 each non-terminal brick β, the denominator, p0 (aβ ), was β approximated by assuming that in the absence of an explicit Efficient discrimination: Markov versus Content-Sensitive dist. 160 constraint, the prior distribution on aβ is the parts of one consis- cient discrimination: Markov versus Content-Sensitive 200 Coincidence 20 40 60 80 100 120 140 160 180 dist. tent with independent instantiations of the children. The characters and (active) bricks. The proportionality sign (∝) can be replaced numerator, pc (aβ ), was constructed to encourage regularity β 20 20 in plate sides with equality (=) if, at the introduction of each attribute the relative positions of character parts, and of charac- function, aβ , care is taken to ensure that p0 (aβ ) is exactly β ters, in composing characters and strings, respectively. The 40 40 the current (“unperturbed”) conditional distribution on aβ upper panel is a sample instantiation from the Markov back- 60 60 given xβ 0. In general, it is not practical to compute an bone; the lower panel is a sample instantiation from the full 80 100 Sampling 80 100 exact null distribution and P must be re-normalized. The effect on coverage of the perturbation can be seen compositional distribution. Samples from the full compo- sitional distribution can be computed (at considerable com- 120 120 by comparing the upper and lower panels in Figure 3. For putational cost) through a variant of importance sampling. 140 140 each non-terminal brick β, the denominator, p0 (aβ ), was β Conditional Data Models. The data model connects in- approximated by assuming that in the absence of an explicit terpretations to the grey-level image, and completes the 160 Original image image discrimination: 160 180 200 Zoomed license license region 200 aβ is the one consis- EfficientOriginal 120 140 Markov versus Content-Sensitive dist. 60 the region 160 Zoomed prior 140 160 180 constraint, 80 100 120 distribution on 20 40 60 80 100 20 40 Bayesian framework. In the license-plate-reading demon- tent with independent instantiations of the children. The stration system, we have assumed that the data distribution, Figure 3. Samples from Markov backbone (upper panel, ‘4850’) numerator, pc (aβ ), was constructed to encourage regularity β conditioned on an interpretation, is a function only of the and compositional distribution (lower panel, ‘8502’). in the relative positions of character parts, and of charac- 20 states of the terminal bricks: 40 ters, in composing characters and strings, respectively. The aβ (I ) returns the relative coordinates of the four numerals back- upper panel is a sample instantiation from the Markov P (y|I ) = P (y|{xβ : β ∈ T }) 60 bone; the lower panel is a sample instantiation from the full that instantiate β in the interpretation I . Similarly, each where T ⊆ B is the set of terminal, or bottom-row, bricks. Zoomed license character brick, and each numeral Samples fromhas an as- compositional distribution. in particular, the full compo- 80 Original image region Good performance in most image analysis applications 100 120 Detection sociated attribute function can be computed (at considerable com- sitional distribution that computes the relative coor- of the particular parts a variant of importance that requires some degree of photometric invariance. In the dinatesputational cost) through that are composed into sampling. context of a probability model, the notion of invariance is 140 Conditional Data Models. The A “compositional character in a particular interpretation. data model connects in- closely connected to the statistical notion of sufficiency. Top object under MarkovMarkov Top object under built to thea grey-level image, and completes the Top object under distribution” is content-sensitive Top object under content-sensitive (Equation 1) terpretations from Markov backbone 160 The following data model, employed in the demonstration 60 distribution distribution 20 40 80 100 120 distribution 140 160 180 200 distribution Bayesian framework. In the license-plate-reading demon- and a pair of probability distributions, pc (“composed”) and β system, is an example of the application of sufficiency to stration system, we have assumed that the data distribution, Figure 3. Samples from Markov backbone (upper panel, ‘4850’)β (“null”), on each attribute a . The former, composed p0 β invariance. As remarked earlier, the terminal bricks in Top object under Markov distribution, captures regularities of the is a function only of the conditioned on an interpretation, arrangements (i.e. Top object under content-sensitive and compositional distribution (lower panel, ‘8502’). Maria Isabel Restrepo the demonstration system represent reusable parts of alpha- distribution distribution states of the terminal bricks: instantiations) of the children bricks, given that they are numeric characters. The states of the terminal bricks code parts of the object represented by (y|{xβlatter, null distribu- P (y|I ) = P β; the : β ∈ T }) the local position of the represented part. Some of the parts aβ (I ) returns the relative coordinates of the four numerals tion, is the attribute distribution in the absence of the non- can be more-or-less clearly discerned from the upper-hand that instantiate β in the interpretation I . Similarly, each
  • 12. Prior Work by Fidler and Leonardis [Fidler, Berginc, Leonardis CVPR 2006], [Fidler, Leonardis, CVPR 2007], [Fidler, Boben, Leonardis CVPR 2008] Compositionality and bottom-up learning ✤ Computation efficiency - Scalable ✤ Bottom up learning: All classes in early layers, then class specific ✤ Models general and discriminative ✤ Sharing of parts Have learned complete objects from simple edges Example of learned whole-object shape models. Fidler, M. Boben, A. Leonardis. Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation. Submitted to a journal. Images from Fidler webpage Maria Isabel Restrepo
  • 13. ! Work by Mundy and Ozcanli [Mundy, Ozcanli, SPIE 2009 ] F igu re 6 A n example of vehicle ext rema oper ator responses. 1, 0.5, 90o , dark . T he spatial resolution is a round 0.7 meters, with about 25 pixels on a vehicle. T he oper ator response is indicated by the cyan dot. T he oper ator ker nel extent is indicated in blue. T he or iginal grey scale intensity is in the red channel. Composition of Parts ✤ Combine Geman’s and Leonardis’ work into an unified Bayesian framework F igu re 7 T he composition of ext rema oper ators. T he anisot ropic da r k oper ator, b r ight pea k oper ator, or ientation ' . ' . T he composition is cha r acter ized by distance d , is composed with one of a ' and relative ✤ Classification of foreground objects: Vehicles F igu re 8 T h ree p r imitive ext rema oper ators compose in a L ayer 1 node. T he cent r al pa r t is 2, 1, - 45o , bright , and the second p r imitive pa r t is ' 2, 1, - 45o , dark . T he pea k responses of the ! oper ators a re indicated by cyan pixels. T he oper ator ker nel is indicated in blue. T he vehicle intensity is in the red channel. ✤ Domain: Low resolution, satellite images Probabilistic Score: p(dαα , θαα |ci )P (ci ) p(ci |dαα , θαα ) = αα αα αα p(dαα , θαα ) ! k−1 j j p(d αα ,θ αα ) = p(d αα ,θ αα |¯ c αα )P (¯ c αα )+ p(d αα ,θ αα |cαα )P (cαα ) j=0 Maria Isabel Restrepo
  • 14. Hierarchical Composition for 3D Objects Buildings, streets, trees, rivers... Windows, street lines, roofs, leafs ... Junctions, curves... Simple primitives e.g edges Learn bottom-up Maria Isabel Restrepo
  • 15. Outline ✤ Volumetric appearance model - The Voxel World ✤ Insights on classical recognition methods ✤ Compositional hierarchies ✤ Jin Geman ✤ Fidler Leonardis, CVPR’07; Fidler, Boben Leonardis, CVPR 2008 ✤ Mundy Ozcanli, SPIE ’09 ✤ Proof of concept: Construction of a simple hierarchy to find windows in the voxel world ✤ Future Work Maria Isabel Restrepo
  • 16. Data and Algorithm ˜ min DKL (f (x)|f (x)) Algorithm Steps or f1 (x) 1.For each orientation K1 ✤ Apply corner kernel on f (x) = wk fk (x) ˜ f (x) ∼ N(˜f , σf ) µ ˜2 k=1 appearance and occupancy grids Top :Mean appearance near wall surface. Bottom: occupancy ✤ Perform non-maxima suppression on kernel-specific region 2.Build a hierarchy to find windows Maria Isabel Restrepo
  • 17. The Primitives: Corner Kernel Corner kernel in 2D Corner kernel in 3D Every pixel has a label/weight Every voxel has a label/weight DEPTH PLUS (+) REGION HEIGHT WIDTH MINUS (-) REGION Maria Isabel Restrepo
  • 18. The Primitives: Corner Kernel PLUS (+) REGION - WHITE VOXELS MINUS (-) REGION- BLACK VOXELS Maria Isabel Restrepo
  • 19. The Primitives: Corner Kernel Rotate kernel to create layer of primitives z ψ θ y φ x Coordinate system of a corner kernel Layer 1: Primitives 3D Corners Maria Isabel Restrepo
  • 20. Applying the Kernel Corresponding voxels Maria Isabel Restrepo
  • 21. Applying the Kernel “Convolve” kernel with appearance grid Maria Isabel Restrepo
  • 22. Operator Response and Simplifications Ixi : Intensity at voxel xi K : Kernel response K = Ixi − Ixj i:xi ∈R+ j:xj ∈R− K ∼ Nk (µk , σk ) Distribution of the response 2 µk = µxi − µxj 2 σk = 2 σxi + 2 σxj i:xi ∈R+ j:xj ∈R− i:xi ∈R+ j:xj ∈R− This may be the first feature detector based on the spatial arrangement of appearance distributions   |R1 { +| µk ,  P (xi ∈ S) t and µk 0 kernel response = rα = i:xi ∈R+ 0, otherwise Maria Isabel Restrepo
  • 23. Experiment Setup: 1. Demonstrate Hierarchy on a small region Experimental hierarchy Object Layer: Window Layer 3: 2. Show some results on the full grid Triplets of corners Layer 2: Pairs of corners Layer 1: Corner primitives Maria Isabel Restrepo
  • 24. Algorithm Steps Algorithm Steps 1. For each orientation ✤ Run a corner kernel Maria Isabel Restrepo
  • 25. Layer 1: Simple Features Algorithm Steps 1. For each orientation ✤ Run a corner kernel ✤ Perform non-maxima suppression on kernel-specific region Maria Isabel Restrepo
  • 26. Layer 2 Algorithm Steps 2. Build a hierarchy 2.1 Pair corners (90°)→Pairs p(ci i ,αj |dαi ,αj , θαi ,αj ) = α 1 { |{rαi , rαj 0}| 0, otherwise , for rαi , rαj 0 Maria Isabel Restrepo
  • 27. Layer 3 Algorithm Steps 1. ... 2. Build a hierarchy 2.1.Pair coplanar corners (90°)→ Pairs 2.2.Pair corner pairs→ Triplets Maria Isabel Restrepo
  • 28. Object Layer : Windows Algorithm Steps 1. ... 2. Build a hierarchy 2.1.Pair corners (90°)→Pair 2.2.Pair corner pairs→ L-shape 2.3.Pair Triplets→ Window Maria Isabel Restrepo
  • 29. Full Grid: Occupancy Probabilities
  • 32. Summary ✤ Appealing characteristics of The Voxel World and Compositional Hierarchies ✤ Introduced volumetric feature detectors that operate on distribution functions of appearance ✤ Demonstrated, using a very simple instance of a compositional hierarchy the efficiency of such representation. ✤ Localized large number of windows Maria Isabel Restrepo
  • 33. Future Work ✤ Include other extrema operators in the hierarchy (e.g. edges) ✤ Use occupancy information ✤ Learn prior distributions to fully explain probability density of compositions ✤ Optimize source code: Search and storage of parts (e.g octree) ✤ Learn parts automatically ✤ Learn whole-object hierarchies Maria Isabel Restrepo
  • 34. The Principle of Compositionality The meaning of a complex expression is determined by its structure and the meanings of its constituents. Stanford Encyclopedia of Philosophy Questions? Maria Isabel Restrepo