SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Machine Vision Background
Region Annotation by Hierarchical Region Topic Discovery




        Probabilistic Generative Models for Machine
                           Vision

                                              Davide Bacciu

                                             bacciu@di.unipi.it
                                         Dipartimento di Informatica
                                             Università di Pisa


                                 Corso di Intelligenza Artificiale
                                   Prof. Alessandro Sperduti
                                 Università degli Studi di Padova
                                         A.A. 2008/2009


                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Machine Vision Background
Region Annotation by Hierarchical Region Topic Discovery


Outline


   1     Machine Vision Background
           Introduction
           Visual Content Representation
           Machine Vision Applications

   2     Region Annotation by Hierarchical Region Topic Discovery
           Visual Content Representation
           Hierarchical Probabilistic Latent Semantic Analysis
           Experimental Evaluation




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Introduction


           Visual content representation
                   Visual features - how to identify and describe the single bits
                   of visual information
                   Image representation - how to describe the visual
                   information within a picture
           Machine vision applications
                   Scene and object recognition
                   Image annotation

   Focus on Probabilistic Generative Models and the Bag of
   Words image representation



                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Visual Features Descriptors


         How do we describe the single bits of visual information?


   Global descriptors
           Color histograms
           Shape features
           Texture
           Spatial Envelope
   Local descriptors
           Gabor filters
           Differential filters
           Distribution-based
           descriptors


                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Scale Invariant Feature Transform (SIFT)

   Calculate gradient magnitude
   and orientation at a given
   keypoint (x, y ) and scale σ
   Compute 3D histogram of
   magnitude orientations in a
   4 × 4 neighborhood of the
   keypoint
   Angles are quantized to 8
   directions
   Robust to geometric and
   illumination distortion


                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Visual Features Detectors

   How do we identify the interesting/informative parts of an
   image?

           Feature detectors
                   Segment/Blob detectors
                   Corner/Interest point detectors
                   Affine invariant keypoint detectors
                   Dense sampling




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Image Representation



           Straightforward for global descriptors
                   An image corresponds to its descriptor
                   Images are confronted, classified and indexed based on
                   their global color and/or texture histograms
           Requires an additional step for local descriptors
                   Local descriptors provide only information on a portion of
                   the scene
                   Probabilistic modeling of segments
                   Bag of words




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Bag of Words Document Representation




           Count the occurrences of each dictionary word in your
           document
           Represent the document as a vector of word counts

   Definition (Bag of Words Assumption)
   Word order is not relevant for determining document semantics

                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                                Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                                Machine Vision Applications


Bag of Words Image Representation
           Each image is a document and each visual patch is a word
           Count the occurrences of each dictionary visual word
           (visterm) in your image
           Represent the image as a vector of visterm counts
           (histogram)


                                         ...                  k−means


                    Image                                                          Codebook
                   Collection
                                    Local patches



                                                           Vector
                                                         Quantization

                    Image            Input Image                                       BOW
                   Collection


                                         Davide Bacciu          Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Image Understanding Applications

           Images are typically represented as vector of features
           Discover the semantics of an image from its vectorial
           representation
                   Object recognition (e.g. Airplane)
                   Scene recognition (e.g. Forest)
                   Image annotation (e.g. Sky, Tree, Boat, Sea, ...)




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Scene and Object Recognition


   Structured approaches
                                                                                                   Place Context
           Object/scene categories are
           represented as collections of
           connected parts characterized by                                                                   Object Context

           distinctive appearance and spatial
                                                                                                                   Part Context
           position
           Part-based probabilistic models
   Weakly structured approaches
                                                                                                             PixelContext
           Based on bag-of-word assumption
           Latent aspect models




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Latent Aspect Discovery



           Rooted into text processing
           Probabilistic models describing the generative process of
           image content using hidden (i.e. latent) variables
           Model likelihood is used to measure the fit of the unknown
           parameters with respect to the observations
           Model parameters are fitted by Expectation Maximization
           (EM) or variational methods




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                               Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                               Machine Vision Applications


Probabilistic Latent Semantic Analysis (pLSA) (I)



                              P(d)               P(z|d)            P(w|z)
                                          d                z                  w
                                                                               Wd
                                                                                     D



           Observations are collected in the integer matrix n(wj , di )
           Every observation is associated to a latent topic k by
           means of the hidden variable zk
           Each hidden topic k denotes a particular visual theme (e.g.
           an object)


                                         Davide Bacciu         Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                               Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                               Machine Vision Applications


Probabilistic Latent Semantic Analysis (pLSA) (II)


                              P(d)               P(z|d)            P(w|z)
                                          d                z                  w
                                                                               Wd
                                                                                     D


           pLSA is trained by Expectation Maximization (EM)
           Conditional independence is used to decompose model
           likelihood
                                                   D       W
                               L(N |θ) =                       P(wj , di )n(wj ,di )
                                                 i=1 j=1
                                                                         K
           P(wj , di ) = P(di )P(wj |di ) = P(di )                            P(zk |di )P(wj |zk )
                                                                       k =1

                                         Davide Bacciu         Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                            Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                            Machine Vision Applications


An Object Recognition Example
           Caltech-4 Dataset: cars, faces, motorbikes and airplanes on a
           cluttered background
           Fit a pLSA model with 7 latent topics (4 objects + 3 background)
           Determine the most likely topic of each visual word in image di
                                                           P(zk |di )P(wj |zk )
                                 P(zk |wj , di ) =         K
                                                           k =1   P(zk |di )P(wj |zk )




                               Figure: Images by Sivic et al 2005
                                         Davide Bacciu      Probabilistic Generative Models for Machine Vision
Introduction
                              Machine Vision Background
                                                                Visual Content Representation
 Region Annotation by Hierarchical Region Topic Discovery
                                                                Machine Vision Applications


Latent Dirichlet Allocation (LDA)

                                                            β                  φ
                                                                                   K




                                  α            θ            z                  w
                                                                                Wd
                                                                                       D



            pLSA provides a generative model only for the training data
            LDA treats P(zk |di ) as a latent random variable, sampling
            it from a Dirichlet distribution P(θ|α)
            Maximizes data likelihood

                 P(w|φ, α, β) =                        P(w|z, φ)P(z|θ)P(θ|α)P(φ|β)dθ
                                                   z

                                          Davide Bacciu         Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Hierarchical Latent Topic Models...




   Dependent Pachinko Allocation Model (Li and McCallum, 2006)



                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Image Annotation


           Learn association between image content and text
           Typically exploits region-based image representation
                   Image segmentation
                   Color and texture descriptors
           Non-probabilistic image annotation
                   Information retrieval approaches (e.g. vector based)
                   Multiple instance learning
           Probabilistic Image Annotation
                   Machine translation approach
                   Probabilistic generative
                   Latent topic models



                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Latent Aspect Models in Image Annotation


                                                     µ                                                    µ
                            z            b                 α         θ           z            b
                                          Bd         σ                    Bd                              σ
       α         θ


                            v            t           β                           v             t          β
                                          Md                                                   Md
                                               D                                                    D


                        GM-LDA                                            CORR-LDA

   Based on a Mixture of Gaussians modeling of the Bd image
   regions and a multinomial distribution for annotations Md




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


Image Annotation - Is it Worth the Hassle?




              The Web is full of annotated multimedia information


                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Introduction
                             Machine Vision Background
                                                           Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
                                                           Machine Vision Applications


How Informative is Image Representation?




           BOW assumption is strong in the visual domain: spatial
           context matters!
           Region-based representation is often too coarse and
           unstable.
           How can we retain the robustness of BOW while
           introducing spatial context?
                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


Introduction


           Overtake flat region-based and keypoint-based
           approaches
           Structured representation conveying richer semantics and
           discriminative power than a BOW model without the rigidity
           of a part-based approach
           Key idea
                   Introduce a multi-resolution representation of visual content
                   drawing both on region-based and keypoint-based models
                   Introduce a multi-layered structure at the semantic level of
                   latent topic discovery




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


Multi-Resolution Image Representation


           Multi-resolution    describing visual information at different
           spatial granularity
           BOW representation is based on assumption that spatial
           distribution of visual keywords can be neglected when
           determining the overall image appearance
                 Biased view holding mostly for corpora characterized by
                 small variability of the pictorial content
                 Need of structured approach that can isolate semantically
                 different image portions
           Text representation
                Documents are collections of paragraphs
                Paragraphs are collections of words



                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


Region-Keypoints Representation


                                                           r1
                                                           r2
                                                           r3
                                                           r4                                wr
                                                                              bag−of−words
                                                           r5

                       segmented image                     ...

                                                     bag−of−regions



           Use an unsupervised learning model to extract perceptually
           coherent portions of the image rl
           Use dense sampling to extract a set of local visual descriptors
                                         Davide Bacciu      Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                               Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                               Experimental Evaluation


Hierarchical Region-Topic Probabilistic Latent
Semantic Analysis (HRPLSA)

                                                                          φ
                                                                               T




                                 d          z              t              w
                                                                               Wr
                                                                                     rd
                                                                                          D



   Complete model likelihood
                                                                                                           nilj
                           D     Ri    K                         Wl        T
      LC (N |θ) =                           P(zk |di )Zilk                        P(tp |zk )(φjp )Tkjp 
                          i=1 l=1 k =1                          j=1       p=1


                                         Davide Bacciu         Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


Model Fitting

           HRPLSA parameters are fitted by applying Expectation
           Maximization to the complete log-likelihood, e.g.
                                        D         Ri                  Wl
                                        i=1       l=1 P(zk |di , rl ) j=1 nilj P(tp |zk , wj )
                P(tp |zk ) =                              Ri
                                                   D
                                                   i =1   l =1 P(zk |di , rl )ni l ·

           Model parameters θ
                   P(zk |di ) - Document-specific mixing proportions of region
                   topics
                   P(tp |zk ) - Mixing proportions of keyword topics given region
                   aspects
                   φjp - Multinomial keyword-topic distribution
           Probabilistic inference is extended to images outside the
           training pool by folding-in

                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                                    Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                                    Experimental Evaluation


Modeling Images and Words by HRPLSA
(HRPLSA-ANN)

                                                                                                  d
                                                sky

                                                building
                                                                   P (a|z)
                                                streetlight

                                                sing                              z1        z2        ...   zR
                                                awning
                                                                                            ...
                                                window        P (a|t)
                                                sidewalk
                                                                        t1       t2    ... tW1        t1    t2   ... tWR
                                                plants

                                                car

                                                road
                                                                        w1       w2       wW1         w1    w2      wWR



   Connection between region-topics (keyword-topics) and word is
   learned by maximizing constrained annotation log-likelihood
                                            D     F                          K
                     Lcal (N |θ ) =                        mfi log                P(af |zk )P(zk |di )
                                          i=1 f =1                       k =1

                                         Davide Bacciu              Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                                 Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                                 Experimental Evaluation


Probabilistic Inference
A Few Examples


            Most likely visual aspect within an image di

                                             z i = arg max P(zk |di )
                                                                zk ∈Z

            Most likely visual aspect z ∗ for an image segment rl

                                          z ∗ = arg max P(zk |di , rl )
                                                            zk ∈Z

            Most likely annotation a∗ for region rl in image dnew
                                                            K
                                a∗ = arg max                     P(af |zk )P(zk |dnew )
                                               af ∈A
                                                       k =1



                                          Davide Bacciu          Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


Unsupervised Discovery and Segmentation of Visual
Classes




           Microsoft Research Cambridge dataset (MSRC-B1)
                   240 images and 9 visual classes
                   Ground truth segmentation of visual classes
           Image segments are assigned to the most likely
           discovered aspect and segmentation overlap is measured
                                                           GTo ∩ Rr
                                                ρor =               .
                                                           GTo ∪ Rr

                                         Davide Bacciu      Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


Semantic Segmentation




   Discovery of semantic visual aspects corrects segmentation
   errors


                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                                 Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                                 Experimental Evaluation


MSRC-B1 Semantic Segmentation Result

                               Table:     Segmentation Overlap on the MSRC-B1 Dataset

          Algorithm   Airplanes   Bicycles   Buildings    Cars   Cows    Faces    Grass    Tree   Sky    Average
           dLDA         0.43       0.50        0.16       0.45    0.52    0.44     0.60    0.71   0.74     0.51
          LDA10         0.11       0.06        0.09       0.14    0.11    0.40     0.41    0.69   0.41     0.27
          LDA15         0.08       0.56        0.06       0.14    0.14    0.43     0.57    0.62   0.41     0.33
          LDA20         0.10       0.50        0.40       0.15    0.58    0.45     0.54    0.46   0.50     0.40
          LDA25         0.14       0.52        0.21       0.17    0.48    0.44     0.45    0.59   0.50     0.39
         HPRLSA10
                15      0.32       0.37        0.59       0.68    0.74    0.44     0.87    0.81   0.90     0.62
         HPRLSA10
                20      0.35       0.56        0.65       0.67    0.74    0.47     0.89    0.47   0.91     0.63
         HPRLSA10
                25      0.35       0.52        0.75       0.63    0.73    0.50     0.89    0.71   0.86     0.66
         HPRLSA15
                20      0.36       0.49        0.56       0.60    0.68    0.52     0.89    0.77   0.91     0.64
         HPRLSA15
                25      0.42       0.52        0.61       0.68    0.69    0.45     0.90    0.61   0.92     0.65
         HPRLSA20
                25      0.44       0.43        0.48       0.52    0.67    0.48     0.88    0.75   0.91     0.61
                 10
         HPRLSA25       0.16       0.32        0.15       0.40    0.43    0.22     0.28    0.33   0.38     0.30
                 15
         HPRLSA25       0.24       0.31        0.13       0.38    0.37    0.19     0.25    0.21   0.45     0.28

   Comparative analysis
           HRPLSA
           Multi-segmentation Latent Dirichlet Allocation (LDA)
           Hierarchical Latent Dirichlet Allocation (hLDA)
                                          Davide Bacciu          Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


MSRC-B1 Segmentation Example I




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


MSRC-B1 Segmentation Example II




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


MSRC-B1 Segmentation Example III




                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                             Machine Vision Background
                                                           Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
                                                           Experimental Evaluation


Image Annotation


           Washington Ground Truth dataset
                   854 images manually annotated with 392 words
                   427 training/test images
           LabelMe dataset
                   2688 images annotated with ∼ 400 words
                   800 training / 1888 test images
           Comparative analysis
                   HRPLSA-ANN
                   pLSA-FEATURES
                   Annotation Propagation (AP)
           Precision/recall analysis



                                         Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                                   Machine Vision Background
                                                                                    Hierarchical Probabilistic Latent Semantic Analysis
      Region Annotation by Hierarchical Region Topic Discovery
                                                                                    Experimental Evaluation


Washington Dataset
Image Annotation


                             Precision/Recall Plot                                 Comparative precision/recall results on the Washington dataset



             0.8
                                                                 plsa245
            0.75                                                 hrplsa245
                                                                 ap245
             0.7
                                                                 plsa188

            0.65                                                 hrplsa188
                                                                 ap188
             0.6
precision




                                                                 plsa158
                                                                 hrplsa158
            0.55
                                                                 ap158

             0.5

            0.45

             0.4

            0.35
                   0   0.1   0.2    0.3            0.4   0.5     0.6         0.7
                                          recall




                                                               Davide Bacciu        Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


Washington Dataset
Image Annotation Example




                                          Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                                                            Experimental Evaluation


Washington Dataset
Region Annotation Example (I)


                    Examples of good HRPLSA-ANN annotations

                    50                                                          50
                               clouds                                                                                                          tree
                                                            clouds                               tree
                   100                                                        100
                                                                                                                                        building
                                                                              150                                         tree tree
                   150
                                                                                              tree
                   200
                                                                     clouds   200
                           clouds
                                              sky
                                                                              250                                                       tree
                   250
                                                                                                 tree
                                                                              300
                   300

                                                                     house    350
                   350                    tree
                                                                              400
                   400     people                                                                       tree                 sidewalk    tree
                                                            people            450
                   450
                                                                              500




                                         50
                                                                                 clear
                                        100          clear
                                                                                                         house
                                        150
                                                                                                                        window
                                        200


                                        250         tree             building
                                                                                     tree
                                        300                                                 building

                                        350
                                                                                                                  tree
                                        400                tree

                                                    tree                           tree
                                        450
                                                                           building                              tree

                                        500



                                                 Davide Bacciu                              Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                                                             Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                                                             Experimental Evaluation


Washington Dataset
Region Annotation Example (II)


                         Examples of bad HRPLSA-ANN annotations

                    50                                                               50
                                                                            tree                  clouds
                                  tree                    tree                                                          clouds           clouds
                   100                                                              100
                                         tree
                   150                                            tree              150


                   200                                                              200                    water                 sky
                           tree                  tree
                   250                                                              250
                                                 tree                       tree                                                        tree
                                                                                                     tree
                   300                                                              300
                                                                 tree
                   350
                                                 tree                               350

                                                                                                                          sky
                   400                                                              400
                           tree                                                                     sky                                tree
                                                                             bush
                   450                                                              450
                                                 flower
                   500                                                              500




                                                 50
                                                                                    overcast
                                                                                                              sky
                                                                   people
                                                100


                                                150

                                                                                          house                clouds
                                                200


                                                250                 tree
                                                                                                            people
                                                300


                                                350
                                                                                      tree
                                                400
                                                                 tree                                        tree
                                                450


                                                500



                                                        Davide Bacciu                        Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


Washington Dataset
Annotated Segments




                                          Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                                             Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                                             Experimental Evaluation


LabelMe Dataset
Image Annotation

                                                      Precision/Recall Plot

                                      0.8
                                                                                               plsa60
                                     0.75                                                      hrplsa60,80
                                                                                               plsa25
                                      0.7
                                                                                               hrplsa25,50
                                                                                               ap
                                     0.65
                         Precision




                                      0.6

                                     0.55

                                      0.5

                                     0.45

                                      0.4

                                     0.35
                                            0   0.1       0.2     0.3            0.4   0.5      0.6          0.7
                                                                        Recall




                                                      Davide Bacciu          Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


LabelMe Dataset
Example of Annotated Images




                                          Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                                         Machine Vision Background
                                                                           Hierarchical Probabilistic Latent Semantic Analysis
            Region Annotation by Hierarchical Region Topic Discovery
                                                                           Experimental Evaluation


LabelMe Dataset
Recalling Specific Terms



                                      Balcony                                                                     Sign
                                        balcony                                                                    sign
                   0.45                                                                        0.45
                                                               hrplsa                                                              hrplsa
                    0.4                                        plsa                             0.4                                plsa
                                                               ap                                                                  ap
                   0.35                                                                        0.35

                    0.3                                                                         0.3
precision/recall




                                                                            precision/recall
                   0.25                                                                        0.25

                    0.2                                                                         0.2

                   0.15                                                                        0.15

                    0.1                                                                         0.1

                   0.05                                                                        0.05

                     0                                                                           0
                          precision               recall                                              precision           recall




                                                           Davide Bacciu   Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                                        Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                                        Experimental Evaluation


LabelMe Dataset
Region Annotation Issue I



     Region annotation tends to predict only a limited number of
     words

  0.16                                                                                      coast                                forest
                                                                           400                                  2000
  0.14                                                                     200                                  1000

  0.12                                                                       0                                    0
                                                                                  0   100      200        300          0   100            200   300
                                                                                         highway                                   city
   0.1                                                                     1000                                 2000
                                                                           500                                  1000
  0.08
                                                                             0                                    0
                                                                                  0   100     200         300          0   100         200      300
  0.06                                                                                  mountain                                 country
                                                                           1000                                 1000
  0.04
                                                                           500                                  500
  0.02                                                                       0                                    0
                                                                                  0   100         200     300          0   100         200      300
                                                                                            street                           tallbuilding
    0                                                                      2000                                 2000
             g




                          n
                        on




                                             er
                         ld




                       ad




                                                         w
                                                          e

                                                         er
                          r




                               a

                                    y
                 ca




                                   sk
          in




                       ai




                                                  tre
                              se




                                                     do
                     fie




                                         ap



                                                      at
                     rs

                    ro
         ild




                    nt




                                                                           1000                                 1000
                                                    w

                                                   in
                  pe




                                        cr
                      ou
     bu




                                                        w
                                    ys
                      m




                                   sk




                                                                             0                                    0
                                                                                  0   100           200   300          0   100            200   300




                                                        Davide Bacciu   Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


LabelMe Dataset
Region Annotation Issue II

                  Distribution of region-topics in mountain caption




                                                                     Topic 11                Topic 40




                                                                                 Topic 20




    The unbalanced distribution of term occurrences introduces a bias in
    the generation of the segment caption that yields to distinct region
    aspects being associated to the same frequently co-occurring word

                                          Davide Bacciu     Probabilistic Generative Models for Machine Vision
Visual Content Representation
                              Machine Vision Background
                                                            Hierarchical Probabilistic Latent Semantic Analysis
 Region Annotation by Hierarchical Region Topic Discovery
                                                            Experimental Evaluation


LabelMe Dataset
Annotated Segments




                                          Davide Bacciu     Probabilistic Generative Models for Machine Vision
Concluding Remarks
                          Conclusion
                                       Bibliography and Online Resources


Wrapping up...

     Inspiration can be sought in related application fields
          Bag of words representation
          Generative models for text
          Need to pay attention to the underlying assumptions of a
          model!
     Hierarchical multi-resolution latent aspect model (HRPLSA)
          Unsupervised semantic segmentation of visual content
          Multi-level image captioning
     Future challenges
          Generalize multi-resolution representation and topic
          hierarchy beyond a two-layered structure
          Video/multimedia understanding



                       Davide Bacciu   Probabilistic Generative Models for Machine Vision
Concluding Remarks
                             Conclusion
                                          Bibliography and Online Resources


Online Resources
  A notable introductory course to object recognition (with code)
       http://people.csail.mit.edu/torralba/
       shortCourseRLOC/
  Code repositories
       http://www.cs.ubc.ca/~lowe/keypoints/ (Original
       SIFT software by D. Lowe)
       http://vlfeat.org/ (SIFT, MSER and VQ routines)
       www.robots.ox.ac.uk/~vgg/research/affine/ (Affine
       detectors and more)
       http://www.di.ens.fr/~russell/projects/mult_
       seg_discovery/index.html (Multiple segmentation LDA)
  Image annotation tool and repository
       http://labelme.csail.mit.edu/ (LabelMe)
                          Davide Bacciu   Probabilistic Generative Models for Machine Vision
Concluding Remarks
                                        Conclusion
                                                          Bibliography and Online Resources


Bibliography

     D. Bacciu, A Perceptual Learning Model to Discover the Hierarchical Latent Structure of Image Collections,
     Ph.D. Thesis, 2008.
     J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos.
     Proceedings of the Ninth IEEE International Conference on Computer Vision, ICCV 2003, pages
     1470–1477, vol.2, Oct. 2003.
     J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman. Discovering objects and their location in
     images. Proceedings of the Tenth IEEE International Conference on Computer Vision, ICCV 2005,
     370–377, Oct. 2005.
     J. Sivic, B. C. Russell, A. Zisserman, W. T. Freeman, and A. A. Efros. Unsupervised discovery of visual
     object class hierarchies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
     2008.
     K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. M. Blei, and M. I. Jordan. Matching words and pictures.
     Journal of Machine Learning Research, vol. 3, 1107–1135, Feb. 2003.
     D. M. Blei and M. I. Jordan. Modeling annotated data. Proceedings of the International Conference on
     Research and Development in Information Retrieval, SIGIR 2003, Aug. 2003.
     F. Monay and D. Gatica-Perez. Modeling semantic aspects for cross-media image indexing. IEEE
     Transactions on Pattern Analysis and Machine Intelligence, 29(10):1802–1817, Oct. 2007.




                                    Davide Bacciu         Probabilistic Generative Models for Machine Vision
Concluding Remarks
   Conclusion
                Bibliography and Online Resources




       Thank You




Davide Bacciu   Probabilistic Generative Models for Machine Vision

Weitere ähnliche Inhalte

Andere mochten auch

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin GlassesAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin GlassesMartin Pelikan
 
Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lesley Chapman
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation Todd Rutherford
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...Brittne Kakulla, Ph.D.
 
Analysis of differential investor performance captstone presentation final
Analysis of differential investor  performance   captstone  presentation finalAnalysis of differential investor  performance   captstone  presentation final
Analysis of differential investor performance captstone presentation finalHoward Ho
 
Capital Bikeshare Presentation
Capital Bikeshare PresentationCapital Bikeshare Presentation
Capital Bikeshare Presentationdonahuerm
 
Deep Advances in Generative Modeling
Deep Advances in Generative ModelingDeep Advances in Generative Modeling
Deep Advances in Generative Modelingindico data
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectMark Phillips
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Noah Turner
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationshipsdivyakalsi89
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 

Andere mochten auch (20)

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin GlassesAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
 
Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lets eat presentation_final_20160521
Lets eat presentation_final_20160521
 
Red Blue Presentation
Red Blue PresentationRed Blue Presentation
Red Blue Presentation
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
Personalizing a Stream of Content
Personalizing a Stream of ContentPersonalizing a Stream of Content
Personalizing a Stream of Content
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
 
Analysis of differential investor performance captstone presentation final
Analysis of differential investor  performance   captstone  presentation finalAnalysis of differential investor  performance   captstone  presentation final
Analysis of differential investor performance captstone presentation final
 
Capital Bikeshare Presentation
Capital Bikeshare PresentationCapital Bikeshare Presentation
Capital Bikeshare Presentation
 
Deep Advances in Generative Modeling
Deep Advances in Generative ModelingDeep Advances in Generative Modeling
Deep Advances in Generative Modeling
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Machine learning
Machine learningMachine learning
Machine learning
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone Project
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
 
Hotel Performance FINAL
Hotel Performance FINALHotel Performance FINAL
Hotel Performance FINAL
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 

Ähnlich wie Probabilistic generative models for machine vision

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Konstantinos Zagoris
 
Computer Assisted Manufacturing
Computer Assisted ManufacturingComputer Assisted Manufacturing
Computer Assisted Manufacturingvrt-medialab
 
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTION
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTIONA NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTION
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTIONIJASCSE
 
Multi-touch 3D-set modeler
Multi-touch 3D-set modelerMulti-touch 3D-set modeler
Multi-touch 3D-set modelervrt-medialab
 
iMinds The Conference 2012: Adrian Munteanu
iMinds The Conference 2012: Adrian MunteanuiMinds The Conference 2012: Adrian Munteanu
iMinds The Conference 2012: Adrian Munteanuimec
 
426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in ARMark Billinghurst
 
AI Use Cases: Special Attention on Semantic Segmentation
AI Use Cases: Special Attention on Semantic SegmentationAI Use Cases: Special Attention on Semantic Segmentation
AI Use Cases: Special Attention on Semantic SegmentationFrederick Apina
 
P02 sparse coding cvpr2012 deep learning methods for vision
P02 sparse coding cvpr2012 deep learning methods for visionP02 sparse coding cvpr2012 deep learning methods for vision
P02 sparse coding cvpr2012 deep learning methods for visionzukun
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)npinto
 
01 introduction image processing analysis
01 introduction image processing analysis01 introduction image processing analysis
01 introduction image processing analysisRumah Belajar
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]imec.archive
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]imec.archive
 

Ähnlich wie Probabilistic generative models for machine vision (17)

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...
 
Computer Assisted Manufacturing
Computer Assisted ManufacturingComputer Assisted Manufacturing
Computer Assisted Manufacturing
 
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTION
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTIONA NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTION
A NOVAL ARTECHTURE FOR 3D MODEL IN VIRTUAL COMMUNITIES FROM FACE DETECTION
 
Vb
VbVb
Vb
 
Multi-touch 3D-set modeler
Multi-touch 3D-set modelerMulti-touch 3D-set modeler
Multi-touch 3D-set modeler
 
426 lecture2: AR Technology
426 lecture2: AR Technology426 lecture2: AR Technology
426 lecture2: AR Technology
 
iMinds The Conference 2012: Adrian Munteanu
iMinds The Conference 2012: Adrian MunteanuiMinds The Conference 2012: Adrian Munteanu
iMinds The Conference 2012: Adrian Munteanu
 
426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR426 Lecture 9: Research Directions in AR
426 Lecture 9: Research Directions in AR
 
Ib2
Ib2Ib2
Ib2
 
Sa 008 patterns
Sa 008 patternsSa 008 patterns
Sa 008 patterns
 
AI Use Cases: Special Attention on Semantic Segmentation
AI Use Cases: Special Attention on Semantic SegmentationAI Use Cases: Special Attention on Semantic Segmentation
AI Use Cases: Special Attention on Semantic Segmentation
 
1.pdf
1.pdf1.pdf
1.pdf
 
P02 sparse coding cvpr2012 deep learning methods for vision
P02 sparse coding cvpr2012 deep learning methods for visionP02 sparse coding cvpr2012 deep learning methods for vision
P02 sparse coding cvpr2012 deep learning methods for vision
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
 
01 introduction image processing analysis
01 introduction image processing analysis01 introduction image processing analysis
01 introduction image processing analysis
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Kürzlich hochgeladen (20)

fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Probabilistic generative models for machine vision

  • 1. Machine Vision Background Region Annotation by Hierarchical Region Topic Discovery Probabilistic Generative Models for Machine Vision Davide Bacciu bacciu@di.unipi.it Dipartimento di Informatica Università di Pisa Corso di Intelligenza Artificiale Prof. Alessandro Sperduti Università degli Studi di Padova A.A. 2008/2009 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 2. Machine Vision Background Region Annotation by Hierarchical Region Topic Discovery Outline 1 Machine Vision Background Introduction Visual Content Representation Machine Vision Applications 2 Region Annotation by Hierarchical Region Topic Discovery Visual Content Representation Hierarchical Probabilistic Latent Semantic Analysis Experimental Evaluation Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 3. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Introduction Visual content representation Visual features - how to identify and describe the single bits of visual information Image representation - how to describe the visual information within a picture Machine vision applications Scene and object recognition Image annotation Focus on Probabilistic Generative Models and the Bag of Words image representation Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 4. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Visual Features Descriptors How do we describe the single bits of visual information? Global descriptors Color histograms Shape features Texture Spatial Envelope Local descriptors Gabor filters Differential filters Distribution-based descriptors Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 5. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Scale Invariant Feature Transform (SIFT) Calculate gradient magnitude and orientation at a given keypoint (x, y ) and scale σ Compute 3D histogram of magnitude orientations in a 4 × 4 neighborhood of the keypoint Angles are quantized to 8 directions Robust to geometric and illumination distortion Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 6. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Visual Features Detectors How do we identify the interesting/informative parts of an image? Feature detectors Segment/Blob detectors Corner/Interest point detectors Affine invariant keypoint detectors Dense sampling Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 7. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Image Representation Straightforward for global descriptors An image corresponds to its descriptor Images are confronted, classified and indexed based on their global color and/or texture histograms Requires an additional step for local descriptors Local descriptors provide only information on a portion of the scene Probabilistic modeling of segments Bag of words Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 8. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Bag of Words Document Representation Count the occurrences of each dictionary word in your document Represent the document as a vector of word counts Definition (Bag of Words Assumption) Word order is not relevant for determining document semantics Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 9. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Bag of Words Image Representation Each image is a document and each visual patch is a word Count the occurrences of each dictionary visual word (visterm) in your image Represent the image as a vector of visterm counts (histogram) ... k−means Image Codebook Collection Local patches Vector Quantization Image Input Image BOW Collection Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 10. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Image Understanding Applications Images are typically represented as vector of features Discover the semantics of an image from its vectorial representation Object recognition (e.g. Airplane) Scene recognition (e.g. Forest) Image annotation (e.g. Sky, Tree, Boat, Sea, ...) Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 11. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Scene and Object Recognition Structured approaches Place Context Object/scene categories are represented as collections of connected parts characterized by Object Context distinctive appearance and spatial Part Context position Part-based probabilistic models Weakly structured approaches PixelContext Based on bag-of-word assumption Latent aspect models Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 12. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Latent Aspect Discovery Rooted into text processing Probabilistic models describing the generative process of image content using hidden (i.e. latent) variables Model likelihood is used to measure the fit of the unknown parameters with respect to the observations Model parameters are fitted by Expectation Maximization (EM) or variational methods Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 13. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Probabilistic Latent Semantic Analysis (pLSA) (I) P(d) P(z|d) P(w|z) d z w Wd D Observations are collected in the integer matrix n(wj , di ) Every observation is associated to a latent topic k by means of the hidden variable zk Each hidden topic k denotes a particular visual theme (e.g. an object) Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 14. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Probabilistic Latent Semantic Analysis (pLSA) (II) P(d) P(z|d) P(w|z) d z w Wd D pLSA is trained by Expectation Maximization (EM) Conditional independence is used to decompose model likelihood D W L(N |θ) = P(wj , di )n(wj ,di ) i=1 j=1 K P(wj , di ) = P(di )P(wj |di ) = P(di ) P(zk |di )P(wj |zk ) k =1 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 15. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications An Object Recognition Example Caltech-4 Dataset: cars, faces, motorbikes and airplanes on a cluttered background Fit a pLSA model with 7 latent topics (4 objects + 3 background) Determine the most likely topic of each visual word in image di P(zk |di )P(wj |zk ) P(zk |wj , di ) = K k =1 P(zk |di )P(wj |zk ) Figure: Images by Sivic et al 2005 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 16. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Latent Dirichlet Allocation (LDA) β φ K α θ z w Wd D pLSA provides a generative model only for the training data LDA treats P(zk |di ) as a latent random variable, sampling it from a Dirichlet distribution P(θ|α) Maximizes data likelihood P(w|φ, α, β) = P(w|z, φ)P(z|θ)P(θ|α)P(φ|β)dθ z Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 17. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Hierarchical Latent Topic Models... Dependent Pachinko Allocation Model (Li and McCallum, 2006) Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 18. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Image Annotation Learn association between image content and text Typically exploits region-based image representation Image segmentation Color and texture descriptors Non-probabilistic image annotation Information retrieval approaches (e.g. vector based) Multiple instance learning Probabilistic Image Annotation Machine translation approach Probabilistic generative Latent topic models Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 19. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Latent Aspect Models in Image Annotation µ µ z b α θ z b Bd σ Bd σ α θ v t β v t β Md Md D D GM-LDA CORR-LDA Based on a Mixture of Gaussians modeling of the Bd image regions and a multinomial distribution for annotations Md Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 20. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications Image Annotation - Is it Worth the Hassle? The Web is full of annotated multimedia information Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 21. Introduction Machine Vision Background Visual Content Representation Region Annotation by Hierarchical Region Topic Discovery Machine Vision Applications How Informative is Image Representation? BOW assumption is strong in the visual domain: spatial context matters! Region-based representation is often too coarse and unstable. How can we retain the robustness of BOW while introducing spatial context? Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 22. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Introduction Overtake flat region-based and keypoint-based approaches Structured representation conveying richer semantics and discriminative power than a BOW model without the rigidity of a part-based approach Key idea Introduce a multi-resolution representation of visual content drawing both on region-based and keypoint-based models Introduce a multi-layered structure at the semantic level of latent topic discovery Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 23. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Multi-Resolution Image Representation Multi-resolution describing visual information at different spatial granularity BOW representation is based on assumption that spatial distribution of visual keywords can be neglected when determining the overall image appearance Biased view holding mostly for corpora characterized by small variability of the pictorial content Need of structured approach that can isolate semantically different image portions Text representation Documents are collections of paragraphs Paragraphs are collections of words Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 24. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Region-Keypoints Representation r1 r2 r3 r4 wr bag−of−words r5 segmented image ... bag−of−regions Use an unsupervised learning model to extract perceptually coherent portions of the image rl Use dense sampling to extract a set of local visual descriptors Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 25. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Hierarchical Region-Topic Probabilistic Latent Semantic Analysis (HRPLSA) φ T d z t w Wr rd D Complete model likelihood  nilj D Ri K Wl T LC (N |θ) = P(zk |di )Zilk  P(tp |zk )(φjp )Tkjp  i=1 l=1 k =1 j=1 p=1 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 26. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Model Fitting HRPLSA parameters are fitted by applying Expectation Maximization to the complete log-likelihood, e.g. D Ri Wl i=1 l=1 P(zk |di , rl ) j=1 nilj P(tp |zk , wj ) P(tp |zk ) = Ri D i =1 l =1 P(zk |di , rl )ni l · Model parameters θ P(zk |di ) - Document-specific mixing proportions of region topics P(tp |zk ) - Mixing proportions of keyword topics given region aspects φjp - Multinomial keyword-topic distribution Probabilistic inference is extended to images outside the training pool by folding-in Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 27. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Modeling Images and Words by HRPLSA (HRPLSA-ANN) d sky building P (a|z) streetlight sing z1 z2 ... zR awning ... window P (a|t) sidewalk t1 t2 ... tW1 t1 t2 ... tWR plants car road w1 w2 wW1 w1 w2 wWR Connection between region-topics (keyword-topics) and word is learned by maximizing constrained annotation log-likelihood D F K Lcal (N |θ ) = mfi log P(af |zk )P(zk |di ) i=1 f =1 k =1 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 28. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Probabilistic Inference A Few Examples Most likely visual aspect within an image di z i = arg max P(zk |di ) zk ∈Z Most likely visual aspect z ∗ for an image segment rl z ∗ = arg max P(zk |di , rl ) zk ∈Z Most likely annotation a∗ for region rl in image dnew K a∗ = arg max P(af |zk )P(zk |dnew ) af ∈A k =1 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 29. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Unsupervised Discovery and Segmentation of Visual Classes Microsoft Research Cambridge dataset (MSRC-B1) 240 images and 9 visual classes Ground truth segmentation of visual classes Image segments are assigned to the most likely discovered aspect and segmentation overlap is measured GTo ∩ Rr ρor = . GTo ∪ Rr Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 30. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Semantic Segmentation Discovery of semantic visual aspects corrects segmentation errors Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 31. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation MSRC-B1 Semantic Segmentation Result Table: Segmentation Overlap on the MSRC-B1 Dataset Algorithm Airplanes Bicycles Buildings Cars Cows Faces Grass Tree Sky Average dLDA 0.43 0.50 0.16 0.45 0.52 0.44 0.60 0.71 0.74 0.51 LDA10 0.11 0.06 0.09 0.14 0.11 0.40 0.41 0.69 0.41 0.27 LDA15 0.08 0.56 0.06 0.14 0.14 0.43 0.57 0.62 0.41 0.33 LDA20 0.10 0.50 0.40 0.15 0.58 0.45 0.54 0.46 0.50 0.40 LDA25 0.14 0.52 0.21 0.17 0.48 0.44 0.45 0.59 0.50 0.39 HPRLSA10 15 0.32 0.37 0.59 0.68 0.74 0.44 0.87 0.81 0.90 0.62 HPRLSA10 20 0.35 0.56 0.65 0.67 0.74 0.47 0.89 0.47 0.91 0.63 HPRLSA10 25 0.35 0.52 0.75 0.63 0.73 0.50 0.89 0.71 0.86 0.66 HPRLSA15 20 0.36 0.49 0.56 0.60 0.68 0.52 0.89 0.77 0.91 0.64 HPRLSA15 25 0.42 0.52 0.61 0.68 0.69 0.45 0.90 0.61 0.92 0.65 HPRLSA20 25 0.44 0.43 0.48 0.52 0.67 0.48 0.88 0.75 0.91 0.61 10 HPRLSA25 0.16 0.32 0.15 0.40 0.43 0.22 0.28 0.33 0.38 0.30 15 HPRLSA25 0.24 0.31 0.13 0.38 0.37 0.19 0.25 0.21 0.45 0.28 Comparative analysis HRPLSA Multi-segmentation Latent Dirichlet Allocation (LDA) Hierarchical Latent Dirichlet Allocation (hLDA) Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 32. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation MSRC-B1 Segmentation Example I Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 33. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation MSRC-B1 Segmentation Example II Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 34. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation MSRC-B1 Segmentation Example III Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 35. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Image Annotation Washington Ground Truth dataset 854 images manually annotated with 392 words 427 training/test images LabelMe dataset 2688 images annotated with ∼ 400 words 800 training / 1888 test images Comparative analysis HRPLSA-ANN pLSA-FEATURES Annotation Propagation (AP) Precision/recall analysis Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 36. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Washington Dataset Image Annotation Precision/Recall Plot Comparative precision/recall results on the Washington dataset 0.8 plsa245 0.75 hrplsa245 ap245 0.7 plsa188 0.65 hrplsa188 ap188 0.6 precision plsa158 hrplsa158 0.55 ap158 0.5 0.45 0.4 0.35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 recall Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 37. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Washington Dataset Image Annotation Example Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 38. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Washington Dataset Region Annotation Example (I) Examples of good HRPLSA-ANN annotations 50 50 clouds tree clouds tree 100 100 building 150 tree tree 150 tree 200 clouds 200 clouds sky 250 tree 250 tree 300 300 house 350 350 tree 400 400 people tree sidewalk tree people 450 450 500 50 clear 100 clear house 150 window 200 250 tree building tree 300 building 350 tree 400 tree tree tree 450 building tree 500 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 39. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Washington Dataset Region Annotation Example (II) Examples of bad HRPLSA-ANN annotations 50 50 tree clouds tree tree clouds clouds 100 100 tree 150 tree 150 200 200 water sky tree tree 250 250 tree tree tree tree 300 300 tree 350 tree 350 sky 400 400 tree sky tree bush 450 450 flower 500 500 50 overcast sky people 100 150 house clouds 200 250 tree people 300 350 tree 400 tree tree 450 500 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 40. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation Washington Dataset Annotated Segments Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 41. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Image Annotation Precision/Recall Plot 0.8 plsa60 0.75 hrplsa60,80 plsa25 0.7 hrplsa25,50 ap 0.65 Precision 0.6 0.55 0.5 0.45 0.4 0.35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Recall Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 42. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Example of Annotated Images Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 43. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Recalling Specific Terms Balcony Sign balcony sign 0.45 0.45 hrplsa hrplsa 0.4 plsa 0.4 plsa ap ap 0.35 0.35 0.3 0.3 precision/recall precision/recall 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 precision recall precision recall Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 44. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Region Annotation Issue I Region annotation tends to predict only a limited number of words 0.16 coast forest 400 2000 0.14 200 1000 0.12 0 0 0 100 200 300 0 100 200 300 highway city 0.1 1000 2000 500 1000 0.08 0 0 0 100 200 300 0 100 200 300 0.06 mountain country 1000 1000 0.04 500 500 0.02 0 0 0 100 200 300 0 100 200 300 street tallbuilding 0 2000 2000 g n on er ld ad w e er r a y ca sk in ai tre se do fie ap at rs ro ild nt 1000 1000 w in pe cr ou bu w ys m sk 0 0 0 100 200 300 0 100 200 300 Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 45. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Region Annotation Issue II Distribution of region-topics in mountain caption Topic 11 Topic 40 Topic 20 The unbalanced distribution of term occurrences introduces a bias in the generation of the segment caption that yields to distinct region aspects being associated to the same frequently co-occurring word Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 46. Visual Content Representation Machine Vision Background Hierarchical Probabilistic Latent Semantic Analysis Region Annotation by Hierarchical Region Topic Discovery Experimental Evaluation LabelMe Dataset Annotated Segments Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 47. Concluding Remarks Conclusion Bibliography and Online Resources Wrapping up... Inspiration can be sought in related application fields Bag of words representation Generative models for text Need to pay attention to the underlying assumptions of a model! Hierarchical multi-resolution latent aspect model (HRPLSA) Unsupervised semantic segmentation of visual content Multi-level image captioning Future challenges Generalize multi-resolution representation and topic hierarchy beyond a two-layered structure Video/multimedia understanding Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 48. Concluding Remarks Conclusion Bibliography and Online Resources Online Resources A notable introductory course to object recognition (with code) http://people.csail.mit.edu/torralba/ shortCourseRLOC/ Code repositories http://www.cs.ubc.ca/~lowe/keypoints/ (Original SIFT software by D. Lowe) http://vlfeat.org/ (SIFT, MSER and VQ routines) www.robots.ox.ac.uk/~vgg/research/affine/ (Affine detectors and more) http://www.di.ens.fr/~russell/projects/mult_ seg_discovery/index.html (Multiple segmentation LDA) Image annotation tool and repository http://labelme.csail.mit.edu/ (LabelMe) Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 49. Concluding Remarks Conclusion Bibliography and Online Resources Bibliography D. Bacciu, A Perceptual Learning Model to Discover the Hierarchical Latent Structure of Image Collections, Ph.D. Thesis, 2008. J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. Proceedings of the Ninth IEEE International Conference on Computer Vision, ICCV 2003, pages 1470–1477, vol.2, Oct. 2003. J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman. Discovering objects and their location in images. Proceedings of the Tenth IEEE International Conference on Computer Vision, ICCV 2005, 370–377, Oct. 2005. J. Sivic, B. C. Russell, A. Zisserman, W. T. Freeman, and A. A. Efros. Unsupervised discovery of visual object class hierarchies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, vol. 3, 1107–1135, Feb. 2003. D. M. Blei and M. I. Jordan. Modeling annotated data. Proceedings of the International Conference on Research and Development in Information Retrieval, SIGIR 2003, Aug. 2003. F. Monay and D. Gatica-Perez. Modeling semantic aspects for cross-media image indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):1802–1817, Oct. 2007. Davide Bacciu Probabilistic Generative Models for Machine Vision
  • 50. Concluding Remarks Conclusion Bibliography and Online Resources Thank You Davide Bacciu Probabilistic Generative Models for Machine Vision