SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Novel Approaches to Natural Scene Categorization
                    Amit Prabhudesai
                   Roll No. 04307002
                  amitp@ee.iitb.ac.in


                 M.Tech Thesis Defence
                  Under the guidance of
                Prof. Subhasis Chaudhuri
          Indian Institute of Technology, Bombay




                                                   Natural Scene Categorization – p.1/32
Overview of topics to be covered

     • Natural Scene Categorization: Challenges
     • Our contribution
        ◦ Qualitative visual environment description
          • Portable, real-time system to aid the visually impaired
          • System has peripheral vision!
        ◦ Model-based approaches
          • Use of stochastic models to capture semantics
          • pLSA and maximum entropy models
     • Conclusions and Future Work




                                                         Natural Scene Categorization – p.2/32
Natural Scene Categorization

     • Interesting application of a CBIR system
     • Images from a broad image domain: diverse and often
       ambiguous
     • Bridging the semantic gap
     • Grouping scenes into semantically meaningful categories
       could aid further retrieval
     • Efficient schemes for grouping images into semantic
       categories




                                                       Natural Scene Categorization – p.3/32
Qualitative Visual Environment Retrieval

                                                      BUILDING
                                SKY


                                  LAWN
                                       FR
                                  LT        RT




                        WOODS
                                  LB        RB          P3
                                                 P2

                                       P1
                                      WATER BODY




     • Use of omnidirectional images
     • Challenges
        ◦ Unstructured environment
        ◦ No prior learning (unlike navigation/localization)
     • Target application and objective
        ◦ Wearable computing community, emphasis on visually
          challenged people
        ◦ Real-time operation

                                                                 Natural Scene Categorization – p.4/32
Qualitative Visual Environment System: Overview

     • Environment representation
     • Environment retrieval
        ◦ View partitioning
        ◦ Feature extraction
        ◦ Node annotation
        ◦ Dynamic node annotation
        ◦ Real-time operation
     • Results




                                                  Natural Scene Categorization – p.5/32
System Overview (contd.)

     • Environment representation
        ◦ Image database containing images belonging to 6
            classes: Lawns(L), Woods(W), Buildings(B),
            Waterbodies(H), Roads(R) and Traffic(T)
        ◦   Moderately large intra-class variance (in the feature
            space) in images of each category
        ◦   Description relative to the person using the system: e.g.,
            ‘to left of’, ‘in the front’, etc.
        ◦   Topological relationships indicated by a graph
        ◦   Each node annotated by an identifier associated with a
            class




                                                            Natural Scene Categorization – p.6/32
System Overview (contd.)

     • Environment Retrieval
        ◦ View Partitioning
                                          FORWARD DIRECTION
                                                  FR


                    FR               LT                                                                                           RT




                                               




                                                       




                                                               




                                                                       




                                                                               




                                                                                           
               LT        RT




                                               




                                                  ¡

                                                       




                                                          ¡

                                                               




                                                                  ¡

                                                                       




                                                                          ¡

                                                                               




                                                                                      ¡

                                                                                           
                                               




                                                  ¡

                                                       




                                                          ¡

                                                               




                                                                  ¡

                                                                       




                                                                          ¡

                                                                               




                                                                                      ¡

                                                                                           
                                               




                                                  ¡

                                                       




                                                          ¡

                                                               




                                                                  ¡

                                                                       




                                                                          ¡

                                                                               




                                                                                      ¡

                                                                                           
                                               




                                                  ¡

                                                       




                                                          ¡

                                                               




                                                                  ¡

                                                                       




                                                                          ¡

                                                                               




                                                                                      ¡

                                                                                           
                                               




                                                  ¡

                                                       




                                                          ¡

                                                               




                                                                  ¡

                                                                       




                                                                          ¡

                                                                               




                                                                                      ¡

                                                                                           
               LB        RB




                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                    XX               LB                                                                                           RB




                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                                                                                  £




                                                                                      ¢




                                                                                              £




                                                                                                  ¢




                                                                                                      £




                                                                                                          ¢




                                                                                                              £




                                                                                                                  ¢




                                                                                                                      £




                                                                                                                          ¢




                                                                                                                              £




                                                                                                                                  ¢
                                                                                                                                  BS
                    BS                                            XX
                                          BACKWARD DIRECTION


          View Partitioning    Graphical representation




                                                                                                                                       Natural Scene Categorization – p.7/32
System Overview (contd.)

     • Environment Retrieval
        ◦ View Partitioning
                                                   FORWARD DIRECTION
                                                           FR


                     FR                       LT                                                                                           RT




                                                        




                                                                




                                                                        




                                                                                




                                                                                        




                                                                                                    
                LT        RT




                                                        




                                                           ¡

                                                                




                                                                   ¡

                                                                        




                                                                           ¡

                                                                                




                                                                                   ¡

                                                                                        




                                                                                               ¡

                                                                                                    
                                                        




                                                           ¡

                                                                




                                                                   ¡

                                                                        




                                                                           ¡

                                                                                




                                                                                   ¡

                                                                                        




                                                                                               ¡

                                                                                                    
                                                        




                                                           ¡

                                                                




                                                                   ¡

                                                                        




                                                                           ¡

                                                                                




                                                                                   ¡

                                                                                        




                                                                                               ¡

                                                                                                    
                                                        




                                                           ¡

                                                                




                                                                   ¡

                                                                        




                                                                           ¡

                                                                                




                                                                                   ¡

                                                                                        




                                                                                               ¡

                                                                                                    
                                                        




                                                           ¡

                                                                




                                                                   ¡

                                                                        




                                                                           ¡

                                                                                




                                                                                   ¡

                                                                                        




                                                                                               ¡

                                                                                                    
                LB        RB




                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                     XX                       LB                                                                                           RB




                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                                                                                           £




                                                                                               ¢




                                                                                                       £




                                                                                                           ¢




                                                                                                               £




                                                                                                                   ¢




                                                                                                                       £




                                                                                                                           ¢




                                                                                                                               £




                                                                                                                                   ¢




                                                                                                                                       £




                                                                                                                                           ¢
                                                                                                                                           BS
                     BS                                                    XX
                                                   BACKWARD DIRECTION


          View Partitioning            Graphical representation

        ◦ Feature Extraction
          • Feature invariant to scaling, viewpoint, illumination
            changes, and geometric warping introduced by
            omnicam images
          • Colour histogram selected as the feature for
            performing CBIR

                                                                                                                                                Natural Scene Categorization – p.7/32
System Overview (contd.)

     • Environment Retrieval
        ◦ Node annotation
          • Objective: Robust retrieval against illumination
            changes and intra-class variations
          • Solution: Annotation decided by a simple voting
            scheme




                                                         Natural Scene Categorization – p.8/32
System Overview (contd.)

     • Environment Retrieval
        ◦ Node annotation
          • Objective: Robust retrieval against illumination
            changes and intra-class variations
          • Solution: Annotation decided by a simple voting
            scheme
        ◦ Dynamic node annotation
          • Temporal evolution of graph Gn with time tn
          • Complete temporal evolution of the graph given by G,
             obtained by concatenating the subgraphs Gn ,
             i.e.,G = {G1 , G2 , . . . , Gk , . . .}




                                                         Natural Scene Categorization – p.8/32
System Overview (contd.)

     • Environment Retrieval
        ◦ Real-time operation
          • Colour histogram: compact feature vector
          • Pre-computed histograms of all the database images
          • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
            100 ms for single omnicam image




                                                     Natural Scene Categorization – p.9/32
System Overview (contd.)

     • Environment Retrieval
        ◦ Real-time operation
          • Colour histogram: compact feature vector
          • Pre-computed histograms of all the database images
          • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼
            100 ms for single omnicam image
        ◦ Portable, low-cost system for visually impaired
          • Modest hardware and software requirements
          • Easily put together using off-the-shelf components




                                                        Natural Scene Categorization – p.9/32
System Overview (contd.)

     • Results
        ◦ Cylindrical concentric mosaics




                                           Natural Scene Categorization – p.10/32
System Overview (contd.)

     • Results
        ◦ Cylindrical concentric mosaics




                                           Natural Scene Categorization – p.10/32
System Overview (contd.)

     • Results
        ◦ Still omnicam image




                                Natural Scene Categorization – p.11/32
System Overview (contd.)

     • Results
        ◦ Still omnicam image




                                Natural Scene Categorization – p.11/32
System Overview (contd.)

     • Results
        ◦ Omnivideo sequence

                               FORWARD DIRECTION
                                                                         n

                                                W

                                                W                W

                                            B                   W
                                W
                                                            W                 W 20
                               W        W

                                                            W                W 15
                           W            W
                                            W
                           W                            W X              W 10
                                                                                    n
                                        W                            B
                                                            X            5
                       W
                                    W
                                                        X
                                                                    W1
                                    W
                                                        X       BACKWARD DIRECTION

                                B
                                                    X




                                                                                        Natural Scene Categorization – p.12/32
System Overview (contd.)

     • Results
        ◦ Omnivideo sequence

                               FORWARD DIRECTION
                                                                            n

                                                  W

                                                  W                W

                                              B                   W
                                  W
                                                              W                  W 20
                               W          W

                                                              W                 W 15
                              W           W
                                              W
                           W                              W X              W 10
                                                                                       n
                                          W                            B
                                                              X             5
                          W
                                      W
                                                          X
                                                                      W1
                                      W
                                                          X       BACKWARD DIRECTION

                                  B
                                                      X




                 R    R                       R                        L                   L    R


                                          10                                               20   25   n
                 1    5                                                15

                                                                                                         Natural Scene Categorization – p.12/32
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguous!
     • Often the second best class is the correct class




                                                          Natural Scene Categorization – p.13/32
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguous!
     • Often the second best class is the correct class

     • Limitations
       1. Limited discriminating power of global colour histogram
          (GCH)
       2. Local colour histogram (LCH) based on tiling cannot be
          used
       3. Each frame analyzed independently




                                                          Natural Scene Categorization – p.13/32
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguous!
     • Often the second best class is the correct class

     • Limitations
       1. Limited discriminating power of global colour histogram
          (GCH)
       2. Local colour histogram (LCH) based on tiling cannot be
          used
       3. Each frame analyzed independently
     • Possible solutions
       1. Adding memory to the system
       2. Clustering scheme before computing similarity measure


                                                          Natural Scene Categorization – p.13/32
Method I. Adding memory to the system

     • System uses only the current observation in labeling
     • Good idea to use all observations upto the current one
     • Desired: A recursive implementation to calculate the
       posterior (should be able to do it in real-time!)
     • Hidden Markov Model: Parameter estimation using Kevin
       Murphy’s HMM toolkit




                                                           Natural Scene Categorization – p.14/32
Method I. Adding memory to the system

     • System uses only the current observation in labeling
     • Good idea to use all observations upto the current one
     • Desired: A recursive implementation to calculate the
       posterior (should be able to do it in real-time!)
     • Hidden Markov Model: Parameter estimation using Kevin
       Murphy’s HMM toolkit
     • Challenges
       1. Estimation of the transition matrix- possible solution is to
          use limited classes
       2. Enormous training data required




                                                           Natural Scene Categorization – p.14/32
Adding memory. . . (Results)

     • Improved confidence in the results. However, negligible
       improvement in the accuracy
     • Reasons for poor performance
        ◦ Limited number of transitions in categories (as opposed
          to locations
        ◦ Typical training data for HMMs is thousands of labels:
          difficult to collect such vast data
     • Limitation: Makes the system dependent on the system
       dependent on the training sequence




                                                        Natural Scene Categorization – p.15/32
Method II. Preclustering the image

     • Presence of clutter, images from a broad domain
     • Premise: The part of the image indicative of the semantic
       category forms a distinct part in the feature space




     Some test images belonging to the ‘Water-bodies’ category

     • Possible solution: segment out the clutter in the scene




                                                             Natural Scene Categorization – p.16/32
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to compute the
        colour histogram




                                                         Natural Scene Categorization – p.17/32
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to compute the
        colour histogram




           Results of K means clustering on the test images




                                                         Natural Scene Categorization – p.17/32
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to compute the
        colour histogram




           Results of K means clustering on the test images

     • Results
        ◦ Accuracy improves significantly– for ‘water-bodies’ class
           improvement from 25% to about 72%
     • Limitations: What about, say, a traffic scene?!

                                                         Natural Scene Categorization – p.17/32
Model-based approaches

    • Stochastic models used to learn semantic concepts from
      training images
    • Use of normal perspective images
    • Use of local image features
    • Two models examined
      1. probabilistic Latent Semantic Analysis (pLSA)
      2. Maximum entropy models
    • Use of the ‘bag of words’ approach




                                                         Natural Scene Categorization – p.18/32
Bag of words approach

     • Local features more robust to occlusions and spatial
       variations
     • Image represented as a collection of local patches
     • Image patches are members of a learned (visual)
       vocabulary
     • Positional relationships not considered!
     • Data representation by a co-occurrence matrix




                                                         Natural Scene Categorization – p.19/32
Bag of words approach

     • Local features more robust to occlusions and spatial
       variations
     • Image represented as a collection of local patches
     • Image patches are members of a learned (visual)
       vocabulary
     • Positional relationships not considered!
     • Data representation by a co-occurrence matrix

     • Notation
        ◦ D = {d1 , . . . , dN } : corpus of documents
        ◦ W = {w1 , . . . , wM } : dictionary of words
        ◦ Z = {z1 , . . . , zK } : (latent) topic variables
        ◦ N = {n(w, d)}: co-occurrence table


                                                              Natural Scene Categorization – p.19/32
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent class z with probability P (z|d)
        ◦ select a word w with probability P (w|z)




                                                             Natural Scene Categorization – p.20/32
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent class z with probability P (z|d)
        ◦ select a word w with probability P (w|z)

     • Joint observation probability
       P (d, w) = P (d)P (w|d), where
       P (w|d) = z∈Z P (w|z)P (z|d)




                                                             Natural Scene Categorization – p.20/32
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent class z with probability P (z|d)
        ◦ select a word w with probability P (w|z)

     • Joint observation probability
       P (d, w) = P (d)P (w|d), where
       P (w|d) = z∈Z P (w|z)P (z|d)

     • Modeling assumptions
        1. Observation pairs (d, w) generated independently
        2. Conditional independence assumption
           P (w, d|z) = P (w|z)P (d|z)



                                                             Natural Scene Categorization – p.20/32
pLSA model . . .

     • Model fitting
        ◦ Maximize the log-likelihood function
          L = d∈D w∈W n(d, w)logP (d, w)
        ◦ Minimizing the KL divergence between the empirical
          distribution and the model
        ◦ EM algorithm to learn model parameters




                                                      Natural Scene Categorization – p.21/32
pLSA model . . .

     • Model fitting
        ◦ Maximize the log-likelihood function
          L = d∈D w∈W n(d, w)logP (d, w)
        ◦ Minimizing the KL divergence between the empirical
          distribution and the model
        ◦ EM algorithm to learn model parameters

     • Evaluating model on unseen test images
        ◦ P (w|z) and P (z|d) learned from the training dataset
        ◦ ‘Fold-in’ heuristic for categorization: learned factors
           P (w|z) are kept fixed, mixing coefficients P (z|dtest ) are
           estimated using the EM iterations




                                                           Natural Scene Categorization – p.21/32
pLSA model . . .

     • Details of experiment to evaluate model
        ◦ 5 categories: houses, forests, mountains, streets and
            beaches
        ◦   Image dataset: COREL photo CDs, images from internet
            search engines, and personal image collections
        ◦   100 images of each category
        ◦   Modifications in Rob Fergus’s code for the experiments
        ◦   128-dim SIFT feature used to represent a patch
        ◦   Visual codebook with 125 entries
     • Image annotation
       z = arg maxi P (zi |dtest )
       ˆ




                                                        Natural Scene Categorization – p.22/32
pLSA model. . . Results

     • 50 runs of the experiment: with random partitioning on each
       run
     • Vastly different accuracy on different runs: best case ∼ 46%,
       and worst case 5%




                                                         Natural Scene Categorization – p.23/32
pLSA model. . . Results

     • 50 runs of the experiment: with random partitioning on each
       run
     • Vastly different accuracy on different runs: best case ∼ 46%,
       and worst case 5%
     • Analysis of the results
        ◦ Confusion matrix gives us further insights
        ◦ Most of the labeling errors occur between houses and
          streets
        ◦ Ambiguity between mountains and forests




                                                         Natural Scene Categorization – p.23/32
Results using the pLSA model




   Figure 0: Some images that were wrongly anno-
   tated by our system




                                        Natural Scene Categorization – p.24/32
Results of the pLSA model . . .

     • Comparison with the naive Bayes’ classifier




   Figure 0: Confusion matrices for the pLSA and
   naive Bayes models
     • 10-fold cross validation test on the same dataset: mean
       accuracy: ∼ 66%




                                                        Natural Scene Categorization – p.25/32
Analysis of our results

     • Reasons for poor performance
        ◦ Model convergence!
         ◦ Local optima problem in the EM algorithm
         ◦ Optimum value of the objective function depends on the
           initialized values
         ◦ We initialize the algorithm randomly at each run!




                                                         Natural Scene Categorization – p.26/32
Analysis of our results

     • Reasons for poor performance
        ◦ Model convergence!
         ◦ Local optima problem in the EM algorithm
         ◦ Optimum value of the objective function depends on the
           initialized values
         ◦ We initialize the algorithm randomly at each run!

     • Possible solution: Deterministic annealing EM (DAEM)
       algorithm
     • Even with DAEM no guarantee of converging to the global
       optimal solution




                                                         Natural Scene Categorization – p.26/32
Maximum entropy models

    • Maximum entropy prefers a uniform distribution when no
      data are available
    • Best model is the one that is:
       1. Consistent with the constraints imposed by training data
       2. Makes as few assumptions as possible
    • Training dataset: {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}, where xi
      represents an image and yi represents a label
    • Predicate functions
       ◦ Unigram predicate: co-occurrence statistics of a word
          and a label
                                    1 if y =LABEL and v1 ∈ x
          fv1 ,LABEL (x, y) =
                                    0         otherwise




                                                                   Natural Scene Categorization – p.27/32
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution of the observed pairs
          ˜
        ◦ p(y|x): stochastic model to be learnt




                                                          Natural Scene Categorization – p.28/32
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution of the observed pairs
          ˜
        ◦ p(y|x): stochastic model to be learnt

     • Model fitting: expected value of the predicate function w.r.t.
       to the stochastic model should equal the expected value of
       the predicate measured from the training data
     • Constrained optimization problem
       Maximize H(p) = − x,y p(x)p(y|x)logp(y|x)
                                    ˜
       s.t.   x,y p(x, y)f (x, y) =
                  ˜                   x,y p(x)p(y|x)f (x, y)
                                          ˜




                                                               Natural Scene Categorization – p.28/32
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution of the observed pairs
          ˜
        ◦ p(y|x): stochastic model to be learnt

     • Model fitting: expected value of the predicate function w.r.t.
       to the stochastic model should equal the expected value of
       the predicate measured from the training data
     • Constrained optimization problem
       Maximize H(p) = − x,y p(x)p(y|x)logp(y|x)
                                    ˜
       s.t.   x,y p(x, y)f (x, y) =
                  ˜                   x,y p(x)p(y|x)f (x, y)
                                          ˜

     • p(y|x) =    1         k
                  Z(x) exp   i=1 λi fi (x, y)




                                                               Natural Scene Categorization – p.28/32
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
     • Evaluation using Zhang Le’s maximum entropy toolkit




                                                      Natural Scene Categorization – p.29/32
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
     • Evaluation using Zhang Le’s maximum entropy toolkit

     • 25-fold cross-validation accuracy: ∼ 70%
     • The second best label is often the correct label: accuracy
       improves to 85%




                                                         Natural Scene Categorization – p.29/32
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
     • Evaluation using Zhang Le’s maximum entropy toolkit

     • 25-fold cross-validation accuracy: ∼ 70%
     • The second best label is often the correct label: accuracy
       improves to 85%




   Figure 1: Confusion matrices for the maximum
   entropy and naive Bayes models
                                                         Natural Scene Categorization – p.29/32
A comparative study


           Method            # of catg.   training # per catg.          perf(%)
      Maximum entropy            5                 50                     70
            pLSA                 5                 50                     46
    Naive Bayes’ classifier       5                 50                     66
           Fei-Fei              13                100                     64
            Vogel                6               ∼100                    89.3
            Vogel                6               ∼100                    67.2
            Oliva                8             250 ∼ 300                  89

   Table 0: A performance comparison with other
   studies reported in literature.


                                                           Natural Scene Categorization – p.30/32
Future Work

    • Further investigations into the pLSA model
    • Issue of model convergence
    • DAEM algorithm is not the ideal solution
    • Using a richer feature set, e.g., bank of Gabor filters
    • For maximum entropy models, ways to define predicates
      that will capture semantic information better




                                                         Natural Scene Categorization – p.31/32
THANK YOU




            Natural Scene Categorization – p.32/32

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Categorization of natural images

  • 1. Novel Approaches to Natural Scene Categorization Amit Prabhudesai Roll No. 04307002 amitp@ee.iitb.ac.in M.Tech Thesis Defence Under the guidance of Prof. Subhasis Chaudhuri Indian Institute of Technology, Bombay Natural Scene Categorization – p.1/32
  • 2. Overview of topics to be covered • Natural Scene Categorization: Challenges • Our contribution ◦ Qualitative visual environment description • Portable, real-time system to aid the visually impaired • System has peripheral vision! ◦ Model-based approaches • Use of stochastic models to capture semantics • pLSA and maximum entropy models • Conclusions and Future Work Natural Scene Categorization – p.2/32
  • 3. Natural Scene Categorization • Interesting application of a CBIR system • Images from a broad image domain: diverse and often ambiguous • Bridging the semantic gap • Grouping scenes into semantically meaningful categories could aid further retrieval • Efficient schemes for grouping images into semantic categories Natural Scene Categorization – p.3/32
  • 4. Qualitative Visual Environment Retrieval BUILDING SKY LAWN FR LT RT WOODS LB RB P3 P2 P1 WATER BODY • Use of omnidirectional images • Challenges ◦ Unstructured environment ◦ No prior learning (unlike navigation/localization) • Target application and objective ◦ Wearable computing community, emphasis on visually challenged people ◦ Real-time operation Natural Scene Categorization – p.4/32
  • 5. Qualitative Visual Environment System: Overview • Environment representation • Environment retrieval ◦ View partitioning ◦ Feature extraction ◦ Node annotation ◦ Dynamic node annotation ◦ Real-time operation • Results Natural Scene Categorization – p.5/32
  • 6. System Overview (contd.) • Environment representation ◦ Image database containing images belonging to 6 classes: Lawns(L), Woods(W), Buildings(B), Waterbodies(H), Roads(R) and Traffic(T) ◦ Moderately large intra-class variance (in the feature space) in images of each category ◦ Description relative to the person using the system: e.g., ‘to left of’, ‘in the front’, etc. ◦ Topological relationships indicated by a graph ◦ Each node annotated by an identifier associated with a class Natural Scene Categorization – p.6/32
  • 7. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT             LT RT   ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡   LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation Natural Scene Categorization – p.7/32
  • 8. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT             LT RT   ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡   LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation ◦ Feature Extraction • Feature invariant to scaling, viewpoint, illumination changes, and geometric warping introduced by omnicam images • Colour histogram selected as the feature for performing CBIR Natural Scene Categorization – p.7/32
  • 9. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme Natural Scene Categorization – p.8/32
  • 10. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme ◦ Dynamic node annotation • Temporal evolution of graph Gn with time tn • Complete temporal evolution of the graph given by G, obtained by concatenating the subgraphs Gn , i.e.,G = {G1 , G2 , . . . , Gk , . . .} Natural Scene Categorization – p.8/32
  • 11. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image Natural Scene Categorization – p.9/32
  • 12. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image ◦ Portable, low-cost system for visually impaired • Modest hardware and software requirements • Easily put together using off-the-shelf components Natural Scene Categorization – p.9/32
  • 13. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
  • 14. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
  • 15. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
  • 16. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
  • 17. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X Natural Scene Categorization – p.12/32
  • 18. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X R R R L L R 10 20 25 n 1 5 15 Natural Scene Categorization – p.12/32
  • 19. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class Natural Scene Categorization – p.13/32
  • 20. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently Natural Scene Categorization – p.13/32
  • 21. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently • Possible solutions 1. Adding memory to the system 2. Clustering scheme before computing similarity measure Natural Scene Categorization – p.13/32
  • 22. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit Natural Scene Categorization – p.14/32
  • 23. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit • Challenges 1. Estimation of the transition matrix- possible solution is to use limited classes 2. Enormous training data required Natural Scene Categorization – p.14/32
  • 24. Adding memory. . . (Results) • Improved confidence in the results. However, negligible improvement in the accuracy • Reasons for poor performance ◦ Limited number of transitions in categories (as opposed to locations ◦ Typical training data for HMMs is thousands of labels: difficult to collect such vast data • Limitation: Makes the system dependent on the system dependent on the training sequence Natural Scene Categorization – p.15/32
  • 25. Method II. Preclustering the image • Presence of clutter, images from a broad domain • Premise: The part of the image indicative of the semantic category forms a distinct part in the feature space Some test images belonging to the ‘Water-bodies’ category • Possible solution: segment out the clutter in the scene Natural Scene Categorization – p.16/32
  • 26. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Natural Scene Categorization – p.17/32
  • 27. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images Natural Scene Categorization – p.17/32
  • 28. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images • Results ◦ Accuracy improves significantly– for ‘water-bodies’ class improvement from 25% to about 72% • Limitations: What about, say, a traffic scene?! Natural Scene Categorization – p.17/32
  • 29. Model-based approaches • Stochastic models used to learn semantic concepts from training images • Use of normal perspective images • Use of local image features • Two models examined 1. probabilistic Latent Semantic Analysis (pLSA) 2. Maximum entropy models • Use of the ‘bag of words’ approach Natural Scene Categorization – p.18/32
  • 30. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix Natural Scene Categorization – p.19/32
  • 31. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix • Notation ◦ D = {d1 , . . . , dN } : corpus of documents ◦ W = {w1 , . . . , wM } : dictionary of words ◦ Z = {z1 , . . . , zK } : (latent) topic variables ◦ N = {n(w, d)}: co-occurrence table Natural Scene Categorization – p.19/32
  • 32. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) Natural Scene Categorization – p.20/32
  • 33. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) Natural Scene Categorization – p.20/32
  • 34. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) • Modeling assumptions 1. Observation pairs (d, w) generated independently 2. Conditional independence assumption P (w, d|z) = P (w|z)P (d|z) Natural Scene Categorization – p.20/32
  • 35. pLSA model . . . • Model fitting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters Natural Scene Categorization – p.21/32
  • 36. pLSA model . . . • Model fitting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters • Evaluating model on unseen test images ◦ P (w|z) and P (z|d) learned from the training dataset ◦ ‘Fold-in’ heuristic for categorization: learned factors P (w|z) are kept fixed, mixing coefficients P (z|dtest ) are estimated using the EM iterations Natural Scene Categorization – p.21/32
  • 37. pLSA model . . . • Details of experiment to evaluate model ◦ 5 categories: houses, forests, mountains, streets and beaches ◦ Image dataset: COREL photo CDs, images from internet search engines, and personal image collections ◦ 100 images of each category ◦ Modifications in Rob Fergus’s code for the experiments ◦ 128-dim SIFT feature used to represent a patch ◦ Visual codebook with 125 entries • Image annotation z = arg maxi P (zi |dtest ) ˆ Natural Scene Categorization – p.22/32
  • 38. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% Natural Scene Categorization – p.23/32
  • 39. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% • Analysis of the results ◦ Confusion matrix gives us further insights ◦ Most of the labeling errors occur between houses and streets ◦ Ambiguity between mountains and forests Natural Scene Categorization – p.23/32
  • 40. Results using the pLSA model Figure 0: Some images that were wrongly anno- tated by our system Natural Scene Categorization – p.24/32
  • 41. Results of the pLSA model . . . • Comparison with the naive Bayes’ classifier Figure 0: Confusion matrices for the pLSA and naive Bayes models • 10-fold cross validation test on the same dataset: mean accuracy: ∼ 66% Natural Scene Categorization – p.25/32
  • 42. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! Natural Scene Categorization – p.26/32
  • 43. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! • Possible solution: Deterministic annealing EM (DAEM) algorithm • Even with DAEM no guarantee of converging to the global optimal solution Natural Scene Categorization – p.26/32
  • 44. Maximum entropy models • Maximum entropy prefers a uniform distribution when no data are available • Best model is the one that is: 1. Consistent with the constraints imposed by training data 2. Makes as few assumptions as possible • Training dataset: {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}, where xi represents an image and yi represents a label • Predicate functions ◦ Unigram predicate: co-occurrence statistics of a word and a label 1 if y =LABEL and v1 ∈ x fv1 ,LABEL (x, y) = 0 otherwise Natural Scene Categorization – p.27/32
  • 45. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt Natural Scene Categorization – p.28/32
  • 46. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model fitting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ Natural Scene Categorization – p.28/32
  • 47. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model fitting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ • p(y|x) = 1 k Z(x) exp i=1 λi fi (x, y) Natural Scene Categorization – p.28/32
  • 48. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit Natural Scene Categorization – p.29/32
  • 49. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Natural Scene Categorization – p.29/32
  • 50. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Figure 1: Confusion matrices for the maximum entropy and naive Bayes models Natural Scene Categorization – p.29/32
  • 51. A comparative study Method # of catg. training # per catg. perf(%) Maximum entropy 5 50 70 pLSA 5 50 46 Naive Bayes’ classifier 5 50 66 Fei-Fei 13 100 64 Vogel 6 ∼100 89.3 Vogel 6 ∼100 67.2 Oliva 8 250 ∼ 300 89 Table 0: A performance comparison with other studies reported in literature. Natural Scene Categorization – p.30/32
  • 52. Future Work • Further investigations into the pLSA model • Issue of model convergence • DAEM algorithm is not the ideal solution • Using a richer feature set, e.g., bank of Gabor filters • For maximum entropy models, ways to define predicates that will capture semantic information better Natural Scene Categorization – p.31/32
  • 53. THANK YOU Natural Scene Categorization – p.32/32