SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Bag-of-Words models

                  Computer Vision
                     Lecture 8


Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka
Bag-of-features models
Overview: Bag-of-features models
• Origins and motivation
• Image representation
• Discriminative methods
  – Nearest-neighbor classification
  – Support vector machines
• Generative methods
  – Naïve Bayes
  – Probabilistic Latent Semantic Analysis
• Extensions: incorporating spatial information
Origin 1: Texture recognition
• Texture is characterized by the repetition of basic
  elements or textons
• For stochastic textures, it is the identity of the textons,
  not their spatial arrangement, that matters




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid
2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 1: Texture recognition
                                                        histogram



                                                Universal texton dictionary




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid
2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies
  of words from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
               http://chir.ag/phernalia/preztags/
Bags of features for image
              classification

1. Extract features
Bags of features for image
              classification

1. Extract features
2. Learn “visual vocabulary”
Bags of features for image
              classification

1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bags of features for image
                classification
1.   Extract features
2.   Learn “visual vocabulary”
3.   Quantize features using visual vocabulary
4.   Represent images by frequencies of
     “visual words”
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
• Interest point detector
   – Csurka et al. 2004
   – Fei-Fei & Perona, 2005
   – Sivic et al. 2005
1. Feature extraction
• Regular grid
   – Vogel & Schiele, 2003
   – Fei-Fei & Perona, 2005
• Interest point detector
   – Csurka et al. 2004
   – Fei-Fei & Perona, 2005
   – Sivic et al. 2005
• Other methods
   – Random sampling (Vidal-Naquet & Ullman, 2002)
   – Segmentation-based patches (Barnard et al. 2003)
1. Feature extraction



Compute SIFT
 descriptor    Normalize patch
 [Lowe’99]


                                                Detect patches
                                 [Mikojaczyk and Schmid ’02]
                                 [Mata, Chum, Urban & Pajdla, ’02]
                                 [Sivic & Zisserman, ’03]




                                                               Slide credit: Josef Sivic
1. Feature extraction

      …
2. Learning the visual vocabulary

            …
2. Learning the visual vocabulary

            …




                           Clustering



                             Slide credit: Josef Sivic
2. Learning the visual vocabulary
                         Visual vocabulary

            …




                            Clustering



                               Slide credit: Josef Sivic
K-means clustering
• Want to minimize sum of squared
  Euclidean distances between points xi and
  their nearest cluster centers mk
          D( X , M ) =     ∑          ∑ ( xi − mk ) 2
                         cluster k point i in
                                   cluster k


• Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
  – Assign each data point to the nearest center
  – Recompute each cluster center as the mean of
    all points assigned to it
From clustering to vector quantization
• Clustering is a common method for learning a visual
  vocabulary or codebook
   – Unsupervised learning process
   – Each cluster center produced by k-means becomes a
     codevector
   – Codebook can be learned on separate training set
   – Provided the training set is sufficiently representative,
     the codebook will be “universal”

• The codebook is used for quantizing features
   – A vector quantizer takes a feature vector and maps it
     to the index of the nearest codevector in a codebook
   – Codebook = visual vocabulary
   – Codevector = visual word
Example visual vocabulary




                            Fei-Fei et al. 2005
Image patch examples of visual words




                                 Sivic et al. 2005
Visual vocabularies: Issues
• How to choose vocabulary size?
  – Too small: visual words not representative of all
    patches
  – Too large: quantization artifacts, overfitting
• Generative or discriminative learning?
• Computational efficiency
  – Vocabulary trees
    (Nister & Stewenius, 2006)
3. Image representation
frequency




                              …..
                  codewords
Image classification
• Given the bag-of-features representations of
  images from different classes, how do we
  learn a model for distinguishing them?
Discriminative and generative methods for
             bags of features

                       Zebra
                       Non-zebra
Discriminative methods
• Learn a decision rule (classifier) assigning bag-
  of-features representations of images to
  different classes
      Decision                Zebra
      boundary
                              Non-zebra
Classification
• Assign input vector to one of two or more
  classes
• Any decision rule divides input space into
  decision regions separated by decision
  boundaries
Nearest Neighbor Classifier
• Assign label of nearest training data point
  to each test data point




      from Duda et al.


                    Voronoi partitioning of feature space
                     for two-category 2D and 3D data        Source: D. Lowe
K-Nearest Neighbors
• For a new point, find the k closest points from training
  data
• Labels of the k points “vote” to classify
• Works well provided there is lots of data and the distance
  function is good
                                   k=5




                                                    Source: D. Lowe
Functions for comparing histograms
• L1 distance                   N
                D(h1 , h2 ) = ∑ | h1 (i ) − h2 (i ) |
                               i =1




                D(h1 , h2 ) = ∑
                                      N
                                           ( h1 (i) − h2 (i) ) 2
• χ2 distance
                                    i =1     h1 (i ) + h2 (i )
Linear classifiers
• Find linear function (hyperplane) to separate
  positive and negative examples
                           x i positive :   xi ⋅ w + b ≥ 0
                           x i negative :   xi ⋅ w + b < 0




                                    Which hyperplane
                                        is best?



                                             Slide: S. Lazebnik
Support vector machines
    • Find hyperplane that maximizes the margin
      between the positive and negative examples




C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Support vector machines
    • Find hyperplane that maximizes the margin
      between the positive and negative examples
                                                 x i positive ( yi = 1) :     xi ⋅ w + b ≥ 1
                                                 x i negative ( yi = −1) :    x i ⋅ w + b ≤ −1

                                                 For support vectors,        x i ⋅ w + b = ±1

                                                 Distance between point          | xi ⋅ w + b |
                                                 and hyperplane:                     || w ||

                                                 Therefore, the margin is 2 / ||w||
 Support vectors                      Margin

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Finding the maximum margin
                          hyperplane
    1. Maximize margin 2/||w||
    2. Correctly classify all training data:
             x i positive ( yi = 1) :          xi ⋅ w + b ≥ 1
             x i negative ( yi = −1) :         x i ⋅ w + b ≤ −1




C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Nonlinear SVMs
• Datasets that are linearly separable work out great:

                           0                  x



• But what if the dataset is just too hard?
                            0                 x



• We can map it to a higher-dimensional space:
                                x2




                               0              x   Slide credit: Andrew Moore
Nonlinear SVMs
• General idea: the original input space can
  always be mapped to some higher-
  dimensional feature space where the training
  set is separable:

                   Φ: x → φ(x)




                                     Slide credit: Andrew Moore
Nonlinear SVMs
    • The kernel trick: instead of explicitly
      computing the lifting transformation φ(x),
      define a kernel function K such that
                    K(xi, xj) = φ(xi ) · φ(xj)
    • (to be valid, the kernel function must satisfy
      Mercer’s condition)
    • This gives a nonlinear decision boundary in
      the original feature space:
                               ∑ α y K ( x , x) + b
                                 i
                                      i   i      i

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and
Knowledge Discovery, 1998
Kernels for bags of features
     • Histogram intersection kernel:
                                              N
                            I (h1 , h2 ) = ∑ min(h1 (i ), h2 (i ))
                                             i =1



     • Generalized Gaussian kernel:
                                   1            2
                K (h1 , h2 ) = exp − D(h1 , h2 ) 
                                   A             
     • D can be Euclidean distance, χ2 distance

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation
of Texture and Object Categories: A Comprehensive Study, IJCV 2007
What about multi-class SVMs?
• Unfortunately, there is no “definitive” multi-class SVM
  formulation
• In practice, we have to obtain a multi-class SVM by
  combining multiple two-class SVMs
• One vs. others
   – Traning: learn an SVM for each class vs. the others
   – Testing: apply each SVM to test example and assign to it the
     class of the SVM that returns the highest decision value
• One vs. one
   – Training: learn an SVM for each pair of classes
   – Testing: each learned SVM “votes” for a class to assign to the
     test example



                                                         Slide: S. Lazebnik
SVMs: Pros and cons
• Pros
  – Many publicly available SVM packages:
    http://www.kernel-machines.org/software
  – Kernel-based framework is very powerful, flexible
  – SVMs work very well in practice, even with very
    small training sample sizes

• Cons
  – No “direct” multi-class SVM, must combine two-
    class SVMs
  – Computation, memory
     • During training time, must compute matrix of kernel
       values for every pair of examples
     • Learning can take a very long time for large-scale
       problems

                                                    Slide: S. Lazebnik
Summary: Discriminative methods
• Nearest-neighbor and k-nearest-neighbor classifiers
   – L1 distance, χ2 distance, quadratic distance,

   – Support vector machines
   – Linear classifiers
   – Margin maximization
   – The kernel trick
   – Kernel functions: histogram intersection, generalized
     Gaussian, pyramid match
   – Multi-class
• Of course, there are many other classifiers out there
   – Neural networks, boosting, decision trees, …
Generative learning methods for bags of features

• Model the probability of a bag of features
  given a class
Generative methods
• We will cover two models, both inspired by
  text document analysis:
  – Naïve Bayes
  – Probabilistic Latent Semantic Analysis
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class

                         N
 p ( f1 ,  , f N | c) = ∏ p ( f i | c)
                        i =1

     fi: ith feature in the image
     N: number of features in the image




                                              Csurka et al. 2004
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class
                       N                M
 p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c)
                                                          n(w j )

                       i =1             j =1


     fi: ith feature in the image
     N: number of features in the image
      wj: jth visual word in the vocabulary
      M: size of visual vocabulary
      n(wj): number of features of type wj in the image
                                                   Csurka et al. 2004
The Naïve Bayes model
• Assume that each feature is conditionally
  independent given the class
                             N                          M
   p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c)
                                                                               n(w j )

                             i =1                       j =1




                 No. of features of type wj in training images of class c
p(wj | c) =
                   Total no. of features in training images of class c




                                                                         Csurka et al. 2004
The Naïve Bayes model
• Maximum A Posteriori decision:

                          M
   c* = arg max c p (c)∏ p ( w j | c)
                                        n(w j )

                          j =1
                                 M
      = arg max c log p (c) + ∑ n( w j ) log p ( w j | c)
                                 j =1


   (you should compute the log of the likelihood
   instead of the likelihood itself in order to avoid
   underflow)
                                                   Csurka et al. 2004
The Naïve Bayes model

• “Graphical model”:




           c                 w
                                       N

                                           Csurka et al. 2004
Probabilistic Latent Semantic Analysis




            = p1                      + p2                      + p3

Image                   zebra                      grass               tree




                                              “visual topics”

        T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
Probabilistic Latent Semantic Analysis
• Unsupervised technique
• Two-level generative model: a document is a
  mixture of topics, and each topic has its own
  characteristic word distribution


                    d              z          w


                document         topic      word
                                P(z|d)      P(w|z)


      T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
Probabilistic Latent Semantic Analysis
• Unsupervised technique
• Two-level generative model: a document is a
  mixture of topics, and each topic has its own
  characteristic word distribution


                    d              z          w


                             K
        p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                            k =1

      T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
The pLSA model
                        K
  p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                        k =1


Probability of word i          Probability of   Probability of
   in document j                word i given    topic k given
      (known)                     topic k        document j
                                (unknown)        (unknown)
The pLSA model
                                      K
               p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                                      k =1
         documents                           topics                documents
words




                              words




                                                          topics
                                                                          p(zk|dj)


                p(wi|dj)
                              =                p(wi|zk)




        Observed codeword     Codeword distributions                Class distributions
           distributions         per topic (class)                      per image
              (M×N)                   (M×K)                                (K×N)
Learning pLSA parameters
Maximize likelihood of data:
                                 Observed counts of
                                 word i in document j




       M … number of codewords
       N … number of images




                                            Slide credit: Josef Sivic
Inference
• Finding the most likely topic (class) for an image:

              ∗
             z = arg max p ( z | d )
                        z
Inference
  • Finding the most likely topic (class) for an image:

                 ∗
                z = arg max p ( z | d )
                               z


• Finding the most likely topic (class) for a visual
  word in a given image:

                 z ∗ = arg max p ( z | w, d )
                           z
Topic discovery in images




J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location
in Images, ICCV 2005
Application of pLSA: Action recognition
                                  Space-time interest points




Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action
Categories Using Spatial-Temporal Words, IJCV 2008.
Application of pLSA: Action recognition




Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action
Categories Using Spatial-Temporal Words, IJCV 2008.
pLSA model
                          K
    p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j )
                          k =1


  Probability of word i          Probability of   Probability of
       in video j                 word i given    topic k given
        (known)                     topic k          video j
                                  (unknown)        (unknown)

– wi = spatial-temporal word
– dj = video
– n(wi, dj) = co-occurrence table
  (# of occurrences of word wi in video dj)
– z = topic, corresponding to an action
Action recognition example
Multiple Actions
Multiple Actions
Summary: Generative models
• Naïve Bayes
  – Unigram models in document analysis
  – Assumes conditional independence of words given
    class
  – Parameter estimation: frequency counting
• Probabilistic Latent Semantic Analysis
  – Unsupervised technique
  – Each document is a mixture of topics (image is a
    mixture of classes)
  – Can be thought of as matrix decomposition

Weitere ähnliche Inhalte

Was ist angesagt?

MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative modelszukun
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detectionzukun
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferCharles Deledalle
 
Ontology Based Object Learning and Recognition
Ontology Based Object Learning and RecognitionOntology Based Object Learning and Recognition
Ontology Based Object Learning and Recognitionnmaillot
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robotss1240148
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harrisWael Badawy
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration StructuresMark Kilgard
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global IlluminationMark Kilgard
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videosInsaf Setitra
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2zukun
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2Prarinya Siritanawan
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 

Was ist angesagt? (20)

MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative models
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detection
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
 
Ontology Based Object Learning and Recognition
Ontology Based Object Learning and RecognitionOntology Based Object Learning and Recognition
Ontology Based Object Learning and Recognition
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robots
 
Blurclassification
BlurclassificationBlurclassification
Blurclassification
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harris
 
CS 354 Acceleration Structures
CS 354 Acceleration StructuresCS 354 Acceleration Structures
CS 354 Acceleration Structures
 
CS 354 Global Illumination
CS 354 Global IlluminationCS 354 Global Illumination
CS 354 Global Illumination
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videos
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2
 
BMC 2012
BMC 2012BMC 2012
BMC 2012
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
56 58
56 5856 58
56 58
 

Ähnlich wie 16 17 bag_words

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Machine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine LearninggMachine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine Learninggghsskchutta
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningkkkc
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetectionJie Feng
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Anton Kasyanov
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingcairo university
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 

Ähnlich wie 16 17 bag_words (20)

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
MAchine learning
MAchine learningMAchine learning
MAchine learning
 
PPT-3.ppt
PPT-3.pptPPT-3.ppt
PPT-3.ppt
 
Machine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine LearninggMachine Learning Machine Learnin Machine Learningg
Machine Learning Machine Learnin Machine Learningg
 
17.ppt
17.ppt17.ppt
17.ppt
 
Part2
Part2Part2
Part2
 
PPT s12-machine vision-s2
PPT s12-machine vision-s2PPT s12-machine vision-s2
PPT s12-machine vision-s2
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Iccv11 salientobjectdetection
Iccv11 salientobjectdetectionIccv11 salientobjectdetection
Iccv11 salientobjectdetection
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)Introduction to Computer Vision (uapycon 2017)
Introduction to Computer Vision (uapycon 2017)
 
Lecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matchingLecture 6-computer vision features descriptors matching
Lecture 6-computer vision features descriptors matching
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Lec10 alignment
Lec10 alignmentLec10 alignment
Lec10 alignment
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 

Kürzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

16 17 bag_words

  • 1. Bag-of-Words models Computer Vision Lecture 8 Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka
  • 3. Overview: Bag-of-features models • Origins and motivation • Image representation • Discriminative methods – Nearest-neighbor classification – Support vector machines • Generative methods – Naïve Bayes – Probabilistic Latent Semantic Analysis • Extensions: incorporating spatial information
  • 4. Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 5. Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 6. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • 7. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 8. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 9. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 10. Bags of features for image classification 1. Extract features
  • 11. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary”
  • 12. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary
  • 13. Bags of features for image classification 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”
  • 14. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005
  • 15. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005
  • 16. 1. Feature extraction • Regular grid – Vogel & Schiele, 2003 – Fei-Fei & Perona, 2005 • Interest point detector – Csurka et al. 2004 – Fei-Fei & Perona, 2005 – Sivic et al. 2005 • Other methods – Random sampling (Vidal-Naquet & Ullman, 2002) – Segmentation-based patches (Barnard et al. 2003)
  • 17. 1. Feature extraction Compute SIFT descriptor Normalize patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic
  • 19. 2. Learning the visual vocabulary …
  • 20. 2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic
  • 21. 2. Learning the visual vocabulary Visual vocabulary … Clustering Slide credit: Josef Sivic
  • 22. K-means clustering • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk D( X , M ) = ∑ ∑ ( xi − mk ) 2 cluster k point i in cluster k • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: – Assign each data point to the nearest center – Recompute each cluster center as the mean of all points assigned to it
  • 23. From clustering to vector quantization • Clustering is a common method for learning a visual vocabulary or codebook – Unsupervised learning process – Each cluster center produced by k-means becomes a codevector – Codebook can be learned on separate training set – Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features – A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook – Codebook = visual vocabulary – Codevector = visual word
  • 24. Example visual vocabulary Fei-Fei et al. 2005
  • 25. Image patch examples of visual words Sivic et al. 2005
  • 26. Visual vocabularies: Issues • How to choose vocabulary size? – Too small: visual words not representative of all patches – Too large: quantization artifacts, overfitting • Generative or discriminative learning? • Computational efficiency – Vocabulary trees (Nister & Stewenius, 2006)
  • 28. Image classification • Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
  • 29. Discriminative and generative methods for bags of features Zebra Non-zebra
  • 30. Discriminative methods • Learn a decision rule (classifier) assigning bag- of-features representations of images to different classes Decision Zebra boundary Non-zebra
  • 31. Classification • Assign input vector to one of two or more classes • Any decision rule divides input space into decision regions separated by decision boundaries
  • 32. Nearest Neighbor Classifier • Assign label of nearest training data point to each test data point from Duda et al. Voronoi partitioning of feature space for two-category 2D and 3D data Source: D. Lowe
  • 33. K-Nearest Neighbors • For a new point, find the k closest points from training data • Labels of the k points “vote” to classify • Works well provided there is lots of data and the distance function is good k=5 Source: D. Lowe
  • 34. Functions for comparing histograms • L1 distance N D(h1 , h2 ) = ∑ | h1 (i ) − h2 (i ) | i =1 D(h1 , h2 ) = ∑ N ( h1 (i) − h2 (i) ) 2 • χ2 distance i =1 h1 (i ) + h2 (i )
  • 35. Linear classifiers • Find linear function (hyperplane) to separate positive and negative examples x i positive : xi ⋅ w + b ≥ 0 x i negative : xi ⋅ w + b < 0 Which hyperplane is best? Slide: S. Lazebnik
  • 36. Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 37. Support vector machines • Find hyperplane that maximizes the margin between the positive and negative examples x i positive ( yi = 1) : xi ⋅ w + b ≥ 1 x i negative ( yi = −1) : x i ⋅ w + b ≤ −1 For support vectors, x i ⋅ w + b = ±1 Distance between point | xi ⋅ w + b | and hyperplane: || w || Therefore, the margin is 2 / ||w|| Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 38. Finding the maximum margin hyperplane 1. Maximize margin 2/||w|| 2. Correctly classify all training data: x i positive ( yi = 1) : xi ⋅ w + b ≥ 1 x i negative ( yi = −1) : x i ⋅ w + b ≤ −1 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 39. Nonlinear SVMs • Datasets that are linearly separable work out great: 0 x • But what if the dataset is just too hard? 0 x • We can map it to a higher-dimensional space: x2 0 x Slide credit: Andrew Moore
  • 40. Nonlinear SVMs • General idea: the original input space can always be mapped to some higher- dimensional feature space where the training set is separable: Φ: x → φ(x) Slide credit: Andrew Moore
  • 41. Nonlinear SVMs • The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that K(xi, xj) = φ(xi ) · φ(xj) • (to be valid, the kernel function must satisfy Mercer’s condition) • This gives a nonlinear decision boundary in the original feature space: ∑ α y K ( x , x) + b i i i i C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
  • 42. Kernels for bags of features • Histogram intersection kernel: N I (h1 , h2 ) = ∑ min(h1 (i ), h2 (i )) i =1 • Generalized Gaussian kernel:  1 2 K (h1 , h2 ) = exp − D(h1 , h2 )   A  • D can be Euclidean distance, χ2 distance J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007
  • 43. What about multi-class SVMs? • Unfortunately, there is no “definitive” multi-class SVM formulation • In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs • One vs. others – Traning: learn an SVM for each class vs. the others – Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value • One vs. one – Training: learn an SVM for each pair of classes – Testing: each learned SVM “votes” for a class to assign to the test example Slide: S. Lazebnik
  • 44. SVMs: Pros and cons • Pros – Many publicly available SVM packages: http://www.kernel-machines.org/software – Kernel-based framework is very powerful, flexible – SVMs work very well in practice, even with very small training sample sizes • Cons – No “direct” multi-class SVM, must combine two- class SVMs – Computation, memory • During training time, must compute matrix of kernel values for every pair of examples • Learning can take a very long time for large-scale problems Slide: S. Lazebnik
  • 45. Summary: Discriminative methods • Nearest-neighbor and k-nearest-neighbor classifiers – L1 distance, χ2 distance, quadratic distance, – Support vector machines – Linear classifiers – Margin maximization – The kernel trick – Kernel functions: histogram intersection, generalized Gaussian, pyramid match – Multi-class • Of course, there are many other classifiers out there – Neural networks, boosting, decision trees, …
  • 46. Generative learning methods for bags of features • Model the probability of a bag of features given a class
  • 47. Generative methods • We will cover two models, both inspired by text document analysis: – Naïve Bayes – Probabilistic Latent Semantic Analysis
  • 48. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N p ( f1 ,  , f N | c) = ∏ p ( f i | c) i =1 fi: ith feature in the image N: number of features in the image Csurka et al. 2004
  • 49. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N M p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c) n(w j ) i =1 j =1 fi: ith feature in the image N: number of features in the image wj: jth visual word in the vocabulary M: size of visual vocabulary n(wj): number of features of type wj in the image Csurka et al. 2004
  • 50. The Naïve Bayes model • Assume that each feature is conditionally independent given the class N M p( f1 ,  , f N | c) = ∏ p ( f i | c) = ∏ p ( w j | c) n(w j ) i =1 j =1 No. of features of type wj in training images of class c p(wj | c) = Total no. of features in training images of class c Csurka et al. 2004
  • 51. The Naïve Bayes model • Maximum A Posteriori decision: M c* = arg max c p (c)∏ p ( w j | c) n(w j ) j =1 M = arg max c log p (c) + ∑ n( w j ) log p ( w j | c) j =1 (you should compute the log of the likelihood instead of the likelihood itself in order to avoid underflow) Csurka et al. 2004
  • 52. The Naïve Bayes model • “Graphical model”: c w N Csurka et al. 2004
  • 53. Probabilistic Latent Semantic Analysis = p1 + p2 + p3 Image zebra grass tree “visual topics” T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 54. Probabilistic Latent Semantic Analysis • Unsupervised technique • Two-level generative model: a document is a mixture of topics, and each topic has its own characteristic word distribution d z w document topic word P(z|d) P(w|z) T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 55. Probabilistic Latent Semantic Analysis • Unsupervised technique • Two-level generative model: a document is a mixture of topics, and each topic has its own characteristic word distribution d z w K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 T. Hofmann, Probabilistic Latent Semantic Analysis, UAI 1999
  • 56. The pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 Probability of word i Probability of Probability of in document j word i given topic k given (known) topic k document j (unknown) (unknown)
  • 57. The pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 documents topics documents words words topics p(zk|dj) p(wi|dj) = p(wi|zk) Observed codeword Codeword distributions Class distributions distributions per topic (class) per image (M×N) (M×K) (K×N)
  • 58. Learning pLSA parameters Maximize likelihood of data: Observed counts of word i in document j M … number of codewords N … number of images Slide credit: Josef Sivic
  • 59. Inference • Finding the most likely topic (class) for an image: ∗ z = arg max p ( z | d ) z
  • 60. Inference • Finding the most likely topic (class) for an image: ∗ z = arg max p ( z | d ) z • Finding the most likely topic (class) for a visual word in a given image: z ∗ = arg max p ( z | w, d ) z
  • 61. Topic discovery in images J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman, Discovering Objects and their Location in Images, ICCV 2005
  • 62. Application of pLSA: Action recognition Space-time interest points Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
  • 63. Application of pLSA: Action recognition Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
  • 64. pLSA model K p ( wi | d j ) = ∑ p ( wi | z k ) p ( z k | d j ) k =1 Probability of word i Probability of Probability of in video j word i given topic k given (known) topic k video j (unknown) (unknown) – wi = spatial-temporal word – dj = video – n(wi, dj) = co-occurrence table (# of occurrences of word wi in video dj) – z = topic, corresponding to an action
  • 68. Summary: Generative models • Naïve Bayes – Unigram models in document analysis – Assumes conditional independence of words given class – Parameter estimation: frequency counting • Probabilistic Latent Semantic Analysis – Unsupervised technique – Each document is a mixture of topics (image is a mixture of classes) – Can be thought of as matrix decomposition

Hinweis der Redaktion

  1. For that we will use a method called probabilistic latent semantic analysis. pLSA can be thought of as a matrix decomposition. Here is our term-document matrix (documents, words) and we want to find topic vectors common to all documents and mixture coefficients P(z|d) specific to each document. Note that these are all probabilites which sum to one. So a column here is here expressed as a convex combination of the topic vectors. What we would like to see is that the topics correspond to objects. So an image will be expressed as a mixture of different objects and backgrounds.
  2. For that we will use a method called probabilistic latent semantic analysis. pLSA can be thought of as a matrix decomposition. Here is our term-document matrix (documents, words) and we want to find topic vectors common to all documents and mixture coefficients P(z|d) specific to each document. Note that these are all probabilites which sum to one. So a column here is here expressed as a convex combination of the topic vectors. What we would like to see is that the topics correspond to objects. So an image will be expressed as a mixture of different objects and backgrounds.
  3. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data
  4. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data
  5. We use maximum likelihood approach to fit parameters of the model. We write down the likelihood of the whole collection and maximize it using EM. M,N goes over all visual words and all documents. P(w|d) factorizes as on previous slide the entries in those matrices are our parameters. n(w,d) are the observed counts in our data