SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Bag-of-features models




     Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Overview: Bag-of-features models
• Origins and motivation
• Image representation
  • Feature extraction
  • Visual vocabularies
• Discriminative methods
  •   Nearest-neighbor classification
  •   Distance functions
  •   Support vector machines
  •   Kernels
• Generative methods
  • Naïve Bayes
  • Probabilistic Latent Semantic Analysis
• Extensions: incorporating spatial information
Origin 1: Texture recognition

• Texture is characterized by the repetition of basic elements
  or textons
• For stochastic textures, it is the identity of the textons, not
  their spatial arrangement, that matters




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 1: Texture recognition

                                                   histogram



                                           Universal texton dictionary




Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
  from a dictionary Salton & McGill (1983)
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
  from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
                 http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
  from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
                 http://chir.ag/phernalia/preztags/
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words
  from a dictionary Salton & McGill (1983)




                    US Presidential Speeches Tag Cloud
                 http://chir.ag/phernalia/preztags/
Bags of features for object recognition




                                                                                face, flowers, building



•   Works pretty well for image-level classification



        Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
Bags of features for object recognition


                  Caltech6 dataset




     bag of features     bag of features   Parts-and-shape model
Bag of features: outline
1. Extract features
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
Bag of features: outline
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
Bag of features: outline
1.   Extract features
2.   Learn “visual vocabulary”
3.   Quantize features using visual vocabulary
4.   Represent images by frequencies of
     “visual words”
1. Feature extraction

Regular grid
   • Vogel & Schiele, 2003
   • Fei-Fei & Perona, 2005
1. Feature extraction

Regular grid
   • Vogel & Schiele, 2003
   • Fei-Fei & Perona, 2005
Interest point detector
   • Csurka et al. 2004
   • Fei-Fei & Perona, 2005
   • Sivic et al. 2005
1. Feature extraction

Regular grid
   • Vogel & Schiele, 2003
   • Fei-Fei & Perona, 2005
Interest point detector
   • Csurka et al. 2004
   • Fei-Fei & Perona, 2005
   • Sivic et al. 2005
Other methods
   • Random sampling (Vidal-Naquet & Ullman, 2002)
   • Segmentation-based patches (Barnard et al. 2003)
1. Feature extraction


Compute
  SIFT           Normalize
descriptor         patch
 [Lowe’99]

                                         Detect patches
                             [Mikojaczyk and Schmid ’02]
                             [Mata, Chum, Urban & Pajdla, ’02]
                             [Sivic & Zisserman, ’03]




                                                        Slide credit: Josef Sivic
1. Feature extraction

       …
2. Learning the visual vocabulary

             …
2. Learning the visual vocabulary

             …




                       Clustering


                         Slide credit: Josef Sivic
2. Learning the visual vocabulary
                     Visual vocabulary
             …




                        Clustering


                           Slide credit: Josef Sivic
K-means clustering
•   Want to minimize sum of squared Euclidean
    distances between points xi and their
    nearest cluster centers mk

               D( X , M ) =     ∑          ∑ ( xi − mk ) 2
                              cluster k point i in
                                        cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
    •   Assign each data point to the nearest center
    •   Recompute each cluster center as the mean of all points
        assigned to it
From clustering to vector quantization
• Clustering is a common method for learning a
  visual vocabulary or codebook
  • Unsupervised learning process
  • Each cluster center produced by k-means becomes a
    codevector
  • Codebook can be learned on separate training set
  • Provided the training set is sufficiently representative, the
    codebook will be “universal”


• The codebook is used for quantizing features
  • A vector quantizer takes a feature vector and maps it to the
    index of the nearest codevector in a codebook
  • Codebook = visual vocabulary
  • Codevector = visual word
Example visual vocabulary




                            Fei-Fei et al. 2005
Image patch examples of visual words




                               Sivic et al. 2005
Visual vocabularies: Issues
• How to choose vocabulary size?
  • Too small: visual words not representative of all patches
  • Too large: quantization artifacts, overfitting
• Generative or discriminative learning?
• Computational efficiency
  • Vocabulary trees
    (Nister & Stewenius, 2006)
3. Image representation
frequency




                           …..
               codewords
Image classification
• Given the bag-of-features representations of
  images from different classes, how do we
  learn a model for distinguishing them?

Weitere ähnliche Inhalte

Ähnlich wie Lec18 bag of_features

Fcv hist lowe
Fcv hist loweFcv hist lowe
Fcv hist lowezukun
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015Beatrice van Eden
 
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1zukun
 
CBIR in the Era of Deep Learning
CBIR in the Era of Deep LearningCBIR in the Era of Deep Learning
CBIR in the Era of Deep LearningXiaohu ZHU
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
Lec13 stereo converted
Lec13 stereo convertedLec13 stereo converted
Lec13 stereo convertedBaliThorat1
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1zukun
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Dr. Aparna Varde
 
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)mayankraj86
 
Changing Minds, Changing Organizations, Changing Technologies
Changing Minds, Changing Organizations, Changing TechnologiesChanging Minds, Changing Organizations, Changing Technologies
Changing Minds, Changing Organizations, Changing TechnologiesGigi Johnson
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDGeorge Awad
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 

Ähnlich wie Lec18 bag of_features (20)

Fcv hist lowe
Fcv hist loweFcv hist lowe
Fcv hist lowe
 
Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Wits presentation 2_19052015
Wits presentation 2_19052015Wits presentation 2_19052015
Wits presentation 2_19052015
 
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
 
CBIR in the Era of Deep Learning
CBIR in the Era of Deep LearningCBIR in the Era of Deep Learning
CBIR in the Era of Deep Learning
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
PPT s12-machine vision-s2
PPT s12-machine vision-s2PPT s12-machine vision-s2
PPT s12-machine vision-s2
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
Part2
Part2Part2
Part2
 
Part1
Part1Part1
Part1
 
Lec13 stereo converted
Lec13 stereo convertedLec13 stereo converted
Lec13 stereo converted
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1
 
PPT s01-machine vision-s2
PPT s01-machine vision-s2PPT s01-machine vision-s2
PPT s01-machine vision-s2
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
 
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
Mayank Raj - 4th Year Project on CBIR (Content Based Image Retrieval)
 
Changing Minds, Changing Organizations, Changing Technologies
Changing Minds, Changing Organizations, Changing TechnologiesChanging Minds, Changing Organizations, Changing Technologies
Changing Minds, Changing Organizations, Changing Technologies
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVID
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 

Lec18 bag of_features

  • 1. Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
  • 2. Overview: Bag-of-features models • Origins and motivation • Image representation • Feature extraction • Visual vocabularies • Discriminative methods • Nearest-neighbor classification • Distance functions • Support vector machines • Kernels • Generative methods • Naïve Bayes • Probabilistic Latent Semantic Analysis • Extensions: incorporating spatial information
  • 3. Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 4. Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
  • 5. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • 6. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 7. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 8. Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/
  • 9. Bags of features for object recognition face, flowers, building • Works pretty well for image-level classification Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
  • 10. Bags of features for object recognition Caltech6 dataset bag of features bag of features Parts-and-shape model
  • 11. Bag of features: outline 1. Extract features
  • 12. Bag of features: outline 1. Extract features 2. Learn “visual vocabulary”
  • 13. Bag of features: outline 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary
  • 14. Bag of features: outline 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”
  • 15. 1. Feature extraction Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005
  • 16. 1. Feature extraction Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005
  • 17. 1. Feature extraction Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods • Random sampling (Vidal-Naquet & Ullman, 2002) • Segmentation-based patches (Barnard et al. 2003)
  • 18. 1. Feature extraction Compute SIFT Normalize descriptor patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic
  • 20. 2. Learning the visual vocabulary …
  • 21. 2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic
  • 22. 2. Learning the visual vocabulary Visual vocabulary … Clustering Slide credit: Josef Sivic
  • 23. K-means clustering • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk D( X , M ) = ∑ ∑ ( xi − mk ) 2 cluster k point i in cluster k Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it
  • 24. From clustering to vector quantization • Clustering is a common method for learning a visual vocabulary or codebook • Unsupervised learning process • Each cluster center produced by k-means becomes a codevector • Codebook can be learned on separate training set • Provided the training set is sufficiently representative, the codebook will be “universal” • The codebook is used for quantizing features • A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook • Codebook = visual vocabulary • Codevector = visual word
  • 25. Example visual vocabulary Fei-Fei et al. 2005
  • 26. Image patch examples of visual words Sivic et al. 2005
  • 27. Visual vocabularies: Issues • How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting • Generative or discriminative learning? • Computational efficiency • Vocabulary trees (Nister & Stewenius, 2006)
  • 29. Image classification • Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?