Probabilistic generative models for machine vision
1. Machine Vision Background
Region Annotation by Hierarchical Region Topic Discovery
Probabilistic Generative Models for Machine
Vision
Davide Bacciu
bacciu@di.unipi.it
Dipartimento di Informatica
Università di Pisa
Corso di Intelligenza Artificiale
Prof. Alessandro Sperduti
Università degli Studi di Padova
A.A. 2008/2009
Davide Bacciu Probabilistic Generative Models for Machine Vision
2. Machine Vision Background
Region Annotation by Hierarchical Region Topic Discovery
Outline
1 Machine Vision Background
Introduction
Visual Content Representation
Machine Vision Applications
2 Region Annotation by Hierarchical Region Topic Discovery
Visual Content Representation
Hierarchical Probabilistic Latent Semantic Analysis
Experimental Evaluation
Davide Bacciu Probabilistic Generative Models for Machine Vision
3. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Introduction
Visual content representation
Visual features - how to identify and describe the single bits
of visual information
Image representation - how to describe the visual
information within a picture
Machine vision applications
Scene and object recognition
Image annotation
Focus on Probabilistic Generative Models and the Bag of
Words image representation
Davide Bacciu Probabilistic Generative Models for Machine Vision
4. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Visual Features Descriptors
How do we describe the single bits of visual information?
Global descriptors
Color histograms
Shape features
Texture
Spatial Envelope
Local descriptors
Gabor filters
Differential filters
Distribution-based
descriptors
Davide Bacciu Probabilistic Generative Models for Machine Vision
5. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Scale Invariant Feature Transform (SIFT)
Calculate gradient magnitude
and orientation at a given
keypoint (x, y ) and scale σ
Compute 3D histogram of
magnitude orientations in a
4 × 4 neighborhood of the
keypoint
Angles are quantized to 8
directions
Robust to geometric and
illumination distortion
Davide Bacciu Probabilistic Generative Models for Machine Vision
6. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Visual Features Detectors
How do we identify the interesting/informative parts of an
image?
Feature detectors
Segment/Blob detectors
Corner/Interest point detectors
Affine invariant keypoint detectors
Dense sampling
Davide Bacciu Probabilistic Generative Models for Machine Vision
7. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Image Representation
Straightforward for global descriptors
An image corresponds to its descriptor
Images are confronted, classified and indexed based on
their global color and/or texture histograms
Requires an additional step for local descriptors
Local descriptors provide only information on a portion of
the scene
Probabilistic modeling of segments
Bag of words
Davide Bacciu Probabilistic Generative Models for Machine Vision
8. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Bag of Words Document Representation
Count the occurrences of each dictionary word in your
document
Represent the document as a vector of word counts
Definition (Bag of Words Assumption)
Word order is not relevant for determining document semantics
Davide Bacciu Probabilistic Generative Models for Machine Vision
9. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Bag of Words Image Representation
Each image is a document and each visual patch is a word
Count the occurrences of each dictionary visual word
(visterm) in your image
Represent the image as a vector of visterm counts
(histogram)
... k−means
Image Codebook
Collection
Local patches
Vector
Quantization
Image Input Image BOW
Collection
Davide Bacciu Probabilistic Generative Models for Machine Vision
10. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Image Understanding Applications
Images are typically represented as vector of features
Discover the semantics of an image from its vectorial
representation
Object recognition (e.g. Airplane)
Scene recognition (e.g. Forest)
Image annotation (e.g. Sky, Tree, Boat, Sea, ...)
Davide Bacciu Probabilistic Generative Models for Machine Vision
11. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Scene and Object Recognition
Structured approaches
Place Context
Object/scene categories are
represented as collections of
connected parts characterized by Object Context
distinctive appearance and spatial
Part Context
position
Part-based probabilistic models
Weakly structured approaches
PixelContext
Based on bag-of-word assumption
Latent aspect models
Davide Bacciu Probabilistic Generative Models for Machine Vision
12. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Latent Aspect Discovery
Rooted into text processing
Probabilistic models describing the generative process of
image content using hidden (i.e. latent) variables
Model likelihood is used to measure the fit of the unknown
parameters with respect to the observations
Model parameters are fitted by Expectation Maximization
(EM) or variational methods
Davide Bacciu Probabilistic Generative Models for Machine Vision
13. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Probabilistic Latent Semantic Analysis (pLSA) (I)
P(d) P(z|d) P(w|z)
d z w
Wd
D
Observations are collected in the integer matrix n(wj , di )
Every observation is associated to a latent topic k by
means of the hidden variable zk
Each hidden topic k denotes a particular visual theme (e.g.
an object)
Davide Bacciu Probabilistic Generative Models for Machine Vision
14. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Probabilistic Latent Semantic Analysis (pLSA) (II)
P(d) P(z|d) P(w|z)
d z w
Wd
D
pLSA is trained by Expectation Maximization (EM)
Conditional independence is used to decompose model
likelihood
D W
L(N |θ) = P(wj , di )n(wj ,di )
i=1 j=1
K
P(wj , di ) = P(di )P(wj |di ) = P(di ) P(zk |di )P(wj |zk )
k =1
Davide Bacciu Probabilistic Generative Models for Machine Vision
15. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
An Object Recognition Example
Caltech-4 Dataset: cars, faces, motorbikes and airplanes on a
cluttered background
Fit a pLSA model with 7 latent topics (4 objects + 3 background)
Determine the most likely topic of each visual word in image di
P(zk |di )P(wj |zk )
P(zk |wj , di ) = K
k =1 P(zk |di )P(wj |zk )
Figure: Images by Sivic et al 2005
Davide Bacciu Probabilistic Generative Models for Machine Vision
16. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Latent Dirichlet Allocation (LDA)
β φ
K
α θ z w
Wd
D
pLSA provides a generative model only for the training data
LDA treats P(zk |di ) as a latent random variable, sampling
it from a Dirichlet distribution P(θ|α)
Maximizes data likelihood
P(w|φ, α, β) = P(w|z, φ)P(z|θ)P(θ|α)P(φ|β)dθ
z
Davide Bacciu Probabilistic Generative Models for Machine Vision
17. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Hierarchical Latent Topic Models...
Dependent Pachinko Allocation Model (Li and McCallum, 2006)
Davide Bacciu Probabilistic Generative Models for Machine Vision
18. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Image Annotation
Learn association between image content and text
Typically exploits region-based image representation
Image segmentation
Color and texture descriptors
Non-probabilistic image annotation
Information retrieval approaches (e.g. vector based)
Multiple instance learning
Probabilistic Image Annotation
Machine translation approach
Probabilistic generative
Latent topic models
Davide Bacciu Probabilistic Generative Models for Machine Vision
19. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Latent Aspect Models in Image Annotation
µ µ
z b α θ z b
Bd σ Bd σ
α θ
v t β v t β
Md Md
D D
GM-LDA CORR-LDA
Based on a Mixture of Gaussians modeling of the Bd image
regions and a multinomial distribution for annotations Md
Davide Bacciu Probabilistic Generative Models for Machine Vision
20. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
Image Annotation - Is it Worth the Hassle?
The Web is full of annotated multimedia information
Davide Bacciu Probabilistic Generative Models for Machine Vision
21. Introduction
Machine Vision Background
Visual Content Representation
Region Annotation by Hierarchical Region Topic Discovery
Machine Vision Applications
How Informative is Image Representation?
BOW assumption is strong in the visual domain: spatial
context matters!
Region-based representation is often too coarse and
unstable.
How can we retain the robustness of BOW while
introducing spatial context?
Davide Bacciu Probabilistic Generative Models for Machine Vision
22. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Introduction
Overtake flat region-based and keypoint-based
approaches
Structured representation conveying richer semantics and
discriminative power than a BOW model without the rigidity
of a part-based approach
Key idea
Introduce a multi-resolution representation of visual content
drawing both on region-based and keypoint-based models
Introduce a multi-layered structure at the semantic level of
latent topic discovery
Davide Bacciu Probabilistic Generative Models for Machine Vision
23. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Multi-Resolution Image Representation
Multi-resolution describing visual information at different
spatial granularity
BOW representation is based on assumption that spatial
distribution of visual keywords can be neglected when
determining the overall image appearance
Biased view holding mostly for corpora characterized by
small variability of the pictorial content
Need of structured approach that can isolate semantically
different image portions
Text representation
Documents are collections of paragraphs
Paragraphs are collections of words
Davide Bacciu Probabilistic Generative Models for Machine Vision
24. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Region-Keypoints Representation
r1
r2
r3
r4 wr
bag−of−words
r5
segmented image ...
bag−of−regions
Use an unsupervised learning model to extract perceptually
coherent portions of the image rl
Use dense sampling to extract a set of local visual descriptors
Davide Bacciu Probabilistic Generative Models for Machine Vision
25. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Hierarchical Region-Topic Probabilistic Latent
Semantic Analysis (HRPLSA)
φ
T
d z t w
Wr
rd
D
Complete model likelihood
nilj
D Ri K Wl T
LC (N |θ) = P(zk |di )Zilk P(tp |zk )(φjp )Tkjp
i=1 l=1 k =1 j=1 p=1
Davide Bacciu Probabilistic Generative Models for Machine Vision
26. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Model Fitting
HRPLSA parameters are fitted by applying Expectation
Maximization to the complete log-likelihood, e.g.
D Ri Wl
i=1 l=1 P(zk |di , rl ) j=1 nilj P(tp |zk , wj )
P(tp |zk ) = Ri
D
i =1 l =1 P(zk |di , rl )ni l ·
Model parameters θ
P(zk |di ) - Document-specific mixing proportions of region
topics
P(tp |zk ) - Mixing proportions of keyword topics given region
aspects
φjp - Multinomial keyword-topic distribution
Probabilistic inference is extended to images outside the
training pool by folding-in
Davide Bacciu Probabilistic Generative Models for Machine Vision
27. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Modeling Images and Words by HRPLSA
(HRPLSA-ANN)
d
sky
building
P (a|z)
streetlight
sing z1 z2 ... zR
awning
...
window P (a|t)
sidewalk
t1 t2 ... tW1 t1 t2 ... tWR
plants
car
road
w1 w2 wW1 w1 w2 wWR
Connection between region-topics (keyword-topics) and word is
learned by maximizing constrained annotation log-likelihood
D F K
Lcal (N |θ ) = mfi log P(af |zk )P(zk |di )
i=1 f =1 k =1
Davide Bacciu Probabilistic Generative Models for Machine Vision
28. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Probabilistic Inference
A Few Examples
Most likely visual aspect within an image di
z i = arg max P(zk |di )
zk ∈Z
Most likely visual aspect z ∗ for an image segment rl
z ∗ = arg max P(zk |di , rl )
zk ∈Z
Most likely annotation a∗ for region rl in image dnew
K
a∗ = arg max P(af |zk )P(zk |dnew )
af ∈A
k =1
Davide Bacciu Probabilistic Generative Models for Machine Vision
29. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Unsupervised Discovery and Segmentation of Visual
Classes
Microsoft Research Cambridge dataset (MSRC-B1)
240 images and 9 visual classes
Ground truth segmentation of visual classes
Image segments are assigned to the most likely
discovered aspect and segmentation overlap is measured
GTo ∩ Rr
ρor = .
GTo ∪ Rr
Davide Bacciu Probabilistic Generative Models for Machine Vision
30. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Semantic Segmentation
Discovery of semantic visual aspects corrects segmentation
errors
Davide Bacciu Probabilistic Generative Models for Machine Vision
32. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
MSRC-B1 Segmentation Example I
Davide Bacciu Probabilistic Generative Models for Machine Vision
33. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
MSRC-B1 Segmentation Example II
Davide Bacciu Probabilistic Generative Models for Machine Vision
34. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
MSRC-B1 Segmentation Example III
Davide Bacciu Probabilistic Generative Models for Machine Vision
35. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Image Annotation
Washington Ground Truth dataset
854 images manually annotated with 392 words
427 training/test images
LabelMe dataset
2688 images annotated with ∼ 400 words
800 training / 1888 test images
Comparative analysis
HRPLSA-ANN
pLSA-FEATURES
Annotation Propagation (AP)
Precision/recall analysis
Davide Bacciu Probabilistic Generative Models for Machine Vision
36. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Washington Dataset
Image Annotation
Precision/Recall Plot Comparative precision/recall results on the Washington dataset
0.8
plsa245
0.75 hrplsa245
ap245
0.7
plsa188
0.65 hrplsa188
ap188
0.6
precision
plsa158
hrplsa158
0.55
ap158
0.5
0.45
0.4
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
recall
Davide Bacciu Probabilistic Generative Models for Machine Vision
37. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Washington Dataset
Image Annotation Example
Davide Bacciu Probabilistic Generative Models for Machine Vision
38. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Washington Dataset
Region Annotation Example (I)
Examples of good HRPLSA-ANN annotations
50 50
clouds tree
clouds tree
100 100
building
150 tree tree
150
tree
200
clouds 200
clouds
sky
250 tree
250
tree
300
300
house 350
350 tree
400
400 people tree sidewalk tree
people 450
450
500
50
clear
100 clear
house
150
window
200
250 tree building
tree
300 building
350
tree
400 tree
tree tree
450
building tree
500
Davide Bacciu Probabilistic Generative Models for Machine Vision
39. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Washington Dataset
Region Annotation Example (II)
Examples of bad HRPLSA-ANN annotations
50 50
tree clouds
tree tree clouds clouds
100 100
tree
150 tree 150
200 200 water sky
tree tree
250 250
tree tree tree
tree
300 300
tree
350
tree 350
sky
400 400
tree sky tree
bush
450 450
flower
500 500
50
overcast
sky
people
100
150
house clouds
200
250 tree
people
300
350
tree
400
tree tree
450
500
Davide Bacciu Probabilistic Generative Models for Machine Vision
40. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
Washington Dataset
Annotated Segments
Davide Bacciu Probabilistic Generative Models for Machine Vision
44. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
LabelMe Dataset
Region Annotation Issue I
Region annotation tends to predict only a limited number of
words
0.16 coast forest
400 2000
0.14 200 1000
0.12 0 0
0 100 200 300 0 100 200 300
highway city
0.1 1000 2000
500 1000
0.08
0 0
0 100 200 300 0 100 200 300
0.06 mountain country
1000 1000
0.04
500 500
0.02 0 0
0 100 200 300 0 100 200 300
street tallbuilding
0 2000 2000
g
n
on
er
ld
ad
w
e
er
r
a
y
ca
sk
in
ai
tre
se
do
fie
ap
at
rs
ro
ild
nt
1000 1000
w
in
pe
cr
ou
bu
w
ys
m
sk
0 0
0 100 200 300 0 100 200 300
Davide Bacciu Probabilistic Generative Models for Machine Vision
45. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
LabelMe Dataset
Region Annotation Issue II
Distribution of region-topics in mountain caption
Topic 11 Topic 40
Topic 20
The unbalanced distribution of term occurrences introduces a bias in
the generation of the segment caption that yields to distinct region
aspects being associated to the same frequently co-occurring word
Davide Bacciu Probabilistic Generative Models for Machine Vision
46. Visual Content Representation
Machine Vision Background
Hierarchical Probabilistic Latent Semantic Analysis
Region Annotation by Hierarchical Region Topic Discovery
Experimental Evaluation
LabelMe Dataset
Annotated Segments
Davide Bacciu Probabilistic Generative Models for Machine Vision
47. Concluding Remarks
Conclusion
Bibliography and Online Resources
Wrapping up...
Inspiration can be sought in related application fields
Bag of words representation
Generative models for text
Need to pay attention to the underlying assumptions of a
model!
Hierarchical multi-resolution latent aspect model (HRPLSA)
Unsupervised semantic segmentation of visual content
Multi-level image captioning
Future challenges
Generalize multi-resolution representation and topic
hierarchy beyond a two-layered structure
Video/multimedia understanding
Davide Bacciu Probabilistic Generative Models for Machine Vision
48. Concluding Remarks
Conclusion
Bibliography and Online Resources
Online Resources
A notable introductory course to object recognition (with code)
http://people.csail.mit.edu/torralba/
shortCourseRLOC/
Code repositories
http://www.cs.ubc.ca/~lowe/keypoints/ (Original
SIFT software by D. Lowe)
http://vlfeat.org/ (SIFT, MSER and VQ routines)
www.robots.ox.ac.uk/~vgg/research/affine/ (Affine
detectors and more)
http://www.di.ens.fr/~russell/projects/mult_
seg_discovery/index.html (Multiple segmentation LDA)
Image annotation tool and repository
http://labelme.csail.mit.edu/ (LabelMe)
Davide Bacciu Probabilistic Generative Models for Machine Vision
49. Concluding Remarks
Conclusion
Bibliography and Online Resources
Bibliography
D. Bacciu, A Perceptual Learning Model to Discover the Hierarchical Latent Structure of Image Collections,
Ph.D. Thesis, 2008.
J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos.
Proceedings of the Ninth IEEE International Conference on Computer Vision, ICCV 2003, pages
1470–1477, vol.2, Oct. 2003.
J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman. Discovering objects and their location in
images. Proceedings of the Tenth IEEE International Conference on Computer Vision, ICCV 2005,
370–377, Oct. 2005.
J. Sivic, B. C. Russell, A. Zisserman, W. T. Freeman, and A. A. Efros. Unsupervised discovery of visual
object class hierarchies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2008.
K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. M. Blei, and M. I. Jordan. Matching words and pictures.
Journal of Machine Learning Research, vol. 3, 1107–1135, Feb. 2003.
D. M. Blei and M. I. Jordan. Modeling annotated data. Proceedings of the International Conference on
Research and Development in Information Retrieval, SIGIR 2003, Aug. 2003.
F. Monay and D. Gatica-Perez. Modeling semantic aspects for cross-media image indexing. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 29(10):1802–1817, Oct. 2007.
Davide Bacciu Probabilistic Generative Models for Machine Vision
50. Concluding Remarks
Conclusion
Bibliography and Online Resources
Thank You
Davide Bacciu Probabilistic Generative Models for Machine Vision