SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Learning A Structured Model For Visual Category
Recognition
Ashish Gupta
University of Surrey
a.gupta@surrey.ac.uk
July 5,2013
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Introduction
Introduction : What is Category Recognition?
Feature vector Embedding : Information in Sub-Manifold.
Feature vector distribution: Fuzzy Visual Model.
Estimating semantic structure: Co-clustering.
Sparse Models: Semantically structured.
Summary & Future Work
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category?
Robot interacts physical objects.
Object taxonomy based on physical
properties.
Robot recognizes object using
visual appearance.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category Model
Appearance variation → scatter of semantically related descriptors in feature
space
Can this scatter distribution be estimated?
Can this structure be used to improve the learnt visual model?
Visual category model ≈ Visual object model + Estimated structure of visual
category variation
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Visual Classification Pipeline
Structure in sub-spaces → groups of sub-spaces → dictionary
Structure in dictionary → groups of prototypes → encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Feature Descriptor Matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dimensions
0
50
100
150
200
250
Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Encoded Feature Matrix
Conceptual illustration of encoded feature matrix, occurrence
histogram of visual words in images.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Conceptual Interpretation
Structure estimation can be interpreted as estimation of
semantically related rows or columns of data matrix. These are
projected to a lower dimensional space such that mutual separation
between equivalent feature vectors is reduced.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sub-space Embedding
Feature descriptor space is high dimensional.
Relevant information is embedded in a lower dimensional
sub-manifold.
What is the appropriate lower dimensionality?
Measure efficacy of sub-space embedding method?
Measure information in embedded feature vectors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Intrinsic dimensionality p estimation
Correlation Dimension
Number of feature vectors in a hypersphere of radius r is proportional to rp
.
Maximum Likelihood Estimate
Expectation of number of feature vectors covered by a hypersphere of growing
radius r.
Eigenvalue Estimate
Number of eigenvalues greater than a small threshold value .
Geodesic Minimum Spanning Tree
Based on length of GMST of k descriptors in a neighbourhood graph.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Estimated Intrinsic Dimensionality
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Subspace Embedding Methods
Global Methods
Principal Components
Multi-Dimensional
Scaling
Stochastic Proximity
Embedding
Isomap
Diffusion Maps
Local Methods
Locally Linear Embedding
Locality Preserving Projection
Neighbourhood Preserving
Projection
Landmark Isomap
t-Stochastic Neighbourhood
Embedding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Entropic Measure
Entropy Measure Intuition
−10 −5 0 5 10 15
0
20
40
−15
−10
−5
0
5
10
15
x
’swiss’ synthetic data
Y
Z
−1.5
−1
−0.5
0
0.5
1
1.5
−1
−0.5
0
0.5
1
−5
0
5
10
X
’intersect’ synthetic data
Y
Z
−400 −200 0 200 400
−500
0
500
−300
−200
−100
0
100
200
X
’VOC2006,car’ data
Y
Z
0 10 20 30 40 50 60 70 80 90 100
0
0.005
0.01
0.015
0.02
0.025
Bin index
NormalizedFrequency
Distribution of pair−wise distances in data
swiss, H=−25.3355
intersect, H=−19.3150
VOC2006,car, H=−33.0302
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Comparison of Embedded Entropy
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Computational Time Complexity
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Classification Performance
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Estimated intrinsic dimensionality was in the neighbourhood
of 14 of the 128-dimensional descriptor.
The performance of LPP in comparison to other embedding
methods accentuates the importance of modelling structure in
local distributions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Visual Model
Structure in distribution of descriptors in feature space?
Issues with K-means clustering in the Bag-of-Words model.
Visual model incorporating Fuzzy logic framework.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Visual Ambiguity
Descriptor assignment has issues of uncertainty and
plausibility.
Kernel Codebook uses soft-assignment to resolve the
ambiguity.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
Visual Dictionary
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
K−means Hard Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
Fuzzy K−Means Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acceleration(normalizedscale)
Gustafson−Kessel Fuzzy Partition | Motorcycle Data
L(Z; µC) =
r
j=1 i∈Cj
zi − µCj
2
L(Z; D, A) =
r
i=1
n
j=1
(αij )m
zj − µCi
2
Σ
L(Z; D, A, {Σi }) =
r
i=1
n
j=1
(αij )m
zj − di
2
Σi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
d2
Σ(z, µC) = (z−µC)T
Σ(z−µC)
Σ =






( 1
σ1
)2
0 · · · 0
0 ( 1
σ2
)2
· · · 0
...
...
...
...
0 0 · · · ( 1
σn
)p






d2
Σi
(zj , µCi ) = (zj −µCi )T
Σi (zj −µCi )
Fi =
n
j=1(αij )m
(zj − di )(zj − di )T
n
j=1(αij )m
Σi =
(ρi det(Fi ))
1
p
Fi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
FKM Classification Performance
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburbstore
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-Words
Fuzzy K-means
sheep
horse
bicycle
motorbike cow bus dog cat
person car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-Words
Fuzzy K-means
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
GK Classification Performance
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburbstore
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-Words
Gustafson-Kessel
sheep horse bicycle
motorbike cow bus dog cat person car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-Words
Gustafson-Kessel
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Dictionary Size
32 64 128 256 512
dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-Words
Fuzzy K-means
32 64 128 256 512
dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-Words
Gustafson-Kessel
Comparison of BoW with FKM and GK for different sizes of
dictionary.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Aggregate Performance
VOC2006 VOC2010
data set
0.50
0.51
0.52
0.53
0.54
0.55
Acc
Bag-of-Words
Fuzzy K-means
Gustafson-Kessel
(a) VOC datasets
Caltech101 Caltech256
data set
0.60
0.62
0.64
0.66
0.68
Acc
Bag-of-Words
Fuzzy K-means
Gustafson-Kessel
(b) Caltech datasets
Visual Model Data Set
VOC-2006 VOC-2010 Caltech-101 Caltech-256
BoW 0.50825 0.52446 0.60111 0.67606
FKM 0.52635 0.53736 0.61928 0.68357
G-K 0.52885 0.54224 0.62413 0.68623
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Visual model learnt within the framework of fuzzy logic adapts
to the local distribution of feature vectors.
Learning a better fuzzy membership function is an effective
alternative to learning increasing large dictionaries to adapt to
increasing complexity of visual categories.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering for Structure Estimation
What is co-clustering?
Co-clustering for structure in descriptor data matrix.
Co-clustering for structure in encoded feature matrix.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering
Co-clustering is simultaneous and alternative row and column
clustering of a data matrix.
At each step of the optimization routine, the groups of rows
guide column clustering and vice versa.
CX : {x1, . . . , xm} → {ˆx1, . . . , ˆxk}
CY : {y1, . . . , yn} → {ˆy1, . . . , ˆyl }
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering methods
Information-Theoretic Co-Clustering
Data matrix is considered a joint probability distribution.
Minimizes KL-divergence between original data and co-clustered
matrices.
Sum-Squared Residue Co-Clustering
Alternative k-means clustering of rows and columns. Minimizes
squared Euclidean distance between rows and columns from row
and column means respectively.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Information-Theoretic Co-clustering
I(X; Y ) − I( ˆX; ˆY ) = dKL(p(X, Y ), q(X, Y ))
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Mutiple Sub-spaces Intuition
i,j
dE (z•
i|Sl
, z•
j|Sq
) >
i,j
dE (z•
i , z•
j ), l = q
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Co-clustering descriptor data matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dimensions
0
50
100
150
200
250
Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters
feature vectors
dimensions
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary on single and multiple sub-spaces
Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans
dictionary [500]
dimensions[10]PCA
0
100
200
Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans
dictionary [500]
dimensions[10]CC
0
100
200
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
0.70
F1
Dict: 10x1000
MSSD:(i): 5x1000
MSSD:(r): 5x1000
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
F1
Dict: 10x1000
MSSD:(i): 10x1000
MSSD:(r): 10x1000
Comparison of classification performance of single and multiple sub-space
dictionaries.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary projected to multiple sub-spaces
Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans
dictionary [500]
dimensions[128]
0
50
100
150
200
250
Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans
dictionary [500]
dimensions[128],submanifolds[10]
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
F1(5)
Dict: 128x1000
SSSD:(i): 128x1000
SSSD:(r): 128x1000
VOC2006 VOC2007
Data Set
0.50
0.55
0.60
0.65
0.70
F1(50)
Dict: 128x1000
SSSD:(i): 128x1000
SSSD:(r): 128x1000
Comparison of classification performance of dictionary projected to multiple
sub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Structure in Dictionary Intuition
Estimating groups of non-contiguous partitions of feature space
that are semantically related.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Topic Dictionary Concept
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Classification Performance
Comparison of classification performance of dictionaries using BoW
and ITCC, for VOC2006 and Scene15 datasets.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Dictionary sizes
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 100
CC:i: 100
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 500
CC:i: 500
VOC2006 VOC2007 VOC2010 Scene15 Caltech101
Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 1000
CC:i: 1000
Comparative classification performance for different dictionary
sizes.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Conclusion
Groups of sub-spaces computed using co-clustering yielded
dictionaries with better classification performance.
Groups of feature space partition (dictionary elements) yielded
improved classification results.
These estimated groups can be used in learning a semantically
structured visual model.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Decomposition
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Visual Model
Sparse model approximates a feature vector as a combination
of a sub-set of an over-complete basis set.
Sparsity is induced by adding a regularization constraint is
added to the coefficients in the loss function.
Degree of sparsity is determined empirically.
Each basis element is considered individually.
Possible structure amongst basis elements is disregarded.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Model
SSPCA (structure in sub-spaces)
Co-clustered groups of sub-spaces is used to augment Sparse-PCA
to compute Structured Sparse-PCA dictionary.
Group Lasso (structure in dictionary)
Co-clustered groups of dictionary elements is used to augment
Lasso to compute group Lasso feature encoding.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Regularization
Sparse regularization : min
α
1
n
n
i=1
L(zi , dαi ) + λΩ(α)
Lasso : min
α
1
n
n
i=1
zi − Dαi
2
+λ αi 1
Group Sparsity : min
α
1
n
n
i=1
zi − Dαi
2
+λ
k
j=1
αi Gj
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using ITCC
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
50
60
70
80
90
mAP
VOC2006
Sparse Subspace
Structured Subspace
sheep
horse
bicycle
aeroplanecow
sofabus dog cat
person
train
diningtable
bottlecar
pottedplant
tvmonitor
chairbird
boat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse Subspace
Structured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using SSRCC
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
mAP
VOC2006
Sparse Subspace
Structured Subspace
sheep
horse
bicycle
aeroplanecow
sofabus dog cat
person
train
diningtable
bottlecar
pottedplant
tvmonitor
chairbird
boat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse Subspace
Structured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Sparse Subspace Structured Sparse Subspace
Data Set ITCC SSRCC
VOC2006 67.5941 70.8295 68.5808
VOC2007 67.9971 68.0783 68.3718
Sparse selection of semantically related set of sub-spaces
performs better than sparse individual selection of sub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using ITCC
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburb
store
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
Visual Category
50
60
70
80
90
mAP
Scene15 ITCC
Sparse Encoding
Structured Encoding
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
100
mAP
VOC2006 ITCC
Sparse Encoding
Structured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using SSRCC
MITcoast
MITmountain
industrial
livingroom
MITopencountry
PARoffice
MITtallbuilding
CALsuburb
store
bedroom
MITforest
MIThighway
MITstreet
MITinsidecity
kitchen
Visual Category
50
55
60
65
70
75
80
85
mAP
Scene15 SSRCC
Sparse Encoding
Structured Encoding
sheep
horse
bicycle
motorbike cow bus dog cat
person car
Visual Category
60
70
80
90
100
mAP
VOC2006 SSRCC
Sparse Encoding
Structured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Sparse Encoding Structured Sparse Encoding
Data Set ITCC SSRCC
VOC-2006 72.8386 73.3977 72.7738
Scene-15 68.5737 79.8794 72.1155
Sparse selection of semantically related set of dictionary
elements performs better than sparse individual selection of
dictionary element.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Summary
Learning semantically relevant structure in feature space used
to compute better visual models.
Analysis of sub-space embedding emphasized modelling local
distributions.
Incorporation of fuzzy logic framework to learn dictionary
kernels that adapt to local distributions yielded better visual
models.
Co-clustering was successful in grouping semantically related
sub-spaces and feature space partitions.
Estimated groups of sub-spaces and dictionary elements were
used to compute structured sparse visual models, improving
upon regular sparse models.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
Future Work
Visual models using Fisher Kernel coding, which uses a
Gaussian kernel, has been very successful. Combining the
approach in Fisher Kernels with the learnt Fuzzy membership
functions could potentially improve the visual model.
Fuzzy logic based learning algorithms that are more advanced
than Gustafson-Kessel could be explored to learn better
membership functions.
Co-clustering creates a block factorization of the data matrix.
Partial membership of rows and columns to the co-clusters
would be the natural next step.
Explore ways of using semantic structure to improve feature
generation techniques like hierarchical models that aim to
learn category specific descriptors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
End
Questions...
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
BoW Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Bag−of−Words Partition | VOC−2006 | #000017
Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 ( ) is
computed using K-means clustering. The feature vectors ( ) are projected to 2 dimensions using PCA.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
FKM Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017
Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
GK Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017
Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition

Weitere ähnliche Inhalte

Ähnlich wie Learning a structured model for visual category recognition

Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Trackingijsrd.com
 
IRJET- Recognition of OPS using Google Street View Images
IRJET-  	  Recognition of OPS using Google Street View ImagesIRJET-  	  Recognition of OPS using Google Street View Images
IRJET- Recognition of OPS using Google Street View ImagesIRJET Journal
 
Vertical Image Search Engine
 Vertical Image Search Engine Vertical Image Search Engine
Vertical Image Search Engineshivam_kedia
 
Object Capturing In A Cluttered Scene By Using Point Feature Matching
Object Capturing In A Cluttered Scene By Using Point Feature MatchingObject Capturing In A Cluttered Scene By Using Point Feature Matching
Object Capturing In A Cluttered Scene By Using Point Feature MatchingIJERA Editor
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Jill Lyons
 
Image Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual SaliencyImage Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual SaliencyIRJET Journal
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionPetteriTeikariPhD
 
IRJET- Semantic Retrieval of Trademarks based on Text and Images Conceptu...
IRJET-  	  Semantic Retrieval of Trademarks based on Text and Images Conceptu...IRJET-  	  Semantic Retrieval of Trademarks based on Text and Images Conceptu...
IRJET- Semantic Retrieval of Trademarks based on Text and Images Conceptu...IRJET Journal
 
A Convolutional Neural Network approach for Signature verification
A Convolutional Neural Network approach for Signature verificationA Convolutional Neural Network approach for Signature verification
A Convolutional Neural Network approach for Signature verificationIRJET Journal
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET Journal
 
Emerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechEmerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechPetteriTeikariPhD
 
A Review on Face Detection under Occlusion by Facial Accessories
A Review on Face Detection under Occlusion by Facial AccessoriesA Review on Face Detection under Occlusion by Facial Accessories
A Review on Face Detection under Occlusion by Facial AccessoriesIRJET Journal
 
A preliminary approach on ontologybased visual query formulation for big data
A preliminary approach on ontologybased visual query formulation for big dataA preliminary approach on ontologybased visual query formulation for big data
A preliminary approach on ontologybased visual query formulation for big dataAhmet Soylu
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...IRJET Journal
 
Object recognition
Object recognitionObject recognition
Object recognitionsaniacorreya
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...IRJET Journal
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringIRJET Journal
 
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme ijcisjournal
 

Ähnlich wie Learning a structured model for visual category recognition (20)

Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object TrackingIntegrated Hidden Markov Model and Kalman Filter for Online Object Tracking
Integrated Hidden Markov Model and Kalman Filter for Online Object Tracking
 
IRJET- Recognition of OPS using Google Street View Images
IRJET-  	  Recognition of OPS using Google Street View ImagesIRJET-  	  Recognition of OPS using Google Street View Images
IRJET- Recognition of OPS using Google Street View Images
 
Vertical Image Search Engine
 Vertical Image Search Engine Vertical Image Search Engine
Vertical Image Search Engine
 
Object Capturing In A Cluttered Scene By Using Point Feature Matching
Object Capturing In A Cluttered Scene By Using Point Feature MatchingObject Capturing In A Cluttered Scene By Using Point Feature Matching
Object Capturing In A Cluttered Scene By Using Point Feature Matching
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
Image Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual SaliencyImage Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual Saliency
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer Vision
 
IRJET- Semantic Retrieval of Trademarks based on Text and Images Conceptu...
IRJET-  	  Semantic Retrieval of Trademarks based on Text and Images Conceptu...IRJET-  	  Semantic Retrieval of Trademarks based on Text and Images Conceptu...
IRJET- Semantic Retrieval of Trademarks based on Text and Images Conceptu...
 
A Convolutional Neural Network approach for Signature verification
A Convolutional Neural Network approach for Signature verificationA Convolutional Neural Network approach for Signature verification
A Convolutional Neural Network approach for Signature verification
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
 
Emerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechEmerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTech
 
A Review on Face Detection under Occlusion by Facial Accessories
A Review on Face Detection under Occlusion by Facial AccessoriesA Review on Face Detection under Occlusion by Facial Accessories
A Review on Face Detection under Occlusion by Facial Accessories
 
A preliminary approach on ontologybased visual query formulation for big data
A preliminary approach on ontologybased visual query formulation for big dataA preliminary approach on ontologybased visual query formulation for big data
A preliminary approach on ontologybased visual query formulation for big data
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
 
Object recognition
Object recognitionObject recognition
Object recognition
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme Gait Based Person Recognition Using Partial Least Squares Selection Scheme
Gait Based Person Recognition Using Partial Least Squares Selection Scheme
 

Mehr von Ashish Gupta

GreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease DiagnosisGreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease DiagnosisAshish Gupta
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category RecognitionAshish Gupta
 
Visual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-ClusteringVisual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-ClusteringAshish Gupta
 
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel AglorithmFuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel AglorithmAshish Gupta
 
Semantically Relevant Visual Dictionary
Semantically Relevant Visual DictionarySemantically Relevant Visual Dictionary
Semantically Relevant Visual DictionaryAshish Gupta
 
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Ashish Gupta
 

Mehr von Ashish Gupta (6)

GreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease DiagnosisGreenR: Automatic Plant Disease Diagnosis
GreenR: Automatic Plant Disease Diagnosis
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category Recognition
 
Visual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-ClusteringVisual Category Recognition using Information-Theoretic Co-Clustering
Visual Category Recognition using Information-Theoretic Co-Clustering
 
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel AglorithmFuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm
 
Semantically Relevant Visual Dictionary
Semantically Relevant Visual DictionarySemantically Relevant Visual Dictionary
Semantically Relevant Visual Dictionary
 
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
Towards Learning a Semantically Relevant Dictionary for Visual Category Recog...
 

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Learning a structured model for visual category recognition

  • 1. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Learning A Structured Model For Visual Category Recognition Ashish Gupta University of Surrey a.gupta@surrey.ac.uk July 5,2013 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 2. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Introduction Introduction : What is Category Recognition? Feature vector Embedding : Information in Sub-Manifold. Feature vector distribution: Fuzzy Visual Model. Estimating semantic structure: Co-clustering. Sparse Models: Semantically structured. Summary & Future Work Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 3. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Motivation Visual Category? Robot interacts physical objects. Object taxonomy based on physical properties. Robot recognizes object using visual appearance. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 4. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Motivation Visual Category Model Appearance variation → scatter of semantically related descriptors in feature space Can this scatter distribution be estimated? Can this structure be used to improve the learnt visual model? Visual category model ≈ Visual object model + Estimated structure of visual category variation Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 5. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Approach Visual Classification Pipeline Structure in sub-spaces → groups of sub-spaces → dictionary Structure in dictionary → groups of prototypes → encoding Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 6. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Approach Feature Descriptor Matrix Scene−15 D−SIFT, 500 feature vectors of 128 dimensions feature vectors dimensions 0 50 100 150 200 250 Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 7. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Approach Encoded Feature Matrix Conceptual illustration of encoded feature matrix, occurrence histogram of visual words in images. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 8. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Approach Conceptual Interpretation Structure estimation can be interpreted as estimation of semantically related rows or columns of data matrix. These are projected to a lower dimensional space such that mutual separation between equivalent feature vectors is reduced. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 9. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Sub-space Embedding Feature descriptor space is high dimensional. Relevant information is embedded in a lower dimensional sub-manifold. What is the appropriate lower dimensionality? Measure efficacy of sub-space embedding method? Measure information in embedded feature vectors. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 10. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Intrinsic Dimensionality Intrinsic dimensionality p estimation Correlation Dimension Number of feature vectors in a hypersphere of radius r is proportional to rp . Maximum Likelihood Estimate Expectation of number of feature vectors covered by a hypersphere of growing radius r. Eigenvalue Estimate Number of eigenvalues greater than a small threshold value . Geodesic Minimum Spanning Tree Based on length of GMST of k descriptors in a neighbourhood graph. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 11. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Intrinsic Dimensionality Estimated Intrinsic Dimensionality Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 12. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Intrinsic Dimensionality Subspace Embedding Methods Global Methods Principal Components Multi-Dimensional Scaling Stochastic Proximity Embedding Isomap Diffusion Maps Local Methods Locally Linear Embedding Locality Preserving Projection Neighbourhood Preserving Projection Landmark Isomap t-Stochastic Neighbourhood Embedding Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 13. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Entropic Measure Entropy Measure Intuition −10 −5 0 5 10 15 0 20 40 −15 −10 −5 0 5 10 15 x ’swiss’ synthetic data Y Z −1.5 −1 −0.5 0 0.5 1 1.5 −1 −0.5 0 0.5 1 −5 0 5 10 X ’intersect’ synthetic data Y Z −400 −200 0 200 400 −500 0 500 −300 −200 −100 0 100 200 X ’VOC2006,car’ data Y Z 0 10 20 30 40 50 60 70 80 90 100 0 0.005 0.01 0.015 0.02 0.025 Bin index NormalizedFrequency Distribution of pair−wise distances in data swiss, H=−25.3355 intersect, H=−19.3150 VOC2006,car, H=−33.0302 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 14. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Comparison of Embedded Entropy Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 15. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Computational Time Complexity Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 16. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Classification Performance Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 17. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Conclusion Estimated intrinsic dimensionality was in the neighbourhood of 14 of the 128-dimensional descriptor. The performance of LPP in comparison to other embedding methods accentuates the importance of modelling structure in local distributions. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 18. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Fuzzy Visual Model Structure in distribution of descriptors in feature space? Issues with K-means clustering in the Bag-of-Words model. Visual model incorporating Fuzzy logic framework. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 19. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Visual Ambiguity Descriptor assignment has issues of uncertainty and plausibility. Kernel Codebook uses soft-assignment to resolve the ambiguity. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 20. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Fuzzy Models Visual Dictionary 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 times (normalized scale) acceleration(normalizedscale) K−means Hard Partition | Motorcycle Data 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 times (normalized scale) acceleration(normalizedscale) Fuzzy K−Means Partition | Motorcycle Data 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 times (normalized scale) acceleration(normalizedscale) Gustafson−Kessel Fuzzy Partition | Motorcycle Data L(Z; µC) = r j=1 i∈Cj zi − µCj 2 L(Z; D, A) = r i=1 n j=1 (αij )m zj − µCi 2 Σ L(Z; D, A, {Σi }) = r i=1 n j=1 (αij )m zj − di 2 Σi Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 21. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Fuzzy Models d2 Σ(z, µC) = (z−µC)T Σ(z−µC) Σ =       ( 1 σ1 )2 0 · · · 0 0 ( 1 σ2 )2 · · · 0 ... ... ... ... 0 0 · · · ( 1 σn )p       d2 Σi (zj , µCi ) = (zj −µCi )T Σi (zj −µCi ) Fi = n j=1(αij )m (zj − di )(zj − di )T n j=1(αij )m Σi = (ρi det(Fi )) 1 p Fi Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 22. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results FKM Classification Performance MITcoast MITmountain industrial livingroom MITopencountry PARoffice MITtallbuilding CALsuburbstore bedroom MITforest MIThighway MITstreet MITinsidecity kitchen visual category 0.5 0.6 0.7 0.8 Acc Scene15 Bag-of-Words Fuzzy K-means sheep horse bicycle motorbike cow bus dog cat person car visual category 0.45 0.50 0.55 0.60 Acc VOC2006 Bag-of-Words Fuzzy K-means Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 23. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results GK Classification Performance MITcoast MITmountain industrial livingroom MITopencountry PARoffice MITtallbuilding CALsuburbstore bedroom MITforest MIThighway MITstreet MITinsidecity kitchen visual category 0.5 0.6 0.7 0.8 Acc Scene15 Bag-of-Words Gustafson-Kessel sheep horse bicycle motorbike cow bus dog cat person car visual category 0.45 0.50 0.55 0.60 Acc VOC2006 Bag-of-Words Gustafson-Kessel Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 24. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Dictionary Size 32 64 128 256 512 dictionary size 0.58 0.60 0.62 0.64 0.66 Acc Caltech101 Bag-of-Words Fuzzy K-means 32 64 128 256 512 dictionary size 0.58 0.60 0.62 0.64 0.66 Acc Caltech101 Bag-of-Words Gustafson-Kessel Comparison of BoW with FKM and GK for different sizes of dictionary. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 25. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Aggregate Performance VOC2006 VOC2010 data set 0.50 0.51 0.52 0.53 0.54 0.55 Acc Bag-of-Words Fuzzy K-means Gustafson-Kessel (a) VOC datasets Caltech101 Caltech256 data set 0.60 0.62 0.64 0.66 0.68 Acc Bag-of-Words Fuzzy K-means Gustafson-Kessel (b) Caltech datasets Visual Model Data Set VOC-2006 VOC-2010 Caltech-101 Caltech-256 BoW 0.50825 0.52446 0.60111 0.67606 FKM 0.52635 0.53736 0.61928 0.68357 G-K 0.52885 0.54224 0.62413 0.68623 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 26. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Empirical Results Conclusion Visual model learnt within the framework of fuzzy logic adapts to the local distribution of feature vectors. Learning a better fuzzy membership function is an effective alternative to learning increasing large dictionaries to adapt to increasing complexity of visual categories. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 27. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Co-clustering for Structure Estimation What is co-clustering? Co-clustering for structure in descriptor data matrix. Co-clustering for structure in encoded feature matrix. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 28. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Co-clustering Methods Co-clustering Co-clustering is simultaneous and alternative row and column clustering of a data matrix. At each step of the optimization routine, the groups of rows guide column clustering and vice versa. CX : {x1, . . . , xm} → {ˆx1, . . . , ˆxk} CY : {y1, . . . , yn} → {ˆy1, . . . , ˆyl } Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 29. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Co-clustering Methods Co-clustering methods Information-Theoretic Co-Clustering Data matrix is considered a joint probability distribution. Minimizes KL-divergence between original data and co-clustered matrices. Sum-Squared Residue Co-Clustering Alternative k-means clustering of rows and columns. Minimizes squared Euclidean distance between rows and columns from row and column means respectively. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 30. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Co-clustering Methods Information-Theoretic Co-clustering I(X; Y ) − I( ˆX; ˆY ) = dKL(p(X, Y ), q(X, Y )) Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 31. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Mutiple Sub-spaces Intuition i,j dE (z• i|Sl , z• j|Sq ) > i,j dE (z• i , z• j ), l = q Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 32. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Co-clustering descriptor data matrix Scene−15 D−SIFT, 500 feature vectors of 128 dimensions feature vectors dimensions 0 50 100 150 200 250 Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters feature vectors dimensions 0 50 100 150 200 250 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 33. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Dictionary on single and multiple sub-spaces Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans dictionary [500] dimensions[10]PCA 0 100 200 Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans dictionary [500] dimensions[10]CC 0 100 200 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 34. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Classification performance VOC2006 VOC2007 Data Set 0.50 0.55 0.60 0.65 0.70 F1 Dict: 10x1000 MSSD:(i): 5x1000 MSSD:(r): 5x1000 VOC2006 VOC2007 Data Set 0.50 0.55 0.60 0.65 F1 Dict: 10x1000 MSSD:(i): 10x1000 MSSD:(r): 10x1000 Comparison of classification performance of single and multiple sub-space dictionaries. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 35. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Dictionary projected to multiple sub-spaces Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans dictionary [500] dimensions[128] 0 50 100 150 200 250 Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans dictionary [500] dimensions[128],submanifolds[10] 0 50 100 150 200 250 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 36. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Multiple Sub-spaces Classification performance VOC2006 VOC2007 Data Set 0.50 0.55 0.60 0.65 F1(5) Dict: 128x1000 SSSD:(i): 128x1000 SSSD:(r): 128x1000 VOC2006 VOC2007 Data Set 0.50 0.55 0.60 0.65 0.70 F1(50) Dict: 128x1000 SSSD:(i): 128x1000 SSSD:(r): 128x1000 Comparison of classification performance of dictionary projected to multiple sub-spaces. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 37. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Topic Dictionary Structure in Dictionary Intuition Estimating groups of non-contiguous partitions of feature space that are semantically related. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 38. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Topic Dictionary Topic Dictionary Concept Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 39. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Topic Dictionary Classification Performance Comparison of classification performance of dictionaries using BoW and ITCC, for VOC2006 and Scene15 datasets. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 40. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Topic Dictionary Dictionary sizes VOC2006 VOC2007 VOC2010 Scene15 Caltech101 Data Set 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 F1 BoW: 100 CC:i: 100 VOC2006 VOC2007 VOC2010 Scene15 Caltech101 Data Set 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 F1 BoW: 500 CC:i: 500 VOC2006 VOC2007 VOC2010 Scene15 Caltech101 Data Set 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 F1 BoW: 1000 CC:i: 1000 Comparative classification performance for different dictionary sizes. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 41. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Topic Dictionary Conclusion Groups of sub-spaces computed using co-clustering yielded dictionaries with better classification performance. Groups of feature space partition (dictionary elements) yielded improved classification results. These estimated groups can be used in learning a semantically structured visual model. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 42. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Sparse Decomposition Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 43. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Sparse Visual Model Sparse model approximates a feature vector as a combination of a sub-set of an over-complete basis set. Sparsity is induced by adding a regularization constraint is added to the coefficients in the loss function. Degree of sparsity is determined empirically. Each basis element is considered individually. Possible structure amongst basis elements is disregarded. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 44. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sparse Model SSPCA (structure in sub-spaces) Co-clustered groups of sub-spaces is used to augment Sparse-PCA to compute Structured Sparse-PCA dictionary. Group Lasso (structure in dictionary) Co-clustered groups of dictionary elements is used to augment Lasso to compute group Lasso feature encoding. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 45. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Sparse Regularization Sparse regularization : min α 1 n n i=1 L(zi , dαi ) + λΩ(α) Lasso : min α 1 n n i=1 zi − Dαi 2 +λ αi 1 Group Sparsity : min α 1 n n i=1 zi − Dαi 2 +λ k j=1 αi Gj Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 46. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sub-space Structured Sub-space Dictionary using ITCC sheep horse bicycle motorbike cow bus dog cat person car Visual Category 50 60 70 80 90 mAP VOC2006 Sparse Subspace Structured Subspace sheep horse bicycle aeroplanecow sofabus dog cat person train diningtable bottlecar pottedplant tvmonitor chairbird boat motorbike Visual Category 50 60 70 80 90 mAP VOC2007 Sparse Subspace Structured Subspace Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 47. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sub-space Structured Sub-space Dictionary using SSRCC sheep horse bicycle motorbike cow bus dog cat person car Visual Category 60 70 80 90 mAP VOC2006 Sparse Subspace Structured Subspace sheep horse bicycle aeroplanecow sofabus dog cat person train diningtable bottlecar pottedplant tvmonitor chairbird boat motorbike Visual Category 50 60 70 80 90 mAP VOC2007 Sparse Subspace Structured Subspace Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 48. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sub-space Sparse Subspace Structured Sparse Subspace Data Set ITCC SSRCC VOC2006 67.5941 70.8295 68.5808 VOC2007 67.9971 68.0783 68.3718 Sparse selection of semantically related set of sub-spaces performs better than sparse individual selection of sub-spaces. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 49. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sparse Dictionary Structured Sparse Encoding using ITCC MITcoast MITmountain industrial livingroom MITopencountry PARoffice MITtallbuilding CALsuburb store bedroom MITforest MIThighway MITstreet MITinsidecity kitchen Visual Category 50 60 70 80 90 mAP Scene15 ITCC Sparse Encoding Structured Encoding sheep horse bicycle motorbike cow bus dog cat person car Visual Category 60 70 80 90 100 mAP VOC2006 ITCC Sparse Encoding Structured Encoding Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 50. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sparse Dictionary Structured Sparse Encoding using SSRCC MITcoast MITmountain industrial livingroom MITopencountry PARoffice MITtallbuilding CALsuburb store bedroom MITforest MIThighway MITstreet MITinsidecity kitchen Visual Category 50 55 60 65 70 75 80 85 mAP Scene15 SSRCC Sparse Encoding Structured Encoding sheep horse bicycle motorbike cow bus dog cat person car Visual Category 60 70 80 90 100 mAP VOC2006 SSRCC Sparse Encoding Structured Encoding Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 51. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Structured Sparse Dictionary Sparse Encoding Structured Sparse Encoding Data Set ITCC SSRCC VOC-2006 72.8386 73.3977 72.7738 Scene-15 68.5737 79.8794 72.1155 Sparse selection of semantically related set of dictionary elements performs better than sparse individual selection of dictionary element. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 52. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Summary Learning semantically relevant structure in feature space used to compute better visual models. Analysis of sub-space embedding emphasized modelling local distributions. Incorporation of fuzzy logic framework to learn dictionary kernels that adapt to local distributions yielded better visual models. Co-clustering was successful in grouping semantically related sub-spaces and feature space partitions. Estimated groups of sub-spaces and dictionary elements were used to compute structured sparse visual models, improving upon regular sparse models. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 53. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Future Work Future Work Visual models using Fisher Kernel coding, which uses a Gaussian kernel, has been very successful. Combining the approach in Fisher Kernels with the learnt Fuzzy membership functions could potentially improve the visual model. Fuzzy logic based learning algorithms that are more advanced than Gustafson-Kessel could be explored to learn better membership functions. Co-clustering creates a block factorization of the data matrix. Partial membership of rows and columns to the co-clusters would be the natural next step. Explore ways of using semantic structure to improve feature generation techniques like hierarchical models that aim to learn category specific descriptors. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 54. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Future Work End Questions... Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 55. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Appendices BoW Partitioning 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Bag−of−Words Partition | VOC−2006 | #000017 Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 ( ) is computed using K-means clustering. The feature vectors ( ) are projected to 2 dimensions using PCA. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 56. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Appendices FKM Partitioning 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017 Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition
  • 57. Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Appendices GK Partitioning 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017 Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset. Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition