Cmap presentation

ConceptMap: Learning Visual Concepts
from Weakly-Labeled WWW images
A work by Eren Golge
Supervised by Asst. Prof. Pinar Duygulu

Dictionary
●
Visual Concept – a visual correspondence of semantic
values
– Objects (car, bus … ), attributes (red, metallic … ) or scenes
(indoor, kitchen, office …)
●
Polysemy – multiple semantic matching for a given
word
●
Model – Classifiers in Machine Learning sense
●
BoW – Bag of Words feature representation

Problems
●
Hard to have Large labeled data
●
Query Web sources : Google, Bing, Yahoo etc.
●
Evade polysemy or irrelevancy in the gathered data
●
Deal with Domain Adaptation
●
Learn salient models
●
Use lower concept models -objects- to discover higher level
concepts – scenes -

General Pipeline
GATHER DATA from
CLUSTER and
remove OUTLIERS
Learn Classifiers

Hassles
●
Polysemy
●
Irrelevancy
●
Data size
●
Model learning

Method #1 : CMAP
Polysemy : Clustering
Irrelevancy : Outlier detection+
Rectifying Self Organizing Map (RSOM)
Accepted for
Draft version : http://arxiv.org/abs/1312.4384

RSOM
●
Very Generic method for other domains as well (textual, biological etc.)
●
Extension of SOM (a.k.a. Kohonen's Map) *
●
Inspired by biological phenomenas **
●
Able to cluster data and detect outliers
●
IRRELEVANCY SOLVED!!
*Kohonen, T.: Self-organizing maps. Springer (1997)
**Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal
of physiology 160(1) (1962) 106
Outlier clusters
Outlier instances in salient clusters

RSOM cont'
finding outlier units
●
Look activation statistics of each SOM unit in
learning phase
●
Latter learning iterations are more reliable
IF a unit is activated
REARLY → OUTLIER
FREQUENTLY → SALIENT
Winner activations Neighbor activations

RSOM cont'
finding sole outliers
x
x
x
x

Learning Models
●
Learn L1 linear SVM models
– Easier to train
– Better for high dimensional data
(wide data matrix)
– Implicit feature selection by L1
norm
●
Learn one linear model from each
salient cluster
●
Each concept has multiple models
– POLYSEMY SOLVED!!

Retrospective
●
Fergus et. al. [1]
– They use human annotated control set to cull data
– We use fully non-human afforded data
●
Berg and Forsyth [3]
– They use textual surrounding
– We use only visual content
●
OPTIMOL, Li and Fei-Fei [2]
– They use seed images and update incrementally
– We use no supervision with all in one iteration
●
Efros et. al. [4] “Discriminative Patches”
– They require a large computer clusters and iterative data elimination
– We use single computer with faster and better results and no time wasting iterations.
●
CMAP has broader possible applications
[1] Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Computer Vision, 2005. ICCV 2005
[2] Berg, T.L., Berg, A.C., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E.G., Forsyth, D.A.: Names and faces in the news. In: IEEE Conference on
Computer Vision
Pattern Recognition (CVPR). Volume 2. (2004) 848–854
[3] Li, L.J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. International journal of computer vision 88(2) (2010) 147–168
[4] Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Computer Vision–ECCV 2012. Springer (2012) 73–86

Experiments
●
Only use images for learning
●
Attack to problems:
– Attribute Learning : [1] , Images, Google [2],
[2]
●
Learn Texture and Color attributes
– Scene Learning : MIT-indoor [4], Scene-15 [5]
●
Use Attributes as mid-level features
– Face Recognition : FAN-Large [6]
●
Use EASY and HARD subset of the dataset
– Object Recognition : Google data-set [3]
[1] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[2] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[4] Quattoni, A., Torralba, A.: Recognizing indoor scenes. CVPR (2009)
[5] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR 2006
[6] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. In: BMVC. (2011)

Visual Examples # Faces
Salient Clusters Outlier Clusters Outlier Instances

Salient Clusters Outlier Clusters Outlier Instances

Implementation
●
Visual Features :
– BoW SIFT with 4000 words (for texture attribute, object and face)
– Use 3D 10x20x20 Lab Histograms (for attribute)
– 256 dimensional LBP [1] (for object and face)
●
Preprocessing
– Attribute: Extract random 100x100 non-overlapping image patches from each image.
– Scene: Represent each image with the confidence scores of attribute classifiers in a Spatial Pyramid sense
– Face: Apply face detection[2] to each image and get one highest score patch.
– Object: Apply unsupervised saliency detection [3] to images and get a single highest activation region.
●
Model Learning
– Use outliers and some sample of other concept instances as Negative set
– Apply Hard Mining
– Tune all hyper parameters via X-validation on the (classifiers and RSOM parameters)
●
NOTICE:
– We use Google images to train concept models and deal with DOMAIN ADAPTATION
[1] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and
Machine Intelligence, IEEE Transactions on 24(7) (2002) 971–987
[2] Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE (2012) 2879–2886
[3] Erdem, E., Erdem, A.: Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision 13(4) (2013) 1–20

Results
Ours State of art
Face 0.66 0.58 [1]
Object 0.78 0.75 [2]
Attribute Image-Net 0.37 0.36 [3]
Attribute ebay 0.81 0.79 [4]
Attribute bing 0.82
-
- We beat all state of art methods except scene recognition!!
However our method is very cheaper compared to Li et al. [5]
[1] Ozcan, M., Luo, J., Ferrari, V., Caputo, B.: A large-scale database of images and captions for automatic face naming. BMVC. (2011)
[3] Russakovsky, O., Fei-Fei, L.: Attribute learning in large-scale datasets. In: Trends and Topics in Computer Vision. Springer (2012)
[4] Van De Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. Image Processing, IEEE (2009)
[5] Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. CVPR (2013)

Last Words
●
Fact – We propose a novel algorithm RSOM
●
Fact – Roughly beating all state-of-art methods
●
Fact – Solution for better data-sets with little or no
human effort
●
Improvement – Try to estimate # clusters implicitly
without any hyper parameter.
●
Improvement – Use more complex classification
scheme.

Not Much... Thanks for
valuable time :)

Cmap presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cmap presentation

Similar to Cmap presentation (20)

Recently uploaded

Recently uploaded (20)

Cmap presentation