Towards data driven estimation of image tag relevance using visually similar and dissimilar folksonomy images
1. TOWARDS DATA-DRIVEN ESTIMATION OF
IMAGE TAG RELEVANCE USING VISUALLY
SIMILAR AND DISSIMILAR FOLKSONOMY IMAGES
ACM Multimedia 2012:
Workshop on Socially-Aware Multimedia
Nara – Oct. 29, 2012
Sihyoung Lee1, Wesley De Neve1,2, Yong Man Ro1
1
Image and Video Systems Lab, Dept. of Electrical Engineering, KAIST
2
Multimedia Lab, ELIS, Ghent University - iMinds
4. 4 /19
Introduction
• Increasing online availability of images
– thanks to easy-to-use multimedia devices and online services
– thanks to cheap storage and bandwidth
– thanks to an increasing number of people going online
• Some statistics
– every minute, over 2,500 images are uploaded to Flickr
– every day, over 300 million photos are uploaded to Facebook
• How to effectively retrieve images for consumption
purposes?
5. 5 /19
Problems in Image Folksonomies
• Most image search engines strongly depend on tags
• Non-relevant tags hinder effective consumption
Among the 60 images retrieved, only 20 images
are related to ‘apple’
7. 7 /19
Motivation
• The correlation between visual and semantic similarity
– is high for images that are semantically and visually distant
– is lower for images that are semantically and visually close
The probability of having semantically similar images in a set of
visually similar images is lower than the probability of having
semantically dissimilar images in a set of visually dissimilar images
• The above observation motivated us to develop a novel
technique for tag relevance learning
– takes advantage of both visually similar and dissimilar images
9. 9 /19
Conceptual Illustration of
The Proposed Method
food
bicycle
sign
solder
desert
airplane
desert
desert bicycle
desert
building
bicycle
Image tag relevance estimation
desert
rifle desert bicycle
desert
Image tag relevance
desert, bicycle using visually dissimilar images
ship
desert desert
bicycle
street bicycle
nature
desert bicycle
Image tag relevance
atomium using the proposed method
basketball
bicycle
10. 10 /19
Proposed Method
• Let r (i, t) be an image tag relevance learning function
based on the proposed method, then it is defined as
r ( i, t ) := rsimilar ( i, t , k ) − rdissimilar ( i, t , l )
∑ vote( j, t )
– rsimilar ( i, t , k ) := nt [ N s ( i, k ) ] − nt [ N rand ( k ) ] = j=N∑i ,vote( j , t ) − k ⋅
j=I
s( k) I
∑ vote( j , t )
– r dissimilar ( i, t , u ) := n [ N ( i, l ) ] − n [ N ( l ) ] = ∑( vote( j, t ) − l ⋅
t d t rand
j =I
)
j = N d i ,l I
where nt[∙] represents the number of images annotated with t,
Ns(i,k) is a set of k images visually similar to i ,
Nd(i,l) is a set of l images visually dissimilar to i , and
Nrand(k) is a set of k randomly selected neighbors
Relationship between rsimilar ( i, t , k ) , rdissimilar ( i, t , l ) , and r ( i, t )
rsimilar ( i, t , k ) rdissimilar ( i, t , l ) r ( i, t )
t
relevant
+ - ++
t irrelevant - + --
11. 11 /19
Rationale
• For trelevant relevant to the content of i,
– P(trelevant|Ns(i,k)) is higher than P(trelevant|Nrand(k))
rsimilar(trelevant,i,k) thus returns a positive value
– P(trelevant|Nd(i,l)) is lower than P(trelevant|Nrand(l))
rdissimilar(trelevant,i,l) thus returns a negative value
• For tirrelevant irrelevant to the content of i,
– P(tirrelevant|Ns(i,k)) is lower than P(tirrelevant|Nrand(k))
rsimilar(tirrelevant,i,k) thus returns a negative value
– P(tirrelevant|Nd(i,l)) is higher than P(tirrelevant|Nrand(l))
rdissimilar(tirrelevant,i,l) thus returns a positive value
12. 12 /19
Outline
• Introduction
• Motivation
• The Proposed Image Tag Relevance Estimation
• Experiments
• Conclusions
13. 13 /19
Experimental Setup (1/2)
• Image set used: subset of MIRFlickr-1M
– 100,000 images annotated with 1,130,342 tags
by 13,343 users
concept vocabulary of 159,300 unique tags
– test set
1,000 images annotated with at least four tags
annotated with 24,474 tags
• we manually classified 6,534 tags as correct
• we manually classified 17,940 tags as noisy
• Image descriptor
– Bag of Visual Words (BoVW)
vocabulary size: 500
– use of cosine similarity for measuring image similarity
14. 14 /19
Experimental Setup (2/2)
• Metrics used for evaluating the effectiveness of the
proposed technique for image tag relevance estimation
– for image tag refinement
A
noise
NL =
A
where NL (Noise Level) denotes the proportion of irrelevant tag assignments in the set of all tag assignments,
A is the set of tag assignments in an image folksonomy,
Anoise is the set of incorrect (noisy) tag assignments
– for tag-based image retrieval
It ∩ I t ,m
relevant retrieved
P @ m for t = ,
m
where Tt is the set of all folksonomy images relevant to t,
relevant
Tt,mrelevant is the set of the m topmost images that have been retrieved for t
(given the estimated tag relevance values)
15. Effectiveness of Image Tag Relevance 15 /19
Estimation
for Image Tag Refinement
• Effectiveness of image tag relevance estimation using
visually similar and dissimilar images, compared to
previous approaches
– neighbor voting and a variant of neighbor voting estimate image
tag relevance by only making use of visually similar images
After image tag refinement
Before
image tag refinement Using visually similar images Using the proposed technique
Number of relevant tags
6,534 5,881 5,881
Number of irrelevant tags
17,940 13,117 12,094
NL
0.733 0.690 0.673
16. Effectiveness of Image Tag Relevance 16 /19
Estimation
for Tag-based Image Retrieval
• Effectiveness of image tag relevance estimation using
visually similar and dissimilar images, compared to
previous approaches
– neighbor voting and a variant of neighbor voting estimate image
tag relevance by only making use of visually similar images
18. 18 /19
Conclusions
• We proposed an image tag relevance technique that
makes use of both visually similar and dissimilar images
– increases the difference in image tag relevance between tags
relevant and tags not relevant with respect to a seed image
– comes with a low increase in computational complexity
• The effectiveness of the proposed technique was
confirmed using MIRFLICKR-25000 and MIRFLICKR-1M
– by showing that the proposed technique allows increasing the
effectiveness of tag refinement and tag-based image retrieval
• Future research
– combining visual information and tag statistics comparing our
data-driven approach with a classifier-based approach for
detecting a number of predefined semantic concepts