Towards data driven estimation of image tag relevance using visually similar and dissimilar folksonomy images

TOWARDS DATA-DRIVEN ESTIMATION OF
IMAGE TAG RELEVANCE USING VISUALLY
SIMILAR AND DISSIMILAR FOLKSONOMY IMAGES

ACM Multimedia 2012:
Workshop on Socially-Aware Multimedia
Nara – Oct. 29, 2012

Sihyoung Lee1, Wesley De Neve1,2, Yong Man Ro1

1
Image and Video Systems Lab, Dept. of Electrical Engineering, KAIST
2
Multimedia Lab, ELIS, Ghent University - iMinds

2 /19

Outline

• Introduction
• Motivation
• Proposed Image Tag Relevance Estimation
• Experiments
• Conclusions

3 /19

Outline

• Introduction
• Motivation
• Experiments
• Conclusions

4 /19

Introduction

• Increasing online availability of images
– thanks to easy-to-use multimedia devices and online services
– thanks to cheap storage and bandwidth
– thanks to an increasing number of people going online

• Some statistics
– every minute, over 2,500 images are uploaded to Flickr
– every day, over 300 million photos are uploaded to Facebook

• How to effectively retrieve images for consumption
purposes?

5 /19

Problems in Image Folksonomies

• Most image search engines strongly depend on tags
• Non-relevant tags hinder effective consumption

Among the 60 images retrieved, only 20 images
are related to ‘apple’

6 /19

Outline

• Introduction
• Motivation
• Experiments
• Conclusions

7 /19

Motivation

• The correlation between visual and semantic similarity
– is high for images that are semantically and visually distant
– is lower for images that are semantically and visually close

 The probability of having semantically similar images in a set of
visually similar images is lower than the probability of having
semantically dissimilar images in a set of visually dissimilar images

• The above observation motivated us to develop a novel
technique for tag relevance learning
– takes advantage of both visually similar and dissimilar images

8 /19

• Introduction
• Motivation
• Experiments
• Conclusions

9 /19

Conceptual Illustration of
The Proposed Method
food

bicycle

sign

solder

desert
airplane
desert
desert bicycle

desert
building
bicycle

Image tag relevance estimation
desert

rifle desert bicycle
desert
Image tag relevance
desert, bicycle using visually dissimilar images
ship

desert desert

bicycle
street bicycle
nature

desert bicycle
Image tag relevance
atomium using the proposed method
basketball

bicycle

10 /19

Proposed Method

• Let r (i, t) be an image tag relevance learning function
based on the proposed method, then it is defined as
r ( i, t ) := rsimilar ( i, t , k ) − rdissimilar ( i, t , l )
∑ vote( j, t )
– rsimilar ( i, t , k ) := nt [ N s ( i, k ) ] − nt [ N rand ( k ) ] = j=N∑i ,vote( j , t ) − k ⋅
j=I

s( k) I

∑ vote( j , t )
– r dissimilar ( i, t , u ) := n [ N ( i, l ) ] − n [ N ( l ) ] = ∑( vote( j, t ) − l ⋅
t d t rand
j =I

)
j = N d i ,l I
where nt[∙] represents the number of images annotated with t,
Ns(i,k) is a set of k images visually similar to i ,
Nd(i,l) is a set of l images visually dissimilar to i , and
Nrand(k) is a set of k randomly selected neighbors

Relationship between rsimilar ( i, t , k ) , rdissimilar ( i, t , l ) , and r ( i, t )
rsimilar ( i, t , k ) rdissimilar ( i, t , l ) r ( i, t )
t
relevant
+ - ++
t irrelevant - + --

11 /19

Rationale

• For trelevant relevant to the content of i,
– P(trelevant|Ns(i,k)) is higher than P(trelevant|Nrand(k))
 rsimilar(trelevant,i,k) thus returns a positive value
– P(trelevant|Nd(i,l)) is lower than P(trelevant|Nrand(l))
 rdissimilar(trelevant,i,l) thus returns a negative value

• For tirrelevant irrelevant to the content of i,
– P(tirrelevant|Ns(i,k)) is lower than P(tirrelevant|Nrand(k))
 rsimilar(tirrelevant,i,k) thus returns a negative value
– P(tirrelevant|Nd(i,l)) is higher than P(tirrelevant|Nrand(l))
 rdissimilar(tirrelevant,i,l) thus returns a positive value

12 /19

Outline

• Introduction
• Motivation
• The Proposed Image Tag Relevance Estimation
• Experiments
• Conclusions

13 /19

Experimental Setup (1/2)

• Image set used: subset of MIRFlickr-1M
– 100,000 images annotated with 1,130,342 tags
 by 13,343 users
 concept vocabulary of 159,300 unique tags
– test set
 1,000 images annotated with at least four tags
 annotated with 24,474 tags
• we manually classified 6,534 tags as correct
• we manually classified 17,940 tags as noisy

• Image descriptor
– Bag of Visual Words (BoVW)
 vocabulary size: 500
– use of cosine similarity for measuring image similarity

14 /19

Experimental Setup (2/2)

• Metrics used for evaluating the effectiveness of the
proposed technique for image tag relevance estimation
– for image tag refinement
A
noise

NL =
A
where NL (Noise Level) denotes the proportion of irrelevant tag assignments in the set of all tag assignments,
A is the set of tag assignments in an image folksonomy,
Anoise is the set of incorrect (noisy) tag assignments

– for tag-based image retrieval
It ∩ I t ,m
relevant retrieved

P @ m for t = ,
m
where Tt is the set of all folksonomy images relevant to t,
relevant

Tt,mrelevant is the set of the m topmost images that have been retrieved for t

(given the estimated tag relevance values)

Effectiveness of Image Tag Relevance 15 /19

Estimation
for Image Tag Refinement
• Effectiveness of image tag relevance estimation using
visually similar and dissimilar images, compared to
previous approaches
– neighbor voting and a variant of neighbor voting estimate image
tag relevance by only making use of visually similar images

After image tag refinement
Before

image tag refinement Using visually similar images Using the proposed technique

Number of relevant tags
6,534 5,881 5,881

Number of irrelevant tags
17,940 13,117 12,094

NL
0.733 0.690 0.673

Effectiveness of Image Tag Relevance 16 /19

Estimation
for Tag-based Image Retrieval
• Effectiveness of image tag relevance estimation using
visually similar and dissimilar images, compared to
previous approaches
– neighbor voting and a variant of neighbor voting estimate image
tag relevance by only making use of visually similar images

17 /19

Outline

• Introduction
• Motivation
• Experiments
• Conclusions

18 /19

Conclusions

• We proposed an image tag relevance technique that
makes use of both visually similar and dissimilar images
– increases the difference in image tag relevance between tags
relevant and tags not relevant with respect to a seed image
– comes with a low increase in computational complexity
• The effectiveness of the proposed technique was
confirmed using MIRFLICKR-25000 and MIRFLICKR-1M
– by showing that the proposed technique allows increasing the
effectiveness of tag refinement and tag-based image retrieval
• Future research
– combining visual information and tag statistics comparing our
data-driven approach with a classifier-based approach for
detecting a number of predefined semantic concepts

Towards data driven estimation of image tag relevance using visually similar and dissimilar folksonomy images

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Mehr von Wesley De Neve

Mehr von Wesley De Neve (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Towards data driven estimation of image tag relevance using visually similar and dissimilar folksonomy images