Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Annotating a Foreign
Language Lexical
Resource with Pictures
Dmitry Ustalov
IMM UB RAS / UrFU
Yekaterinburg, Russia
Outline
•Introduction
•Related Work
•Approach
•Evaluation
•Results
•Discussion
•Conclusion
2
Introduction
•The problem of mapping images to
the word senses is quite important:
• multimedia search,
• text illustratio...
Related Work
• PicNet, a proprietary resource
(Mihalcea & Leong, 2008).
• ImageNet annotates WordNet with
pictures & bound...
Related Work: Flickr
•Single-query image retrieval
(Reiter et al., 2007).
•Semantic Web-based approach
(Trojahn et al., 20...
Problem
Given an annotated image I, a bilingual
dictionary B, and a lexical resource S,
produce a mapping Is.
“cat”,“tomca...
TagBag: Assumptions
•The most image tags are nouns.
•Tags may be polysemous and the
redundant tags may be present.
• “cran...
TagBag
•Tag. Initialize an empty vector.
• Iterate over image tags and retrieve all
the translations for each tag.
• Add e...
TagBag: Pseudocode
9
Evaluation
•The present approach is pretty simple.
Let’s evaluate it empirically.
•Take the top 1500 English nouns and
sea...
Experimental Setup
•Yet Another RussNet (CC BY-SA).
http://russianword.net/
•Similarity measures:
• cosine similarity,
• J...
призрак, тень, намёк
12https://www.flickr.com/photos/127324269@N03/16217604730
труд, работа, занятие
13https://www.flickr.com/photos/79304587@N07/16192772090
мужчина, парень, юноша
14https://www.flickr.com/photos/94029069@N03/15797009873
футбол
15https://www.flickr.com/photos/113780395@N05/15789001293
пища, провизия, питание, корм
16https://www.flickr.com/photos/80972943@N00/16396295195
Results
•The accuracy is moderately high and
the agreement level is good.
•Both measures demonstrate the same
performance....
Discussion
•Some mappings are the same w.r.t.
the similarity measures and 13 of 43
of these mapping are wrong.
•Three sour...
Conclusion
•TagBag is an unsupervised approach
for mapping images to synsets.
•The performance depends both on
image tags ...
Further Work
20
Thank you!
Dmitry Ustalov
a post-graduate student @
IMM UB RAS, Yekaterinburg, Russia.
https://ustalov.name/
dau@imm.uran....
Nächste SlideShare
Wird geladen in …5
×

Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

510 Aufrufe

Veröffentlicht am

TagBag: Annotating a Foreign Language Lexical Resource with Pictures. Talk by Dmitry Ustalov — http://ustalov.name/.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with Pictures

  1. 1. Annotating a Foreign Language Lexical Resource with Pictures Dmitry Ustalov IMM UB RAS / UrFU Yekaterinburg, Russia
  2. 2. Outline •Introduction •Related Work •Approach •Evaluation •Results •Discussion •Conclusion 2
  3. 3. Introduction •The problem of mapping images to the word senses is quite important: • multimedia search, • text illustration, • quality assessment. •It is also interesting to assess the Yet Another RussNet lexical resource. (Braslavski et al, 2014). 3
  4. 4. Related Work • PicNet, a proprietary resource (Mihalcea & Leong, 2008). • ImageNet annotates WordNet with pictures & bounding boxes (Deng et al., 2009). • Intersection with WordNet.ru is negligible. • ImageCLEF creates software and datasets for image indexing (Mü̈ller et al., 2010). 4
  5. 5. Related Work: Flickr •Single-query image retrieval (Reiter et al., 2007). •Semantic Web-based approach (Trojahn et al., 2008). •Wikipedia-based approach (Stampouli et al., 2010). •Flickr tags with visual saliency of images (Jiang et al., 2014). 5
  6. 6. Problem Given an annotated image I, a bilingual dictionary B, and a lexical resource S, produce a mapping Is. “cat”,“tomcat”,“kitten” → «кот, кошка, котёночек» 6
  7. 7. TagBag: Assumptions •The most image tags are nouns. •Tags may be polysemous and the redundant tags may be present. • “crane” is «журавль» or «кран»? •The image has a “main” object. 7
  8. 8. TagBag •Tag. Initialize an empty vector. • Iterate over image tags and retrieve all the translations for each tag. • Add each occurrence to a dimension. •Bag. Prune that vector. • Remove the low frequency dimensions with the cut-off value. • Return the resulting vector. 8
  9. 9. TagBag: Pseudocode 9
  10. 10. Evaluation •The present approach is pretty simple. Let’s evaluate it empirically. •Take the top 1500 English nouns and search for Flickr photos. http://www.talkenglish.com/Vocabulary/T op-1500-Nouns.aspx •Get the V.K. Mueller’s dictionary. http://ustalov.imm.uran.ru/pub/mueller.tar.gz 10
  11. 11. Experimental Setup •Yet Another RussNet (CC BY-SA). http://russianword.net/ •Similarity measures: • cosine similarity, • Jaccard index. •Ask three annotators to submit judgements. 11
  12. 12. призрак, тень, намёк 12https://www.flickr.com/photos/127324269@N03/16217604730
  13. 13. труд, работа, занятие 13https://www.flickr.com/photos/79304587@N07/16192772090
  14. 14. мужчина, парень, юноша 14https://www.flickr.com/photos/94029069@N03/15797009873
  15. 15. футбол 15https://www.flickr.com/photos/113780395@N05/15789001293
  16. 16. пища, провизия, питание, корм 16https://www.flickr.com/photos/80972943@N00/16396295195
  17. 17. Results •The accuracy is moderately high and the agreement level is good. •Both measures demonstrate the same performance. 17 http://ustalov.imm.uran.ru/pub/tagbag-aist.tar.gz
  18. 18. Discussion •Some mappings are the same w.r.t. the similarity measures and 13 of 43 of these mapping are wrong. •Three sources of errors: • sloppy image tags (7 of 13), • actual mapping errors (3 of 13), • batch uploads (3 of 13). 18
  19. 19. Conclusion •TagBag is an unsupervised approach for mapping images to synsets. •The performance depends both on image tags and ontology bias. •Visual saliency and spam filtering may increase the quality. 19
  20. 20. Further Work 20
  21. 21. Thank you! Dmitry Ustalov a post-graduate student @ IMM UB RAS, Yekaterinburg, Russia. https://ustalov.name/ dau@imm.uran.ru The present work is supported by the Russian Foundation for the Humanities, project no. 13-04-12020. 21

×