Multipedia: Enriching DBpedia with Multimedia information
1. Multipedia:Enriching DBpedia with Images Andrés García-Silva†, Asunción Gómez-Pérez† Max Jakob *, Pablo Mendez * and Chris Bizer ⃰ † {hgarcia, ocorcho,asun}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo s/n 28660 Boadilla del Monte, Madrid, Spain *first.last@fu-berlin.de Web-based Systems Group Freie Universitat Berlin, Germany
2. Introduction Enriching ontologies with multimedia The use of images and videos complement information about concepts/entities in existing knowledge bases. Multimodal ontologies can help in QA systems, User Interfaces, search and recommendation processes. depicts Pathology IsA «Show me X-ray Images with fractures of the Femur» occurs isA Bone depicts Radhouani, S., HweeLim, J.: pierre Chevallet, J., Falquet, G.: Combining textual and visual ontologies to solve medical multimodal queries. In: IEEE International Conference on Multimedia and Expo., pp. 1853-1856 (2006). 2 Garcia-Silva et al.
3. Introduction Goal: Populate a general purpose ontology with images from the Web. - Find relevant images for ontology instances with ambiguous names DBpedia knowledge base Collects facts from Wikipedia containing 3.5 million entities, Classified into a consistent cross-domain ontology: 272 classes and 1.6 million instances. Has evolved into a hub in the linked data cloud. Images in DBpedia Wikipedia images are represented in DBpedia (foaf:depiction) about 70% of the wikipedia articles don’t have images 3 Garcia-Silva et al.
4. Introduction Challenges Ambiguity of instance labels Querying the web for images related to the resource dbpedia:hornet 4 Garcia-Silva et al.
6. Enriching DBpedia with Multimedia 6 Garcia-Silva et al. Get Context Retrieve Images Aggregate Generate tag-based ranking Aggregate dbpr:Hornet Wikipedia-based Context Index Related terms Query per context term & dbpr name Image Search Engines Rankings of Images (One per each query) List of Images Annotated with tags Ranking of Images Ranking of Images Ranking of Images
7. Enriching DBpedia with Multimedia 7 Garcia-Silva et al. Get Context Wikipedia article dbpr:Hornet Wikipedia-based Context Index family, wasps, insect
8. Enriching DBpedia with Multimedia 8 Garcia-Silva et al. Retrieve Images dbpr:Hornet family, wasps, insect Q0=Hornet Q1=Hornet and Family Q2=Hornet and Wasps Q3=Hornet and insect Image Search Engines Image Rankings R0 = img0,1; img0,2 ... Img0,k R1 = img1,1; img1,2 ... Img1,l R2 = img2,1; img2,2 ... Img2,m R3 = img3,1; img3,2 ... Img3,n
9.
10. Each query result Ri is a voter and Images imgj are candidates:Foreachcandidate imgj in Ri Si(imgj) = number of candidates ranked below imgjin Ri. Output: imgj ordered by S(imgj) value Rcontext-based= img1; img2 ... Imgp
11. Enriching DBpedia with Multimedia 10 Garcia-Silva et al. Generate tag-based ranking Aggregate List of images L= R0ᴜ R1ᴜ R2ᴜ R3 Rtag-based= img1; img2 ... Imgq 1) Measuring relatedness between a DBpedia resource and an image: - Overlapping of terms between the context of the former and the tags of the latter. 2) Vector Space Model to represent the DBpedia resource and images: - TF as weighting scheme, - cosine function to measure similarity 3) Generate ranking of images according to the similarity value Rcontext-based= img1; img2 ... Imgp Rfinal= img1; img2 ... Imgl Rtag-based= img1; img2 ... Imgq
12. Experiments How many context words do produce the best results? 11 Apple context: «juice, fruit, apples, capital, michigan, orange» Garcia-Silva et al.
13. Experiments Ambiguity Search engines work well: unambiguous names ambiguous names referring a dominant sense e.g., dbpedia:Stonehenge However they fail for ambiguous names: Lacking of a dominant sensee.g.: dbpedia:Apple When they do not refer to the dominant sense e.g.: dbpedia:Blackberry 12 Garcia-Silva et al.
14. Experiments Dominance: Dataset: 10 Classes and 15 dbpr randomly selected per each class Each dbpr must be: 1) popular, 2) have a dominance under 0.7 We found dbpr for Mammals, Birds and Insects Increasing the dominance limit to 0.9 we found dbpr for the rest of classes. 13 Garcia-Silva et al.
15. Experiments 15 people evaluate the results of three approaches Each image was rated by 3 evaluators 14 Garcia-Silva et al.
17. Conclusions Multipedia an approach to automatically populate an ontology with images related to existing instances We focused on the particularly challenging problem of ambiguity in instance names Human-driven evaluation of the approach involving 15 users and a total of 2250 image ratings containing DBpedia resources from several classes. A variation of Multipedia improves average precision by 9.4% over a baseline of keyword queries to commercial image search engines We have validated that in contrast to the baseline our approach achieves the highest precision with ambiguous names lacking a dominant sense. 16 Garcia-Silva et al.