The document discusses Press Association's semantic technology project which aims to generate a knowledge base using information extraction and the Linked Data Cloud. It outlines Press Association's operations and workflow, and how semantic technologies can be used to develop taxonomies, annotate images, and extract entities from captions into an ontology-based knowledge base. The knowledge base can then be populated and interlinked with external datasets from the Linked Data Cloud like DBpedia to provide a comprehensive, semantically-structured source of information.
6. PA Images Workflow Agency/Photographers Metadata Company Captioners Website Provides minimum metadata in IPTC Images with metadata passed to Captioners for batch processing Modifies existing and adds new metadata Information Extraction Storage & Browsing Semantic structure Background Semantic Web project Knowledge base Conclusions
7.
8. Text Mining System Overview Images with captions GATE-based IE System Background Semantic Web project Knowledge base Conclusions Gazetteer (known entities) JAPE Grammar (context rules) Disambiguation/Summarisation Entities of interest Annotated Image captions PA KB Linked Data Cloud What to store What to extract Confirmation Captions Learned Facts Schema PA Images view PA Images ontology
9. PA Images Ontology (OWL) Background Semantic Web project Knowledge base Conclusions
10.
11.
12.
13.
14. Linked Data cloud 31/03/2008 Background Semantic Web project Knowledge base Conclusions
15.
16.
17.
18. Background Semantic Web project Knowledge base Conclusions
19.
20.
21. Ontology Mapping - Map the ontology and the data will follow.. Linked Data Cloud PA Images Ontology DBPedia YAGO Geonames ...... sameAs sameAs sameAs Knowledgebase/data for our ontology Similar Entities & Their Features Background Semantic Web project Knowledge base Conclusions
22.
23.
24.
25.
26.
27.
28.
Hinweis der Redaktion
In terms of data quality, we have found following limitation of the DBpedia knowledge base: DBpedia is less formally structured and governed by number of ontologies where retrieving a particular class of entity will require joining a number of ontologies. For example, a comprehensive list of footballers can only be retrieved by combining Yago, DBpedia and SKOS ontology. The data quality is inferior (to our expectations) as there are considerable inconsistencies within DBpedia. For example, some of the object properties do not link to other entities and instead link to temporal templates. Another example is the incorrect classification of entities. For example, some of the bands are incorrectly classified as persons. In addition to the above shortcomings, we have our own view of the world and define them differently in PA Images ontology. As suggested by DBpedia authors [9], an approach to combine the advantages of both worlds is to interlink DBpedia with hand-crafted ontologies, which enables applications to use the formal knowledge from these ontologies together with the instance data from DBpedia.
The accuracy required needs to be close to 100%. As mentioned earlier, the coverage of data under DBpedia is richer when using multiple ontologies which require mapping one ontology to many and doing so that the coverage benefits and redundancy is countered. There is no known automatic ontology mapping approach to us that fulfils the aforementioned criteria. We have successfully used SPARQL CONSTRUCT [17] queries to achieve ontology mapping between PA Images and DBpedia ontologies and to extract the entities from DBpedia KB and generate a clean, contextualised PA KB.