Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Linkator: enriching web pages by automatically adding dereferenceable semantic annotations Samur Araujo, Geert-Jan Houben, Daniel Schwabe Web Information Systems Delft University of Technology, the Netherlands

Summary – dereferencing semantic annotations What dereferencing semantic annotations is about? Automatic linking web pages. Summary Overview of the problem and motivation. Our approach for solving the problem. One example of use.

Motivation Links between HTML pages are the main mechanism to navigate on web pages. However, a lot of pages are unlinked or poorly linked. Terms on pages have meaning and are intrinsically associated to concepts or entities that the user is interested in. These terms can be interpreted by machines and automatically linked to relevant resources on the web.

Problem Statement The problem of automatic linking can be divided in 3 sub-problems: How to identify candidate terms (anchors) for adding links? It denotes concepts in which the user is interested. Which concept does a candidate term represent? Disambiguate a candidate term. How to identify a web resource to be the link target? How to select a source of data for finding the destination of the link?

State-of-the-Art in Automatic Linking Candidate Terms: Focused on term disambiguation using an auxiliary knowledge base or dictionaries (e.g. wikipedia and wordnet). Link Target: It is selected from a specific knowledge base [1] or from a collection [2] of target documents. Limitations Does not support well users interested in a broader range of domains. [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007. [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans. Knowl. Data Eng. 21(6). 829-839. 2009.

Linkator Approach Linkator Extract Terms from Web Pages Associate Terms to Concepts Find Resources that Represents these Concepts Core Linkator Information Extraction Engine Semantic Annotator

Link Clicked Page Accessed Page is accessed Annotated page Term are extracted Annotation is extracted Page is semantically annotated Endpoint is chosen Semantic Links created Query is formulated If not found Search for a resource

Linkator Approach Web Browser Linkator Client - Firefox Plugin Annotator RDFa Annotator Information Extraction Engine HTTP HTTP Linkator Server Linked Data Endpoint Resolution Sparql Query Formulation

Semantic Link – Definition A semantic link is an HTML tag A that is semantically annotated with RDFa. It contains RDF triples associated to it. Semantic Link causes a query over Linked Data.

RDF Triples associated to the Semantic Link Semantic Links

Dereferencing Semantic Links Linkator uses the Linked Data cloudfor discovering a destination for the semantic link as opposed to querying search engines or a fixed knowledge base. Algorithm for Endpoint Resolution Algorithm for Query Formulation

Endpoint Resolution ,[object Object],Linkatorselects available endpoints based on the vocabulariesused in the semantic links. voiD(Vocabulary of Interlinked Datasets)

Endpoint Resolution Select the vocabulary of all RDF types associated with the annotation. Or select the vocabularies of all predicates associated with the annotation.

Endpoint Resolution The SelectEndpoint function find the resource: http://ontoware.org/swrc/swrc_v0.3.owl#Author It extracts the vocabulary associated with this resource:http://ontoware.org/swrc/swrc_v0.3.owl It queries the voiDdescriptor of the available SPARQL endpoints, looking for such a vocabulary.

Query Formulation Query is based on the object of the triple. Try to find a human-readable representation of the resource, i.e., try to match predicates such as: foaf:homepage, akt:has-web-address, rdfs:seeAlso.

Proof of Concept Semantic links for pages that contain bibliographic citations. Extended version of FreeCite parsing engine. Example of bibliographic citation: Keesvan derSluijs, Geert-Jan Houben, Erwin Leonardi, Jan Hidders. Hera: Engineering Web Applications Using Semantic Web-Based Models. Book chapter: Semantic Web Information Management: A Model-Based Perspective, De Virgilio, Roberto; Giunchiglia, Fausto; Tanca, Letizia (Eds.), Chapter 22, 2010, Springer.

Linkator Extract Terms from Web Pages Associate Terms to Concepts Find Resources that Represents these Concepts Core Linkator Information Extraction Engine Semantic Annotator Html Page Sparql Endpoint Discovering and Selection Markup Removed Entity Extraction Plain Text Text Semantically Annotated Endpoint Querying Semantic link clicked Semantic Annotation Insert annotations on the page HTML Page Semantically Annotated URL Generation FreeCite Extraction Engine Core Linkator

Example – HTML Page without Links

Example – Page annotated with RDFa

Example – Page with Semantic Links

Conclusion and Future Work For a specific scenario of linking bibliographic citations Linkator provides a reasonable solution. The composition of the Semantic Web technologies can provide a reasonable solution for the problem of automatic linking. Linkator is a concrete application that uses Semantic Web technologies. Future Work: Use Linkator in a broader scenario. Enhance the Linkator algorithms. Evaluate the precision and recall of the linking.

Questions? Thank you for your attention! Samur Araujo s.f.cardosodearaujo@tudelft.nl You can download Linkator at: http://www.wis.ewi.tudelft.nl/

Annotation on the page are used to find the link destination Annotated HTML Page HTML Page Page is annotated Link is clicked RDF

State-of-the-Art in Automatic Linking Example: Wikify! [1] is focused on linking keywords on web pages to Wikipedia articles Nnexus [2] focus on linking keywords obtained from an index extracted from target documents. [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007. [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans. Knowl. Data Eng. 21(6). 829-839. 2009.

Endpoint Resolution FUNCTION SelectEndpoint E := Array R : = select all rdf:type objects associated to the semantic link T := ExtractVocabulary(R) FOR EACH vocabulary in T DO { E.add (select endpoints that contain this vocabulary) } IF E = Empty { R := select all predicates associated to the semantic link T := ExtractVocabulary(R) FOR EACH vocabulary in T DO { E.add (select endpoints that contain this vocabulary) } } RETURN E FUNCTION ExtractVocabulary(R) V := Array FOR EACH resource in R DO { V.add (extract the vocabulary from the resource) } RETURN V 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Semantic Link – Example Triples associated with the semantic link.

Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Ähnlich wie Linkator: enriching web pages by automatically adding dereferenceable semantic annotations (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

Hinweis der Redaktion