In this paper, we introduce Linkator, an application architecture that
exploits semantic annotations for automatically adding links to previously
generated web pages. Linkator provides a mechanism for dereferencing these
semantic annotations with what it calls semantic links. Automatically adding
links to web pages improves the users’ navigation. It connects the visited page
with external sources of information that the user can be interested in, but that
were not identified as such during the web page design phase. The process of
auto-linking encompasses: finding the terms to be linked and finding the
destination of the link. Linkator delegates the first stage to external semantic
annotation tools and it concentrates on the process of finding a relevant
resource to link to. In this paper, a use case is presented that shows how this
mechanism can support knowledge workers in finding publications during their
navigation on the web.
TeamStation AI System Report LATAM IT Salaries 2024
Linkator: enriching web pages by automatically adding dereferenceable semantic annotations
1. Linkator: enriching web pages by automatically adding dereferenceable semantic annotations Samur Araujo, Geert-Jan Houben, Daniel Schwabe Web Information Systems Delft University of Technology, the Netherlands
2. Summary – dereferencing semantic annotations What dereferencing semantic annotations is about? Automatic linking web pages. Summary Overview of the problem and motivation. Our approach for solving the problem. One example of use.
3. Motivation Links between HTML pages are the main mechanism to navigate on web pages. However, a lot of pages are unlinked or poorly linked. Terms on pages have meaning and are intrinsically associated to concepts or entities that the user is interested in. These terms can be interpreted by machines and automatically linked to relevant resources on the web.
4.
5. Problem Statement The problem of automatic linking can be divided in 3 sub-problems: How to identify candidate terms (anchors) for adding links? It denotes concepts in which the user is interested. Which concept does a candidate term represent? Disambiguate a candidate term. How to identify a web resource to be the link target? How to select a source of data for finding the destination of the link?
6. State-of-the-Art in Automatic Linking Candidate Terms: Focused on term disambiguation using an auxiliary knowledge base or dictionaries (e.g. wikipedia and wordnet). Link Target: It is selected from a specific knowledge base [1] or from a collection [2] of target documents. Limitations Does not support well users interested in a broader range of domains. [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007. [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans. Knowl. Data Eng. 21(6). 829-839. 2009.
7. Linkator Approach Linkator Extract Terms from Web Pages Associate Terms to Concepts Find Resources that Represents these Concepts Core Linkator Information Extraction Engine Semantic Annotator
8. Link Clicked Page Accessed Page is accessed Annotated page Term are extracted Annotation is extracted Page is semantically annotated Endpoint is chosen Semantic Links created Query is formulated If not found Search for a resource
9. Linkator Approach Web Browser Linkator Client - Firefox Plugin Annotator RDFa Annotator Information Extraction Engine HTTP HTTP Linkator Server Linked Data Endpoint Resolution Sparql Query Formulation
10. Semantic Link – Definition A semantic link is an HTML tag A that is semantically annotated with RDFa. It contains RDF triples associated to it. Semantic Link causes a query over Linked Data.
12. Dereferencing Semantic Links Linkator uses the Linked Data cloudfor discovering a destination for the semantic link as opposed to querying search engines or a fixed knowledge base. Algorithm for Endpoint Resolution Algorithm for Query Formulation
13.
14. Endpoint Resolution Select the vocabulary of all RDF types associated with the annotation. Or select the vocabularies of all predicates associated with the annotation.
15. Endpoint Resolution The SelectEndpoint function find the resource: http://ontoware.org/swrc/swrc_v0.3.owl#Author It extracts the vocabulary associated with this resource:http://ontoware.org/swrc/swrc_v0.3.owl It queries the voiDdescriptor of the available SPARQL endpoints, looking for such a vocabulary.
16. Query Formulation Query is based on the object of the triple. Try to find a human-readable representation of the resource, i.e., try to match predicates such as: foaf:homepage, akt:has-web-address, rdfs:seeAlso.
17. Proof of Concept Semantic links for pages that contain bibliographic citations. Extended version of FreeCite parsing engine. Example of bibliographic citation: Keesvan derSluijs, Geert-Jan Houben, Erwin Leonardi, Jan Hidders. Hera: Engineering Web Applications Using Semantic Web-Based Models. Book chapter: Semantic Web Information Management: A Model-Based Perspective, De Virgilio, Roberto; Giunchiglia, Fausto; Tanca, Letizia (Eds.), Chapter 22, 2010, Springer.
18. Linkator Extract Terms from Web Pages Associate Terms to Concepts Find Resources that Represents these Concepts Core Linkator Information Extraction Engine Semantic Annotator Html Page Sparql Endpoint Discovering and Selection Markup Removed Entity Extraction Plain Text Text Semantically Annotated Endpoint Querying Semantic link clicked Semantic Annotation Insert annotations on the page HTML Page Semantically Annotated URL Generation FreeCite Extraction Engine Core Linkator
25. Conclusion and Future Work For a specific scenario of linking bibliographic citations Linkator provides a reasonable solution. The composition of the Semantic Web technologies can provide a reasonable solution for the problem of automatic linking. Linkator is a concrete application that uses Semantic Web technologies. Future Work: Use Linkator in a broader scenario. Enhance the Linkator algorithms. Evaluate the precision and recall of the linking.
26. Questions? Thank you for your attention! Samur Araujo s.f.cardosodearaujo@tudelft.nl You can download Linkator at: http://www.wis.ewi.tudelft.nl/
27. Annotation on the page are used to find the link destination Annotated HTML Page HTML Page Page is annotated Link is clicked RDF
28. State-of-the-Art in Automatic Linking Example: Wikify! [1] is focused on linking keywords on web pages to Wikipedia articles Nnexus [2] focus on linking keywords obtained from an index extracted from target documents. [1] Mihalcea, R. and Csomai, A. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM 07), Lisbon, Portugal, pp. 233-242, 2007. [2] Gardner JJ, Krowne A, Xiong L. NNexus: An Automatic Linker for Collaborative Web-Based Corpora. IEEE Trans. Knowl. Data Eng. 21(6). 829-839. 2009.
29. Endpoint Resolution FUNCTION SelectEndpoint E := Array R : = select all rdf:type objects associated to the semantic link T := ExtractVocabulary(R) FOR EACH vocabulary in T DO { E.add (select endpoints that contain this vocabulary) } IF E = Empty { R := select all predicates associated to the semantic link T := ExtractVocabulary(R) FOR EACH vocabulary in T DO { E.add (select endpoints that contain this vocabulary) } } RETURN E FUNCTION ExtractVocabulary(R) V := Array FOR EACH resource in R DO { V.add (extract the vocabulary from the resource) } RETURN V 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
30. Semantic Link – Example Triples associated with the semantic link.
Hinweis der Redaktion
I am in the start phase of the phd research. In this presentation, I will outline the vision at the start of the phd period on the research problem which is building trust in web content and our approach to solving this problem. Also I will give a brief plan of my PhD research.
We focus on content trust and formulate our main research questions. The first key issue here is to investigate what kind factors that can influence trust in content.Following the first one, we also need to know how to capture and represent the information about these factors.The third key issue is how to assess or compute content trust based on the information we get from the second step. Ideally we want to have a trust value assigned to every piece of content. Different from the propagation of trust through the network of people, since we now have more information, and semantics about the content, we want to build metrics to assess the trustworthiness based content itself and the connection between different pieces of content, especially the semantic similarity and relation.