This document summarizes work from the INSEMTIVES project on developing models and methods for creating and using lightweight, structured knowledge on the semantic web. It discusses problems with current web annotations like synonymy, polysemy, and specificity gaps. The project aims to address these by developing models to enrich web annotations with semantics and associated services. Key challenges include determining the right level of semantic complexity for users and algorithms for bootstrapping annotations, reaching consensus on vocabularies, and evolving annotations over time.
1. 5/17/2011 www.insemtives.eu 1 INSEMTIVESIncentives for Semantics WP 2 - Models and Methods for the Creation and Usage of Lightweight, Structured Knowledge Pierre Andrews, Ilya ZaihrayeuUNITN
3. Motivation The current Web 2.0 annotation model lacks formal semantics and, therefore, suffers from several shortcomings, e.g.: Searching for “images” should (not) return resources annotated with “picture” (synonymy problem) Searching for “java” (books) should (not) return resources annotated with “java” (drink) (polysemy problem) Searching for “animals” should (not) return resources annotated with “dogs” (specificity gap problem) These problems have a negative effect on the QoS for the end user (e.g., correctness, completeness) An annotation model with formal semantics can address these (and other) problems and enable new services 5/17/2011 www.insemtives.eu 3
4. Aims and outcomes Key problem: how to enrich the Web 2.0 user-annotation-resource model with semantics and semantics-aware services that an ordinary user can comprehend and make use of? Models for annotations: Structural complexity (e.g., tags vs. attributes) Vocabulary support (e.g., free tags vs. controlled tags) Collaboration level (e.g., single user vs. shared vocabularies) Services: Annotation bootstrapping (help the user at the initial phase) Vocabulary evolution (help users “talk” the same annotation language) Annotation evolution (keep annotations synced with the vocabulary) Semantic search Annotation linkage (enable cross-platform services) Semantics-aware services 5/17/2011 www.insemtives.eu 4 annotation sematnics User Resource
5. Research – key challenges What is the right level of complexity of semantics to let the ordinary user generate semantic annotations and to provide the user with useful new services? How to (semi-)automatically extract annotations from user generated contents and link them to the underlying model? How to support the users in (re)using the same semantics for annotations? How to keep semantic annotations up-to-date when the underlying semantic model changes? 5/17/2011 www.insemtives.eu 5
6. WP 2 TIMELINE AND DELIVERABLES Months 24 12 18 30 36 6 0 Tasks D2.1.1: State of the Art and requirements from the use case partners D2.1.2: Specification of the model Task 2.1Designing models UIBK D2.2.1: Report on bootstrapping semantic annotations and on reaching consensus in the use of semantics D2.2.2+D2.2.3: Report on linking semantic annotations to external sources and on keeping them up-to-date when the underlying semantic model changes D2.4 Report on the refinement of the proposed models, methods and semantic search Task 2.2Designingmethods UNITN Task 2.3Research on Information Retrieval (IR) methods for semantic content D2.3.1: Requirements for semantics-aware IR methods D2.3.2: Specification for semantics-aware IR methods ONTO
7. Outcomes Semantic annotation model Annotation bootstrapping algorithm Consensus reaching algorithm Algorithm for supporting the evolution in time of annotations Algorithms for semantic search and faceted navigation Algorithms for linking semantic annotations to external sources 5/17/2011 www.insemtives.eu 7
8. User involvement Resource 2 1. Uncontrolledannotation User(annotator) Uncontrolledannotations annotate 2. Vocabularyevolution via consensus on the use Consensus - ontology maturing Consult and import Resource 1 Bootstrappedannotations Manuallyaddedannotations Controlledannotations publish Externalsources (DBPEdia, Yago, etc) Link to User(creator) Context 3. Annotation evolution bootstraping file User involvement Search, navigate User(consumer) Annotation lifecycle D2.2.1, D2.2.2, D2.2.3 User involvement
19. Taxonomyrelations attributes single user (private) uncontrolled tags single user (public) collective Authority file taxonomy Community Type Vocabulary Type
20. Bootstrapping: motivation Data on the web grows very fast: 161 exabytes (108 TB) ofinformation was created or replicated worldwide in 2006 6X growth is predicted by 2010 The largest source of data is: user generated content with 4+ billion devices – cameras, phones, PCs, CCTVs – mostly multimedia! will increase 50% by 2010 However, at the publishing time, metadata encoded in the local context in which the multimedia items resided is lost Source: invited talk of Michael Brodie at VLDB 2007
27. 13 163 034 tag-photo pairsAbout 49% of tags are used only once % Of Total Vocabulary Frequently used tags make a small part of the total vocabulary (less 1 % > 512 times) How many times a tag is used
32. Evolution in Time 5/17/2011 www.insemtives.eu 16 resource Sea Sea#1 Adriatic#1 evolve evolve water#2 Sea#1 … Adriatic#1 …
33. Evolution in time through clustering 5/17/2011 www.insemtives.eu 17 Adriatic#sea Sea#water mass
34. Outlook Further specification and refinement of models, algorithms, and semantic IR methods (to be reported in D2.4 due on M20) Know-how transfer and integration with WP3 and WP4 as well as with use case WPs Work on the implementation of models and algorithms (part of WP3) 5/17/2011 www.insemtives.eu 18