Pundit is a novel semantic annotation tool that allows users to create structured data while annotating Web pages relying on stand-off mark-up techniques. Pundit provides support for different types of annotations, ranging from simple comments to se- mantic links to Web of data entities and fine granular cross-references and citations. In addition, it can be configured to include custom con- trolled vocabularies and has been designed to enable groups of users to share their annotations and collaboratively create structured knowledge. Pundit allows creating semantically typed relations among heterogeneous resources, both having different multimedia formats and belonging to dif- ferent pages and domains. In this way, annotations can reinforce existing data connections or create new ones and augment original information generating new semantically structured aggregations of knowledge. These can later be exploited both by other users to better navigate Digital Libraries and Web content, and by applications to improve data management.
Pundit: Semantically Structured Annotations for Web Contents and Digital Libraries
1. SDA 2012
Semantic Digital Archives
PUNDIT: SEMANTICALLY STRUCTURED
ANNOTATIONS FOR WEB CONTENTS
AND DIGITAL LIBRARIES
Marco Grassi(1), Christian Morbidoni(2), Michele Nucci(3),
Simone Fonda(4), Giovanni Ledda(5)
Semedia
(Semantic Web and Multimedia)
http://semedia.dii.univpm.it www.netseven.it/
(1,2,3,5) DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, Italy
(4) - NET7 srl
This work is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0)
2. THE WEB SCENARIO
• Annotating web content has become a
common task
• Comments and tags are widely supported by
mainstream application
• Many tools to bookmark, highlight, comment
web page fragments
• Some tools support collaborative annotations
• Web content annotations are beneficial:
• More engaging and productive user experience
• Exploit social engagement to improve resource
ranking, classification
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
3. DL SCENARIO
• Digital Libraries (DL) are no longer simple “expositions” of digital objects but
provide users with more interaction Experts
Create Contents
Add Content Add Annotations
Experts
on
Digital Library
cti
Consume Commenting
Contents
ra
Tagging Linking
te
Create Contents Consume
Expert model Contents
rI n
Digital Library
se
Experts
U
Consume Commenting Users
Contents
Crowdsourcing
Tagging Linking
Consume
Contents
Create Contents
Digital Library
Users
Consume Contents
Social Engagement
Users
• Crowdsourcing experiments for enriching DL, curating contents or uploading digital
material of interest for the DL (BBC WW2 People’s War, …)
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
4. WHAT’S MISSING? ...
• Most of existing annotation tools are
usually limited to simple textual tags and Orange?
comments.
• limitation due to the ambiguity of natural
language
• their semantic is not machine interpretable
Limitation in the efficiency of resource classification and retrieval and in the
possibility to reuse these annotations in other context
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
5. SEMANTICALLY STRUCTURED
ANNOTATIONS
• Semantically structured annotations to make smart use of such added
knowledge:
• Unambiguously express semantics to be processed by software agents:
• annotations can be harvested periodically and publish back
• used by recommender systems or search engines,
• ...
• Enhance Digital Libraries capabilities
• improving browsing
• enabling automatic content classification
• ...
• Reuse such a collaborative knowledge in different contexts and by different
applications
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
6. SEMANTICALLY STRUCTURED
ANNOTATIONS
User should be able to create knowledge graphs where web content
fragments, concepts and entities are meaningfully connected.
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
7. SEMANTICALLY STRUCTURED
ANNOTATIONS
• Rely on controlled vocabularies and ontologies
• share the same terminology and “talk about the same things”
• annotations can be meaningfully mashed-up
• Link to the emerging Web of Data
• a software can automatically get additional, useful semantic data (e.g. date and place of
birth, pictures, citations, multi-language data)
Augmenting the information
of the annotation and of the
original content to support
smarter application behaviors!
Ex. We have discovered that the two
images contain american film actors
showing anger emotion!
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
8. • Pundit is a novel semantic annotation tool:
Semedia (Semantic Web and Multimedia)
http://semedia.dii.univpm.it
• developed by: with the collaboration of NET7
Semlib Project Eu Project
• funded by: http://semedia.dii.univpm.it
• supported and
further developed in: DM2E EU Project AGORA EU Project
http://dm2e.edu/ http://project-agora.eu/
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
9. SEMLIB PROJECT
Semlib Project
Semantic Web Tools for DL
http://www.semlibproject.eu/
• R&D project supported by EU FP7 Theme: Research for SMEs (no. FP7-SME -2010-01- 262301 -
SEMLIB)
• 24 months (commenced in January 2011, currently at month 19)
www.semedia.dii.univpm.it/ www.deri.ie/
www.in-two.com www.liberologico.com/ www.knowledgehives.com/ www.netseven.it/
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
10. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology
(currently working to provide full compliancy with OA)
Contextual Information
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
11. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology
(currently working to provide full compliancy with OA)
Contextual Information
Annotation Content
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
12. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology
(currently working to provide full compliancy with OA)
Semantically Structured Content
Contextual Information
Annotation Content
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
13. ANNOTATION MODEL
• Based on Open Annotation Collaboration (OAC) ontology
(currently working to provide full compliancy with OA)
SPARQL support to query
slices of knowledge
Named Graph
Contextual Information
Annotation Content
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
14. NAMED GRAPHS AS BODIES
...allow to keep separated statements belonging to different annotations...
2011-01-27 10:30:56 2011-09-27 11:43:12
ex:MarcoGrassi Annotation 1 Annotation 2
ex:MarcoGrassi
dcterms:created dcterms:created
dcterms:creator
rdfs:label rdfs:label
dcterms:creator
An example annotation showing the Another annotation whose content can be
annotation model merged with the former one
oac:Annotation
rdfs:comment rdfs:comment
a
ex:ANNOTATION-ID-1 ex:ANNOTATION-ID-2
ex:ANNOTATION-GRAPH-ID-1 ex:ANNOTATION-GRAPH-ID-2
http://example.com/
oac:hasBody oac:hasBody
mypage.htm#textFragment
http://example.com/ http://example.com/ 2
mypage.htm#textFragment 1.htm
semlib:hasSimilarContent
oac:hasTarget semlib:mentionsPeriod rdfs:label a
rdfs:label http://example.com/
mypage.htm#textFragment
http://example.com/ semlib:Renassance
oac:hasTarget
mypage.htm#textFragment
Fragment: Dante
semlib:mentionPeriod Alighieri life has oac:hasTarget
semlib:mentionsAuthor been.. oac:Annotation
Fragment: Durante gli
Alighieri... semlib:talksAbout
semlin:Renassance http://example.com/
semlib:DanteAlighieri mypage.htm#textFragment2
http://example.com/
semlib:depicts
img1.jpeg semlib:Politics
http://example.com/
img1.jpeg
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
15. NAMED GRAPHS AS BODIES
...allow to keep separated statements belonging to different annotations...
2011-01-27 10:30:56 2011-09-27 11:43:12
ex:MarcoGrassi Annotation 1 Annotation 2
ex:MarcoGrassi
dcterms:created dcterms:created
dcterms:creator
rdfs:label rdfs:label
dcterms:creator
An example annotation showing the Another annotation whose content can be
annotation model merged with the former one
oac:Annotation
rdfs:comment rdfs:comment
a
ex:ANNOTATION-ID-1 ex:ANNOTATION-ID-2
ex:ANNOTATION-GRAPH-ID-1 ex:ANNOTATION-GRAPH-ID-2
http://example.com/
oac:hasBody oac:hasBody
mypage.htm#textFragment
http://example.com/ http://example.com/ 2
mypage.htm#textFragment 1.htm
semlib:hasSimilarContent
oac:hasTarget semlib:mentionsPeriod rdfs:label a
rdfs:label http://example.com/
mypage.htm#textFragment
http://example.com/ semlib:Renassance
oac:hasTarget
mypage.htm#textFragment
Fragment: Dante
semlib:mentionPeriod Alighieri life has oac:hasTarget
semlib:mentionsAuthor been.. oac:Annotation
Fragment: Durante gli
Alighieri... semlib:talksAbout
semlin:Renassance http://example.com/
semlib:DanteAlighieri mypage.htm#textFragment2
http://example.com/
semlib:depicts
img1.jpeg semlib:Politics
http://example.com/
img1.jpeg
http://example.com/
mypage.htm#textFragment Fragment: Dante
rdfs:label Alighieri life has
2
been..
semlib:hasSimilarContent
semlib:talksAbout
http://example.com/
mypage.htm#textFragment
semlib:mentionPeriod semlib:Politics
rdfs:label
semlib:mentionsPeriod
Fragment: Durante gli
Alighieri... semlib:mentionsAuthor semlib:Renassance
http://example.com/
img1.jpeg semlib:depicts semlib:DanteAlighieri
...but enable to aggregate them into “composite’ graphs and query them using standard SPARQL
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
16. NOTEBOOKS
• Annotations are collected in notebooks
2011-01-27 10:30:56 • Users can organize their annotations
dcterms:creator
dcterms:created
My Example Notebook • Aggregate annotations to be retrieved and
rdfs:label
queried
An Example Notebook
used to show the model
rdfs:comment • Different UNIX style read/write privileges
(from private to completely public)*
NotebookURI
• Activate/Deactivate a notebook to filter the
amount of public annotations visualizing only
those of interest.
• Identified by a (dereferenciable) URI
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
17. NOTEBOOKS
• Notebooks allow annotations sharing
2011-01-27 10:30:56
dcterms:creator E SINGLE USER
R
HA
My Example Notebook
dcterms:created S
RI
kU
rdfs:label oo
teb
An Example Notebook No
used to show the model
WIKI
SHARE
rdfs:comment
NotebookURI
NotebookURI SH COMMUNITIES
AR
No E
te
bo
ok
U RI
PUBLIC
• Sharing a notebook is as easy as sharing its URL on the web (similarly to
popular file sharing platforms)
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
18. NOTEBOOK MANAGEMENT
• Create new notebooks
• Set the current notebook (where the
annotations are written)
• Set notebook private or public
• Activate/deactivate owned notebooks
or public notebook to filter annotations
of interest
• Share notebook by URI
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
19. USER AUTHENTICATION
• Authentication is based on OpenID:
• No need to store user’s credentials
• Implemented already by mainstream company (Google, Yahoo, ...)
• Possibly avoid user multiple registration (waste of time, another password)
• Single identity can be used among different Pundit-enabled Digital Libraries
• Adding an OpenID provider is easy and transparent to the Pundit server.
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
20. PUNDIT ARCHITECTURE
CLIENT
• Set of Javascript modules (Dojo Framework)
• Easily extendable
• Highly customizable
• Open Source RESTful Web Service (Java Jersey
framework)
• Cross origin request
SERVER • CORS (Cross-Origin Resource Sharing)
• JSONP
• Sesame triple store
• SPARQL and inference
• Different sail are provided to implement different
storages (BigOWLIM, MySQL, PostgreeSQL, Virtuoso ...)
• MySQL for user data
• RESTful API to edit and consume annotations
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
21. DIFFERENT ANNOTABLE CONTENTS
• Pundit allows the annotation of different types of
contents at different level of granularity
• Text fragments
• Images
• Image fragments (under development)
• Videos and video fragments (experimented in Semtube)
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
22. • Semantic annotation of YouTube videos (alpha state) based on Pundit
JavaScript libraries and annotation server
http://semedia.dii.univpm.it/semtube
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
23. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Comment/Tag Panel
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
24. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
25. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
• Semantic Tags
• Automatically extracted from textual
comments (Dbpedia Spotlight)
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
26. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
• Textual comments Comment/Tag Panel
• Semantic Tags
• Automatically extracted from textual
comments (Dbpedia Spotlight)
• Popular Linked Data service(Dbpedia,
Freebase, Wordnet, ..)
• Define your own source of named
entities (SPARQL endpoint, HTTP API)
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
27. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Triple Composer
• Textual comments
• Semantic Tags
• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user
selection, linked data entities, ...) with
semantically defined properties
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
28. DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Triple Composer
• Textual comments
• Semantic Tags
• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user
selection, linked data entities, ...) with
semantically defined properties
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
29. CUSTOM VOCABULARIES
• Pundit allows to use custom vocabularies/taxonomies (and
relations):
• Create a JSONp file (manually or automatically from an ontology )
• Put it online
• Add its URL to the configuration to import and use it
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
30. CROSS PAGE / DOMAIN ANNOTATIONS
• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be
added to “My Items” to be stored on server and reused on different pages
Add to My Items
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
31. CROSS PAGE / DOMAIN ANNOTATIONS
• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be
added to “My Items” to be stored on server and reused on different pages
Use in another page
Add to My Items
cites
Create cross page semantic relations
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
32. NAMED CONTENT
• DLs change over time
<div class="pundit-content" about="http://example.org/contents/123">
• Presentation can restyled and content can be <!-- HTML goes here. -->
re-organized <p>This is a named content and contains both text and a picture</p>
<img src="http://example.org/pictires/pictire123.png" />
• Same content in different pages <p><em>Caption:</em> this is a caption.</p>
</div>
• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the
pages to allows Pundit:
• identifying atomic pieces of content (by
means of URI)
• attaching the annotations to such
contents
• avoid the annotation of page accessory
component
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
33. NAMED CONTENT
• DLs change over time
<div class="pundit-content" about="http://example.org/contents/123">
• Presentation can restyled and content can be <!-- HTML goes here. -->
re-organized <p>This is a named content and contains both text and a picture</p>
<img src="http://example.org/pictires/pictire123.png" />
• Same content in different pages <p><em>Caption:</em> this is a caption.</p>
</div>
• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the
pages to allows Pundit:
• identifying atomic pieces of content (by
means of URI)
• attaching the annotations to such
contents
• avoid the annotation of page accessory
component
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
34. NAMED CONTENT
Text
The same content in different pages
shows the same annotations!
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
35. NAMED CONTENT
Text
The same content in different pages
shows the same annotations!
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
36. CONSUMING THE ANNOTATIONS
• PUNDIT server provides RESTfull APIs
to consume annotations.
• (Public) annotations can be consumed
by third party applications.
• Currently conceiving and developing
apps to display and reuse pundit
annotation
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
37. ASK THE PUND
• A social web app consuming people's annotations, which let group of people
to organize them into a shared collection, telling a meaningful story with it.
http://ask.thepund.it/
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
38. EDGEMAPS VISUALIZATION
• An Edgemaps graph populated with Pundit annotations
http://thepund.it/edgemaps_demo/demo.html
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
39. TIMELINE ANNOTATION
http://ask.thepund.it/#/timeline/31951d93
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
40. MORE...
• Find our and suggest more: http://thepund.it/okfest.php
...and don’t forget to leave some feedbacks :-) !!!
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
41. DEMO TIME!
http://thepund.it
SDA 2012 Pundit: Semantically Structured Annotations for Web Contents... m.grassi@univpm.it
42. SDA 2012
Semantic Digital Archives
THANK YOU!
http://thepund.it
Semedia
(Semantic Web and Multimedia)
http://semedia.dii.univpm.it www.netseven.it/
Semlib Project Eu Project DM2E EU Project AGORA EU Project
http://www.semlibproject.eu/ http://dm2e.edu/ http://project-agora.eu/
This work is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0)