2. Interlinking wikis
Digital Enterprise Research Institute www.deri.ie
All wikis share a wide common knowledge, within many different
wiki platforms:
TWiki DokuWiki
MoinMoin
Widely used even in the workplace...
Atlassian Trac
XWiki
Confluence Wiki
All with different structures, platform dependent, all disconnected...
2 of 27
3. Many isolated communities of users and their data
Digital Enterprise Research Institute www.deri.ie
Wikis are also disconnected with other
social media websites
* Source: Pidgin Technologies, www.pidgintech.com
4. Interlinking wikis
Digital Enterprise Research Institute www.deri.ie
We propose a new approach based on Linked Data principles to solve such
issues and to enable semantic search across heterogeneous wiki systems
4 of 27
5. Wiki Models
Digital Enterprise Research Institute www.deri.ie
Several semantic models have been implemented and used within
specific semantic wiki platforms
e.g.:
Semantic MediaWiki
as well as efforts to create generic ontology models:
•WikiOnt ontology (DERI)
•WIF (Wiki Interchange Format) ontology
(Völkel, Oren - 1st Workshop on Semantic Wikis - 2006)
But they are all specific to wikis and not open to other social
websites
5 of 27
6. SIOC
Semantically-Interlinked Online Communities
Digital Enterprise Research Institute www.deri.ie
• A project developed by DERI to semantically describe the content
and structure of community sites
• In particular the SIOC ontology is not specific to wikis and is widely
used on the Web
• It aims to create new connections between online discussion posts
and items, forums, blogs... and wikis.
• Adopted in a framework of more than 50 applications, deployed on
over 400 sites
including Drupal 7 and Yahoo! SearchMonkey
http://sioc-project.org
6 of 27
7. Extending the SIOC ontology
Digital Enterprise Research Institute www.deri.ie
We decided to extend the SIOC ontology to make it compliant with wikis
and make wikis interoperable and linkable to other social objects.
First we considered the typical and relevant features of wikis in terms of
structure and social interactions.
Modeling these features using SIOC has other advantages:
• Integration with existing SIOC data, as well as interlinking with other
RDF data for advanced querying purposes;
• Ability to run the same SPARQL query to find items on a particular wiki
site or on a weblog or a forum.
7 of 27
8. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Multi-authoring: multiple users edit the same content collaboratively.
Feature modeled using the class sioc:UserAccount (subclass of foaf:OnlineAccount) as
object of sioc:has_creator that describes a user account in an online community
site.
In this way a foaf:Person can be linked to several sioc:UserAccount belonging to
different wiki sites.
8 of 27
9. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Multi-authoring: multiple users edit the same content collaboratively.
Feature modeled using the class sioc:UserAccount (subclass of foaf:OnlineAccount) as
object of sioc:has_creator that describes a user account in an online community
site.
In this way a foaf:Person can be linked to several sioc:UserAccount belonging to
different wiki sites.
9 of 27
10. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Categories: sets of articles on related topics which are hierarchically
organized.
A solution is provided by the SKOS vocabulary, as it offers a way to model
hierarchical structures between various categories, as instances of skos:Concept
[Miles, Bechhofer – W3C Recommendation - 2009]
Hence we defined the sioct:Category class as a subclass of skos:Concept.
10 of 27
11. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Social Tagging: non-organized but dynamic organization process.
The properties sioc:topic (using URIs) and dc:subject (using keywords) can be
used to represent tags related to a particular wiki page.
http://wiki.../The_Clash sioc:topic http://wiki.../punk_rock
dc:subject tag:hasTag
Punk rock
11 of 27
12. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Discussions: pages where people can discuss about the article subject.
We added a new sioc:has_discussion property, with domain sioc:Item and open
range (to make this property reusable).
12 of 27
13. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Backlinks: (or “what links here”) wiki internal links pointing to the same
wiki article.
We modeled this feature using the already existing sioc:links_to property
(subproperty of dcterms:references).
13 of 27
14. Relevant wiki features
Digital Enterprise Research Institute www.deri.ie
• Pages Versioning: each page has an associated page history.
In order to define an essential and lightweight model we:
• Added a sioc:latest_version property;
• Added 2 transitive (OWL) properties: sioc:earlier_version & sioc:later_version;
• Defined sioc:later_version as inverse property of sioc:earlier_version;
• Defined sioc:next(previous)_version as subproperty of sioc:later(earlier)_version.
14 of 27
15. SIOC-MediaWiki Exporter
Digital Enterprise Research Institute www.deri.ie
An exporter from a popular wiki platform to expose data in RDF using our
proposed model.
A webservice, written in PHP, that exports a MediaWiki article in RDF publicly
available at:
http://ws.sioc-project.org/mediawiki/
15 of 27
16. SIOC-MediaWiki Exporter
Digital Enterprise Research Institute www.deri.ie
An exporter from a popular wiki platform to expose data in RDF using our
proposed model.
A webservice, written in PHP, that exports a MediaWiki article in RDF publicly
available at:
http://ws.sioc-project.org/mediawiki/
16 of 27
17. Browsing the generated data
Digital Enterprise Research Institute www.deri.ie
RDF data extracted from a wiki page is browsable with tools such as
The Tabulator
To offer a better browsing experience and ease the process of
crawling SIOC exports of MediaWiki instances, the webservice
automatically produces rdfs:seeAlso links between wiki pages,
following the Linked Data practices;
Link to the corresponding Dbpedia resource added automatically, if
the article is from the Wikipedia [English] (with foaf:primaryTopic)
A RDF crawler can easily follow all the seeAlso links found on every
document and continue to crawl, so it is possible to crawl an entire
wiki site starting from a single URI.
18. Browsing the generated data
Digital Enterprise Research Institute www.deri.ie
RDF data extracted from a wiki page is browsable with tools such as
The Tabulator
To offer a better browsing experience and ease the process of
crawling SIOC exports of MediaWiki instances, the webservice
automatically produces rdfs:seeAlso links between wiki pages,
following the Linked Data principles;
Link to the corresponding DBpedia resource added automatically, if
the article is from the Wikipedia [English] (with foaf:primaryTopic)
A RDF crawler can easily follow all the seeAlso links found on every
document and continue to crawl, so it is possible to crawl an entire
wiki site starting from a single URI.
18 of 27
19. The DokuSIOC plugin
Digital Enterprise Research Institute www.deri.ie
A plugin for DokuWiki that exports RDF data using popular lightweight ontologies
(originally developed by M. Haschke, a SIOC contributor).
We modified and extended this plug-in in order to be compliant with our proposed
model and to export all the needed wiki features.
It takes information from the metadata stored in the wiki system about pages,
users, links, etc. and provides it as raw RDF/XML serialized data
(instead of the usual HTML page).
Developed in PHP and easy to install in every DokuWiki system.
It uses the SIOC PHP API.
19 of 27
21. Collecting Data
Digital Enterprise Research Institute www.deri.ie
To evaluate our proposal, we exported and crawled different MediaWiki
and DokuWiki instances: 5 wikis have been crawled, collecting more
than 1GB of RDF data.
More than 3000 wiki articles and 700 users.
RDF data loaded in a triple-store: Sesame + OWLIM
Using the SPARQL endpoint it is possible to run advanced and cross-
sites queries on the top of the data collected by combining FOAF and
SIOC
e.g.:
SELECT DISTINCT ?content
WHERE {
<http://example.org/js#me> foaf:account ?account .
?account rdf:type sioc:UserAccount .
?content sioc:has_creator ?account .
}
21 of 27
22. Collecting Data
Digital Enterprise Research Institute www.deri.ie
SELECT DISTINCT ?content
WHERE {
<http://example.org/js#me> foaf:account ?account .
?account rdf:type sioc:UserAccount .
?content sioc:has_creator ?account .
}
22 of 27
23. Building the application
Digital Enterprise Research Institute www.deri.ie
The data acquisition module is a PHP script that:
queries the triple-store
collects and parses the results
translates the data in the correct format (JSON) for the visualization
layer
The visualization layer has been built with the Exhibit framework by the
MIT SIMILE Project
It is a set of Javascript files directly configurable on the HTML code of
the page to display
It allows for faceted browsing capabilities
23 of 27
25. The underlying queries
Digital Enterprise Research Institute www.deri.ie
The first part shows co-authors of the requested user and their articles in
common.
SELECT DISTINCT ?wiki ?title ?coauthor
WHERE {
?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i").
?pag1 dc:title ?title ;
sioc:has_container ?wiki .
?pag2 dc:title ?title2. FILTER regex(str(?title), str(?title2)).
?pag2 dc:contributor ?coauthor. FILTER ((?coauthor) != (?me)).
}
The second part shows all the articles, and the related categories, contributed
by the requested user on different wikis.
SELECT DISTINCT ? wiki ? title ? category
WHERE {
?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i").
?pag1 dc:title ?title ;
sioc:has_container ?wiki ;
sioc:topic ?category .
}
25 of 27
26. Conclusions and Future Work
Digital Enterprise Research Institute www.deri.ie
Presented how the SIOC ontology and lightweight semantics can be used and
extended to represent the structure of wikis in an unified way;
Demonstrated an overall benefit on applying SemWeb technologies to wikis:
– enabling end-users to access the information generated in a simple and
transparent way,
– showing potentialities that cannot be obtained using the traditional Web
2.0 instruments;
The presented work goes in the direction of creating a collective knowledge
system on the Web following the best Linked Data principles.
Future work:
To provide more details about the content of wiki articles
To add to the system architecture a real-time search functionality
To standardise and spread plugins and exporters
26 of 27