Powerful Google developer tools for immediate impact! (2023-24 C)
SKOS and semantic web best practice to access terminological resources: NatureSDIPlus and CHRONIOUS hand-on experience
1. +
Networked Knowledge Organization
Systems and Services
The 9th European NKOS Workshop
at the 14th ECDL Conference,
Glasgow, Scotland
10 September 2010
SKOS and semantic web best practice to access
terminological resources: NatureSDIPlus and
CHRONIOUS hand-on experience
Riccardo Albertoni,Albertoni@ge.imati.cnr.it
Monica De Martino, Franca Giannini, IMATI-CNR-GE, Italy
2. +
Goals of this presentation
To share hand-on experience we got working in two European
Projects ( NatureSDIPlus and CHRONIOUS)
Motivations which brought us deploying KOS in the projects
SKOS + linked data in NatureSDIPlus
SKOS + OWL Ontologies in CHRONIOUS
Common abstract pipeline to set up and exploit KOS,
Deployments-instantiation of such a pipeline according to constraints
arising in NatureSDIPlus and CHRONIOUS projects
To provide
Hand-on recipes: hopefully, you can adopt, adapt, and enhance our
solutions
bases for a critical discussion
Suggestions from the audience are welcome
3. + NatureSDIPlus CHRONIOUS
ECP-2007-GEO- FP7-ICT-2007–1– 216461,
317007http://www.nature- http://www.chronious.eu/
sdi.eu/ An Open, Ubiquitous and Adaptive
Best Practice Network aimed at Chronic Disease Management
establish a Spatial Data Platform for COPD and CKD
Infrastructure (SDI) for Nature February 2008- 2012,(48 months)
Conservation to define a European framework
October 2008-2011,(30 months) for a generic health status
to enable and improve the monitoring platform addressing
harmonisation of national people with chronic health
datasets on nature conservation. conditions. This will be achieved
The considered data themes are: by developing an intelligent,
Protect Site (Annex I); ubiquitous and adaptive chronic
geogeographical region, Habitat disease platform to be used by
and biotopes and species both patients and clinicians
distribution (Annex III).
We were leading the We were involved in the
TASK defining a Thesaurus-Ontology
module supporting the
terminologythesaurus search for scientific
as common base for literature pertaining to the
Metadata, keyword COP and CK deseases
search
5. +
Define a brand new thesaurus?
Don’t reinvent the wheel !
1. different communities with a large spectrumof
competencies are involved in the Nature
Conservation;
2. many terminologies have been already
developed and adopted on these competencies;
(but still different formats and models)
3. more than one terminologycan be available for a
given competency;
4. terminologies adopted have often a national
origin, so they are not uniform in all the
European countries and even stakeholders from
the same country can adopt different
terminologies in the everyday practice.
6. +
Common thesaurus framework
Integrating well known existing thesauri or
classifications.
Framework Design Requirements
Modularity: Each new thesaurus can be added as a new
module in the framework
Openness: Each terminology/thesaurus should be easily
extendable
Interlinking: Interlinking among the terms and concepts of
different available thesauri is allowed in order to harmonize
terminologies
Exploitability: Framework thesauri encoded in a standard and
flexible format to encourage the adoption and its enrichment
from third parties user and system
9. +
Linked data Best Practice
Web Server http:zzz
skos:prefLabel
http:yyyid2 kichen
skos:broader
skos:prefLabel http:yyyid1
farm animal
skos:broaderMatch Web Server http:xyz web Clients
skos:prefLabel
animal
http:xyzid1
Thesaurus
skos:broader skos:broader HTML
skos:prefLabel skos:prefLabel
dog cat
http:xxxid1 http:xyzid2
tomcat
skos:altLabel
skos:broader skos:broader SPARQL
skos:prefLabel
…..
http:xxxid2
guard dog
SW Clients Thesaurus
Skos/RDF
Tabulator fragments
Web Server http:xxx OpenLinkData Machine-
understandable
form
10. +
Common thesaurus: Integrating well
known existing thesauri or
classifications.
SKOS/RDF as Common thesaurus format supporting the
multilingualism
SKOS/RDF + Linked data best practices paving the way for
Modularity : Each new thesaurus can be added as a new module in
the framework
Openness: Each terminology/thesaurus should be easily
extendable
Interlinking: Interlinking among the terms and concepts of different
available thesauri in order to harmonize their usage
Exploitability: Framework thesauri encoded in a standard and
flexible format to encourage the adoption and its enrichment from
third parties user and system
11. Common thesaurus framework
Current state GEMET
Published by EEA
According to Linked Data
Id1
Best Practice
Skos:ExactMatch
skos:broder
EARTH Id3 skos:broder
Id1
skos:broder skos:broder Skos:ExactMatch
Id4
DMEER/Treats
Id3 Skos:ExactMatch
Biodiversity By Id2 skos:broder
Biogeographical Regions
skos:broder
Skos:RelatedMatch Id4
Id1 skos:related
Id5
Id5 ???
Id6 Id1
skos:broder skos:broder
????
skos:broder
???? skos:broder
Id2 Id1 Id2
Id3
Eunis Habitat Types -
skos:broder skos:broder skos:broder Id3
NATURE 2000 A I
skos:related
Id1
Id2 Id3 Id2
Id6
Id6
skos:broder
skos:broder skos:related
Eunis Species Skos:RelatedMatch
Id2
Id6
Id3
IUCN Classification
skos:related
Id6
13. +
Terminology to index scientific
literature
MeSH is a well known controlled vocabulary used for indexing articles
from MEDLINE/PubMed
But it isn’t enough specialized to deeply cover COPD and CKD
Formal Ontologies have been defined to deepen these diseases
MloC (middle layer), COPD and CKD ontologiesprovided by IFOMIS
However MeSH is still required in Chronious
The search is not always made at the same level of granularity, often keywords
search can be done moving back and forward from coarse to very disease-
specialized concepts
Multilingual support, some “certified” translation are available for example in it, pt,
es
Terminological de facto standard, Clinicians expect it is included
How to combine ontologies and MESH in CHRONIOUS ?
A Skossyfied version of MeSH and we used RDF as a kind of lingua franca
14. +
CHRONIOUS’s KOS
MeSH 2010 Skossyfied MESH
Translations in Italian,
Skossyfied Portuguese, Spanish
SKOS- RDF
- URI
Mapping between
IFOMIS’s Specialized
Skossyfied MeSH and
Ontologies In OWL
Ontologies
18. +
Which resources?
In NatureSDIPlus,
How to manage feedbacks from experts with limited time and
economic resources?
we have more than 30 partners involved (with Multiple
competencies/fields of expertise)
SUGGESTION:
Restricted group of
Questionnaire to Selection has been re-
experts for revise the
partners and partners’ discussed with all
feedbacks
friends by partners by second
surveymonkey.com • Are the resources suggested electronic questionnaire
available in electronic form?
19. +
Copyright?
Extremely tricky: It was extremely hard to find out
Who is the owner of the data..
If we could use and republish data..
Under which restrictions ..
Example in NatureSDIPLUS: We asked to who distributed the
data and the owner, and then we got a mail saying go ahead !!!
Suggestions:
Extremely demanding task..
• to deal with copyright issue since the earliest phase of resource selections
Often you have more than one owner.
•Selecting resources that cannot be exploited as you need might
Not always the distribution issues have been faced at the time data
jeopardize your project efforts
was created
•Take a look at initiative that have been establish in the meanwhile, but if I
had to face this problem now I would start from
Example in CHRONIOUS: you can use MeSH in your systems
but it seems •http://www.opendatacommons.org/guide/
you cannot republish it
•http://www.slideshare.net/jordanhatcher/linked-data-licensing-
provide MeSH as linked data is probably not allowed but you can
introduction-isemantics-2010
provide services based on it
21. +
What we have used …
D2R Server, http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/
To map relational DB into RDF vocabularies
To publish vocabularies as linked data
To dump the data as RDF
Open source from FreieUniversität Berlin
Very simple, if you know SQL, (mySQL), you have just to learn D2RQ
the mapping language
Consideration:
Jena, http://jena.sourceforge.net/ such a bunch of technology at least as
I would recommend
starting point
is a Java framework for building Semantic Web applications. It provides
a programmatic environment for RDF, RDFS and OWL, SPARQL and
includes a rule-based inference engine.
•Tool and framework available for free
•Very limited of work with the HP Labs is required
open source and grown out technological knowledgeSemantic Web
Programme. •Basic semantic weblinked data principle
•JAVA – MySQL
22. +
Where we have used what ..
Project Resources SKOSifycation
NatureSDIPlus Excel, Relational •Importing in MySQL,
Data base •extraction of a simplified data view
•D2R server
CHRONIOUS XML MESH 2010 •Conversion in MYSQL
DUMP •Importing in MySQL,
•extraction of a simplified data view
Italian MESH •D2R server
Translation •DUMP to RDF
Spanish and Ad hoc program developed with JAVA and
Portuguese JENA to read a file and convert the info into
MESH SKOSRDF
Translations
24. +
Suggestion:
Project Kind of Access According to our experience
How
linked data is very good for sharing your
resources with third parties enabling them
NatureSDIPlu Linked data D2R server
to extend your resources
s
However, harvesting can be very costly
So It is very useful to provide also updated
NatureSDIPlu Web Services We provided a DUMP to Partners who had
dump copies of your resources
s similar to SKOS included in their own platform
GEMET API http://www.mdweb-project.org/web
CHRONIOUS Ad-hoc API We developed ad-Hoc API that have been
included in the CHRONIOUS Architecture
26. +
Strategies for Interlinking
Exploitation of domain experts:
The interlinking can be defined manually by the domain experts.
Huge efforts, very tricky to reach a consensus especially when a large group of
experts are involved.
The process can result in a high quality mapping, but only if domain experts are
very willing and knowledgeable.
Exploitation of a priori knowledge:
Very often KOS have been created by common origins or they have been built
including part of other pre-existing resources.
Knowledge about these interrelations can be crucial to link different KOS
generally accepted naming schemata, for instance, DOI for libraries, habitat
classification as NATURA 2000 A I
If the link source and the link target data sets already support one of these
identification schemas, the implicit relationships between entities in data sets
can easily be made explicit.
Exploitation of automatic tools:
The idea behind these tools is to compare concepts belonging to distinct KOS
assessing their similarity, and then they link the concepts whose similarity is
higher than a given threshold.
SILK, discovering relationships between data items within different Linked Data
sources http://www4.wiwiss.fu-berlin.de/bizer/silk/
27. Common thesaurus framework
Interlinking GEMET
Published by EEA
According to Linked Data
Id1
Best Practice
Skos:ExactMatch
Exploitation of domain experts skos:broder
Exploitation of
EARTH
Id1
Id3 skos:broder a priori knowledge
skos:broder skos:broder Skos:ExactMatch
Id4
DMEER/Treats
Id3 Skos:ExactMatch
Biodiversity By Id2 skos:broder
Biogeographical Regions
skos:broder
Skos:RelatedMatch Id4
Id1 skos:related
Id5
Id6 Id1
skos:broder skos:broder
skos:broder
Id2 Id1 skos:broder Id2
Id3
Eunis Habitat Types -
skos:broder skos:broder skos:broder Id3
NATURE 2000 A I
skos:related
Id1 Id3 Id2
Id6
Id6
skos:broder
skos:broder skos:related
Eunis Species Skos:RelatedMatch
Id2
Id3
Id6
Exploitation of
IUCN Classification
skos:related
automatic tools
Id6
28. +
Interlinking
Exploitation of domain experts (EARTh-
BiogeographicalRegionsskos:relatedMatch )
We asked directly to the EARTh team to figure out the connections
between EARTh and BiogeographicalREgions
It worked because their valuable expertise on EARTh and the
limited number of concept in BiogeographicalRegions (80 concepts)
Exploitation of a priori knowledge (EARTh-GEMET,
skos:exactMatch)
EARTh is an extension of GEMET, when a concept come from
GEMET they internally kept the GEMET identifier
E.g., Wood (ID 30510) has GEMETID 9349 within EARTH then
GEMET URI
(http://www.eionet.europa.eu/gemet/concept?cp=9349)
29. +
Interlinking
Exploitation of automatic tools (EUNIS Habitat and Species skos:relatedMatch)
Example of HABITAT : Low energy litoral rock
Species are
easily skos:definition
identifiable in Sheltered to extremely sheltered rocky shores with
the Habitat title very weak to weak tidal streams are typically
and description characterized by a dense cover of fucoid seaweeds
!!!! which form distinct zones (the wrack
[Pelvetiacanaliculata] on the upper shore through
to the wrack [Fucusserratus] on the lower shore).
We didn't use SILK, we defined
…
Ad hoc procedure in JAVA +JENA :
For each HABITAT Y
Extract from Habitat Title and Description A={a1, a2, a3,.., an)
For each X in A
then URI(X) skos:relatedMatch URI (Y) and
URI (Y) skos:relatedMatch URI (X)
30. +
CHRONIOUS: Mapping between
MeSH and Ontologies
Example:
31. +
Mapping between MeSH and
Ontologies
Obtainedby a twostepsprocess
– First step: automaticsyntaticcomparisonbetween
ontologies classlabels and MeSHterms
mesh:mapToEquivalent are created
– Secondstep: manualcheck
Todelete wrong mapping
– Conceptwhosetermshavesamesyntaxbutdifferentsemanti
cs
Tospecialize the mappingifrequired in
– mesh:mapToNarrower
– mesh:mapToBroader
33. +
Void: Vocabulary of Interlinked
Datasets
Void provides metadata for your resources, It makes available
info about
License, data dumps, sparqlEndPoints, Interlinked dataset,
exploited RDF vocabulary, example of Resources, homepage
DETAILED INSTRUCTIONS
http://vocab.deri.ie/void
http://semanticweb.org/wiki/VoiD
EDITOR:
ve2- the voiD editor, http://ld2sd.deri.org/ve2
34. +
Let’s provide a Void to Earth
First suggestion: Define an URI for each of your
resources
<http://purl.org/NET/Earth>rdf:typevoid:Dataset ;
You need a stable URI, so I suggest to exploit some service to try to
have the URI under your control
E.g. PURL to set up an URI (http://purl.com )
THINK twice before using the URL of the KOS web page as URI
What happen if this URL changes?
Example 1: you use your companyInstitute server and
eventually the companyinstitute changes its name
From http://mycompany/mydataset to http://???/mydataset
35. +
Let’s provide a Void to Earth
Second Suggestion: Beware about Functional
Inverse Properties (e.g., foaf:homepage)
<http://purl.org/NET/Earth>foaf:homepage<http://ekolab.iia.cnr.it/
earth_eng.htm>;
Pay attention!!!
foaf:homepage is an Inverse Functional Property
A foaf:homepage C A=B
A owl:sameAs B
A’s VOID description
Someone’s reasoner
B foaf:homepage C
B’s VOID description
36. +
Let’s provide a Void to Earth
<http://purl.org/NET/Earth>rdf:typevoid:Dataset ;
foaf:homepage<http://ekolab.iia.cnr.it/earth_eng.htm> ;
dcterms:title"EARTh" ;
dcterms:description"Enviromental Applications Reference THesaurus" ;
dcterms:publisher<http://dblp.l3s.de/d2r/resource/authors/Riccardo_Albertoni>
;
dcterms:license<http://purl.org/NET/EARTHlicence> ;
void:sparqlEndpoint<http://linkeddata.ge.imati.cnr.it:2020/sparql> ;
void:dataDump<http://purl.oclc.org/net/DumpEarthRDF> ;
void:vocabulary<http://www.w3.org/2004/02/skos/core#> ;
void:vocabulary<http://www.w3.org/1999/02/22-rdf-syntax-ns#> ;
void:exampleResource<http://linkeddata.ge.imati.cnr.it:2020/resource/EARTh/
100000> ;
void:exampleResource<http://linkeddata.ge.imati.cnr.it:2020/resource/EARTh/1304
0> ;
dcterms:subject<http://dbpedia.org/resource/Natural_environment> ;
dcterms:subject<http://dbpedia.org/resource/Thesaurus> ;
void:subset :myDS-DS1 ; # EARTh has also a subset :myDS-DS1
void:subset :myDS-DS2 . # EARTh has also a subset :myDS-DS2
37. +
Let’s provide a Void to Earth
# that are linked to GEMET
:DS1 rdf:typevoid:Dataset ;
foaf:homepage<http://eionet.europa.eu/gemet> ;
dcterms:title"GEMET" ;
dcterms:description"GEneral Multilingual Environmental Thesaurus " ;
void:exampleResource<http://www.eionet.europa.eu/gemet/concept?cp=8344
>.
:myDS-DS1
rdf:typevoid:Linkset ;
void:linkPredicate<http://www.w3.org/2004/02/skos/core#exactMatch> ;
void:target<http://purl.org/NET/Earth> ;
void:target:DS1 .
38. Good .. What once we have Written
+
a VOID description
Publish it
Ve2 provides a support to notify the VOID to Sindice, the RKB voiD
store, the Talis voiD store, and to PingtheSemanticWeb.com.
Publish as RDF in your Web site and notify by your own where to
harvest the VOID description of your data
Search your dataset
By Sindice, (sindice.com)
semantic web index to search RDF fragments
Different query form: keywords, relations (* foaf:knows C),
advance query ( AND, OR, ..)
40. +
Other SINDICE queries
http://sindice.com/
Give me the list of dataset for which Riccardo Albertoni is Publisher
*
<http://purl.org/dc/terms/publisher><http://dblp.l3s.de/d2r/resource/authors/Riccar
do_Albertoni>
Give Me the list of dataset VOID whose Riccardo Albertoni is publish
(*
<http://purl.org/dc/terms/publisher><http://dblp.l3s.de/d2r/resource/authors/Riccar
do_Albertoni>) AND ( * <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type><http://rdfs.org/ns/void#Dataset>)
If you ask for GEMET you get also EARTH
* <http://xmlns.com/foaf/0.1/homepage><http://eionet.europa.eu/gemet>
Give me all RDF fragment pertaining to http://www.eea.europa.eu/data-
and-maps/data/digital-map-of-european-ecological-regions
* <http://xmlns.com/foaf/0.1/homepage><http://www.eea.europa.eu/data-and-
maps/data/digital-map-of-european-ecological-regions>
41. +
Conclusion -Discussion
Resource Translation into
Publication/Access (Inter)linking Advertisement
Selection SKOS
Ad-hoc
programming
Which Linked data
DB + D2R to relate VOID
resources? by D2R
published
entities
Ad –hoc API
Copyright CONVERTER developed by SKOS & OWL SINDICE
Jena
42. + For further information & questions
please write to
Riccardo.Albertoni@ge.imati.cnr.it
The interlinking procedure among KOS aims at harmonizes the use of their terms when the terms are referring to the same concepts in more than one thesaurus. It enables the access to the same concept from a multicultural point of view allowing to move from a KOS to another. The interlinking is not aimed to connect all KOS to each others, but connections between KOS should be created only where they are useful and sound. The interlinking is one of the hot topic addressed by the semantic web/linked data research community. As far as we know there are no consolidated practices, but different strategies are employed according to the characteristics of the resources that have to be interlinked. These strategies can be roughly grouped as follows: Strategy A: “Exploitation of automatic tools”. The idea behind these tools is to compare concepts belonging to distinct KOS assessing their similarity, and then they link the concepts whose similarity is higher than a given threshold. Different similarity measures can be employed considering the concept definition, their broader, narrow, related concepts. In particular, as it recently pointed out by (Heath et al, 2009), two research frameworks have been developed by the linked data community so far: SILK (Volz et al., 2009) and LinQL (Hassanzadeh et al., 2009). The Silk framework works against local and remote SPARQL endpoints and it is designed to be employed in distributed environments without having to replicate data sets locally. The LinQL framework works over relational databases and it is designed to be used together with database to RDF mapping tools such as D2R Server tools. Strategy B: “Exploitation of a priori knowledge”. Very often KOS have been created by common origins or they have been built including part of other pre-existing resources. Knowledge about these interrelations can be crucial to link different KOS, since it suggests a good starting point to find comparable concepts. It is even more effective when they refer to generally accepted naming schemata. For instance, in the domain of Nature Conservation different agencies have produced data referring to habitat classification as NATURA 2000 A I. If the link source and the link target data sets already support one of these identification schemas, the implicit relationships between entities in data sets can easily be made explicit.Strategy C: “Exploitation of domain experts”. The interlinking can be defined manually by the domain experts. The management of this activity requires a huge effort. In particular, it is very tricky to reach a consensus especially when a large group of experts are involved. The process can result in a high quality mapping, but only if domain experts are very willing and knowledgeable. In order to reach a high quality, it is highly recommendable the involvement of experts who have originally defined the thesauri and classifications which are going to be mapped. All these strategies have been adopted to realise the interlinking within our Common Thesaurus Framework. In the first stage, we have considered SILK and LinQL as automatic tool to interlink the sub-thesauri according to the strategy A. The rationale behind these tools is very promising and interesting especially from a research point of view but they are available only as research prototypes (alpha/beta release) and they have not yet reached the technological maturity required for our purpose and to scale up the huge number of concepts that are provided in the framework. Since, the quest for this kind of tools came from a large set of applications and projects, we foresee that the aforementioned tools will be soon improved or brand new more exploitable and stable tools will be soon available. Anyway, the attempt of using these prototypic automatic tools has been very inspiring in order to outline the approach adopted in our activity. In particular, inspired by LinQL which work at the database level, we have combined the automatic and the a priori strategies (strategies A and B) to interlink habitats and species. Habitats in the EUNIS database were equipped with descriptions listing the species that they guest. Although different taxonomies about species exist, we assumed the species used in the habitat description were coherent with the species taxonomy because both the resources were provided by EUNIS. Exploiting such a priori knowledge and inspired by the approach followed by LinQL, we have developed a store procedure adding an link between a specie and a habitat whenever the species were mentioned in the habitat descriptions.The link between EARTh and Biogeographical regions has been obtained exploiting the domain experts (strategyC). We have exploited EARTh developers’ knowledge to find out a mapping between EARTh and the resources pertaining to the biogeographic regions. This activity was feasible because the moderate number of concepts contained in the biogeographical region (about 80) and the deep expert’s knowledge of EARTh.Finally, the strategy B has been deployed to interlink the EARTH and GEMET, as well as to map the EUNIS Species published in our framework to the official linked data EUNIS species dataset. In the former, EARTh is a significantly extension of GEMET [Plini et al., 2009], and it contains explicit references to the GEMET ID for the EARTh concepts that are in common with GEMET. Exploiting these references and the fact that GEMET has been published assigning URI derived by the original GEMET ID, it has been possible to materialize the links from EARTh to GEMET. In the latter, we have exposed as SKOS concepts the species and their whole taxonomic information (e.g., Kingdom, phylum, class, order, family and genus). After our exposing, EUNIS provided Species as linked data. Since we know how we exploited the internal species identifiers to provide an URI to each species we provided, we know the correspondence between the EUNIS official species and those we have provided. So we have been able to interlink these two datasets.
In the case of GEMET I was not able to find a URI set by the EEA which refers to GEMETSo we are forced to rely on homepage IFP to discover that EARTh is connected to GEMET
Resource selection. It is the phase that identifies and selects a list of the relevant resources, i.e., classifications, terminologies and thesauri. Criteria adopted in the selection such as relevance of resource content, issues concerning the copyright constraints and their availability as structured digital form will be introduced.Resource translation in SKOS (SKOSifycation) is the phase where the data model of the selected resources is mapped in the SKOS model. The deployed strategies will be described considering that further RDF vocabularies might be also considered to complement the SKOS model whenever information relevant for the final application does not completely fit the SKOS Model;KOS PublicationAccess is fundamental to make the SKOSsifyied KOS available to third parties. The strategies exploited in the two mentioned projects will be presented. In particular we will detail the Linked Data Best Practices echnologies adopted in NatureSDIPlus and the JAVA API developed to grant the access in CHONIOUS. Knowledge organization systems interlinking: Some specific mappings among KOS, and between KOS and Ontologies, have been defined in the projects relying on a-priori knowledge (e.g., Knowledge about how KOS have been originated) andor developing appropriate semi-automatic procedure. The interlinking procedure and their results will be shown in the presentation.KOS advertisement: Once KOS have been interlinked and stable versions have been reached, specific strategies to advertise them in related communities should be foreseen. In this direction there are different actions that can be undertaken to advertise the KOS availability. The actions as registering KOS in existing websites, producing VOID descriptions, submitting the KOS to semantic web search engines as SINDICE will be briefly introduced as strategies that we are going to be deployed.