The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies bringing their semantic to the data being published. These ontologies should be evaluated at different stages, both during their development and their publication. As important as correctly modelling the intended part of the world to be captured in an ontology, is publishing, sharing and facilitating the (re)use of the obtained model. In this paper, 11 evaluation characteristics, with respect to publish, share and facilitate the reuse, are proposed. In particular, 6 good practices and 5 pitfalls are presented, together with their associated detection methods. In addition, a grid-based rating system is generated showing the results of analysing the vocabularies gathered in LOV repository. Both contributions, the set of evaluation characteristics and the grid system, could be useful for ontologists in order to reuse existing LD vocabularies or to check the one being built.
Scaling API-first – The story of a global engineering organization
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
1. Detecting Good Practices and
Pitfalls when Publishing
Vocabularies on the Web
María Poveda-Villalón1, Bernard Vatant2, Mari Carmen SuárezFigueroa1, Asunción Gómez-Pérez1 ,
1Ontology
Engineering Group. Universidad Politécnica de Madrid. Spain.
2Mondeca, Paris, France.
mpoveda@fi.upm.es, bernard.vatant@mondeca.com, {mcsuarez, asun}@fi.upm.es
Speaker: Asunción Gómez-Pérez
Contact author: María Poveda-Villalón: mpoveda@fi.upm.es
Date: 10/28/13
2. Table of Contents
• Introduction
• Good practices and pitfalls for publishing
vocabularies
• Results and Analysis over LOV vocabularies
• Conclusions and future work
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
2
3. Introduction
• Different formats: RDFS, OWL, HTML
• Different configurations
• Do they ease or impede applications
consuming vocabularies?
Ø Good practices & Pitfalls
Vocabularies bring semantics to data
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/”
Along this work:
• Detailed analysis of 355 vocabularies gathered in the
LOV registry (http://lov.okfn.org/)
• Why LOV: complete information about each
vocabulary, namely URI, namespace and prefix
• Results:
1. a non exhaustive list of good practices and
pitfalls about publishing LD vocabularies
2. specific methods for detecting such good
practices and pitfalls
3. some metadata about ontology quality
4. the inclusion of pitfalls in services such as
OOPS! (http://www.oeg-upm.net/oops) to help
eager vocabulary managers
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
3
4. Table of Contents
• Introduction
• Good practices and pitfalls for publishing
vocabularies
• Previous work
• Proposed good practices and pitfalls
• Results and analysis over LOV vocabularies
• Conclusions and future work
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
4
5. Previous work (I)
Linked Open Data 5 Star rating system (Tim Bernes-Lee) http://www.w3.org/DesignIssues/
LinkedData.html. 2006 (last change 2009).
LOD1. Available on the web (whatever format) but with an open licence, to be Open Data
LOD2. Available as machine-readable structured data (e.g. excel instead of image scan of a table)
LOD3. As (2) plus non-proprietary format (e.g. CSV instead of excel)
LOD4. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things,
so that people can point at your stuff
LOD5. All the above plus Link your data to other people’s data to provide context
Is your linked data vocabulary 5-star? (Bernard Vatant) http://bvatant.blogspot.fr/2012/02/is-yourlinked-data-vocabulary-5-star_9588.html. 2012.
LDV1. Publish your vocabulary on the Web at a stable URI
LDV2. Provide human-readable documentation and basic metadata such as creator, publisher,
date of creation, last modification, version number
LDV3. Provide labels and descriptions, if possible in several languages, to make your
vocabulary usable in multiple linguistic scopes
LDV4. Make your vocabulary available via its namespace URI, both as a formal file and humanreadable documentation, using content negotiation
LDV5. Link to other vocabularies by re-using elements rather than re-inventing
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
5
6. Previous work (II)
Archer, P., Goedertier, S., and Loutas, N. D7.1.3 – Study on persistent URIs, with identification of
best practices and recommendations on the topic for the MSs and the EC. Deliverable. December
17, 2012.
Heath, T., Bizer, C.: Linked data: Evolving the Web into a global data space (1st edition). Morgan &
Claypool. 2011.
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
6
7. Proposed good practices and pitfalls
Proposals
Inspired by
Previous work brief reminder
Linked Open Data 5 Star
LOD1. on the web. Open.
LOD2. machine-readable
LOD3. non-proprietary
LOD4. open standards
LOD5. Link
Good practices
GP1. Provide RDF description
GP2. Provide HTML documentation
GP3. Content negotiation for RDF
GP4. Content negotiation for HTML
GP5. Provide vann metadata
GP6. Well-established prefix
Is your linked data vocabulary 5-star?
LDV1. vocabulary on the Web
LDV2. human-readable and metadata
LDV3. labels and descriptions
LDV4. content negotiation
LDV5. Link
Pitfalls
P36. URI contains file extension
P37. Ontology not available
P38. No OWL ontology declaration
P39. Ambiguous namespace
P40. Namespace hijacking
10 rules for persistent URIs
✔
Linked data: Evolving the Web
into a global data space:
“Only define new terms in a
namespace that you control.”
✖
• Follow the pattern
• Re-use existing identifiers
• Multiple representations
• Implements 303 redirects
• Use a dedicated server
• Avoid stating ownership
• Avoid version numbers
• Avoid using auto-increment
• Avoid query strings
• Avoid file extensions
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
7
8. Table of Contents
• Introduction
• Good practices and pitfalls for publishing
vocabularies
• Results and analysis over LOV vocabularies
• Conclusions and future work
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
8
9. Results and analysis over LOV vocabularies (I)
Good practices and pitfalls frequency
355 vocabularies registered in LOV - 19th June, 2013
GP1. Provide RDF
description
GP2. Provide HTML
documentation
GP3. Content negotiation for
RDF
GP4. Content negotiation for
HTML
GP5. Provide vann metadata
GP6. Well-established prefix
Pitfalls distribution
Good practices distribution
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
9
P36. URI contains file
extension
P37. Ontology not
available
P38. No OWL ontology
declaration
P39. Ambiguous
namespace
P40. Namespace
hijacking
10. Results and analysis over LOV vocabularies (I)
Grid with vocabularies according to the number of good practices and pitfalls observed.
Available at http://goo.gl/zu9ZbW
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
10
11. Table of Contents
• Introduction
• Good practices and pitfalls for publishing
vocabularies
• Results and analysis over LOV vocabularies
• Conclusions and future work
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
11
12. Conclusions
• 6 good practices and 5 pitfalls proposed
• Based on existing works
• Implementation of the detection methods
• Grid-based rating system proposed. Useful for:
• Vocabulary registry maintainers
• Vocabulary developers and creators
• Execution over 355 vocabularies
• All good practices and pitfalls are observed
• Some of them surprisingly (e.g.: P40. Namespace hijacking)
• LOV vocabularies seem to be well maintained and likely to be high quality . Due to
semi-handcrafted maintenance instead of crawlers?
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
12
13. Future work (I)
Linked Open Data 5 Star
LOD1. on the web. Open.
LOD2. machine-readable
LOD3. non-proprietary
LOD4. open standards
LOD5. Link
• Take into account:
• metadata about licences
• other metadata, e.g., creators, authors,
dates, languages, etc.
• linguistic information
• reused terms from other vocabularies
• Provide guidelines to solve pitfalls and to
follow good practices
• Execute methods over LOV in regular basis
• Observe evaluation of the ecosystem
• Draw tends for vocabulary publication
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Is your linked data vocabulary 5-star?
LDV1. vocabulary on the Web
LDV2. human-readable and metadata
LDV3. labels and descriptions
LDV4. content negotiation
LDV5. Link
13
14. Future work (II)
• Integration with third party systems. E.g.
• LOV search
• OOPS! - OntOlogy Pitfall Scanner! (http://oeg-upm.net/oops/)
ü Done for pitfalls
• Assign importance levels for good practices and pitfalls
ü Done for pitfalls
…
…
…
…
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
14
16. Detecting Good Practices and
Pitfalls when Publishing
Vocabularies on the Web
María Poveda-Villalón1, Bernard Vatant2, Mari Carmen SuárezFigueroa1, Asunción Gómez-Pérez1 ,
1Ontology
Engineering Group. Universidad Politécnica de Madrid. Spain.
2Mondeca, Paris, France.
mpoveda@fi.upm.es, bernard.vatant@mondeca.com, {mcsuarez, asun}@fi.upm.es
Speaker: Asunción Gómez-Pérez
Contact author: María Poveda-Villalón: mpoveda@fi.upm.es
Date: 10/28/13