Genislab builds better products and faster go-to-market with Lean project man...
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
1. 19/03/2014 1Presentername
3LD: Towards high quality, industry-ready
Linguistic Linked Licensed Data
Daniel Vila-Suero1, Victor Rodríguez-
Doncel1, Asunción Gómez-Pérez1, Philipp
Cimiano2, John P. McCrae2, and Guadalupe
Aguado-de-Cea1
1Ontology Engineering Group, Facultad de Informática, UPM. Madrid, Spain
{dvila, vrodriguez, asun, lupe}@fi.upm.es
2 Forschungsbau Intelligente Systeme (FBIIS). Universität Bielefeld. Bielefeld, Germany
{cimiano, jmccrae}@cit-ec.uni-bielefeld.de
2. 19/03/2014 2Daniel Vila-Suero
Context: Lider project
• Ecosystem of Linguistic resources
(Corpora, Lexico-semantic data, etc.)
as LD and NLP services to support
content analytics.
Join us!
http://lider-project.eu
Linked Data for Language Technologies
Community Group (LD4LT)
3. 19/03/2014 3Daniel Vila-Suero
Licensing Linked Data, why?
Open Data Propietary Data
Gainvisibility
Encourage re-use
Protectyour data
Enablewaystotrackusage
Thinkaboutnewbusinessmodels
4. 19/03/2014 4Daniel Vila-Suero
How open is the LOD cloud?
[1] Rodriguez-Doncel, Victor et al., 2013. Rights declaration in Linked Data.
in Proc. of the 3rd Int. W. on Consuming Linked Data O. Hartig et al. (Eds) CEUR vol. 1034 (2013)
5. 19/03/2014 5Daniel Vila-Suero
How open is the LOD cloud?
• 338 datasets in :
[1] Rodriguez-Doncel, Victor et al., 2013. Rights declaration in Linked Data.
in Proc. of the 3rd Int. W. on Consuming Linked Data O. Hartig et al. (Eds) CEUR vol. 1034 (2013)
6. 19/03/2014 6Daniel Vila-Suero
Linguistic Linked Data
1 "Open Data andLinguistics" workinggroup, Open KnowledgeFoundation, see more http://linguistics.okfn.org/
Language resources
as Linked Data:
Lexica
Language descriptions
Corpora
….
Linguistic LOD (LLOD) cloud
9. 19/03/2014 9Daniel Vila-Suero
What is 3LD?
3LD
Linguistic Linked Licensed Data
Language resources such as:
- Lexica
- Corpora
- Dictionaries ..
10. 19/03/2014 10Daniel Vila-Suero
What is 3LD?
3LD
Linguistic Linked Licensed Data
Linguistic data as Linked Data using RDF and
standard data models (vocabularies):
- Lexica
- Corpora .. NIF
NLP Interchange Format
11. 19/03/2014 11Daniel Vila-Suero
What is 3LD?
3LD
Linguistic LinkedLicensedData
Linguistic Linked Data published along with
a machine-readable license.
ODRL
Open Digital Rights Language
NIF
NLP Interchange Format
12. 19/03/2014 12Daniel Vila-Suero
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description
(e.g., VoID, DCAT)
1 DCAT
Data catalog vocabulary
13. 19/03/2014 13Daniel Vila-Suero
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description
(e.g., VoID, DCAT)
1
Use standard predicates to declare "rights" statements
(e.g., Dublin Core terms: dc:rights, dct:license)2
DCAT
Data catalog vocabulary
14. 19/03/2014 14Daniel Vila-Suero
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description
(e.g., VoID, DCAT)
1
Use standard predicates to declare "rights" statements
(e.g., Dublin Core terms: dc:rights, dct:license)2
?
3a
Standard license available
DCAT
Data catalog vocabulary
15. 19/03/2014 15Daniel Vila-Suero
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description
(e.g., VoID, DCAT)
1
Use standard predicates to declare "rights" statements
(e.g., Dublin Core terms: dc:rights, dct:license)2
?Yes
Use URI of standard
license e.g., CC0
3a
Standard license available
DCAT
Data catalog vocabulary
16. 19/03/2014 16Daniel Vila-Suero
Guideline: Licensing models & mechanisms
Add "rights" metadata in the dataset description
(e.g., VoID, DCAT)
1
Use standard predicates to declare "rights" statements
(e.g., Dublin Core terms: dc:rights, dct:license)2
?
Use rights declaration
language, e.g., ODRL
Yes
Use URI of standard
license e.g., CC0
3b3a
No
Standard license available
ODRL
Open Digital Rights Language
DCAT
Data catalog vocabulary
17. 19/03/2014 17Daniel Vila-Suero
Demo: Conditional access to Linked Data
• Prototype developed at the Ontology
Engineering Group.
• A licenses-aware Linked Data server and a data
policies and licenses manager
• Using Web standards (DCAT descriptions,
SPARQL constructs, ODRL RDF policies, etc.)
Victor RodríguezDoncel
vrodriguez@fi.upm.es
18. 19/03/2014 18Daniel Vila-Suero
Demo: Use case
• Spanish geographical data: Administrative
units, geopositions, links to DBpedia
1 Browse the data (user)
2 Set policies for parts of
the dataset (admin)
3 Gain access to the
restricted data (user)
27. 19/03/2014 27Daniel Vila-Suero
Gain access to restricted data (user)
<http://localhost:99/ldr/policy/ee32f675-ccae-4ca9-a544-3c07abf0b16e>
a <http://www.w3.org/ns/odrl/2/Policy> , <http://www.w3.org/ns/odrl/2/Set>;
<http://www.w3.org/2000/01/rdf-schema#comment>
"Individual triples are available upon payment of 1 euro cent" ;
<http://www.w3.org/ns/odrl/2/permission>….
The work I will present today is a collaboration between the Ontology Engineering Group at Universidad Politécnica de Madrid and Universität Bielefeld
But is also the result of many discussions among the partners of the EU project LiderBut, what is Lider?Lider is a support and coordination action with the goal of setting the pathway for the creation of an ecosystem linguistic Linked Data and NLP services to support enterprise content analytics in Europe. And a crucial issue to achieve this is to listen to industry and the community, so please join us in the newly created W3C Linked Data for Language technologies CG.In this discussions with the community there are several recurring topics such as data modelling, quality, provenance, etc. But one of them seems to be of special relevancy and that will be the main topic of this talk.The main outcome of the project will be a roadmap for EU, several guidelines to help data publishers and consumers, a reference architecture and an industrial community
As you might have guessed from the tile, the topic is Licensing Linked Data and in particular Linguistic Linked Data, but why is this important?No matter you are publishing Open Data or data under more restrictive terms, you and the potential data consumers will benefit from providing a license along with your data.In the case of Open Data …For data under more restrictive terms of use…Given that everyone seems to agree on this, what is actually the current practice?In 2013, member of my group performed a study on the so-called Linked Data cloud and the results were a bit surprising, (or maybe not).
Although there's a lot of green areas (with licenses such as public domain, those that require attribution), you can see several red and orange areas corresponding to restrictive licenses and a considerable mass of grey which corresponds to
If you are interested u can read the paper, but as u can see in this graph almost 50% of the datasets are published either under not specified licenses or even closed licenses.
Going back to our topic, linguistic linked data. As you might be aware recently a new cloud of LLD has emerged with the support of the Open Data and Linguistics working group. One can look at this cloud from several perspectives language, type of resource, data models or quality, but what about the licenses, how open is this cloud?
In this case we found out that it is certainly more open than the LOD cloud, although there's still around 13 percent of unspecified or restrictive licenses. Adittionaly, this cloud has been selected and curated by a working group, but what willl happen when the scope gets broader including resources from ELRA or metashare for example?
This concern is why we have came up with the concept of 3LD which stands for Linguistic Linked Licensed Data