Presentation of our paper at the WHISE workshop at ESWC 2016 on requirements for metadata over non-public datasets for the science & technology studies field.
Managing Metadata for Science and Technology Studies: the RISIS case
1. Managing Metadata for Science, Technology and
Innovation Studies: The RISIS Case
Al Koudous Idrissou, Ali Khalili, Rinke Hoekstra and Peter van den Besselaar
Vrije Universiteit Amsterdam/University of Amsterdam
rinke.hoekstra@vu.nl
• 6 public resea
• Goal: promote a
to advance science
universities
public research org
oal: promote a distribute
advance science & innova
• 6 public resea
• Goal: promote a
to advance science
2. Science & Technology
Studies
• Study the dynamics of scientific
ideas.
• Interaction between academia,
business and government.
• Highly interdisciplinary
social sciences, economics,
political science, humanities
• Highly heterogeneous data
structured vs. unstructured
qualitative vs. quantitative
3. The RISIS Project
• "an explosion of experimental datasets since 2000 … mostly thanks to EC
supported project"
• A distributed research infrastructure to advance science & innovation studies
• Serving research:
consolidate and integrate existing datasets
complement with new datasets on key issues currently not covered
develop software platforms to support research
(extract, integrate, structure and treat semantic web data)
• Serving society:
A radically improved evidence base for research & innovation policies
4. Six Use Cases
1. Where do what types of firms innovate, how do they develop, where do they
grow fastest?
2. How stable and large are EU-promoted networks? How do joint funding and
emerging science & technologies affect Europe?
3. What is the quality and extent of public sector research? Build registers at a
European level, integrated views of excellence (leiden ranking etc.)
4. Track the careers of researchers across borders
5. Effect and impact of research & innovation studies
6. Develop integrated data and tools for researchers in the field
5. Six Use Cases
1. Where do what types of firms innovate, how do they develop, where do they
grow fastest?
2. How stable and large are EU-promoted networks? How do joint funding and
emerging science & technologies affect Europe?
3. What is the quality and extent of public sector research? Build registers at a
European level, integrated views of excellence (leiden ranking etc.)
4. Track the careers of researchers across borders
5. Effect and impact of research & innovation studies
6. Develop integrated data and tools for researchers in the field
6. The Types of Data in SMS
Data Integration
Organization Product Agreement
Person Policy
Policy
Evaluation Location
CIB ETER EUPRO JOREP Leiden-Ranking
MORE I Nano Profile SIPER VICO
Higher
Education
Firm
Funding
Body
Publication
Patent
Project
Investment
Funding
Program
7. Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOID
RDF
VOID
RDF
VOID
RDF
VOID
[Linked Data] API
Data Cache
(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity
Recognition
[Linked] Open Data
Public Data Access Methods
(SPARQL, API, RSS,…)
Meta-data
Services
Basic Geo
Services
Innovative
Geo Services
Integration with
local datasets
Integration with
public datasets
Category
Services
Apps
Integration with
social data
Access Control
Service
Domain
Adaptation
Service
Identifier
Management
ServiceIdentity Resolution
Service
VOID
8. But wait a moment… hasn't this been done before?
• … solve a similar data integration problem
pharma (OpenPHACTS), socio-economic history, linguistics, media (CLARIAH, CEDAR,
etc.).
• … solve a similar data search, indexing and cataloguing problem
datahub.io, lodlaundromat.org
• … solve similar metadata representation problems
DCAT, VOID, etc.
14. Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache
(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity
Recognition
[Linked] Open Data
Meta-data
Services
Basic Geo
Services
Innovative
Geo Services
Integration with
local datasets
Integration with
public datasets
Category
Services
Apps
Integration with
social data
Domain
Adaptation
Service
Identifier
Management
Service
Identity Resolution
Service
Access Control Points
RDF
metadata
VOIDVOID
RDFstore
convert convert
metadata
metadata
RDF
metadata
convert
15. Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache
(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity
Recognition
[Linked] Open Data
Meta-data
Services
Basic Geo
Services
Innovative
Geo Services
Integration with
local datasets
Integration with
public datasets
Category
Services
Apps
Integration with
social data
Domain
Adaptation
Service
Identifier
Management
Service
Identity Resolution
Service
Access Control Points
RDF
metadata
VOIDVOID
RDFstore
convert convert
metadata
metadata
RDF
metadata
convert
How can we still provide an integrated view on this data?
16. Semantically Mapping Science (SMS)
DB DBDB DB
RISIS Private DataRISIS Public Data
VOIDVOIDVOID
SMS [Linked Data] API
Data Cache
(Triple Store)
Data Viz. & Exploration views Interoperability with corTEXT
Named Entity
Recognition
[Linked] Open Data
Meta-data
Services
Basic Geo
Services
Innovative
Geo Services
Integration with
local datasets
Integration with
public datasets
Category
Services
Apps
Integration with
social data
Domain
Adaptation
Service
Identifier
Management
Service
Identity Resolution
Service
Access Control Points
RDF
metadata
VOIDVOID
RDFstore
convert convert
metadata
metadata
RDF
metadata
convert
How can we still provide an integrated view on this data?
Do existing vocabularies suffice?
17. How do experts assess the suitability of a dataset?
• Knowledge acquisition & elicitation with experts
interviews -> first design -> user experiences -> revise & adapt
• Distinguish between private, publicly accessible, and other public data.
18. How do experts assess the suitability of a dataset?
• Knowledge acquisition & elicitation with experts
interviews -> first design -> user experiences -> revise & adapt
• Distinguish between private, publicly accessible, and other public data.
1. User friendly web interface for viewing dataset metadata;
2. Show conditions under which the data can be used;
3. Provide detailed information about the dataset, to
4. Enable users to gain an in depth understanding of the data;
5. Facilitate trust (quality assessment);
6. Allow for both simple and advanced search (background knowledge)
19. Operationalisation
1. User interface
categorisation of different types of metadata, non-technical terms, hints
2. Usage conditions
legal aspects, access conditions, but also technical (data format, size, model)
3. Information & Understanding
overview, content description, temporal aspects, structure of the data
4. Trust
provenance and origin of data, when, how, and by whom it was created
5. Search
All of the above + the use of external knowledge sources to show connections
20. Operationalisation
1. User interface
categorisation of different types of metadata, non-technical terms, hints
2. Usage conditions
legal aspects, access conditions, but also technical (data format, size, model)
3. Information & Understanding
overview, content description, temporal aspects, structure of the data
4. Trust
provenance and origin of data, when, how, and by whom it was created
5. Search
All of the above + the use of external knowledge sources to show connections
Fig. 2. RISIS metadata coverage overview through knowledge type categorization.
21. In more detail
and generic domain of the problem, SMS is intended to be useful not only for
STIS but also for the humanities and social sciences.
5 Conclusions & Future Work
This paper presents an approach for managing metadata in the field of science,
technology and innovation studies. The approach was developed and applied in
the context of the RISIS-SMS project with the goal of supporting data integra-
tion, discovery and search across datasets, maintaining privacy, and obtaining
user trust while focussing on data that are not directly accessible. A contribu-
tion of this work is the requirements elicited by interviewing the stakeholders.
The requirement analysis guided the design of a new vocabulary, together with
review of existing metadata vocabularies that helped us filling in part of the
metadata needed to accommodate the domain needs. Additionally, to meet the
requirements, we designed and implemented a user-friendly interface which al-
lows non-expert users to easily author metadata in RDF.
As future work, we envisage to extend our vocabulary to cover aspects related
to the quality and provenance of data. We also plan to conduct a usability
evaluation with end-users of the system to ensure that our user interface and
metadata specifications fulfil the user needs.
References
1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.
Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-
tics, 4(1):1–22, 2013.
2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and
A. Bartolucci. Data integration for research and innovation policy: an ontology-
based data management approach. Scientometrics, pages 1–15, 2015.
3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centric
linked data integration: The open phacts discovery platform case study. Web Se-
mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.
4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of
science and technology studies. The MIT Press, 2008.
5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web
components: Building flexible and reusable semantic web interfaces. Semantic Web
Conference (ESWC) 2016, 2016.
6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodr´ıguez-Doncel, and
P. Cimiano. One ontology to bind them all: The meta-share owl ontology for the
interoperability of linguistic datasets on the web. In The Semantic Web: ESWC
2015 Satellite Events, pages 271–282. Springer, 2015.
7. A. Mero˜no-Pe˜nuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,
A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies for
historical research: A survey. Semantic Web, 6(6):539–564, 2014.
8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,
51(2):441–460, 2001.
Fig. 4. The RISIS Ontology and the vocabularies it reuses.
22. In more detail
and generic domain of the problem, SMS is intended to be useful not only for
STIS but also for the humanities and social sciences.
5 Conclusions & Future Work
This paper presents an approach for managing metadata in the field of science,
technology and innovation studies. The approach was developed and applied in
the context of the RISIS-SMS project with the goal of supporting data integra-
tion, discovery and search across datasets, maintaining privacy, and obtaining
user trust while focussing on data that are not directly accessible. A contribu-
tion of this work is the requirements elicited by interviewing the stakeholders.
The requirement analysis guided the design of a new vocabulary, together with
review of existing metadata vocabularies that helped us filling in part of the
metadata needed to accommodate the domain needs. Additionally, to meet the
requirements, we designed and implemented a user-friendly interface which al-
lows non-expert users to easily author metadata in RDF.
As future work, we envisage to extend our vocabulary to cover aspects related
to the quality and provenance of data. We also plan to conduct a usability
evaluation with end-users of the system to ensure that our user interface and
metadata specifications fulfil the user needs.
References
1. P. Ciccarese, S. Soiland-Reyes, K. Belhajjame, A. J. Gray, C. Goble, and T. Clark.
Pav ontology: provenance, authoring and versioning. Journal of biomedical seman-
tics, 4(1):1–22, 2013.
2. C. Daraio, M. Lenzerini, C. Leporelli, H. F. Moed, P. Naggar, A. Bonaccorsi, and
A. Bartolucci. Data integration for research and innovation policy: an ontology-
based data management approach. Scientometrics, pages 1–15, 2015.
3. P. Groth, A. Loizou, A. J. Gray, C. Goble, L. Harland, and S. Pettifer. Api-centric
linked data integration: The open phacts discovery platform case study. Web Se-
mantics: Science, Services and Agents on the World Wide Web, 29:12–18, 2014.
4. E. J. Hackett, O. Amsterdamska, M. Lynch, and J. Wajcman. The handbook of
science and technology studies. The MIT Press, 2008.
5. A. Khalili, A. Loizou, and F. van Harmelen. Adaptive linked data-driven web
components: Building flexible and reusable semantic web interfaces. Semantic Web
Conference (ESWC) 2016, 2016.
6. J. P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodr´ıguez-Doncel, and
P. Cimiano. One ontology to bind them all: The meta-share owl ontology for the
interoperability of linguistic datasets on the web. In The Semantic Web: ESWC
2015 Satellite Events, pages 271–282. Springer, 2015.
7. A. Mero˜no-Pe˜nuela, A. Ashkpour, M. Van Erp, K. Mandemakers, L. Breure,
A. Scharnhorst, S. Schlobach, and F. Van Harmelen. Semantic technologies for
historical research: A survey. Semantic Web, 6(6):539–564, 2014.
8. P. Van den Besselaar. The cognitive and the social structure of sts. Scientometrics,
51(2):441–460, 2001.
Fig. 4. The RISIS Ontology and the vocabularies it reuses.
Fig. 3. RISIS's Ontology. A view over mapped vocabularies reused.
respectively The Dublin Core metadata Element Set9
which is a ”vocabulary of
fifteen properties for use in resource description”, The PROV Ontology10
which is
23. Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
24. Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
25. Ali Khalili, Antonis Loizou and Frank van Harmelen. Adaptive Linked Data-driven Web Components: Building Flexible and Reusable Semantic Web Interfaces
26. Discussion
• Science & innovation studies thrives on diverse and heterogeneous data
• Existing platforms do not take access restrictions into account, or
• … they do not provide sufficiently descriptive metadata to support research
• We performed a requirements analysis for minimal metadata needs
• Resulting in a vocabulary that integrates and connects existing standards, and
• … drives a Linked Data driven data search portal.