This document summarizes a presentation on Research Objects given on October 29, 2018 in Amsterdam. It discusses how Research Objects can bundle together different components of a research investigation, such as data, methods, provenance, and results, to facilitate exchange, reproducibility, and preservation of research. The Research Object framework carries machine-readable metadata about these components. Examples are given of Research Objects that bundle workflows and computational experiments. Challenges and opportunities are discussed around developing community tools and standards to work with Research Objects.
2. Research Object
Community Update
Carole Goble, Stian Soiland-Reyes, Sean Bechhofer
The University of Manchester, UK
carole.goble@manchester.ac.uk
RO2018, 29 October 2018, Amsterdam, 2018.
Satellite workshop of IEEE 14th International Conference on e-Science 2018
3. 2010 â Research Objects not PDFs
Research has many
components, of
many types
Exchange of all the components of
an investigation
Computational instruments break or
need to be maintained
Science, and its products, evolve
5. Research Object Framework
Bechhofer et al (2013) https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) https://eprints.soton.ac.uk/268555/
carry machine processable
metadata common and specific
to different object types
bundle together and relate digital resources
with their context
snapshot, cite,
exchange
Standards-based generic metadata framework
Data used and results produced
Methods used to produce /analyse that data
Provenance and settings, People involved,
Annotations understanding & interpretation
6. 6
Howard Ratner,
Chair STM Future Labs Committee, CEO EVP Nature Publishing Group
Director of Development for CHORUS (Clearinghouse for the Open Research of US) STM
Innovations Seminar 2012
http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5
7. Container
âUnboundedâ Objects
Bags of things and external references to things
A Digital Package Object
Type composed of many
interrelated elements that
bundles together and
relates digital resources of
a scientific investigation
with context.
A Metadata Object
that represents
properties in common
across all research
artefacts types,
common PIDs and
metadata
9. Workflow RO
Describe and run workflows, and
the command line tools they
orchestrate, supporting
containers to be portable,
transparent and interoperable .
Describe the workflow inputs,
outputs, tools and data with
controlled vocabularies /
ontologies
EDAM
Describe the provenance of
the workflow
Software components are
containerised to be portable
Workflow systems run the
CWL workflow
Gather the CWL
workflow
descriptions + rich
context, provenance
using multi-tiered
descriptions
Snapshot workflow.
Relate it to other
objects.
Archive formats
to contain the
objectContainer
Metadata
https://www.commonwl.org/
17. Container Profiles
Specification for a structured ZIP-file, based on the ePub and Adobe
UCF specifications
Research Object Bundle 1.0
https://researchobject.github.io/specifications/bundle/
Specifies a file system structure for transferring and
archiving a collection of files, including their checksums to
verify and validate content and brief metadata.
https://github.com/ResearchObject/bagit-ro
mechanism for serialization
and transport consistency,
capture identity, annotations and
provenance of the resources
Big Data
collections
of arbitrary
referenced
content
https://github.com/fair-research/bdbag
18. Manifest Profile Description
general construction & validation tooling
Linked Data and RDF Shapes
Validate graph-based data against a set
of conditions
Shapes Constraint Language
Gamble,Zhao, Klyne,Goble. IEEE eScience 2012,
http://dx.doi.org/10.1109/eScience.2012.6404489
Minim model for defining checklists
ro-show
âą RO pre-processing to merge to
single graph
âą RDF Shape that indicates to
follow links
âą Bespoke validators / unpackers
to iterate over the RO
[Lilian Gorea,Oluwatomide Fasugba 2018]
19. ResearchObject drivers
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1
Exchange & Commons
Preservation and fixed point publishing
Reproducibility and execution
Active âreleaseâ research
24. Acknowledgements
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Mark Robinson
AlanWilliams
Norman Morrison
Stian Soiland-Reyes
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
KristianGarza
Daniel Garijo
Catarina Martins
Iain Buchan
Michael Crusoe
Rob Finn
Carl Kesselman
Ian Foster
Kyle Chard
Vahan Simonyan
Ravi Madduri
Raja Mazumder
GilAlterovitz,
Denis Dean II
Durga Addepalli
Wouter Haak
Anita De Waard
Paul Groth
Oscar Corcho
Josh Sommer
Project ID: 675728
Hinweis der Redaktion
Nested content
Heterogeneous elements.
Distributed and embedded content.
Externally stewarded content.
Checklists + Checksums
Standards-based generic
metadata framework
A unit for exchanges and turningâŠ.
Validation, Verification
Provenance
Dependencies Versions
Checklists Variance
Portability
Transparent Processes
Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Fixed and Living
This is what we are turning
More than Data, More than one store
Lets make this more concrete before we go into the HOWStandards-based metadata framework for logically and physically bundling resources with context, http://researchobject.orgBigger on the inside than the outside - external referencing
Research Object FrameworkStandards-based metadata framework for logically and physically bundling resources with context
In Seven Bridges Language
The abstract one is the task (blue)
The results as well (green) is the analysis
Studies are lots of analysis
Community led standard way of expressing and running workflows and the command line tools they orchestrate, supporting containers for portability.
JSON-LD
Predated the FAIR Principles
Element enumeration
IdentiïŹcation & citation
Description tracking attributes (metadata) and origins (provenance) of contents.
Simplicity - low user overhead and thin (no) client
Research Objects are designed to: be tailored to be domain or type specific; work at many levels of granularity, with their own identifiers, citation metadata; and be snapshots or living to suit their place in the research lifecycle.
Data used and results produced âŠ
Methods employed to produce and analyse that data ââŠ
Provenance and settings âŠ
People involved ââŠ
Annotations understanding & interpretation âŠ
Describe how to describe
Itâs a scaffold
In the presentation we outline features of the latest specifications (https://w3id.org/ro/2016-01-28)
Under the hood building blocks
BagIt is an Internet Draft that specifies a file system structure for transferring and archiving a collection of files, including their checksums and brief metadata.
Research Object bundles is a specification for a structured ZIP-file, based on the ePub and Adobe UCF specifications. The bundle serializes a Research Object, embedding some or all of its resources within the ZIP file, and list the RO content in a manifest, in addition to embedding and referencing annotations and provenance.
A BagIt bag can be considered a mechanism for serialization and transport consistency, while a Research Object can be considered a way to capture identity, annotations and provenance of the resources. As such, the two formats complement each-other. They are however not directly compatible.
This GitHub repository explains by example a profile for a BagIt bag to also be a Research Object.
https://www.slideshare.net/jelabra/shex-vs-shacl
ShEx is schema based
SHACL is constraint based
encoding manifest content profiles using Linked Data and RDF Shapes in order to support general validation tools (https://github.com/researchobject/ro-show)
validating graph-based data against a set of conditions. Among others, SHACL includes features to express conditions that constrain the number of values that a property may have, the type of such values, numeric ranges, string matching patterns, and logical combinations of such constraints. SHACL also includes an extension mechanism to express more complex conditions in languages such as SPARQL. A SHACL validation engine takes as input a data graph and a graph containing shapes declarations and produces a validation report that can be consumed by tools. All these graphs can be represented in any Resource Description Framework (RDF) serialization formats including JSON-LD or Turtle. The adoption of SHACL may influence the future of linked data.[2]
Exchange & Commons: The transfer of knowledge, data and results between the services and actors and the development of RO Commons to enable reuse and sharing. References to remote content allows for access management due to privacy restrictions or data scales.
Commons & Catalogues
Publishing,
Exchange between people and platforms
Sharing,
Training
Preservation: Snapshots of state of a collection of resources for fixed-point publishing.
Conservation
Repair
Archive
Reproducibility: Describing the structured collection, its components and its context in a rich enough way to support inspection and interpretation by people, and re-execution and comparison by computational machinery.
Execution
Replication
Reproducibility
Preservation
Portability
Active âreleaseâ oriented research: Accumulating metadata to reflect versions, new configurations of content, evolutions, relationships between objects, and metadata reflecting who has added to the object or used it.
Active Research
Release
Evolution
Snapshots
Remixing,
Comparison,
Review
Automated processing
RO Composer
Nested content
Heterogeneous elements.
Distributed and embedded content.
Externally stewarded content.
Checklists + Checksums
Community forking & fragmentation
Arose from workflow sharing and preservation
Atomicity
Granularity
Aggregation
Composition, Fragmentation
Versioning
Forking
Cloning
Portability
Dependency management