Take control of your SAP testing with UiPath Test Suite
Datasets with bioschemas
1. DatasetswithBioschemas
Alejandra Gonzalez-Beltran(*), Philippe Rocca-Serra(*),
Susanna Sansone(*) and the bioCADDIE Team and Commmunity
(*) Oxford e-Research Centre, University of Oxford
Bioschemas community meeting,
Harpenden,Hertfordshire, UK
8th-9th November 2016
2. Theproblem
how to describe scientific(*)
datasets to enable data discovery
(*) considering in particular
biological and biomedical datasets
3. Designprinciples
The model for data description to be designed around the
Dataset entity, i.e. a unit of information stored by a data
repository:
● Archived experimental datasets which do not change after
deposition to the repository; e.g. dbGAP, GEO,
ClinicalTrials.org
● Datasets in reference knowledge bases, describing dynamic
concepts, such as “genes” whose definition morphs over
time; e.g. UniProt
Additionally:
● A dataset and related datasets may available in multiple
repositories
● A dataset may be available in multiple forms
6. Extractingrequirementsfromusecases
❖ Selected competency questions
✧ representative set collected from: use cases workshop, white
paper, submitted by the community and from NIH and Phil
Bourne’s ADDS office
✧ key metadata elements processed: abstracted, color-coded and
terms binned binned as Material, Process, Information,
Properties; relation identified
7. Mappingexistingmetadataschemas/models
❖ schema.org
❖ DataCite
❖ RIF-CS
❖ W3C HCLS dataset descriptions (mapping of many models including DCAT, PROV,
VOID, Dublin Core)
❖ Project Open Metadata (used by HealthData.gov is being added in this new
iteration)
❖ ISA
❖ BioProject
❖ BioSample
❖ MiNIML
❖ PRIDE-ml
❖ MAGE-tab
❖ GA4GH metadata schema
❖ SRA xml
❖ CDISC SDM / element of BRIDGE model
bottom-up approach
https://biosharing.org/collection/bioCADDIE
https://github.com/biocaddie/WG3-MetadataSpecifications/
8. DATS:DAtaTagSuite
Coreentities
Biomedicalextension
Like the JATS (Journal Article Tag Suite) is used by
PubMed to index literature,
a DATS (DatA Tag Suite) is needed for a scalable way to
index data sources in the DataMed prototype
https://github.com/biocaddie/WG3-MetadataSpecifications/
14. Mappingdatstoschema.org
✧ Missing elements (needed by DATS) submitted to
the tracker; Roughly 80 % of DATS entities and
properties can be mapped but alignment is not
perfect/less precise), the remaining 20%
constitute major gaps
✧
✧ Tracking schema.org and its related Health and
Life Science extension evolution (the latter
focuses on clinical studies)
20. Datsexportedby
https://github.com/datacite/spinone/issues/3
● An API endpoint that returns
DataCite metadata in DATS
format is work in progress:
http://api.datacite.org/dats
● DataCite Metadata Schema allows
for a RelatedIdentifier with
the HasMetadata relation type
this allows linking to the DATS
metadata from a DataCite
metadata record
Martin Fenner
DataCite