My presentation at the http://neuroinformatics2017.org (Kuala Lumpur, Malaysia) on FAIR and FAIRsharing (previously BioSharing); metadata standards and their implementation by databases/repositories and adoption by journals' and funders' data policies.
FAIR and metadata standards - FAIRsharing and Neuroscience
1. FAIR digital research assets:
beyond the acronym
Susanna-Assunta Sansone, PhD
@SusannaASansone
ORCiD 0000-0001-5306-5690
Consultant,
Founding Academic Editor
Associate Director,
Principal Investigator
Neuroinformatics, Kuala Lumpur, 20-21 August, 2017
2. ⢠Available in a public repository
⢠Findable through some sort of search facility
⢠Retrievable in a standard format
⢠Self-described so that third parties can make sense of it
⢠Intended to outlive the experiment for which they were collected
To do better science, more efficiently
we need data that areâŚ
3. A set of principles, for those
wishing to enhance
the value of their
data holdings
4.
5. Wider adoption of the FAIR principles, by research
infrastructure programmes, e.g.
7. Defining a framework for evaluating FAIRness
By the
fairmetrics.org
Working Group
8. NOTE:
The Principles are high-level; do not suggest any specific
technology, standard, or implementation-solution
Principles put emphasis on enhancing the ability of machines to automatically find
and use the data, in addition to supporting its reuse by individuals
Interoperability standards â the pillars of FAIR
9. The invisible machinery
⢠Identifiers and metadata to be implemented by technical
experts in tools, registries, catalogues, databases, services
⢠It is essential to make standards âinvisibleâ to lay users, who
often have little or no familiarity with them
11. Metadata standards â fundamentals
⢠Descriptors for a digital object that help to understand what
it is, where to find it, how to access it etc.
⢠The type of metadata depends also on the type of digital
object (e.g. software, dataset)
⢠The depth and breadth of metadata varies according to
their purpose
§ e.g. reproducibility requires richer metadata then citation
12. ⢠Domain-level descriptors that are essential for interpretation,
verification and reproducibility of datasets
⢠The depth and breadth of descriptors vary according to the
domain, broadly covering the what, who, when, how and why
Metadata standards - datasets
13. ⢠Domain-level descriptors that are essential for interpretation,
verification and reproducibility of datasets
⢠The depth and breadth of descriptors vary according to the
domain, broadly covering the what, who, when, how and why
allowing:
§ experimental components (e.g., design, conditions, parameters),
§ fundamental biological entities (e.g., samples, genes, cells),
§ complex concepts (such as bioprocesses, tissues and diseases),
§ analytical process and the mathematical models, and
§ their instantiation in computational simulations (from the molecular
level through to whole populations of individuals)
to be harmonized with respect to structure, format and
annotation
Metadata standards - datasets
22. Domain-specific metadata standards for datasets
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE
âŚ.
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML
âŚ
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO
âŚ
TEDDY
PRO
XAO
DO
VO
de jure
standard
organizations
de facto
grass-roots
groups
Formats Terminologies Guidelines
220+
115+
548+
~1000
24. ⢠Perspective and focus vary, ranging:
§ from standards with a specific biological or clinical domain of study
(e.g. neuroscience) or significance (e.g. model processes)
§ to the technology used (e.g. imaging modality)
⢠Motivation is different, spanning:
§ creation of new standards (to fill a gap)
§ mapping and harmonization of complementary or contrasting efforts
§ extensions and repurposing of existing standards
⢠Stakeholders are diverse, including those:
§ involved in managing, serving, curating, preserving, publishing or
regulating data and/or other digital objects
§ academia, industry, governmental sectors, and funding agencies
§ producers but also also consumers of the standards, as domain (and
not just technical) expertise is a must
A complex landscape
25. Standardsâ life cycle
⢠Formulation
§ use cases, scope, prioritization and expertise
⢠Development
§ iterations, tests, feedback and evaluation
§ harmonization of different perspectives and available options
⢠Maintenance
§ (exemplar) implementations, technical documentation, education
material, metrics
§ sustainability, evolution (versions) and conversion modules
26. Technologically-delineated
views of the world
Biologically-delineated
views of the world
Generic features (âcommon coreâ)
- description of source biomaterial
- experimental design components
Arrays &
Scanning
âŚ
Columns
Gels
MS MS
FTIR
NMR
Columns
âŚ
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
neuroscience
Fragmentation, duplications and gaps
Arrays
Scanning
âŚ
27. Arrays
Scanning
⌠Arrays &
Scanning
âŚ
Columns
Gels
MS MS
FTIR
NMR
Columns
âŚ
transcriptomics
proteomics
metabolomics
Modularization to combine and validate
plant biology
epidemiology
neuroscience
Proteomics-based
investigations of
neurodegenerative diseases
Proteomics and metabolomics-
based investigations of
neurodegenerative diseases
28. Working in/across multiple domains is challenging
⢠Requires
§ Mapping between/among heterogeneous representations
§ Conceptual modelling framework to encompass the
domain specific metadata standards
§ Tools to handle customizable annotation, multiple
conversions and validation
29.
30. Technical and social engineering required
⢠Pain points include
§ Fragmentation
§ Coordination, harmonization, extensions
§ Credit, incentives for contributors
§ Governance, ownership
§ Indicators and evaluation methods
§ Outreach and engagement with all stakeholders
§ Synergies between basic and clinical/medical areas
§ Implementations: infrastructures, tools, services
§ Education, documentation and training
§ Funding streams
§ Business models for sustainability
35. ⢠Consumers:
§ How do I find the standards appropriate for my case?
⢠Producers
§ How do I make my standards visible to others?
Improving discoverability of standards
39. Standard developing groups, incl:Journal, publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes:
Working with and for producers and consumers
41. Formats Terminologies Guidelines
âŚand to indicate âadoptionâ
Databases/data
repositories
Data policies by
funders, journals and
other organizations
Metadata standards
42. 270
48
23
2
97
87 4
204
9 6 8
Assign âindicatorsâ to describe their statusâŚ
Paper in preparation,
preliminary information as of July 2017
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
All records are manually curated
in-house and verified by the
community behind each resource
43. Help us map the neuroscience standards landscape
44.
45.
46.
47.
48.
49.
50. Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
Paper in preparation,
preliminary information as of July 2017
Activating the decision-making chain
51. Models/Formats Reporting Guidelines Terminology Artifacts
Database Implementations
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
Number of standards implemented by 544 databases/repositories (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
59 out of 116 (MIAME)
146 out of 223 (FASTA)
121 out of 343 (GO)
Paper in preparation,
preliminary information as of July 2017
Activating the decision-making chain
52. Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Milo
Thurston, DPhD
Research Software Engineer
Massimiliano
Izzo, PhD
Research Software Engineer
Peter
McQuilton, PhD
Knowledge Engineer
Allyson
Lister, PhD
Knowledge Engineer
Eamonn
Maguire, Dphil
Contractor
David
Johnson, PhD
Research Software Engineer
Melanie
Adekale, PhD
Biocurator Contractor
Delphine
Dauga, PhD
Biocurator Contractor
Susanna-Assunta Sansone, PhD
Principal Investigator, Associate Director
53. The (long) road to FAIR
Interoperability standards
are digital objects in their own right,
with their associated research, development and educational activities