A 30 minute seminar presented at the National Bioscience Database Center, part of the Japanese Science and Technology Agency, based in Tokyo, Japan. This presentation covers the FAIR Principles, the aims, methodology and use of FAIRsharing, related projects such as Bioschemas, and international initiatives such as ELIXIR and EOSC.
4. A set of principles, for those wishing to enhance
the value of their
data holdings
Designed and endorsed by a diverse set of stakeholders - representing academia, industry, funding
agencies, and scholarly publishers
9. • Not always well cited, stored
o Software, code, workflows are hard to find/access
• Poorly described for third party reuse
o Different levels of detail and annotation
• Curation activities are perceived as time-consuming
o Collection and harmonization of detailed methods and
experimental steps is rushed at the publication stage
Not FAIR – low findability and
badly documented
10. • Available in a public repository
• Findable through some sort of search facility
• Retrievable in a standard format
• Self-described so that third parties can make sense of it
• Intended to outlive the experiment for which they were collected
To do better science, more efficiently,
we need data that are…
11. My database is going
offline, where should I
put the data, and in
what format?
Before accepting my
paper, this journal
wants my data to be in
a public repository, but
which one?
My funder says I
should deposit the
data in a reputable
repository. But
which one?
I’m collecting in-
vivo animal
testing data –
what metadata
should I curate?
I’m about to start a set of
experiments. In what
format should I record
the data?
12. A web-based, curated, and searchable portal that monitors the
development and evolution of standards*, across all disciplines,
inter-related to databases/repositories and data policies
* A standard is a formal community specification for reporting, sharing and
citing data, metadata and other digital assets
14. Formats Terminologies Guidelines
FAIRsharing enhances their findability
240+
119+
709+
Source:
Sources:
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML…
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
~1500
Source:
15. Content standards
Data policies by
funders, journals and
other organizations
Databases/Repositories
Formats Terminologies Guidelines
Mapping a complex and evolving
landscape
16. 270
48
23
2
97
87 4
204
9 6 8
Paper in preparation,
preliminary information as of July 2017
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
All records are manually curated
in-house and verified by the
community behind each resource
Community verified status indicators
17. My funder’s data policy recommends the use
of established standards, but which are widely
endorsed and applicable to my crop data?
We need a standard for sharing
social science data, what’s out
there and who should we talk to?
I have some old rice genomic data in
format X, which is now deprecated;
what format has replaced X?
Which are the mature
standards and standards-
compliant databases that we
should recommend to our
authors?
22. Collections group together
one or more types of
resource by domain,
project or organization.
Recommendations are a
core-set of resources that
are selected and
recommended by a funder
or journal data policy.
Grouping the data
26. A pan-European infrastructure for biological information
€19 million
2015 - 2019
Making FAIRsharing FAIR –
Findable - ELIXIR
27. Making FAIRsharing FAIR –
Findable - Bioschemas
• Web mark-up – Schema.org and Bioschema.org
scscscsc/BioSchemas/specifications/tree/master/DataCatalog
28. Consortium of 33 pan-European organisations & 15 third parties covering a range of
disciplines and organisations working together to develop a European-wide
governance framework for a pan-European “trusted virtual environment with free,
open and seamless services for data storage, management, analysis, sharing and re-
use, across disciplines”
European Open Science Cloud
(EOSC) Pilot
29. Wider adoption of FAIRsharing by many biomedical research
infrastructure programmes in EU and USA, e.g.
30. Embeddable Widget
• Recommendation/Collection Widget for embedding
in third-party websites
• Journal data policies (GigaScience, PLOS, Springer
Nature…)
• Standard Developing Organisations (e.g. TDWG)
• Societies/Organisations (e.g. ELIXIR)
Dr Massimiliano Izzo
31.
32.
33. FAIR - Interoperability/Accessibility
• Data annotation:
• Users/Maintainers – ORCID
• Organisations – FundRef
• Species – NCBI Taxon ontology
• Disciplines and Domains – re3data/EDAM/BRO
• API – swagger (ELIXIR guidelines)
• DOIs for standards (coming soon)
34. Reaching out to the community - What were
the aims of the RDA /Force BioSharing WG?
• To develop guidelines for linking information on
databases, content standards and journal and
funder data policies in the life sciences
• To develop a curated registry (running since 2011),
to access and cross-search this information, such
that a variety of stakeholders can make decisions
on which standards and databases to use or
endorse
35.
36. Standard developing groups, incl:Journal publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes:
Working with and for the community
OBO