ScientificData is a data journal that provides concise summaries of research data in 3 sentences or less:
ScientificData publishes structured data descriptors and accompanying research data to promote open and reproducible science. Data descriptors provide detailed methods and validation to allow other researchers to understand and reuse shared data. Through peer review of data quality and reuse potential, as well as providing incentives like citations, ScientificData aims to help address issues like selective reporting and make shared research data more accessible and useful.
3. Plagued by selective reporting of data and methods
Why? For example:
• Researchers still lack of or insufficient motivations
o Focus on big discovery and impact; because they “have to”
• Hypothesis-confirming results get prioritized
o Difficulties with reviews of other results
• Agreements, disagreements and timing
o Unclear or lack of data sharing agreements and timing of disclosure
• Loose requirements and monitoring by journals and funders
o Publish and release just enough; keep the rest, move to next grant
5. Are open data and methods understandable, reusable?
Not always…
• Outputs are multi-dimensional, diverse, not always well cited / stored
• Software, codes, workflows etc.; hard(er) to get hold of
• Data often distributed and fragmented to fit (siloed) databases
o Not contain enough information for others to understand it
• Uneven level of details and annotation across different databases
o Specialized, generalist, public and institutional
• Data curation activities are perceived as time consuming
o Collection and harmonization of detailed methods and experimental
steps is done/rushed at publication stage
7. Role of data papers / data journals
• Incentive, credit for sharing!
• Data-focused peer review!
• Value of data vs. analysis, results!
• Support of the FAIR concept!
8. market research (2011)
• What do researchers want from a data publications?
o 96% - increased visibility and discovery
o 95% - increased usability of their research data
o 93% - credit mechanism for deposit of data
o 80% - peer review of content/datasets
Respondent characteristics
387 respondents (329 active researchers
Physics (24%)
Earth and environmental science (21%)
Biology (20%)
Chemistry (19%)
Others (16%)
9. Because of importance of formal
publications in the academic !
incentive structure!
Publishers occupy a leverage point
11.
"
!
!
Helping you publish, discover and reuse research data
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting community
data and code
repositories
Open Access
• Currently covering life, natural and environmental
sciences!
• Big and small data!
o power of small data are in their aggregation and
integration with other datasets!
• New and previously published individual datasets,
curated collections and citizen science!
o a fuller, more in-depth look at the data processing
steps, additional data files, codes etc!
o tutorial-like information for scientists interested in
reusing or integrating the data with their own!
12. Methods and technical analyses supporting the quality
of the measurements:"
What did I do to generate the data?"
How was the data processed?"
Where is the data?"
Who did what when"
How can the data be used or reused?"
Introducing a new content type: Data Descriptor
Designed to make data
more FAIR
Focused mainly on:
• Methods
• Technical Validation
• Data Records
• Usage Notes
14. AFTER: expand on your research articles, adding further information for reuse of the data
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
OR BEFORE
Relation with traditional article - time
Publish
Data!
16. Evaluation is not be based on the perceived impact!
or novelty of the findings or size of the data!
!
• Experimental rigour and technical data quality!
o Methodologically sound!
o Technical validation experiments and statistical analyses!
o Depth, coverage, size, and/or completeness of data sufficient for the types
of applications!
• Completeness of the description!
o Sufficient details to allow others to reproduce the results, reuse or
integrate it with other data!
o Compliance with relevant minimum information or reporting standards!
• Integrity of the data files and repository record!
o Data files match the descriptions in the Data Descriptor!
o Deposited in the most appropriate available databases!
Peer review process focused on quality and reuse!
17. "
"
"
Experimental metadata or"
structured component"
(in-house curated, machine-
readable formats)"
Article or "
narrative component"
(PDF and HTML)!
Data Descriptor: narrative and structure
18. Sections:!
• Title"
• Abstract"
• Background & Summary"
• Methods"
• Technical Validation"
• Data Records"
• Usage Notes "
• Figures & Tables "
• References"
• Data Citations"
!
Focus on data reuse"
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Joint Declaration of Data Citation Principles by the
Data Citation Synthesis Group
Data Descriptor: narrative
19. In-house editorial curator assists authors via !
• Excel spreadsheet
templates"
• internal authoring tool!
to create the structured
component, also performing
value-added semantic
annotation
analysis !
method! script!
Data file or !
record in a
database!
Data Descriptor: structure (CC0)
20. Because we do not want cryptic experimental info, e.g.:
LS1_C2_LD_TP2_P1! file1-fastq.gz!
21. …how not to report the experimental information!
• L!S1 ! !liver sample 1!
• C2 ! !compound 2!
• LD ! !low dose!
• TP2 ! !time point 2!
• P1 ! !protocol 1!
• file1-fastq.gz !compressed data file for sequence !
! ! !information corresponding to this sample!
Sample name (?!)" Data file"
LS1_C2_LD_TP2_P1! file1-fastq.gz!
22. Structured component: key information from narrative
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared…
23. Age value
Unit
Strain name
Subject of the experiment
Type of diet and
experimental condition
Anatomy part
Seven week old C57BL/6N mice were treated
with low-fat diet.
Liver was dissected out, hepatocytes prepared …
From natural language to ‘computable’ concepts
Type of protocol – cell preparation
Type of protocol - sample treatment
Type of protocol – liver preparation
25. What does a structured component add?
• Supplements the scientific discourse!
o natural language has a degree of ambiguity!
• Brings clarity in reporting research methods and procedures!
o no trimming, no cooking!
o clear samples to data files links and relation to methods!
• Provides the basis for search and discovery features!
SciData DD
Structured
content SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
SciData DD
Structured
content
Same tissue
Same organism
Same assay
Community
Data
Repositories
28. Big
data
|
CSE
2014
2
Repositories criteria!
1. Broad support and recognition within their scientific community !
2. Ensure long-term persistence and preservation of datasets!
3. Provide expert curation !
4. Implement relevant, community-endorsed reporting requirements !
5. Provide for confidential review of submitted datasets !
6. Provide stable identifiers for submitted datasets !
7. Allow public access to data without unnecessary restrictions !
31. Nature 515, 312 (20 November
2014) doi:10.1038/515312a
http://www.nature.com/
news/data-access-
practices-
strengthened-1.16370
Key part of NPG data access & reproducible research
policies
32. Responsibilities lie across several stakeholder groups
Understand the benefits of sharing
FAIR datasets and enact them
Engage and assist researchers to
enable them to share FAIR datasets
Release or endorse practices
and polices, but also incentive
and credit mechanisms for
researchers, curators and
developers
33. Acknowledgements!
Visit
nature.com/scientificdata
Email
scientificdata@nature.com
Tweet
@ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
Advisory Panel and Editorial Board including
senior researchers, funders, librarians and curators
and our Advisory Boards and Collaborators
Funds:
Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Eamonn
Maguire, Dphil
Contractor
Milo
Thurston, PhD
Senior Bioinfomatician
Allyson
Lister, PhD
Knowledge Engineer
Alfie
Abdul-Rahman, PhD
Research Software Engineer