Anzeige

Gaining credit for sharing research data

Data & Connectivity Project Manager um Health Data Research UK
18. Mar 2016
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Anzeige

Similar a Gaining credit for sharing research data(20)

Anzeige

Gaining credit for sharing research data

  1. Varsha Khodiyar, PhD Data Curation Editor, Scientific Data Nature Publishing Group @varsha_khodiyar @scientificdata Tweet with #SDJPN16 Gaining credit for sharing research data Data publishing with Scientific Data RIKEN Center for Life Science Technologies 4th March 2016
  2. My background • Joined Scientific Data in October 2014 • Professional data curator since 2003 • PhD in Molecular Biology from the University of Leicester • Contributed to the Human Genome Project as member of the Human Gene Nomenclature Committee (HGNC) • Gene Ontology curator for 8 years, at University College London, UK • 3 years of open data publishing experience 2
  3. Why share research data?
  4. Generating research data is expensive Just 18.1% NIH grant applications funded in 2014* • Hours spent writing grants? • Hours spent reviewing grants? Resources are finite/expensive • Modified animals • Specialized reagents Time and effort taken in the laboratory to generate good, valid data * report.nih.gov/success_rates/Success_ByIC.cfm
  5. Irreproducibility of published science Figure 1 - Ioannidis JPA. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149–55 (2009) doi:10.1038/ng.295
  6. Withholding data impacts on human health Clinical study reports, detailed data and software code available at Dryad Digital Repository doi:10.5061/dryad.bv8j6 and www.Study329.org
  7. • Diversity of analyses and opinion • New research • testing of new hypotheses • new analysis methods • meta-analyses to create new datasets • studies on data collection methods • Education of new researchers • Increased return on investment in research Vickers AJ: Whose data set is it anyway? Sharing raw data from randomized trials. Trials 2006, 7:15 Hrynaszkiewicz I, Altman DG: Towards agreement on best practice for publishing raw clinical trial data. Trials 2009, 10:17 Sharing data promotes
  8. Researchers already share data • Most researchers are sharing data, and using the data of others • Direct contact between researchers (on request) is a common way of sharing data • Repositories are second most common method of sharing Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9
  9. Some problems… • Sharing upon request relies heavily on trust • Informally stored data associated with published works disappears at a rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014) • Datasets not referenced in a manuscript are essentially invisible (a.k.a “Dark data”) • If data are available, they are often not interpretable or reusable because sufficient detail is not included • Data producers do not get appropriate credit for their work
  10. 10 www.nature.com/scientificdata
  11. Credit – Scholarly credit for publishing data; all publications are indexed and citeable. Reuse – Standardized and detailed descriptions enables easier reuse of published research data. Quality – Rigorous peer-review on technical quality and reusability. Editorial Board of experts in their field maintain community standards. Discovery – Curated, machine-readable metadata for dataset discovery. Validated links to published data in each article. Open – Use of CC-BY licence for articles and CC0 for metadata. Promote use of open licences for published data. Service – Commitment to excellent service for authors and readers.
  12. What is a Data Descriptor?
  13. Data Descriptors have human and machine readable components 13 Human readable representation of study i.e. article (HTML & PDF) Human readable representation of study i.e. article (HTML & PDF) Machine readable representation of study i.e. metadata
  14. Synthesis Analysis Conclusions What did I do to generate the data? How was the data processed? Where is the data? Who did what and when? Methods and technical analyses supporting the quality of the measurements. Do not contain tests of new scientific hypotheses Comparison of Data Descriptor to traditional article
  15. What types of data can be published? 15 Decades old dataset Standalone dataset Data that has been used in an analysis article Large consortium dataset Data from a single experiment Data that the researcher finds valuable and that others might find useful too Data associated with a high impact analysis article
  16. When can a Data Descriptor be published? 16 After data analysis has been published Before analysis has been published Authors not intending to analyse data Data Descriptors can be submitted and published at any point in the research workflow, i.e. whenever it makes most sense for your data After data analysis has been published Before the analysis has been published Publication alongside analysis article
  17. Scientific Data accepts submissions from all quantitative research disciplines 17
  18. Helping authors find the right place for their data
  19. Scientific Data’s Repository List Browse our recommended data repositories online. • We currently list almost 80 repositories, across biological, medical, physical and social sciences • When required, we provide guidance to authors on the best place to store their data www.nature.com/sdata/data-policies/repositories
  20. Generation of machine readable metadata
  21. • We want to capture metadata about the dataset being described in each Data Descriptor • The manuscript captures human readable metadata needed for data reuse • The curated metadata records capture machine readable metadata needed for machine based data discovery Metadata at Scientific Data
  22. ISA-Tab format for machine readable metadata 22 • Study workflow • Key sample characteristics needed for data discovery • Relates samples to data files • Shows location of dataset • Uses controlled vocabularies and ontologies (where possible)
  23. Use of community endorsed ontologies and controlled vocabularies 23 Controlled vocabulary = list of standardized phrases of scientific concepts Ontology = controlled vocabulary with defined relationships between terms
  24. Structured Summary table from curated metadata 24 Investigation file Study file Sample characteristics reported in Structured Summary table: Organism Organism part Cell line Geographical location Environment type
  25. Viewing the metadata 25 1. 2. 3.
  26. Metadata for data discovery Search by: • Data Repositories • Experiment design • Measurements made • Technologies used • Factor types • Sample Characteristics • Organism • Environment types • Geographic locations scientificdata.isa-explorer.org
  27. Citing Data
  28. Citing my own data 1. In the article text 2. In the Data Citation section
  29. Citing data I’ve reused 1. In the article text 2. In the References section
  30. Clinical researchers support sharing, but… Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570 • Sharing de-identified data via repositories should be required (236 respondents, 74%) • Investigators should share de-identified data on request (229 respondents, 72%)
  31. …clinical data producers have specific concerns Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
  32. Example initiatives for sharing clinical data Yale Open Data Access (YODA) & Clinical Study Data Request (CSDR) projects: • Data Use Agreements (DUAs) • Controlled access environment • Scientific validity of reanalysis checked • Independent governance • Data anonymisation checks http://yoda.yale.edu/ https://www.clinicalstudydatarequest.com/
  33. Clinical data publication at Scientific Data • Identify repositories able to archive clinical data • Work with identified repositories to establish workflows for peer review and publication, whilst maintaining patient privacy • Facilitate specialist peer review process for clinical data, for example ensure peer reviewers have agreed to terms of data use agreement Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non- public clinical datasets: guidance for researchers, repositories, editors and funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
  34. A robust data-on-request workflow?
  35. Published Data Descriptor with clinical data Data Records section details how to access the data
  36. Links to restricted access data Data Citations link to repository Data files requiring permission to access Freely accessible data files
  37. Data Reuse stories
  38. Data reuse by (some of) the same researchers 38
  39. Data reuse by other researchers in the same field 39 “The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.” Professor Daniele Marinazzo
  40. According to Google Scholar, cited 43 times! (February 2016) Data reuse and citation by researchers
  41. 41 www.bbc.co.uk/news/science-environment-33057402 Data reuse by the non-research community
  42. Data reuse by the non-research community 42 http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
  43. Data Descriptors… • …enable you to gain scholarly credit for your data gathering efforts. • …are human AND machine readable. • …can be published with, or independently of, an analysis article. • …can be published point in the research workflow. • …allow the publication and discovery of clinical data, whilst maintaining your patients privacy. • …result in greater reuse and citation by fellow members of your research community. • …extend the impact of your research data by enabling access to and reuse by the non-research community. 43
  44. Get more from your data Preserve it Encourage reuse Get credit for it Visit nature.com/sdata Email scientificdata@nature.com Tweet @ScientificData #SDJPN16
Anzeige