Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

INSERM - Data Management & Reuse of Health Data - May 2017

INSERM - Data Management & Reuse of Health Data - May 2017

Herunterladen, um offline zu lesen

INSERM Workshop 246 - Management and reuse of health data: methodological issues: https://ateliersinserm.dakini.fr/en/workshop.246.management.and.reuse.of.health.data.methodological.issues-66-22.php

INSERM Workshop 246 - Management and reuse of health data: methodological issues: https://ateliersinserm.dakini.fr/en/workshop.246.management.and.reuse.of.health.data.methodological.issues-66-22.php

Weitere Verwandte Inhalte

Ähnlich wie INSERM - Data Management & Reuse of Health Data - May 2017

INSERM - Data Management & Reuse of Health Data - May 2017

  1. 1. On community-standards, FAIR data and scholarly communication Susanna-Assunta Sansone, PhD ORCID: 0000-0001-5306-5690 INSERM Workshop 246 “Management and reuse of health data: methodological issues”, Bordeaux, 14-17 May 2017 Data Consultant, Founding Academic Editor Associate Director, Principal Investigator www.slideshare.net/SusannaSansone
  2. 2. Source: https://www.dataone.org/best-practices Simplified research data life cycle
  3. 3. • Available in a public repository • Findable through some sort of search facility • Retrievable in a standard format • Self-describing so that third parties can make sense of it • The product of careful planning, organization and stewardship • Intended to outlive the experiment for which they were collected To do better science, more efficiently we need data that are…
  4. 4. Key problem: low findability and understandability • Not always well cited and stored o True for data as well as for any other digital asset • Poorly described for third party reuse o Different level of details and annotation • Reporting and annotation activities are perceived as time consuming o Often rushed and minimally done
  5. 5. We need content or reporting standards • To harmonized the datasets with respect to the structure and level or annotation of their: § experimental components (e.g., design, conditions, parameters), § fundamental biological entities (e.g., samples, genes, cells), § complex concepts (such as bioprocesses, tissues, diseases), § analytical process and the mathematical models, and § their instantiation in computational simulations (from the molecular level through to whole populations of individuals)
  6. 6. Minimum information reporting requirements, checklists o Report the same core, essential information o e.g. MIAME guidelines Controlled vocabularies, taxonomies, thesauri, ontologies etc. o Unambiguous identification and definition of concepts o e.g. Gene Ontology Conceptual model, schema, exchange formats etc o Define the structure and interrelation of information, and the transmission format o e.g. FASTA Formats Terminologies Guidelines Types of content standards
  7. 7. de jure de facto grass-roots groups standard organizations Nanotechnology Working Group Formats Terminologies Guidelines Community-driven efforts, just few examples
  8. 8. Formats Terminologies Guidelines 224 115 500+ source source source MIAME MIRIAM MIQAS MIX MIGEN ARRIVE MIAPE MIASE MIQE MISFISHIE…. REMARK CONSORT SRAxml SOFT FASTA DICOM MzML SBRML SEDML… GELML ISA CML MITAB AAO CHEBIOBI PATO ENVO MOD BTO IDO… TEDDY PRO XAO DO VO Content standards in numbers
  9. 9. How to discover the ‘right’ standards for your data?
  10. 10. A web-based, curated and searchable portal that monitors the development and evolution of standards, their use in databases and the adoption of both in data policies, to inform and educate the user community
  11. 11. Data policies by funders, journals and other organizations Content standards Formats Terminologies Guidelines Map this complex and evolving landscape Databases All records are manually curated in-house and verified by the community behind each resource
  12. 12. Data policies by funders, journals and other organizations Databases Content standards Formats Terminologies Guidelines Using indicators to describe ‘status’ Ready for use, implementation, or recommendation In development Status uncertain Deprecated as subsumed or superseded
  13. 13. Understanding how standards are used
  14. 14. Understanding how standards are used Guideline
  15. 15. Understanding how standards are used Formats Guideline
  16. 16. Understanding how standards are used Formats Guideline Formats
  17. 17. Understanding how standards are used Formats Guideline Formats Terminology
  18. 18. Data policies by funders, journals and other organizations Databases Content standards Formats Terminologies Guidelines Using indicators to indicate ‘adoption’
  19. 19. Standard developing groups:Journal, publishers: Cross-links, data exchange: Societies and organisations: Institutional RDM services: Projects, programmes:
  20. 20. Technologically-delineated views of the world Biologically-delineated views of the world Generic features (‘common core’) - description of source biomaterial - experimental design components Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Duplications & lack of interoperability among standards
  21. 21. Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Hard to use them in combinations, e.g. to represent: Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling
  22. 22. Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Enhancing modularization Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling
  23. 23. Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Enhancing modularization Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling
  24. 24. bsg-000174 biosharing: ReportingGuideline bsg-000161 MINSEQE MIMARKS sample information sample identifier taxonomy identifier sequence read geo location High-level information about the metadata standards Representations of the standards elements Template elements for el-000001 el-000002 el-000003 provenance: MINSEQE provenance: MINSEQE and MIMARKS provenance: MIMARKS Serve machine-readable content metadata standards, providing provenance for their elements, rendering standards invisible to the researchers Inform the creation of metadata templates
  25. 25. How to discover the datasets relevant to your work?
  26. 26. OmicsDI: Nature Biotechnology 35, 406–409 (2017) doi:10.1038/nbt.3790 omicsdi.org
  27. 27. datamed.org DataMed: bioRxiv 094888; https://doi.org/10.1101/094888 Nature Genetics (in press) DATS: bioRxiv 103143; https://doi.org/10.1101/103143 Scientific Data (in press)
  28. 28. • Discoverability and reusability o Complementing community databases • Incentive, credit for sharing o Big and small data o Unpublished data o Long tail of data o Curated aggregation • Peer review of data • Value of data vs. analysis Growing number of data papers and data journals, e.g:
  29. 29. nature.com/scientificdataHonorary Academic Editor Susanna-Assunta Sansone, PhD Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar Publisher Iain Hrynaszkiewicz A new open-access, online-only publication for descriptions of scientifically valuable datasets Supported by
  30. 30. • A peer reviewed description of data, to maximize usage • Citable publications that give credit for reusable data • It requires data deposition to the appropriate repository(s) • Is complementary and can be associated or not to traditional article(s) New article type
  31. 31. Research papers Data records Data Descriptors Value added component – complementing articles and repositories
  32. 32. • Title • Abstract • Background & Summary • Methods • Data Records • Technical Validation • Usage Notes • Figures & Tables • References • Data Citations • following the Joint Declaration of Data Citation Principles Detailed description of the methods and technical analyses supporting the quality of the measurements; no scientific hypotheses Article structure
  33. 33. Focus on data peer review • Completeness = can others reproduce? • Consistency = were community standards followed? • Integrity = are data in the best repository? • Experimental rigour, technical quality = were the methods sound? Does not focus on perceived impact, importance, size, complexity of data
  34. 34. Credit for data producers, data managers/curators etc. Credit to: Varsha Khodiyar
  35. 35. “The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.” Professor Daniele Marinazzo Credit to: Varsha Khodiyar Data (re)use made easier
  36. 36. Decades old dataset Aggregated or curated data resources Computationally produced data products Large consortium dataset Data from a single experiment Data that YOU find valuable and that others might find useful too Data associated with a high impact analysis article What makes a good ?
  37. 37. Experimental metadata or structured component (in-house curated, machine- readable formats) Article or narrative component (PDF and HTML) Data Descriptors has two components
  38. 38. The Data Curation Editor is responsible for creating and curating the machine-readable structured component • Enables browsing and searching the articles • Facilitates links to related journal articles and repository records Curation and discoverability
  39. 39. Created with the input of the authors, includes value-added semantic annotation of the experimental metadata analysis method script Data file or record in a database Data Descriptors: structured component
  40. 40. Complementary roles of ISA and nanopublications From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. https://doi.org/10.1371/journal.pone.0127612 PloS ONE (2015)
  41. 41. The (long) road to FAIR
  42. 42. Responsibilities lie across several stakeholder groups Understand the benefits of sharing FAIR datasets and enact them Engage and assist researchers to enable them to share FAIR datasets Release or endorse practices and polices, but also incentive and credit mechanisms for researchers, curators and developers
  43. 43. “As Data Science culture grows, digital research outputs (such as data, computational analysis and software) are being established as first-class citizens. This cultural shift is required to go one step further: to recognize interoperability standards as digital objects in their own right, with their associated research, development and educational activities”. Sansone, Susanna-Assunta; Rocca-Serra, Philippe (2016). Interoperability Standards - Digital Objects in Their Own Right. Wellcome Trust” https://dx.doi.org/10.6084/m9.figshare.4055496.v1
  44. 44. Philippe Rocca-Serra, PhD Senior Research Lecturer Alejandra Gonzalez-Beltran, PhD Research Lecturer Milo Thurston, DPhD Research Software Engineer Massimiliano Izzo, PhD Research Software Engineer Peter McQuilton, PhD Knowledge Engineer Allyson Lister, PhD Knowledge Engineer Eamonn Maguire, Dphil Contractor David Johnson, PhD Research Software Engineer Melanie Adekale, PhD Biocurator Contractor Delphine Dauga, PhD Biocurator Contractor We work with and for to make data and other digital research assets Susanna-Assunta Sansone, PhD Principal Investigator, Associate Director and Data Consultant for Springer Nature enabling open science, driving science and discoveries

×