Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

FAIR BioData Management

56 Aufrufe

Veröffentlicht am

Keynote at Workshop "Ready for BioData Management?" 2nd July 2019, Lisboa, Portugal

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

FAIR BioData Management

  1. 1. FAIR BioData Management Ulrike Wittig Heidelberg Institute forTheoretical Studies, Germany
  2. 2. - Where do you store your experimental data? - What happens with data when a PhD students leaves the group? - Are all data complete for a publication? - Do you make regular backups of your local machine? - Do you send emails to share data with your colleagues? - Do you always store email attachements in your local directory? - Do you store all different versions of a data file together in the same place? - Which protocol was used for the experiment? ... Why do you need data management?
  3. 3. Vahan Simonyan, Center for Biologics Evaluation and Research, Food and DrugAdministration, USA How well is your experiment documented?
  4. 4. • Track collection of raw and processed (secondary) data, models & metadata • Maintain experimental context • Organise and link assets • Choose what to keep and what to ditch • Report consistently • Reproducible publications • Promote standardised metadata practices • Exchange among colleagues • How and when to share and publish • Get and give credit • Retain and find beyond project • Integrate with legacy, home grown, external systems • Reuse tools and community archives • Support automation and analytics workflows. Support curation CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA ACCESS TO DATA RE-USING DATA Purpose of Project Data Management
  5. 5. Purpose of Project Data Management Organisation Communication Dissemination Partners Funders Public
  6. 6. The FAIR Guiding Principles for scientific data management and stewardship https://www.nature.com/articles/sdata201618 (2016)
  7. 7. FAIR Principles FAIR ≠ FREE
  8. 8. FAIR Checklists Making Data Findable (documentation and metadata management) • What documentation and metadata will accompany the data (assist its discoverability)? (Details on methodology, definitions, procedures, SOPs, vocabularies, units, dependencies, etc) • What information is needed for the data to be read and interpreted in the future? • What naming conventions will be used? • How will you approach versioning your data? • How will you capture / create this documentation and metadata? • How do you ensure the completeness of the captured data? Making Data Accessible Specify which data will be made openly available taking into consideration • What ethics and legal compliance issues do you have if any? Do you need consent for data preservation and sharing? Do you have to protect certain data? Is any data sensitive? • Do you think you might have Intellectual Property Rights issues? Have you considered ownership of the data, licensing, restrictions on use? • Do you think you will need to embargo any data? • How will you make the data available? (consider the platforms you will use: databases, repositories, etc) • What methods or software tools are needed to access the data? shoudl you include documentation detailing how to access use/access the software that is needed for accessing the data? Is it possible to include this software with the data (e.g. source code, docker etc) • If there are any restrictions on accessibility, how will you provide access? Making Data Interoperable • What standards (metadata vocabularies, formats, checklists) or methodologies will you use? • How do you address data and model quality?What validation steps do you foresee? • Will you use standardised vocabulary for all data types to allow inter-disciplinary interoperability? • Where you can not used standardised vocabulary for all types of data, can you map to more commonly used ontologies? Making Data Re-usable • How will you licence your data to permit the widest re- use possible? • When will the data be made available for re-use? Does this include an embargo period? (if so, why?) • Which data will be available for re-use during/after the project? If not, why? • What are your data quality assurance processes? • How long do you expect your data to remain re-usable?
  9. 9. FAIRDOM Initiative - develop a community - establish an internationally sustained Data and Model Management service - joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE
  10. 10. A bit of history : 11Year Anniversary 2008 2010 2014 2018 2012 2016 2020 Standards based asset management (data, models, workflows, SOPs…) for multi-party projects Sensitive sharing Self-deposit / curation Mixed stewardship skills Legacy local systems Community resources Started in Systems Biology. Now widened.
  11. 11. SEEK Software - Open source web platform for sharing scientific research assets, processes and outcomes - Associations between data along with information about the people and organisations (yellow pages) - ISA (Investigation, Study, Assay) structure for describing how individual experiments are aggregated into studies and investigations - Flexible and detailed sharing permissions - DOI can be generated for individual items, or entire aggregates - Semantic technology, allowing sophisticated queries over the content - Collection of meta data https://seek4science.org/
  12. 12. Models Data files SOP Standard Operating Procedures Documents People Programmes Projects Publications SEEK Software PresentationsEvents Samples
  13. 13. Catalogue of distributed data Personal Data Local Stores External Databases Articles Models Standards SOPs
  14. 14. Investigation Study Analysis Data Model SOP(Assay) https://fairdomhub.org/investigations/56
  15. 15. Investigation - Study - Assay https://fairdomhub.org/investigations/56
  16. 16. Investigation: Glucose metabolism in P. falciparum trophozoites Study: Model construction Study: Model validation Assay: LDH Assay: PK Assay: ENO Assay: PGM Assay: PGK Assay: GAPDH Assay: TPI Assay: ALD Assay: PFK Assay: PGI Assay: HK Assay: GLCtr Assay: PYRtr Assay: LACtr Assay: G3PDH Assay: GLYtr Assay: ATPase Data: GLCtr Model: GLCtr Data: HK Model: HK Steady state Incubation penkler1 Validation data penkler2 Validation data ... ... SOP: GLCtr SOP: HK ... SOP: Validation Assay: Culturing Assay: Lysate prep. SOP: Culturing SOP: Lysate prep. Design an ISA Investigation - Study - Assay
  17. 17. People -Yellow Pages
  18. 18. Data Files, SOPs, Documents - no file format restrictions - some formats allow to view the content in SEEK: e.g.Excel,Word, PDF, XML, PNG
  19. 19. Models SBML Model simulation Model comparison Model versioning Reproducing simulations [Jacky Snoep, Dagmar Waltemath, Martin Peters, Martin Scharm]
  20. 20. Tracking versi0ns
  21. 21. Tracking model versions smartly Scharm, M.,Wolkenhauer, O., &Waltemath, D. (2015). An algorithm to detect and communicate the differences in computational models describing biological systems. Bioinformatics
  22. 22. SpreadsheetTemplates Embed ontologies into Excel templates Excel spreadsheets enriched with ontology annotations Upload, extract metadata and register http://www.rightfield.org.uk
  23. 23. Samples Generation of templates for sample types Sample extraction from spreadsheets HTP sample referencing and metadata migration
  24. 24. Data Sharing in SEEK
  25. 25. Publishing in SEEK
  26. 26. Publishing in SEEK - DOI https://fairdomhub.org/investigations/56
  27. 27. Publishing in SEEK Fix state with particular versions Active entry continues to evolve Assign a DOI
  28. 28. DOI in Publication
  29. 29. More than simple supplementary materials 16 datafiles (kinetic, flux inhibition, runout) 19 models (kinetics, validation) 13 SOPs 3 studies (model analysis, construction, validation) 24 assays/analyses (simulations, model characterisations) Penkler, G., du Toit, F., Adams, W., Rautenbach, M., Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015), Construction and validation of a detailed kinetic model of glycolysis in Plasmodium falciparum. FEBS J, 282: 1481–1511. doi:10.1111/febs.13237
  30. 30. Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata - Bundling files - Keeping provenance - Exchanging data - Shipping results Bergmann, F.T.,Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1. Packaging: COMBINEarchive
  31. 31. Standards-based metadata framework for bundling (scattered) resources with context and citation Packaging: Research Objects http://researchobject.org
  32. 32. SEEK as project-specific local instances or as central FAIRDOMHub Service hosted at HITS (Institutional Guarantee at least until 2029)
  33. 33. FAIRDOMHub Statistics 1st July 2019 Programmes 60 Projects 144 Institutions 274 People 1291 Data files 2280 Models 487 SOPs 301 Sample types 63 Presentations 729 Publications 370 Events 178
  34. 34. FAIRDOM Platform Free and Open Source Front end Project(s) Hub Back end Onsite storage & analytics On site Tracking, data analytic pipelines, Extract,Transform and Load direct from the instruments, large data management LIMS, auto-archiving Web-based portal Project controlled spaces Metadata catalogue &Yellow pages Results repository, dissemination and collaboration Tool gateway Built using Built using
  35. 35. Back end Instrument Data Management, LIMS, ELN • samples • protocols • instruments • data management • experimental description Norway’s national e-Infrastructure for Life Science https://nels.bioinfo.no/ Electronic Laboratory Notebook and Laboratory Information Management System (ELN-LIMS) https://csb.ethz.ch/tools/software/openbis-lims-eln.html
  36. 36. [Adapted from Ursula Klingmüller, Martin Böhm] Excemplify Antibody Database FAIR collaboration from the ERANet ERASysAPP
  37. 37. 38 Programme Overarching research theme (The Digital Salmon) Project Research grant (DigiSal, GenoSysFat) Investigation A particular biological process, phenomenon or thing (typically corresponds to [plans for] one or more closely related papers) Study Experiment whose design reflects a specific biological research question Assay Standardized measurement or diagnostic experiment using a specific protocol (applied to material from a study) Jon Olav Vik, Norwegian University of Life Science Integration with Norway’s national einfrastructure for Life Science (NeLS)
  38. 38. • Project controlled protected spaces – Working space, show space for results – Supp. materials space for publications – Yellow pages and collaboration – Upload or link to data • One place catalogue – Regardless of physical store – ISA with shared metadata – Standards-compliant • Linked with other systems – Project on-site (secure) repositories – Public deposition archives – Integrated with JWSOnline modelling tools Front End Find, Access and Organise assets “Using FAIRDOMHub my own lab colleagues saw what I was doing and called to collaborate!” https://fairdomhub.org/
  39. 39. Catalogue across repositories regardless of location In House Stores External Databases Publishing services Secure Stores Model Resources Upload or Reference
  40. 40. Active and Published Data
  41. 41. Metadata Exchange along the Pipeline ELNs
  42. 42. PALs - Project Area Liaisons PALs DM Team Data management training Requirements & Suggestions • Training needs for users • Suggestions to improve SEEK • Requirements for new SEEK features and DM services
  43. 43. PALs - Project Area Liaisons - our user focus group - post docs, postgrads and techs - experimentalists, modellers and bioinformaticians - advocates and communicate our progress back to their projects
  44. 44. Data Stewards function, profession, cultural shift • 500,000 needed in Europe* • Specialist skills • Career pathways • Recognition Curation and management • Supported, Resourced • Recognised, Rewarded Sharing policy and practice embedded * Realising the Open European Science Cloud (2016)
  45. 45. Stewardship Support
  46. 46. Independent researchers Facilities Centres Projects Programmes Infrastructures Different Users, Different Use
  47. 47. LiSyM (Liver Systems Medicine) German Research Network on Systems Medicine for Liver Disease Supported by The German Federal Ministry of Education and Research 2016-2020 Multiple disciplines • Medicine • Biology, Biochemistry • Pharmacology • Physics • Bioinformatics • Data management • Industry 38 independent research groups: • Bayer AG • Max Planck (Dresden and Berlin) • MEVIS Fraunhofer (Bremen) • Leibniz Institute IfaDo (Dortmund) • Charité (Berlin) • DKFZ (Heidelberg) • Hospitals: Dresden, Kiel,Aachen, Homburg, Berlin, Heidelberg, Munich • + 18 Universitieshttp://www.lisym.org/
  48. 48. LiSyM Data Management
  49. 49. Clinical data sharing concept Goal: • Diffuse description of data throughout consortium Challenge: • Some partners cannot share Solution: • Share table structure • Create & share common code • Distributedly create summaries
  50. 50. NMTrypI Trypanosomiasis causes sleeping sickness, leishmaniasis and Chagas disease - in Africa, South America and India EU-funded project 2014 – 2017 Goal: new candidate drugs against Trypanosomatidic infections Consortium: 12 partners (3 SMEs and 9 academics) in Europe and in disease- endemic countries (Italy, Greece, Portugal, Spain, Germany, UK, Sudan, Brazil) https://fp7-nmtrypi.eu
  51. 51. NMTrypI specific challenges • New visualizations of spreadsheet data • Cross-references with external databases • Chemical compound specific features – show structure – allow (sub)structure search – create compound summary reports
  52. 52. xxxxx Visualization of enzyme inhibition by different compounds (in %) Heat map + Parallel coordinates plot xxxxxxxxxxx xxxxxxxxxx xxxxxx xxxxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
  53. 53. Automatic detection of UniprotIDs in Excel-table and link to UniprotKB and StringDB
  54. 54. Chemical compound specific features
  55. 55. de.NBI -The German Network for Bioinformatics Infrastructure de.NBI consortium • 39 project partners • 30 institutions • 8 service centers https://www.denbi.de/ Mission • Provide, expand and improve specialized bioinformatics tools • Provide access to computing and storage capacities • Provide regular training events and workshops • Maintain and develop specific high-quality data resources
  56. 56. Research and service topics of de.NBI service centers HD-HuB Bioinformatics Infrastructures in Biomedical Research • Human genetics and genomics • Metagenomics • Systematic phenotyping of human cells • Epigenetics BiGi Microbial Research for Biotechnology and Medicine • High performance computing services • Repository of reusable workflows • Comparative genomics and meta-omics • Post-genomics data integration BioData Reference Databases, Services and Tools • Ribosomal RNAs (SILVA) • Environmental data (PANGAEA) • Taxon-associated metadata (BacDive) • Enzymes & Ligands (BRENDA/EnzymeStructures) CIBI Tools for omics data and imaging • Open-source libraries (OpenMS, SeqAn, FIJI) • Tools for NGS, mass spec, and imaging • Workflow engine (KNIME) for automation • (Multi-)omics data analysis workflows RBC RNA Bioinformatics • Analysis of RNA-related data • Life science data analysis with Galaxy • Meta-transcriptomics • Epigenetic research de.NBI-SysBio Standards-based Systems Biology • Data and model management tools • SABIO-RK reaction kinetics data • Methods and tools for modeling in Systems Biology • Standards & tools for model search and management GCBN Crops and BioGreenFormatics • Plant genetic resources and traits • Bridging genotypes to phenotypes • Plant gene and genome annotation • Enabling technologies to improve crops BioInfra.Prot Bioinformatics for Proteomics • Comprehensive proteomics workflow • Data publication, analysis & tool services • Quality standards for targeted proteomics • Lipidomics de.NBI -The German Network for Bioinformatics Infrastructure
  57. 57. Current Actions in de.NBI • Goal: Make Data FAIRness part of all de.NBI centers • Idea: Have service centers collect more metadata. No metadata, no service. • Approach: Build use cases that involve data management and service centers Two example use cases: Medical proteomics center • Statistical advice service – tracking of advice given – making reports FAIR • From data to PRIDE – Catalogue links to PRIDE in SEEK/FAIRDOMHub – Store and standardise intermediate files
  58. 58. Summary FAIRDOM FAIRDOM Software Platform+Tools A Central Public Hub for Projects Customised Project Installations Project Stewardship Consultancy Services Community Activities 144 Projects 30+ Installations
  59. 59. Summary FAIRDOM Find & Access Central catalogue Link to original files and external resources Search Metadata tagging and standards Yellow pages of projects and people Access control to spaces Embedded tools Interoperate Rich metadata, standards compliance Consistent reporting – ISA Curation support Integration with other resources, archives, tools Export packages Reuse Secure sharing space Long term retention Reproducible publication
  60. 60. - Where do you store your experimental data? - What happens with data when a PhD students leaves the group? - Are all data complete for a publication? - Do you make regular backups of your local machine? - Do you send emails to share data with your colleagues? - Do you always store email attachements in your local directory? - Do you store all different versions of a data file together in the same place? - Which protocol was used for the experiment? ... Why do you need data management?
  61. 61. What can you do? Be FAIR! 1. make a Data Management Plan 2. use standard identifiers 3. use metadata standards 4. catalogue / register data with metadata 5. define and share your SOPs 6. use data (assets) management platforms and tools that work together 7. deposit into public archives 8. have a sustainability / end project plan 9. resource and support, and that means people too 10. embed data management into work practices and do some training 11. give credit 12. check if you have sensitive data issues 13. educate your supervisors, institutions and peers
  62. 62. FAIRDOMTeam
  63. 63. Thanks to our sponsors, partners and collaborators
  64. 64. Thank you! https://fair-dom.org/ Questions? ulrike.wittig@h-its.org