Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Enhance your rese​arch impact through open science

124 Aufrufe

Veröffentlicht am

Presentation slides on Open Science and research reproducibility. Presented by Gareth Knight (LSHTM Research Data Manager) on 18th September 2018, as part of an Open Science event for LSHTM Week 2018.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Enhance your rese​arch impact through open science

  1. 1. Enhance your research impact through open science Gareth Knight Research Data Manager Library & Archives Service researchdatamanagement@lshtm.ac.uk
  2. 2. Open Science A broad movement that seeks to improve the quality of research through greater: • Transparency: Ensure methods are clearly explained and made available earlier • Consistency: Common standards, tools and services are used to perform analysis. • Collaboration: Opportunities are available for external contribution & collaboration on research • Access: All resources necessary to recreate the analysis are made available in a form that enable verification & reuse (Summary: it’s science with the benefit of 21st century tools)
  3. 3. Reproducibility Crisis Vimes et al (2014) investigated data availability for 516 articles published 2-22 years previous – odds of a dataset being obtainable fell by 17% per year A 2016 Nature survey revealed 52% of 1,576 surveyed researchers considered there to be a 'significant' reproducibility crisis in science. • Approx. 68% of respondents failed to reproduce medical experiment. Research replication is time-consuming and expensive • Cancer Biology: https://osf.io/e81xl/wiki/home/ • Psychological Science - https://osf.io/ezcuj/wiki/home/ Retraction Watch lists 18,000+ papers that have been retracted, many as a result of faulty science Vimes et al (2014) https://doi.org/10.1016/j.cub.2013.11.014 Nature (2016) https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  4. 4. What are the benefits of open science? Analysis of open research practices and motivations of 583 Wellcome & 259 ESRC funded researchers: • Improved visibility of research • More publications • Higher citation rate – See Piwowar & Vision (2013) • Contribute to academic profile • Career benefits (e.g. promotion) • New collaborations Van den Eynden, V. et al. (2016) Towards Open Research: Practices, experiences, barriers and Opportunities. Wellcome Trust. https://doi.org/10.6084/m9.figshare.4055448 Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. https://doi.org/10.7717/peerj.175
  5. 5. Open Science by Design Plan Collect ManageAnalyse Publish https://www.flaticon.com/free-icon/scientist_857648 Enhanced Research standards Enhanced Research standards Open Education Resources Open Education Resources Open software Open software Citizen Science & peer review opportunities Citizen Science & peer review opportunities Open accessOpen access Reusable resources Reusable resources
  6. 6. Research Reproducibility
  7. 7. Research Objectives Research is reviewed for many purposes: • Verification: check analysis to confirm conclusions are valid • Replicate: Same methods applied to get same result, different environment • Reproduce: Same methods applied, different setup • Reuse: same data, different research What steps do you take to ensure research is easier to validate/replicate/reproduce or reuse by others? The Difference https://xkcd.com/242/
  8. 8. Plan for openness from the outset Plan Be aware of requirements Consider community engagement opportunities Document research protocol & publish Data collection Inform participants and relevant stakeholder Acquire raw data in electronic form using secure systems (e.g. ODK) Data Management Organise resources logically Ensure raw data is read only Assign unique IDs to relevant items Data processing Automate processing activities (as far as possible) in an open format to enable it to be re-applied Document activities performed to ensure an audit trail Data analysis Provide opportunities for relevant individuals to contribute Store resources used to underpin analysis (inc. that used to produce graphs) Reporting Consider how resources can be made accessible Ensure resources are curated & accessible in the long-term https://doi.org/10.1371/journal.pcbi.1003285
  9. 9. Openness requirements Research practice • Demonstrate rigour of research Funder requirements: • Gold vs. Green • Publication status, research data, other outputs Domain-specific reporting guidelines: • For study protocol and project outputs https://www.equator-network.org/ Journal policies: • Transparency and Openness Promotion (TOP) https://cos.io/our-services/top-guidelines/ • Joint Data Archiving Policy (JDAP) https://datadryad.org//pages/jdap https://cos.io/prereg/
  10. 10. Storage and organisation • Ensure project resources are stored in a location that is secure and available to relevant parties • Can you find files from a project completed 10 years ago? • Store on Secure Server or other defined location • Adopt a consistent structure to organise & label content • Content type (data, documents, code) • Version (raw, processed) • Sensitivity – store personal info in secure locale • Create a file inventory spreadsheet • Filename, location, content, source, sensitivity, etc. https://xkcd.com/1459/
  11. 11. Tidy data Common issues: • Column headers contain values • Multiple variables held in 1 column. • Variables held in both rows and columns. • Multiple types of observation recorded in the same table. Wickham applies 3rd Normal Form: • One row for each observation • One column for each variable • One table for each type of observation • Column headers (where they are used) should be variable names, Tidy data tools: tidyr, dplyr, ggplot2, data.table, pandas A set of principles to make data more consistent https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf
  12. 12. Documentation & metadata What info is needed to replicate or re-apply your analysis? What info is needed to analyse and use your data? User guide: • Study design and data collection methods • Data Analysis and Preparation • Quality checks applied Codebook: • Variable type (Continuous, Ordinal, Categorical, Missing values, censored/redacted) • Permitted responses & their meaning (what is 1?) • Abbreviations & phrases • Research protocols • Standard Operating Procedures • Codebooks & data dictionaries • Informed Consent form & participant information sheet • Questionnaires, interview guide and other collection tools • Data papers and other publications • Other relevant documents http://www.dcc.ac.uk/resources/metadata-standards
  13. 13. Working with code and scripts in workflows • Use ‘open’ programming/scripting languages not dependent upon proprietary software • Don’t reinvent the wheel: reuse existing code if it serves purpose • Don’t update the source data, generate a derived file & label the version no. • Ensure a header to code files that explains their purpose and indicate who created it & when • Add comments throughout code explaining purpose of functions/specific lines (if not obvious) • Document dependencies, including version number
  14. 14. Providing access to resources What do you make available? Anonymised data Code Research tools Workflows When do you make it available? - During the project lifetime On publication of findings Within 6-12 months of publication Where do you host it? What platforms are appropriate to your needs? How will access be provided? Open vs. controlled access Need a reason Participant consent, identifiable - How will it be managed? Corresponding author, Data Access Committee, Data Sharing Agreement https://www.flickr.com/photos/lwr/3897479560 https://www.flickr.com/photos/ryanr/142455033/
  15. 15. Data sharing principles Publish a description in a research catalogue Obtain a permanent ID to make it easy to cite Provide clear method to obtain files – open vs. safeguarded Handle access consistently (PLOS req.) Use recognised domain standards & vocabularies Common formats, e.g. STATA, CSV Apply clear usage licence - Creative Commons or other Provide documentation relevant to researchers in your field The FAIR Guiding Principles for scientific data management and stewardship
  16. 16. Resource management tools Functionality: • Lifecycle management • Object & version identifiers • Workflow description standards that balance generic & domain specific needs (E.g. DDI lifecycle, BPM variants) Platforms: • Electronic Lab Notebooks (Rspace, SciNote, LabArchives • Code hosting: My Experiment, runmycode, Github/lab • Repository platforms: OSF, Data Compass
  17. 17. Analysis and reporting tools Growing number of online tools allow you to create and share interactive documents that contain live code, data, and other resources • R Markdown - https://rmarkdown.rstudio.com/ • Jupyter - http://jupyter.org/ • Collaboratory https://colab.research.google.com/ • Benefits: • Dynamic content that combines data & analysis • Development environment - R, Python SQL. • Disadvantages: • Another complex platform to host & manage • Content will become publicly accessible Images sourced from project webpages
  18. 18. In summary Open science requires you to consider: • Research stakeholders who will be interested in your work • The value of research outputs for verification and further use • Systems that will be used to collect, manage, analyse and provide access to research https://www.flickr.com/photos/keith_marshall_avery/8132240925/

×