Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Management in Research – Why and How?

137 Aufrufe

Veröffentlicht am

What is Data Management and why should it concern you?

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Data Management in Research – Why and How?

  1. 1. ||ETH Library / Scientific IT Services Ana Sesartic & Matthias Töwe Caterina Barillari Digital Curation Office Scientific IT-Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 1 Data Management in Research – Why and How?
  2. 2. ||ETH Library / Scientific IT Services  from the  Digital Curation Office at ETH-Library  Scientific IT-Services  sharing a scientific background ourselves  here to discuss data management as part of your research  to learn more about your needs in the process  and to motivate you to think critically about the chances and limitations of data management and re-use. 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 2 Nice to meet you, we’re…
  3. 3. ||ETH Library / Scientific IT Services  Your scientific background  Motivation to attend this course 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 3 Shortly tell us something about… © icon 54 from Noun Project
  4. 4. ||ETH Library / Scientific IT Services What is data management and why should it concern you? ETH Regulations • Intellectual property, privacy and access rights Data management plans Exercise • What does it take to understand someone‘s data? Coffee Break Active Data Management Data sharing • With project partners & with the community Long term preservation of data 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 4 Today’s Programme
  5. 5. ||ETH Library / Scientific IT Services What is data management and why should it concern you? Introduction 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 5
  6. 6. ||ETH Library / Scientific IT Services “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” Digital Curation Centre 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 6 What is data? Slide adapted from the PrePARe Project
  7. 7. ||ETH Library / Scientific IT Services  Manage your data while doing research  Share, publish, preserve your data for others – and for yourself ! 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 7 Two major issues
  8. 8. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 8 Why spend time and effort on this?  Preserve data that cannot be replicated (e.g. observational data)  Avoid redundant data creation/collection  Highlight patterns or connections that might otherwise be missed  To enable data re-use and sharing – even for yourself  Facilitate collaboration  Raise your impact: your data can be cited  To meet funders’ and institutional requirements  SNF asks for data management plans as of October 2017  EU Horizon 2020 asking for data management plans  Keep work in accordance to good scientific practice, transparency and validity  You may be able to influence the discussion in your community, in your institution and with funders
  9. 9. ||ETH Library / Scientific IT Services ETH regulations, intellectual property, privacy and access rights 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 9
  10. 10. ||ETH Library / Scientific IT Services https://itsecurity.ethz.ch/en/#/manage_your_data 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 10 Recent Overview
  11. 11. ||ETH Library / Scientific IT Services  At the ETH Zurich research is founded on intellectual honesty. Researchers […] are committed to scientific integrity and truthfulness in research and peer review.  For research data, see Art. 11, in particular.  https://doi.org/10.3929/ethz-b-000179298 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 11 Guidelines for Research Integrity
  12. 12. ||ETH Library / Scientific IT Services  Project Members:  adhere to the principles of good scientific practice and the guidelines for Research Integrity at ETH.  All steps of treatment of primary data must be documented in a form appropriate to the discipline and results must be reproducible.  Project Manager:  responsible for data management (data collection, storage, data access, compliance with data protection requirements, retention for the period prescribed by the discipline ...).  Ensures that all research project participants are aware of the guidelines.  Determines together with the professor, which departed project members should retain access to the primary data or materials.  See Guidelines for Research Integrity: https://doi.org/10.3929/ethz-b-000179298 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 12 Roles and Responsibilities
  13. 13. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 13 Do you know where your data is and who has access to it? http://fsfe.org/nocloud
  14. 14. ||ETH Library / Scientific IT Services  The removal of sensitive data from ETH Zurich (e.g. research data subject to contractual confidentiality with third parties, important ETH Zurich business data such as financial data, personal employee or student data, reports) is not permitted. ETH Zurich must retain access to and control over such data at all times.  The use of cloud and social media services (e.g. Facebook, Google, Dropbox) in research, for exchange with researchers at other universities, or in teaching for exchange with students (lecture folders, etc.) is permitted as long as no sensitive ETH Zurich data are affected and no third party rights, in particular privacy or intellectual property rights, are infringed. Links: https://www.ethz.ch/content/dam/ethz/associates/services/Service/IT-Services/files/broschueren/rechtliches/de/Merkblatt_Cloud_Computing_MA.pdf https://itsecurity.ethz.ch/leaflet_example_cloud_EN.pdf 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 14 Cloud Computing@ ETH Zurich Rules and Regulations © Symbolon from Noun Project
  15. 15. ||ETH Library / Scientific IT Services  […] all ETH members […] are required to integrate the general conditions and internal directives into the work process.  In the research context, the project manager plays an active role in guiding and monitoring junior scientists. In particular, he or she is responsible for making sure that everyone involved in the project is aware of the research integrity guidelines.  Junior scientists are given appropriate guidance.  Primary data is carefully archived.  From: https://rechtssammlung.sp.ethz.ch/Dokumente/133en.pdf 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 15 Compliance Guide
  16. 16. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 16 Privacy © Hea Poh Lin from Noun Project  People-related data need to be preserved according to Swiss data protection law Federal Act on Research involving Human Beings (https://www.admin.ch/opc/en/classified-compilation/20061313/index.html) Federal Act on Data Protection (https://www.admin.ch/opc/en/classified-compilation/19920153/index.html) Swiss Criminal Code (https://www.admin.ch/opc/en/classified-compilation/19370083/index.html)  Appropriate anonymization might be required  The deletion of individual datasets must be possible at all times  The study subjects need to sign a declaration of consent  More information: ETH Zürich Ethikkommission (German): https://www.ethz.ch/services/de/organisation/gremien-gruppen-kommissionen/ethikkommission.html
  17. 17. ||ETH Library / Scientific IT Services 17 Personalized Health Data in Switzerland Secure data management & computing environment:  Multi-factor authentication  Auditable user actions  Data encryption  Usage policy (IT, legal, ethical) User actions:  Authorized system access  Role based data access  Results can be shared, Data Not Ecosystem Requirements  distributed data sources  primary data management at data source  expose anonymized/identified sensitive personal data Ana Sesartic / Matthias Töwe / Caterina Barillari 25.10.2017  ID-SIS will provide trainings in this area starting from 2018
  18. 18. ||ETH Library / Scientific IT Services For publications and for data!  Respect the rights of others  Third parties  Individuals you work with  In case of doubt: seek permission even when a CC-licence is assigned  Note that according to ETH law, ETH reserves most immaterial rights in works by its employees. When in doubt, contact ETH transfer (www.transfer.ethz.ch)  Make sure you keep sufficient rights  Eg. for Open Access Publishing (green path)  Eg. with respect to patent applications: ETH transfer (www.transfer.ethz.ch) 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 18 Intellectual Property Rights: What you need to consider
  19. 19. ||ETH Library / Scientific IT Services Exercise: What it takes to understand someone’s data 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 19 ©TheUpturnedMicroscope
  20. 20. ||ETH Library / Scientific IT Services 1. What information is needed to understand your data? 2. What information do you expect from metadata in your field? Is this sufficient for you to work with others’ data? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 20 Discuss
  21. 21. ||ETH Library / Scientific IT Services  Data, metadata and context are needed to properly understand a data set.  Data management does not start with your own data, but also includes a critical view on other people’s data you use:  Do you understand how they were produced?  Do you have enough information on evaluating their reliability?  Are you comfortable with using data without talking to its producers?  Will you know in a few months time which data you re-used from other researchers?  Do you know how to cite the data you use? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 21 Critically re-thinking the (re-)use of data
  22. 22. ||ETH Library / Scientific IT Services Data Management Planning 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 22
  23. 23. ||ETH Library / Scientific IT Services A brief plan written at the start of a project and updated during its course to define:  What data will be collected or created?  How the data will be documented and described?  Where the data will be stored?  Who will be responsible for data security and backup?  Which data will be shared and/or preserved?  How the data will be shared and with whom? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 23 What is a Data Management Plan (DMP)? DMPs are e.g. demanded by: SNSF from October 2017 on http://www.snf.ch/en/theSNSF/research- policies/open_research_data/Pages/default.aspx Horizon2020 EU funding programme http://ec.europa.eu/research/participants/data/ref/h2020/grant s_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
  24. 24. ||ETH Library / Scientific IT Services  Aim: plan and document the research data life cycle from creation to preservation  Facilitate making data FAIR: Findable – Accessible – Interoperable – Re-usable  At minimum: make underlying data to publications accessible  DMP must be entered in mySNF web form together with proposal  It is a living document to be updated until project is finished  Final version of the DMP will be moved to P3 grants database  DMP is not part of the scientific assessment, but checked for plausibility by SNSF staff  Additional funding available for costs of enabling access (up to 10’000.- CHF)  Advice from the SNSF: comment on any issues not applicable for your project and give reasons  See SNSF’s documentation at http://www.snf.ch/en/theSNSF/research- policies/open_research_data/Pages/default.aspx, also contact ord@snf.ch for questions 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 24 The Data Management Plan (DMP) according to SNSF
  25. 25. ||ETH Library / Scientific IT Services  A proposal can only be submitted if a DMP was created  A DMP for SNSF must be created online in mySNF  You cannot upload a DMP created outside of mySNF – except in Lead Agency process, where the DMP has to be uploaded as a PDF version in the data container “other annexes”  Contents of DMP:  Instructions and examples for ETH Zurich: 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 25 How to submit a DMP to SNSF https://www.mysnf.ch http://www.snf.ch/SiteCollectionDocuments/DMP_content_mySNF-form_en.pdf http://www.library.ethz.ch/en/Media/Files/DLCM-template-for-the-SNSF-Data-Management-Plan
  26. 26. ||ETH Library / Scientific IT Services  Data Management Checklist by ETH and EPFL  Supports you in the creation of a DMP or in discussing data management in general, even if you don’t need to do it to comply with funders  http://bit.ly/rdmchecklist  DMPOnline  A tool by the UK Digital Curation Centre that helps you create Horizon 2020 compliant data management plans, by answering a questionnaire  https://dmponline.dcc.ac.uk 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 26 What to do for other funders? Collection of DMP examples: http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples
  27. 27. ||ETH Library / Scientific IT Services  Self-critical question:  What must data look like to enable us to re-use it with scientific conviction and trust into its quality and correctness?  Is this true for our own data? What is missing?  Tasks for group leaders  Agree on binding rules  Define data management responsible (DMR) within the group  Discuss and document rules (in writing) with DMR 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 27 Research Group Policy
  28. 28. ||ETH Library / Scientific IT Services Group discussion on current practices in data management 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 28
  29. 29. ||ETH Library / Scientific IT Services  Naming conventions: Do you have any? Which rules apply?  Versioning: How do you currently handle it? What works well? What went wrong?  ELNs: Do you have experience with any?  Sharing: Which tools or services do you use? What are your experiences?  Literature Management: Which tools do you use? What are their pros and cons? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 29 Topics
  30. 30. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 30
  31. 31. ||ETH Library / Scientific IT Services Active research data management 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 31 Picture by Mushonz (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
  32. 32. ||ETH Library / Scientific IT Services Research workflow in experimental labs Sample preparation Measurement Analysis Publication 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 32 ?
  33. 33. ||ETH Library / Scientific IT Services What is active research data management? ARDM is the process of annotating, storing and backing up all data when it is generated, throughout the life-span of a project. Why is it important? 1. Data without context is not very useful, same for experimental descriptions without data. 2. Think about your future self. Will you be able to find/reproduce your data in 6 months, 1 year? Having organized & documented data simplifies the process of writing papers and PhD thesis. 3. Think about others. Would a new lab member be able find/reproduce your data? 4. Always have backup in place! 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 33
  34. 34. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 34 Data types in experimental labs Meeting notes Presentations Literature .... Generic files Descriptions of experimental procedures Protocols List of all materials used in lab (e.g chemicals, bio- samples, equipment) Materials Data generated by processing of raw data Processed dataData generated by measuring instruments Raw data Data generated by analysis of raw/processed data with different software/scripts Analyzed data Scripts used for data processing/ analysis Scripts
  35. 35. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 35 Storage options at ETH Personal computer Measuring instrument Local HD • Departmental IT Service Group (ISG) is the primary contact regarding data storage & backups • ETH IT Services provides centralized storage solutions and backup File server across network NAS For large amounts of data (>100TB) CDS Long term storage of valuable data LTS
  36. 36. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 36 The “data spread” Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts
  37. 37. ||ETH Library / Scientific IT Services What does it take to manage research data? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Experimental description / notes Date Title Materials Methods Analysis Results Data files (raw, analysed, scripts, etc) Materials/samples 37 Protocols  Complex process that requires tracking and linking different types of information
  38. 38. ||ETH Library / Scientific IT Services 1. Management of materials and samples 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 38  Biological samples  Chemical samples  Materials  Devices How? What?  Not scalable  No sharing  No efficient search  Easy to use  Scalable  Sharing  Search functionality  Require time for set up and maintenance Spreadsheets Database
  39. 39. ||ETH Library / Scientific IT Services 2. Management of protocols 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 39  Step by step description of procedure  Experimental/computational parameters (e.g. temperature, time, etc)  Machine used (experimental)  OS, program, version, etc (calculation) How? What?  Not scalable  No sharing  No efficient search  Easy to use  Scalable  Sharing  Search functionality  Require time for set up and maintenance  Scalable  Sharing  Search functionality  Versioning  Not scalable  No sharing  No search  Easy to usePaper notebook Text files Database
  40. 40. ||ETH Library / Scientific IT Services 3. Management of research data files 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Files/folders naming conventions  Structured organization of data  Data is annotated with metadata  Searchable  One central location 40 Raw data Processed data Analyzed data Scripts What? How? MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 Database (DM platform)
  41. 41. ||ETH Library / Scientific IT Services  Keep stuff together that belongs together  Keep path names short  < 255 characters  File names should  Reflect content and be unique  Use only ASCII characters (no diacritic characters)  No spaces  Lowercase or camel case (LikeThis)  Careful! Not all systems are case sensitive!  UNIX: case sensitive  Win/Mac: mostly case insensitive  Assume that this, THIS and tHiS are the same.  Document your structure and file naming conventions in a README text file 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 41 File organisation tipps  Write dates like this: YYYY-MM-DD © XKCD https://xkcd.com/1179/ For further file and folder organisation tipps, see:  http://www.data.cam.ac.uk/data-management- guide/organising-your-data  http://www.wur.nl/en/Expertise-Services/Data- Management-Support-Hub/Browse-by- Subject/Organising-files-and-folders.htm  http://datalib.edina.ac.uk/mantra/organisingdata/
  42. 42. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 42 MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 A possible structure…
  43. 43. ||ETH Library / Scientific IT Services Version control 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 43  Management of changes to documents and computer programs https://subversion.apache.org https://github.com https://bitbucket.org https://www.ethz.ch/services/en/it-services/catalogue/web-application-hosting/sharepoint.html (For documents only!) https://gitlab.ethz.ch  ID-SIS provides hands-on trainings on git for code management (info @ https://sis.id.ethz.ch/consulting) How? What?  Naming convention with numbers (filename_v1, filename_v2, filename_FINAL, etc)  Use versioning tools
  44. 44. ||ETH Library / Scientific IT Services 4. Experimental description / notes 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Date Title Materials Methods Analysis Results 44  Goals  Materials  Methods  Experimental/computational procedure  Analysis procedures  Results  Links to data What to document?
  45. 45. ||ETH Library / Scientific IT Services Metadata & Standards 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 45  Metadata is the data about your data.  Use of structured metadata facilitates data organization and searches.  Examples of metadata:  Investigator  Date  Title  Description  Several metadata schemas are available. For info check the DCC website  Standards (taxonomies, synonyms, ontologies) are important to guarantee consistency.  General standards:  ISO 8601 for dates (YYYY-MM-DD or YYYYMMDD)  ISO 6709 for latitude/longitude  standards for SI base units (meters, kilograms, etc)  Scientific standards examples  Biology -> Gene ontology, NCBI taxonomy, etc  Physical sciences -> IUPAC, InChI  Earth science and ecology -> USGS Thesaurus, GIS dictionary, etc  Math & computer science -> Mathematics Subject Classification, ACM Computing Classification System
  46. 46. ||ETH Library / Scientific IT Services 4. Experimental description / notes 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari Date Title Materials Methods Analysis Results Paper laboratory notebook Electronic laboratory notebook (ELN) 46  Goals  Materials  Methods  Experimental/computational procedure  Analysis procedures  Results  Links to data How? What to document?
  47. 47. ||ETH Library / Scientific IT Services Electronic Laboratory Notebooks  ELNs are the digital version of the classic paper laboratory notebooks, used to record experimental lab procedures  Several open-source and commercial ELNs are available:  market analysis based on LIMSwiki.org (74 commercials + 11 open source)  There are different types of ELNs:  Generic note keeping applications (e.g. OneNote, Evernote, etc)  Discipline-specific ELNs  Free online cloud-based version  ELN with local installation 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 47
  48. 48. ||ETH Library / Scientific IT Services ELN vs. paper notebook 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 48 Advantages of ELN over paper notebook: 1. Sharing 2. Most ELNs have rights management 3. Most ELNs keep track of changes 4. Searching 5. Easier to link digital data 6. No issues with handwriting 7. Can be backed up Disadvantages of ELNs over paper notebooks: 1. Require change in working mode 2. Have a learning curve When choosing an ELN always make sure you can export your data in a common open format (e.g. xml, html, pdf, .txt, .docx, etc)
  49. 49. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 49 1. Paper lab notebooks ? Exp. description Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2
  50. 50. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 50 2. Generic note-keeping applications Central local storage Cloud Generic files Protocols ? Local HD NAS CDS LTS Scripts Raw data Processed data Analyze d data Materials Date Title Materials Methods Analysis Results MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 Advantages over paper notebook:  Can be shared  Small files upload
  51. 51. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 51 3. Scientific online ELNs Generic files Protocols (Materials) Cloud ? Local HD NAS CDS LTS Scripts Raw data Processed data Analyze d data (Materials) Date Title Materials Methods Analysis Results 1. Online ELNs store data on an external could: can you upload your data? 2. Free version only offers limited space, so these should not be considered DM solutions MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2
  52. 52. ||ETH Library / Scientific IT Services 4. Local installation of ELN+LIMS 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 52 Central local storage Generic files Protocols Materials Raw data Analyzed data Processed data Date Title Materials Methods Analysis Results Scripts ETH
  53. 53. ||ETH Library / Scientific IT Services Notebooks for code documentation 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 53 • Open source + commercial edition • Integrated development environment for R IPython Beaker • Open source • Several languages supported (Python, R, Scala, Julia, etc)  Applications that combine documentation, code, input and output generated by the code (e.g. graphs, plots) • Open source • > 40 languages supported (Python, R, Scala, Julia, etc) • Open source • Python only Mathematica • Commercial (Wolfram Research) • Used in scientific, engineering, mathematical fields
  54. 54. ||ETH Library / Scientific IT Services openBIS ELN-LIMS (ETH Scientific IT Services) 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 54
  55. 55. ||ETH Library / Scientific IT Services  Originally developed for management of life sciences data within SystemsX projects (CISD 2007-2013, ID-SIS since 2013)  Generic underlying structure makes it amenable to be used in other disciplines  Use-case in IFV: used to store data on students, educational materials, test results in a project to evaluate effectiveness of educational materials on students across Switzerland.  Physics UNIGE/CERN: used to keep track of all components used to build the cameras for the cherenkov telescopes and all technical data + quality control reports associated with them.  Currently used in several labs and facilities at ETH, in other Swiss and European universities, and a few companies 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 55 openBIS facts
  56. 56. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 56 openBIS ELN-LIMS in a nutshell Workflow manager (e.g Snakemake) Date Title Descripti on ….
  57. 57. ||ETH Library / Scientific IT Services openBIS ELN-LIMS features Relationships Storage manager Plasmid mapsBLAST search Import/Export File upload Big data Instrument integration 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 57 https://openbis-eln-lims.ethz.ch Jupyter integration Ordering
  58. 58. ||ETH Library / Scientific IT Services openBIS ELN-LIMS features Users need to authenticate to the system to login. Users can have different roles: Instance admin, Space admin, Space User, Space observer, Instance observer. Data is stored in “datasets”, which are immutable. Modified data must be uploaded in a new dataset. All modifications made to any entity in the system are recorded in the database. Space admins and Instance admins are allowed to delete entities. This permission can be removed upon request. A record of deleted entities can be kept in the database. Datasets can be archived to affordable tape storage when no longer needed. They can be recovered from this storage when needed. It is possible to export the complete lab notebook or parts of it. 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 58 https://openbis-eln-lims.ethz.ch Rights management Audit Trail Deletion Data preservation Data archiving Export
  59. 59. ||ETH Library / Scientific IT Services  Basic service for research groups:  Install and maintain openBIS on ID infrastructure.  Initial training.  Continuous support. 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 59 openBIS as a service from ID-SIS at ETH  Additional services (on demand)  Database customization.  Migration of existing databases.  Instrument integration for direct data upload.  Upload of existing historic raw data.  From 2018, SIS will have the mandate to provide active data management services to all ETH.
  60. 60. ||ETH Library / Scientific IT Services Remarks on active research data management  There are several options available  There is no “best for all”, it all depends on your research field, data types and research workflow  ARDM requires WORK & TIME, but the time spent on this is an investment for the future! 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 60
  61. 61. ||ETH Library / Scientific IT Services Collaboration – Creative Commons – Data Sharing 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 61
  62. 62. ||ETH Library / Scientific IT Services Data sharing 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 62  Data sharing / collaboration with project partners (during the project)  Data sharing with / publishing to the community (after publication of results)  Creative Commons Licenses for third parties
  63. 63. ||ETH Library / Scientific IT Services Only conditionally recommended  Data stored in EU/USA  Security regulations only partially fulfilled  Never store sensitive / private data there! Recommended  Data stored in Switzerland  Security regulations fulfilled 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 63 File Sharing tools https://www.dropbox.com https://www.switch.ch/drive/ https://www.switch.ch/filesender https://cifex.ethz.ch/ https://polybox.ethz.ch https://www.wetransfer.com
  64. 64. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 64 Collaborative project management tools https://www.openproject.org http://www.redmine.org https://trello.com https://slack.com https://tagpacker.com https://asana.com
  65. 65. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 65 Collaborative writing tools https://www.overleaf.com https://www.authorea.com https://atlas.oreilly.com https://hypothes.is https://evernote.com http://simplenote.com https://www.onenote.com https://www.ethz.ch/services/en/it- services/catalogue/web-application- hosting/sharepoint.html
  66. 66. ||ETH Library / Scientific IT Services www.zotero.org 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 66 Reference Management tools www.mendeley.com endnote.com http://www.jabref.org www.citeulike.org www.bibsonomy.org
  67. 67. ||ETH Library / Scientific IT Services Data sharing with the community And how to prepare for it 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 67
  68. 68. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 68 Benefits of data sharing © Neil Chue Hong http://dx.doi.org/10.6084/m9.figshare.942289
  69. 69. ||ETH Library / Scientific IT Services “In genomics research, a large-scale analysis of data sharing shows that studies that made data available in repositories received 9% more citations, when controlling for other variables; and that whilst self-reuse citation declines steeply after two years, reuse by third parties increases even after six years.” (Piwowar and Vision, 2013) 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 69 Benefits of Open Data: Impact and Longevity Van den Eynden, V. and Bishop, L. (2014). Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, http://repository.jisc.ac.uk/5662/1/KE_report- incentives-for-sharing-researchdata.pdf
  70. 70. ||ETH Library / Scientific IT Services www.dcc.ac.uk/resources/how-guides/license-research-data 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 70 Licensing research data Outlines pros and cons of each approach and gives practical advice on how to implement your licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? SA Share Alike Reduces interoperability ND No Derivatives Severely restricts use Horizon 2020 guidelines point to OR
  71. 71. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 71 share-alike by non-derivative Some rights reserved share non-commercial public domainremix
  72. 72. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 72 Data should be FAIR CC-BY-SA 4.0 Sangya Pundir https://upload.wikimedia.org/wikipedia/commons/a/aa/FAIR_data_principles.jpg
  73. 73. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 73 Deposit in a repository – but in which one? http://databib.org http://service.re3data.org/search
  74. 74. ||ETH Library / Scientific IT Services  Where will your data reside?  Which legislation applies, e.g. in terms of data protection?  Is the service sustainable?  Do you trust the provider?  Who else can access and use which of your data?  How can you get your data back?  Is a certain license required?  Are there immediate or longer term costs? 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 74 Criteria for chosing services and tools © Jorgen Stamp
  75. 75. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 75 Data repositories http://www.re3data.org http://datadryad.org https://zenodo.org http://figshare.com https://www.openaire.eu/search/data-providers (Only partially recommendable as according to their Terms of Use, figshare is allowed to delete data anytime and without notice.)
  76. 76. ||ETH Library / Scientific IT Services  New one-stop-shop for depositing research output ETH Research Collection (https://www.research-collection.ethz.ch)  Publications, Research Data  Web upload, DOI-reservation and registration, ORCID, Export to OpenAire…  Long term preservation in ETH Data Archive (http://www.library.ethz.ch/Digital-Curation)  Metadata is always public, access to content may be delayed or restricted  Aligned with FAIR principles (Findable – Accessible – Interoperable – Re-usable) according to SNSF guidelines. 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 76 ETH Research Collection
  77. 77. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 77 Registry of Publications / University Bibliography Web pages (AEM) Annual Academic Achievements Slide by Barbara Hirschmann
  78. 78. ||ETH Library / Scientific IT Services  Primary Publication of Reports, Presentations, Dissertations etc.  Secondary Publication of scientific papers (Green Road to Open Access) 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 78 Open Access Repository Publisher’s version Open Access version Slide by Barbara Hirschmann
  79. 79. ||ETH Library / Scientific IT Services • Publication of Research Data as Supplementary Material or stand alone • Access limited to selected users • Deposit for preservation only • All file formats permitted • Retention periods: 10 years / 15 years / unlimited 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 79 Research Data Repository Slide by Barbara Hirschmann
  80. 80. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 80 3 Ways for Importing Data Manual Entry Web of Science / Scopus: daily data export Input form DOI- Search Batch-Import: BibTex / RIS New Entry in Research Collection Slide by Barbara Hirschmann
  81. 81. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 81 Selection of Access Rights for Full Texts / Data Open access Embargoed ETHZ users Selected users Closed access Publications   Research Data      Slide by Barbara Hirschmann
  82. 82. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 82 Citable DOIs & Possibility to reserve a DOI Slide by Barbara Hirschmann
  83. 83. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 83 Citation Numbers / Altmetrics / Download-Statistics Slide by Barbara Hirschmann
  84. 84. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 84 Linking between Data Set and Publication Slide by Barbara Hirschmann
  85. 85. ||ETH Library / Scientific IT Services  Legal Issues in Open Acess Publishing  Open-Access- and Guidelines of Research Funders (SNSF, EU)  Data Management and Digital Curation  ORCID support 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 85 Advice and Support by ETH Library www.research-collection.ethz.ch Mail: research-collection@library.ethz.ch Tel. 27 222 Slide by Barbara Hirschmann
  86. 86. ||ETH Library / Scientific IT Services Long term preservation of data And how to prepare for it 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 86
  87. 87. ||ETH Library / Scientific IT Services What does long term mean?  Different time horizons and purposes  Keeping data for at least ten years to ensure accountability if results are challenged (as defined in the ETH “Guidelines for Research Integrity”)  Potentially unlimited retention of data with permanent value (e.g. long running series of observational data)  Permanent retention of published data which is considered as part of the scientific record and is expected to remain available just like articles and journals are  In general “long term” signifies any time period which spans technological changes in the way data is being used 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 87
  88. 88. ||ETH Library / Scientific IT Services  Proper data management or its absence determine if preservation of data will be possible  For a period of ten years, data management alone might suffice, but thinking further ahead is useful  If data is to be kept and used for longer periods, further measures apply:  Data should be as self-contained as possible, including documentation of any tools used or better: the tools themselves; remember e.g. including reference outputs for model algorithms  More care is required in the choice and use of file formats 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 88 How does this relate to data management?
  89. 89. ||ETH Library / Scientific IT Services  Open standards (non proprietary)  If proprietary, convert or if not possible include data viewer  Well documented  Widely used and supported by many tools  Uncompressed (or at least losslessly compressed)  Unencrypted  When in doubt, keep original and create a copy in an open or exchange format  Don’t rely on file extensions  Consider that data might be used in different operating systems 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 89 Preferences for file formats
  90. 90. ||ETH Library / Scientific IT Services  Images: uncompressed TIFF; JPEG2000  Text: ASCII, including XML etc.  Add encoding information and dependencies such as stylesheets or TeX-libraries  Text (page based): PDF/A1-b, (PDF)  Data from spreadsheets: CSV  Spreadsheets: (CSV), (ODF, OOXML)  More information: https://documentation.library.ethz.ch/display/DD/File+formats+for+archiving 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 90 Examples
  91. 91. ||ETH Library / Scientific IT Services  This does not mean you «must not» keep data in other formats  Just be aware that proprietary or undocumented formats (even your own!) might cause trouble in the future  Think about adding an alternative format (yes, redundantly) for a proprietary one…  …and add any context information you yourself would like to have on your own formats in a few years time in a readme-file, an accompanying document or as metadata 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 91 Note
  92. 92. ||ETH Library / Scientific IT Services  Digital preservation solution for ETH Zurich, operated by ETH Library  Automatically archives content from Research Collection and also heritage content from ETH University Archives and ETH Library  Handles «Software Disclosure» workflow for ETH transfer  For certain automated use cases, Research Data can also be submitted directly to ETH Data Archive via dedicated interfaces  Data previously organised in docuteam packer will also be submitted to ETH Data Archive  More information: https://www.library.ethz.ch/Digital-Curation 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 92 ETH Data Archive
  93. 93. ||ETH Library / Scientific IT Services Further Services and Trainings 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 93
  94. 94. ||ETH Library / Scientific IT Services IT Services  Storage provisioning (usually via your IT Support Group)  Research data management support www.sis.id.ethz.ch/researchdatamanagement  openBIS Electronic Lab Notebook & Laboratory Information Management System https://openbis-eln-lims.ethz.ch Versioning  Gitlab - gitlab.ethz.ch (hosted by IT services)  SharePoint - mysite.sp.ethz.ch (free up to 1 GB) ETH transfer https://www.ethz.ch/en/the-eth-zurich/organisation/staff-units/eth-transfer.html  Software disclosure workflow with ETH Data Archive  Advice on Intellectual Property, Patents, Licensing of Software etc. 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 94 IT services and ETH transfer
  95. 95. ||ETH Library / Scientific IT Services  Training courses on information research, reference management, data management, scientific writing and open access by the ETH Library: http://www.library.ethz.ch/en/Services/Training-courses-guided-tours  Comprehensive workshop on data management offered by ETH Library in collaboration with Scientific IT Services: see link above or ask for additional dates!  Courses offered by the ETH Information Center for Chemistry/Biology/Pharmacy: http://www.infozentrum.ethz.ch/en/whats-up/events/  Further topics on demand 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 95 Trainings
  96. 96. ||ETH Library / Scientific IT Services  Think about what you do!  Start early  Agree on clean concepts and simple tools  You do not need the latest sophisticated apps – but there are useful tools  Talk to colleagues  Check what your local service providers can offer  «Keep it as simple as possible – but distrust it!» 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 96 Take home message
  97. 97. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 97Source: https://doi.org/10.22010/ethz-exp-0002-en
  98. 98. ||ETH Library / Scientific IT Services Dr. Matthias Töwe Head Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 60 32 matthias.toewe@library.ethz.ch 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 98 Thank you! Questions? Dr. Ana Sesartic Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 73 76 ana.sesartic@library.ethz.ch www.library.ethz.ch/Digital-Curation data-archive@library.ethz.ch Dr. Caterina Barillari ID Scientific IT Services Mattenstrasse 22 4058 Basel 061 387 31 29 caterina.barillari@id.ethz.ch Digital Curation Scientific IT Services https://sis.id.ethz.ch/ sis.helpdesk@ethz.ch Research Data www.ethz.ch/researchdata researchdata@ethz.ch
  99. 99. ||ETH Library / Scientific IT Services 25.10.2017Ana Sesartic / Matthias Töwe / Caterina Barillari 99 We need your… Please fill out the course evaluation form provided – Thank you!

×