Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Management in Research – Why and How?

109 Aufrufe

Veröffentlicht am

What is Data Management and why should it concern you?

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Data Management in Research – Why and How?

  1. 1. ||ETH Library / Scientific IT Services / D-BSSE IT Services Matthias Töwe Caterina Barillari John Ryan Digital Curation Office Scientific IT-Services D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 1 Data Management in Research – Why and How?
  2. 2. ||ETH Library / Scientific IT Services / D-BSSE IT Services  from the  Digital Curation Office at ETH-Library  Scientific IT-Services  D-BSSE IT Services  sharing a scientific background ourselves  here to discuss data management as part of your research  to learn more about your needs in the process  and to motivate you to think critically about the chances and limitations of data management and re-use. 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 2 Nice to meet you, we’re…
  3. 3. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 3 Today’s Programme What is data management and why should it concern you? ETH Regulations • Intellectual property, privacy and access rights Data management plans Active Data Management Coffee Break Data sharing • With project partners & with the community Long term preservation of data
  4. 4. ||ETH Library / Scientific IT Services / D-BSSE IT Services What is data management and why should it concern you? Introduction 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 4
  5. 5. ||ETH Library / Scientific IT Services / D-BSSE IT Services “A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.” Digital Curation Centre 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 5 What is data? Slide adapted from the PrePARe Project
  6. 6. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Manage your data while doing research  Share, publish, preserve your data for others – and for yourself ! 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 6 The two phases of research data management
  7. 7. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 7 Why spend time and effort on this?  Preserve data that cannot be replicated (e.g. observational data)  Avoid redundant data creation/collection  Highlight patterns or connections that might otherwise be missed  To enable data re-use and sharing – even for yourself  Facilitate collaboration  Raise your impact: your data can be cited  To meet funders’ and institutional requirements  SNSF asks for data management plans as of October 2017  EU Horizon 2020 asking for data management plans  Keep work in accordance to good scientific practice, transparency and validity  You may be able to influence the discussion in your community, in your institution and with funders
  8. 8. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 8 What is the issue? To retrieve data and to work with it in a scientifically sound way, comprehensive documentation is required  This often goes beyond what is covered in publications and must be gathered and preserved, too Context Aim Sampling Measurement protocol Hardware / Tools Software Algorithms Errors Implicit knowledge???
  9. 9. ||ETH Library / Scientific IT Services / D-BSSE IT Services ETH regulations and legal issues 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 9
  10. 10. ||ETH Library / Scientific IT Services / D-BSSE IT Services  At the ETH Zurich research is founded on intellectual honesty. Researchers […] are committed to scientific integrity and truthfulness in research and peer review.  For research data, see Art. 11, in particular.  https://doi.org/10.3929/ethz-b-000179298 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 10 Guidelines for Research Integrity
  11. 11. ||ETH Library / Scientific IT Services / D-BSSE IT Services  All Project Members:  adhere to the principles of good scientific practice and the guidelines for Research Integrity at ETH.  All steps of treatment of primary data must be documented in a form appropriate to the discipline and results must be reproducible.  Project Manager / Principal Investigator:  responsible for data management (data collection, storage, data access, compliance with data protection requirements, retention for the period prescribed by the discipline ...).  Ensures that all research project participants are aware of the guidelines.  Determines together with the professor, which departed project members should retain access to the primary data or materials.  See Guidelines for Research Integrity: https://doi.org/10.3929/ethz-b-000179298 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 11 Roles and Responsibilities: Assign on Group Level
  12. 12. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 12 Do you know where your data is and who has access to it? http://fsfe.org/nocloud
  13. 13. ||ETH Library / Scientific IT Services / D-BSSE IT Services  The removal of sensitive data from ETH Zurich (e.g. research data subject to contractual confidentiality with third parties, important ETH Zurich business data such as financial data, personal employee or student data, reports) is not permitted. ETH Zurich must retain access to and control over such data at all times.  The use of cloud and social media services (e.g. Facebook, Google, Dropbox) in research, for exchange with researchers at other universities, or in teaching for exchange with students (lecture folders, etc.) is permitted as long as no sensitive ETH Zurich data are affected and no third party rights, in particular privacy or intellectual property rights, are infringed. Links: https://www.ethz.ch/content/dam/ethz/associates/services/Service/IT-Services/files/broschueren/rechtliches/de/Merkblatt_Cloud_Computing_MA.pdf https://itsecurity.ethz.ch/leaflet_example_cloud_EN.pdf 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 13 Cloud Computing@ ETH Zurich Rules and Regulations © Symbolon from Noun Project
  14. 14. ||ETH Library / Scientific IT Services / D-BSSE IT Services  […] all ETH members […] are required to integrate the general conditions and internal directives into the work process.  In the research context, the project manager plays an active role in guiding and monitoring junior scientists. In particular, he or she is responsible for making sure that everyone involved in the project is aware of the research integrity guidelines.  Junior scientists are given appropriate guidance.  Primary data is carefully archived.  From: https://rechtssammlung.sp.ethz.ch/Dokumente/133en.pdf 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 14 Compliance Guide
  15. 15. ||ETH Library / Scientific IT Services / D-BSSE IT Services https://itsecurity.ethz.ch/en/#/manage_your_data 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 15 Recent Overview – Suitable as Introduction for Students
  16. 16. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 16 Privacy © Hea Poh Lin from Noun Project  People-related data need to be preserved according to Swiss data protection law Federal Act on Research involving Human Beings (https://www.admin.ch/opc/en/classified-compilation/20061313/index.html) Federal Act on Data Protection (https://www.admin.ch/opc/en/classified-compilation/19920153/index.html) Swiss Criminal Code (https://www.admin.ch/opc/en/classified-compilation/19370083/index.html)  Appropriate anonymization might be required  The deletion of individual datasets must be possible at all times  The study subjects need to sign a declaration of consent  More information: ETH Zürich Ethikkommission (German): https://www.ethz.ch/services/de/organisation/gremien-gruppen-kommissionen/ethikkommission.html
  17. 17. ||ETH Library / Scientific IT Services / D-BSSE IT Services 17 Personalized Health Data in Switzerland Secure data management & computing environment:  Multi-factor authentication  Auditable user actions  Data encryption  Usage policy (IT, legal, ethical) User actions:  Authorized system access  Role based data access  Results can be shared, Data Not Ecosystem Requirements  distributed data sources  primary data management at data source  expose anonymized/identified sensitive personal data Ana Sesartic / Matthias Töwe / Caterina Barillari 25.10.2017  ID-SIS will provide trainings in this area starting from 2018
  18. 18. ||ETH Library / Scientific IT Services / D-BSSE IT Services For publications and for data!  Respect the rights of others  Third parties  Individuals you work with  In case of doubt: seek permission even when a CC-licence is assigned  Note that according to ETH law, ETH reserves most immaterial rights in works by its employees. When in doubt, contact ETH transfer (www.transfer.ethz.ch)  Make sure you keep sufficient rights  Eg. for Open Access Publishing (green path)  Eg. with respect to patent applications: ETH transfer (www.transfer.ethz.ch) 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 18 Intellectual Property Rights: What you need to consider Interested? Ask for an in-depth workshop!
  19. 19. ||ETH Library / Scientific IT Services / D-BSSE IT Services Data Management Planning 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 19
  20. 20. ||ETH Library / Scientific IT Services / D-BSSE IT Services A brief plan written at the start of a project and updated during its course to define:  What data will be collected or created?  How the data will be documented and described?  Where the data will be stored?  Who will be responsible for data security and backup?  Which data will be shared and/or preserved?  How the data will be shared and with whom? 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 20 What is a Data Management Plan (DMP)? DMPs are e.g. demanded by: SNSF since October 2017 http://www.snf.ch/en/theSNSF/research- policies/open_research_data/Pages/default.aspx Horizon2020 EU funding programme http://ec.europa.eu/research/participants/data/ref/h2020/grant s_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
  21. 21. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Aim: plan and document the research data life cycle from creation to preservation  Facilitate making data FAIR: Findable – Accessible – Interoperable – Re-usable  At minimum: make underlying data to publications accessible  DMP must be entered in mySNF web form together with proposal and updated as a living document  Final version of the DMP will be moved to P3 grants database  DMP is not part of the scientific assessment, but checked for plausibility by SNSF staff  Additional funding available for costs of enabling access (generally up to 10’000.- CHF)  Advice from the SNSF: comment on any issues not applicable for your project and give reasons  See SNSF’s documentation at http://www.snf.ch/en/theSNSF/research- policies/open_research_data/Pages/default.aspx, also contact ord@snf.ch for questions 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 21 The Data Management Plan (DMP) according to SNSF
  22. 22. ||ETH Library / Scientific IT Services / D-BSSE IT Services  A proposal can only be submitted if a DMP was created  A DMP for SNSF must be created online in mySNF  You cannot upload a DMP created outside of mySNF – except in Lead Agency process, where the DMP has to be uploaded as a PDF version in the data container “other annexes”  Contents of DMP:  Instructions and examples for ETH Zurich: 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 22 How to submit a DMP to SNSF https://www.mysnf.ch http://www.snf.ch/SiteCollectionDocuments/DMP_content_mySNF-form_en.pdf http://www.library.ethz.ch/en/Media/Files/DLCM-template-for-the-SNSF-Data-Management-Plan
  23. 23. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Data Management Checklist by ETH and EPFL  Supports you in the creation of a DMP or in discussing data management in general, even if you don’t need to do it to comply with funders  http://bit.ly/rdmchecklist  DMPOnline  A tool by the UK Digital Curation Centre that helps you create Horizon 2020 compliant data management plans, by answering a questionnaire  https://dmponline.dcc.ac.uk 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 23 What to do for other funders? Collection of DMP examples: http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples
  24. 24. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Self-critical question:  What must data look like to enable us to re-use it with scientific conviction and trust into its quality and correctness?  Is this true for our own data? What is missing?  Tasks for group leaders  Agree on binding rules according to your discipline’s best practices  Define data management responsible (DMR) within the group  Discuss and document rules (in writing) with DMR 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 24 Research Group Policy
  25. 25. ||ETH Library / Scientific IT Services / D-BSSE IT Services Active research data management 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 25
  26. 26. ||ETH Library / Scientific IT Services / D-BSSE IT Services Research workflow in experimental labs Sample preparation Measurement Analysis Publication 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 26 ?
  27. 27. ||ETH Library / Scientific IT Services / D-BSSE IT Services What is active research data management? ARDM is the process of annotating, storing and backing up all data when it is generated, throughout the life-span of a project. Why is it important? 1. Data without context is not very useful, same for experimental descriptions without data. 2. Think about your future self. Will you be able to find/reproduce your data in 6 months, 1 year? Having organized & documented data simplifies the process of writing papers and PhD thesis. 3. Think about others. Would a new lab member be able find/reproduce your data? 4. Always have backup in place! 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 27
  28. 28. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 28 Data types in experimental labs Meeting notes Presentations Literature .... Generic files Descriptions of experimental procedures Protocols List of all materials used in lab (e.g chemicals, bio- samples, equipment) Materials Data generated by processing of raw data Processed dataData generated by measuring instruments Raw data Data generated by analysis of raw/processed data with different software/scripts Analyzed data Scripts/code used for data processing/ analysis Scripts
  29. 29. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 29 Storage options at D-BSSE Personal computer Measuring instrument Local HD • D-BSSE ITSC is the primary contact regarding data storage & backups • ETH IT Services also provides centralized storage solutions and backup which the D-BSSE may make more use of in the future. File server across network NAS For large amounts of data (>100TB) CDS Long term storage of valuable data LTS
  30. 30. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 30 Storage growth at D-BSSE Personal computer Measuring instrument Local HD File server across network NAS For large amounts of data (>100TB) CDS Long term storage of valuable data LTS
  31. 31. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 31 NAS storage growth at D-BSSE over the last 6 years
  32. 32. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 32 Average cost per GB for hard drives source:
  33. 33. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 33 Local HD encryption at D-BSSE Personal computer Local HD • If your Local HD is encrypted, and someone has your laptop, your data and emails are safe. • D-BSSE ITSC is now encrypting all new or repaired Mac laptops with “FileVault”. • We are also planning to implement “BitLocker” on all Windows laptops early next year. • If you would like your disk encrypted, please talk to our ITSC team. We will keep a copy of your recovery keys, in case another authorized person needs access to your data in your absence.
  34. 34. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 34 The “data spread” Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts
  35. 35. ||ETH Library / Scientific IT Services / D-BSSE IT Services What does it take to manage research data? 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 35  Complex process that requires tracking and linking different types of information Materials/samples Protocols Data files (raw analysed, scripts, etc) Title Date Materials Methods Analysis Results Experimental description/notes
  36. 36. ||ETH Library / Scientific IT Services / D-BSSE IT Services 1. Management of materials and samples 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 36  Biological samples  Chemical samples  Materials  Devices How? What?  Not scalable  No sharing  No efficient search  Easy to use  Scalable  Sharing  Search functionality  Require time for set up and maintenance Spreadsheets Database
  37. 37. ||ETH Library / Scientific IT Services / D-BSSE IT Services 2. Management of protocols 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 37  Step by step description of procedure  Experimental/computational parameters (e.g. temperature, time, etc)  Machine used (experimental)  OS, program, version, etc (calculation) How? What?  Not scalable  No sharing  No efficient search  Easy to use  Scalable  Sharing  Search functionality  Require time for set up and maintenance  Scalable  Sharing  Search functionality  Versioning  Not scalable  No sharing  No search  Easy to usePaper notebook Text files Database
  38. 38. ||ETH Library / Scientific IT Services / D-BSSE IT Services 3. Management of research data files 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe Files/folders naming conventions  Structured organization of data  Data is annotated with metadata  Searchable  One central location 38 Raw data Processed data Analyzed data Scripts What? How? MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 DM platform
  39. 39. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Keep stuff together that belongs together  Keep path names short  < 255 characters  File names should  Reflect content and be unique  Use only ASCII characters (no diacritic characters)  No spaces  Lowercase or camel case (LikeThis)  Careful! Not all systems are case sensitive!  UNIX: case sensitive  Win/Mac: mostly case insensitive  Assume that this, THIS and tHiS are the same.  Document your structure and file naming conventions in a README text file 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 39 File organisation tipps  Write dates like this: YYYY-MM-DD © XKCD https://xkcd.com/1179/ For further file and folder organisation tipps, see:  http://www.data.cam.ac.uk/data-management- guide/organising-your-data  http://www.wur.nl/en/Expertise-Services/Data- Management-Support-Hub/Browse-by- Subject/Organising-files-and-folders.htm  http://datalib.edina.ac.uk/mantra/organisingdata/
  40. 40. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 40 MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 A possible structure…
  41. 41. ||ETH Library / Scientific IT Services / D-BSSE IT Services Version control 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 41  Management of changes to documents and computer programs https://subversion.apache.orghttps://github.com https://bitbucket.org https://www.ethz.ch/services/en/it-services/catalogue/web-application-hosting/sharepoint.html (For documents only!) https://gitlab.ethz.ch  ID-SIS provides hands-on trainings on git for code management (info @ https://sis.id.ethz.ch/consulting) How? What? Hosted at ETH Zurich: Cloud based – use with consideration:  Naming convention with numbers (filename_v1, filename_v2, filename_FINAL, etc)  Use versioning tools
  42. 42. ||ETH Library / Scientific IT Services / D-BSSE IT Services 4. Experimental description / notes 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 42  Goals  Materials  Methods  Experimental/computational procedure  Analysis procedures  Results  Links to data What to document? Title Date Materials Methods Analysis Results
  43. 43. ||ETH Library / Scientific IT Services / D-BSSE IT Services Metadata & Standards 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 43  Metadata is the data about your data.  Use of structured metadata facilitates data organization and searches.  Examples of metadata:  Investigator  Date  Title  Description  Several metadata schemas are available. For info check the DCC website  Standards (taxonomies, synonyms, ontologies) are important to guarantee consistency.  General standards:  ISO 8601 for dates (YYYY-MM-DD or YYYYMMDD)  ISO 6709 for latitude/longitude  standards for SI base units (meters, kilograms, etc)  Scientific standards examples  Biology -> Gene ontology, NCBI taxonomy, etc  Physical sciences -> IUPAC, InChI  Earth science and ecology -> USGS Thesaurus, GIS dictionary, etc  Math & computer science -> Mathematics Subject Classification, ACM Computing Classification System
  44. 44. ||ETH Library / Scientific IT Services / D-BSSE IT Services 4. Experimental description / notes 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe Paper laboratory notebook Electronic laboratory notebook (ELN) 44  Goals  Materials  Methods  Experimental/computational procedure  Analysis procedures  Results  Links to data How? What to document? Title Date Materials Methods Analysis Results
  45. 45. ||ETH Library / Scientific IT Services / D-BSSE IT Services Electronic Laboratory Notebooks  ELNs are the digital version of the classic paper laboratory notebooks, used to record experimental lab procedures… and much more!  Several open-source and commercial ELNs are available:  market analysis based on LIMSwiki.org (74 commercials + 11 open source)  There are different types of ELNs:  Generic note keeping applications (e.g. OneNote, Evernote, etc)  Discipline-specific ELNs  Free online cloud-based version  ELN with local installation 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 45
  46. 46. ||ETH Library / Scientific IT Services / D-BSSE IT Services ELN vs. paper notebook 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 46 Advantages of ELN over paper notebook: 1. Sharing 2. Most ELNs have rights management 3. Most ELNs keep track of changes 4. Searching 5. Easier to link digital data 6. No issues with handwriting 7. Can be backed up 8. Can be provided as DM solutions in DMPs Disadvantages of ELNs over paper notebooks: 1. Require change in working mode 2. Have a learning curve When choosing an ELN always make sure you can export your data in a common open format (e.g. xml, html, pdf, .txt, .docx, etc)
  47. 47. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 47 1. Paper lab notebooks ? Exp. description Local HD NAS CDS LTS Generic files Protocols Materials Raw data Processed data Analyzed data Scripts MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 DM platform
  48. 48. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 48 2. Generic note-keeping applications Central local storage Cloud Generic files Protocols ? Local HD NAS CDS LTS Scripts Raw data Processed data Analyze d data Materials Date Title Materials Methods Analysis Results MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 Advantages over paper notebook:  Can be shared  Small files upload DM platform
  49. 49. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 49 3. Scientific online ELNs Generic files Protocols (Materials) Cloud ? Local HD NAS CDS LTS Scripts Raw data Processed data Analyze d data (Materials) Date Title Materials Methods Analysis Results 1. Free online ELNs store data on an external cloud: can you upload your data? 2. Free version only offers limited space, so these should not be considered DM solutions MyPhD Admin Contracts Budget Lab Gear Conference Travel Academic Writing Reviews Proposals Publications Paper 1 Images TeX Src Paper 2 Modelling Source Code Original Modified Input Data Output Data Lab Data Exp. 1 Exp. 2 DM platform
  50. 50. ||ETH Library / Scientific IT Services / D-BSSE IT Services 4. Local installation of ELN+LIMS 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 50 Central local storage Generic files Protocols Materials Raw data Analyzed data Processed data Date Title Materials Methods Analysis Results Scripts ETH
  51. 51. ||ETH Library / Scientific IT Services / D-BSSE IT Services Notebooks for code documentation 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 51 • Open source + commercial edition • Integrated development environment for R IPython Beaker • Open source • Several languages supported (Python, R, Scala, Julia, etc)  Applications that combine documentation, code, input and output generated by the code (e.g. graphs, plots) • Open source • > 40 languages supported (Python, R, Scala, Julia, etc) • Open source • Python only Mathematica • Commercial (Wolfram Research) • Used in scientific, engineering, mathematical fields
  52. 52. ||ETH Library / Scientific IT Services / D-BSSE IT Services openBIS ELN-LIMS (ETH Scientific IT Services) 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 52
  53. 53. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Originally developed for management of life sciences data within SystemsX projects (CISD 2007-2013, ID-SIS since 2013)  In addition to data management, it can also be used as Laboratory Information Management System (LIMS) and Electronic Lab Notebook (ELN)  Currently used in several labs and facilities at ETH, in other Swiss and European universities, and a few companies  Generic underlying structure makes it amenable to be used in other disciplines 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 53 openBIS facts
  54. 54. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 54 openBIS ELN-LIMS in a nutshell Workflow manager (e.g Snakemake) Direct upload + Metadata registration Title Date Materials Methods Analysis Results
  55. 55. ||ETH Library / Scientific IT Services / D-BSSE IT Services openBIS ELN-LIMS features Relationships Storage manager Plasmid mapsBLAST search Import/Export File upload Big data Instrument integration 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 55 https://openbis.elnlims.ch Jupyter integration Ordering
  56. 56. ||ETH Library / Scientific IT Services / D-BSSE IT Services openBIS ELN-LIMS features Users need to authenticate to the system to login. Users can have different roles: Instance admin, Space admin, Space User, Space observer, Instance observer. Data is stored in “datasets”, which are immutable. Modified data must be uploaded in a new dataset. All modifications made to any entity in the system are recorded in the database. Space admins and Instance admins are allowed to delete entities. This permission can be removed upon request. A record of deleted entities can be kept in the database. Datasets can be archived to affordable tape storage when no longer needed. They can be recovered from this storage when needed. It is possible to export the complete lab notebook or parts of it. 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 56 Rights management Audit Trail Deletion Data preservation Data archiving Export https://openbis.elnlims.ch
  57. 57. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 57 openBIS users @ D-BSSE Research groups: • Dittrich • Stelling • Müller • Pantazis • Panke • Schröder • Iber • Reddy Facilities: • GFB • SCU
  58. 58. ||ETH Library / Scientific IT Services / D-BSSE IT Services Remarks on active research data management  There are several options available  There is no “best for all”, it all depends on your research field, data types and research workflow  ARDM requires WORK & TIME, but the time spent on this is an investment for the future! 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 58
  59. 59. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 59
  60. 60. ||ETH Library / Scientific IT Services / D-BSSE IT Services Collaboration – Creative Commons – Data Sharing 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 60
  61. 61. ||ETH Library / Scientific IT Services / D-BSSE IT Services Data sharing 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 61  Data sharing / collaboration with project partners (during the project)  Data sharing with / publishing to the community (after publication of results)  Creative Commons Licenses for third parties
  62. 62. ||ETH Library / Scientific IT Services / D-BSSE IT Services Only conditionally recommended  Data stored in EU/USA  Security regulations only partially fulfilled  Never store sensitive / private data there! Recommended  Data stored in Switzerland  Security regulations fulfilled 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 62 File Sharing tools https://www.dropbox.com https://www.switch.ch/drive/ (not subscribed by ETH Zurich) https://www.switch.ch/filesender https://cifex.ethz.ch/ https://polybox.ethz.ch https://www.wetransfer.com
  63. 63. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 63 Collaborative project management tools https://www.openproject.org http://www.redmine.org https://trello.com https://slack.com https://tagpacker.com https://asana.com Cloud based – use with consideration! @ D-BSSE Hosted at ETH Zurich: https://jira-bsse.ethz.ch
  64. 64. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 64 Collaborative writing tools https://www.overleaf.com https://www.authorea.com https://atlas.oreilly.com https://hypothes.is https://evernote.com http://simplenote.com https://www.onenote.com https://www.ethz.ch/services/en/it- services/catalogue/web-application- hosting/sharepoint.html Hosted at ETH Zurich: Cloud based – use with consideration: Sometimes, on-site installations are also available. https://www.ethz.ch/services/en/it -services/catalogue/web- application-hosting/wiki.html @ D-BSSE https://wiki-bsse.ethz.ch
  65. 65. ||ETH Library / Scientific IT Services / D-BSSE IT Services www.zotero.org 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 65 Reference Management tools www.mendeley.com endnote.com http://www.jabref.org www.citeulike.org www.bibsonomy.org Increasingly cloud based or synchronised with cloud – use with consideration! Your reading can expose your research interests.
  66. 66. ||ETH Library / Scientific IT Services / D-BSSE IT Services Data sharing with the community And how to prepare for it 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 66
  67. 67. ||ETH Library / Scientific IT Services / D-BSSE IT Services “In genomics research, a large-scale analysis of data sharing shows that studies that made data available in repositories received 9% more citations, when controlling for other variables; and that whilst self-reuse citation declines steeply after two years, reuse by third parties increases even after six years.” (Piwowar and Vision, 2013) 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 67 Benefits of Open Data: Impact and Longevity Van den Eynden, V. and Bishop, L. (2014). Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, http://repository.jisc.ac.uk/5662/1/KE_report- incentives-for-sharing-researchdata.pdf
  68. 68. ||ETH Library / Scientific IT Services / D-BSSE IT Services www.dcc.ac.uk/resources/how-guides/license-research-data 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 68 Licensing research data Outlines pros and cons of each approach and gives practical advice on how to implement your licence CREATIVE COMMONS LIMITATIONS NC Non-Commercial What counts as commercial? SA Share Alike Reduces interoperability ND No Derivatives Severely restricts use Horizon 2020 guidelines point to OR
  69. 69. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 69 share-alike by non-derivative Some rights reserved share non-commercial public domainremix
  70. 70. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 70 Data should be FAIR CC-BY-SA 4.0 Sangya Pundir https://upload.wikimedia.org/wikipedia/commons/a/aa/FAIR_data_principles.jpg
  71. 71. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 71 Deposit in a repository – but in which one? http://databib.org http://service.re3data.org/search
  72. 72. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Where will your data reside?  Which legislation applies, e.g. in terms of data protection?  Is the service sustainable?  Do you trust the provider?  Who else can access and use which of your data?  How can you get your data back?  Is a certain license required?  Are there immediate or longer term costs? 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 72 Criteria for chosing services and tools © Jorgen Stamp
  73. 73. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 73 Data Repositories http://www.re3data.org http://datadryad.org https://zenodo.org http://figshare.com https://www.openaire.eu/search/data-providers(Only partially recommendable as according to their Terms of Use, figshare is allowed to delete data anytime and without notice.) Registries of existing repositories – not hosting data themselves:
  74. 74. ||ETH Library / Scientific IT Services / D-BSSE IT Services  New one-stop-shop for depositing research output ETH Research Collection (https://www.research-collection.ethz.ch)  Publications, Research Data  Web upload, DOI-reservation and registration, ORCID, Export to OpenAire…  Long term preservation in ETH Data Archive (http://www.library.ethz.ch/Digital-Curation)  Metadata is always public, access to content may be delayed or restricted  Aligned with FAIR principles (Findable – Accessible – Interoperable – Re-usable) according to SNSF guidelines. 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 74 ETH Research Collection
  75. 75. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 75 Registry of Publications / University Bibliography Web pages (AEM) Annual Academic Achievements Slide by Barbara Hirschmann
  76. 76. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Primary Publication of Reports, Presentations, Dissertations etc.  Secondary Publication of scientific papers (Green Road to Open Access) 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 76 Open Access Repository Publisher’s version Open Access version Slide by Barbara Hirschmann
  77. 77. ||ETH Library / Scientific IT Services / D-BSSE IT Services • Publication of Research Data as Supplementary Material or stand alone • Access limited to selected users • Deposit for preservation only • All file formats permitted • Retention periods: 10 years / 15 years / unlimited 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 77 Research Data Repository Slide by Barbara Hirschmann
  78. 78. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 78 3 Ways for Importing Data Manual Entry Web of Science / Scopus: daily data export Input form DOI- Search Batch-Import: BibTex / RIS New Entry in Research Collection Slide by Barbara Hirschmann
  79. 79. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 79 Selection of Access Rights for Full Texts / Data Open access Embargoed ETHZ users Selected users Closed access Publications   Research Data      Slide by Barbara Hirschmann
  80. 80. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 80 Citable DOIs & Possibility to reserve a DOI Slide by Barbara Hirschmann
  81. 81. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 81 Citation Numbers / Altmetrics / Download-Statistics Slide by Barbara Hirschmann
  82. 82. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 82 Linking between Data Set and Publication Slide by Barbara Hirschmann
  83. 83. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Legal Issues in Open Acess Publishing  Open-Access- and Guidelines of Research Funders (SNSF, EU)  Data Management and Digital Curation  ORCID support 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 83 Advice and Support by ETH Library www.research-collection.ethz.ch Mail: research-collection@library.ethz.ch Tel. 27 222 Slide by Barbara Hirschmann
  84. 84. ||ETH Library / Scientific IT Services / D-BSSE IT Services Long term preservation of data And how to prepare for it 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 84
  85. 85. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 85 Time scales of different storage solutions Short term Few years, up to 10 10 years to permanent Everyday storage on NAS + backup:  3 months to detect data loss before overwrite Storage on LTS:  Immutable storage (secondary copy does not depend on primary)  It is up to you to keep track of what is where  Satisfies requirements for being able to present data on demand ETH Data Archive via Research Collection or API:  Storage on NAS/HSM or LTS, both with replication  Retention for 10 or 15 years or permanently  Metadata required, citable with DOI, format analysis  Satisfies requirements for data sharing, publishing, and preservation
  86. 86. ||ETH Library / Scientific IT Services / D-BSSE IT Services What does long term mean?  Different time horizons and purposes  Keeping data for at least ten years to ensure accountability if results are challenged (as defined in the ETH “Guidelines for Research Integrity”)  Potentially unlimited retention of data with permanent value (e.g. long running series of observational data)  Permanent retention of published data which is considered as part of the scientific record and is expected to remain available just like articles and journals are  In general “long term” signifies any time period which spans technological changes in the way data is being used 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 86
  87. 87. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Proper data management or its absence determine if preservation of data will be possible  For a period of ten years, data management alone might suffice, but thinking further ahead is useful  If data is to be kept and used for longer periods, further measures apply:  Data should be as self-contained as possible, including documentation of any tools used or better: the tools themselves; remember e.g. including reference outputs for model algorithms  More care is required in the choice and use of file formats 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 87 How does this relate to data management?
  88. 88. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Open standards (non proprietary)  If proprietary, convert or if not possible include data viewer  Well documented  Widely used and supported by many tools  Uncompressed (or at least losslessly compressed)  Unencrypted  When in doubt, keep original and create a copy in an open or exchange format  Don’t rely on file extensions  Consider that data might be used in different operating systems 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 88 Preferences for file formats
  89. 89. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Images: uncompressed TIFF; JPEG2000  Text: ASCII, including XML etc.  Add encoding information and dependencies such as stylesheets or TeX-libraries  Text (page based): PDF/A1-b, (PDF)  Data from spreadsheets: CSV  Spreadsheets: (CSV), (ODF, OOXML)  More information: https://documentation.library.ethz.ch/display/DD/File+formats+for+archiving 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 89 Examples
  90. 90. ||ETH Library / Scientific IT Services / D-BSSE IT Services  This does not mean you «must not» keep data in other formats  Just be aware that proprietary or undocumented formats (even your own!) might cause trouble in the future  Think about adding an alternative format (yes, redundantly) for a proprietary one…  …and add any context information you yourself would like to have on your own formats in a few years time in a readme-file, an accompanying document or as metadata  If possible, try to retain executables required to open such files and document their dependencies (system and hardware requirements…) 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 90 Note
  91. 91. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Digital preservation solution for ETH Zurich, operated by ETH Library  Automatically archives content from Research Collection and also heritage content from ETH University Archives and ETH Library  Handles «Software Disclosure» workflow for ETH transfer  For certain automated use cases, Research Data can also be submitted directly to ETH Data Archive via API  Data previously organised in docuteam packer will also be submitted to ETH Data Archive  More information: https://www.library.ethz.ch/Digital-Curation 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 91 ETH Data Archive via API
  92. 92. ||ETH Library / Scientific IT Services / D-BSSE IT Services Further Services and Trainings 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 92
  93. 93. ||ETH Library / Scientific IT Services / D-BSSE IT Services ETH transfer https://www.ethz.ch/en/the-eth-zurich/organisation/staff-units/eth-transfer.html  Software disclosure workflow with ETH Data Archive  Advice on Intellectual Property, Patents, Licensing of Software etc. Statistical Consulting Service @ D-MATH https://www.math.ethz.ch/sfs/consulting.html  Consulting service and contractual data analysis https://www.math.ethz.ch/sfs/consulting/consulting-service.html  Statistics and software courses https://www.math.ethz.ch/sfs/consulting/software-courses.html 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 93 ETH transfer
  94. 94. ||ETH Library / Scientific IT Services / D-BSSE IT Services Trainings  Training courses on information research, reference management, data management, scientific writing, open access and more by the ETH Library – or «Book a Librarian»: http://www.library.ethz.ch/en/Services/Training-courses-guided-tours  Comprehensive workshop on data management offered by ETH Library in collaboration with Scientific IT Services: see link above or ask for additional dates!  SIS trainings and courses (e.g. openBIS, Python, bioinformatics, etc, etc): https://sis.id.ethz.ch/consulting/  Courses offered by the ETH Information Center for Chemistry/Biology/Pharmacy: http://www.infozentrum.ethz.ch/en/whats-up/events/  Further topics on demand 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 94
  95. 95. ||ETH Library / Scientific IT Services / D-BSSE IT Services  Think about what you do!  Start early  Agree on clean concepts and simple tools  You do not need the latest sophisticated apps – but there are useful tools  Talk to colleagues  Check what your local service providers can offer  «Keep it as simple as possible – but distrust it!» 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 95 Take home message
  96. 96. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 96Source: https://doi.org/10.22010/ethz-exp-0002-en
  97. 97. ||ETH Library / Scientific IT Services / D-BSSE IT Services Dr. Matthias Töwe Head Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 60 32 matthias.toewe@library.ethz.ch 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 97 Thank you! Questions? Dr. Ana Sesartic Digital Curation ETH-Bibliothek Rämistrasse 101 8092 Zurich 044 632 73 76 ana.sesartic@library.ethz.ch www.library.ethz.ch/Digital-Curation data-archive@library.ethz.ch Dr. Caterina Barillari ID Scientific IT Services Mattenstrasse 26 4058 Basel 061 387 31 29 caterina.barillari@id.ethz.ch Digital Curation Scientific IT Services https://sis.id.ethz.ch/ sis.helpdesk@ethz.ch Research Data www.ethz.ch/researchdata researchdata@ethz.ch Dr. Vernon Bailey Head IT Services and Consulting Mattenstrasse 26 4058 Basel 061 387 33 88 vernon.bailey@bsse.ethz.ch D-BSSE IT Services https://www.bsse.ethz.ch/department/adminis tration-services/it-services.html helpdesk@bsse.ethz.ch John Ryan IT Services and Consulting Mattenstrasse 26 4058 Basel 061 387 31 55 john.ryan@bsse.ethz.ch
  98. 98. ||ETH Library / Scientific IT Services / D-BSSE IT Services 30.11.2017Caterina Barillari / John Ryan / Ana Sesartic / Matthias Töwe 98 We need your… Please fill out the course evaluation form provided – Thank you!

×