Presentations, exercises and discussion of the following topics:
- General requirements of research Data management
- Guidelines and responsibilities
- Data management plans (DMP) for the Swiss national Science Foundation (SNSF)
- Data management in practice
- Prerequisites for re-use
- Useful services and tools
- Exchange of experiences, methods and tools
1. ||ETH Library / Scientific IT Services
Ana Sesartic Petrus Caterina Barillari
Research Data Management and Digital Curation Scientific IT Services
31.10.2018Ana Sesartic Petrus / Caterina Barillari 1
Data Management in Research – Why and How?
http://hdl.handle.net/20.500.11850/296468
2. ||ETH Library / Scientific IT Services
From the
Research Data Management and Digital Curation
Team at ETH Library, and the Scientific IT Services,
both at ETH Zurich
Sharing a scientific background ourselves
Here to discuss data management as part of your
research
To learn more about your needs in the process
And to motivate you to think critically about the chances
and limitations of data management and re-use
31.10.2018Ana Sesartic Petrus / Caterina Barillari 2
Nice to meet you, we’re..
http://www.library.ethz.ch/Digital-Curation
https://sis.id.ethz.ch
3. ||ETH Library / Scientific IT Services
Let’s get to know you a bit better…
«Cross the line»
31.10.2018Ana Sesartic Petrus / Caterina Barillari 3
4. ||ETH Library / Scientific IT Services
What we are going to do…
What is data management
and why should it concern
you?
Regulations, intellectual
property, privacy and
access rights
Data Management Planning
Long-term preservation
31.10.2018Ana Sesartic Petrus / Caterina Barillari 4
Data sharing
Active Data Management
5. ||ETH Library / Scientific IT Services
What is data management and why should it concern you?
An introduction
31.10.2018Ana Sesartic Petrus / Caterina Barillari 5
6. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 6
What is data?
“A reinterpretable representation of information in a formalized manner suitable for communication,
interpretation, or processing.”
Digital Curation Centre, UK
Slide adapted from
the PrePARe Project
7. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 8
The data life cycle
Data processing
Data preservation
Publication and
preservation: annotate,
share, publish, preserve
data at the end of the
project/publication
Active data management:
annotate, store, backup data
while it is produced
8. ||ETH Library / Scientific IT Services
Preserve data that cannot be replicated
(e.g. observational data)
Avoid redundant data creation/collection
Highlight patterns or connections that
might otherwise be missed
Enable data re-use and sharing – even for
yourself
Facilitate collaboration
Raise your impact: your data can be cited
Meet funders’ and institutional requirements
SNF asks for data management plans
as of October 2017
EU Horizon 2020 asking for data management plans
Keep work in accordance to good scientific
practice, transparency and validity
You may be able to influence the
discussion in your community, in your
institution and with funders
31.10.2018Ana Sesartic Petrus / Caterina Barillari 9
Why spend time and effort on this?
Your benefit Your duty
9. ||ETH Library / Scientific IT Services
Regulations, intellectual property, privacy and access rights
An overview
31.10.2018Ana Sesartic Petrus / Caterina Barillari 10
10. ||ETH Library / Scientific IT Services
https://itsecurity.ethz.ch/en/#/manage_your_data
31.10.2018Ana Sesartic Petrus / Caterina Barillari 11
Recent overview
11. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 12
What you (should have) received… …at the beginning of
Your studies
Your PhD
Your employment
at ETH Zurich
12. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 13
Compliance Guide
[…] all ETH members […] are required to integrate the
general conditions and internal directives into the work
process.
In the research context, the project manager plays an
active role in guiding and monitoring junior scientists.
In particular, he or she is responsible for making sure that
everyone involved in the project is aware of the research
integrity guidelines.
Junior scientists are given appropriate guidance.
Primary data is carefully archived.
From:
https://rechtssammlung.sp.ethz.ch/Dokumente/133en.pdf
13. ||ETH Library / Scientific IT Services
Compliance Guide
At the ETH Zurich research is founded on intellectual
honesty. Researchers […] are committed to scientific
integrity and truthfulness in research and peer review.
For research data, see Art. 11, in particular.
From
https://doi.org/10.3929/ethz-b-000179298
31.10.2018Ana Sesartic Petrus / Caterina Barillari 14
14. ||ETH Library / Scientific IT Services
Project Members:
adhere to the principles of good scientific practice and the guidelines for Research Integrity at ETH.
All steps of treatment of primary data must be documented in a form appropriate to the discipline and results
must be reproducible.
Project Manager:
responsible for data management (data collection, storage, data access,
compliance with data protection requirements,
retention for the period prescribed by the discipline ...).
Ensures that all research project participants are aware of the guidelines.
Determines together with the professor, which departed project members should
retain access to the primary data or materials.
31.10.2018Ana Sesartic Petrus / Caterina Barillari 15
Roles and responsibilities
15. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 16
Do you know where your data is and who has access to it?
http://fsfe.org/nocloud
18. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 19
Security Landscape: Norms are Shaping Best Practices
ETH Zurich
Acceptable Use
Policy for Telematics
Resources (BOT)
ISO 27001 (ISO
27799)
Federal Act on Data
Protection
Federal Act on
Research involving
Human Being
Swiss Criminal Law
EU General Data
Protection
Regulation
Health Insurance
Portability and
Accountability Act
(HIPAA)
Professional Code
FMH (Swiss Medical
Association)
ACM Code of Ethics
and Professional
Conduct
Recommendations
ETH Ethics Group
“Biomedical Big
Data"
Best Practices
ETH Zurich
• Laws and binding regulations
• ISO standards
• Code of conduct
• Reccomendations
• Best practices
19. ||ETH Library / Scientific IT Services 20
Personalized Health Data in Switzerland
Secure data management &
computing environment:
Multi-factor authentication
Auditable user actions
Data encryption
Usage policy (IT, legal, ethical)
User actions:
Authorized system access
Role based data access
Results can be shared, Data not
Ecosystem Requirements
Distributed data sources
Primary data management at data source
Expose anonymized/identified sensitive personal data
Ana Sesartic Petrus / Caterina Barillari 31.10.2018
ID-SIS provides trainings in this area
20. ||ETH Library / Scientific IT Services
For publications and for data!
Respect the rights of others
Third parties
Individuals you work with
In case of doubt: seek permission even when a CC-licence is assigned
Note that according to ETH law, ETH reserves most immaterial rights in works by its
employees. When in doubt, contact ETH transfer (www.transfer.ethz.ch)
Make sure you keep sufficient rights
E.g. for Open Access Publishing (green path)
E.g. with respect to patent applications: ETH transfer (www.transfer.ethz.ch)
31.10.2018Ana Sesartic Petrus / Caterina Barillari 21
Intellectual Property Rights: What you need to consider
21. ||ETH Library / Scientific IT Services
What’s next?
What is data management
and why should it concern
you?
Regulations, intellectual
property, privacy and
access rights
Data Management Planning
31.10.2018Ana Sesartic Petrus / Caterina Barillari 22
22. ||ETH Library / Scientific IT Services
Data Management Planning
What? Why? How?
31.10.2018Ana Sesartic Petrus / Caterina Barillari 23
23. ||ETH Library / Scientific IT Services
A brief plan written at the start of a project and updated during its course to define:
What data will be collected or created?
How will the data be documented and described?
Where will the data be stored?
Who will be responsible for data security and backup?
Which data will be shared and/or preserved?
How will the data be shared and with whom?
31.10.2018Ana Sesartic Petrus / Caterina Barillari 24
What is a Data Management Plan (DMP)?
DMPs are e.g. demanded by:
SNSF since October 2017
http://www.snf.ch/en/theSNSF/research-
policies/open_research_data/Pages/default.aspx
Horizon2020 EU funding programme
http://ec.europa.eu/research/participants/data/ref/h2020/grant
s_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
24. ||ETH Library / Scientific IT Services
Goal of the SNSF:
Research data should be freely accessible to everyone – for scientists as well as for the
general public.
Article 47 of the Funding Regulations
(1 Jan 2016, http://www.snf.ch/SiteCollectionDocuments/allg_reglement_16_e.pdf):
“[…] the data collected with the aid of an SNSF grant must also be made available to other researchers for
further research and integrated into recognised scientific data pools […]”
→ A data management plan is just one of the tools to reach this goal
Please also be aware of SNSF’s updated Open Access Policy for Publications and changes to the General implementation
regulations for the Funding Regulations! http://www.snf.ch/en/theSNSF/research-policies/open-access/
31.10.2018Ana Sesartic Petrus / Caterina Barillari 25
SNSF policy on Open Research Data
25. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 26
Aims of the DMP according to SNSF
Planning and documenting
the life cycle of data
In the ideal case, you only
need to document your
current practice / best
practice in your field
Making data FAIR:
Findable
Accessible
Interoperable
Re-usable
Updating the plan
as the project progresses
Offering a long-term perspective
by outlining how the data will be:
Generated
Collected
Documented
Shared / Published
Preserved
26. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 27
Making research data FAIR
CC-BY-SA 4.0 Sangya Pundir
https://upload.wikimedia.org/wikipedia/commons/a/aa/FAIR_data_principles.jpg
The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, Issue 3,
2016. 10.1038/sdata.2016.18.
27. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 28
The FAIR data principles matrix
Findable
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
Accessible
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
Interoperable
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
Reusable
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
https://www.go-fair.org/fair-principles
How do you know if your data is FAIR?
28. ||ETH Library / Scientific IT Services
Collection of SNSF information on Open Research Data including FAQ:
http://www.snf.ch/en/theSNSF/research-policies/open_research_data/
SNSF’s explanation of the DMP expected content:
http://www.snf.ch/SiteCollectionDocuments/DMP_content_mySNF-form_en.pdf
Guidance for ETH researchers on filling out SNSF Data Management Plans:
https://documentation.library.ethz.ch/display/DD/Guidance+for+ETH+researche
rs+on+filling+out+SNSF+Data+Management+Plans
Includes:
explanations per question, examples from DMPs, contacts and links specific for ETH
Zurich
31.10.2018Ana Sesartic Petrus / Caterina Barillari 29
Information to support you
29. ||ETH Library / Scientific IT Services
A proposal can only be submitted if a DMP
was created
A DMP for SNSF must be created online
in mySNF
You cannot upload a DMP created outside
of mySNF – except in Lead Agency process
Contents of DMP:
31.10.2018Ana Sesartic Petrus / Caterina Barillari 30
How to submit a DMP to SNSF
https://www.mysnf.ch
http://www.snf.ch/SiteCollectionDocuments/DMP_content_mySNF-form_en.pdf
30. ||ETH Library / Scientific IT Services
The DMP is assessed by SNSF staff for plausibility
and compliance with its Open Research Data policy
It is not sent to external reviewers
Applicants can be assigned «tasks» for enhancing
their DMP as part of the funding decision
DMP Guidelines for researchers
http://www.snf.ch/en/theSNSF/research-
policies/open_research_data/Pages/data-management-plan-
dmp-guidelines-for-researchers.aspx
31.10.2018Ana Sesartic Petrus / Caterina Barillari 31
Assessment of the DMP
31. ||ETH Library / Scientific IT Services
Consider your first DMP version as a draft to be adapted later
SNSF is aware of limitations: not everything applies to everyone – give reasons
SNSF needs feedback from your research practice!
Please get involved with your colleagues:
What do you consider as best practice in your field?
SNSF offers financial support for community workshops via the funding scheme «Scientific
Exchanges» (http://www.snf.ch/en/funding/science-communication/scientific-exchanges/)
If you encounter difficulties or have comments, suggestions, questions:
Please contact the SNSF via ord@snf.ch
31.10.2018Ana Sesartic Petrus / Caterina Barillari 32
Send feedback to SNSF
32. ||ETH Library / Scientific IT Services
Data Management Checklist by ETH and EPFL
Supports you in the creation of a DMP or in discussing
data management in general, even if you don’t need to
do it to comply with funders
https://documentation.library.ethz.ch/display/DD/Data+
Management+Checklist
Collection of DMP examples
http://www.dcc.ac.uk/resources/data-management-
plans/guidance-examples
H2020 Information by EU GrantsAccess
http://grantsaccess.ethz.ch/en/servicesupport/
uzh-eth-zurich-support/open-access-
publications-data/
DMPOnline
A tool by the UK Digital Curation Centre that
helps you create Horizon 2020 compliant data
management plans, by answering a
questionnaire
https://dmponline.dcc.ac.uk
31.10.2018Ana Sesartic Petrus / Caterina Barillari 33
What to do for other funders?
33. ||ETH Library / Scientific IT Services
Self-critical questions:
What must data look like to enable us to re-use it with
scientific conviction and trust into its quality and correctness?
Is this true for our own data? What is missing?
Tasks for group leaders
(not required but recommended):
Agree on binding rules
Define data management responsible
(DMR) within the group
Discuss and document rules (in writing) with DMR
31.10.2018Ana Sesartic Petrus / Caterina Barillari 34
Research Group Policy
34. ||ETH Library / Scientific IT Services
Post it!
31.10.2018Ana Sesartic Petrus / Caterina Barillari 35
Current practices in data
management
Naming
conventions
Versioning
ELN Sharing
Current practices in data management
What are your best practices?
35. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 36
Current practices in data management
Naming conventions:
Do you have any? Which rules apply?
Versioning:
How do you currently handle it? What works well? What went wrong?
Electronic Laboratory Notebooks (ELN):
Do you have experience with any?
Sharing:
Which tools or services do you use? What are your experiences?
36. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 37
Time for a
break…
37. ||ETH Library / Scientific IT Services
Active research data management
31.10.2018Ana Sesartic Petrus / Caterina Barillari 38
38. ||ETH Library / Scientific IT Services
Research workflow in experimental & computational labs
Sample
preparation
Analysis
Publication
31.10.2018Ana Sesartic Petrus / Caterina Barillari 39
Measurement
Processing
Data collection
39. ||ETH Library / Scientific IT Services
What does it take to manage research data?
31.10.2018Ana Sesartic Petrus / Caterina Barillari
Complex process that requires tracking and linking different types of information
Materials/samples
Protocols/ SOPs
Raw data
Processed
data
Results
Code
Analysis
notebooks
Model
Title
Date
Materials
Methods
Analysis
Results
Experimental
description/notes
40
40. ||ETH Library / Scientific IT Services
Results Analysis
notebooks
31.10.2018Ana Sesartic Petrus / Caterina Barillari
A common scenario @ ETHZ
Experimental
description/notes
41
Materials/samples Code
Raw data
Processed
data
Protocols/ SOPs
Model
Local HD
NAS CDS LTS
Cluster Cloud
41. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
How can we take care of the individual components and
how can we bring things together?
42
42. ||ETH Library / Scientific IT Services
Management of materials and samples
31.10.2018Ana Sesartic Petrus / Caterina Barillari
Biological samples
Chemical samples
Materials
Devices
….
How?
What?
Not scalable
No sharing
No efficient search
Easy to use
Scalable
Sharing
Search functionality
Require time for set up and
maintenance
Spreadsheets
Databases
43
43. ||ETH Library / Scientific IT Services
Management of protocols
31.10.2018Ana Sesartic Petrus / Caterina Barillari
Step by step description of procedure
Experimental/computational parameters (e.g. temperature, time, etc.)
Machine used (experimental)
OS, program, version, etc. (calculation)
How?
What?
Not scalable
No sharing
No efficient search
Easy to use
Scalable
Sharing
Search functionality
Require time for set
up and maintenance
Scalable
Sharing
Search functionality
Versioning
Not scalable
No sharing
No search
Easy to usePaper
notebook
Text files
ELN/
LIMS
44
44. ||ETH Library / Scientific IT Services
Management of research data files
31.10.2018Ana Sesartic Petrus / Caterina Barillari
Files/folders naming conventions
45
What?
How?
MyPhD
Admin
Contracts
Budget
Lab Gear
Conference
Travel
Academic
Writing
Reviews
Proposals
Publications
Paper 1
Images
TeX Src
Paper 2
Modelling
Source Code
Original
Modified
Input Data
Output Data
Lab Data
Exp. 1
Exp. 2
Data management
platform
Results Analysis
notebooksRaw data
Processed
data
Model
46. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
File & folder organisation
47
My PhD
Research
Experiment
Exp1
Raw
Processed
Exp2
Raw
Processed
Analysis
Code
Results
Simulation
Code
Results
Writing
Paper1
Paper2
Review
Admin
Grants
SNF
H2020
Travel
Conf1
Conf2
Orders
Lab
Office
Example hierarchy for an individual project
47. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
File & folder organisation
48
Smith lab
Projects
P1_Title
Exp1
Raw
Processed
Exp2
Raw
Processed
Analysis
Code
Results
P2_Title
Exp1
Exp2
Analysis
Protocols
Cloning
Imaging
Biochemistry
Grants
SNF
H2020
Presentations
2016
2017
2018
Photos
2017-
06_Retreat
2016-
12_XmasDinner
Example hierarchy for a research group
48. ||ETH Library / Scientific IT Services
Data management software
System that allows structured organization of data
Data is described by metadata
Searchable, scalable, flexible
Accessible by anyone with sufficient rights
Back up procedures for data are put in place
4931.10.2018Ana Sesartic Petrus / Caterina Barillari
49. ||ETH Library / Scientific IT Services
Metadata & Standards
31.10.2018Ana Sesartic Petrus / Caterina Barillari 50
Metadata is the data about your data
Use of structured metadata facilitates data
organization and searches
Examples of metadata:
Investigator
Date
Title
Description
Several metadata schemas are available. For info
check the DCC website
Standards (taxonomies, synonyms, ontologies)
are important to guarantee consistency
General standards:
ISO 8601 for dates (YYYY-MM-DD or
YYYYMMDD)
ISO 6709 for latitude/longitude
standards for SI base units (meters, kilograms,
etc.)
Scientific standards examples
Biology -> Gene ontology, NCBI taxonomy, etc.
Physical sciences -> IUPAC, InChI
Earth science and ecology -> USGS Thesaurus,
GIS dictionary, etc.
Math & computer science -> Mathematics Subject
Classification, ACM Computing Classification
System
50. ||ETH Library / Scientific IT Services
Code management
31.10.2018Ana Sesartic Petrus / Caterina Barillari 51
Software tools specialized on managing and documenting changes to source code over time
Used for managing large code bases
They are the standard in professional software development
https://subversion.apache.org https://github.comhttps://gitlab.ethz.ch
ID-SIS provides hands-on trainings on git for code management (info @ https://sis.id.ethz.ch/consulting)
Tools
Version control systems
Many journals require code availability after publication and during review (see Nature 555, 142)
51. ||ETH Library / Scientific IT Services
Code management
31.10.2018Ana Sesartic Petrus / Caterina Barillari 52
Applications that combine documentation, code, input and output generated by the code, e.g. graphs,
plots (Nature 515, 151–152, 06 Nov 2014)
Very useful for exploratory data analysis
• Open source
• > 40 languages supported
(Python, R, Julia, Matlab,
IDL, etc.)
• Open source + commercial edition
• Integrated development
environment for R
Interactive notebooks
• Commercial
• Used in scientific, engineering,
mathematical fields
52. ||ETH Library / Scientific IT Services
Containers
31.10.2018Ana Sesartic Petrus / Caterina Barillari 53
Reproducibility in running a given software is often an issue
Dependency on specific libraries and versions, runtime environment,
settings, etc.
A container image is an executable package of a piece of software that
includes everything needed to run it: code, runtime, system tools, system
libraries, settings (https://www.docker.com/what-container)
https://www.docker.com
53. ||ETH Library / Scientific IT Services
Experimental description / notes
31.10.2018Ana Sesartic Petrus / Caterina Barillari
Paper laboratory notebook Electronic laboratory notebook (ELN)
Goals
Materials
Methods
Experimental/computational procedure
Analysis procedures
Results
Links to data
How?
What to document?
54
Title
Date
Materials
Methods
Analysis
Results
54. ||ETH Library / Scientific IT Services
ELNs vs. paper notebook
31.10.2018Ana Sesartic Petrus / Caterina Barillari 55
Advantages of ELNs over paper notebooks:
1. Sharing
2. Most ELNs have rights management
3. Most ELNs keep track of changes
4. Searching
5. Easier to link digital data
6. No issues with handwriting
7. Can be backed up
Disadvantages of ELNs over paper notebooks:
1. Require change in working mode
2. Have a learning curve
When choosing an ELN always make sure you can export your data in a common open format (e.g.
xml, html, pdf, .txt, .docx, etc.)
55. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
ELN types
Discipline-specific
applications
cloud-based
self-hosted
Generic note-
keeping applications
cloud-based
self-hosted
56
Some systems only allow partial information management
Some systems offer an all-in-one solution for RDM
Results
Analysis
notebook
s
Experimental
description/no
tes
Materials/sam
ples
Code
Raw data
Processe
d data
Protocols/
SOPs
Model
Local HD
NAS CDS LTS
Cluster Cloud
56. ||ETH Library / Scientific IT Services
The ETH Scientific IT Services data management solution
31.10.2018Ana Sesartic Petrus / Caterina Barillari 57
57. ||ETH Library / Scientific IT Services
openBIS facts
31.10.2018Ana Sesartic Petrus / Caterina Barillari 58
Summer
2007
• openBIS
development
start
(SystemsX)
April 2008
• first openBIS
release
(v08.04)
Summer
2009
• SystemsX
projects start
using openBIS
Summer
2013
• openBIS ELN-
LIMS UI start
Spring
2014
• first ELN-LIMS
beta version
May 2015
• first
downloadable
ELN-LIMS
plugin
May 2016
• first ELN-
LIMS official
release
May 2017
• BigDataLink
v.1
December
2017
• JupyterHub
integration
Platform for managing
scientific information and
supporting research data
workflows from “bench” to
publication
Can be used in most
quantitative science fields
(e.g. life sciences,
physics, env. sciences,
etc)
Used by research groups
and facilities @ ETH,
Swiss & European
Universities, a few
companies
58. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
openBIS in a nutshell
59
Workflow
manager (e.g.
Snakemake)
openBIS is a solution for research labs
Title
Date
Materials
Methods
Analysis
Results
59. ||ID Scientific IT Services 31.10.2018 60Ana Sesartic Petrus / Caterina Barillari
60. ||ETH Library / Scientific IT Services
openBIS ELN-LIMS features
Relationships
Storage manager
Import/Export
openBIS features
31.10.2018Ana Sesartic Petrus / Caterina Barillari 61
Ordering
https://labnotebook.ch
x
User rights
management
Long term
storage (tape)
Audit trail
Data
preservation
61. ||ETH Library / Scientific IT Services
openBIS ELN-LIMS features
Plasmid mapsBLAST search
openBIS features for life sciences
31.10.2018Ana Sesartic Petrus / Caterina Barillari 62
https://labnotebook.ch
User rights
management
62. ||ETH Library / Scientific IT Services
Standard openBIS-based service for research groups:
Provide openBIS to research groups :
central instance (ETH Research Data Hub)
private group instances
Initial training
Continuous support
Prefilled DMP template for openBIS users
(https://www.ethz.ch/services/en/service/a-to-z/
research-data/data-management-planning.html)
31.10.2018Ana Sesartic Petrus / Caterina Barillari
openBIS as a service from ID-SIS at ETH
From 2018, SIS has the mandate to provide services in active data management to ETH
researchers
As part of this mandate, we offer services based on openBIS
63
63. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
ETH Research Data Hub (ETH RDH)
A central openBIS instance, ETH RDH, is available to ETH research groups
(https://www.ethz.ch/services/en/it-services/catalogue/software-business-
applications/research-data-hub.html).
Only storage costs have to be covered by research groups:
First 100GB free
Up to 1 TB: reduced rate
LTS (i.e. tapes): regular rates
ETH RDH is suitable for use by groups that:
Do not require much customization
Do not need to store sensitive data (e.g. patient data)
Do not have a high data volume (<50.000 objects/datasets)
64
64. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari
ETH Research Data Hub (ETH RDH)
Access can be requested via the IT shop (https://itshop.ethz.ch)
Request must be approved by a fund owner (usually PI).
An admin must be nominated in the lab. The admin will be able to do some minimal customization.
Trainings will be provided by SIS
65
A launch event will take place on November 7th, HG E21, 13:30-14:30
https://www.ethz.ch/services/en/service/a-to-z/research-data/active-
data-management/resources-for-ardm1.html
65. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 66
Private group instances of openBIS
Private instances can be requested by email to sis.helpdesk@ethz.ch (via IT shop in the
future)
Additional services available for private group instances (on demand)
Database customization
Migration of existing databases
Instrument integration for direct data upload
Upload of existing historic raw data
Additional JupyterHub server
Service is charged for groups in departments with no SIS subscription that cover these costs
66. ||ETH Library / Scientific IT Services
Remarks on active research data management
There are several options available
There is no “best for all”, it all depends on your research field, data types and research workflow
ARDM is the basis for having FAIR published data
ARDM requires WORK & TIME, but the time spent on this is an investment for the future!
31.10.2018Ana Sesartic Petrus / Caterina Barillari 67
67. ||ETH Library / Scientific IT Services
A glimpse into the toolbox
31.10.2018Ana Sesartic Petrus / Caterina Barillari 68
68. ||ETH Library / Scientific IT Services
File sharing tools
→ Data stored in Switzerland
→ Security regulations fulfilled
polybox.ethz.ch
www.switch.ch/drive/
www.switch.ch/filesender
cifex.ethz.ch/
recommended
www.dropbox.com
www.wetransfer.com
onlyconditionally
recommended
→ Data stored in EU/USA
→ Security regulations only partially fulfilled
→ Never store sensitive / private data there!
69. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 70
Collaborative project management tools
https://www.openproject.org
http://www.redmine.org https://trello.com
https://slack.com
https://tagpacker.com
https://asana.com
Cloud based – use with consideration!
70. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 71
Collaborative writing tools
https://www.overleaf.com
https://www.authorea.com
https://atlas.oreilly.com
https://hypothes.is
https://evernote.com
http://simplenote.com
https://www.onenote.com
https://www.ethz.ch/services/en/it-
services/catalogue/web-application-
hosting/sharepoint.html
Hosted at ETH Zurich: Cloud based –
use with consideration:
Sometimes, on-site
installations are also
available
https://www.ethz.ch/services/en/it
-services/catalogue/web-
application-hosting/wiki.html
71. ||ETH Library / Scientific IT Services
www.zotero.org
31.10.2018Ana Sesartic Petrus / Caterina Barillari 72
Reference management tools
www.mendeley.com endnote.com
http://www.jabref.org
www.citeulike.org www.bibsonomy.org
Increasingly cloud based or synchronised with cloud – use with consideration!
Your reading can expose your research interests.
72. ||ETH Library / Scientific IT Services
Location of the service and its servers
Legal regulations on data protection
Sustainability and trustworthyness
Access rights for your data
Licences and immediate/long-term costs
How can you get your data back?
Ana Sesartic 73
Criteria for choice of tools and services
20.03.2018
74. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 75
1. What information is needed to understand your data?
2. What information do you expect from metadata in your field?
Is this sufficient for you to work with others’ data?
Someone’s data
metadata
What it takes to understand someone’s data – Mind map
75. ||ETH Library / Scientific IT Services
Data, metadata and context are needed to properly understand a data set
Data management does not start with your own data, but also includes a critical view
on other people’s data you use:
Do you understand how they were produced?
Do you have enough information on evaluating their reliability?
Are you comfortable with using data without talking to its producers?
Will you know in a few months time which data you re-used from other researchers?
Do you know how to cite the data you use?
31.10.2018Ana Sesartic Petrus / Caterina Barillari 76
Critically re-thinking the (re-)use of data
76. ||ETH Library / Scientific IT Services
What’s next… What is data management
and why should it concern
you?
Regulations, intellectual
property, privacy and
access rights
Data Management Planning
Data sharing
Long-term preservation
31.10.2018Ana Sesartic Petrus / Caterina Barillari 77
77. ||ETH Library / Scientific IT Services
Data sharing
31.10.2018Ana Sesartic Petrus / Caterina Barillari 78
Data sharing/
collaboration with
project partners
(during the project)
Data sharing with/
publishing to the
community
(after publication
of results)
Creative Commons
Licenses
for third parties
78. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 79
79. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 80
See also our Explora story for more!
Source: https://doi.org/10.22010/ethz-exp-0004-en
80. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 81
share-alike
by non-derivative some rights
reserved
share
non-commercial public domainremix
“Creative Commons” (4.9.2018)
by Michael Porter
CC BY-NC-ND 2.0
81. ||ETH Library / Scientific IT Services
www.dcc.ac.uk/resources/how-guides/license-research-data
31.10.2018Ana Sesartic Petrus / Caterina Barillari 82
Licensing research data
Outlines pros and cons of each approach
and gives practical advice on how to
implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
SA Share Alike
Reduces interoperability
ND No Derivatives
Severely restricts use
Horizon 2020 guidelines
point to
OR
82. ||ETH Library / Scientific IT Services
www.re3data.org
www.openaire.eu/search/data-providers
zenodo.org
datadryad.org
figshare.com*
Ana Sesartic 83
Repositories and registries
*Only partially recommendable as according to
their Terms of Use, figshare is allowed to delete
data anytime and without notice
20.03.2018
83. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 84
Deposit in a repository – but in which one?
http://databib.org
www.re3data.org
84. ||ETH Library / Scientific IT Services
New one-stop-shop for depositing research output
ETH Research Collection (https://www.research-collection.ethz.ch)
Publications, Research Data
Web upload, DOI-reservation and registration, ORCID, export to OpenAire…
Long-term preservation in ETH Data Archive (http://www.library.ethz.ch/Digital-Curation)
Metadata is always public, access to content may be delayed or restricted
Aligned with FAIR principles (Findable – Accessible – Interoperable – Re-usable) according to
SNSF guidelines
31.10.2018Ana Sesartic Petrus / Caterina Barillari 85
ETH Research Collection
85. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 86
Registry of publications / University bibliography
Web pages (AEM)
Annual Academic
Achievements
Slide by Barbara Hirschmann
86. ||ETH Library / Scientific IT Services
Primary publication of reports, presentations, dissertations etc.
Secondary publication of scientific papers (Green Road to Open Access)
31.10.2018Ana Sesartic Petrus / Caterina Barillari 87
Open Access repository
Publisher’s version Open Access version
Slide by Barbara Hirschmann
87. ||ETH Library / Scientific IT Services
• Publication of research data as
supplementary material or stand alone
• Access limited to selected users
• Deposit for preservation only
• All file formats permitted
• Retention periods:
10 years / 15 years / unlimited
31.10.2018Ana Sesartic Petrus / Caterina Barillari 88
Research data repository
Slide by Barbara Hirschmann
88. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 89
3 Ways for importing data
Manual
Entry
Web of Science /
Scopus: daily data
export
Input form
DOI-
Search
Batch-Import:
BibTex / RIS
New entry in Research Collection
Slide by Barbara Hirschmann
89. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 90
Selection of access rights for full texts / data
Open Access Embargoed ETHZ users Selected users Closed access
Publications
Research data
Slide by Barbara Hirschmann
90. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 91
Citable DOIs & possibility to reserve a DOI
Slide by Barbara Hirschmann
91. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 92
Citation numbers / altmetrics / download statistics
Slide by Barbara Hirschmann
92. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 93
Linking
between
data set
and
publication
Slide by Barbara Hirschmann
93. ||ETH Library / Scientific IT Services
Legal issues in Open Acess
publishing
Open-Access- and guidelines of
research funders (SNSF, EU)
Data management and digital
curation
ORCID support
31.10.2018Ana Sesartic Petrus / Caterina Barillari 94
Advice and support by ETH Library
www.research-collection.ethz.ch
Mail: research-collection@library.ethz.ch
Tel. 27 222
Slide by Barbara Hirschmann
94. ||ETH Library / Scientific IT Services
Long-term preservation of data
And how to prepare for it
31.10.2018Ana Sesartic Petrus / Caterina Barillari 95
95. ||ETH Library / Scientific IT Services
What does long-term mean?
Permanent retention of published data which is
considered as part of the scientific record and is
expected to remain available just like articles and
journals are
In general “long-term” signifies any time period
which spans technological changes in the way data
is being used
31.10.2018Ana Sesartic Petrus / Caterina Barillari 96
short term up to 10 years 10 years to permanent
Keeping data for at least ten years to
ensure accountability if results are
challenged (as defined in the ETH
“Guidelines for Research Integrity”)
Different time horizons and purposes
Potentially unlimited retention of data with
permanent value (e.g. long running series of
observational data)
96. ||ETH Library / Scientific IT Services
How does this relate to data management?
31.10.2018Ana Sesartic Petrus / Caterina Barillari 97
Data should be as self-contained as possible,
including documentation of any tools used or better: the
tools themselves;
remember e.g. including reference outputs for model
algorithms
More care is required in the choice and use of file
formats
short term up to 10 years 10 years to permanent
Proper data management or its absence determine if preservation of data will be possible
For a period of ten years, data
management alone might suffice, but
thinking further ahead is useful
If data is to be kept and used for longer periods:
97. ||ETH Library / Scientific IT Services
Open standards (non-proprietary)
If proprietary, convert or if not possible include data viewer
Well documented
Widely used and supported by many tools
Uncompressed (or at least losslessly compressed)
Unencrypted
When in doubt, keep original and create a copy in an open or exchange format
Don’t rely on file extensions
Consider that data might be used in different operating systems
31.10.2018Ana Sesartic Petrus / Caterina Barillari 98
Preferences for file formats
98. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 99
Examples
More information:
https://documentation.library.ethz.ch/display/DD/File+formats+for+archiving
Data File format
Images Uncompressed TIFF, JPEG2000
Text ASCII, including XML etc.
Text (page-based) PDF/A1-b, (PDF)
Data from spreadsheets CSV
Spreadsheets (CSV), (ODF, OOXML)
Add encoding information and dependencies such as stylesheets or TeX-libraries!
99. ||ETH Library / Scientific IT Services
This does not mean you «must not» keep data in other formats
Just be aware that proprietary or undocumented
formats (even your own!) might cause trouble in the future
Think about adding an alternative format (yes, redundantly) for a proprietary one…
… and add any context information you yourself would like to have on your own formats in a
few years time in a readme-file, an accompanying document or as metadata
31.10.2018Ana Sesartic Petrus / Caterina Barillari 100
Note
100. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 101
ETH Data Archive
Digital preservation solution for ETH Zurich, operated by ETH Library
Research Collection
automatic archiving
Heritage content from ETH
University Archives and ETH
Library
automatic archiving
«Software Disclosure»
workflow for ETH transfer
software disclosure
workflow
Docuteam packer
Data
Yael Fitzpatrick: Science Cover Vol 331 (4.9.2018)
http://science.sciencemag.org/content/331/6018/728
101. ||ETH Library / Scientific IT Services
Further Services and Trainings
31.10.2018Ana Sesartic Petrus / Caterina Barillari 102
102. ||ETH Library / Scientific IT Services
IT Services
Storage provisioning (usually via your IT Support Group)
Research data management support and consultiong www.sis.id.ethz.ch/researchdatamanagement
ARDM services based on openBIS https://sis.id.ethz.ch/researchdatamanagement
SIS bioinformatic co-analysis service for –omics data (Contact: Michal Okoniewski, michalo@id.ethz.ch)
Versioning
Gitlab - gitlab.ethz.ch (hosted by IT services)
SharePoint - mysite.sp.ethz.ch (free up to 1 GB, hosted by IT services)
ETH transfer https://www.ethz.ch/en/the-eth-zurich/organisation/staff-units/eth-transfer.html
Software disclosure workflow with ETH Data Archive
Advice on Intellectual Property, Patents, Licensing of Software etc.
31.10.2018Ana Sesartic Petrus / Caterina Barillari 103
IT services and ETH transfer
103. ||ETH Library / Scientific IT Services
Training courses on information research, reference management, data management, scientific writing and
open access offered by the ETH Library
Comprehensive workshop on data management offered by ETH Library in collaboration with SIS:
see link above or ask for additional dates!
New modularity from 2019 on: separate shorter trainings on DMP, ARDM and open access, three weeks in a row
Summer school on research data management in 2019 in preparation (4-7. June 2019)
Trainings on coding best practices, Python, Python for NGS, git, and more offered by SIS
SPHN Data Privacy and IT Security Training: https://www.sib.swiss/training/course/2018-11-sphn-security
Courses offered by the ETH Information Center for Chemistry/Biology/Pharmacy
Further topics on demand
31.10.2018Ana Sesartic Petrus / Caterina Barillari 104
Trainings
104. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 105
What messages
are you taking home
with you?
105. ||ETH Library / Scientific IT Services
Think about what you do!
Start early
Agree on clean concepts and simple tools
You do not need the latest sophisticated apps – but there are useful tools
Talk to colleagues
Check what your local service providers can offer
«Keep it as simple as possible – but distrust it!»
31.10.2018Ana Sesartic Petrus / Caterina Barillari 106
Take home message
We are here to help you !
106. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 107Source: https://doi.org/10.22010/ethz-exp-0002-en
Six easy tips to keep your data safe
107. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 108
Thank you!
Questions?
Dr. Ana Sesartic Petrus
Research Data Management and
Digital Curation
ETH-Bibliothek
Rämistrasse 101
8092 Zurich
044 632 73 76
ana.petrus@library.ethz.ch
www.library.ethz.ch/RDM
data-management@library.ethz.ch
Dr. Caterina Barillari
ID Scientific IT Services
Mattenstrasse 22
4058 Basel
061 387 31 29
caterina.barillari@id.ethz.ch
RDM and Digital Curation Scientific IT Services
https://sis.id.ethz.ch/
sis.helpdesk@ethz.ch
Research Data
www.ethz.ch/researchdata
researchdata@ethz.ch
108. ||ETH Library / Scientific IT Services 31.10.2018Ana Sesartic Petrus / Caterina Barillari 109
We need your feedback
Please fill out the course evaluation form – Thank you!
https://www.umfrageonline.ch/s/a13b937