1) A workshop on data was held in Munich with 15 leading scientists and data practitioners to discuss challenges around data sharing, reuse, and persistence.
2) Key topics discussed included incentives for sharing data, establishing trust in data quality, and ensuring long-term access to data.
3) Participants made recommendations for the Research Data Alliance (RDA) to develop interoperable standards and specifications to facilitate more widespread and sustainable data sharing across disciplines.
OSGIS Conference: report on RDA/MPG Science workshop
1. 1st Science Workshop on Data
10-11 February, Munich, Germany
Herman Stehouwer
Thanks to Raphael Ritz for the slides
Garching Compute Center, Max-Planck-Society, Germany
06/05/14 RDA-Europe Forum 1
2. Science Workshop on Data
February 10-11, 2014, in Munich
15 leading scientists + data practitioners
Hosted by RDA-E and MPG
Key points:
Rewards (“what’s in there for me?”)
Trust (“are these data and metadata
correct?”)
Persistence (“what will be in 20 years?”)
Report available
06/05/14 RDA-Europe/MPG Science Workshop 2
3. Participants - Scientists
• Bernard Schutz - Gravitational Physics - Cardiff U / MPG
• Bruce Allen - Gravitational Physics - MPI for Gravitationphysics, Hannover
• Bruno Leibundgut - Astronomy - ESO, Garching
• Cécile Callou - Archaezoology/Biodiversity-Ecology-Environment - Museum d'Histoire
Naturelle, Paris
• Christine Gaspin - Bio-Informatics - INRA, Toulouse
• Dick Dee - Meteorology - ECMWF
• Jan Bjaalie - Neuroanatomy and Computer Science - University of Oslo
• Francoise Genova – Astronomy – CNRS, Strasbourg
• Jochem Marotzke - Climate Model - MPI for Meteorology, Hamburg
• Manfred Laubichler - History of Science - New Mexico University
• Marc Brysbaert - Psychology - Ghent University
• Mark Hahnel - Biology - Figshare and Imperial College, London
• Markku Kulmala - Atmospheric Sciences - University of Helsinki
• Peter Coveney - Chemistry, biomedecine - UCL, London
• Stefano Nativi – Earth System Science and Environmental Technologies – CNR, Rome
06/05/14 RDA-Europe/MPG Science Workshop 3
4. Participants - Experts
• Carlos Morais-Pires - e-Infrastructures - European Commission
• Donatella Castelli - RDA/E Member/Computer Science - ISTI-CNR, Pisa
• Frank Sander - MPDL Director - MPDL, Munich
• Herman Stehouwer - RDA Secretariat - MPI for Psycholinguistics, Nijmegen
• Leif Laaksonen - RDA/E Coordinator - CSC, Helsinki
• Peter Wittenburg - RDA-TAB Member/Linguistics - MPI for
Psycholinguistics, Nijmegen
• Ramin Yahyapour - GWDG Director - GWDG, Göttingen
• Raphael Ritz - RDA/E Member/NeuroInformatics - MPG, Garching
• Stefan Heinzel - RZG Director - MPG, Garching
• Reinhard Budich – Data Scientist – MPI Meterology, Hamburg
• Riam Kanso – Data Policies/Cognitive Neuroscience – UCL, London
• Ari Asmi – Atmospheric Sciences – University of Helsinki
06/05/14 RDA-Europe/MPG Science Workshop 4
6. Observations
Sharing and Re-use of Data
Publishing and Citing Data
Infrastructure and Repositories
Leading to
Recommendations
06/05/14 RDA-Europe/MPG Science Workshop 6
7. Sharing and Re-use of Data
Still in its infancy
Lack of efficient, cross-disciplinary
methods enabling re-use
Reference data is costly to establish and
maintain
Trust needed in the identity, integrity,
authenticity of data and the seriousness of
all actors involved
06/05/14 RDA-Europe/MPG Science Workshop 7
8. Publishing and Citing Data
Core of the scientific process
Referencing data (e.g. via PIDs) must be
stable
Reproducible Science
Quality assessment (e.g. peer review)
Career building
Suitable infrastructure needed
06/05/14 RDA-Europe/MPG Science Workshop 8
9. Infrastructure and Repositories
They are desperately needed
Researchers in the driving seat
Limits to Open Access
Services on data
Legacy data and systems
Companies more cost effective?
Registries and catalogues needed
06/05/14 RDA-Europe/MPG Science Workshop 9
10. Recommendations to RDA
Come up with recommendations and
specifications to overcome the one-shot,
point solutions of today
Be really bottom-up
Engage a “middle layer” of data scientists
Compete or cooperate with commercial
players
06/05/14 RDA-Europe/MPG Science Workshop 10
11. Expectations towards RDA
Invest in training younger generations of data
scientists
Push demo projects, act as clearing house,
provide advice on data management, access
and re-use
Have data experts visit and support institutes
Perform good quality assessment on own results
Don’t oversell
06/05/14 RDA-Europe/MPG Science Workshop 11
12. A New Way of Doing Science?
https://materialsproject.org/escience
06/05/14 RDA-Europe/MPG Science Workshop 12
13. Challenges of Data Intensive Science
Large Volumes
Accumulated at high rate (Velocity)
Integration across a Variety of sources
Trust/Validity/Veracity
This has implications on data capture, curation,
storage, search, sharing, transfer, analysis and
visualization
06/05/14 RDA-Europe/MPG Science Workshop 13
14. Data Sharing != Open Access
Data Sharing starts in your own laboratory.
It continues with collaborators, consortia,
peers, the scientific communities and maybe
the general public.
06/05/14 RDA-Europe/MPG Science Workshop 14
15. Making data sharing feasible
Are you willing to
take the time to
prepare your data for
sharing?
Is there somewhere
for you to put your
data?
Can I read your data?
Will it be possible to still
understand the data in
the future?
If so, can I understand it?
(e.g., how it was acquired and
analysed?)
Develop methods
to ensure
interoperability
Develop
standardised
and
comprehensive
metadata
schemes
Develop automated
methods for metadata
capture
06/05/14 RDA-Europe/MPG Science Workshop 15
16. A Lot of Work
Managing data in a lab requires increasing
amounts of time by the scientists involved.
Preparing for sharing requires additional work.
How can we make this as effective as possible?
RDA’s idea: adopt a bottom-up approach towards
Research Data Sharing without barriers.
06/05/14 RDA-Europe/MPG Science Workshop 16