ODiP: Reproducibility, open data and GDPR

Reproducibility, open data, &
GDPR
Cylcia Bolibaugh, Education,
CReLLU

Data sharing in Education
EROS (Education Researchers for Open Science)
(UYSEG, CRESJ, PERC, CReLLU)
• qualitative
• quantitative (experimental)
• quantitative (individual differences)
• Various goals for sharing data -- today’s
focus on reproducibility
– Verifiability of a publication’s findings -- data
and code

GDPR & Data Protection Act
complicate sharing of research data…
– Co-regulatory approach: a shift in accountability
from data protection authorities to data controllers
and data processors (us!)
– Adoption of open science practices hindered by
worries about compliance (funder, university
requirements, legal, ethical),

Personal data & identifiability
“‘personal data’ means any information relating to an
identified or identifiable natural person (‘data subject’);
an identifiable natural person is one who can be identified,
directly or indirectly, in particular by reference to an identifier
such as a name, an identification number, location data, an
online identifier or to one or more factors specific to the
physical, physiological, genetic, mental, economic, cultural
or social identity of that natural person”

The ‘motivated intruder’ test:
To determine whether a natural person is identifiable, account should
be taken of all the means reasonably likely to be used, such as
singling out, either by the controller or by another person to identify the
natural person directly or indirectly.
To ascertain whether means are reasonably likely to be used to identify
the natural person, account should be taken of all objective factors,
such as the costs of and the amount of time required for identification,
taking into consideration the available technology at the time of the
processing and technological developments. (Recital 26 EU GDPR)

Differentiating between personal
and anonymised data:
A balance between
(1) risk of disclosure/ re-identification
(2) consequences of disclosure (“perceived
value of the information”)

A toy dataset (Polish immigrants to the UK)
-- accuracy scores on language measure
-- reaction times on language measure
-- score on cognitive measure
-- score on cognitive measure
-- Age
-- Native language
-- Age of arrival to UK
-- Length of residence in UK

Assessing risk of reidentification (Klein et al 2018)
 Small population and
rare traits
 Dyadic data
 Hierarchical data (e.g.,
small subsamples of
students, co-workers)
 Motivated intruder test
(e.g., jealous partner,
nosy neighbor, envious
co-worker, insurers,
criminals)

questions, questions…
1) do the biographical variables constitute indirect identifiers?
(1b) how can I systematically calculate the risk of re-identification (e.g. what is the
risk of reidentification for a Polish immigrant to the UK, based on their age, length of
residence in UK and age at time of immigration?)
(2) If there is only a very slight possibility that an individual could be indirectly
identified, is it still personal data?
(3) What if the perceived value of the information that might be linked to that
individual is actually quite low (e.g. how many milliseconds an individual took to
identify an English word, or their rating of how acceptable a particular phrase or
grammatical construction is)?
(4) How would one go about documenting their consideration of these factors?

solutions?
Reproducibility Open Data Usability
Binning ✗ ✓✓ ✓✓✓
Permutation ✓✗ ✓✓ ✓✓✓
K-anonymity tools
(e.g. R package
sdcMicro)
✗ ✓✓ ✓✓
Synthesized dataset
(e.g. R package
Synthpop)
✓✓ ✗ ✓
Encrypted data with
script (e.g. OSF)
✓✓✓ ✗ ✓
Restricted access
depository
✓✓✓ ✓✓✓ ✓✓

OSF approved Protected Access
repositories which are GDPR compliant
- Research Data Center of the SOEP (DE)
- Datorium (DE)
- DataFirst (DE)
- PsychData (ZPID, Leibniz)
- University of Bristol Research Data
Repository
- The UK Data Service (ESRC)

Anonymisation
• Europe-wide standards for anonymisation are needed.
– OpenAire  European Data Protection Board could issue
guidelines concerning anonymisation.
• Nationally, codes of conduct to differentiate between
personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance (decision
trees, case studies, tools for documentation of risk
assessment etc)

Anonymisation
• Europe-wide standards for anonymisation
are needed.
– OpenAire  European Data Protection Board
could issue guidelines concerning
anonymisation.
• Nationally, codes of conduct to differentiate
between personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance
(decision trees, case studies, tools for
documentation of risk assessment etc)
Thanks!
Questions?

The Open Data badge is
earned for making publicly
available the digitally-
shareable data necessary
to reproduce the reported
results.

ODiP: Reproducibility, open data and GDPR

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie ODiP: Reproducibility, open data and GDPR

Ähnlich wie ODiP: Reproducibility, open data and GDPR (20)

Mehr von University of York Library

Mehr von University of York Library (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ODiP: Reproducibility, open data and GDPR

Hinweis der Redaktion