SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Reproducibility, open data, &
GDPR
Cylcia Bolibaugh, Education,
CReLLU
Data sharing in Education
EROS (Education Researchers for Open Science)
(UYSEG, CRESJ, PERC, CReLLU)
• qualitative
• quantitative (experimental)
• quantitative (individual differences)
• Various goals for sharing data -- today’s
focus on reproducibility
– Verifiability of a publication’s findings -- data
and code
GDPR & Data Protection Act
complicate sharing of research data…
– Co-regulatory approach: a shift in accountability
from data protection authorities to data controllers
and data processors (us!)
– Adoption of open science practices hindered by
worries about compliance (funder, university
requirements, legal, ethical),
Personal data & identifiability
“‘personal data’ means any information relating to an
identified or identifiable natural person (‘data subject’);
an identifiable natural person is one who can be identified,
directly or indirectly, in particular by reference to an identifier
such as a name, an identification number, location data, an
online identifier or to one or more factors specific to the
physical, physiological, genetic, mental, economic, cultural
or social identity of that natural person”
The ‘motivated intruder’ test:
To determine whether a natural person is identifiable, account should
be taken of all the means reasonably likely to be used, such as
singling out, either by the controller or by another person to identify the
natural person directly or indirectly.
To ascertain whether means are reasonably likely to be used to identify
the natural person, account should be taken of all objective factors,
such as the costs of and the amount of time required for identification,
taking into consideration the available technology at the time of the
processing and technological developments. (Recital 26 EU GDPR)
Differentiating between personal
and anonymised data:
A balance between
(1) risk of disclosure/ re-identification
(2) consequences of disclosure (“perceived
value of the information”)
A toy dataset (Polish immigrants to the UK)
-- accuracy scores on language measure
-- reaction times on language measure
-- score on cognitive measure
-- score on cognitive measure
-- Age
-- Native language
-- Age of arrival to UK
-- Length of residence in UK
Assessing risk of reidentification (Klein et al 2018)
 Small population and
rare traits
 Dyadic data
 Hierarchical data (e.g.,
small subsamples of
students, co-workers)
 Motivated intruder test
(e.g., jealous partner,
nosy neighbor, envious
co-worker, insurers,
criminals)
questions, questions…
1) do the biographical variables constitute indirect identifiers?
(1b) how can I systematically calculate the risk of re-identification (e.g. what is the
risk of reidentification for a Polish immigrant to the UK, based on their age, length of
residence in UK and age at time of immigration?)
(2) If there is only a very slight possibility that an individual could be indirectly
identified, is it still personal data?
(3) What if the perceived value of the information that might be linked to that
individual is actually quite low (e.g. how many milliseconds an individual took to
identify an English word, or their rating of how acceptable a particular phrase or
grammatical construction is)?
(4) How would one go about documenting their consideration of these factors?
solutions?
Reproducibility Open Data Usability
Binning ✗ ✓✓ ✓✓✓
Permutation ✓✗ ✓✓ ✓✓✓
K-anonymity tools
(e.g. R package
sdcMicro)
✗ ✓✓ ✓✓
Synthesized dataset
(e.g. R package
Synthpop)
✓✓ ✗ ✓
Encrypted data with
script (e.g. OSF)
✓✓✓ ✗ ✓
Restricted access
depository
✓✓✓ ✓✓✓ ✓✓
OSF approved Protected Access
repositories which are GDPR compliant
- Research Data Center of the SOEP (DE)
- Datorium (DE)
- DataFirst (DE)
- PsychData (ZPID, Leibniz)
- University of Bristol Research Data
Repository
- The UK Data Service (ESRC)
Anonymisation
• Europe-wide standards for anonymisation are needed.
– OpenAire  European Data Protection Board could issue
guidelines concerning anonymisation.
• Nationally, codes of conduct to differentiate between
personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance (decision
trees, case studies, tools for documentation of risk
assessment etc)
Anonymisation
• Europe-wide standards for anonymisation
are needed.
– OpenAire  European Data Protection Board
could issue guidelines concerning
anonymisation.
• Nationally, codes of conduct to differentiate
between personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance
(decision trees, case studies, tools for
documentation of risk assessment etc)
Thanks!
Questions?
The Open Data badge is
earned for making publicly
available the digitally-
shareable data necessary
to reproduce the reported
results.

Weitere ähnliche Inhalte

Ähnlich wie ODiP: Reproducibility, open data and GDPR

Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhurymaredata
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionIOSR Journals
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...CREST @ University of Adelaide
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)albert ca
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...Simon Knight
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 

Ähnlich wie ODiP: Reproducibility, open data and GDPR (20)

Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
A brave new world: student surveillance in higher education
A brave new world: student surveillance in higher educationA brave new world: student surveillance in higher education
A brave new world: student surveillance in higher education
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths
 

Mehr von University of York Library

Who's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsWho's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsUniversity of York Library
 
ODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectUniversity of York Library
 
Understanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUnderstanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUniversity of York Library
 
Women's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsWomen's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsUniversity of York Library
 
Twitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSTwitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSUniversity of York Library
 
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Library
 
A basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterA basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterUniversity of York Library
 
Blogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherBlogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherUniversity of York Library
 

Mehr von University of York Library (20)

Open access publication (RET workshop)
Open access publication (RET workshop)Open access publication (RET workshop)
Open access publication (RET workshop)
 
Who's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsWho's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometrics
 
Finding what you need with YorSearch | #UoYTips
Finding what you need with YorSearch | #UoYTipsFinding what you need with YorSearch | #UoYTips
Finding what you need with YorSearch | #UoYTips
 
Managing your research data
Managing your research dataManaging your research data
Managing your research data
 
#UoYTips: Welcome to the Library
#UoYTips: Welcome to the Library#UoYTips: Welcome to the Library
#UoYTips: Welcome to the Library
 
CLG2 The Good Nurse in the Literature May 2019
CLG2 The Good Nurse in the Literature May 2019CLG2 The Good Nurse in the Literature May 2019
CLG2 The Good Nurse in the Literature May 2019
 
ODiP: Open data and the scientific gift culture
ODiP: Open data and the scientific gift cultureODiP: Open data and the scientific gift culture
ODiP: Open data and the scientific gift culture
 
ODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research Project
 
ODiP: Psychology Open Science Interest Group
ODiP: Psychology Open Science Interest GroupODiP: Psychology Open Science Interest Group
ODiP: Psychology Open Science Interest Group
 
Searching the Literature 2018/19
Searching the Literature 2018/19 Searching the Literature 2018/19
Searching the Literature 2018/19
 
Understanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUnderstanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimore
 
RDM: a briefing for Health Sciences
RDM: a briefing for Health SciencesRDM: a briefing for Health Sciences
RDM: a briefing for Health Sciences
 
Searching the Literature
Searching the Literature Searching the Literature
Searching the Literature
 
Women's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsWomen's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collections
 
Twitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSTwitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERS
 
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
 
A basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterA basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of Twitter
 
10 useful things for Management Research
10 useful things for Management Research10 useful things for Management Research
10 useful things for Management Research
 
Blogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherBlogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked Researcher
 
Using Twitter in Academic Teaching
Using Twitter in Academic TeachingUsing Twitter in Academic Teaching
Using Twitter in Academic Teaching
 

Kürzlich hochgeladen

Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Kürzlich hochgeladen (20)

Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 

ODiP: Reproducibility, open data and GDPR

  • 1. Reproducibility, open data, & GDPR Cylcia Bolibaugh, Education, CReLLU
  • 2. Data sharing in Education EROS (Education Researchers for Open Science) (UYSEG, CRESJ, PERC, CReLLU) • qualitative • quantitative (experimental) • quantitative (individual differences) • Various goals for sharing data -- today’s focus on reproducibility – Verifiability of a publication’s findings -- data and code
  • 3. GDPR & Data Protection Act complicate sharing of research data… – Co-regulatory approach: a shift in accountability from data protection authorities to data controllers and data processors (us!) – Adoption of open science practices hindered by worries about compliance (funder, university requirements, legal, ethical),
  • 4. Personal data & identifiability “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”
  • 5. The ‘motivated intruder’ test: To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. (Recital 26 EU GDPR)
  • 6. Differentiating between personal and anonymised data: A balance between (1) risk of disclosure/ re-identification (2) consequences of disclosure (“perceived value of the information”)
  • 7. A toy dataset (Polish immigrants to the UK) -- accuracy scores on language measure -- reaction times on language measure -- score on cognitive measure -- score on cognitive measure -- Age -- Native language -- Age of arrival to UK -- Length of residence in UK
  • 8. Assessing risk of reidentification (Klein et al 2018)  Small population and rare traits  Dyadic data  Hierarchical data (e.g., small subsamples of students, co-workers)  Motivated intruder test (e.g., jealous partner, nosy neighbor, envious co-worker, insurers, criminals)
  • 9. questions, questions… 1) do the biographical variables constitute indirect identifiers? (1b) how can I systematically calculate the risk of re-identification (e.g. what is the risk of reidentification for a Polish immigrant to the UK, based on their age, length of residence in UK and age at time of immigration?) (2) If there is only a very slight possibility that an individual could be indirectly identified, is it still personal data? (3) What if the perceived value of the information that might be linked to that individual is actually quite low (e.g. how many milliseconds an individual took to identify an English word, or their rating of how acceptable a particular phrase or grammatical construction is)? (4) How would one go about documenting their consideration of these factors?
  • 10. solutions? Reproducibility Open Data Usability Binning ✗ ✓✓ ✓✓✓ Permutation ✓✗ ✓✓ ✓✓✓ K-anonymity tools (e.g. R package sdcMicro) ✗ ✓✓ ✓✓ Synthesized dataset (e.g. R package Synthpop) ✓✓ ✗ ✓ Encrypted data with script (e.g. OSF) ✓✓✓ ✗ ✓ Restricted access depository ✓✓✓ ✓✓✓ ✓✓
  • 11. OSF approved Protected Access repositories which are GDPR compliant - Research Data Center of the SOEP (DE) - Datorium (DE) - DataFirst (DE) - PsychData (ZPID, Leibniz) - University of Bristol Research Data Repository - The UK Data Service (ESRC)
  • 12. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc)
  • 13. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc) Thanks! Questions?
  • 14. The Open Data badge is earned for making publicly available the digitally- shareable data necessary to reproduce the reported results.

Hinweis der Redaktion

  1. Lack of clear procedural guidance, and precedent/case studies means that data controllers (i,e, researchers!) understandably risk averse (ris being not only legal compliance, but also the time investment necessary to
  2. (https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/what-is-personal-data/can-we-identify-an-individual-indirectly/) If there is only a very slight possibility that an individual could be indirectly identified, is it still personal data? You should assume that you are not looking just at the means reasonably likely to be used by an ordinary person, but also by a determined person with a particular reason to want to identify individuals. The measures reasonably likely to be taken to identify an individual may vary depending upon the perceived value of the information.
  3. https://www.york.ac.uk/library/info-for/researchers/data/sharing/#tab-3 “In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied.”
  4. From a project investigating whether there are differences in learning mechanisms between child and adult language learners, minimal data required to model variability in the language attainment/proficiency of bilinguals as a function of their learning history (what age they started, how long their exposure has been, and cognitive skills theorised to underlie particular learning mechanisms. In this case, the biographical data are integral to the reproducibility of the analysis, and cannot be separated or binned etc without detriment to the reproducibility.
  5. I have a sample of Polish immigrants, and data about their age at test, the age they arrived to the UK, and their length of residence. Is the combination of these indirect identifiers sufficient to reidentify an individual? Approx 900,000 Polish immigrants to UK, so my population is large and risk of reidentification small. However sampling criteria (very advanced proficiency in English, and minimum 12 years residence) likely increase that, but by how much. Finally risk not evenly spread throughout sample: WWII immigrants.
  6. Depending on the answer to these questions, there are a variety of means by which data can be further anonymised, or other ways in which the data could be shared. However, there is a tradeoff between increasing the availability of the dataset, and ensuring the reproducibility of analyses underlying a published output, which I tried to sketch here in a back of the envelope fashion.
  7. My feeling is that there is likely to be a bias toward placing data in restricted access repositories, even when the disclosure risk is relatively small. The problem with this solution, at least for language researchers, is that it eliminates 2 repositories that are most commonly used (IRIS which is repository specifiliased in materials for L2 research, and OSF, Figshare etc). If you are interested in obtaining an open data badge, Restricted access notation was added earlier this year, but only a small number of repositories have been certified. The first 4 on the list on in Germany, and relatively few UKDA has an end user agreement But in practice, the repositories most commonly used, OSF, and figshare, github