workshop session delivered alongside 'Making your thesis legal' workshop in July and September 2013 to PhD, MPhil, DrPh students who are completing their thesis. Discusses standards for sharing data, issues that need addressing, formats, data protection, usability, licenses
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Preparing research data for sharing
1. Preparing Research Data
for Sharing
An overview for LSHTM students
Gareth Knight & Victoria Cranna
This work is licensed under a
Creative Commons Attribution 2.0 UK:
England & Wales License
LSHTM eThesis session
Presented on 10th and 18th July 2013
3. Research Data
“Data produced during the research activity
should be managed appropriately, ensuring
that it is stored, organised and documented in
a manner that allows it to be understood and
used for the intended purpose.”
Research Degrees Handbook: Academic Year 2012-13
4. To Share or not to Share
1. Is the Sharing justified?
• What benefits will it provide?
• What are the risks associated with sharing data?
2. Do you have the ability to share?
• Intellectual Property Rights (IPR)
• Participant Consent
• Other obligations, e.g. confidentiality
3. Are there any conditions associated with sharing?
• What measures need to be in place to protect data? (e.g. record access
requests, specific use only)
Information Commissioner Office. Data Sharing Code of Practice
http://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing
5. Reasons for
• Encourages validation of research
findings
• Increase visibility of research
findings through attribution and
further analysis
• Comply with sponsor obligations
• Comply with journal publisher
req.
• Simple way to deal with annoying
data requests
6. Reasons against
• Ownership issues , e.g. 3rd party
rights
• Participant Confidentiality - DPA
1998 –not apply to anonymised data
• Sensitivity - Implications of release
(e.g. geo-references for animal
migration).
• Commercial/Research exploitation
• Contractual, regulatory, & legislative
What are the risks of data release?
7. Protection of
Research Participants
“ Researchers must ensure the confidentiality
of personal information relating to research
participants”
“Prior to publication or depositing data in a
public depository, data should be fully
anonymised”
LSHTM Guidelines on Good Research Practice
8. Data Protection Act 1998
Personal Data
Info that can be used to identify
individual in isolation, or in tandem with
other info. E.g. Name, age, address, etc.
Sensitive Personal Data
racial or ethnic origin
political opinions
religious beliefs
trade union membership
physical or mental health
sexual life
criminal convictions
Protect living individual’s fundamental rights and freedoms in
relation to storage, processing, and disclosure of information held
about them
9. Data Protection Principles
Eight principles which broadly state that personal data shall be:
1. Fairly and lawfully processed
2. Obtained only for specified purposes, and shall not be further processed
for other purposes that are incompatible with the original reason
3. Adequate, relevant and not excessive in comparison to original purpose
4. Accurate and where necessary, kept up to date
5. Held no longer than is necessary
6. Processed in accordance with the data subject’s rights
7. Kept securely and safely with appropriate measures to prevent
unauthorised or unlawful processing of the data and against accidental
loss, destruction or damage
8. Not transferred to countries without adequate protection
10. Potential Exemptions
No blanket exemption, but...
• Certain exemptions for research purposes including
statistical or historical purposes.
• If the research processing is not targeted at particular
individual & does not cause substantial distress or
damage to a data subject, then:
• 2nd principle - personal data can be processed for purposes other
than for which they were originally obtained
• 5th principle - personal data can be held indefinitely
• Analysis results do not identify data subjects
Information Commissioner Office: Guide to Data Protection
http://www.ico.org.uk/for_organisations/data_protection/the_guide
11. Reducing Disclosure risk
Disclosure Types:
• Identity: Identify person directly
• Attribute: ID sensitive info on subject
• Inferential: Determine value of a subject’s
characteristic more accurately than would have
been otherwise possible
Techniques:
• Remove obvious identifiers (DPA 1998)
• Replace real data with synthetic
• Limit variables that are made available
• Sampling with a larger group
• Group significant values / Top/bottom coding
• Limit geographic detail
Avoiding inappropriate attribution of information to a data subject
Information Commissioner Office: Anonymisation Code of Practice
http://www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation
12. Ensuring continued access
Problems:
1. User doesn’t possess relevant
software package
2. User runs a different operating
system than the creator (e.g.
Linux, MacOS)
3. Software package is obsolete
Options:
• Emulation of original
environment
• Export to other format
13. Choosing File Formats
Format should be:
• Accessible using wide-range of
software tools
• In widespread use
• Support relevant information
attributes without loss
• Based upon a public specification
• Able to be created without DRM or
other limitations
“turning [a] PDF into XML is like turning a hamburger into a cow”
Peter Murray-Rust
14. Recommended Formats
Quantitative tabular:
• Preferred: SPSS portable format (.por), delimited txt & command/setup file
• Acceptable: SPSS (.sav), Stata (.dta), MS Access & other proprietary formats
Geospatial:
• Preferred: ESRI Shapefile, Geo-referenced TIFF (.tif, .tfw)
• Acceptable: SRI Geodatabase format (.mdb), MapInfo Interchange Format (.mif),
Keyhole Mark-up Language (KML) (.kml)
Qualitative text:
• Preferred: XML-encoded text (e.g. DDI, TEI), Open Document Format (ODF), Rich
Text Format (RTF)
• Acceptable: MS Word, NVivo
Still Images:
• Preferred: TIFF, Uncompressed lossless JP2000
• Acceptable: PNG, RAW, Compressed JP2000
15. Ensuring Understandability
Researcher Qs:
• What does the variable mean?
• How were the results produced?
• What are the boundaries of the
measurement?
• What instruments and measures
were used?
A user – a 3rd party or future self) has difficult understanding
some aspect of the research data
Source:
• Lab notebooks & research protocols
• Codebooks and data dictionaries
• Equipment settings &
instrument calibration
Approach:
1. Check reqs in your field (e.g. Clinical)
2. Look at other collections (e.g. UKDS)
3. Consider Qs that user may have when accessing
16. Ensuring Usability
Scenarios:
1. Uncertain if permitted to
analyse data – does not use.
2. Researcher uses data in research
for non-permitted purpose
End user unsure on permitted use of data
Licence should specify:
• Data that the licence applies to;
• Who owns each component;
• Who is permitted access & use;
• Conditions associated with use
17. 1. Standard licence model
Creative Commons
Attribution (BY): Creator must be credited
No Derivatives (ND): No editing or manipulation
Non-Commercial (NC): Cannot be sold
Share Alike (SA): Share under same licence
Open Data Commons
Public Domain Dedication & License
(PDDL)
Attribution License (ODC-By)
Open Database License (ODC-ODbL)
Attribution Share-Alike
Various software Licence Models
GNU General Public License (GPL)
GNU General Public License (LGPL)
BSD license
Etc.
18. 2. Tailored Licence form
• National Cancer Research Institute - Data
and Material Transfer Agreement
template
• http://www.ncri.org.uk/default.asp?s=1&
p=8&ss=9
• UK Data Service licence
http://ukdataservice.ac.uk/deposit-
data/support/licence.aspx
• CELCIUS Data Access Agreement
http://celsius.lshtm.ac.uk/documents/Dat
a%20Access%20Agreement.doc
• Participant Consent form
http://www.lshtm.ac.uk/research/ethicsc
ommittees/
Digital Curation Centre: How to License Research Data
http://www.dcc.ac.uk/resources/how-guides/license-research-data
19. LSHTM Data Repository
• Public: data made available for
anonymous access
• Registered: End user required to
register for time-limited access
• Approved: End user must state
purpose they wish to use data for.
• Embargoed: Data associated
withheld for a designated time
period, e.g. 5 years.
• Request: Data not held in the
repository may be requested from
the creator
In-development service capable of
curating, preserving, and sharing LSHTM research data
20. A Few Useful References
• MANTRA – Data Management training for PhD students
http://datalib.edina.ac.uk/mantra/
• UK Data Archive – Managing and Sharing Data
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
• LSHTM Information Management support material
http://intra.lshtm.ac.uk/infoman/
• Data Protection web pages: http://intra.lshtm.ac.uk/infoman/data/
• Guidelines on good research practice: Implementing research governance:
http://www.lshtm.ac.uk/research/ethicscommittees/good_research_practice.p
df
• Research Degrees Handbook:
http://www.lshtm.ac.uk/study/currentstudents/studentinformation/rd_handbo
ok_12_13.pdf
• Information Management and Security Policy:
http://intra.lshtm.ac.uk/infoman/security/index.html
Protect living individual’s fundamental rights and freedoms in relation to storage, processing, and disclosure of information held about themAims to protect an individual’s fundamental rights and freedoms in respect of personal data processingGives individuals the right to access the personal data the School holds on them, to correct it, purpose for which it is held, and who the information can be disclosed toOnly relates to living individualsPublic register of data controllers to which institutions have to add their notification
If the purpose of the research processing is not measures or decisions targeted at particular individuals and does not cause substantial distress or damage to a data