This document discusses research lifecycles and data management. It begins by outlining typical stages in a research lifecycle from planning to publication. It then discusses how data is created and managed at various stages, and raises questions researchers should consider around formatting, documenting, storing, sharing and preserving data. The document provides examples of research lifecycle models and gives advice on best practices for managing data at each stage of the research process to support reuse and ensure data is well documented and preserved.
2. About this presentation
Managing data throughout the
research lifecycle
• What is the research lifecycle?
• How do you manage data?
• What questions does managing data raise?
3. What is the research lifecycle?
• Research activity often takes place in stages
which form a ‘lifecycle’
• Data is created at points during this lifecycle
• The data created has its own lifespan
“Data often have a longer lifespan than the research project that creates
them. Researchers may continue to work on data after funding has ceased,
follow-up projects may analyse or add to the data, and data may be re-used
by other researchers.” UKDA
4. CREATE DATA
ADD DOCUMENTATION
PLAN
DATA CENTRE
IT
RESEARCHERS
CREATE DATA
ADD DOCUMENTATION
PLANA model to show the activities and
people involved in managing data.
Example 1: DCC lifecycle model
5. Write proposal
Start project
Acquire sample
Generate,
Create,
Collect
ProcessAnalyze
Interpret
Publish
Validate
Research
Process
Example 2: Research360 lifecycle
8. Key ideas from the research lifecycle
Different research lifecycles suit different researchers
Research is a circular process
Certain stages are likely to be familiar to many researchers
– conceptualisation/planning, creation, active
use/documentation, publication etc…
Certain stages are likely to be familiar to less researchers –
sharing, re-use etc…
Data may be created at many stages during the process
(intervention points)
Data is likely to need management at many stages during
the process
9. 4.
Publication
& Deposit
5.
Preservation
& Re-Use
1.
Create
2.
Active Use
3.
Documentation
1. What data will you produce?
2. How will you organise the data?
3. Can you/others understand the
data
4. What data will be deposited and
where?
5. Who will be interested in re-using
the data?
Key Qs from the research lifecycle
10. “the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data management is part of
good research practice
What is data curation?
Manage
Share
13. How do you manage data?
Key questions to consider when:
- Creating data
- Documenting data
- Storing data
- Sharing data
- Preserving data
- Planning data management
Examples and pointers to support
14. Creating data: questions
What formats will you use?
- determined by the instruments / software you have to use
- common, widespread formats to enable reuse
How will you create your data?
- What methodologies and standards will you use?
- How will you address ethical concerns and protect participants?
- Will you control variations to provide quality assurance?
- What external data sets will you use?
(See the BL Social Science Collection guide to Management and
Business studies datasets)
15. Different formats are good for different things
- open, lossless formats are more sustainable e.g. rtf, xml, tif, wav
- proprietary and/or compressed formats are less preservable but
are often in widespread use e.g. doc, jpg, mp3
May choose one format for analysis then convert
to a standard format for preservation / sharing
Excellent guidance on creating data & managing ethics in:
www.data-archive.ac.uk/media/2894/managingsharing.pdf
Creating data: advice
16. Unencrypted
Uncompressed
Non-proprietary/patent-encumbered
Open, documented standard
Standard representation (ASCII, Unicode)
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
File formats for long-term access
17. Documenting data: questions
What information do users need to understand the data?
- descriptions of all variables / fields and their values
- code labels, classification schema, abbreviations list
- information about the project and data creators
- tips on usage e.g. exceptions, quirks, questionable results
How will you capture this?
Are there standards you can use?
18. Dublin Core metadata example
Creator:Donald Cooper
Role=Photographer
Subject: Shakespeare, William, 1564-1616,
Antony and Cleopatra [LC]
Description:Vanessa Redgrave as Cleopatra
Date: 1973-08-09
Type:Image
Format:JPEG
Identifier:4150 [catalogue no]
Source: negative no 235
Relation: Antony and Cleopatra: Thompson/73-8
IsPartOf
Coverage:Bankside Globe
Role=Spatial
Rights:Donald Cooper
http://www.ahds.ac.uk/performingarts
19.
20. Storing data: questions
What is available to you?
What facilities do you need?
- remote access
- file sharing with colleagues
- high-levels of security
How will the data be backed up?
21. Storing data: advice
Speak to the Northampton IT Team for advice – TUNDRA2
Remember that all storage is fallible – need to back-up
- keep 2+ copies on different types of media in different locations
- manage back-ups (migrate media, test integrity)
Choose appropriate methods to transfer / share data
- email, dropbox, ftp, encrypted media, filestore, VREs...
22. Sharing data: questions
Does your funder expect you to share data?
Which data can be shared?
How will you share your data?
What do you get from sharing?
- citations, recognition...
23. Sharing data: advice
Where possible, make your data
available via repositories, data
centres and structured databases
http://datacite.org/repolist http://databib.org/
Northampton Electronic Collection of Theses and Research (NECTAR)
http://nectar.northampton.ac.uk/
24. Preserving data: questions
Are you required to preserve (or destroy) your data?
How will you select what to keep?
Is there somewhere you can archive your data?
How can you support the reuse of your data?
25. Preserving data: advice
How to select and appraise research data:
www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
How to licence research data
www.dcc.ac.uk/resources/how-guides/license-research-data
How to cite datasets and link to publications
www.dcc.ac.uk/resources/how-guides/cite-datasets
26. Planning data management
What do you (and others) want to do with the data?
your decisions should bear this in mind and make it feasible
Remember:
Data management is about making informed decisions
Talk to colleagues and support staff to see which option works best
27. Data Management and Sharing Plans
Funders typically want a short statement covering:
- What data will be created (format, types) and how?
- How will the data be documented and described?
- How will you manage ethics and Intellectual Property?
- What are the plans for data sharing and access?
- What is the strategy for long-term preservation?
DMP tool: https://dmponline.dcc.ac.uk/
How to write a DMP:
www.dcc.ac.uk/resources/how-guides/develop-data-plan
28. The research process at Cardiff
Take 10 minutes to think about one of the academic
departments you work
― How would you characterise the subject as a whole?
― What do you know about the social organisation of the
discipline?
― What do you know about the data they create?
― Are you familiar with the research process in this
department?
― When might they require your help?
29. Thanks - any questions?
Acknowledgements:
Thanks to DCC staff, UK Data Archive and Research360 for slides