Presentation given to the High Performance Computing Summer School as part of a hands-on workshop developing software management plans and looking at software as data within the context of research data management best practices.
1. Library
Services
Software Management Plans
and ‘Software as Data’
HPC Summer School
Research Data Management Community
Session
Sept. 30th, 2016
Sarah Stewart, Research Data Management Team,
Central Library
2. Missing Data (and Software)
In their parents' attic, in boxes in the garage, or stored on now-defunct
floppy disks — these are just some of the inaccessible places in which
scientists have admitted to keeping their old research data.”
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-
1.14416
4. Software: What do I do with it?
• Lots of emphasis on ‘data’ management, but software in
research is often neglected.
• Software is sensitive to changes in its ‘environment’
• There is a lot of variation inherent in software
(languages, versions, licensing, etc.)
5. Software as ‘Data’
• ‘Software is used to create, interpret, present,
manipulate and manage data’ (Software Sustainability
Institute)
• Data: ‘recorded factual material commonly retained by
and accepted…as necessary to validate research
findings’ (EPSRC)
• Software = Data!
6. Software should be preserved if:
• Software can’t be separated from the data or digital
object.
• Software is classified as a research output
• Software has intrinsic value
7. Digital Preservation Issues
• Storage, Retrieval, Reconstruction and Replay are all
complexities relating to code libraries, dependencies and
software engineering overall.
• Planning is essential for subsequent retrieval,
reconstruction and replay.
• Software is a digital object which is frequently the result
of research and is often a vital prerequisite for the
preservation of other digital objects.
• Software preservation should be part of a broader
preservation strategy: Research Data Management.
8. Strategies for Digital Preservation
• Data Integrity and File Fixity checks (management of
checksums) – for source code
• Media and Format Migrations
• Refreshing (reduces bit-rot)
• Replication (create duplicate copies, avoids corruption,
loss, erasure)
• Emulation
• Encapsulation (linking content with all information
required for it to be deciphered and understood)
9. Software Management Plans
What?
• Like Data Management Plans, Software management
plans provide an outline of uses, responsibilities,
ownership, access and sharing, storage, maintenance
and archiving of research software.
10. Software Management Plans
Why?
• No clear funder requirements yet, but…
• Promotes citability and credit for your research =
Increased Research Impact
• Research Output can be validated/checked by others
• Supports transparency of research and promotes Open
Research.
• Good practice!
11. DMPOnline for Software Management Plans
• Currently in the process of developing Imperial-specific
Software Management plan templates using DMPOnline.
• Previous templates through Software Sustainability
Institute – some sources available via GitHub.
12. Software Management Plans
How? (at Imperial College London):
• Specialised template in DMPOnline (via DCC)
• Imperial-specific DMPOnline template (in development).
• Use GitHub (Imperial has an enterprise account)
• Use Zenodo or another subject-specific repository to archive
versions of research software (GitHub integration)
• Log metadata about your software into Symplectic.
• Contact RDM Team (Central Library) for assistance/support: rdm-
enquiries@imperial.ac.uk
13. Any Questions?
Thank you!
For more information and support:
Webpage: www.imperial.ac.uk/research-data-management
E-mail: rdm-enquiries@imperial.ac.uk
And also:
DMPOnline: https://dmponline.dcc.ac.uk/
Software Sustainability Institute:
https://www.software.ac.uk/