The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software.
In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.
Where does it go from here? The role of software in digital repositories
1. www.software.ac.uk
Where does it go from here?
The Place of Software in Digital Repositories
12 July 2012
OR2012, Edinburgh
Neil Chue Hong (@npch)
N.ChueHong@software.ac.uk
Software Sustainability Institute
2. Software is pervasive
in research www.software.ac.uk
Software Sustainability Institute
3. The Software Sustainability
Institute www.software.ac.uk
A national facility for building better software
⢠Better software enables better research
⢠Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
⢠Providing the expertise and services
needed to negotiate to the next stage
⢠Software reviews and refactoring, collaborations
to develop your project, guidance and best practice
on software development, project management,
community building, publicity and moreâŚ
Supported by EPSRC
Software Sustainability Institute Grant EP/H043160/1
4. Software Sustainability:
preservation vs sustainability www.software.ac.uk
Sustainability?
Image courtesy of London Permaculture under CC-by-nc-sa license
Image courtesy of Mortati under CC-by-nc-nd
Preservation?
Software Sustainability Institute
5. Why are you considering
software sustainability? www.software.ac.uk
Achieve legal compliance
Create heritage value
Purpose
Enable continued access to data
Encourage software reuse
JISC-funded, with Curtis+Cartwright
http://www.software.ac.uk/resources/preserving-software-resources
Software Sustainability Institute
6. How are you going to choose
the right approach? www.software.ac.uk
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Approach
Transition (process-centric)
Hibernation (knowledge-centric)
Deprecation
Software Sustainability Institute
7. Software Carpentry
www.software.ac.uk
⢠Helping scientists be more productive by
teaching them basic computing skills
⢠How to use
repositories
properly
is a key skill
⢠http://software-carpentry.org
Software Sustainability Institute
8. Just the Nature of the problem?
www.software.ac.uk
Statistics courtesy of Greg Wilson, Software Carpentry, from Nature article
Maintenance is not fun
Published online 13 October 2010 | Nature 467, 775-777 (2010)
doi:10.1038/467775a
Hacking is fun
Software Sustainability Institute
10. Slide from Carole Goble, JCDL 2012
Reuse Review
New Refresh
State
Rerun
Same
State Good enough Repeat
To Verify
Reproduce
with new Data
Data
Replay
Provenance
Repurpose Recover
Reconstruct Repair
Data
Reproduce with new Method
Public
ation
Method Method Method
only
Documentation Provenance Execution
(link data and code)
Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
11. The most important: Reward
www.software.ac.uk
⢠How do we reward people for important software
contributions?
⢠Traditionally: publish a research paper that happens to
mention software
ď§ Can we provide more direct, acceptable software citations?
⢠A Research Software Impact Manifesto
ď§ http://www.software.ac.uk/blog/2011-05-02-publish-or-be-
damned-alternative-impact-manifesto-research-software
ď§ NB Authorship is hard
Software Sustainability Institute
13. Boundary www.software.ac.uk
What do we choose to keep:
- Workflow?
- Software that runs workflow?
- Software referenced by workflow?
- Software dependencies?
Whatâs the minimum citable part?
Software Sustainability Institute
14. Function
Granularity www.software.ac.uk
Library / Suite / Package
Algorithm
Program
âŚ
Software Sustainability Institute
15. Why do we version?
Versioning www.software.ac.uk
- To indicate a change
- To allow sharing
- To confer special status
Public Public Public
v1 v2 v3
Personal Personal
v3 v3a
Personal Personal Personal
v1 v2 v2a
Personal
v2a
Software Sustainability Institute
17. Differing roles,
different repositories www.software.ac.uk
backup ď¨ sharing ď¨ archiving
Timescales Ingest
Policy Metadata
Licensing Assurance
Software Sustainability Institute
18. Software Metapapers
www.software.ac.uk
⢠Create a complete scholarly record including âstandardâ
publication, method, dataset and models, and software
ď§ e.g. modelling and simulation, statistical analysis
ď§ Enable replay, reproduction and reuse
⢠Pragmatic approach is to create a metadata record for
the software, and link it to a copy of the software in
some storage infrastructure
ď§ This is a software metapaper
ď§ Peer-review the metadata, not the software
⢠Journal of Open Research Software:
ď§ http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/
Software Sustainability Institute
and the work by B. Matthews et al: The Significant Properties of Software: A Study
19. An acceptable repository
www.software.ac.uk
⢠Metapaper references an instance of software,
stored in a âsuitableâ repository
ď§ Clear access / deposit / preservation policy
ď§ Adherence to standards
ď§ Ability to easily âtransferâ
ď§ Sustainability of hosting organisation
ď§ Ability to monitor, check integrity (obsolescence?)
⢠We may be storing
ď§ Binaries, source code (as text or archived), virtual
machines(!)
Software Sustainability Institute
20. Potential for confusion
www.software.ac.uk
⢠âThe right license for all parts of the scholarly recordâ
ď§ Victoria Stodden, Enabling Reproducible Research: Open
Licensing for Scientific Innovation
⢠Commonly used OSI approved licenses include:
ď§ Apache License, 2.0 (Apache-2.0)
ď§ BSD 3-Clause âNewâ or âRevisedâ license (BSD-3-Clause)
ď§ BSD 3-Clause âSimplifiedâ or âFreeBSDâ license (BSD-2-Clause)
ď§ GNU General Public License (GPL)
ď§ GNU Library or âLesserâ General Public License (LGPL)
ď§ MIT license (MIT)
ď§ Mozilla Public License 2.0 (MPL-2.0)
ď§ Common Development and Distribution License (CDDL-1.0)
ď§ Eclipse Public License (EPL-1.0)
⢠Does enabling the deposit of software just confuse
those already depositing publications/data?
Software Sustainability Institute
21. 5 Stars of Software?
www.software.ac.uk
⢠Do we need a 5 stars for software?
ď§ Existence â there is accurate
metadata that defines the software
ď§ Availability â you can access and run
the software
ď§ Openness â the software has an
open permissible license
ď§ Assured â the software provides
ways of assuring its correctness
ď§ Linked â the related data, c.f.
5 Stars of Linked Data
dependencies and papers are (Berners-Lee)
indicated 5 Stars of Online Journals
(Shotton)
Software Sustainability Institute
22. Take home points www.software.ac.uk
1) Researchers are developing more software
than ever, and trying to do it better
2) They want to be rewarded for creating a
complete scholarly record â this includes
software
3) We still donât know the best way to shift
from one repository role to another when it
comes to software!
BackupSoftware Sustainability Institutearchiving
-> sharing ->
Hinweis der Redaktion
Steven Gray here at CASA has produced a proof of concept showing the last hours snow fall in the UK as Tweets and the last 24 in postcode districts (the important part here is the data underneath, not the Tweets as such)Based on Ben Marshâs work.
I ended up doing this because we needed to fix the basics:Reproducible researchSoftware credit / career pathsSoftware skillsDrawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
Clarifying the Purposes and Benefits of Software Preservation: http://softwarepreservation.jiscinvolve.org/wp/about/
There is a spectrum of approaches
Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?
c.f work of James Howison
Based on study done for Cameron Neylonâs Beyond Impact workshop
Is it more important to sustain the software that this workflow references, or the workflow itself?
At what level do you reference, at what level do you deposit?
Made more difficult than data because of the fluidly changing collaborative nature of software development â not just adding to the contributor pool
Based on OR2012 workshop outputs
Want to move towards OSI licenses which are similar in spirit to CC-BY e.g. BSD, Apache
C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?