Ever spotted some great looking software only to discover you can’t get it, it doesn’t work, there is no documentation to help fix it and the developers don’t have the time or incentive to help? Ever produced some software that you want to be widely used or have folks contribute? What’s the sustainability of that key platform/library/tool /database your lab uses day in and day out? Are you helping the providers? The same issues stand for Data (or as we now say “FAIR” Findable, Accessible, Interoperable, Reusable Data) and its metadata. Is anyone looking out for Europe’s data services– the datasets and analysis systems you use and you make – the standards they use and the curators and developers who make them? Or is FAIR just a FAIRy story? I’ll tell how two organisations with quite different structures and approaches - the UK’s Software Sustainability Institute and the ELIXIR European Research Infrastructure for Life Science Data – are working for the common goal of better software, better service, and better research.
https://www.rothamsted.ac.uk/events/14th-international-symposium-integrative-bioinformatics
Better software, better service, better research: The Software Sustainability Institute, ELIXIR and you
1. Better software, better service,
better research
The Software Sustainability Institute, ELIXIR and you
Professor Carole Goble
Head of Node ELIXIR UK
Software Sustainability Institute UK
The University of Manchester, UK
carole.goble@manchester.ac.uk
Keynote: 14th Intl Symposium on Integrative Bioinformatics, IB2018
Rothamsted Research, Harpenden, UK, 13-15 June 2018
2. My team has produced lots of software and
services used by others for a long time…
… including data and
metadata management
and sharing systems …
viewer
3. Shared and sharable data and software
key to reproducibility & productivity
• Improve transparency, understanding,
trust
• Eliminate errors
• Encourage collaboration
• Ease on-boarding
“Scholarship is the full
software environment, code
and data, that produced the
result” - Claerbout
4. Hey, some great looking software !
you can’t get it
it doesn’t work for me
no documentation
developers don’t have resources to help
or have gone
how do I get it be widely used?
have folks contribute
make it sustainable
get folk who use it to contribute to it
Hey, I have some great software !
5. Hey, how can I get hold of and use data?
A great deal of talk of FAIR at Integrative Bioinformatics 2018……..
Hey, how do I make my (meta)data FAIR?
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
6. Are they FAIR so I can upload
to them and use them?
Are they sustained?
7. F A I R
F R E E
≠
No software or data is free. Its all sponsored.
8. “Better Software,
Better Research”
Software Sustainability
Institute
UK national facility
cultivating better, more
sustainable, research software to
enable world-class research
Est 2010 By UK funders
Better Software, Better Research
IEEE Internet Computing (2014)
doi.ieeecomputersociety.org/10.1109/MIC.2014.88
“FAIR Data for Life”
ELIXIR
European Research
Infrastructure
operating a sustainable European
infrastructure for biological information,
supporting life-science research and its
translation to society, the bio-industries,
environment and medicine.
Est 2013 by
inter-govt
agreement
The FAIR guiding principles for Scientific data
management and stewardship Scientific Data 3,
160018 (2016) doi:10.1038/sdata.2016.18
http://elixir-europe.org
9. 4 organisations
Edinburgh, Manchester, Southampton,
Oxford
21 National Nodes* + Hub
*Counting EMBL-EBI as a Nation
Seeded an international movement
>180 organisations
“Act Local
Think Global”
“Act Global,
Think Global….”
11. The research community and research
depends on software
Do you use research
software?
What would happen to your
research without software
Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October
2014. 406 respondents covering representative range of funders, discipline and seniority.
12. Software Sustainability Institute
www.software.ac.uk
The Research community
produces software
scientific software is important
for their own research
91%
developing scientific software is important
for their own research
84%
claimed to spend more time developing
scientific software than they did 10 years ago
53%
spend at least one fifth of their time
developing software
38%
2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc.
ICSE Workshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8.
13. Software Sustainability Institute
www.software.ac.uk
£840m
Investment in 2013-2014 financial
year, an amount that has risen by 3%
on average over last four years
The cost of UK research that
relies on software
30%
Of total research investment has
been spent on research which relies
on software over the last four
financial years
Analysis of data from 49,650 grant titles and abstracts published on Gateway to Research covering 2010-2014.
16. Software Sustainability Institute
www.software.ac.uk
Culture change is hard
Stodden, Seiler, Ma. An empirical analysis of journal policy
effectiveness for computational reproducibility
https://doi.org/10.1073/pnas.1708290115
“We require that all computer code used for modeling and/or
data analysis that is not commercially available be deposited
in a publicly accessible repository upon publication.”
“After publication, all reasonable requests for data, code, or
materials must be fulfilled.”
In 2011 Science changed its editorial policies:
17. Software Sustainability Institute
www.software.ac.uk
Culture change is hard
Stodden, Seiler, Ma. An empirical analysis of journal policy effectiveness for computational
reproducibility, PNAS March 13, 2018. 115 (11) 2584-2589;
https://doi.org/10.1073/pnas.1708290115
19. Software Sustainability Institute
www.software.ac.uk
Software Ecosystem
Patchworks and Spectrums
Not all software is equal and worth sustaining. Its all worth being good.
Nangia and Katz:
https://arxiv.org/pdf/1706.06527.pdf
Invisible
Domain
generic
Visible
Domain
specific
Tools
Services
Workflows
ScriptsLibraries
Frameworks
platforms
Teams Individuals
20. Software Sustainability Institute
www.software.ac.uk
Software Ecosystem
Patchworks and Spectrums
Not all software is equal and worth sustaining. Its all worth being good.
Nangia and Katz:
https://arxiv.org/pdf/1706.06527.pdf
Intentional Side-effect
Full fledged
for reuse
Throw-
away
Code Algorithm
21. Software Sustainability Institute
www.software.ac.uk
Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
56%Of UK researchers develop their
own research software or scripts
73%Of UK researchers have had no
formal software engineering training
140,000UK researchers rely on their own coding skills
The UK research community
making software
22. Software Sustainability Institute
www.software.ac.uk
Software making practices
“As a general rule,
researchers do not test
or document their
programs rigorously, and
they rarely release their
codes, making it almost
impossible to reproduce
and verify published
results generated by
scientific software”
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming does not compute.
2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc. ICSE Workshop Software
Eng. for Computational Science and Eng., 2009, pp. 1–8.
24. Llorente et al. Science, 350, 6262
doi:10.1126/science.aad2879
The results presented in the Report “Ancient Ethiopian genome reveals
extensive Eurasian admixture throughout the African continent“ were
affected by a bioinformatics pipeline error that wrongly discarded data
25. Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
Software Consultancy Training
CommunityPolicy
Helping the community to
develop software that meets the
needs of reliable, reproducible,
and reusable research
Delivering essential software
skills to researchers via CDTs,
institutions & doctoral schools
Bringing together
the right people to understand
and address topical issues
Collecting evidence
on the community’s software
issues & policymaking with
stakeholders
Outreach
Exploiting our platform
to enable engagement,
delivery & uptake
26. Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
Software Consultancy Training
CommunityPolicy
140+ UKCarpentry
workshops
4500+ learners
10 delivery partners
Outreach
50+ projects
130+ evaluations
4 surgeries
90+ guides
50,000 readers
Network of 112 Fellows
across 70 orgs
20+ workshops
organised
740 researchers surveyed
50,000 grants analysed
Web site & Blogs
150+ contributed articles
20,000 unique visitors per month
3,000Twitter followers
300+ RSEs
engagedAdvice to UK, USA
and EU govt
stakeholders
28. Software Sustainability Institute
www.software.ac.uk
Fellowship Programme
Fellowship programme funds researchers in
exchange for their expertise and advice.
• travel to conferences, setup and run workshops,
organise software sustainability sessions at
domain conferences, host, organise or teach
training events
Annual Collaborations Workshop
SSI Fellows
29. Software Sustainability Institute
www.software.ac.uk
Workshops
Software Deposit and Preservation Workshop
11 July 2018, with Jisc
https://software.ac.uk/workshops
Developing Software Licensing
Guidance for the BBSRC 24
April 2017, with ELIXIR-UK and
the BBSRC
Docker Containers for Reproducible
Research, 27-28 June 2017Specialist
International
9th International Workshop on
Sustainable Software for Science:
Practice and Experiences
29 Oct 2018, Amsterdam, NL
3rd Research Software Engineers
Conference
3-4 Sept 2018, Birmingham, UK
NSF Workshop Data and Software
Citation, 6-7 June 2016, Boston
USA
Annual Collaborations
Workshop
Software Credit Workshop - 19 Oct 2015,
Natural History Museum, London
30. Software Sustainability Institute
www.software.ac.uk
Policy making
Campaigning for Software Recognition by researchers,
publishers, journals, funders, institutions, societies …
When and how
should I cite?
How do I deal with
components and
teams?
Be a better reviewer
Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices 10.1145/2232817.2232822
34. Software Sustainability Institute
www.software.ac.uk
Get help
Biomolecular
systems and
protein
modelling
codes
BoneJ: suite of open-
source plug-ins for bone
shape analysis based on
ImageJ
Community assessment
and building
Improved testing f/work
Packaging and installation
Improved coding standards
Improved web site
Community web portal
ionomic data on over
300,000 plant and yeast
samples
Rehosted service
Migration of portal from
Purdue to Nottingham
Technical analysis of the
service + a migration
process
Changes to ensure the
long-term sustainability
User assessment
Re-architect and scale
One-man, small-scale
software project into
multi-developer
programme
ChrisWood
David SaltMichael Doube
36. Software Sustainability Institute
www.software.ac.uk
Get a plan and publish…
develop share preserve
Developed and
versioned using
code repository
Published via
code repository
or website
Deposited in
digital repository
with paper /
for preservation
SOFTWARE HERIT
38. Software Sustainability Institute
www.software.ac.uk
Writing for strangers
Goldilocks principle
• Readers of papers
• Reviewers
• Future collaborators
• Potential users
• Potential contributors
• Future members of
your research group
• Current students
• Co-authors
• You in 6 months time
“stranger - anyone who doesn’t possess our
current short-term memory and experiences”
– David Donoho
40. Software Sustainability Institute
www.software.ac.uk
All software is “legacy code”
Maintenance =
Evolution
prepare to repair
if its used it will evolve
Institute
Software
Sustainability
Corrective
Preventative
Adaptive
Perfective
Keeping the Show
on the Road
Dealing with
change
41. Software Sustainability Institute
www.software.ac.uk
provenance
portability
good enough practices
access documentation
adopt a licence
make it discoverable
source code accessible
citation metadata
validation docs
test data
example data
version control, automated build and test,
code reviews by mates, modularise, use standards
clear and transparent contribution, governance and communication processes
packaging, containers
dependencies
Writing for Strangers
ids steps
44. ELIXIR
Services & Activities Training
CommunitiesPolicy
Data,Tools, Compute, Interoperability
Engage
European
International
National
Industry
domains
technologiestechniques
45. European Level
EOSC Summit 11 June
Open for comments until
5th August
https://github.com/FAIR-
Data-EG/action-plan
http://bit.ly/interim_FAIR
_report
https://ec.europa.eu/info/events/2nd
-eosc-summit-2018-jun-11_en
49. Standards
general specific
1. Define driving
user
questions(s)
7. Query
interface
2. Pre-
FAIRification
analysis
3. Define
semantic model
4. Transform
data
records
5. Define
metadata
6. Deploy FAIR
data point
Data Stewards (cf RSE)
FAIRification
Processes
50. Data Validation
Open validation services for archetype archival databases and knowledge
bases: public APIs, min information checklists, file formats, phenotyping data.
ELIXIR- BE, EBI, UK, FR
[Frederik Coppens]
51. Bioschemas.org
Universal Lightweight Web Mark-up
to Find, Cite, Index, Summarise
without API tears
DataCatalog
Dataset
Event
Lab Protocol
Tool
Training Material
Protein
ProteinAnnotation
ProteinStructure
Sample
Beacon
Machines processable
metadata for better
software, better search
< / >
52. Bioschemas.org
Towards Knowledge
Graphs for Biology
MarRef
Marine Metagenomics
Database
BioSamples
Deposition
Database
Aside: Google alpha test dataset-search feature (under NDA) invitation….
53. Describe workflows to be portable,
scalable & interoperable with different
workflow systems and containerised tools
54. 58
ELIXIR Tools, Workflows & Containers
BioTools
Registry Packaging
Containers
Integration
Workflows
Benchmarking
Info Standards Software
Best
Practice
Communities
InteroperabilityTraining
ComputeData
EDAM
55. Five steps to better data better research –
metadata at source
Get expert help Train yourTeam
Publish your Data
Develop a Data
Management Plan
Annotate
for
strangers
56. Five steps to better data better research –
metadata at source
Annotate
for
strangers
create analysis-
friendly data
use a unique
identifier for
each record
record your
processing
steps
use standards
try to use
platforms and
tools that work
together & help
save and
backup raw
data
57. Tragedy of the Commons
metadata & identifier quality
https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
https://metadatacenter.org
Creating good metadata takes
considerable work ….
When investigators act in their own self-
interest, taking short cuts to generate
metadata as quickly as possible, we
should expect that the overall utility of
the resource will decline.
… a need for easy-to-use solutions that are generic
to provide guidance over the entire life cycle of
metadata — streamlining metadata creation,
discovery, and access, as well as supporting
metadata publication to third-party repositories”
Mark Musen
58. The Nodes: The last (or is it first?) mile
Bench Benefit
HEI’s
Institutes
Industry
HEI’s
Institutes
Industry
Policy Makers
Public
Nodes
“Act Local Think Global”
The ‘last mile’ challenge for European research e-infrastructures
https://doi.org/10.3897/rio.2.e9933
59. The Nodes: The last (or is it first?) mile
Bench Benefit
“Act Local Think Global”
Nodes
The ‘last mile’ challenge for European research e-infrastructures
https://doi.org/10.3897/rio.2.e9933
60. FAIRDOM
Project Commons, Stewardship and the Last Mile
https://fairdomhub.org
SOPs
https://nels.bioinfo.no
https://bio.tools/nels
NORWAY
models
Data
DATA
fair-dom.org
Data
Stewards
61. Researcher / DeveloperInstitution / Lab
National / International
Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
The ‘last mile’ challenge for European research e-infrastructures https://doi.org/10.3897/rio.2.e9933
Act
Think
62. Overcoming the Tragedy of the Commons at all
scales … TOGETHER….
Help Skills
CommunityPlans and Policies
Work for
strangers
Value Systems Sweatshops
CreditInfrastructure funding
models FAIR ≠
FREE
First Mile Ramps
Professionalise
Beat Cultural Inertia
Pay
RSEs Data stewards
Skill at SourceServices and Practices
embedding
65. Funder
Acknowledgements
European Union Horizon 2020 program under
grant agreement 676559
Implementation Studies
CWL and Bioschemas
European Union Horizon 2020 program
under grant agreement 675728.
European Union Horizon 2020 program
under grant agreement 654248.
European Union Horizon 2020 program
under grant agreement 739563.