Talk given at the National Science Foundation on the UK e-Science programme, the UK Software Sustainability Institute, and some of the challenges faced in ensuring long term development and maintenance of scientific software
2. Overview e-Science software in the UK A brief history OMII-UK Commissioned Software Programme ENGAGE Programme Software Sustainability Institute Approaches Software Preservation Challenges
3. UK e-Science Programme: Preparing the Ground “e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it” John Taylor, D-G RCUK e-Science Centres e-Science Pilot Projects
4. UK e-Science Budget (2001-2006) EPSRC Breakdown + Industrial Contributions £25M + £100M via JISC Total: £213M Staff costs - Grid Resources Computers & Network funded separately Source: Science Budget 2003/4 – 2005/6, DTI(OST) Slide from Steve Newhouse
5. OMII: Sowing the first seeds 11 initial projects funded by Managed Programme Many projects flourished But some wilted and decayed OMII setup to harvest and maintain software output of UK e-Science Core Programme
6. OMII-UK: Cultivating and Nurturing Emphasis on helping existing software grow Extra gardeners brought in (Edinburgh and Manchester) with their own plant stock Making the garden public through initiatives like Google Summer of Code and ENGAGE Inviting specialists through the PALs scheme Cultivate and sustain community software important to research
7. Software Sustainability Institute: pruning, staking, grafting Working with research softwareusers and developers Helping review and refactor Providing support and skills Identifying areas of convergence Producing strong, capablesoftware able to live longand be successfully built on
170. PCB ChileTutorials to over 2000 researchers: Antwerp, Bangkok, Basel, Boston, Cambridge, Catania, CERN, Chicago, Edinburgh, Hanoi, Hawaii, Helsinki, Leeds, London, Manchester, Newcastle, Nijmegen, Nottingham, Oxford, San Francisco, Seattle, Seoul, Sheffield, Southampton, Tenerife, Tokyo, Toronto, ISSGC 03 to 09
171. Developing the role of standards in the community OMII-UK is instrumental in the development and use of standards Enabling interoperation over continental scale AHE across TeraGrid, DEISA, EGEE DataMINX across SRB, GridFTP Reference implementations SAGA enabling legacy applications WS-DAI for data access JSDL/BES/HPC BP for computational job submission
172. Impact on UK Research The top 75% of “Quality of Research” funding is allocated to 49 UK research institutions out of a total of 159 HEIs
173. Impact on UK Research OMII-UK has worked with all 7 of the top research intensive institutions in each region: Oxford, Cambridge, UCL, Imperial Edinburgh Cardiff QUB
174. Commissioned Software Programme Commissioning Supporting Developing GridSAM Condor WS Geodise Lab AHE BPEL Designer Compute Grimoires Open Grid Manager Info / Registry MANGO Visual/ Collab WSRF:: Lite FINS/ FIRMS Infra / Security WSeSSH £3.4m initial funding for Managed Programme 2006: Q1 – Initial projects commissioned; open call to community Deprecated:
175. Commissioned Software Programme Commissioning Supporting Developing GridSAM Geodise Lab AHE BPEL Designer Compute Grimoires Open Grid Manager Info / Registry KNOOGLE MANGO Visual/ Collab WSRF:: Lite FINS/ FIRMS Infra / Security OMII-AuthZ 2006: Q3 – trials complete; new specific commissions Deprecated: Condor WS WSeSSH
177. Commissioned Software Programme Commissioning Supporting Developing GridSAM Geodise Lab GridBSBroker RAPID AHE BPEL Designer Compute OGRSH SAGA Grimoires Open Grid Manager Info / Registry KNOOGLE Visual/ Collab RAVE NGS JSDL App Rep PAG WSRF:: Lite Infra / Security OMII-AuthZ SCAMP NDG Security WHIP £1.4m additional funding for Commissioned Software Programme 2007: Q3 – Software integrated; new portal and simplified access calls Deprecated: MANGO Condor WS WSeSSH FINS/ FIRMS
178. Commissioned Software Programme Commissioning Supporting Developing GridSAM GridBSBroker RAPID AHE BPEL Designer Compute OGRSH SAGA Grimoires Open Grid Manager Info / Registry Visual/ Collab KNOOGLE RAVE NGS JSDL App Rep VIC + RAT PAG WSRF:: Lite Infra / Security OMII-AuthZ SCAMP NDG Security WHIP 2008: Q1 – significant support for implementations of standards Deprecated: MANGO Condor WS WSeSSH FINS/ FIRMS Geodise Lab
179. Commissioned Software Programme Commissioning Supporting Developing GridSAM GridBSBroker RAPID AHE BPEL Designer Compute OGRSH SAGA Grimoires Open Grid Manager Info / Registry Visual/ Collab RAVE NGS JSDL App Rep VIC + RAT PAG WSRF:: Lite Infra / Security OMII-AuthZ SCAMP NDG Security WHIP 2008: Q3 – start of investment in community development Deprecated: MANGO Condor WS WSeSSH FINS/ FIRMS Geodise Lab KNOOGLE
180. Commissioned Software Programme Commissioning Supporting Developing GridSAM GridBSBroker RAPID AHE BPEL Designer Compute SAGA Grimoires Open Grid Manager Info / Registry Visual/ Collab RAVE NGS JSDL App Rep VIC + RAT PAG WSRF:: Lite Infra / Security OMII-AuthZ WHIP SCAMP NDG Security 2009: Q1 – many projects complete, in use by community Deprecated: MANGO Condor WS WSeSSH FINS/ FIRMS Geodise Lab KNOOGLE OGRSH
181. Commissioned Software Programme Commissioning Supporting Developing GridSAM GridBSBroker RAPID AHE BPEL Designer Compute SAGA 8 projects with multiple international contributors through SF/CPAN/PyPl 75+ evaluations of 40+ components Grimoires Info / Registry Visual/ Collab RAVE NGS JSDL App Rep VIC + RAT PAG WSRF:: Lite Infra / Security WHIP SCAMP NDG Security Data DiGS CIAS Data MINX OSCAR 2009: Q3 – Data call commissioned; focus on community need Deprecated: Open Grid Manager MANGO Condor WS WSeSSH FINS/ FIRMS Geodise Lab OMII-AuthZ KNOOGLE OGRSH
182. Case Study: TavernaWorkbench Initially funded through e-Science myGrid project (2001-2005) Directly funded through OMII-UK (2006-2010) Plus marketing, outreach, legal and networking Platform funding (2009-2014) caBIG subcontract Eli Lilly development 40,000+ downloads of Taverna 1.x Take up in other domains,e.g. astronomy
183. Case Study: NERC Data Grid Security Provides single sign-on to federated data infrastructure NDGS software now installed at major NERC data centres in the UK Now used across multiple projects Filter based approach and OpenID work used by US Earth System Grid for access to CMIP5 archive METAFOR QUESTIONNAIRE COWS/NCEO Contributions back to Python community ndg_saml, ndg_xacml, MyproxyClient
184. Case Study: VIC + RAT Media backbone tools for audio and video maintained by UCL since early 90s Used as the basis for Access Grid, VRVS OMII-UK funding when other sources cut Allowed continued maintenance and bug fixes Enabled projects from Australia, Korea to contribute However difficulties in sustaining Rapid changes in hardware / software Too low profile Other projects not contributing back
185. Engaging Research with e-Infrastructure 53 direct interviews 200+ interviews total Interviews 30 month programme 14 projects 3-6 months duration £650,000 funding Wider deployment 17 papers 10 posters 50 presentations £1.36m further funding Projects Dissemination Adoption New requirements
186. First Phase ENGAGE Development Projects High Throughput Humanities for e-Research Exposing bioinformatic programs as Web Services Protein Molecule Simulation on the Grid Enable workflows in a Shared Genomics causality workbench Linking and Querying Ancient Texts SWARMCloud Rapid Chemistry Portals by Engaging Users
187. Second Phase ENGAGE Development Projects Monte Carlo Treatment Planning Crystal Energy Landscape Application Epigraphy and papyrology image processing Strengthening and support for eMinerals RMCS system Configuration parameters for the GENIE simulator Lab Blog Book Strengthening and supporting the text and data analysis toolkit OSCAR
188. ENGAGE Findings Significant findings include the most challenging aspects of e-Science application development is the communication between development and research teams there are differing time constraints on researchers and developers having good facilitators improves the success of a project centralisation of IT services means that it is harder to do exploratory development adherence to standards can reduce the barriers for the deployment of technology removing complexity can allow researchers to become developers there are still issues when trying to migrate from local to national resources issues which appear trivial to computer scientists can cause researchers to consider the software unusable.
189. ENGAGE Outputs Significant outputs of the development projects include: publicly available workflows in daily use by students to do analyses of 15,000 protein sequences a protein molecule simulation portal available to any user with a valid UK e-Science certificate a live portal being used teach over 140 students how to optimise molecule structures new data exploration techniques being enabled a number of follow-on projects funded to take the work pioneered in ENGAGE to a larger or different community; many improvements to commonly used software being released back to the community
190. Case Study: Crystal Energy Landscapes Understanding polymorphism in drugs E.g. Dosage profile Chemists Computational Experimental Developers Domain S/W Engineers Integrators Research Computing Services Facilitator http://www.youtube.com/watch?v=bkbRwOWmiwo
191. The Software Sustainability Institute A national facility for research software Providing services for research software users and developers Developing research community interactions and capacity Promoting research software best practice and capability Sustaining software by helping to negotiate the stages of the software maturity cycle
192.
193. consultative advice (software evaluation , development process, community engagement, dissemination, workshops+surgeries)
199. How are you going to choose the right approach? Preservation (techno-centric) Emulation (data-centric) Migration (functionality-centric) Transition (process-centric) Hibernation (knowledge-centric) Approach SSI effort focused here
200. Current SSI Guides Software development Software development: general best practice Developing maintainable software Testing your software Repositories Choosing a repository for your software project Migrating project resources: what to remember Creating and managing SourceForge projects Retrieving project resources from NeSCForge Open source Adopting an open-source licence Supporting open-source software Community building Recruiting champions for your project Recruiting student developers
201. SSI Evaluation Criteria Importance: the alignment of the research domain to the UK’s strategic research roadmap. Enthusiasm: the impact which the work will have on the community, engagement of software authors with process and the likely additional contribution that would be gained from the community. Value: the impact on the research outputs. Would the science enabled be significantly improved by the work? This is a measure of the User Demand for improvement. Availability: the likelihood that the work would enable the software to reach a new stage of availability e.g. taking it from within one collaboration to make it fit for the whole research community or a new community Tractability: the impact on the software. Will it be possible to improve easily the quality or performance of the software? Opportunity: will the work lead to new opportunities for sustainability, e.g. collaboration with other groups, commercialisation, alternative funding or new effort?
210. Exploiting software for sustainability Models Grant Mosaic Institutional support Fully Costed Service External Enterprise / Consultancy Royalties and Fees Donations Advertising T-shirt (spinoff merchandising) Vehicles University based Spin out company Consultancy and Customisation Industrial knowledge transfer Contracts Licensing Certification Support services / training Software as a Service Software Foundation Most common but what happens when PI retires?
211. Sustainability in Context Support / Contributions Software Sustainability Community Engagement Software Engineering Product Management Market Development Funding/ Effort
212. Software sustainability is part of the process Comparable to risk management No one right “solution” but many examples of best practice and process Plan from before the start if possible But must be reviewed regularly No longer considering timescales bounded by a project, but considering the product
213. Software development comes in stages Bridging criteria: strength of team; strength of market; proximity of software to market Idea Prototype Research Idea Prototype Idea Idea Prototype Research Supported Product Idea Prototype Research Supported An idea to solve a problem Understand the functionality Scaling to work for others Allow othersto participate
219. Case Study: CASTEP Building intellectual access ramps to support incremental engagement – building capacity and capability Individual Group Consortium W/ industry Community Active 48
220. Case Study: R-Project Basics: Website, mailing list, code repository, issue resolution Remove barriers to participation, increase efficiency 1993: First public release; 2 devs 1995: Code open sourced; 3 devs 1996: r-testers list set up 1997: lists split: r-announce, r-help,r-devel; public CVS; 11 devs 2000: CRAN split and mirror 2001: BioConductor 2003: Namespaces 2005: I8n, L8n 2007: R-Forge Today: BioConductor (33 core devs), R-Forge (532 projects, 1562 devs), CRAN (1400+ packages) 49 http://cran.r-project.org/doc/html/interface98-paper/paper_2.html
221. The Software Maturity Curve Portals Quantum chemistry Cloud Computing RDBMS Social Simulation Workflows Spatio- Temporal viz Molecular Dynamics Geospatial viz Digitised Doc Analysis Digital repositories Software proliferation Innovation Consolidation Customisation Time
222. Enabling Innovation Supporting emergent disciplines Needs recognition of innovative software development as part of funding Breaking down barriers We cannot assume that the way people interact with resources will conform to expectations e.g. researchers will use/store files outside of universities Researchers will do whatever they can to get an edge – they will not always conform
223. Supporting Consolidation “e-Science is an organic, emergent process requiring ongoing, coordinated investment from multiple funders and coordinated action by multiple research and infrastructure communities. It is both an enabler of research and an object of research” – RCUK Review of e-Science Bridging the expectation gaps between participants Maintenance vs. research Different timescales for “exciting” work Well supported open platforms are the key in the age of the research mashup Platforms to enable bottom-up innovation Platforms to enable citizen participation Competition/innovation built on top c.f. industry
224. Sustaining Customisation “The time constants for real transformative impact and significant competitive advantage is decades” – RCUK Review of e-Science Sustain software infrastructure in the long term Differing models: through centres; within institutions; distributed Need to change perceptions so that software is seen as valuable! (and not just invaluable) Lower barriers to community growth and participation Increase value of providing services Virtually merge + map small amounts of effort / funding
225. Invest in people People are the most important investment Adaptability, ability to recognise transferable skills, not strict career paths Software developers come from many backgrounds If e-Science is multi-disciplinary, multi-institution, multi-scale then make it easier to recognise peoples efforts as they move University structures do not make it easy These people are key to effective e-Science as they bridge the gap between other participants
226. The credit question How do we get credit for reusing, extending and sustaining software? Research credit is based on publication output Data citations and credit for reuse are still not commonplace Software credit is the next stage Otherwise how can we persuade people to contribute back?
Managed Programme gave money to address gapsMany projects flourished (such as GridSAM, the Application Hosting Environment from RealityGrid and BPEL Designer), but some wilted and faded away.
8 projects with multiple international contributors through SF/CPAN/PyPl
With the SSI we have reached a new stage where we are working to support all the current gardeners who are already out there.So, how are we going to do this?
Quality of Research funding
The reason we are able to have such an impact is because of the approaches we have developed in working with the communityLeads toCSP – how we got betterENGAGE – how we encourage investment
Interviews, from ENGAGE and from eUptake/eIUSDistilled into development projectsGuided by database of findings: barriers and enablersPushed out through NGS roadshows, websites, newsletters, workshopsIdentifying the new requirements
Monte Carlo Treatment Planning (MCTP)Groups of users at Velindre Hospital and collaborating centres will be able to use the NGS-based computationally intensive radiotherapy planning software through the RTGrid portal on a routine basis, both within and outside an NHS firewall. The documentation and software will be of a sufficiently high quality to allow the RTGrid software to be established at institutions without any help from specialists in the RTGrid project. Data protection and security issues will also be addressed.Crystal Energy Landscape ApplicationThe application uses a good part of the OMII stack, in particular WS-I, GridSam, OMII-BPEL and Grimoires. . This servlet then invokes the BPEL engine that orchestrates the workflow required to perform the search and at the end of the search the results are visualised on a web page. The scientists also use this web page to check progress of the calculation as it gets updated as the results come in.replace DMAREL with DMACRYS, which is capable of dealing with much larger molecules and crystal structuresexpand the BPEL workflow to perform post-processing of the resultsport the deployment to run on both Legion and Condor pool for testing, and design it to then also run on the NGS so that polymorph calculations can be performed by the wider range of users.Epigraphy and papyrology image processing : VRE-SDMapplications developed within eSAD will be encapsulated such that they are easily transported to a distributed computing environment such as the NGS.The presentation to the user will be through a custom development of the NGS applications repository portlet such that complications such as remote resource and application version selection are automatically performed. This JSR-168 compliant portlet will also then seamlessly fit into the portal environment developed within the VRE-SDM project.By basing this development on the NGS application repository we will be able to take advantage of already existing web-service endpoints that are able to connect into the computational resources of the NGS using the OMII-UK developed GridSAM software as currently deployed at partner resources of the NGS.Strengthening and support for eMinerals RMCS systemEnabling RMCS to work on the hardware provided by partner and affiliate sites in addition to that of the core sites;Supporting one change to the software from the AgentX XML tool (now no longer under active development since the loss of core STFC staff earlier in 2008) to the use of XPATH (we have carried out some preliminary work on this);Enhance support for MS Windows users, including reactivation of a java GUI (support lost since the STFC financial crisis) and user-friendly packaging of the client tools;Revision and field-testing of the documentation;Support for working with campus grids using Condor; there are some oddities with the Globus–Condor interface that need examination;Support for the NGS training teams;Creation of some use cases with groups of new users, focussing on the DL_POLY and CASTEP modelling codes and the SHETRAN hydrology codes. Specific groups will be easy to select from within the materials modelling community if this proposal is approved; the SHETRAN community is based in Newcastle.Configuration parameters for the GENIE simulatorThe aim of this project is to provide a fully functional prototype of a 'launchpad' application which will facilitate set-up and launch of GENIE model runs and to facilitate its use in a GENIE training workshop for PhD students and more senior researchers and in Masters-level teaching units at the University of East Anglia and Bristol in the Spring Semester 2009. After evaluation in these environments, an improved version will be added to the trunk of the GENIE subversion repository and a tagged release will be made to allow the use of the launchpad by anybody using the latest stable release of the model.Integrating field work with the e-Lab Notebook with centralized services and archivesThis scenario offers integration with the grid-computing and the associated storage, retrieval and integration of instrument-recorded data. Use of the blog framework makes it easier to store more fully annotated data. The results of other services, for example NGS calculations, can be returned to the blog in an annotated and context-rich format. The investigative computations, “a soft pipelines approach” can be tried and tested incrementally and recorded for discussion, before formally committing to pipelines and other more rigid workflows. This benefits the wider research community by providing improved context for the data, and significantly the processes as these are recorded automatically and is therefore more easily searchable.Strengthening and supporting the text and data analysis toolkit OSCARThe ease with which developer-users could work with OSCAR, and with which developers could build end-user tools would be massively increased by refactoring all of OSCAR to the same Object Oriented style API, with good configuration support and developer and user documentation. Implementing unit testing across the library will make it easier and less difficult to maintain in the future.The developer-user utility would also be enhanced by building a component that enables OSCAR to work in the UIMA architecture, and therefore with the various tools provided by NaCTeM. NaCTeM have indicated strong interest in seeing OSCAR integrated with UIMA
Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchers
There is a spectrum of approachesExamples:-
Based on CSP evaluation and Engage triage
JournalTOCS largest collection of TOCs from major publication
Update slide for surveymapper?
Update slide for surveymapper?
How does software sustainability fit within context of software engineering, community engagement, project management, fundingWhat are the external factors like change in effort, timelines and deadlines, licensing, step changes in product development
No one sets out to make a bad piece of software
Frequency Hopping Spread Spectrum (HedyLamarr) originally using a piano roll, Nikola Tesla for controlling boats
Tools –Signal Data Explorer (SDE)We developed SDE which is now being used:In CARMEN –neuroscience tools and data sharingIn BROADEN and in Rolls-RoyceWe exploited SDE through Cybula Ltd.Being used on trainsStarted to sell out of the box system
CAStep: keeping up with the community
Allowing people to move makes it easier to bridge gaps as you have a chance of creating common communication structures
Become our next collaborator – email info@software.ac.uk