This webinar discusses lessons learned from developing West-Life, a virtual research environment for integrative structural biology. West-Life brought together software tools from different disciplines to facilitate complex multi-technique research projects. Key lessons included understanding researchers' needs, promoting interoperability between existing tools, and involving the user community in strategic decisions through a "middle-out" approach. The webinar also highlights the maturity of structural biology users and the importance of semantic technologies and community development.
1. West-Life: Lessons learnt
from developing a virtual
research environment
Presenters: Chris Morris (Daresbury Laboratory)
Host: Natalie Haley (INSTRUCT)
31/10/2018
CORBEL Webinar Series
4. BACKGROUND
4
Since 2015, thirteen ESFRI Research Infrastructures from the field
of BioMedical Science (BMS RI) joined their scientific capabilities
and services to transform the understanding of biological
mechanisms and accelerate its translation into medical care.
• biobanking &
biomolecular resources
• curated
databases
• marine model organisms
• systems biology
• translational research
• functional genomics
• screening & medicinal
chemistry
• microorganism
s
• clinical trials
• structural biology
• biological/medical
imaging
• plant phenotyping
• highly pathogenic
microorganisms
5. CORBEL MISSION
5
Modern biological and biomedical research involves complex
projects and a variety of different technologies.
Some of the most important discoveries are made at the
interface between different disciplines.
CORBEL will harmonise access and services for complex
research projects involving more than one RI that offer:
– biological and medical technologies
– biological samples and
– data services
6. TODAY’S PRESENTER
31/10/2018
footer6
Chris Morris is deputy group leader
for the Computational Biology Group
at Daresbury Laboratory. He works on
software project management and
data analysis, including work for the
Scientific Machine Learning group. He
has been a software developer for
twenty five years. He says “Eventually I
realised that the coding is not the
hardest part of the job”.
7. • 10 Partners:
o STFC (UK) (lead partner, Martyn Winn Coordinator)
o Dutch Cancer Institute (NKI) (NL)
o EMBL (DE)
o Masaryk University (MU) (CZ)
o Consejo Superior De Investigaciones Cientificas (CSIC) (ES)
o Consorzio Interuniversitario Risonanze Magnetiche Di Metallo
Proteine (CIRMPP) (IT)
o INSTRUCT (UK)
o Utrecht University (NL)
o Luna (FR) – (SME)
o INFN (IT)
West-Life: the partners
Budget: €4 000 0000
Duration: 36 months
Started: 1 Nov 2015
Proposal ID 675858
9. Integrative structural biology…
• Larger macromolecular
machines
• Time dimension
• Hybrid methods
• Resolution revolution in
cryo Electron Microscopy
• Delivery of results to
other disciplines
10. Survey at Instruct Biennial 2014
• 73% working on eukaryotic rather than
prokaryotic systems
• 84% working on complexes rather than single
gene products
• Each research team routinely uses three-four
different techniques
• 84% agreed “I would use combined techniques if
the software was easier to get and use.”
• 61% disagree that “It is easy to combine software
tools for different techniques in integrated
workflows.”
11. … new IT challenges
• Algorithms for hybrid methods
– Link the silos
• Validation methods
• Data management
• Metadata management
– Provenance, mmCIF
– Web services not installables
• Single Sign On
• 21st Century software development
18. How it happened
• Workshop in 2012: “Integrated Software for
Integrative Structural Biology ”
– http://journals.iucr.org/d/issues/2013/05/00/index.html
• Survey at Instruct Biennial 2014
• Application to EINFRA-9-2015
“users need to become much more directly involved in strategy,
coordination and innovation in each of the e-Infrastructure
components. This implies that users also need to be empowered to
drive the direction of e-Infrastructure service. To this end, the
funding for service delivery should be channelled through the users,
rather than directly to the service delivery organisations."
e-IRG White Paper 2013
20. Lessons
• Understanding the context of use
• … and the context of development
– decades of prior work
– W-L is small part of community
• Interoperability
– No single data model
– No single workflow engine
• Approach may be different for greenfield
development
21. Crowdsourcing from
the middle tier
• Community includes:
– Life scientists who use computers (106)
– End user programmers (104)
– Algorithm developers (102)
• Widgets / BioJS / Web Components
– Compose existing services
– Flask: web development in Python
• Semantic web underexploited
22. Questions?
West-Life: Lessons learnt from developing a virtual
research environment
Chris Morris (Daresbury Laboratory)
References
31/10/2018
footer34
25. Misuse cases
• Deliberate injection of false data
• Plagiarism
– Industrial project in progress has value
€1e7
• Identity theft
• As research becomes e-enabled …
• … so does research misconduct
Editor's Notes
Today’s speaker is the project manager for West-Life, a Horizon 2020 project which has developed a Virtual Research Environment for structural biology.
Here are two recent structures which were processed using West-Life services.
On the left is pseudopodium-enriched atypical kinase 1. This structure shows similarities with the Parkinson's disease-associated kinase PINK1.
On the right is the N-terminal domain of Syncrip, bound to an extended RNA Recognition Motif. As a result of this structure, the N-terminal domain was identified as a sequence-specific RNA binding domain.
West-Life is an e-infrastructure project, working to serve the domain of structural biology. It is driven by some developments in the research questions chosen by structural biologists. They are now investigating the macromolecular machinery of the cell rather than individual proteins. The graphs divide the structures deposited in the PDB each year into categories, showing a trend to undertaken more challenging research projects over time. In order to do this, structural biologists need to use not just one experimental technique, but several. A survey at the Instruct conference in 2014 confirmed this picture.
For those of us interested in metadata, it’s worth adding that 26% agreed that “Last year I discarded some samples or files because their provenance was not recorded well enough” and 24% agreed that “Last year I repeated some work because I could not find the sample or file produced.”
To support these methods, we have enhanced existing web services for structural biology. In particular the EMBL extended ARP/wARP to build models in EM maps, and Utrecht University extended the docking predictor to accept cryo-EM density maps as restraints. CIRMMP extended the MetalPDB database and services.
We also participated in the development of new services. UU developed SpotOn for the identification of hot-spot residues in protein complexes, PRODIGY for the prediction of binding affinities for protein-protein complexes and protein-small ligand (PRODIGY-LIG) complexes. CSIC developed 3DBionotes for annotating structures with biochemical and biomedical information, and Dipcheck for validating protein backbone geometry.
We also joined up existing services into pipelines. After using HADDOCK or PDB-REDO, the researcher will find a link to see the results in the visualization tool 3DBIONOTES. Someone who uses the CCP4Online web services is then invited to submit the results to ARP/wARP for refinement, and then to the PDB-REDO service provided by partner the NKI.
With smaller files, these pipelines simply submit the output from the first service to the second. With larger files, we provided an interface to e-Infrastructure data services. In particular partner INFN made a link to Onedata.
Some of our services need no login. To make the pipelines useable, the services that do require login all accept the same identity, via a new identity proxy developed by partner Masaryk University.
The weakness of current European Single Sign On efforts is the lack of singularity. We chose technologies compatible with those used by Elixir, so a merger will be possible in future.
Some of our services are easily provided with local compute resources. Others rely on EGI services, and their future relies on the development of the European Open Science Cloud. Partner CSIC has already run an EOSC pilot project about publishing workflows.
So this is what we made.
A CECAM workshop in 2012 explored the scientific needs for enhanced IT infrastructure for structural biology, to support Instruct’s mission to promote integrative methods. The proceedings were published in Acta Cryst D.
At the time, there was no appropriate call to fund this work. But one opened up two years later. We were in the unusual situation that the scientific case for the project was already published, so we were able to submit a successful grant application.
No software development can be successful without understanding the context of use. Most West-Life developers are embedded in structural biology groups. The others, from solution providers like STFC and INFN, have had to listen to other partners to ensure that their work is relevant.
The PhenoMeNal project built a VRE for metabolomics. They designed a microservice architecture using Docker. This is a great example of how to build a VRE from scratch.
West-Life was in a different situation, because structural biology is far from a green field for IT development. Key codes are decades old. Several of them contain a custom workflow engine that long predates workflow exchange standards. They are, and should be, developed in universities across the world. So our loosely coupled pipelines are the right way to integrate services in this field.