This document summarizes ELIXIR's plans to develop a cloud computing platform to support life science research across Europe. It discusses ELIXIR's goals to integrate user authentication, rationalize reference data distribution, support hybrid cloud/HPC deployments, develop a task distribution network using Kubernetes, and support workflow engines. Key components include Biocontainers for tools, RDSDS for reference data, TESK for task execution, and WES-ELIXIR for workflows. The platform aims to be compatible with GA4GH standards and support projects like EOSC-Hub and EOSC-Life.
1. eosc-hub.eu
@EOSC_eu
EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536.
Dr Susheel Varma, EMBL-EBI
Dissemination level: Public
3. ELIXIR Today
• ELIXIR has reached a healthy critical mass:
• International coordination through 23 Nodes
• Reaching over 200 institutes
• Joint scientific direction through 23 national node
directors (HoN committee)
• Coordinated ELIXIR infrastructure developments
(Platform ExCos)
• ELIXIR Communities validating infrastructure
services
• Strong implementation mechanisms:
• Legal framework for coordinated transnational
actions
• Experience operating large infrastructure grant
consortia
4. Data flowing freely between connected national infrastructures
Data
Tools
Standards
Compute
Training and
Skills
…delivered in partnership with
research communities...the
ELIXIR Communities
5. ELIXIR 2019-23 Programme
• Approved by ELIXIR Board in November 2018
• Came into effect 1 January 2019
• Accompanied by Financial Plan (2019-23)
• Programme coordinated from the ELIXIR Hub
and implemented through Annual Work
Plans via Commissioned Services
• Mid-term review of Programme planned for
2021
https://www.elixir-europe.org/about-us/what-we-do/elixir-programme
8. ELIXIR Compute Platform for EOSC-Hub & EOSC-Life
• Integrate user federation into local compute
and data deployments - ELIXIR AAI
• Rationalise a ELIXIR-wide Data Distribution
Network – starting with Reference datasets ~
RDSDS
• Drive ELIXIR Compute Platform support to
Nodes to develop hybrid cloud/HPC
deployments in ELIXIR and EOSC
• Develop Task Distribution Network using Task
orchestration engines – e.g. Kubernetes ~ TESK
• Support national or regional Workflow
Choreography Engines – e.g. CWL-TES,
Cromwell, Nextflow, Galaxy, etc. ~ WES-ELIXIR
9. Developing standards and tools
Simplify the way people search for and request access
to potentially identifiable data in international and
national genomic data resources
Working towards GA4GH standards, APIs and toolkits to be used
throughout ELIXIR Nodes for human data discovery and access
10. ELIXIR Cloud and AAI Programme
WES
Workflow Execution Service
DRS
Data Repository Service
TRS
Tool Registry Service
TES
Task Execution Service
RDSDSTESKWES-ELIXIRBiocontainers
AAI
ELIXIR TOOLS
PLATFORM ELIXIR COMPUTE PLATFORM
11. ELIXIR Human Data – RD-Connect Pipeline
• ELIXIR Human Data – Rare Disease
Community
• Demonstrator focussed on the variant-calling pipeline
curated by the RD-Connect platfom authors – Laurie et
al 2016
• RD researchers submit raw (fastq) files from EGA and
data transfers
• The platform scales the analysis to obtain unannotated
gVCF files
• Further analysis via RD-Connect by data transfer back
to RD-Connect
• Pipeline converted to CWL and Nextflow for execution
via the platform
12. Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP)
• 2018-IS-Interop-CWL
• Pipeline for the assembly and annotation of
marine eukaryotic transcriptome
• Datasets: Tara Ocean and MMETSP
• Goal is to develop and deploy CWL workflows to
multiple environments
• ELIXIR GA4GH Compatible Platform (WES - TES)
• ELIXIR Galaxy Community (usegalaxy.eu)
• Current focus on:
• Containerisation of the assembly and annotation
pipelines
• Galaxy – CWL Interoperability
• Ready for multi-site CWL execution
16. Reference Data Set Distribution Service (RDSDS)
• Developed as part of EOSC-Hub
• RDSDS – Data Distribution Network
• Provides a centralised Dataset registry,
management and distribution
• Integrated with ELIXIR AAI for AuthN-Z
• Supports HTTP, FTP, GSIFTP, S3, FTS3 and Globus
Connect
• Decentralised Identifiers using content-based
dataset identifiers
• Virtual Dataset sidecar container for local data
cache (Q2 2019)
• EMBL-EBI Data Archives (>80% Reference Data)
• Metabolights, ArrayExpress, PRIDE, ENA*
22. ELIXIR Compute Platform for EOSC-Hub & EOSC-Life
• Integrate user federation into local compute
and data deployments - ELIXIR AAI
• Rationalise a ELIXIR-wide Data Distribution
Network – starting with Reference datasets ~
RDSDS
• Drive ELIXIR Compute Platform support to
Nodes to develop hybrid cloud/HPC
deployments across ELIXIR and EOSC
• Develop Task Distribution Network using Task
orchestration engines – e.g. Kubernetes ~ TESK
• Support national or regional Workflow
Choreography Engines – e.g. CWL-TES,
Cromwell, Nextflow, Galaxy, etc. ~ WES-ELIXIR
23.
24.
25.
26. Thank you to Heads of
Nodes, Platform and
Communities leaders,
Training and Technical
Coordinators and the whole
ELIXIR Community!