OBIWEE is an open source bioinformatics cloud environment for running intensive workflows. It uses SLICEE as a workflow authoring tool and runs jobs on a scalable virtual cluster deployed on private or public clouds using OpenNebula or EC2. Workflows are authored by describing command lines and OBIWEE handles parallelization and job submission to the cluster. It provides tools for deployment, data management, and client access via command line or GUI interfaces like Kepler.
1. OBIWEE : an open source bioinformatics
cloud environment
OBIWEE : On Demand Bioinformatics Intensive Workflow
Execution Environment
J. Piat, F. Moreews, O. Sallou
http://vapor.gforge.inria.fr/
OBIWEE - BOSC 2011, July 16, Vienna
2. What is OBIWEE?
OBIWEE is an open source bioinformatics Intensive
Computation Execution environment based on SLICEE.
Preconfigured on a scalable linux virtual cluster with
Torgue job scheduler, it can be deployed on a private
cloud, using OpenNebula, or EC2 public cloud.
S3 is used as a persistent storage layer (Eucalyptus Walrus
or Amazon S3).
Based on Ubuntu/Debian linux bioinformatics
images/packages.
OBIWEE - BOSC 2011, July 16, Vienna
3. What is OBIWEE?
1/ A workflow authoring tool +
2/ A virtual cluster (Torque) +
3/ A set of deployment scripts for
Private cloud (OpenNebula / KVM )
and/or
Public cloud (EC2)
OBIWEE - BOSC 2011, July 16, Vienna
4. OBIWEE : components
1/ SLICEE : A workflow authoring tool
● Tools description is command-line based:
Write the command line as on local, in your workflow, execute on remote
All installed tools immediately available
Easy file referencing method
● Job scheduler front end (queue selection per job)
● Set of reference ID : dataset reference mechanism for remote service
invocation
● Access data via URI : multiple protocols (sftp,ftp,http,file,s3) +
internal ref. ID URI.
● Standard authentication(ssh)
● Persistence and logs
● Automatic coarse grain parallelism extraction:
Basic bioinformatics formats implemented
Easy extension with regular expressions/external scripts
OBIWEE - BOSC 2011, July 16, Vienna
5. OBIWEE : components
2/ A virtual cluster
A scalable cluster using Torque/SGE scheduler
Workflow jobs and parallelized jobs are submitted to the
DRM manager. It is easy to scale the DRM to increase the
workload capacity of the tool.
OBIWEE - BOSC 2011, July 16, Vienna
6. OBIWEE : components
3/ A set of deployment scripts for
Private cloud (OpenNebula / KVM )
and/or
Public cloud (EC2)
OBIWEE - BOSC 2011, July 16, Vienna
7. OBIWEE : installation
Virtual image creation:
Bioinformatics
software needs Lyncee
installation install
needs Image
configuration
needs
provides
Slicee install
provides
Cluster generation
Workflow management NFS mount of working directory
Data parallelization Node deployment
Data management
Job submission
Authorization
OBIWEE - BOSC 2011, July 16, Vienna
8. OBIWEE : architecture
node
Client
NFS share
(Kepler/command line) Run job
node
Amazon EC2/ master
Open Nebula
Add node
node
publish
Retrieve
Amazon EC2/ S3
Open Nebula
OBIWEE - BOSC 2011, July 16, Vienna
9. OBIWEE : clients
● API (job submission) :
create your own submission/orchestration clients
CommonRestClient client = new CommonRestClient(serverUrl);
//upload data
client.upload(sessionId, inputDataUriPath);
//asynchronous execution
rdsid = client.getDSIDFromAsyncExe(xmlQuery, sessionId);
//wait (client.waitAndGetResult()),or do something else
//download /move results
client.move(vaporSession, uri,new URI(path));
● Command line (workflow execution)
java -cp $cp vapor.cli.VaporCmdClient -w workflow.xml -i input.xml -d auth.xml
● GUI (workflow execution and design):
Kepler with SLICEE actors (workflow creation/execution)
OBIWEE - BOSC 2011, July 16, Vienna
10. OBIWEE : KEPLER client with SLICEE
actors
miRNAs detection workflow
OBIWEE - BOSC 2011, July 16, Vienna
11. OBIWEE : road map
Road map
● Monitoring, fail over
● Custom full web client
● Integration in existing popular clients
● Data cleanup policies
THANK YOU !
more info on SLICEE and OBIWEE EC2 deployment tutorial at
http://vapor.gforge.inria.fr/
OBIWEE - BOSC 2011, July 16, Vienna