SlideShare a Scribd company logo
1 of 19
Download to read offline
Christophe Blanchet, Clément Gauthey
Infrastructure Distributed for Biology
IDB-IBCP CNRS FR3302 - LYON - FRANCE
http://idee-b.ibcp.fr
IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552)
and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)
Providing Bioinformatics Services
on Cloud
C. Blanchet and C. Gauthey
EGI CF13, Manchester, 9 April 2013
Infrastructure Distributed for Biology - IDB
CNRS-IBCP FR3302, Lyon, FRANCE
EGI CF13, Manchester, 9 April 2013
Bioinformatics Today
• Biological data are big data
• 1512 online databases (NAR Database Issue 2013)
• Institut Sanger, UK, 5 PB
• Beijing Genome Institute, China, 4 sites, 10 PB
➡ Big data in lot of places
• Analysing such data became difficult
• Scale-up of the analyses : gene/protein to complete genome/
proteome, ...
• Lot of different daily-used tools
• That need to be combined in workflows
• Usual interfaces: portals,Web services, federation,...
➡ Datacenters with ease of access/use
• Distributed resources
• Experimental platforms: NGS, imaging, ...
• Bioinformatics platforms
➡ Federation of datacenters
ADN
BI
M
ADN
A
ADN
BI CC
BI
ADN
ADN
EGI CF13, Manchester, 9 April 2013
Sequencing Genomes
source: www.politigenomics.com/next-generation-sequencing-informatics
Complete genome sequencing
become a lab commodity with
NGS (cheap and efficient)
source: www.genomesonline.org
EGI CF13, Manchester, 9 April 2013
Infrastructures in Biology
Lot of tools
and web services
to treat and vizualize
lot of data
EGI CF13, Manchester, 9 April 2013
The scene
• Bioinformatics services providers
• Is it easy to deploy lot of (incompatible) tools ?
• To make them connected to public databases ?
• To limit transfer of huge data ?
• To provide users with their own computing resources ?
• With their own isolated storage ?
• Scientists
• Is it easy to access/use these tools ?
• To adapt to your usage ?
• To get your/other tools deployed on a datacenter ?
• To combine them ?
• To get my own computing/storage resources ?
EGI CF13, Manchester, 9 April 2013
IDB’s Cloud
• Cloud workbench for Biology
• 13 turnkey bioinformatics appliances (as of Apr. 2013)
• Running since Sept. 2011, opened to Biology community
• Lyon, FRANCE
• Powered by
• StratusLab
• Compute nodes, Block storage
• +900 cores, +4TB RAM, 36TB vdisks
• Mainly Intel SandyBridge servers with 32c 128GB
• Bigmen servers with 64c 768GB
• VMs from 1 to 64c, 512MB to 760GB RAM
• + Openstack
• Object storage (Swift)
• +200 TB redundant & scalable storage
EGI CF13, Manchester, 9 April 2013
Driven throught a simple web interface
EGI CF13, Manchester, 9 April 2013
Integrate Bioinformatics Tools in Cloud
BLAST
GOR4
FastA
SSearch
Abyss
ClustalW
Bioinformatics
Tools
Ray
BWA
PhyML RedHat,
CentOS
Debian,
Ubuntu
Suse
Linux
Virtual machines
Create
new
Appliance
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
• Appliances are virtual machines
• small : few GB, easy to convert in most virtualization formats
• Installed and pre-configured with common bioinformatics tools
• e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
EGI CF13, Manchester, 9 April 2013
Bioinformatics Appliances
EGI CF13, Manchester, 9 April 2013
Select your bioinformatics tools
EGI CF13, Manchester, 9 April 2013
Run Bioinformatics Cloud Instances
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
IBCP's Cloud
Resources
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
Launch
Instances
EGI CF13, Manchester, 9 April 2013
Manage your Cloud Instances
EGI CF13, Manchester, 9 April 2013
UNIPROT
PDB
EMBL
PROSITE
Genomes
Public
Data sources
Bioinformatics
Cloud
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
shared
(NFS)
User
Persistent data
pdisk
(iSCSI)
Biological Data in Cloud
Upload your data
Get your results
scp http/S3
scp http/S3
EGI CF13, Manchester, 9 April 2013
Example:‘biocompute’ Appliance
• Use your own instance(s)
• With pre-installed
standard bioinformatics
tools
• BLAST, FastA, SSearch,HMM,...
• ClustalW2, Clustal-Omega, Muscle,..
• Bowtie(2), BWA, samtools, ...
• MEME, R, etc.
• Connected to public
reference data
• Uniprot, EMBL, genomes, PDB, etc.
• Automaticaly shared to theVMs
EGI CF13, Manchester, 9 April 2013
Example: Galaxy portal for NGS analyses
• Analyse NGS data
• portal Galaxy is widely used in the community
• connected to large public data: sequences and indexes
• large user data (GBs)
• Preserve workflows and results (persistent storage)
EGI CF13, Manchester, 9 April 2013
Example: Proteomics
• Motivation
• Collaboration with a mass spectroscopy platform
• Running out of space on their local resources
• Protein identification
• Mass experimental data
• Reference databases : nr, Swiss-Prot
• Reference screening tools:
OMSSA, X!Tandem
• User interface
• Remote display
• NX
• Reference GUIs
• SearchGUI
• PeptidShaker
source: PeptideShaker site
EGI CF13, Manchester, 9 April 2013
Conclusion
• Provide turnkey bioinformatics appliances
• Standard tools and pipelines
• Interoperability: ready to run on cloud
• Easier to transfer appliances than data (GB vs TB)
• Provide a cloud infrastructure tightly connected
to existing bioinformatics infrastructure
• Public IDB’s bioinformatics cloud
• Linked to public biological databases
• In collaboration with the French Bioinformatics Institute
• Ease the usage by scientists
• Usual bioinformatics gateways
• Persistent and large ubiquitous storage
• Web interface for cloud management
EGI CF13, Manchester, 9 April 2013
Perspectives
• Define good practices to provide academic
community and industry with bioinformatics services!
• French Bioinformatics Institute - IFB
• Goals are to provide core bioinformatics resources to the
national and international life science research community in
key fields such as genomics, proteomics, systems biology, etc.
• Aims at building a national academic cloud devoted to
Bioinformatics, inspired by the model evaluated through the
IDB’s cloud.
• European ELIXIR infrastructure
• To build a sustainable European infrastructure for biological
information, supporting life science research and its
translation
• IFB will be the French representative in ELIXIR.
EGI CF13, Manchester, 9 April 2013
• Acknowledgment
• StratusLab members
• co-funding by the European Community's Seventh
Framework Programme (INFSO-RI-261552) and
by the French National Research Agency's Arpege
Programme (ANR-10-SEGI-001).
Questions ?
http://idee-b.ibcp.fr

More Related Content

What's hot

2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - finalEOSC-hub project
 
LCG project description
LCG project descriptionLCG project description
LCG project descriptionlouisponcet
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*Vince Smith
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data ExchangeEOSCpilot .eu
 
Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Karlsruhe Institute of Technology (KIT)
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II CodataFAO
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...FAO
 

What's hot (12)

2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final
 
LCG project description
LCG project descriptionLCG project description
LCG project description
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*
 
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data Exchange
 
Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
De castro sonex work group
De castro sonex work groupDe castro sonex work group
De castro sonex work group
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II Codata
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 

Similar to Providing Bioinformatics Services on Cloud

The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
EGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Federation
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...OpenAIRE
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishArchiver
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectHelix Nebula The Science Cloud
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Blue BRIDGE
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructuree-ROSA
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017Gregory Newby
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 

Similar to Providing Bioinformatics Services on Cloud (20)

The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
EGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & Results
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP Project
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Providing Bioinformatics Services on Cloud

  • 1. Christophe Blanchet, Clément Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) Providing Bioinformatics Services on Cloud C. Blanchet and C. Gauthey EGI CF13, Manchester, 9 April 2013 Infrastructure Distributed for Biology - IDB CNRS-IBCP FR3302, Lyon, FRANCE
  • 2. EGI CF13, Manchester, 9 April 2013 Bioinformatics Today • Biological data are big data • 1512 online databases (NAR Database Issue 2013) • Institut Sanger, UK, 5 PB • Beijing Genome Institute, China, 4 sites, 10 PB ➡ Big data in lot of places • Analysing such data became difficult • Scale-up of the analyses : gene/protein to complete genome/ proteome, ... • Lot of different daily-used tools • That need to be combined in workflows • Usual interfaces: portals,Web services, federation,... ➡ Datacenters with ease of access/use • Distributed resources • Experimental platforms: NGS, imaging, ... • Bioinformatics platforms ➡ Federation of datacenters ADN BI M ADN A ADN BI CC BI ADN ADN
  • 3. EGI CF13, Manchester, 9 April 2013 Sequencing Genomes source: www.politigenomics.com/next-generation-sequencing-informatics Complete genome sequencing become a lab commodity with NGS (cheap and efficient) source: www.genomesonline.org
  • 4. EGI CF13, Manchester, 9 April 2013 Infrastructures in Biology Lot of tools and web services to treat and vizualize lot of data
  • 5. EGI CF13, Manchester, 9 April 2013 The scene • Bioinformatics services providers • Is it easy to deploy lot of (incompatible) tools ? • To make them connected to public databases ? • To limit transfer of huge data ? • To provide users with their own computing resources ? • With their own isolated storage ? • Scientists • Is it easy to access/use these tools ? • To adapt to your usage ? • To get your/other tools deployed on a datacenter ? • To combine them ? • To get my own computing/storage resources ?
  • 6. EGI CF13, Manchester, 9 April 2013 IDB’s Cloud • Cloud workbench for Biology • 13 turnkey bioinformatics appliances (as of Apr. 2013) • Running since Sept. 2011, opened to Biology community • Lyon, FRANCE • Powered by • StratusLab • Compute nodes, Block storage • +900 cores, +4TB RAM, 36TB vdisks • Mainly Intel SandyBridge servers with 32c 128GB • Bigmen servers with 64c 768GB • VMs from 1 to 64c, 512MB to 760GB RAM • + Openstack • Object storage (Swift) • +200 TB redundant & scalable storage
  • 7. EGI CF13, Manchester, 9 April 2013 Driven throught a simple web interface
  • 8. EGI CF13, Manchester, 9 April 2013 Integrate Bioinformatics Tools in Cloud BLAST GOR4 FastA SSearch Abyss ClustalW Bioinformatics Tools Ray BWA PhyML RedHat, CentOS Debian, Ubuntu Suse Linux Virtual machines Create new Appliance Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence • Appliances are virtual machines • small : few GB, easy to convert in most virtualization formats • Installed and pre-configured with common bioinformatics tools • e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
  • 9. EGI CF13, Manchester, 9 April 2013 Bioinformatics Appliances
  • 10. EGI CF13, Manchester, 9 April 2013 Select your bioinformatics tools
  • 11. EGI CF13, Manchester, 9 April 2013 Run Bioinformatics Cloud Instances Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence IBCP's Cloud Resources BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal Launch Instances
  • 12. EGI CF13, Manchester, 9 April 2013 Manage your Cloud Instances
  • 13. EGI CF13, Manchester, 9 April 2013 UNIPROT PDB EMBL PROSITE Genomes Public Data sources Bioinformatics Cloud BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal shared (NFS) User Persistent data pdisk (iSCSI) Biological Data in Cloud Upload your data Get your results scp http/S3 scp http/S3
  • 14. EGI CF13, Manchester, 9 April 2013 Example:‘biocompute’ Appliance • Use your own instance(s) • With pre-installed standard bioinformatics tools • BLAST, FastA, SSearch,HMM,... • ClustalW2, Clustal-Omega, Muscle,.. • Bowtie(2), BWA, samtools, ... • MEME, R, etc. • Connected to public reference data • Uniprot, EMBL, genomes, PDB, etc. • Automaticaly shared to theVMs
  • 15. EGI CF13, Manchester, 9 April 2013 Example: Galaxy portal for NGS analyses • Analyse NGS data • portal Galaxy is widely used in the community • connected to large public data: sequences and indexes • large user data (GBs) • Preserve workflows and results (persistent storage)
  • 16. EGI CF13, Manchester, 9 April 2013 Example: Proteomics • Motivation • Collaboration with a mass spectroscopy platform • Running out of space on their local resources • Protein identification • Mass experimental data • Reference databases : nr, Swiss-Prot • Reference screening tools: OMSSA, X!Tandem • User interface • Remote display • NX • Reference GUIs • SearchGUI • PeptidShaker source: PeptideShaker site
  • 17. EGI CF13, Manchester, 9 April 2013 Conclusion • Provide turnkey bioinformatics appliances • Standard tools and pipelines • Interoperability: ready to run on cloud • Easier to transfer appliances than data (GB vs TB) • Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure • Public IDB’s bioinformatics cloud • Linked to public biological databases • In collaboration with the French Bioinformatics Institute • Ease the usage by scientists • Usual bioinformatics gateways • Persistent and large ubiquitous storage • Web interface for cloud management
  • 18. EGI CF13, Manchester, 9 April 2013 Perspectives • Define good practices to provide academic community and industry with bioinformatics services! • French Bioinformatics Institute - IFB • Goals are to provide core bioinformatics resources to the national and international life science research community in key fields such as genomics, proteomics, systems biology, etc. • Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud. • European ELIXIR infrastructure • To build a sustainable European infrastructure for biological information, supporting life science research and its translation • IFB will be the French representative in ELIXIR.
  • 19. EGI CF13, Manchester, 9 April 2013 • Acknowledgment • StratusLab members • co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001). Questions ? http://idee-b.ibcp.fr