SlideShare ist ein Scribd-Unternehmen logo
1 von 30
e-Biothon
V. Breton (breton@clermont.in2p3.fr)
LPC Clermont-Ferrand, IdGC
CNRS-IN2P3
http://france-grilles.fr
Credit: N. Bard, A. Franc, JF Gibrat
Extreme Performance Computational Science workshop
Tokyo, April 15th 2014
Table of content
2
• What are the computing challenges of life
sciences?
• France Grilles: a
multidisciplinarydistributede-
infrastructure for science
• E-Biothon: an HPC platform for research in
life sciences
Generalities on sequencing
• Genome = DNA sequence (4 nucleotids:
A, C, G, T)
– Smallest non viral genome:
Carsonellaruddii (0,16Mbp)
– Largestgenome: Polychaosdubium(670Gbp)
Sanger technology 500 bpsequences
454 technology 105reads of 450 to 600bp seq.
Illumina Technology 106 reads of 100 bpseq.
Currentprojects(Tara) 107reads of 100 to 400 bpseq.
Explosion of data set size
Data analysis ?
Algorithms?
Heuristics?
Tara @ http://oceans.taraexpeditions.org/
Evolution of sequencing
techniques
Data production isdistributed
2558 High Throughput ÂŤ NextGeneration Âť sequencingfacilities in the world,
located in 920 centers (only 10 with more than 15 machines)
Source: omicspmaps.com
Data production
growsfasterthanMoore’slaw
Sequencing scenarii
• Interest for a new genome requires assembly
– process of taking a large number of short DNA sequences and
putting them back together to create a representation of the
original
– Algorithms based on read overlapping benefit from large RAM (1
TO) -> HPC
• Working with a reference genome requires comparative
analysis
– Alignment algorithms (BLAST) findregions of local
similaritybetweensequences
– Phylogeny algorithms (PhyML) build evolutionary relationships
between genomes
– Comparative analyses are easily parallelized at data level -> HTC
Summary
• Life Sciences have specificcomputational challenges
– Data production growsfasterthan Moore law
– Permanent need of comparing new data to existingones
• Life sciences needscanberelevantlyaddressed on
multidisciplinary IT infrastructures (e-infrastructures)
– HPC resources best fitted for genomeassembly
– Grid/cloud HTC resourceswellfitted for comparative analysis
• Life sciences are among the main users of the French
national grid/cloud production infrastructure
France Grilles
• Is a ScientificInterest Group…
– Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA,
INSERM, MESR, RENATER…
– To steer up and coordinate the national strategy in the fields of
grids and clouds
• Vision:
– Build and operate a national distributedcomputing
infrastructure open to all sciences and to developing countries
9
France Grilles model
• France Grilles does not own the resources
– Resourcesowned by user communities
• France Grilles provides a framework
– To shareresources, expertise and know how
– To promote innovation and initiatives
– To foster collaboration at national and international
levels
– To reach out to the long tail of users
10
France Grilles resources
France-Grillesbackbone:
LCG-France
France-Grillesspine:
CC-IN2P3
EGI de 2010 Ă  2013
12
2010-2013: from 14 regional to 34 operations centres in 53 countries,
from 188,000 jobs/day with 80,000 cores on 250 Resource Centres
to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres
Technologies
• Grids
• Clouds
• Desktops
ExposĂŠ S. Newhouse Madrid, Sept. 2013
France Grilles, a partner of EGI
Provide a commonframework to all user communities
Provide an open environment for fruitfuldisciplinary and
multidisciplinaryresearch
14
5 1 1
218
54
9 1 5 9 11 15 13 11
755
99 50
9
23
1
10
100
1000
Over 1500 scientific publications
june 2010 – April 2014
Web portal
Users
479 registered users in Nov 2013 (175 in France)
Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users)
Neuro-image analysisCancer therapy simulation
Prostate radiotherapy plan simulated
with GATE(L. Grevillot and D. Sarrut)
Image simulation
Echocardiography simulated with
FIELD-II (O. Bernard et al)
Modeling and optimization of
distributed computing systems
Acceleration yielded by non-clairvoyant
task replication (R. Ferreira da Silva et al)
Brain tissue segmentation
with Freesurfer
Scientific applications
Infrastructure
Supported by EGI Infrastructure
Uses biomed VO (most used EGI VO for life sciences in 2013)
VIP accounts for ~25% of biomed's activity
VIP consumes ~50 CPU years every month
DIRAC
France-Grilles
Application as a service
File transfer to/from grid
Virtual Imaging Platform:
http://www.creatis.insa-lyon.fr/vip
Collaborations withdedicated life sciences infrastructures
• Institut Français de Bioinformatique (computing
and storageresourcesatIDRIS)
• France Genomique ( computing and
storageresourcesat TGCC)
• France Life Imaging (infrastructure for
biomedicalimaging)
• E-Biothon
16
17
• Telethon: everyyear, fundraising by
french media for French
MuscularDistrophy Association (AFM)
• FromTelethon to Decrypthon
– Computing infrastructure (IBM)
– Researchprojects (CNRS)
– Humanresources (AFM)
• FromDecrypthon to E-Biothon
E-Biothon: history
e-Biothon: an HPC platform for
research in life sciences
18
User Support
Blue Gene / p
machines
Technical supportUser Support
Blue Gene / P
operationWeb access
portal
E-Biothon: infrastructure
19
• 2 Blue Gene/P IBM racks
with 200 TO storage
– 2x1024 4-core nodes
– up to 28 TFlopspeak
performance
• SysFera-DS web access
to computingresources
• 2 modes:
– Standard (MPI)
– HTC (1024
independenttasks in
parallel)
E-Biothon vision is to offer a service to
the user communities in life sciences
• 2013-2014: first 3 projects
– Jean-François Gibrat et al, (MIGALE
platform, INRA Jouy-en-Josas)
– Olivier Gascuel, Stéphane Guindon et
Vincent Lefort (CNRS Montpellier)
– Yec’hanLaizet, Philippe
Chaumeil, Jean-Marc
Frigerio, StĂŠphanie Mariette, Sophie
Gerber, Alain Franc (INRA BioGeCo –
Bordeaux)
• > 2014: open call for projects (IFB)
Studying the synteny over a wide
range of microbialgenomes
21
• Definition: similar blocks of genes in the same relative positions in
the genome
• Interest: Study of syntenycan show how the genomeiscut and pasted
in the course of evolution
• MIGALE team at INRA designed a pipeline analysis to
computesyntenybetween 2 genomes and store it in a database
• E-Biothon impact: change in scale - capacity to
computesyntenybetween 2000 completebacterialgenomes (7
millions comparisons)
PhyML
Philogeneticsis the study of evolutionaryrelationshipsamong groups of
organisms
PhyMLis a software thatestimates maximum
likelihoodphylogeniesfromalignments of nucleotide or
aminoacidsequences
PhyML original publication in 2007 is the mostcited in environment and
ecology (> 6000 citations).
E-Biothon impact: change in scale in the resources made available
to PhyMLusers
Characterizing biodiversity
According to botanictheory,
biodiversityisorganized in
species, genders, families, orders:
isitconfirmed in the distance
betweensequences?
Study of biodiversity in Guyane
16000 differenttreespecies
in amazonianforest (≈ 300
in Europe)
More biodiversity in 10000
m2 of forest in French
Guyana than in Europe
Decrypthonadded value
Change in scale (from local Mesocenter in
Bordeaux)
Millions of reads
Exact distance computation
withoutheuristics (alignement scores)
TOctets of data producedeveryweek
Conclusion
• Both HPC and HTC resources are increasinglyneeded to
address life sciences data and computing challenges:
– As sequencing technologies keepevolving, data production
growsfasterthan Moore law and isincreasinglydistributed
– Biological data need to beconstantlycompared to
eachother (phylogenetics, genomics comparative analysis)
• France isdevelopingcomplementary HPC and HTC
infrastructures for life sciences
– Institut Français de Bioinformatique, France Génomique
– E-Biothon: an HPC platform for research in life sciences
– France Grilles: a multidisciplinarygrid/cloud production
infrastructure
2558 NextGenerationSequencers in the world
Are life sciences
specificw.r.tcomputing?
Whatisspecific to life sciences:
- As sequencing technologies keepevolving, data production growsfasterthan
Moore law
- Biological data need to beconstantlycompared to eachother (phylogenetics,
Genomics comparative analysis)
Whatis not specific?
- Data production isdistributed
- Multiscalemodeling

Weitere ähnliche Inhalte

Andere mochten auch

Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esok
ainimat
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answers
nadsavan
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014
pintailfp7
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2
jfazaker
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!
sherrywyche
 
PRUEBA DE SLIDE
PRUEBA DE SLIDEPRUEBA DE SLIDE
PRUEBA DE SLIDE
FranklinDoria
 
(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi
DIHYAUDDIN SAAD
 

Andere mochten auch (18)

Ученый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямУченый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиям
 
Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esok
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answers
 
(Group 13) kbat
(Group 13) kbat(Group 13) kbat
(Group 13) kbat
 
научные работники 24 июня 2014
научные работники  24 июня 2014 научные работники  24 июня 2014
научные работники 24 июня 2014
 
(Group 6) pisa
(Group 6) pisa(Group 6) pisa
(Group 6) pisa
 
Business Etiquette Toronto
Business Etiquette TorontoBusiness Etiquette Toronto
Business Etiquette Toronto
 
Ideal Learning Environment
Ideal Learning EnvironmentIdeal Learning Environment
Ideal Learning Environment
 
Overview of power quality problems
Overview of power quality problemsOverview of power quality problems
Overview of power quality problems
 
Granite City Tool History
Granite City Tool HistoryGranite City Tool History
Granite City Tool History
 
Social Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersSocial Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppers
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital business
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!
 
PRUEBA DE SLIDE
PRUEBA DE SLIDEPRUEBA DE SLIDE
PRUEBA DE SLIDE
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerce
 
(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi
 

Ähnlich wie E biothon workshop 2014 04 15 v1

10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
Alex Hardisty
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster France
ELIXIR-Europe
 

Ähnlich wie E biothon workshop 2014 04 15 v1 (20)

E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
Life watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonLife watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. Breton
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster France
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
 
Science for water management in Mediterranean
Science for water management in MediterraneanScience for water management in Mediterranean
Science for water management in Mediterranean
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 

KĂźrzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

KĂźrzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

E biothon workshop 2014 04 15 v1

  • 1. e-Biothon V. Breton (breton@clermont.in2p3.fr) LPC Clermont-Ferrand, IdGC CNRS-IN2P3 http://france-grilles.fr Credit: N. Bard, A. Franc, JF Gibrat Extreme Performance Computational Science workshop Tokyo, April 15th 2014
  • 2. Table of content 2 • What are the computing challenges of life sciences? • France Grilles: a multidisciplinarydistributede- infrastructure for science • E-Biothon: an HPC platform for research in life sciences
  • 3. Generalities on sequencing • Genome = DNA sequence (4 nucleotids: A, C, G, T) – Smallest non viral genome: Carsonellaruddii (0,16Mbp) – Largestgenome: Polychaosdubium(670Gbp)
  • 4. Sanger technology 500 bpsequences 454 technology 105reads of 450 to 600bp seq. Illumina Technology 106 reads of 100 bpseq. Currentprojects(Tara) 107reads of 100 to 400 bpseq. Explosion of data set size Data analysis ? Algorithms? Heuristics? Tara @ http://oceans.taraexpeditions.org/ Evolution of sequencing techniques
  • 5. Data production isdistributed 2558 High Throughput ÂŤ NextGeneration Âť sequencingfacilities in the world, located in 920 centers (only 10 with more than 15 machines) Source: omicspmaps.com
  • 7. Sequencing scenarii • Interest for a new genome requires assembly – process of taking a large number of short DNA sequences and putting them back together to create a representation of the original – Algorithms based on read overlapping benefit from large RAM (1 TO) -> HPC • Working with a reference genome requires comparative analysis – Alignment algorithms (BLAST) findregions of local similaritybetweensequences – Phylogeny algorithms (PhyML) build evolutionary relationships between genomes – Comparative analyses are easily parallelized at data level -> HTC
  • 8. Summary • Life Sciences have specificcomputational challenges – Data production growsfasterthan Moore law – Permanent need of comparing new data to existingones • Life sciences needscanberelevantlyaddressed on multidisciplinary IT infrastructures (e-infrastructures) – HPC resources best fitted for genomeassembly – Grid/cloud HTC resourceswellfitted for comparative analysis • Life sciences are among the main users of the French national grid/cloud production infrastructure
  • 9. France Grilles • Is a ScientificInterest Group… – Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA, INSERM, MESR, RENATER… – To steer up and coordinate the national strategy in the fields of grids and clouds • Vision: – Build and operate a national distributedcomputing infrastructure open to all sciences and to developing countries 9
  • 10. France Grilles model • France Grilles does not own the resources – Resourcesowned by user communities • France Grilles provides a framework – To shareresources, expertise and know how – To promote innovation and initiatives – To foster collaboration at national and international levels – To reach out to the long tail of users 10
  • 12. EGI de 2010 Ă  2013 12 2010-2013: from 14 regional to 34 operations centres in 53 countries, from 188,000 jobs/day with 80,000 cores on 250 Resource Centres to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres Technologies • Grids • Clouds • Desktops ExposĂŠ S. Newhouse Madrid, Sept. 2013 France Grilles, a partner of EGI
  • 13. Provide a commonframework to all user communities
  • 14. Provide an open environment for fruitfuldisciplinary and multidisciplinaryresearch 14 5 1 1 218 54 9 1 5 9 11 15 13 11 755 99 50 9 23 1 10 100 1000 Over 1500 scientific publications june 2010 – April 2014
  • 15. Web portal Users 479 registered users in Nov 2013 (175 in France) Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users) Neuro-image analysisCancer therapy simulation Prostate radiotherapy plan simulated with GATE(L. Grevillot and D. Sarrut) Image simulation Echocardiography simulated with FIELD-II (O. Bernard et al) Modeling and optimization of distributed computing systems Acceleration yielded by non-clairvoyant task replication (R. Ferreira da Silva et al) Brain tissue segmentation with Freesurfer Scientific applications Infrastructure Supported by EGI Infrastructure Uses biomed VO (most used EGI VO for life sciences in 2013) VIP accounts for ~25% of biomed's activity VIP consumes ~50 CPU years every month DIRAC France-Grilles Application as a service File transfer to/from grid Virtual Imaging Platform: http://www.creatis.insa-lyon.fr/vip
  • 16. Collaborations withdedicated life sciences infrastructures • Institut Français de Bioinformatique (computing and storageresourcesatIDRIS) • France Genomique ( computing and storageresourcesat TGCC) • France Life Imaging (infrastructure for biomedicalimaging) • E-Biothon 16
  • 17. 17 • Telethon: everyyear, fundraising by french media for French MuscularDistrophy Association (AFM) • FromTelethon to Decrypthon – Computing infrastructure (IBM) – Researchprojects (CNRS) – Humanresources (AFM) • FromDecrypthon to E-Biothon E-Biothon: history
  • 18. e-Biothon: an HPC platform for research in life sciences 18 User Support Blue Gene / p machines Technical supportUser Support Blue Gene / P operationWeb access portal
  • 19. E-Biothon: infrastructure 19 • 2 Blue Gene/P IBM racks with 200 TO storage – 2x1024 4-core nodes – up to 28 TFlopspeak performance • SysFera-DS web access to computingresources • 2 modes: – Standard (MPI) – HTC (1024 independenttasks in parallel)
  • 20. E-Biothon vision is to offer a service to the user communities in life sciences • 2013-2014: first 3 projects – Jean-François Gibrat et al, (MIGALE platform, INRA Jouy-en-Josas) – Olivier Gascuel, StĂŠphane Guindon et Vincent Lefort (CNRS Montpellier) – Yec’hanLaizet, Philippe Chaumeil, Jean-Marc Frigerio, StĂŠphanie Mariette, Sophie Gerber, Alain Franc (INRA BioGeCo – Bordeaux) • > 2014: open call for projects (IFB)
  • 21. Studying the synteny over a wide range of microbialgenomes 21 • Definition: similar blocks of genes in the same relative positions in the genome • Interest: Study of syntenycan show how the genomeiscut and pasted in the course of evolution • MIGALE team at INRA designed a pipeline analysis to computesyntenybetween 2 genomes and store it in a database • E-Biothon impact: change in scale - capacity to computesyntenybetween 2000 completebacterialgenomes (7 millions comparisons)
  • 22. PhyML Philogeneticsis the study of evolutionaryrelationshipsamong groups of organisms PhyMLis a software thatestimates maximum likelihoodphylogeniesfromalignments of nucleotide or aminoacidsequences PhyML original publication in 2007 is the mostcited in environment and ecology (> 6000 citations). E-Biothon impact: change in scale in the resources made available to PhyMLusers
  • 24. According to botanictheory, biodiversityisorganized in species, genders, families, orders: isitconfirmed in the distance betweensequences?
  • 25. Study of biodiversity in Guyane 16000 differenttreespecies in amazonianforest (≈ 300 in Europe) More biodiversity in 10000 m2 of forest in French Guyana than in Europe Decrypthonadded value Change in scale (from local Mesocenter in Bordeaux) Millions of reads Exact distance computation withoutheuristics (alignement scores) TOctets of data producedeveryweek
  • 26. Conclusion • Both HPC and HTC resources are increasinglyneeded to address life sciences data and computing challenges: – As sequencing technologies keepevolving, data production growsfasterthan Moore law and isincreasinglydistributed – Biological data need to beconstantlycompared to eachother (phylogenetics, genomics comparative analysis) • France isdevelopingcomplementary HPC and HTC infrastructures for life sciences – Institut Français de Bioinformatique, France GĂŠnomique – E-Biothon: an HPC platform for research in life sciences – France Grilles: a multidisciplinarygrid/cloud production infrastructure
  • 27.
  • 28.
  • 30. Are life sciences specificw.r.tcomputing? Whatisspecific to life sciences: - As sequencing technologies keepevolving, data production growsfasterthan Moore law - Biological data need to beconstantlycompared to eachother (phylogenetics, Genomics comparative analysis) Whatis not specific? - Data production isdistributed - Multiscalemodeling