2014 05-27 - Opinion: Computing for genomics sucks.

•

2 gefällt mir•12,333 views

Some thoughts on 1. why the genomics bioinformaticians need hardware that differs from what traditional HPC providers provide 2. why its challenging to get it. With input from @bmpvieira, @yeban, @gawbul . Video: https://www.youtube.com/watch?v=mmMQw2gIozI

Geräte & Hardware Technologie Business

Why
computing
for genomics
research
sucks.
y.wurm@qmul.ac.uk
BaltiBio 2014-05-27

Example GenomicsTasks
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
Trim FASTQ ﬁles 40-400x high low short
One de novo
genome assembly
1 high high long
Many de novo
genome assemblies
20-1000x high high long
Determine which of
10 new tools that
promise X can
actually do X (once). !
“genome hacking”
1 depends depends depends

Traditional High Performance
Computing (HPC)
• Physics? Astronomy? Maths? Chemistry?
• Traditional HPC infrastructures are great at small tasks:
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
• And/or have mechanisms/tools that transform their challenges
into many small tasks.

“We have 9999 cores!” - central IT admin
but they are inadequate

Big Ass Servers
• e.g.: 1.5TB ram; 48 cores -
SSH into it and do whatever
you want.
Repetitiveness
“Disk” !
Input/Output
Memory
Duration
per task
Build 10,000 trees 10,000x low low short
Trim FASTQ ﬁles 40-400x high low short
One de novo genome
assembly
1 high high long
Many de novo genome
assemblies
20-1000x high high long
Determine which of 10
new tools that promise
X can actually do X
1 depends depends depends
Jeremy Leipzig

Additional challenges for biologists
• Datasets continue growing fast!
• Generally:
• We lack computational training.
• Bioinformatics tools suck (badly written, badly
tested, hard to install).

So what do we need?
• access to machines of all shapes and sizes
• big and small machines
• direct access via ssh (for hacking & doing things few times)
• indirect access via queue (for doing things many times)
• fast I/O - cheap archival.
• single login: all ﬁles “feel” like they’re in one place

Swiss Institute of
Bioinformatics:Vital-IT

Easily changeable OS & software versions
https://www.docker.io
>docker-switch bio-linux7
# do stuff
>docker-switch pacbio-assembly-vm
# do other stuff
>docker-switch antlab-ubuntu
# do more stuff
@bmpvieira

What if Apple/Google made an
idiot-proof cloud computing
system for genomics?

What if Apple/Google made an
idiot-proof cloud computing
system for genomics?
• Always on - single place to connect to:
ssh mylab.awskiller.co.uk
• Dropbox-like shared directories & ﬁle checksumming.
• Easily switchable OS version / “VM”.
• Automagically & transparently migrates:
• from small to huge machines (and back) as CPU and RAM
demands change.

Summary
• Broad range of needs:!
• some similar to traditional HPC.!
• some very different!!
• Users are naive.!
• Tools are experimental.!
• Datasets are experimental.!
• IT people have difﬁculty understanding this.
• Do not trust them when they say things will just work!
!
• A lot of potential to make things not suck.

Evolutionary Genetics group
& Queen Mary U London
Bruno Vieira - @bmpvieira
Steve Moss - @gawbul
Anurag Priyam - @yeban
Richard Christie & ITS
Research Support team @
Queen Mary U London
Ioannis Xenarios & Vital-IT
team @ Swiss Institute of
Bioinformatics
http://yannick.poulet.orgy.wurm@qmul.ac.uk

Empfohlen

ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman

ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsNick Loman

London Calling: A Year of Happy MAPping 14th May 2015Nick Loman

Mobile Access, by Mark Johnson, HighwireCharleston Conference

Hdmi Bit Rate CalculatorMaryArmenta

2016 09-16-fairdomYannick Wurm

170216 jts agbt_finalJared Simpson

Erez Hanit's RésuméErez Hanit

Empfohlen

ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman

ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsNick Loman

London Calling: A Year of Happy MAPping 14th May 2015Nick Loman

Mobile Access, by Mark Johnson, HighwireCharleston Conference

Hdmi Bit Rate CalculatorMaryArmenta

2016 09-16-fairdomYannick Wurm

170216 jts agbt_finalJared Simpson

Erez Hanit's RésuméErez Hanit

2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm

2018 08-reduce risks of genomics researchYannick Wurm

2017 11-15-reproducible researchYannick Wurm

2016 05-31-wurm-social-chromosomeYannick Wurm

2016 05-30-monday-assemblyYannick Wurm

2016 05-29-intro-sib-springschool-leuker badYannick Wurm

2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm

2015 11-17-programming inr.keyYannick Wurm

2015 11-10-bio-in-docker-oswitchYannick Wurm

Week 5 genetic basis of evolutionYannick Wurm

Biol113 week4 evolutionYannick Wurm

Evolution week3Yannick Wurm

2015 10-7-11am-reproducible researchYannick Wurm

2015 10-7-9am regex-functions-loops.keyYannick Wurm

Evolution week2Yannick Wurm

2015 9-30-sbc361-research methcommYannick Wurm

2015 09-29-sbc322-methods.keyYannick Wurm

Sbc322 intro.keyYannick Wurm

2015 09-28 bio721 introYannick Wurm

Sustainable software institute Collaboration workshopYannick Wurm

Just Call Vip call girls godhra Escorts ☎️9352988975 Two shot with one girl (...gajnagarg

Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Weitere ähnliche Inhalte

Mehr von Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm

2018 08-reduce risks of genomics researchYannick Wurm

2017 11-15-reproducible researchYannick Wurm

2016 05-31-wurm-social-chromosomeYannick Wurm

2016 05-30-monday-assemblyYannick Wurm

2016 05-29-intro-sib-springschool-leuker badYannick Wurm

2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm

2015 11-17-programming inr.keyYannick Wurm

2015 11-10-bio-in-docker-oswitchYannick Wurm

Week 5 genetic basis of evolutionYannick Wurm

Biol113 week4 evolutionYannick Wurm

Evolution week3Yannick Wurm

2015 10-7-11am-reproducible researchYannick Wurm

2015 10-7-9am regex-functions-loops.keyYannick Wurm

Evolution week2Yannick Wurm

2015 9-30-sbc361-research methcommYannick Wurm

2015 09-29-sbc322-methods.keyYannick Wurm

Sbc322 intro.keyYannick Wurm

2015 09-28 bio721 introYannick Wurm

Sustainable software institute Collaboration workshopYannick Wurm

Mehr von Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics

2018 08-reduce risks of genomics research

2017 11-15-reproducible research

2016 05-31-wurm-social-chromosome

2016 05-30-monday-assembly

2016 05-29-intro-sib-springschool-leuker bad

2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...

2015 11-17-programming inr.key

2015 11-10-bio-in-docker-oswitch

Week 5 genetic basis of evolution

Biol113 week4 evolution

Evolution week3

2015 10-7-11am-reproducible research

2015 10-7-9am regex-functions-loops.key

Evolution week2

2015 9-30-sbc361-research methcomm

2015 09-29-sbc322-methods.key

Sbc322 intro.key

2015 09-28 bio721 intro

Sustainable software institute Collaboration workshop

Kürzlich hochgeladen

Just Call Vip call girls godhra Escorts ☎️9352988975 Two shot with one girl (...gajnagarg

Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...amitlee9823

CHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

怎样办理斯威本科技大学毕业证（SUT毕业证书）成绩单留信认证tufbav

怎样办理维多利亚大学毕业证（UVic毕业证书）成绩单留信认证tufbav

Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Just Call Vip call girls Shillong Escorts ☎️9352988975 Two shot with one girl...gajnagarg

➥🔝 7737669865 🔝▻ Deoghar Call-girls in Women Seeking Men 🔝Deoghar🔝 Escorts...amitlee9823

CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Point of Care Testing in clinical laboratoryoyebolasonuga14

Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)amitlee9823

SM-N975F esquematico completo - reparación.pdfStefanoBiamonte1

CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Just Call Vip call girls daman Escorts ☎️9352988975 Two shot with one girl (d...gajnagarg

CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...amitlee9823

Abortion pills in Jeddah |+966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Kürzlich hochgeladen (20)

Just Call Vip call girls godhra Escorts ☎️9352988975 Two shot with one girl (...

Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking

Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...

CHEAP Call Girls in Vinay Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

怎样办理斯威本科技大学毕业证（SUT毕业证书）成绩单留信认证

怎样办理维多利亚大学毕业证（UVic毕业证书）成绩单留信认证

Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝

Just Call Vip call girls Shillong Escorts ☎️9352988975 Two shot with one girl...

➥🔝 7737669865 🔝▻ Deoghar Call-girls in Women Seeking Men 🔝Deoghar🔝 Escorts...

CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand

Point of Care Testing in clinical laboratory

Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)

SM-N975F esquematico completo - reparación.pdf

CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking

Just Call Vip call girls daman Escorts ☎️9352988975 Two shot with one girl (d...

CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...

Abortion pills in Jeddah |+966572737505 | Get Cytotec

2014 05-27 - Opinion: Computing for genomics sucks.

1. Why computing for genomics research sucks. y.wurm@qmul.ac.uk BaltiBio 2014-05-27

2. Example GenomicsTasks Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short Trim FASTQ ﬁles 40-400x high low short One de novo genome assembly 1 high high long Many de novo genome assemblies 20-1000x high high long Determine which of 10 new tools that promise X can actually do X (once). ! “genome hacking” 1 depends depends depends

3. Traditional High Performance Computing (HPC) • Physics? Astronomy? Maths? Chemistry? • Traditional HPC infrastructures are great at small tasks: Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short • And/or have mechanisms/tools that transform their challenges into many small tasks.

4. “We have 9999 cores!” - central IT admin but they are inadequate

5. Big Ass Servers • e.g.: 1.5TB ram; 48 cores - SSH into it and do whatever you want. Repetitiveness “Disk” ! Input/Output Memory Duration per task Build 10,000 trees 10,000x low low short Trim FASTQ ﬁles 40-400x high low short One de novo genome assembly 1 high high long Many de novo genome assemblies 20-1000x high high long Determine which of 10 new tools that promise X can actually do X 1 depends depends depends Jeremy Leipzig

6. Additional challenges for biologists • Datasets continue growing fast! • Generally: • We lack computational training. • Bioinformatics tools suck (badly written, badly tested, hard to install).

7. So what do we need? • access to machines of all shapes and sizes • big and small machines • direct access via ssh (for hacking & doing things few times) • indirect access via queue (for doing things many times) • fast I/O - cheap archival. • single login: all ﬁles “feel” like they’re in one place

8. Swiss Institute of Bioinformatics:Vital-IT

9. So what do we need? • access to machines of all shapes and sizes • big and small machines • direct access via ssh (for hacking & doing things few times) • indirect access via queue (for doing things many times) • fast I/O - cheap archival. • single login; all ﬁles “feel” like they’re in one place • easily changeable software & OS versions

10. Easily changeable OS & software versions https://www.docker.io >docker-switch bio-linux7 # do stuff >docker-switch pacbio-assembly-vm # do other stuff >docker-switch antlab-ubuntu # do more stuff @bmpvieira

11. Easily changeable OS & software versions https://www.docker.io >docker-switch bio-linux7 # do stuff >docker-switch pacbio-assembly-vm # do other stuff >docker-switch antlab-ubuntu # do more stuff FAKE @bmpvieira

12.

13. What if Apple/Google made an idiot-proof cloud computing system for genomics?

14. What if Apple/Google made an idiot-proof cloud computing system for genomics? • Always on - single place to connect to: ssh mylab.awskiller.co.uk • Dropbox-like shared directories & ﬁle checksumming. • Easily switchable OS version / “VM”. • Automagically & transparently migrates: • from small to huge machines (and back) as CPU and RAM demands change.

15. What if Apple/Google made an idiot-proof cloud computing system for genomics? • Always on - single place to connect to: ssh mylab.awskiller.co.uk • Dropbox-like shared directories & ﬁle checksumming. • Easily switchable OS version / “VM”. • Automagically & transparently migrates: • from small to huge machines (and back) as CPU and RAM demands change. • from one physical site (huge dataset) to another

16. Summary • Broad range of needs:! • some similar to traditional HPC.! • some very different!! • Users are naive.! • Tools are experimental.! • Datasets are experimental.! • IT people have difﬁculty understanding this. • Do not trust them when they say things will just work! ! • A lot of potential to make things not suck.

17. Evolutionary Genetics group & Queen Mary U London Bruno Vieira - @bmpvieira Steve Moss - @gawbul Anurag Priyam - @yeban Richard Christie & ITS Research Support team @ Queen Mary U London Ioannis Xenarios & Vital-IT team @ Swiss Institute of Bioinformatics http://yannick.poulet.orgy.wurm@qmul.ac.uk