SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Complex metagenome
assembly, + bonus
career thoughts.
C. Titus Brown
UC Davis
ctbrown@ucdavis.edu
Hello!
Research background:
Computing, modeling, and data analysis: 1989-2000
(high school & undergrad+)
Molecular biology, genomics, and data analysis:
2000-2007
(grad school + postdoc)
Bioinformatics, data analysis, and Comp Sci: 2007-
present
(assistant professor)
Genomics and Veterinary Medicine (?)
Two topics for this talk:
1. Metagenome assembly.
2. Careers & a “middle class” of
bioinformaticians.
Shotgun metagenomics
Collect samples;
Extract DNA;
Feed into sequencer;
Computationally analyze.
Wikipedia: Environmental shotgun
sequencing.png
To assemble, or not to
assemble?
Goals: reconstruct phylogenetic content and predict
functional potential of ensemble.
Should we analyze short reads directly?
OR
Do we assemble short reads into longer contigs first, and
then analyze the contigs?
Assembly: good.
Howe et al., 2014, PMID 24632729
Assemblies yield much
more significant
homology matches.
But!
Does assembly work well!?
(Short reads, chimerism, strain
variation, coverage, compute
resources, etc. etc.)
Yes: metagenome assemblers
recover the majority of known
content from a mock community.
Velvet IDBA Spades
Total length (>= 0 bp) 1.6E+08 2.0E+08 2.0E+08
Total length (>= 1000 bp) 1.6E+08 1.9E+08 1.9E+08
Largest contig 561,449 979,948 1,387,918
# misassembled contigs 631 1032 752
Genome fraction (%) 72.949 90.969 90.424
Duplication ratio 1.004 1.007 1.004
Results: Dr. Sherine AwadReads from Shakya et al., 2013; pmid 23387867
But!
A study of the Rifle site comparing long read
(Moleculo/TruSeq) and short read/assembly content
concluded that their short read assembly was not
comprehensive.
“Low rate of read mapping (18-30%) is typically indicative of
complex communities with a large number of low
abundance genomes or with high degree of
species and strain variations.”
Sharon et al., Banfield lab; PMID 25665577
The dirty not-so-secret (?)
about sequence assembly:
The assembler will simply discard two types of data.
1. Low coverage data - can’t be reconstructed with
confidence; may be erroneous.
2. Highly polymorphic data – confuses the assembler.
So: why didn’t the Rifle data
assemble?
There are no published approaches that will
discriminate between low coverage and strain
variation.
But we’ve known about this problem for ages.
So we’ve been working with something called
“assembly graphs”.
Assembly graphs.
Assembly “graphs” are a way of representing
sequencing data that retains all variation;
assemblers then crunch this down into a single
sequence.
Image from Iqbal et al., 2012. PMID 23172865
Our work on assembly graphs
enables:
Evaluation of data set coverage profiles prior to assembly.
Variant calling and quantification on raw metagenomic
data.
Analysis of strain variation.
Evaluation of “what’s in my reads but not in my assembly”.
(See http://ivory.idyll.org/blog/2015-wok-notes.html
for details.)
Rifle: Low coverage? (Yes.)
Assembly starts to work @ ~10x
Rifle: strain variation? (Maybe.)
ATTCGTCGATTGGCAAAAGTTCTTTCCAGAGCCTACGGGAGAAGTGTA
|||||||||||||||||||||||||||||||| |||||||||||||||
ATTCGTCGATTGGCAAAAGTTCTTTCCAGAGCTTACGGGAGAAGTGTA
GTCAAAATAAGGTGAGGTTGCTAATCCTCGAACTTTTCAC
||||||||||||||| |||||||| ||||||| |||||||
GTCAAAATAAGGTGAAGTTGCTAACCCTCGAATTTTTCAC
A typical subalignment between all short reads & one long read:
If we saw many medium-high coverage alignments with this
level of variation, => strain variation.
My thoughts on metagenome
assembly & Rifle data:
The Rifle short-read data is low coverage, based
on both indirect (in paper) and direct (our)
observations. This is the first reason why it didn’t
assemble well.
Strain variation is also present, within the limits of
low coverage analysis. That will cause problems
in future
=> Your methods limit and bias your results.
The problem:
Assembly graphs are coming to all of
genomics.
Because they are fundamentally different
they require a completely new bioinformatics
tool chain. (They don’t use FASTA
)
For better or for worse, us bioinformaticians
are not going to write tools that are easy to
use.
It’s hard;
There’s little incentive;
The tool/application needs are incredibly
Who ya gonna call??

to do your bioinformatics?
Choices
(1) Focus on biology and avoid computation as much as
possible.
(2) Integrate large scale data analysis into your biology.
(3) Become purely computational
https://commons.wikimedia.org/wiki/File:Three_options_-_three_choices_scheme.png
Choices
(1) Focus on biology and avoid computation as much as
possible.
(2) Integrate large scale data analysis into your biology.
(3) Become purely computational.
Towards a “bioinformatics
middle class”
Most bioinformaticians are quite ignorant of the biology
you’re doing; biologists are often more aware of the
bioinformatics they’re using.
There is amazing opportunity at the intersection of
biology and computing.
I think of it as a “bioinformatics middle class” –
biologists who are comfortable with computing, and
deploy large scale data analysis in the service of their
biological work.
Towards a “bioinformatics
middle class”
We need many more biologists who have an
intuitive & deep understanding of the
computing.
Such people are rare, and there is no defined
“pipeline” for them. Training must be self-
motivated.
(And higher ed has really abdicated its
responsibilities in this area.)
My top four suggestions
(more at end)
1. Don’t avoid computing; embrace it.
2. Invest in the Internet and social media
(blogs, Twitter) – seqanswers, biostars, etc.
3. Be patient and aware of the time it takes time
to effectively cross-train.
4. Seek out formal training opportunities.
If you’re a senior scientist, or
know any:
Ask them to lobby for funding at this
intersection.
Ask them to lobby for good (nay, excellent)
funding for training opportunities.
Make sure they respect the challenges and
opportunities of large scale data analysis and
modeling (along with those who do it).
Career benefits of doing large-
scale data analysis.
Alternative career paths (i.e. “jobs actually exist in this
area.”)
Flexibility in work hours & location.
Work with an even broader diversity of people and
projects.
Dangers:
It’s easy to get caught up in the computing and ignore
the biology!

but right now training & culture are tilted too much
towards experimental and field research, which
presents its own problems in a data-intensive era of
research.
What’s coming?
Lots more data.
Where am I going?
Data integration.
Figure 2. Summary of challenges associated with the data integration in the proposed project.
Figure via E. Kujawinski
An optimistic message
This is a great time to be alive and doing
research!
We can look at & try to understand
environmental microbes with many new tools
and new approaches!
The skills you need to do this extend across
disciplines, across the public and private
sectors, and cannot be automated or
outsourced!
Thank you for listening!
I’ll be here all week; really
looking forward to it!
More advice.
don’t avoid computing
teach and train what you do know; put together classes and
workshops;
host and run software and data carpentry workshops, then put
together more advanced workshops;
do hackathons or compute-focused events where you just sit
down in groups and work on data analysis.
(push admin to support all this, or just do it without your admin);
invest in the internet and social media – blogs, twitter, biostars

take a CS prof to lunch, seek joint funding, do a sabbatical in a
purely compute lab, etc.
support open source bioinformatics software
invest in reproducibility
be aware that compute people’s time is as or more
oversubscribed as yours & prospectively value it.

Weitere Àhnliche Inhalte

Was ist angesagt?

2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbioc.titus.brown
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4c.titus.brown
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenomec.titus.brown
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangementGuy Coates
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)AllSeq
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017philippbayer
 

Was ist angesagt? (20)

2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangement
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Genome Big Data
Genome Big DataGenome Big Data
Genome Big Data
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 

Andere mochten auch

How to back up files online?
How to back up files online?How to back up files online?
How to back up files online?jessecadelina
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09Eyeblaster Spain
 
Creditmanagement en cloud computing
Creditmanagement en cloud computingCreditmanagement en cloud computing
Creditmanagement en cloud computingPiet van Vugt
 
Romairone, Gregorio
Romairone, GregorioRomairone, Gregorio
Romairone, GregorioGregorio
 
Virtualizing the Next Generation of Server Workloads with AMDℱ
Virtualizing the Next Generation of Server Workloads with AMDℱVirtualizing the Next Generation of Server Workloads with AMDℱ
Virtualizing the Next Generation of Server Workloads with AMDℱJames Price
 
MorsÞ erhversrÄd energimÊrkning
MorsÞ erhversrÄd   energimÊrkningMorsÞ erhversrÄd   energimÊrkning
MorsÞ erhversrÄd energimÊrkningBertel Bolt-JÞrgensen
 
Ashleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesAshleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesTakahe One
 
Stordfest2010 Festival paper
Stordfest2010 Festival paperStordfest2010 Festival paper
Stordfest2010 Festival paperelhope
 
iPOJO 2.x - a tale about dynamism
iPOJO 2.x - a tale about dynamismiPOJO 2.x - a tale about dynamism
iPOJO 2.x - a tale about dynamismClément Escoffier
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swcc.titus.brown
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederBertel Bolt-JĂžrgensen
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talkc.titus.brown
 
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...Kegler Brown Hill + Ritter
 
Interactive NETS*S Workshop, ISTE 2011
Interactive NETS*S Workshop, ISTE 2011Interactive NETS*S Workshop, ISTE 2011
Interactive NETS*S Workshop, ISTE 2011arowland1313
 
DNAçš„ć€©çŸ…ćœ°ç¶Č
DNAçš„ć€©çŸ…ćœ°ç¶ČDNAçš„ć€©çŸ…ćœ°ç¶Č
DNAçš„ć€©çŸ…ćœ°ç¶Čnanchi98
 
The beauty-of-mathematics
The beauty-of-mathematicsThe beauty-of-mathematics
The beauty-of-mathematicsDaniel Chua
 
Coke
CokeCoke
Cokeckntu828
 
Cloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuCloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuPiet van Vugt
 

Andere mochten auch (20)

How to back up files online?
How to back up files online?How to back up files online?
How to back up files online?
 
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09Tendencias En Comunicacion Digital  Eyeblaster Oded Lida Ded09
Tendencias En Comunicacion Digital Eyeblaster Oded Lida Ded09
 
Creditmanagement en cloud computing
Creditmanagement en cloud computingCreditmanagement en cloud computing
Creditmanagement en cloud computing
 
Informationsleder Jane Kruse
Informationsleder Jane KruseInformationsleder Jane Kruse
Informationsleder Jane Kruse
 
Romairone, Gregorio
Romairone, GregorioRomairone, Gregorio
Romairone, Gregorio
 
Virtualizing the Next Generation of Server Workloads with AMDℱ
Virtualizing the Next Generation of Server Workloads with AMDℱVirtualizing the Next Generation of Server Workloads with AMDℱ
Virtualizing the Next Generation of Server Workloads with AMDℱ
 
MorsÞ erhversrÄd energimÊrkning
MorsÞ erhversrÄd   energimÊrkningMorsÞ erhversrÄd   energimÊrkning
MorsÞ erhversrÄd energimÊrkning
 
Ashleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer WhalesAshleigh and Sarah's: Killer Whales
Ashleigh and Sarah's: Killer Whales
 
Stordfest2010 Festival paper
Stordfest2010 Festival paperStordfest2010 Festival paper
Stordfest2010 Festival paper
 
iPOJO 2.x - a tale about dynamism
iPOJO 2.x - a tale about dynamismiPOJO 2.x - a tale about dynamism
iPOJO 2.x - a tale about dynamism
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swc
 
Passivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og mulighederPassivhuse: Udfordringer og muligheder
Passivhuse: Udfordringer og muligheder
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
OSHA Goes On the Attack as the Obama Administration Winds Down: Are You Prepa...
 
Interactive NETS*S Workshop, ISTE 2011
Interactive NETS*S Workshop, ISTE 2011Interactive NETS*S Workshop, ISTE 2011
Interactive NETS*S Workshop, ISTE 2011
 
DNAçš„ć€©çŸ…ćœ°ç¶Č
DNAçš„ć€©çŸ…ćœ°ç¶ČDNAçš„ć€©çŸ…ćœ°ç¶Č
DNAçš„ć€©çŸ…ćœ°ç¶Č
 
The beauty-of-mathematics
The beauty-of-mathematicsThe beauty-of-mathematics
The beauty-of-mathematics
 
Alcohol # 1 concern march 16 2016
Alcohol # 1 concern march 16 2016Alcohol # 1 concern march 16 2016
Alcohol # 1 concern march 16 2016
 
Coke
CokeCoke
Coke
 
Cloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvuCloudxp keynote 19 sept pvu
Cloudxp keynote 19 sept pvu
 

Ähnlich wie 2015 aem-grs-keynote

BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing techc.titus.brown
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talkc.titus.brown
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformaticsnadimissimple
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformaticsJan Aerts
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftwareYannick Wurm
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talkc.titus.brown
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformaticsijtsrd
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talkc.titus.brown
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practicesc.titus.brown
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big..."Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...Benjamin Keller
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 

Ähnlich wie 2015 aem-grs-keynote (20)

2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing tech
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
 
2013 caltech-edrn-talk
2013 caltech-edrn-talk2013 caltech-edrn-talk
2013 caltech-edrn-talk
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big..."Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...
"Got a nail? I got a hammer": Lessons for data science from the "dawn" of big...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 

Mehr von c.titus.brown

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-reviewc.titus.brown
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcastc.titus.brown
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbugc.titus.brown
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenomec.titus.brown
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chickenc.titus.brown
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talkc.titus.brown
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-dddc.titus.brown
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slidesc.titus.brown
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynotec.titus.brown
 

Mehr von c.titus.brown (14)

2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 

KĂŒrzlich hochgeladen

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂ©rgio Sacani
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.Nitya salvi
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSĂ©rgio Sacani
 

KĂŒrzlich hochgeladen (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

2015 aem-grs-keynote

  • 1. Complex metagenome assembly, + bonus career thoughts. C. Titus Brown UC Davis ctbrown@ucdavis.edu
  • 2.
  • 3. Hello! Research background: Computing, modeling, and data analysis: 1989-2000 (high school & undergrad+) Molecular biology, genomics, and data analysis: 2000-2007 (grad school + postdoc) Bioinformatics, data analysis, and Comp Sci: 2007- present (assistant professor) Genomics and Veterinary Medicine (?)
  • 4. Two topics for this talk: 1. Metagenome assembly. 2. Careers & a “middle class” of bioinformaticians.
  • 5. Shotgun metagenomics Collect samples; Extract DNA; Feed into sequencer; Computationally analyze. Wikipedia: Environmental shotgun sequencing.png
  • 6. To assemble, or not to assemble? Goals: reconstruct phylogenetic content and predict functional potential of ensemble. Should we analyze short reads directly? OR Do we assemble short reads into longer contigs first, and then analyze the contigs?
  • 7. Assembly: good. Howe et al., 2014, PMID 24632729 Assemblies yield much more significant homology matches.
  • 8. But! Does assembly work well!? (Short reads, chimerism, strain variation, coverage, compute resources, etc. etc.)
  • 9. Yes: metagenome assemblers recover the majority of known content from a mock community. Velvet IDBA Spades Total length (>= 0 bp) 1.6E+08 2.0E+08 2.0E+08 Total length (>= 1000 bp) 1.6E+08 1.9E+08 1.9E+08 Largest contig 561,449 979,948 1,387,918 # misassembled contigs 631 1032 752 Genome fraction (%) 72.949 90.969 90.424 Duplication ratio 1.004 1.007 1.004 Results: Dr. Sherine AwadReads from Shakya et al., 2013; pmid 23387867
  • 10. But! A study of the Rifle site comparing long read (Moleculo/TruSeq) and short read/assembly content concluded that their short read assembly was not comprehensive. “Low rate of read mapping (18-30%) is typically indicative of complex communities with a large number of low abundance genomes or with high degree of species and strain variations.” Sharon et al., Banfield lab; PMID 25665577
  • 11. The dirty not-so-secret (?) about sequence assembly: The assembler will simply discard two types of data. 1. Low coverage data - can’t be reconstructed with confidence; may be erroneous. 2. Highly polymorphic data – confuses the assembler.
  • 12. So: why didn’t the Rifle data assemble? There are no published approaches that will discriminate between low coverage and strain variation. But we’ve known about this problem for ages. So we’ve been working with something called “assembly graphs”.
  • 13. Assembly graphs. Assembly “graphs” are a way of representing sequencing data that retains all variation; assemblers then crunch this down into a single sequence. Image from Iqbal et al., 2012. PMID 23172865
  • 14. Our work on assembly graphs enables: Evaluation of data set coverage profiles prior to assembly. Variant calling and quantification on raw metagenomic data. Analysis of strain variation. Evaluation of “what’s in my reads but not in my assembly”. (See http://ivory.idyll.org/blog/2015-wok-notes.html for details.)
  • 15. Rifle: Low coverage? (Yes.) Assembly starts to work @ ~10x
  • 16. Rifle: strain variation? (Maybe.) ATTCGTCGATTGGCAAAAGTTCTTTCCAGAGCCTACGGGAGAAGTGTA |||||||||||||||||||||||||||||||| ||||||||||||||| ATTCGTCGATTGGCAAAAGTTCTTTCCAGAGCTTACGGGAGAAGTGTA GTCAAAATAAGGTGAGGTTGCTAATCCTCGAACTTTTCAC ||||||||||||||| |||||||| ||||||| ||||||| GTCAAAATAAGGTGAAGTTGCTAACCCTCGAATTTTTCAC A typical subalignment between all short reads & one long read: If we saw many medium-high coverage alignments with this level of variation, => strain variation.
  • 17. My thoughts on metagenome assembly & Rifle data: The Rifle short-read data is low coverage, based on both indirect (in paper) and direct (our) observations. This is the first reason why it didn’t assemble well. Strain variation is also present, within the limits of low coverage analysis. That will cause problems in future => Your methods limit and bias your results.
  • 18. The problem: Assembly graphs are coming to all of genomics. Because they are fundamentally different they require a completely new bioinformatics tool chain. (They don’t use FASTA
) For better or for worse, us bioinformaticians are not going to write tools that are easy to use. It’s hard; There’s little incentive; The tool/application needs are incredibly
  • 19. Who ya gonna call?? 
to do your bioinformatics?
  • 20. Choices (1) Focus on biology and avoid computation as much as possible. (2) Integrate large scale data analysis into your biology. (3) Become purely computational https://commons.wikimedia.org/wiki/File:Three_options_-_three_choices_scheme.png
  • 21. Choices (1) Focus on biology and avoid computation as much as possible. (2) Integrate large scale data analysis into your biology. (3) Become purely computational.
  • 22. Towards a “bioinformatics middle class” Most bioinformaticians are quite ignorant of the biology you’re doing; biologists are often more aware of the bioinformatics they’re using. There is amazing opportunity at the intersection of biology and computing. I think of it as a “bioinformatics middle class” – biologists who are comfortable with computing, and deploy large scale data analysis in the service of their biological work.
  • 23. Towards a “bioinformatics middle class” We need many more biologists who have an intuitive & deep understanding of the computing. Such people are rare, and there is no defined “pipeline” for them. Training must be self- motivated. (And higher ed has really abdicated its responsibilities in this area.)
  • 24. My top four suggestions (more at end) 1. Don’t avoid computing; embrace it. 2. Invest in the Internet and social media (blogs, Twitter) – seqanswers, biostars, etc. 3. Be patient and aware of the time it takes time to effectively cross-train. 4. Seek out formal training opportunities.
  • 25. If you’re a senior scientist, or know any: Ask them to lobby for funding at this intersection. Ask them to lobby for good (nay, excellent) funding for training opportunities. Make sure they respect the challenges and opportunities of large scale data analysis and modeling (along with those who do it).
  • 26. Career benefits of doing large- scale data analysis. Alternative career paths (i.e. “jobs actually exist in this area.”) Flexibility in work hours & location. Work with an even broader diversity of people and projects.
  • 27. Dangers: It’s easy to get caught up in the computing and ignore the biology! 
but right now training & culture are tilted too much towards experimental and field research, which presents its own problems in a data-intensive era of research.
  • 29. Where am I going? Data integration. Figure 2. Summary of challenges associated with the data integration in the proposed project. Figure via E. Kujawinski
  • 30. An optimistic message This is a great time to be alive and doing research! We can look at & try to understand environmental microbes with many new tools and new approaches! The skills you need to do this extend across disciplines, across the public and private sectors, and cannot be automated or outsourced!
  • 31. Thank you for listening! I’ll be here all week; really looking forward to it!
  • 32. More advice. don’t avoid computing teach and train what you do know; put together classes and workshops; host and run software and data carpentry workshops, then put together more advanced workshops; do hackathons or compute-focused events where you just sit down in groups and work on data analysis. (push admin to support all this, or just do it without your admin); invest in the internet and social media – blogs, twitter, biostars
 take a CS prof to lunch, seek joint funding, do a sabbatical in a purely compute lab, etc. support open source bioinformatics software invest in reproducibility be aware that compute people’s time is as or more oversubscribed as yours & prospectively value it.

Hinweis der Redaktion

  1. Tweet; funding; affiliation.
  2. Nothing in this life that is worth doing is *easy*.