SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
FIND MEANING IN COMPLEXITY
Š Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved.
Jane Landolin
Wednesday, January 21st
Open Data: PacBio Long Reads for Model Organisms
Best Balti in Bay Area ?
2
Little Delhi
• Bad neighborhood, good food
• San Francisco “tenderloin”
• 83 Eddy St
• (415) 398-3173
Amber India
• Unlimited Balti Buffet
• Mountain View
• Palo Alto
• San Jose
• San Francisco
Bioinformatics Support
3
• Bioinformatics Scientist in the Customer Support group
• Focus on enabling our customers
• Installing and using SMRT Analysis software
• Interpreting data
• Experimental design
• Referencing third-party tools
• Customer Portal/Github Issues
• We support everyone with PacBio Data
http://marketanalysts.lifescienceexecutive.com/blog/?p=1510
Mini Survey: Customer Service and Technical Support for Life Science Products Courtesy of BioInformatics, LLC June 2013
How important is the quality of technical
support in your decision to purchase a
new instrument?
Agenda
4
• Paper summary
• How to download the data
• DNA & Sample Prep
• Quality filter & technical validation
• Summary Statistics
Goal: Instructions to enable users
• Analysis and Assembly
• S.Cerevisiae
• Neurospora
• Drosophila
• MHAP + Human
• Thoughts on open data
http://www.nature.com/articles/sdata201445
Summary
5
• BioRxiv paper first released on
August 15 2014
• Published at Scientific Data online on
Nov 25 2014 (4 months later)
• Data released without restriction
• Data released at NCBI SRA
(.sra .fastq format) & Amazon S3 (.h5 format)
• Five Model Organisms
(E.coli, S.cerevisiae, N.crassa, A.Thaliana, D.Melanogaster)
• Eight datasets
• 55.8 Giga-bases of filtered sequence
(adapters and low quality sequence removed)
How to Download Data (Supplementary Table 1)
6
0
2000
4000
6000
8000
10000
12000
14000
Data & Technology Release Timelines (Newest P6C4)
AverageReadLength(bp)
2008 2009 2010 2011 2012 2013 2014 2015
Early PacBioÂŽ chemistries
453 1012 1734
LPR
FCR
ECR2
C2–C2
P4–C2
P5–C3
P6–C4
Half of data in reads: >14,000 bp
Average read length: 10,000 - 15,000 bp
Consensus accuracy: Achieves QV50 @30X
Preprint (BiorXiv)
Publication
DNA Prep is Lab & Organism-specific
8
Get large fragments of DNA:
- gentle-handling of DNA
- sequence right after prep
- minimal freeze-thaws
- Blue Pippin size selection
Remove Contaminants
- CTAB
- CsCl
- RNase
- Phenol Chloroform
- Ampure bead cleanup
40kb
20kb
15kb
S1 S2 S3
Quality Filtering
9
• In SMRT Sequencing, we typically have high yields after quality filtering
• On average, 95% of bases are high quality bases and pass quality filtering
• All high-quality samples retained 90-97% of the bases after filtering
(E. coli, A. thaliana, D. melanogaster)
o Retain high-quality (HQ) regions, remove others
o Remove adapter sequences between subreads
Quality Filtering Statistics
10
Mapping and Coverage Statistics
11
• Subreads are mapped to available reference
- In some cases, reference is not the
same strain
- Results are typical of SMRT Sequencing
- concordance includes indels and mismatches
- mode at 86%
• De Novo assemblies achieve
> 99.99% consensus accuracy
• Coverage is even along entire genome
- Expected distribution of coverage
- Least bias
- Random profile
• Mapping artifacts reflect poor quality of
reference genome, not sequence data
FIND MEANING IN COMPLEXITY
Š Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved.
New Analysis and Genome Assemblies
PacBio-only De Novo Sequencing of Yeast
I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI M
100 kb
Reference (S228C): 17 chromosomes
• Genome size = 12.3 Mb
• N50 = 950 kb
• Max chrom = 1.5 Mb (chr. IV)
HGAP de novo assembly : 30 contigs
• Assembly size = 12.3 Mb
• N50 = 770 kb
• Max contig = 1.5 Mb (chr. IV)
Neurospora HGAP assembly fills 356 gaps (only 4 left)
14
Chromosome
(reference scaffold)
Length # gaps Assembled Contigs # gaps
Supercontig 12.1 9.7 Mb 89 Contig_54 (6.6 Mb)
Contig_63 (3.4 Mb)
1
Supercontig 12.2 4.5 Mb 56 Contig 60 (4.5 Mb) 0
Supercontig 12.3 5.3 Mb 45 Contig 59 (5.3 Mb) 0
Supercontig 12.4 6.0 Mb 47 Contig 57 (6.2 Mb)
Contig 58 (12 kb)
1
Supercontig 12.5 6.4 Mb 42 Contig 56 (6.4 Mb) 0
Supercontig 12.6 4.2 Mb 38 Contig 62 (4.3 Mb) 0
Supercontig 12.7 4.3 Mb 43 Contig 69 (2.6 Mb)
Contig 70 (1.7 Mb)
Contig 61 (20 kb)
2
http://figshare.com/articles/ENCODE_like_study_using_PacBio_sequencing/928630
• Added >0.5Mb of sequence
Telomere-to-Telomere Assembly!
15
Boundary of 276 kb centromere
4.5 Mb chromosome captured in contig 60
Drosophila Assembly (~160 Mb)
16
Reference genome De novo assembly
chr2L 6 pieces 4-6 pieces
chr2R 27 pieces 2 pieces
chr3L 22 pieces 1 piece
chr3R 15 pieces 3 pieces
chr4 2 pieces 3 pieces
chrX 3 pieces 42 pieces
10+ years
shotgun sanger + BAC + Opgen +
manual finishing
$millions$
1 week – collect DNA
1 week – sample prep
6 days – sequencing
3 weeks – assembly
$9,000
24.6 MB!!
Drosophila Y Chromosome
Release 5 reference contains
~1% of chromosome Y
This assembly: >50%
Drosophilla Assembly (vs. Synthetic reads)
18
Assembly I
(FALCON)
Assembly II
(Celera
Assembler +
PBcR)
Moleculo
(Celera
Assembler)
Number of
contigs
434 128 5,066
N50
length
5.0 Mb 15.3 Mb 0.1 Mb
http://biorxiv.org/content/early/2014/06/17/001834
“By directly sequencing long
molecules, these third-generation
technologies will likely outperform
TruSeq synthetic long-reads in certain
capacities, such as assembly contiguity
enabled by homogeneous genome
coverage. Indeed, preliminary results
from the assembly of a different
substrain of D. melanogaster using
corrected PacBio data achieved an N50
contig length of 15.3 Mbp and closed two
of the remaining gaps in the euchromatin
of the Release 5 reference sequence
(Landolin, et al., 2014,
http://dx.doi.org/10.6084/m9.figshare.976097).”
Completely spans repeat elements
19
PacBio
Reads
Moleculo
Synthetic
Reads
Repeats
6kb ROO elements
Stacked
reads
No reads
simple repeats
(chr2L:922,441-1,013,372)
96
5.2
PacBio Moleculo
0
20
40
60
80
100
%
Resolved roo TEs:
Sequences through GC-rich regions
20
PacBio
GC Percent
(chr2L:1,714,784-1,741,283)
Moleculo
Synthetic
Reads
Advances in PacBio-only De Novo Assembly
Spinach 1G
Contig N50
531 kbpDrosophila 170M
Contig N50
4.5 MbpArabidopsis 120M
Contig N50
7.1 Mbp
Human 3.2 G
Contig N50
4.4 Mbp,
Max=44 Mbp
(Assembly
powered by
GoogleÂŽ
Exacycle)
2013 2014
Bacteria:
Finished
Genomes
Yeast 12M
Resolve most
chromosomes
“Haploid” Assemblies
Next Challenge:
Diploid Assemblies
MinHash Alignment Process (MHAP)
22http://biorxiv.org/content/early/2014/08/14/008003
For D. melanogaster, MHAP
achieved a 600-fold speedup
relative to prior methods and a
cloud computing cost of a few
hundred dollars.
Public Genome Assembly Tools (blog/preprint)
• Dazzler
– Gene Myers, U. Dresden
- Benchmarking on H. sapiens
- Distributed filesystem (GlusterFS) to optimize read/write I/O operations
- New data structures to minimize data loading/memory burden (.qva, DAM)
- Blog: https://dazzlerblog.wordpress.com/
- Code: https://github.com/thegenemyers/DALIGNER
• ECtools
- Mike Schatz, CSHL
- Benchmarking on E.coli, S. Cerevisiae, A. thaliana, O. sativa
- Hybrid Assembly
- Support Vector Regression
- Preprint: http://schatzlab.cshl.edu/data/ectools/AssemblyComplexity.pdf
- Code: https://github.com/jgurtowski/ectools
23
Human PacBio Data
24
• Resolved >26,000 euchromatic structural variants at the base-pair level
• ~22,000 (85%) of these are novel
• Closes/extends 55% of the remaining gaps in human reference genome
Chaisson et al. (2014) Nature doi:10.1038/nature13907
PacBio Data vs. GRCh37 & 1000 Genomes Project
Chaisson et al. (2014) Nature doi:10.1038/nature13907
Gonzaga-Jauregui et al. (2012) Annu Rev Med 63: 35-61
Genomic Variation Detection by 2nd Gen Technologies
“Approximately 35% of the genes in the human genome are
encompassed either totally or partially by a CNV that can alter their
expression or even their structure, possibly giving rise to novel fusion
transcripts.”
“Detection of structural variation is imperative in any WGS study.”
Behavioral Diseases Associated with Structural Variation
Girirajan & Eichler (2010) Human Molecular Genetics 19: R176-187
Accelerating discovery in open-access/preprint world
28
Data Release Paper:
BiorXiv preprint: http://biorxiv.org/content/early/2014/10/23/008037
Publication: http://www.nature.com/articles/sdata201445
Neurospora:
Poster: http://figshare.com/articles/ENCODE_like_study_using_PacBio_sequencing/928630
Publication: In the works
Drosophila:
Poster: http://figshare.com/articles/A_better_Drosophila_Melanogaster_genome_by_long_read_sequencing/976097
Publication: In the works
Moleculo Synthetic Reads Paper
BiorXiv preprint: http://biorxiv.org/content/early/2014/01/19/001834
MinHash Assembly Process Paper:
BiorXiv: http://biorxiv.org/content/early/2014/08/14/008003
Publication: Accepted
Human Structural Complexity Paper:
Publication: http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html
What’s next for PacBio
29
Data
• https://github.com/PacificBiosciences/DevNet/wiki/Datasets
o P6C4 C.elegans 40X dataset
o HLA Multiplexed GenDx Amplicon
Performance
Analysis
Software/Analysis
• Scaling with increasing platform throughput and provide faster time to results
• De novo assembly for larger genomes
• Diploid Genome Assembler
• Regional methylation analysis for large genomes
• Intuitive and easy to use Graphical User Interface (SMRT® Portal)
Estimated Output per
SMRTÂŽ Cell
Read Length
Avg N50bases Max
Jan 2013 ~100 Mb 4,500 bp 6 kb >20 kb
Oct 2013
P5-C3
~400 Mb 8,500 bp 10 kb >40 kb
Oct 2014
P6-C4
500 Mb – 1 Gb 10-15 kb 12-18 kb >60 kb
Active Loading, Template Prep,
& Read Length Improvements
2 – 4 Gb 15-20 kb 17-23 kb >60 kb
2015 roadmap (http://blog.pacificbiosciences.com/)
Thank you!
30
PacBio
Kristi Spittle-Kim, Primo Babayan, Paul Peluso, David Rank, Jonas Korlach
Collaborators
Casey Bergman, Sue Celniker, Jane Yeadon, David Catcheside, Joachim Li
Community
Lex Nederbragt, Konstantin Berlin, Sergey Koren, Adam Phillippy, Gene Myers,
Mike Schatz, Nick Loman

Weitere ähnliche Inhalte

Was ist angesagt?

NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Miten Jain
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?Adam Phillippy
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsGenomeInABottle
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 

Was ist angesagt? (20)

NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...Generating high-quality reference human genomes using PromethION nanopore seq...
Generating high-quality reference human genomes using PromethION nanopore seq...
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 

Ähnlich wie Open pacbiomodelorgpaper j_landolin_20150121

BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015hansjansen9999
 
Bda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databasesBda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databasesInterpretOmics
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfCRISTIANALONSORODRIG1
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies. ShadenAlharbi
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
DNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsDNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsJeffrey Funk
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2Isabelle Chiu
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Integrated DNA Technologies
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposalGenomeInABottle
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbioc.titus.brown
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 

Ähnlich wie Open pacbiomodelorgpaper j_landolin_20150121 (20)

BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Bda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databasesBda2015 tutorial-part2-data&databases
Bda2015 tutorial-part2-data&databases
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies.
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
DNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsDNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implications
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 

KĂźrzlich hochgeladen

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSĂŠrgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSĂŠrgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSĂŠrgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSĂŠrgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSĂŠrgio Sacani
 

KĂźrzlich hochgeladen (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Open pacbiomodelorgpaper j_landolin_20150121

  • 1. FIND MEANING IN COMPLEXITY Š Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved. Jane Landolin Wednesday, January 21st Open Data: PacBio Long Reads for Model Organisms
  • 2. Best Balti in Bay Area ? 2 Little Delhi • Bad neighborhood, good food • San Francisco “tenderloin” • 83 Eddy St • (415) 398-3173 Amber India • Unlimited Balti Buffet • Mountain View • Palo Alto • San Jose • San Francisco
  • 3. Bioinformatics Support 3 • Bioinformatics Scientist in the Customer Support group • Focus on enabling our customers • Installing and using SMRT Analysis software • Interpreting data • Experimental design • Referencing third-party tools • Customer Portal/Github Issues • We support everyone with PacBio Data http://marketanalysts.lifescienceexecutive.com/blog/?p=1510 Mini Survey: Customer Service and Technical Support for Life Science Products Courtesy of BioInformatics, LLC June 2013 How important is the quality of technical support in your decision to purchase a new instrument?
  • 4. Agenda 4 • Paper summary • How to download the data • DNA & Sample Prep • Quality filter & technical validation • Summary Statistics Goal: Instructions to enable users • Analysis and Assembly • S.Cerevisiae • Neurospora • Drosophila • MHAP + Human • Thoughts on open data http://www.nature.com/articles/sdata201445
  • 5. Summary 5 • BioRxiv paper first released on August 15 2014 • Published at Scientific Data online on Nov 25 2014 (4 months later) • Data released without restriction • Data released at NCBI SRA (.sra .fastq format) & Amazon S3 (.h5 format) • Five Model Organisms (E.coli, S.cerevisiae, N.crassa, A.Thaliana, D.Melanogaster) • Eight datasets • 55.8 Giga-bases of filtered sequence (adapters and low quality sequence removed)
  • 6. How to Download Data (Supplementary Table 1) 6
  • 7. 0 2000 4000 6000 8000 10000 12000 14000 Data & Technology Release Timelines (Newest P6C4) AverageReadLength(bp) 2008 2009 2010 2011 2012 2013 2014 2015 Early PacBioÂŽ chemistries 453 1012 1734 LPR FCR ECR2 C2–C2 P4–C2 P5–C3 P6–C4 Half of data in reads: >14,000 bp Average read length: 10,000 - 15,000 bp Consensus accuracy: Achieves QV50 @30X Preprint (BiorXiv) Publication
  • 8. DNA Prep is Lab & Organism-specific 8 Get large fragments of DNA: - gentle-handling of DNA - sequence right after prep - minimal freeze-thaws - Blue Pippin size selection Remove Contaminants - CTAB - CsCl - RNase - Phenol Chloroform - Ampure bead cleanup 40kb 20kb 15kb S1 S2 S3
  • 9. Quality Filtering 9 • In SMRT Sequencing, we typically have high yields after quality filtering • On average, 95% of bases are high quality bases and pass quality filtering • All high-quality samples retained 90-97% of the bases after filtering (E. coli, A. thaliana, D. melanogaster) o Retain high-quality (HQ) regions, remove others o Remove adapter sequences between subreads
  • 11. Mapping and Coverage Statistics 11 • Subreads are mapped to available reference - In some cases, reference is not the same strain - Results are typical of SMRT Sequencing - concordance includes indels and mismatches - mode at 86% • De Novo assemblies achieve > 99.99% consensus accuracy • Coverage is even along entire genome - Expected distribution of coverage - Least bias - Random profile • Mapping artifacts reflect poor quality of reference genome, not sequence data
  • 12. FIND MEANING IN COMPLEXITY Š Copyright 2013 by Pacific Biosciences of California, Inc. All rights reserved. New Analysis and Genome Assemblies
  • 13. PacBio-only De Novo Sequencing of Yeast I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI M 100 kb Reference (S228C): 17 chromosomes • Genome size = 12.3 Mb • N50 = 950 kb • Max chrom = 1.5 Mb (chr. IV) HGAP de novo assembly : 30 contigs • Assembly size = 12.3 Mb • N50 = 770 kb • Max contig = 1.5 Mb (chr. IV)
  • 14. Neurospora HGAP assembly fills 356 gaps (only 4 left) 14 Chromosome (reference scaffold) Length # gaps Assembled Contigs # gaps Supercontig 12.1 9.7 Mb 89 Contig_54 (6.6 Mb) Contig_63 (3.4 Mb) 1 Supercontig 12.2 4.5 Mb 56 Contig 60 (4.5 Mb) 0 Supercontig 12.3 5.3 Mb 45 Contig 59 (5.3 Mb) 0 Supercontig 12.4 6.0 Mb 47 Contig 57 (6.2 Mb) Contig 58 (12 kb) 1 Supercontig 12.5 6.4 Mb 42 Contig 56 (6.4 Mb) 0 Supercontig 12.6 4.2 Mb 38 Contig 62 (4.3 Mb) 0 Supercontig 12.7 4.3 Mb 43 Contig 69 (2.6 Mb) Contig 70 (1.7 Mb) Contig 61 (20 kb) 2 http://figshare.com/articles/ENCODE_like_study_using_PacBio_sequencing/928630 • Added >0.5Mb of sequence
  • 15. Telomere-to-Telomere Assembly! 15 Boundary of 276 kb centromere 4.5 Mb chromosome captured in contig 60
  • 16. Drosophila Assembly (~160 Mb) 16 Reference genome De novo assembly chr2L 6 pieces 4-6 pieces chr2R 27 pieces 2 pieces chr3L 22 pieces 1 piece chr3R 15 pieces 3 pieces chr4 2 pieces 3 pieces chrX 3 pieces 42 pieces 10+ years shotgun sanger + BAC + Opgen + manual finishing $millions$ 1 week – collect DNA 1 week – sample prep 6 days – sequencing 3 weeks – assembly $9,000 24.6 MB!!
  • 17. Drosophila Y Chromosome Release 5 reference contains ~1% of chromosome Y This assembly: >50%
  • 18. Drosophilla Assembly (vs. Synthetic reads) 18 Assembly I (FALCON) Assembly II (Celera Assembler + PBcR) Moleculo (Celera Assembler) Number of contigs 434 128 5,066 N50 length 5.0 Mb 15.3 Mb 0.1 Mb http://biorxiv.org/content/early/2014/06/17/001834 “By directly sequencing long molecules, these third-generation technologies will likely outperform TruSeq synthetic long-reads in certain capacities, such as assembly contiguity enabled by homogeneous genome coverage. Indeed, preliminary results from the assembly of a different substrain of D. melanogaster using corrected PacBio data achieved an N50 contig length of 15.3 Mbp and closed two of the remaining gaps in the euchromatin of the Release 5 reference sequence (Landolin, et al., 2014, http://dx.doi.org/10.6084/m9.figshare.976097).”
  • 19. Completely spans repeat elements 19 PacBio Reads Moleculo Synthetic Reads Repeats 6kb ROO elements Stacked reads No reads simple repeats (chr2L:922,441-1,013,372) 96 5.2 PacBio Moleculo 0 20 40 60 80 100 % Resolved roo TEs:
  • 20. Sequences through GC-rich regions 20 PacBio GC Percent (chr2L:1,714,784-1,741,283) Moleculo Synthetic Reads
  • 21. Advances in PacBio-only De Novo Assembly Spinach 1G Contig N50 531 kbpDrosophila 170M Contig N50 4.5 MbpArabidopsis 120M Contig N50 7.1 Mbp Human 3.2 G Contig N50 4.4 Mbp, Max=44 Mbp (Assembly powered by GoogleÂŽ Exacycle) 2013 2014 Bacteria: Finished Genomes Yeast 12M Resolve most chromosomes “Haploid” Assemblies Next Challenge: Diploid Assemblies
  • 22. MinHash Alignment Process (MHAP) 22http://biorxiv.org/content/early/2014/08/14/008003 For D. melanogaster, MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars.
  • 23. Public Genome Assembly Tools (blog/preprint) • Dazzler – Gene Myers, U. Dresden - Benchmarking on H. sapiens - Distributed filesystem (GlusterFS) to optimize read/write I/O operations - New data structures to minimize data loading/memory burden (.qva, DAM) - Blog: https://dazzlerblog.wordpress.com/ - Code: https://github.com/thegenemyers/DALIGNER • ECtools - Mike Schatz, CSHL - Benchmarking on E.coli, S. Cerevisiae, A. thaliana, O. sativa - Hybrid Assembly - Support Vector Regression - Preprint: http://schatzlab.cshl.edu/data/ectools/AssemblyComplexity.pdf - Code: https://github.com/jgurtowski/ectools 23
  • 24. Human PacBio Data 24 • Resolved >26,000 euchromatic structural variants at the base-pair level • ~22,000 (85%) of these are novel • Closes/extends 55% of the remaining gaps in human reference genome Chaisson et al. (2014) Nature doi:10.1038/nature13907
  • 25. PacBio Data vs. GRCh37 & 1000 Genomes Project Chaisson et al. (2014) Nature doi:10.1038/nature13907
  • 26. Gonzaga-Jauregui et al. (2012) Annu Rev Med 63: 35-61 Genomic Variation Detection by 2nd Gen Technologies “Approximately 35% of the genes in the human genome are encompassed either totally or partially by a CNV that can alter their expression or even their structure, possibly giving rise to novel fusion transcripts.” “Detection of structural variation is imperative in any WGS study.”
  • 27. Behavioral Diseases Associated with Structural Variation Girirajan & Eichler (2010) Human Molecular Genetics 19: R176-187
  • 28. Accelerating discovery in open-access/preprint world 28 Data Release Paper: BiorXiv preprint: http://biorxiv.org/content/early/2014/10/23/008037 Publication: http://www.nature.com/articles/sdata201445 Neurospora: Poster: http://figshare.com/articles/ENCODE_like_study_using_PacBio_sequencing/928630 Publication: In the works Drosophila: Poster: http://figshare.com/articles/A_better_Drosophila_Melanogaster_genome_by_long_read_sequencing/976097 Publication: In the works Moleculo Synthetic Reads Paper BiorXiv preprint: http://biorxiv.org/content/early/2014/01/19/001834 MinHash Assembly Process Paper: BiorXiv: http://biorxiv.org/content/early/2014/08/14/008003 Publication: Accepted Human Structural Complexity Paper: Publication: http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13907.html
  • 29. What’s next for PacBio 29 Data • https://github.com/PacificBiosciences/DevNet/wiki/Datasets o P6C4 C.elegans 40X dataset o HLA Multiplexed GenDx Amplicon Performance Analysis Software/Analysis • Scaling with increasing platform throughput and provide faster time to results • De novo assembly for larger genomes • Diploid Genome Assembler • Regional methylation analysis for large genomes • Intuitive and easy to use Graphical User Interface (SMRTÂŽ Portal) Estimated Output per SMRTÂŽ Cell Read Length Avg N50bases Max Jan 2013 ~100 Mb 4,500 bp 6 kb >20 kb Oct 2013 P5-C3 ~400 Mb 8,500 bp 10 kb >40 kb Oct 2014 P6-C4 500 Mb – 1 Gb 10-15 kb 12-18 kb >60 kb Active Loading, Template Prep, & Read Length Improvements 2 – 4 Gb 15-20 kb 17-23 kb >60 kb 2015 roadmap (http://blog.pacificbiosciences.com/)
  • 30. Thank you! 30 PacBio Kristi Spittle-Kim, Primo Babayan, Paul Peluso, David Rank, Jonas Korlach Collaborators Casey Bergman, Sue Celniker, Jane Yeadon, David Catcheside, Joachim Li Community Lex Nederbragt, Konstantin Berlin, Sergey Koren, Adam Phillippy, Gene Myers, Mike Schatz, Nick Loman