SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Transcript discovery and
 gene model correction
 using next generation
    sequencing data

    Sucheta Tripathy, 6th July
             2012
NextGen Sequencing Methods
 454 sequencing methods(2006)
     Principles of pyrophosphate detection(1985, 1988)
 Illumina(Solexa) Genome sequencing
    methods(2007)
   Applied Biosystems ABI SOLiD System(2007)
   Helicos single molecule sequencing(Helioscope,
    2007)
   Pacific Biosciences single-molecule real-
    time(SMRT) technology, 2010
   Sequenom for Nanotechnology based
    sequencing.
   BioNanomatrix nanofluidiscs.
   RNAP technology.
Cost
Roberts et al.
Genome Biology 2011
RNASeq
 Catalogue all species of transcripts.
   mRNA
   Non-coding RNA
   Small RNA
 Splicing patterns or other post-transcriptional
  modifications.
 Quantify the expression levels.
Topics covered
 Sequence formats
   Calculate the sequencing depth of coverage
 Data Analysis Workflow
   Mapping programs
     Output data files
       SAM
       SHRIMP
       MAQ
   Clustering and assembly programs
   Finding new genes and correction of existing genes
   Annotation of RNAseq data
Input File Types
                 @SNPSTER4_90_307R0AAXX:2:41:528:604 run=080625_SNPSTER4_090_307R0AAXX
                 GCGCCTATCCACTTTGCGGTCTTCCAAAGNCTCCGG
Raw              +
                 IIIIIIIIIIIIIIIIIIIIIIIIII,II!IIIIII
sequence files
in csfasta or    >853_22_43_F3
                 T32310120021231211023112232332233113303231202211332
fastq format
                 >853_22_43_F3
                 20 24 23 22 14 13 18 12 23 22 14 14 17 26 26 18 12 17 16 26 23 16 15 16 25 5 14
                 25 26 23 8 10 9 20 2 11 2 9 25 26 8 6 19 24 15 18 6 10 20 12
Calculate the sequencing depth of
coverage
 Read Length
 Number of reads
 GeneSpace size/genome size

Read Length * Number of Reads/GeneSpace (or genome size)

Problem: 12 million reads , read length = 50 bases, Total
GeneSpace=8 MB
      12 * 10^6 * 50/8 * 10^6 = 75X
Part -1 : Alignment of the reads to the reference Genome

 Raw                                                  Reads mapped to
                         QC by R
 Sequence                                          reference Bowtie,
                         ShortReads
 Data                                              BWA,        Shrimp
 Files(FastQ/
 colorspace)

                            1. Filter out spike-
    BEDTools
                               ins
    1. Read Depth
                            2. Filter reads
       of coverage
                               mapping multi
    2. Manipulatio
                               locations
       n of
                            3. Sam -> Bam
       BED,SAM,
                            4. Remove PCR
       BAM, GTF,
                               duplicates
       GFF files
                            5. Sort, View,
                               pileup, merge


      SNP
      discovery,
      indel
Part 2: Data Anlysis




    Assembly of                     Assembly of
    Mapped reads                    raw QCd
    (cufflink)                      reads by
                                    denovo
                                    methods
                                    Abyss, Velvet

                                                          Gene Model
                       Align                              correction/ju
     Merging           assembled                          nction
     cufflink          reads back to                      finding
     outputs from      genome(BLAT)                       TopHat,
     different                                            Transabyss
                                               Splice
     libraries                                 Variants
     (cuffcompare
     )
                       Expression Analysis
Copy                   and differential
Number                 expression (cuffdiff,
Variation              DEGseq, edgeR)
Zhong Wang et
al; Nat. Rev.
Genetics, 2009
Mapping
 One or two mis-matches < 35 bases
 One insertion/deletion.
 K-mer based seeding.




 •Identification of Novel Transcripts.
 •Transcript abundance.
Available tools for Nextgen
         sequence alignment
BFAST: Blat like Fast Alignment Tool.
Bowtie: Burrows-Wheeler-Transformed (BWT)
index.
BWA: Gapped global alignment wrt query
sequences.
ELAND: Is part of Illumina distr. And runs on
single processor, Local Alignment.
SOAP: Short Oligonucleotide Alignment Program.
SSAHA: SSAHA (Sequence Search and
Alignment by Hashing Algorithm)
SHRiMP(Short Read Mapping algorithm)
SOCS: Rabin-Karp string search algorithm, which
Integrated Pipeline

• SOLiD™ System Analysis Pipeline Tool
  (Corona Lite)
• CLCBio Genomic workbench.
• Partek
• Galaxy Server.
• ERANGE: Is a full package for RNASeq
  and chipSeq data analysis
• DESEQ(used by edgeR package)
Output File Formats
     SAM(Sequence Alignment and Mapping)
        SAM              BAM
        Sorting/indexing BAM/SAM files
        Extracting and viewing alignment
        SNP calling(mpileup)
        Text viewer(Tview)

1082_1988_1406_F3          16    scaffold_1   31452 255 48M *
0     0
TCCACGTCACCAGCAAGCCTCCGGTCAATCCGTCTGACTTGTCCTGTC
8E/./:R*
$BIG/!%GP9@MMK;@FMJIXVNSWNNUUOTXQNGFQUPN                        XA:i:0
MD:Z:48 NM:i:0 CM:i:5
0 -> the read is not paired and mapped, forward strand
4 -> unmapped read
16 -> mapped to the reverse strand http://samtools.sourceforge.net/SAM1.pdf
SHRiMP and MAQ Format
 >947_1567_1384_F3 reftig_991 + 22901 22923 3 25 25 2020
 18x2x3
    A perfect match for 25-bp tags is: "25“
    Edit String
    A SNP at the 16th base of the tag is: "15A9“
     A four-base insertion in the reference: "3(TGCT)20"
    A four-base deletion in the reference: "5----20"
    Two sequencing errors: "4x15x6" (i.e. 25 matches with 2
    crossovers)
 http://compbio.cs.toronto.edu/shrimp/README


ID19_190907_6_195_127_427     Contig0_2091311 60     +        0
0
30   30   30   0    0   1     4     35
GTGCAGCCATTTGCGT
ACaAGCaTCtCaaGctACt ?IIIIIIIIIIIIII@EI6<II6HB9I(8I6.G<-
Assembly program
 Abyss
   Supports multiple K values
   Fast
   Merging different K valued assembly possible
   Trans-abyss pipeline runs on this


 MIRA(Mimicking Intelligent Read Assembly)
   Hybrid Denovo assembler
   Genome Mapper
 Velvet
Splice Junction prediction
 TopHat
 Cufflink
 MapSplice
 Trans-Abyss
Trapnell et. al 2009
An overview of the MapSplice pipeline.




© The Author(s) 2010. Published by Oxford University Press.

                                                              Wang K et al. Nucl. Acids Res. 2010;38:e178-e178
Denoeud et al,
2008
Cufflink
 Transcript Assembly
 Expression levels with a reference GTF
 Expression levels without GTF.
 Merging experimental replicates(cuffcompare)
 Differential Expression Analysis(cuffdiff)
Annotation of RNASeq Data

   De novo                               Reads
   Assembled                             mapped to
   Reads (contigs)                       reference
                                         assembled

                     Map Back to
                     genome
                     (BLAT)
                                                 Expressio
                            Train for            n Profiling
    Junction/no             gene
    vel                     prediction
    transcripts/                         Differential
    Splice                               Expression            CNV
    variants                             analysis
Genome Viewer
 Desktop/standalone application
     Tbrowse
     Bamview
     Savant
     IGV
     IGB
 Web based browsers
   Gbrowse
   UCSC Genome Browse
   VBI Transcriptomics browser
Other Applications
 SNP detection
 Splice Variant Discovery
 Identification of miRNA targets
 TF binding sites
 Genome Methylation pattern
 RNA editing
 Metagenomic projects
 Gene Expression Analysis
Difference with other expression
sequencing
 EST: Low throughput, expansive, NOT
  quantitative.
 SAGE, CAGE, MPSS: Highthroughput, digital
  gene expression levels
   Expansive
   Sanger sequencing methods
   A portion of transcript is analyzed
   Isoforms are indistinguishable
Advantages:
 Zero or very less background noise.
 Sensitive to isoform discovery.
 Both low and highly expressed genes can be
  quantified.
 Highly reproducible.
Transcripts discovered/Corrected
 10,000 new Transcription start site discovered in
    Rhesus macaque(Liu et al., NAR 2010)
   602 transcriptionally active regions and numerous
    introns in Candida albicans(Bruno et al., 2010,
    Genome Research)
   96% of the genes were corrected in Laccaria
    bicolor(Larsen et al., PLoS One 2010).
   16,923 regions in mouse (Martazavi et al., 2008).
   3,724 novel isoforms (Trapanell 2010).
Bioinformatics Challenges
 Store , retrieve and analyze large amounts of
  data
 Matching of reads to multiple locations
 Short reads with higher copy number and long
  reads representing less expressed genes.
References:
 Wilhelm J. Ansorge, Next-generation DNA sequencing techniques, New
    Biotechnology, Volume 25, Issue 4, April 2009, Pages 195-203
   Zhong Wang, Mark Gerstein, and Michael Snyder. RNA-Seq: a
    revolutionary tool for transcriptomics. Nat Rev Genet. 2009 January;
    10(1): 57–63.
   Peter E. Larsen et al., Using Deep RNA Sequencing for the Structural
    Annotation of the Laccaria Bicolor Mycorrhizal TranscriptomePLoS One.
    2010; 5(7): e9780
   Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice
    junction discovery, NAR, 2010
   Denoeud et al., Annotating genomes with massive-scale RNA
    sequencing, Genome Biology, 2008
   Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren
    MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and
    quantification by RNA-Seq reveals unannotated transcripts and isoform
    switching during cell differentiation Nature Biotechnology
    doi:10.1038/nbt.1621
   Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions
    with RNA-Seq. Bioinformatics doi:10.1093/bioinformatics/btp120
   Mortazavi et al. Nature Methods, May 2008

Weitere ähnliche Inhalte

Was ist angesagt?

RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencingSean Davis
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
diffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packagediffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packageLi Shen
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomicsFrancisco Garc
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 

Was ist angesagt? (20)

RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Rna seq
Rna seqRna seq
Rna seq
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
diffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis packagediffReps: automated ChIP-seq differential analysis package
diffReps: automated ChIP-seq differential analysis package
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
 
presentation
presentationpresentation
presentation
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 

Andere mochten auch

Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its applicationprateek kumar
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarrayprateek kumar
 
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Wafaa Mowlabaccus
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprintingbiomedicz
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
 
DNA microarray
DNA microarrayDNA microarray
DNA microarrayS Rasouli
 
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data EcosystemNCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data EcosystemWarren Kibbe
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)Rachana Tiwari
 
DNA microarray final ppt.
DNA microarray final ppt.DNA microarray final ppt.
DNA microarray final ppt.Aashish Patel
 

Andere mochten auch (18)

Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its application
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Applications of microarray
Applications of microarrayApplications of microarray
Applications of microarray
 
Dna microarray application in vp research mehran
Dna microarray application in vp research  mehranDna microarray application in vp research  mehran
Dna microarray application in vp research mehran
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
 
Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology Translating Cancer Genomes and Transcriptomes for Precision Oncology
Translating Cancer Genomes and Transcriptomes for Precision Oncology
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data EcosystemNCI Cancer Imaging Program - Cancer Research Data Ecosystem
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
 
Dna finger printing
Dna finger printingDna finger printing
Dna finger printing
 
Microarray CGH
Microarray CGHMicroarray CGH
Microarray CGH
 
DNA microarray
DNA microarrayDNA microarray
DNA microarray
 
Microarray
MicroarrayMicroarray
Microarray
 
MICROARRAY
MICROARRAYMICROARRAY
MICROARRAY
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)
 
DNA microarray final ppt.
DNA microarray final ppt.DNA microarray final ppt.
DNA microarray final ppt.
 

Ähnlich wie Rnaseq forgenefinding

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assemblyRamya P
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfPushpendra83
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers Golden Helix Inc
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsFrancis Rowland
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 

Ähnlich wie Rnaseq forgenefinding (20)

Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Under the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS ResearchersUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in Genomics
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 

Mehr von Sucheta Tripathy (20)

Gal
GalGal
Gal
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Stat2013
Stat2013Stat2013
Stat2013
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 

Kürzlich hochgeladen

DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdfJayanti Pande
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashasashalaycock03
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxCapitolTechU
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 

Kürzlich hochgeladen (20)

Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sasha
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 

Rnaseq forgenefinding

  • 1. Transcript discovery and gene model correction using next generation sequencing data Sucheta Tripathy, 6th July 2012
  • 2. NextGen Sequencing Methods  454 sequencing methods(2006)  Principles of pyrophosphate detection(1985, 1988)  Illumina(Solexa) Genome sequencing methods(2007)  Applied Biosystems ABI SOLiD System(2007)  Helicos single molecule sequencing(Helioscope, 2007)  Pacific Biosciences single-molecule real- time(SMRT) technology, 2010  Sequenom for Nanotechnology based sequencing.  BioNanomatrix nanofluidiscs.  RNAP technology.
  • 4. Roberts et al. Genome Biology 2011
  • 5. RNASeq  Catalogue all species of transcripts.  mRNA  Non-coding RNA  Small RNA  Splicing patterns or other post-transcriptional modifications.  Quantify the expression levels.
  • 6. Topics covered  Sequence formats  Calculate the sequencing depth of coverage  Data Analysis Workflow  Mapping programs  Output data files  SAM  SHRIMP  MAQ  Clustering and assembly programs  Finding new genes and correction of existing genes  Annotation of RNAseq data
  • 7. Input File Types @SNPSTER4_90_307R0AAXX:2:41:528:604 run=080625_SNPSTER4_090_307R0AAXX GCGCCTATCCACTTTGCGGTCTTCCAAAGNCTCCGG Raw + IIIIIIIIIIIIIIIIIIIIIIIIII,II!IIIIII sequence files in csfasta or >853_22_43_F3 T32310120021231211023112232332233113303231202211332 fastq format >853_22_43_F3 20 24 23 22 14 13 18 12 23 22 14 14 17 26 26 18 12 17 16 26 23 16 15 16 25 5 14 25 26 23 8 10 9 20 2 11 2 9 25 26 8 6 19 24 15 18 6 10 20 12
  • 8. Calculate the sequencing depth of coverage  Read Length  Number of reads  GeneSpace size/genome size Read Length * Number of Reads/GeneSpace (or genome size) Problem: 12 million reads , read length = 50 bases, Total GeneSpace=8 MB 12 * 10^6 * 50/8 * 10^6 = 75X
  • 9. Part -1 : Alignment of the reads to the reference Genome Raw Reads mapped to QC by R Sequence reference Bowtie, ShortReads Data BWA, Shrimp Files(FastQ/ colorspace) 1. Filter out spike- BEDTools ins 1. Read Depth 2. Filter reads of coverage mapping multi 2. Manipulatio locations n of 3. Sam -> Bam BED,SAM, 4. Remove PCR BAM, GTF, duplicates GFF files 5. Sort, View, pileup, merge SNP discovery, indel
  • 10. Part 2: Data Anlysis Assembly of Assembly of Mapped reads raw QCd (cufflink) reads by denovo methods Abyss, Velvet Gene Model Align correction/ju Merging assembled nction cufflink reads back to finding outputs from genome(BLAT) TopHat, different Transabyss Splice libraries Variants (cuffcompare ) Expression Analysis Copy and differential Number expression (cuffdiff, Variation DEGseq, edgeR)
  • 11. Zhong Wang et al; Nat. Rev. Genetics, 2009
  • 12. Mapping  One or two mis-matches < 35 bases  One insertion/deletion.  K-mer based seeding. •Identification of Novel Transcripts. •Transcript abundance.
  • 13. Available tools for Nextgen sequence alignment BFAST: Blat like Fast Alignment Tool. Bowtie: Burrows-Wheeler-Transformed (BWT) index. BWA: Gapped global alignment wrt query sequences. ELAND: Is part of Illumina distr. And runs on single processor, Local Alignment. SOAP: Short Oligonucleotide Alignment Program. SSAHA: SSAHA (Sequence Search and Alignment by Hashing Algorithm) SHRiMP(Short Read Mapping algorithm) SOCS: Rabin-Karp string search algorithm, which
  • 14. Integrated Pipeline • SOLiD™ System Analysis Pipeline Tool (Corona Lite) • CLCBio Genomic workbench. • Partek • Galaxy Server. • ERANGE: Is a full package for RNASeq and chipSeq data analysis • DESEQ(used by edgeR package)
  • 15. Output File Formats  SAM(Sequence Alignment and Mapping)  SAM BAM  Sorting/indexing BAM/SAM files  Extracting and viewing alignment  SNP calling(mpileup)  Text viewer(Tview) 1082_1988_1406_F3 16 scaffold_1 31452 255 48M * 0 0 TCCACGTCACCAGCAAGCCTCCGGTCAATCCGTCTGACTTGTCCTGTC 8E/./:R* $BIG/!%GP9@MMK;@FMJIXVNSWNNUUOTXQNGFQUPN XA:i:0 MD:Z:48 NM:i:0 CM:i:5 0 -> the read is not paired and mapped, forward strand 4 -> unmapped read 16 -> mapped to the reverse strand http://samtools.sourceforge.net/SAM1.pdf
  • 16. SHRiMP and MAQ Format >947_1567_1384_F3 reftig_991 + 22901 22923 3 25 25 2020 18x2x3 A perfect match for 25-bp tags is: "25“ Edit String A SNP at the 16th base of the tag is: "15A9“ A four-base insertion in the reference: "3(TGCT)20" A four-base deletion in the reference: "5----20" Two sequencing errors: "4x15x6" (i.e. 25 matches with 2 crossovers) http://compbio.cs.toronto.edu/shrimp/README ID19_190907_6_195_127_427 Contig0_2091311 60 + 0 0 30 30 30 0 0 1 4 35 GTGCAGCCATTTGCGT ACaAGCaTCtCaaGctACt ?IIIIIIIIIIIIII@EI6<II6HB9I(8I6.G<-
  • 17. Assembly program  Abyss  Supports multiple K values  Fast  Merging different K valued assembly possible  Trans-abyss pipeline runs on this  MIRA(Mimicking Intelligent Read Assembly)  Hybrid Denovo assembler  Genome Mapper  Velvet
  • 18. Splice Junction prediction  TopHat  Cufflink  MapSplice  Trans-Abyss
  • 20. An overview of the MapSplice pipeline. © The Author(s) 2010. Published by Oxford University Press. Wang K et al. Nucl. Acids Res. 2010;38:e178-e178
  • 22. Cufflink  Transcript Assembly  Expression levels with a reference GTF  Expression levels without GTF.  Merging experimental replicates(cuffcompare)  Differential Expression Analysis(cuffdiff)
  • 23. Annotation of RNASeq Data De novo Reads Assembled mapped to Reads (contigs) reference assembled Map Back to genome (BLAT) Expressio Train for n Profiling Junction/no gene vel prediction transcripts/ Differential Splice Expression CNV variants analysis
  • 24. Genome Viewer  Desktop/standalone application  Tbrowse  Bamview  Savant  IGV  IGB  Web based browsers  Gbrowse  UCSC Genome Browse  VBI Transcriptomics browser
  • 25. Other Applications  SNP detection  Splice Variant Discovery  Identification of miRNA targets  TF binding sites  Genome Methylation pattern  RNA editing  Metagenomic projects  Gene Expression Analysis
  • 26. Difference with other expression sequencing  EST: Low throughput, expansive, NOT quantitative.  SAGE, CAGE, MPSS: Highthroughput, digital gene expression levels  Expansive  Sanger sequencing methods  A portion of transcript is analyzed  Isoforms are indistinguishable
  • 27. Advantages:  Zero or very less background noise.  Sensitive to isoform discovery.  Both low and highly expressed genes can be quantified.  Highly reproducible.
  • 28. Transcripts discovered/Corrected  10,000 new Transcription start site discovered in Rhesus macaque(Liu et al., NAR 2010)  602 transcriptionally active regions and numerous introns in Candida albicans(Bruno et al., 2010, Genome Research)  96% of the genes were corrected in Laccaria bicolor(Larsen et al., PLoS One 2010).  16,923 regions in mouse (Martazavi et al., 2008).  3,724 novel isoforms (Trapanell 2010).
  • 29. Bioinformatics Challenges  Store , retrieve and analyze large amounts of data  Matching of reads to multiple locations  Short reads with higher copy number and long reads representing less expressed genes.
  • 30. References:  Wilhelm J. Ansorge, Next-generation DNA sequencing techniques, New Biotechnology, Volume 25, Issue 4, April 2009, Pages 195-203  Zhong Wang, Mark Gerstein, and Michael Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 January; 10(1): 57–63.  Peter E. Larsen et al., Using Deep RNA Sequencing for the Structural Annotation of the Laccaria Bicolor Mycorrhizal TranscriptomePLoS One. 2010; 5(7): e9780  Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery, NAR, 2010  Denoeud et al., Annotating genomes with massive-scale RNA sequencing, Genome Biology, 2008  Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation Nature Biotechnology doi:10.1038/nbt.1621  Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics doi:10.1093/bioinformatics/btp120  Mortazavi et al. Nature Methods, May 2008

Hinweis der Redaktion

  1. An overview of the MapSplice pipeline. The algorithm contains two phases: tag alignment (Step 1–Step 4) and splice inference (Step 5–Step 6). In the ‘tag alignment&apos; phase, candidate alignments of the mRNA tags to the reference genome are determined. In the ‘splice inference&apos; phase, splice junctions that appear in one or more tag alignments are analyzed to determine a splice significance score based on the quality and diversity of alignments that include the splice. Ambiguous candidate alignments are resolved by selecting the alignment with the overall highest quality match and highest confidence splice junctions.
  2. Cap analysis of gene expression, Massively parallel signature sequencing , Serial analysis of gene expression