SlideShare ist ein Scribd-Unternehmen logo
1 von 18
[by Joseph Robertson] RNAseq analysis: Transcript detection (1/2) What is a jar ? August 11, 2011
Quick recap: Production informatics August 11, 2011 Sequencing->Images->Conversion (Demultiplexing) Resulting file type: FASTQ “Having raw sequence reads and quality scores” Sequencing Image Fastq Quality Control Projects
Objective & Challenges Objective: study the active transcriptome of the cell Problems: The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes Flowcell has only a finite real-estate of which most would be  occupied by these mainly invariable transcripts How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ? August 11, 2011
What RNAseq protocols are there? RNA seq total RNA tRNA/rRNA removed + PolyA-tail filtered Good for studying protein coding genes, e.g.  gene expression, isoforms, expression of variant alleles RNA editing events RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation.  Small RNAseq:  Total RNA size selection for small RNA molecules Good for small ncRNA e.g. miRNAs, snoRNA Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization Total RNA  high abundant transcripts are digested  Good for studying all transcripts August 11, 2011 Today Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952. Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
RNA-seq workflow Select PolyA-tail + remove tRNA/rRNA Fragment RNA Make cDNA(caution you may loose strand info) Sequence Map reads Identify transcripts Quantify transcripts Identify differences between conditions August 11, 2011 Today Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
Production Informatics and Bioinformatics August 11, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced  Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
Challenges for RNAseq read mapping Loosing reads because they do not match the ref. genome Reads spanning exon junctions RNA editing events  Approaches Align to ref. transcriptom library Exon-first e.g. Tophat Seed-extend methods e.g. GSNAP August 11, 2011 Sequencing reads DNA gRNA mRNA editing event Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
Exon-first approach Align reads to ref. genome Chop up unaligned reads and try to identify matching regions Find splice junctions around the matches August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Seed-extend approach Break reads in smaller k-mers and find matches Iteratively extend k-mers to identify exact spliced alignment August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Which method ? Exon-first: less computationally intensive The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real. August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Challenges for transcript detection Identifying isoforms is difficult Transcript abundance is volatile Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA)  Genes can have many isoforms Approaches Ignore isoforms Genome-guided reconstruction, e.g. Cufflinks Genome-independent reconstruction, e.g. Trinity August 11, 2011 QBI data Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Genome-guided reconstruction Use reads spanning slice junction to assemble the transcript path Work out minimal possible set paths so that all reads are visited (graph theory) If more than one set use read count to pick the most probable  August 11, 2011 Reads aligned to the genome Isoforms Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Genome-independent reconstruction Break reads into k-mers find their mutual overlap to build a de Bruijn graph Find probable paths through the graph by using read counts Map consensus assembly to genome August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
Which method? De novo methods are very computationally intensive However, they are able to find alternative isoforms and promoters and structural variation deletions (yellow) chimeras (green) August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
What are real transcripts? Even the most sophisticated computational method can’t tell you what is a real transcript. August 11, 2011 Roberts et al. QBI data Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
Solution: biological replicates Significant findings (here: new isoforms) in small sample sets can be due to  Technical errors Biological variability Population outliers Sequencing experiments are subject to the same issues (even though they are more expensive than arrays) Replicates are necessary to build confidence in your results! August 11, 2011 Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
Three things to remember Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet. Especially identifying transcript isoforms is difficult. Replicates are crucial to account for the biological variability August 11, 2011
Next Week: August 11, 2011 Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.

Weitere ähnliche Inhalte

Was ist angesagt?

Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markersAFSATH
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Bharathiar university
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingSwathi Prabakar
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
System's Biology
System's Biology System's Biology
System's Biology Pritam Shil
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysisRamaJumwal2
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;applicationFyzah Bashir
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methodssonamyadav82
 

Was ist angesagt? (20)

Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markers
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
System's Biology
System's Biology System's Biology
System's Biology
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
 

Andere mochten auch

DARana Master's Transcript
DARana Master's TranscriptDARana Master's Transcript
DARana Master's TranscriptDhavalkumar Rana
 
New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12Ranjani Reddy
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...QIAGEN
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsfaraharooj
 
New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project Senthil Natesan
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
genome mapping
genome mappinggenome mapping
genome mappingSuresh San
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single CellQIAGEN
 

Andere mochten auch (18)

DARana Master's Transcript
DARana Master's TranscriptDARana Master's Transcript
DARana Master's Transcript
 
New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 
sequencing-methods-review
sequencing-methods-reviewsequencing-methods-review
sequencing-methods-review
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
 
New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Genetic mapping
Genetic mappingGenetic mapping
Genetic mapping
 
Genome mapping
Genome mappingGenome mapping
Genome mapping
 
Mapping interview transcript records: theoretical, technical and cartographic...
Mapping interview transcript records: theoretical, technical and cartographic...Mapping interview transcript records: theoretical, technical and cartographic...
Mapping interview transcript records: theoretical, technical and cartographic...
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 

Ähnlich wie Transcript detection in RNAseq

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsPulipati Gangadhara Rao
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqDouglas Wu
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptxMalihaTanveer1
 
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Jacob Hanna - Weizmann Institute of Science
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)creativebiolabs11
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 

Ähnlich wie Transcript detection in RNAseq (20)

Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Rna
RnaRna
Rna
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 

Mehr von Denis C. Bauer

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site predictionDenis C. Bauer
 

Mehr von Denis C. Bauer (18)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site prediction
 

Kürzlich hochgeladen

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Transcript detection in RNAseq

  • 1. [by Joseph Robertson] RNAseq analysis: Transcript detection (1/2) What is a jar ? August 11, 2011
  • 2. Quick recap: Production informatics August 11, 2011 Sequencing->Images->Conversion (Demultiplexing) Resulting file type: FASTQ “Having raw sequence reads and quality scores” Sequencing Image Fastq Quality Control Projects
  • 3. Objective & Challenges Objective: study the active transcriptome of the cell Problems: The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes Flowcell has only a finite real-estate of which most would be occupied by these mainly invariable transcripts How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ? August 11, 2011
  • 4. What RNAseq protocols are there? RNA seq total RNA tRNA/rRNA removed + PolyA-tail filtered Good for studying protein coding genes, e.g. gene expression, isoforms, expression of variant alleles RNA editing events RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation. Small RNAseq: Total RNA size selection for small RNA molecules Good for small ncRNA e.g. miRNAs, snoRNA Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization Total RNA  high abundant transcripts are digested Good for studying all transcripts August 11, 2011 Today Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952. Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
  • 5. RNA-seq workflow Select PolyA-tail + remove tRNA/rRNA Fragment RNA Make cDNA(caution you may loose strand info) Sequence Map reads Identify transcripts Quantify transcripts Identify differences between conditions August 11, 2011 Today Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
  • 6. Production Informatics and Bioinformatics August 11, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
  • 7. Challenges for RNAseq read mapping Loosing reads because they do not match the ref. genome Reads spanning exon junctions RNA editing events Approaches Align to ref. transcriptom library Exon-first e.g. Tophat Seed-extend methods e.g. GSNAP August 11, 2011 Sequencing reads DNA gRNA mRNA editing event Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
  • 8. Exon-first approach Align reads to ref. genome Chop up unaligned reads and try to identify matching regions Find splice junctions around the matches August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 9. Seed-extend approach Break reads in smaller k-mers and find matches Iteratively extend k-mers to identify exact spliced alignment August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 10. Which method ? Exon-first: less computationally intensive The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real. August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 11. Challenges for transcript detection Identifying isoforms is difficult Transcript abundance is volatile Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA) Genes can have many isoforms Approaches Ignore isoforms Genome-guided reconstruction, e.g. Cufflinks Genome-independent reconstruction, e.g. Trinity August 11, 2011 QBI data Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 12. Genome-guided reconstruction Use reads spanning slice junction to assemble the transcript path Work out minimal possible set paths so that all reads are visited (graph theory) If more than one set use read count to pick the most probable August 11, 2011 Reads aligned to the genome Isoforms Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 13. Genome-independent reconstruction Break reads into k-mers find their mutual overlap to build a de Bruijn graph Find probable paths through the graph by using read counts Map consensus assembly to genome August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 14. Which method? De novo methods are very computationally intensive However, they are able to find alternative isoforms and promoters and structural variation deletions (yellow) chimeras (green) August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 15. What are real transcripts? Even the most sophisticated computational method can’t tell you what is a real transcript. August 11, 2011 Roberts et al. QBI data Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
  • 16. Solution: biological replicates Significant findings (here: new isoforms) in small sample sets can be due to Technical errors Biological variability Population outliers Sequencing experiments are subject to the same issues (even though they are more expensive than arrays) Replicates are necessary to build confidence in your results! August 11, 2011 Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
  • 17. Three things to remember Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet. Especially identifying transcript isoforms is difficult. Replicates are crucial to account for the biological variability August 11, 2011
  • 18. Next Week: August 11, 2011 Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.

Hinweis der Redaktion

  1. http://www.nature.com/nbt/journal/v29/n7/full/nbt.1915.html
  2. That is, double-stranded cDNA is denatured, then allowed to partially re-anneal, and the most abundant species, which re-anneal most rapidly, are digested with crab duplex-specific nuclease