SlideShare a Scribd company logo
1 of 22
Download to read offline
Technical and biological variance structure in
mRNA-Seq data:life in the real world
Paper by
Ann Oberg, et al.
October 2, 2013
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
Suppose x is helpful in predicting y.
y = β0 + β1x + (1)
∼ N(0, σ2
)
No variation, no model
◦
C = (◦
F − 32) ×
5
9
(2)
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
RNASeq studies, sources of variation
Technical variation: flowcell, replication in lanes, library
preparation etc
Biological variation: person to person
Observed count data: combination of both types of variation.
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Concept
Technical variation Poisson distribution: Var(Y ) = µ
Total variation over-dispersion: Var(Y ) > µ
within sample variation ∼ Poisson distribution
between sample variation ∼ Gamma distribution
This gives rise to Negative Binomial distribution
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Purpose of the paper
Describe the mean variance relationship in mRNA Seq data
1. Var(Y ) = µ: Poisson
2. Var(Y ) = kµ: Overdispersed Poisson (OD)
3. Var(Y ) = µ + φµ2: Negative-Binomial distribution
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Purpose of the paper
Estimation of φ is very crucial step
1. per gene, glm.nb function MASS
2. local, empirical Bayes estimate shrinking per gene estimate
towards global, edgeR
3. global, quantile adjusted conditional maximum likelihood,
edgeR
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Data and Statistical Experimental Design, Figure 1
25 study subjects (all female caucasians): 12 high and 13 low
antibody responders
13 flow cells, each with 8 lanes: 4 for High response, 4 for Low
response
For each response group, two specimens: unstimulated and
stimulated
2 replicates for unstimulated and stimulated specimens each
2 subjects failed from High response group; leaving 10 subjects
high and 13 subjects low
Only the unstimulated specimens were used, to avoid correlation
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 2
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Statistical Analysis
Models were fit to unstimulated specimens only to focus on
biological variation
Counts for the two technical replicates were summed for the
models.
No normalization with total count per lane-pair OR 75th percentile
count per lane pair as normalization constant.
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variation
Representative scatter plot of technical replicate 1 versus technical
replicate 2 for one subject. Spearman correlation was 0.9941 for
this pair.
Figure: Supplementary plot
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variation
The vertical axis is difference between the counts in the two
replicates on the log2 scale and the horizontal axis is the average
of the two counts on the log2 scale.
Figure: Bland Altman plot: Supplementary plot
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Technical variation
QQ plots assuming poisson distribution in addition files.
Technical variation in general follows Poisson distribution.
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Biological variation, Figure 3
A. Plot of Mean (x) and Variance (S2)
B. Local estimates of φ and per group mean count
Figure: 3
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Goodness of fit
QQ plots
1. Standard Poisson
2. NB with global estimate of φ
3. NB with per-gene estimate of φ
4. NB with local estimate of φ
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 4
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Experimental variation
Potential sources of experimental variation examined (When
experimental factors were included in the model):
flow-cell, lane-pair and library preparation batch
Figure 5
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 5
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Flow-cell, the entire observed counts were smaller than the
expected count.
Reason was the software upgrade mid-way through the experiment.
Number of read increased with the software upgrade, Figure 6A.
After 75th percentile offset was used, no clear flow-cell effect.
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 6
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Characterizing genes with poor model fit
Effect of genes with small counts.
1. smallest GOF statistics: indicative of overfitting
2. largest GOF statistics: indicative of underfitting (not
explaining enough variance)
Filtering out up to 10,000 total count had minor impact
GOF statistics for gene with average gene count < 5 per subject
were distributed through out the range.Figure 7A
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Figure: 7
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world
Data records of genes with very small GOF statistics.
1. All 0 counts in one response group and non zero counts in
other
2. counts very consistent and small variance
Data records of genes with very large GOF statistics.
1. The variance is very high. Example of one such gene in Figure
7b
Paper byAnn Oberg, et al.
Technical and biological variance structure in mRNA-Seq data:life in the real world

More Related Content

What's hot

Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisDespoina Kalfakakou
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...ExternalEvents
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsJonathan Eisen
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Thermo Fisher Scientific
 
Embryonic Cancer Therapy – TEDxCluj 2018
Embryonic Cancer Therapy – TEDxCluj 2018Embryonic Cancer Therapy – TEDxCluj 2018
Embryonic Cancer Therapy – TEDxCluj 2018Hashem AL-ghaili
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionLarry Smarr
 
Plant health emergencies in the face of a changing environment
Plant health emergencies in the face of a changing environmentPlant health emergencies in the face of a changing environment
Plant health emergencies in the face of a changing environmentSophien Kamoun
 
Bioengineering: making life from scratch
Bioengineering: making life from scratchBioengineering: making life from scratch
Bioengineering: making life from scratchMelanie Swan
 
Network-based approaches for the analysis of gene-disease associations
Network-based approaches for the analysis of gene-disease associationsNetwork-based approaches for the analysis of gene-disease associations
Network-based approaches for the analysis of gene-disease associationsCasey Greene
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Larry Smarr
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data DriverLarry Smarr
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
 
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is brokenMads Albertsen
 

What's hot (20)

Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
 
Embryonic Cancer Therapy – TEDxCluj 2018
Embryonic Cancer Therapy – TEDxCluj 2018Embryonic Cancer Therapy – TEDxCluj 2018
Embryonic Cancer Therapy – TEDxCluj 2018
 
Genetic engineering
Genetic engineering Genetic engineering
Genetic engineering
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Plant health emergencies in the face of a changing environment
Plant health emergencies in the face of a changing environmentPlant health emergencies in the face of a changing environment
Plant health emergencies in the face of a changing environment
 
Bioengineering: making life from scratch
Bioengineering: making life from scratchBioengineering: making life from scratch
Bioengineering: making life from scratch
 
Network-based approaches for the analysis of gene-disease associations
Network-based approaches for the analysis of gene-disease associationsNetwork-based approaches for the analysis of gene-disease associations
Network-based approaches for the analysis of gene-disease associations
 
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
 
Bio computing
Bio computingBio computing
Bio computing
 
Crispr
CrisprCrispr
Crispr
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
 
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken
[2014.08.25] Albertsen ISME15 CAMI: Why metgenomics is broken
 
RURS 2016 Poster
RURS 2016 PosterRURS 2016 Poster
RURS 2016 Poster
 

Viewers also liked

Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2Jennifer Shelton
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSJennifer Shelton
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Jennifer Shelton
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014Jennifer Shelton
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation DetectionJennifer Shelton
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 

Viewers also liked (12)

Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014
 
Bng presentation draft
Bng presentation draftBng presentation draft
Bng presentation draft
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 

Similar to Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Thermo Fisher Scientific
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsThermo Fisher Scientific
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Enrico Glaab
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks
 
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...Thermo Fisher Scientific
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
jin-HMG2014-post
jin-HMG2014-postjin-HMG2014-post
jin-HMG2014-postJin Yu
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13Jonathan Eisen
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHbiocs
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
Prof. Ramez Daniel, Technion
Prof. Ramez Daniel, TechnionProf. Ramez Daniel, Technion
Prof. Ramez Daniel, Technionchiportal
 
rareAPA_website.pptx
rareAPA_website.pptxrareAPA_website.pptx
rareAPA_website.pptxxuelianma
 

Similar to Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by (20)

NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 
ASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary AnalysisASHG 2015 - Redundant Annotations in Tertiary Analysis
ASHG 2015 - Redundant Annotations in Tertiary Analysis
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Flow basics2.ppt2
Flow basics2.ppt2Flow basics2.ppt2
Flow basics2.ppt2
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)
 
bai2
bai2bai2
bai2
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
Custom AmpliSeq™ Panels for Inherited Disease Research from Optimized, Invent...
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
jin-HMG2014-post
jin-HMG2014-postjin-HMG2014-post
jin-HMG2014-post
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
Prof. Ramez Daniel, Technion
Prof. Ramez Daniel, TechnionProf. Ramez Daniel, Technion
Prof. Ramez Daniel, Technion
 
rareAPA_website.pptx
rareAPA_website.pptxrareAPA_website.pptx
rareAPA_website.pptx
 

Recently uploaded

Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 

Recently uploaded (20)

Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 

Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technical and biological variance structure in mRNA-Seq data:life in the real world" by

  • 1. Technical and biological variance structure in mRNA-Seq data:life in the real world Paper by Ann Oberg, et al. October 2, 2013 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 2. Concept Suppose x is helpful in predicting y. y = β0 + β1x + (1) ∼ N(0, σ2 ) No variation, no model ◦ C = (◦ F − 32) × 5 9 (2) Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 3. Concept RNASeq studies, sources of variation Technical variation: flowcell, replication in lanes, library preparation etc Biological variation: person to person Observed count data: combination of both types of variation. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 4. Concept Technical variation Poisson distribution: Var(Y ) = µ Total variation over-dispersion: Var(Y ) > µ within sample variation ∼ Poisson distribution between sample variation ∼ Gamma distribution This gives rise to Negative Binomial distribution Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 5. Purpose of the paper Describe the mean variance relationship in mRNA Seq data 1. Var(Y ) = µ: Poisson 2. Var(Y ) = kµ: Overdispersed Poisson (OD) 3. Var(Y ) = µ + φµ2: Negative-Binomial distribution Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 6. Purpose of the paper Estimation of φ is very crucial step 1. per gene, glm.nb function MASS 2. local, empirical Bayes estimate shrinking per gene estimate towards global, edgeR 3. global, quantile adjusted conditional maximum likelihood, edgeR Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 7. Data and Statistical Experimental Design, Figure 1 25 study subjects (all female caucasians): 12 high and 13 low antibody responders 13 flow cells, each with 8 lanes: 4 for High response, 4 for Low response For each response group, two specimens: unstimulated and stimulated 2 replicates for unstimulated and stimulated specimens each 2 subjects failed from High response group; leaving 10 subjects high and 13 subjects low Only the unstimulated specimens were used, to avoid correlation Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 8. Figure: 2 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 9. Statistical Analysis Models were fit to unstimulated specimens only to focus on biological variation Counts for the two technical replicates were summed for the models. No normalization with total count per lane-pair OR 75th percentile count per lane pair as normalization constant. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 10. Technical variation Representative scatter plot of technical replicate 1 versus technical replicate 2 for one subject. Spearman correlation was 0.9941 for this pair. Figure: Supplementary plot Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 11. Technical variation The vertical axis is difference between the counts in the two replicates on the log2 scale and the horizontal axis is the average of the two counts on the log2 scale. Figure: Bland Altman plot: Supplementary plot Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 12. Technical variation QQ plots assuming poisson distribution in addition files. Technical variation in general follows Poisson distribution. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 13. Biological variation, Figure 3 A. Plot of Mean (x) and Variance (S2) B. Local estimates of φ and per group mean count Figure: 3 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 14. Goodness of fit QQ plots 1. Standard Poisson 2. NB with global estimate of φ 3. NB with per-gene estimate of φ 4. NB with local estimate of φ Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 15. Figure: 4 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 16. Experimental variation Potential sources of experimental variation examined (When experimental factors were included in the model): flow-cell, lane-pair and library preparation batch Figure 5 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 17. Figure: 5 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 18. Flow-cell, the entire observed counts were smaller than the expected count. Reason was the software upgrade mid-way through the experiment. Number of read increased with the software upgrade, Figure 6A. After 75th percentile offset was used, no clear flow-cell effect. Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 19. Figure: 6 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 20. Characterizing genes with poor model fit Effect of genes with small counts. 1. smallest GOF statistics: indicative of overfitting 2. largest GOF statistics: indicative of underfitting (not explaining enough variance) Filtering out up to 10,000 total count had minor impact GOF statistics for gene with average gene count < 5 per subject were distributed through out the range.Figure 7A Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 21. Figure: 7 Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world
  • 22. Data records of genes with very small GOF statistics. 1. All 0 counts in one response group and non zero counts in other 2. counts very consistent and small variance Data records of genes with very large GOF statistics. 1. The variance is very high. Example of one such gene in Figure 7b Paper byAnn Oberg, et al. Technical and biological variance structure in mRNA-Seq data:life in the real world