SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Generating the count table
and validating assumptions
RNA-seq for DE analysis training
Joachim Jacob
20 and 27 January 2014

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts hereof.
Goal
Summarize the read counts per gene from
a mapping result.
The outcome is a raw count table on
which we can perform some QC.
This table is used by the differential
expression algorithm to detect DE genes.
Status
The challenge
'Exons' are the type of features used here.
They are summarized per 'gene'

Alt splicing
Overlaps no feature

Concept:
GeneA = exon 1 + exon 2 + exon 3 + exon 4 = 215 reads
GeneB = exon 1 + exon 2 + exon 3 = 180 reads
No normalization yet! Just pure counts, aka 'raw counts',
Tools to count features
●

Different tools exist to accomplish this:

http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Feature_counting
Dealing with ambiguity
●

We focus on the gene level: merge all counts over
different isoforms into one, taking into account:
●

●

●

Reads that do not overlap a feature, but appear in
introns. Take into account?
Reads that align to more than one feature (exon or
transcript). Transcripts can be overlapping - perhaps
on different strands. (PE, and strandedness can
resolve this partially).
Reads that partially overlap a feature, not following
known annotations.
HTSeq count has 3 modes
HTSeq-count
recommends
the 'union
mode'. But
depending on
your genome,
you may opt
for the
'intersection_st
rict mode'.
Galaxy allows
experimenting!

http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
Indicate the SE or PE nature of your data
(note: mate-pair is not
appropriate naming here)
The annotation file with the coordinates
of the features to be counted
mode
Reverse stranded: heck with mapping viz
Check with mapping QC (see earlier)
For RNA-seq DE we summarize over
'exons' grouped by 'gene_id'. Make sure
these fields are correct in your GTF file.
Resulting count table column

One sample !
Merging to create experiment count table
Resulting count table
Quality control of count table
Relative numbers

Absolute numbers

In the end, we used about 70% of the reads. Check for your experiment.
Quality control of count table
2 types of QC:
●

General metrics

●

Sample-specific quality control
QC: general metrics
●

General numbers
QC: general metrics
Which genes are most highly present?
Which fractions do they occupy?
Gene

Counts

42 genes (0,0063%)
of the 6665 genes
take 25% of all
counts.
This graph can be
constructed from
the count table.
TEF1alpha, putative ribo prot,...
QC: general metrics
●

General numbers
QC: general metrics
●

We can plot the counts per sample: filter
out the '0', and transform on log2.

The bulk of the genes have counts
in the hundreds.

Few are extremely highly expressed
A minority have extremely low counts
log2(count)
QC: log2 density graph
●

We can do this for all samples, and merge
All samples show
nice overlap, peaks
are similar

Strange
Deviation
here
QC: log2 merging samples
Here, we take one sample,
plot the log2 density
graph, add the counts of
another sample, and plot
again, add the counts of
another sample, etc. until
we have merged all
samples.
We see a horizontal shift
of the graph, rather than a
vertical shift, pointing to
no saturation.
QC: log2, merging samples
Here, we take one sample,
plot the log2 density
graph, add the counts of
another sample, and plot
again, add the counts of
another sample, etc. until
we have merged all
samples.
QC: rarefaction curve
What is the number
of total detected
features, how does
the feature space
increase with each
additional sample
added?
There should be
saturation, but
here there is none.
Code:
ggplot(data = nonzero_counts, aes(total,
counts)) + geom_line() + labs(x = "total
number of sequenced reads",
y = "number of genes with counts > 0")
Sample A
Sample A + sample B
Sample A + sample B + sample C
Etc.

QC: rarefaction curve
rRNA genes

Saturation: OK!
QC: transformations for viz

Regularized log (rLog) and 'Variance Stabilizing Transformation'
(VST) as alternatives to log2.
http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
QC: count transformations
Not normalizations!
●

Techniques used for microarray can be
applied on VST transformed counts.
Log2

http://www.biomedcentral.com/1471-2105/14/91

rLog

VST

http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
QC including condition info
●

●

We can also include condition
information, to interpret our QC better.
For this, we need to gather sample
information.
Make a separate file
in which sample info
is provided (metadata)
QC with condition info

What are the differences in
counts in each sample
dependent on? Here: counts are
dependent on the treatment
and the strain. Must match
the sample descriptions file.
QC with condition info
Clustering of the distance between samples based on
transformed counts can reveal sample errors.

VST transformed

Colour scale
Of the distance
measure between
Samples. Similar conditions
Should cluster together

rLog transformed
QC with condition info
Clustering of transformed counts can reveal sample
errors.

VST transformed

rLog transformed
QC with condition info
Principal component (PC) analysis allows to display
the samples in a 2D scatterplot based on variability
between the samples. Samples close to each other
resemble each other more.
Collect enough metadata
Principal component (PC) analysis allows to display
the samples in a 2D scatterplot based on variability
between the samples. Samples close to each other
resemble each other more.

Why do
these resemble
each other?
QC with condition info
During library preparation, collect as much as
information as possible, to add to the sample
descriptions. Pay particular attention to differences
between samples: e.g. day of preparation,
centrifuges used, ...

Why do
these resemble
each other?
Collect enough metadata
In the QC of the count table, you can map this
additional info to the PC graph. In this case, library
prep on a different day had effect on the WT
samples.

Day 1
Day 2

Additional metadata
Collect enough metadata
In the QC of the count table, you can map this
additional info to the PC graph. In this case, library
prep on a different day had effect on the WT
samples (batch effect).

Day 1
Day 2

Additional metadata
Collect enough metadata
Next step
Now we know our data from the inside out, we
can run a DE algorithm on the count table!
Keywords
Raw counts
VST

Write in your own words what the terms mean
Break

Weitere ähnliche Inhalte

Was ist angesagt?

RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
Jennifer Shelton
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
Qiang Kou
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
Dongyan Zhao
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 

Was ist angesagt? (20)

RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
presentation
presentationpresentation
presentation
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Eccmid meet the expert 2015
Eccmid meet the expert 2015Eccmid meet the expert 2015
Eccmid meet the expert 2015
 
Rna seq
Rna seqRna seq
Rna seq
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 

Andere mochten auch

Unit 9 - DNA, RNA, and Proteins Notes
Unit 9  - DNA, RNA, and Proteins NotesUnit 9  - DNA, RNA, and Proteins Notes
Unit 9 - DNA, RNA, and Proteins Notes
asteinman
 

Andere mochten auch (20)

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Macs course
Macs courseMacs course
Macs course
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss research
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1Sfu ngs course_workshop tutorial_2.1
Sfu ngs course_workshop tutorial_2.1
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Unit 9 - DNA, RNA, and Proteins Notes
Unit 9  - DNA, RNA, and Proteins NotesUnit 9  - DNA, RNA, and Proteins Notes
Unit 9 - DNA, RNA, and Proteins Notes
 

Ähnlich wie RNA-seq for DE analysis: extracting counts and QC - part 4

Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
ISSEL
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
ISSEL
 
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
felicidaddinwoodie
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011
Devnology
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 

Ähnlich wie RNA-seq for DE analysis: extracting counts and QC - part 4 (20)

3302 3305
3302 33053302 3305
3302 3305
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx1PhylogeneticAnalysisHomeworkassignmentThisa.docx
1PhylogeneticAnalysisHomeworkassignmentThisa.docx
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011Devnology Workshop Genpro 2 feb 2011
Devnology Workshop Genpro 2 feb 2011
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsTMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
Biomart Update
Biomart UpdateBiomart Update
Biomart Update
 
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
The Use of K-mer Minimizers to Identify Bacterium Genomes in High Throughput ...
 
Bio-protocol4446.GO_KEGG_RICE.pdf
Bio-protocol4446.GO_KEGG_RICE.pdfBio-protocol4446.GO_KEGG_RICE.pdf
Bio-protocol4446.GO_KEGG_RICE.pdf
 

Mehr von BITS

Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
BITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
BITS
 

Mehr von BITS (16)

BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Kürzlich hochgeladen (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

RNA-seq for DE analysis: extracting counts and QC - part 4

  • 1. Generating the count table and validating assumptions RNA-seq for DE analysis training Joachim Jacob 20 and 27 January 2014 This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
  • 2. Goal Summarize the read counts per gene from a mapping result. The outcome is a raw count table on which we can perform some QC. This table is used by the differential expression algorithm to detect DE genes.
  • 4. The challenge 'Exons' are the type of features used here. They are summarized per 'gene' Alt splicing Overlaps no feature Concept: GeneA = exon 1 + exon 2 + exon 3 + exon 4 = 215 reads GeneB = exon 1 + exon 2 + exon 3 = 180 reads No normalization yet! Just pure counts, aka 'raw counts',
  • 5. Tools to count features ● Different tools exist to accomplish this: http://wiki.bits.vib.be/index.php/RNAseq_toolbox#Feature_counting
  • 6. Dealing with ambiguity ● We focus on the gene level: merge all counts over different isoforms into one, taking into account: ● ● ● Reads that do not overlap a feature, but appear in introns. Take into account? Reads that align to more than one feature (exon or transcript). Transcripts can be overlapping - perhaps on different strands. (PE, and strandedness can resolve this partially). Reads that partially overlap a feature, not following known annotations.
  • 7. HTSeq count has 3 modes HTSeq-count recommends the 'union mode'. But depending on your genome, you may opt for the 'intersection_st rict mode'. Galaxy allows experimenting! http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
  • 8. Indicate the SE or PE nature of your data (note: mate-pair is not appropriate naming here) The annotation file with the coordinates of the features to be counted mode Reverse stranded: heck with mapping viz Check with mapping QC (see earlier) For RNA-seq DE we summarize over 'exons' grouped by 'gene_id'. Make sure these fields are correct in your GTF file.
  • 9. Resulting count table column One sample !
  • 10. Merging to create experiment count table
  • 12. Quality control of count table Relative numbers Absolute numbers In the end, we used about 70% of the reads. Check for your experiment.
  • 13. Quality control of count table 2 types of QC: ● General metrics ● Sample-specific quality control
  • 15. QC: general metrics Which genes are most highly present? Which fractions do they occupy? Gene Counts 42 genes (0,0063%) of the 6665 genes take 25% of all counts. This graph can be constructed from the count table. TEF1alpha, putative ribo prot,...
  • 17. QC: general metrics ● We can plot the counts per sample: filter out the '0', and transform on log2. The bulk of the genes have counts in the hundreds. Few are extremely highly expressed A minority have extremely low counts log2(count)
  • 18. QC: log2 density graph ● We can do this for all samples, and merge All samples show nice overlap, peaks are similar Strange Deviation here
  • 19. QC: log2 merging samples Here, we take one sample, plot the log2 density graph, add the counts of another sample, and plot again, add the counts of another sample, etc. until we have merged all samples. We see a horizontal shift of the graph, rather than a vertical shift, pointing to no saturation.
  • 20. QC: log2, merging samples Here, we take one sample, plot the log2 density graph, add the counts of another sample, and plot again, add the counts of another sample, etc. until we have merged all samples.
  • 21. QC: rarefaction curve What is the number of total detected features, how does the feature space increase with each additional sample added? There should be saturation, but here there is none. Code: ggplot(data = nonzero_counts, aes(total, counts)) + geom_line() + labs(x = "total number of sequenced reads", y = "number of genes with counts > 0")
  • 22. Sample A Sample A + sample B Sample A + sample B + sample C Etc. QC: rarefaction curve rRNA genes Saturation: OK!
  • 23. QC: transformations for viz Regularized log (rLog) and 'Variance Stabilizing Transformation' (VST) as alternatives to log2. http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
  • 24. QC: count transformations Not normalizations! ● Techniques used for microarray can be applied on VST transformed counts. Log2 http://www.biomedcentral.com/1471-2105/14/91 rLog VST http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
  • 25. QC including condition info ● ● We can also include condition information, to interpret our QC better. For this, we need to gather sample information. Make a separate file in which sample info is provided (metadata)
  • 26. QC with condition info What are the differences in counts in each sample dependent on? Here: counts are dependent on the treatment and the strain. Must match the sample descriptions file.
  • 27. QC with condition info Clustering of the distance between samples based on transformed counts can reveal sample errors. VST transformed Colour scale Of the distance measure between Samples. Similar conditions Should cluster together rLog transformed
  • 28. QC with condition info Clustering of transformed counts can reveal sample errors. VST transformed rLog transformed
  • 29. QC with condition info Principal component (PC) analysis allows to display the samples in a 2D scatterplot based on variability between the samples. Samples close to each other resemble each other more.
  • 30. Collect enough metadata Principal component (PC) analysis allows to display the samples in a 2D scatterplot based on variability between the samples. Samples close to each other resemble each other more. Why do these resemble each other?
  • 31. QC with condition info During library preparation, collect as much as information as possible, to add to the sample descriptions. Pay particular attention to differences between samples: e.g. day of preparation, centrifuges used, ... Why do these resemble each other?
  • 32. Collect enough metadata In the QC of the count table, you can map this additional info to the PC graph. In this case, library prep on a different day had effect on the WT samples. Day 1 Day 2 Additional metadata
  • 33. Collect enough metadata In the QC of the count table, you can map this additional info to the PC graph. In this case, library prep on a different day had effect on the WT samples (batch effect). Day 1 Day 2 Additional metadata
  • 35. Next step Now we know our data from the inside out, we can run a DE algorithm on the count table!
  • 36. Keywords Raw counts VST Write in your own words what the terms mean
  • 37. Break