SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
GENETIC
ANALYSIS
of Complex Human Diseases
Examining Gene Expression and
Methylation with Next-Gen
Sequencing
Stephen Turner, Ph.D.
Bioinformatics Core Director
bioinformatics.virginia.edu
University of Virginia
GENETIC
ANALYSIS
of Complex Human Diseases
Gene expression pre-2008
PCR Microarrays
GENETIC
ANALYSIS
of Complex Human Diseases
GENETIC
ANALYSIS
of Complex Human Diseases
Advantages of RNA-Seq
n  No reference necessary
n  Low background (no cross-hybridization)
n  Unlimited dynamic range (FC 9000 Science
320:1344)
n  Direct counting (microarrays: indirect – hybridization)
n  Can characterize full transcriptome
u mRNA and ncRNA (miRNA, lncRNA, snoRNA, etc)
u Differential gene expression
u Differential coding output
u Differential TSS usage
u Differential isoform expression
GENETIC
ANALYSIS
of Complex Human Diseases
Isoform level data
GENETIC
ANALYSIS
of Complex Human Diseases
Isoform level data
GENETIC
ANALYSIS
of Complex Human Diseases
Differential splicing & TSS use
GENETIC
ANALYSIS
of Complex Human Diseases
Is it accurate?
n  Marioni et al. RNA-seq: An assessment of technical reproducibility and
comparison with gene expression arrays. Genome Research 2008
18:1509.
GENETIC
ANALYSIS
of Complex Human Diseases
RNA-Seq Challenges
n  Library construction
u  Size selection (messenger, small)
u  Strand specificity?
n  Bioinformatic challenges
u  Spliced alignment
u  Transcript deconvolution
n  Statistical Challenges
u  Highly variable abundance
u  Sample size: never, ever, plan n=1
u  Normalization (RPKM)
►  More reads from longer transcripts,
higher sequencing depth
►  Want to compare features of different
lengths
►  Want to compare conditions with
different total sequence depth
GENETIC
ANALYSIS
of Complex Human Diseases
RNA-Seq Overview
Condi&on	
  1	
  
(normal	
  colon)	
  
Condi&on	
  2	
  
(colon	
  tumor)	
  
Samples	
  of	
  interest	
  
AAAAA mRNA
AAAAA
mRNA
TTTTT
Library
@HWUSI-EAS100R:6:73:941:1973#0/1
GATTTGGGGTTCAAAGCAGTATCGATCAAATA
+HWUSI-EAS100R:6:73:941:1973#0/1
!''*((((***+))%%%++)(%%%%).1***-
@HWUSI-EAS100R:6:73:941:1973#0/1
CATCGACGTAGATCGACTACATGAACTGCTCG
+HWUSI-EAS100R:6:73:941:1973#0/1
!'’*+(*+!+(*!+*(((***!%%%%!%%(+-
GENETIC
ANALYSIS
of Complex Human Diseases
Common question #1: Depth
n  Question: how much sequence do I need?
n  Answer: it’s complicated.
n  Oversimplified answer: 20-50 million PE reads / sample
(mouse/human).
n  Depends on:
u  Size & complexity of transcriptome
u  Application: differential gene expression, transcript
discovery
u  Tissue type, RNA quality, library preparation
u  Sequencing type: length, paired-end vs single-end, etc.
n  Find a publication in your field with similar goals.
n  Good news: ¼ HiSeq lane usually sufficient.
GENETIC
ANALYSIS
of Complex Human Diseases
Common question #2: Sample Size
n Question: How many samples should I
sequence?
n Oversimplified Answer: At least 3 biological
replicates per condition.
n Depends on:
u Sequencing depth
u Application
u Goals (prioritization, biomarker discovery, etc.)
u Effect size, desired power, statistical significance
n Find a publication with similar goals
GENETIC
ANALYSIS
of Complex Human Diseases
Common question #3: Workflow
n  How do I analyze the data?
n  No standards!
u  Unspliced aligners: BWA, Bowtie, Bowtie2, MANY others!
u  Spliced aligners: STAR, Rum, Tophat, Tophat2-Bowtie1, Tophat2-Bowtie2,
GSNAP, MANY others.
u  Reference builds & annotations: UCSC, Entrez, Ensembl
u  Assembly: Cufflinks, Scripture, Trinity, G.Mor.Se, Velvet, TransABySS
u  Quantification: Cufflinks, RSEM, eXpress, MISO, etc.
u  Differential expression: Cuffdiff, Cuffdiff2, DegSeq, DESeq, EdgeR, Myrna
n  Like early microarray days: lots of excitement, lots of tools, little
knowledge of integrating tools in pipeline!
n  Benchmarks
u  Microarray: Spike-ins (Irizarry)
u  RNA-Seq: ???, simulation, ???
GENETIC
ANALYSIS
of Complex Human Diseases
Common question #3: Workflow
Eyras et al. Methods to Study Splicing from RNA-Seq.
http://dx.doi.org/10.6084/m9.figshare.679993
Turner SD. RNA-seq Workflows and Tools.
http://dx.doi.org/10.6084/m9.figshare.662782
GENETIC
ANALYSIS
of Complex Human Diseases
Phases	
  of	
  NGS	
  Analysis	
  
n  Primary	
  
u  Conversion	
  of	
  raw	
  machine	
  signal	
  into	
  sequence	
  and	
  quali8es	
  
n  secondary	
  
u  Alignment	
  of	
  reads	
  to	
  reference	
  genome	
  or	
  transcriptome	
  
u  or	
  de	
  novo	
  assembly	
  of	
  reads	
  into	
  con8gs	
  
n  Ter8ary	
  
u  SNP	
  discovery/genotyping	
  
u  Peak	
  discovery/quan8fica8on	
  (ChIP,	
  MeDIP)	
  
u  Transcript	
  assembly/quan8fica8on	
  (RNA-­‐seq)	
  
n  Quaternary	
  
u  Differen8al	
  expression	
  
u  Enrichment,	
  pathways,	
  correla8on,	
  clustering,	
  visualiza8on,	
  etc.	
  	
  
u  hKp://geMnggene8csdone.blogspot.com/2012/03/pathway-­‐analysis-­‐for-­‐high-­‐throughput.html	
  
u  hKp://www.slideshare.net/turnersd/pathway-­‐analysis-­‐2012-­‐17947529	
  
GENETIC
ANALYSIS
of Complex Human Diseases
Primary	
  Analysis:	
  Get	
  FASTQ	
  file	
  
@HWUSI-EAS100R:6:73:941:1973#0/1
GATTTGGGGTTCAAAGCAGTATCGATCAAATA
+HWUSI-EAS100R:6:73:941:1973#0/1
!''*((((***+))%%%++)(%%%%).1***-
GENETIC
ANALYSIS
of Complex Human Diseases
“Phred-­‐scaled”	
  base	
  quali&es	
  
#	
  $p	
  is	
  probability	
  base	
  is	
  erroneous	
  
$Q	
  =	
  -­‐10	
  *	
  log($p)	
  /	
  log(10);	
  #	
  Phred	
  Q	
  
$q	
  =	
  chr(($Q<=40?	
  $Q	
  :	
  40)	
  +	
  33);	
  #	
  FASTQ	
  quality	
  character	
  
$Q	
  =	
  ord($q)	
  -­‐	
  33;	
  #	
  33	
  offset	
  
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
| | | | | |
33 59 64 73 104 126
S - Sanger Phred+33, 41 values (0, 40)
I - Illumina 1.3 Phred+64, 41 values (0, 40)
X - Solexa Solexa+64, 68 values (-5, 62)
GENETIC
ANALYSIS
of Complex Human Diseases
Secondary	
  analysis	
  
n Alignment	
  back	
  to	
  the	
  reference	
  
u Computa8onally	
  demanding	
  –	
  can’t	
  use	
  BLAST	
  
u Many	
  algorithms	
  (Maq,	
  BWA,	
  bow8e,	
  bow8e2,	
  
Mosaik,	
  NovoAlign,	
  SOAP2,	
  SSAHA,	
  …)	
  
u  hKp://en.wikipedia.org/wiki/List_of_sequence_alignment_sokware	
  	
  
u Sensi8vity	
  to	
  sequencing	
  errors,	
  polymorphisms,	
  
indels,	
  rearrangements	
  
u Tradeoffs	
  in	
  8me	
  vs.	
  memory	
  vs.	
  performance	
  
	
  
GENETIC
ANALYSIS
of Complex Human Diseases
RNA-Seq Workflow 1: Differential
Gene Expression
GENETIC
ANALYSIS
of Complex Human Diseases
RNA-Seq Workflow 2: Differential
Isoform Expression, Exon Usage
GENETIC
ANALYSIS
of Complex Human Diseases
Download data & software
n  Public data from GEO. E.g. GSE32038
u  http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE32038
u  Trapnell et al. Differential gene and transcript expression analysis of RNA-seq
experiments with TopHat and Cufflinks. Nature Protocols 2012: 7:562.
n  Sequence, annotation, indexes (Ensembl)
u  iGenomes: http://tophat.cbcb.umd.edu/igenomes.html
u  Genes: /Annotation/Genes/genes.gtf
u  Indexes: /Sequence/BowtieIndex/genome.*
n  Software:
u  Samtools: http://samtools.sourceforge.net/
u  FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
u  Bowtie: http://bowtie-bio.sourceforge.net/index.shtml
u  Tophat: http://tophat.cbcb.umd.edu/
u  HTSeq: http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html
u  R: http://www.r-project.org/
u  DESeq2: http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
u  Cufflinks: http://cufflinks.cbcb.umd.edu/
u  cummeRbund: http://compbio.mit.edu/cummeRbund/
GENETIC
ANALYSIS
of Complex Human Diseases
Do some quality assessment
Software:
Picard picard.sourceforge.net
FastQC bioinformatics.bbsrc.ac.uk/projects/fastqc
RSeQC code.google.com/p/rseqc
FastX Toolkit hannonlab.cshl.edu/fastx_toolkit
R/ShortRead bioconductor.org/packages/bioc/html/ShortRead.html
GENETIC
ANALYSIS
of Complex Human Diseases
Mapping across splice junctions: tophat
1.  Map reads to genome
2.  Collect unmappable reads
3.  Break reads into segments. Small
segments often independently
align. If align 100bp-kbs apart,
infer splice.
tophat –G genes.gtf –o C1_R1_tophatout /path/bowtieindex/genome C1_R1_1.fq C1_R1_2.fq
tophat –G genes.gtf –o C1_R2_tophatout /path/bowtieindex/genome C1_R2_1.fq C1_R2_2.fq
tophat –G genes.gtf –o C1_R3_tophatout /path/bowtieindex/genome C1_R3_1.fq C1_R3_2.fq
tophat –G genes.gtf –o C2_R1_tophatout /path/bowtieindex/genome C2_R1_1.fq C2_R1_2.fq
tophat –G genes.gtf –o C2_R2_tophatout /path/bowtieindex/genome C2_R2_1.fq C2_R2_2.fq
tophat –G genes.gtf –o C2_R3_tophatout /path/bowtieindex/genome C2_R3_1.fq C2_R3_2.fq
Gene
Annotation Output Directory Bowtie Index Read 1 Read 2
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow 1: Differential Gene Expression
Step 1: Align to Genome
Step 2: Count Reads overlapping genes
Step 3: Differential expression
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow 1: Differential Gene Expression
Step 1: Align to Genome
Step 2: Count Reads overlapping genes
Step 3: Differential expression
Software: HTSeq
http://www-huber.embl.de/users/anders/HTSeq
Run htseq-count on each of the
alignments:
htseq-count <sam_file> <gtf_file>
First convert binary .bam file to
text .sam file using samtools:
samtools view accepted_hits.bam > C1_R1.sam
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow 1: Differential Gene Expression
Step 1: Align to Genome
Step 2: Count Reads overlapping genes
Step 3: Differential expression
Software: DESeq2
http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html
> library(DESeq2)
> sampleFiles <- c("C1_R1.counts.txt", "C1_R2.counts.txt", "C1_R3.counts.txt",
"C2_R1.counts.txt", "C2_R2.counts.txt", "C2_R3.counts.txt")
> sampleCondition <- factor(substr(sampleFiles, 1, 2))
> sampleTable <- data.frame(sampleName=sampleFiles, fileName=sampleFiles,
condition=sampleCondition)
> sampleTable
sampleName fileName condition
1 C1_R1.counts.txt C1_R1.counts.txt C1
2 C1_R2.counts.txt C1_R2.counts.txt C1
3 C1_R3.counts.txt C1_R3.counts.txt C1
4 C2_R1.counts.txt C2_R1.counts.txt C2
5 C2_R2.counts.txt C2_R2.counts.txt C2
6 C2_R3.counts.txt C2_R3.counts.txt C2
dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=".",
design=~condition)
dds <- DESeq(dds)
results <- results(dds)
results <- results[order(results$FDR), ]
plotMA(dds)
...
GENETIC
ANALYSIS
of Complex Human Diseases
RNA-Seq Workflow 2: Differential
Isoform Expression, Exon Usage
GENETIC
ANALYSIS
of Complex Human Diseases
Changes in fragment count for a
gene does not necessarily equal a
change in expression.
Trapnell, Cole, et al. "Differential analysis of gene regulation at transcript resolution with RNA-seq." Nature biotechnology 31.1 (2012): 46-53.
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow 2a: Assemble transcripts
for each sample: cufflinks
n Cufflinks
u Identifies mutually
incompatible
fragments
u Identify minimal set
of transcripts to
explain all the
fragments.
cufflinks -o C1_R1_cufflinksout C1_R1_tophatout/accepted_hits.bam
cufflinks -o C1_R2_cufflinksout C1_R2_tophatout/accepted_hits.bam
cufflinks -o C1_R3_cufflinksout C1_R3_tophatout/accepted_hits.bam
cufflinks -o C2_R1_cufflinksout C2_R1_tophatout/accepted_hits.bam
cufflinks -o C2_R2_cufflinksout C2_R2_tophatout/accepted_hits.bam
cufflinks -o C2_R3_cufflinksout C2_R3_tophatout/accepted_hits.bam
Output Directory Path to alignment
GENETIC
ANALYSIS
of Complex Human Diseases
Merge assemblies: cuffmerge
n  Merge assemblies to create single merged transcriptome
annotation.
u  Option 1: Pool alignments and assemble all at once.
►  Computationally demanding
►  Assembler will be faced complex mixture of isoforms à more error
u  Option 2: Assemble alignments individually, merge resulting
assemblies
►  Cuffmerge: meta-assembler using parsimony.
►  Genes with low expression à insufficient coverage for reconstruction.
►  Merging often recovers complete gene.
►  Newly discovered isoforms integrated w/ known ones (RABT).
GENETIC
ANALYSIS
of Complex Human Diseases
Merge assemblies: cuffmerge
n Create “manifest” of location of all assemblies
n Run Cuffmerge on assemblies using RABT
cuffmerge –g /path/to/annotation/genes.gtf –s /path/to/refgenome/genome.fa assemblies.txt
Reference Gene Annotation
./C1_R1_cufflinksout/transcripts.gtf
./C1_R2_cufflinksout/transcripts.gtf
./C1_R3_cufflinksout/transcripts.gtf
./C2_R1_cufflinksout/transcripts.gtf
./C2_R2_cufflinksout/transcripts.gtf
./C2_R3_cufflinksout/transcripts.gtf
Assemblies.txt: location of assemblies
Reference Genome Sequence Manifest from above
GENETIC
ANALYSIS
of Complex Human Diseases
Differential expression: cuffdiff
n Identify differentially expressed genes &
transcripts
cuffdiff –o cuffdiff_out –b genome.fa –u merged.gtf 
./C1_R1_tophatout/accepted_hits.bam,
./C1_R2_tophatout/accepted_hits.bam,
./C1_R3_tophatout/accepted_hits.bam 
./C2_R1_tophatout/accepted_hits.bam,
./C2_R2_tophatout/accepted_hits.bam,
./C2_R3_tophatout/accepted_hits.bam
Reference
Sequence
Output
directory
Merged
assembly
Location of
alignments
•  1 gene
•  2 TSS
•  2 CDS
•  3 Isoforms
GENETIC
ANALYSIS
of Complex Human Diseases
Downstream analysis &
visualization
GENETIC
ANALYSIS
of Complex Human Diseases
Visualization with cummeRbund
n Install cummeRbund:
u Install from BioConductor:
►  source("http://bioconductor.org/biocLite.R")
►  biocLite("cummeRbund")
u Download and install latest version from
http://compbio.mit.edu/cummeRbund/
n Load the package
u library(cummeRbund)
n Read in the data
u  cuff <- readCufflinks(“/path/to/cuffdiff/output”)
GENETIC
ANALYSIS
of Complex Human Diseases
Visualization with cummeRbund
csDensity(genes(cuff))
csBoxplot(genes(cuff))
csScatter(genes(cuff), "C1", "C2", smooth=T)
csVolcano(genes(cuff), "C1", "C2")
GENETIC
ANALYSIS
of Complex Human Diseases
Visualization with cummeRbund
mygene2 <- getGene(cuff, "Rala")
expressionBarplot(mygene2)
expressionBarplot(isoforms(mygene2))
GENETIC
ANALYSIS
of Complex Human Diseases
DEXSeq
n  Differential Gene Expression (E.g.
DESeq)
n  Differential Isoform Expression
(E.g. Cufflinks)
n  Differential Exon Usage
n  What’s different about DEXSeq?
u  Doesn’t do full transcript
assembly (Cufflinks)
u  Doesn’t count fragments
mapping to genes (DESeq)
u  Avoids assembly and looks for
differences in reads mapping to
individual exons.
u  Uses counts (negative binomial)
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: Installation
n Installation & load:
u  source("http://bioconductor.org/biocLite.R")
u  biocLite(“DEXSeq”)
u  library(DEXSeq)
n Installation comes bundled with useful python
scripts in the python_scripts directory of the
library. Put these in your PATH.
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: Data preparation
n First, prepare “flattened” GFF:
n Create sorted SAM files
n Count reads overlapping counting bins
dexseq_prepare_annotation.py input.gtf exons.gff
Reference
AnnotationScript comes with DEXSeq
samtools view C1_R1-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R1.sam
samtools view C1_R2-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R2.sam
samtools view C1_R3-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R3.sam
samtools view C2_R1-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R1.sam
samtools view C2_R2-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R2.sam
samtools view C2_R3-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R3.sam
dexseq_count.py -p no -s no exons.gff C1_R1.sam C1_R1.counts.txt
dexseq_count.py -p no -s no exons.gff C1_R2.sam C1_R2.counts.txt
dexseq_count.py -p no -s no exons.gff C1_R3.sam C1_R3.counts.txt
dexseq_count.py -p no -s no exons.gff C2_R1.sam C2_R1.counts.txt
dexseq_count.py -p no -s no exons.gff C2_R2.sam C2_R2.counts.txt
dexseq_count.py -p no -s no exons.gff C2_R3.sam C2_R3.counts.txt
Script comes with DEXSeq
Flattened
Annotation Alignment Output file
Output file
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: Data import
n  The pasilla package vignette gives detailed
instructions on how to do this:
http://www.bioconductor.org/packages/release/data/experiment/html/pasilla.html
> design <- data.frame(condition=c(rep("C1",3), rep("C2",3)), replicate=rep(1:3,2))
> rownames(design) <- with(design, paste(condition, "_R", replicate, sep=""))
> design
condition replicate
C1_R1 C1 1
C1_R2 C1 2
C1_R3 C1 3
C2_R1 C2 1
C2_R2 C2 2
C2_R3 C2 3
> countfiles <- file.path(".", paste(rownames(design), ".counts.txt", sep=""))
> countfiles
[1] "./C1_R1.counts.txt" "./C1_R2.counts.txt" "./C1_R3.counts.txt" "./C2_R1.counts.txt"
[5] "./C2_R2.counts.txt" "./C2_R3.counts.txt"
> flattenedfile <- "/Users/sdt5z/smb/u/genomes/dexseq/exons_dme_ens_bdgp525.gff"
> exons <- read.HTSeqCounts(countfiles=countfiles, design=design,
flattenedfile=flattenedfile)
> sampleNames(exons) <- rownames(design)
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: Data Analysis
# Estimate size factors (normalizes for sequencing depth)
exons <- estimateSizeFactors(exons)
sizeFactors(exons)
# Estimate dispersion
exons <- estimateDispersions(exons)
exons <- fitDispersionFunction(exons)
# Test for Differential Exon Usage
exons <- testForDEU(exons)
exons <- estimatelog2FoldChanges(exons)
result <- DEUresultTable(exons)
# How many are significant at FDR 0.001?
table(res$padjust<0.0001)
# M vs A plot
plot(result$meanBase, result[, "log2fold(C2/C1)"], log="x”)
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: visualization
plotDEXSeq(exons, "FBgn0030362", cex.axis=1.2, cex=1.3, lwd=2, legend=T, displayTranscripts=T)
GENETIC
ANALYSIS
of Complex Human Diseases
Using DEXSeq: HTML Report
library(biomaRt)
mart <- useMart("ensembl", dataset="dmelanogaster_gene_ensembl")
listAttributes(mart)[1:25,]
attributes <- c("ensembl_gene_id", "external_gene_id", "description")
DEXSeqHTML(exons, FDR=0.0001, mart=mart, filter="ensembl_gene_id", attributes=attributes)
GENETIC
ANALYSIS
of Complex Human Diseases
Downstream analysis
n  Now you have a list of:
u Genes
u Isoforms (genes)
u Exons (genes)
n  How to place in functional context?
n  Pathway / functional analysis!
u Gene Ontology over-representation
u Gene Set Enrichment Analysis
u Signaling Pathway Impact Analysis
u Many more…
n  Resources:
u  hKp://geMnggene8csdone.blogspot.com/2012/03/pathway-­‐analysis-­‐for-­‐high-­‐throughput.html	
  
u  hKp://www.slideshare.net/turnersd/pathway-­‐analysis-­‐2012-­‐17947529	
  
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow Management: Galaxy
n http:usegalaxy.org
GENETIC
ANALYSIS
of Complex Human Diseases
Workflow Management: Taverna
n  Taverna: http://www.taverna.org.uk/
n  TavernaPBS: http://sourceforge.net/projects/tavernapbs/
GENETIC
ANALYSIS
of Complex Human Diseases
Further Reading
n  RNA-Seq:
u  Garber, M., Grabherr, M. G., Guttman, M., & Trapnell, C. (2011). Computational methods for transcriptome annotation and
quantification using RNA-seq. Nature methods, 8(6), 469-77.
u  Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., & Gilad, Y. (2008). RNA-seq: an assessment of technical
reproducibility and comparison with gene expression arrays. Genome research, 18(9), 1509-17.
u  Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B. (2008). Mapping and quantifying mammalian
transcriptomes by RNA-Seq. Nature methods, 5(7), 621-8.
u  Ozsolak, F., & Milos, P. M. (2011). RNA sequencing: advances, challenges and opportunities. Nature reviews. Genetics,
12(2), 87-98.
u  Toung, J. M., Morley, M., Li, M., & Cheung, V. G. (2011). RNA-sequence analysis of human B-cells. Genome research,
991-998.
u  Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews. Genetics,
10(1), 57-63.
n  Bowtie/Tophat:
u  Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA
sequences to the human genome. Genome biology, 10(3), R25.
u  Trapnell, C., Pachter, L., & Salzberg, S. L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (Oxford,
England), 25(9), 1105-11.
n  Cufflinks:
u  Roberts, A., Pimentel, H., Trapnell, C., & Pachter, L. (2011). Identification of novel transcripts in annotated genomes using
RNA-Seq. Bioinformatics (Oxford, England), 27(17), 2325-9.
u  Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., et al. (2012). Differential gene and transcript
expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7(3), 562-578.
u  Trapnell, C., Williams, B. a, Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., et al. (2010). Transcript
assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell
differentiation. Nature biotechnology, 28(5), 511-5.
n  DEXSeq:
u  Vignette: http://watson.nci.nih.gov/bioc_mirror/packages/2.9/bioc/html/DEXSeq.html.
u  Pre-pub manuscript: Anders, S., Reyes, A., Huber, W. (2012). Detecting differential usage of exons from RNA-Seq data.
Nautre Precedings, DOI: 10.1038/npre.2012.6837.2.
GENETIC
ANALYSIS
of Complex Human Diseases
Online Community Forum and
Discussion
n Seqanswers
u  http://SEQanswers.com
u  Format: Forum
u  Li et al. SEQanswers : An open access community for
collaboratively decoding genomes. Bioinformatics (2012).
n BioStar:
u  http://biostar.stackexchange.com
u  Format: Q&A
u  Parnell et al. BioStar: an online question & answer resource for
the bioinformatics community. PLoS Comp Bio (2011).
n  Other Bioinformatics Resources: stephenturner.us/p/edu
GENETIC
ANALYSIS
of Complex Human Diseases
DNA Methylation: Importance
n Occurs most frequently at CpG sites
n High methylation at promoters ≈ silencing
n Methylation perturbed in cancer
n Methylation associated with many other complex
diseases: neural, autoimmune, response to env.
n Mapping DNA methylation à new disease
genes & drug targets.
GENETIC
ANALYSIS
of Complex Human Diseases
DNA Methylation: Challenges
n Dynamic and tissue-specific
n DNA à Collection of cells which vary in 5meC
patterns à 5meC pattern is complex.
n Further, uneven distribution of CpG targets
n Multiple classes of methods:
u Bisulfite, sequence-based: Assay methylated
target sequences across individual DNAs.
u Affinity enrichment, count-based: Assay
methylation level across many genomic loci.
GENETIC
ANALYSIS
of Complex Human Diseases
DNA Methylation: Mapping
BS-Seq Whole-genome bisulfite sequencing
RRBS-Seq Reduced representation bisulfite sequencing
BC-Seq Bisulfite capture sequencing
BSPP Bisulfite specific padlock probes
Methyl-Seq Restriction enzyme based methyl-seq
MSCC Methyl sensitive cut counting
HELP-Seq HpaII fragment enrichment by ligation PCR
MCA-Seq Methylated CpG island amplification
MeDIP-Seq Methylated DNA immunoprecipitation
MBP-Seq Methyl-binding protein sequencing
MethylCap-seq Methylated DNA capture by affinity purification
MIRA-Seq Methylated CpG island recovery assay
RNA-Seq High-throughput cDNA sequencing
DNA
Methylation
Gene
Expression
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation: REs and PCR
n Restriction enzyme digest
u Isoschizomers HpaII and MspI both recognize
same sequence: 5’-CCGG-3’
u MspI digests regardless of methylation
u HpaII only digests at unmethylated sites
n PCR à gel electrophoresis à southern blot
n Pros: Highly sensitive
n Cons: Low-throughput, high false positive rate
because of incomplete digestion (for reasons
other than methylation).
GENETIC
ANALYSIS
of Complex Human Diseases
Bisulfite sequencing
n  Sodium bisulfite converts unmethylated (but not methylated) C’s into U’s.
n  This introduces a methylation-specific “SNP”.
n  RRBS – library enriched for CpG-dense regions by digesting with MspI.
GENETIC
ANALYSIS
of Complex Human Diseases
MeDIP-Seq
n MeDIP-Seq = Methylated
DNA immunoprecipitation
n Uses antibody against 5-
methylcytosine to retrieve
methylated fragments from
sonicated DNA.
n Enrichment method = count
number of reads
GENETIC
ANALYSIS
of Complex Human Diseases
MethylCap-Seq
n Uses methyl-binding domain (MBD) protein to
obtain DNA with similar methylation levels.
n Also a counting method.
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation: Accuracy
n  Bock et al. Quantitative
comparison of genome-wide DNA
methylation mapping technologies.
Nature biotechnology, 28(10),
1106-14.
n  MeDIP, MethylCap, RRBS largely
concordant with Illumina Infinium
assay
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation methods: coverage
n Coverage varies among different methods
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation: Features & Biases
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation: Bioinformatics Resources
Resource	
   Purpose	
   URL	
  Refs	
  
Batman	
   MeDIP	
  DNA	
  methyla8on	
  analysis	
  tool	
   hKp://td-­‐blade.gurdon.cam.ac.uk/sokware/batman	
  
BDPC	
   DNA	
  methyla8on	
  analysis	
  plalorm	
   hKp://biochem.jacobs-­‐university.de/BDPC	
  
BSMAP	
   Whole-­‐genome	
  bisulphite	
  sequence	
  mapping	
   hKp://code.google.com/p/bsmap	
  
CpG	
  Analyzer	
   Windows-­‐based	
  program	
  for	
  bisulphite	
  DNA	
   -­‐	
  
CpGcluster	
   CpG	
  island	
  iden8fica8on	
   hKp://bioinfo2.ugr.es/CpGcluster	
  
CpGFinder	
   Online	
  program	
  for	
  CpG	
  island	
  iden8fica8on	
   hKp://linux1.sokberry.com	
  
CpG	
  Island	
  Explorer	
   Online	
  program	
  for	
  CpG	
  Island	
  iden8fica8on	
   hKp://bioinfo.hku.hk/cpgieintro.html	
  
CpG	
  Island	
  Searcher	
   Online	
  program	
  for	
  CpG	
  Island	
  iden8fica8on	
   hKp://cpgislands.usc.edu	
  
CpG	
  PaKernFinder	
   Windows-­‐based	
  program	
  for	
  bisulphite	
  DNA	
   -­‐	
  
CpG	
  Promoter	
   Large-­‐scale	
  promoter	
  mapping	
  using	
  CpG	
  islands	
   hKp://www.cshl.edu/OTT/html/cpg_promoter.html	
  
CpG	
  ra8o	
  and	
  GC	
  content	
  PloKer	
   Online	
  program	
  for	
  ploMng	
  the	
  observed:expected	
  ra8o	
  of	
  CpG	
   hKp://mwsross.bms.ed.ac.uk/public/cgi-­‐bin/cpg.pl	
  
CpGviewer	
   Bisulphite	
  DNA	
  sequencing	
  viewer	
   hKp://dna.leeds.ac.uk/cpgviewer	
  
CyMATE	
   Bisulphite-­‐based	
  analysis	
  of	
  plant	
  genomic	
  DNA	
   hKp://www.gmi.oeaw.ac.at/en/cymate-­‐index/	
  
EMBOSS	
  CpGPlot/	
  CpGReport	
   Online	
  program	
  for	
  ploMng	
  CpG-­‐rich	
  regions	
   hKp://www.ebi.ac.uk/Tools/emboss/cpgplot/index.html	
  
Epigenomics	
  Roadmap	
   NIH	
  Epigenomics	
  Roadmap	
  Ini8a8ve	
  homepage	
   hKp://nihroadmap.nih.gov/epigenomics	
  
Epinexus	
   DNA	
  methyla8on	
  analysis	
  tools	
   hKp://epinexus.net/home.html	
  
MEDME	
   Sokware	
  package	
  (using	
  R)	
  for	
  modelling	
  MeDIP	
  experimental	
  data	
   hKp://espresso.med.yale.edu/medme	
  
methBLAST	
   Similarity	
  search	
  program	
  for	
  bisulphite-­‐modified	
  DNA	
   hKp://medgen.ugent.be/methBLAST	
  
MethDB	
   Database	
  for	
  DNA	
  methyla8on	
  data	
   hKp://www.methdb.de	
  
MethPrimer	
   Primer	
  design	
  for	
  bisulphite	
  PCR	
   hKp://www.urogene.org/methprimer	
  
methPrimerDB	
   PCR	
  primers	
  for	
  DNA	
  methyla8on	
  analysis	
   hKp://medgen.ugent.be/methprimerdb	
  
MethTools	
   Bisulphite	
  sequence	
  data	
  analysis	
  tool	
   hKp://www.methdb.de	
  
MethyCancer	
  Database	
   Database	
  of	
  cancer	
  DNA	
  methyla8on	
  data	
   hKp://methycancer.psych.ac.cn	
  
Methyl	
  Primer	
  Express	
   Primer	
  design	
  for	
  bisulphite	
  PCR	
   hKp://www.appliedbiosystems.com/	
  
Methylumi	
   Bioconductor	
  pkg	
  for	
  DNA	
  methyla8on	
  data	
  from	
  Illumina	
   hKp://www.bioconductor.org/packages/bioc/html/	
  
Methylyzer	
   Bisulphite	
  DNA	
  sequence	
  visualiza8on	
  tool	
   hKp://ubio.bioinfo.cnio.es/Methylyzer/main/index.html	
  
mPod	
   DNA	
  methyla8on	
  viewer	
  integrated	
  w/	
  Ensembl	
  genome	
  browser	
   hKp://www.compbio.group.cam.ac.uk/Projects/	
  
PubMeth	
   Database	
  of	
  DNA	
  methyla8on	
  literature	
   hKp://www.pubmeth.org	
  
QUMA	
   Quan8fica8on	
  tool	
  for	
  methyla8on	
  analysis	
   hKp://quma.cdb.riken.jp	
  
TCGA	
  Data	
  Portal	
   Database	
  of	
  TCGA	
  DNA	
  methyla8on	
  data	
   hKp://cancergenome.nih.gov/dataportal	
  
GENETIC
ANALYSIS
of Complex Human Diseases
Methylation: Further Reading
Bock, C., Tomazou, E. M., Brinkman, A. B., Müller, F., Simmer, F., Gu, H., Jäger, N., et al. (2010). Quantitative
comparison of genome-wide DNA methylation mapping technologies. Nature biotechnology, 28(10), 1106-14.
Brinkman, A. B., Simmer, F., Ma, K., Kaan, A., Zhu, J., & Stunnenberg, H. G. (2010). Whole-genome DNA methylation
profiling using MethylCap-seq. Methods (San Diego, Calif.), 52(3), 232-6.
Brunner, A. L., Johnson, D. S., Kim, S. W., Valouev, A., Reddy, T. E., et al. (2009). Distinct DNA methylation patterns
characterize differentiated human embryonic stem cells and developing human fetal liver, 1044-1056.
Gu, H., Bock, C., Mikkelsen, T. S., Jäger, N., Smith, Z. D., Tomazou, E., Gnirke, A., et al. (2010). Genome-scale DNA
methylation mapping of clinical samples at single-nucleotide resolution. Nature methods, 7(2), 133-6.
Harris, R. A., Wang, T., Coarfa, C., Nagarajan, R. P., Hong, C., Downey, S. L., Johnson, B. E., et al. (2010). Comparison
of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic
modifications. Nature biotechnology, 28(10), 1097-105.
Kerick, M., Fischer, A., & Schweiger, M.-ruth. (2012). Bioinformatics for High Throughput Sequencing. (N.
Rodríguez-Ezpeleta, M. Hackenberg, & A. M. Aransay, Eds.), 151-167. New York, NY: Springer New York.
Laird, P. W. (2010). Principles and challenges of genomewide DNA methylation analysis. Nature reviews. Genetics,
11(3), 191-203.
Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., & Schübeler, D. (2005). Chromosome-wide
and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed
human cells. Nature genetics, 37(8), 853-62. doi:10.1038/ng1598
GENETIC
ANALYSIS
of Complex Human Diseases
Thank you
Web: bioinformatics.virginia.edu
E-mail: bioinformatics@virginia.edu
Blog: www.GettingGeneticsDone.com
Twitter: twitter.com/genetics_blog

Weitere ähnliche Inhalte

Was ist angesagt?

Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisEfi Athieniti
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA Roberto Scarafia
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….Mohammad Hossein Banabazi
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 

Was ist angesagt? (20)

Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing Analysis
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 

Andere mochten auch

Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 

Andere mochten auch (6)

Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 

Ähnlich wie Examining gene expression and methylation with next gen sequencing

Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherNils Gehlenborg
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptxMalihaTanveer1
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Determining Evolutionary Relationships Using BLAST
Determining Evolutionary Relationships Using BLASTDetermining Evolutionary Relationships Using BLAST
Determining Evolutionary Relationships Using BLASTDanielle Snowflack
 
Graziano Pesole - il progetto EPIGEN
Graziano Pesole - il progetto EPIGENGraziano Pesole - il progetto EPIGEN
Graziano Pesole - il progetto EPIGENeventi-ITBbari
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Festival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkFestival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkJean Fan
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
 

Ähnlich wie Examining gene expression and methylation with next gen sequencing (20)

Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
project
projectproject
project
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Determining Evolutionary Relationships Using BLAST
Determining Evolutionary Relationships Using BLASTDetermining Evolutionary Relationships Using BLAST
Determining Evolutionary Relationships Using BLAST
 
Graziano Pesole - il progetto EPIGEN
Graziano Pesole - il progetto EPIGENGraziano Pesole - il progetto EPIGEN
Graziano Pesole - il progetto EPIGEN
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
CRISPR Array
CRISPR ArrayCRISPR Array
CRISPR Array
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
Festival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkFestival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talk
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 

Kürzlich hochgeladen

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 

Kürzlich hochgeladen (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 

Examining gene expression and methylation with next gen sequencing

  • 1. GENETIC ANALYSIS of Complex Human Diseases Examining Gene Expression and Methylation with Next-Gen Sequencing Stephen Turner, Ph.D. Bioinformatics Core Director bioinformatics.virginia.edu University of Virginia
  • 2. GENETIC ANALYSIS of Complex Human Diseases Gene expression pre-2008 PCR Microarrays
  • 4. GENETIC ANALYSIS of Complex Human Diseases Advantages of RNA-Seq n  No reference necessary n  Low background (no cross-hybridization) n  Unlimited dynamic range (FC 9000 Science 320:1344) n  Direct counting (microarrays: indirect – hybridization) n  Can characterize full transcriptome u mRNA and ncRNA (miRNA, lncRNA, snoRNA, etc) u Differential gene expression u Differential coding output u Differential TSS usage u Differential isoform expression
  • 5. GENETIC ANALYSIS of Complex Human Diseases Isoform level data
  • 6. GENETIC ANALYSIS of Complex Human Diseases Isoform level data
  • 7. GENETIC ANALYSIS of Complex Human Diseases Differential splicing & TSS use
  • 8. GENETIC ANALYSIS of Complex Human Diseases Is it accurate? n  Marioni et al. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 2008 18:1509.
  • 9. GENETIC ANALYSIS of Complex Human Diseases RNA-Seq Challenges n  Library construction u  Size selection (messenger, small) u  Strand specificity? n  Bioinformatic challenges u  Spliced alignment u  Transcript deconvolution n  Statistical Challenges u  Highly variable abundance u  Sample size: never, ever, plan n=1 u  Normalization (RPKM) ►  More reads from longer transcripts, higher sequencing depth ►  Want to compare features of different lengths ►  Want to compare conditions with different total sequence depth
  • 10. GENETIC ANALYSIS of Complex Human Diseases RNA-Seq Overview Condi&on  1   (normal  colon)   Condi&on  2   (colon  tumor)   Samples  of  interest   AAAAA mRNA AAAAA mRNA TTTTT Library @HWUSI-EAS100R:6:73:941:1973#0/1 GATTTGGGGTTCAAAGCAGTATCGATCAAATA +HWUSI-EAS100R:6:73:941:1973#0/1 !''*((((***+))%%%++)(%%%%).1***- @HWUSI-EAS100R:6:73:941:1973#0/1 CATCGACGTAGATCGACTACATGAACTGCTCG +HWUSI-EAS100R:6:73:941:1973#0/1 !'’*+(*+!+(*!+*(((***!%%%%!%%(+-
  • 11. GENETIC ANALYSIS of Complex Human Diseases Common question #1: Depth n  Question: how much sequence do I need? n  Answer: it’s complicated. n  Oversimplified answer: 20-50 million PE reads / sample (mouse/human). n  Depends on: u  Size & complexity of transcriptome u  Application: differential gene expression, transcript discovery u  Tissue type, RNA quality, library preparation u  Sequencing type: length, paired-end vs single-end, etc. n  Find a publication in your field with similar goals. n  Good news: ¼ HiSeq lane usually sufficient.
  • 12. GENETIC ANALYSIS of Complex Human Diseases Common question #2: Sample Size n Question: How many samples should I sequence? n Oversimplified Answer: At least 3 biological replicates per condition. n Depends on: u Sequencing depth u Application u Goals (prioritization, biomarker discovery, etc.) u Effect size, desired power, statistical significance n Find a publication with similar goals
  • 13. GENETIC ANALYSIS of Complex Human Diseases Common question #3: Workflow n  How do I analyze the data? n  No standards! u  Unspliced aligners: BWA, Bowtie, Bowtie2, MANY others! u  Spliced aligners: STAR, Rum, Tophat, Tophat2-Bowtie1, Tophat2-Bowtie2, GSNAP, MANY others. u  Reference builds & annotations: UCSC, Entrez, Ensembl u  Assembly: Cufflinks, Scripture, Trinity, G.Mor.Se, Velvet, TransABySS u  Quantification: Cufflinks, RSEM, eXpress, MISO, etc. u  Differential expression: Cuffdiff, Cuffdiff2, DegSeq, DESeq, EdgeR, Myrna n  Like early microarray days: lots of excitement, lots of tools, little knowledge of integrating tools in pipeline! n  Benchmarks u  Microarray: Spike-ins (Irizarry) u  RNA-Seq: ???, simulation, ???
  • 14. GENETIC ANALYSIS of Complex Human Diseases Common question #3: Workflow Eyras et al. Methods to Study Splicing from RNA-Seq. http://dx.doi.org/10.6084/m9.figshare.679993 Turner SD. RNA-seq Workflows and Tools. http://dx.doi.org/10.6084/m9.figshare.662782
  • 15. GENETIC ANALYSIS of Complex Human Diseases Phases  of  NGS  Analysis   n  Primary   u  Conversion  of  raw  machine  signal  into  sequence  and  quali8es   n  secondary   u  Alignment  of  reads  to  reference  genome  or  transcriptome   u  or  de  novo  assembly  of  reads  into  con8gs   n  Ter8ary   u  SNP  discovery/genotyping   u  Peak  discovery/quan8fica8on  (ChIP,  MeDIP)   u  Transcript  assembly/quan8fica8on  (RNA-­‐seq)   n  Quaternary   u  Differen8al  expression   u  Enrichment,  pathways,  correla8on,  clustering,  visualiza8on,  etc.     u  hKp://geMnggene8csdone.blogspot.com/2012/03/pathway-­‐analysis-­‐for-­‐high-­‐throughput.html   u  hKp://www.slideshare.net/turnersd/pathway-­‐analysis-­‐2012-­‐17947529  
  • 16. GENETIC ANALYSIS of Complex Human Diseases Primary  Analysis:  Get  FASTQ  file   @HWUSI-EAS100R:6:73:941:1973#0/1 GATTTGGGGTTCAAAGCAGTATCGATCAAATA +HWUSI-EAS100R:6:73:941:1973#0/1 !''*((((***+))%%%++)(%%%%).1***-
  • 17. GENETIC ANALYSIS of Complex Human Diseases “Phred-­‐scaled”  base  quali&es   #  $p  is  probability  base  is  erroneous   $Q  =  -­‐10  *  log($p)  /  log(10);  #  Phred  Q   $q  =  chr(($Q<=40?  $Q  :  40)  +  33);  #  FASTQ  quality  character   $Q  =  ord($q)  -­‐  33;  #  33  offset   SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS..................................................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~ | | | | | | 33 59 64 73 104 126 S - Sanger Phred+33, 41 values (0, 40) I - Illumina 1.3 Phred+64, 41 values (0, 40) X - Solexa Solexa+64, 68 values (-5, 62)
  • 18. GENETIC ANALYSIS of Complex Human Diseases Secondary  analysis   n Alignment  back  to  the  reference   u Computa8onally  demanding  –  can’t  use  BLAST   u Many  algorithms  (Maq,  BWA,  bow8e,  bow8e2,   Mosaik,  NovoAlign,  SOAP2,  SSAHA,  …)   u  hKp://en.wikipedia.org/wiki/List_of_sequence_alignment_sokware     u Sensi8vity  to  sequencing  errors,  polymorphisms,   indels,  rearrangements   u Tradeoffs  in  8me  vs.  memory  vs.  performance    
  • 19. GENETIC ANALYSIS of Complex Human Diseases RNA-Seq Workflow 1: Differential Gene Expression
  • 20. GENETIC ANALYSIS of Complex Human Diseases RNA-Seq Workflow 2: Differential Isoform Expression, Exon Usage
  • 21. GENETIC ANALYSIS of Complex Human Diseases Download data & software n  Public data from GEO. E.g. GSE32038 u  http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE32038 u  Trapnell et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012: 7:562. n  Sequence, annotation, indexes (Ensembl) u  iGenomes: http://tophat.cbcb.umd.edu/igenomes.html u  Genes: /Annotation/Genes/genes.gtf u  Indexes: /Sequence/BowtieIndex/genome.* n  Software: u  Samtools: http://samtools.sourceforge.net/ u  FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ u  Bowtie: http://bowtie-bio.sourceforge.net/index.shtml u  Tophat: http://tophat.cbcb.umd.edu/ u  HTSeq: http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html u  R: http://www.r-project.org/ u  DESeq2: http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html u  Cufflinks: http://cufflinks.cbcb.umd.edu/ u  cummeRbund: http://compbio.mit.edu/cummeRbund/
  • 22. GENETIC ANALYSIS of Complex Human Diseases Do some quality assessment Software: Picard picard.sourceforge.net FastQC bioinformatics.bbsrc.ac.uk/projects/fastqc RSeQC code.google.com/p/rseqc FastX Toolkit hannonlab.cshl.edu/fastx_toolkit R/ShortRead bioconductor.org/packages/bioc/html/ShortRead.html
  • 23. GENETIC ANALYSIS of Complex Human Diseases Mapping across splice junctions: tophat 1.  Map reads to genome 2.  Collect unmappable reads 3.  Break reads into segments. Small segments often independently align. If align 100bp-kbs apart, infer splice. tophat –G genes.gtf –o C1_R1_tophatout /path/bowtieindex/genome C1_R1_1.fq C1_R1_2.fq tophat –G genes.gtf –o C1_R2_tophatout /path/bowtieindex/genome C1_R2_1.fq C1_R2_2.fq tophat –G genes.gtf –o C1_R3_tophatout /path/bowtieindex/genome C1_R3_1.fq C1_R3_2.fq tophat –G genes.gtf –o C2_R1_tophatout /path/bowtieindex/genome C2_R1_1.fq C2_R1_2.fq tophat –G genes.gtf –o C2_R2_tophatout /path/bowtieindex/genome C2_R2_1.fq C2_R2_2.fq tophat –G genes.gtf –o C2_R3_tophatout /path/bowtieindex/genome C2_R3_1.fq C2_R3_2.fq Gene Annotation Output Directory Bowtie Index Read 1 Read 2
  • 24. GENETIC ANALYSIS of Complex Human Diseases Workflow 1: Differential Gene Expression Step 1: Align to Genome Step 2: Count Reads overlapping genes Step 3: Differential expression
  • 25. GENETIC ANALYSIS of Complex Human Diseases Workflow 1: Differential Gene Expression Step 1: Align to Genome Step 2: Count Reads overlapping genes Step 3: Differential expression Software: HTSeq http://www-huber.embl.de/users/anders/HTSeq Run htseq-count on each of the alignments: htseq-count <sam_file> <gtf_file> First convert binary .bam file to text .sam file using samtools: samtools view accepted_hits.bam > C1_R1.sam
  • 26. GENETIC ANALYSIS of Complex Human Diseases Workflow 1: Differential Gene Expression Step 1: Align to Genome Step 2: Count Reads overlapping genes Step 3: Differential expression Software: DESeq2 http://www.bioconductor.org/packages/2.12/bioc/html/DESeq2.html > library(DESeq2) > sampleFiles <- c("C1_R1.counts.txt", "C1_R2.counts.txt", "C1_R3.counts.txt", "C2_R1.counts.txt", "C2_R2.counts.txt", "C2_R3.counts.txt") > sampleCondition <- factor(substr(sampleFiles, 1, 2)) > sampleTable <- data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition) > sampleTable sampleName fileName condition 1 C1_R1.counts.txt C1_R1.counts.txt C1 2 C1_R2.counts.txt C1_R2.counts.txt C1 3 C1_R3.counts.txt C1_R3.counts.txt C1 4 C2_R1.counts.txt C2_R1.counts.txt C2 5 C2_R2.counts.txt C2_R2.counts.txt C2 6 C2_R3.counts.txt C2_R3.counts.txt C2 dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=".", design=~condition) dds <- DESeq(dds) results <- results(dds) results <- results[order(results$FDR), ] plotMA(dds) ...
  • 27. GENETIC ANALYSIS of Complex Human Diseases RNA-Seq Workflow 2: Differential Isoform Expression, Exon Usage
  • 28. GENETIC ANALYSIS of Complex Human Diseases Changes in fragment count for a gene does not necessarily equal a change in expression. Trapnell, Cole, et al. "Differential analysis of gene regulation at transcript resolution with RNA-seq." Nature biotechnology 31.1 (2012): 46-53.
  • 29. GENETIC ANALYSIS of Complex Human Diseases Workflow 2a: Assemble transcripts for each sample: cufflinks n Cufflinks u Identifies mutually incompatible fragments u Identify minimal set of transcripts to explain all the fragments. cufflinks -o C1_R1_cufflinksout C1_R1_tophatout/accepted_hits.bam cufflinks -o C1_R2_cufflinksout C1_R2_tophatout/accepted_hits.bam cufflinks -o C1_R3_cufflinksout C1_R3_tophatout/accepted_hits.bam cufflinks -o C2_R1_cufflinksout C2_R1_tophatout/accepted_hits.bam cufflinks -o C2_R2_cufflinksout C2_R2_tophatout/accepted_hits.bam cufflinks -o C2_R3_cufflinksout C2_R3_tophatout/accepted_hits.bam Output Directory Path to alignment
  • 30. GENETIC ANALYSIS of Complex Human Diseases Merge assemblies: cuffmerge n  Merge assemblies to create single merged transcriptome annotation. u  Option 1: Pool alignments and assemble all at once. ►  Computationally demanding ►  Assembler will be faced complex mixture of isoforms à more error u  Option 2: Assemble alignments individually, merge resulting assemblies ►  Cuffmerge: meta-assembler using parsimony. ►  Genes with low expression à insufficient coverage for reconstruction. ►  Merging often recovers complete gene. ►  Newly discovered isoforms integrated w/ known ones (RABT).
  • 31. GENETIC ANALYSIS of Complex Human Diseases Merge assemblies: cuffmerge n Create “manifest” of location of all assemblies n Run Cuffmerge on assemblies using RABT cuffmerge –g /path/to/annotation/genes.gtf –s /path/to/refgenome/genome.fa assemblies.txt Reference Gene Annotation ./C1_R1_cufflinksout/transcripts.gtf ./C1_R2_cufflinksout/transcripts.gtf ./C1_R3_cufflinksout/transcripts.gtf ./C2_R1_cufflinksout/transcripts.gtf ./C2_R2_cufflinksout/transcripts.gtf ./C2_R3_cufflinksout/transcripts.gtf Assemblies.txt: location of assemblies Reference Genome Sequence Manifest from above
  • 32. GENETIC ANALYSIS of Complex Human Diseases Differential expression: cuffdiff n Identify differentially expressed genes & transcripts cuffdiff –o cuffdiff_out –b genome.fa –u merged.gtf ./C1_R1_tophatout/accepted_hits.bam, ./C1_R2_tophatout/accepted_hits.bam, ./C1_R3_tophatout/accepted_hits.bam ./C2_R1_tophatout/accepted_hits.bam, ./C2_R2_tophatout/accepted_hits.bam, ./C2_R3_tophatout/accepted_hits.bam Reference Sequence Output directory Merged assembly Location of alignments •  1 gene •  2 TSS •  2 CDS •  3 Isoforms
  • 33. GENETIC ANALYSIS of Complex Human Diseases Downstream analysis & visualization
  • 34. GENETIC ANALYSIS of Complex Human Diseases Visualization with cummeRbund n Install cummeRbund: u Install from BioConductor: ►  source("http://bioconductor.org/biocLite.R") ►  biocLite("cummeRbund") u Download and install latest version from http://compbio.mit.edu/cummeRbund/ n Load the package u library(cummeRbund) n Read in the data u  cuff <- readCufflinks(“/path/to/cuffdiff/output”)
  • 35. GENETIC ANALYSIS of Complex Human Diseases Visualization with cummeRbund csDensity(genes(cuff)) csBoxplot(genes(cuff)) csScatter(genes(cuff), "C1", "C2", smooth=T) csVolcano(genes(cuff), "C1", "C2")
  • 36. GENETIC ANALYSIS of Complex Human Diseases Visualization with cummeRbund mygene2 <- getGene(cuff, "Rala") expressionBarplot(mygene2) expressionBarplot(isoforms(mygene2))
  • 37. GENETIC ANALYSIS of Complex Human Diseases DEXSeq n  Differential Gene Expression (E.g. DESeq) n  Differential Isoform Expression (E.g. Cufflinks) n  Differential Exon Usage n  What’s different about DEXSeq? u  Doesn’t do full transcript assembly (Cufflinks) u  Doesn’t count fragments mapping to genes (DESeq) u  Avoids assembly and looks for differences in reads mapping to individual exons. u  Uses counts (negative binomial)
  • 38. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: Installation n Installation & load: u  source("http://bioconductor.org/biocLite.R") u  biocLite(“DEXSeq”) u  library(DEXSeq) n Installation comes bundled with useful python scripts in the python_scripts directory of the library. Put these in your PATH.
  • 39. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: Data preparation n First, prepare “flattened” GFF: n Create sorted SAM files n Count reads overlapping counting bins dexseq_prepare_annotation.py input.gtf exons.gff Reference AnnotationScript comes with DEXSeq samtools view C1_R1-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R1.sam samtools view C1_R2-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R2.sam samtools view C1_R3-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C1_R3.sam samtools view C2_R1-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R1.sam samtools view C2_R2-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R2.sam samtools view C2_R3-tophat-out/accepted_hits.bam | sort –k 1,1 –k2,2n > C2_R3.sam dexseq_count.py -p no -s no exons.gff C1_R1.sam C1_R1.counts.txt dexseq_count.py -p no -s no exons.gff C1_R2.sam C1_R2.counts.txt dexseq_count.py -p no -s no exons.gff C1_R3.sam C1_R3.counts.txt dexseq_count.py -p no -s no exons.gff C2_R1.sam C2_R1.counts.txt dexseq_count.py -p no -s no exons.gff C2_R2.sam C2_R2.counts.txt dexseq_count.py -p no -s no exons.gff C2_R3.sam C2_R3.counts.txt Script comes with DEXSeq Flattened Annotation Alignment Output file Output file
  • 40. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: Data import n  The pasilla package vignette gives detailed instructions on how to do this: http://www.bioconductor.org/packages/release/data/experiment/html/pasilla.html > design <- data.frame(condition=c(rep("C1",3), rep("C2",3)), replicate=rep(1:3,2)) > rownames(design) <- with(design, paste(condition, "_R", replicate, sep="")) > design condition replicate C1_R1 C1 1 C1_R2 C1 2 C1_R3 C1 3 C2_R1 C2 1 C2_R2 C2 2 C2_R3 C2 3 > countfiles <- file.path(".", paste(rownames(design), ".counts.txt", sep="")) > countfiles [1] "./C1_R1.counts.txt" "./C1_R2.counts.txt" "./C1_R3.counts.txt" "./C2_R1.counts.txt" [5] "./C2_R2.counts.txt" "./C2_R3.counts.txt" > flattenedfile <- "/Users/sdt5z/smb/u/genomes/dexseq/exons_dme_ens_bdgp525.gff" > exons <- read.HTSeqCounts(countfiles=countfiles, design=design, flattenedfile=flattenedfile) > sampleNames(exons) <- rownames(design)
  • 41. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: Data Analysis # Estimate size factors (normalizes for sequencing depth) exons <- estimateSizeFactors(exons) sizeFactors(exons) # Estimate dispersion exons <- estimateDispersions(exons) exons <- fitDispersionFunction(exons) # Test for Differential Exon Usage exons <- testForDEU(exons) exons <- estimatelog2FoldChanges(exons) result <- DEUresultTable(exons) # How many are significant at FDR 0.001? table(res$padjust<0.0001) # M vs A plot plot(result$meanBase, result[, "log2fold(C2/C1)"], log="x”)
  • 42. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: visualization plotDEXSeq(exons, "FBgn0030362", cex.axis=1.2, cex=1.3, lwd=2, legend=T, displayTranscripts=T)
  • 43. GENETIC ANALYSIS of Complex Human Diseases Using DEXSeq: HTML Report library(biomaRt) mart <- useMart("ensembl", dataset="dmelanogaster_gene_ensembl") listAttributes(mart)[1:25,] attributes <- c("ensembl_gene_id", "external_gene_id", "description") DEXSeqHTML(exons, FDR=0.0001, mart=mart, filter="ensembl_gene_id", attributes=attributes)
  • 44. GENETIC ANALYSIS of Complex Human Diseases Downstream analysis n  Now you have a list of: u Genes u Isoforms (genes) u Exons (genes) n  How to place in functional context? n  Pathway / functional analysis! u Gene Ontology over-representation u Gene Set Enrichment Analysis u Signaling Pathway Impact Analysis u Many more… n  Resources: u  hKp://geMnggene8csdone.blogspot.com/2012/03/pathway-­‐analysis-­‐for-­‐high-­‐throughput.html   u  hKp://www.slideshare.net/turnersd/pathway-­‐analysis-­‐2012-­‐17947529  
  • 45. GENETIC ANALYSIS of Complex Human Diseases Workflow Management: Galaxy n http:usegalaxy.org
  • 46. GENETIC ANALYSIS of Complex Human Diseases Workflow Management: Taverna n  Taverna: http://www.taverna.org.uk/ n  TavernaPBS: http://sourceforge.net/projects/tavernapbs/
  • 47. GENETIC ANALYSIS of Complex Human Diseases Further Reading n  RNA-Seq: u  Garber, M., Grabherr, M. G., Guttman, M., & Trapnell, C. (2011). Computational methods for transcriptome annotation and quantification using RNA-seq. Nature methods, 8(6), 469-77. u  Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., & Gilad, Y. (2008). RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research, 18(9), 1509-17. u  Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods, 5(7), 621-8. u  Ozsolak, F., & Milos, P. M. (2011). RNA sequencing: advances, challenges and opportunities. Nature reviews. Genetics, 12(2), 87-98. u  Toung, J. M., Morley, M., Li, M., & Cheung, V. G. (2011). RNA-sequence analysis of human B-cells. Genome research, 991-998. u  Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews. Genetics, 10(1), 57-63. n  Bowtie/Tophat: u  Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 10(3), R25. u  Trapnell, C., Pachter, L., & Salzberg, S. L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (Oxford, England), 25(9), 1105-11. n  Cufflinks: u  Roberts, A., Pimentel, H., Trapnell, C., & Pachter, L. (2011). Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics (Oxford, England), 27(17), 2325-9. u  Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., et al. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7(3), 562-578. u  Trapnell, C., Williams, B. a, Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology, 28(5), 511-5. n  DEXSeq: u  Vignette: http://watson.nci.nih.gov/bioc_mirror/packages/2.9/bioc/html/DEXSeq.html. u  Pre-pub manuscript: Anders, S., Reyes, A., Huber, W. (2012). Detecting differential usage of exons from RNA-Seq data. Nautre Precedings, DOI: 10.1038/npre.2012.6837.2.
  • 48. GENETIC ANALYSIS of Complex Human Diseases Online Community Forum and Discussion n Seqanswers u  http://SEQanswers.com u  Format: Forum u  Li et al. SEQanswers : An open access community for collaboratively decoding genomes. Bioinformatics (2012). n BioStar: u  http://biostar.stackexchange.com u  Format: Q&A u  Parnell et al. BioStar: an online question & answer resource for the bioinformatics community. PLoS Comp Bio (2011). n  Other Bioinformatics Resources: stephenturner.us/p/edu
  • 49. GENETIC ANALYSIS of Complex Human Diseases DNA Methylation: Importance n Occurs most frequently at CpG sites n High methylation at promoters ≈ silencing n Methylation perturbed in cancer n Methylation associated with many other complex diseases: neural, autoimmune, response to env. n Mapping DNA methylation à new disease genes & drug targets.
  • 50. GENETIC ANALYSIS of Complex Human Diseases DNA Methylation: Challenges n Dynamic and tissue-specific n DNA à Collection of cells which vary in 5meC patterns à 5meC pattern is complex. n Further, uneven distribution of CpG targets n Multiple classes of methods: u Bisulfite, sequence-based: Assay methylated target sequences across individual DNAs. u Affinity enrichment, count-based: Assay methylation level across many genomic loci.
  • 51. GENETIC ANALYSIS of Complex Human Diseases DNA Methylation: Mapping BS-Seq Whole-genome bisulfite sequencing RRBS-Seq Reduced representation bisulfite sequencing BC-Seq Bisulfite capture sequencing BSPP Bisulfite specific padlock probes Methyl-Seq Restriction enzyme based methyl-seq MSCC Methyl sensitive cut counting HELP-Seq HpaII fragment enrichment by ligation PCR MCA-Seq Methylated CpG island amplification MeDIP-Seq Methylated DNA immunoprecipitation MBP-Seq Methyl-binding protein sequencing MethylCap-seq Methylated DNA capture by affinity purification MIRA-Seq Methylated CpG island recovery assay RNA-Seq High-throughput cDNA sequencing DNA Methylation Gene Expression
  • 52. GENETIC ANALYSIS of Complex Human Diseases Methylation: REs and PCR n Restriction enzyme digest u Isoschizomers HpaII and MspI both recognize same sequence: 5’-CCGG-3’ u MspI digests regardless of methylation u HpaII only digests at unmethylated sites n PCR à gel electrophoresis à southern blot n Pros: Highly sensitive n Cons: Low-throughput, high false positive rate because of incomplete digestion (for reasons other than methylation).
  • 53. GENETIC ANALYSIS of Complex Human Diseases Bisulfite sequencing n  Sodium bisulfite converts unmethylated (but not methylated) C’s into U’s. n  This introduces a methylation-specific “SNP”. n  RRBS – library enriched for CpG-dense regions by digesting with MspI.
  • 54. GENETIC ANALYSIS of Complex Human Diseases MeDIP-Seq n MeDIP-Seq = Methylated DNA immunoprecipitation n Uses antibody against 5- methylcytosine to retrieve methylated fragments from sonicated DNA. n Enrichment method = count number of reads
  • 55. GENETIC ANALYSIS of Complex Human Diseases MethylCap-Seq n Uses methyl-binding domain (MBD) protein to obtain DNA with similar methylation levels. n Also a counting method.
  • 56. GENETIC ANALYSIS of Complex Human Diseases Methylation: Accuracy n  Bock et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature biotechnology, 28(10), 1106-14. n  MeDIP, MethylCap, RRBS largely concordant with Illumina Infinium assay
  • 57. GENETIC ANALYSIS of Complex Human Diseases Methylation methods: coverage n Coverage varies among different methods
  • 58. GENETIC ANALYSIS of Complex Human Diseases Methylation: Features & Biases
  • 59. GENETIC ANALYSIS of Complex Human Diseases Methylation: Bioinformatics Resources Resource   Purpose   URL  Refs   Batman   MeDIP  DNA  methyla8on  analysis  tool   hKp://td-­‐blade.gurdon.cam.ac.uk/sokware/batman   BDPC   DNA  methyla8on  analysis  plalorm   hKp://biochem.jacobs-­‐university.de/BDPC   BSMAP   Whole-­‐genome  bisulphite  sequence  mapping   hKp://code.google.com/p/bsmap   CpG  Analyzer   Windows-­‐based  program  for  bisulphite  DNA   -­‐   CpGcluster   CpG  island  iden8fica8on   hKp://bioinfo2.ugr.es/CpGcluster   CpGFinder   Online  program  for  CpG  island  iden8fica8on   hKp://linux1.sokberry.com   CpG  Island  Explorer   Online  program  for  CpG  Island  iden8fica8on   hKp://bioinfo.hku.hk/cpgieintro.html   CpG  Island  Searcher   Online  program  for  CpG  Island  iden8fica8on   hKp://cpgislands.usc.edu   CpG  PaKernFinder   Windows-­‐based  program  for  bisulphite  DNA   -­‐   CpG  Promoter   Large-­‐scale  promoter  mapping  using  CpG  islands   hKp://www.cshl.edu/OTT/html/cpg_promoter.html   CpG  ra8o  and  GC  content  PloKer   Online  program  for  ploMng  the  observed:expected  ra8o  of  CpG   hKp://mwsross.bms.ed.ac.uk/public/cgi-­‐bin/cpg.pl   CpGviewer   Bisulphite  DNA  sequencing  viewer   hKp://dna.leeds.ac.uk/cpgviewer   CyMATE   Bisulphite-­‐based  analysis  of  plant  genomic  DNA   hKp://www.gmi.oeaw.ac.at/en/cymate-­‐index/   EMBOSS  CpGPlot/  CpGReport   Online  program  for  ploMng  CpG-­‐rich  regions   hKp://www.ebi.ac.uk/Tools/emboss/cpgplot/index.html   Epigenomics  Roadmap   NIH  Epigenomics  Roadmap  Ini8a8ve  homepage   hKp://nihroadmap.nih.gov/epigenomics   Epinexus   DNA  methyla8on  analysis  tools   hKp://epinexus.net/home.html   MEDME   Sokware  package  (using  R)  for  modelling  MeDIP  experimental  data   hKp://espresso.med.yale.edu/medme   methBLAST   Similarity  search  program  for  bisulphite-­‐modified  DNA   hKp://medgen.ugent.be/methBLAST   MethDB   Database  for  DNA  methyla8on  data   hKp://www.methdb.de   MethPrimer   Primer  design  for  bisulphite  PCR   hKp://www.urogene.org/methprimer   methPrimerDB   PCR  primers  for  DNA  methyla8on  analysis   hKp://medgen.ugent.be/methprimerdb   MethTools   Bisulphite  sequence  data  analysis  tool   hKp://www.methdb.de   MethyCancer  Database   Database  of  cancer  DNA  methyla8on  data   hKp://methycancer.psych.ac.cn   Methyl  Primer  Express   Primer  design  for  bisulphite  PCR   hKp://www.appliedbiosystems.com/   Methylumi   Bioconductor  pkg  for  DNA  methyla8on  data  from  Illumina   hKp://www.bioconductor.org/packages/bioc/html/   Methylyzer   Bisulphite  DNA  sequence  visualiza8on  tool   hKp://ubio.bioinfo.cnio.es/Methylyzer/main/index.html   mPod   DNA  methyla8on  viewer  integrated  w/  Ensembl  genome  browser   hKp://www.compbio.group.cam.ac.uk/Projects/   PubMeth   Database  of  DNA  methyla8on  literature   hKp://www.pubmeth.org   QUMA   Quan8fica8on  tool  for  methyla8on  analysis   hKp://quma.cdb.riken.jp   TCGA  Data  Portal   Database  of  TCGA  DNA  methyla8on  data   hKp://cancergenome.nih.gov/dataportal  
  • 60. GENETIC ANALYSIS of Complex Human Diseases Methylation: Further Reading Bock, C., Tomazou, E. M., Brinkman, A. B., Müller, F., Simmer, F., Gu, H., Jäger, N., et al. (2010). Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature biotechnology, 28(10), 1106-14. Brinkman, A. B., Simmer, F., Ma, K., Kaan, A., Zhu, J., & Stunnenberg, H. G. (2010). Whole-genome DNA methylation profiling using MethylCap-seq. Methods (San Diego, Calif.), 52(3), 232-6. Brunner, A. L., Johnson, D. S., Kim, S. W., Valouev, A., Reddy, T. E., et al. (2009). Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver, 1044-1056. Gu, H., Bock, C., Mikkelsen, T. S., Jäger, N., Smith, Z. D., Tomazou, E., Gnirke, A., et al. (2010). Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nature methods, 7(2), 133-6. Harris, R. A., Wang, T., Coarfa, C., Nagarajan, R. P., Hong, C., Downey, S. L., Johnson, B. E., et al. (2010). Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature biotechnology, 28(10), 1097-105. Kerick, M., Fischer, A., & Schweiger, M.-ruth. (2012). Bioinformatics for High Throughput Sequencing. (N. Rodríguez-Ezpeleta, M. Hackenberg, & A. M. Aransay, Eds.), 151-167. New York, NY: Springer New York. Laird, P. W. (2010). Principles and challenges of genomewide DNA methylation analysis. Nature reviews. Genetics, 11(3), 191-203. Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., & Schübeler, D. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nature genetics, 37(8), 853-62. doi:10.1038/ng1598
  • 61. GENETIC ANALYSIS of Complex Human Diseases Thank you Web: bioinformatics.virginia.edu E-mail: bioinformatics@virginia.edu Blog: www.GettingGeneticsDone.com Twitter: twitter.com/genetics_blog