SlideShare ist ein Scribd-Unternehmen logo
1 von 16
diffReps: automated ChIP-seq
differential analysis package
Li Shen
Asst. Professor
Neuroscience, Mount Sinai
06/28/2013
Slides adapted from previous presentation
ChIP-seq differential analysis
Treatment
(coc i.p.)
Control
(sal i.p.)
Rep1
Rep2
Rep3
Rep1
Rep2
Rep3
Differences
Venn diagram for peak lists
Treatment Control
False
positive
False
negativeTreatment Control 2
Subtle changes of chromatin
modifications
H3K4me3 from ENCODE
K562
ESC
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
ASUN: Asunder, Spermatogenesis Regulator
[0, 1.2]
[0, 1.2]
3
Existing programs for differential
analysis
• ChIPDiff(2008): HMM-based
approach. NOT sensitive
enough for brain data.
• Peak-based: DIME(2011),
DBChIP(2012). Caveats.
• Read counts +
DESeq(2010)/edgeR(2010):
Not convenient to use.
K562
ESC
Peaks
4
diffReps: a ChIP-seq differential analysis package
• Written in PERL, easy
to use command line
tool; Do everything in
one command.
• Sliding window
strategy.
Background
modeling
Normalization
Differential
test
Merge and re-
test
Multiple
testing
correction
Workflow
diffReps.pl -tr A.bed B.bed -co C.bed D.bed -gn mm9 -re report.txt
Google code:
5
Differential analysis & tail behavior
Gaussian: p=1E-5
Empirical: p=1E-5
 H3K4me3 from mouse
brain; bin1kb counts
normalized.
6
Statistical tests for differential analysis
• Negative binomial test:
models biological replicates,
over-dispersion
• T-test: NOT recommended
• X2 test: SUM((exp – emp)^2)
=> X2 distr (p-val).
• G-test: SUM(ln(emp / exp))
=> X2 distr (p-val). A
modification to X2 test,
recommended.
diffReps on H3K4me3: cocaine vs. saline
Negative
binomial
test T-test6527
282
130 7
Two additional tools
1. Find hotspots - hotspots are regions where the differential
sites or peaks occur significantly more often than random
chance.
Hotspot
Differential sites
Greedy search algorithm
Local Poisson
Eval
2. Region analysis - any file with the first 3 columns to be:
chromosome, start, end. Annotate gene and heterochromatic
regions
Easy to use: region_analysis.pl -i input.txt
8
Test data: ENCODE H3K4me3 between
K562 and ESC
Target: H3K4me3 Mock: DNA Input
Identify differential chromatin
modification sites
ESC K562
Rep1
Rep2
Rep1
Rep2
Estimate empirical false
positive rate
9
Sensitivity & Specificity
Target
Mock
Negative binomial vs. G-test
eFDR < .05%
10
Overlapped & specific sites
Up-regulated sites, do the same for down sites
“Specific”
“Overlapped”
Promoter
Genebody Promoter Genebody
Using default
p<1E-4
RNA-seq
11
Correlating differential sites with transcription
“Specific”“Overlapped”
K562, ESC RNA-seq TopHat-Cufflinks: gene exp change,
alternative promoter/splicing
12
diffReps “specific” sites - examples
13
diffReps is used in many works
Big cocaine project:
14
diffReps: current status & community
feedback
diffReps
published
Great to see diffreps has found a nice home in plos one. It is
literally the program which has saved my sanity, my phD and
probably the paper i'm writing!
- Michael Reschen, Oxford Univ., UK
15
http://dx.plos.org/10.1371/journal.pone.0065598
Acknowledgement
Role Li Shen Ningyi Shao Xiaochuan Liu Eric Nestler
Development
Test & result
Documentation
Google code
Money$
diffReps:
16

Weitere ähnliche Inhalte

Was ist angesagt?

Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Amazon Web Services
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframeJaemun Jung
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Neo4j
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons Provectus
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 
GPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelGPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelZimin Park
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLDatabricks
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedDatabricks
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
DRS-111 Data Structure and Data Collection Methods.pdf
DRS-111 Data Structure and Data Collection Methods.pdfDRS-111 Data Structure and Data Collection Methods.pdf
DRS-111 Data Structure and Data Collection Methods.pdfNay Aung
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 

Was ist angesagt? (20)

Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
Analyzing and processing streaming data with Amazon EMR - ADB204 - New York A...
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 
GPT : Generative Pre-Training Model
GPT : Generative Pre-Training ModelGPT : Generative Pre-Training Model
GPT : Generative Pre-Training Model
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
DRS-111 Data Structure and Data Collection Methods.pdf
DRS-111 Data Structure and Data Collection Methods.pdfDRS-111 Data Structure and Data Collection Methods.pdf
DRS-111 Data Structure and Data Collection Methods.pdf
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 

Ähnlich wie diffReps: automated ChIP-seq differential analysis package

RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_finalLawrence Hwang
 
[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNN[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNNDonghyeon Kim
 
A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...Thermo Fisher Scientific
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaDelaina Hawkins
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaGolden Helix
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...Thermo Fisher Scientific
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Kate Barlow
 
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesDetecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesThermo Fisher Scientific
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsThermo Fisher Scientific
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 

Ähnlich wie diffReps: automated ChIP-seq differential analysis package (20)

RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final
6-8-2015 AACC Poster HIV p24 S-PLEX - Stengelin_final
 
[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNN[Research] Detection of MCI using EEG Relative Power + DNN
[Research] Detection of MCI using EEG Relative Power + DNN
 
JClinChem_2003
JClinChem_2003JClinChem_2003
JClinChem_2003
 
A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemia
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemia
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...
Direct Sanger CE Sequencing of Individual Ampliseq Cancer Panel Targets from ...
 
ACMG Workshop 2011
ACMG Workshop 2011ACMG Workshop 2011
ACMG Workshop 2011
 
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
 
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing TracesDetecting and Quantifying Low Level Variants in Sanger Sequencing Traces
Detecting and Quantifying Low Level Variants in Sanger Sequencing Traces
 
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electro...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
Cignal webina
Cignal webinaCignal webina
Cignal webina
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 

Kürzlich hochgeladen

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 

Kürzlich hochgeladen (20)

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 

diffReps: automated ChIP-seq differential analysis package

  • 1. diffReps: automated ChIP-seq differential analysis package Li Shen Asst. Professor Neuroscience, Mount Sinai 06/28/2013 Slides adapted from previous presentation
  • 2. ChIP-seq differential analysis Treatment (coc i.p.) Control (sal i.p.) Rep1 Rep2 Rep3 Rep1 Rep2 Rep3 Differences Venn diagram for peak lists Treatment Control False positive False negativeTreatment Control 2
  • 3. Subtle changes of chromatin modifications H3K4me3 from ENCODE K562 ESC ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ASUN: Asunder, Spermatogenesis Regulator [0, 1.2] [0, 1.2] 3
  • 4. Existing programs for differential analysis • ChIPDiff(2008): HMM-based approach. NOT sensitive enough for brain data. • Peak-based: DIME(2011), DBChIP(2012). Caveats. • Read counts + DESeq(2010)/edgeR(2010): Not convenient to use. K562 ESC Peaks 4
  • 5. diffReps: a ChIP-seq differential analysis package • Written in PERL, easy to use command line tool; Do everything in one command. • Sliding window strategy. Background modeling Normalization Differential test Merge and re- test Multiple testing correction Workflow diffReps.pl -tr A.bed B.bed -co C.bed D.bed -gn mm9 -re report.txt Google code: 5
  • 6. Differential analysis & tail behavior Gaussian: p=1E-5 Empirical: p=1E-5  H3K4me3 from mouse brain; bin1kb counts normalized. 6
  • 7. Statistical tests for differential analysis • Negative binomial test: models biological replicates, over-dispersion • T-test: NOT recommended • X2 test: SUM((exp – emp)^2) => X2 distr (p-val). • G-test: SUM(ln(emp / exp)) => X2 distr (p-val). A modification to X2 test, recommended. diffReps on H3K4me3: cocaine vs. saline Negative binomial test T-test6527 282 130 7
  • 8. Two additional tools 1. Find hotspots - hotspots are regions where the differential sites or peaks occur significantly more often than random chance. Hotspot Differential sites Greedy search algorithm Local Poisson Eval 2. Region analysis - any file with the first 3 columns to be: chromosome, start, end. Annotate gene and heterochromatic regions Easy to use: region_analysis.pl -i input.txt 8
  • 9. Test data: ENCODE H3K4me3 between K562 and ESC Target: H3K4me3 Mock: DNA Input Identify differential chromatin modification sites ESC K562 Rep1 Rep2 Rep1 Rep2 Estimate empirical false positive rate 9
  • 10. Sensitivity & Specificity Target Mock Negative binomial vs. G-test eFDR < .05% 10
  • 11. Overlapped & specific sites Up-regulated sites, do the same for down sites “Specific” “Overlapped” Promoter Genebody Promoter Genebody Using default p<1E-4 RNA-seq 11
  • 12. Correlating differential sites with transcription “Specific”“Overlapped” K562, ESC RNA-seq TopHat-Cufflinks: gene exp change, alternative promoter/splicing 12
  • 14. diffReps is used in many works Big cocaine project: 14
  • 15. diffReps: current status & community feedback diffReps published Great to see diffreps has found a nice home in plos one. It is literally the program which has saved my sanity, my phD and probably the paper i'm writing! - Michael Reschen, Oxford Univ., UK 15 http://dx.plos.org/10.1371/journal.pone.0065598
  • 16. Acknowledgement Role Li Shen Ningyi Shao Xiaochuan Liu Eric Nestler Development Test & result Documentation Google code Money$ diffReps: 16

Hinweis der Redaktion

  1. Good morning! Thank you for inviting me. I’ve been coming to this meeting for many times but never made any contribution. So today is pay back time.
  2. The first problem I identified is sth. called differential analysis for ChIP-seq data. Basically, you have two groups of animals, one group is treatment and the other group is control. You take samples from these animals and send them for sequencing to measure the chromatin modifications. And you want to compare the two groups to find out the differences in chromatin modifications. This sounds like a straightforward question but the solution is not. Some people may say, well, this is easy, why not do peak calling for each group separately, then compare the two peak lists using a venn diagram, right? Well, this is surely going to be problematic. A treatment-specific peak may not be truly different. You may happen to set the cutoff between the two heights. This leads to false positives. On the other hand, a common peak may actually be different, but you set the cutoff below the two heights. This leads to false negatives. So we definitely need to treat this problem more carefully.
  3. In addition to this problem, we also found, from real data, that some of the chromatin modifications are really subtle. This is an example from the ENCODE project. ChIP-seq was performed on histone mark h3k4me3 in both cell lines, k562 and embryonic stem cell. Clearly, there is a peak at the TSS of the gene ASUN in both cell lines. there appear to be two increased sites at each side of the peak, which seem to be really subtle. And the site downstream of the TSS seems to overlap with this variant exon. Could this chromatin modification site cause the change of the expression of these two isoforms? To answer this kind of question, you really feel like to cut the whole genome into many small slices and determine the chromatin modification at slice.
  4. Unfortunately, when I started to work on this kind of problems, there was very few choices I can make. Back in 2009, it seems there was only one program, called chipdiff, which specifically targets the differential analysis for chip-seq. based on our experience, chipdiff tends to generate very few targets. When I used it on our brain data, it often gave me nothing. It was not until 2011 and 2012, there were two new tools called dime and dbchip which base their differential analysis on peak lists given by another peak calling program. But this kind of approach has caveats. Using the example I just showed you, you may identify a peak in k562 like this, and another peak in stem cell like this, how can you compare these two peaks? It’s very likely you’ll miss these two differential sites. Finally, people have also tried to use deseq and edger on chip-seq data. these two programs are my favorite because they treat statistics seriously. But they were originally designed for rna-seq. to use them on chip-seq, you’ll add a lot of pre- and post-processing steps. So they are not convenient to use.
  5. Out of these frustrations, I decided to develop my own program called diffreps. It is a program package written in PERL. the workflow of diffreps is illustrated here. It goes from background modeling, normalization, all the way down to multiple testing correction. It is typically triggered by one command line like this and do all these things. It uses a sliding window strategy so you won’t miss a thing. Btw, diffreps is developed as an open source project and is hosted on google code.
  6. Across my career, I have heard some people saying things like “it doesn’t matter what kind of distribution you use, they are all about the same”. I do not agree with that. One of the most common mistakes people make on sequencing data is that they do normalization on the read counts, and then assume these values are normally distributed. Here I used a chip-seq dataset from our brain samples. I then calculated the difference between the means of two groups of diffferent conditions. The dot-dashed line shows you the empirical density while the red line shows you the Gaussian fit. As you can see, the two distributions are totally different. The empirical data shows a sharp peak with a long righthand side tail. While the gaussian is much more broad. In differential analysis, it’s all about the tail behavior of the distribution. At p value of 10 to minus 5, this is where the Gaussian cutoff is, and this is where the empirical cutoff is. Look at how big the difference is between them.
  7. So choosing the right statistical test is extremely important for chip-seq differential analysis. In diffreps, we implemented four different tests: negative binomial, t-test, chi-square test and g-test. If you have biological replicates, then negative binomial test is really what you should use. It models the over-dispersion among the biological replicates and control false positives. While t-test really should not be used. I only added it for comparison purpose. If your data do not contain biological replicates, then chi-square test or g-test can be an excellent choice. G-test can be basically considered as a modification to chi-sqaure test and is recommended by some statisticians to replace chi-square. On the top right, this group of people from oregon state has done some very nice comparison between negative binomial and t-test. The conclusion is that t-test is no good: it is not sensitive or specific on sequencing data. But they somehow publish this study in a not so prominent journal so probably most people did notice this paper. But if you are interested in differential analysis, I would suggest you to read it. On the bottom right, I also did some comparison between negative binomial and t-test on our own chip-seq data. The difference is striking. Negative binomial predicts 20 folds more sites than t-test. What’s even worse is that, only less than half of the t-test sites are overlapped by negative binomial. So this really raises a red flag for those who are using t-test on chip-seq or rna-seq data.
  8. Besides differential tests, diffreps also includes two additional tools. The first tool is called find hotspots. A hotspot is basically a region where the differential sites or peaks occur significantly more often than random chance. In this cartoon, these guys are very close to each other and they form a hotspot while this guy is being squared. A greedy search algorithm is designed to identify those hotspots. It basically goes from start to the end and eats a differential site whenever it improves the score. When a hotspot is found, it is evaluated by a local poisson model. The second tool is called region analysis. It is a script which accepts any input file as long as the first 3 columns contain genomic coordinates. It will assign each region to genes or heterochromatic regions.
  9. So we’ve talked a lot of methodology. Now, let’s put diffreps into test. This test dataset is from the ENCODE project. Chip-seq was performed on h3k4me3 between two cell lines: k562 and embryonic stem cell. There are two replicates in each group, the number of aligned reads ranges from 7 to 16 million. We also created a mock dataset using DNA input samples and we mixed the replicates between the two cell lines. The reason of doing that is because the dna input actually contains information about chromatin structures. So we want to remove those biases. By using this mock dataset, we can estimate the empirical false positive rate.
  10. These two figures show you that diffreps predicts much more differential sites than the other approaches at different p-value cutoffs. Although diffreps also produces some differential sites on the mock data, the number decreases rapidly with the p-value cutoff. And the empirical false discovery rate is below for .5% for diffreps. It should also be noticed that g-test is very sensitive and produces much more sites than negative binomial test. It is not surprising because g-test ignores the variation within a group so it tends to have higher false positive rate. But the nice thing about g-test is that it nearly includes negative binomial. So if false positive is not your major concern, g-test can be a excellent choice.
  11. Now, at the default p-value cutoff, diffreps produces a differential site list that basically includes deseq and chipdiff. There are lots of diffreps specific sites that are not overlapped with other methods. A natural question is whether those sites are actually biological, not just noise from the data. So we separate the differential sties into specific and overlapped category, and further classify them based on their location into promoter and genebody. Then correlate those sites with RNA-SEQ data.
  12. The RNA-seq data from the two cell lines were processed using the tophat-cufflinks pipeline. This program not only measures gene expression change, but also more complicated things like alternative promoter usage and alternative splicing. We correlated these different categories of events using fisher’s exact test. When we look at the overlapped category, they correlate very well with gene expression changes. They also show some correlation with alternative promoters but not with alternative splicing. When we look at the diffreps specific category, they also show different kinds of correlation with transcriptional change. So this is very positive, that means a lot of the diffreps specific sites are likely to be biological. What is interesting here is, those diffreps specific sites also correlate with alternative splicing. This seems to suggest that a lot of subtle chromatin modifications are missed by other methods but diffreps can pick them up. So diffreps is a very sensitive method that catches both major and minor changes.
  13. To give you some more intuitive and real examples, we created these two figures. In the upper figure, this micu1 gene has two alternative promoters. The second one is many kb downstream of the first one. The longer TSS has increased expression in k562 cell line. diffreps found two increased sites at the longer TSS. This is consistent with this histone mark’s role as an activation mark at the TSS. In the lower figure, this fanci gene has two isoforms. The second isoform contains a variant exon which has increased expression in k562 cell line. Diffreps found an increased site which overlaps with this variant exon. This seems to suggest a positive role for h3k4me3 in this exon’s inclusion.
  14. As you can see, diffreps can be a very useful tool for chip-seq analysis. We have used it literally on every chip-seq dataset we have. It was used to study morphine-regulated h3k9me2 in mouse brain, a study that was published last year in the journal of neuroscience. It was also used in our big cocaine project to study the cocaine-regulated chromatin modification of 7 different histone marks.
  15. The paper about diffreps is now in production in plos one and shall come online in no time. Recently, I received this email from one of diffreps’ users. This guy from UK said, and I quote, “great to see…”. Well, I am really flattered. Sometimes, I do feel that it is users like this who keep me motived to improve my programs and make them even better.
  16. I thought I could be innovative in this section too. These are two heatmaps that show you each person’s role in the two software. The diffreps is kind of a one man’s project. I pretty much did everything and ningyi helped a lot with testing and results generation. For ngsplot, I developed most part of the code. Ningyi also made some contribution. Leo has been helping with testing, documentation and maintaining the google code page. He also imported it into Galaxy. Eric nestler is all about money.