Using combined evidence from replicates to evaluate ChIP-seq peaks.
The analysis of ChIP-seq samples outputs a number of enriched regions, each indicating a protein-DNA interaction or a specific chromatin modification. Enriched regions (commonly known as "peaks") are called when the read distribution is significantly different from the background and its corresponding significance measure (p-value) is below a user-defined threshold.
When replicate samples are analysed, overlapping enriched regions are expected. This repeated evidence can therefore be used to locally lower the minimum significance required to accept a peak. Here, we propose a method for joint analysis of weak peaks.
Given a set of peaks from (biological or technical) replicates, the method combines the p-values of overlapping enriched regions: users can choose a threshold on the combined significance of overlapping peaks and set a minimum number of replicates where the overlapping peaks should be present. The method allows the "rescue" of weak peaks occuring in more than one replicate and outputs a new set of enriched regions for each replicate.
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
MSPC: Joint analysis of ChIP-seq replicates
1. POLITECNICO
DI MILANO
Department of Electronics,
Information and Bioengineering
July 20, 2015
Using combined evidence
from replicates to evaluate
ChIP-seq peaks
Vahid Jalili
Vahid Jalili (vahid.jalili@polimi.it)
Matteo Matteucci (matteo.matteucci@polimi.it)
Marco Masseroli (marco.masseroli@polimi.it)
Marco Morelli (marco.morelli@iit.it)
Website: https://mspc.codeplex.com
2. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 2
Motivation
Tagcount
Genomic DNA
Signal Background
ChIP-seq sample
True Positive False Positive
False Negative True Negative
Stringent
Threshold
Permissive
Threshold
Stringent
Threshold
Permissive
Threshold
3. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 3
Motivation
Benefit from Replicates
Utilize replicates to discriminate between
sub-threshold binding from truly none-bounding regions
Tagcount
Genomic DNA
Signal Background
Replicate 1
Replicate 2
Tagcount
4. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 4
Motivation
Benefit from Replicates
5. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 5
Method
Notations
𝒯 𝑠
𝒯 𝑤
Strong threshold
Weak threshold
𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯 𝑠
Strong Peak
Weak Peak
𝒯 𝑠
< 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯 𝑤
6. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 6
Method
Combining Evidences
𝑋2𝑘
2
follows a 𝜒2
distribution with 2𝑘 degrees of freedom.
Alternatives for combining test statistics :
Liptak’s method (Liptak, 1958)
Mudholkar and George (Mudholkar & George, 1979)
Wilkinson’s method (Wilkinson, 1951)
Truncated product method (Zaykin D. , Zhivotovsky, Westfall, & Weir, 2002)
…
How to combine evidences ?
Fisher’s combined probability test
𝑋2𝑘
2
= −2
𝑖=1
𝑘
ln 𝑝𝑖
𝐶𝑜𝑛𝑓𝑖𝑟𝑚, 𝑋2𝑘
2
≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝐷𝑖𝑠𝑐𝑎𝑟𝑑, 𝑋2𝑘
2
< 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
7. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 7
Method
Combining Evidences
Replicate 1
Replicate 2
Replicate 3
Which evidences to combine ?
Replicate 4
8. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 8
Method
Combining Evidences
Replicate 1
Replicate 2
Replicate 3
Which evidences to combine ?
Replicate 4
9. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 9
Method
Combining Evidences
Replicate 1
Replicate 2
Replicate 3
Which evidences to combine ?
Replicate 4
10. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 10
Method
Combining Evidences
Replicate 1
Replicate 2
Replicate 3
Which evidences to combine ?
Replicate 4
11. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 11
Method
Intersection DeterminationThe Challenge …
an optimal method for finding the intersections
Sorted Lists
Naïve method
Hashing Based
Interval Trees
𝑶 𝒎 𝒏
𝑶 𝒏 𝒎
𝑶
𝒏 𝒍𝒐𝒈 𝟐
𝒘
𝒘
+ 𝒎𝒓
𝑶 𝒏 log 𝟐 𝒏
S o m e Po s s i b l e M e t h o d s
• 𝑛 average peaks count on a sample
• 𝑚 sample count
M e t h o d ’s C o m p l ex i t y
• 𝑤 number of bits in a machine-word
• 𝑟 intersection size
12. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 12
Method
Intersection DeterminationInterval Trees
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
[ 16 , 21 ]
Data
[ 8 , 9 ]
Data
[ 25 , 30 ]
Data
[ 17 , 19 ]
Data
[ 26 , 27 ]
Data
[ 19 , 20 ]
Data
[ 15 , 23 ]
Data
[ 5 , 8 ]
Data
[ 6 , 10 ]
Data
[ 0 , 3 ]
Data
13. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 13
Method
Algorithm
14. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 14
Method
Algorithm
15. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 15
Method
Algorithm
16. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 16
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3
R 1 (weak peak)
R 4 (strong region)
R 3 (weak peak)
Algorithm
… an example
R 2 (weak peak)
R 1 (weak peak)
17. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 17
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
R 3 (weak peak)
Algorithm
… an example
R 2 (weak peak)
Determine intersecting regions across all samples
R 1 (weak peak)
R 2 (weak peak) R 3 (weak peak)
18. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 18
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 1 (weak peak)
R 2 (weak peak) R 3 (weak peak)
If multiple regions determined intersecting on a
sample, choose the strongest one
R 3 (weak peak)
Determine intersecting regions across all samples
19. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 19
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 1 (weak peak)
R 2 (weak peak) R 3 (weak peak)
If multiple regions determined intersecting on a
sample, choose the strongest one
Determine intersecting regions across all samples
Combine test statistics using Fisher’s method
20. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 20
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 1 (weak peak)
R 2 (weak peak) R 3 (weak peak)
If multiple regions determined intersecting on a
sample, choose the strongest one
Determine intersecting regions across all samples
Combine test statistics using Fisher’s method
𝑋2
≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? NO !
21. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 21
Method
Algorithm
██ Confirmed Peaks Set
██ Discarded Peaks Set
Algorithm
… an example
R 1
I n t e r m e d i a t e S e t s
Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3
R 2
22. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 22
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 1 (weak peak)
R 2 (weak peak) R 3 (weak peak)
Determine intersecting regions across all samples
R 2 (weak peak)
Since R2 intersects only with R1, and R1-R2 test is
already performed, no further process will be taken
23. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 23
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 1 (weak peak)
R 3 (weak peak)
Determine intersecting regions across all samples
R 2 (weak peak) R 3 (weak peak)
R 4 (strong region)
R 1 (weak peak)
Combine test statistics using Fisher’s method
𝑋2
≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES !
24. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 24
Method
AlgorithmAlgorithm
… an example
██ Confirmed Peaks Set
██ Discarded Peaks Set
R 1
I n t e r m e d i a t e S e t s
Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3
R 2
R 3 R 4
25. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 25
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 3 (weak peak)R 2 (weak peak)
R 1 (weak peak)
R 4 (strong region)
Determine intersecting regions across all samples
Combine test statistics using Fisher’s method
𝑋2
≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES !
R 3 (weak peak)
26. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 26
Method
AlgorithmAlgorithm
… an example
██ Confirmed Peaks Set
██ Discarded Peaks Set
I n t e r m e d i a t e S e t s
Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3
R 2
R 3 R 4
R 1
27. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 27
Method
AlgorithmAlgorithm
… an example
I n t e r m e d i a t e S e t s
Re p l i c a t e 1 Re p l i c a t e 2 Re p l i c a t e 3
R 2
R 3 R 4
R 1
R 1
██ Confirmed Peaks Set
██ Discarded Peaks Set
██ Output Set
O u t p u t S e t s
28. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 28
Method
Algorithm
Replicate 1
Replicate 2
Replicate 3 R 4 (strong region)
Algorithm
… an example
R 3 (weak peak)R 2 (weak peak)
R 1 (weak peak)
R 2 (weak peak)
R 1 (weak peak)
R 3 (weak peak)
R 4 (strong region)
29. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 29
Results
Myc2_1
0e+002e+044e+046e+048e+041e+05
Myc2_2
Myc3_1
050001000015000200002500030000
Myc3_2
Myc2_1
0e+002e+044e+046e+048e+041e+05
Myc2_2 Myc3_1 Myc3_2
Abbreviation File name
Myc2_1 wgEncodeSydhTfbsK562CmycIggrabAlnRep1
Myc2_2 wgEncodeSydhTfbsK562CmycIggrabAlnRep2
Myc3_1 wgEncodeSydhTfbsK562CmycStdAlnRep1
Myc3_2 wgEncodeSydhTfbsK562CmycStdAlnRep2
Category Abbreviation Color Implication
Input (source BED file) In
██ Strong
██ Weak
Analysis Results Re
██ Strong Confirmed
██ Weak Confirmed
██ Weak Discarded
S e t 1 S e t 2 Set 3
In Re In Re In Re In Re In Re In Re In Re In Re
30. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 31
Results
Motif was enriched in the sequence defined by peaks
Motif was NOT enriched in the sequence defined by peaks
Presence of Ebox
31. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 32
Implementation
Performance
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25 30 35 40 45
Time(seconds)
Peaks Count
x 10000
Running Time
2-Replicates 4-Replicates 6-Replicates
Demo
32. P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 33
Questions
Q u e s t i o n s
arewelcomeat: https://mspc.codeplex.com/discussions