1. XPRIME: A Novel Motif Searching Method
Rachel L. Poulsen
Department of Statistics
Brigham Young University
June 15, 2009
2. Introduction
DNA contains the genetic instructions that uniquely define an
organism
RNA is created to carry genetic instructions from the DNA to
the rest of the cell
3. Introduction
DNA contains the genetic instructions that uniquely define an
organism
RNA is created to carry genetic instructions from the DNA to
the rest of the cell
The process of DNA “talking” to the rest of the cell is called
transcription
11. De Novo motif searching
Regular expression enumeration
12. De Novo motif searching
Regular expression enumeration
1 Actual count vs. expected count
2 Dictionary-based sequence model (Bussemaker et al. 2000)
13. De Novo motif searching
Regular expression enumeration
1 Actual count vs. expected count
2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating
14. De Novo motif searching
Regular expression enumeration
1 Actual count vs. expected count
2 Dictionary-based sequence model (Bussemaker et al. 2000)
PWM updating
1 MEME (Bailey et al 1995)
2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993)
3 BioProspector (Liu et al 2001)
4 AlignACE (Roth et al 1998)
15. Known Motif Search
1 GREP
2 Database search with scoring function (Hertz et al 1990)
17. XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literature
Most methods justify results using TRANSFAC
18. XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literature
Most methods justify results using TRANSFAC
XPRIME incorporates prior information
19. XPIME: An Improved Method
TRANSFAC (Matys et al 2003)
Information pulled from in vitro experiments and literature
Most methods justify results using TRANSFAC
XPRIME incorporates prior information
XPRIME can search for both de novo motifs and known motifs
simultaneously
21. Notation and Data
Indices
w: width of motif
L: length of sequence
m: motif indicator
i: position in sequence
j: position in motif
s: indicates sequence
22. Notation and Data
Indices
w: width of motif
L: length of sequence
m: motif indicator
i: position in sequence
j: position in motif
s: indicates sequence
The data, zs
23. Notation and Data
Indices
w: width of motif
L: length of sequence
m: motif indicator
i: position in sequence
j: position in motif
s: indicates sequence
The data, zs
zs = (yis , ∆1i , ∆2i , · · · , ∆(m+1)i )
yi represents the position (w-mer)
∆mi indicates if yi belongs to motif m or not
∆(m+1)i indicates if yi belongs to the backgrond motif or not
26. Methods: Complete Data Likelihood
(m+1) – component mixture model
Ls
L(θ|z) = C (yi )[r1 f1 (yi )]∆1i [r2 f2 (yi )]∆2i · · · [rm+1 fm+1 ]∆(m+1)i
i=1
f(y) is the Motif Score equation
27. Methods: Priors
fm+1 (y ) is fixed a priori
∆(m+1)i ’s are missing a priori
f1 (y ), · · · , fm (y ) have product Dirichlet priors such that
L
ap
mij
−1
π(fm (y )) ∝ pmjk
j=1 k∈(A,C ,G ,T )
r also has a Dirichlet prior
M
ari −1
π(r) ∝ ri
i=1
30. Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM (y )
2 Draws r from a Dirichlet distribution
L
αr = i=1 ∆Mi + aM
31. Methods: Gibbs Algorithm
1 Draws ∆’s from a multinomial distribution
p∆ ∝ rM ∗ fM (y )
2 Draws r from a Dirichlet distribution
L
αr = i=1 ∆Mi + aM
3 Draws pmij from a Dirichlet distribution
L
αpmij = i=1 k={A,C ,G ,T } ∆mi I (yij = k) + apmij
32. An Example: ETS1
We hypothesize that ETS1 has a specific binding site
The Data
1 ETS1 only
2 GABP only
3 ETS1 and GABP
33. ETS1 Binding Motifs
(a) ETS1 from TRANSFAC (b) ETS1 from ETS1 only
(c) ETS1 from GABP only (d) ETS1 from ETS1/GABP
37. Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
38. Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
39. Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
40. Conclusions and Future Research
XPRIME successfully searches for de novo and known motifs
Evidence found suggesting ETS1 has its own binding motif
Hidden Markov Models and forward backward algorithm
Prior information on r