SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
XPRIME: A Novel Motif Searching Method

             Rachel L. Poulsen

             Department of Statistics
            Brigham Young University


               June 15, 2009
Introduction




      DNA contains the genetic instructions that uniquely define an
      organism
      RNA is created to carry genetic instructions from the DNA to
      the rest of the cell
Introduction




      DNA contains the genetic instructions that uniquely define an
      organism
      RNA is created to carry genetic instructions from the DNA to
      the rest of the cell
      The process of DNA “talking” to the rest of the cell is called
      transcription
Transcription

                DNA
Transcription

                DNA   RNA
Transcription

                DNA   RNA
Position Weight Matrix (PWM) (Hertz et al 1990)
Position Weight Matrix (PWM) (Hertz et al 1990)




       ETS1 TF binding motif
    Position:        1     2       3      4     5       6      7      8       
       A            0.067   0.333    0.0   0.0   1.0   0.533   0.267   0.067
       C        
                   0.933   0.600    0.0   0.0   0.0   0.133   0.067   0.400   
                                                                               
       G           0.000   0.000    1.0   1.0   0.0   0.000   0.667   0.000   
       T            0.000   0.067    0.0   0.0   0.0   0.333   0.000   0.533
Sequence Logos




           Figure: DNA binding motif for the ETS1 TF
De Novo motif searching
De Novo motif searching




      Regular expression enumeration
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   Dictionary-based sequence model (Bussemaker et al. 2000)
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   Dictionary-based sequence model (Bussemaker et al. 2000)
      PWM updating
De Novo motif searching




      Regular expression enumeration
        1   Actual count vs. expected count
        2   Dictionary-based sequence model (Bussemaker et al. 2000)
      PWM updating
        1   MEME (Bailey et al 1995)
        2   Gibbs Motif Sampler (GMS) (Lawrence et al 1993)
        3   BioProspector (Liu et al 2001)
        4   AlignACE (Roth et al 1998)
Known Motif Search




    1   GREP
    2   Database search with scoring function (Hertz et al 1990)
XPIME: An Improved Method
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and literature
         Most methods justify results using TRANSFAC
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and literature
         Most methods justify results using TRANSFAC
         XPRIME incorporates prior information
XPIME: An Improved Method




     TRANSFAC (Matys et al 2003)
         Information pulled from in vitro experiments and literature
         Most methods justify results using TRANSFAC
         XPRIME incorporates prior information
         XPRIME can search for both de novo motifs and known motifs
         simultaneously
Notation and Data
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator
          i: position in sequence
          j: position in motif
          s: indicates sequence
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator
          i: position in sequence
          j: position in motif
          s: indicates sequence
      The data, zs
Notation and Data


      Indices
          w: width of motif
          L: length of sequence
          m: motif indicator
          i: position in sequence
          j: position in motif
          s: indicates sequence
      The data, zs
          zs = (yis , ∆1i , ∆2i , · · · , ∆(m+1)i )
          yi represents the position (w-mer)
          ∆mi indicates if yi belongs to motif m or not
          ∆(m+1)i indicates if yi belongs to the backgrond motif or not
The Scoring Function




                                 w
          MotifScore = f (y) =                     pij I (yj = i).
                                 j=1 i∈A,C ,G ,T
Methods: Complete Data Likelihood




      (m+1) – component mixture model
Methods: Complete Data Likelihood




      (m+1) – component mixture model
               Ls
    L(θ|z) =         C (yi )[r1 f1 (yi )]∆1i [r2 f2 (yi )]∆2i · · · [rm+1 fm+1 ]∆(m+1)i
               i=1


      f(y) is the Motif Score equation
Methods: Priors


      fm+1 (y ) is fixed a priori
      ∆(m+1)i ’s are missing a priori
      f1 (y ), · · · , fm (y ) have product Dirichlet priors such that
                                      L
                                                           ap
                                                            mij
                                                                  −1
                      π(fm (y )) ∝                        pmjk
                                     j=1 k∈(A,C ,G ,T )

      r also has a Dirichlet prior
                                           M
                                                 ari −1
                                π(r) ∝          ri
                                          i=1
Methods: Gibbs Algorithm
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
    2   Draws r from a Dirichlet distribution
                    L
            αr =    i=1   ∆Mi + aM
Methods: Gibbs Algorithm




    1   Draws ∆’s from a multinomial distribution
            p∆ ∝ rM ∗ fM (y )
    2   Draws r from a Dirichlet distribution
                      L
            αr =      i=1   ∆Mi + aM
    3   Draws pmij from a Dirichlet distribution
                        L
            αpmij =     i=1    k={A,C ,G ,T }   ∆mi I (yij = k) + apmij
An Example: ETS1




     We hypothesize that ETS1 has a specific binding site

     The Data
       1   ETS1 only
       2   GABP only
       3   ETS1 and GABP
ETS1 Binding Motifs




       (a) ETS1 from TRANSFAC     (b) ETS1 from ETS1 only




       (c) ETS1 from GABP only   (d) ETS1 from ETS1/GABP
Justification of Prior Information


       Pete Hollenhorst sequence logo
Justification of Prior Information


             Figure: Motif found without prior specification




              Figure: Motif found with prior specification
Conclusions and Future Research
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found suggesting ETS1 has its own binding motif
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found suggesting ETS1 has its own binding motif
      Hidden Markov Models and forward backward algorithm
Conclusions and Future Research




      XPRIME successfully searches for de novo and known motifs
      Evidence found suggesting ETS1 has its own binding motif
      Hidden Markov Models and forward backward algorithm
      Prior information on r

Weitere ähnliche Inhalte

Was ist angesagt?

Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1Vin Voro
 
Fourier transformation
Fourier transformationFourier transformation
Fourier transformationzertux
 
Chapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier TransformationChapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier TransformationVarun Ojha
 
fourier transforms
fourier transformsfourier transforms
fourier transformsUmang Gupta
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transformsIffat Anjum
 
Fourier Transform
Fourier TransformFourier Transform
Fourier TransformAamir Saeed
 
Eece 301 note set 14 fourier transform
Eece 301 note set 14 fourier transformEece 301 note set 14 fourier transform
Eece 301 note set 14 fourier transformSandilya Sridhara
 
Lecture8 Signal and Systems
Lecture8 Signal and SystemsLecture8 Signal and Systems
Lecture8 Signal and Systemsbabak danyal
 
Optics Fourier Transform Ii
Optics Fourier Transform IiOptics Fourier Transform Ii
Optics Fourier Transform Iidiarmseven
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transformskalung0313
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
fourier representation of signal and systems
fourier representation of signal and systemsfourier representation of signal and systems
fourier representation of signal and systemsSugeng Widodo
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsArvind Devaraj
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you needFabian Pedregosa
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) inventionjournals
 
Optics Fourier Transform I
Optics Fourier Transform IOptics Fourier Transform I
Optics Fourier Transform Idiarmseven
 

Was ist angesagt? (20)

Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1
 
Fourier transformation
Fourier transformationFourier transformation
Fourier transformation
 
Chapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier TransformationChapter 5 Image Processing: Fourier Transformation
Chapter 5 Image Processing: Fourier Transformation
 
fourier transforms
fourier transformsfourier transforms
fourier transforms
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transforms
 
Fourier Transform
Fourier TransformFourier Transform
Fourier Transform
 
Eece 301 note set 14 fourier transform
Eece 301 note set 14 fourier transformEece 301 note set 14 fourier transform
Eece 301 note set 14 fourier transform
 
Lecture8 Signal and Systems
Lecture8 Signal and SystemsLecture8 Signal and Systems
Lecture8 Signal and Systems
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 
Optics Fourier Transform Ii
Optics Fourier Transform IiOptics Fourier Transform Ii
Optics Fourier Transform Ii
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transforms
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
fourier representation of signal and systems
fourier representation of signal and systemsfourier representation of signal and systems
fourier representation of signal and systems
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
 
Fourier transform
Fourier transformFourier transform
Fourier transform
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
Optics Fourier Transform I
Optics Fourier Transform IOptics Fourier Transform I
Optics Fourier Transform I
 
Lesson 5: Continuity
Lesson 5: ContinuityLesson 5: Continuity
Lesson 5: Continuity
 

Andere mochten auch

MEMEs in the Classroom
MEMEs in the ClassroomMEMEs in the Classroom
MEMEs in the ClassroomMichael A.
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Luca Cozzuto
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Protein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthProtein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthDan Gaston
 
WTF is meme culture? / memes anatomy.
WTF is meme culture? / memes anatomy.WTF is meme culture? / memes anatomy.
WTF is meme culture? / memes anatomy.Ravard & Co
 
Meme Powerpoint
Meme PowerpointMeme Powerpoint
Meme PowerpointConnor
 

Andere mochten auch (12)

MEMEs in the Classroom
MEMEs in the ClassroomMEMEs in the Classroom
MEMEs in the Classroom
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
 
6 motif and pattern
6   motif and pattern6   motif and pattern
6 motif and pattern
 
Macs course
Macs courseMacs course
Macs course
 
DNA Motif Finding 2010
DNA Motif Finding 2010DNA Motif Finding 2010
DNA Motif Finding 2010
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Protein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthProtein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human Health
 
What Is a Meme
What Is a MemeWhat Is a Meme
What Is a Meme
 
WTF is meme culture? / memes anatomy.
WTF is meme culture? / memes anatomy.WTF is meme culture? / memes anatomy.
WTF is meme culture? / memes anatomy.
 
Meme Powerpoint
Meme PowerpointMeme Powerpoint
Meme Powerpoint
 

Ähnlich wie XPRIME: A Novel Motif Searching Method

IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfgrssieee
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPer Kristian Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片Chyi-Tsong Chen
 
New Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative GamesNew Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative GamesSSA KPI
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)Eric Zhang
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to ReconstructJonas Adler
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
Reading Seminar (140515) Spectral Learning of L-PCFGs
Reading Seminar (140515) Spectral Learning of L-PCFGsReading Seminar (140515) Spectral Learning of L-PCFGs
Reading Seminar (140515) Spectral Learning of L-PCFGsKeisuke OTAKI
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.pptashutoshvb1
 

Ähnlich wie XPRIME: A Novel Motif Searching Method (20)

IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Finding motif
Finding motifFinding motif
Finding motif
 
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
Data Driven Process Optimization Using Real-Coded Genetic Algorithms ~陳奇中教授演講投影片
 
New Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative GamesNew Insights and Applications of Eco-Finance Networks and Collaborative Games
New Insights and Applications of Eco-Finance Networks and Collaborative Games
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)
 
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Fol
FolFol
Fol
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Reading Seminar (140515) Spectral Learning of L-PCFGs
Reading Seminar (140515) Spectral Learning of L-PCFGsReading Seminar (140515) Spectral Learning of L-PCFGs
Reading Seminar (140515) Spectral Learning of L-PCFGs
 
Glowworm Swarm Optimisation
Glowworm Swarm OptimisationGlowworm Swarm Optimisation
Glowworm Swarm Optimisation
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CloSapn
CloSapnCloSapn
CloSapn
 
from_data_to_differential_equations.ppt
from_data_to_differential_equations.pptfrom_data_to_differential_equations.ppt
from_data_to_differential_equations.ppt
 

XPRIME: A Novel Motif Searching Method

  • 1. XPRIME: A Novel Motif Searching Method Rachel L. Poulsen Department of Statistics Brigham Young University June 15, 2009
  • 2. Introduction DNA contains the genetic instructions that uniquely define an organism RNA is created to carry genetic instructions from the DNA to the rest of the cell
  • 3. Introduction DNA contains the genetic instructions that uniquely define an organism RNA is created to carry genetic instructions from the DNA to the rest of the cell The process of DNA “talking” to the rest of the cell is called transcription
  • 5. Transcription DNA RNA
  • 6. Transcription DNA RNA
  • 7. Position Weight Matrix (PWM) (Hertz et al 1990)
  • 8. Position Weight Matrix (PWM) (Hertz et al 1990) ETS1 TF binding motif Position:  1 2 3 4 5 6 7 8  A 0.067 0.333 0.0 0.0 1.0 0.533 0.267 0.067 C   0.933 0.600 0.0 0.0 0.0 0.133 0.067 0.400   G  0.000 0.000 1.0 1.0 0.0 0.000 0.667 0.000  T 0.000 0.067 0.0 0.0 0.0 0.333 0.000 0.533
  • 9. Sequence Logos Figure: DNA binding motif for the ETS1 TF
  • 10. De Novo motif searching
  • 11. De Novo motif searching Regular expression enumeration
  • 12. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000)
  • 13. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000) PWM updating
  • 14. De Novo motif searching Regular expression enumeration 1 Actual count vs. expected count 2 Dictionary-based sequence model (Bussemaker et al. 2000) PWM updating 1 MEME (Bailey et al 1995) 2 Gibbs Motif Sampler (GMS) (Lawrence et al 1993) 3 BioProspector (Liu et al 2001) 4 AlignACE (Roth et al 1998)
  • 15. Known Motif Search 1 GREP 2 Database search with scoring function (Hertz et al 1990)
  • 17. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC
  • 18. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC XPRIME incorporates prior information
  • 19. XPIME: An Improved Method TRANSFAC (Matys et al 2003) Information pulled from in vitro experiments and literature Most methods justify results using TRANSFAC XPRIME incorporates prior information XPRIME can search for both de novo motifs and known motifs simultaneously
  • 21. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence
  • 22. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence The data, zs
  • 23. Notation and Data Indices w: width of motif L: length of sequence m: motif indicator i: position in sequence j: position in motif s: indicates sequence The data, zs zs = (yis , ∆1i , ∆2i , · · · , ∆(m+1)i ) yi represents the position (w-mer) ∆mi indicates if yi belongs to motif m or not ∆(m+1)i indicates if yi belongs to the backgrond motif or not
  • 24. The Scoring Function w MotifScore = f (y) = pij I (yj = i). j=1 i∈A,C ,G ,T
  • 25. Methods: Complete Data Likelihood (m+1) – component mixture model
  • 26. Methods: Complete Data Likelihood (m+1) – component mixture model Ls L(θ|z) = C (yi )[r1 f1 (yi )]∆1i [r2 f2 (yi )]∆2i · · · [rm+1 fm+1 ]∆(m+1)i i=1 f(y) is the Motif Score equation
  • 27. Methods: Priors fm+1 (y ) is fixed a priori ∆(m+1)i ’s are missing a priori f1 (y ), · · · , fm (y ) have product Dirichlet priors such that L ap mij −1 π(fm (y )) ∝ pmjk j=1 k∈(A,C ,G ,T ) r also has a Dirichlet prior M ari −1 π(r) ∝ ri i=1
  • 29. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y )
  • 30. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y ) 2 Draws r from a Dirichlet distribution L αr = i=1 ∆Mi + aM
  • 31. Methods: Gibbs Algorithm 1 Draws ∆’s from a multinomial distribution p∆ ∝ rM ∗ fM (y ) 2 Draws r from a Dirichlet distribution L αr = i=1 ∆Mi + aM 3 Draws pmij from a Dirichlet distribution L αpmij = i=1 k={A,C ,G ,T } ∆mi I (yij = k) + apmij
  • 32. An Example: ETS1 We hypothesize that ETS1 has a specific binding site The Data 1 ETS1 only 2 GABP only 3 ETS1 and GABP
  • 33. ETS1 Binding Motifs (a) ETS1 from TRANSFAC (b) ETS1 from ETS1 only (c) ETS1 from GABP only (d) ETS1 from ETS1/GABP
  • 34. Justification of Prior Information Pete Hollenhorst sequence logo
  • 35. Justification of Prior Information Figure: Motif found without prior specification Figure: Motif found with prior specification
  • 37. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs
  • 38. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif
  • 39. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif Hidden Markov Models and forward backward algorithm
  • 40. Conclusions and Future Research XPRIME successfully searches for de novo and known motifs Evidence found suggesting ETS1 has its own binding motif Hidden Markov Models and forward backward algorithm Prior information on r