Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Fauteux Seeder Bosc2009
1. Seeder: Perl Modules for
Cis-regulatory Motif Discovery
Bioinformatics Open Source Conference
June 28 2009, Stockholm
François Fauteux
Department of Plant Science
McGill University
Macdonald campus
2. Introduction
• Precise control of where,
when and at which level
transcription occurs
• Synthetic promoter
engineering
M. Venter, Trends Plant Sci 12, 118 (2007).
4. DNA Motif Discovery
• Searching for imperfect
copies of an unknown pattern
• Sequence-driven
approaches: not guaranteed to
yield a global optimum
• Enumerative approaches:
computationally expensive
• Convergence towards low-
complexity motifs
D. GuhaThakurta, Nucleic Acids Res 34, 3585 (2006). W. W. Wasserman, A. Sandelin,
Nat Rev Genet 5, 276 (2004).
5. Seeder Algorithm: Input
• Set B={B1,...,Bm} of background sequences
• Set P={P1,...,Pn} of positive sequences
• Length k of the motif seed
• Length l of the full motif to discover
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
6. Seeder::Background
• Enumerate all words [A C G T]
• SMD: smallest HD between w and a |w|-length substring of s
• SMDs between word w and background sequences
probability distribution gw(y)
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
7. Seeder::Finder
• Sum S(w) of SMDs between w and
positive sequences p-value
• Closest match to word w* (min. q-value) found in each
positive sequence seed PWM
• Matrix is extended to motif width and sites maximizing the
score to the extended weight matrix are selected
• PWM is built from those sites and the process is iterated
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
8. Seeder::Index
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
9. Seeder::Index
• List of indices corresponding
to words of increasing HD
• Efficient lookup of minimally
distant subsequence
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
11. Benchmark Against Popular Tools
• Binding site sequences from the Transfac database
G. K. Sandve, O. Abul, V. Walseng, F. Drablos, BMC Bioinformatics 8, 193 (2007).
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).