SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Short introduction to Bioinformatics
             What are the Probabilistic Models?
                            Sequence Alignment
                             Pairwise Alignment
            Multiple Sequence Alignment Models
                         What is Phylogenetics?
                     Building Phylogenetic Trees
                                   Other Models
                                    Conctact Us




Introduction to Probabilistic Models for Bioinformatics

              Igor Bogicevic (igor.bogicevic@sbgenomics.com)




                                          July 3, 2011




                                                                                                         EVEN BRIDGES
                                                                                                             G E N O M I C S, LLC




  Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.
       At the current moment, given the enormous volumes of sequenced data, one of
       the biggest challenges is not producing, but actually understanding the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.
       A bit more relevant example - random sequence model in DNA .
       Biological sequences are strings from a finite alphabet of residues, most
       commonly either four nucleotides, or twenty amino acids.
       Imagine that a residue a occurs with probability qa , if protein or DNA sequence is
       denoted x1 ...xn , then probability of the whole sequence is:
                                                                     n
                                                                     Y
                                                  qx1 qx2 ...qxn =         qxi
                                                                     i=1
                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.
       Common formats for representing alignments are FASTA and GenBank format




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)
       Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
       etc.)



                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us




Example - Smith-Waterman: A matrix H is built as follows:

                                         H(i, 0) = 0, 0 ≤ i ≤ m
                                         H(0, j) = 0, 0 ≤ j ≤ n


                               if ai = bj then w (ai , bj ) = w (match)
                          or if ai ! = bj then w (ai , bj ) = w (mismatch)

                  8                                                          9
                  >
                  >          0                                               >
                                                                             >
                H(i − 1, j − 1) + w (ai , bj )                 Match/Mismatch
                  <                                                          =
H(i, j) = max                                                                  , 1 ≤ i ≤ m, 1 ≤ j ≤ n
              > H(i − 1, j) + w (ai , −)
              >                                                   Deletion   >
                                                                             >
                 H(i, j − 1) + w (−, bj )                         Insertion
              :                                                              ;



                                                                                                             EVEN BRIDGES
                                                                                                                 G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
               What are the Probabilistic Models?
                              Sequence Alignment
                               Pairwise Alignment
              Multiple Sequence Alignment Models
                           What is Phylogenetics?
                       Building Phylogenetic Trees
                                     Other Models
                                      Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA




                                                                                                           EVEN BRIDGES
                                                                                                               G E N O M I C S, LLC




    Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                What are the Probabilistic Models?
                               Sequence Alignment
                                Pairwise Alignment
               Multiple Sequence Alignment Models
                            What is Phylogenetics?
                        Building Phylogenetic Trees
                                      Other Models
                                       Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                  −      A      C     A       C      A       C        T       A
                        0                                                                       1
                  B−              0      0      0     0       0      0        0        0      0C
                  BA              0      2      1     2       1      2        1        0      2C
                  B                                                                             C
                  BG              0      1      1     1       1      1        1        0      1C
                  B                                                                             C
                  BC              0      0      3     2       3      2        3        2      1C
                  B                                                                             C
                H=B
                  BA              0      2      2     5       4      5        4        3      4C
                                                                                                C
                  BC              0      1      4     4       7      6        7        6      5C
                  B                                                                             C
                  BA              0      2      3     6       6      9        8        7      8C
                  B                                                                             C
                  @C              0      1      4     5       8      8       11       10       9A
                    A             0      2      3     6       7      10      10       10      12




                                                                                                                EVEN BRIDGES
                                                                                                                    G E N O M I C S, LLC




     Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                   −      A      C     A       C      A       C        T       A
                         0                                                                       1
                   B−              0      0      0     0       0      0        0        0      0C
                   BA              0      2      1     2       1      2        1        0      2C
                   B                                                                             C
                   BG              0      1      1     1       1      1        1        0      1C
                   B                                                                             C
                   BC              0      0      3     2       3      2        3        2      1C
                   B                                                                             C
                 H=B
                   BA              0      2      2     5       4      5        4        3      4C
                                                                                                 C
                   BC              0      1      4     4       7      6        7        6      5C
                   B                                                                             C
                   BA              0      2      3     6       6      9        8        7      8C
                   B                                                                             C
                   @C              0      1      4     5       8      8       11       10       9A
                     A             0      2      3     6       7      10      10       10      12

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)
Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A                                                                   EVEN BRIDGES
                                                                                                                     G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.
       Hidden Markov Models - representation of MSA as DAG, observed states are
       individual alignment columns and the hidden states represent the presumed
       ancestral sequence.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.
       Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
       hypothesis that in developing from embryo to adult, animals go through stages
       resembling or representing successive stages in the evolution of their remote
       ancestors.



                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.
       They do not necessarily accurately represent the species evolutionary history as
       the data on which they are based is noisy; the analysis can be confounded by
       horizontal gene transfer, hybridisation between species that were not nearest
       neighbors on the tree before hybridisation takes place, convergent evolution, and
       conserved sequences.

                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Other Models




       Transformational Grammars (Chomsky Hierarchy)
       RNA Structure Analysis Models (RNA contains the interactions - rather than
       preserving the sequence)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Contact Us




       We are Hiring!




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics

Weitere ähnliche Inhalte

Was ist angesagt?

Applying Hidden Markov Models to Bioinformatics
Applying Hidden Markov Models to BioinformaticsApplying Hidden Markov Models to Bioinformatics
Applying Hidden Markov Models to Bioinformatics
butest
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
zamakhan
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 

Was ist angesagt? (20)

Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Applying Hidden Markov Models to Bioinformatics
Applying Hidden Markov Models to BioinformaticsApplying Hidden Markov Models to Bioinformatics
Applying Hidden Markov Models to Bioinformatics
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Exome sequence analysis
Exome sequence analysisExome sequence analysis
Exome sequence analysis
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
 

Andere mochten auch

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...
JIA-MING CHANG
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
avrilcoghlan
 

Andere mochten auch (20)

Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge2015 bioinformatics phylogenetics_wim_vancriekinge
2015 bioinformatics phylogenetics_wim_vancriekinge
 
TCS: A new multiple sequence alignment reliability measure to estimate align...
 TCS: A new multiple sequence alignment reliability measure to estimate align... TCS: A new multiple sequence alignment reliability measure to estimate align...
TCS: A new multiple sequence alignment reliability measure to estimate align...
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic TreesBIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
 
Clustal X
Clustal XClustal X
Clustal X
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)Lecture 7: Hidden Markov Models (HMMs)
Lecture 7: Hidden Markov Models (HMMs)
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 

Ähnlich wie Introduction to Probabilistic Models for Bioinformatics

My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
Robert Hoehndorf
 

Ähnlich wie Introduction to Probabilistic Models for Bioinformatics (8)

Bioinformatica t1-bioinformatics
Bioinformatica t1-bioinformaticsBioinformatica t1-bioinformatics
Bioinformatica t1-bioinformatics
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
HOMOLOGY MODELING.pptx.pdf
HOMOLOGY MODELING.pptx.pdfHOMOLOGY MODELING.pptx.pdf
HOMOLOGY MODELING.pptx.pdf
 
My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
 
Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend HHMI-Penn 2011-05-27
 
Biotechnology as Career Option 2012
Biotechnology as Career Option 2012Biotechnology as Career Option 2012
Biotechnology as Career Option 2012
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011Vicarious Systems at Singularity Summit 2011
Vicarious Systems at Singularity Summit 2011
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Introduction to Probabilistic Models for Bioinformatics

  • 1. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Introduction to Probabilistic Models for Bioinformatics Igor Bogicevic (igor.bogicevic@sbgenomics.com) July 3, 2011 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 2. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 3. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 4. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. At the current moment, given the enormous volumes of sequenced data, one of the biggest challenges is not producing, but actually understanding the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 5. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 6. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 7. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 8. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. A bit more relevant example - random sequence model in DNA . Biological sequences are strings from a finite alphabet of residues, most commonly either four nucleotides, or twenty amino acids. Imagine that a residue a occurs with probability qa , if protein or DNA sequence is denoted x1 ...xn , then probability of the whole sequence is: n Y qx1 qx2 ...qxn = qxi i=1 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 9. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 10. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 11. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. Common formats for representing alignments are FASTA and GenBank format EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 12. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 13. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 14. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 15. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 16. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 17. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 18. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine, etc.) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 19. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Example - Smith-Waterman: A matrix H is built as follows: H(i, 0) = 0, 0 ≤ i ≤ m H(0, j) = 0, 0 ≤ j ≤ n if ai = bj then w (ai , bj ) = w (match) or if ai ! = bj then w (ai , bj ) = w (mismatch) 8 9 > > 0 > > H(i − 1, j − 1) + w (ai , bj ) Match/Mismatch < = H(i, j) = max , 1 ≤ i ≤ m, 1 ≤ j ≤ n > H(i − 1, j) + w (ai , −) > Deletion > > H(i, j − 1) + w (−, bj ) Insertion : ; EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 20. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 21. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 22. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 In the example, the highest value corresponds to the cell in position (8,8). The walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1), (1,1), and (0,0) Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 23. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 24. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 25. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. Hidden Markov Models - representation of MSA as DAG, observed states are individual alignment columns and the hidden states represent the presumed ancestral sequence. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 26. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 27. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 28. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 29. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a hypothesis that in developing from embryo to adult, animals go through stages resembling or representing successive stages in the evolution of their remote ancestors. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 30. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 31. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 32. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 33. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. They do not necessarily accurately represent the species evolutionary history as the data on which they are based is noisy; the analysis can be confounded by horizontal gene transfer, hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution, and conserved sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 34. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 35. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Other Models Transformational Grammars (Chomsky Hierarchy) RNA Structure Analysis Models (RNA contains the interactions - rather than preserving the sequence) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 36. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Contact Us We are Hiring! EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics