SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Evolutionary design of energy functions
    for protein structure prediction

              Natalio Krasnogor
            nxk@cs.nott.ac.uk
        Paweł Widera, Jonathan Garibaldi




         7th Annual HUMIES Awards
                 2010-07-09
Protein structure prediction
From 1D sequence to 3D structure

           LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAA
          IKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP




                                                          Protein basics
                                                              20 amino acid alphabet
                                                                 sequence encodes
                                                                 structure
                                                                 structure determines
                                                                 activity
                                                                          structures
                                                                 ratio    sequences    = 0.2%



 Natalio Krasnogor      Evolutionary design of energy functions for PSP       HUMIES 2010   2 / 15
The algorithm of folding
Anfinsen’s thermodynamic hypothesis [Anfinsen, 1973]




                                                              Refolding experiment
                                                                    folds to the same
                                                                    native state
                                                                    native state is
                                                                    energetically stable

                                                              Energy funnel
                                                                    roll down free
                                                                    energy hill
                                                                    avoid local minima
                                                                    traps
[Dill and Chan, 1997]


 Natalio Krasnogor      Evolutionary design of energy functions for PSP   HUMIES 2010   3 / 15
The two aspects of folding
Towards practical prediction


                                                               Energy landscape
                                                                     all-atom force field
                                                                     statistical potential

                                                               Search method
                                                                   random walk
                                                                     structure
                                                                     optimisation

                                                               Folding@home
                                                               8.5 peta FLOPS
                                                                     10 000 CPU days
                                                                     for 10µs of folding
 [Dill and Chan, 1997]

 Natalio Krasnogor       Evolutionary design of energy functions for PSP    HUMIES 2010    4 / 15
The two aspects of folding
Towards practical prediction


                                                               Energy landscape
                                                                     all-atom force field
                                                                     statistical potential

                                                               Search method
                                                                   random walk
                                                                     structure
                                                                     optimisation

                                                               Folding@home
                                                               8.5 peta FLOPS
                                                                     10 000 CPU days
                                                                     for 10µs of folding
 [Dill and Chan, 1997]

 Natalio Krasnogor       Evolutionary design of energy functions for PSP    HUMIES 2010    4 / 15
Community wide prediction experiment
Critical Assessment of techniques for protein Structure Prediction




CASP facts
   biannual competition started in 1994
    parallel prediction and experimental verification
    model assessment by human experts

9th edition of CASP
    150 human groups
    140 server groups




 Natalio Krasnogor    Evolutionary design of energy functions for PSP   HUMIES 2010   5 / 15
How to find good quality models?
Correlation between energy and distance to the native structure




       energy


                                                               Requirements
                                                                        energy reflects
                                                                        distance
                                                                        distance reflects
                                                                        similarity

       native state

                                     distance


 Natalio Krasnogor    Evolutionary design of energy functions for PSP      HUMIES 2010   6 / 15
How the best of CASP do it?
Energy of models vs. distance to a target structure




                                                            Similarity measure
                                                                                        i=N
                                                                                 1
                                                            RMSD =                            δi2
                                                                                 N
                                                                                        i=1


                                                            Decoys generated by
                                                                  I-TASSER
                                                                  [Wu et al., 2007]

                                                                  Robetta
                                                                  [Rohl et al., 2004]




 Natalio Krasnogor    Evolutionary design of energy functions for PSP         HUMIES 2010           7 / 15
How the best of CASP do it?
Energy of models vs. distance to a target structure




                                                            Similarity measure
                                                                                        i=N
                                                                                 1
                                                            RMSD =                            δi2
                                                                                 N
                                                                                        i=1


                                                            Decoys generated by
                                                                  I-TASSER
                                                                  [Wu et al., 2007]

                                                                  Robetta
                                                                  [Rohl et al., 2004]




 Natalio Krasnogor    Evolutionary design of energy functions for PSP         HUMIES 2010           7 / 15
How the energy function is designed?
Weighted sum vs. free combination of terms



                                                                           Decision support
F (T ) = w1 ∗ T1 + . . . wn ∗ Tn
[Zhang et al., 2003]                                                             local numerical
           T ∗T
            1 3
                            „
                                    T4 −w2 ∗T1
                                                        «                        approximation
F (T ) = w ∗log(T ) + sin       T5 ∗exp(cos(w1 ∗T3 ))
          1      2


                                                                           GP input
                                                                                 terminals:
                                                                                 T1 , . . . , T8
                                                                                 functions:
                                                                                 add sub mul div
                                                                                 sin cos exp log
                                                                                 random ephemerals
                                                                                 in range [0,1]


 Natalio Krasnogor                   Evolutionary design of energy functions for PSP      HUMIES 2010   8 / 15
How the energy function is designed?
Weighted sum vs. free combination of terms



                                                                           Decision support
F (T ) = w1 ∗ T1 + . . . wn ∗ Tn
[Zhang et al., 2003]                                                             local numerical
           T ∗T
            1 3
                            „
                                    T4 −w2 ∗T1
                                                        «                        approximation
F (T ) = w ∗log(T ) + sin       T5 ∗exp(cos(w1 ∗T3 ))
          1      2


                                                                           GP input
                                                                                 terminals:
                                                                                 T1 , . . . , T8
                                                                                 functions:
                                                                                 add sub mul div
                                                                                 sin cos exp log
                                                                                 random ephemerals
                                                                                 in range [0,1]


 Natalio Krasnogor                   Evolutionary design of energy functions for PSP      HUMIES 2010   8 / 15
How the energy function is designed?
Weighted sum vs. free combination of terms



                                                                           Decision support
F (T ) = w1 ∗ T1 + . . . wn ∗ Tn
[Zhang et al., 2003]                                                             local numerical
           T ∗T
            1 3
                            „
                                    T4 −w2 ∗T1
                                                        «                        approximation
F (T ) = w ∗log(T ) + sin       T5 ∗exp(cos(w1 ∗T3 ))
          1      2


                                                                           GP input
                                                                                 terminals:
                                                                                 T1 , . . . , T8
                                                                                 functions:
                                                                                 add sub mul div
                                                                                 sin cos exp log
[Widera et al., 2010]                                                            random ephemerals
                                                                                 in range [0,1]


 Natalio Krasnogor                   Evolutionary design of energy functions for PSP      HUMIES 2010   8 / 15
What is the GP objective?
Beyond simple symbolic regression



  1   construction of the reference ranking RR
      (structures sorted by distance to the native)
       3         1     5         0         4          7         6       2


  2   construction of the evolutionary ranking RE
      (structures sorted by the evolved energy function)
       5         1     3         4         0         2          6       7


  3   rankings comparison — RR vs. RE
  4   total fitness = average distance for m proteins
               m
            1
      F =        d(RRi , REi )
           m
                i=1




 Natalio Krasnogor    Evolutionary design of energy functions for PSP       HUMIES 2010   9 / 15
Can GP improve over a weighted sum of terms?
Nelder-Mead downhill simplex optimisation



                            spearman-sigmoid                    correlation
              method        d-100                     all     d-100           all
              simplex       0.734                0.638        0.650         0.166
              GP            0.835                0.714       *0.740        *0.200



              Computational cost of experiments
                     55 proteins, 1000-2000 structures for each
                     5 different ranking distance measures
                     20 different configurations of GP parameters
                     total of 150 CPU days

 Natalio Krasnogor       Evolutionary design of energy functions for PSP     HUMIES 2010   10 / 15
Can GP improve over a weighted sum of terms?
Nelder-Mead downhill simplex optimisation



                            spearman-sigmoid                    correlation
              method        d-100                     all     d-100           all
              simplex       0.734                0.638        0.650         0.166
              GP            0.835                0.714       *0.740        *0.200



              Computational cost of experiments
                     55 proteins, 1000-2000 structures for each
                     5 different ranking distance measures
                     20 different configurations of GP parameters
                     total of 150 CPU days

 Natalio Krasnogor       Evolutionary design of energy functions for PSP     HUMIES 2010   10 / 15
Criteria for human-competitivness


                                CRITERION F
                    result >= past achievement in the field


                                 CRITERION E
                    result >= most recent human-created
                      solution to a long-standing problem


                                CRITERION H
                     result holds its own in a competition
                        involving human contestants


Natalio Krasnogor      Evolutionary design of energy functions for PSP   HUMIES 2010   11 / 15
Criteria for human-competitivness


                                CRITERION F
                    result >= past achievement in the field


                                 CRITERION E
                    result >= most recent human-created
                      solution to a long-standing problem


                                CRITERION H
                     result holds its own in a competition
                        involving human contestants


Natalio Krasnogor      Evolutionary design of energy functions for PSP   HUMIES 2010   11 / 15
Criteria for human-competitivness


                                CRITERION F
                    result >= past achievement in the field


                                 CRITERION E
                    result >= most recent human-created
                      solution to a long-standing problem


                                CRITERION H
                     result holds its own in a competition
                        involving human contestants


Natalio Krasnogor      Evolutionary design of energy functions for PSP   HUMIES 2010   11 / 15
Comparison to the human made solution



          1   automated method to discover the best combination
              of the energy terms

          2   human-competitive improvement to the solution of a
              long-standing problem

          3   challenge weighted sum of terms with expert-picked
              weights




Natalio Krasnogor    Evolutionary design of energy functions for PSP   HUMIES 2010   12 / 15
Potential impact



           1   automated energy design using a free functional
               combination of terms haven’t been used before

           2   energy functions determines the search landscape
               and its smoothness is a key to the efficient
               prediction

           3   long-term effects in protein science that the
               improvement in prediction quality could bring




 Natalio Krasnogor    Evolutionary design of energy functions for PSP   HUMIES 2010   13 / 15
Why this is the best entry?



         1   innovates the field with a novel approach to a
             long-standing problem

         2   could be a step towards more accurate prediction and
             in a long-term improve drug design and identification of
             disease-causing mutations

         3   represent a new and difficult challange for GP
             http://www.infobiotics.org/gpchallenge/




 Natalio Krasnogor    Evolutionary design of energy functions for PSP   HUMIES 2010   14 / 15
References

    Anfinsen, C. (1973).
    Principles that Govern the Folding of Protein Chains.
    Science, 181(4096):223–30.

    Dill, K. A. and Chan, H. S. (1997).
    From Levinthal to pathways to funnels.
    Nat Struct Mol Biol, 4(1):10–19.

    Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004).
    Protein Structure Prediction Using Rosetta.
    In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods in
    Enzymology, pages 66–93. Academic Press.

    Widera, P., Garibaldi, J., and Krasnogor, N. (2009).
    Evolutionary design of the energy function for protein structure prediction.
    In IEEE Congress on Evolutionary Computation 2009, pages 1305–1312, Trondheim, Norway.

    Widera, P., Garibaldi, J., and Krasnogor, N. (2010).
    GP challenge: evolving energy function for protein structure prediction.
    Genetic Programming and Evolvable Machines, 11(1):61–88.

    Wu, S., Skolnick, J., and Zhang, Y. (2007).
    Ab initio modeling of small proteins by iterative TASSER simulations.
    BMC Biol, 5(1):17.

    Zhang, Y., Kolinski, A., and Skolnick, J. (2003).
    TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction.
    Biophys. J., 85(2):1145–1164.



Natalio Krasnogor               Evolutionary design of energy functions for PSP               HUMIES 2010           15 / 15

Weitere ähnliche Inhalte

Andere mochten auch

Curso Murcia julio 2014
 Curso Murcia julio 2014 Curso Murcia julio 2014
Curso Murcia julio 2014ACREDITRA
 
Publicitatea Online In Romania
Publicitatea Online In RomaniaPublicitatea Online In Romania
Publicitatea Online In RomaniaGabriel Curcudel
 
6. proceso de control administrativo
6. proceso de control administrativo6. proceso de control administrativo
6. proceso de control administrativoruizver
 
Osha construction-hazards-los-angeles
Osha construction-hazards-los-angelesOsha construction-hazards-los-angeles
Osha construction-hazards-los-angelesGlobalCompliancePanel
 
The Popularity Myth - A Guide to Mean Girls with Julie Hiramine
The Popularity Myth - A Guide to Mean Girls with Julie HiramineThe Popularity Myth - A Guide to Mean Girls with Julie Hiramine
The Popularity Myth - A Guide to Mean Girls with Julie HiramineGenerations of Virtue
 
Ayuda nemon directorio de contactos
Ayuda nemon directorio de contactosAyuda nemon directorio de contactos
Ayuda nemon directorio de contactosNemon Intelligence
 
Arquitectura del computador
Arquitectura del computadorArquitectura del computador
Arquitectura del computadoryurelyguevara
 
Aufwertung von Printprodukten für die crossmediale Weiterverwertung
Aufwertung von Printprodukten für die crossmediale WeiterverwertungAufwertung von Printprodukten für die crossmediale Weiterverwertung
Aufwertung von Printprodukten für die crossmediale WeiterverwertungNadine Edelmann
 
2016 Predictions for the Luxury Industry: Executive Summary
2016 Predictions for the Luxury Industry: Executive Summary2016 Predictions for the Luxury Industry: Executive Summary
2016 Predictions for the Luxury Industry: Executive SummaryPosLux
 
Direccionamiento ip privado automático APIPA
Direccionamiento ip privado automático APIPADireccionamiento ip privado automático APIPA
Direccionamiento ip privado automático APIPADaniel Gvtierrex
 
Cultivo de granadilla municipio de boavita
Cultivo de granadilla municipio de boavitaCultivo de granadilla municipio de boavita
Cultivo de granadilla municipio de boavitaUNAD
 

Andere mochten auch (16)

Curso Murcia julio 2014
 Curso Murcia julio 2014 Curso Murcia julio 2014
Curso Murcia julio 2014
 
Computacion
ComputacionComputacion
Computacion
 
Memorias ddr
Memorias ddrMemorias ddr
Memorias ddr
 
Publicitatea Online In Romania
Publicitatea Online In RomaniaPublicitatea Online In Romania
Publicitatea Online In Romania
 
UX i fysiske produkter
UX i fysiske produkterUX i fysiske produkter
UX i fysiske produkter
 
6. proceso de control administrativo
6. proceso de control administrativo6. proceso de control administrativo
6. proceso de control administrativo
 
Osha construction-hazards-los-angeles
Osha construction-hazards-los-angelesOsha construction-hazards-los-angeles
Osha construction-hazards-los-angeles
 
The Popularity Myth - A Guide to Mean Girls with Julie Hiramine
The Popularity Myth - A Guide to Mean Girls with Julie HiramineThe Popularity Myth - A Guide to Mean Girls with Julie Hiramine
The Popularity Myth - A Guide to Mean Girls with Julie Hiramine
 
Ayuda nemon directorio de contactos
Ayuda nemon directorio de contactosAyuda nemon directorio de contactos
Ayuda nemon directorio de contactos
 
Arquitectura del computador
Arquitectura del computadorArquitectura del computador
Arquitectura del computador
 
Aufwertung von Printprodukten für die crossmediale Weiterverwertung
Aufwertung von Printprodukten für die crossmediale WeiterverwertungAufwertung von Printprodukten für die crossmediale Weiterverwertung
Aufwertung von Printprodukten für die crossmediale Weiterverwertung
 
2016 Predictions for the Luxury Industry: Executive Summary
2016 Predictions for the Luxury Industry: Executive Summary2016 Predictions for the Luxury Industry: Executive Summary
2016 Predictions for the Luxury Industry: Executive Summary
 
Direccionamiento ip privado automático APIPA
Direccionamiento ip privado automático APIPADireccionamiento ip privado automático APIPA
Direccionamiento ip privado automático APIPA
 
Rafael patxot ti
Rafael patxot tiRafael patxot ti
Rafael patxot ti
 
Equinoterapia
EquinoterapiaEquinoterapia
Equinoterapia
 
Cultivo de granadilla municipio de boavita
Cultivo de granadilla municipio de boavitaCultivo de granadilla municipio de boavita
Cultivo de granadilla municipio de boavita
 

Mehr von Natalio Krasnogor

Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Natalio Krasnogor
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainNatalio Krasnogor
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Natalio Krasnogor
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbioNatalio Krasnogor
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputingNatalio Krasnogor
 
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesEvolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesNatalio Krasnogor
 
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Natalio Krasnogor
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsNatalio Krasnogor
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Natalio Krasnogor
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic BiologyNatalio Krasnogor
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...Natalio Krasnogor
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic BiologyNatalio Krasnogor
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Synthetic Biology - Modeling and Optimisation
Synthetic Biology -  Modeling and OptimisationSynthetic Biology -  Modeling and Optimisation
Synthetic Biology - Modeling and OptimisationNatalio Krasnogor
 
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...Natalio Krasnogor
 
Building Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyBuilding Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyNatalio Krasnogor
 

Mehr von Natalio Krasnogor (20)

Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
Designing for Addressability, Bio-orthogonality and Abstraction Scalability a...
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
 
DNA data-structure
DNA data-structureDNA data-structure
DNA data-structure
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbio
 
The Infobiotics workbench
The Infobiotics workbenchThe Infobiotics workbench
The Infobiotics workbench
 
Introduction to biocomputing
 Introduction to biocomputing Introduction to biocomputing
Introduction to biocomputing
 
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tilesEvolvability of Designs and Computation with Porphyrins-based Nano-tiles
Evolvability of Designs and Computation with Porphyrins-based Nano-tiles
 
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
Plenary Speaker slides at the 2016 International Workshop on Biodesign Automa...
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
An Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic AlgorithmsAn Unorthodox View on Memetic Algorithms
An Unorthodox View on Memetic Algorithms
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...Evolutionary Symbolic Discovery for Bioinformatics,  Systems and Synthetic Bi...
Evolutionary Symbolic Discovery for Bioinformatics, Systems and Synthetic Bi...
 
Computational Synthetic Biology
Computational Synthetic BiologyComputational Synthetic Biology
Computational Synthetic Biology
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Synthetic Biology - Modeling and Optimisation
Synthetic Biology -  Modeling and OptimisationSynthetic Biology -  Modeling and Optimisation
Synthetic Biology - Modeling and Optimisation
 
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
A Genetic Programming Challenge: Evolving the Energy Function for Protein Str...
 
Building Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic BiologyBuilding Executable Biology Models for Synthetic Biology
Building Executable Biology Models for Synthetic Biology
 

HUMIES presentation: Evolutionary design of energy functions for protein structure prediction

  • 1. Evolutionary design of energy functions for protein structure prediction Natalio Krasnogor nxk@cs.nott.ac.uk Paweł Widera, Jonathan Garibaldi 7th Annual HUMIES Awards 2010-07-09
  • 2. Protein structure prediction From 1D sequence to 3D structure LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAA IKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP Protein basics 20 amino acid alphabet sequence encodes structure structure determines activity structures ratio sequences = 0.2% Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 2 / 15
  • 3. The algorithm of folding Anfinsen’s thermodynamic hypothesis [Anfinsen, 1973] Refolding experiment folds to the same native state native state is energetically stable Energy funnel roll down free energy hill avoid local minima traps [Dill and Chan, 1997] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 3 / 15
  • 4. The two aspects of folding Towards practical prediction Energy landscape all-atom force field statistical potential Search method random walk structure optimisation Folding@home 8.5 peta FLOPS 10 000 CPU days for 10µs of folding [Dill and Chan, 1997] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 4 / 15
  • 5. The two aspects of folding Towards practical prediction Energy landscape all-atom force field statistical potential Search method random walk structure optimisation Folding@home 8.5 peta FLOPS 10 000 CPU days for 10µs of folding [Dill and Chan, 1997] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 4 / 15
  • 6. Community wide prediction experiment Critical Assessment of techniques for protein Structure Prediction CASP facts biannual competition started in 1994 parallel prediction and experimental verification model assessment by human experts 9th edition of CASP 150 human groups 140 server groups Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 5 / 15
  • 7. How to find good quality models? Correlation between energy and distance to the native structure energy Requirements energy reflects distance distance reflects similarity native state distance Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 6 / 15
  • 8. How the best of CASP do it? Energy of models vs. distance to a target structure Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 7 / 15
  • 9. How the best of CASP do it? Energy of models vs. distance to a target structure Similarity measure i=N 1 RMSD = δi2 N i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 7 / 15
  • 10. How the energy function is designed? Weighted sum vs. free combination of terms Decision support F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] local numerical T ∗T 1 3 „ T4 −w2 ∗T1 « approximation F (T ) = w ∗log(T ) + sin T5 ∗exp(cos(w1 ∗T3 )) 1 2 GP input terminals: T1 , . . . , T8 functions: add sub mul div sin cos exp log random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 15
  • 11. How the energy function is designed? Weighted sum vs. free combination of terms Decision support F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] local numerical T ∗T 1 3 „ T4 −w2 ∗T1 « approximation F (T ) = w ∗log(T ) + sin T5 ∗exp(cos(w1 ∗T3 )) 1 2 GP input terminals: T1 , . . . , T8 functions: add sub mul div sin cos exp log random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 15
  • 12. How the energy function is designed? Weighted sum vs. free combination of terms Decision support F (T ) = w1 ∗ T1 + . . . wn ∗ Tn [Zhang et al., 2003] local numerical T ∗T 1 3 „ T4 −w2 ∗T1 « approximation F (T ) = w ∗log(T ) + sin T5 ∗exp(cos(w1 ∗T3 )) 1 2 GP input terminals: T1 , . . . , T8 functions: add sub mul div sin cos exp log [Widera et al., 2010] random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 15
  • 13. What is the GP objective? Beyond simple symbolic regression 1 construction of the reference ranking RR (structures sorted by distance to the native) 3 1 5 0 4 7 6 2 2 construction of the evolutionary ranking RE (structures sorted by the evolved energy function) 5 1 3 4 0 2 6 7 3 rankings comparison — RR vs. RE 4 total fitness = average distance for m proteins m 1 F = d(RRi , REi ) m i=1 Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 9 / 15
  • 14. Can GP improve over a weighted sum of terms? Nelder-Mead downhill simplex optimisation spearman-sigmoid correlation method d-100 all d-100 all simplex 0.734 0.638 0.650 0.166 GP 0.835 0.714 *0.740 *0.200 Computational cost of experiments 55 proteins, 1000-2000 structures for each 5 different ranking distance measures 20 different configurations of GP parameters total of 150 CPU days Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 10 / 15
  • 15. Can GP improve over a weighted sum of terms? Nelder-Mead downhill simplex optimisation spearman-sigmoid correlation method d-100 all d-100 all simplex 0.734 0.638 0.650 0.166 GP 0.835 0.714 *0.740 *0.200 Computational cost of experiments 55 proteins, 1000-2000 structures for each 5 different ranking distance measures 20 different configurations of GP parameters total of 150 CPU days Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 10 / 15
  • 16. Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 11 / 15
  • 17. Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 11 / 15
  • 18. Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 11 / 15
  • 19. Comparison to the human made solution 1 automated method to discover the best combination of the energy terms 2 human-competitive improvement to the solution of a long-standing problem 3 challenge weighted sum of terms with expert-picked weights Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 12 / 15
  • 20. Potential impact 1 automated energy design using a free functional combination of terms haven’t been used before 2 energy functions determines the search landscape and its smoothness is a key to the efficient prediction 3 long-term effects in protein science that the improvement in prediction quality could bring Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 13 / 15
  • 21. Why this is the best entry? 1 innovates the field with a novel approach to a long-standing problem 2 could be a step towards more accurate prediction and in a long-term improve drug design and identification of disease-causing mutations 3 represent a new and difficult challange for GP http://www.infobiotics.org/gpchallenge/ Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 14 / 15
  • 22. References Anfinsen, C. (1973). Principles that Govern the Folding of Protein Chains. Science, 181(4096):223–30. Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nat Struct Mol Biol, 4(1):10–19. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004). Protein Structure Prediction Using Rosetta. In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods in Enzymology, pages 66–93. Academic Press. Widera, P., Garibaldi, J., and Krasnogor, N. (2009). Evolutionary design of the energy function for protein structure prediction. In IEEE Congress on Evolutionary Computation 2009, pages 1305–1312, Trondheim, Norway. Widera, P., Garibaldi, J., and Krasnogor, N. (2010). GP challenge: evolving energy function for protein structure prediction. Genetic Programming and Evolvable Machines, 11(1):61–88. Wu, S., Skolnick, J., and Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol, 5(1):17. Zhang, Y., Kolinski, A., and Skolnick, J. (2003). TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction. Biophys. J., 85(2):1145–1164. Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 15 / 15