SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Machine Learning as Applied to Structural Bioinformatics: Results and Challenges Philip E. Bourne University of California San Diego [email_address]
The Current Situation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Example Unsolved Problems that Machine Learning Can Address ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Unsolved Problems that Machine Learning Can Address ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Will talk about this * Will offer as a challenge
The Current Situation: The Potential “Training Set” is Growing Quickly ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Predicting Functional Flexibility Jenny Gu Gu, Gribskov & Bourne  PLoS Computational Biology  2006 Early On-line Release
Spectrum of Protein Order and Disorder Ordered Structures Disordered Structures If we believe that the 3-dimensional structure of a protein is defined by its 1-dimensional sequence then why not its flexibility?
Bridging the Sequence-flexibility Gap Generalize sequence - flexibility relationship to identify local protein regions important for allostery
The Training Dataset ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Obtaining Protein Dynamic Information ,[object Object],Bahar, I., A.R. Atilgan, and B. Erman  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.   Folding & Design, 1997. 2(3): p. 173-181.
Defining the Target Features ,[object Object],[object Object],[object Object],[object Object],[object Object],Bahar, I., A.R. Atilgan, and B. Erman  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.   Folding & Design, 1997. 2(3): p. 173-181.
Side Note: Gaussian Network Model vs Molecular Dynamics ,[object Object],[object Object],[object Object],[object Object]
Functional Flexibility Score ,[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Identifying Functional Flexible Regions (FFR) in HIV Protease Gu, Gribskov & Bourne PLoS Comp. Biol..  2006 Early Release Correlated modes (yellow) Anti-correlated (blue) Normalized scores – single chain
Identifying Regions in Bovine Pancreatic Trypsin Inhibitor and Calmodulin
How to Represent the Protein Sequence? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
Generating Additional Input Features Modified Bootstrapping – for Tripeptides – Accounts for Nearest Neighbors Effects Calculate Z score and P value  for each pattern  with respective null models Sample  with replacement 44645 times Pooled  Patterns (window size : 3) Null Model*  for  Non-FFR Regions Sample  with replacement 199515 times * Generate 10,000 Null Models Null Model* for  FFR Regions
Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
Predictors Trained on the Entire Dataset Perform Poorly on Smaller Proteins. False Positive False Negative The characteristics of small  proteins are different –  eg percent of complexes
Partition Training Set Based on Sequence Length ,[object Object],[object Object],<200 AA Long >200 AA Long
Performance of Wiggle Predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study:  PvuII Endonuclease FF SCORE (homodimer for DNA specific cleavage) Wiggle 200 ,[object Object],[object Object],[object Object]
Conclusions for Wiggle ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Gu, Gribskov  & Bourne 2006 PLoS Comp. Biol..  2006 Early Release
Exploiting Sequence and Structure Homologs to Identify  Protein-Protein Binding Sites JoLan Chung Chung, Wang & Bourne 2006  Proteins: Structure, Function and Bioinformatics,  62(3) 630-640
Methods to Identify Protein-protein Binding Sites ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],Structurally Conserved Surface Residues?
Method: Incorporate Structural Conservation to Predict the Interface Residue Using SVM Support vector machine  Sequence + structure information  Binding site location
Derive the Structurally Conserved Residues ,[object Object],[object Object],[object Object]
Structurally Conserved Residues and Interface Residues E.g. Residues with the top 20% of structure conservation scores (red) mapped to adrenodoxin (Adx, PDB code 1E6E:B) and known to bind adrenodoxin reductase (AR, blue).
Training D ataset ,[object Object],[object Object]
SVM Training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SVM Training ,[object Object],[object Object],[object Object],[object Object]
Predictor 1: Sequence profile + ASA. Predictor 2: Sequence profile + ASA + structural conservation score Predictor 3: Sequence profile + ASA + raw structural conservation    score without weighted by the normalized B-factor  Predictor 4: Sequence profile + ASA+ normalized B-factor The Performance of Various Predictors
Precise prediction: at least 70% interface residues were identified Correct prediction: at least 50 % interface residues were identified Partial prediction: some but less than 50 % interface residues were identified Wrong prediction: no interface residues were identified The Performances of the Predictors
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
Conclusions – Protein-protein Binding Sites ,[object Object],[object Object],[object Object],[object Object]
General Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object]
1fohb PUU: 2 Experts: 3 A. B. C. D. E. Consider Domain Definitions: Holland et al. 2006  JMB  Early Release Veretnik et al. 2004  JMB   339(3), 647-678   1ytf PUU: 1 Experts: 2 1d0gt PUU: 1 Experts: 3 1dgk PUU: 6 Experts: 4 1aoga PUU: 4 Experts: 3
Challenge – Defining Domain Boundaries from Sequence ,[object Object],[object Object],[object Object],[object Object],Benchmark Data Available See: Holland et al 2006  JMB  Early Release
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The structural conservation score ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The structure conservation score ,[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

PuneetJaju_SummerProjectReport_Univ. of Cambridge
PuneetJaju_SummerProjectReport_Univ. of CambridgePuneetJaju_SummerProjectReport_Univ. of Cambridge
PuneetJaju_SummerProjectReport_Univ. of CambridgePuneet Jaju
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
 
Implementation and Evaluation of Signal Processing Techniques for EEG based B...
Implementation and Evaluation of Signal Processing Techniques for EEG based B...Implementation and Evaluation of Signal Processing Techniques for EEG based B...
Implementation and Evaluation of Signal Processing Techniques for EEG based B...Damian Quinn
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)SumatiHajela
 
Gel Based Proteomics and Protein Sequences Analysis
Gel Based Proteomics and Protein Sequences AnalysisGel Based Proteomics and Protein Sequences Analysis
Gel Based Proteomics and Protein Sequences AnalysisGelica F
 
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...IOSR Journals
 

Was ist angesagt? (7)

PuneetJaju_SummerProjectReport_Univ. of Cambridge
PuneetJaju_SummerProjectReport_Univ. of CambridgePuneetJaju_SummerProjectReport_Univ. of Cambridge
PuneetJaju_SummerProjectReport_Univ. of Cambridge
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequences
 
Implementation and Evaluation of Signal Processing Techniques for EEG based B...
Implementation and Evaluation of Signal Processing Techniques for EEG based B...Implementation and Evaluation of Signal Processing Techniques for EEG based B...
Implementation and Evaluation of Signal Processing Techniques for EEG based B...
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)
 
Gel Based Proteomics and Protein Sequences Analysis
Gel Based Proteomics and Protein Sequences AnalysisGel Based Proteomics and Protein Sequences Analysis
Gel Based Proteomics and Protein Sequences Analysis
 
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
A New Approach of Protein Sequence Compression using Repeat Reduction and ASC...
 

Andere mochten auch

PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
download
downloaddownload
downloadbutest
 
An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learningbutest
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...butest
 
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」R.O.C.Executive Yuan
 
Webpage Design-eCommerce
Webpage Design-eCommerceWebpage Design-eCommerce
Webpage Design-eCommercebutest
 

Andere mochten auch (7)

PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
download
downloaddownload
download
 
An introduc on to Machine Learning
An introduc on to Machine LearningAn introduc on to Machine Learning
An introduc on to Machine Learning
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
 
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」
衛福部:「南韓中東呼吸症候群冠狀病毒感染症(MERS-CoV)疫情」
 
Webpage Design-eCommerce
Webpage Design-eCommerceWebpage Design-eCommerce
Webpage Design-eCommerce
 
.doc
.doc.doc
.doc
 

Ähnlich wie Powerpoint

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
 
Predicting protein binding sites using svm
Predicting protein binding sites using svmPredicting protein binding sites using svm
Predicting protein binding sites using svmSiddhant Gawsane
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9jaumebp
 
Bio process
Bio processBio process
Bio processsun777
 
Bio process
Bio processBio process
Bio processsun777
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
upload.pdf
upload.pdfupload.pdf
upload.pdfzohra72
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsGolden Helix Inc
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal ClubMed_KU
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignmentShwetA Kumari
 

Ähnlich wie Powerpoint (20)

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Predicting protein binding sites using svm
Predicting protein binding sites using svmPredicting protein binding sites using svm
Predicting protein binding sites using svm
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
1207.2600
1207.26001207.2600
1207.2600
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Bio process
Bio processBio process
Bio process
 
Bio process
Bio processBio process
Bio process
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
upload.pdf
upload.pdfupload.pdf
upload.pdf
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
PPT
PPTPPT
PPT
 
modelling assignment
modelling assignmentmodelling assignment
modelling assignment
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Powerpoint

  • 1. Machine Learning as Applied to Structural Bioinformatics: Results and Challenges Philip E. Bourne University of California San Diego [email_address]
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Predicting Functional Flexibility Jenny Gu Gu, Gribskov & Bourne PLoS Computational Biology 2006 Early On-line Release
  • 7. Spectrum of Protein Order and Disorder Ordered Structures Disordered Structures If we believe that the 3-dimensional structure of a protein is defined by its 1-dimensional sequence then why not its flexibility?
  • 8. Bridging the Sequence-flexibility Gap Generalize sequence - flexibility relationship to identify local protein regions important for allostery
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Example: Identifying Functional Flexible Regions (FFR) in HIV Protease Gu, Gribskov & Bourne PLoS Comp. Biol.. 2006 Early Release Correlated modes (yellow) Anti-correlated (blue) Normalized scores – single chain
  • 15. Identifying Regions in Bovine Pancreatic Trypsin Inhibitor and Calmodulin
  • 16.
  • 17. Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
  • 18. Generating Additional Input Features Modified Bootstrapping – for Tripeptides – Accounts for Nearest Neighbors Effects Calculate Z score and P value for each pattern with respective null models Sample with replacement 44645 times Pooled Patterns (window size : 3) Null Model* for Non-FFR Regions Sample with replacement 199515 times * Generate 10,000 Null Models Null Model* for FFR Regions
  • 19. Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
  • 20. Predictors Trained on the Entire Dataset Perform Poorly on Smaller Proteins. False Positive False Negative The characteristics of small proteins are different – eg percent of complexes
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Exploiting Sequence and Structure Homologs to Identify Protein-Protein Binding Sites JoLan Chung Chung, Wang & Bourne 2006 Proteins: Structure, Function and Bioinformatics, 62(3) 630-640
  • 26.
  • 27.
  • 28. Method: Incorporate Structural Conservation to Predict the Interface Residue Using SVM Support vector machine Sequence + structure information Binding site location
  • 29.
  • 30. Structurally Conserved Residues and Interface Residues E.g. Residues with the top 20% of structure conservation scores (red) mapped to adrenodoxin (Adx, PDB code 1E6E:B) and known to bind adrenodoxin reductase (AR, blue).
  • 31.
  • 32.
  • 33.
  • 34. Predictor 1: Sequence profile + ASA. Predictor 2: Sequence profile + ASA + structural conservation score Predictor 3: Sequence profile + ASA + raw structural conservation score without weighted by the normalized B-factor Predictor 4: Sequence profile + ASA+ normalized B-factor The Performance of Various Predictors
  • 35. Precise prediction: at least 70% interface residues were identified Correct prediction: at least 50 % interface residues were identified Partial prediction: some but less than 50 % interface residues were identified Wrong prediction: no interface residues were identified The Performances of the Predictors
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. 1fohb PUU: 2 Experts: 3 A. B. C. D. E. Consider Domain Definitions: Holland et al. 2006 JMB Early Release Veretnik et al. 2004 JMB 339(3), 647-678 1ytf PUU: 1 Experts: 2 1d0gt PUU: 1 Experts: 3 1dgk PUU: 6 Experts: 4 1aoga PUU: 4 Experts: 3
  • 42.
  • 43.
  • 44.
  • 45.

Hinweis der Redaktion

  1. 29 = 20 amino acid state + 9 transition states (deletion insertion match)