1. Machine Learning as Applied to Structural Bioinformatics: Results and Challenges Philip E. Bourne University of California San Diego [email_address]
7. Spectrum of Protein Order and Disorder Ordered Structures Disordered Structures If we believe that the 3-dimensional structure of a protein is defined by its 1-dimensional sequence then why not its flexibility?
8. Bridging the Sequence-flexibility Gap Generalize sequence - flexibility relationship to identify local protein regions important for allostery
9.
10.
11.
12.
13.
14. Example: Identifying Functional Flexible Regions (FFR) in HIV Protease Gu, Gribskov & Bourne PLoS Comp. Biol.. 2006 Early Release Correlated modes (yellow) Anti-correlated (blue) Normalized scores – single chain
17. Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
18. Generating Additional Input Features Modified Bootstrapping – for Tripeptides – Accounts for Nearest Neighbors Effects Calculate Z score and P value for each pattern with respective null models Sample with replacement 44645 times Pooled Patterns (window size : 3) Null Model* for Non-FFR Regions Sample with replacement 199515 times * Generate 10,000 Null Models Null Model* for FFR Regions
19. Architecture of Wiggle Captures Evolutionary Effects Captures Local Effects (smoothing) 9*29 features used for each residue
20. Predictors Trained on the Entire Dataset Perform Poorly on Smaller Proteins. False Positive False Negative The characteristics of small proteins are different – eg percent of complexes
21.
22.
23.
24.
25. Exploiting Sequence and Structure Homologs to Identify Protein-Protein Binding Sites JoLan Chung Chung, Wang & Bourne 2006 Proteins: Structure, Function and Bioinformatics, 62(3) 630-640
26.
27.
28. Method: Incorporate Structural Conservation to Predict the Interface Residue Using SVM Support vector machine Sequence + structure information Binding site location
29.
30. Structurally Conserved Residues and Interface Residues E.g. Residues with the top 20% of structure conservation scores (red) mapped to adrenodoxin (Adx, PDB code 1E6E:B) and known to bind adrenodoxin reductase (AR, blue).
31.
32.
33.
34. Predictor 1: Sequence profile + ASA. Predictor 2: Sequence profile + ASA + structural conservation score Predictor 3: Sequence profile + ASA + raw structural conservation score without weighted by the normalized B-factor Predictor 4: Sequence profile + ASA+ normalized B-factor The Performance of Various Predictors
35. Precise prediction: at least 70% interface residues were identified Correct prediction: at least 50 % interface residues were identified Partial prediction: some but less than 50 % interface residues were identified Wrong prediction: no interface residues were identified The Performances of the Predictors
36.
37.
38.
39.
40.
41. 1fohb PUU: 2 Experts: 3 A. B. C. D. E. Consider Domain Definitions: Holland et al. 2006 JMB Early Release Veretnik et al. 2004 JMB 339(3), 647-678 1ytf PUU: 1 Experts: 2 1d0gt PUU: 1 Experts: 3 1dgk PUU: 6 Experts: 4 1aoga PUU: 4 Experts: 3
42.
43.
44.
45.
Hinweis der Redaktion
29 = 20 amino acid state + 9 transition states (deletion insertion match)