The document discusses the RCSB Protein Data Bank (PDB) and how it can be leveraged by UCSD. It provides an overview of the PDB, examples of how it has been used in drug discovery research including for HIV and tuberculosis proteins, and proposes ways that UCSB could collaborate with the PDB such as in drug repositioning efforts. The PDB contains information on protein structures that has enabled greater understanding of protein function and evolution, and has been instrumental in structure-based drug design.
5. Depositor locations Download locations RCSB PDB PDBe PDBj Depositions since 2000 1/25/11 UCSD Deans and Chairs Meeting
6.
7. Structure distribution Other Protein only Protein-DNA complexes DNA only Protein-RNA complexes RNA only RNA-DNA hybrid Structure determination methods Number of structures Year Resolution distribution: protein structures Resolution distribution: other structures Year Resolution Resolution distribution: all structures
14. Consider one example of using the corpus as a whole from our own research – high throughput hypothesis generation for use in drug discovery 1/25/11 UCSD Deans and Chairs Meeting
15.
16.
17.
18.
19. Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/ Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
Response to stimuli (stress, inflammation, DNA repair) Cellular processes (transcription, translation, RNA splicing, photosynthesis) Other
Bacteriophage T4 lysozyme (458 structures) with many different changes (substitutions, ligands) Protein folding: Mutagenesis studies suggest that fraction of amino acid residues that define the structure of T4 lysozyme is about 50% Ligand binding: Understanding binding site flexibility would allow more effective design of ligands or inhibitors Relationship between protein activity and stability: Reduction in activity may increase protein stability Enzyme catalysis Conclusion Matthews, B.W. 1996, The FASEB Journal, v 10, 35-41 Hen White Lysozyme (257 structures) Protein evolution Conclusion Phage and hen lysozymes present no significant sequence similarity. However their three-dimensional structures are homologous. Interactions between substrate and enzyme are homologous Mechanism of catalysis is the same Matthews, B.W., Remington, M.G., Grutter, M.G., Anderson, W.F. 1981, J.Mol.Biol, v 147, (4), 545-58 Human Lysozyme (196 structures) Implication into disease (amyloidosis) Conclusion Crystal structure of wild type and amyloidogenic variants are similar. However amyloidogenic variants have reduced protein stability and altered folding kinetics Booth, R.B., et. all 1997 Nature,387, 787-93
Whale Myoglobin (185 structures) looked at: different ligands (carbon dixoide, ...) different substitutions (...)? Protein folding Ligand binding Protein dynamics Protein evolution conclusion: Myoglobin Other species looked at conclusion Hemoglobin different ligands, different species with different sequences conclusion: same fold (evolutionary) Conclusion Sequence similarity between whale and plant myoglobins is ~25%. However their three dimensional structures and functions are homologous Globin genes evolved from divergence from one ancestral globin gene Myoglobin and hemoglobin genes diverged into separate subfamilies Dickerson, R.E., Geis, I 1983 Hemoglobin: structure, function, and pathology Human Hemoglobin (178 structures) Implication into disease (sicke cell anemia, thalassemia)
3,996 proteins in TB proteome 749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage) ModBase contains homology models for entire TB proteome 1,446 ‘high quality’ homology models were added to the data set Structural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models). There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models). However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained) Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF ). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage , and the three individual scores evalue , z-Dope and GA341 . We consider a MPQS of >1.1 as reliable
(nutraceuticals excluded)
Multi-target therapy may be more effective than single-target therapy to treat infectious diseases Most of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics. GSMN-TB : Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/sbs.surrey.ac.uk/tb) 849 reactions, 739 metabolites, 726 genes Can optimize the model for in vivo growth Carry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)