New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Bioinformatics A Biased Overview
1. A {Biased} Overview of Bioinformatics with Examples Drawn from Our Own Work Philip E. Bourne Professor of Pharmacology UCSD [email_address] Bioinformatics - Overview
2. There Are Multiple Types of Informatics in the Life Sciences Bioinformatics - Overview Pharmacy Informatics Biomedical Informatics Bioinformatics Note: These are only representative examples Drug dosing Pharmacokinetics Pharmacy Information Systems EHR Decision support systems Hospital Information Systems Algorithms Genomics Proteomics Biological networks Systems Biology
3. There Are Multiple Types of Informatics in the Life Sciences Bioinformatics - Overview Pharmacy Informatics Biomedical Informatics Bioinformatics Controlled vocabularies Ontologies Literature searching Data management Pharmacogenomics Personalized medicine Note: These are only representative examples
4. Bioinformatics In One Slide Biological Experiment Data Information Knowledge Discovery Collect Characterize Compare Model Infer Sequence Structure Assembly Sub-cellular Cellular Organ Higher-life 90 05 Computing Power Sequencing Data 1 10 100 1000 10 5 95 00 Human Genome Project E.Coli Genome C.Elegans Genome 1 Small Genome/Mo. ESTs Yeast Genome Gene Chips Virus Structure Ribosome Model Metaboloic Pathway of E.coli Complexity Technology Brain Mapping Genetic Circuits Neuronal Modeling Cardiac Modeling Human Genome # People /Web Site 10 6 10 2 1 Virtual Communities 10 6 Blogs Facebook 1000 ’s GWAS The Omics Revolution Bioinformatics - Overview
5.
6. Biological Scales (Complexity) Bioinformatics - Overview Genomics Proteomics Protein-protein interactions Biological Networks Systems Biology We will look at an example of how bioinformatics is used at each scale
7.
8.
9.
10. Metagenomics New Discoveries Environmental (red) vs. Currently Known PTPases (blue) Higher eukaryotes 1 2 3 4 Bioinformatics at Different Scales - Genomics Bioinformatics - Overview
12. Its Not Just About Numbers its About Complexity Number of released entries Year Courtesy of the RCSB Protein Data Bank Bioinformatics at Different Scales - Proteomics Bioinformatics - Overview
15. Nature ’s Reductionism There are ~ 20 300 possible proteins >>>> all the atoms in the Universe ~20M protein sequences from UniProt/TrEMBL ~75,000 protein structures Yield ~1500 folds, ~2000 superfamilies, ~4000 families (SCOP 1.75) Using Protein Structure to Study Evolution
16.
17. Method – Distance Determination Presence/Absence Data Matrix Distance Matrix Using Protein Structure to Study Evolution (FSF) SCOP SUPERFAMILY organisms C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 C. intestinalis C. briggsae F. rubripes C. intestinalis 0 101 109 C. briggsae 0 144 F. rubripes 0
18.
19. The Influence of Environment on Life Chris Dupont Scripps Institute of Oceanography UCSD DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827 Using Protein Structure to Study Evolution
28. Metal Binding Proteins are Not Consistent Across Superkingdoms Since these data are derived from current species they are independent of evolutionary events such as duplication, gene loss, horizontal transfer and endosymbiosis Using Protein Structure to Study Evolution
29.
30.
31. Do the Metallomes Contain Further Support for this Hypothesis? Using Protein Structure to Study Evolution
36. A Reverse Engineering Approach to Drug Discovery Across Gene Families Characterize ligand binding site of primary target (Geometric Potential) Identify off-targets by ligand binding site similarity (Sequence order independent profile-profile alignment) Extract known drugs or inhibitors of the primary and/or off-targets Search for similar small molecules Dock molecules to both primary and off-targets Statistics analysis of docking score correlations … Computational Methodology Xie and Bourne 2009 Bioinformatics 25(12) 305-312
37.
38.
39.
40.
41. Map 2 onto 1 – The TB-Drugome http://funsite.sdsc.edu/drugome/TB/ Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
2D hyperbolic view of the phylogenetic tree, colored based on the origin of sequences (red, ocean data set from CVI; blue, NCBI NR) Alignment performed by MUSCLE from sequences identified in a joined ocean80_nr80 database by PDB-BLAST search. Visualization by HyperTree program from Sugen
Tuberculosis, which is caused by the bacterial pathogen Mycobacterium tuberculosis , is a leading cause of mortality among the infectious diseases. It has been estimated by the World Health Organization (WHO) that almost one-third of the world's population , around 2 billion people, is infected with the disease. Every year, more than 8 million people develop an active form of the disease, which claims the lives of nearly 2 million. This translates to over 4,900 deaths per day , and more than 95% of these are in developing countries. Despite the current global situation, antitubercular drugs have remained largely unchanged over the last four decades. The widespread use of these agents has provided a strong selective pressure for M.tuberculosis, thus encouraging the emergence of resistant strains. Multidrug resistant (MDR) tuberculosis is defined as resistance to the first-line drugs isoniazid and rifampin . The effective treatment of MDR tuberculosis necessitates long-term use of second-line drug combinations , an unfortunate consequence of which is the emergence of further drug resistance. Enter extensively drug resistant (XDR) tuberculosis - M.tuberculosis strains that are resistant to both isoniazid plus rifampin, as well as key second-line drugs . Since the only remaining drug classes exhibit such low potency and high toxicity , XDR tuberculosis is extremely difficult to treat. The rise of XDR tuberculosis around the world imposes a great threat on human health , therefore reinforcing the development of new antitubercular agents as an urgent priority. Very few Mtb proteins explored as drug targets
3,996 proteins in TB proteome 749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage) ModBase contains homology models for entire TB proteome 1,446 ‘high quality’ homology models were added to the data set Structural coverage increased to 43.8% Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models). There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models). However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained) Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF ). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions. The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage , and the three individual scores evalue , z-Dope and GA341 . We consider a MPQS of >1.1 as reliable
(nutraceuticals excluded)
Multi-target therapy may be more effective than single-target therapy to treat infectious diseases Most of the proteins listed are potential novel drug targets for the development of efficient anti-tuberculosis chemotherapeutics. GSMN-TB : Genome Scale Metabolic Reaction Network of M.tb (http://sysbio/sbs.surrey.ac.uk/tb) 849 reactions, 739 metabolites, 726 genes Can optimize the model for in vivo growth Carry out multiple gene inhibition and compute the maximal theoretical growth rate (if close to zero, that combination of genes is essential for growth)