Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Introduction to Chemoinformatics

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 33 Anzeige

Introduction to Chemoinformatics

Herunterladen, um offline zu lesen

AACIMP 2010 Summer School lecture by Igor Tetko. "Physics, Сhemistry and Living Systems" stream. "Chemoinformatics" course.
More info at http://summerschool.ssa.org.ua

AACIMP 2010 Summer School lecture by Igor Tetko. "Physics, Сhemistry and Living Systems" stream. "Chemoinformatics" course.
More info at http://summerschool.ssa.org.ua

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Introduction to Chemoinformatics (20)

Weitere von SSA KPI (20)

Anzeige

Aktuellste (20)

Introduction to Chemoinformatics

  1. 1. Introduction to Chemoinformatics Dr. Igor V. Tetko Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU) Kyiv, 13 August 2010, V Summer School AACIMP 2010
  2. 2. Helmholtz Zentrum München statistics •  Part of Helmholtz Network (€3 Milliards, 30000 people) •  Leading center for Environmental Health in Germany •  25 institutes (1789 people, 569 scientists & 280 PhD students)* •  70 contracts with EU •  Disciplines •  Biology 40% •  Chemistry/biochemistry 15% •  Physics 10% •  Medicine 8% •  Chemoinformatics group, Institute for Bioinformatics & Systems Biology   9 peoples, strong expertise in in silico data analysis, machine learning methods, chemoinfomatics software development, data dissemination *January 2010
  3. 3. Layout •  Chemoiformatics??? What does it mean? •  Role of chemoinformatics •  in drug discovery •  Chemical Biology, NIH Roadmaps •  REACH, chemical safety •  Definitions of chemoinformatics •  OCHEM •  Overview of the demonstration course
  4. 4. Cheminformatics & Bioinformatics keywords in SCOPUS 250 70000 2000 200 2001 60000 2000 2002 2001 50000 2002 150 2003 40000 2003 2004 2004 100 2005 30000 2005 2006 2006 20000 2007 50 2007 2008 10000 2008 2009 0 2009 0 Chemoinfomatics Cheminformatics Bioinfomatics There was no dedicated journal, about 10x lower if journals many authors does not explicitly use world and references are exclued chemoinformatics
  5. 5. Dmitrii Ivanovich Mendeleev, 1834-1907 Discoverer of the Periodic Table — An Early “Chemoinformatician”
  6. 6. Why Mendeleev? Faced with a large amount of data, with many gaps, Mendeleev: •  Sought patterns where none were obvious, •  Made predictions about properties of unknown chemical substances, based on observed properties of known substances, •  Created a great visualization tool! Why he? Why at that time? “The Law of transformation of Quantity into Quality” by Karl Marx
  7. 7. Quantity aspects of chemoinformatics
  8. 8. Major challenges of Chemoinformatics Millions of structures Thousand of publications Storage, organization and search of information QSAR / QSPR studies: prediction of properties and activities Chemical biology & drug discovery In silico environmental toxicology, REACH QSAR/QSPR - Quantitative Structure-Activity (Property) Relationship Studies
  9. 9. Goal of chemistry: Synthesis of Properties The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties George S. Hammond Norris Award Lecture, 1968
  10. 10. Where Chemoinformatics is required? Pharma companies and public organisation •  data collection and handling •  model development QSAR/QSPR, in silico design •  In vitro, in vivo data analysis and interpretation REACH •  > 140,000 compounds •  Development of in silico methods to predict toxicity of chemical compounds Chemical industry •  Design of new properties; chemical synthesis •  Prediction of toxicity of chemicals BEFORE they enter market
  11. 11. Pharma R&D: Cost and Productivity issues Compound numbers 1.000.000 Cpds <5000 Cpds < 500 Cpds < 5 Cpds Up to 15 Years: 1 drug Lead Lead Optimisation Lead Profiling Clinics Approval Target Discovery Discovery ADME/T ADME/T In vitro and in vivo property determination: Millions of screens for solubility, stability, absorption, metabolism, transport, reactive products, drug interactions, etc etc Preclinics Costs: > $300m PER COMPOUND to reach approval
  12. 12. Pharma: Declining R&D productivity in the drug discovery industry Approved medicine See http://www.frost.com/prod/servlet/market-insight-top.pag?docid=128394740
  13. 13. Pharma R&D Cost and productivity: Reasons for compound failure Pharmacokinetics Animal toxicity Adverse effects Lack of efficacy Commercial reasons Miscellaneous TOP four reasons are connected to compound Absorption, Distribution, Metabolism and Excretion, all of which may contribute to lack of efficacy and Toxicity: ADME/T issues Experimental measurements or computational predictions? Chemoinformatics!
  14. 14. Public organizations: NIH Roadmaps Dr. Elias Zerhouni Former NIH director (2002-2008) $29.5 billions in 2008 for 27 Institutes Ukraine GDP is ca $127 billions Chemical Biology Belarus GDP is ca $48 billions
  15. 15. Public organizations: NIH Roadmaps New pathways to discovery Molecular Libraries Screening Center Network •  understand complex biological (MLSCN) systems •  10 centers were created •  understand their connections •  250 assays to screen >200,000 molecules •  build better "toolbox" for researchers Chemoinfomatics! Research teams of the future •  PubChem database to handle data Re-engineering the clinical research team
  16. 16. REACH Registration, Evaluation, Authorisation and Restriction of Chemical substances European Chemicals Agency (ECHA) in Helsinki
  17. 17. REACH and Environmental Chemoinformatics > 140,000 chemicals to be registered It is expensive to measure all of them ($200,000 per compound), a lot of animal testing => €24 billions over next 10 years QSAR models can be used to prioritize compounds •  Compound is predicted to be toxic •  Biological testing will be done to prove/ disprove the models •  Compound is predicted to be not toxic •  tests can be avoided, saving money, animals •  but ... only if we are confident in the predictions
  18. 18. "One can not embrace the unembraceable.” Possible: 1060 - 10100 molecules theoretically exist Achievable: 1020 - 1024 can be synthesized now by companies (weight of the Moon is ca 1023 kg) Available: 2*107 molecules are on the market Measured: 102 - 105 molecules with ADME/T data Kozma Prutkov Problem: To predict ADME/T properties of just molecules 1080 atoms in the Universe on the market we must extrapolate data from one to 1,000 - 100,000 molecules!
  19. 19. Applicability domain: Key points There are NO universal computational models that work well If x is small, on the whole chemical space Sin(x) ≈ x
  20. 20. Applicability domain: Key points There are NO universal computational models that work well on the whole chemical space Performance on the test set Performance on a dissimilar set
  21. 21. Making predictions with the experimental accuracy: case study of 96,000 Pfizer molecules Non-reliable Reliable 60% of molecules are predicted with average accuracy of 0.3 log units Tetko et al, Chem. & Biodiver., 2009, 6(11), 197-202.
  22. 22. Definitions of Chemoinformatics Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. G. Paris, 1988 Chemoinformatics is the mixing of those information resources to transform data in to information and information in to knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization. F.K. Brown, 1998 Chemoinformatics is the application of informatics methods to solve chemical problems. J. Gasteiger, 2004 Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in multidimensional chemical space. A. Varnek & A. Tropsha, 2007.
  23. 23. Chemoinfomatics: data to knowledge
  24. 24. What is required to create a good (chemoinformatics) model? •  Data is the most essential part! •  Appropriate representation of molecules (choice of descriptors is crucial) •  Good statistical methods
  25. 25. >100,000 lines of code in Java
  26. 26. Database schema Simplified overview Property Condition Filtering Toxicology, Biology, Temperature, logP = 0.5 Partition coefficient. pH, species, Melting Point = 100 tissue, method °C Tags Data Point Introducer Bill G., Sergey B. Toxicology, Biology, Partition coefficient. Date of modification Informationsystem Structure Article Manipulation Benzene, Urea, ... Editing Garberg, P Organization “In vitro models for …” Working sets<
  27. 27. Practical course on using OCHEM and data modeling on-line Tomorrow 10:00-12:00 auditorium 235-6 Advice: make yourself familiar with the on-line tutorials available as http://ochem/tutorials If you have data (SMILES/sdf file and molecular activities) – bring it with you – we can try to build model.
  28. 28. HMGU Team
  29. 29. Interested in Short-Term stays (3-12 months)? Apply!!! http://www.eco-itn.eu

×