This document discusses open source bioinformatics tools and resources for data scientists working in drug discovery. It provides an overview of recent projects involving druggability prediction, protein structure and function prediction, and identification of new targets for cancer. It also summarizes key steps in the drug discovery process and some of the main challenges, including drug resistance and tumor heterogeneity. Resources mentioned include databases of protein structures, drug data, gene expression and pathways involved in DNA damage response.
2. Recent Projects
! Druggability prediction
! 3D structure
! Protein Sequence
! Predict a protein’s druggability based on it’s position in the
protein-protein interaction network
! Drug Resistance
! Therapeutic opportunities
! Identification of new gene targets for cancer
! Are they Druggable?
! Candidate Compounds
! Compounds more likely to be a hit for a bioassay
3. Drug Discovery Process
Early-stage:
Discovery
Optimisation ADMET
Clinical
Trials
Paperwork
• Target Evaluation
• Compound
Screening
• Computational
Chemistry
• Structure-
based Drug
Design
• Absorption
Distribution
Metabolism
Excretion
Toxicity
• Patient
Stratification
• Protocol
• Drug Approval
4. Biology 101
! There is a many to many relationship between Gene and Protein
! A Protein is a large molecule; a Drug is a small molecule
! Gene Expression data
! The amount of a gene produced. Epigenetics.
! highly / lowly / over / under – fold change
! Warning: Platforms and preprocessing
! Gene Copy Number
! Loss / Gain a gene
! On one strand or 2?
! There are only approx. 400 genetic targets of approved
pharmaceuticals
! Only from a handful of Protein Families
! Desperate need for diversity
6. Target Identification
! Prediction of disease-associated genes
! patient level
! gene / protein level
! network
! Prediction of mechanisms of disease
! Epigenetic targets – meta-targets
! Prediction of protein function – from sequence / structure / network
! multi-class; multi-label
! Prediction of 3D structure
! Prediction of protein binding
! New immune targets
7. Druggability Prediction
! Drugs – FDA Approved ~350 Very strict – know
therapeutic benefit
! Drugbank – loose – binds but no therapeutic benefit
! Tractable or Druggable
! Rule of 5 compliant
! Precedence-based
- Druggable families / Homology
- Ligand-based scoring
- Uniprot, bioassays – EBI and Pubchem bioassay
- Statistical analysis
8. Druggability Prediction
! Sequence Analysis
- Amino Acid motifs and composition
- Physicochemical descriptors
- infinite amount – very wide data set
- Supervised classification
! FASTA - can download all human sequences from Uniprot
>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD
! R ProtR ; R Bioconductor
! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y
,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.l
ag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,F
A,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schn
eider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,
9. Druggability Prediction
! 3D structure
- Pockets, surface area
- Ligand interaction fingerprints
- Supervised classification
11. Druggability Prediction
! Interaction Network
! Many use cases
! Data from EBI and Y2H
! List of binary interactions
! Becareful 1: Data is inherently biased
! Becareful 2: Complex interactions
! R iGraph; Gephi for visualisation
! Topological properties
! Community analysis
! Subgraph analysis
! Statistical analysis, network analysis and supervised
classification
15. Compound Bioactivity
! Brute force mass screening
! 1000s compounds screened in batches
! Primary Assays; Secondary / confirmatory assays
! Can be binary classification or regression
! The IC50 is a measure of how effective a drug is.
! Active / inactive : IC50 threshold
! Goal is also to identify diverse compound structures
! Scaffold Hopping
! Same kind of method as Protein Sequence conversion
! Pharmacophore fingerprints
! https://www.chemaxon.com/free-software/
16.
17. Compound ADMET
! Many use cases
! ADMET of hits
! Absorption
! Distribution
! Metabolism
! Excretion
! Toxicity
! Mutagenecity
! Protein binding
18. General Resources
! EBI European Bioinformatics Institute / Pubchem
! API
! Integrates several downloadable Data Sources (expression, Copy
Number, Bioassays, network, disease-specific)
! Baseline data (Normal not diseased)
! Protein Data Bank – 3D Structures
! DrugBank
! Cancer – The Cancer Genome Atlas (TCGA) and International
Cancer Genome Consortium (ICGC)
! Coding Tools – R Bioconductor , BioPerl, BioPython
! https://docs.chemaxon.com/display/docs/Documentation
19. General Resources
! canSAR database
! Integration of biological, pharmacological, chemical, structural
biology and protein network data
20. Beware 101
! Non-standard Gene names
! Some experiments Genes, some are Proteins
! We need new Drug Targets, different from established ones.
! Keep in mind when analysing results
! Cancer is difficult
! Drug resistance
! Data is not up with the science
! Tumour Heterogeneity
! Wide data = random patterns
! Different expression / sequencing platforms
21. Therapeutic Opportunities
! Approximately only 350 - 400 protein targets
! DNA damage response (DDR) is essential for maintaining
the genomic integrity of the cell
! Currently targeted by chemotherapy and radiation. Goal is for
small molecule targeting
! TCGA Patient Analysis: Expression, Copy Number Variation
and Mutation data.
! 15 cancer disease types
! Telegraph March 2015
! New drugs to tackle cancer cell weak spots could end
'scattergun' chemotherapy
Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G.
Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews
22. Therapeutic Opportunities
! Statistical analysis of DDR deregulation in patients compared
to a random set of genes
! Druggability prediction of deregulated DDR genes
! Synthetic Lethality analysis of Yeast DDR orthologues
! Two genes are synthetic lethal if mutation of either alone is fine
but mutation of both leads to cell death. Targeting a gene that is
synthetic lethal to a cancer-relevant mutation theoretically will
kill only cancer cells.