1. Towards a modular Web-based
Workflow environment for enabling
large scale Virtual Screening in Cancer
Chemoprevention Research
19 June 2012
COST Conference
Personalised Medicine: Better Healthcare for the Future
Christos Kannas
Computer Science Dept., University of Cyprus
2. June 19, 2012 2
Outline
• About the Project
• Overview of the Project
• Objectives
• State of the Art Review
• Implementation
• Virtual Screening Process
• Predictive Model Preparation
• In-Silico Tools and Methods
• Early in silico experiments
• Concluding Remarks
3. June 19, 2012 3
About the Project
• The vision of the GRANATUM project is to:
• bridge the information, knowledge and collaboration gap among
biomedical researchers in Europe (at least),
• ensure that the biomedical scientific community has homogenized,
integrated access to the globally available information and data
resources needed to perform complex cancer chemoprevention
experiments, and conduct studies on large-scale datasets.
• The GRANATUM project is partially funded by the
European Commission under the Seventh Framework
Programme in the area of Virtual Physiological Human
(ICT-2009.5.3).
• http://www.granatum.org/
5. June 19, 2012 5
Objectives
• Design a scientific algorithmic workflow for the development
of in silico chemoprevention models.
• Implement workflow(s) for the selection of promising
chemopreventive agents.
• Connect the custom in-silico models for compound selection
to other datasets, and evidence included in the Linked
Biomedical Data Space.
• Test the performance of custom in-silico models.
6. June 19, 2012 6
State Of the Art
• Significant overlap of chemoprevention and traditional
drug discovery process (DDP).
• Special case with additional constraints, e.g. no toxicity
• In Silico Models and Tools: heavily borrowing from DDP.
SOA Review
Online resources Databases (e.g. ChemBL), journals, reports, …
Infrastructure tools Chemoinformatics toolkits (e.g. RDKit and CDK):
compound representation, property and
descriptor calculation, substructure mining, …
Advanced comp. chem. Biological property predictive models, compound
3D conformations, docking tools, …
Machine learning Classification and regression methods, available
open source libraries
Scientific workflow Knime, Taverna, Galaxy, …
systems
8. June 19, 2012 8
Predictive Model Preparation Template
Chemical
data
Algorithm
Biological • Algorithm
data parameters
Predictive
Model
9. June 19, 2012 9
Chemopreventive Property Models
Anti – Anti – Anti – metastatic Estrogenic
Anti – oxidant Apoptotic
inflammatory proliferating / Anti – agiogenic Activity
Anti-apoptotic
Cyclin D1 members of
COX-2 but not COX-2 down- ER-alpha
Direct Effect down- Bcl-2 family
COX-1 inhibitor regulation binding affinity
regulation down-
regulation
IAP family
Reduction of Her-2 down- VEGF down- ER-beta
Indirect Effect down-
TNF-a regulation regulation binding affinity
regulation
Caspase up-
Direct/Indirect Reduction of Cyclin E down- PDGF down- ER-alpha/beta
regulation/activ
Effect LOX regulation regulation binding affinity
ation
Induction of EGFR down-
No affinity
AP-1 regulation
Reduction of Estrogen
Interleukins Antagonists
Selective
Estrogen
Receptor
Modulators
(SERMs)
Estrogen
Receptor
Modulators
10. June 19, 2012 10
In Silico Tools and Methods
• Generic Chemoinformatics Tool:
• E-Health Lab and collaborators resources
• RDKit
• Docking Experiment Tools:
• AutoDock Vina
• Chil2 GlamDock
• Data Mining & Statistics Tools:
• In house tools
• R
• Scientific Workflow System:
• Galaxy
11. June 19, 2012 11
Early In Silico Experiments
• In silico tool & models validation
• Steps:
• Prepare compound dataset
• Mix of natural products and known inhibitors (4% actives)
• Implementation/application of predictive models
• Rule of Five
• Toxicity model
• Implementation/application of docking model
• ER-alpha
• Compound prioritization
• Top selections visualization/evaluation
12. June 19, 2012 12
Virtual Screening Process Example
Natural products Calculate
collection + physicochemical
Rule of Five filter
known ER-alpha molecular
inhibitors descriptors
Compound
prioritization; Docking to ER-
Toxicity model
Report on top alpha
selections
13. June 19, 2012 13
Cytotoxicity Predictive Model
Cytotoxicity
Cytotoxicity Clean Oral Drug- Morgan
Predictive
Dataset Molecules like Filtering Fingerprints
Model
• Source : The • Remove Salts • HBA <= 10 • Bit Vector 2048- • Cytotoxicity
Scripps • HBD <= 5 bits Bio-Chemical
Research • Molecular data
Institute Weight <=500 • SVM:
Molecular • Kernel: Linear
• logP <= 5
Screening
• Stratified K-
Center
Fold:
• PubChem Bio-
• 5-folds
Assay: AID 464
• 10-folds
• Tested: 706
• Active: 331
• Inactive: 375
14. June 19, 2012 14
Virtual Screening Process Example
Demo Dataset
Result: 2451 OK, Remove 85 (valence errors, empty
Known ER-Alpha Inhibitors (42) Indofine Dataset (2494)
molecule block)
Clean Molecules
Remove Salts 2451 molecules (42 Known, 2409 Indofine)
Oral Druglike Filtering
HBA <= 10 HBD <= 5 Molecular Weight <=500 logP <= 5 Result: 2035 pass, 416 not pass
Calculate Morgan Fingerprints
row-20-top-known row-36-top-known row-42-top-known
Bit Vector 2048-bits
Cytotoxicity (Predictive Model)
SVM Classifier Trained with Bio-Assay 464 dataset Predict: 2451 molecules
ER-Alpha Docking (GlamDock)
ER-Alpha Protein 2451 molecules for docking experiments
Ranked order of Cytotoxicity Prediction, Docking and Oral Druglikness Filtering results
row-1988-top-unknown
row-729-top-unknown row-1652-top-unknown
15. June 19, 2012 15
Docking results: known ER inhibitors
row-20-top-known
16. June 19, 2012 16
Docking results: known ER inhibitors
row-36-top-known
17. June 19, 2012 17
Docking results: known ER inhibitors
row-42-top-known
18. June 19, 2012 18
Docking results: Indofine compounds
row-729-top-unknown
19. June 19, 2012 19
Docking results: Indofine compounds
row-1652-top-unknown
20. June 19, 2012 20
Docking results: Indofine compounds
row-1988-top-unknown
21. June 19, 2012 21
Concluding Remarks
• Support of chemopreventive specific predictive models.
• Initial promising results on ERa (based on Indofine dataset).
• Modular architecture and workflow management.
• Integrated with additional tools within the Granatum
Project.
• Linked Biomedical Data Space.
• Social Collaborative Workspace.
• Product Release:
• Advanced Prototype Version: October 2012
• Final Version: April 2013
Emphasize Chemopreventive specific Predictive Models.
Expert User with knowledge on biology, chemistry, data mining and knowledge extraction algorithms.
Search from LBS, get ZINC database.Molecular DescriptorsGet compounds that might be active to estrogen receptors.Prepare files required for docking.Docking to ER-a using AutoDock Vina.Gather results.