In-silico study of ToxCast GPCR assays by quantitative structure-activity relationships (QSARs) modeling. Presented at ACS 2014 11 August 2014, San Francisco, USA 2014
The EPA tested several thousand chemicals in 700 toxicity-related in-vitro HTS bioassays through the ToxCast and Tox21 projects. However, the chemical space of interest for environmental exposure is much wider than this set of chemicals. Thus, there is a need to fill data gaps with in-silico methods, and quantitative structure-activity relationships (QSARs) are a cost effective approach to predict biological activity. The overall goal of this project was to use QSAR predictions to fill the data gaps in a larger environmental database of ~30K structures. The specific aim of the current work was to build QSAR models for multiple ToxCast assays using a subset of 1800 chemicals tested in 18 G-Protein Coupled Receptor (GPCR) assays. These assays are part of the aminergic category which was among the most active within the biochemical assays. Using PLSDA for the human histamine H1 GPCR assay, the classification accuracy reached 94% with a non-error rate of 89% in fitting and 80% in 5-fold CV, with only 2 latent variables. These results demonstrate the ability of QSAR models to predict bioactivity.
Ähnlich wie In-silico study of ToxCast GPCR assays by quantitative structure-activity relationships (QSARs) modeling. Presented at ACS 2014 11 August 2014, San Francisco, USA 2014
Ähnlich wie In-silico study of ToxCast GPCR assays by quantitative structure-activity relationships (QSARs) modeling. Presented at ACS 2014 11 August 2014, San Francisco, USA 2014 (20)
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
In-silico study of ToxCast GPCR assays by quantitative structure-activity relationships (QSARs) modeling. Presented at ACS 2014 11 August 2014, San Francisco, USA 2014
1. Office of Research and Development
In-silico study of ToxCast GPCR
assays by quantitative structure-
activity relationships (QSARs)
modeling
Kamel Mansouri
ORISE postdoctoral fellow
National Center for Computational Toxicology, U.S. EPA
The views expressed in this presentation are those of the author and do not
necessarily reflect the views or policies of the U.S. EPA
ACS 2014
11 August 2014, San Francisco
2. Office of Research and Development
National Center for Computational Toxicology
Outline
• ToxCast
–High-throughput Screening Data Generation
–Available data with lots of gaps to fill
–Time, cost, ethics
• GPCR assays
–Importance
–Classification models
–Regression models
–Predictions
• Future Plans
–Model other assays and extend the prediction list
2
3. Office of Research and Development
National Center for Computational Toxicology
Problem Statement
3
Too many chemicals to test with standard
animal-based methods
–Cost, time, animal welfare
Alternative
4. Office of Research and Development
National Center for Computational Toxicology
ToxCast / Tox21 Overall Strategy
• Identify targets or pathways linked to toxicity
• Develop high throughput assays for these targets or pathways
• Develop predictive systems models
– in vitro → in vivo
–in vitro → in silico
• Use predictive models (qualitative):
–Prioritize chemicals for targeted testing
• High Throughput Risk Assessments (quantitative)
• High Throughput Exposure Predictions
4
5. Office of Research and Development
National Center for Computational Toxicology 5
Testing under ToxCast and Tox21
Chemicals, Data and Release Timelines
Set Chemicals Assays Endpoints Completion Available
ToxCast Phase I 293 ~600 ~700 2011 Now
ToxCast Phase II 767 ~600 ~700 03/2013 Now
ToxCast E1K 800 ~50 ~120 03/2013 Now
Tox21 ~9000 ~80 ~150 Ongoing Ongoing
ToxCast Phase III ~900 ~300 ~300 Just starting 2014-2015
Chemicals
Assays
~600
~9000
6. Office of Research and Development
National Center for Computational Toxicology
QSAR for ToxCast targets: GPCRs
• G-Protein coupled receptors
• Common drug targets
• Chemicals are often promiscuous across GPCRs, leading to
off-target side effects
• ToxCast includes a large number of ~1000 diverse chemicals
6
8. Office of Research and Development
National Center for Computational Toxicology
http://dx.doi.org/10.1016/j.str.2011.05.012
Muscarinic AChR
GPCR Family Tree
8
Biogenic Amine
Receptors
Dopaminergic
Histamine
Serotonin
Adrenergic
9. Office of Research and Development
National Center for Computational Toxicology
GPCR Targets for QSAR modeling
9
Assay Symbol Name
Hm1-5 CHRM1-5 cholinergic receptor, muscarinic 1-5
gMPeripheral_NonSelective M1 Muscarinic receptor peripheral
hAdrb2 ADRB2 adrenergic, beta-2-, receptor, surface
bDR_NonSelective DRD1 Dopamine receptor D1
h5HT2A HTR2A 5-hydroxytryptamine (serotonin) receptor 2A
rAdra1A,B Adra1a,b adrenergic, alpha-1A-B, receptor
rAdra1_NonSelective Adra1a adrenergic, alpha-1A-, receptor
hH1 HRH1 Histamine receptor H1
gH2 Hrh2 Guinea pig histamine receptor H2
rAdra2_NonSelective Adra2a adrenergic, alpha-2A-, receptor
hAdra2A ADRA2A adrenergic, alpha-2A-, receptor
rmAdra2B Adra2b adrenergic, alpha-2B-, receptor
10. Office of Research and Development
National Center for Computational Toxicology
10
INITIAL LIST OF CHEMICALS
File parsing & first check
Check for mixtures
Check for salts & counter ions
Normalization of structures
Manual inspection
Removal of duplicates
Treatment of tautomeric forms
CURATED DATASET
Check for inorganics
Curation of chemical structures for QSAR
KNIME workflow
(EPA/NCCT, DTU-food, UNC/MML)
11. Office of Research and Development
National Center for Computational Toxicology 11
Training & prediction sets:
• Training set (ToxCast):
• 1005 Chemicals with data 18 Endpoints
• Binary classes (qualitative models)
• AC50 (quantitative models)
• Prediction Set (Human Exposure Universe):
• 32464 Chemicals to be predicted
Molecular descriptors calculation:
• 1022 molecular descriptors 2D chemical structures.
• Software: Indigo, RDKit, CDK and MOE.
• Reduce collinearity,
• correlation threshold of 0.96 was applied Constant
& near constant removed
• Descriptors with missing values removed.
• The remaining set consisted of 470 descriptors.
QSAR-ready data
12. Office of Research and Development
National Center for Computational Toxicology
Preliminary structure-activity analysis
(Self-organizing maps)
12
Supervised-learning SOM map of all actives for
the 18 assays with the set of 470 descriptors.
Fitting
NER Ac
81.79 95.4
5-fold CV
NER AC
66.8 89.5
ROC curves of the SOM map in fitting
for active and non-active compounds.
13. Office of Research and Development
National Center for Computational Toxicology 13
A
B
Method PLSDA KNN SVM
Endpoint
(Positives)
Dsc LVs Fitting 5-fold CV Dsc k Fitting 5-fold CV Dsc C Fitting 5-fold CV
NER Ac NER Ac NER Ac NER Ac NER Ac NER Ac
hH1 (37) 26 5 91.9 89.4 91.5 88.6 27 1 81.1 96.2 81.2 96.3 30 5 87.8 99.1 68.5 96.9
hM1-5 (76) 15 3 84.1 82.9 85.7 83.6 19 4 76.7 93.8 78.8 94.3 12 10 94.7 99.2 76.6 94.7
All (115) 20 5 84.0 85.8 82.3 84.1 22 1 76.8 90.3 78.1 90.1 16 10 94.3 98.7 75.9 92.1
• Partial least squares -
discriminamt analysis
• K nearest neighbors
• Support vector machines
Machine learning methods evaluated
Genetic algorithm for feature selection
16. Office of Research and Development
National Center for Computational Toxicology
Clustered heatmap of the 18 assays
(experimental and predicted values of AC50)
16
Training set:
1005 chemicals
Predicted:
778 chemicals
Chemicals
Assays
Data gaps of ToxCast for 18 GPCR assays filled
17. Office of Research and Development
National Center for Computational Toxicology
Examples of chemical structures
17
Most potent in training set Most potent in prediction set
18. Office of Research and Development
National Center for Computational Toxicology
Summary and future directions
• High accuracy classification & regression models were built on quality data
• ToxCast data gaps filled with:
1. Qualitative predictions: active/ inactive
2. Quantitative predictions for active chemicals: estimation of AC50
• ToxCast is providing large data sets ready for building new QSAR models
–300 gene targets
–1000-8000 chemicals
• Forms the basis for predicting other inventories
– Validate with external test sets
– Apply all models on a list of 32,000 chemicals: Human Exposure Universe
– Use QSAR predictions to help regulators in the prioritization procedures
18
19. Office of Research and Development
National Center for Computational Toxicology
Acknowledgements
EPA NCCT
Rusty Thomas
Kevin Crofton
Keith Houck
Ann Richard
Richard Judson
Tom Knudsen
Matt Martin
Woody Setzer
John Wambaugh
Monica Linnenbrink
Jim Rabinowitz
Steve Little
Agnes Forgacs
Jill Franzosa
Chantel Nicolas
Bhavesh Ahir
Nisha Sipes
Lisa Truong
Max Leung
Kamel Mansouri
Eric Watt
Corey Strope
EPA NCCT
Nancy Baker
Jeff Edwards
Dayne Filer
Jayaram Kancherla
Parth Kothiya
Jimmy Phuong
Jessica Liu
Doris Smith
Jamey Vail
Hao Truong
Sean Watford
Indira Thillainadarajah
Christina Baghdikian
Contributors:
Nisha Sipes
Richard Judson
www.epa.gov/ncct/
US EPA National Center for Computational Toxicology