1) The document presents research on using deep neural networks and transfer learning to improve virtual screening for drug discovery.
2) The researchers trained protein family-specific models using the DenseNet architecture on different sized training sets and evaluated using transfer learning and fine-tuning.
3) The results showed that the protein family-specific models outperformed baseline models on standard evaluation metrics, highlighting both the importance of more target-specific models and the need for more data to train such models.
Ăhnlich wie Protein family specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data
Make Sense Out of Data with Feature EngineeringDataRobot
Â
Ăhnlich wie Protein family specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data (20)
sauth delhi call girls in Bhajanpura đ 9953056974 đ escort Service
Â
Protein family specific models using deep neural networks and transfer learning improve virtual screening and highlight the need for more data
1. Protein Family-Specific Models Using Deep Neural
Networks and Transfer Learning Improve Virtual Screening
and Highlight the Need for More Data
Presenter: Aydin Ayanzadeh Authors: Fergus Imrie, Anthony R. Bradley,
Mihaela van der Schaar, and Charlotte M.
Deane*
2. Agenda
â Introduction
â Training Set Size
â DenseNet
â Transfer Learning
â Fine Tuning
â Evaluation Metrics
â Results
â Quantitative Results
â Qualitative Results
2
â Visualization
4. Introduction
â DUD-E, ChEMBL data set.
â Virtual screening is a computational technique used
in drug discovery to search libraries of small
molecules in order to identify those structures
which are most likely to bind to a drug target.
â A major challenge in virtual screening is the
heterogeneity of binding between different targets
arising from the structural diversity of proteins.
4
6. DenseNet
Figure 2. Schematic of the DenseNet architecture used in our model.
6
Model Description
â Dense connections
â strengthen feature propagation,
â encourage feature reuse
â reduce the number of parameters.
â Maintain low complexity features
7. Transfer Learning
â Transfer Learning is the reuse of a pre-trained
model on a new problem.
â very popular in the field of Deep Learning
â ImageNet
â Ensemble learning
7
8. Fine Tuning
â Fine-tuning classifier
â Fine-tuning all layers
Figure 3. (aâd) Illustration of the different training regimes adopted to construct
family-specific models. White corresponds to layers of the model that have been
trained on all training data; blue, to layers that have been trained first on all
training data and then fine-tuned on data from a specific protein family; and
orange, to layers that have been trained only on data from a specific protein
family.
8
9. Evaluation
Metrics
â False Positive. Predict an event when there was no event.
â False Negative. Predict no event when in fact there was an
event
â ROC Curves summarize the trade-off between the true
positive rate and false positive rate for a predictive model
using different probability thresholds.
â Precision means the percentage of your results which are
relevant.
â recall refers to the percentage of total relevant results correctly
classified by your algorithm
â Precision-Recall curves summarize the trade-off between the
true positive rate and the positive predictive value for a
predictive model using different probability thresholds.
9
10. Quantitative Results
10
Table 2. Mean AUC ROC, AUC PRC, and ROC Enrichment Across Targets
in the DUD-E Data Set for Our Method,DenseFS, Compared to Baseline
CNN and the AutoDockVina Scoring Function
11. Quantitative Results
11
Table 5. Mean AUC ROC, AUC PRC, and ROC EnrichmentAcross Targets in the MUV Test Set for
Our Method,DenseFS, Compared to Baseline CNN and the AutoDockVina Scoring Function
12. Quantitative Results
12
Table 5. Mean AUC ROC, AUC PRC, and ROC EnrichmentAcross Targets in the MUV Test Set for
Our Method,DenseFS, Compared to Baseline CNN and the AutoDockVina Scoring Function
13. Visualization
13
Fig. Visualization of the known active CHEMBL293409 ligand (a) docked against the DUD-E target
ANDR. (b and c) Results of the visualization procedure for Baseline CNN and DenseFS, respectively.
Areas of green indicate a score for that region above 0.5, whereas red represents a score below 0.5, with
the intensity depending on the magnitude of the difference. The Baseline CNN assigned the complex an
overall score of 0.34, while DenseFS scored the complex at 0.91.
â Protein families:
â kinase (26 targets)
â protease (15)
â nuclear (11)
â GPCR (5)
â other (45)