A lecture on molecular docking that I give for master students at University Paris Diderot.
Warning: this presentation has numerous animations which are not included in the slideshare document.
https://florentbarbault.wordpress.com/
6. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
1. Introduction
molecular docking: prediction of the association between two molecules
Experimentally, the interaction process between two compounds is never easy
and provides, no to few informations about the structure.
We use computational approaches to:
Observe how a compound is structurally placed with (or inside) its partner
Understand the recognition process and establish structure activity/property
relationships
Predict on a database of chemical compounds which ones are the most able
to interact with the target
Molecular docking is mainly applied in the field of medicinal chemistry. However,
we can apply this technique to study the biological interactions between two
macromolecules (protein/protein or DNA/protein) or any other interactions.
8. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
2. Basic concepts
A drug always acts on a bio-macromolecule (protein, DNA or RNA) as a key (ligand)
in a lock (target).
Most of the time we wish to directly compete with the substrate.
enzyme
+
Substrate
+
drug
Competitive inhibition:
concentration and affinity are key elements for inhibiting the enzyme.
It's the most widespread case.
9. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
2. Basic concepts
A + B AB DG = DH -TDS
KD =
A B
AB
=
1
KA
Even the most complex biomacromolecules obey to thermodynamic.
If DG is negative the reaction will be driven toward the AB formation.
If DG is decreased by 2.7 kcal/mol then the dissociation constant (KD) change
from 100 to 1 and the association population evolve from 50% to 99% (Boltzmann
statistic's).
This logarithmic dependency shows the problem of accuracy in molecular
modeling
∆Gbind = RTlogKD
10. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of
molecular docking
11. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.1. Basic knowledge
We know the 3D structure of the target and we wish to simulate the interaction
of a database of compounds (around 1 million!)
One naive approach is to perform molecular dynamics in explicit solvent
Protein is embedded in a box
Ligand is randomly placed in this box
MD predicts the interaction
This should work but this requires trajectory in the
scale of ms to s whereas we generally perform ns
to µs.
See David Shaw
We need other methods, more direct, since the
interaction prediction of two molecules is highly
complex and requires tremendous explorations
12. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.2. Target structure
3.2.1. Sources
A target 3D structure is required!
The PDB (protein databank)
➔ Xray diffraction
● No size limit
●More accurate
●Unique structure (of the crystal)
●Crystallization problems
●Hydrogen are missed
➔ NMR
● Lowest accuracy
●Solution structure
●Size limit around 150 residues (for a
protein)
●Average structure
➔ Homology modelling
● Free and quick
●No experimental
●Low precision of sidechains
●Sequence similarity or identity?
13. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
Accuracy is an important parameter: RX
3.2. Target structure
3.2.2. Resolution
Here precision, accuracy is very good.
14. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
A protein alpha-helix with different resolution
3.2. Target structure
3.2.2. Resolution
3. Preparation steps of molecular docking
15. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
In NMR the resolution is hard to determine numerically:
Generally we look at the RMSD or the number of restraints by residue.
3.2. Target structure
3.2.2. Resolution
16. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.2. Target structure
3.2.2. Resolution
For homology modelling (comparative modelling) the resolution has no real
meaning.
In all cases, it is essential to have a feeling of the target structure resolution at the
itneracting site location. For enzyme, generally, this area is the best defined.
Beware: for Xray structures some protein parts or atoms may be missed. In this
case, we choose to add or not these parts depending of their location or influence
for the chemical association.
To sum-up, it is always required to gather as much as you can information about
the target.
17. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.2. Target structure
3.2.3. Treatment
Experimental structures are far from
being perfect!
You can find in them:
o Ions
o Water
o Soap
o Glycosyl
o Antibody
o Chaperon proteins
o Missing atoms…
You must clean the pdb file
18. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
Where is the interacting site on the protein?
Three major methods:
Experimental complex
Safer method
We need an identical mechanism for ligands
Analysis of structural properties
Cavity detection is complex
More an art than a definite method
Molecular docking of the whole protein
Time consuming and boring
Needs a lot of docking poses (~ 1000) to do statistics
Generally we have “surprising” results
3.3. Interacting site:
19. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.3. Interacting site
The cavity detection method "knob & hole".
Principe:
We consider a sphere of a given volume V. The center of this sphere is placed on
the molecular surface (Connoly). We roll this sphere around the molecular surface
and we compute the common volume, Vcom, which belongs also to the protein.
0 < Vcom ≤
V
3
V
3
< Vcom <
2V
3
2V
3
≤ Vcom < V
if
Knob
Plane
Hole
20. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.3. Interacting site
"Knob & Hole" cavity detection technique
*
Knob
Hole
21. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.4. Ligand structure
Ligands are generally molecular organic compounds. We use GUI software
(Graphical User Interface), working with the molecular mechanic theory, such as
Maestro, Sybyl, Accelrys, Moe, ICM...
22. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.4. Ligand structure
Not an easy step:
No or scarce experimental 3D structures (CSD)
No absolute force-field parameters
Sometimes stereochemistry is not an issue for organic chemist’s ; but not
for you.
Ionization states?
Physiological pH?
Atomic type
hybridization
Tautomeric forms
Partial atomic charges
Resonance structures
23. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.5. Flexibility
During the interaction, the ligand flexibility is highly engaged whereas the
protein (larger molecule) hardly moves.
Rigid docking
Flexible docking
Induced fit docking
24. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
3.5. Flexibility
3.5.1. Ligand flexibility
It is impossible to manage all ligand cartesian coordinates. Thus, only rotatable
dihedral angles (torsion) move. Rings are maintained fixed so that they must be
correctly minimized.
Some questions remain:
resonance angle, peptide bond, guanidinium... how to manage them, fixed or
rotatable?
25. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
3. Preparation steps of molecular docking
Other anecdotic method: make a rigid docking with several ligand conformations.
Captopril
3.5. Flexibility
3.5.1. Ligand flexibility
26. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Direct methods are still in development. In Autodock 4.2, the user can define, for
few protein residues inside the active site, sidechain torsion angles.
3.5. Flexibility
3.5.2. Target flexibility
Advantage:
You choose the amino-acids you
want to involve
Drawbacks:
Difficult to choose which amino-
acids
Only sidechain movements are
considered
Possible explusion of the ligand
by collapse of the rotatable
residues
3. Preparation steps of molecular docking
27. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Indirect methods: A molecular docking is performed. Then a molecular dynamic
simulation of the obtained complexe is realized...
3.5. Flexibility
3.5.2. Target flexibility
Advantages:
With methods such as MMPBSA you
can determine (evaluate) binding
free energy
You can explore the physical
chemistry of the recognition process
You have access to statistical view of
interaction (hydrogen bond lifetime)
Drawback:
If the starting structure is not
correct...
3. Preparation steps of molecular docking
28. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Indirect methods: A MD simulation is made with the apo protein. Representative
structures are then extracted and molecular docking is performed with these
targets.
3.5. Flexibility
3.5.2. Target flexibility
Advantage:
Real consideration of the apo
protein
Drawback:
How to extract "representative"
target conformations?
What about the molecular docking
precision?
3. Preparation steps of molecular docking
31. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
4. Manual docking
Looks like a joke... The ligand is placed in the interacting site and the association
energy is calculated at each steps.
The user manually moves, rotates or translates the compound inside the protein
cavity. A new association energy is recorded... etc
Advantages:
Quick (and dirty?)
Can be very efficient if the user knows well the interacting site
Drawbacks:
Users dependant
You can really obtain stupid results
This rudimentary method surprisingly provided interesting results in the past. It
is still applicable if only small ligand modifications are explored.
33. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5. Automatic docking
5.1. Rules
Principles:
Ligand is automatically placed onto the macromolecule. More exhaustive and
safer this technique requires long CPU time.
34. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.1. Rules
Dreaming about a perfect molecular docking technique:
Reasonable computation time
The global minimum of the ligand/target interaction energy is reached
The calculated free energies reproduce the experimental ones
Experimental interaction patterns observed in XRay complexes are identical
Generally the molecular docking simulation can be shared in two steps.
DOCKING
Searching algorithm:
- Conf ormational exploration
- Several possible docking poses
Scoring function:
- Energy quantification
- Ranking of docking poses
- Clustering
5. Automatic docking
35. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
A box is drawn on the protein
macromolecule. Therefore, the
interaction will be explored only on this
box. This drastically limits the
computational time.
Beware:
o If the box is too small, docking will be
false
o If the box is too large, exploration
must be more intensive and could
provides strange "false positive"
ligand conformations
5.2. Algorithms and methods
5.2.1. Grid method
5. Automatic docking
o Take care of the amino-acids you want to embedded in the box (especially
the charged residues)
36. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
For all points (nodes) of the grid
a probe atom is positioned.
There are as many probes as
ligand atom types. A
supplemental probe of a +e
charge is also considered for the
electrostatic computation.
The software places iteratively
the probe atom in each node
points and then compute the
energy. These values (tables) are
recorded in map files.
5.2. Algorithms and methods
5.2.1. Grid method
5. Automatic docking
37. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
C
H
H
O
1
2
34
The evaluation of the interaction energy is instantaneous:
𝐸𝑖𝑛𝑡𝑒𝑟
𝑔𝑟𝑖𝑑
= 𝐸 𝑂
4
+ 𝐸 𝐶
3
+ 𝐸 𝐻
1
+ 𝐸 𝐻
2
Computationally, the energy calculation is made by tables summations. However,
molecule is considered as a list of points without bonds.
5.2. Algorithms and methods
5.2.1. Grid method
5. Automatic docking
Formol example:
38. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Now, we have to explore box in order to find the global optimum.
It's a "classical" molecular modelling problem... without absolute solution.
In docking several exploration methods are used:
Molecular dynamics (global search)
Simulated annealing (global search)
Genetic algorithm (global search)
Conjugated gradient (local serach)
Actually, the best method seems to be a genetic algorithm (Lamarckian)
followed by some steps of conjugated gradient.
5.2. Algorithms and methods
5.2.1. Grid method
5. Automatic docking
39. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Dihedral angles are translated in genes (binary)
101001010101001011001100111010101010
A random initial population is easily generated
001011010111000101001010011101010101
101010010111101110001010010100111010
110101101011100010100101001110101010
010111110110101110001010010100111010
......
40. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Starting population
genotypes phenotypes
Parents selection
fitness fonction
Children
This process is stopped after several defined steps
translation
Crossing
mutation
42. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Starting population
genotypes phenotypes
Parents selection
fitness fonction
Children
This process is stopped after several defined steps
translation
Crossing
mutation Parents optimisation
44. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
45. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
46. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
*
*
*
** *
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
47. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
*
assumption: distances between center of spheres correspond to inter-atoms
distances (heavy atoms)
*
*
*
**
*
* *
*
*
**
*
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
48. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
49. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The DOCK software used this method.
This technique acts more on the shape of molecules than on interactions
complementarity.
Some issues:
Sphere dimensions?
Matching of sphere centers?
Ligand flexibility?
5.2. Algorithms and methods
5.2.2. Sphere method
5. Automatic docking
This old method has proven its
efficiency and is still employed.
50. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
N
H
N
O
OH
NH2 OH
O
O
OH
N
H
NH2
N
OH
O
Fragments
51. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Definition of interactions
as "umbrellas"
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
O
H N
52. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
NH
NH2
N
H
NH2
The base fragment is
placed by triangulation
53. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
NH
NH2
The second fragment is
linked to the first.
Torsion exploration is made
to find the best pose for this
new fragment
O
NH
NH2
OH
O
OH
NH
NH2
54. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
O
NH
NH2
N
O
O
-
OHThe ligand is then incrementally
build In the protein
55. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The target can only be a protein
Umbrella interactions:
Hbond
electrostatic
hydrophobic contact
This method tends to
overestimate the importance of
Hbonds regarding others
interactions.
5.2. Algorithms and methods
5.2.3. Incremental method
5. Automatic docking
56. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Aims to describe and quantify the association.
Purpose:
Quick computation
Able to compare results with
experimental data
Able to distinguish true inhibitors
to false positive ligands
Able to rank the ligands
5.3. Scoring
5. Automatic docking
57. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
A force-field (FF) is used to describe the interaction.
Based on classical FF such as AMBER or CHARMM.
Advantages:
• Quick
• Good parameterization based on empirical parameters
Drawbacks:
• Electrostatic is generally overestimated
• Entropy??
Example : Dock
5.3. Scoring
5.3.1. Force-field
𝐸 =
𝑖
𝑁𝐵𝑂𝑁𝐷
𝑗=𝑖+1
𝑁𝐵𝑂𝑁𝐷
𝑞𝑖 𝑞 𝑗
𝜀𝑖𝑗 𝑟𝑖𝑗
+
𝐴𝑖𝑗
𝑟𝑖𝑗
12
−
𝐵𝑖𝑗
𝑟𝑖𝑗
6
5. Automatic docking
58. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
A function is designed to evaluate free energy of binding instead of interaction
energy.
5.3. Scoring
5.3.2. Empirical potential
5. Automatic docking
59. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
These functions are calibrated with experimental data.
Advantages:
Safer evaluation of energy
More physical effects are incorporated in the equation
More accurate results
Drawbacks:
The function is calibrated with a training set of data. Beware if your
system is not "classical".
Sometimes the electrostatic effect is overestimated
Estimation of entropy is far from being correct.
Example : FlexX, Autodock, Gold...
5.3. Scoring
5.3.2. Empirical potential
5. Automatic docking
60. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Used only for scoring (after the docking pose)
How it works:
A statistical analysis is made on a dataset of complex structures form the PDB.
ligand/protein atomic distances are recorded. According to the clouds found, a
score is given for the atomic distances found in the docking calculation.
5.3. Scoring
5.3.2. Knowledge based
5. Automatic docking
61. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
This technique works well but has no chemical meaning. This type of score ranks
more on drug-likeness than on interactions.
This technique is sensitive to the studied protein family type. For example,
different scoring values are found depending the protein location in cell and its
function. This can be an advantage or a drawback.
This type of docking scoring (drugscore, ligscore,...) is usually used in consensus
scoring
Compounds which have a good
rank with several scoring functions
may be the best ones.
No physical interpretation.
5.3. Scoring
5.3.2. Knowledge based
5. Automatic docking
62. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Applications:
Able to localize a ligand inside a biological macromolecule.
Analysis of the interacting binding mode.
Able to draw structure activity relationships.
Limitations:
Target flexibility is never taken into account, or scarcely.
Scoring functions are far from being perfect. Energetical interpretations
are thus questionable.
Beware of searching parameters.
Generally, several binding modes are proposed... which one should be
picked?
Software:
Grid method: Autodock, Gold, ICM, Glide
Sphere method: Dock
Incremental construction: FlexX, Ludi
5.4 Conclusions
5. Automatic docking
64. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
An iterative work with experimental chemists is made. The purpose is to propose
original ideas for getting more active compounds.
Requirements:
A collaboration with people from experimental fields (chemist/biologist).
All people must understand each other! Not so obvious because each field of
research has its own logic.
Structural analyses must be performed for "all ligands“
The pros and cons:
Provides more original compounds than screening.
Safer interpretation of results when we compare to virtual screening (see
later).
Real scientific interactions but needs human and computational time.
6.1. Direct conception
65. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Example 1: exploring a protein cavity with several moities.
6. Applications
6.1. Direct conception
66. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Example 2: Extend a ligand to pick up a new favourable interaction.
6. Applications
6.1. Direct conception
67. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Example 2: Extend a ligand to pick up a new favourable interaction.
6. Applications
6.1. Direct conception
69. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Instead of making molecular docking for a small set of defined ligands this
computation is extended to a large database.
The compounds which will have the best ranks will be purchased and biologically
tested.
Virtual screening is named by its analogy to all experimental screening methods.
Three major steps:
1. Ligand database. If you remove the good ones... You will have nothing
at the end.
2. Molecular docking. Even if your database is full of good compounds if
you are not able to correctly dock each one... You will have nothing at
the end.
3. Ranking. Even if the two previous steps were correctly made, if you are
not able to meaningfully rank the ligands... You will have nothing at the
end.
6. Applications
6.2. Virtual screening
6.2.1 Rules
70. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Chemical universe:
10100 à 10400 compounds.
Organic molecules:
1024 à 1040 compounds.
Synthesized molecules:
106 compounds.
Acitve molecules:
10? molécules.
6. Applications
6.2. Virtual screening
6.2.1 Databases
We are looking of a needle in a
haystack... if this needle exists.
71. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Numerous chemical databases exist. Some of them are commercial.
6. Applications
6.2. Virtual screening
6.2.1 Databases
Name Type Number
Pubchem Public 30 million
ChEMBL Public 1 million
NCI set Public 140 000
ChemSpider Public 26 million
CoCoCo Public 7 million
TCM Public 32 000
ZINC Public 13 million
ChemBridge Commercial 700 000
Specs Commercial 240 000
Name Type Number
IUPHAR Public 3 180
Asinex Commercial 550 000
Enamine Commercial 1.7 million
Maybridge Commercial 56 000
WOMBAT Commercial 263 000
ChemDiv Commercial 1.5 million
Chemnavigator Commercial 55.3 million
ACD Commercial 3 870 000
MDDR Commercial 150 000
72. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
There are problems of storing
chemical data as 3D files:
difficulty to compare
chemical composition
it needs high hard-drive
access
modification of databases
is hard to make
can we simplify?
Benzene example
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
73. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The SMILES code gives a benzene with only one line.
SMILES: Simplified Molecular Input Line Entry System
Others coding system exist (SLN, WLN, STRAPS...), however, they share a similar
philosophy and the knowledge of their differences are not for the uninitiated
people.
6.Applications
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
c1ccccc1 Cc1ccccc1
OH
Oc1ccccc1
74. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Example of SMILES code for a molecule:
This system has numerous advantages:
Simple storage (1 line!)
Easy to manage
Generation of virtual library is very easy
6.Applications
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
75. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
A chemical database is SMILES
6.Applications
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
76. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Unfortunately, there are drawbacks of using SMILES coding:
Hydrogens are added at the end for filling the chemical valences
Software are required to transform 1D in 3D. These are generally commercial
and have their own drawbacks (CORINA, Omega, ROTATE, CAESAR...)
Smile code is not (yet) unique!!! A molecule might be present twice (or more)
6.Applications
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
77. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The 3D storage partially solves the 1D problems... but
Storage problems: if a 1D database of 1.5 Go is transformed in 3D, the size is
around 132 Go.
Really more difficult to create virtual chemical databases comparing to
SMILES code.
Still problem for tautomeric forms and charge
6.Applications
6.2. Virtual screening
6.2.2. Databases
6.2.2.1. 1D storage
78. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The main problem of chemical databases is that they contain mainly
uninteresting compounds.
We must filter them to:
Eliminate as much as possible uninteresting compounds
Spend more computational time for molecular docking calculations.
First obvious filter is the redundancy: Sometimes, chemical databases contain
the same compounds (even the commercial databases). Why?
1D databases → SMILES code is not unique
3D database → Comparison of compounds is hard to perform
6.2.3.1. Redundancy
6.Applications
6.2. Virtual screening
6.2.3. Filtering
79. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Comparison of 3D information is hard to perform
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.1. Redundancy
80. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Other types of redundancy
These three compounds may appear as different in a database!!!!!
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.1. Redundancy
81. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.2. Reactivity and toxicity
Some chemical moieties are
known to be highly reactive
and/or toxic.
The compounds which carry
these moieties can thus be
moved apart.
82. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The artemisinine counterexample (anti-paludic drug).
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.2. Reactivity and toxicity
O
O
O O
O
H
H
83. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
The global assumption of this filtering step is that a biologically molecule looks
like... any other biologically active compounds.
From this idea (maybe false) several filters can be set:
The 32 types of cycles
The 34 types of moieties
The Lipinski rule
From these filters, a score is determined. According to your defined thresholds
you will get a database with more or less compounds.
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
84. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
When a ligand interacts with its target it loses some degrees of freedom. This
process decreases the association entropy variation and thus increase the free
energy of binding.
To avoid this fact, there is no other way than to eliminate, as much as possible,
ligand degrees of freedom by... making rings. But, keep in mind that:
You must maintain a similar interaction scaffold (the bioactive
conformation)
Generally, a ligand without flexibility has difficulties to pass through
membrane (distribution)
Making rings is thus a smart idea when you are designing biologically active
compounds. Some researchers made an inventory of the 32 classical rings
classically encountered in drugs.
85. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
Compounds with one and two rings (5 or 6 membered).
( )n
n=1,2,3,4,5,6
( )n
n=1,2
86. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
Compounds three rings (5 or 6 membered).
87. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
Other scaffolds
88. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
89. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
From a statistical study of 2548 commercially avaible orally active substances,
Lipinski defined a rule: the "Lipinski rule's of five".
If you want to design an orally available active substance it must follow at least
4 of these 5 points:
A molecular weight lower than 500 g/mol
A logP lower than 5
A number of hydrogen bond donors atoms lower than 5
A number of hydrogen bond acceptors atoms lower than 10
A polar surface lower than 150 Ų
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.3. Drug-like
90. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
ADMET:
Adsorption Desorption Metabolism Excretion Toxicity
Usually, drugs failed to be marketed during the clinical tests. It is thus essential to
remove compounds that have bad AMDET properties.
QSAR 2D equations are used to defined the several ADMET properties.
With all of these properties a chemical space can be defined. Some software are
dedicated to predict pharmacokinetic properties (Volsurf) or toxicity (CORAL)
This space is useful to visualize the chemical space and get diverse or similar
compounds.
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.4. ADMET
91. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
Chemical descriptors label
the axis and colors of the
chemical space
Statistical tools are useful to
analyze this chemical space
6.Applications
6.2. Virtual screening
6.2.3. Filtering
6.2.3.4. ADMET
92. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
The molecular docking calculation is a long step.
You can decrease the computational time by two ways:
A low generation of docking poses... In this case, you have to be lucky to get
a right-first-time molecular docking calculation.
A highly filtered databases... In this case, you have "few" compounds but,
you have to be lucky that the good molecules are not discarded.
To sum-up, you have to be lucky (or gifted).
The scoring part is the Achille's heel of the structure-based virtual screening.
There are 3 main methods of scoring (see previous slides). A consensus scoring
is certainly the best way to avoid the major drawbacks of each techniques.
6.2. Virtual screening
6.2.4. Scoring
93. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
𝑅𝑀𝑆𝐷 =
𝑖=1
𝑁
𝑟𝑖 − 𝑟𝑖𝑜
2
𝑁
Like "classical" molecular docking calculations, if experimental structures of a
complex are known, it's interesting to add these compounds in your database.
These compounds, normally, mustn't be discarded during the filtering
processes.
We can compare the predicted docked position and the experimental
structure. A root mean square deviation (RMSD) can thus be determined:
The user should define its threshold value, generally between 0.3 to 2 Å. Despite
its simplicity, this metric is far from being perfect (size, interaction, symetry...).
94. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
Number of compounds
Interaction energy or scoreThreshold
95. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
False positive compounds
96. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
False negative compounds
97. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
How to evaluate a good virtual screening procedure? Several groups have
developed the used of decoys in the VS strategy.
The decoys have been designed to display similar physico-chemical properties of
known ligands.
For example, the DUD-E (Directory of Useful Decoy Enhanced) contains:
around 102 protein systems (classical drug-target)
for each system, several known ligands are put in a database (average 13)
for each ligand, around 50 decoys (with similar properties) are added
The databases contain, on average, 650 compounds
98. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
We can compute the Enrichment Factor for x% of selected compounds:
𝐸𝐹𝑥% =
𝑅𝑎𝑡𝑖𝑜 𝑜𝑓 𝑎𝑐𝑡𝑖𝑣𝑒 𝑐𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑥% 𝑜𝑓 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
𝑅𝑎𝑡𝑖𝑜 𝑜𝑓 𝑎𝑐𝑡𝑖𝑣𝑒 𝑐𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒
EF must be up to 1, because if EF=1 then your molecular docking is equivalent to
a random choice!
EF is easy to calculate but:
Requires active compounds
Not easy to compare virtual screening methods with distinct databases
All active compounds are considered similarly. However, some active
molecules are not so much active...
99. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.2. Virtual screening
6.2.5. Assessing quality
ROC (Receiver Operating Characteristic) curves:
% of false positive: 𝐹𝑃 = 1 −
𝑁 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑢𝑛𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑐𝑡
𝑁 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒
% of active compounds
Random
Ideal ROC
Classical good ROC
100. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
Moon and Howe: ‘‘Given detailed structural knowledge of the target receptor, it
should be possible to construct a model of a potential ligand, by algorithmic
connection of small molecular fragments, that will exhibit the desired structural
and electrostatic complementarity with the receptor.’’
De novo design purpose and challenge: Build an ideal compound inside the
protein. If synthesized, this one should be a perfect inhibitor.
There are several methods to do this task. All of them have advantages and
drawbacks. To date, nothing is perfect!
Examples of software: LUDI, CAVEAT, SPROUT, MCSS... Here, we will see only the
MCSS strategy.
MCSS: Multiple Copy Simultaneaous Search
6.3. De novo design
101. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
We start from an empty protein binding site
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
102. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
The binding site is filled with a lot of identical fragments
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
O O
-
103. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
The binding site is filled with a lot of identical fragments
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
O O
-
O O
-
O O
-
O O
-
O O
-
O O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
104. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
Only the best position of the fragment is kept
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
O O
-
O O
-
O O
-
O O
-
O O
-
O O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
O
O
-
105. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
Step by step, the protein binding site is completely filled with fragments at optimal
positions.
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
O O
-
N
H
H
106. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
Last step: linkage of all elements
O O
-
N
H
H
NH3
+
Thr
Lys
Phe
Trp
Leu Ile
Val
Ser
OH
OH
107. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
First example: the design of novel inhibitor of hepatitis C virus helicase (HCV).
Ligbuilder 1st proposition
N
OH
NH
O
NH
OH
NH2
OH
SH
H
H
OH
108. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
First example: the design of novel inhibitor of hepatitis C virus helicase (HCV).
Ligbuilder 2nd proposition
O
OH
O
OH
O
O
NH
O
N
H
O
Human enhancement
IC50 = 260 nM
109. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
6.3. De novo design
Second example: the design of inverse agonist of cannabinoid receptor 1 (CB1).
TOPAS proposition Human enhancement
IC50 = 4 nMIC50 = 1500 nM
N
O
O
O
F N
O
O
F
Cl
Cl
110. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
6. Applications
Advantages:
Quick
Provides new ideas of chemical scaffolds
Compounds are here original. It is not the case of virtual screening
Drawbacks:
How to synthetize them? New softwares attempt to follow some chemical
rules of synthesis...
Sometimes the molecules are generated only for filling the protein cavity and
not for inhibit the enzyme.
A final human design is always required
6.3. De novo design
112. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
7. General conclusions
Molecular docking is an efficient method to
predict the structural interaction of an
organic molecule inside a biomacromolecule
binding site.
However, molecular docking has a weakness
for the determation of the interaction energy
(scoring function).
Generally, molecular docking calculations and
their applications don't give an unique
solution but rather several solutions. Human
has the last word.
Molecular docking is mainly applied for the
drug-design and get many success.
113. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
7. General conclusions
Some successful drugs through molecular docking between 1995 and 2009.
114. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
7. General conclusions
Some successful drugs through molecular docking between 1995 and 2009.
115. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
7. General conclusions
Some successful drugs through molecular docking between 1995 and 2009.
1) LMC : Leucémie Myéloïde Chronique
2) EGFR : récepteur au facteur de croissance endothélial
3) Cancer pulmonaire non à petites cellules
4) VEGFR : récepteur au facteur de croissance endothélial vasculaire
5) Cancer gastro-intestinal résistant à l’imatinib
6) Lymphome cutané à cellules T
7) INNTI : inhibiteur non nucléosidique de la transcriptase inverse.
117. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
8. References
Articles:
Shoichet, B.K., D.L. Bodian, and I.D. Kuntz, J. Comp. Chem., 1992. 13(3): p. 380-397.
Meng, E.C., B.K. Shoichet, and I.D. Kuntz, J. Comp. Chem., 1992. 13: p. 505-524.
Kuntz, I.D., J.M. Blaney, S.J. Oatley, R. Langridge, and T.E. Ferrin, J. Mol. Biol., 1982. 161: p.
269-288.
Meng, E.C., D.A. Gschwend, J.M. Blaney, and I.D. Kuntz, Proteins, 1993. 17(3): p. 266-278.
F. Barbault, C. Landon, M. Guenneugues, M. Legrain, et al, Biochemistry 2003. 42 14434-42
D. Eisenberg, E. Schwarz, M. Komaromy and R. Wall, J. Mol. Biol. 1984. 179 125-142.
F. Barbault, B. Ren, J. Rebehmed, C. Teixeira, Y. Luo, et al, Eur. J. Med. Chem. 2008 .43 1648-
56.
W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graph. 1996 (14) 33-8
C. Teixeira, N. Serradji, F. Maurel, F. Barbault, Eur. J. Med. Chem. 2009 . 44 3524-32
Hu R., Barbault F., Delamar M., Zhang R. Bioorg. Med. Chem. 2009. 17 2400–9
Morris G.M., Huey R., Lindstrom W., Sanner M.F,et al, J Comput Chem 2009. 30 2785–91.
Morris G.M., Goodsell D.S., Halliday R.S., Huey R., Hart W.E., Belew R.K., Olson A.J., J. Comput.
Chem. 1998. 19 1639–62
T. Cheng, Q. Li, Z. Zhou, Y. Wang, SH. Bryan., AAPS Journal 2012. 14 133-41
Crivori P, Cruciani G, Carrupt P-A, Testa B., J Med Chem. 2000;43(11):2204–2216.
Toropova AP, Toropov AA, Lombardo A, Roncaglioni A, Benfenati E, Gini G. , J Comput. Chem.
2012. doi:10.1002/jcc.22953.
Sharman JL, Benson HE, Pawson AJ, et al. , Nucleic Acids Res. 2013;41(D1):D1083–D1088
118. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
8. References
Articles:
Nesrine Ben Nasr, "Optimisation de méthodes de criblage virtuel et synthèse de molécules
à visée thérapeutique pour le traitement des maladies auto-immunes", 2013, Thesis, CNAM
OMEGA135,213, ROTATE195 , CAESAR214
Boström J, Greenwood JR, Gottfries J., J. Mol. Graph. Model. 2003;21(5):449–462
CORINA - http://www.molecular-networks.com/products/corina
Renner S, Schwab CH, Gasteiger J, Schneider G., J Chem Inf Model. 2006;46(6):2324–2332
Hawkins PCD, Skillman AG, Warren GL, et al, J Chem Inf Model. 2010;50(4):572–584
Li J, Ehlers T, Sutter J, Varma-O’brien S, Kirchmair J., J Chem Inf Model. 2007;47(5):1923–1932
Brooijmans N, Kuntz ID., Annu Rev Biophys Biomol Struct. 2003;32(1):335–373
Boström J., J Comput Aided Mol Des. 2001;15(12):1137–1152
Bissantz C, Folkers G, Rognan D. , J Med Chem. 2000;43(25):4759–4767
Pham TA, Jain AN., J Med Chem. 2006;49(20):5856–5868
Irwin JJ, Raushel FM, Shoichet BK., Biochemistry (Mosc). 2005;44(37):12316–12328
Huang N, Shoichet BK, Irwin JJ., J Med Chem. 2006;49(23):6789–6801
DUD - A Directory of Useful Decoys. http://dud.docking.org/
Spitzer R, Jain AN., J Comput Aided Mol Des. 2012;26(6):687–699
Neves MAC, Totrov M, Abagyan R., J Comput Aided Mol Des. 2012;26(6):675–686
Brozell SR, Mukherjee S, Balius TE, et al, J Comput Aided Mol Des. 2012;26(6):749–773
119. Dr. Florent Barbault, ITODYS (CNRS UMR 7086)
8. References
Articles:
Fan H, Irwin JJ, Webb BM, Klebe G, Shoichet BK, Sali A., J Chem Inf Model. 2009;49(11):2512–
2527.
DUD-E: A Database of Useful (Docking) Decoys — Enhanced. http://dude.docking.org/
Mysinger MM, Carchia M, Irwin JJ, Shoichet BK., J Med Chem. 2012;55(14):6582–6594
Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O. J Med Chem. 2005;48(7):2534–2547
Kirchmair J, Distinto S, Markt P, et al. J Chem Inf Model. 2009;49(3):678–692
Giganti D, Guillemain H, Spadoni J-L, et al., J Chem Inf Model. 2010;50(6):992–1004
Böhm HJ. J Comput Aided Mol Des. 1992;6(1):61–78
Böhm HJ. J Comput Aided Mol Des. 1992;6(6):593–606
Miranker, A.; Karplus, M., PROTEINS: Struct, Funct, and Gene 1991 11:29–34
Bohacek, R. S. & McMartin, C., J Am Chem Soc 1994 116:5560–71
Gisbert Schneider and Karl-Heinz Baringhaus, "De Novo Design: From Models to
Molecules"(book)
Kandil, S., Biondaro, S., Vlachakis, D., et al, Bioorg. Med. Chem. Lett 2009 19:2935–7
Wang, R., Gao, Y., and Lai, L., J Mol Model 2000 6:498–516
Rogers-Evans, M., Alanine, A., Bleicher, et al, QSAR Comb Sci 2004 26:426–30
Schneider, G., Neidhart, W., Giller, T., et al, Angew. Chem Int Ed 1999 38:2894–6
Alig, L., Alsenz, J., Andjelkovic, M., Bendels, S., Benardeau, A., et al, J Med Chem 2008
51:2115–27
Alex AA, Millan DS. In: Drug Design Strategies.; 2011 (chapter book)