Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

43_EMIJ-06-00212.pdf

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 9 Anzeige

Weitere Verwandte Inhalte

Aktuellste (20)

Anzeige

43_EMIJ-06-00212.pdf

  1. 1. Submit Manuscript | http://medcraveonline.com Introduction In 1981, a cover article entitled “Next Industrial Revolution: Designing Drugs by Computer at Merck”1 was published by Fortune magazine. This event may be regarded as the beginning of the potential interests among scientists towards computer-added drug design (CADD). In the past decade, CADD has been reemerged as a very effective way to develop new potential drugs. The screening of a numberofcompoundscanbeeasilyexecutedinordertoleadcompound discovery. Many compounds predicted to be inactive can be skipped, and those predicted to be active can be prioritized which reduces the cost and workload of a full High-Throughput screening without compromising lead discovery. This feature provides new impetus for the drug discovery. In modern post-genomic era, the number of proteins with a known three-dimensional structure is increasing rapidly. This increase in the number of protein targets is in part because of improvements in techniques for structure determination, such as high-throughput X-ray crystallography. The structures produced by structural genomics initiatives are available publicly. Efforts are made to achieve large scale generation and analysis of information through computational as well as experimental methods from the three dimensional structures and dynamics of the proteins. Computational generation of targets through modeling and ab-initio prediction of protein structures are the helpful tools in computational proteomics. Molecular docking of a protein structure with potential interacting partners (ligands) is heavily used in industry of rational drug design. Discovery and development of a new medicine is a long and expensive process. The new discovered compound must not only produce the desired response with minimal side effects but also be better than the existing remedies.2 Docking is a computational technique that samples conformations of ligand in protein binding sites which aims to achieve an optimized conformation for both the protein and ligand and relative orientation between them, such that the free energy of the overall system is minimized. The interaction between a ligand and its target may be due to non-bonded forces and, in some cases, covalent interactions. Upon binding, many ligands show significant shape complementarities with the region of the macromolecule in binding site. Ligands often form hydrogen bond interactions in the active site. Some receptors have hydrophobic packet formed by a group of non-polar hydrophobic amino acids in to which ligand can place a hydrophobic group of appropriate size. Ligands are required to be sufficiently lipophillic to partition the cell membrane but not so lipophillic that it stays there. Components of the docking simulations are molecular representation, conformational search space and ranking of the docked complexes. Protein surface provide the toe-hold information about the interaction with other molecules. Different mathematical models viz; geometrical shape descriptor or grid generation are used to define the protein surface which can be modified for rigid and flexible portions of the receptors. Docking initiates from the non-native folded protein chains and ligand conformations guided from experimental observations. The multipart docking demands a large number of possible arrangements. Also, protein domains are not necessarily stable and they may have low population times which indicate that these domains are more flexible than entire protein chains. Solving these problems require two components: an efficient search procedure and a good scoring function. The generation of ligand conformations and to locate at the most stable state in the energy landscape is the first step in the process of docking. The scoring function should include and appropriately parameterized all the energetic ingredients. To predict the possible conformation of the binary complex, each docking program utilizes a specific search algorithm and to assign a numerical fitness value to the computed protein-ligand conformation different scoring functions are utilized. Scoring functions are also valuable in order to optimize and ranks best poses of the docking. The scoring function should be fast enough to allow its application to a large number of potential. Speed and accuracy are key features for obtaining a successful result in docking simulations. The objective of the docking algorithm and scoring function is to obtain a fast method which is able to ascertain the novel lead compounds or reproduce experimental conformation at higher accuracy as possible. Search algorithms In the conformational search, structural parameters of the ligands, Endocrinol Metab Int J. 2018;6(6):359‒367. 359 © 2018 Yadava. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and build upon your work non-commercially. Search algorithms and scoring methods in protein- ligand docking Volume 6 Issue 6 - 2018 UmeshYadava Professor, Department of Physics, Deen Dayal Upadnyaya Gorakhpur University, India Correspondence: UmeshYadava, Professor, Department of Physics, Deen Dayal Upadnyaya Gorakhpur University, Gorakhpur–273009, India, Email u_yadava@yahoo.com Received: September 10, 2018 | Published: November 13, 2018 Abstract Accurate structural modeling and correct prediction of activity are the two aims of docking studies. The identification of molecular features and modifications in compounds in order to improve the potency are the other difficult issues to understand. Docking process is a multi-step process in which each step introduces one or more additional degrees of complication. The generation of ligand conformations and to locate at the most stable state in the energy landscape is the first step in the process of docking. To treat ligand flexibility and, to some extent, protein flexibility different search algorithms are utilized. The evaluation and ranking of envisaged ligand conformations are executed by scoring functions. Scoring functions make various assumptions and simplifications and do not fully account for a number of physical phenomena that determine molecular recognition. This chapter focuses on the methodological development of the search algorithms and scoring functions including limitations and advantages. Endocrinology & Metabolism International Journal Mini Review Open Access
  2. 2. Search algorithms and scoring methods in protein-ligand docking 360 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 such as torsional (dihedral), translational and rotational degrees of freedom, are incrementally modified. Conformational search algorithms perform this task by applying different methods.3 The identification of molecular features and modifications in compounds, in order to improve the potency are the difficult issues to understand. The docking process may be regarded as a multi-step process in which each step introduces one or more additional degrees of complication. Accurate structural modeling and correct prediction of activity are the aspirants of docking studies. The search algorithms used to predict plausible conformations of the complex are defined by a set of rules and parameters. In terms of the flexibility of the ligand and/or the receptor, docking algorithms can be categorized in two large sets: rigid-body and flexible docking which are based on different types of algorithms. Rigid-body docking method considers essential geometric complementarities and deals with the flexibility of neither ligand nor receptor, which limits the specificity and accuracy of results. In many of the cases, Rigid-body docking simulation has been capable of identifying ligand binding sites for proteins which are close enough to the crystallographic structures.4 Root mean square deviation (RMSD) between the atomic coordinates obtained from docking simulation and crystallographic structure is used for the comparison of the structures. In docking simulations, the best results generate RMSD values below 1.5 Å. Rigid-body docking method is usually utilized for the fastest way to perform an initial screening of small molecule databases. An illustrious example of Rigid body docking algorithm is the DOCK which is designed to find molecules with a high degree of shape complementarities to the binding site.5 At first, the DOCK program derives the negative image of the binding pocket utilizing the molecular surface of the receptor (protein or nucleic acid). The negative site is consisted of the collection of the overlapping spheres of varying radii such that each sphere touches only at the two points of the molecular surface. Ligand atoms are matched to the sphere centers to find matching sets such that all the distances between the sphere centers matches with the corresponding distances between the atoms of the ligands (Figure 1). The ligand can then be oriented within the binding site using least square fit of the atoms to the sphere centers. There are also steps in algorithm which checks the steric clashes between the ligand and receptor. In the case of unacceptable orientation the ligand is reoriented within the least square fit limit until acceptable orientation is obtained. The acceptable orientation is then scored on the basis of interaction energy computation. Subsequently, new orientations are generated by matching sphere centers and ligand atoms and scored using scoring functions. Orientations are arranged on the basis of these scores for the subsequent analysis. Figure 1 The DOCK algorithm: a) atoms are matched with sphere centers then molecule is oriented in the binding site. b) docked molecule in the active site of an enzyme After initial screening of ligands through rigid-body docking, the flexible docking is utilized for a more specific refinement and lead optimization. Flexible docking stipulates more computational power. In Flexible docking, several possible conformations of ligand or receptor, or both of the molecules is considered at the same time. Rigid-body docking considers only six degrees of freedom (translational and rotational) while flexible docking method considers conformational degrees of freedom of ligands and receptor too. Most of the methods only consider the conformational space for the ligands while the receptor is considered to be rigid. Docking algorithms contain several common methods for searching conformational space. As an instance, the docking through Monte Carlo methods incorporates simulated annealing as well.6 To treat ligand flexibility and, to some extent, protein flexibility different search algorithms are used. Ligand flexibility search methods can be divided into three basic classes: Systematic search methods, Random or Stochastic methods and Simulation methods. Systematic search algorithms Systematic search algorithms endorse slight variations in the structural parameters, progressively changing the conformation of the ligands. Systemic search algorithms try to explore all the degrees of freedom in a molecule which is dictated by the rotations of the bonds and angles and size of increments. The number of possible molecular conformations is given by, 1 1 , 360 n i j i j N inc θ ° = = ∏ ∏ Where N represent the number of rotatable bonds, ninc is the number of increments and θi,j represents the incremental angle j for bond i. Therefore, because of large number of conformations, Systematic searches ultimately face the combinatorial explosion problem.7 The systematic search method probes the energy landscape of the conformational space and, after numerous search and evaluation
  3. 3. Search algorithms and scoring methods in protein-ligand docking 361 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 cycles, converges to the minimum energy solution corresponding to the most likely binding mode. Although the method is effective in exploringtheconformationalspace,itcanconvergetoalocalminimum rather than the global minimum. This drawback can be overcome by performing simultaneous searches starting from different points of the energy landscape.3 Incremental Construction, Conformational search, Database, Fast Shape Mappings, Distance Geometry are the examples of the Systematic search algorithms. Systematic search methods can be categorized into exhaustive search algorithms and fragmentation based algorithms. Exhaustive search algorithms Exhaustive searches elucidate ligand conformations by systematically rotating all possible rotatable bonds at a given interval. Large conformational space often prohibits an exhaustive systematic search. Algorithms such as GLIDE8 use heuristics to focus on regions of conformational space that are likely to contain good scoring ligand poses. GLIDE also precomputes a grid representation of target’s shape and properties and an initial set of low-energy ligand conformations in ligand torsion-angle space is created. Initial favorable ligand poses are identified by approximate positioning and scoring methods. This initial screening reduces the conformational space over which the high resolution docking search is applied. High-resolution search involves the minimization of the ligand using standard molecular mechanics energy function followed by a Monte Carlo procedure for examining nearby torsional minima. Fragment based algorithms Incremental construction: Fragmentation methods sample ligand conformation by Incremental Construction of ligand conformations from fragments obtained by dividing the ligand of interest. Ligand conformations are obtained by docking fragments in the binding site one at a time and incrementally growing them or by docking all fragments into the binding site and linking them covalently. The incremental search process is bringing about in to two different techniques. In first technique, known as de novo ligand design, the various molecular fragments are docked in to the active region and then they are linked covalently. Second technique includes the breakup of docked ligands in to rigid (core) and flexible parts (side chains); the core rigid part is docked first in to the active site and flexible parts are added incrementally.9 The fragments are added within geometric constraints depending upon their steric complementarities and binding affinities in to binding sites. An algorithm is also introduced to remove the unfavourable conformations. In another systemic search method, the libraries of pre-generated conformations of ligands are utilised. It is a helpful tool in rigid body docking procedure. Distance geometry (DG): In the development of Distance Geometry systematic algorithm, intra and inter molecular distances are used.10 Compared to other methods, Distance geometry algorithm uses a smaller set of distance constraints, that be inclined to work with a large number of constraints either by imposing additional bounds or by deducing bounds from the given bounds. FLOG11 utilizes the distance geometry and generates database conformations which can be then used in the same manner as DOCK.12 Fast shape matching(SM): Fast Shape matching algorithms are based upon the geometrical overlap between the two molecules derived from molecular surfaces. Different algorithms are employed in order to make several alignments between ligand and receptor. Fast shape matching also predicts the possible conformations of the binding site for the ligand suitability. Rigid-body docking applications usually make use of the basic concept of the fast shape matching algorithm. As an illustrative example ZDOCK13 accounts a geometrical surface model which combines shape complementarities, desolvation and electrostatics parameters through a Fast Fourier Transform algorithm. Fast shape matching algorithm has been found to demonstrate high accuracies in the case of protein-protein docking. It is also widely utilized with the strategies in flexible docking algorithms too. The first step in the DOCK algorithm is based on a sphere-matching procedure combined with Incremental Construction method. In FlexX, the interaction geometries between the core fragments and receptor groups is taken in to account for the placement of the rigid core.14 FlexX also differ from DOCK in the sense that it uses pose- clustering algorithm to classify the docked poses. In common with other incremental search algorithms, the Hammerhead also divides ligands into fragments and docks in to binding site. After the docking of fragments, it rebuilds the ligand using energy minimization criteria with acceptable initial scores. The use of libraries of pre-generated conformations of the ligand is another method of the systematic search. Once the acceptable conformations library is calculated then search problem is reduced to a rigid body docking procedure. Stochastic or random search methods Stochastic or Random search methods are based on making random changes to either a single ligand or a population of ligands which are evaluated with a predefined probability function. Derived from the probability criteria, favorable changes are accepted. For this, the algorithm generates ensembles of molecular conformations and populates a wide range of the energy landscape. This strategy avoids trapping the final solution at a local energy minimum and increases the probability of finding a global minimum. As the algorithm promotes a broad coverage of the energy landscape, the computational cost associated with this procedure is an important limitation. Genetic algorithm, Monte carlo simulation, Tabu search etc. methods are the examples of stochastic or random search methods which uses different probability criteria of acceptance. Monte carlo algorithm Monte Carlo algorithm generate a random initial configuration of ligand in the active site which is scored based on some specific properties (e.g. energy). Monte Carlo algorithm generates an initial configuration of the ligand in the active site consisting of a random conformation, translation and rotation. This initial configuration is scored based on some specific criteria. Then, small changes are made to generate new configuration. This new configuration is again scored on the same criteria. If the new configuration gives better score than the previous one then it is retained otherwise it is accepted or rejected following Metropolis criterion. In metropolis criterion, if the configuration is not a new minimum, a Boltzmann-based probability function is applied. If the solution passes the probability function test, it is accepted; if not, the configuration is rejected. The process is repeated until the desired number of configurations is obtained. Monte carlo algorithms are introduced as an initial minimization process in some molecular dynamics programs, such as GROMOS15 and GROMACS.16 The docking programs MCDOCK17 and ICM18 utilizes Monte Carlo as flexible docking algorithm. The splendor
  4. 4. Search algorithms and scoring methods in protein-ligand docking 362 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 of Monte Carlo method is that it provides accurate and precise results in different thermodynamic conditions. Based on ensemble, this method tries to dock the ligand inside the receptor binding site through numerous random positions and rotations that drops off the chances of being trapped in the local minima. Random orientations and conformations are generated using random number generator algorithm which decreases the chances of false results. Genetic algorithm Genetic algorithms are based on the principle of population and biological evolution. The particular arrangement of the ligand and protein are defined by a set of parameters describing translation, rotation and conformation of the ligand with respect to protein. These parameters correspond to gene in the genetic algorithm and called as ‘state variables’. Parameters are encoded in a chromosome and stochastically varied and evaluated by a fitness function. In molecular docking total interaction energy of ligand with protein is regarded as the fitness value. Random pairs of the chromosomes are combined (mated) through a process called as crossover to produce a new chromosome (offspring). Based upon the fitness criteria, the new chromosome inherits the genes from either parent. Additionally, some offsprings undergo random mutation in which one gene is changed by random amount. The mutation is accepted only if it gives better fitness value. Consequently, solutions better suited to their environment reproduce whereas poorer matching sets die. This process is analogous to the gene recombination and mutation to produce the next generation. The scoring function for these algorithms takes a large number of parameters in to account like mutation rates, crossover rates and number evolutionary rounds. The genetic algorithm, as implemented in GOLD, requires approximate size and location of the receptor active site as an input. Several methods are used to define active site of the protein. Lamarkian Genetic Algorithm is implemented in AUTODOCK which switches between genotypic spaces to phenotypic space. Mutation and crossover take place in genotypic space while phenotypic space is decided by the energy function and optimized. From the energy minimization, the phenotypic alterations are mapped back onto the genes through the change of state variables of the ligand.19 Tabu search algorithm In Tabu search algorithm, a number of small random changes are made to the current configuration of ligand and rank them according to the fitness function. The change having minimum number of tabu (reject conformations) is accepted. Developed and described by Glover, Tabu search is a meta-heuristic method which has been used to resolve hard optimization problems.20 Heuristics i.e. approximate solution methods are used to tackle difficult problems. Initially, to handle the combinatorial problems in search space, Local search techniques were used which starts with an initial feasible solution and progressively improved by a series of local modifications (or moves). The search terminates when it comes across a local minimum which is an important limitations of this method. The method consist of, n small random changes to the current conformation which are ranked according to the value of the chosen fitness function. The ‘tabu’ changes (that is, previously rejected conformations) are determined. If the best modification has a lower value than any other accepted so far, then it is accepted, even if it is in the ‘tabu’; otherwise, the best ‘non-tabu’ change is accepted and recorded. This process is repeated until true minimum is obtained. Usually, tabus are stored in a short- term memory of the search space. Some tabu search techniques are implemented by aspiration criteria which revoke tabus after a specific period because thay may prevent attractive moves even when there is no danger of cycling18, but they are rarely used. Tabu Search docking algorithm has demonstrated high accuracy, being talented to prevent the simulation being trapped in local minima and avoid to visit previously known minimal energy conformations. Simulation methods Molecular dynamics simulation The most popular simulation approach for molecular docking is the Molecular dynamics simulation which calculates the trajectory of the system by the application of Newtonian mechanics. Forces are calculated on each atom from the small change in potential energy between current and new position ( ) i i F U = ∇ . These atomic forces and corresponding masses of atoms are used to determine the atomic positions over a series of small time steps by integrating Newton’s second law of motion 2 2 i i i d r F m dt =       . Thus, a trajectory of changes in atomic positions over time is obtained. The problem with molecular dynamics simulation is that it is unable to cross high-energy barriers within a practicable period of simulation. Hence, molecular dynamics simulation can locate ligands within local minima. The complement of other methods followed (like Simulated annealing) by molecular dynamics simulation may provide better results. In the year 1999, Mangoni and coworkers described a MD protocol for docking small flexible ligands to flexible targets in water.21 The center of mass motion of the ligand from its internal and rotational motions were separated and coupled to different temperature baths. The rigid or flexible ligand and/or receptor were defined using appropriate values of temperature and coupling constants. Later on the relaxed-complex approach was introduced that walks around the binding conformations that may happen only rarely in the unbound target proteins. In this approach, the MD simulation of ligand free target is carried out for 2 ns and then the docking of ligand is performed. The discovery of first clinically approved HIV integrase inhibitor, raltagravir was led by the relaxed complex method.22 Long MD simulations are used for the study of drug binding events to their protein targets. In contrast to MD simulation, energy minimization methods are hardly ever used as alone search techniques. The energy minimization techniques lead to the local energy minima, therefore these methods are often complement with other search methods, including Monte Carlo. The DOCK program acts upon a minimization step after each fragment addition, tracked by a final minimization before scoring. Simulated annealing Simulated Annealing is a probabilistic method for finding the global minimum of a cost function that posses several local minimum. In simulated annealing search method, the biomolecular system is simulated by a specific kind of molecular dynamics simulation in which every docked conformer is simulated with gradual decrease of temperature during regular interval of time. Unlike solutions may be rejected based on scoring criteria or through approximations for steric clashes and distance matches. Simulated Annealing method has been used in several conformational-analysis, protein structure predictions, and molecular docking search methods. Since this method considers the conformations and flexibility of ligand and protein both, it may give a better accuracy results when compared with Monte Carlo. However, SimulatedAnnealing docking may be more time consuming, as annealing cycle has to be repeated for each ligand positioned inside the binding pocket of receptor. Simulated Annealing is combined
  5. 5. Search algorithms and scoring methods in protein-ligand docking 363 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 with Monte Carlo (AutoDock) where random changes are made in ligand orientation during each simulated annealing temperature cycle. The energy of new configuration is compared with the previous one, lower energy configuration is chosen to be compared with the next configuration. Otherwise, the configuration is accepted or rejected based on Metropolis criterion. Several docking algorithms are used by different programs and have demonstrated considerable accuracies in different cases. Table 1 shows the various docking programs and search algorithms implemented with them. Most of the programs are focused on virtual screening initiatives. The key point in the development of docking algorithms is the accuracy of results close to experimental values which mostly depends on the type of target and ligands under study. Methods which are more complex and considers many physicochemical and thermodynamic properties be likely to provide higher accuracy but cost more computational time. Table 1 Docking algorithms implemented in various docking programs Program Algorithm Autodock Lamarkian Genetic Algorithm DOCK Shape Matching EUDOCK Shape Matching FlexX Incremental Construction FLOG Incremental Construction GLIDE Descriptor matching/Monte Carlo GOLD Genetic Algorithm HammerHead Incremental Construction MSDOCK Shape Matching MCDOCK Monte Carlo M-ZDOCK Shape Matching ICM MC minimization LigandFit Shape matching QXP MC minimization, free searching and pruning SLIDE Descriptor matching Surflex Dock Surface based molecular similarity SYSDOCK Shape Matching ZDOCK Shape Matching Docking algorithms in drug Designing and discovery Drugs are small ligands which are often highly flexible with a large number of conformations. Drug molecule surfaces are optimally complemented to the receptor pocket. Therefore, docking schemes for drug designing are based on the enhanced ligand flexibility. Mediating hydrogen bonds, water molecules in the interface play important roles.23 Two approaches are adopted for docking in drug designing. In the first approach, fragments are selected combinatorially from a library of chemical fragments, which is docked in to the binding site and molecule is grown, considering all the permissible degrees of freedom. The docked orientation is investigated to minimize energies and search for most favourable combinations. The algorithms for such problems are reviewed by Bohacek and others.24 The second approach is based on the docking of the entire molecule. Instead of step-by- step ligating fragments, database matching and scoring is carried out in this approach. Data bases which are utilized for this purpose are CMC (Comprehensive Medicinal Chemistry), NCI (National Cancer Institute databases of drugs), ACD (Available Chemical Directory), ZINC etc. Selected compounds are either docked directly one by one, or, dock compounds containing given pharmacophores. Pharmacophores are the geometrical features or chemical attributes which are common to the compounds found to bind experimentally with the given receptor. Geometrical features include hydrogen bonds, coordination of hydrophobic atoms, charged groups etc. that are responsible for the recognition of ligands with the receptor. Pharmacophores help in identifying the potential ligands from the databases. The algorithm for multiple structural alignments identifies the pharmacophores.25 Finn et al.,26 developed an algorithm called as Randomized Pharmacophore Identification for Drug Design (RAPID) which is designed to find the structural alignment between a pair of molecules. In this method, a triplet is selected randomly from one molecule and fined a congruent triplet in other molecule. If the yield is larger than required then the transformation is considered as the solution. At each iteration, the dilemma must be solved again between the current solution and the next molecule. In this approaches, the difficulty exists in the drug flexibility. An efficient algorithm for flexible 3D structure matching against massive databases of small molecules has been proposed by Rigoutsos et al.,27 . This method determines those molecules which contain common substructure with the substructure of the query molecule allowing torsional flexibility around rotatable bonds. Pharmacophoric searches are routinely utilized in the search of novel active compounds.28 Taking intrinsic flexibility in the active site and entropic penalty associated with ligand binding, Carlson et al.,29 developed a dynamic pharmacophore construction algorithm which has been tested on HIV-1 integrase. In spite of advantages of using pharmacophores and docking compounds, this method also has limitations of drug diversity. Drug diversity is of particular importance because it has been shown that the volume and shape of the binding site can change, and drugs of diverse shape, size and composition can bind at the same site. Scoring functions Docking algorithms predict a number of orientations (poses) for the ligand inside the biding site. The evaluation and ranking of envisaged ligand conformations are executed by some approximate mathematical functions known as scoring functions. There are three important applications of scoring functions in molecular docking: ligand binding mode identification, binding affinity prediction, and virtual database screening. An accurate scoring function would perform equally well on each of them. The design of consistent and reliable scoring functions is vital. Generally, free-energy estimation techniques are used in the development of scoring functions of the protein ligand docking complexes. Enthalpic and entropic effects also play important roles in ligand-binding events. The free energy perturbation approaches considers an additive equation of various components of binding.30 Acomplete equation of this kind would have the terms as appearing in equation (1). int / bind solvent conf rot trans rot vib G G G G G G G ∆ = ∆ + ∆ + ∆ + ∆ + ∆ + ∆ (1) where ΔGsolvent contribution arises due to effect of the interaction of solvent with ligand and protein. ΔGconf is the effect of conformational changes in protein and ligand. ΔGint represent the contribution of the
  6. 6. Search algorithms and scoring methods in protein-ligand docking 364 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 free energy of specific protein-ligand interaction. ΔGrot is related with the free energy loss due to freezing rotatable bonds, generally known as entropic contribution. ΔGtrans/rot represents the loss in translational and rotational free energy caused by association of two bodies (protein and ligand) to form a single body (protein-ligand complex).ΔGvib is the contribution of free energy due to changes in vibrational modes. There are number of approaches for the estimation of various terms. Scoring functions make various assumptions and simplifications of these terms and do not fully account for a number of physical phenomena that determine molecular recognition. Usually, three types of scoring functions: Force field based scoring, knowledge based scoring and empirical scoring functions are utilized by docking programs. Force-field based scoring A force field which expresses the energy of the system as a sum of diverse non-bonded terms (viz; van der Waals (VDW) interactions, electrostatic interactions, and bond stretching/bending/ torsional forces), involved in molecular recognition, are used for the development of force-field based scoring functions. Force field methods utilize a variety of force-field parameters. Empirical scoring functions use several intermolecular interaction terms which are calibrated with maximum possible experimental data. The idea that binding energies can be approximated by a sum of individual uncorrelated terms is used in designing of these functions. A typical semiempirical force field scoring function used in molecular docking through DOCK is composed of two energy components of Lennard-Jones potential and an electrostatic term whose energy parameters are taken from the Amber force fields. The force field as implemented in DOCK can be expressed by equation (2). 12 6 ( ) ( ) ij ij i j i j ij ij ij ij A B q q U r r r r ε ∑ ∑ = − + (2) where rij represents the distance between protein atom i and ligand atom j. Aij and Bij are the van der Waals parameters, and qi and qj are the atomic charges. Here, ε(rij) is the distance dependent dielectric constant representing the effect of solvent implicitly. In spite of the computational efficiency of DOCK, it cannot account for the desolvation effects that account for the effect of aqueous and non-aqueous environment of polar and non-polar groups. In absence of desolvation effects, the scoring function would be more biased towards the coulombic interactions and favors highly charged ligands. The common way of introducing the effect of desolvation terms is to treat water molecules explicitly. However, these methods are computationally expensive. The computational cost is reduced by treating water as continuum dielectric medium. These models includes Poisson-Boltzmann surface area(PB/SA) and Generalized- Born surface area (GB/SA)31 which are often used in post scoring of the docking programs. In some simplified scoring schemes solvation effect in ligand binding free energy calculations is performed using a GB/SA approach. The electrostatic interactions and the electrostatic desolvation costs are calculated with the GB model while the hydrophobic contributions for non-polar atoms are estimated using the solvent-accessible surface areas (SA) of the atoms. Lennard-Jones potential is utilized for the estimation of van der Waals energies. The parameters of the, van der Waals, hydrophobic and electrostatic contributions are optimized in agreement with experimental affinity data. Various force-field scoring functions are based on different force field parameter sets. For example, GScore32 is based on the Tripos force field33 and AutoDock34 on the AMBER force field. However, functional forms are usually similar. Standard force-field scoring functions have limitations of not only the inclusion of solvation and entropictermsbuttheysufferfromcut-offdistancesforthetreatmentof non-bonded interactions, complicating the accurate treatment of long- range effects. Hydrogen-bonding terms are often included in different ways. G-Score includes hydrogen-bonding terms which depend opon the nature and geometry of the hydrogen bonding interaction. The AutoDock treats all of the hydrogen bonds by a directional term and 12–10 Lennard–Jones potential. Recent semiempirical AutoDock scoring scheme includes the evaluations for dispersion/repulsion, hydrogen bonding, electrostatics, and desolvation terms (equation (3)): The weighting constants W have been optimized to calibrate the empirical free energy based on a set of experimentally-determined binding constants. The first term represent the van der Waals potential dispersion/repulsion interactions. The second term represents the directional H-bond energy term. C and D are the optimal parameters for hydrogen bonds. The function E(θ) provides directionality based on the angle θ from ideal hydrogen-bonding geometry. Coulomb electrostatic potential is considered in third term. The final term is a desolvation potential based on the volume of atoms (V) that surround a given atom and shelter it from solvent, weighted by a solvation parameter (S) and and exponential term with distance-weighting factor σ=3.5Å.35 Empirical scoring The empirical scoring functions consist of several energy terms whose coefficients(weights) are based on experimentally observed values which are obtained from the regression analysis using experimentally determined binding energies and x-ray structures. Due to simple energy terms, the binding score calculation from empirical scoring functions are much faster than force field scoring functions. First of all Bohm36 developed empirical scoring function SCORE1, based on the experimental data of forty five protein-ligand complexes which consist of four energy terms i.e. ionic interaction, hydrogen bonds, the lipophilic contact between protein and ligand and the number of rotatable bonds in the ligand. Later on the empirical scoring function was improved by taking a number of parameters in to account. The ChemScore empirical scoring function, presented by Eldridge and coworkers37 consist of the terms: metal atoms interaction, hydrogen bonds, the lipophilic effects of atoms, and the effective number of rotatable bonds in the ligand. In ChemScore, the free energy of binding is given by equation (4). 2 2 2 _ 12 6 12 10 , , , , ( ) ( ) ( )( ) ( ) ( ) rij ij ij ij ij i j vdw h bond elec sol i j j i i j i j i j i j ij ij ij ij ij ij A B C D q q V W W E W W S V S V e r r r r r r σ θ ε − ∑ ∑ ∑ ∑ = − + − + + + (3) ' _ 0 _ ( , ) ( , ) ( ) ( , ) bind H bond metal lipo rot nl nl H bond metal lipo rot G G f R G f R G f R G f P P G α α ∑ ∑ ∑ ∑ ∆ = ∆ ∆ ∆ + ∆ ∆ ∆ + ∆ ∆ + ∆ + ∆ (4)
  7. 7. Search algorithms and scoring methods in protein-ligand docking 365 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 Here, f is a function which depends on angular (Δα) and/or distance (ΔR) terms. ΔG0 , is the regression constant. Wang et al. developed a new empirical function based on a larger set of 200 protein–ligand complexes which was called as X-score. The X-score contains VDW interactions, hydrogen bonds, hydrophobic effects and effective rotatable bonds in ligand.38 The terms accounting for non-bonded interactions are includeded in empirical scoring functions in various ways. For example, in the early LUDI formulation,30 the hydrogen-bonding term is separated into neutral hydrogen bonds and ionic hydrogen bonds, whereas ChemScore does not differentiate between different types of hydrogen bonds. Their hydrophobic interaction treatments may also have different ways. F-score has an additional term for aromatic interactions. Glide-score39 takes into account a number of parameters like hydrogen bonds (H-bond), hydrophobic contacts (Lipo), van der- Waals (vdW), columbic (Coul), polar interactions in the binding site (Site), metal binding term (Metal) and penalty for buried polar group (BuryP) & freezing rotatable bonds (RotB) (equation (5)). 0.130 0.065 – – bond G score H Lipo Metal Site Coul vdW BuryP RotB − = + + + + + (5) In Extra Precision (XP) docking protocol of Glide, in addition to uniquewaterdesolvationenergyterms,protein-ligandstructuralmotifs leading to enhanced binding affinity are included: (i) hydrophobic enclosure where groups of lipophilic ligand atoms are enclosed on opposite faces by lipophilic protein atoms, (ii) neutral-neutral single or correlated hydrogen bonds in a hydrophobically enclosed environment, and (iii) five categories of charged-charged hydrogen bonds.40 XP Glide Score contains following terms (equation(6)).41 _ coul vdW bind penalty XP GlideScore E E E E = + + + (6) Where, _ _ _ _ _ _ _ bind hyd enclosure hb nn motif hb cc motif PI hb pair phobic pair E E E E E E E = + + + + + and _ penalty desolv ligand strain E E E = + Despite their fast and direct estimation of binding affinities, these scoring schemes come across the limitations of the penalty term for bad structures and highly dependence on the placement of hydrogen atoms. Knowledge based scoring Knowledge-based scoring functions are derived from statistical observations of intermolecular contacts recognized from the databases. They are designed to reproduce structures rather than binding energies. Potential of Mean Force (PMF), DrugScore etc. are the well-liked implementations of these scoring functions that utilizes pairwise atomic potentials. In general, with empirical methods, Knowledge based scoring functions attempt to implicitly capture binding effects that are difficult to model explicitly. Compared to the force field and empirical scoring functions, the knowledge-based scoring functions offer a good balance between accuracy and speed. Because the potentials employed in these schemes are extracted from the structures rather than from attempting to reproduce the known affinities by fitting. Knowledge-based scoring functions are quite robust and relatively insensitive to the training set as they are derived from a large and diverse structural database. Tanaka and Scheraga pioneered the Knowledge based scoring which led to the development of a number of knowledge-based scoring functions which has been applied in protein structure prediction and protein–ligand studies (Li and Liang, 2006). The functional form of Knowledge based scoring function SMoG42 is given by the equation (7): ij j j G gi i ∑ = ∆ ; log ij ij p g kT p = −       (7) and Δij = 0 or 1 depending upon whether values of i and j are more than 5Å or within 5Å. pij and p represent the interatomic and averaged interatomic interactions. k is the Boltzmann constant and T represents the Kelvin temperature. Potential of Mean Force (PMF)43,44 and DrugScore include solvent-accessibility corrections to pair-wise potentials as well. Knowledge based scoring functions are computationally simple. A disadvantage is that their derivation is basically based on information implicitly programmed in limited sets of protein–ligand complex structures. Table 2 depicts the various type of scoring functions employed in different docking programs. Table 2 Types of scoring functions employed in docking programs Force field based scoring Empirical scoring Knowledge based scoring DOCK,AutoDock, GOLD, SYBYL/ D-Score, SYBYL/G-Score etc. FlexX, Glide, LUDI, ICM, ChemScore, X-Score, Surflex, SYBYL/F-Score, SCALE, SFCscore, LigScore etc. ITScore, PMF, DrugScore, SMoG, BLEEP, MScore, KScore, GOLD/ASP, DFIRE etc. Consensus scoring Because of imperfections of scoring functions, now-a-days, another type of scoring function known as consensus scoring is utilized which combines the information from different scoring schemes to overcome the limitations in scoring. This can be very useful, as it combines the advantages and simultaneously attenuates the shortcomings of each method. Chemscore, GFscore, Xcscore, Gold-like, FlexX etc. scoring functions are the examples of consensus scoring functions. Concluding remarks Docking simulations help in the development of pharmaceutical research which cuts the much of the cost and efforts involved in traditional drug discovery. Virtual screening on protein templates provide an opportunity for de novo identification of ligands without biased from known hits. A large number of search algorithms have been developed in order to implement flexibilities of ligands and/ or proteins to obtain the correct poses of the complexes. The results
  8. 8. Search algorithms and scoring methods in protein-ligand docking 366 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 obtained utilizing, the search methodologies, employed by different docking programs, are highly dependent upon the system chosen for study. Therefore, we should be cautious regarding the choice of the algorithm for docking. The interplay between docking and scoring functions is fairly complex, but it is often easier to produce reliable models of bound ligands than to distinguish true ligands from false- positives. Despite considerable interest and improvements the current scoring functions are still far from being universally acceptable. Each scoring functions have their own advantages and limitations. The comparision of scoring functions is not promising if they are tested on different sets. Some comparision studies available online (http://www.csardock.org, http://dud.docking.org, CCDC/astex etc.) may be invaluable and promising for the development of new and improvement of existing scoring functions. Acknowledgments None. Conflict of interest The author declares there is no conflict of interest. References 1. Van Drie JH. Computer-aided drug design: the next 20 years. J Comput Aided Mol Des. 2007;21(10-11):591–601. 2. Kitchen DB, Decornez H, Furr JR, et al. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935–949. 3. Ferreira LG, dos Santos RN, Oliva G, et al. Molecular Docking and Structure-Based Drug Design Strategies. Molecules. 2015;20(7):13384– 13421. 4. Oliveira JS, Pereira JH, Canduri F, et al. Crystallographic and pre-stea- dy-state kinetics studies on binding of NADH to wild-type and isonia- zid-resistant enoyl-ACP(CoA) reductase enzymes from Mycobacterium tuberculosis. J Mol Biol. 2006;359(3):646–666. 5. Kuntz ID, Blaney JM, Oatley SJ, et al. A geometric approach to macro- molecule-ligand interactions. J Mol Biol. 1982;161(2):269–288. 6. Goodsell DS, Olson AJ. Automated Docking of Substrates to Proteins by Simulated Annealing. Proteins. 1990;8(3):195–202. 7. Leach AR. Molecular Modelling: Principles and Applications, Addison Wesley Longman Limited, Harlow. 8. Friesner RA, Banks JL, Murphy RB, et al. Glide: a new approach for rap- id, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem. 2004;47(7):1739–149. 9. Rarey M, Kramer B, Lengauer T, et al. A Fast Flexible Docking Me- thod using an Incremental Construction Algorithm. J Mol Biol. 1996;261(3):470–489. 10. Moré JJ, Wu Z. Distance Geometry Optimization for Protein Structures. Journal of Global Optimization. 1999;15(3):219–234. 11. Miller MD1, Kearsley SK, Underwood DJ, et al. FLOG: a system to se- lect ‘quasi-flexible’ligands complementary to a receptor of known three- -dimensional structure. J Comput Aided Mol Des. 1994;8(2):153–174. 12. Ewing TJ, Makino S, Skillman AG, et al. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J Com- put Aided Mol Des. 2001;15(5):411–428. 13. Chen R, Li L, Weng Z. ZDOCK: An initial-stage protein-docking algori- thm. Proteins. 2003;52(1):80–87. 14. Lorber DM, Shoichet B. Flexible ligand docking using conformational ensembles. Protein Sci. 1998;7(4):938–950. 15. Van Der Spoel D, Lindahl E, Hess B, et al. GROMACS: fast, flexible, and free. J Comput Chem. 2005;26(16):1701–1718. 16. Christen M, Hunenberger PH, Bakowies D, et al. The GROMOS sof- tware for biomolecular simulation: GROMOS05. J Comput Chem. 2005;26(16):1719–1751. 17. Liu M, Wang S. MCDOCK: A Monte Carlo simulation approach to the molecular docking problem. J Comput Aided Mol Des. 1999;13(5):435– 451. 18. Totrov R, Abagyan R. Flexible protein–ligand docking by global energy optimization in internal coordinates. Proteins. 1997;(Suppl):215–220. 19. Brooijmans N, Kuntz ID. Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct. 2003;32:335–373. 20. Glover F. Future paths for integer programming and links to artificial intelligence. Computers and Operations Research. 1986;13(5):533–549. 21. Mangoni M, Roccatano D, Di Nola A. Docking of flexible ligands to flexible receptors in solution by molecular dynamics simulation. Pro- teins. 1999;35(2):153–162. 22. Cocohoba J, Dong BJ. Raltegravir: The first HIV integrase inhibitor. Clin Infect Dis. 2009;48(7):931–939. 23. Lengauer T, Rarey M. Computational methods for biomolecular docking. Curr Opin Struct Biol. 1996;6(3):402–406. 24. Bohacek RS, McMartin C, Guida WC. The art and practice of structu- re-based drug design: a molecular modeling perspective. Med Res Rev. 1996;16(1):3–50. 25. Leibowitz N, Fligelman Z, Nussinov R, et al. An automated multiple structure alignment and detection of a common substructural motif. Pro- teins. 2001;43:235–245. 26. Finn PW, Kavarki LE, Latombe LC, et al. RAPID: randomized phar- macophore identification for drug design. Computational Geometry. 1997;97:324–333. 27. Rigoutsos I, Platt D, Califano A. Flexible 3D-substructure matchingand novel conformer derivation in very large databases of 3D-molecular in- formation. IBM Research Division. Yorktown Heights, NY: T.J. Watson Research Center, 1996. 28. Martin YC. 3D database searching in drug design. J Med Chem. 1992;35(12):2145–2154. 29. Carlson HA, Masukawa KM, Rubins K, et al. Developing a dy- namic pharmacophore model for HIV-1 integrase. J Med Chem. 20001;43(11):2100–2114. 30. Böhm HJ. LUDI: rule-based automatic design of new substituents for enzyme inhibitor leads. J Comput Aided Mol Des. 1992;6(6):593–606. 31. Rocchia W, Sridharan S, Nicholls A, et al. Rapid grid-based construction of the molecular surface and the use of induced surface charge to calcu- late reaction field energies: applications to the molecular systems and geometric objects. J Comput Chem. 2002;23(1):128–137. 32. Hawkins GD, Cramer CJ, Truhlar DG. Pairwise solute descreening of solute charges from a dielectric medium. Chemical Physics Letters. 1995;246:122–129. 33. Kramer B, Rarey M, Lengauer T. Evaluation of the FLEXX incre- mental construction algorithm for protein–ligand docking. Proteins. 1999;37(2):228–241. 34. Morris GM, Goodsell DS, Halliday RS, et al. Automated Docking Using a Lamarckian Genetic Algorithm and Empirical Binding Free Energy Function. Journal of Computational Chemistry. 1998;19(14):1639–1662.
  9. 9. Search algorithms and scoring methods in protein-ligand docking 367 Copyright: ©2018Yadava Citation:Yadava U. Search algorithms and scoring methods in protein-ligand docking. Endocrinol Metab Int J. 2018;6(6):359‒367. DOI: 10.15406/emij.2018.06.00212 35. Forli S, Botta M. Lennard-Jones Potential and Dummy Atom Settings to Overcome the AUTODOCK Limitation in Treating Flexible Ring Sys- tems. J Chem Inf Model. 2007;47(4):1481–192. 36. Bohm HJ. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput Aided Mol Des. 1994;8(3):243– 256. 37. Eldridge MD, Murray CW, Auton TR, et al. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des. 1997;11(5):425–445. 38. Wang R, Lai L, Wang S. Further development and validation of empirical scoring functions for structure based binding affinity prediction. J Com- put Aided Mol Des. 2002;16(1):11–26. 39. Friesner RA, Murphy RB, Repasky MP, et al. Extra precision glide: do- cking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem. 2006;49(21):6177–61196. 40. Gohlke H, Hendlich M, Klebe G. Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol. 2000;295(2):337–356. 41. Yadava U, Singh M, Roychoudhury M. Pyrazolo[3,4-d]pyrimidines as inhibitor of anti-coagulation and inflammation activities of phos- pholipase A2: insight from molecular docking studies. J Biol Phys. 2013;39(3):419–438. 42. Muegge I, Martin YC. A general and fast scoring function for pro- tein-ligand interactions: a simplified potential approach. J Med Chem. 1999;42(5):791–804. 43. Li X, Liang J. Knowledge-based energy functions for computational studies of proteins in Computational Methods for Protein Structure Pre- diction and Modeling. Xu Y, Xu D, Liang J, editors. New York, Springer. 44. Higueruelo AP, Schreyer A, Bickerton GR, et al. What Can We Learn from Molecular Recognition in Protein–Ligand Complexes for the Design of New Drugs? Angewandte Chemie Internation Edition. 1996;35:2588–2614.

×