SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Basics of QSAR Modeling
Prof. Rahul D. Jawarkar,
Department of Pharmaceutical Chemistry,
Dr Rajendra Gode Institute of Pharmacy,
University Mardi Road, Amravati, Maharashtra, India(444602),
E-Mail: rahuljawarkar@gmail.com,
Contact:+91-7385178762.
Drug: Drug is a single active chemical moiety which is found
in medicine and. used for diagnosis, prevention, treatment and
cure of a disease.
Chemotherapy: It is the treatment of infection or malignancy
with the specific chemical which possesses selective adverse
effects on the infecting organism, malignant cell or host cell.
Synthetic (20%)
Natural source (80%)
Drug Discovery- Finding therapeutic actions of the
molecule. e.g. Penicilin, anti-pletlet action of aspirin,
etc.
Drug Development
Drug Designing- Modifying the molecule for high
activity and Absorption-Distribution-Metabolism-
Excretion-Toxicity (ADMET). e.g. Tamiflu, Relinza,
Dorzolamide, etc.
Drug Delivery- Developing methods for drug
administration. e.g. Gelatin, starch, etc.
Conventional Procedure for drug
discovery/designing:
Synthesis-Testing-Synthesis-Testing
Cost for designing a new drug is
about $300 million
Needs 10-15 years to launch a drug
in market.
Resources like time, chemicals, etc.
are consumed
Slower, frustrating, lower success,
etc.
QSAR is not theoretical !!!!
• Collection of experimental bioactivity like IC50, EC50,
LD50, Kd, Ki, etc.
• Use of chemical structures of reported molecules only
• Comparison of bioactivity of one molecule with another
• Finding reasons for high and low activity
• Validating analysis using Statistical techniques
• OECD guidelines
In short, the experimental part has been accomplished in
advance, now QSAR analysis is being done for experimental
data to identify the reasons for bio-activity of a molecule.
Quantitative Structure-Activity Relationship (QSAR)
“Similar compounds behave similarly
and
Activity or Property varies with Structure.”
Activity = Lipophilicity + Steric + Electronic + Unknown
Factors
A QSAR is a multivariate, mathematical relationship
between a set of 2D- and 3D- physicochemical
properties (molecular descriptors) and a biological
activity/toxicity.
Do you agree?
Important steps involved in
QSAR analysis:
• Experimental data collection
• Structure drawing and appropriate 3D-
optimization
• Molecular descriptor calculation and pruning
• Model building
• Model validation
• Model interpretation
Experimental data collection:
1. ChEMBL Database - EMBL-EBI: ChEMBL is a manually
curated database of bioactive molecules with drug-like
properties. It brings together chemical, bioactivity and
genomic data
https://www.ebi.ac.uk/chembl/
2. Binding Database: BindingDB is a public, web-accessible
database of measured binding affinities, focusing chiefly on
the interactions of protein considered to be drug-targets with
small, drug-like molecules.
http://bindingdb.org/bind/index.jsp
3. Enzyme Database – BRENDA: A comprehensive enzyme
information system. https://www.brenda-enzymes.org/
Structure drawing and appropriate
3D-optimization:
Identification, Information & Description
Molecular descriptor calculation
and pruning:
1D- like MW, Number of atoms, etc.
2D- like Distance, functional group, etc.
3D- like torsional angles, etc.
Step-2: Calculation of Descriptors
Charge on atom
Dipole moment
pKa
HOMO, LUMO
Chirality
Hydrogen bond donor/acceptor
LogP
Thermodynamic………etc .
Note: At present, more than 45,000 descriptor can be calculated !!!
Step-3: Descriptor selection & Model building
All descriptors do not contain useful information.
Many descriptors provide same information.
Use of too many descriptors results in “Over Fitting”.
Use of improper descriptors results in poor and misleading models.
Use of many descriptors can lead to Chancy correlation.
Use SR, GA, MA, etc. to select best descriptors
Current Methods for Model Building
A) Multiple Linear Regression (MLR)
 Best Multiple Linear Regression (BMLR),
 Heuristic Method (HM),
 Genetic Algorithm-Multiple Linear Regression (GA-MLR),
 Stepwise MLR,
 Factor Analysis MLR and so on.
B) Partial Least Squares (PLS)
 Genetic Partial Least Squares (G/PLS),
 Factor Analysis Partial Least Squares (FA-PLS),
 Orthogonal Signal Correction Partial Least Squares (OSC-
PLS)
Step-4: Validation of model
a) Leave-One-Out Cross validation:
b) Leave-Many-Out Cross Validation:
c) External validation
d) Use PCA, Simulated Annealing, Automated Relevance
Determination (ARD), etc…
e) Use Bayesian Statistics or Gaussian Processes
since they do not require Cross-Validation!!!
Modern trends in QSAR modeling
• Currently, there is much talk about the use of artificial
intelligence (AI) in chemistry.
• AI is the superset of tasks that demonstrate characteristics
of human intelligence, while ML is a subset of AI which
accesses data, analyses trends and generates intelligent,
actionable insights.
• Many people use the term AI in the same context as ML in
many data-rich disciplines, ranging from health care to
astronomy.
• In this regard one can say that AI has been used in
chemistry since the 1960’s under the name QSAR.
troponin I-interacting
kinase (TNNI3K)
IC50 = 8000 nM*
troponin I-interacting
kinase (TNNI3K)
AI pred:
IC50 = 7800 nM
Experimental:
IC50 = 80 nM*
*Lawhorn, B. G. et al., Identification of purines and 7-deazapurines as potent and
selective type I inhibitors of troponin I-interacting kinase (TNNI3K). J. Med. Chem.
2015, 58, 7431−7448.
spleen tyrosine
kinase (Syk)
IC50 = 8.8 nM*
spleen tyrosine
kinase (Syk)
AI pred:
IC50 = 10 nM
Experimental:
IC50 = 0.060 nM*
*Ellis, J. M. et al., Overcoming mutagenicity and ion channel activity: optimization of
selective spleen tyrosine kinase inhibitors. J. Med. Chem. 2015, 58, 1929−1939.
QSAR based virtual screening
• Molecular docking can rapidly identify large subsets of
molecules with desired activity from large screening
collections of compounds (105–106 compounds) using
automated methods.
• However, the hit rate ranges between 0.01% and 0.1% !!!
• Most of the screened compounds are routinely reported as
false positives.
• On the other hand, typical hit rates for QSAR-based virtual
screening range between 1% and 40% !!!!!
Reference: Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN and
Andrade CH (2018) QSAR-Based Virtual Screening: Advances and Applications in
Drug Discovery. Frontiers in Pharmacology 9. doi: 10.3389/fphar.2018.01275
QSAR based virtual screening:
Success Stories
• Zhang et al. (2013), a data set of 3,133 compounds reported
as active or inactive against P. falciparum was used to
develop QSAR models.
• QSAR models were applied for VS of the ChemBridge
database.
• After VS, 176 potential antimalarial compounds were
identified and submitted to experimental validation along
with 42 putative inactive compounds.
• Twenty-five compounds presented antimalarial activity in P.
falciparum.
• All 42 compounds predicted as inactives by the models were
confirmed experimentally to be inactives.
• Alves et al. (2020), a data set of 113 compounds (40 actives
and 73 inactives) for the SARS-CoV Mpro.
• QSAR models were applied for VS of the DrugBank database
of FDA approved drugs.
• After VS, 42 potential drugs were identified but only 11 were
tested for experimental validation.
• Three compounds presented strong activity for the SARS-
CoV-2 Mpro.
QSAR based virtual screening:
Success Stories
1. Zhang, L. et al. (2013) Discovery of novel antimalarial compounds enabled by
QSAR-based virtual screening, J. Chem. Inf. Model. 53, 475–492. DOI:
10.1021/ci300421n
2. Alves et al. (2020) QSAR Modeling of SARS-CoV Mpro Inhibitors Identifies
Sufugolix, Cenicriviroc, Proglumetacin, and Other Drugs as Candidates for
Repurposing against SARS-CoV-2, Mol inf (Wiley). DOI: 10.1002/minf.202000113
Disadvantages of QSAR
•False correlations may arise because biological data that
are subject to considerable experimental error (noisy data).
•If training dataset is not large enough, the data collected
may not reflect the complete property space.
Consequently, many QSAR results cannot be used to
confidently predict the most likely compounds of best
activity.
•Features may not be reliable as well. This is particularly
serious for 3D features because 3D structures of ligands
binding to receptor may not be available. Common
approach is to use minimized structure, but that may not
represent the reality well.
1. ACD Chemsketch (www.acdlabs.com)
2. PyMOL
3. RDKit
4. ChemDraw
5. Avogadro software (https://avogadro.cc/)
6. OpenBabel (http://openbabel.org/wiki/Main_Page)
7. MMTK (http://dirac.cnrs-orleans.fr/MMTK.html)
8. PyDescriptor (available from Dr. V. H. Masand)
9. PaDEL (http://www.yapcwsoft.com/dd/padeldescriptor/)
10.BuildQSAR (https://profanderson.net/files/buildqsar.php)
11.Weka (https://www.cs.waikato.ac.nz/ml/weka/)
12.‘R’ package like GA-MLR, Carret, etc.
Free Software for QSAR
1. ChEMBL Database - EMBL-EBI: ChEMBL is a
manually curated database of bioactive molecules with
drug-like properties. It brings together chemical,
bioactivity and genomic data
https://www.ebi.ac.uk/chembl/
2. Enzyme Database – BRENDA: A comprehensive
enzyme information system.
https://www.brenda-enzymes.org/
Databases
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx

Weitere ähnliche Inhalte

Ähnlich wie Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx

Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
Kamel Mansouri
 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1
Ankur Khanna
 
Recent Applications in Pharmaceutical Biotechnology.pptx
Recent Applications in Pharmaceutical Biotechnology.pptxRecent Applications in Pharmaceutical Biotechnology.pptx
Recent Applications in Pharmaceutical Biotechnology.pptx
SewagegnehuGetachew
 
alternative methods of animal toxicity.pptx
alternative methods of animal toxicity.pptxalternative methods of animal toxicity.pptx
alternative methods of animal toxicity.pptx
ashharnomani
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
Ankur Khanna
 

Ähnlich wie Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx (20)

Bigger Data to Increase Drug Discovery
Bigger Data to Increase Drug DiscoveryBigger Data to Increase Drug Discovery
Bigger Data to Increase Drug Discovery
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designing
 
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
QSAR Modeling of Bisbenzofuran Compounds using 2D-Descriptors as Antimalarial...
QSAR Modeling of Bisbenzofuran Compounds using 2D-Descriptors as Antimalarial...QSAR Modeling of Bisbenzofuran Compounds using 2D-Descriptors as Antimalarial...
QSAR Modeling of Bisbenzofuran Compounds using 2D-Descriptors as Antimalarial...
 
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
Evaluation of Logistic Regression and Neural Network Model With Sensitivity A...
 
Journal Club (Systematic Review & Meta Analysis)
Journal Club (Systematic Review & Meta Analysis) Journal Club (Systematic Review & Meta Analysis)
Journal Club (Systematic Review & Meta Analysis)
 
Predictive tox proposal_ver1
Predictive tox proposal_ver1Predictive tox proposal_ver1
Predictive tox proposal_ver1
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
MDC Connects: Target discovery at AstraZeneca
MDC Connects: Target discovery at AstraZenecaMDC Connects: Target discovery at AstraZeneca
MDC Connects: Target discovery at AstraZeneca
 
Recent Applications in Pharmaceutical Biotechnology.pptx
Recent Applications in Pharmaceutical Biotechnology.pptxRecent Applications in Pharmaceutical Biotechnology.pptx
Recent Applications in Pharmaceutical Biotechnology.pptx
 
Journal
JournalJournal
Journal
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
alternative methods of animal toxicity.pptx
alternative methods of animal toxicity.pptxalternative methods of animal toxicity.pptx
alternative methods of animal toxicity.pptx
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 

Kürzlich hochgeladen

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 

Kürzlich hochgeladen (17)

Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 

Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx

  • 1. Basics of QSAR Modeling Prof. Rahul D. Jawarkar, Department of Pharmaceutical Chemistry, Dr Rajendra Gode Institute of Pharmacy, University Mardi Road, Amravati, Maharashtra, India(444602), E-Mail: rahuljawarkar@gmail.com, Contact:+91-7385178762.
  • 2. Drug: Drug is a single active chemical moiety which is found in medicine and. used for diagnosis, prevention, treatment and cure of a disease. Chemotherapy: It is the treatment of infection or malignancy with the specific chemical which possesses selective adverse effects on the infecting organism, malignant cell or host cell. Synthetic (20%) Natural source (80%)
  • 3. Drug Discovery- Finding therapeutic actions of the molecule. e.g. Penicilin, anti-pletlet action of aspirin, etc. Drug Development Drug Designing- Modifying the molecule for high activity and Absorption-Distribution-Metabolism- Excretion-Toxicity (ADMET). e.g. Tamiflu, Relinza, Dorzolamide, etc. Drug Delivery- Developing methods for drug administration. e.g. Gelatin, starch, etc.
  • 4. Conventional Procedure for drug discovery/designing: Synthesis-Testing-Synthesis-Testing Cost for designing a new drug is about $300 million Needs 10-15 years to launch a drug in market. Resources like time, chemicals, etc. are consumed Slower, frustrating, lower success, etc.
  • 5.
  • 6. QSAR is not theoretical !!!! • Collection of experimental bioactivity like IC50, EC50, LD50, Kd, Ki, etc. • Use of chemical structures of reported molecules only • Comparison of bioactivity of one molecule with another • Finding reasons for high and low activity • Validating analysis using Statistical techniques • OECD guidelines In short, the experimental part has been accomplished in advance, now QSAR analysis is being done for experimental data to identify the reasons for bio-activity of a molecule.
  • 7. Quantitative Structure-Activity Relationship (QSAR) “Similar compounds behave similarly and Activity or Property varies with Structure.” Activity = Lipophilicity + Steric + Electronic + Unknown Factors A QSAR is a multivariate, mathematical relationship between a set of 2D- and 3D- physicochemical properties (molecular descriptors) and a biological activity/toxicity. Do you agree?
  • 8. Important steps involved in QSAR analysis: • Experimental data collection • Structure drawing and appropriate 3D- optimization • Molecular descriptor calculation and pruning • Model building • Model validation • Model interpretation
  • 9. Experimental data collection: 1. ChEMBL Database - EMBL-EBI: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data https://www.ebi.ac.uk/chembl/ 2. Binding Database: BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of protein considered to be drug-targets with small, drug-like molecules. http://bindingdb.org/bind/index.jsp 3. Enzyme Database – BRENDA: A comprehensive enzyme information system. https://www.brenda-enzymes.org/
  • 10. Structure drawing and appropriate 3D-optimization: Identification, Information & Description
  • 11. Molecular descriptor calculation and pruning: 1D- like MW, Number of atoms, etc. 2D- like Distance, functional group, etc. 3D- like torsional angles, etc.
  • 12. Step-2: Calculation of Descriptors Charge on atom Dipole moment pKa HOMO, LUMO Chirality Hydrogen bond donor/acceptor LogP Thermodynamic………etc . Note: At present, more than 45,000 descriptor can be calculated !!!
  • 13. Step-3: Descriptor selection & Model building All descriptors do not contain useful information. Many descriptors provide same information. Use of too many descriptors results in “Over Fitting”. Use of improper descriptors results in poor and misleading models. Use of many descriptors can lead to Chancy correlation. Use SR, GA, MA, etc. to select best descriptors
  • 14. Current Methods for Model Building A) Multiple Linear Regression (MLR)  Best Multiple Linear Regression (BMLR),  Heuristic Method (HM),  Genetic Algorithm-Multiple Linear Regression (GA-MLR),  Stepwise MLR,  Factor Analysis MLR and so on. B) Partial Least Squares (PLS)  Genetic Partial Least Squares (G/PLS),  Factor Analysis Partial Least Squares (FA-PLS),  Orthogonal Signal Correction Partial Least Squares (OSC- PLS)
  • 15. Step-4: Validation of model a) Leave-One-Out Cross validation: b) Leave-Many-Out Cross Validation: c) External validation d) Use PCA, Simulated Annealing, Automated Relevance Determination (ARD), etc… e) Use Bayesian Statistics or Gaussian Processes since they do not require Cross-Validation!!!
  • 16. Modern trends in QSAR modeling • Currently, there is much talk about the use of artificial intelligence (AI) in chemistry. • AI is the superset of tasks that demonstrate characteristics of human intelligence, while ML is a subset of AI which accesses data, analyses trends and generates intelligent, actionable insights. • Many people use the term AI in the same context as ML in many data-rich disciplines, ranging from health care to astronomy. • In this regard one can say that AI has been used in chemistry since the 1960’s under the name QSAR.
  • 17.
  • 18. troponin I-interacting kinase (TNNI3K) IC50 = 8000 nM* troponin I-interacting kinase (TNNI3K) AI pred: IC50 = 7800 nM Experimental: IC50 = 80 nM* *Lawhorn, B. G. et al., Identification of purines and 7-deazapurines as potent and selective type I inhibitors of troponin I-interacting kinase (TNNI3K). J. Med. Chem. 2015, 58, 7431−7448.
  • 19. spleen tyrosine kinase (Syk) IC50 = 8.8 nM* spleen tyrosine kinase (Syk) AI pred: IC50 = 10 nM Experimental: IC50 = 0.060 nM* *Ellis, J. M. et al., Overcoming mutagenicity and ion channel activity: optimization of selective spleen tyrosine kinase inhibitors. J. Med. Chem. 2015, 58, 1929−1939.
  • 20.
  • 21. QSAR based virtual screening • Molecular docking can rapidly identify large subsets of molecules with desired activity from large screening collections of compounds (105–106 compounds) using automated methods. • However, the hit rate ranges between 0.01% and 0.1% !!! • Most of the screened compounds are routinely reported as false positives. • On the other hand, typical hit rates for QSAR-based virtual screening range between 1% and 40% !!!!! Reference: Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN and Andrade CH (2018) QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Frontiers in Pharmacology 9. doi: 10.3389/fphar.2018.01275
  • 22. QSAR based virtual screening: Success Stories • Zhang et al. (2013), a data set of 3,133 compounds reported as active or inactive against P. falciparum was used to develop QSAR models. • QSAR models were applied for VS of the ChemBridge database. • After VS, 176 potential antimalarial compounds were identified and submitted to experimental validation along with 42 putative inactive compounds. • Twenty-five compounds presented antimalarial activity in P. falciparum. • All 42 compounds predicted as inactives by the models were confirmed experimentally to be inactives.
  • 23. • Alves et al. (2020), a data set of 113 compounds (40 actives and 73 inactives) for the SARS-CoV Mpro. • QSAR models were applied for VS of the DrugBank database of FDA approved drugs. • After VS, 42 potential drugs were identified but only 11 were tested for experimental validation. • Three compounds presented strong activity for the SARS- CoV-2 Mpro. QSAR based virtual screening: Success Stories 1. Zhang, L. et al. (2013) Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening, J. Chem. Inf. Model. 53, 475–492. DOI: 10.1021/ci300421n 2. Alves et al. (2020) QSAR Modeling of SARS-CoV Mpro Inhibitors Identifies Sufugolix, Cenicriviroc, Proglumetacin, and Other Drugs as Candidates for Repurposing against SARS-CoV-2, Mol inf (Wiley). DOI: 10.1002/minf.202000113
  • 24. Disadvantages of QSAR •False correlations may arise because biological data that are subject to considerable experimental error (noisy data). •If training dataset is not large enough, the data collected may not reflect the complete property space. Consequently, many QSAR results cannot be used to confidently predict the most likely compounds of best activity. •Features may not be reliable as well. This is particularly serious for 3D features because 3D structures of ligands binding to receptor may not be available. Common approach is to use minimized structure, but that may not represent the reality well.
  • 25. 1. ACD Chemsketch (www.acdlabs.com) 2. PyMOL 3. RDKit 4. ChemDraw 5. Avogadro software (https://avogadro.cc/) 6. OpenBabel (http://openbabel.org/wiki/Main_Page) 7. MMTK (http://dirac.cnrs-orleans.fr/MMTK.html) 8. PyDescriptor (available from Dr. V. H. Masand) 9. PaDEL (http://www.yapcwsoft.com/dd/padeldescriptor/) 10.BuildQSAR (https://profanderson.net/files/buildqsar.php) 11.Weka (https://www.cs.waikato.ac.nz/ml/weka/) 12.‘R’ package like GA-MLR, Carret, etc. Free Software for QSAR
  • 26. 1. ChEMBL Database - EMBL-EBI: ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data https://www.ebi.ac.uk/chembl/ 2. Enzyme Database – BRENDA: A comprehensive enzyme information system. https://www.brenda-enzymes.org/ Databases