SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
AI FOR PHARMACEUTICAL
INDUSTRY – A PRIMER
GOPI KRISHNA NUTI
VICE PRESIDENT, MUST RESEARCH
LEAD DATA SCIENTIST, AUTODESK
HTTPS://WWW.LINKEDIN.COM/IN/NGOPIKRISHNA/
ABOUT ME • Gopi Krishna Nuti
• Education
• Proud alumni of Andhra University College of Engineering,
Computer Science Department
• MS in Data Science from State University of NewYork at Buffalo
• MBA from Amrita University
• Career
• Working in IT industry for the past 20 years and in AI/ML/Data
Science for nearly a decade
• Lead Data Scientist in Autodesk,Bangalore
• Vice President of MUST Research, aTechnology NGO working to
bridge Academia and Industry in the field of AI.Working closely
with multiple governments, NASSCOM, NITI Ayog, BIS etc.
• Author of Amazon Best Seller “Machine Learning for Engineers”
• Research Publications and patents
WHAT IS AI
A wide-ranging branch of computer science concerned with building smart machines
capable of performing tasks that typically require human intelligence.
Multiple names with subtle differences.AI/ML/DL/Data Science. Can be loosely
considered to be the same.
Relies heavily on mathematics (Statistics and Linear Algebra)
Statistics? We already use them, don’t we?
Samples, populations, Standard deviations, p-values, paired t-tests,
normal distribution are very commonly used in Pharmaceutical
industry
SO,
WHAT’S NEW
WITH AI?
• AI draws heavily from the same basic principles.
• However, Statistics is used for cases with data paucity.
• Example:
• Consider clinical trial of a drug for high-risk, advanced
stage women pregnant for the first time after the age of
45.
• Clinical samples are, understandably, small.
• Statistics comes to the rescue here
• ML/AI is useful when data is abundant.
DO I HAVETO LEARN MATHS?
• Fortunately, NO.
• Familiarity with math shall be helpful. Unfamiliarity is not a deal breaker.
• Asking Pharmacologists to master the math is like expecting a film star in Celebrity
Cricket League to go play in ICCWorld cup.
• Neither wrong nor impossible.
• However, such a person is likely to be a sportsman; not an actor ☺
WHAT CAN
AI DO?
• Analyse massive amounts of data, mathematically.
• Identify hidden patterns in data and extracts insights which would be
humanly impractical.
• Predict how changes to one variable impacts another
• Analyse data collected over time and identify trends
• Group similar pieces of data into a single cluster along multiple
dimensions
• Identify factors which are related to one another
• Learn from images, free form text to detect specific pieces of information
• For the most part, backed by verifiable algebraic/statistical algorithms.
WHAT CAN I DOWITH AI?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/
WHAT AI CAN (NOT) DO?
• Create a race of killer robot machines which will enslave humans???
• Yeah,AI cannot do that yet.
WHAT SHOULD
WE LEARN IN AI
Supervised
Learning
Regression
Classification
Timeseries forecasting
Unsupervised
learning
Clustering
Apriori
Deep Learning
Vs Statistics
based
approaches
SUPERVISED
LEARNING -
REGRESSION
• Predicting a parameter based on other parameters
• y=mx+c anyone? That’s Linear Regression.
• Simplest ML algorithm
• m, c have a different formula than slope and y-intercept that we studied in 8th
• Parameter to predict = f(remaining parameters)
• Other algorithms
• Decision Trees, Random Forests, SupportVector Regression, Polynomial
Regression etc.
• Use cases:
• Predict
• Solubility, Partition Coefficient (logP), degree of ionization, intrinsic
permeability
• Using
• Molecular descriptor, SMILES strings, electron density of the molecule
in the chemical space etc.
EXAMPLE FOR REGRESSION
• ESOL Dataset
• Properties of 1128 compounds have been tabulated
Question : Can you predict the solubility of CC(C)C(C(=O)OC(C#N)c1cccc(Oc2ccccc2)c1)c3ccc(OC(F)F)cc3 (Flucythrinate)
without an actual experiment?
Compound ID smiles
Minimum
Degree
Molecular
Weight
Number of
H-Bond
Donors
Number
of Rings
Number of
Rotatable
Bonds
Polar
Surface
Area
measured log
solubility in mols
per litre
Amigdalin
OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(
O)C3O
1 457.432 7 3 7 202.32 -0.77
Fenfuram Cc1occc1C(=O)Nc2ccccc2 1 201.225 1 2 2 42.24 -3.3
citral CC(C)=CCCC(C)=CC(=O) 1 152.237 0 0 4 17.07 -2.06
Picene c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43 2 278.354 0 5 0 0 -7.87
Thiophene c1ccsc1 2 84.143 0 1 0 0 -1.33
benzothiazole c2ccc1scnc1c2 2 135.191 0 2 0 12.89 -1.5
2,2,4,6,6'-PCB Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl 1 326.437 0 2 1 0 -7.32
Estradiol CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O 1 272.388 2 4 0 40.46 -5.03
Dieldrin ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl 1 380.913 0 5 0 12.53 -6.29
Delaney, John S. "ESOL: estimating aqueous solubility directly from molecular structure." Journal of chemical information and computer sciences 44.3 (2004): 1000-1005.
REGRESSION PROCEDURE
• Mathematical formulation
Solubility = f(Minimum Degree, Molecular Weight, Number of H-Bond Donors, Number of Rings, Number of Rotatable Bonds, Polar Surface Area)
• By using the Regression ML formulae,
solubility for Flucythrinate = -6.878
• Actual/Measured solubility = -6.876
• Error of just 0.002
• Time taken for predicting this value: less than 5 minutes.
• Cost – negligible
• Historical data is all that’s needed
SOME
TERMINOLOGY
• Parameter to be predicted (Solubility in the example) –
Dependent Variable
• Input parameters (Molecular Weight, Number of H-Bond
Donors, Number of Rings, Number of Rotatable Bonds, Polar
Surface Area) – Independent variables
• Difference between predicted value and actual value – Error
• R2 – A ratio of how much model explains the data. 1 is the
highest possible value and 0 is the lowest.
• Error is a statistical inevitability. It can be minimized but can
never be eliminated.
SUPERVISED
LEARNING -
CLASSIFICATION
• If “y” happens to NOT be a number but a class or category.
• Examples:Yes/No, Mild/Moderate/Severe, etc.
• Algorithms are Logistic Regression, k-Nearest Neighbours, DecisionTrees,
SupportVector Classifications, Naïve Bayes Classification etc.
• Use cases
• Predict Molecules that might respond to a given biochemical assay
Molecular properties
• Properties of novel molecules using Properties of old molecules,
structure of old molecules, structure of new molecule
EXAMPLE FOR CLASSIFICATION
assay measurements for 12 different toxic effects
1 – Toxic, 0 0 Non-toxic, NA – Information unavailable/unknown
SR.HSE
NCGC00178831-03 0
NCGC00166114-03 0
NCGC00263563-01 0
NCGC00013058-02 1
NCGC00167516-01 NA
NCGC00018301-05 1
NCGC00249897-01 1
NCGC00016000-18 1
AW AWeight Arto
BertzC
T Chi0 Chi1 Chi10 Chi2
NCGC00178831-03 54367203 13.053 2.176 3.194 23.112 15.868 1.496 15.127
NCGC00166114-03 12688176.07 22.123 2.065 3.137 21.033 13.718 1.937 13.187
NCGC00263563-01 3076932.336 13.085 2.154 3.207 46.896 29.958 3.806 30.105
NCGC00013058-02 71685690.57 12.832 2.029 3.38 51.086 32.045 1.806 29.09
NCGC00167516-01 7989702.276 12.936 2.124 3.573 70.295 46.402 3.604 42.132
NCGC00018301-05 6.213 13.143 2 2.607 17.079 11.041 0.286 9.157
NCGC00249897-01 2.773 28.889 2.167 2.561 8.715 5.698 0.037 5.368
NCGC00016000-18 4.183 17.275 1.875 2.303 12.552 7.599 0 5.685
Chemical structures of molecules
800 properties of the molecule are available
If a new molecule’s properties are provided, can you predict if SR.HSE is toxic?
Example is fromTox21 Dataset
https://www.bioinf.jku.at/research/DeepTox/tox21.html
[Mayr2016] Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science, 3:80.
[Huang2016] Huang, R., Xia, M., Nguyen, D. T., Zhao, T., Sakamuru, S., Zhao, J., Shahane, S., Rossoshek, A., & Simeonov, A. (2016). Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to
environmental chemicals and drugs. Frontiers in Environmental Science, 3:85.
SOME
TERMINOLOGY
• Parameter to be predicted (SR.HSE) – Dependent Variable
• Input parameters (AW, Aweight, Arto, BertzCT, Chi0, Chi1, Chi10, Chi2) –
Independent variables
• Error is a statistical inevitability. It can be minimized but can never be
eliminated.
• Can’t calculate error because Severe – Mild = Moderate is meaningless
• Values like True Positive, True Negative, False Positive, False Negative are
calculated.
• Metrics used are Specificity, Sensitivity, Accuracy, Precision, Recall, Area Under
Curve. All these are derived from the above 4.
• Standard example used to explain false positives to newbies actually comes
from biological sciences.
Best example of False Positive: A pregnancy kit confirms the pregnancy of a
male human.
SUPERVISED
LEARNING –
FORECASTING
• Also called Time series analysis
• Useful when the behaviour patterns should be identified over a
period of time.
• Classic examples: Stock market price prediction.Today’s opening
price is dependent on yesterday’s closing price.
• Algorithms
• ARIMA, SARIMA, HoltWinters etc.
• Use cases
• Patient Churn rate during experimentation stages. Based
on historical data, can we predict when a volunteer will
drop off the sample?
• Can we identify a seasonality and/or trend in the
resurgence of Dengue, Covid etc?
IS THERE A
SEASONALITY HERE?
• If so, can we predict when the
next season will begin?
• Immense Benefits to healthcare
IS THERE A
SEASONALITY HERE?
• If so, can we predict when the
next season will begin?
• Immense Benefits to healthcare
UNSUPERVISED
LEARNING
• We are not predicting anything here.We are simply trying to understand
the data better. i.e.There is no y=mx+c like relationship in the data
• Identify hidden patterns in the data
• Use cases
• Drug repurposing
• Therapeutic efficacy of drugs and target proteins of known
and unknown pharmaceuticals
• Interpreting the molecular mechanism of chemicals
• Which molecules are similar to one another?To which
cluster can I assign this molecule?
• Assumes that similar molecules react in similar ways
with a protein
• Patient screening/selection
• Group patients together based on previously unknown
similarities
• Precision Medicine
SOME
ALGORITHM
DETAILS
• Clustering is a data mining technique
• used to group data based on similarities or patterns in
data
• All observations in a cluster are similar to one another.
• Clusters are different from each other
• Association rule
• used to find the relationship among items in data
sets
• Mis-leading if used on small datasets
EXAMPLE OF CLUSTERING
After applying k-means clustering algorithm, to identify 4 clusters, the
centroids of these clusters is found to be →
• Cluster 1 was dominated by elderly patients with the youngest age
of 71 years and the oldest age of 97 years came from Lens Poly,
General Poly, and Refraction Poly.
• Cluster 2 is also dominated by elderly patients but have different
ages, namely 52-68 years from more diverse polys such as Lens
Polymers, General, EED, Glaucoma and Refraction Poly.
• Cluster 3 was dominated by patients aged 31-49 years with most
patients in cluster 3 coming from EED Poly. In cluster 4 the patients
in this cluster are dominated by children up to adolescents aged 1 to
30 years with patients who come from diverse poly areas such as
pediatric poly and EED.
https://iopscience.iop.org/article/10.1088/1742-6596/1196/1/012051/pdf
HYPOTHETICAL EXAMPLE OF CLUSTERING
https://iopscience.iop.org/article/10.1088/1742-6596/1196/1/012051/pdf
Patient Id Blood group Year of Birth Smoking Drinking Narcotics Promiscuity Age of onset
disease
History of
foreign travel
Hereditary
A A+ Y N N N 72 Y Y
B B- Y N Y N 68 Y N
C O+ N Y N Y 45 Y Y
D O+ N N N N 82 N Y
E B+ Y N N Y 32 N N
F Bombay blood
group
N Y Y N 65 N N
• Imagine a cluster being identified with below characteristics
• Cluster of cases where Blood group = O+,Year birth is before 1947, Smoking =Y, Drinking=Y, Narcotics=Y,
Promiscuity=Y, History of Foreign Travel=N, Hereditary=N and Age of onset > 80
• If this cluster is < 0.05% of cases, then we can statistically ignore it.
• If this cluster is > 80% of cases then, we can perhaps say that the disease is prevalent among a particular section of
population.
APRIORI
• Used to find relationship between
items in a database.
• Famous example of beer and diapers in
a supermarket
HYPOTHETICAL EXAMPLE FOR A-PRIORI
• Chemical structures of molecule are quantified and tabulated
• Solubility is measured and documented.
Question : How many items with Minimum Degree of 3 of exactly 2 H-Bond donors have 2 rotatable bonds?
Compound ID smiles
Minimum
Degree
Molecular
Weight
Number of
H-Bond
Donors
Number
of Rings
Number of
Rotatable
Bonds
Polar
Surface
Area
measured log
solubility in mols
per litre
Amigdalin
OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C(
O)C3O
1 457.432 7 3 7 202.32 -0.77
Fenfuram Cc1occc1C(=O)Nc2ccccc2 1 201.225 1 2 2 42.24 -3.3
citral CC(C)=CCCC(C)=CC(=O) 1 152.237 0 0 4 17.07 -2.06
Picene c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43 2 278.354 0 5 0 0 -7.87
Thiophene c1ccsc1 2 84.143 0 1 0 0 -1.33
benzothiazole c2ccc1scnc1c2 2 135.191 0 2 0 12.89 -1.5
2,2,4,6,6'-PCB Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl 1 326.437 0 2 1 0 -7.32
Estradiol CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O 1 272.388 2 4 0 40.46 -5.03
Dieldrin ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl 1 380.913 0 5 0 12.53 -6.29
DEEP
LEARNING
• Performs the same activities i.e. Regression,
Classification, Forecasting etc.
• Relies on linear algebra rather than statistical
methods.
• Training data requirements are much higher
• Computational complexity is much higher
• Results are much more accurate. i.e. Error is much
lower.
COMPUTER
VISION
• Processing images to identify information from them
• Examples
• Automatically analysing x-ray images
• Diabetic Retinopathy
• Identifying compliance to drug intake during clinical
trials
NATURAL
LANGUAGE
PROCESSING
• Reading free-form text to identify information
• Examples:
• Automatically read prescriptions
• Analyse medical reports to extract information in
actionable manner
• Patient A reported reduced pain in 30 minutes after
administering 10 mg of the drug. Patient B reported
reduced pain in 15 minutes after administering 15 mg.
Name Dosage Time
A 10 30
B 15 15
PROBLEMS
WITH AI
• Heavily dependent on the data that is fed to it.
Garbage in, Garbage out.
• In some cases, too much data is also a problem.
• Explainability is still aWork In Progress for AI
scientists.
CAN AI REPLACE
HUMANS?
Yeah, No.AI is not so intelligent.
AI is very very good at searching for answers in data.
Human intellect is in asking the right questions.
AI is neither artificial nor intelligent. It is made from
natural resources and it is people who are performing the
tasks to make the systems appear autonomous.
- Kate Crawford, Senior principal researcher at Microsoft
Research.
HOW DOES IT ALL WORK TOGETHER
• Data is increasingly digitized and stored in databases.
• IoT devices are capturing information in clinical studies, manufacturing, biometry data
• Once data is available in a table, ML can start working
FURTHER READING
• Should I learn mathematics?
• Should I learn programming?
• Where can I learn more?
• Machine Learning for Engineers https://www.amazon.in/dp/9389024870/ref=cm_sw_em_r_mt_dp_M4SYKWQ234BAKR800NKZ
• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/
• https://pubmed.ncbi.nlm.nih.gov/30472429/
QUESTIONS AND ANSWERS
• ThankYou!
• For further discussions
• Ngopikrishna.public@gmail.com
• +91-9036005121

Weitere ähnliche Inhalte

Ähnlich wie Ai for pharmaceutical industry – a primer

Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and YouJoel Saltz
 
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...StatsCommunications
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the massesLaura Berry
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningTamjid Rayhan
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
Research Method Review Report ( Experimentation )
Research Method Review Report ( Experimentation )Research Method Review Report ( Experimentation )
Research Method Review Report ( Experimentation )Jennifer Campbell
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for SponsorsDee Daley
 
Making an Impact With Data Visualization
Making an Impact With Data VisualizationMaking an Impact With Data Visualization
Making an Impact With Data VisualizationUNCResearchHub
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationLewandog, Inc,
 
A/B Testing - Design, Analysis and Pitfals
A/B Testing - Design, Analysis and PitfalsA/B Testing - Design, Analysis and Pitfals
A/B Testing - Design, Analysis and PitfalsSlava Borodovsky
 
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET Journal
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)AllSeq
 
John Billings: Developing a new predictive risk model
John Billings: Developing a new predictive risk modelJohn Billings: Developing a new predictive risk model
John Billings: Developing a new predictive risk modelNuffield Trust
 
Diagnosis Support by Machine Learning Using Posturography Data
Diagnosis Support by Machine Learning Using Posturography DataDiagnosis Support by Machine Learning Using Posturography Data
Diagnosis Support by Machine Learning Using Posturography DataTeruKamogashira
 
IRJET - Real Time Facial Analysis using Tensorflowand OpenCV
IRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCVIRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCV
IRJET - Real Time Facial Analysis using Tensorflowand OpenCVIRJET Journal
 
Materials informatics skunkworks overview 2015-11-18 1.1
Materials informatics skunkworks overview 2015-11-18 1.1Materials informatics skunkworks overview 2015-11-18 1.1
Materials informatics skunkworks overview 2015-11-18 1.1ddm314
 
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersIRJET Journal
 

Ähnlich wie Ai for pharmaceutical industry – a primer (20)

Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and You
 
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...
Kick-off Meeting of the Advisory Group for the OECD Guidelines for Measuring ...
 
3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses3decision®: Bringing structural data analytics to the masses
3decision®: Bringing structural data analytics to the masses
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
Research Method Review Report ( Experimentation )
Research Method Review Report ( Experimentation )Research Method Review Report ( Experimentation )
Research Method Review Report ( Experimentation )
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for Sponsors
 
Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
Making an Impact With Data Visualization
Making an Impact With Data VisualizationMaking an Impact With Data Visualization
Making an Impact With Data Visualization
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick Implementation
 
Chang Sha, China
Chang Sha, ChinaChang Sha, China
Chang Sha, China
 
A/B Testing - Design, Analysis and Pitfals
A/B Testing - Design, Analysis and PitfalsA/B Testing - Design, Analysis and Pitfals
A/B Testing - Design, Analysis and Pitfals
 
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)
 
John Billings: Developing a new predictive risk model
John Billings: Developing a new predictive risk modelJohn Billings: Developing a new predictive risk model
John Billings: Developing a new predictive risk model
 
Diagnosis Support by Machine Learning Using Posturography Data
Diagnosis Support by Machine Learning Using Posturography DataDiagnosis Support by Machine Learning Using Posturography Data
Diagnosis Support by Machine Learning Using Posturography Data
 
IRJET - Real Time Facial Analysis using Tensorflowand OpenCV
IRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCVIRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCV
IRJET - Real Time Facial Analysis using Tensorflowand OpenCV
 
Materials informatics skunkworks overview 2015-11-18 1.1
Materials informatics skunkworks overview 2015-11-18 1.1Materials informatics skunkworks overview 2015-11-18 1.1
Materials informatics skunkworks overview 2015-11-18 1.1
 
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health Matters
 

Mehr von Gopi Krishna Nuti

Neural Networks - it’s usage in Corporate
Neural Networks -it’s usage in CorporateNeural Networks -it’s usage in Corporate
Neural Networks - it’s usage in CorporateGopi Krishna Nuti
 
Mathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringMathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringGopi Krishna Nuti
 
Emerging Technology trends and employability skills
Emerging Technology trends and employability skillsEmerging Technology trends and employability skills
Emerging Technology trends and employability skillsGopi Krishna Nuti
 
Emerging trends in Artificial intelligence - A deeper review
Emerging trends in Artificial intelligence - A deeper reviewEmerging trends in Artificial intelligence - A deeper review
Emerging trends in Artificial intelligence - A deeper reviewGopi Krishna Nuti
 
Classification vis a-vis ranking - gopi
Classification vis a-vis ranking - gopiClassification vis a-vis ranking - gopi
Classification vis a-vis ranking - gopiGopi Krishna Nuti
 
Emerging Trends in Information Technology
Emerging Trends in Information TechnologyEmerging Trends in Information Technology
Emerging Trends in Information TechnologyGopi Krishna Nuti
 
Computer vision old problems new solutions
Computer vision   old problems new solutionsComputer vision   old problems new solutions
Computer vision old problems new solutionsGopi Krishna Nuti
 

Mehr von Gopi Krishna Nuti (13)

Neural Networks - it’s usage in Corporate
Neural Networks -it’s usage in CorporateNeural Networks -it’s usage in Corporate
Neural Networks - it’s usage in Corporate
 
AI for HRM
AI for HRMAI for HRM
AI for HRM
 
Mathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringMathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML Engineering
 
Image analytics - A Primer
Image analytics - A PrimerImage analytics - A Primer
Image analytics - A Primer
 
Softskills orientation
Softskills orientationSoftskills orientation
Softskills orientation
 
Emerging Technology trends and employability skills
Emerging Technology trends and employability skillsEmerging Technology trends and employability skills
Emerging Technology trends and employability skills
 
Ml - A shallow dive
Ml  - A shallow diveMl  - A shallow dive
Ml - A shallow dive
 
Emerging trends in Artificial intelligence - A deeper review
Emerging trends in Artificial intelligence - A deeper reviewEmerging trends in Artificial intelligence - A deeper review
Emerging trends in Artificial intelligence - A deeper review
 
Classification vis a-vis ranking - gopi
Classification vis a-vis ranking - gopiClassification vis a-vis ranking - gopi
Classification vis a-vis ranking - gopi
 
F2 talk
F2 talkF2 talk
F2 talk
 
Inferene trends in industry
Inferene trends in industryInferene trends in industry
Inferene trends in industry
 
Emerging Trends in Information Technology
Emerging Trends in Information TechnologyEmerging Trends in Information Technology
Emerging Trends in Information Technology
 
Computer vision old problems new solutions
Computer vision   old problems new solutionsComputer vision   old problems new solutions
Computer vision old problems new solutions
 

Kürzlich hochgeladen

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 

Kürzlich hochgeladen (20)

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 

Ai for pharmaceutical industry – a primer

  • 1. AI FOR PHARMACEUTICAL INDUSTRY – A PRIMER GOPI KRISHNA NUTI VICE PRESIDENT, MUST RESEARCH LEAD DATA SCIENTIST, AUTODESK HTTPS://WWW.LINKEDIN.COM/IN/NGOPIKRISHNA/
  • 2. ABOUT ME • Gopi Krishna Nuti • Education • Proud alumni of Andhra University College of Engineering, Computer Science Department • MS in Data Science from State University of NewYork at Buffalo • MBA from Amrita University • Career • Working in IT industry for the past 20 years and in AI/ML/Data Science for nearly a decade • Lead Data Scientist in Autodesk,Bangalore • Vice President of MUST Research, aTechnology NGO working to bridge Academia and Industry in the field of AI.Working closely with multiple governments, NASSCOM, NITI Ayog, BIS etc. • Author of Amazon Best Seller “Machine Learning for Engineers” • Research Publications and patents
  • 3. WHAT IS AI A wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that typically require human intelligence. Multiple names with subtle differences.AI/ML/DL/Data Science. Can be loosely considered to be the same. Relies heavily on mathematics (Statistics and Linear Algebra) Statistics? We already use them, don’t we? Samples, populations, Standard deviations, p-values, paired t-tests, normal distribution are very commonly used in Pharmaceutical industry
  • 4. SO, WHAT’S NEW WITH AI? • AI draws heavily from the same basic principles. • However, Statistics is used for cases with data paucity. • Example: • Consider clinical trial of a drug for high-risk, advanced stage women pregnant for the first time after the age of 45. • Clinical samples are, understandably, small. • Statistics comes to the rescue here • ML/AI is useful when data is abundant.
  • 5. DO I HAVETO LEARN MATHS? • Fortunately, NO. • Familiarity with math shall be helpful. Unfamiliarity is not a deal breaker. • Asking Pharmacologists to master the math is like expecting a film star in Celebrity Cricket League to go play in ICCWorld cup. • Neither wrong nor impossible. • However, such a person is likely to be a sportsman; not an actor ☺
  • 6. WHAT CAN AI DO? • Analyse massive amounts of data, mathematically. • Identify hidden patterns in data and extracts insights which would be humanly impractical. • Predict how changes to one variable impacts another • Analyse data collected over time and identify trends • Group similar pieces of data into a single cluster along multiple dimensions • Identify factors which are related to one another • Learn from images, free form text to detect specific pieces of information • For the most part, backed by verifiable algebraic/statistical algorithms.
  • 7. WHAT CAN I DOWITH AI? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/
  • 8.
  • 9. WHAT AI CAN (NOT) DO? • Create a race of killer robot machines which will enslave humans??? • Yeah,AI cannot do that yet.
  • 10. WHAT SHOULD WE LEARN IN AI Supervised Learning Regression Classification Timeseries forecasting Unsupervised learning Clustering Apriori Deep Learning Vs Statistics based approaches
  • 11. SUPERVISED LEARNING - REGRESSION • Predicting a parameter based on other parameters • y=mx+c anyone? That’s Linear Regression. • Simplest ML algorithm • m, c have a different formula than slope and y-intercept that we studied in 8th • Parameter to predict = f(remaining parameters) • Other algorithms • Decision Trees, Random Forests, SupportVector Regression, Polynomial Regression etc. • Use cases: • Predict • Solubility, Partition Coefficient (logP), degree of ionization, intrinsic permeability • Using • Molecular descriptor, SMILES strings, electron density of the molecule in the chemical space etc.
  • 12. EXAMPLE FOR REGRESSION • ESOL Dataset • Properties of 1128 compounds have been tabulated Question : Can you predict the solubility of CC(C)C(C(=O)OC(C#N)c1cccc(Oc2ccccc2)c1)c3ccc(OC(F)F)cc3 (Flucythrinate) without an actual experiment? Compound ID smiles Minimum Degree Molecular Weight Number of H-Bond Donors Number of Rings Number of Rotatable Bonds Polar Surface Area measured log solubility in mols per litre Amigdalin OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C( O)C3O 1 457.432 7 3 7 202.32 -0.77 Fenfuram Cc1occc1C(=O)Nc2ccccc2 1 201.225 1 2 2 42.24 -3.3 citral CC(C)=CCCC(C)=CC(=O) 1 152.237 0 0 4 17.07 -2.06 Picene c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43 2 278.354 0 5 0 0 -7.87 Thiophene c1ccsc1 2 84.143 0 1 0 0 -1.33 benzothiazole c2ccc1scnc1c2 2 135.191 0 2 0 12.89 -1.5 2,2,4,6,6'-PCB Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl 1 326.437 0 2 1 0 -7.32 Estradiol CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O 1 272.388 2 4 0 40.46 -5.03 Dieldrin ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl 1 380.913 0 5 0 12.53 -6.29 Delaney, John S. "ESOL: estimating aqueous solubility directly from molecular structure." Journal of chemical information and computer sciences 44.3 (2004): 1000-1005.
  • 13. REGRESSION PROCEDURE • Mathematical formulation Solubility = f(Minimum Degree, Molecular Weight, Number of H-Bond Donors, Number of Rings, Number of Rotatable Bonds, Polar Surface Area) • By using the Regression ML formulae, solubility for Flucythrinate = -6.878 • Actual/Measured solubility = -6.876 • Error of just 0.002 • Time taken for predicting this value: less than 5 minutes. • Cost – negligible • Historical data is all that’s needed
  • 14. SOME TERMINOLOGY • Parameter to be predicted (Solubility in the example) – Dependent Variable • Input parameters (Molecular Weight, Number of H-Bond Donors, Number of Rings, Number of Rotatable Bonds, Polar Surface Area) – Independent variables • Difference between predicted value and actual value – Error • R2 – A ratio of how much model explains the data. 1 is the highest possible value and 0 is the lowest. • Error is a statistical inevitability. It can be minimized but can never be eliminated.
  • 15. SUPERVISED LEARNING - CLASSIFICATION • If “y” happens to NOT be a number but a class or category. • Examples:Yes/No, Mild/Moderate/Severe, etc. • Algorithms are Logistic Regression, k-Nearest Neighbours, DecisionTrees, SupportVector Classifications, Naïve Bayes Classification etc. • Use cases • Predict Molecules that might respond to a given biochemical assay Molecular properties • Properties of novel molecules using Properties of old molecules, structure of old molecules, structure of new molecule
  • 16. EXAMPLE FOR CLASSIFICATION assay measurements for 12 different toxic effects 1 – Toxic, 0 0 Non-toxic, NA – Information unavailable/unknown SR.HSE NCGC00178831-03 0 NCGC00166114-03 0 NCGC00263563-01 0 NCGC00013058-02 1 NCGC00167516-01 NA NCGC00018301-05 1 NCGC00249897-01 1 NCGC00016000-18 1 AW AWeight Arto BertzC T Chi0 Chi1 Chi10 Chi2 NCGC00178831-03 54367203 13.053 2.176 3.194 23.112 15.868 1.496 15.127 NCGC00166114-03 12688176.07 22.123 2.065 3.137 21.033 13.718 1.937 13.187 NCGC00263563-01 3076932.336 13.085 2.154 3.207 46.896 29.958 3.806 30.105 NCGC00013058-02 71685690.57 12.832 2.029 3.38 51.086 32.045 1.806 29.09 NCGC00167516-01 7989702.276 12.936 2.124 3.573 70.295 46.402 3.604 42.132 NCGC00018301-05 6.213 13.143 2 2.607 17.079 11.041 0.286 9.157 NCGC00249897-01 2.773 28.889 2.167 2.561 8.715 5.698 0.037 5.368 NCGC00016000-18 4.183 17.275 1.875 2.303 12.552 7.599 0 5.685 Chemical structures of molecules 800 properties of the molecule are available If a new molecule’s properties are provided, can you predict if SR.HSE is toxic? Example is fromTox21 Dataset https://www.bioinf.jku.at/research/DeepTox/tox21.html [Mayr2016] Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2016). DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science, 3:80. [Huang2016] Huang, R., Xia, M., Nguyen, D. T., Zhao, T., Sakamuru, S., Zhao, J., Shahane, S., Rossoshek, A., & Simeonov, A. (2016). Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Frontiers in Environmental Science, 3:85.
  • 17. SOME TERMINOLOGY • Parameter to be predicted (SR.HSE) – Dependent Variable • Input parameters (AW, Aweight, Arto, BertzCT, Chi0, Chi1, Chi10, Chi2) – Independent variables • Error is a statistical inevitability. It can be minimized but can never be eliminated. • Can’t calculate error because Severe – Mild = Moderate is meaningless • Values like True Positive, True Negative, False Positive, False Negative are calculated. • Metrics used are Specificity, Sensitivity, Accuracy, Precision, Recall, Area Under Curve. All these are derived from the above 4. • Standard example used to explain false positives to newbies actually comes from biological sciences. Best example of False Positive: A pregnancy kit confirms the pregnancy of a male human.
  • 18. SUPERVISED LEARNING – FORECASTING • Also called Time series analysis • Useful when the behaviour patterns should be identified over a period of time. • Classic examples: Stock market price prediction.Today’s opening price is dependent on yesterday’s closing price. • Algorithms • ARIMA, SARIMA, HoltWinters etc. • Use cases • Patient Churn rate during experimentation stages. Based on historical data, can we predict when a volunteer will drop off the sample? • Can we identify a seasonality and/or trend in the resurgence of Dengue, Covid etc?
  • 19. IS THERE A SEASONALITY HERE? • If so, can we predict when the next season will begin? • Immense Benefits to healthcare
  • 20. IS THERE A SEASONALITY HERE? • If so, can we predict when the next season will begin? • Immense Benefits to healthcare
  • 21. UNSUPERVISED LEARNING • We are not predicting anything here.We are simply trying to understand the data better. i.e.There is no y=mx+c like relationship in the data • Identify hidden patterns in the data • Use cases • Drug repurposing • Therapeutic efficacy of drugs and target proteins of known and unknown pharmaceuticals • Interpreting the molecular mechanism of chemicals • Which molecules are similar to one another?To which cluster can I assign this molecule? • Assumes that similar molecules react in similar ways with a protein • Patient screening/selection • Group patients together based on previously unknown similarities • Precision Medicine
  • 22. SOME ALGORITHM DETAILS • Clustering is a data mining technique • used to group data based on similarities or patterns in data • All observations in a cluster are similar to one another. • Clusters are different from each other • Association rule • used to find the relationship among items in data sets • Mis-leading if used on small datasets
  • 23. EXAMPLE OF CLUSTERING After applying k-means clustering algorithm, to identify 4 clusters, the centroids of these clusters is found to be → • Cluster 1 was dominated by elderly patients with the youngest age of 71 years and the oldest age of 97 years came from Lens Poly, General Poly, and Refraction Poly. • Cluster 2 is also dominated by elderly patients but have different ages, namely 52-68 years from more diverse polys such as Lens Polymers, General, EED, Glaucoma and Refraction Poly. • Cluster 3 was dominated by patients aged 31-49 years with most patients in cluster 3 coming from EED Poly. In cluster 4 the patients in this cluster are dominated by children up to adolescents aged 1 to 30 years with patients who come from diverse poly areas such as pediatric poly and EED. https://iopscience.iop.org/article/10.1088/1742-6596/1196/1/012051/pdf
  • 24. HYPOTHETICAL EXAMPLE OF CLUSTERING https://iopscience.iop.org/article/10.1088/1742-6596/1196/1/012051/pdf Patient Id Blood group Year of Birth Smoking Drinking Narcotics Promiscuity Age of onset disease History of foreign travel Hereditary A A+ Y N N N 72 Y Y B B- Y N Y N 68 Y N C O+ N Y N Y 45 Y Y D O+ N N N N 82 N Y E B+ Y N N Y 32 N N F Bombay blood group N Y Y N 65 N N • Imagine a cluster being identified with below characteristics • Cluster of cases where Blood group = O+,Year birth is before 1947, Smoking =Y, Drinking=Y, Narcotics=Y, Promiscuity=Y, History of Foreign Travel=N, Hereditary=N and Age of onset > 80 • If this cluster is < 0.05% of cases, then we can statistically ignore it. • If this cluster is > 80% of cases then, we can perhaps say that the disease is prevalent among a particular section of population.
  • 25. APRIORI • Used to find relationship between items in a database. • Famous example of beer and diapers in a supermarket
  • 26. HYPOTHETICAL EXAMPLE FOR A-PRIORI • Chemical structures of molecule are quantified and tabulated • Solubility is measured and documented. Question : How many items with Minimum Degree of 3 of exactly 2 H-Bond donors have 2 rotatable bonds? Compound ID smiles Minimum Degree Molecular Weight Number of H-Bond Donors Number of Rings Number of Rotatable Bonds Polar Surface Area measured log solubility in mols per litre Amigdalin OCC3OC(OCC2OC(OC(C#N)c1ccccc1)C(O)C(O)C2O)C(O)C( O)C3O 1 457.432 7 3 7 202.32 -0.77 Fenfuram Cc1occc1C(=O)Nc2ccccc2 1 201.225 1 2 2 42.24 -3.3 citral CC(C)=CCCC(C)=CC(=O) 1 152.237 0 0 4 17.07 -2.06 Picene c1ccc2c(c1)ccc3c2ccc4c5ccccc5ccc43 2 278.354 0 5 0 0 -7.87 Thiophene c1ccsc1 2 84.143 0 1 0 0 -1.33 benzothiazole c2ccc1scnc1c2 2 135.191 0 2 0 12.89 -1.5 2,2,4,6,6'-PCB Clc1cc(Cl)c(c(Cl)c1)c2c(Cl)cccc2Cl 1 326.437 0 2 1 0 -7.32 Estradiol CC12CCC3C(CCc4cc(O)ccc34)C2CCC1O 1 272.388 2 4 0 40.46 -5.03 Dieldrin ClC4=C(Cl)C5(Cl)C3C1CC(C2OC12)C3C4(Cl)C5(Cl)Cl 1 380.913 0 5 0 12.53 -6.29
  • 27. DEEP LEARNING • Performs the same activities i.e. Regression, Classification, Forecasting etc. • Relies on linear algebra rather than statistical methods. • Training data requirements are much higher • Computational complexity is much higher • Results are much more accurate. i.e. Error is much lower.
  • 28. COMPUTER VISION • Processing images to identify information from them • Examples • Automatically analysing x-ray images • Diabetic Retinopathy • Identifying compliance to drug intake during clinical trials
  • 29. NATURAL LANGUAGE PROCESSING • Reading free-form text to identify information • Examples: • Automatically read prescriptions • Analyse medical reports to extract information in actionable manner • Patient A reported reduced pain in 30 minutes after administering 10 mg of the drug. Patient B reported reduced pain in 15 minutes after administering 15 mg. Name Dosage Time A 10 30 B 15 15
  • 30. PROBLEMS WITH AI • Heavily dependent on the data that is fed to it. Garbage in, Garbage out. • In some cases, too much data is also a problem. • Explainability is still aWork In Progress for AI scientists.
  • 31. CAN AI REPLACE HUMANS? Yeah, No.AI is not so intelligent. AI is very very good at searching for answers in data. Human intellect is in asking the right questions. AI is neither artificial nor intelligent. It is made from natural resources and it is people who are performing the tasks to make the systems appear autonomous. - Kate Crawford, Senior principal researcher at Microsoft Research.
  • 32. HOW DOES IT ALL WORK TOGETHER • Data is increasingly digitized and stored in databases. • IoT devices are capturing information in clinical studies, manufacturing, biometry data • Once data is available in a table, ML can start working
  • 33. FURTHER READING • Should I learn mathematics? • Should I learn programming? • Where can I learn more? • Machine Learning for Engineers https://www.amazon.in/dp/9389024870/ref=cm_sw_em_r_mt_dp_M4SYKWQ234BAKR800NKZ • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/ • https://pubmed.ncbi.nlm.nih.gov/30472429/
  • 34. QUESTIONS AND ANSWERS • ThankYou! • For further discussions • Ngopikrishna.public@gmail.com • +91-9036005121