Patent Data for Artificial Intelligence based Drug Discovery

AI Drug Discovery in Patent
Space
Hanjo Kim
Principal Scientist at Standigm Inc.
hanjo.kim@standigm.com
business@standigm.com
apply@standigm.com
www.standigm.com

Disclaimer
• Statements of fact and opinions expressed in this presentation
and on the following slides are solely those of the presenter and
not necessarily those of Standigm Inc.

Standigm Inc.
2015
Founded by three researchers at Samsung Advanced Institute of Technology
Jinhan Kim, PhD Artificial Intelligence (The University of Edinburgh)
Sang Ok Song, PhD Chemical Engineering (Seoul National University)
So Jeong Yun, PhD Systems Biology (POSTECH)
$23M
Funding raised
SK Holdings, Mirae Asset Capital, Mirae Asset Venture Investment, DSC
Investment, Wonik Investment, Atinum Investment, LB Investment, Kakao
Ventures
Seoul Korea (33)
Ann Arbor
Michigan (2)
Standigm= drug discovery company that generates and optimizes therapeutic
lead compounds by using advanced artificial intelligence toward license-out
Cambridge
UK (1)
AI, 16
Biology, 6
Chemistry, 8
Systems Biology,
4
Advisor, 3
PhD
20/37*
* Except Operation 5, Patent attorney 1

The AI solution
Disease Hit Lead Preclinical Clinical Drug
Drug
repositioning
The Standigm AI solution is industrializing drug discovery
Discovery at Scale
Target
* developing
BEST
TM
ASK
TM
Insight
TM
FIRST
*
Standigm ASKTM is freely available at
https://icluenask.standigm.com

Standigm BEST Platform
Standigm BESTStandigm
ASK
Knowledge
based biology
platform
for
novel targets,
pathways, and
MoA discovery
Standigm
FIRST
Hit generation
platform
for
novel and/or
undruggable
targets
Generative Models
Graph-based VAE
Scaffold-based
conditional enumerator
Novel Molecular
Representation
Scoring Functions
Simulations
AI rescoring models
Machine learning models
Compound Database
Known Molecules
Seed Molecules
Novel Virtual Structures
Commercial Library Privileged Standigm Library
Target Database Public data (gene, protein, function) BEST Feasibility
Public Library
Strategy setup Hit Generation Hit-2-Lead
Predictive Models
ADME/Tox predictors
Novelty (patentability)
Synthetic accessibility
Filters/Ranking models
External
CROs
Organic
synthesis,
In vitro/in vivo
Assays
Novel/Commercial Hits Lead Series

Graph-based VAE
Chemical
space
Encoder Decoder
Latent
space
Chemical
space
E DZ
Learning chemical space
Training DB
~4M
Y
Property/Target information
Contextualizing:
- substructures
- topology
- shape
- etc
property 1
property 2
property 3
Z : latent space
predictor
q(y|z)
seed molecules
decoder
p(x|z)
X : original chemical space
encoder
q(z|x)
Analogue structure generation
functionally similar
but novel scaffolds/molecules
Lead optimization
novel molecules
w/ better desired properties
decoder
p(x|z)
Smart library expansion
IP generation & expansion

Patent Space
Target A Compounds in latent space
Competitor 1
Competitor 2
Competitor 3
Interesting Area
potentweak

Chemical Space Navigation
• Chemical Space ~ Map
• Known scaffolds ~ POIs
• Information-rich space (ChEMBL, PubChem Bioassays, etc.)
• Novel scaffold ~ New POI
• El Dorado
• Patent
• Markush structure: How to protect as wide as possible area
• Exemplified compounds: boundary stones

Using ChemCurator
• Project types
• Google Patents (most cases)
• PDF files (do not use pdf files!)
• Text files (when google ocr is not good)

Using ChemCurator
Google patents

OCR (and chemical OCR)
• Lessons
• Google patents is reliable in most cases
• It even provides the compound table though very primitive
• Professional OCR software can give better results
• Convert pdf file to plain text with chemical names
• Complex tables
• Image (not OCRed) tables (next 3 slides)
• Chemical OCR engine helps a lot
• Text-image comparison
• Chemical OCR engines
• CLiDE (recommended, proprietary)
• Osra (open-source, recommended on Linux machine)
• Imago (I have no experience)
• Unsupported engines (like ChemGrapher,
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459)

Chemical structures in patents

Markush Structures
• Very expressive
• Same set of compounds can be written to very different forms
• Not well-validated
• ChemCurator helps
• Extracting example compounds
• Matching them to the Markush structure
• Require manual correction
• Sentence to chemical groups
• Ambiguous/incomplete R-group definitions

AI can help
• Reduction of frequent text OCR error
• NLP technique can correct frequent OCR errors
• The availability of large training set is important
• Extraction of relevant data
• Biological activities
• Analytical data
• Chemical OCR can be improved
• AI can do image recognition very well
• Different drawing styles can be managed

Acknowledgement
• Standigm Inc.
• Sanghyung JIN, Minkyu HA, Soyeon Kim, Sangok SONG
• T&J Tech. (Korean distributor)
• Jung-A HAN

Patent Data for Artificial Intelligence based Drug Discovery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Patent Data for Artificial Intelligence based Drug Discovery

Ähnlich wie Patent Data for Artificial Intelligence based Drug Discovery (20)

Mehr von ChemAxon

Mehr von ChemAxon (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Patent Data for Artificial Intelligence based Drug Discovery