Automating AI drug discovery

Automating the process of continuously prioritising
data, updating and deploying AI models in a robotised
lab for drug discovery
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences, Uppsala University
www.pharmb.io
Lead Scientist AI at

Objective: Accelerate drug discovery using
AI and intelligent design of experiments.
• Predict safety concerns (fail early)
• Explain drug mechanisms
• Screen for new drugs

Hypothesis
revise
Insight
• Iterative
• Flexible
• Mostly manual
• Slow
Experiments
Analysis and interpretation
Traditional hypothesis testing
• Retrospective analysis
• Hopefully predictive
• Expensive
• Limited for hypothesis
testing
more
Predictive modeling
Database
Data generation
Traditional Processing Stream Processing
Data
Data Query
request
response
Real- T ime
Analytics
Data Results
ModelPrediction
Modeling and prediction

Data-driven science
Stream Processing
Real- T ime
Analytics
Data Results
Data
Models Evaluation
Prediction/
Insight
Hard problem!
Poor accuracy?
Hypothesis
Hypothesis
test
generate
Generate new data

Intelligent experimentation
Data
Stream Processing
Real- T ime
Analytics
Data Results
Current fact finding
Analyze data in motion – before it is storedsk
Continuous Analytics
Results
Intelligent design of
experiments Experiments
Scientist
• What experiments should we
do and how?
• Can we reduce search space?
• How store only interesting
data?
• Can we replace experiments
with predictions?
Informatics system

Genetic or
chemical
perturbations
Experiments
in multi-
well plates
Imaging Features Hypotheses
Convolutional Neural Network
Predictions
High-content cell profiling
Bray et al. (2016). “Cell Painting, a High-Content Image-Based Assay for Morphological
Profiling Using Multiplexed Fluorescent Dyes.” Nature Protocols 11 (9): 1757–74.

Automated cell-based experiments
GPU cluster
CPU server
Storage
Automated plate handling
Informatics system
Current setup
Automate more protocols
Open source robotized cell lab

Robotized lab
images
cell profiles
CPU/GPU/HPC cloud
Models
OMERO
Chemical DB
link
Notebooks
chemicals
Data
Models
External
user
Services
Public services

•Fluorescent LNPs (lipids)
•Fluorescent Cargo (mRNA)
•Fluorescent Product (protein)
No
LNPs
Partial LNP
uptake
LNP uptake and mRNA
decoding

Dealing with large scale data
• High volume, high velocity
• Continuously process data, train models, serve models
• Embrace scalable virtual infrastructures (cloud) and
microservices (containers)
• Intelligently prioritize what to store

Designing a flexible and scalable
informatics system
14
Cloud
HPC
GPU cluster
CPU server
Storage

Master
Worker
Worker
Server1 (KVM) Server2 (KVM) GPU-Server
GPU-Worker
Join external nodes
VPN /
IPSec
Develop Other
Master
Worker
Master/
Worker
Other
Our infrastructure
Master
Worker
Master
Worker
External
users
Internal
users and
devel
(administered with Rancher)
Fileserver (NFS)

System configuration, IaaC
Master
Worker
Kubernetes (administered with Rancher)
Develop Other
Master
Worker
Master/
Worker
Other
OpenShift Master
Worker
Master
Worker

17
Multi-cluster Kubernetes management
• Kubernetes Setup and
admin in the cluster
• High availability.
Kubernetes extras
such as Networking,
Helm, Nginx-load-
balancer with tested
and matched versions

Continuous Integration and Deployment (CICD)
• IaaC
• Cluster config
• Services
Master
Worker
Develop Other
Master
Worker
Master/
Worker
Other
Master
Worker
Master
Worker

AI/Machine Learning life cycle
•Structured processing steps
•Governance
•Versioning
•Streamline analysis on high-
performance e-infrastructures
•Support reproducible data analysis
•Enable large-scale data analysis

DevOps for machine learning
A systematic method of working that accelerates the path from initial data to production AI services
Continuous Integration & Continuous
Delivery
Physical Virtual
Cloud
Data scientist
Data
Modeling
Testing
Pre-processing Training Container Testing
Model serving

http://haste.research.it.uu.se/

FAIR data and services
FAIR1
• Findable
• Semantic service discovery (JSON-LD)
• Accessible
• Web UI
• Programmatic API (OpenAPI)
• Interoperable
• Open API
• Semantic annotations (JSON-LD)
• Reproducible
• Published scientific workflow
(Pachyderm)
23
Services
• Portable (Containers)
• Resilient (Kubernetes)
• Scalable (Kubernetes and cloud
computing)
1 Wilkinson, Mark D., et al. "The FAIR Guiding
Principles for scientific data management and
stewardship." Scientific data 3 (2016).

Examples of what we serve
Site-of-metabolism and reaction types
http://ptp.service.pharmb.io/
https://metpred.service.pharmb.io/draw/
Target (safety) profiles

Key components and takeaways
• IaaC: Infrastructure is portable, testable and simplifies maintenance
• Containers: Components become portable and scalable
• Kubernetes: Resilient container orchestration over multiple nodes
• logging and monitoring - profiling
• Hybrid cloud - elasticity
• Pachyderm: Workflows of containers in Kubernetes with data versioning
• CICD: Streamline development and testing
• Open Source: Only open source components
 DevOps: Software Developers, Data Engineers and Data Scientists working together
in same infrastructure

Implications: Continuous Analytics
• We can handle the continuous data processing from instruments with
robust, resilient data pipelines
• We can continuously re-train models as data is updated
• We can continuously publish data and models
Agile research group of different competencies
• Scientists has access to infrastructure
• DevOps roles, no dedicated sysadmin / tool devel / data engineer

• Easy deployment of virtual infrastructures on IaaS
• Containerize tools, orchestrate microservices with
workflow systems on top of Kubernetes and OpenShift
Cloudflare
kubeadm Terraform
kubectl
Packer
KubeNow
https://github.com/kubenow
Capuccini et al. On-Demand Virtual Research Environments using
Microservices. https://arxiv.org/abs/1805.06180
Novella et al. Container-based bioinformatics with
Pachyderm. Bioinformatics. 35, 5, 839-846. (2018). DOI:
http://dx.doi.org/10.1093/bioinformatics/bty699
Tools we develop and use
• Data pipelining
and data
versioning layer
for Kubernetes
https://www.pachyderm.io

Research group website: http://pharmb.io
- Thank you -
Funding:

Automating AI drug discovery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Automating AI drug discovery

Ähnlich wie Automating AI drug discovery (20)

Mehr von Ola Spjuth

Mehr von Ola Spjuth (13)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Automating AI drug discovery