This document discusses building an informatics solution to support AI-guided cell profiling using high-content microscopy imaging. The research group aims to accelerate drug discovery using AI and intelligent experimental design. Their goal is to build pipelines of containerized services and workflows to continuously process imaging data, train models, and design new experiments in an automated and scalable manner. This will enable data-driven science through predictive modeling and closing the loop between data generation, analysis, and experimentation.
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Building an AI-guided cell profiling solution with microscopy
1. Building an informatics solution to sustain AI-guided cell
profiling with high-content microscopy imaging
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences, Uppsala University
www.pharmb.io
2. Who are we?
• Academic research group at Uppsala University
• Background in computational pharmacology (AI/ML)
• Good at e-infrastructure, big data (data engineering)
• Setting up an high-content imaging lab for cell profiling
Research group website: http://pharmb.io
3. Objective: Accelerate drug discovery using
AI and intelligent design of experiments.
• Predict safety concerns
• Explain drug mechanisms
• Screen for new drugs
4. Hypothesis
revise
Insight
• Iterative
• Flexible
• Mostly manual
• Slow
Experiments
Analysis and interpretation
Traditional hypothesis testing
• Retrospective analysis
• Hopefully predictive
• Expensive
• Limited for hypothesis
testing
more
Predictive modeling
Database
Data generation
Traditional Processing Stream Processing
Data
Data Query
request
response
Real- T ime
Analytics
Data Results
ModelPrediction
Modeling and prediction
5. Data-driven science
Stream Processing
Real- T ime
Analytics
Data Results
Data
Models Evaluation
Prediction/
Insight
Hard problem!
Poor accuracy?
Hypothesis
Hypothesis
test
generate
Generate new data
6. Closing the loop:
Intelligent experimentation
Data
Stream Processing
Real- T ime
Analytics
Data Results
Current fact finding
Analyze data in motion – before it is storedsk
Continuous Analytics
Results
Intelligent design of
experiments Experiments
Scientist
• What experiments should we
do and how?
• Can we reduce search space?
• How store only interesting
data?
• Can we replace experiments
with predictions?
Automation Informatics
7. Genetic or
chemical
perturbations
Experiments
in multi-
well plates
Imaging Features Hypotheses
Convolutional Neural Network
Predictions
Cell painting: HCI with multiplexed dyes
Bray et al. (2016). “Cell Painting, a High-Content Image-Based Assay for Morphological
Profiling Using Multiplexed Fluorescent Dyes.” Nature Protocols 11 (9): 1757–74.
8. Holographic live cell imaging
• Quantitative phase-contrast microscopy
• Holographic phase-shift imaging
• Label-free, live cell imaging
• Used inside incubator
HoloMonitor system
9. Main focus area: Drug/chemical profiling
with AI modeling
Explore profiling with AI/ML
• Target identification
• Mechanism-of-Action predictions
• Pathway enrichment
actin disruption
microtubule destabilization
aurora kinase inhibition
DNA replication
Eg5 inhibition
protein degradation
cholesterol lowering
DNA damage
epithelial
kinase inhibition
protein synthesis
microtubule stabilization.
Microscopy
image
Deep Neural Network MoA profile prediction
Cell
treatment
• 2D monolayer, cell lines (U2OS,
MCF-7, A549, RKO, …)
• Integrate HCI with other data
model
10. Protein degradation Cholesterol-lowering DNA replication
Microtubule stabilizer Actin disruptor Kinase inhibitor
Classify images into biological
mechanisms
Kensert A, Harrison PJ, Spjuth O.
Transfer learning with deep convolutional neural network for classifying cellular morphological changes.
SLAS DISCOVERY: Advancing Life Sciences R&D. 24, 4 (2019)
13. Make predictions
using available
data
External data
Data warehouse
Design new
experimentsAI
Modeling
Publish data and models
Manual wet lab
Hypothesis
Verify using
external
protocol
Automated lab
Carry out
new
experiments
Analysis pipeline
Vision: Intelligent systems for
drug/chemical profiling
Hypothesis
Hypothesis
test
generate
14. Automating our cell-based lab
Fixed setup (version 1)
• ImageXpress XLS (Molecular Devices)
• Plate robot (Preciseflex)
• Plate incubator (Liconic), barcode reader
• BioMek 4000 liquid handling (Beckman
Coulter)
• Green Button Go lab automation software
(Biosero)
Observations:
• Quick to get up and running
• Suitable for fixed protocols
• Dependent on vendors to
solve problems
• Not easy to expand or
configure for us
Open source setup (under construction)
• HoloMonitor (Phase Holographic Imaging)
• OT-2 liquid handling (OpenTrons)
• Plate robots (under procurement)
• Open source lab automation (to be decided)
• More components… (to be decided)
Our priorities:
• Flexibility to expand/adapt
• Open source or good APIs
• Low cost, serviceable by us
• Configurable by us
Collaborators wanted!
15. Robotized lab
images
Automating our data processing
ImageDBImage viewer
File system
Metadata Files (images)
https://github.com/pharmbio/imagedb
Cold storage
Hot storage
Online,
intelligent
processing
Cell profilesQC workflows Interestingness models
HASTE CORE and Cell Profiler Pipeline
https://github.com/HASTE-project/cellprofiler-pipeline
Avoid storing
uninteresting data
16. Robotized lab
Data scientists
Empowering our data scientists
ImageDB
File system
Metadata Files (images)
Models
CPU/GPU/HPC cloud
Notebooks
Data
Models
External
users
Services
Public services
Publish
17. Managing our software ecosystem
• Scientists require many different software tools
• Difficult and time-consuming to manage dependencies
• Software Containers
• Offers isolation on application level, share operating system
• Portable, fast, smaller than virtual machine images
• Docker
• Microservices
• Decompose functionality into smaller, loosely coupled, on-demand
services
• Improve resilience, agile development
• Easy to scale
• Kubernetes
• manage a cluster of machines running containers
18. Building pipelines of containers
• A suitable way of using containers are
connecting them into a (scientific)
workflow
• Goal: Reproducible, fault-tolerant,
scalable execution
• Lampa S et al. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. Gigascience. 8, 5 (2019)
• Spjuth O et al. Approaches for containerized scientific workflows in cloud environments with applications in life science. PeerJ Preprints. 6, e27141v1 (2018
• Capuccini M, et al. MaRe: Container-Based Parallel Computing with Data Locality ArXiv. 1808.02318 (2018)
• Novella JA et al. Container-based bioinformatics with Pachyderm. Bioinformatics. 35, 5, 839-846. (2018)
• Lampa S et al. Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. Journal of Cheminformatics. 8, 67. (2016)
19. Dealing with large scale data
• High volume, relatively high velocity
• Continuously process data, train
models, serve models
• Embrace scalable virtual
infrastructures (cloud) and
microservices (containers)
GPU cluster
CPU server
Storage
Cloud
HPC
Online processing
20. AI modeling life cycle
Model Development
ML studio
ML workflow
automation
Package & Deploy Models Model Serving
Model
management
Model
serving
Monitoring
Explore Data and
Develop Models
Train at scale
Register Model
and Metadata for
Serving
Package and
Publish Run in
operations Monitor
LoggingIntegrate
Data
scientist
Data
Engineer
Data
Engineer
Promote
Model
Ship
Model
www.scaleoutsystems.com
In collaboration with:
21. Integrate with our other AI services
Site-of-metabolism and reaction types
http://ptp.service.pharmb.io/
https://metpred.service.pharmb.io/draw/
Target (safety) profiles
22. Implications: Continuous Analytics
• We can handle the continuous data processing from instruments with
robust, resilient data pipelines
• We can continuously re-train models as data is updated
• We can (soon) continuously publish data and models
Data
Traditional Processing Stream Processing
Data Query
request
response
Real-Time
Analytics
Data Results
Continuous Analytics
Results
Intelligent
design of
experiments
Experiments
Scientist
Agile research group of different competencies
• Scientists has access to necessary
infrastructure
• Data stored in structured databases
• DevOps roles, no dedicated sysadmin /
developer