SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
PRESENTED BY
Serving Deep Learning Models at
Scale with RedisAI
Luca Antiga
[tensor]werk, CEO
PRESENTED BY
1 Don’t say AI until you productionize
Deep learning and challenges for production
2 Introducing RedisAI + Roadmap
Architecture, API, operation, what’s next
3 Demo
Live notebook by Chris Fragly, Pipeline.ai
Agenda:
PRESENTED BY
• Co-founder and CEO of [tensor]werk
Infrastructure for data-defined software
• Co-founder of Orobix

AI in healthcare/manufacturing/gaming/+
• PyTorch contributor in 2017-2018
• Co-author of Deep Learning with
PyTorch, Manning (with Eli Stevens)
Who am I (@lantiga)
PRESENTED BY
Don’t say AI until you productionize
PRESENTED BY
• TensorFlow
• PyTorch
• MXNet
• CNTK, Chainer,
• DyNet, DL4J,
• Flux, …
Deep learning frameworks
PRESENTED BY
• Python code behind e.g. Flask
• Execution service from cloud provider
• Runtime
• TensorFlow serving
• Clipper
• NVIDIA TensorRT inference server
• MXNet Model Server
• …
• Bespoke solutions (C++, …)
Production strategies
https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58
PRESENTED BY
• Must fit the technology stack
• Not just about languages, but about
semantics, scalability, guarantees
• Run anywhere, any size
• Composable building blocks
• Must try to limit the amount of moving
parts
• And the amount of moving data
• Must make best use of resources
Production requirements
PRESENTED BY
RedisAI
PRESENTED BY
• A Redis module providing
• Tensors as a data type
• and Deep Learning model execution
• on CPU and GPU
• It turns Redis into a full-fledged deep
learning runtime
• While still being Redis
What it is
PRESENTED BY
RedisAI
New data type:
Tensor
def addsq(a, b):
return (a + b)**2
TorchScript
CPU
GPU0
GPU1
…
PRESENTED BY
• Tensors: framework-agnostic
• Queue + processing thread
• Backpressure
• Redis stays responsive
• Models are kept hot in memory
• Client blocks
Architecture
PRESENTED BY
• redisai.io
• github.com/RedisAI/RedisAI
Where to get it
docker run -p 6379:6379 -it --rm redisai/redisai
PRESENTED BY
• AI.TENSORSET
• AI.TENSORGET
API: Tensor
AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4
AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw
AI.TENSORGET foo BLOB
AI.TENSORGET foo VALUES
AI.TENSORGET foo META
PRESENTED BY
• based on dlpack https://github.com/dmlc/dlpack
• framework-independent
API: Tensor
typedef struct {
void* data;
DLContext ctx;
int ndim;
DLDataType dtype;
int64_t* shape;
int64_t* strides;
uint64_t byte_offset;
} DLTensor;
PRESENTED BY
• AI.MODELSET
API: Model
AI.MODELSET resnet18 TORCH GPU < foo.pt
AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
PRESENTED BY
• AI.MODELRUN
API: Model
AI.MODELRUN resnet18 INPUTS foo OUTPUTS bar
https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
PRESENTED BY
• TensorFlow (+ Keras): freeze graph
Exporting models
import tensorflow as tf
var_converter = tf.compat.v1.graph_util.convert_variables_to_constants
with tf.Session() as sess:
sess.run([tf.global_variables_initializer()])
frozen_graph = var_converter(sess, sess.graph_def, ['output'])
tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False)
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py
PRESENTED BY
• PyTorch: JIT model
Exporting models
import torch
batch = torch.randn((1, 3, 224, 224))
traced_model = torch.jit.trace(model, batch)
torch.jit.save(traced_model, 'resnet50.pt')
https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py
PRESENTED BY
• AI.SCRIPTSET
• AI.SCRIPTRUN
API: Script
AI.SCRIPTSET myadd2 GPU < addtwo.txt
AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar
def addtwo(a, b):
return a + b
addtwo.txt
PRESENTED BY
• SCRIPT is a TorchScript interpreter
• Python-like syntax for tensor ops
• on CPU and GPU
• Vast library of tensor operations
• Allows to prescribe computations
directly (without exporting from a
Python env, etc)
• Pre-proc, post-proc (but not only)
Scripts?
Ref: https://pytorch.org/docs/stable/jit.html
SCRIPT
PT MODEL
TF MODEL
SCRIPTin_key out_key
PRESENTED BY
• RDB supported
• AOF almost :-)
• Tensors are serialized (meta + blob)
• Models are serialized back into
protobuf
• Scripts are serialized as strings
Persistence
PRESENTED BY
• Master-replica supported for all data
types
• Right now, run cmds replicated too

(post-conf: replication of results of
computations where appropriate)
• Cluster supported, caveat: sharding
models and scripts
• For the moment, use hash tags

{foo}resnet18 {foo}input1234
Replication
PRESENTED BY
• NOTE: any Redis client works right now
• JRedisAI https://github.com/RedisAI/JRedisAI
• redisai-py https://github.com/RedisAI/redisai-py
• Coming up: NodeJS, Go, … (community?)
RedisAI client libraries
PRESENTED BY
RedisAI from NodeJS
PRESENTED BY
RedisAI from NodeJS
PRESENTED BY
• Keep the data local
• Keep stack short
• Run everywhere Redis runs
• Run multi-backend
• Stay language-independent
• Optimize use of resources
• Keep models hot
• HA with sentinel, clustering
Advantages of RedisAI today
https://github.com/RedisAI/redisai-examples
PRESENTED BY
Roadmap
PRESENTED BY
Roadmap: DAG
AI.DAGRUN
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
• DAG = Direct Acyclic Graph
• Atomic operation
• Volatile keys (~x~): command-local, don’t touch keyspace
PRESENTED BY
Roadmap: DAG
AI.DAGRUNRO
TENSORSET ~img~ FLOAT 1 3 224 224 BLOB ...
SCRIPTRUN preproc normalize ~img~ ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~
TENSORGET ~label~ VALUES
• AI.DAGRUNRO:
• if no key is written, replicas can execute
• errors if commands try to write to non-volatile keys
DAGRUN
DAGRUNRO
DAGRUNRO
DAGRUNRO
PRESENTED BY
Roadmap: DAG
AI.MODELSET resnet18a TF GPU0 ...
AI.MODELSET resnet18b TORCH GPU1 ...
AI.TENSORSET img ...
AI.DAGRUN
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~
MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~
SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
• Parallel multi-device execution
• One queue per device
normalize
GPU0 GPU1
ensamble
probstolabels
PRESENTED BY
Roadmap: DAG
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize img ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label
=> 12634
AI.DAGRUNINFO 1264
=> RUNNING [... more details …]
AI.DAGRUNINFO 1264
=> DONE
• AI.DAGRUNASYNC:
• non-blocking, returns an ID
• ID can be used to later retrieve status + keyspace notification
PRESENTED BY
• Pervasive use of streams
Roadmap: streams
AI.DAGRUN
SCRIPTRUN preproc normalize instream ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
AI.DAGRUNASYNC
SCRIPTRUN preproc normalize instream ~in~
MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
PRESENTED BY
• Runtime from Microsoft
• ONNX
• exchange format for NN
• export from many frameworks
(MXNet, CNTK, …)
• ONNX-ML
• ONNX for machine learning
models (RandomForest, SVN, K-
means, etc)
• export from scikit-learn
Roadmap: ONNXRuntime backend
PRESENTED BY
• Auto-batching
• Transparent batching of requests
• Queue gets rearranged according
to other analogous requests in the
queue, time to response
• Only if different clients or async
• Time-to-response
• ~Predictable in DL graphs
• Can reject request if TTR >
estimated time of queue + run
Roadmap: Auto-batching
concat along 0-th dim
run batch
Run if TTR
< 800ms
analyze queue
Reject Run if TTR
< 800ms
OK
A B
PRESENTED BY
• Dynamic loading of backends:
• don’t load what you don’t need
(especially on GPU)
• Monitoring with AI.INFO:
• more granular information on
running operations, commands,
memory
Roadmap: Misc
AI.CONFIG LOADBACKEND TF CPU
AI.CONFIG LOADBACKEND TF GPU
AI.CONFIG LOADBACKEND TORCH CPU
AI.CONFIG LOADBACKEND TORCH GPU
...
AI.INFO
AI.INFO MODEL key
AI.INFO SCRIPT key
...
PRESENTED BY
• Advanced monitoring
• Health
• Performance
• Model metrics (reliability)
• A/B testing
• Module integration, e.g.
• RediSearch (FAISS)
• Anomaly detection with RedisTS
• Training/fine-tuning in-Redis
RedisAI Enterprise
PRESENTED BY
• [tensor]werk
• Sherin Thomas, Rick Izzo, Pietro Rota
• RedisLabs
• Guy Korland, Itamar Haber, Pieter
Cailliau, Meir Shpilraien, Mark
Nunberg, Ariel Madar
• Orobix
• Everyone!
• Manuela Bazzana, Daniele Ciriello,
Lisa Lozza, Simone Manini,
Alessandro Re
Acknowledgements
PRESENTED BY
Development tools for the data-
defined software era
1. Launching April 2019
2. Looking for investors, users,
contributors
Projects:
- RedisAI: enterprise-grade runtime
(with Redis Labs)
- Hangar: version control for tensor
data (mid April, BSD)
Hit me up!
luca@tensorwerk.com
tensorwerk.com
PRESENTED BY
Chris Fregly, Pipeline.ai
• End-to-End Deep Learning from Research to Production
• Any Cloud, Any Framework, Any Hardware
• Free Community Edition: https://community.pipeline.ai
Thank you!
PRESENTED BY

Weitere ähnliche Inhalte

Was ist angesagt?

Apache ActiveMQ and Apache Camel
Apache ActiveMQ and Apache CamelApache ActiveMQ and Apache Camel
Apache ActiveMQ and Apache Camel
Omi Om
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
DataWorks Summit
 
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
Redis Labs
 

Was ist angesagt? (20)

Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?
 
Apache drill
Apache drillApache drill
Apache drill
 
Apache ActiveMQ and Apache Camel
Apache ActiveMQ and Apache CamelApache ActiveMQ and Apache Camel
Apache ActiveMQ and Apache Camel
 
Securing Your Atlassian Connect Add-on With JWT
Securing Your Atlassian Connect Add-on With JWTSecuring Your Atlassian Connect Add-on With JWT
Securing Your Atlassian Connect Add-on With JWT
 
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
 
Hadoop et son écosystème - v2
Hadoop et son écosystème - v2Hadoop et son écosystème - v2
Hadoop et son écosystème - v2
 
REST API debate: OData vs GraphQL vs ORDS
REST API debate: OData vs GraphQL vs ORDSREST API debate: OData vs GraphQL vs ORDS
REST API debate: OData vs GraphQL vs ORDS
 
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
Top Troubleshooting Tips and Techniques for Citrix XenServer DeploymentsTop Troubleshooting Tips and Techniques for Citrix XenServer Deployments
Top Troubleshooting Tips and Techniques for Citrix XenServer Deployments
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
UNAERP - 04/11 - Digerindo dados com Apache NiFi
UNAERP - 04/11 - Digerindo dados com Apache NiFiUNAERP - 04/11 - Digerindo dados com Apache NiFi
UNAERP - 04/11 - Digerindo dados com Apache NiFi
 
Oracle DBMS vs Amazon RDS vs Amazon Aurora PostgreSQL principali similitudini...
Oracle DBMS vs Amazon RDS vs Amazon Aurora PostgreSQL principali similitudini...Oracle DBMS vs Amazon RDS vs Amazon Aurora PostgreSQL principali similitudini...
Oracle DBMS vs Amazon RDS vs Amazon Aurora PostgreSQL principali similitudini...
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-CloudAlluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 

Ähnlich wie Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
nvirters
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
Artem Chebotko
 

Ähnlich wie Serving Deep Learning Models At Scale With RedisAI: Luca Antiga (20)

Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...
Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...
Build a Deep Learning App with Tensorflow & Redis by Jayesh Ahire and Sherin ...
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDNTech Tutorial by Vikram Dham: Let's build MPLS router using SDN
Tech Tutorial by Vikram Dham: Let's build MPLS router using SDN
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
 
Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"
 
Cockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirCockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with Elixir
 
Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
Rina p4 rina workshop
Rina p4   rina workshopRina p4   rina workshop
Rina p4 rina workshop
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 

Mehr von Redis Labs

SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
 

Mehr von Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Serving Deep Learning Models At Scale With RedisAI: Luca Antiga

  • 1. PRESENTED BY Serving Deep Learning Models at Scale with RedisAI Luca Antiga [tensor]werk, CEO
  • 2. PRESENTED BY 1 Don’t say AI until you productionize Deep learning and challenges for production 2 Introducing RedisAI + Roadmap Architecture, API, operation, what’s next 3 Demo Live notebook by Chris Fragly, Pipeline.ai Agenda:
  • 3. PRESENTED BY • Co-founder and CEO of [tensor]werk Infrastructure for data-defined software • Co-founder of Orobix
 AI in healthcare/manufacturing/gaming/+ • PyTorch contributor in 2017-2018 • Co-author of Deep Learning with PyTorch, Manning (with Eli Stevens) Who am I (@lantiga)
  • 4. PRESENTED BY Don’t say AI until you productionize
  • 5. PRESENTED BY • TensorFlow • PyTorch • MXNet • CNTK, Chainer, • DyNet, DL4J, • Flux, … Deep learning frameworks
  • 6. PRESENTED BY • Python code behind e.g. Flask • Execution service from cloud provider • Runtime • TensorFlow serving • Clipper • NVIDIA TensorRT inference server • MXNet Model Server • … • Bespoke solutions (C++, …) Production strategies https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58
  • 7. PRESENTED BY • Must fit the technology stack • Not just about languages, but about semantics, scalability, guarantees • Run anywhere, any size • Composable building blocks • Must try to limit the amount of moving parts • And the amount of moving data • Must make best use of resources Production requirements
  • 9. PRESENTED BY • A Redis module providing • Tensors as a data type • and Deep Learning model execution • on CPU and GPU • It turns Redis into a full-fledged deep learning runtime • While still being Redis What it is
  • 10. PRESENTED BY RedisAI New data type: Tensor def addsq(a, b): return (a + b)**2 TorchScript CPU GPU0 GPU1 …
  • 11. PRESENTED BY • Tensors: framework-agnostic • Queue + processing thread • Backpressure • Redis stays responsive • Models are kept hot in memory • Client blocks Architecture
  • 12. PRESENTED BY • redisai.io • github.com/RedisAI/RedisAI Where to get it docker run -p 6379:6379 -it --rm redisai/redisai
  • 13. PRESENTED BY • AI.TENSORSET • AI.TENSORGET API: Tensor AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4 AI.TENSORSET foo FLOAT 2 2 BLOB < buffer.raw AI.TENSORGET foo BLOB AI.TENSORGET foo VALUES AI.TENSORGET foo META
  • 14. PRESENTED BY • based on dlpack https://github.com/dmlc/dlpack • framework-independent API: Tensor typedef struct { void* data; DLContext ctx; int ndim; DLDataType dtype; int64_t* shape; int64_t* strides; uint64_t byte_offset; } DLTensor;
  • 15. PRESENTED BY • AI.MODELSET API: Model AI.MODELSET resnet18 TORCH GPU < foo.pt AI.MODELSET resnet18 TF CPU INPUTS in1 OUTPUTS linear4 < foo.pt https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
  • 16. PRESENTED BY • AI.MODELRUN API: Model AI.MODELRUN resnet18 INPUTS foo OUTPUTS bar https://www.codeproject.com/Articles/1248963/Deep-Learning-using-Python-plus-Keras-Chapter-Re
  • 17. PRESENTED BY • TensorFlow (+ Keras): freeze graph Exporting models import tensorflow as tf var_converter = tf.compat.v1.graph_util.convert_variables_to_constants with tf.Session() as sess: sess.run([tf.global_variables_initializer()]) frozen_graph = var_converter(sess, sess.graph_def, ['output']) tf.train.write_graph(frozen_graph, '.', 'resnet50.pb', as_text=False) https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/tensorflow/model_saver.py
  • 18. PRESENTED BY • PyTorch: JIT model Exporting models import torch batch = torch.randn((1, 3, 224, 224)) traced_model = torch.jit.trace(model, batch) torch.jit.save(traced_model, 'resnet50.pt') https://github.com/RedisAI/redisai-examples/blob/master/models/imagenet/pytorch/model_saver.py
  • 19. PRESENTED BY • AI.SCRIPTSET • AI.SCRIPTRUN API: Script AI.SCRIPTSET myadd2 GPU < addtwo.txt AI.SCRIPTRUN myadd2 addtwo INPUTS foo OUTPUTS bar def addtwo(a, b): return a + b addtwo.txt
  • 20. PRESENTED BY • SCRIPT is a TorchScript interpreter • Python-like syntax for tensor ops • on CPU and GPU • Vast library of tensor operations • Allows to prescribe computations directly (without exporting from a Python env, etc) • Pre-proc, post-proc (but not only) Scripts? Ref: https://pytorch.org/docs/stable/jit.html SCRIPT PT MODEL TF MODEL SCRIPTin_key out_key
  • 21. PRESENTED BY • RDB supported • AOF almost :-) • Tensors are serialized (meta + blob) • Models are serialized back into protobuf • Scripts are serialized as strings Persistence
  • 22. PRESENTED BY • Master-replica supported for all data types • Right now, run cmds replicated too
 (post-conf: replication of results of computations where appropriate) • Cluster supported, caveat: sharding models and scripts • For the moment, use hash tags
 {foo}resnet18 {foo}input1234 Replication
  • 23. PRESENTED BY • NOTE: any Redis client works right now • JRedisAI https://github.com/RedisAI/JRedisAI • redisai-py https://github.com/RedisAI/redisai-py • Coming up: NodeJS, Go, … (community?) RedisAI client libraries
  • 26. PRESENTED BY • Keep the data local • Keep stack short • Run everywhere Redis runs • Run multi-backend • Stay language-independent • Optimize use of resources • Keep models hot • HA with sentinel, clustering Advantages of RedisAI today https://github.com/RedisAI/redisai-examples
  • 28. PRESENTED BY Roadmap: DAG AI.DAGRUN SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label • DAG = Direct Acyclic Graph • Atomic operation • Volatile keys (~x~): command-local, don’t touch keyspace
  • 29. PRESENTED BY Roadmap: DAG AI.DAGRUNRO TENSORSET ~img~ FLOAT 1 3 224 224 BLOB ... SCRIPTRUN preproc normalize ~img~ ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~ TENSORGET ~label~ VALUES • AI.DAGRUNRO: • if no key is written, replicas can execute • errors if commands try to write to non-volatile keys DAGRUN DAGRUNRO DAGRUNRO DAGRUNRO
  • 30. PRESENTED BY Roadmap: DAG AI.MODELSET resnet18a TF GPU0 ... AI.MODELSET resnet18b TORCH GPU1 ... AI.TENSORSET img ... AI.DAGRUN SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~ MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~ SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label • Parallel multi-device execution • One queue per device normalize GPU0 GPU1 ensamble probstolabels
  • 31. PRESENTED BY Roadmap: DAG AI.DAGRUNASYNC SCRIPTRUN preproc normalize img ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label => 12634 AI.DAGRUNINFO 1264 => RUNNING [... more details …] AI.DAGRUNINFO 1264 => DONE • AI.DAGRUNASYNC: • non-blocking, returns an ID • ID can be used to later retrieve status + keyspace notification
  • 32. PRESENTED BY • Pervasive use of streams Roadmap: streams AI.DAGRUN SCRIPTRUN preproc normalize instream ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream AI.DAGRUNASYNC SCRIPTRUN preproc normalize instream ~in~ MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~ SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS outstream
  • 33. PRESENTED BY • Runtime from Microsoft • ONNX • exchange format for NN • export from many frameworks (MXNet, CNTK, …) • ONNX-ML • ONNX for machine learning models (RandomForest, SVN, K- means, etc) • export from scikit-learn Roadmap: ONNXRuntime backend
  • 34. PRESENTED BY • Auto-batching • Transparent batching of requests • Queue gets rearranged according to other analogous requests in the queue, time to response • Only if different clients or async • Time-to-response • ~Predictable in DL graphs • Can reject request if TTR > estimated time of queue + run Roadmap: Auto-batching concat along 0-th dim run batch Run if TTR < 800ms analyze queue Reject Run if TTR < 800ms OK A B
  • 35. PRESENTED BY • Dynamic loading of backends: • don’t load what you don’t need (especially on GPU) • Monitoring with AI.INFO: • more granular information on running operations, commands, memory Roadmap: Misc AI.CONFIG LOADBACKEND TF CPU AI.CONFIG LOADBACKEND TF GPU AI.CONFIG LOADBACKEND TORCH CPU AI.CONFIG LOADBACKEND TORCH GPU ... AI.INFO AI.INFO MODEL key AI.INFO SCRIPT key ...
  • 36. PRESENTED BY • Advanced monitoring • Health • Performance • Model metrics (reliability) • A/B testing • Module integration, e.g. • RediSearch (FAISS) • Anomaly detection with RedisTS • Training/fine-tuning in-Redis RedisAI Enterprise
  • 37. PRESENTED BY • [tensor]werk • Sherin Thomas, Rick Izzo, Pietro Rota • RedisLabs • Guy Korland, Itamar Haber, Pieter Cailliau, Meir Shpilraien, Mark Nunberg, Ariel Madar • Orobix • Everyone! • Manuela Bazzana, Daniele Ciriello, Lisa Lozza, Simone Manini, Alessandro Re Acknowledgements
  • 38. PRESENTED BY Development tools for the data- defined software era 1. Launching April 2019 2. Looking for investors, users, contributors Projects: - RedisAI: enterprise-grade runtime (with Redis Labs) - Hangar: version control for tensor data (mid April, BSD) Hit me up! luca@tensorwerk.com tensorwerk.com
  • 39. PRESENTED BY Chris Fregly, Pipeline.ai • End-to-End Deep Learning from Research to Production • Any Cloud, Any Framework, Any Hardware • Free Community Edition: https://community.pipeline.ai