Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

Simplifying the Creation of ML
Workflow Pipelines for IoT Application
on Kubernetes with ScyllaDB
Timo Mechler, Product Manager
Charles Adetiloye, ML Platform Engineer

Presenters
Timo Mechler, Product Manager & Architect
Timo Mechler is a Product Manager and Architect at SmartDeployAI. He has close to a
decade of financial data modeling experience working as both an analyst and
strategist in the energy commodities sector. At SmartDeployAI he now works closely
with product development and engineering teams to solve interesting data modeling
challenges.
Charles Adetiloye, ML Platform Engineer
Charles is a Lead ML platforms engineer at SmartDeployAI, He was well over 15 years
of experience building large-scale distributed application. He is always been
interested in building distributed Event-Driven systems that are composable from
independent asynchronous subsystems. He has extensive experience working with
Kubernetes and NoSQL databases like ScyllaDB and Cassandra

About SmartDeployAI
At SmartDeployAI, we develop software platform and frameworks for
running and deploying AI and ML workflow.
Our primary focus is on
- Increasing productivity and agile team release cycle
- Increasing collaboration and visibility between team members
- Sharable and re-usable AI and ML workflow pipeline components

All IoT Devices Generate Time-bounded Events!
. . . . . . .
t1 t2 t3 t4 t5 t6 tn- 2 tn- 1 tn
time
Event
Source
Event
Consumer

IoT Devices are Ubiquitous and all around us!
Event
Consumer
…device1
device2
device3
deviceN
Event Processing
Pipeline

IoT Devices - Geographical & TimeZone

IoT Devices - Group or Cluster of Devices
Customer 1
Customer 2
Customer 3
Customer 4

Generalized IoT Pipeline for AI and ML
Data ingestion
at scale
Data Processing
Pipeline
Data lake and
Data warehouse AI and Analytics
Streaming Data, Ingested
through a secure input
endpoint
Data Processing Pipeline, to
Clean, format raw data ingested
Data Storage, where organized
and cleansed data are stored
Model Training, Deployment,
Analytics and Insights

AI and ML Workflow Pipeline
for IoT Devices

Our Goal!
- Create a Workflow Pipeline that abstract whole process of provisioning IoT pipelines
- Efficient Utilization of Compute Resources
- Support for Multi-tenant deployment of Workflow Pipelines on a Kubernetes Cluster
- Quick instantiation of new Workflow Pipeline from Deployment Config
- Quick Access to ingested dataset for near real-time inference and model retraining
- Store Model metadata from training and Hyperparameters for Model training/re-training
- Super fast Aggregation, Rollup or Grouping of results over a given time-window

IoT Stream Ingestion Pipeline - 2014
Ingestion Process Store AnalyzeData
ML Learning

Pros
- Scales to Support several devices
- Easy path towards ML deployment
- High write throughput
Cons
- Not easily scalable
- Very expensive setup
- We still had downtime
- Cassandra needed occasional tuning
- Bootstrapping new environment took a while

IoT Stream Ingestion Pipeline- 2017
ML Learning
Akka Streams
POD POD
POD POD
POD
POD POD POD
POD
POD POD

Pros
- Efficient Compute Resource Utilization
- Easily Scalable
Cons
- Not easily scalable
- We still had downtime with Cassandra
- Bootstrapping our Cassandra Datastore still a pain point
- Entire workflow not easily Cloneable or Reproduced
Compute
Resource

ML Learning
Akka Streams
POD
POD
POD
POD
POD POD POD
POD POD
POD POD POD
POD POD
POD

Pros
- Efficient Compute Resource Utilization
- Easily Scalable Deployment Pipeline
Cons
- Cassandra JVM is extremely greedy! >= 60% of resources
- Bootstrapping Cassandra pods took over 6000ms
- Entire workflow not easily Cloneable or Reproduced

Why did we go with ScyllaDB?
- Drop in replacement for Cassandra
- Low memory footprint - VERY Important on Kubernetes
- More than 8x faster than Cassandra
- Easy to containerize and deploy as Kubernetes POD
- We could easily run it as part of our ML Workflow Pipeline

Cloud Native ML Pipeline
. . .Transient
DataStore
...
Ingestion Processing Inference

Cloud Native ML Pipeline with ScyllaDB
. . .Transient
DataStore
...
Ingestion Processing Inference
Kubeflow Pipeline
ScyllaDB Operator

Cloud Native ML Pipeline with ScyllaDB
@dsl.pipeline(name=”SmartDeploy IoT”, descriptor=”IoT data stream pipeline”)
def smartdeploy_training_iot_pipeline(
servers,
topic,
auto_offset)
dataflow_transform = …
persist_to_scylladb = …
run_inference = …
compiler.Compile().compile(smartdeploy_training_iot_pipeline, smart-iot.gz)
Step 1: Deploy Operators -> kubectl apply -f operator.yaml
Step 2: Create Cluster -> kubectl apply - cluster.yaml
Step3: Check if Cluster is created
kubectl -n scylla get clusters.scylladb.com
Step4: Scale Up or Scale Down Cluster as needed
kubectl -n scylla edit clusters.scylladb.com and edit
Spec.Members

Scene Parsing, Object Detection and Counting
Pipeline
Pipeline Workflow
- Time Lapse camera capturing event stream onsite
- Time stamped keyframes from the video streams are tagged and uploaded as
images to the Cloud
- AI Models are used to perform real-time analytics of Key Objects/Entities on
Image Scene - Workers Onsite, Trucks, Cranes etc
Workflow Output
- Trigger Notification whenever Events of Interest Occurs e.g. Daily Activity Start
time, Equipment was Delivered
- Daily Report Notification generated from AI model emailed or via SMS

Scene Parsing, Object Detection and Counting
Pipeline
Event
Payload
Processor
Daily
Analytics
Tagged
Object
Counting
Trigger Event
Notification
ML Training &
Deployment
1
2
3
4
5
6
Pipeline 2 - Model Serving Pipeline
Pipeline 1
Model
Training
ScyllaDB Datastore
- Model MetaData
- Metrics Inference
- Inference Result
Entities Detected
Database Tables &
Materialized Views
- uuid
- entity_person_count
- entity_crane_count
- entity_truck_count
- location
- timestamp
Event Payload

Pipeline Workflow
- Ingest image pictures of a view - living room, bed room, kitchen etc
- Use AI model to identify the room type
- Identify the Walls in the room and allow Users to specify the Color
Scheme
Workflow Output
- Modified image output with painted Walls
Scene Parsing, Object Identification and
Contextual Modification

Image Scene
Detection
Model
Bedroom Model
Living Room Model
Kitchen Model
Post Processing
ScyllaDB Datastore
- Model MetaData
- Metrics Inference
- Inference Result
ML Training &
Deployment
1
2
3
45

Key Areas of ScyllaDB
Benefits in our AI Pipelines

Easily Running Multiple Pipelines on
Kubernetes!
. . . . . . .
. . . . . . . . .
Pipeline 1 Pipeline 2 Pipeline 3
1
2
3

Hyper-Param and ML Metadata Store
METADATA STORE
component-1 component-2 component-3
param1 => [a1, b1,...n1]
param2 => [a2, b2,...n2]
Param3 => [a3, b3,...n3]

device_id group_id pay_load .
Materialized View of Tables to display relevant
Info Event Info
- device_id
- reg_id
- group_id
- cust_id
- model_id
- event_id
- lat
- lng
- pay_load
- checksum
- timestamp
device_id reg_id group_id cust_id …...
TABLE: device_event_tbl
CREATE TABLE indoor_sensor (
device_id uuid,
reg_id uuid,
group_id uuid,
cust_id uuid,
model_id uuid,
event_id uuid,
lat float,
lng float,
Pay_load_size bigint,
checksum, bigint
timestamp TIMESTAMP,
PRIMARY KEY (device_id, timestamp) ) WITH CLUSTERING ORDER
BY (timestamp, DESC)
VIEW: indoor_sensor_group
CREATE MATERIALIZED VIEW indoor_sensor_group AS
SELECT device_id, lat, lng FROM indoor_sensor
WHERE group_id IS NOT NULL
PRIMARY KEY (device_id, group_id)
1
2
3
DASHBOARD
4

Thank you Stay in touch
Any questions?
Charles Adetiloye
charles@smartdeploy.ai
@cadetiloye
Timo Mechler
timo@smartdeploy.ai
Connect with us on Slack
http://bit.ly/ai-pipelines

Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

Ähnlich wie Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

Hinweis der Redaktion