SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Simplifying the Creation of ML
Workflow Pipelines for IoT Application
on Kubernetes with ScyllaDB
Timo Mechler, Product Manager
Charles Adetiloye, ML Platform Engineer
Presenters
Timo Mechler, Product Manager & Architect
Timo Mechler is a Product Manager and Architect at SmartDeployAI. He has close to a
decade of financial data modeling experience working as both an analyst and
strategist in the energy commodities sector. At SmartDeployAI he now works closely
with product development and engineering teams to solve interesting data modeling
challenges.
Charles Adetiloye, ML Platform Engineer
Charles is a Lead ML platforms engineer at SmartDeployAI, He was well over 15 years
of experience building large-scale distributed application. He is always been
interested in building distributed Event-Driven systems that are composable from
independent asynchronous subsystems. He has extensive experience working with
Kubernetes and NoSQL databases like ScyllaDB and Cassandra
About SmartDeployAI
At SmartDeployAI, we develop software platform and frameworks for
running and deploying AI and ML workflow.
Our primary focus is on
- Increasing productivity and agile team release cycle
- Increasing collaboration and visibility between team members
- Sharable and re-usable AI and ML workflow pipeline components
IOT Device Landscape
IoT Devices
All IoT Devices Generate Time-bounded Events!
. . . . . . .
t1 t2 t3 t4 t5 t6 tn- 2 tn- 1 tn
time
Event
Source
Event
Consumer
IoT Devices are Ubiquitous and all around us!
Event
Consumer
…device1
device2
device3
deviceN
Event Processing
Pipeline
IoT Devices - Geographical & TimeZone
IoT Devices - Group or Cluster of Devices
Customer 1
Customer 2
Customer 3
Customer 4
Generalized IoT Pipeline for AI and ML
Data ingestion
at scale
Data Processing
Pipeline
Data lake and
Data warehouse AI and Analytics
Streaming Data, Ingested
through a secure input
endpoint
Data Processing Pipeline, to
Clean, format raw data ingested
Data Storage, where organized
and cleansed data are stored
Model Training, Deployment,
Analytics and Insights
AI and ML Workflow Pipeline
for IoT Devices
Our Goal!
- Create a Workflow Pipeline that abstract whole process of provisioning IoT pipelines
- Efficient Utilization of Compute Resources
- Support for Multi-tenant deployment of Workflow Pipelines on a Kubernetes Cluster
- Quick instantiation of new Workflow Pipeline from Deployment Config
- Quick Access to ingested dataset for near real-time inference and model retraining
- Store Model metadata from training and Hyperparameters for Model training/re-training
- Super fast Aggregation, Rollup or Grouping of results over a given time-window
IoT Stream Ingestion Pipeline - 2014
Ingestion Process Store AnalyzeData
ML Learning
IoT Stream Ingestion Pipeline - 2014
Pros
- Scales to Support several devices
- Easy path towards ML deployment
- High write throughput
Cons
- Not easily scalable
- Very expensive setup
- We still had downtime
- Cassandra needed occasional tuning
- Bootstrapping new environment took a while
IoT Stream Ingestion Pipeline- 2017
Ingestion Process Store AnalyzeData
ML Learning
Akka Streams
POD POD
POD POD
POD
POD POD POD
POD
POD POD
IoT Stream Ingestion Pipeline - 2017
Pros
- Scales to Support several devices
- Easy path towards ML deployment
- High write throughput
- Efficient Compute Resource Utilization
- Easily Scalable
Cons
- Not easily scalable
- We still had downtime with Cassandra
- Bootstrapping our Cassandra Datastore still a pain point
- Entire workflow not easily Cloneable or Reproduced
Compute
Resource
IoT Stream Ingestion Pipeline - 2017
Ingestion Process Store AnalyzeData
ML Learning
Akka Streams
POD
POD
POD
POD
POD POD POD
POD POD
POD POD POD
POD POD
POD
IoT Stream Ingestion Pipeline - 2017
Pros
- Scales to Support several devices
- Easy path towards ML deployment
- High write throughput
- Efficient Compute Resource Utilization
- Easily Scalable Deployment Pipeline
Cons
- Cassandra JVM is extremely greedy! >= 60% of resources
- Bootstrapping Cassandra pods took over 6000ms
- Entire workflow not easily Cloneable or Reproduced
Slow Feet Don’t Eat!
Why did we go with ScyllaDB?
- Drop in replacement for Cassandra
- Low memory footprint - VERY Important on Kubernetes
- More than 8x faster than Cassandra
- Easy to containerize and deploy as Kubernetes POD
- We could easily run it as part of our ML Workflow Pipeline
Cloud Native ML Pipeline
. . .Transient
DataStore
...
Ingestion Processing Inference
Cloud Native ML Pipeline with ScyllaDB
. . .Transient
DataStore
...
Ingestion Processing Inference
Kubeflow Pipeline
ScyllaDB Operator
Cloud Native ML Pipeline with ScyllaDB
@dsl.pipeline(name=”SmartDeploy IoT”, descriptor=”IoT data stream pipeline”)
def smartdeploy_training_iot_pipeline(
servers,
topic,
auto_offset)
dataflow_transform = …
persist_to_scylladb = …
run_inference = …
compiler.Compile().compile(smartdeploy_training_iot_pipeline, smart-iot.gz)
Step 1: Deploy Operators -> kubectl apply -f operator.yaml
Step 2: Create Cluster -> kubectl apply - cluster.yaml
Step3: Check if Cluster is created
kubectl -n scylla get clusters.scylladb.com
Step4: Scale Up or Scale Down Cluster as needed
kubectl -n scylla edit clusters.scylladb.com and edit
Spec.Members
AI pipeline
Use-cases
Scene Parsing, Object Detection and Counting
Pipeline
Pipeline Workflow
- Time Lapse camera capturing event stream onsite
- Time stamped keyframes from the video streams are tagged and uploaded as
images to the Cloud
- AI Models are used to perform real-time analytics of Key Objects/Entities on
Image Scene - Workers Onsite, Trucks, Cranes etc
Workflow Output
- Trigger Notification whenever Events of Interest Occurs e.g. Daily Activity Start
time, Equipment was Delivered
- Daily Report Notification generated from AI model emailed or via SMS
Scene Parsing, Object Detection and Counting
Pipeline
Event
Payload
Processor
Daily
Analytics
Tagged
Object
Counting
Trigger Event
Notification
ML Training &
Deployment
1
2
3
4
5
6
Pipeline 2 - Model Serving Pipeline
Pipeline 1
Model
Training
ScyllaDB Datastore
- Model MetaData
- Metrics Inference
- Inference Result
Entities Detected
Database Tables &
Materialized Views
- uuid
- entity_person_count
- entity_crane_count
- entity_truck_count
- location
- timestamp
Event Payload
Pipeline Workflow
- Ingest image pictures of a view - living room, bed room, kitchen etc
- Use AI model to identify the room type
- Identify the Walls in the room and allow Users to specify the Color
Scheme
Workflow Output
- Modified image output with painted Walls
Scene Parsing, Object Identification and
Contextual Modification
Scene Parsing, Object Identification and
Contextual Modification
Image Scene
Detection
Model
Bedroom Model
Living Room Model
Kitchen Model
Post Processing
ScyllaDB Datastore
- Model MetaData
- Metrics Inference
- Inference Result
ML Training &
Deployment
1
2
3
45
Scene Parsing, Object Identification and
Contextual Modification
Key Areas of ScyllaDB
Benefits in our AI Pipelines
Easily Running Multiple Pipelines on
Kubernetes!
. . . . . . .
. . . . . . . . .
Pipeline 1 Pipeline 2 Pipeline 3
1
2
3
Hyper-Param and ML Metadata Store
METADATA STORE
component-1 component-2 component-3
param1 => [a1, b1,...n1]
param2 => [a2, b2,...n2]
Param3 => [a3, b3,...n3]
device_id group_id pay_load .
Materialized View of Tables to display relevant
Info Event Info
- device_id
- reg_id
- group_id
- cust_id
- model_id
- event_id
- lat
- lng
- pay_load
- checksum
- timestamp
device_id reg_id group_id cust_id …...
TABLE: device_event_tbl
CREATE TABLE indoor_sensor (
device_id uuid,
reg_id uuid,
group_id uuid,
cust_id uuid,
model_id uuid,
event_id uuid,
lat float,
lng float,
Pay_load_size bigint,
checksum, bigint
timestamp TIMESTAMP,
PRIMARY KEY (device_id, timestamp) ) WITH CLUSTERING ORDER
BY (timestamp, DESC)
VIEW: indoor_sensor_group
CREATE MATERIALIZED VIEW indoor_sensor_group AS
SELECT device_id, lat, lng FROM indoor_sensor
WHERE group_id IS NOT NULL
PRIMARY KEY (device_id, group_id)
1
2
3
DASHBOARD
4
Thank you Stay in touch
Any questions?
Charles Adetiloye
charles@smartdeploy.ai
@cadetiloye
Timo Mechler
timo@smartdeploy.ai
Connect with us on Slack
http://bit.ly/ai-pipelines

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Distributed Data Streaming Architecture for Modern Hardware with S...
Building a Distributed Data Streaming Architecture for Modern Hardware with S...Building a Distributed Data Streaming Architecture for Modern Hardware with S...
Building a Distributed Data Streaming Architecture for Modern Hardware with S...ScyllaDB
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarTimothy Spann
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersOmkar Joshi
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesDatabricks
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
 
Composable Data Processing with Apache Spark
Composable Data Processing with Apache SparkComposable Data Processing with Apache Spark
Composable Data Processing with Apache SparkDatabricks
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringSputnik: Airbnb’s Apache Spark Framework for Data Engineering
Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringDatabricks
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Web Services
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 

Was ist angesagt? (20)

Building a Distributed Data Streaming Architecture for Modern Hardware with S...
Building a Distributed Data Streaming Architecture for Modern Hardware with S...Building a Distributed Data Streaming Architecture for Modern Hardware with S...
Building a Distributed Data Streaming Architecture for Modern Hardware with S...
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clusters
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming Pipelines
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
Composable Data Processing with Apache Spark
Composable Data Processing with Apache SparkComposable Data Processing with Apache Spark
Composable Data Processing with Apache Spark
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringSputnik: Airbnb’s Apache Spark Framework for Data Engineering
Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
AWS_Data_Pipeline
AWS_Data_PipelineAWS_Data_Pipeline
AWS_Data_Pipeline
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 

Ähnlich wie Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-ml
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-mlShubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-ml
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-mlShubham Mallick
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simplellangit
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsSingleStore
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 

Ähnlich wie Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla (20)

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-ml
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-mlShubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-ml
Shubham, 7.5+ years exp, mcp, map r spark-hive-bi-etl-azure-dataengineer-ml
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simple
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
NextGenML
NextGenML NextGenML
NextGenML
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive AnalyticsCTO View: Driving the On-Demand Economy with Predictive Analytics
CTO View: Driving the On-Demand Economy with Predictive Analytics
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 

Mehr von ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Applications on Kubernetes with Scylla

  • 1. Simplifying the Creation of ML Workflow Pipelines for IoT Application on Kubernetes with ScyllaDB Timo Mechler, Product Manager Charles Adetiloye, ML Platform Engineer
  • 2. Presenters Timo Mechler, Product Manager & Architect Timo Mechler is a Product Manager and Architect at SmartDeployAI. He has close to a decade of financial data modeling experience working as both an analyst and strategist in the energy commodities sector. At SmartDeployAI he now works closely with product development and engineering teams to solve interesting data modeling challenges. Charles Adetiloye, ML Platform Engineer Charles is a Lead ML platforms engineer at SmartDeployAI, He was well over 15 years of experience building large-scale distributed application. He is always been interested in building distributed Event-Driven systems that are composable from independent asynchronous subsystems. He has extensive experience working with Kubernetes and NoSQL databases like ScyllaDB and Cassandra
  • 3. About SmartDeployAI At SmartDeployAI, we develop software platform and frameworks for running and deploying AI and ML workflow. Our primary focus is on - Increasing productivity and agile team release cycle - Increasing collaboration and visibility between team members - Sharable and re-usable AI and ML workflow pipeline components
  • 6. All IoT Devices Generate Time-bounded Events! . . . . . . . t1 t2 t3 t4 t5 t6 tn- 2 tn- 1 tn time Event Source Event Consumer
  • 7. IoT Devices are Ubiquitous and all around us! Event Consumer …device1 device2 device3 deviceN Event Processing Pipeline
  • 8. IoT Devices - Geographical & TimeZone
  • 9. IoT Devices - Group or Cluster of Devices Customer 1 Customer 2 Customer 3 Customer 4
  • 10. Generalized IoT Pipeline for AI and ML Data ingestion at scale Data Processing Pipeline Data lake and Data warehouse AI and Analytics Streaming Data, Ingested through a secure input endpoint Data Processing Pipeline, to Clean, format raw data ingested Data Storage, where organized and cleansed data are stored Model Training, Deployment, Analytics and Insights
  • 11. AI and ML Workflow Pipeline for IoT Devices
  • 12. Our Goal! - Create a Workflow Pipeline that abstract whole process of provisioning IoT pipelines - Efficient Utilization of Compute Resources - Support for Multi-tenant deployment of Workflow Pipelines on a Kubernetes Cluster - Quick instantiation of new Workflow Pipeline from Deployment Config - Quick Access to ingested dataset for near real-time inference and model retraining - Store Model metadata from training and Hyperparameters for Model training/re-training - Super fast Aggregation, Rollup or Grouping of results over a given time-window
  • 13. IoT Stream Ingestion Pipeline - 2014 Ingestion Process Store AnalyzeData ML Learning
  • 14. IoT Stream Ingestion Pipeline - 2014 Pros - Scales to Support several devices - Easy path towards ML deployment - High write throughput Cons - Not easily scalable - Very expensive setup - We still had downtime - Cassandra needed occasional tuning - Bootstrapping new environment took a while
  • 15. IoT Stream Ingestion Pipeline- 2017 Ingestion Process Store AnalyzeData ML Learning Akka Streams POD POD POD POD POD POD POD POD POD POD POD
  • 16. IoT Stream Ingestion Pipeline - 2017 Pros - Scales to Support several devices - Easy path towards ML deployment - High write throughput - Efficient Compute Resource Utilization - Easily Scalable Cons - Not easily scalable - We still had downtime with Cassandra - Bootstrapping our Cassandra Datastore still a pain point - Entire workflow not easily Cloneable or Reproduced Compute Resource
  • 17. IoT Stream Ingestion Pipeline - 2017 Ingestion Process Store AnalyzeData ML Learning Akka Streams POD POD POD POD POD POD POD POD POD POD POD POD POD POD POD
  • 18. IoT Stream Ingestion Pipeline - 2017 Pros - Scales to Support several devices - Easy path towards ML deployment - High write throughput - Efficient Compute Resource Utilization - Easily Scalable Deployment Pipeline Cons - Cassandra JVM is extremely greedy! >= 60% of resources - Bootstrapping Cassandra pods took over 6000ms - Entire workflow not easily Cloneable or Reproduced
  • 20. Why did we go with ScyllaDB? - Drop in replacement for Cassandra - Low memory footprint - VERY Important on Kubernetes - More than 8x faster than Cassandra - Easy to containerize and deploy as Kubernetes POD - We could easily run it as part of our ML Workflow Pipeline
  • 21. Cloud Native ML Pipeline . . .Transient DataStore ... Ingestion Processing Inference
  • 22. Cloud Native ML Pipeline with ScyllaDB . . .Transient DataStore ... Ingestion Processing Inference Kubeflow Pipeline ScyllaDB Operator
  • 23. Cloud Native ML Pipeline with ScyllaDB @dsl.pipeline(name=”SmartDeploy IoT”, descriptor=”IoT data stream pipeline”) def smartdeploy_training_iot_pipeline( servers, topic, auto_offset) dataflow_transform = … persist_to_scylladb = … run_inference = … compiler.Compile().compile(smartdeploy_training_iot_pipeline, smart-iot.gz) Step 1: Deploy Operators -> kubectl apply -f operator.yaml Step 2: Create Cluster -> kubectl apply - cluster.yaml Step3: Check if Cluster is created kubectl -n scylla get clusters.scylladb.com Step4: Scale Up or Scale Down Cluster as needed kubectl -n scylla edit clusters.scylladb.com and edit Spec.Members
  • 25. Scene Parsing, Object Detection and Counting Pipeline Pipeline Workflow - Time Lapse camera capturing event stream onsite - Time stamped keyframes from the video streams are tagged and uploaded as images to the Cloud - AI Models are used to perform real-time analytics of Key Objects/Entities on Image Scene - Workers Onsite, Trucks, Cranes etc Workflow Output - Trigger Notification whenever Events of Interest Occurs e.g. Daily Activity Start time, Equipment was Delivered - Daily Report Notification generated from AI model emailed or via SMS
  • 26. Scene Parsing, Object Detection and Counting Pipeline Event Payload Processor Daily Analytics Tagged Object Counting Trigger Event Notification ML Training & Deployment 1 2 3 4 5 6 Pipeline 2 - Model Serving Pipeline Pipeline 1 Model Training ScyllaDB Datastore - Model MetaData - Metrics Inference - Inference Result Entities Detected Database Tables & Materialized Views - uuid - entity_person_count - entity_crane_count - entity_truck_count - location - timestamp Event Payload
  • 27. Pipeline Workflow - Ingest image pictures of a view - living room, bed room, kitchen etc - Use AI model to identify the room type - Identify the Walls in the room and allow Users to specify the Color Scheme Workflow Output - Modified image output with painted Walls Scene Parsing, Object Identification and Contextual Modification
  • 28. Scene Parsing, Object Identification and Contextual Modification Image Scene Detection Model Bedroom Model Living Room Model Kitchen Model Post Processing ScyllaDB Datastore - Model MetaData - Metrics Inference - Inference Result ML Training & Deployment 1 2 3 45
  • 29. Scene Parsing, Object Identification and Contextual Modification
  • 30. Key Areas of ScyllaDB Benefits in our AI Pipelines
  • 31. Easily Running Multiple Pipelines on Kubernetes! . . . . . . . . . . . . . . . . Pipeline 1 Pipeline 2 Pipeline 3 1 2 3
  • 32. Hyper-Param and ML Metadata Store METADATA STORE component-1 component-2 component-3 param1 => [a1, b1,...n1] param2 => [a2, b2,...n2] Param3 => [a3, b3,...n3]
  • 33. device_id group_id pay_load . Materialized View of Tables to display relevant Info Event Info - device_id - reg_id - group_id - cust_id - model_id - event_id - lat - lng - pay_load - checksum - timestamp device_id reg_id group_id cust_id …... TABLE: device_event_tbl CREATE TABLE indoor_sensor ( device_id uuid, reg_id uuid, group_id uuid, cust_id uuid, model_id uuid, event_id uuid, lat float, lng float, Pay_load_size bigint, checksum, bigint timestamp TIMESTAMP, PRIMARY KEY (device_id, timestamp) ) WITH CLUSTERING ORDER BY (timestamp, DESC) VIEW: indoor_sensor_group CREATE MATERIALIZED VIEW indoor_sensor_group AS SELECT device_id, lat, lng FROM indoor_sensor WHERE group_id IS NOT NULL PRIMARY KEY (device_id, group_id) 1 2 3 DASHBOARD 4
  • 34. Thank you Stay in touch Any questions? Charles Adetiloye charles@smartdeploy.ai @cadetiloye Timo Mechler timo@smartdeploy.ai Connect with us on Slack http://bit.ly/ai-pipelines

Hinweis der Redaktion

  1. Simplifying the Creation ML Workflow
  2. IoT Devices Generates continuous stream of time-bounded events
  3. IoT Devices Generates continuous stream of time-bounded events
  4. IoT Devices Generates continuous stream of time-bounded events
  5. IoT Devices Generates continuous stream of time-bounded events
  6. IoT Devices Generates continuous stream of time-bounded events
  7. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  8. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  9. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  10. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  11. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  12. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  13. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  14. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  15. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  16. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  17. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  18. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  19. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  20. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  21. Ingestion Process, building Time-Series Pipeline with Kafka, Spark, Cassandra
  22. Because of the Pipeline Abstraction we have created, Each workflow Artifacts can run in a Shared Kubernetes Environment in different NS ScyllaDB Operator is used to instantiate dedicated DB to each pipeline that can be scaled independently Result - more efficient Utilization of resources, Easy for us to do Capacity Planning
  23. Use ScyllaDB to store Hyper-Param for Model Training We instantiate the Pipeline and Create Experiments with each Run and there Run Parameters The META data (and artifacts) for each run is stored in the METADATA store With this We can do quick AB Testing and do multiple PARALLEL runs
  24. We recieve events from all this devices Transform the payload into highly denormalized form Write the denormalized data into ScyllaDB Using Materialized View can create different subviews of the DATA set that we serialize and use to upload the views on Dashboards