SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
MATS stack (MLFlow, Airflow, Tensorflow,
Spark) for cross-system orchestration of
machine learning pipelines
João Da Silva & Yury Kasimov
Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
João Da Silva
Scala & FP enthusiast, Lead Data Engineer @avast,
DJ @sonuz, capoeirista and co-organizer of
Prague @functional_jvm meetup
Agenda
● Intro: The saga begins
● Problems: Clone wars
● Goals: Insidious plan
● Solutions: Spark of a rebellion
● Challenges: Technologies strike back
● Successes: A new hope
Avast
Avast is dedicated to creating a world
that provides safety and privacy for all,
no matter who you are, where you are,
or how you connect.
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Problems: Clone wars
Problems: Clone wars
● A lot of duplicated effort between different teams
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
● Scaling and monitoring of deployed models
Goals: Insidious plan
Goals:Insidiousplan
Goals:Insidiousplan
Goals:Insidiousplan
Goals:Insidiousplan
Goals: Insidious plan
● Define a common ground for data science team and data engineering
team
● Structured, fast and reproducible experiments
● Cross-system orchestration/scheduling
● Automated model serving
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
ML Project
Lifecycle, Design
and Structure
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
○ Code: CI/CD
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
○ Standard Tooling
Solutions: MATS Stack
● MLFlow
● Airflow
● Tensorflow
● Spark
MATS
Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
○ Open Source ML Platform
○ Easy experiment tracking
○ Model packaging, storage, version management and deployment
○ Rich API and CLI which can be used by any language or ML Library
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● Airflow for cross-system scheduling
Solutions: Spark of a rebellion
● Airflow for cross-system scheduling
○ Message driven architecture
○ It’s extensible, it’s Python ;-)
○ Templating, default_args and connections removes boilerplate
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Spark / HDFS
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Solutions: Spark of a rebellion
● Tensorflow for high performance training
Solutions: Spark of a rebellion
● Tensorflow for high performance training
○ TFRecords
○ TensorFlow Serving
○ Rich ecosystem
Solutions: Spark of a rebellion
● Spark for distributed big data processing
Solutions: Spark of a rebellion
● Spark for distributed big data processing
○ Extensive usage and knowledge at Avast
○ Really, Spark it’s king for big data processing ;-)
Solutions: MATS Stack
MATS
Challenges: Technologies strike back
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/3015
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Airflow deployment, security and quirks
Successes: A new hope
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solution
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solutions
● MATS Stack
We would like to thank
● Tomas Trnka – our first “customer” and the creator of Angler projects
● Vojtech Tuma – our manager for guiding and supporting us
● Our colleagues for their help and suggestions
● All of you that attended this presentation
Reach out
Yury Kasimov
@LunaticInHall
João Da Silva
@imjsilva
Q&A
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Weitere ähnliche Inhalte

Was ist angesagt?

MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"Databricks
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumSasha Rosenbaum
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOpsMarco Parenzan
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsNeo4j
 
Apache Kafka and Blockchain - Comparison and a Kafka-native Implementation
Apache Kafka and Blockchain - Comparison and a Kafka-native ImplementationApache Kafka and Blockchain - Comparison and a Kafka-native Implementation
Apache Kafka and Blockchain - Comparison and a Kafka-native ImplementationKai Wähner
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 

Was ist angesagt? (20)

MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting WorkflowsGSK: How Knowledge Graphs Improve Clinical Reporting Workflows
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
 
Apache Kafka and Blockchain - Comparison and a Kafka-native Implementation
Apache Kafka and Blockchain - Comparison and a Kafka-native ImplementationApache Kafka and Blockchain - Comparison and a Kafka-native Implementation
Apache Kafka and Blockchain - Comparison and a Kafka-native Implementation
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 

Ähnlich wie MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines

Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorchgeetachauhan
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...Gabriel Moreira
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...Gabriel Moreira
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetupTheodoros Vasiloudis
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platformhadooparchbook
 

Ähnlich wie MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines (20)

Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
AML_service.pptx
AML_service.pptxAML_service.pptx
AML_service.pptx
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Kürzlich hochgeladen (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines

  • 1. MATS stack (MLFlow, Airflow, Tensorflow, Spark) for cross-system orchestration of machine learning pipelines João Da Silva & Yury Kasimov
  • 2. Intro Yury Kasimov Data engineer at Avast with background in Machine Learning and Network security, tennis player on even days, chess on odd days
  • 3. Intro Yury Kasimov Data engineer at Avast with background in Machine Learning and Network security, tennis player on even days, chess on odd days João Da Silva Scala & FP enthusiast, Lead Data Engineer @avast, DJ @sonuz, capoeirista and co-organizer of Prague @functional_jvm meetup
  • 4. Agenda ● Intro: The saga begins ● Problems: Clone wars ● Goals: Insidious plan ● Solutions: Spark of a rebellion ● Challenges: Technologies strike back ● Successes: A new hope
  • 5. Avast Avast is dedicated to creating a world that provides safety and privacy for all, no matter who you are, where you are, or how you connect.
  • 10. Intro: The saga begins
  • 12. Problems: Clone wars ● A lot of duplicated effort between different teams
  • 13. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place
  • 14. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place ● No automated process for moving from experiments to production
  • 15. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place ● No automated process for moving from experiments to production ● Scaling and monitoring of deployed models
  • 16.
  • 22. Goals: Insidious plan ● Define a common ground for data science team and data engineering team ● Structured, fast and reproducible experiments ● Cross-system orchestration/scheduling ● Automated model serving
  • 23. Solutions: Spark of a rebellion
  • 24. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure
  • 25. ML Project Lifecycle, Design and Structure Solutions: Spark of a rebellion
  • 26. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure
  • 27. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages
  • 28. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages ○ Model: Machine Learning Stages
  • 29. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages ○ Model: Machine Learning Stages ○ Code: CI/CD
  • 30. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure
  • 31. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure ○ Standard ML Development at Avast
  • 32. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure ○ Standard ML Development at Avast ○ Standard Tooling
  • 33. Solutions: MATS Stack ● MLFlow ● Airflow ● Tensorflow ● Spark MATS
  • 34. Solutions: Spark of a rebellion ● MLFlow for experiment tracking and Model management
  • 35. Solutions: Spark of a rebellion ● MLFlow for experiment tracking and Model management ○ Open Source ML Platform ○ Easy experiment tracking ○ Model packaging, storage, version management and deployment ○ Rich API and CLI which can be used by any language or ML Library
  • 36. Solutions: Spark of a rebellion
  • 37. Solutions: Spark of a rebellion
  • 38. Solutions: Spark of a rebellion
  • 39. Solutions: Spark of a rebellion
  • 40. Solutions: Spark of a rebellion ● Airflow for cross-system scheduling
  • 41. Solutions: Spark of a rebellion ● Airflow for cross-system scheduling ○ Message driven architecture ○ It’s extensible, it’s Python ;-) ○ Templating, default_args and connections removes boilerplate
  • 42. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 43. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 44. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 45. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 46. Solutions: Spark of a rebellion Spark / HDFS Kubernetes GPU/Tensorflow Yarn/Spark data_dump
  • 47. Solutions: Spark of a rebellion ● Tensorflow for high performance training
  • 48. Solutions: Spark of a rebellion ● Tensorflow for high performance training ○ TFRecords ○ TensorFlow Serving ○ Rich ecosystem
  • 49. Solutions: Spark of a rebellion ● Spark for distributed big data processing
  • 50. Solutions: Spark of a rebellion ● Spark for distributed big data processing ○ Extensive usage and knowledge at Avast ○ Really, Spark it’s king for big data processing ;-)
  • 53. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/2740
  • 54. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/3015 ▪ Lack of support Tensorflow ModelServer for serving ▪ MLFlow does not support tensorflow model logging in saved_model format ▪ https://github.com/mlflow/mlflow/issues/2740
  • 55. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/2740 ▪ Lack of support Tensorflow ModelServer for serving ▪ MLFlow does not support tensorflow model logging in saved_model format ▪ https://github.com/mlflow/mlflow/issues/2740 ▪ Airflow deployment, security and quirks
  • 57. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier
  • 58. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models
  • 59. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models ● Interest from other teams to adopt our solution
  • 60. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models ● Interest from other teams to adopt our solutions ● MATS Stack
  • 61. We would like to thank ● Tomas Trnka – our first “customer” and the creator of Angler projects ● Vojtech Tuma – our manager for guiding and supporting us ● Our colleagues for their help and suggestions ● All of you that attended this presentation
  • 63. Q&A
  • 64. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.