At Avast we complete over 17 million phishing detections a day, providing crucial online protection for this type of attacks.
In this talk Joao Da Silva and Yury Kasimov will present the MATS stack for productionisation of Machine Learning and their journey into integrating model tracking, storage, cross-system orchestration and model deployments for a complete and modern machine learning pipeline.
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines
1. MATS stack (MLFlow, Airflow, Tensorflow,
Spark) for cross-system orchestration of
machine learning pipelines
João Da Silva & Yury Kasimov
2. Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
3. Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
João Da Silva
Scala & FP enthusiast, Lead Data Engineer @avast,
DJ @sonuz, capoeirista and co-organizer of
Prague @functional_jvm meetup
4. Agenda
● Intro: The saga begins
● Problems: Clone wars
● Goals: Insidious plan
● Solutions: Spark of a rebellion
● Challenges: Technologies strike back
● Successes: A new hope
5. Avast
Avast is dedicated to creating a world
that provides safety and privacy for all,
no matter who you are, where you are,
or how you connect.
13. Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
14. Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
15. Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
● Scaling and monitoring of deployed models
22. Goals: Insidious plan
● Define a common ground for data science team and data engineering
team
● Structured, fast and reproducible experiments
● Cross-system orchestration/scheduling
● Automated model serving
26. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
27. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
28. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
29. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
○ Code: CI/CD
30. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
31. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
32. Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
○ Standard Tooling
34. Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
35. Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
○ Open Source ML Platform
○ Easy experiment tracking
○ Model packaging, storage, version management and deployment
○ Rich API and CLI which can be used by any language or ML Library
50. Solutions: Spark of a rebellion
● Spark for distributed big data processing
○ Extensive usage and knowledge at Avast
○ Really, Spark it’s king for big data processing ;-)
53. Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
54. Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/3015
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
55. Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Airflow deployment, security and quirks
57. Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
58. Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
59. Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solution
60. Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solutions
● MATS Stack
61. We would like to thank
● Tomas Trnka – our first “customer” and the creator of Angler projects
● Vojtech Tuma – our manager for guiding and supporting us
● Our colleagues for their help and suggestions
● All of you that attended this presentation