SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Introduce MLflow with
Databricks
Liangjun Jiang, 10/03/20189
A tutorial to help you use Mlflow in your work
Problems of Machine Learning Workflow
DIFFICULT TO KEEP TRACK OF
EXPERIMENTS
DIFFICULT TO REPRODUCE CODE NO STANDARD WAY TO PACKAGE
AND DEPLOY MODELS
Outline
MLFLOW OVERVIEW MLFLOW WITH
DATABRICKS
CICD WITH DATABRICKS
Part I: Mlflow Overview
Open source platform for the
machine learning lifecycle
https://github.com/mlflow/mlflow
https://mlflow.org/
Mlflow Components
Scalability and Big Data
MLfow supports scaling in three dimensions
1. An individual MLflow run can execute on a distributed cluster, for example,
using Apache Spark. You can launch runs on the distributed infrastructure of your
choice and report results to a Tracking Server to compare them. MLflow includes a
built-in API to launch runs on Databricks.
2. MLflow supports launching multiple runs in parallel with different parameters, for
example, for hyperparameter tuning. You can simply use the Projects API to start
multiple runs and the Tracking API to track them.
3. MLflow Projects can take input from, and write output to, distributed storage
systems such as AWS S3 and DBFS. MLflow can automatically download such files
locally for projects that can only run on local files, or give the project a distributed
storage URI if it supports that. This means that you can write projects that build
large datasets, such as featurizing a 100 TB file.
MLflow Components - tracking
• - is an API and UI for logging parameters,
code versions, metrics, and output files when
running your machine learning code and for
later visualizing the results. You can use
MLflow Tracking in any environment (for
example, a standalone script or a notebook)
to log results to local files or to a server, then
compare multiple runs. Teams can also use it
to compare results from different users.
Tracking – cont’d
Multisteps Tracking
• A typical flow
• Step 1: download data from a url
• Step 2: transform and load the downloaded dataset to another place
• Step 3: use Spark (Databricks) to train your model
• Step 4: share your model with application developers
You can use MLFlow Tracking API to track information of each steps
MLflow Components - Projects
• are a standard format for packaging reusable data science code.
Each project is simply a directory with code or a Git repository,
and uses a descriptor file or simply convention to specify its
dependencies and how to run the code. For example, projects
can contain a conda.yaml file for specifying a
Python Conda environment. When you use the MLflow Tracking
API in a Project, MLflow automatically remembers the project
version (for example, Git commit) and any parameters. You can
easily run existing MLflow Projects from GitHub or your own Git
repository, and chain them into multi-step workflows.
mlflow run https://github.com/mlflow/mlflow-
example.git -P alpha=0.4
MLflow
Components
- models
• offer a convention for packaging machine learning models in multiple
flavors, and a variety of tools to help you deploy them. Each Model is
saved as a directory containing arbitrary files and a descriptor file
that lists several “flavors” the model can be used in. For example, a
TensorFlow model can be loaded as a TensorFlow DAG, or as a
Python function to apply to input data. MLflow provides tools to
deploy many common model types to diverse platforms: for
example, any model supporting the “Python function” flavor can be
deployed to a Docker-based REST server, to cloud platforms such as
Azure ML and AWS SageMaker, and as a user-defined function in
Apache Spark for batch and streaming inference. If you output
MLflow Models using the Tracking API, MLflow also automatically
remembers which Project and run they came from.
MLflow Models
• Storage Format
• MLflow defines several “standard” flavors that all of its built-in
deployment tools support, such as a “Python function” flavor that
describes how to run the model as a Python function.
• However, libraries can also define and use other flavors. For example,
MLflow’s mlflow.sklearn library allows loading models back as a scikit-
learn Pipeline object for use in code that is aware of scikit-learn, or as
a generic Python function for use in tools that just need to apply the
model (for example, the mlflow sagemaker tool for deploying models
to Amazon SageMaker).
Built-in Model Flavors
• Python Function
• R Funtion
• H2O
• Keras
• Mleap
• PyTorch
• Scikit-learn
• Spark Mlib
• TensorFlow
• ONNX
Saving & Serving Models
• MLflow includes a generic MLmodel format for saving models from a
variety of tools in diverse flavors. For example, many models can be
served as Python functions, so an MLmodel file can declare how each
model should be interpreted as a Python function in order to let
various tools serve it. MLflow also includes tools for running such
models locally and exporting them to Docker containers or
commercial serving platforms.
• for example, batch inference on Apache Spark and real-time serving
through a REST API
mlflow models serve -m runs:/<RUN_ID>/model
• A local restful service demo
Modeling wine preferences by data mining from physicochemical pro
perties
• mlflow models serve -m runs:/ea388ca349964193a17f2823480fc6bf/model --
port 5001
• curl -d '{"columns":["x"], "data":[[1], [-1]]}' -H 'Content-Type: application/json;
format=pandas-split' -X POST localhost:5001/invocations
Modeling Learning Model as a Service
More about model service
• Restful Service for Batch or low latency Inference with or w/o
databricks (Spark)
• Deploy with Azure ML
• The mlflow.azureml module can package python_function models into Azure
ML container images.
• Example workflow using the Python API
• https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools
Summary: Use Cases
• Individual Data Scientists can use MLflow Tracking to track experiments locally on their machine, organize code in projects for
future reuse, and output models that production engineers can then deploy using MLflow’s deployment tools. MLflow Tracking
just reads and writes files to the local file system by default, so there is no need to deploy a server.
• Data Science Teams can deploy an MLflow Tracking server to log and compare results across multiple users working on the same
problem. By setting up a convention for naming their parameters and metrics, they can try different algorithms to tackle the same
problem and then run the same algorithms again on new data to compare models in the future. Moreover, anyone can download
and run another model.
• Large Organizations can share projects, models, and results using MLflow. Any team can run another team’s code using MLflow
Projects, so organizations can package useful training and data preparation steps that other teams can use, or compare results
from many teams on the same task. Moreover, engineering teams can easily move workflows from R&D to staging to production.
• Production Engineers can deploy models from diverse ML libraries in the same way, store the models as files in a management
system of their choice, and track which run a model came from.
• Researchers and Open Source Developers can publish code to GitHub in the MLflow Project format, making it easy for anyone to
run their code using the mlflow run github.com/... command.
• ML Library Developers can output models in the MLflow Model format to have them automatically support deployment using
MLflow’s built-in tools. In addition, deployment tool developers (for example, a cloud vendor building a serving platform) can
automatically support a large variety of models.
Part II: Working
with Databricks
• MLflow on Databricks integrates with the complete
Databricks Unified Analytics Platform, including
Notebooks, Jobs, Databricks Delta, and the Databricks
security model, enabling you to run your existing MLflow
jobs at scale in a secure, production-ready manner.
mlflow run git@github.com:mlflow/mlflow-example.git -P
alpha=0.5 -b databricks --backend-config json-cluster-spec.json
Step by step guide: https://medium.com/@liangjunjiang/install-
mlflow-on-databricks-55b11bc023fa
Extra: CI/CD on Databricks
1. https://thedataguy.blog/ci-cd-with-databricks-and-azure-devops/
2. https://databricks.com/blog/2017/10/30/continuous-integration-continuous-delivery-databricks.html
• Databricks doesn’t support
Enteprise Github (yet)
• Azure DevOps can do per file git
tracking
• Coding with Databricks needs
supports of 1. your code. 2.
libraries you used 3.
machine(cluster) you will run
CI/CD with Databricks - Development
• A two-step process for development
1. Use git manage your project (as usual) locally . Your project should have data,
your notebook, the library you will use, etc
2. Copy your project content to Databricks Workspace
3. Once you are done with your notebook coding, export from Databricks
Workspace to your local
4. Use git command to push to the remote repo.
CI/CD with Databricks – Unit Test or
Integration Test
• Solution 1:
• Rewrite your notebook code to Java/Scala Classes or Python Packages using
IDE
• Write unit test of those classes with IDE
• Remember to split the core logic and your library
• Import your library to databricks and let your notebooks to interact with it
• In the end, your code are two parts: libraries and notebooks
• Solution 2:
• Everything are in a package, use Spark run it
CI/CD with Databricks – Build
• Similar to the development process
• Assign dedicated production cluster
Alternative to this Presentation
• You Probably should just watch this Mlflow Spark
+ AI Summit Keynote video
• https://vimeo.com/274266886#t=33s

Weitere ähnliche Inhalte

Was ist angesagt?

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Daniel Toomey
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOpsMarco Parenzan
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowFernando Ortega Gallego
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflakeSunil Gurav
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 

Was ist angesagt? (20)

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)Azure Databricks - An Introduction (by Kris Bock)
Azure Databricks - An Introduction (by Kris Bock)
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Azure logic app
Azure logic appAzure logic app
Azure logic app
 

Ähnlich wie Mlflow with databricks

MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"Databricks
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup Databricks
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Herman Wu
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Servingamesar0
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Training and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformTraining and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformSotrender
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021amesar0
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowIT Arena
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsMárton Kodok
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so differentRyan Dawson
 
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)sparkfabrik
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on DatabricksDataScienceConferenc1
 

Ähnlich wie Mlflow with databricks (20)

MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
 
MLlib with MLFlow.pdf
MLlib with MLFlow.pdfMLlib with MLFlow.pdf
MLlib with MLFlow.pdf
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Training and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformTraining and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud Platform
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021MLflow Model Serving - DAIS 2021
MLflow Model Serving - DAIS 2021
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
 
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
Do you know what your Drupal is doing Observe it! (DrupalCon Prague 2022)
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Mlflow with databricks

  • 1. Introduce MLflow with Databricks Liangjun Jiang, 10/03/20189 A tutorial to help you use Mlflow in your work
  • 2. Problems of Machine Learning Workflow DIFFICULT TO KEEP TRACK OF EXPERIMENTS DIFFICULT TO REPRODUCE CODE NO STANDARD WAY TO PACKAGE AND DEPLOY MODELS
  • 3. Outline MLFLOW OVERVIEW MLFLOW WITH DATABRICKS CICD WITH DATABRICKS
  • 4. Part I: Mlflow Overview Open source platform for the machine learning lifecycle https://github.com/mlflow/mlflow https://mlflow.org/
  • 6. Scalability and Big Data MLfow supports scaling in three dimensions 1. An individual MLflow run can execute on a distributed cluster, for example, using Apache Spark. You can launch runs on the distributed infrastructure of your choice and report results to a Tracking Server to compare them. MLflow includes a built-in API to launch runs on Databricks. 2. MLflow supports launching multiple runs in parallel with different parameters, for example, for hyperparameter tuning. You can simply use the Projects API to start multiple runs and the Tracking API to track them. 3. MLflow Projects can take input from, and write output to, distributed storage systems such as AWS S3 and DBFS. MLflow can automatically download such files locally for projects that can only run on local files, or give the project a distributed storage URI if it supports that. This means that you can write projects that build large datasets, such as featurizing a 100 TB file.
  • 7. MLflow Components - tracking • - is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. You can use MLflow Tracking in any environment (for example, a standalone script or a notebook) to log results to local files or to a server, then compare multiple runs. Teams can also use it to compare results from different users.
  • 9. Multisteps Tracking • A typical flow • Step 1: download data from a url • Step 2: transform and load the downloaded dataset to another place • Step 3: use Spark (Databricks) to train your model • Step 4: share your model with application developers You can use MLFlow Tracking API to track information of each steps
  • 10. MLflow Components - Projects • are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows. mlflow run https://github.com/mlflow/mlflow- example.git -P alpha=0.4
  • 11. MLflow Components - models • offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.
  • 12. MLflow Models • Storage Format • MLflow defines several “standard” flavors that all of its built-in deployment tools support, such as a “Python function” flavor that describes how to run the model as a Python function. • However, libraries can also define and use other flavors. For example, MLflow’s mlflow.sklearn library allows loading models back as a scikit- learn Pipeline object for use in code that is aware of scikit-learn, or as a generic Python function for use in tools that just need to apply the model (for example, the mlflow sagemaker tool for deploying models to Amazon SageMaker).
  • 13. Built-in Model Flavors • Python Function • R Funtion • H2O • Keras • Mleap • PyTorch • Scikit-learn • Spark Mlib • TensorFlow • ONNX
  • 14. Saving & Serving Models • MLflow includes a generic MLmodel format for saving models from a variety of tools in diverse flavors. For example, many models can be served as Python functions, so an MLmodel file can declare how each model should be interpreted as a Python function in order to let various tools serve it. MLflow also includes tools for running such models locally and exporting them to Docker containers or commercial serving platforms. • for example, batch inference on Apache Spark and real-time serving through a REST API mlflow models serve -m runs:/<RUN_ID>/model
  • 15. • A local restful service demo Modeling wine preferences by data mining from physicochemical pro perties • mlflow models serve -m runs:/ea388ca349964193a17f2823480fc6bf/model -- port 5001 • curl -d '{"columns":["x"], "data":[[1], [-1]]}' -H 'Content-Type: application/json; format=pandas-split' -X POST localhost:5001/invocations Modeling Learning Model as a Service
  • 16. More about model service • Restful Service for Batch or low latency Inference with or w/o databricks (Spark) • Deploy with Azure ML • The mlflow.azureml module can package python_function models into Azure ML container images. • Example workflow using the Python API • https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools
  • 17. Summary: Use Cases • Individual Data Scientists can use MLflow Tracking to track experiments locally on their machine, organize code in projects for future reuse, and output models that production engineers can then deploy using MLflow’s deployment tools. MLflow Tracking just reads and writes files to the local file system by default, so there is no need to deploy a server. • Data Science Teams can deploy an MLflow Tracking server to log and compare results across multiple users working on the same problem. By setting up a convention for naming their parameters and metrics, they can try different algorithms to tackle the same problem and then run the same algorithms again on new data to compare models in the future. Moreover, anyone can download and run another model. • Large Organizations can share projects, models, and results using MLflow. Any team can run another team’s code using MLflow Projects, so organizations can package useful training and data preparation steps that other teams can use, or compare results from many teams on the same task. Moreover, engineering teams can easily move workflows from R&D to staging to production. • Production Engineers can deploy models from diverse ML libraries in the same way, store the models as files in a management system of their choice, and track which run a model came from. • Researchers and Open Source Developers can publish code to GitHub in the MLflow Project format, making it easy for anyone to run their code using the mlflow run github.com/... command. • ML Library Developers can output models in the MLflow Model format to have them automatically support deployment using MLflow’s built-in tools. In addition, deployment tool developers (for example, a cloud vendor building a serving platform) can automatically support a large variety of models.
  • 18. Part II: Working with Databricks • MLflow on Databricks integrates with the complete Databricks Unified Analytics Platform, including Notebooks, Jobs, Databricks Delta, and the Databricks security model, enabling you to run your existing MLflow jobs at scale in a secure, production-ready manner. mlflow run git@github.com:mlflow/mlflow-example.git -P alpha=0.5 -b databricks --backend-config json-cluster-spec.json Step by step guide: https://medium.com/@liangjunjiang/install- mlflow-on-databricks-55b11bc023fa
  • 19. Extra: CI/CD on Databricks 1. https://thedataguy.blog/ci-cd-with-databricks-and-azure-devops/ 2. https://databricks.com/blog/2017/10/30/continuous-integration-continuous-delivery-databricks.html • Databricks doesn’t support Enteprise Github (yet) • Azure DevOps can do per file git tracking • Coding with Databricks needs supports of 1. your code. 2. libraries you used 3. machine(cluster) you will run
  • 20. CI/CD with Databricks - Development • A two-step process for development 1. Use git manage your project (as usual) locally . Your project should have data, your notebook, the library you will use, etc 2. Copy your project content to Databricks Workspace 3. Once you are done with your notebook coding, export from Databricks Workspace to your local 4. Use git command to push to the remote repo.
  • 21. CI/CD with Databricks – Unit Test or Integration Test • Solution 1: • Rewrite your notebook code to Java/Scala Classes or Python Packages using IDE • Write unit test of those classes with IDE • Remember to split the core logic and your library • Import your library to databricks and let your notebooks to interact with it • In the end, your code are two parts: libraries and notebooks • Solution 2: • Everything are in a package, use Spark run it
  • 22. CI/CD with Databricks – Build • Similar to the development process • Assign dedicated production cluster
  • 23. Alternative to this Presentation • You Probably should just watch this Mlflow Spark + AI Summit Keynote video • https://vimeo.com/274266886#t=33s