SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Andre Mesarovic
Sr. Specialist Solutions Architect
23 November 2022
Object Relationships
Databricks
● Databricks MLflow objects (runs, experiments, registered models and their
versions, notebooks) form a complex web of relationships.
● Objects live in different places: workspace objects, DBFS (cloud) and MySQL.
○ A run’s metadata lives in MySQL, its artifacts in cloud and its notebook in the workspace and/or git.
● Experiments have zero or more runs.
● Registered models have 0 or more versions that point to a run’s MLflow model.
● Code that generated a run’s MLflow model:
○ MLflow runs have pointers to a notebook revision that generated the model.
○ Runs will/should have pointers to the git version of a notebook that generated the model.
Overview
● Model is an overloaded term with three meanings:
○ Native model artifact - this is the lowest level and is simply the native flavor’s serialized format. For
sklearn it’s a pickle file, for Keras it’s a directory with TensorFlow’s native SaveModel format files.
○ MLflow model - a wrapper around the native model artifact with metadata in the MLmodel file and
environment information in conda.yaml and requirements.txt files.
○ Registered model - a bucket of model versions. A model version contains one MLflow model that is
cached in the model repository. A version has the following links (expressed as tags):
■ run_id - points to the run that generated the version’s model.
■ source - points to the path of MLflow model in the run that corresponds to the version’s model.
■ workspace_uri - currently missing. Needed if using shared model registry. ML-19472.
Model terminology
Model relationships
● Runs
○ Contains one or more MLflow models
● Experiments
○ Notebook experiments
○ Workspace experiments
● Registered models
○ A registered model contains versions
○ A version points to one run’s MLflow model
○ Native model artifacts - the actual bits that execute predictions that are part of the MLflow model
● Notebooks
Databricks MLflow object relationships
Databricks MLflow objects relationships
● Diagram uses the UML modeling language.
○ *: indicates a many relationship
○ 1: indicates a required one relationship.
○ 0..1: indicates an optional one relationship.
● This is a logical diagram. Not all nuances are captured for simplification.
● The diagram represents a notebook experiment.
● A workspace experiment is not represented in the diagram.
Diagram legend
● A registered model is a bucket for model versions.
● A version has one MLflow model which is linked to the run that generated it.
● The production and staging stage have one "latest" version.
● Registered model versions are cached in the model registry.
● This is a clone of the run's MLflow model that the version points to.
● If source run is in a different workspace we have a lineage reachability problem.
See ML-19472 - Add workspace URI field in ModelVersion for a registered
model to make run reachable.
Registered models
● An experiment has zero or more runs.
● Two types of experiments:
○ Notebook experiment
■ Relationship of experiment to notebook is one-to-one.
■ Workspace path of the experiment is the same as its notebook.
○ Workspace experiment
■ Relationship of experiment to notebook is one-to-many.
■ Explicitly specify the experiment path with set_experiment method.
■ Different notebooks can create runs in the same experiment.
Experiments
● A run belongs to only one experiment.
● A run is linked to one notebook revision. MLflow notebook tags:
○ mlflow.databricks.notebookRevisionID
○ mlflow.databricks.notebookID
○ mlflow.databricks.notebookPath
● Optionally a run’s notebook can be linked to a git reference.
○ See discussion on Notebook below for details.
● A run can have one or more MLflow models (flavors) such as Sklearn and ONNX.
● Every run has a default Pyfunc flavor which is wrapper around the native model.
Runs
MLflow Run Details
● An MLflow Run has three basic components
○ Metadata (params, metrics, tags) residing in a MySQL database.
○ MLflow model artifact which lives in DBFS (cloud). Note you can also have arbitrary customer
artifacts.
○ Link to code:
■ For Databricks, the run points to either:
● Workspace notebook revision
● Repos notebook a pointer to git.
■ For open source the link points to git.
MLflow Run Details Legend
● A notebook has many revisions.
● Optionally, a notebook revision can be checked into git with Databricks Repos.
● Need to capture git reference analogous to the MLflow open source tags:
○ mlflow.source.git.commit
○ mlflow.source.git.repoURL
○ mlflow.gitRepoURL
● See ML-19473 - Add git reference tags to Databricks run if its notebook is synced with
Repos
● Two sources of truth for a notebook snapshot that can be confusing:
○ Databricks notebook revision
○ Git version
Notebooks
journey!
Happy

Weitere ähnliche Inhalte

Ähnlich wie Databricks MLflow Object Relationships

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark StoriesRylan Halteman
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility PrincipleBADR
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageAsankhaya Sharma
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBKnoldus Inc.
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowFernando Ortega Gallego
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup Databricks
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...AboutYouGmbH
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Collaborative modeling with sirius
Collaborative modeling with siriusCollaborative modeling with sirius
Collaborative modeling with siriuspcdavid_
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark MLdatamantra
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNico Ludwig
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overviewAlex Meadows
 
Design Patterns In Scala
Design Patterns In ScalaDesign Patterns In Scala
Design Patterns In ScalaKnoldus Inc.
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesnicolascombin1
 

Ähnlich wie Databricks MLflow Object Relationships (20)

Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Wattpad - Spark Stories
Wattpad - Spark StoriesWattpad - Spark Stories
Wattpad - Spark Stories
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Mongo db
Mongo dbMongo db
Mongo db
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Collaborative modeling with sirius
Collaborative modeling with siriusCollaborative modeling with sirius
Collaborative modeling with sirius
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
 
Design Patterns In Scala
Design Patterns In ScalaDesign Patterns In Scala
Design Patterns In Scala
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 

Kürzlich hochgeladen

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Kürzlich hochgeladen (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Databricks MLflow Object Relationships

  • 1. Andre Mesarovic Sr. Specialist Solutions Architect 23 November 2022 Object Relationships Databricks
  • 2. ● Databricks MLflow objects (runs, experiments, registered models and their versions, notebooks) form a complex web of relationships. ● Objects live in different places: workspace objects, DBFS (cloud) and MySQL. ○ A run’s metadata lives in MySQL, its artifacts in cloud and its notebook in the workspace and/or git. ● Experiments have zero or more runs. ● Registered models have 0 or more versions that point to a run’s MLflow model. ● Code that generated a run’s MLflow model: ○ MLflow runs have pointers to a notebook revision that generated the model. ○ Runs will/should have pointers to the git version of a notebook that generated the model. Overview
  • 3. ● Model is an overloaded term with three meanings: ○ Native model artifact - this is the lowest level and is simply the native flavor’s serialized format. For sklearn it’s a pickle file, for Keras it’s a directory with TensorFlow’s native SaveModel format files. ○ MLflow model - a wrapper around the native model artifact with metadata in the MLmodel file and environment information in conda.yaml and requirements.txt files. ○ Registered model - a bucket of model versions. A model version contains one MLflow model that is cached in the model repository. A version has the following links (expressed as tags): ■ run_id - points to the run that generated the version’s model. ■ source - points to the path of MLflow model in the run that corresponds to the version’s model. ■ workspace_uri - currently missing. Needed if using shared model registry. ML-19472. Model terminology
  • 5. ● Runs ○ Contains one or more MLflow models ● Experiments ○ Notebook experiments ○ Workspace experiments ● Registered models ○ A registered model contains versions ○ A version points to one run’s MLflow model ○ Native model artifacts - the actual bits that execute predictions that are part of the MLflow model ● Notebooks Databricks MLflow object relationships
  • 7. ● Diagram uses the UML modeling language. ○ *: indicates a many relationship ○ 1: indicates a required one relationship. ○ 0..1: indicates an optional one relationship. ● This is a logical diagram. Not all nuances are captured for simplification. ● The diagram represents a notebook experiment. ● A workspace experiment is not represented in the diagram. Diagram legend
  • 8. ● A registered model is a bucket for model versions. ● A version has one MLflow model which is linked to the run that generated it. ● The production and staging stage have one "latest" version. ● Registered model versions are cached in the model registry. ● This is a clone of the run's MLflow model that the version points to. ● If source run is in a different workspace we have a lineage reachability problem. See ML-19472 - Add workspace URI field in ModelVersion for a registered model to make run reachable. Registered models
  • 9. ● An experiment has zero or more runs. ● Two types of experiments: ○ Notebook experiment ■ Relationship of experiment to notebook is one-to-one. ■ Workspace path of the experiment is the same as its notebook. ○ Workspace experiment ■ Relationship of experiment to notebook is one-to-many. ■ Explicitly specify the experiment path with set_experiment method. ■ Different notebooks can create runs in the same experiment. Experiments
  • 10. ● A run belongs to only one experiment. ● A run is linked to one notebook revision. MLflow notebook tags: ○ mlflow.databricks.notebookRevisionID ○ mlflow.databricks.notebookID ○ mlflow.databricks.notebookPath ● Optionally a run’s notebook can be linked to a git reference. ○ See discussion on Notebook below for details. ● A run can have one or more MLflow models (flavors) such as Sklearn and ONNX. ● Every run has a default Pyfunc flavor which is wrapper around the native model. Runs
  • 12. ● An MLflow Run has three basic components ○ Metadata (params, metrics, tags) residing in a MySQL database. ○ MLflow model artifact which lives in DBFS (cloud). Note you can also have arbitrary customer artifacts. ○ Link to code: ■ For Databricks, the run points to either: ● Workspace notebook revision ● Repos notebook a pointer to git. ■ For open source the link points to git. MLflow Run Details Legend
  • 13. ● A notebook has many revisions. ● Optionally, a notebook revision can be checked into git with Databricks Repos. ● Need to capture git reference analogous to the MLflow open source tags: ○ mlflow.source.git.commit ○ mlflow.source.git.repoURL ○ mlflow.gitRepoURL ● See ML-19473 - Add git reference tags to Databricks run if its notebook is synced with Repos ● Two sources of truth for a notebook snapshot that can be confusing: ○ Databricks notebook revision ○ Git version Notebooks

Hinweis der Redaktion

  1. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  2. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  3. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  4. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  5. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  6. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  7. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  8. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  9. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  10. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  11. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?
  12. What did they do with us? what are they trying to do? recommendation? content curation? how does that work? How come Delta and Spark and those things can help with that thing (recommendation, or whatever they do)?