Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

MLFlow: Platform for Complete Machine Learning Lifecycle

1.048 Aufrufe

Veröffentlicht am

Description
Data Science and ML development bring many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work.

MLflow addresses some of these challenges during an ML model development cycle.

Abstract
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure.

In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

With a short demo, you see a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Using tracking Python APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

MLFlow: Platform for Complete Machine Learning Lifecycle

  1. 1. : Platform for Complete Machine Learning Lifecycle Jules S. Damji PyData Miami, FL Jan 10, 2019 @2twitme
  2. 2. Apache Spark Developer & CommunityAdvocate @ Databricks DeveloperAdvocate@ Hortonworks Software engineering @Sun Microsystems, Netscape, @Home, Excite@Home, VeriSign, Scalix, Centrify, LoudCloud/Opsware, ProQuest Program Chair Spark + AI Summit https://www.linkedin.com/in/dmatrix @2twitme
  3. 3. Outline Overview of ML development challenges How MLflow tackles these MLflow Components Developer Experience & Demo Ongoing Roadmap Q & A
  4. 4. Machine Learning Development is Complex
  5. 5. ML Lifecycle 5 Delta Data Prep Training Deploy Raw Data μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  6. 6. “I build 100s of models/day to lift revenue, using any library: MLlib, PyTorch, R, etc. There’s no easy way to see what data went in a model from a week ago, tune it and rebuild it.” -- Chief scientist at ad tech firm Example
  7. 7. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX +Standardize the data prep / training / deploy loop: if you work with the platform, you get these! –Limited to a few algorithms or frameworks –Tied to one company’s infrastructure –Out of luck if you left the company…. Can we provide similar benefits in an open manner?
  8. 8. Introducing Open machine learningplatform • Works with any ML library& language • Runs the same wayanywhere (e.g., anycloud) • Designed to be useful for 1 or 1000+ person orgs • Simple, Easy-to-use, Developer Experience, and get started!
  9. 9. MLflow Design Philosophy 1. “API-first”, open platform • Allow submittingruns,models,etc from anylibrary & language • Example: a “model” can justbe a lambdafunction thatMLflow can thendeploy in many places (Docker, AzureML, Spark UDF, …) Key enabler: built aroundREST APIs and CLI
  10. 10. MLflow Design Philosophy 2. Modular design • Let people use different components individually(e.g.,use MLflow’s project format but not its deployment tools) • Not monolithic, Distinctiveand Selective Key enabler: distinct components (Tracking/Projects/Models)
  11. 11. MLflow Components 11 Tracking Record and query experiments: code, configs, results, …etc Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  12. 12. Model Development without MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  13. 13. Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Tags and Notes: information about a run Artifacts: files, data and models Source: what code ran? Version: what of the code?
  14. 14. MLflow Tracking API: Simple! 14 Tracking Record and query experiments: code, configs, results, …etc import mlflow # log model’s tuning parameters with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # log model’s metrics mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model)
  15. 15. $ mlflow ui Model Development with MLflow is Simple! data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  16. 16. Notebooks LocalApps CloudJobs Tracking Server UI API MLflow Tracking Python, Java, R or REST API $ export MLFLOW_TRACKING_URI <URI> (http://mlflow.tracking.com/mlflow) mlflow.set_tracking_uri(URI)
  17. 17. MLflow Projects Motivation Diverse set of tools Diverse set of environments Result: It is difficult to productionize and share. Projects Packaging format for reproducible runs on any platform
  18. 18. Project Spec Code DataConfig LocalExecution Remote Execution MLflow Projects
  19. 19. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> -P lambda=0.2 mlflow.run(“git://<my_project>”, ...)
  20. 20. ML Frameworks InferenceCode Batch & Stream Scoring Serving Tools MLflow Models Standard for ML models Model Format ONNX Flavor Python Flavor . . .
  21. 21. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlowmodel format Usable by any tool that can run Python (Docker,Spark,etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow >>> mlflow.tensorflow.log_model(...)
  22. 22. Developer Experience & Demo Binary Classification of IMDB Movie Reviews Using Keras Neural Network Model https://dbricks.co/keras_imdb https://github.com/dmatrix/jsd-mlflow-examples/ +
  23. 23. 23 Model Units Epochs Loss Function Hidden Layers Base 16 20 binary_crosstropy 1 Experiment-1 32 30 binary_crosstropy 3 Experiment-2 32 20 mse 3 MLflow Baseline & Experiment Models
  24. 24. 24 MLflow Baseline & Experiment Models .0.1 .1.0
  25. 25. Ongoing MLflow Roadmap • TensorFlow, Keras, PyTorch, H2O, MLleap, MLlib integrations ✔ • Java and R MLflow Client language APIs ✔ • Multi-step workflows ✔ • Hyperparameter tuning ✔ • Integration with Databricks Tracking Server ✔ • Support for Data Store (e.g., MySQL) ✔ • Stabilize Mlflow APIs 1.0 • Model metadata & management & registry • Hosted MLflow Just released v8.0.1 • Faster & Improved UI • Extended Python Model as Spark UDF • Persist model dependencies as Conda Environment
  26. 26. Learning More About MLflow • pip install mlflow to get started • Find docs & examples at mlflow.org • https://github.com/mlflow/mlflow • tinyurl.com/mlflow-slack • https://dbricks.co/mlflow-survey2019 26
  27. 27. What Did We Talk About? Workflow tools can greatly simplifythe ML lifecycle • Simplify lifecycle development • Lightweight, open platform that integrates easily • Available APIs: Python, Java & R • Easy to install and use • Develop locally and track locally or remotely • Deploy locally, cloud, on premise… • Visualize experiments
  28. 28. 15% Discount Code: PyDataMiami
  29. 29. Thank You J jules@databricks.com @2twitme https://www.linkedin.com/in/dmatrix/

×