Start with version control and experiments management in ML: reproducible experiments

•

1 gefällt mir•95 views

How to manage complexity and reproducibility of Machine Learning projects? What requirements and tools? How to apply in your company and projects? Let's start with data and model version control! Review Data Version Control (DVC), MLFlow and other tools

Daten & Analysen

Start with version control and experiments
management in ML:
reproducible experiments
Data Fest3
Minsk, 2019
1
Mikhail Rozhkov

2
Workﬂow of ML project and artefacts
Problem
Statement
MVP
design
Get data
Prepare data
Train model
Evaluate
modelTest &
Integrate
Serve /
Predict
Monitor
1. Analyze &
Plan
2. Prototype
4. Monitor &
Maintain
3. Productionize
Inspired by Uber’s workflow of a machine learning project diagram. Scaling Machine Learning at Uber with Michelangelo https://eng.uber.com/scaling-michelangelo/
Solution
development

Experiment: pipelines, conﬁgs and artifacts
Algorithm
Data
Hyperpara
meters
Evaluation
Measure
Model
ETL
tasks
test
dataset
train
dataset
evaluate
train
Experiment
config - artifacts
- pipelines
- code
- conﬁgs
3

ML reproducibility is a dimension of quality
4
What is Reproducibility?
using the original methods applied to
the original data to produce the
original results [Gardner]
Why should you care?
● Trust
● Consistent Results
● Versioned History
● Team Performance
● Pain Less Production
Josh Gardner, Yuming Yang, Ryan S. Baker, Christopher Brooks. Enabling End-To-End Machine
Learning Replicability: A Case Study in Educational Data Mining

ML Reproducibility
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
6

How to start?
step 1 step 2 step 3 step 4
Manual work
Automated
work
Time on DS
task
100%
0%
10%
90%
10%
90% 90%
7

Start with artifacts versioning!
Algorithm
Data
Hyperpara
meters
Evaluation
Measure
Model
ETL
tasks
test
dataset
train
dataset
evaluate
train
Experiment
config
8

Use Case: dogs and cats classiﬁer
● Project
○ Classify dogs and cats by photo
○ Datat
■ object: cats, dogs
■ dogs: 12500
■ cats: 12500
○ Metrics: accuracy, ROC-AUC
9
● Team
○ > 2 members
○ diﬀerent machines/servers
○ diﬀerent OS
○ git-flow dev process
○ run on one machine

Step 1:
Jupyter Notebook
● code in Jupyter Notebook
● everything in Docker
10

ML Reproducibility checklist
11
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
8. Experiments results tracking

Step 2:
build pipelines
● move common code into .py modules
● build pipelines
● everything in Docker
● run experiments in terminal or Jupyter Notebook
12

Model
train
train report
index
Data
config
evaluate test reportindex
Data
config
split index
Data
config
Setup pipelines
13

ML Reproducibility checklist
14
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
8. Experiments results tracking

Step 3:
add version control
for artifacts
15
● add models/data/congis under DVC control
● same code in .py modules
● same pipelines
● everything in Docker
● run experiments in terminal or Jupyter Notebook

ML Reproducibility checklist
16
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
8. Experiments results tracking

Step 4:
add execution
DAG control
● add pipelines dependencies under DVC control
● models/data/congis under DVC control
● same code in .py modules
● same pipelines
● everything in Docker
● run experiments in terminal or Jupyter Notebook
17

Experiment
config
train config
eval config
split config
prepare
config
Model
train
train report
index
Data
config
evaluate test reportindex
Data
config
split index
Data
config
Setup pipelines
18

ML Reproducibility checklist
19
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
8. Experiments results tracking

Step 5:
add experiments
control
● add experiments benchmark (DVC, mlflow)
● pipelines dependencies under DVC control
● models/data/congis under DVC control
● same code in .py modules
● same pipelines
● everything in Docker
● run experiments in terminal or Jupyter Notebook
20

Metrics tracking in mlﬂow UI
21
from mlflow import log_metric, log_param,
log_artifact
log_artifact(args.config)
log_param('batch_size', config['batch_size'])
log_metric('f1', f1)
log_metric('roc_auc', roc_auc)

Experiments benchmarking
22
runs
params metrics

ML Reproducibility checklist
23
1. Automated pipelines
2. Control run params
3. Control execution DAG
4. Code version control
5. Artifacts version control (models, datasets, etc.)
6. Use shared/cloud storage for artifacts
7. Environment dependencies control
8. Experiments results tracking

Conclusions
1. pipelines - not diﬀicult
2. start where you detect a “copy-paste” pattern
3. artifacts version control - MUST
4. discipline in a team is important
5. more benefits for high complexity and large team projects
24

Contact me
25
Mikhail Rozhkov
mail: mnrozhkov@gmail.com
ods: @Mikhail Rozhkov

Empfohlen

HKG15-411: Browser Testing Framework for LHGLinaro

Nrnb projectSravanthi Sinha

Apache Flink London Meetup - Let's Talk ML on FlinkStavros Kontopoulos

Qtp certification questions and answersRamu Palanki

Gatling overviewViral Jain

STAMP at Open Cloud Forum by OW2 2017STAMP Project

Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingPablo Gómez Abajo

A Tool for Optimizing Java 8 Stream Software via Automated RefactoringRaffi Khatchadourian

Empfohlen

HKG15-411: Browser Testing Framework for LHGLinaro

Nrnb projectSravanthi Sinha

Apache Flink London Meetup - Let's Talk ML on FlinkStavros Kontopoulos

Qtp certification questions and answersRamu Palanki

Gatling overviewViral Jain

STAMP at Open Cloud Forum by OW2 2017STAMP Project

Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingPablo Gómez Abajo

A Tool for Optimizing Java 8 Stream Software via Automated RefactoringRaffi Khatchadourian

StarWest 2019 - End to end testing: Stupid or Legit?mabl

Reproducibility and experiments management in Machine Learning Mikhail Rozhkov

End to-end test automation at scalemabl

DevOps for TYPO3 Teams and ProjectsFedir RYKHTIK

Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019VMware Tanzu

Software Delivery in 2016 - A Continuous Delivery ApproachGiovanni Toraldo

Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender

Scaling Ride-Hailing with Machine Learning on MLflowDatabricks

AdaCore Paris Tech Day 2016: Jose Ruiz - QGen Tech Updatejamieayre

Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Fwdays

Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev

Tools for Test-Driven Product ModelingTim Geisler

Moodle Development Best PracitcesJustin Filip

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman

QA Meetup at Signavio (Berlin, 06.06.19)Anesthezia

Measurement .Net Performance with BenchmarkDotNetVasyl Senko

Presentation Verification & ValidationElmar Selbach

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus

Monitoring AI with AIStepan Pushkarev

Data science workflows: from notebooks to productionMarissa Saunders

Школа Tech-In.RU: Cеминар 1. Основы работы с Ардуино (Аrduino) и Обзор hardwa...Mikhail Rozhkov

How to improve performance of team members? Consider competencies and context! Mikhail Rozhkov

Weitere ähnliche Inhalte

Ähnlich wie Start with version control and experiments management in ML: reproducible experiments

StarWest 2019 - End to end testing: Stupid or Legit?mabl

Reproducibility and experiments management in Machine Learning Mikhail Rozhkov

End to-end test automation at scalemabl

DevOps for TYPO3 Teams and ProjectsFedir RYKHTIK

Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019VMware Tanzu

Software Delivery in 2016 - A Continuous Delivery ApproachGiovanni Toraldo

Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender

Scaling Ride-Hailing with Machine Learning on MLflowDatabricks

AdaCore Paris Tech Day 2016: Jose Ruiz - QGen Tech Updatejamieayre

Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Fwdays

Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev

Tools for Test-Driven Product ModelingTim Geisler

Moodle Development Best PracitcesJustin Filip

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman

QA Meetup at Signavio (Berlin, 06.06.19)Anesthezia

Measurement .Net Performance with BenchmarkDotNetVasyl Senko

Presentation Verification & ValidationElmar Selbach

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus

Monitoring AI with AIStepan Pushkarev

Data science workflows: from notebooks to productionMarissa Saunders

Ähnlich wie Start with version control and experiments management in ML: reproducible experiments (20)

StarWest 2019 - End to end testing: Stupid or Legit?

Reproducibility and experiments management in Machine Learning

End to-end test automation at scale

DevOps for TYPO3 Teams and Projects

Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019

Software Delivery in 2016 - A Continuous Delivery Approach

Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...

Scaling Ride-Hailing with Machine Learning on MLflow

AdaCore Paris Tech Day 2016: Jose Ruiz - QGen Tech Update

Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"

Hydrosphere.io for ODSC: Webinar on Kubeflow

Tools for Test-Driven Product Modeling

Moodle Development Best Pracitces

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...

QA Meetup at Signavio (Berlin, 06.06.19)

Measurement .Net Performance with BenchmarkDotNet

Presentation Verification & Validation

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...

Monitoring AI with AI

Data science workflows: from notebooks to production

Mehr von Mikhail Rozhkov

Школа Tech-In.RU: Cеминар 1. Основы работы с Ардуино (Аrduino) и Обзор hardwa...Mikhail Rozhkov

How to improve performance of team members? Consider competencies and context! Mikhail Rozhkov

Применение Arduino (Ардуино) в школе. Сообщество Tech-In.ruMikhail Rozhkov

Tech in.ru Опыт проведения семинаров по ардуино, электронике и робототехнике ...Mikhail Rozhkov

Slides_Workplace context and its effect on individual competencies and perfor...Mikhail Rozhkov

Study summary_Workplace context and its effect on individual competencies and...Mikhail Rozhkov

An initial framework of competency-based knowledge managementMikhail Rozhkov

Отчет о конференции "Управление знаниями: практика" 2011Mikhail Rozhkov

Роль знаний в организацииMikhail Rozhkov

Влияние управления знаниями на конкурентоспособность организацийMikhail Rozhkov

концепция поликтики уз в современном вузеMikhail Rozhkov

организационно-управленческие семинары как инструмент управления знаниямиMikhail Rozhkov

Implementation of work-based learning approach in partnership of universities...Mikhail Rozhkov

Управление знаниями в университетеMikhail Rozhkov

Интернет в образовании: путеводительMikhail Rozhkov

Mehr von Mikhail Rozhkov (15)

Школа Tech-In.RU: Cеминар 1. Основы работы с Ардуино (Аrduino) и Обзор hardwa...

How to improve performance of team members? Consider competencies and context!

Применение Arduino (Ардуино) в школе. Сообщество Tech-In.ru

Tech in.ru Опыт проведения семинаров по ардуино, электронике и робототехнике ...

Slides_Workplace context and its effect on individual competencies and perfor...

Study summary_Workplace context and its effect on individual competencies and...

An initial framework of competency-based knowledge management

Отчет о конференции "Управление знаниями: практика" 2011

Роль знаний в организации

Влияние управления знаниями на конкурентоспособность организаций

концепция поликтики уз в современном вузе

организационно-управленческие семинары как инструмент управления знаниями

Implementation of work-based learning approach in partnership of universities...

Управление знаниями в университете

Интернет в образовании: путеводитель

Kürzlich hochgeladen

Vision, Mission, Goals and Objectives ppt..pptxellehsormae

Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

Multiple time frame trading analysis -brianshannon.pdfchwongval

Easter Eggs From Star Wars and in cars 1 and 217djon017

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

detection and classification of knee osteoarthritis.pptxAleenaJamil4

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

Kürzlich hochgeladen (20)

Vision, Mission, Goals and Objectives ppt..pptx

Advanced Machine Learning for Business Professionals

Semantic Shed - Squashing and Squeezing.pptx

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh

modul pembelajaran robotic Workshop _ by Slidesgo.pptx

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI

Heart Disease Classification Report: A Data Analysis Project

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档

Multiple time frame trading analysis -brianshannon.pdf

Easter Eggs From Star Wars and in cars 1 and 2

RABBIT: A CLI tool for identifying bots based on their GitHub events.

Student Profile Sample report on improving academic performance by uniting gr...

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

detection and classification of knee osteoarthritis.pptx

20240419 - Measurecamp Amsterdam - SAM.pdf

Data Factory in Microsoft Fabric (MsBIP #82)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

Start with version control and experiments management in ML: reproducible experiments

1. Start with version control and experiments management in ML: reproducible experiments Data Fest3 Minsk, 2019 1 Mikhail Rozhkov

2. 2 Workﬂow of ML project and artefacts Problem Statement MVP design Get data Prepare data Train model Evaluate modelTest & Integrate Serve / Predict Monitor 1. Analyze & Plan 2. Prototype 4. Monitor & Maintain 3. Productionize Inspired by Uber’s workflow of a machine learning project diagram. Scaling Machine Learning at Uber with Michelangelo https://eng.uber.com/scaling-michelangelo/ Solution development

3. Experiment: pipelines, conﬁgs and artifacts Algorithm Data Hyperpara meters Evaluation Measure Model ETL tasks test dataset train dataset evaluate train Experiment config - artifacts - pipelines - code - conﬁgs 3

4. ML reproducibility is a dimension of quality 4 What is Reproducibility? using the original methods applied to the original data to produce the original results [Gardner] Why should you care? ● Trust ● Consistent Results ● Versioned History ● Team Performance ● Pain Less Production Josh Gardner, Yuming Yang, Ryan S. Baker, Christopher Brooks. Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining

5. Is a “magic button”? 5

6. ML Reproducibility 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 6

7. How to start? step 1 step 2 step 3 step 4 Manual work Automated work Time on DS task 100% 0% 10% 90% 10% 90% 90% 7

8. Start with artifacts versioning! Algorithm Data Hyperpara meters Evaluation Measure Model ETL tasks test dataset train dataset evaluate train Experiment config 8

9. Use Case: dogs and cats classifier ● Project ○ Classify dogs and cats by photo ○ Datat ■ object: cats, dogs ■ dogs: 12500 ■ cats: 12500 ○ Metrics: accuracy, ROC-AUC 9 ● Team ○ > 2 members ○ different machines/servers ○ different OS ○ git-flow dev process ○ run on one machine

10. Step 1: Jupyter Notebook ● code in Jupyter Notebook ● everything in Docker 10

11. ML Reproducibility checklist 11 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 8. Experiments results tracking

12. Step 2: build pipelines ● move common code into .py modules ● build pipelines ● everything in Docker ● run experiments in terminal or Jupyter Notebook 12

13. Model train train report index Data config evaluate test reportindex Data config split index Data config Setup pipelines 13

14. ML Reproducibility checklist 14 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 8. Experiments results tracking

15. Step 3: add version control for artifacts 15 ● add models/data/congis under DVC control ● same code in .py modules ● same pipelines ● everything in Docker ● run experiments in terminal or Jupyter Notebook

16. ML Reproducibility checklist 16 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 8. Experiments results tracking

17. Step 4: add execution DAG control ● add pipelines dependencies under DVC control ● models/data/congis under DVC control ● same code in .py modules ● same pipelines ● everything in Docker ● run experiments in terminal or Jupyter Notebook 17

18. Experiment config train config eval config split config prepare config Model train train report index Data config evaluate test reportindex Data config split index Data config Setup pipelines 18

19. ML Reproducibility checklist 19 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 8. Experiments results tracking

20. Step 5: add experiments control ● add experiments benchmark (DVC, mlflow) ● pipelines dependencies under DVC control ● models/data/congis under DVC control ● same code in .py modules ● same pipelines ● everything in Docker ● run experiments in terminal or Jupyter Notebook 20

21. Metrics tracking in mlﬂow UI 21 from mlflow import log_metric, log_param, log_artifact log_artifact(args.config) log_param('batch_size', config['batch_size']) log_metric('f1', f1) log_metric('roc_auc', roc_auc)

22. Experiments benchmarking 22 runs params metrics

23. ML Reproducibility checklist 23 1. Automated pipelines 2. Control run params 3. Control execution DAG 4. Code version control 5. Artifacts version control (models, datasets, etc.) 6. Use shared/cloud storage for artifacts 7. Environment dependencies control 8. Experiments results tracking

24. Conclusions 1. pipelines - not diﬀicult 2. start where you detect a “copy-paste” pattern 3. artifacts version control - MUST 4. discipline in a team is important 5. more benefits for high complexity and large team projects 24

25. Contact me 25 Mikhail Rozhkov mail: mnrozhkov@gmail.com ods: @Mikhail Rozhkov