SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Reproducibility and Versioning
of ML Systems
ŠPELA POKLUKAR | MACHINE LEARNING CONSULTANT
DSC 2022 // © COPYRIGHT 2022 ENDAVA 2
"Špela is experienced machine learning consultant with experience mostly
in SW engineering services and energy sector. She has successfully lead
projects in various domains such as manufacturing, finance, robotics,
energy, and IT services. She is currently employed as a data discipline lead
in Endava Slovenia and an active member of innovation and gender
balance communities. Her background is in mathematics, philosophy, and
theology.”
Spela.poklukar@endava.com
+386 40 545 898
Špela Poklukar
MACHINE LEARNING CONSULTANT
DSC 2022 // © COPYRIGHT 2022 ENDAVA 3
Agenda
1. MOTIVATION
2. MODULARITY
3. VERSIONING
4. DOCUMENTATION
DSC 2022 // © COPYRIGHT 2022 ENDAVA
1
Motivation
WHY WE NEED REPRODUCIBILITY ANYWAY
DSC 2022 // © COPYRIGHT 2022 ENDAVA 5
Reproducibility:
Two Sides of the Same Coin
REPRODUCIBILITY OF
ML Research
Results
REPRODUCIBILITY OF
ML Systems
Reproducibility and Versioning of ML Systems - 1. Motivation
Reproducibility of ML research
results means being able to
recreate a ML workflow of
someone else and reach the
same or similar conclusions
as the original work.
Reproducibility of ML system
means being able to
repeatedly run a ML workflow
and reach the same or similar
results on each run.
DSC 2022 // © COPYRIGHT 2022 ENDAVA 6
EVIDENCE OF SIGNIFICANCE
To ensure the obtained results are accurate
and significant.
ABLATION
To ensure that claimed gain really comes
from the intended change and is not random.
Why Reproducibility?
COST ESTIMATION
To inform potential consumers about
computational complexity.
Reproducibility and Versioning of ML Systems - 1. Motivation
DSC 2022 // © COPYRIGHT 2022 ENDAVA 7
SCALING
To be able to scale the machine learning
system by replicating its parts.
INFERENCE
To ensure selected model is the same one
used for inference.
FAULT TOLERANCE
To reduce the risk of errors by consistently
obtaining the same results.
MODEL ROLLBACK
To allow for model rollback in case the new
model is not performing as expected.
TRUST
To create trust and credibility of the machine
learning product.
REGULATION
To adhere to the increasing regulation
constraints.
Why Reproducibility?
Reproducibility and Versioning of ML Systems - 1. Motivation
DSC 2022 // © COPYRIGHT 2022 ENDAVA
2
Modularity
ADOPTION OF PIPELINE MENTALITY
DSC 2022 // © COPYRIGHT 2022 ENDAVA 9
Feature Engineering
Data Preprocessing Model Training Prediction Service Model Evaluation
Feature Engineering
Data Preprocessing Model Training
Feature Engineering
Data Preprocessing Prediction Service
Development Pipeline:
Training Pipeline:
Inference Pipeline:
Reproducibility and Versioning of ML Systems - 2. Modularity
DSC 2022 // © COPYRIGHT 2022 ENDAVA
3
Versioning
TRACKING THE CHANGES IN ML SYSTEM
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Reproducibility can be achieved
by tracking and versioning
every change in ML system.
11
for Training Datasets
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA 12
Environment
Source Code
Model Parameters
Features
Preprocessing
System
Model
Dataset
Changes to Track
Data
‣ Dataset version
‣ Data availability
timestamp
‣ Dataset split
‣ Dataset shuffling
‣ Preprocessing
parameters
‣ Target variable
transformation
‣ Feature computation
parameters
‣ Feature selection
‣ Model type
‣ Model
hyperparameters
‣ Weights initialization
‣ Evaluation parameters
‣ Dropout
‣ Components source
code
‣ Pipeline definition
‣ Dependencies
‣ Environment variables
‣ Infrastructure
‣ Floating point
calculation
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA 13
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Experiment Tracking
14
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Dataset Versioning
15
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
The feature store is a central location where the features are stored and organized for the explicit purpose of being used to either train models
or make predictions. Features are computed when the new data become available and stored in the feature store as opposed to being
computed on the fly by training and serving services.
Feature store should provide:
‣ Updated list of feature consumers
‣ Point-in-time lookup
Benefits of using feature store:
‣ Consistent feature engineering for model development, training and serving
‣ Bridging the gap between data scientists and data & ML engineers
‣ Discover and reuse available feature sets, avoid having similar features with different definitions
‣ Point-in-time lookup to prevent data leakage
‣ Accelerate ML innovation
‣ Reproducibility of ML experiments
‣ Empower legal and compliance teams to ensure compliant use of data
Feature Versioning – Feature Store
16
for Training Datasets
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Model registry is a service that manages multiple model artifacts, tracks, and governs models at different stages of the ML lifecycle.
The model registry provides:
‣ Centralized storage for all types of models,
‣ Collaborative unit for model lifecycle management.
‣ Basis for assessing model risks and model governance.
‣ Fast and seamless model roll-out and roll-back.
Model registry should keep track of:
‣ Model name
‣ Model architecture
‣ Model hyperparameters
‣ Trained model/model weights
‣ Model metrics
Model Versioning – Model Registry
17
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Environment Versioning – Container Registry
18
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Pipeline Versioning – Workflow Orchestration
19
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Provisioning, configuring and managing infrastructure with machine-readable definition files.
Benefits:
‣ Ensures infrastructure consistency and eliminates configuration drift.
‣ Cost reduction.
‣ Increase in speed of deployments.
‣ Scalability and availability.
‣ Fosters collaboration.
‣ Standardizes deployment workflow.
‣ Error risk reduction.
Infrastructure Versioning – IaC
20
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Metadata store is a central place that holds and connects all parameters about ML system.
It may hold, for example:
‣ Data version: Reference to the dataset, md5 hash, dataset sample to know which data was used
to train the model
‣ Environment configuration: Docker image ID, requirements.txt, conda.yml, Dockerfile, Makefile to
know how to recreate the environment where the model was trained
‣ Code version: Git SHA of a commit or an actual snapshot of code to know what code was used
to build a model
‣ Model version: Model ID, configuration of the feature preprocessing steps of the pipeline, model
training, and inference to reproduce the process if needed
‣ Model performance metrics: Experiment ID, F1, accuracy, ROC on test and validation set to
know how your model performs
‣ Hardware metrics: CPU, GPU, TPU, memory to see how much your model consumes during
training/inference
‣ Performance visualizations: ROC curve, Confusion matrix, PR curve to understand the errors
deeply
‣ Model predictions: to see the actual predictions and understand model performance beyond
metrics
Version Versioning – Metadata Store
21
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA 22
EXPERIMENT
TRACKING
SOURCE
CODE
FEATURE
STORE
MODEL
REGISTRY
METADATA
STORE
EXPERIMENTING AND
MODEL DEVELOPMENT
ML PIPELINE CI/CD:
BUILD, TEST,
PACKAGE, DEPLOY
DATA ENGINEERING
CONTINUOUS MODEL
TRAINING
MODEL CD
PREDICTION SERVICE
CONTINUOUS
MONITORING
DATA
ENGINEERING
Reproducibility and Versioning of ML Systems - 3. Versioning
DSC 2022 // © COPYRIGHT 2022 ENDAVA
4
Documentation
THE ONLY DIFFERENCE BETWEEN SCIENCE AND FOOLING AROUND IS WRITIN G IT DOWN
DSC 2022 // © COPYRIGHT 2022 ENDAVA 24
Reproducibility and Versioning of ML Systems - 4. Documentation
DSC 2022 // © COPYRIGHT 2022 ENDAVA
Document as you go.
Start from day 1.
25
Reproducibility and Versioning of ML Systems - 4. Documentation
DSC 2022 // © COPYRIGHT 2022 ENDAVA 26
MLOps – New Kid on the Block - Thank You!
Thank You!
Q&A

Weitere ähnliche Inhalte

Ähnlich wie [DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar

Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Altair
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in productionAntoine Sauray
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...AbishekSubramanian2
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
 
An Integrated Simulation Tool Framework for Process Data Management
An Integrated Simulation Tool Framework for Process Data ManagementAn Integrated Simulation Tool Framework for Process Data Management
An Integrated Simulation Tool Framework for Process Data ManagementCognizant
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeJames Anderson
 
Tool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringTool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringHeiko Koziolek
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Databricks
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime PlatformAlexey Kharlamov
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleDeep Kayal
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
A Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere ToolsA Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere Toolsghodgkinson
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019VMware Tanzu
 

Ähnlich wie [DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar (20)

Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
 
Pitfalls of machine learning in production
Pitfalls of machine learning in productionPitfalls of machine learning in production
Pitfalls of machine learning in production
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...
Unlocking MLOps Potential: Streamlining Machine Learning Lifecycle with Datab...
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
An Integrated Simulation Tool Framework for Process Data Management
An Integrated Simulation Tool Framework for Process Data ManagementAn Integrated Simulation Tool Framework for Process Data Management
An Integrated Simulation Tool Framework for Process Data Management
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
 
Tool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringTool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software Engineering
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
Deploying and Monitoring Heterogeneous Machine Learning Applications with Cli...
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
A Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere ToolsA Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere Tools
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 

Mehr von DataScienceConferenc1

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDFDataScienceConferenc1
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...DataScienceConferenc1
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdfDataScienceConferenc1
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...DataScienceConferenc1
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptxDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In TreatmentsDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMEDDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with SeifDataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help youDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...DataScienceConferenc1
 

Mehr von DataScienceConferenc1 (20)

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
 
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
[DSC Europe 23][DigiHealth] Ilya Zakharov - NETWORK NEUROSCIENCE WHERE THE BR...
 

Kürzlich hochgeladen

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Kürzlich hochgeladen (20)

Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar

  • 1. Reproducibility and Versioning of ML Systems ŠPELA POKLUKAR | MACHINE LEARNING CONSULTANT
  • 2. DSC 2022 // © COPYRIGHT 2022 ENDAVA 2 "Špela is experienced machine learning consultant with experience mostly in SW engineering services and energy sector. She has successfully lead projects in various domains such as manufacturing, finance, robotics, energy, and IT services. She is currently employed as a data discipline lead in Endava Slovenia and an active member of innovation and gender balance communities. Her background is in mathematics, philosophy, and theology.” Spela.poklukar@endava.com +386 40 545 898 Špela Poklukar MACHINE LEARNING CONSULTANT
  • 3. DSC 2022 // © COPYRIGHT 2022 ENDAVA 3 Agenda 1. MOTIVATION 2. MODULARITY 3. VERSIONING 4. DOCUMENTATION
  • 4. DSC 2022 // © COPYRIGHT 2022 ENDAVA 1 Motivation WHY WE NEED REPRODUCIBILITY ANYWAY
  • 5. DSC 2022 // © COPYRIGHT 2022 ENDAVA 5 Reproducibility: Two Sides of the Same Coin REPRODUCIBILITY OF ML Research Results REPRODUCIBILITY OF ML Systems Reproducibility and Versioning of ML Systems - 1. Motivation Reproducibility of ML research results means being able to recreate a ML workflow of someone else and reach the same or similar conclusions as the original work. Reproducibility of ML system means being able to repeatedly run a ML workflow and reach the same or similar results on each run.
  • 6. DSC 2022 // © COPYRIGHT 2022 ENDAVA 6 EVIDENCE OF SIGNIFICANCE To ensure the obtained results are accurate and significant. ABLATION To ensure that claimed gain really comes from the intended change and is not random. Why Reproducibility? COST ESTIMATION To inform potential consumers about computational complexity. Reproducibility and Versioning of ML Systems - 1. Motivation
  • 7. DSC 2022 // © COPYRIGHT 2022 ENDAVA 7 SCALING To be able to scale the machine learning system by replicating its parts. INFERENCE To ensure selected model is the same one used for inference. FAULT TOLERANCE To reduce the risk of errors by consistently obtaining the same results. MODEL ROLLBACK To allow for model rollback in case the new model is not performing as expected. TRUST To create trust and credibility of the machine learning product. REGULATION To adhere to the increasing regulation constraints. Why Reproducibility? Reproducibility and Versioning of ML Systems - 1. Motivation
  • 8. DSC 2022 // © COPYRIGHT 2022 ENDAVA 2 Modularity ADOPTION OF PIPELINE MENTALITY
  • 9. DSC 2022 // © COPYRIGHT 2022 ENDAVA 9 Feature Engineering Data Preprocessing Model Training Prediction Service Model Evaluation Feature Engineering Data Preprocessing Model Training Feature Engineering Data Preprocessing Prediction Service Development Pipeline: Training Pipeline: Inference Pipeline: Reproducibility and Versioning of ML Systems - 2. Modularity
  • 10. DSC 2022 // © COPYRIGHT 2022 ENDAVA 3 Versioning TRACKING THE CHANGES IN ML SYSTEM
  • 11. DSC 2022 // © COPYRIGHT 2022 ENDAVA Reproducibility can be achieved by tracking and versioning every change in ML system. 11 for Training Datasets Reproducibility and Versioning of ML Systems - 3. Versioning
  • 12. DSC 2022 // © COPYRIGHT 2022 ENDAVA 12 Environment Source Code Model Parameters Features Preprocessing System Model Dataset Changes to Track Data ‣ Dataset version ‣ Data availability timestamp ‣ Dataset split ‣ Dataset shuffling ‣ Preprocessing parameters ‣ Target variable transformation ‣ Feature computation parameters ‣ Feature selection ‣ Model type ‣ Model hyperparameters ‣ Weights initialization ‣ Evaluation parameters ‣ Dropout ‣ Components source code ‣ Pipeline definition ‣ Dependencies ‣ Environment variables ‣ Infrastructure ‣ Floating point calculation Reproducibility and Versioning of ML Systems - 3. Versioning
  • 13. DSC 2022 // © COPYRIGHT 2022 ENDAVA 13 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 14. DSC 2022 // © COPYRIGHT 2022 ENDAVA Experiment Tracking 14 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 15. DSC 2022 // © COPYRIGHT 2022 ENDAVA Dataset Versioning 15 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 16. DSC 2022 // © COPYRIGHT 2022 ENDAVA The feature store is a central location where the features are stored and organized for the explicit purpose of being used to either train models or make predictions. Features are computed when the new data become available and stored in the feature store as opposed to being computed on the fly by training and serving services. Feature store should provide: ‣ Updated list of feature consumers ‣ Point-in-time lookup Benefits of using feature store: ‣ Consistent feature engineering for model development, training and serving ‣ Bridging the gap between data scientists and data & ML engineers ‣ Discover and reuse available feature sets, avoid having similar features with different definitions ‣ Point-in-time lookup to prevent data leakage ‣ Accelerate ML innovation ‣ Reproducibility of ML experiments ‣ Empower legal and compliance teams to ensure compliant use of data Feature Versioning – Feature Store 16 for Training Datasets Reproducibility and Versioning of ML Systems - 3. Versioning
  • 17. DSC 2022 // © COPYRIGHT 2022 ENDAVA Model registry is a service that manages multiple model artifacts, tracks, and governs models at different stages of the ML lifecycle. The model registry provides: ‣ Centralized storage for all types of models, ‣ Collaborative unit for model lifecycle management. ‣ Basis for assessing model risks and model governance. ‣ Fast and seamless model roll-out and roll-back. Model registry should keep track of: ‣ Model name ‣ Model architecture ‣ Model hyperparameters ‣ Trained model/model weights ‣ Model metrics Model Versioning – Model Registry 17 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 18. DSC 2022 // © COPYRIGHT 2022 ENDAVA Environment Versioning – Container Registry 18 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 19. DSC 2022 // © COPYRIGHT 2022 ENDAVA Pipeline Versioning – Workflow Orchestration 19 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 20. DSC 2022 // © COPYRIGHT 2022 ENDAVA Provisioning, configuring and managing infrastructure with machine-readable definition files. Benefits: ‣ Ensures infrastructure consistency and eliminates configuration drift. ‣ Cost reduction. ‣ Increase in speed of deployments. ‣ Scalability and availability. ‣ Fosters collaboration. ‣ Standardizes deployment workflow. ‣ Error risk reduction. Infrastructure Versioning – IaC 20 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 21. DSC 2022 // © COPYRIGHT 2022 ENDAVA Metadata store is a central place that holds and connects all parameters about ML system. It may hold, for example: ‣ Data version: Reference to the dataset, md5 hash, dataset sample to know which data was used to train the model ‣ Environment configuration: Docker image ID, requirements.txt, conda.yml, Dockerfile, Makefile to know how to recreate the environment where the model was trained ‣ Code version: Git SHA of a commit or an actual snapshot of code to know what code was used to build a model ‣ Model version: Model ID, configuration of the feature preprocessing steps of the pipeline, model training, and inference to reproduce the process if needed ‣ Model performance metrics: Experiment ID, F1, accuracy, ROC on test and validation set to know how your model performs ‣ Hardware metrics: CPU, GPU, TPU, memory to see how much your model consumes during training/inference ‣ Performance visualizations: ROC curve, Confusion matrix, PR curve to understand the errors deeply ‣ Model predictions: to see the actual predictions and understand model performance beyond metrics Version Versioning – Metadata Store 21 Reproducibility and Versioning of ML Systems - 3. Versioning
  • 22. DSC 2022 // © COPYRIGHT 2022 ENDAVA 22 EXPERIMENT TRACKING SOURCE CODE FEATURE STORE MODEL REGISTRY METADATA STORE EXPERIMENTING AND MODEL DEVELOPMENT ML PIPELINE CI/CD: BUILD, TEST, PACKAGE, DEPLOY DATA ENGINEERING CONTINUOUS MODEL TRAINING MODEL CD PREDICTION SERVICE CONTINUOUS MONITORING DATA ENGINEERING Reproducibility and Versioning of ML Systems - 3. Versioning
  • 23. DSC 2022 // © COPYRIGHT 2022 ENDAVA 4 Documentation THE ONLY DIFFERENCE BETWEEN SCIENCE AND FOOLING AROUND IS WRITIN G IT DOWN
  • 24. DSC 2022 // © COPYRIGHT 2022 ENDAVA 24 Reproducibility and Versioning of ML Systems - 4. Documentation
  • 25. DSC 2022 // © COPYRIGHT 2022 ENDAVA Document as you go. Start from day 1. 25 Reproducibility and Versioning of ML Systems - 4. Documentation
  • 26. DSC 2022 // © COPYRIGHT 2022 ENDAVA 26 MLOps – New Kid on the Block - Thank You!