SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Machine Learning
Systems for Engineers
Where Data Science Meets
Engineering
Who Am I?
• I’m Cameron!
• LinkedIn - https://www.linkedin.com/in/cameron-joannidis/
• Twitter - @CamJo89
• Consult across a range of areas and have built many big data
and machine learning systems
• Buzz Word Bingo
• Big Data
• Machine Learning
• Functional Programming
Agenda
• Data
• Deployment
• Metrics
• Big Data Iteration Speed
Data
Example Use Case: Churn Prediction
We want to predict which users are likely to leave our service
soon so that we can try and give them reasons to stay
Training Data Creation
• Historical Data (need actual
churn events as examples)
• We know the labels at train time
• Produce Features to try and
predict the label

Train Our Model
• Minimise our loss function to
best predict our labels (Churn/
No Churn)
Prediction Time
• Jason’s red feature value > 30
Prediction Time
• Jason’s red feature value > 30
• Jason’s yellow feature value != 7
Prediction Time
• Jason’s red feature value > 30
• Jason’s yellow feature value != 7
• We predict Jason will churn
Moving to Production
Training
Pipeline
Training
Pipeline
Scoring
Pipeline
Data Issues
• Data ingestion lags (systematic) or failures (random)
• Data is incorrect
Data Issues
• Data ingestion lags (systematic) or failures (random)
• Data is incorrect
Before We Change the System
• Fix the data source if that’s and option
• Measure the importance of the feature in the model to
quantify the cost/effort
Naive Best Effort
• Use most recent data for all
features
• Inconsistent customer view
• Retrain model with data lag in
mind
• Tightly couple model
Consistently Lagged
• Get a consistent snapshot at the
time of the most lagged data
source
• Predictions will be outdated
equal to the slowest data
source lag
Imputation
• Fill in missing values with median
for numerical, mode for
continuous
• Every users experience is the
median experience? Not useful.
• Contextual Imputing (e.g. Median
male height for men, median
female height for women)
• Lots of custom model specific
code necessary
Graceful Model Degradation
• Model Specific fallback to give a
best guess from the distribution
given the current inputs
• Doesn't come out of the box in
most cases
Deployment
Data to Model Deployments
• Containerised model exposing
scoring API
• Clean and simple model
management semantics
• Send your data to your models
• Network shuffle costs can be
substantial for larger datasets
Model to Data Deployment
• Distributed processing
framework performs scoring (e.g.
Spark)
• Send your models to your data
• Efficient but less portable
• Model lifecycle more difficult to
manage

Future Solutions
• Spark on Kubernetes
• Models and Executors colocated in pods with data locality.
• Model lifecycle management through Kubernetes
• Rollouts
• Rollbacks
• Canary Testing
• A/B testing
• Managed cloud solutions
• Complexity hidden from users

Metrics
A Few ML System Metrics
• Data Distribution
• Effectiveness in Market
Data Distribution
Your training data will have some
distribution of labels
Data Distribution
•In production, your data distribution
may be significantly different
•This can happen over time as these
systems tend to be dynamic

Possible Causes
• Changes to the domain you're modelling
• Seasonality or external effects
• Changes to the customers themselves or the way the
customers are using your service
• Problems with the data collection pipelines (corrupted
data feeds etc)
Effectiveness in Market
• Production is the first real test
• Need to capture metrics to measure the effect of the
model for its intended purpose
• Paves the road towards;
• Effective A/B testing
• Incremental model improvement
• Measurability of ROI
Big Data Iteration Speed
Training Models on Big Data is Slow
• Not all algorithms scale linearly as data/model
complexity increases
• Hit computation/memory bottlenecks
• Number of hypothesis we can test is reduced
• Generating new features can become prohibitively
expensive
Stratified Sampling
Sampling History
Customer Subset Sampling
Know Where to Spend Your Time
• Bad performance on training data = Bias Problem
• Improve features
• More complex model
• Train longer
• Good performance on training data and bad
performance on test set = Variance Problem
• Get more data for training
• Regularisation
Choice of Framework / Technology
• Modelling in R/Python and rewriting in production in
Scala/Spark is an expensive process
• Choose a tech stack that allows engineers and data
scientists to work together and productionise things
quickly. Leads to faster feedback loops
What We’ve Covered
• Data issues can be be a central issue to ML systems are
require a lot of up front design thought
• There are several modes of deployment, each with
their own tradeoffs for different scenarios
• Production is not the end of the process for ML
models. Metrics are a fundamental part of enabling
improvement and growth.
• Ways to improve iteration speed on ML projects
Thank You
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionTammy Everts
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Luca Zavarella
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingDatabricks
 
Walk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresWalk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresLuca Zavarella
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLsparktc
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learningsafa cimenli
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Entity framework advanced
Entity framework advancedEntity framework advanced
Entity framework advancedUsama Nada
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)Julien SIMON
 

Was ist angesagt? (20)

Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversion
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
Walk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresWalk through of azure machine learning studio new features
Walk through of azure machine learning studio new features
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
ODSC East 2018
ODSC East 2018ODSC East 2018
ODSC East 2018
 
Optimizing Java Performance
Optimizing Java PerformanceOptimizing Java Performance
Optimizing Java Performance
 
Building Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemMLBuilding Custom
Machine Learning Algorithms
with Apache SystemML
Building Custom
Machine Learning Algorithms
with Apache SystemML
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Entity framework advanced
Entity framework advancedEntity framework advanced
Entity framework advanced
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 

Ähnlich wie Machine learning systems for engineers

Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesDerek Graham
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMProduct School
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
The Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine LearningThe Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine LearningPrecisely
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceChristian Beedgen
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsGanesan Narayanasamy
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko Neotys
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Building an Experimentation Platform in Clojure
Building an Experimentation Platform in ClojureBuilding an Experimentation Platform in Clojure
Building an Experimentation Platform in ClojureSrihari Sriraman
 
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...DataScienceConferenc1
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Precisely
 

Ähnlich wie Machine learning systems for engineers (20)

Machine learning
Machine learningMachine learning
Machine learning
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
Adventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE BytesAdventures in Azure Machine Learning from NE Bytes
Adventures in Azure Machine Learning from NE Bytes
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
The Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine LearningThe Fine Art of Combining Capacity Management with Machine Learning
The Fine Art of Combining Capacity Management with Machine Learning
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
Ml2 production
Ml2 productionMl2 production
Ml2 production
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Building an Experimentation Platform in Clojure
Building an Experimentation Platform in ClojureBuilding an Experimentation Platform in Clojure
Building an Experimentation Platform in Clojure
 
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Machine learning systems for engineers

  • 1. Machine Learning Systems for Engineers Where Data Science Meets Engineering
  • 2. Who Am I? • I’m Cameron! • LinkedIn - https://www.linkedin.com/in/cameron-joannidis/ • Twitter - @CamJo89 • Consult across a range of areas and have built many big data and machine learning systems • Buzz Word Bingo • Big Data • Machine Learning • Functional Programming
  • 3.
  • 4. Agenda • Data • Deployment • Metrics • Big Data Iteration Speed
  • 6. Example Use Case: Churn Prediction We want to predict which users are likely to leave our service soon so that we can try and give them reasons to stay
  • 7. Training Data Creation • Historical Data (need actual churn events as examples) • We know the labels at train time • Produce Features to try and predict the label

  • 8. Train Our Model • Minimise our loss function to best predict our labels (Churn/ No Churn)
  • 9. Prediction Time • Jason’s red feature value > 30
  • 10. Prediction Time • Jason’s red feature value > 30 • Jason’s yellow feature value != 7
  • 11. Prediction Time • Jason’s red feature value > 30 • Jason’s yellow feature value != 7 • We predict Jason will churn
  • 15. Data Issues • Data ingestion lags (systematic) or failures (random) • Data is incorrect
  • 16. Data Issues • Data ingestion lags (systematic) or failures (random) • Data is incorrect
  • 17. Before We Change the System • Fix the data source if that’s and option • Measure the importance of the feature in the model to quantify the cost/effort
  • 18. Naive Best Effort • Use most recent data for all features • Inconsistent customer view • Retrain model with data lag in mind • Tightly couple model
  • 19. Consistently Lagged • Get a consistent snapshot at the time of the most lagged data source • Predictions will be outdated equal to the slowest data source lag
  • 20. Imputation • Fill in missing values with median for numerical, mode for continuous • Every users experience is the median experience? Not useful. • Contextual Imputing (e.g. Median male height for men, median female height for women) • Lots of custom model specific code necessary
  • 21. Graceful Model Degradation • Model Specific fallback to give a best guess from the distribution given the current inputs • Doesn't come out of the box in most cases
  • 23. Data to Model Deployments • Containerised model exposing scoring API • Clean and simple model management semantics • Send your data to your models • Network shuffle costs can be substantial for larger datasets
  • 24. Model to Data Deployment • Distributed processing framework performs scoring (e.g. Spark) • Send your models to your data • Efficient but less portable • Model lifecycle more difficult to manage

  • 25. Future Solutions • Spark on Kubernetes • Models and Executors colocated in pods with data locality. • Model lifecycle management through Kubernetes • Rollouts • Rollbacks • Canary Testing • A/B testing • Managed cloud solutions • Complexity hidden from users

  • 27. A Few ML System Metrics • Data Distribution • Effectiveness in Market
  • 28. Data Distribution Your training data will have some distribution of labels
  • 29. Data Distribution •In production, your data distribution may be significantly different •This can happen over time as these systems tend to be dynamic

  • 30. Possible Causes • Changes to the domain you're modelling • Seasonality or external effects • Changes to the customers themselves or the way the customers are using your service • Problems with the data collection pipelines (corrupted data feeds etc)
  • 31. Effectiveness in Market • Production is the first real test • Need to capture metrics to measure the effect of the model for its intended purpose • Paves the road towards; • Effective A/B testing • Incremental model improvement • Measurability of ROI
  • 33. Training Models on Big Data is Slow • Not all algorithms scale linearly as data/model complexity increases • Hit computation/memory bottlenecks • Number of hypothesis we can test is reduced • Generating new features can become prohibitively expensive
  • 37. Know Where to Spend Your Time • Bad performance on training data = Bias Problem • Improve features • More complex model • Train longer • Good performance on training data and bad performance on test set = Variance Problem • Get more data for training • Regularisation
  • 38. Choice of Framework / Technology • Modelling in R/Python and rewriting in production in Scala/Spark is an expensive process • Choose a tech stack that allows engineers and data scientists to work together and productionise things quickly. Leads to faster feedback loops
  • 39. What We’ve Covered • Data issues can be be a central issue to ML systems are require a lot of up front design thought • There are several modes of deployment, each with their own tradeoffs for different scenarios • Production is not the end of the process for ML models. Metrics are a fundamental part of enabling improvement and growth. • Ways to improve iteration speed on ML projects