SlideShare a Scribd company logo
1 of 21
Download to read offline
Erlangen
Artificial Intelligence &
Machine Learning Meetup
presents
Jonas Hausruckinger
HOW TO STRUCTURE
YOUR MACHINE
LEARNING CODE
EFFECTIVELY
Data Scientist – GfK
Erlangen AI & ML Meetup – September 24th 2019
A little bit of context
• Working at GfK as a Data Scientist since 2015
• Main project: FutureOps Matching Automation
• ML models in production:
• Product group detection & Item proposals (2016)
• Brand detection (2017)
• Auto-matching (2018)
Why should I care?
How do we get from here… … to here?
Chapter One
Where to start?
Bad news first: there is no silver bullet!
• ML coding best practices have not stabilized (yet?)
• Some good examples are out there
(AllenNLP, pytext, mlflow, skorch, Palladium, …)
• Tradeoff between simplicity and generalizability!
Differences between „regular“ and ML code (examples)
• Mathematical concepts and notation
• Often involves randomness (use seeds!)
• Hard to test effectively (requires training/test data, difficult to verify
success, lots of edge cases)
Think about your requirements
• How many models do you want to deploy?
• Do you need automatic model retraining?
• What infrastructure will your models run on?
• …
• Doubles the costs of all
further updates
• Requires extensive testing to
ensure correctness
Avoid rewrites!
https://news.artnet.com/app/news-upload/2014/04/Ecce-Homo.jpg
Instead:
Use one model
pipeline for research
AND production Developer Data Scientist
https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/images-jpg-png/what-we-
do/worldwide/worldwide-oman-slide4.jpg.img.1920.medium.jpg
Chapter Two
How to proceed?
Quick reminder: Programming principles
• Keep it simple, stupid (KISS)
• You aren‘t gonna need it (YAGNI)
• Don‘t repeat yourself (DRY)
The goal: a clear path from research to production
• Every team is different: act according to your strengths!
• Production code should be owned by the whole team
àmore eyes = better code & lower risk of bugs
àenable team members to contribute!
• Define workflow (e.g. git-based) and responsibilities
Software architecture:
Thinking in layers
Science
core
Service wrapper
DevOps
infrastructure
build
deploy
train
predict
Flow of control
• Clear interfaces between
layers
• Inner circles should not have
dependencies to outer
circles!
Partition A
Partition B
Partition C
Partitioner
Model A
Model B
Model C
Dataset
ModelStore
Our solution: goo framework
[See demo notebook]
Chapter Three
Final thoughts
Key takeaways
• Use your available time wisely (research vs. putting into production)
• Look for references (e.g. Google papers, Open Source frameworks)
• Work together to find the best solution for your project
TH
AN
KS
Best practices for structuring Machine Learning code
Appendix: References
• https://research.google.com/pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
• https://research.google.com/pubs/archive/46178.pdf
• https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-
AL4ffI/preview
• https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69383
• https://github.com/allenai/allennlp
• https://github.com/mlflow/mlflow/
• https://github.com/ottogroup/palladium
To learn more about the meetup, click the Link
https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup
Erlangen
Artificial Intelligence &
Machine Learning Meetup

More Related Content

What's hot

ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and ToolsJorge Davila-Chacon
 
Nasscom ml ops webinar
Nasscom ml ops webinarNasscom ml ops webinar
Nasscom ml ops webinarSameer Mahajan
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?WSO2
 
API Readiness: Visualizing and Virtualizing
API Readiness: Visualizing and VirtualizingAPI Readiness: Visualizing and Virtualizing
API Readiness: Visualizing and VirtualizingLorinda Brandon
 
Build Better Apps through API Virtualization
Build Better Apps through API VirtualizationBuild Better Apps through API Virtualization
Build Better Apps through API VirtualizationLorinda Brandon
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsGianmario Spacagna
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowDatabricks
 
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoPatrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoSri Ambati
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningAnand Sampat
 
Resume_FengWang
Resume_FengWangResume_FengWang
Resume_FengWang峰 王
 
Making your app - from the idea to your hand
Making your app - from the idea to your handMaking your app - from the idea to your hand
Making your app - from the idea to your handinmediatum.com
 
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan LippsMyth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan LippsApplitools
 

What's hot (20)

ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
Nasscom ml ops webinar
Nasscom ml ops webinarNasscom ml ops webinar
Nasscom ml ops webinar
 
Fundamental MLOps
Fundamental MLOpsFundamental MLOps
Fundamental MLOps
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?
WSO2Con EU 2015: API Readiness: Is Your API Ready for Primetime?
 
API Readiness: Visualizing and Virtualizing
API Readiness: Visualizing and VirtualizingAPI Readiness: Visualizing and Virtualizing
API Readiness: Visualizing and Virtualizing
 
Build Better Apps through API Virtualization
Build Better Apps through API VirtualizationBuild Better Apps through API Virtualization
Build Better Apps through API Virtualization
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
 
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoPatrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
 
Api First Design
Api First DesignApi First Design
Api First Design
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
 
CV
CVCV
CV
 
Polyglot engineering
Polyglot engineeringPolyglot engineering
Polyglot engineering
 
Resume_FengWang
Resume_FengWangResume_FengWang
Resume_FengWang
 
Making your app - from the idea to your hand
Making your app - from the idea to your handMaking your app - from the idea to your hand
Making your app - from the idea to your hand
 
Api readiness ss
Api readiness ssApi readiness ss
Api readiness ss
 
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan LippsMyth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
 

Similar to Best practices for structuring Machine Learning code

Dev Learn Handout - Session 604
Dev Learn Handout - Session 604Dev Learn Handout - Session 604
Dev Learn Handout - Session 604Chad Udell
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 
What would Jesus Developer do?
What would Jesus Developer do?What would Jesus Developer do?
What would Jesus Developer do?Lukáš Čech
 
What the heck is Eclipse Modeling and why should you care !
What the heck is Eclipse Modeling and why should you care !What the heck is Eclipse Modeling and why should you care !
What the heck is Eclipse Modeling and why should you care !Cédric Brun
 
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - Lecture
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - LectureSurvive the Chaos - S4H151 - SAP TechED Barcelona 2017 - Lecture
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - LectureRainer Winkler
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETAnant Corporation
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence WorkshopDavid Tan
 
Agile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovAgile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovSvetlin Nakov
 
Software Engineering in Startups
Software Engineering in StartupsSoftware Engineering in Startups
Software Engineering in StartupsDusan Omercevic
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 

Similar to Best practices for structuring Machine Learning code (20)

Dev Learn Handout - Session 604
Dev Learn Handout - Session 604Dev Learn Handout - Session 604
Dev Learn Handout - Session 604
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
What would Jesus Developer do?
What would Jesus Developer do?What would Jesus Developer do?
What would Jesus Developer do?
 
What the heck is Eclipse Modeling and why should you care !
What the heck is Eclipse Modeling and why should you care !What the heck is Eclipse Modeling and why should you care !
What the heck is Eclipse Modeling and why should you care !
 
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - Lecture
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - LectureSurvive the Chaos - S4H151 - SAP TechED Barcelona 2017 - Lecture
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - Lecture
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NET
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
 
Agile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin NakovAgile Methodologies And Extreme Programming - Svetlin Nakov
Agile Methodologies And Extreme Programming - Svetlin Nakov
 
Software Engineering in Startups
Software Engineering in StartupsSoftware Engineering in Startups
Software Engineering in Startups
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons Learned
 
Shuzworld Analysis
Shuzworld AnalysisShuzworld Analysis
Shuzworld Analysis
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
From open source labs to ceo methods and advice by sysfera
From open source labs to ceo methods and advice by sysferaFrom open source labs to ceo methods and advice by sysfera
From open source labs to ceo methods and advice by sysfera
 

More from Erlangen Artificial Intelligence & Machine Learning Meetup (6)

NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
 
Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial IntelligenceKnowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
 
Ai for cultural history
Ai for cultural historyAi for cultural history
Ai for cultural history
 

Recently uploaded

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Recently uploaded (20)

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

Best practices for structuring Machine Learning code

  • 2. Jonas Hausruckinger HOW TO STRUCTURE YOUR MACHINE LEARNING CODE EFFECTIVELY Data Scientist – GfK Erlangen AI & ML Meetup – September 24th 2019
  • 3. A little bit of context • Working at GfK as a Data Scientist since 2015 • Main project: FutureOps Matching Automation • ML models in production: • Product group detection & Item proposals (2016) • Brand detection (2017) • Auto-matching (2018)
  • 4. Why should I care? How do we get from here… … to here?
  • 6. Bad news first: there is no silver bullet! • ML coding best practices have not stabilized (yet?) • Some good examples are out there (AllenNLP, pytext, mlflow, skorch, Palladium, …) • Tradeoff between simplicity and generalizability!
  • 7. Differences between „regular“ and ML code (examples) • Mathematical concepts and notation • Often involves randomness (use seeds!) • Hard to test effectively (requires training/test data, difficult to verify success, lots of edge cases)
  • 8. Think about your requirements • How many models do you want to deploy? • Do you need automatic model retraining? • What infrastructure will your models run on? • …
  • 9. • Doubles the costs of all further updates • Requires extensive testing to ensure correctness Avoid rewrites! https://news.artnet.com/app/news-upload/2014/04/Ecce-Homo.jpg
  • 10. Instead: Use one model pipeline for research AND production Developer Data Scientist https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/images-jpg-png/what-we- do/worldwide/worldwide-oman-slide4.jpg.img.1920.medium.jpg
  • 11. Chapter Two How to proceed?
  • 12. Quick reminder: Programming principles • Keep it simple, stupid (KISS) • You aren‘t gonna need it (YAGNI) • Don‘t repeat yourself (DRY)
  • 13. The goal: a clear path from research to production • Every team is different: act according to your strengths! • Production code should be owned by the whole team àmore eyes = better code & lower risk of bugs àenable team members to contribute! • Define workflow (e.g. git-based) and responsibilities
  • 14. Software architecture: Thinking in layers Science core Service wrapper DevOps infrastructure build deploy train predict Flow of control • Clear interfaces between layers • Inner circles should not have dependencies to outer circles!
  • 15. Partition A Partition B Partition C Partitioner Model A Model B Model C Dataset ModelStore Our solution: goo framework [See demo notebook]
  • 17. Key takeaways • Use your available time wisely (research vs. putting into production) • Look for references (e.g. Google papers, Open Source frameworks) • Work together to find the best solution for your project
  • 20. Appendix: References • https://research.google.com/pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf • https://research.google.com/pubs/archive/46178.pdf • https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP- AL4ffI/preview • https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/detail/69383 • https://github.com/allenai/allennlp • https://github.com/mlflow/mlflow/ • https://github.com/ottogroup/palladium
  • 21. To learn more about the meetup, click the Link https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup Erlangen Artificial Intelligence & Machine Learning Meetup