The structure of a Machine Learning code base can have a large impact on effective collaboration and time to production.
In this talk I will present our solution developed for the FutureOps Matching Automation project and talk about lessons learned and best practices.
2. Jonas Hausruckinger
HOW TO STRUCTURE
YOUR MACHINE
LEARNING CODE
EFFECTIVELY
Data Scientist – GfK
Erlangen AI & ML Meetup – September 24th 2019
3. A little bit of context
• Working at GfK as a Data Scientist since 2015
• Main project: FutureOps Matching Automation
• ML models in production:
• Product group detection & Item proposals (2016)
• Brand detection (2017)
• Auto-matching (2018)
4. Why should I care?
How do we get from here… … to here?
6. Bad news first: there is no silver bullet!
• ML coding best practices have not stabilized (yet?)
• Some good examples are out there
(AllenNLP, pytext, mlflow, skorch, Palladium, …)
• Tradeoff between simplicity and generalizability!
7. Differences between „regular“ and ML code (examples)
• Mathematical concepts and notation
• Often involves randomness (use seeds!)
• Hard to test effectively (requires training/test data, difficult to verify
success, lots of edge cases)
8. Think about your requirements
• How many models do you want to deploy?
• Do you need automatic model retraining?
• What infrastructure will your models run on?
• …
9. • Doubles the costs of all
further updates
• Requires extensive testing to
ensure correctness
Avoid rewrites!
https://news.artnet.com/app/news-upload/2014/04/Ecce-Homo.jpg
10. Instead:
Use one model
pipeline for research
AND production Developer Data Scientist
https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/images-jpg-png/what-we-
do/worldwide/worldwide-oman-slide4.jpg.img.1920.medium.jpg
12. Quick reminder: Programming principles
• Keep it simple, stupid (KISS)
• You aren‘t gonna need it (YAGNI)
• Don‘t repeat yourself (DRY)
13. The goal: a clear path from research to production
• Every team is different: act according to your strengths!
• Production code should be owned by the whole team
àmore eyes = better code & lower risk of bugs
àenable team members to contribute!
• Define workflow (e.g. git-based) and responsibilities
14. Software architecture:
Thinking in layers
Science
core
Service wrapper
DevOps
infrastructure
build
deploy
train
predict
Flow of control
• Clear interfaces between
layers
• Inner circles should not have
dependencies to outer
circles!
15. Partition A
Partition B
Partition C
Partitioner
Model A
Model B
Model C
Dataset
ModelStore
Our solution: goo framework
[See demo notebook]
17. Key takeaways
• Use your available time wisely (research vs. putting into production)
• Look for references (e.g. Google papers, Open Source frameworks)
• Work together to find the best solution for your project
21. To learn more about the meetup, click the Link
https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup
Erlangen
Artificial Intelligence &
Machine Learning Meetup