A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
2. 7000+ technologists with 43 offices in 14 countries
Partner for technology driven business transformation
Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
15. NEW CHALLENGES
15
→ How to version control training data?
→ Training data and prediction models don’t fit into Git :-(
→ Model re-training slows down the entire continuous delivery server!
→ Data scientists want to evaluate several solutions at the same time...
→ ...and they use analytics notebooks which are hard to version control!
→ How to unit test data science code that is tied to changing data?
→ How to prevent behaviour changes of the model to break the application?
17. Continuous Delivery is the ability to get
changes of all types — including new features,
configuration changes, bug fixes and
experiments — into production, or into the
hands of users, safely and quickly in a
sustainable way.
Jez Humble & Dave Farley