The popular version of applying deep learning is that you take an open-source or research model, train it on raw data and deploy the resulting model as a fully self-contained artefact. However, the reality is far more complex. For the training phase, users face an array of challenges including handling varied deep learning frameworks, hardware requirements and configurations, not to mention code quality, consistency and packaging. For the deployment phase, they face another set of challenges ranging from custom requirements for data pre- and post-processing, to inconsistencies across frameworks, to lack of standardization in serving APIs. The goal of the IBM Code Model Asset eXchange (MAX) is to remove these barriers to entry for developers to obtain, train and deploy open-source deep learning models for their enterprise applications. In building the exchange, we encountered all these challenges and more.
For the training phase, we aim to leverage the Fabric for Deep Learning (FfDL: https://github.com/IBM/FfDL), an open-source project providing framework-independent training of deep learning models on Kubernetes. For the deployment phase, MAX provides container-based, fully self-contained model artefacts, encompassing the end-to-end deep learning predictive pipeline and exposing a standardized REST API.
This talk explores the process of building MAX, the challenges and problems encountered, the solutions developed, the lessons learned along the way and the future and best practices for cross-framework, standardized deep learning model training and deployment.