Those of us who use TensorFlow often focus on building the model that's most predictive, not the one that's most deployable. So how to put that hard work to work? In this talk, we'll walk through a strategy for taking your machine learning models from Jupyter Notebook into production and beyond.
2. Goals
● Explain some common problems that arise when developing
machine learning products
● Describe some of the solutions that are leveraged to solve those
problems, alongside with their limitations
● Open source a library
3. Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management
4. Background
● BS Biomed Eng/Bioinformatics
● DNA stuff
○ Lots of data (wrangling)
○ Data pipelines
○ Data management
○ Data Science aspirant
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work
5. Background
● Currently
○ ML Engineering @ ByteCubed
○ Some frontend work
○ Putting models in the client’s hands
● Since always
○ Workflow optimization for ML/DS
6. Takeaways
● To make great products
○ Repeatability matters
○ Ease of deployment matters
○ Quick iteration is key
○ 80-20 rule
Ultimately, reduce friction between R&D and the rest of the Software Practice
7. Takeaways
● To make great products
○ Do machine learning like the great engineer you are, not like the great machine learning expert
you aren’t.
[Google’s best practices for ML]
○ Provide DS with the right tooling, and the entire organization will benefit
8. TensorFlow Serving
A high-performance serving system for machine learning models, designed for production environments
Capabilities
● High-performance inference
● Model discovery
● RPC interface (gRPC and REST)
● Much more: tensorflow.org/serving
9. How?
A high-performance serving system for machine learning models, designed for production environments
Capabilities
● Keeping models loaded without needing to restore the dataflow graph and
session
● gRPC interface for high-performance, low-latency inference
● Low-level control on how models are loaded
○ C++ API
10. ● Relies on a C++ API for low-level control
● Undocumented Python API
○ Small subset of features implemented
● gRPC API only
○ REST API (for inference only) introduced in August
Limitations
A high-performance serving system for machine learning models, designed for production environments
11. ● Pythonic API ✅
● JSON + REST ✅
● Test different models, architectures, configurations ✅
● Track models over time ✅
Needs
A high-performance serving system for machine learning models, designed for production environments
12. Current state
Existing solutions
● Clipper.ai → great, but not very active (last commit in June)
● P/SaaS platforms (Algorithmia, Rekognition, etc.)
● Mlflow.org → R&D, tracking of model performance
● Custom solutions of varied quality
Model deployment frameworks
13. Introducing racket
Minimalistic framework for ML model deployment & management
● Access to most of the TensorFlow Serving functionality in a Pythonic way
● Model exploration and deployment, consolidated
14. racket
Motivation
● Automation
○ Quick iteration
● Reduce boilerplate
○ Versioning & serving
● Exploration & visibility
○ Loading a different version
○ Ability to track & query model performance
● Flexibility
○ Multiple models can be loaded and are accessible through a simple API
15. racket
Features
● Automated model version
● RESTful interface with rich capabilities
○ Loading a different model
○ Loading a different version
○ Ability to track & query model performance
● Automatic API documentation with Swagger
● Train, explore, and deploy with a single tool
● CLI access
● Static typing
16. ML For Humans
Enabling integration with non-ml experts:
Developer perspective
● REST access
○ Ease of integration
● Ease of deployment
○ Containers not just an afterthought
● Visibility: discover shapes of inputs
needed, shapes of outputs
+
17. ML For Humans
Enabling integration with non-ml experts:
Data Scientist perspective
from racket import KerasLearner
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import TensorBoard
class KerasModel(KerasLearner):
VERSION = '1.1.1'
MODEL_TYPE = 'regression'
MODEL_NAME = 'keras-simple-regression'
def build_model(self):
optimizer = tf.train.RMSPropOptimizer(0.001)
model = Sequential()
model.add(Dense(24, input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(48, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_absolute_error', optimizer=optimizer)
return model
def fit(self, x, y, x_val=None, y_val=None, epochs=2, batch_size=20):
self.model.fit(x, y, epochs=epochs, verbose=0, validation_data=(x_val, y_val))
if __name__ == '__main__':
X_train, X_test, y_train, y_test = train_test_split(x, y)
kf = KerasModel()
kf.fit(X_train, y_train, x_val=X_test, y_val=y_test)
kf.store(autoload=True)
● Define model, rest is taken care of
● Scoring done automatically
○ Including user-defined/multiple
metrics
● Access to previous runs’ data
● Multiple models & versions