The advent of modern deep learning techniques has given organizations new tools to understand, query, and structure their data. However, maintaining complex pipelines, versioning models, and tracking accuracy regressions over time remain ongoing struggles of even the most advanced data engineering teams. This talk presents a simple architecture for deploying machine learning at scale and offer suggestions for how companies can get their feet wet with open source technologies they already deploy.
36. data collection & cleaning
more clean data, the better
1
2 model training & selection
anywhere from 8hrs to 2wks
37. data collection & cleaning
more clean data, the better
1
2 model training & selection
anywhere from 8hrs to 2wks
3 serving in production
with real-time or batch requests
38. data collection & cleaning
more clean data, the better
1
∞
rinse & repeat
keep models fresh with new data
2 model training & selection
anywhere from 8hrs to 2wks
3 serving in production
with real-time or batch requests
51. get your feet wet
TensorFlow MNIST Walkthrough
bit.ly/pavlovtensor
Andrej Karpathy’s CS231n
bit.ly/pavlov231
52. suggested technologies
• Neural Network Libraries
• Caffe & CaffeOnSpark
• TensorFlow
• Torch
• Keras
• Hyperparameter Optimization
• MOE
• hyperopt
• Spearmint
• Infrastructure and Hardware
• Apache Spark & HDFS
• NVIDIA CUDA
• Amazon Web Services G2 instances
such scale
much wow
53. references
• icons by John Caserta, Liau Jian Jie, Garrett Knoll, Luboš Volkov, Noe Araujo from the Noun Project
• images from Andrej Karpathy
• Alex Kern, co-founder & CTO of
• we help you structure image & video w/ deep learning
• @KernCanCode on Twitter • @kern on GitHub
about me
• deep learning is great for many kinds of media
• you can scale a deep learning system on Spark & AWS
• get started @ bit.ly/pavlovtensor & bit.ly/pavlov231
in summary