The ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks like TensorFlow and PyTorch. These libraries are built around the idea of a computational graph that models the dataflow of individual units. Because tensors are their basic computational unit, these frameworks can run efficiently on hardware accelerators (e.g. GPUs).Traditional machine learning (ML) such as linear regressions and decision trees in scikit-learn cannot currently be run on GPUs, missing out on the potential accelerations that deep learning and neural networks enjoy.
In this talk, we’ll show how you can use Hummingbird to achieve 1000x speedup in inferencing on GPUs by converting your traditional ML models to tensor-based models (PyTorch andTVM). https://github.com/microsoft/hummingbird
This talk is for intermediate audiences that use traditional machine learning and want to speedup the time it takes to perform inference with these models. After watching the talk, the audience should be able to use ~5 lines of code to convert their traditional models to tensor-based models to be able to try them out on GPUs.
Outline:
Introduction of what ML inference is (and why it’s different than training)
Motivation: Tensor-based DNN frameworks allow inference on GPU, but “traditional” ML frameworks do not
Why “traditional” ML methods are important
Introduction of what Hummingbirddoes and main benefits
Deep dive on how traditional ML models are built
Brief intro onhow Hummingbird converter works
Example of how Hummingbird can convert a tree model into a tensor-based model
Other models
Demo
Status
Q&A
5. Machine Learning Prediction Serving
1. Models are learned from data
2. Models are deployed and served together
Prediction serving
Users
Server
Data
Training
Model
Learn Deploy
8. Model Serving
Specialized Systems have been developed
Support for traditional ML methods is largely overlooked
Focus: Deep Learning (DL)
9. Traditional ML Models
2019 Kaggle Survey: The State of Data Science & Machine Learning Data Science through the looking glass: https://arxiv.org/abs/1912.09536
10. Problem: Lack of Optimizations for
Traditional ML Serving
Systems for training traditional ML models are not optimized for serving
Traditional ML models are expressed using imperative code in an ad-hoc
fashion, not using a shared logical abstraction
Traditional ML models cannot natively exploit hardware acceleration
11. Tokeniz
er
How do “Traditional ML Models” look inside?
<Example: Binary Classification>
Char
Ngram
Word
Ngram
Conca
t
Logistic
Regressio
n
0 vs 1
Traditional ML Model
A B C D
a 0.1 c 0.5
12. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
13. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Featurizers
14. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Featurizers
Predictor
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
15. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
16. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Split input
into
cat num
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
17. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Normalize
num
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat num One hot
encode cat
18. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Merge two
vectors
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat num
Normalize
num
One hot
encode cat
19. Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Merge two
vectors
Compute
final score
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat num
Normalize
num
One hot
encode cat
21. Primarily relies on the abstraction of tensors
Deep Learning
DL models are expressed as a DAG of tensor operators
w1 b1
X
Mat
Mul
Add ReLU
Mat
Mul
Add Sigmoid
w1 b1
User
Input
22. Systems for DL Prediction Serving
Exploit the abstraction of tensor operations to support multiple DL frameworks
on multiple target environments
Mat
Mul
Add ReLU …
✔ Efficient implementations
✔ Declarative
✔ Seamless hardware acceleration
✔ Reduced engineering efforts
Benefit
s:
24. Converting ML Operators into Tensor Operations
Observation: pipelines are composed of two classes of operators
Algebraic Operations: E.g., Linear Regression
Algorithmic Operations: E.g., RandomForest, OneHotEncoder
Y = wX + b
Complex data access patterns and control-flow patterns!
Introduce redundancies, both computational and storage
Make data access patterns and control flow uniform for all inputs
Our Solution:
Depending on the level of redundancy introduced there can be
more than one potential compilation approach
Hummingbird picks the one that works given pipeline statistics
34. Compiling Decision Tree-based Models
Above approach (GEMM approach) essentially evaluates all paths in a
decision tree model: computation redundancy.
Works surprisingly well on modern hardware for many cases!
Two other tree traversal-based methods that exploit the tree structure.
For tall trees (e.g., LightGBM) For bushy trees (e.g., XGBoost)
36. Tree Traversal Method
Initial: 0
repeat while < max tree depth
Gather
Feature Ids
Featu
re id
X
Gather Feature
value
Gather
Thresholds
Threshold
value
<
Co
nd.
Where
Lefts
Rights
Tr
ue
Fal
se
44. Hummingbird
Updates
• Hummingbird has reached > 21K PyPI
downloads and 2.4k stars
• Demoed at Microsoft Ignite
• Integrated with ONNX converter tools
• OSDI paper
• New features include:
• Pandas Dataframes
• PySparkML support
• TVM support
• Looking for new users/contributors!