Weitere ähnliche Inhalte Ähnlich wie Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv Summit 2018 (20) Mehr von Amazon Web Services (20) Speed up your Machine Learning workflows with built-in algorithms - Tel Aviv Summit 20181. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Julien Simon
Principal AI/ML Evangelist, Amazon Web Services
Speed up your Machine Learning
workflows with built-in algorithms
@julsimon
2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
One-click training for
ML, DL, and custom
algorithms
Easier training with
hyperparameter
optimization
Highly-optimized
machine learning
algorithms
Deployment
without engineering
effort
Fully-managed
hosting at scale
Build
Pre-built notebook
instances
Deploy
Train
Amazon SageMaker
3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Training
code
• Matrix Factorization
• Regression
• Principal Component Analysis
• K-Means Clustering
• Gradient Boosted Trees
• And More!
Amazon provided Algorithms
Bring Your Own Container
Amazon SageMaker: model options
Bring Your Own Script
IM Estimators in
Apache Spark
4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming datasets, for
cheaper training
Train faster, in a single
pass
Greater reliability on
extremely large
datasets
Choice of several ML
algorithms
Amazon SageMaker: 10x better algorithms
5. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Infinitely scalable algorithms
6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming
GPU State
7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Streaming
Data Size
Memory
Data Size
Time/Cost
8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Distributed
GPU State
GPU State
GPU State
9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Shared State
GPU
GPU
GPU Local
State
Shared
State
Local
State
Local
State
10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Cost vs. Time
$$$$
$$$
$$
$
Minutes Hours Days Weeks Months
Best Alternative
Amazon SageMaker
11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Linear Learner
Regression (mean squared error)
SageMaker Other
1.02 1.06
1.09 1.02
0.332 0.183
0.086 0.129
83.3 84.5
Classification (F1 Score)
SageMaker Other
0.980 0.981
0.870 0.930
0.997 0.997
0.978 0.964
0.914 0.859
0.470 0.472
0.903 0.908
0.508 0.508
30 GB datasets for web-spam and web-url classification
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30
CostinDollars
Billable time in Minutes
sagemaker-url sagemaker-spam other-url other-spam
12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Factorization Machines
Log_loss F1 Score Seconds
SageMaker 0.494 0.277 820
Other (10 Iter) 0.516 0.190 650
Other (20 Iter) 0.507 0.254 1300
Other (50 Iter) 0.481 0.313 3250
Click Prediction 1 TB advertising dataset,
m4.4xlarge machines, perfect scaling.
$-
$20.00
$40.00
$60.00
$80.00
$100.00
$120.00
$140.00
$160.00
$180.00
$200.00
1 2 3 4 5 6 7 8CostinDollars
Billable Time in Hours
10
machines
20
machines
30
machines
4050
13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: building a movie recommender with
Factorization Machines
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / b u i l d i n g - a - m o v i e - r e c o m m e n d e r - w i t h - f a c t o r i z a t i o n -
m a c h i n e s - o n - a m a z o n - s a g e m a k e r - c e d b f c 8 c 9 3 d 8
14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
0
1
2
3
4
5
6
7
8
10 100 500
BillableTimeinMinutes Number of Clusters
sagemaker other
K-Means Clustering
k SageMaker Other
Text
1.2GB
10 1.18E3 1.18E3
100 1.00E3 9.77E2
500 9.18.E2 9.03E2
Images
9GB
10 3.29E2 3.28E2
100 2.72E2 2.71E2
500 2.17E2 Failed
Videos
27GB
10 2.19E2 2.18E2
100 2.03E2 2.02E2
500 1.86E2 1.85E2
Advertising
127GB
10 1.72E7 Failed
100 1.30E7 Failed
500 1.03E7 Failed
Synthetic
1100GB
10 3.81E7 Failed
100 3.51E7 Failed
500 2.81E7 Failed
Running Time vs. Number of Clusters
~10x Faster!
15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Principal Component Analysis (PCA)
More than 10x faster
at a fraction the cost!
0.00
20.00
40.00
60.00
80.00
100.00
120.00
8 10 20
Mb/Sec/Machine
Number of Machines
other sagemaker-deterministic sagemaker-randomized
Cost vs. Time Throughput and Scalability
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 10 20 30 40 50
CostinDollars
Billable time in Minutes
other sagemaker-deterministic sagemaker-randomized
16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Neural Topic Modeling
Perplexity vs. Number of Topic
Encoder: feedforward net
Input term counts vector
Document
Posterior
Sampled Document
Representation
Decoder:
Softmax
Output term counts vector
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
Perplexity
Number of Topics
NTM Other
17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
DeepAR: Time Series Forecasting
Mean absolute
percentage error
P90 Loss
DeepAR R DeepAR R
traffic
Hourly occupancy rate of 963
Bay Area freeways
0.14 0.27 0.13 0.24
electricity
Electricity use of 370 homes
over time
0.07 0.11 0.08 0.09
pageviews
Page view hits of
websites
10k 0.32 0.32 0.44 0.31
180k 0.32 0.34 0.29 NA
One hour on p2.xlarge, $1
Input
Network
18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
DeepAR
https://arxiv.org/abs/1704.04110
19. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: predicting world temperature
with DeepAR
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / p r e d i c t i n g - w o r l d - t e m p e r a t u r e - w i t h - t i m e - s e r i e s -
a n d - d e e p a r - o n - a m a z o n - s a g e m a k e r - e 3 7 1 c f 9 4 d d b 5
20. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
More built-in algorithms
21. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Spectral LDA
Training Time vs. Number of Topics
0
50
100
150
200
250
0 20 40 60 80 100TrainingTimeinMinutes
Number of Topics
lda-data-a lda-data-b other-data-a other-data-b
22. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Boosted Decision Trees
Throughput vs. Number of Machines
XGBoost is one of the most
commonly used classifiers.
0
200
400
600
800
1000
1200
1400
0 10 20 30 40 50 60 70
ThroughputinMB/Sec
Number of Machines (C4.8xLarge)
https://github.com/dmlc/xgboost
23. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Sequence to Sequence
English-German Translation
0
5
10
15
20
25
0 5 10 15 20 25
BLEUScore
Billable Time in Hours
P2.16x P2.8x P2.x
Best known result!
• Based on Sockeye
and Apache MXNet.
• Multi-GPU.
• Can be used for Neural
Machine Translation.
• Supports both RNN/CNN as
encoder/decoder
24. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
https://arxiv.org/abs/1712.05690
https://github.com/awslabs/sockeye
Sockeye
25. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Image Classification
• ResNet implementation
with Apache MXNet.
• More networks to come.
• Transfer learning: begin
with a model already
trained on ImageNet!
0
0.5
1
1.5
2
2.5
3
3.5
0 1 2 3 4 5
Speedup
Number of Machines (P2)
Linear Speedup with Horizontal Scaling
26. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Demo: fine-tuning an image classification
model
h t t p s : / / m e d i u m . c o m / @ j u l s i m o n / i m a g e - c l a s s i f i c a t i o n - o n - a m a z o n - s a g e m a k e r -
9 b 6 6 1 9 3 c 8 b 5 4
27. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Latest addition: Blazing Text
https://dl.acm.org/citation.cfm?id=3146354
28. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Resources
https://aws.amazon.com/machine-learning
https://aws.amazon.com/blogs/ai
https://aws.amazon.com/sagemaker (free tier available)
https://github.com/awslabs/amazon-sagemaker-examples
An overview of Amazon SageMaker https://www.youtube.com/watch?v=ym7NEYEx9x4
https://medium.com/@julsimon
29. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank you!
Julien Simon
Principal AI/ML Evangelist, Amazon Web Services
@julsimon