Weitere ähnliche Inhalte Ähnlich wie Build, train and deploy your ML models with Amazon Sage Maker (20) Mehr von AWS User Group Bengaluru (20) Kürzlich hochgeladen (20) Build, train and deploy your ML models with Amazon Sage Maker1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BENGALURU
2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build, Train and Deploy your ML
models with Amazon SageMaker
Sangeetha Krishnan, Member of Technical Staff | Adobe
3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ABOUT MYSELF
● Software Development Engineer at Adobe Systems.
● Part of CloudTech & Adobe I/O Events development
team.
● Areas of interest:
○ Machine Learning
○ Natural Language Processing
○ Music + Travel!
4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHY AMAZON SAGEMAKER?
★ It is a fully managed machine learning service
★ Very quick and easy to build, train and deploy your ML models
★ Integrated Jupyter notebook
★ Several built-in machine learning algorithms provided by Amazon
Sagemaker
★ Capability to automatically tune the machine learning models to
generate the best solution
★ Easy deploy to production
★ Automatic scaling for production variants
5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AGENDA
Image caption 1 Image caption 2 Image caption 3
Image caption 4 Image caption 5 Image caption 6
Getting started
with Amazon
SageMaker
Built-in ML
Algorithms
Hyper-
parameter
tuning
Accessing the
model
endpoints
Blue/Green
Deployments
Security and
Best Practices
6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started with Amazon
SageMaker
7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage and
deployment
dependencies:
1. S3 bucket
2. EC2 Container
Registry
3. Notebook
instance
8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Setting up the prerequisites and Notebook instance
● Create an S3 bucket ( preferable to have name of S3 starting with
sagemaker-*)
● Create a notebook instance:
https://console.aws.amazon.com/sagemaker/
● Create an IAM role give access to the S3 bucket created
● Create a Jupyter notebook
9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SMS Spam Detection
● Problem Statement: Given an SMS Text, determine if the message is a spam
or ham ( non-spam )
● Kaggle dataset: https://www.kaggle.com/uciml/sms-spam-collection-dataset
● The first column is the label (spam/ham), the second column in the dataset is
the SMS text.
● 5574 messages in the dataset. 87% ham (4850) , 13% spam (724)
10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Built-in Machine Learning
Algorithms
11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Different Categories of Algorithms
Supervised
Learning
Unsupervised
Learning
Image and Object
Focused
Text Related
● DeepAR Forecasting
● Factorisation
Machine
● Linear Learner
● XGBoost
● K Means Algorithm
● K Nearest
Neighbours
● PCA
● Random Cut Forest
● Image Classification
algorithm that uses
CNN
● Object Detection
Algorithm
● Blazing Text
● LDA
● Sequence2Sequenc
e
● Neural Topic Model
12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Problem categories
Factorization Machines
● Recommendation
Systems
● Ad-click predictions
XGBoost
● Fraud predictions
DeepAR Forecasting
● Traffic, electricity,
pageviews
Random Cut Forests
● Detecting anomalous
data points
K-Means algorithm and KNN
● Document clustering
● Identifying related articles
Image and Object Detection
and Classification
BlazingText and
Seq2Seq
● Sentiment Analysis
● Named Entity
Recognition
● Machine Translation
13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of using Built-in Algorithms
● Designed for solving issues in a commercial setting
● General common purpose algorithms
● Designed for training on huge datasets
● Can support terabytes of data
● Greater reliability
● Faster training and streaming of the datasets
● Training on multiple instances that share their state
14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BlazingText Algorithm
● Provides highly optimized implementations of the Word2vec and text
classification algorithms.
● Is used in problems like text classification, named entity recognition, machine
translation, etc.
● Word2Vec generate word embeddings
● Ability to generate meaningful vectors for out of vocabulary words
● Provide semantic relationship between words
● Useful for many NLP problems
● Can be trained for huge datasets in a couple of minutes
● Supports multi core CPU and single GPU modes for the purpose of training
15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data format
● Supervised Learning
● Binary classification
Text Classification:
● Training and Validation set (Text file)
__label__spam sms_text1
__label__ham sms_text2
● Test Data (Json request)
{"instances":["sms_text_1", “sms_text_2”,... ]}
16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameter Tuning
17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tunable parameters for BlazingText Classification Model
buckets [1000000, 10000000]
epochs [5, 15]
learning_rate [0.005, 0.05]
min_count [0, 100]
mode [‘supervised’]
vector_dim [32,300]
word_ngrams [1,3]
18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
bt_model = sagemaker.estimator.Estimator(container,
role,
train_instance_count=1,
train_instance_type='ml.m4.xlarge',
train_volume_size = 30,
train_max_run = 360000,
input_mode= 'File',
output_path=s3_output_location,
sagemaker_session=sess)
bt_model.set_hyperparameters(mode="supervised",
epochs=10,
min_count=2,
early_stopping=True,
patience=4,
min_epochs=5)
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter
hyperparameter_ranges = {'learning_rate': ContinuousParameter(0.01, 0.05),
'vector_dim': IntegerParameter(20, 50),
'word_ngrams': IntegerParameter(1, 4)}
19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
objective_metric_name = 'validation:accuracy'
objective_type='Maximize'
tuner = HyperparameterTuner(estimator=bt_model,
objective_metric_name=objective_metric_name,
objective_type=objective_type,
hyperparameter_ranges=hyperparameter_ranges,
max_jobs=4,
max_parallel_jobs=2
)
20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameter Tuning Jobs
blazingtext-181002-
1157-001-073948fb
blazingtext-181002-
1157-002-fba7f0b8
blazingtext-181002-
1157-003-f7407aa1
blazingtext-181002-
1157-004-3a4b4863
21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Maximize/Minimize Objective Metric
● In this example, the objective metric is to maximize the 'validation:accuracy'
22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accessing the Model Endpoints
23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accessing the Model Endpoints via Postman
● Get the invocation endpoint from the model details in Amazon
Sagemaker Console
24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accessing the Model Endpoints via Postman
● Generate the access key ID and secret access key from your security
credentials. This will be used in creating the Authorization token in postman
25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accessing the Model Endpoints via Lambda Functions
Setting up lambda function to
access the SageMaker
endpoint and return the
response
Setting up the API
Gateway that
1. Accepts client
request
2. Forwards the
request parameters
to the lambda
function and waits
for the response
3. Forwards the
response to the
client
26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
import os
import io
import boto3
import json
import csv
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime = boto3.Session().client(service_name='sagemaker-runtime',region_name='ap-southeast-2')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
payload = "{"instances" : [""+event+""]}"
print(payload)
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, Body=payload)
predictions = response['Body'].read().decode('utf8')
print(predictions)
return predictions
Lambda Function
27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API Gateway Configuration
Curl request:
curl -X POST
https://los599anje.execute-api.ap-
southeast-
2.amazonaws.com/test/spamdetection
-d '"Hello from Airtel. For 1 months free
access call 9113851022"'
Response:
"[{"prob": [0.880291223526001],
"label": ["__label__spam"]}]"%
28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blue/Green Deployments using
Amazon SageMaker
29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deploying multiple models to the same endpoint
● Particularly useful in blue/green deployments
● Blue deployment -> current deployment
● Green deployment -> the new model that is to be tested in production
● We can do this by diverting a small amount of traffic to the green
deployment. We achieve this using ProductionVariant in SageMaker.
● Easy rollback to the blue deployment if the green one fails.
30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Steps in Blue/Green Deployment
User
SageMaker
Endpoint
A
100%
User
SageMaker
Endpoint
A
90%
User
SageMaker
Endpoint
100%
B
10%
B
User
SageMaker
Endpoint
A
0%
B
100%
31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Switching to the new ModelProduction Variant Deployment
32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Invocation Metrics
33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security and Best Practices
34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment Recommendations
● Deploy multiple instances for each production endpoint
● Deploy in VPC with more than one subnets with multiple
availability regions
35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security Practices
● Specify Virtual Private Cloud (VPC) for your notebook instance with
outbound connections via Network Address Translation (NAT)
● Exercise judgement when granting individuals access to notebook
instances that are attached to a VPC that contains sensitive
information.
36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Securing Training jobs and Endpoints
● Run Training jobs in a private VPC.
● Create a VPC endpoint to access S3
● Configure custom policy that allow access to S3 only from your private
VPC
● If you want to deny access to certain resources, add to the custom
policy
● Configure rule for the security group to allow inbound communication
between other members in the same security group
● Configure a NAT gateway that allows only outbound connections from
the private VPC.
37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BENGALURU
THANK YOU