SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Machine Learning inference
pipelines at scale
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem statement
• Real-life Machine Learning applications require more than predicting with a
single model.
• Data may need pre-processing: normalization, feature engineering,
dimensionality reduction, etc.
• Predictions may need post-processing: filtering, sorting, combining, etc.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Build and deploy ML pipelines with minimal infrastructure drama!
1. Spark (on Amazon EMR)
2. Spark + Amazon SageMaker
3. Amazon SageMaker, aka Inference Pipelines
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Spark
• Open-source, distributed processing system.
• In-memory caching and optimized execution for fast performance (typically
100x faster than Hadoop).
• Batch processing, streaming analytics, machine learning, graph databases
and ad hoc queries.
• API for Java, Scala, Python, R, and SQL.
https://spark.apache.org/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Spark – DataFrame
• Distributed collection of data organized into named columns
• Conceptually equivalent to a table in a relational database
• Wide array of sources: structured files, databases
• Wide array of formats: text, CSV, JSON, Avro, ORC, Parquet
{"name": "Jeff"}
{"name": "Boaz", "age":72}
{"name": "Julien", "age":12}
df = spark.read.json("people.json")
df.show()
+----+-------+
| age| name |
+----+-------+
|null| Jeff |
| 72 | Boaz |
| 12 | Julien|
+----+-------+
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLlib – Machine learning library
• Algorithms: classification, regression, clustering, collaborative filtering.
• Featurization: feature extraction, transformation, dimensionality reduction.
• Tools for constructing, evaluating and tuning pipelines
• Transformer – a transform function that maps a DataFrame into a new one
• Adding a column, changing the rows of a specific column, etc.
• Predicting the label based on the feature vector
• Estimator – an algorithm that trains on data
• Consists of a fit() function that maps a DataFrame into a Model
https://spark.apache.org/docs/latest/ml-guide.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: binary classification for text samples
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala
// Prepare training documents from a list of (id, text, label) tuples.
val training = <LOAD_TRAINING_DATA>
// Configure an ML pipeline with three stages: tokenizer, hashingTF, and lr.
val tokenizer = new Tokenizer().setInputCol("text”).setOutputCol("words")
val hashingTF = new HashingTF()
.setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001)
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr))
// Fit the pipeline to training documents.
val model = pipeline.fit(training)
// Prepare test documents, which are unlabeled (id, text) tuples.
val test = <LOAD_TEST_DATA>
// Make predictions on test documents.
model.transform(test)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serving SparkML predictions
• Train and predict in the same application (see previous example)
• Or save the model and load it in another Spark application
• Or export the model to PMML and load it elsewhere (Java, R, etc.)
• Or export the model to MLeap
• http://mleap-docs.combust.ml/
• Lightweight runtime independent from Spark
• Interoperability between SparkML,
TensorFlow and scikit-learn
In any case, you need to build and
maintain prediction infrastructure :-/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker:
Build, Train, and Deploy ML Models at Scale
Collect and prepare
training data
Choose and optimize
your
ML algorithm
Train and
Tune ML Models
Set up and
manage
environments
for training
Deploy models
in production
Scale and manage
the production
environment
1
2
3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker SDK for Spark
• Python and Scala SDK, for Apache Spark 2.1.1 and 2.2.
• Pre-installed on EMR 5.11 and later.
• Train, import, deploy and predict with SageMaker models directly from your
Spark application.
• Standalone,
• Integration in Spark MLlib pipelines.
• DataFrames in, DataFrames out:
automatic data conversion to and from protobuf (crowd goes wild!)
https://github.com/aws/sagemaker-spark
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reason #1 - Decouple ETL and Machine Learning
• Different workloads require different instance types
• Say, R4 for ETL, P3 for training and C5 for prediction?
• If you need GPUs for training, running ETL on GPU instances wouldn’t be the
best option…
• Size and scale them independently
• Avoid oversizing your Spark cluster.
• Avoid time-consuming resizing operations on Amazon EMR.
• Run ETL once, train many models in parallel.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reason #2 - Run any ML algorithm in any language
• Spark MLlib is great, but you may need something else
• Other ML algorithms
• Deep Learning libraries, like TensorFlow or Apache MXNet
• Your own custom code in any language
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reason #3 - Get the best prediction performance
• Perform ML predictions without using Spark.
• Save the overhead of the Spark framework
• Save loading your data in a DataFrame
• Amazon SageMaker can deploy MLeap models
• Improve latency for small-batch predictions.
• It can be difficult to achieve low-latency predictions with Spark ML models
• Get real-time predictions with models hosted in Amazon SageMaker
• Use optimized instances for prediction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sample use cases for Spark and SageMaker
• Data preparation and feature engineering + training
• Data transformation + batch prediction (model reuse)
• Data cleaning/enrichment with predictions
• Predict missing values instead of using median.
• Add new predicted features.
• Deploying a SparkML model at scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Inference Pipelines
• Linear sequence of 2-5 containers that process inference requests
• Feature engineering with scikit-learn or SparkML
(on AWS Glue or Amazon EMR)
• Predict with built-in or custom containers
• The pipeline is deployed as a single model
• Useful to preprocess, predict, and post-process
• Available for real-time prediction and batch transform
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started
https://ml.aws
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
https://medium.com/@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon

Weitere ähnliche Inhalte

Was ist angesagt?

AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019
AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019
AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019Boaz Ziniman
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
Progetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWSProgetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWSAmazon Web Services
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleAmazon Web Services
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Amazon Web Services
 
Best-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSBest-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSAmazon Web Services
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS SummitAmazon Web Services
 
Accelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAccelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAmazon Web Services
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Amazon Web Services
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeAmazon Web Services
 
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon Web Services
 
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon Web Services
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Amazon Web Services
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019AWS Summits
 
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Amazon Web Services
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Amazon Web Services
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019AWS Summits
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersAmazon Web Services
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Amazon Web Services
 

Was ist angesagt? (20)

AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019
AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019
AIM301 - Breaking Language Barriers With AI - Tel Aviv Summit 2019
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
Progetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWSProgetta, crea e gestisci Modern Application per web e mobile su AWS
Progetta, crea e gestisci Modern Application per web e mobile su AWS
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
 
Best-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWSBest-Practices-for-Running-Windows-Workloads-on-AWS
Best-Practices-for-Running-Windows-Workloads-on-AWS
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Toronto AWS Summit
 
Accelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-ServicesAccelerating-ML-Adoption-with-Our-New-AI-Services
Accelerating-ML-Adoption-with-Our-New-AI-Services
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo reale
 
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Da...
 
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
 
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
Build accurate training datasets with Amazon SageMaker Ground Truth - AIM205 ...
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
 
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for Builders
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 

Ähnlich wie Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv 2019

Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Amazon Web Services
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Amazon Web Services
 
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Amazon Web Services
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAmazon Web Services
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaSandesh Rao
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotRandall Hunt
 
End to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerEnd to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerAmazon Web Services
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopJulien SIMON
 
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...Amazon Web Services
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Amazon Web Services
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Amazon Web Services
 
Advanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerAdvanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerJulien SIMON
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshopJulien SIMON
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Amazon Web Services
 
Building, Training and Deploying Custom Algorithms with Amazon SageMaker
Building, Training and Deploying Custom Algorithms with Amazon SageMakerBuilding, Training and Deploying Custom Algorithms with Amazon SageMaker
Building, Training and Deploying Custom Algorithms with Amazon SageMakerAmazon Web Services
 

Ähnlich wie Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv 2019 (20)

Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter Bot
 
End to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerEnd to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMaker
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
 
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
Build, Train, and Deploy ML Models Quickly and Easily with Amazon SageMaker, ...
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
 
Advanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerAdvanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMaker
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
 
Building, Training and Deploying Custom Algorithms with Amazon SageMaker
Building, Training and Deploying Custom Algorithms with Amazon SageMakerBuilding, Training and Deploying Custom Algorithms with Amazon SageMaker
Building, Training and Deploying Custom Algorithms with Amazon SageMaker
 

Mehr von AWS Summits

AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summits
 
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summits
 
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summits
 
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summits
 
AWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summits
 
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summits
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summits
 
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summits
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summits
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summits
 
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summits
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summits
 
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summits
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summits
 
AWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summits
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summits
 
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summits
 
AWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summits
 
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summits
 

Mehr von AWS Summits (20)

AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
AWS Summit Singapore 2019 | The Smart Way to Build an AI & ML Strategy for Yo...
 
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and EnterprisesAWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
AWS Summit Singapore 2019 | Bridging Start-ups and Enterprises
 
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and TricksAWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
AWS Summit Singapore 2019 | Hiring a Global Rock Star Team: Tips and Tricks
 
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for StartupsAWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
AWS Summit Singapore 2019 | Five Common Technical Challenges for Startups
 
AWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to ExitAWS Summit Singapore 2019 | A Founder's Journey to Exit
AWS Summit Singapore 2019 | A Founder's Journey to Exit
 
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics ServicesAWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
AWS Summit Singapore 2019 | Realising Business Value with AWS Analytics Services
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement SolutionsAWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
AWS Summit Singapore 2019 | Amazon Digital User Engagement Solutions
 
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWSAWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
AWS Summit Singapore 2019 | Driving Business Outcomes with Data Lake on AWS
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWSAWS Summit Singapore 2019 | Microsoft DevOps on AWS
AWS Summit Singapore 2019 | Microsoft DevOps on AWS
 
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
AWS Summit Singapore 2019 | The Serverless Lifecycle: Development and Operati...
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
 
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at HyperscaleAWS Summit Singapore 2019 | Operating Microservices at Hyperscale
AWS Summit Singapore 2019 | Operating Microservices at Hyperscale
 
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes WorkloadsAWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
AWS Summit Singapore 2019 | Autoscaling Your Kubernetes Workloads
 
AWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business ValueAWS Summit Singapore 2019 | Realising Business Value
AWS Summit Singapore 2019 | Realising Business Value
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native EnterpriseAWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
AWS Summit Singapore 2019 | Transformation Towards a Digital Native Enterprise
 
AWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container SecurityAWS Summit Singapore 2019 | Pragmatic Container Security
AWS Summit Singapore 2019 | Pragmatic Container Security
 
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey RoadmapAWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
AWS Summit Singapore 2019 | Enterprise Migration Journey Roadmap
 

Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv 2019

  • 1.
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Machine Learning inference pipelines at scale Julien Simon Global Evangelist, AI & Machine Learning @julsimon
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Real-life Machine Learning applications require more than predicting with a single model. • Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. • Predictions may need post-processing: filtering, sorting, combining, etc.
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Build and deploy ML pipelines with minimal infrastructure drama! 1. Spark (on Amazon EMR) 2. Spark + Amazon SageMaker 3. Amazon SageMaker, aka Inference Pipelines
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark • Open-source, distributed processing system. • In-memory caching and optimized execution for fast performance (typically 100x faster than Hadoop). • Batch processing, streaming analytics, machine learning, graph databases and ad hoc queries. • API for Java, Scala, Python, R, and SQL. https://spark.apache.org/
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark – DataFrame • Distributed collection of data organized into named columns • Conceptually equivalent to a table in a relational database • Wide array of sources: structured files, databases • Wide array of formats: text, CSV, JSON, Avro, ORC, Parquet {"name": "Jeff"} {"name": "Boaz", "age":72} {"name": "Julien", "age":12} df = spark.read.json("people.json") df.show() +----+-------+ | age| name | +----+-------+ |null| Jeff | | 72 | Boaz | | 12 | Julien| +----+-------+
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. MLlib – Machine learning library • Algorithms: classification, regression, clustering, collaborative filtering. • Featurization: feature extraction, transformation, dimensionality reduction. • Tools for constructing, evaluating and tuning pipelines • Transformer – a transform function that maps a DataFrame into a new one • Adding a column, changing the rows of a specific column, etc. • Predicting the label based on the feature vector • Estimator – an algorithm that trains on data • Consists of a fit() function that maps a DataFrame into a Model https://spark.apache.org/docs/latest/ml-guide.html
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: binary classification for text samples https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala // Prepare training documents from a list of (id, text, label) tuples. val training = <LOAD_TRAINING_DATA> // Configure an ML pipeline with three stages: tokenizer, hashingTF, and lr. val tokenizer = new Tokenizer().setInputCol("text”).setOutputCol("words") val hashingTF = new HashingTF() .setNumFeatures(1000).setInputCol(tokenizer.getOutputCol).setOutputCol("features") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001) val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr)) // Fit the pipeline to training documents. val model = pipeline.fit(training) // Prepare test documents, which are unlabeled (id, text) tuples. val test = <LOAD_TEST_DATA> // Make predictions on test documents. model.transform(test)
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serving SparkML predictions • Train and predict in the same application (see previous example) • Or save the model and load it in another Spark application • Or export the model to PMML and load it elsewhere (Java, R, etc.) • Or export the model to MLeap • http://mleap-docs.combust.ml/ • Lightweight runtime independent from Spark • Interoperability between SparkML, TensorFlow and scikit-learn In any case, you need to build and maintain prediction infrastructure :-/
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: Build, Train, and Deploy ML Models at Scale Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment 1 2 3
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker SDK for Spark • Python and Scala SDK, for Apache Spark 2.1.1 and 2.2. • Pre-installed on EMR 5.11 and later. • Train, import, deploy and predict with SageMaker models directly from your Spark application. • Standalone, • Integration in Spark MLlib pipelines. • DataFrames in, DataFrames out: automatic data conversion to and from protobuf (crowd goes wild!) https://github.com/aws/sagemaker-spark
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reason #1 - Decouple ETL and Machine Learning • Different workloads require different instance types • Say, R4 for ETL, P3 for training and C5 for prediction? • If you need GPUs for training, running ETL on GPU instances wouldn’t be the best option… • Size and scale them independently • Avoid oversizing your Spark cluster. • Avoid time-consuming resizing operations on Amazon EMR. • Run ETL once, train many models in parallel.
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reason #2 - Run any ML algorithm in any language • Spark MLlib is great, but you may need something else • Other ML algorithms • Deep Learning libraries, like TensorFlow or Apache MXNet • Your own custom code in any language
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reason #3 - Get the best prediction performance • Perform ML predictions without using Spark. • Save the overhead of the Spark framework • Save loading your data in a DataFrame • Amazon SageMaker can deploy MLeap models • Improve latency for small-batch predictions. • It can be difficult to achieve low-latency predictions with Spark ML models • Get real-time predictions with models hosted in Amazon SageMaker • Use optimized instances for prediction
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sample use cases for Spark and SageMaker • Data preparation and feature engineering + training • Data transformation + batch prediction (model reuse) • Data cleaning/enrichment with predictions • Predict missing values instead of using median. • Add new predicted features. • Deploying a SparkML model at scale
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Inference Pipelines • Linear sequence of 2-5 containers that process inference requests • Feature engineering with scikit-learn or SparkML (on AWS Glue or Amazon EMR) • Predict with built-in or custom containers • The pipeline is deployed as a single model • Useful to preprocess, predict, and post-process • Available for real-time prediction and batch transform
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started https://ml.aws https://aws.amazon.com/sagemaker https://github.com/awslabs/amazon-sagemaker-examples https://medium.com/@julsimon
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Evangelist, AI & Machine Learning @julsimon