Weitere ähnliche Inhalte Ähnlich wie Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018 (20) Mehr von Amazon Web Services (20) Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower Costs on Amazon EMR: Auto
Scaling, Spot Pricing, & Expert
Strategies
Bruno Faria
Senior EMR Solutions Architect
AWS Solutions Architecture
A N T 3 8 5
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR pricing
With Amazon EMR, you only pay a per-second rate
for every second you use. The price is based on the
instance type and number of EC2 instances that
you deploy, and the region in which you launch
your cluster.
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reserved, Spot, and On-Demand Instances
Spot Instances
Amazon EC2 Spot Instances
offer spare compute
capacity available at
discounts compared to On-
Demand Instances.
Reserved Instances
Amazon EC2 Reserved
Instances provide you the
option to make a payment for
instances that you want to
reserve at a significant
discount compared to On-
Demand pricing.
On-Demand Instances
Amazon EC2 On-
Demand Instances are
instances that you
launch and pay by the
second.
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Understanding the node types in Amazon EMR
Master node: The node that manages the cluster. The master node
tracks the status of tasks and monitors the health of the cluster.
Core node: The node that runs tasks and stores data in the Hadoop
Distributed File System (HDFS) on your cluster.
Task node: The node that only runs tasks and does not store data in
HDFS. Task nodes are optional.
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower costs with Spot and Reserved Instances
Spot for
task nodes
Up to 80%
off EC2
On-Demand
pricing
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Meet SLA at predictable cost Exceed SLA at lower cost
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance and hardware
Considerations
• Transient or long running
• Instance types
• Cluster size
• Application settings
• File formats and S3 tuning
Master node
c5.2xlarge
Slave group - Core
c5.2xlarge
Slave group – Task
m5.2xlarge (EC2 Spot)
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advanced Spot Provisioning with Instance Fleets
Master node Core instance fleet Task instance fleet
• Provision from a list of instance types with Spot and On-Demand
• Launch in the most optimal Availability Zone based on capacity/price
• Spot block support
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transient or long running workloads
Transient
Long running
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lower costs with Auto Scaling
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use Amazon S3 as your persistent data store
• Decouple storage and compute
• Scale up or down for your compute and storage needs
independently
• Can run transient Amazon EMR clusters with Amazon EC2
Spot Instances
• Designed for 99.999999999% durability
• No need to pay for data replication
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Tips
• Partition your data to reduce amount of data scanned
• Optimize file sizes to reduce amount S3 requests
• Compress data set to minimize bandwidth from S3 to EC2
• Use a columnar file format like Parquet when selecting only a subset of columns
13. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bruno Faria
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.