In this session we will review Amazon EFS and how it delivers fully managed, petabyte-scale file storage for Amazon EC2 instances. Large scale and consistent performance make Amazon EFS ideal for web and content serving, enterprise applications, media processing, container storage, and Big Data analytics use cases. Session attendees will learn how to identify appropriate applications for use with Amazon EFS, understand performance details and security models, and hear how established customers are using it in production. The target audience is file system administrators, application developers, and application owners that operate or build file-based applications that require consistent latencies at cloud scale.
4. Amazon Elastic File System (Amazon EFS)
Provides simple, scalable, highly available
and durable file storage in the cloud
Petabyte-scale file system is distributed
across an unconstrained number of storage
servers in multiple Availability Zones (AZs)
Elastic capacity automatically grows and
shrinks as you add and remove files
5. Data encryption at rest
AWS
Key Management Service
Amazon
Elastic File System
Data Encrypted at Rest
6. Transparent data encryption
• All data & metadata is encrypted before it’s written to disk
• All data & metadata is decrypted before it’s read by the client
• Requires no modification to applications
Integrates with AWS Key Management Service (AWS KMS)
for key management
Available in all EFS regions at no additional cost
Data encryption at rest
7. Amazon EFS
Standard file system interface and semantics
Shared storage
Highly available and highly durable
Consistent low latency
Strong read-after-write consistency
Elastic capacity
Fully managed
8. Do you need an EFS file system?
If you have an application running on Amazon EC2 or a
use case that requires a file system…
AND
• Requires multi-attach OR
• GBs/s throughput OR
• Multi-AZ availability/durability OR
• Requires automatic scaling (grow/shrink) of storage
9. What customers are using EFS for today
Web serving
Content management
Analytics
Media and entertainment
workflows
Workflow management
Home directories
Container storage
Database backups
10. Shared file solutions in the cloud… before EFS
Third-party software
Do it yourself
Third-party hardware in AWS
AWS Direct Connect locations
11. Do it yourself – NFS architecture
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
NFS
Clients
NFS
Server
Volume Volume
12. Do it yourself – NFS architecture
• Launch, patch, monitor, & pay for EC2 instances
• Create, attach, monitor, & pay for provisioned Amazon
EBS volumes
• Create, maintain, and monitor Auto Scaling group
• Install, patch, monitor, & pay for* file system software
• Configure, maintain, monitor, & pay for file system data
intra/inter–Availability Zone replication
• IOPS for replication are still IOPS
• Configure DNS for client HA access to inter–Availability
Zone NFS fleet
19. Resources for Amazon EFS
File system
• Mount targets
• Subnet ID
• Security groups
• Tags
• Key-value pairs
20. Resources for Amazon EFS
File system
• Regional construct
• Default throughput limit 3 GB/s (soft)
• Metered size updates approx. every hour
• Two performance modes (gp & maxIO)
• Accessible from EC2
• VPC, EC2-Classic via ClassicLink
• Accessible from on premises
• AWS Direct Connect
21. Resources for Amazon EFS
File System, cont.
• Scenarios for on premises via Direct Connect
Bursting
Migration
Tiering
Backup / DR
22. Resources for Amazon EFS
Mount targets
• One or more per file system
• Create in a VPC subnet
• One per Availability Zone
• Must be in the same VPC
23. Resources for Amazon EFS
Subnet IDs
• Create mount target in one subnet per Availability Zone
• Mount target gets IP from subnet
• Automatic or static IP addresses
• IP addresses do not change
subnet-8d73b6e7
24. Resources for Amazon EFS
Security groups
• Standard VPC security group
• Same VPC as subnet
• Up to five per mount target
• Allow inbound TCP port 2049
from NFS clients
25. Resources for Amazon EFS
Tags
• Typical key-value pair
• Create & associate tag with file system
• Up to 50 tags per file system
27. Recommended mount options
-o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,async
Mount using NFSv4.1 (default options)
Specify 1 MB read/write buffers
Hard mount
Timeout of 60 seconds (600 tenths of a second)
2 minor timeouts & retransmissions before major timeout
Ensure operations are asynchronous
28. Security
Encrypt data at rest using keys managed in AWS KMS
Control network traffic using VPC security groups and
network ACLs
Control file and directory access by using POSIX
permissions
Control administrative access (API access) to file systems
by using AWS Identity and Access Management (IAM)
action-level and resource-level permissions
29. High throughput and parallel I/O
Low latency and serial I/O
Genomics
Big data analytics
Scale-out jobs
Home directories
Content management
Web serving
Metadata-intensive
jobs
Amazon EFS is designed for wide spectrum of
performance needs
30. EC2
EC2
…
EC2
EC2
…
EC2
EC2
…
• File systems distributed
across unconstrained number
of servers
• Avoids
bottlenecks/constraints of
traditional file servers
• Enables high levels of
aggregate IOPS/throughput
• Data also distributed across
Availability Zones (durability,
availability)
Amazon EFS – distributed data storage design
31. How to think about EFS perf relative to EBS
Amazon EFS Amazon EBS PIOPS
Performance
Per-operation
latency
Low, consistent Lowest, consistent
Throughput
scale
Multiple GBs per second Single GB per second
Characteristics
Data availability/
durability
Stored redundantly across multiple
Availability Zones
Stored redundantly in a single
Availability Zone
Access
1 to 1000s of EC2 instances, from
multiple Availability Zones, concurrently
Single EC2 instance in a single
Availability Zone
Use cases
Big data and analytics, media processing
workflows, content management, web
serving, home directories
Boot volumes, transactional and
NoSQL databases, data warehousing,
and ETL
32. Performance modes for different workloads
Mode What it is for Advantages Tradeoffs When to use
General
purpose
(default)
Latency-
sensitive
applications and
general-purpose
workloads
Lowest
latencies for file
operations
Limit of 7K
ops/sec
Best choice for
most workloads
Max I/O
Large-scale and
data-heavy
applications
Virtually
unlimited ability
to scale out
throughput/
IOPS
Slightly higher
latencies
Consider for
large scale-out
workloads
33. EFS CloudWatch Metric – PercentIOLimit
Determine whether you’re being constrained by General Purpose
mode (PercentIOLimit at or near 100%)
34. Burst credit model
Based on size of file system
Starts with 2.1 TiB burst credits
Min. burst throughput 100 MiB/s
Baseline throughput 50 MiB/s per TiB
Burst throughput 100 MiB/s Per TiB
41. Maximize EFS throughput
Not all EC2 instance types are created equal
• Select the appropriate EC2 instance type for the job
• Look at vCPU, memory, network performance, EBS-optimized, etc.
• Sample of EFS throughput of m4 instance family
• Max. throughput per EC2 instance 250 MB/s
m4.large
moderate
~60 MB/s
m4.xlarge
high
~ 95 MB/s
m4.2xlarge
high
~125 MB/s
m4.4xlarge
high
~210 MB/s
m4.10xlarge
10 gigabit
~250 MB/s
m4.16xlarge
20 gigabit
~250 MB/s
42. Maximize EFS throughput
Not all EBS volumes types are created equal
• Select the appropriate EBS volume type for the job
• Sample of max throughputs of EBS volume types
• Max. throughput per EC2 instance 250 MB/s
gp2
160 MiB/s
Io1
320 MiB/s
st1
500 MiB/s
sc1
250 MiB/s
43. Maximize EFS throughput
Not all file transfer utilities are created equal
• Select the appropriate utility for the job
44. Maximize EFS throughput
Select the appropriate EC2 instance / EBS volume type
demo
Select the best transfer tool for the job
demo
Use multiple threads
demo
Use multiple instances – parallelize – scale-out
demo
45. Maximize EFS throughput
Not all file transfer utilities are created equal
• Select the appropriate utility for the job
rsync cp mcp fpsync
cp+
GNU parallel
fpart+cpio+
GNU parallel
single-
threaded
single-
threaded
multi-
threaded
multi-
threaded
multi-threaded multi-threaded
Poor
(very chatty)
Good Better Better Better Best
46. Tools and citations
• GNU Parallel – the command-line power tool
• http://www.gnu.org/s/parallel
• Author: Ole Tange
• fpart – sort file trees & pack them into partitions
• fpsync – wraps fpart & rsync
• https://github.com/martymac/fpart
• Author: Ganaël Laplanche
• mutil | mcp – multithreaded drop-in replacement of cp
• https://github.com/pkolano/mutil
• Author: Paul Kolano (NASA)
48. EFS economics
No minimum commitments or up-front fees
No need to provision storage in advance
No other fees, charges, or billing dimensions
Price: $0.30/GB-Month (US Regions)
$0.33/GB-Month (EU Ireland)
$0.36/GB-Month (EU Frankfurt)
$0.36/GB-Month (AP Sydney)
49. EFS TCO example
Let’s say you need to store ~500 GB and require high availability and durability
Using a shared file layer on top of EBS, you might provision 600 GB (with ~85%
utilization) and fully replicate the data to a second Availability Zone for
availability/durability
Example comparative cost:
Storage (2x 600 GB EBS gp2 volumes): $120 per month
Compute (2x m4.xlarge instances): $350 per month
Inter-AZ data transfer costs (est.): $129 per month
Total $599 per month
EFS cost is (500 GB * $0.30/GB-month) = $150 per month, with no additional
charges
50. Where is EFS available today?
• US West (Oregon)
• US East (N. Virginia)
• US East (Ohio)
• EU (Ireland)
• Asia Pacific (Sydney)
• EU (Frankfurt)
More coming soon!
51. Key recommendations
• Test your application!
• Use General Purpose mode for lowest latency, Max-I/O for
scale-out
• Use Linux kernel version 4.0 or newer, mount via NFSv4.1
• To optimize, look for opportunities to:
• Aggregate I/O
• Perform async operations
• Parallelize
• Cache
• Don’t forget to check your burst credit earn/spend rate when
testing – ensure sufficient amount of storage
52. Key recommendations
• When accessing EFS, know the perf characteristics:
• Source
• Network
• Destination
• Not all EC2 instance types are created equal
• Not all EBS volume types are created equal
• Not all file transfer utilities are created equal
• Test, test, test!