5. What to Expect from the Session
• Getting the best system performance from your EC2
Instances
• How Amazon EC2 instances deliver performance
while providing flexibility and agility
• How to make the most of advanced features for EC2
and companion services
7. • Servers are hired to do jobs
• Performance is measured differently depending on the job
Hiring a Server
?
8. Performance Factors
Resource Performance factors Key indicators
CPU Sockets, number of cores, clock
frequency, bursting capability
CPU utilization, run queue length
Memory Memory capacity Free memory, anonymous paging,
thread swapping
Network
interface
Max bandwidth, packet rate Receive throughput, transmit throughput
over max bandwidth
Disks Input / output operations per
second, throughput
Wait queue length, device utilization,
device errors
9. Resource Utilization
• For given performance, how efficiently are resources being used?
• Something at 100% utilization can’t accept any more work
• Low utilization can indicate more resources are being purchased
than needed
10. Example: Web Application
• MediaWiki installed on Apache with 140 pages of content
• Load increased in intervals over time
15. • Picking an instance is tantamount to resource performance tuning
• Give back instances as easily as you can acquire new ones
• Find an ideal instance type and workload combination
Instance Selection = Performance Tuning
18. Review: T2 Instances
• Lowest cost EC2 instance at $0.013 per hour
• Burstable performance
• Fixed allocation enforced with CPU credits
Model vCPU CPU Credits
/ Hour
Memory
(GiB)
Storage
t2.micro 1 6 1 EBS Only
t2.small 1 12 2 EBS Only
t2.medium 2 24 4 EBS Only
t2.large 2 36 8 EBS Only
19. How Credits Work
• A CPU credit provides the
performance of a full CPU core for
one minute
• An instance earns CPU credits at
a steady rate
• An instance consumes credits
when active
• Credits expire (leak) after 24 hours
Baseline rate
Credit
balance
Burst
rate
22. Announced: X1 Instances
• Largest memory instance with 2 TB of DRAM
• Quad socket, Intel E7 processors with 128 vCPUs
Model vCPU Memory (GiB) Local
Storage
x1.32xlarge 128 1952 2x 1920GB
24. Review: I2 Instances
16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD
365K random read IOPS for 32 vCPU instance
Model vCPU Memory
(GiB)
Storage Read IOPS Write IOPS
i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000
i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000
i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000
i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000
25. Device Pass Through: Enhanced Networking
• SR-IOV eliminates need for driver domain
• Physical network device exposes virtual function to
instance
• Requires a specialized driver, which means:
• Your instance OS needs to know about it
• EC2 needs to be told your instance can use it
26. Hardware
Before Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
23
4
5
27. Hardware
After Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
NIC
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
2
3
NIC
Driver
28. Tip: Use Enhanced Networking
• Highest packets-per-second
• Lowest variance in latency
• Instance OS must support it
• Look for SR-IOV property of instance or image
30. Auto Recovery for Amazon EC2
• Recover Instances that have
become impaired due to
underlying hardware problem.
• Instance maintains Instance ID,
Private IP, Elastic IP and
metadata.
• Configured through CloudWatch
EC2 Action
31. Auto Recovery for Amazon EC2
• Examples of problems causing system status checks to fail:
• Loss of Network Connectivity
• Loss of system power
• Software issues on the physical host
• Hardware issues on the physical host.
• Only supported on:
• C3, C4, M3, M4, R3, T2 and X1 Instances
• Instances in a VPC
• Instances with shared Tenancy
• Instances that use EBS storage exclusively
32. Auto Scaling – Lifecycle Hooks
• Hold instance in Pending or Terminating state.
• Notification of Lifecycle event triggering via
CloudWatch Events or SNS. (Lambda)
• Default timeout is one hour.
• Can CONTINUE or ABANDON. Set a default results
using –default-result
33. Auto Scaling Lifecycle Hooks - Adding
aws autoscaling put-lifecycle-hook
--lifecycle-hook-name my-hook
--auto-scaling-group-name my-asg
--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING
aws autoscaling put-lifecycle-hook
--lifecycle-hook-name my-hook
--auto-scaling-group-name my-asg
--lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING
On Launch
On Termination
50. Environment Manager
Self-service portal & API for common infrastructure
tasks
trainline
Environment Abstraction over AWS
Env & Infra Config Management
Deployment & Toggling
Scaling and Patching
Compare and Synchronise
51. Environment Manager trainline
1,300 servers actively managed
20,000 deployments since Jan
40% fewer Jira tickets
Improved visibility& productivity