2. AMAZON WEB SERVICES (AWS)
Elastic Cloud Compute (EC2)
Virtual Machine in Cloud
Simple Storage Service (S3)
Network Share in Cloud
Elastic MapReduce (EMR)
Cluster of EC2 instances for Hadoop cluster
3. EC2 PRICE TYPES
Spot Instances
Systemfor bidding on unused instances
Same Performance
Go away (abruptly) if outbid
On Demand
Ad Hoc starting
Reserved
Not Covered
5. MILLION MONKEYS PROJECT
Randomly recreated Shakespeare
Open source
Good metric for CPU and memory
6. EC2 SPECIFICATIONS
Instance Name Memory EC2 Compute Platform I/O
Units/Cores Performance
Small 1.7 GB 1 EC2 on 1 Core 32-bit Moderate
Large 7.5 GB 4 EC2 on 2 Cores 64-bit High
Extra Large 15 GB 8 EC2 on 8 Cores 64-bit High
High-CPU 1.7 GB 5 EC2 on 2 Cores 32-bit Moderate
Medium
High-CPU Large 7 GB 20 EC2 on 8 Cores 64-bit High
Quad XL 23 GB 33.5 on 8 Cores 64-bit Very High
EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides
the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or
2007 Xeon processor.
7. EC2 PERFORMANCE
My Core 2 Duo 2.66 GHZ did 50,000,000,000 character groups
16. BREAKDOWNS
Original project would have run in 3 days 9
hours
Took 1.5 months before
20 node cluster costs $45.44 per day
5 day run cost $317
11 day run cost $528
17. ENGINEERING FOR THE CLOUD
Establish if a good fit
Test the EC2 performance
Figure out a unit or widget
Find the most cost efficient EC2 performer
with price per unit/widget
Engineer with Spot Instances in mind
18. CONCLUSIONS
Spot Instance Saves
From $2.20 to $1.30 per hour
Saved $1,000 in one run
Hadoop/EMR Scalability
95% efficiency at 2-5 nodes
87% efficiency at 10 nodes
84% efficiency at 20 nodes