2. What is Hadoop? :) :) :)
Everybody knows that
... What is your definition?
3. What is a cloud?
Everybody knows that, but
1. Elastic resources
2. Internet delivery
3. SAAS
4. Virtualization
5. Device-enabled
6. Only (1) or all of the above
4. You are the Hadoop programmer
... and you need tools
What are your alternatives?
● IDE
● Local "cluster"
● Pseudo-distributed cluster
● EC2
5. You are the Hadoop programmer
... and you need tools
What are your alternatives?
● IDE - compile and run the code
● Local "cluster" - local file system
● Pseudo-distributed cluster - test outside
● EC2 - test on the cluster, test for scale
6. What are your resources
● Tom White, "Hadoop, the Definitive Guide"
● www.hadoopilluminated.com
10. Whirr limitations
● No EBS
● All or nothing
● Generates configuration artifacts
● Takes over your computer, no more local
development - uses proxy
● Hard to customize
12. EMR limitations
● No choice of image
● Fixed architecture
● Hard to debug
● Hard to customize
13. You do it
Repeat the manual procedure, only automate it
Prepare
AMI, Java, Hadoop
On-the-fly
Start AMI, login, configure, start services,
verify, run test jobs
14. You do it - advanced
On startup
Under-provision, over-provision, progress
On-the-fly
Monitor, run test jobs, watch for cluster
deterioration