3. Who am I!
Co-founder and CTO at
Architect of Karmasphere’s solutions
Have been working with Hadoop since …
Written a few compilers
Broken a few things:
› computers, security systems, bosses, etc.
3
4. Survey of Questions
1164 Questions
100%
Others
80%
•How to maintain the cluster?
•Why does Hadoop do ….?
60%
•How to know what the cluster is doing?
40%
•How to use Hadoop?
•How to get stuff to/from Hadoop? 20%
•How to setup Hadoop?
0%
Based on user questions and issues
4
Source: Hadoop Users Mail-list (March 2009-June 2010
5. Problems Past –Cluster as a Utility
Getting a cluster – it’s a utility (like electricity)
› Amazon EMR, Hadoop, Cloudera, IBM, Yahoo
Cluster versions and protocols
› Easy to switch between clusters
› Staging for faster development
› Easy to migrate data
› Talk to remote clusters
6. Karmasphere Client
Ensures Hadoop distribution and version independence
Works from Windows (unlike Hadoop Client), Mac and Linux
Supports any Hadoop environment: private, public or cloud
service.
Provides:
› Job portability
› Operating system portability
› Firewall hopping and tunnelling
› Fault tolerant API
› Synchronous and Asynchronous API
› Clean Object Oriented design
Making it easy and predictable to maintain a business
operation reliant on Hadoop
8. Problems Present – Interact with Cluster
Getting data in
Getting data out
9. Problems Present – Interact with Cluster
Getting data in
Getting data out
…
This is the problem.
Can’t Get data out Have to extract information
10. Writing a MapReduce Job
Understanding MapReduce
Boilerplate is boring
Testing takes time
Debugging is difficult
What Happened?
12. Present Continuous
Why did my job fail?
› Monitoring
› Diagnostics
› Debugging
What do I need to know about my job?
› Valgrind, lint, coverity, gprof, gdb, findbugs, sparse,
JSR305, ....
Why did my job do ….?
17. Traditional Approach Karmasphere Approach
User User
Client Side
Rich communications required for Hive
Rich Communication
Supported within Karmasphere
Application framework
Debug/ optimization information
Hive JDBC Thrift Proxy Karmasphere
Application
All communications
Framework
‘hampered’
through JDBC Thrift
proxy
Thrift Server Native
Hadoop
Protocol
Hive Engine
Server Side
Hadoop Client
Job Tracker Job Tracker
Cluster Cluster
(Hadoop) (Hadoop)
18. Your time
costs money
Theory
Results Experiment
Confidential
19. Get Working Efficiently with Hadoop
Karmasphere Studio: Community Edition Free
Karmasphere Studio: Professional Edition
› ($200 introductory discount for attendees)
Karmasphere Client (Enterprise license)
Karmasphere Studio: Analyst Edition
› Coming sooner than you think!