Profile hadoop apps

Profiling Hadoop Applications
Basant Verma

Agenda
• Profiling General Background
• Available Options
• Profile using Free and Open Source tools
• Profile using YourKit
• Other troubleshooting tools

What does Profiling Provide?
• Profiling runtime / CPU usage:
– what lines of code the program is spending the most
time in
– what call/invocation paths were used to get to these
lines
• naturally represented as tree structures
• Profiling memory usage:
– what kinds of objects are sitting on the heap
– where were they allocated
– who is pointing to them now
– memory leaks

Profiler Types and Components
• Components needed for profiling
– Profiling Agent
• Collects profiled data (samples, traces, exceptions etc.)
– Analysis Tool
• Provides interface for analyzing profiled data and help user
identify potential problems
• Types of Profilers
– insertion
– sampling
– instrumenting

Available Options
• Sun JDK Tools
– hprof: Profiler (uses jvmti)
– jmap: Provides memory map (dump) heap
– jhat: Analyze memory dump
– jstack: Provide thread dump
– Jvisualvm: GUI based profile data analyzer
• Open Source
– Visual VM (same as jvisualvm but downloaded as independent app)
• Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs
– NetBeans Profiler
• Similar to VisualVM but integrated into IDE
– Eclipse MAT (Memory Analysis Tool)
• Can load .hprof files
• Commercial
– YourKit
– JProfile

7
Official hprof Documentation
usage: java -Xrunhprof:[help]|[<option>=<value>, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b text(txt) or binary output a
file=<file> write data to file off
depth=<size> stack trace depth 4
interval=<ms> sample interval in ms 10
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces? Y
thread=y|n thread in traces? N
doe=y|n dump on exit? Y
msa=y|n Solaris micro state accounting n
force=y|n force output to <file> y
verbose=y|n print messages about dumps y
http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html

8
Sample hprof usage
• To measure CPU usage, try the following:
java -Xrunhprof:cpu=samples,depth=6,heap=dump
• Settings:
– Takes samples of CPU execution
– Record call traces that include the last 6 levels on the
stack
– Dumps the heap map (bigger file size but helps in
finding problems)
• Creates the file java.hprof.txt in the
current directory

HPROF with Hadoop
• Hadoop uses hprof as the default profiler
• Profiling related parameters
Purpose JobConf API Command line Parameter
Enable Profiling setProfileEnabled(true) mapred.task.profile=true
Additional
parameters for
Profiler
setProfileParams(…) mapred.task.profile.params
Range of sampled
task to profile
setProfileTaskRange mapred.task.profile.maps
mapred.task.profile.reduces

Example
• Using Java API
• Using Command line parameters
jobConf.setProfileEnabled(true);
jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” +
“,depth=4,thread=y,file=%s");
jobConf.setProfileTaskRange(true, "0-2");
jobConf.setProfileTaskRange(false, "0-1");
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount
-Dmapred.task.profile=true
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s
-Dmapred.task.profile.maps=0-2
-Dmapred.task.profile.reduces=0-1
input output

Collecting Profiler Output
• Hadoop JobClient automatically downloads profile logs
from all the profiled tasks
– If output format type is not specified, hprof creates profile
output in text format (format=a)
• Profiler Outputs are also available via History WebUI
• You can also download profile output using curl
– curl -o attempt_201305161037_0004_m_000000_0.hprof
"http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_
201305161037_0004_m_000000_0&filter=profile"

Analyze Profiler output
• You can use VisualVM, NetBeans profiler or
YourKit for analyzing the profiling data.
– The above tools support only binary format of hprof
output (i.e. option format=b)
• Example
– Run profiler with Hadoop job
– Load Profiler output using VisualVM menu option
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all,
depth=4,thread=y,format=b,file=%s
input output

Analyze Profile Output in VisualVM

Object Query Language
• VisualVM and jhat support special query
language (OQL) to query Java heap.
– Example : Select all Strings with length 1K or more
• More information about OQL is available at
http://visualvm.java.net/oqlhelp.html
select s from java.lang.String where s.count > 1024;

Analyze Profile Output in Eclipse MAT

Profiling Pig Jobs
• Use Hadoop command line parameters
• More information about Pig job profiling is
available at Pig Wiki
– https://cwiki.apache.org/PIG/howtoprofile.html
pig -Dmapred.task.profile=true
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n

mypigscript.pig

Profiling Hive Queries
• Set appropriate Hadoop parameters before
submitting the queries
hive> set mapred.task.profile=true;
hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s;
hive> set mapred.task.profile.maps=0-2;
hive> set mapred.task.profile.reduces=0-0;
hive>
hive> <hive query>

YourKit Profiler - Summary
• Commercial Java Profiling Tool
– Free tryout and Open Source licenses are available
• Used by many Open Source projects including
Hadoop, Pig, Hive etc.
• Features
– On-Demand Profiling
– CPU, Memory and Concurrency profiling methods
– Has integration (Eclipse, NetBeans, IntelliJ)
– Above all, has relatively low performance overhead

Using YourKit Profiler
• You will need to install YourKit profiler (just the profiler
lib) on to each TaskTracker
• Tell Hadoop to use a different profiler
• Theoretically, you can also use DistributedCache to
make binaries available on TaskTracker machines
– Though, I did not have success with this
-Dmapred.task.profile.params=-
agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee
input output

Small Glitch
• Hadoop JobClient.waitforCompletion(…) will throw error since profile logs
are not available in the default directory.
• However, the job will continue to run successfully.
• To avoid this, you can instead use mapred.child.java.opts option to specify
the profiling parameters

YourKit to Analyze Jobs
• Can analyze profile output from both YourKit
Profiler and hprof/jmap.

Using other Tools
• JDK Tool ‘jmap’
– Can be used for capturing heap map of a running Java
process and later used for analysis inside VisualVM or
YourKit
• $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid>
• Don’t run jmap with -histo:live option on JT or NN
– Java process can also be instructed to generate hprof
dump of heap map in case of OutOfMemoryError
• -XX:+HeapDumpOnOutOfMemoryError
• JDK Tool ‘jhat’
– Can read heap dump in hprof format and provides a
light weight web interface to analyze profiler output

Other Tools (Cont…)
• Hadoop Vaidya (Simple Diagnostic Tool)
– Identifies common performance problem related
to Hadoop Jobs (unbalanced partitioning,
granularity of tasks, combiners etc.)
– Works merely on Hadoop Job (does not
understands the specifics of Hive/Pig)

Other Recommendation
• If possible try running Hadoop (MR/Pig/Hive)
in local mode using LocalJobRunner
– LocalJobRunner runs the entire MapReduce job in
a single JVM
– It simplifies profiling and log collection
– Can also be used for attaching debugger from IDE

Resources
• Troubleshooting Java application
– http://www.oracle.com/technetwork/java/javase/toc-135973.html
• Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”)
– http://my.safaribooksonline.com/book/databases/hadoop/978059652
1974/tuning-a-job/id3545664
• Profiling Pig Job
– https://cwiki.apache.org/PIG/howtoprofile.html
• ‘hprof’ Official Documentation
– http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
• YourKit Profiler
– http://www.yourkit.com

Profile hadoop apps

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Profile hadoop apps

Ähnlich wie Profile hadoop apps (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Profile hadoop apps