2. Agenda
• Profiling General Background
• Available Options
• Profile using Free and Open Source tools
• Profile using YourKit
• Other troubleshooting tools
3. What does Profiling Provide?
• Profiling runtime / CPU usage:
– what lines of code the program is spending the most
time in
– what call/invocation paths were used to get to these
lines
• naturally represented as tree structures
• Profiling memory usage:
– what kinds of objects are sitting on the heap
– where were they allocated
– who is pointing to them now
– memory leaks
4. Profiler Types and Components
• Components needed for profiling
– Profiling Agent
• Collects profiled data (samples, traces, exceptions etc.)
– Analysis Tool
• Provides interface for analyzing profiled data and help user
identify potential problems
• Types of Profilers
– insertion
– sampling
– instrumenting
5. Available Options
• Sun JDK Tools
– hprof: Profiler (uses jvmti)
– jmap: Provides memory map (dump) heap
– jhat: Analyze memory dump
– jstack: Provide thread dump
– Jvisualvm: GUI based profile data analyzer
• Open Source
– Visual VM (same as jvisualvm but downloaded as independent app)
• Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs
– NetBeans Profiler
• Similar to VisualVM but integrated into IDE
– Eclipse MAT (Memory Analysis Tool)
• Can load .hprof files
• Commercial
– YourKit
– JProfile
7. 7
Official hprof Documentation
usage: java -Xrunhprof:[help]|[<option>=<value>, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b text(txt) or binary output a
file=<file> write data to file off
depth=<size> stack trace depth 4
interval=<ms> sample interval in ms 10
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces? Y
thread=y|n thread in traces? N
doe=y|n dump on exit? Y
msa=y|n Solaris micro state accounting n
force=y|n force output to <file> y
verbose=y|n print messages about dumps y
http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
8. 8
Sample hprof usage
• To measure CPU usage, try the following:
java -Xrunhprof:cpu=samples,depth=6,heap=dump
• Settings:
– Takes samples of CPU execution
– Record call traces that include the last 6 levels on the
stack
– Dumps the heap map (bigger file size but helps in
finding problems)
• Creates the file java.hprof.txt in the
current directory
9. HPROF with Hadoop
• Hadoop uses hprof as the default profiler
• Profiling related parameters
Purpose JobConf API Command line Parameter
Enable Profiling setProfileEnabled(true) mapred.task.profile=true
Additional
parameters for
Profiler
setProfileParams(…) mapred.task.profile.params
Range of sampled
task to profile
setProfileTaskRange mapred.task.profile.maps
mapred.task.profile.reduces
10. Example
• Using Java API
• Using Command line parameters
jobConf.setProfileEnabled(true);
jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” +
“,depth=4,thread=y,file=%s");
jobConf.setProfileTaskRange(true, "0-2");
jobConf.setProfileTaskRange(false, "0-1");
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount
-Dmapred.task.profile=true
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s
-Dmapred.task.profile.maps=0-2
-Dmapred.task.profile.reduces=0-1
input output
11. Collecting Profiler Output
• Hadoop JobClient automatically downloads profile logs
from all the profiled tasks
– If output format type is not specified, hprof creates profile
output in text format (format=a)
• Profiler Outputs are also available via History WebUI
• You can also download profile output using curl
– curl -o attempt_201305161037_0004_m_000000_0.hprof
"http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_
201305161037_0004_m_000000_0&filter=profile"
13. Analyze Profiler output
• You can use VisualVM, NetBeans profiler or
YourKit for analyzing the profiling data.
– The above tools support only binary format of hprof
output (i.e. option format=b)
• Example
– Run profiler with Hadoop job
– Load Profiler output using VisualVM menu option
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount
-Dmapred.task.profile=true
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all,
depth=4,thread=y,format=b,file=%s
input output
15. Object Query Language
• VisualVM and jhat support special query
language (OQL) to query Java heap.
– Example : Select all Strings with length 1K or more
• More information about OQL is available at
http://visualvm.java.net/oqlhelp.html
select s from java.lang.String where s.count > 1024;
17. Profiling Pig Jobs
• Use Hadoop command line parameters
• More information about Pig job profiling is
available at Pig Wiki
– https://cwiki.apache.org/PIG/howtoprofile.html
pig -Dmapred.task.profile=true
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n
-Dmapred.task.profile.maps=0-2
-Dmapred.task.profile.reduces=0-0
mypigscript.pig
18. Profiling Hive Queries
• Set appropriate Hadoop parameters before
submitting the queries
hive> set mapred.task.profile=true;
hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s;
hive> set mapred.task.profile.maps=0-2;
hive> set mapred.task.profile.reduces=0-0;
hive>
hive> <hive query>
20. YourKit Profiler - Summary
• Commercial Java Profiling Tool
– Free tryout and Open Source licenses are available
• Used by many Open Source projects including
Hadoop, Pig, Hive etc.
• Features
– On-Demand Profiling
– CPU, Memory and Concurrency profiling methods
– Has integration (Eclipse, NetBeans, IntelliJ)
– Above all, has relatively low performance overhead
21. Using YourKit Profiler
• You will need to install YourKit profiler (just the profiler
lib) on to each TaskTracker
• Tell Hadoop to use a different profiler
• Theoretically, you can also use DistributedCache to
make binaries available on TaskTracker machines
– Though, I did not have success with this
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount
-Dmapred.task.profile=true
-Dmapred.task.profile.params=-
agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee
-Dmapred.task.profile.maps=0-2
-Dmapred.task.profile.reduces=0-1
input output
22. Small Glitch
• Hadoop JobClient.waitforCompletion(…) will throw error since profile logs
are not available in the default directory.
• However, the job will continue to run successfully.
• To avoid this, you can instead use mapred.child.java.opts option to specify
the profiling parameters
23. YourKit to Analyze Jobs
• Can analyze profile output from both YourKit
Profiler and hprof/jmap.
25. Using other Tools
• JDK Tool ‘jmap’
– Can be used for capturing heap map of a running Java
process and later used for analysis inside VisualVM or
YourKit
• $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid>
• Don’t run jmap with -histo:live option on JT or NN
– Java process can also be instructed to generate hprof
dump of heap map in case of OutOfMemoryError
• -XX:+HeapDumpOnOutOfMemoryError
• JDK Tool ‘jhat’
– Can read heap dump in hprof format and provides a
light weight web interface to analyze profiler output
26. Other Tools (Cont…)
• Hadoop Vaidya (Simple Diagnostic Tool)
– Identifies common performance problem related
to Hadoop Jobs (unbalanced partitioning,
granularity of tasks, combiners etc.)
– Works merely on Hadoop Job (does not
understands the specifics of Hive/Pig)
27. Other Recommendation
• If possible try running Hadoop (MR/Pig/Hive)
in local mode using LocalJobRunner
– LocalJobRunner runs the entire MapReduce job in
a single JVM
– It simplifies profiling and log collection
– Can also be used for attaching debugger from IDE