SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Profiling Hadoop Applications
Basant Verma
Agenda
• Profiling General Background
• Available Options
• Profile using Free and Open Source tools
• Profile using YourKit
• Other troubleshooting tools
What does Profiling Provide?
• Profiling runtime / CPU usage:
– what lines of code the program is spending the most
time in
– what call/invocation paths were used to get to these
lines
• naturally represented as tree structures
• Profiling memory usage:
– what kinds of objects are sitting on the heap
– where were they allocated
– who is pointing to them now
– memory leaks
Profiler Types and Components
• Components needed for profiling
– Profiling Agent
• Collects profiled data (samples, traces, exceptions etc.)
– Analysis Tool
• Provides interface for analyzing profiled data and help user
identify potential problems
• Types of Profilers
– insertion
– sampling
– instrumenting
Available Options
• Sun JDK Tools
– hprof: Profiler (uses jvmti)
– jmap: Provides memory map (dump) heap
– jhat: Analyze memory dump
– jstack: Provide thread dump
– Jvisualvm: GUI based profile data analyzer
• Open Source
– Visual VM (same as jvisualvm but downloaded as independent app)
• Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs
– NetBeans Profiler
• Similar to VisualVM but integrated into IDE
– Eclipse MAT (Memory Analysis Tool)
• Can load .hprof files
• Commercial
– YourKit
– JProfile
USING HPROF
7
Official hprof Documentation
usage: java -Xrunhprof:[help]|[<option>=<value>, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b text(txt) or binary output a
file=<file> write data to file off
depth=<size> stack trace depth 4
interval=<ms> sample interval in ms 10
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces? Y
thread=y|n thread in traces? N
doe=y|n dump on exit? Y
msa=y|n Solaris micro state accounting n
force=y|n force output to <file> y
verbose=y|n print messages about dumps y
http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
8
Sample hprof usage
• To measure CPU usage, try the following:
java -Xrunhprof:cpu=samples,depth=6,heap=dump
• Settings:
– Takes samples of CPU execution
– Record call traces that include the last 6 levels on the
stack
– Dumps the heap map (bigger file size but helps in
finding problems)
• Creates the file java.hprof.txt in the
current directory
HPROF with Hadoop
• Hadoop uses hprof as the default profiler
• Profiling related parameters
Purpose JobConf API Command line Parameter
Enable Profiling setProfileEnabled(true) mapred.task.profile=true
Additional
parameters for
Profiler
setProfileParams(…) mapred.task.profile.params
Range of sampled
task to profile
setProfileTaskRange mapred.task.profile.maps
mapred.task.profile.reduces
Example
• Using Java API
• Using Command line parameters
jobConf.setProfileEnabled(true);
jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” +
“,depth=4,thread=y,file=%s");
jobConf.setProfileTaskRange(true, "0-2");
jobConf.setProfileTaskRange(false, "0-1");
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s 
-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-1 
input output
Collecting Profiler Output
• Hadoop JobClient automatically downloads profile logs
from all the profiled tasks
– If output format type is not specified, hprof creates profile
output in text format (format=a)
• Profiler Outputs are also available via History WebUI
• You can also download profile output using curl
– curl -o attempt_201305161037_0004_m_000000_0.hprof
"http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_
201305161037_0004_m_000000_0&filter=profile"
Task User Log
Analyze Profiler output
• You can use VisualVM, NetBeans profiler or
YourKit for analyzing the profiling data.
– The above tools support only binary format of hprof
output (i.e. option format=b)
• Example
– Run profiler with Hadoop job
– Load Profiler output using VisualVM menu option
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all,
depth=4,thread=y,format=b,file=%s 
input output
Analyze Profile Output in VisualVM
Object Query Language
• VisualVM and jhat support special query
language (OQL) to query Java heap.
– Example : Select all Strings with length 1K or more
• More information about OQL is available at
http://visualvm.java.net/oqlhelp.html
select s from java.lang.String where s.count > 1024;
Analyze Profile Output in Eclipse MAT
Profiling Pig Jobs
• Use Hadoop command line parameters
• More information about Pig job profiling is
available at Pig Wiki
– https://cwiki.apache.org/PIG/howtoprofile.html
pig -Dmapred.task.profile=true 
-Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n

-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-0 
mypigscript.pig
Profiling Hive Queries
• Set appropriate Hadoop parameters before
submitting the queries
hive> set mapred.task.profile=true;
hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s;
hive> set mapred.task.profile.maps=0-2;
hive> set mapred.task.profile.reduces=0-0;
hive>
hive> <hive query>
USING YOURKIT
YourKit Profiler - Summary
• Commercial Java Profiling Tool
– Free tryout and Open Source licenses are available
• Used by many Open Source projects including
Hadoop, Pig, Hive etc.
• Features
– On-Demand Profiling
– CPU, Memory and Concurrency profiling methods
– Has integration (Eclipse, NetBeans, IntelliJ)
– Above all, has relatively low performance overhead
Using YourKit Profiler
• You will need to install YourKit profiler (just the profiler
lib) on to each TaskTracker
• Tell Hadoop to use a different profiler
• Theoretically, you can also use DistributedCache to
make binaries available on TaskTracker machines
– Though, I did not have success with this
hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount 
-Dmapred.task.profile=true 
-Dmapred.task.profile.params=-
agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee 
-Dmapred.task.profile.maps=0-2 
-Dmapred.task.profile.reduces=0-1 
input output
Small Glitch
• Hadoop JobClient.waitforCompletion(…) will throw error since profile logs
are not available in the default directory.
• However, the job will continue to run successfully.
• To avoid this, you can instead use mapred.child.java.opts option to specify
the profiling parameters
YourKit to Analyze Jobs
• Can analyze profile output from both YourKit
Profiler and hprof/jmap.
OTHER TOOLS
Using other Tools
• JDK Tool ‘jmap’
– Can be used for capturing heap map of a running Java
process and later used for analysis inside VisualVM or
YourKit
• $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid>
• Don’t run jmap with -histo:live option on JT or NN
– Java process can also be instructed to generate hprof
dump of heap map in case of OutOfMemoryError
• -XX:+HeapDumpOnOutOfMemoryError
• JDK Tool ‘jhat’
– Can read heap dump in hprof format and provides a
light weight web interface to analyze profiler output
Other Tools (Cont…)
• Hadoop Vaidya (Simple Diagnostic Tool)
– Identifies common performance problem related
to Hadoop Jobs (unbalanced partitioning,
granularity of tasks, combiners etc.)
– Works merely on Hadoop Job (does not
understands the specifics of Hive/Pig)
Other Recommendation
• If possible try running Hadoop (MR/Pig/Hive)
in local mode using LocalJobRunner
– LocalJobRunner runs the entire MapReduce job in
a single JVM
– It simplifies profiling and log collection
– Can also be used for attaching debugger from IDE
Resources
• Troubleshooting Java application
– http://www.oracle.com/technetwork/java/javase/toc-135973.html
• Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”)
– http://my.safaribooksonline.com/book/databases/hadoop/978059652
1974/tuning-a-job/id3545664
• Profiling Pig Job
– https://cwiki.apache.org/PIG/howtoprofile.html
• ‘hprof’ Official Documentation
– http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
• YourKit Profiler
– http://www.yourkit.com

Weitere ähnliche Inhalte

Was ist angesagt?

Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Sigmoid
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 

Was ist angesagt? (20)

Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0Spark 1.6 vs Spark 2.0
Spark 1.6 vs Spark 2.0
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 

Andere mochten auch

Green Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and EngineeringGreen Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and Engineeringdigitallibrary
 
Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Floyd Arthur
 
Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet Sitters International
 
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...CA Technologies
 
QBE Workers Compensation training
QBE Workers Compensation trainingQBE Workers Compensation training
QBE Workers Compensation trainingLinda Hunter
 

Andere mochten auch (6)

Green Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and EngineeringGreen Storage 1: Economics, Environment, Energy and Engineering
Green Storage 1: Economics, Environment, Energy and Engineering
 
Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur Do you need commercial auto insurance By Floyd Arthur
Do you need commercial auto insurance By Floyd Arthur
 
Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter Pet-sitter insurance for the professional pet sitter
Pet-sitter insurance for the professional pet sitter
 
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
Pre-Con Ed: Covering Your "Assets" - Don't get Caught with Your [Software] Pa...
 
Apartment buildings insurance
Apartment buildings insuranceApartment buildings insurance
Apartment buildings insurance
 
QBE Workers Compensation training
QBE Workers Compensation trainingQBE Workers Compensation training
QBE Workers Compensation training
 

Ähnlich wie Profile hadoop apps

Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Ryan Cuprak
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018Fabio Janiszevski
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Quang Ngoc
 
Java SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptJava SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptAayush Chimaniya
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3Borni DHIFI
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JJosh Patterson
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination ExtRohit Kelapure
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTTkevinvw
 
Debugging Java from Dumps
Debugging Java from DumpsDebugging Java from Dumps
Debugging Java from DumpsChris Bailey
 

Ähnlich wie Profile hadoop apps (20)

Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0
 
Java SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).pptJava SpringMVC SpringBOOT (Divergent).ppt
Java SpringMVC SpringBOOT (Divergent).ppt
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
PHP Profiling/performance
PHP Profiling/performancePHP Profiling/performance
PHP Profiling/performance
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3
 
Java >= 9
Java >= 9Java >= 9
Java >= 9
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
 
Where is the bottleneck
Where is the bottleneckWhere is the bottleneck
Where is the bottleneck
 
Debugging Java from Dumps
Debugging Java from DumpsDebugging Java from Dumps
Debugging Java from Dumps
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Profile hadoop apps

  • 2. Agenda • Profiling General Background • Available Options • Profile using Free and Open Source tools • Profile using YourKit • Other troubleshooting tools
  • 3. What does Profiling Provide? • Profiling runtime / CPU usage: – what lines of code the program is spending the most time in – what call/invocation paths were used to get to these lines • naturally represented as tree structures • Profiling memory usage: – what kinds of objects are sitting on the heap – where were they allocated – who is pointing to them now – memory leaks
  • 4. Profiler Types and Components • Components needed for profiling – Profiling Agent • Collects profiled data (samples, traces, exceptions etc.) – Analysis Tool • Provides interface for analyzing profiled data and help user identify potential problems • Types of Profilers – insertion – sampling – instrumenting
  • 5. Available Options • Sun JDK Tools – hprof: Profiler (uses jvmti) – jmap: Provides memory map (dump) heap – jhat: Analyze memory dump – jstack: Provide thread dump – Jvisualvm: GUI based profile data analyzer • Open Source – Visual VM (same as jvisualvm but downloaded as independent app) • Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs – NetBeans Profiler • Similar to VisualVM but integrated into IDE – Eclipse MAT (Memory Analysis Tool) • Can load .hprof files • Commercial – YourKit – JProfile
  • 7. 7 Official hprof Documentation usage: java -Xrunhprof:[help]|[<option>=<value>, ...] Option Name and Value Description Default --------------------- ----------- ------- heap=dump|sites|all heap profiling all cpu=samples|times|old CPU usage off monitor=y|n monitor contention n format=a|b text(txt) or binary output a file=<file> write data to file off depth=<size> stack trace depth 4 interval=<ms> sample interval in ms 10 cutoff=<value> output cutoff point 0.0001 lineno=y|n line number in traces? Y thread=y|n thread in traces? N doe=y|n dump on exit? Y msa=y|n Solaris micro state accounting n force=y|n force output to <file> y verbose=y|n print messages about dumps y http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html
  • 8. 8 Sample hprof usage • To measure CPU usage, try the following: java -Xrunhprof:cpu=samples,depth=6,heap=dump • Settings: – Takes samples of CPU execution – Record call traces that include the last 6 levels on the stack – Dumps the heap map (bigger file size but helps in finding problems) • Creates the file java.hprof.txt in the current directory
  • 9. HPROF with Hadoop • Hadoop uses hprof as the default profiler • Profiling related parameters Purpose JobConf API Command line Parameter Enable Profiling setProfileEnabled(true) mapred.task.profile=true Additional parameters for Profiler setProfileParams(…) mapred.task.profile.params Range of sampled task to profile setProfileTaskRange mapred.task.profile.maps mapred.task.profile.reduces
  • 10. Example • Using Java API • Using Command line parameters jobConf.setProfileEnabled(true); jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” + “,depth=4,thread=y,file=%s"); jobConf.setProfileTaskRange(true, "0-2"); jobConf.setProfileTaskRange(false, "0-1"); hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  • 11. Collecting Profiler Output • Hadoop JobClient automatically downloads profile logs from all the profiled tasks – If output format type is not specified, hprof creates profile output in text format (format=a) • Profiler Outputs are also available via History WebUI • You can also download profile output using curl – curl -o attempt_201305161037_0004_m_000000_0.hprof "http://17.115.13.191:50060/tasklog?plaintext=true&attemptid=attempt_ 201305161037_0004_m_000000_0&filter=profile"
  • 13. Analyze Profiler output • You can use VisualVM, NetBeans profiler or YourKit for analyzing the profiling data. – The above tools support only binary format of hprof output (i.e. option format=b) • Example – Run profiler with Hadoop job – Load Profiler output using VisualVM menu option hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,format=b,file=%s input output
  • 14. Analyze Profile Output in VisualVM
  • 15. Object Query Language • VisualVM and jhat support special query language (OQL) to query Java heap. – Example : Select all Strings with length 1K or more • More information about OQL is available at http://visualvm.java.net/oqlhelp.html select s from java.lang.String where s.count > 1024;
  • 16. Analyze Profile Output in Eclipse MAT
  • 17. Profiling Pig Jobs • Use Hadoop command line parameters • More information about Pig job profiling is available at Pig Wiki – https://cwiki.apache.org/PIG/howtoprofile.html pig -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-0 mypigscript.pig
  • 18. Profiling Hive Queries • Set appropriate Hadoop parameters before submitting the queries hive> set mapred.task.profile=true; hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s; hive> set mapred.task.profile.maps=0-2; hive> set mapred.task.profile.reduces=0-0; hive> hive> <hive query>
  • 20. YourKit Profiler - Summary • Commercial Java Profiling Tool – Free tryout and Open Source licenses are available • Used by many Open Source projects including Hadoop, Pig, Hive etc. • Features – On-Demand Profiling – CPU, Memory and Concurrency profiling methods – Has integration (Eclipse, NetBeans, IntelliJ) – Above all, has relatively low performance overhead
  • 21. Using YourKit Profiler • You will need to install YourKit profiler (just the profiler lib) on to each TaskTracker • Tell Hadoop to use a different profiler • Theoretically, you can also use DistributedCache to make binaries available on TaskTracker machines – Though, I did not have success with this hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=- agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  • 22. Small Glitch • Hadoop JobClient.waitforCompletion(…) will throw error since profile logs are not available in the default directory. • However, the job will continue to run successfully. • To avoid this, you can instead use mapred.child.java.opts option to specify the profiling parameters
  • 23. YourKit to Analyze Jobs • Can analyze profile output from both YourKit Profiler and hprof/jmap.
  • 25. Using other Tools • JDK Tool ‘jmap’ – Can be used for capturing heap map of a running Java process and later used for analysis inside VisualVM or YourKit • $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid> • Don’t run jmap with -histo:live option on JT or NN – Java process can also be instructed to generate hprof dump of heap map in case of OutOfMemoryError • -XX:+HeapDumpOnOutOfMemoryError • JDK Tool ‘jhat’ – Can read heap dump in hprof format and provides a light weight web interface to analyze profiler output
  • 26. Other Tools (Cont…) • Hadoop Vaidya (Simple Diagnostic Tool) – Identifies common performance problem related to Hadoop Jobs (unbalanced partitioning, granularity of tasks, combiners etc.) – Works merely on Hadoop Job (does not understands the specifics of Hive/Pig)
  • 27. Other Recommendation • If possible try running Hadoop (MR/Pig/Hive) in local mode using LocalJobRunner – LocalJobRunner runs the entire MapReduce job in a single JVM – It simplifies profiling and log collection – Can also be used for attaching debugger from IDE
  • 28. Resources • Troubleshooting Java application – http://www.oracle.com/technetwork/java/javase/toc-135973.html • Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”) – http://my.safaribooksonline.com/book/databases/hadoop/978059652 1974/tuning-a-job/id3545664 • Profiling Pig Job – https://cwiki.apache.org/PIG/howtoprofile.html • ‘hprof’ Official Documentation – http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html • YourKit Profiler – http://www.yourkit.com