SlideShare a Scribd company logo
1 of 35
Download to read offline
Advanced
 Performance Forensics
Uncovering the Mysteries of Performance and Scalability
        Incidents through Forensic Engineering

                    Stephen Feldman
Senior Director Performance Engineering and Architecture
           stephen.feldman@blackboard.com
Sessions Goals

The goals of today’s session are…
• Introduce the practice of performance forensics.
• Present an argument for session level analysis.
• Discuss the difference between Resources and
  Interfaces.
• Present tools that can be used for performance
  forensics at different layers of the architectural
  stack and the client layer.
Definition of Performance Forensics

• The practice of collecting evidence, performing
  interviews and modeling for the purpose of root
  cause analysis of a performance or scalability
  problem.
  – In context of a performance (response time problem)
  – Discussing an individual event (session experience)
• Performance problems can be classified in two
  main categories:
  – Response Time Latency
  – Queuing Latency
Performance Forensics Methodology
  Identify the
    Problem
                                                                                     Develop a Problem Statement
  Identify the Most Important Operations that Affect Your Business




                                                                     Interviewing
                                                                                                              Formulate a Hypothesis


                                                                                          Collecting
                                                                                          Evidence
                                                                                                                                     Establish a Diagnosis


                                                                                                                Data
                                                                                                               Analysis




                                                                                                                                  Modeling
                                                                                                                                    and                             Perform
                                                                                                                                 Visualizing                        Session
                                                                                                                                                                   Inspection

                                                                                                                                                      Sampling
                                                                                                                                                         and
                                                                                                                                                      Simulating




                                                                             Turn the Problem Statement into a Diagnosis to Get to
                      Method-R
                                                                                                 Root Cause                                          Root Cause
Putting Performance Forensics in Context

• Emphasis on the user and the user’s actions and
  experiences.
  – How can this be measured?
• Capture the response time experience and the
  response time expectations of the user.
  – Put into perspective user action in-line with the goals
    of Method-R (what’s most important to the business)
• Identify the contributors of response latency
• Everyone needs to be involved
Measuring the Session

• When should this happen?
   – When a problem statement cannot be developed from
     the data you do have (evidence or interviews) and
     more data needs to be collected.
• How should you go about this?
   – Want to minimize disruption to the production
     environment.
   – Adaptive collection: Less Intensive to More Intensive
     over time.

Basic Sampling     Continuous Collection            Profiling
Resources vs. Interfaces

• One of the most critical data points to collect
• Interfaces are critical for understanding
  throughput and queuing models.
   – Queuing is another cause of latency
   – Also a cause of time-outs
• Resources are critical for understanding the cost
  of performing a transaction.
   – Core Resources: CPU, Memory and I/O
• Response Time = Service Time + Queue Time
The Importance of Wait Events

• Rise of Session Level Forensics
   – Underlying theme with all of these tools that “Session” is more
     important then “System”
• Wait event tuning used to account for latency
   – Exists in SQL Server (Waits and Queues) and Oracle (10046)
   – Other components not mature enough to represent
• Waits are statistical explanations of latency
• Each individual wait event might be deceiving, but
  looking at both aggregates and outliers can explain why
  a performance problem exists.
• When sampling directly, usually only have about 1 hour
  to act on the data.
Performance Forensics Tools
Categories of Tools

• HTTP and User Experience
• JVM Instrumentation Tools
• Database Instrumentation
  – Session and Wait Event
  – Cost Execution Plans
  – Profilers
Breaking Down Latency
Fiddler2

• Fiddler 2 measures end-to-end client responsiveness of
  a web request.
• Little to no overhead (less intrusive forensics)
• Captures requests in order to present http codes, size of
  objects, sequence of loading, time to process request,
  performance by bandwidth speed.
   – Rough estimation of User Experience based on locality.
• Inspects every detail of the http request
   – Detailed session inspection
   – Breakdown of http transformation
• Other Tools in Category: Y-slow/Firebug, Charlesproxy,
  liveHTTPheaders and IEInspector
Coradiant Truesight

• Commercial tool used for passive user experience
  monitoring.
• Captures page, object and session level data.
• Capable of defining Service Level Thresholds and
  Automatic Incident Management.
• Used to trace back session as if you were watching over
  the user’s shoulder.
• Exceptional tool for trend analysis. (Less Intrusive)
• Primarily used in forensics as evidence for analysis.
• Other Tools in the Category: Quest User Experience and
  Citrix EdgeSight
Coradiant Truesight
Coradiant Truesight
Log Analyzers

• Both commercial and open source tools are available to
  parse and analyze http access logs.
• Provides trend data, client statistical data, http summary
  information.
• Recommend using this data to study request and
  bandwidth trends for correlation purposes with resource
  utilization graphs.
   – Such a large volume of data.
   – Recommend working within small time slices
• Post-processing tool (No Impact to Application)
• Examples: Urchin, Summary, WebTrends, SawMill,
  Surfstats and AlterWind Log Analyzer
JSTAT

• Low intrusive statistic collector that provides
   – Percentages of usage by each region
   – Frequency/Counts of collections
   – Time spent in pause state
• Can be invoked any time without restarting the JVM by
  obtaining the Process ID
   – Exception is on Windows when the JVM is run as a background
     service
• Critical for understanding windows of stall times between
  sampling
   – Assume you collect every 5 seconds and observe a 3 second
     pause time
   – Means the application could only work for 2 seconds
JSTAT
Process of Garbage Collection
Process of Garbage Collection
-VerboseGC and -Xloggc

• JVM flags that invoke JVM logging
• Verbose JVM logging is a low-overhead
  collector (less intrusive measurement)
  – Requires a restart of the instance to run
• -XX:+PrintGCDetails is a recommended setting
  to be used with:
  – -XX:+PrintGCApplicationConcurrentTime
  – -XX:+PrintGCApplicationStoppedTime
• Provides aggregate statistics about Pause
  Times versus Working Times.
-VerboseGC and -Xloggc
IBM Pattern Modeling Tool for Java GC

• Post processing tool used for visualizing a –
  VerboseGC or –Xloggc file.
• Can make the analysis efforts for analyzing a log
  file substantially easier.
• Represents pauses/stalls at particular times
• Has no affect on the application environment as
  it reads a log file that is dormant.
IBM Pattern Modeling Tool for Java GC
JHAT, JMAP and SAP Memory Analyzer

• Jhat: Java Heap Analysis Tool takes a heap dump and
  parses the data into useful and human-digestible
  information about what's in the JVM's memory.
• JMap: Java Memory Map is a JVM tool that provides
  information about what is in the heap at a given time.
   – Provides text and OQL views into JHat data
• SAP Memory Analyzer will visualize the JHat output
• Should be run when a problem is occurring right now
   – When the system is unresponsive
   – When the JVM runs into continuous collections
ASH

• ASH: Active Session History
   – Samples session activity in the system every second.
   – 1 hour of history in memory for immediate access at your
     fingertips
• ASH in Memory
   –   Collects active session data only
   –   History v$session_wait + v$session + extras
   •   Circular Buffer - 1M to 128M (~2% of SGA)
   •   Flushed every hour to disk or when buffer 2/3 full (it protects
       itself so you can relax)
• Tools to Consider: SessPack and SessSnaper
SQL Server Performance Dashboard

• Feature of SQL Server 2005 SP2
• Template report that take advantage of DMVs
• Provides views into wait events
  – Doesn’t link events to SQL IDs in the report
  – Provides aggregate views of wait events
  – Session Level DMVs (sys.dm_os_wait_stats and
    sys.dm_exec_sessions)
• Complimentary Tools: SQL Server Health and
  History Tool and Quest Spotlight for SQL Server
Importance of Cost Execution Plans

• Can be run on databases with low overhead
   – Do not need the literal values to run
   – Both SQL Server and Oracle can run “Estimated Cost Plans”
• Each database uses an “Optimizer” that determines the
  best path of execution of SQL
   – Calculates IO, CPU and Number of Executes (Loop Conditions)
• Understanding cost operations on a particular object can
  help change your tuning strategy (ex: TABLE ACCESS
  BY INDEX ROWID)
• Cost is time
   – Query cost refers to the estimated elapsed time, in seconds,
     required to complete a query on a specific hardware
     configuration.
RML and Profiler

• The RML utilities process SQL Server trace files and view reports
  showing how SQL Server is performing.
    – Which application, database or login is using the most resources, and
      which queries are responsible for that.
    – Whether there were any plan changes for a batch during the time when
      the trace was captured and how each of those plans performed.
    – What queries are running slower in today's data compared to a previous
      set of data
• Profiler captures statements, query counts/statistics, wait events
    – Can capture and correlate profile data to Perfmon data
• Heavy overhead with both
• Other Tools to Consider: Quest Performance Analysis for SQL
  Server
Oracle OEM and 10046

• Oracle finally delivered with OEM with a web-based
  interface.
   – Performance dashboard provides great historical and present
     overview
   – Access to ADDM and ASH simplifies job of DBA
   – SQL History
• Problems
   – licensing somewhat cost prohibitive
   – Still doesn’t provide wait events
• For 10046 still need to consider profiling on your own
  and using a profiler reader like Hotsos P4.
   – Difficult to trace and capture sessions
Want More?

• Check-out my blog for postings of the
  presentation:
  http://sevenseconds.wordpress.com

• To view my resources and references for this
  presentation, visit www.scholar.com
• Simply click “Advanced Search” and search by
  sfeldman@blackboard.com and tag: ‘bbworld08’
  or ‘forensics’

More Related Content

Similar to 7.17 1130am adv.perform.forensics_bb

The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingPerfecto by Perforce
 
Pharmacovigilance Surge Resource Calculator
Pharmacovigilance Surge Resource CalculatorPharmacovigilance Surge Resource Calculator
Pharmacovigilance Surge Resource CalculatorTimothy Roe
 
Agile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationAgile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationTechWell
 
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Sudhendu Rai
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...BI Brainz
 
Monitoring An Enterprise Uc Environment
Monitoring An Enterprise Uc EnvironmentMonitoring An Enterprise Uc Environment
Monitoring An Enterprise Uc EnvironmentLanair
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverQA or the Highway
 
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
553: Oracle Database Performance: Are  Database Users Telling Me The Truth?553: Oracle Database Performance: Are  Database Users Telling Me The Truth?
553: Oracle Database Performance: Are Database Users Telling Me The Truth?Alfredo Krieg
 
Oracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthOracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthAlfredo Krieg
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing AnalyticsQASymphony
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Liming Zhu
 
Business Analytics-1.pdf
Business Analytics-1.pdfBusiness Analytics-1.pdf
Business Analytics-1.pdfNaramsettiVamsi
 
Testestimationtechniques
Testestimationtechniques Testestimationtechniques
Testestimationtechniques hongnhung_pham
 
WebSphere Business Process Simulationon
WebSphere Business Process SimulationonWebSphere Business Process Simulationon
WebSphere Business Process Simulationonrandikaucsc
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...Renato Bonomini
 
Workflow Improvement
Workflow ImprovementWorkflow Improvement
Workflow Improvementoriontech
 

Similar to 7.17 1130am adv.perform.forensics_bb (20)

The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
 
Pharmacovigilance Surge Resource Calculator
Pharmacovigilance Surge Resource CalculatorPharmacovigilance Surge Resource Calculator
Pharmacovigilance Surge Resource Calculator
 
Agile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationAgile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful Information
 
Visual Studio Profiler
Visual Studio ProfilerVisual Studio Profiler
Visual Studio Profiler
 
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
Monitoring An Enterprise Uc Environment
Monitoring An Enterprise Uc EnvironmentMonitoring An Enterprise Uc Environment
Monitoring An Enterprise Uc Environment
 
Splunk
SplunkSplunk
Splunk
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
 
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
553: Oracle Database Performance: Are  Database Users Telling Me The Truth?553: Oracle Database Performance: Are  Database Users Telling Me The Truth?
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
 
Oracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthOracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truth
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing Analytics
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
 
Business Analytics-1.pdf
Business Analytics-1.pdfBusiness Analytics-1.pdf
Business Analytics-1.pdf
 
Testestimationtechniques
Testestimationtechniques Testestimationtechniques
Testestimationtechniques
 
WebSphere Business Process Simulationon
WebSphere Business Process SimulationonWebSphere Business Process Simulationon
WebSphere Business Process Simulationon
 
Fundamentals Performance Testing
Fundamentals Performance TestingFundamentals Performance Testing
Fundamentals Performance Testing
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Workflow Improvement
Workflow ImprovementWorkflow Improvement
Workflow Improvement
 

More from Steve Feldman

Day 2 05 - steve feldman - logging matters
Day 2 05 - steve feldman - logging mattersDay 2 05 - steve feldman - logging matters
Day 2 05 - steve feldman - logging mattersSteve Feldman
 
Cookbook for Administrating Blackboard Learn
Cookbook for Administrating Blackboard LearnCookbook for Administrating Blackboard Learn
Cookbook for Administrating Blackboard LearnSteve Feldman
 
Emerging technologies
Emerging technologiesEmerging technologies
Emerging technologiesSteve Feldman
 
Bb world 2011 capacity planning
Bb world 2011 capacity planningBb world 2011 capacity planning
Bb world 2011 capacity planningSteve Feldman
 
Scaling Blackboard Learn™ for High Performance and Delivery
Scaling Blackboard Learn™ for High Performance and DeliveryScaling Blackboard Learn™ for High Performance and Delivery
Scaling Blackboard Learn™ for High Performance and DeliverySteve Feldman
 
So Your Boss Wants You to Performance Test Blackboard
So Your Boss Wants You to Performance Test BlackboardSo Your Boss Wants You to Performance Test Blackboard
So Your Boss Wants You to Performance Test BlackboardSteve Feldman
 
Short reference architecture
Short reference architectureShort reference architecture
Short reference architectureSteve Feldman
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Steve Feldman
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)Steve Feldman
 
Bb performance-engineering-toad
Bb performance-engineering-toadBb performance-engineering-toad
Bb performance-engineering-toadSteve Feldman
 
Bb performance-engineering-spotlight
Bb performance-engineering-spotlightBb performance-engineering-spotlight
Bb performance-engineering-spotlightSteve Feldman
 
Sun blackboardwp10 1_07
Sun blackboardwp10 1_07Sun blackboardwp10 1_07
Sun blackboardwp10 1_07Steve Feldman
 
Dell bb quest_wp_jan6
Dell bb quest_wp_jan6Dell bb quest_wp_jan6
Dell bb quest_wp_jan6Steve Feldman
 
Hied blackboard dell_whitepaper
Hied blackboard dell_whitepaperHied blackboard dell_whitepaper
Hied blackboard dell_whitepaperSteve Feldman
 
Hied blackboard whitepaper
Hied blackboard whitepaperHied blackboard whitepaper
Hied blackboard whitepaperSteve Feldman
 
B2conference performance 2004
B2conference performance 2004B2conference performance 2004
B2conference performance 2004Steve Feldman
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftSteve Feldman
 

More from Steve Feldman (20)

Day 2 05 - steve feldman - logging matters
Day 2 05 - steve feldman - logging mattersDay 2 05 - steve feldman - logging matters
Day 2 05 - steve feldman - logging matters
 
Logonomics
LogonomicsLogonomics
Logonomics
 
Cookbook for Administrating Blackboard Learn
Cookbook for Administrating Blackboard LearnCookbook for Administrating Blackboard Learn
Cookbook for Administrating Blackboard Learn
 
Emerging technologies
Emerging technologiesEmerging technologies
Emerging technologies
 
Bb sql serverdell
Bb sql serverdellBb sql serverdell
Bb sql serverdell
 
Bb world 2011 capacity planning
Bb world 2011 capacity planningBb world 2011 capacity planning
Bb world 2011 capacity planning
 
Scaling Blackboard Learn™ for High Performance and Delivery
Scaling Blackboard Learn™ for High Performance and DeliveryScaling Blackboard Learn™ for High Performance and Delivery
Scaling Blackboard Learn™ for High Performance and Delivery
 
So Your Boss Wants You to Performance Test Blackboard
So Your Boss Wants You to Performance Test BlackboardSo Your Boss Wants You to Performance Test Blackboard
So Your Boss Wants You to Performance Test Blackboard
 
Short reference architecture
Short reference architectureShort reference architecture
Short reference architecture
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
 
Dell bb wp_final
Dell bb wp_finalDell bb wp_final
Dell bb wp_final
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
 
Bb performance-engineering-toad
Bb performance-engineering-toadBb performance-engineering-toad
Bb performance-engineering-toad
 
Bb performance-engineering-spotlight
Bb performance-engineering-spotlightBb performance-engineering-spotlight
Bb performance-engineering-spotlight
 
Sun blackboardwp10 1_07
Sun blackboardwp10 1_07Sun blackboardwp10 1_07
Sun blackboardwp10 1_07
 
Dell bb quest_wp_jan6
Dell bb quest_wp_jan6Dell bb quest_wp_jan6
Dell bb quest_wp_jan6
 
Hied blackboard dell_whitepaper
Hied blackboard dell_whitepaperHied blackboard dell_whitepaper
Hied blackboard dell_whitepaper
 
Hied blackboard whitepaper
Hied blackboard whitepaperHied blackboard whitepaper
Hied blackboard whitepaper
 
B2conference performance 2004
B2conference performance 2004B2conference performance 2004
B2conference performance 2004
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
 

7.17 1130am adv.perform.forensics_bb

  • 1. Advanced Performance Forensics Uncovering the Mysteries of Performance and Scalability Incidents through Forensic Engineering Stephen Feldman Senior Director Performance Engineering and Architecture stephen.feldman@blackboard.com
  • 2. Sessions Goals The goals of today’s session are… • Introduce the practice of performance forensics. • Present an argument for session level analysis. • Discuss the difference between Resources and Interfaces. • Present tools that can be used for performance forensics at different layers of the architectural stack and the client layer.
  • 3. Definition of Performance Forensics • The practice of collecting evidence, performing interviews and modeling for the purpose of root cause analysis of a performance or scalability problem. – In context of a performance (response time problem) – Discussing an individual event (session experience) • Performance problems can be classified in two main categories: – Response Time Latency – Queuing Latency
  • 4. Performance Forensics Methodology Identify the Problem Develop a Problem Statement Identify the Most Important Operations that Affect Your Business Interviewing Formulate a Hypothesis Collecting Evidence Establish a Diagnosis Data Analysis Modeling and Perform Visualizing Session Inspection Sampling and Simulating Turn the Problem Statement into a Diagnosis to Get to Method-R Root Cause Root Cause
  • 5. Putting Performance Forensics in Context • Emphasis on the user and the user’s actions and experiences. – How can this be measured? • Capture the response time experience and the response time expectations of the user. – Put into perspective user action in-line with the goals of Method-R (what’s most important to the business) • Identify the contributors of response latency • Everyone needs to be involved
  • 6. Measuring the Session • When should this happen? – When a problem statement cannot be developed from the data you do have (evidence or interviews) and more data needs to be collected. • How should you go about this? – Want to minimize disruption to the production environment. – Adaptive collection: Less Intensive to More Intensive over time. Basic Sampling Continuous Collection Profiling
  • 7. Resources vs. Interfaces • One of the most critical data points to collect • Interfaces are critical for understanding throughput and queuing models. – Queuing is another cause of latency – Also a cause of time-outs • Resources are critical for understanding the cost of performing a transaction. – Core Resources: CPU, Memory and I/O • Response Time = Service Time + Queue Time
  • 8. The Importance of Wait Events • Rise of Session Level Forensics – Underlying theme with all of these tools that “Session” is more important then “System” • Wait event tuning used to account for latency – Exists in SQL Server (Waits and Queues) and Oracle (10046) – Other components not mature enough to represent • Waits are statistical explanations of latency • Each individual wait event might be deceiving, but looking at both aggregates and outliers can explain why a performance problem exists. • When sampling directly, usually only have about 1 hour to act on the data.
  • 10. Categories of Tools • HTTP and User Experience • JVM Instrumentation Tools • Database Instrumentation – Session and Wait Event – Cost Execution Plans – Profilers
  • 12. Fiddler2 • Fiddler 2 measures end-to-end client responsiveness of a web request. • Little to no overhead (less intrusive forensics) • Captures requests in order to present http codes, size of objects, sequence of loading, time to process request, performance by bandwidth speed. – Rough estimation of User Experience based on locality. • Inspects every detail of the http request – Detailed session inspection – Breakdown of http transformation • Other Tools in Category: Y-slow/Firebug, Charlesproxy, liveHTTPheaders and IEInspector
  • 13.
  • 14. Coradiant Truesight • Commercial tool used for passive user experience monitoring. • Captures page, object and session level data. • Capable of defining Service Level Thresholds and Automatic Incident Management. • Used to trace back session as if you were watching over the user’s shoulder. • Exceptional tool for trend analysis. (Less Intrusive) • Primarily used in forensics as evidence for analysis. • Other Tools in the Category: Quest User Experience and Citrix EdgeSight
  • 17. Log Analyzers • Both commercial and open source tools are available to parse and analyze http access logs. • Provides trend data, client statistical data, http summary information. • Recommend using this data to study request and bandwidth trends for correlation purposes with resource utilization graphs. – Such a large volume of data. – Recommend working within small time slices • Post-processing tool (No Impact to Application) • Examples: Urchin, Summary, WebTrends, SawMill, Surfstats and AlterWind Log Analyzer
  • 18. JSTAT • Low intrusive statistic collector that provides – Percentages of usage by each region – Frequency/Counts of collections – Time spent in pause state • Can be invoked any time without restarting the JVM by obtaining the Process ID – Exception is on Windows when the JVM is run as a background service • Critical for understanding windows of stall times between sampling – Assume you collect every 5 seconds and observe a 3 second pause time – Means the application could only work for 2 seconds
  • 19. JSTAT
  • 20. Process of Garbage Collection
  • 21. Process of Garbage Collection
  • 22. -VerboseGC and -Xloggc • JVM flags that invoke JVM logging • Verbose JVM logging is a low-overhead collector (less intrusive measurement) – Requires a restart of the instance to run • -XX:+PrintGCDetails is a recommended setting to be used with: – -XX:+PrintGCApplicationConcurrentTime – -XX:+PrintGCApplicationStoppedTime • Provides aggregate statistics about Pause Times versus Working Times.
  • 24. IBM Pattern Modeling Tool for Java GC • Post processing tool used for visualizing a – VerboseGC or –Xloggc file. • Can make the analysis efforts for analyzing a log file substantially easier. • Represents pauses/stalls at particular times • Has no affect on the application environment as it reads a log file that is dormant.
  • 25. IBM Pattern Modeling Tool for Java GC
  • 26. JHAT, JMAP and SAP Memory Analyzer • Jhat: Java Heap Analysis Tool takes a heap dump and parses the data into useful and human-digestible information about what's in the JVM's memory. • JMap: Java Memory Map is a JVM tool that provides information about what is in the heap at a given time. – Provides text and OQL views into JHat data • SAP Memory Analyzer will visualize the JHat output • Should be run when a problem is occurring right now – When the system is unresponsive – When the JVM runs into continuous collections
  • 27.
  • 28. ASH • ASH: Active Session History – Samples session activity in the system every second. – 1 hour of history in memory for immediate access at your fingertips • ASH in Memory – Collects active session data only – History v$session_wait + v$session + extras • Circular Buffer - 1M to 128M (~2% of SGA) • Flushed every hour to disk or when buffer 2/3 full (it protects itself so you can relax) • Tools to Consider: SessPack and SessSnaper
  • 29. SQL Server Performance Dashboard • Feature of SQL Server 2005 SP2 • Template report that take advantage of DMVs • Provides views into wait events – Doesn’t link events to SQL IDs in the report – Provides aggregate views of wait events – Session Level DMVs (sys.dm_os_wait_stats and sys.dm_exec_sessions) • Complimentary Tools: SQL Server Health and History Tool and Quest Spotlight for SQL Server
  • 30.
  • 31. Importance of Cost Execution Plans • Can be run on databases with low overhead – Do not need the literal values to run – Both SQL Server and Oracle can run “Estimated Cost Plans” • Each database uses an “Optimizer” that determines the best path of execution of SQL – Calculates IO, CPU and Number of Executes (Loop Conditions) • Understanding cost operations on a particular object can help change your tuning strategy (ex: TABLE ACCESS BY INDEX ROWID) • Cost is time – Query cost refers to the estimated elapsed time, in seconds, required to complete a query on a specific hardware configuration.
  • 32. RML and Profiler • The RML utilities process SQL Server trace files and view reports showing how SQL Server is performing. – Which application, database or login is using the most resources, and which queries are responsible for that. – Whether there were any plan changes for a batch during the time when the trace was captured and how each of those plans performed. – What queries are running slower in today's data compared to a previous set of data • Profiler captures statements, query counts/statistics, wait events – Can capture and correlate profile data to Perfmon data • Heavy overhead with both • Other Tools to Consider: Quest Performance Analysis for SQL Server
  • 33. Oracle OEM and 10046 • Oracle finally delivered with OEM with a web-based interface. – Performance dashboard provides great historical and present overview – Access to ADDM and ASH simplifies job of DBA – SQL History • Problems – licensing somewhat cost prohibitive – Still doesn’t provide wait events • For 10046 still need to consider profiling on your own and using a profiler reader like Hotsos P4. – Difficult to trace and capture sessions
  • 34.
  • 35. Want More? • Check-out my blog for postings of the presentation: http://sevenseconds.wordpress.com • To view my resources and references for this presentation, visit www.scholar.com • Simply click “Advanced Search” and search by sfeldman@blackboard.com and tag: ‘bbworld08’ or ‘forensics’