The document discusses performance forensics methodology for investigating performance issues. It begins with collecting data, either proactively through monitoring or retrospectively through simulation. Interviews are used to gather additional context, while modeling and visualization help identify root causes. The circular nature of the process allows continuously improving data collection and understanding of system behavior. Performance issues are approached as symptoms providing clues, rather than crimes with definitive causes.
3. Did you all get a chance to read that? As a public company we need to have
our disclosure statement before allfrom the ground youon true enterprise
Vista is the only solution built presentations. If up have any questions
ontechnology --- please speak with our Generalyou continue to provide your
what it means allowing you to ensure that Counsel.
faculty and students an outstanding experience
3
10. Our session isn’t about crime scene investigation. I’m not going to attempt to
talk about digital forensics from a security and fraud perspective. I’m here to
put context around performance and scalability incidents that happen in your
application environments every day, week, month and year. There’s no
reason to assume that performance problems can only be solved by “black
magic”. A rational, evidence-supported process can be used to solve a
performance problem.
The first definition about forensics assumes that “something” occurs in the
form of a crime. For all intensive purposes a crime is considered pre-
meditated. Often crimes such as denial of service, script injection and spam
attacks can cause performance problems. Performance incidents can
happen when they are not necessarily premeditated. They happen because
of failure. Things break…it’s a fact of life.
When breakage or breakdowns occur they are not visibly apparent or
obvious to every set of eyes looking on. This can be problematic when
diagnosis bias is introduced or value attribution is incorrectly established.
10
13. Search Google and you will be amazed how little meaningful
information you get by the words “performance forensics” in the
context of computers and software. One paper by Bob Sneed from
Sun Microsystems (http://www.sun.com/blueprints/1203/817-4444.pdf)
is out there, but very little else.
So you will have to trust me in my primitive definition of performance
forensics. You might even offer to help make it better.
Performance forensics is like any other forensics process. It begins
with collective evidence. If you are lucky and have a lot of tools in
place you will have a starting point of data to sift through. More often
then not, the data is not there. You are not always lucky to have the
data when you need it and/or it might not be in the best format for
getting to the root cause of a problem.
Evidence as we will discuss later can be collected after the fact.
Techniques such as discrete simulation can be used to re-enact an
incident. When that does happen, you have the ability to capture all of
the data you want. You simply need to know what data to collect. It’s a
circuitous loop of sorts…mainly because you might not know what
data to collect to begin with.
It’s like when I look under the hood of my car. I have no idea what I’m
looking at…Maybe it’s that smoking gun I’m in search of. Yeah, I
guess if I see some kind of corrosive, smoke or leak it might be
painfully obvious…but it never is. Not with today’s cars…Computers
and software specifically are the same. Rarely is there that smoking 13
gun sitting in front of your face waiting to be found. Thus evidence is
28. Problems are not always easily identifiable. When I say that I feel off or sick, I leave the
listener desiring more information. They might infer that I have a stomach pain, a cold or a
headache. It could be that I am tired or I have a broken arm. A more related example that I
often hear is that my system is slow. What defines slow? Can you show me? Can I
experience the slowness?
Is it always slow every single day and every minute? Are all of the components that make up
the physical architecture necessarily slow? Are particular use cases experiencing latency?
Do they always experience latency or is it at specific times? Is it specific users who
experience latency? Are the users different is some kind of fashion? Does the problem
happen after a particular interaction pattern? Does it happen with a particular piece of data?
When a problem is easily identifiable, define a clear, intelligible problem statement. The
problem statement is used to aid the investigation so the forensics process can focus on
collecting meaningful data to get to root cause analysis.
Narrowing down to a problem statement from the unknown can be an exhaustive effort. Start
with questioning (not formal interviewing) in which your goal is to exclusively narrow down
the chasm of possibilities. Start with the “Lassie Question: Can you show me?” Experiencing
the problem first hand provides basic context. If the problem can’t be reproduced, try to
provide supporting clues so that the unpredictable can become more predictable. You can’t
necessarily replicate the performance problem at will. Do you have supporting data about
your experience? Can you explain what happened to you? Do you know when it happened
(smallest time window)? Has it happened before? If so when? Try to get down to the exact
minute if possible. Has it happened to anyone else? What were they doing? Did it happen to
them at the same time as you?
It comes off like you are asking dozens and dozens of questions, but in reality you are not.
You are gathering basic context: Who, What, Where and When.
Be unwilling to announce a problem statement until you have confidence in the development
of the problem statement (not the cause of the issue). Remember we are not diagnosing, we
are simply collecting and announcing symptoms.
28
30. I’m not the creator of this methodology. I’m quite sure that others who are far more knowledgeable on the subject
would tell you I’m possibly missing a step or that I am drawing out the process too far. A picture is truly worth a
thousand words.
I will breakdown each element of the methodology in subsequent slides. I’ve designed a circular visualization for
the obvious conclusion that I’ve come to over the years in which the process must revolve in order to come to root
cause analysis.
Performance forensics doesn’t necessarily begin with evidence collection. Rather, it potentially begins long before
an incident occurs. Let’s take an abstract example such as a person complains about chest pain. The person tells
their spouse that at times they have unbearable pains, but eventually it goes away. It doesn’t happen enough and
the pain isn’t so severe that it’s worth the time or the effort to go to the doctor. The process of convincing yourself
that the symptoms you are experiencing is not what you really have is called diagnosis bias. I will talk about this in
greater detail later.
This pain might go on and on for quite some time until it progresses. Analysis could be initiated at any point. More
often then not, the complaints go unrealized and forensics is placed on hold. It comes back later on. The question
is when. Typically when a terrible even occurs. It could be a heart attack or sadly a loss of life. The forensic
engineer is tasked with tracing back why it happened, was foul play suspected and could it have been avoided.
I propose that at any time the methodology can be initiated. No major issue has to occur for performance forensics
to begin. Symptoms do not necessarily have to show-up for the process to begin. You can call this what you want,
but basically the collection of evidence, interviewing, modeling/visualizing and planning for the future is most
commonly referred to as capacity planning. It’s not the much different from what we are trying to accomplish with
performance forensics. The key difference is proactive behavior versus reactive behavior.
The methodology begins with the collection of data. We can call this data evidence. Evidence is collected in two
ways: intended data collection and simulated data collection. When data is not available, we often go through the
process of putting data collectors in place. The thought behind this is that if something happened once, it’s bound
to happen again.
Interviewing is incorporated into the methodology. I will discuss techniques for interviewing. Understand that when
humans are involved and asked to participate, you run the greatest chance for diagnosis bias and value attribution
(two topics I will present in greater detail).
Next I will discuss why modeling and visualizing a problem can be critical at getting to the root cause of a
performance issue.
30
32. The best analogy to depict the execution model for wait events is the grocery
store checkout line. Assume the cashier is the CPU. The customer who is
currently being checked out by the cashier is the running session. The
customers who are waiting in line represent the runnable queue.
If customer1 who is being checked out requires a price check on a product,
customer1 must wait until the price check is completed. Meanwhile, the next
in line, customer2, is immediately checked out by the cashier until the price
check is completed for customer1.
When the price check is completed, the cashier can resume the check out of
customer1. This is the simplest illustration of the wait event execution
model.
32
33. The best analogy to depict the execution model for wait events is the grocery
store checkout line. Assume the cashier is the CPU. The customer who is
currently being checked out by the cashier is the running session. The
customers who are waiting in line represent the runnable queue.
If customer1 who is being checked out requires a price check on a product,
customer1 must wait until the price check is completed. Meanwhile, the next
in line, customer2, is immediately checked out by the cashier until the price
check is completed for customer1.
When the price check is completed, the cashier can resume the check out of
customer1. This is the simplest illustration of the wait event execution
model.
33
39. JConsole on Steroids
Great presentation: http://www.javapassion.com/javase/VisualVM.pdf
Another Great Presentation:
http://weblogs.java.net/blog/mandychung/archive/VisualVM-BOF-2007.pdf
39
51. Great presentation highlighting differences between tools:
http://assets.en.oreilly.com/1/event/29/Website%20Performance%20Analysi
s%20Presentation.ppt
51
58. Can also consider using Microsoft VRTA:
http://www.microsoft.com/downloads/details.aspx?FamilyID=119f3477-dced-
41e3-a0e7-d8b5cae893a3&displaylang=en
58