Alistair Croll, Interop conference faculty and Coradiant's VP of product management gives an unbiased, top down view of Web performance monitoring. This informative look at Web measurement business goals, operating processes, tools and metrics will give you a solid understanding of the issues, without a product pitch. Coradiant is the leader in Web Performance Monitoring. The award-winning TrueSight Real-User Monitor allows organizations to watch what matters to their business, by delivering accurate, detailed information on the performance and integrity of Web applications in real time. Incident management, service-level management and change-impact management are three key capabilities. TrueSight watches any web or enterprise web application and lets site operators identify problems more quickly, isolate root-cause faster, and effect fixes more quickly than anything else on the market.
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Â
Gigamon U - Web Performance Monitoring
1. Best Practices in
Web Performance Monitoring
Alistair A. Croll
VP Product Management and Co-Founder
1
So you want to
monitor things.
2
1
2. But there are too
many toys out thereâŚ
3
A top-down approach to web
performance monitoring
Business goals
Operating processes
Tools
Metrics
4
2
3. A top-down approach to web
performance monitoring
Start
Business goals here!
Simplify & interpret
Operating processes
Tools
Metrics
5
What goals?
(in plain English)
6
3
4. Goals
⢠Make the application available
â I can use it
⢠Ensure user satisfaction
â Itâs fast & meets or exceeds my expectations
⢠Balance capacity with demand
â It handles the peak loads
â It doesnât cost too much
⢠Minimize MTTR
â When it breaks, I can fix it efficiently
⢠Align operations tasks with business priorities
â I work on what matters first
7
They can use it
8
4
5. Make the application available
⢠The most basic goal
⢠App should be reachable, responsive, and
functionally correct
⢠3 completely different issues
â Can I communicate with the service?
â Can I get end-to-end responses in a timely
manner?
â Is the application behaving properly?
9
Theyâre happy &
productive
10
5
6. Ensure user satisfaction
⢠How fast is fast enough?
⢠Depends on the task
â Login versus reports
⢠Depends on user expectations
â ATMs versus banking systems
⢠Depends on the userâs state of mind
â Deeply engaged versus browsing
11
Balance capacity with demand
⢠Performance degrades with demand
Performance
(end-to-end
delay)
Maximum
acceptable delay
Maximum capacity
Load (requests per second)
12
6
7. I can fix it fast
13
Minimize MTTR
⢠Fix it efficiently
⢠Know the costs of downtime
⢠Application- and business-dependent
â Direct (operational) costs
â Penalties
â Opportunity costs
â Abandonment costs
14
7
8. Minimize MTTR
⢠Donât just think about lost revenue
15
Minimize MTTR
⢠And consider the whole resolution cycle
Event IT
occurs Aware Reproduced Diagnosed Resolved Deployed Verified
Time to recover
16
8
9. I worry about what
matters
17
Align operations tasks with
business priorities
⢠Know what the business goals are
⢠Fix problems, not incidents
⢠Know the real impact of an issue
18
9
10. Align operations tasks with
business priorities
⢠Tackle problems, not incidents
SLM
Incident Problem
And violation
So did
theyâre all
everyone 10% of
Bob from else in Houston coming
Houston!
from requests are
Houston had canât use the Houston!
getting 500
a 500 error order app
errors
19
Align operations tasks with
business priorities
⢠Know the real impact of issues
Errored requests
Requests
Change from ânormalâ
Total impact
Affected users
Time
Good requests
20
10
11. So I have these goalsâŚ
⢠Make the application available
⢠Ensure user satisfaction
⢠Balance capacity with demand
⢠Minimize MTTR
⢠Align operations tasks with business
priorities
⢠How do I make sure I meet them
repeatably and predictably?
21
Okay, got the goals
22
11
12. But how do I make
this real?
23
A top-down approach to web
performance monitoring
Business goals
Goals drive
Operating processes processes
Tools
Metrics
24
12
13. Processes
⢠Reporting & overcommunication
⢠Capacity planning
⢠SLA definition
⢠Problem detection
⢠Problem localization & resolution
25
Keep people informed
26
13
14. Reporting & overcommunication:
Know the audience
Network latency, throughput,
Network operations retransmissions, service outages
Abandonment, conversion,
Marketing demographics
Host latency, server errors,
Server operations session concurrency
Security Anomalies, fraudulent activity
Capacity planning, time out of
Finance SLA, IT repair costs
Different stakeholders The same data sources
27
I have enough juice
28
14
15. Capacity planning
⢠Define peak load
⢠Define acceptable performance &
availability
⢠Select margin of error
â Cost of being wrong
â Variance and confidence in the data
⢠Build capacity & monitor
â Performance versus load
29
Capacity planning
30
15
16. We all agree on whatâs
âgood enoughâ
31
SLA definition
⢠Select a metric
⢠Select an SLA target
â That you control
â That can be reliably measured
⢠Define how many transactions can exceed
this target before being in violation
⢠Monitor
â Metric, percentile
32
16
17. SLA definition
⢠95% of all searches by zipcode by all HR
personnel will take under 2 seconds for
the network to deliver
95% Percentiles, not averages
All searches by zipcode Application function, not port
All HR personnel User-centric, actual requests
Under 2 seconds Performance metric
For the network to deliver A specific element of delay
33
I know where
problems areâŚ
34
17
18. Problem detection
⢠Detect incidents as soon as they affect
even one user
⢠Is the incident part of a bigger problem?
⢠Prioritize problems by business impact
â Number of users affected
â Dollar value lost
â Severity of the issue
35
âŚand I can figure out
whatâs behind them
36
18
19. Problem localization & resolution
⢠Reproduction of the error
â Capture a sample incident
⢠Deductive reasoning
â Check tests to see what else is failing
â Do incidents share a common element?
â Do incidents happen at a certain load?
â Do incidents recur around a certain time?
37
Problem localization & resolution
38
19
20. Problem localization & resolution
⢠What do they have in common?
39
Problem localization & resolution
40
20
21. A top-down approach to web
performance monitoring
Business goals
Operating processes Select
tools that
make
Tools processes
work best
Metrics
41
Tools:
The three-legged stool
Device
Synthetic
Real User
42
21
22. Device monitoring:
Watching the infrastructure
⢠Less relation to application availability
⢠Vital for troubleshooting and localization
⢠Will show âhard downâ errors
â But good sites are redundant anyway
⢠Correlation between a metric (CPU, RAM)
and performance degradation shows
where to add capacity
43
Synthetic testing:
Checking it yourself
⢠Local or outside
⢠Same test each time
⢠Excellent for network
baselining when you
canât control end-
userâs connection
⢠Use to check if a
region or function is
down for everyone
⢠Limited usefulness for
problem re-creation
44
22
23. Synthetic testing:
Checking it yourself
45
Real User Monitoring:
2 main uses
⢠Tactical
â Detect an incident as soon as 1 user gets it
â Capture session forensics
⢠Long-term
â Actual user service delivery
â Performance/load relations
â Capacity planning
46
23
24. Real user monitoring:
2 main uses
⢠Outlined in ITIL
Service support Service delivery
Incident management Service level management
Problem management Availability management
Capacity planning
47
OK, Iâve got the tools.
What do I look at?
48
24
25. A top-down approach to web
performance monitoring
Business goals
Operating processes
Tools
Use the right
metrics for
Metrics the audience
& question
49
Metrics
⢠Measure everything
â A full performance model
⢠Availability
â Can I use it?
⢠User satisfaction
â Whatâs the impact of bad performance?
⢠Use percentiles
â Averages lie
50
25
26. A full performance model
⢠The HTTP data model
â Redirects
â Containers
â Components
â User sessions
⢠HTTP-specific latency
â SSL
â Redirect time
â Host latency
â Network latency
â Idle time
â Think time
51
Availability
⢠Network errors
â High
retransmissions,
DNS resolution
failure
52
26
27. Availability
⢠Client errors
â 404 not found
53
Availability
⢠Application errors
â HTTP 500
54
27
28. Availability
⢠Service errors
55
Availability
⢠Content & back-
end errors
â âODBC Error
#1234â
56
28
29. Availability
⢠Custom errors
â Specific to your
business
57
User satisfaction:
Satisfied, tolerating, frustrated
What metric? What function?
Target
performance
Impact on
users
Percentile
data
58
29
30. Averages lie:
Use percentiles
59
Averages lie:
Use percentiles
Average varies wildly,
making it hard to
threshold properly or
see a real slow-down.
60
30
31. Averages lie:
Use percentiles
80th percentile
only spikes once
for a legitimate
slow-down (20%
of users affected)
61
Averages lie:
Use percentiles
Setting a useful
threshold on
percentiles gives
less false positives
and more real alerts
62
31
32. A top-down approach to web
performance monitoring
Business goals
Operating processes
Tools
Metrics
63
Questions?
acroll<at>coradiant.com
(514) 944-2765
64
32