Gigamon U - Web Performance Monitoring

Best Practices in
Web Performance Monitoring
Alistair A. Croll
VP Product Management and Co-Founder

1

So you want to
monitor things.

2

1

But there are too
many toys out there…

3

A top-down approach to web
performance monitoring

Business goals

Operating processes

Tools

Metrics

4

2


Start
Business goals here!
Simplify & interpret

Operating processes

Tools

Metrics

5

What goals?
(in plain English)

6

3

Goals

• Make the application available
– I can use it
• Ensure user satisfaction
– It’s fast & meets or exceeds my expectations
• Balance capacity with demand
– It handles the peak loads
– It doesn’t cost too much
• Minimize MTTR
– When it breaks, I can fix it efficiently
• Align operations tasks with business priorities
– I work on what matters first

7

They can use it

8

4

Make the application available

• The most basic goal
• App should be reachable, responsive, and
functionally correct
• 3 completely different issues
– Can I communicate with the service?
– Can I get end-to-end responses in a timely
manner?
– Is the application behaving properly?

9

They’re happy &
productive

10

5

Ensure user satisfaction

• How fast is fast enough?
• Depends on the task
– Login versus reports
• Depends on user expectations
– ATMs versus banking systems
• Depends on the user’s state of mind
– Deeply engaged versus browsing

11

Balance capacity with demand

• Performance degrades with demand

Performance
(end-to-end
delay)

Maximum
acceptable delay

Maximum capacity

Load (requests per second)

12

6

I can fix it fast

13

Minimize MTTR

• Fix it efficiently
• Know the costs of downtime
• Application- and business-dependent
– Direct (operational) costs
– Penalties
– Opportunity costs
– Abandonment costs

14

7

Minimize MTTR

• Don’t just think about lost revenue

15

Minimize MTTR

• And consider the whole resolution cycle

Event IT
occurs Aware Reproduced Diagnosed Resolved Deployed Verified

Time to recover

16

8

I worry about what
matters

17

Align operations tasks with
business priorities
• Know what the business goals are
• Fix problems, not incidents
• Know the real impact of an issue

18

9

business priorities
• Tackle problems, not incidents

SLM
Incident Problem
And violation
So did
they’re all
everyone 10% of
Bob from else in Houston coming
Houston!
from requests are
Houston had can’t use the Houston!
getting 500
a 500 error order app
errors

19

business priorities
• Know the real impact of issues

Errored requests
Requests

Change from “normal”
Total impact

Affected users

Time
Good requests

20

10

So I have these goals…

• Make the application available
• Ensure user satisfaction
• Balance capacity with demand
• Minimize MTTR
• Align operations tasks with business
priorities

• How do I make sure I meet them
repeatably and predictably?

21

Okay, got the goals

22

11

But how do I make
this real?

23


Business goals
Goals drive
Operating processes processes

Tools

Metrics

24

12

Processes

• Reporting & overcommunication
• Capacity planning
• SLA definition
• Problem detection
• Problem localization & resolution

25

Keep people informed

26

13

Reporting & overcommunication:
Know the audience

Network latency, throughput,
Network operations retransmissions, service outages
Abandonment, conversion,
Marketing demographics
Host latency, server errors,
Server operations session concurrency

Security Anomalies, fraudulent activity

Capacity planning, time out of
Finance SLA, IT repair costs

Different stakeholders The same data sources
27

I have enough juice

28

14

Capacity planning

• Define peak load
• Define acceptable performance &
availability
• Select margin of error
– Cost of being wrong
– Variance and confidence in the data
• Build capacity & monitor
– Performance versus load

29

Capacity planning

30

15

We all agree on what’s
“good enough”

31

SLA definition

• Select a metric
• Select an SLA target
– That you control
– That can be reliably measured
• Define how many transactions can exceed
this target before being in violation
• Monitor
– Metric, percentile

32

16

SLA definition

• 95% of all searches by zipcode by all HR
personnel will take under 2 seconds for
the network to deliver

95% Percentiles, not averages
All searches by zipcode Application function, not port
All HR personnel User-centric, actual requests
Under 2 seconds Performance metric
For the network to deliver A specific element of delay

33

I know where
problems are…

34

17

Problem detection

• Detect incidents as soon as they affect
even one user
• Is the incident part of a bigger problem?
• Prioritize problems by business impact
– Number of users affected
– Dollar value lost
– Severity of the issue

35

…and I can figure out
what’s behind them

36

18

Problem localization & resolution

• Reproduction of the error
– Capture a sample incident
• Deductive reasoning
– Check tests to see what else is failing
– Do incidents share a common element?
– Do incidents happen at a certain load?
– Do incidents recur around a certain time?

37


38

19


• What do they have in common?

39


40

20


Business goals

Operating processes Select
tools that
make
Tools processes
work best

Metrics

41

Tools:
The three-legged stool

Device

Synthetic

Real User
42

21

Device monitoring:
Watching the infrastructure

• Less relation to application availability
• Vital for troubleshooting and localization
• Will show “hard down” errors
– But good sites are redundant anyway
• Correlation between a metric (CPU, RAM)
and performance degradation shows
where to add capacity

43

Synthetic testing:
Checking it yourself
• Local or outside
• Same test each time
• Excellent for network
baselining when you
can’t control end-
user’s connection
• Use to check if a
region or function is
down for everyone
• Limited usefulness for
problem re-creation

44

22

Synthetic testing:
Checking it yourself

45

Real User Monitoring:
2 main uses

• Tactical
– Detect an incident as soon as 1 user gets it
– Capture session forensics
• Long-term
– Actual user service delivery
– Performance/load relations
– Capacity planning

46

23

Real user monitoring:
2 main uses

• Outlined in ITIL

Service support Service delivery

Incident management Service level management

Problem management Availability management

Capacity planning

47

OK, I’ve got the tools.
What do I look at?

48

24


Business goals

Operating processes

Tools
Use the right
metrics for
Metrics the audience
& question
49

Metrics

• Measure everything
– A full performance model
• Availability
– Can I use it?
• User satisfaction
– What’s the impact of bad performance?
• Use percentiles
– Averages lie

50

25

A full performance model

• The HTTP data model
– Redirects
– Containers
– Components
– User sessions
• HTTP-specific latency
– SSL
– Redirect time
– Host latency
– Network latency
– Idle time
– Think time

51

Availability

• Network errors
– High
retransmissions,
DNS resolution
failure

52

26

Availability

• Client errors
– 404 not found

53

Availability

• Application errors
– HTTP 500

54

27

Availability

• Service errors

55

Availability

• Content & back-
end errors
– “ODBC Error
#1234”

56

28

Availability

• Custom errors
– Specific to your
business

57

User satisfaction:
Satisfied, tolerating, frustrated
What metric? What function?
Target
performance

Impact on
users

Percentile
data

58

29

Averages lie:
Use percentiles

59

Averages lie:
Use percentiles

Average varies wildly,
making it hard to
threshold properly or
see a real slow-down.

60

30

Averages lie:
Use percentiles

80th percentile
only spikes once
for a legitimate
slow-down (20%
of users affected)
61

Averages lie:
Use percentiles

Setting a useful
threshold on
percentiles gives
less false positives
and more real alerts

62

31


Business goals

Operating processes

Tools

Metrics

63

Questions?
acroll<at>coradiant.com
(514) 944-2765

64

32

Gigamon U - Web Performance Monitoring

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Gigamon U - Web Performance Monitoring

Ähnlich wie Gigamon U - Web Performance Monitoring (20)

Mehr von Grant Swanson

Mehr von Grant Swanson (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Gigamon U - Web Performance Monitoring