Measuring CDN performance and why you're doing it wrong

Measuring
CDN

Performance

Hooman
Beheshti

VP
Technology

Why
this
matters

•  Performance
is
one
of
the
main
reasons
we
use
a

CDN

•  Measurement
often
used
during
evaluation

phase
to
compare
CDNs

–  Most
of
what
we’ll
talk
about
is
in
this
context

•  Seems
easy,
but
isn’t

•  Heavily
vendor-‐inﬂuenced

–  “Ok
Google:
deﬁne
irony!”

Goals

•  What
does
the
measurement
landscape
look
like

•  Share
measurement
experiences

•  Help
guide
towards
good
testing
plan
if/when

you
decide
to
do

this

Delivery:
static/cached
objects

Client

CDN
Node

Origin

Delivery:
dynamic/uncached
objects

What
we’ll
be
focusing
on

•  Only
on
delivery
and
not
all
the
other
features

CDNs
provide

•  How
we
measure

•  Metrics
to
measure

•  What
to
measure

•  Some
gotchas,
misconceptions,
and
common

mistakes

Measurement
Techniques

(how
we
measure)

Measurement
techniques

•  Pretend
Users

–  Synthetic
tests

–  Not
actual
users

•  Real
Users

–  In
the
browser

–  Actual
users

Synthetic
testing

•  Usually
a
large
network
of
test
nodes
all
over

the
globe

•  Highly
scalable,
can
do
lots
of
tests
at
once

•  Many
vendors
that
have
this
model

– Examples:
Catchpoint,
Dynatrace(Gomez),

Keynote,
Pingdom,
etc

Synthetic
testing

•  Built
to
do
full
performance
and
availability
testing

–  Lots
of
“monitors”
–
emulating
what
real
users
do

–  DNS,
Traceroute,
Ping,
Streaming,
Mobile

–  HTTP

•  Object

•  Browser

•  Transactions/Flows

•  Tests
set
up
with
some
frequency
to
repeatedly
test
things

–  Aggregates
reported

Backbone
nodes

•  Test
machines
sitting
in
datacenters
all
around
the
globe

•  Really
good
at:

–  Availability
and
reachability

–  Scale

–  Backend
problems

–  Global
reach

•  Terrible
indicators
of
raw
performance

–  No
latency

–  Inﬁnite
bandwidth

Backbone
nodes

•  Test
machines
sitting
in
datacenters
all
around
the
globe

•  Really
good
at:

–  Availability
and
reachability

–  Scale

–  Backend
problems

–  Global
reach

•  Often
terrible
indicators
of
raw
performance

–  No
latency

–  Inﬁnite
bandwidth

https://www.ﬂickr.com/photos/stars6/4381851322/

Last
mile
nodes

•  Test
machines
sitting
behind
a
real
home-‐like

internet
connection

•  Much
better
at
reporting
what
you
can
expect

from
users,
but
sometimes
unreliable

•  Also
not
as
dense
in
deployment

RUM

•  Use
javascript
to
collect
timing
metrics

•  Can
collect
lots
of
things
through
browser

APIs

– Page
metrics,
asset
metrics,
user-‐deﬁned
metrics

Use
test
assets

•  Use
this
model
to
initiate
tests
in
the
browser

•  Some
vendors:

– Cedexis,
TurboBytes,
CloudHarmony,
more…

– Usually,
this
isn’t
their
business,
but
the
data

drives
their
main
business
objectives

•  You
can
build
this
yourself
too

Use
real
assets
in
the
page

•  Collect
timings
from
actual
objects

–  Resource
timing

•  Vendors

–  SOASTA,
New
Relic,
most
synthetic
vendors

–  Boomerang
(open
source)

–  Google
Analytics
User
Timings

DATA,
DATA,
DATA

•  For
either
RUM
technique,
we
need
A
LOT
of

data

•  Too
much
variance

– Most
vendors
don’t
use
averages

– Medians,
percentiles,
and
histograms

Client
Server

1
x
RTT

Client
Server

DNS

DNS

TCP

Client
Server

DNS

DNS

TCP

Client
Server

DNS

DNS

(TLS)

TCP

Client
Server

DNS

DNS

(TLS)

HTTP
(TTFB)

TCP

Client
Server

DNS

DNS

(TLS)

HTTP
(TTFB)

HTTP
(Download)

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS
RTT
to
DNS
server,
DNS
iterations,
DNS
caching
and

TTLs

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS

TCP

RTT
to
DNS
server,
DNS
iterations,
DNS
caching
and

TTLs

RTT
to
cache
server
(CDN
footprint
&
routing
algorithms)

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS

TCP

(TLS)

RTT
to
DNS
server,
DNS
iterations,
DNS
caching
and

TTLs

RTT
to
cache
server
(CDN
footprint
&
routing
algorithms)

RTT
to
cache
server
(or
RTTs
depending
on
TLS
False
Start),

eﬃciency
of
TLS
engine

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS

TCP

(TLS)

TTFB

RTT
to
DNS
server,
DNS
iterations,
DNS
caching
and

TTLs

RTT
to
cache
server
(CDN
footprint
&
routing
algorithms)

RTT
to
cache
server
(or
RTTs
depending
on
TLS
False
Start),

efficiency
of
TLS
engine

RTT
to
where
the
object
is
stored
+
storage
efficiency

(different
for
requests
to
origin);
lower
bound
=
network
RTT

DNS
TCP
(TLS)
TTFB
Download
(TTLB-‐TTFB)

Time

DNS

TCP

(TLS)

TTFB

TTLB-‐TTFB

RTT
to
DNS
server,
DNS
iterations,
DNS
caching
and

TTLs

RTT
to
cache
server
(CDN
footprint
&
routing
algorithms)

RTT
to
cache
server
(or
RTTs
depending
on
TLS
False
Start),

efficiency
of
TLS
engine

RTT
to
where
the
object
is
stored
+
storage
efficiency

(different
for
requests
to
origin);
lower
bound
=
network
RTT

Bandwidth,
congestion
avoidance
algorithms
(and
RTT!)

Core
object
metrics

•  Not
every
request
experiences
every
metric:

– DNS:
once
per
domain

– TCP/TLS
setup
once
per
connection

– TTFB/Download
for
every
object
(not
already
in

browser
cache)

Resource
timing

http://www.w3.org/TR/resource-‐timing/

Resource
timing

window.performance.getEntries()

Mistakes
we
make

(when
evaluating)

“I’ll
pick
an
image
from
my
home

page,
use
backbone
synthetic

tests
from
all
over
the
world
and

pick
the
CDN
that
has
the
fastest

average
time”

“let’s
test
an
asset
via

RUM
on
a

million
page
views
a
day
and
pick

the
fastest
CDN”

“let’s
run
webpagetest
on
both

CDNs
and
go
with
whichever
has

a
faster
page
load
time”

~$time curl –v http://…

we
measure
the
wrong
thing

Web
application:
objects

•  Your
application
should
determine
what
you
test:

–  Objects
served
from
the
edge

–  Objects
served
from
origin
(through
CDN)

•  If
HTML
is
from
origin
(through
CDN),
we
must

measure
it

–  Essential
to
critical
page
metrics

Web
application:
object
sizes

•  On
any
page

–  DNS
queries
only
happen
a
small

number
of
times

–  6
TCP
connections
per
domain

–  1
TLS
setup
per
connection

–  Many
many
many
HTTP
fetches

•  Core
metrics

–  TTFB

–  Download
(TTLB-‐TTFB)
if

important
large
objects

–  Should
have
a
good
idea
of
DNS/
TCP/TLS,
but
less
critical

Web
application

•  If
CDN
only
for
static/cacheable
objects:

–  One
or
two
representative
assets

–  TTFB
and
maybe
download
most
important

Client
CDN
Node

Web
application

•  If
CDN
also
for
whole
site
(HTML
going
through
CDN)

–  Sample
of
key
HTML
pages,
delivered
from
origin

–  TTFB
will
show
eﬃciency
of
routing
(and
connection

management)

to
origin

–  TTLB
will
show
eﬃciency
of
delivery

Web
Server
Client
CDN
Node

Web
application

•  If
CDN
also
for
whole
site
(HTML
going
through
CDN)

–  Sample
of
key
HTML
pages,
delivered
from
origin

–  TTFB
will
show
eﬃciency
of
routing
(and
connection

management)

to
origin

–  TTLB
will
show
eﬃciency
of
delivery

Web
Server
Client
CDN
Node
CDN
Node

we
measure
the
wrong
way

Backbone

Nodes

(For
true
performance
measurements)

%
of
tests

msec

TCP
Connect
Time
Histogram
(BB
nodes)

object
metrics

or

page
metrics

Download:
15Mbps

Upload:
5Mbps

Latency:
10
ms,
25
ms

onload
Speed
Index
Start
Render

10
msec

25
msec

What
the…???

•  We
always
assume
“all
things
equal”

•  Too
many
factors
aﬀect
page
load
time

–  3rd
parties
(sometimes
varying),
content
form
origin,
layout,
JS

execution,
etc

•  Too
much
variance

Source:
httparchive.org

To
be
clear…

•  Always
use
webpagetest
(or
something
like
it)
to
understand
your

application’s
performance
proﬁle

•  Continue
to
monitor
application
performance,
and
always
spot

check

•  Be
extremely
careful
when
using
it
to
compare
CDN
performance,

it
can
mislead
you

–  If
using
RUM
to
measure
page
metrics,
with
lots
of
data,
things

become
a
little
more
meaningful
(data
volume
handles
variance)

we
overgeneralize

and

draw
the
wrong
conclusions

Cache
hit
ratio:
traditional
calculation

1
-‐

Requests
to
Origin

Total
Requests

Origin

Cache

HOT
COLD

Origin

Cache

cache
“hit”

Isn’t
this
better?

Hits

Total
Requests

@edge

Isn’t
this
better?

Hits

Hits
+
Misses

@edge

Cache
hit
ratio

vs.
1
-‐

Requests
to
Origin

Total
Requests

Hits

Hits
+
Misses

@edge

Cache
hit
ratio

vs.
1
-‐

Requests
to
Origin

Total
Requests

Hits

Hits
+
Misses

@edge

Oﬄoad

Cache
hit
ratio

vs.
1
-‐

Requests
to
Origin

Total
Requests

Hits

Hits
+
Misses

@edge

Oﬄoad
Performance

Eﬀect
on
long
tail
content

Eﬀect
on
long
tail
content

(long
tail:
Cacheable
but
seldom
fetched)

Popular
Medium
Tail
(1hr)
Long
tail

(6hr)

Popular
Medium
Tail
(1hr)
Long
tail

(6hr)

Connect
(median)

Popular
14msec

1hr
Tail
15msec

6hr
Tail
16msec

Popular
Medium
Tail
(1hr)
Long
tail

(6hr)

Connect
(median)

Popular
14msec

1hr
Tail
15msec

6hr
Tail
16msec
6,400+
measurements

77,000+
measurements

38,000+
measurements

Popular
Medium
Tail
(1hr)
Long
tail

(6hr)

Connect
(median)
Wait
(median)

Popular
14msec
19msec

1hr
Tail
15msec
26msec

6hr
Tail
16msec
32msec
6,400+
measurements

77,000+
measurements

38,000+
measurements

Popular
Medium
Tail
(1hr)
Long
tail

(6hr)

Isn’t
this
better?

How
much
of
this
really
matter?

(when
trying
to
choose
between
multiple

CDNs)

The
bigger
picture

•  It’s
really
easy
to
lock
in
on
a
metric

•  Performance
absolutely
matters

•  True
performance
isn’t
always
as
easy
to

measure

We
must
ask
questions
…

What’s
the
storage
model
and
how
does

it
aﬀect
long
tail
content?

What
should
I
expect
with

cache
hit
ratios

for
oﬄoad
and
performance?

Footprint?

(is
what
I’m
testing
the
same
as
what
I’m
buying?)

HTTP
vs
TLS
footprint?

Can
I
serve
stale
content
if
necessary?

(stale-while-revalidate & stale-if-error)

What
if
I
can
cache
something
I
didn’t

think
I
could?

Key
takeaways

•  Everything
is
application-‐dependent

–  Evaluate
how
your
application
works
and
what
impacts
performance
the

most

•  Don’t
get
locked
into
a
single
number/metric

•  Always
know
your
application
performance
and
bottlenecks

•  Be
mindful
of
the
bigger
picture

•  Don’t
stop
measuring!

Thank
you!

hooman@fastly.com

oﬃce
hours
Friday
@lunch

Measuring CDN performance and why you're doing it wrong

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Measuring CDN performance and why you're doing it wrong

Ähnlich wie Measuring CDN performance and why you're doing it wrong (20)

Mehr von Fastly

Mehr von Fastly (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Measuring CDN performance and why you're doing it wrong