What does it take to achieve sub two seconds video playback latency on the 3rd largest website in the world?
We will peek under the hood of the Watch page and explore what common problems are being solved by
YouTube's Desktop team and what interesting solutions had to be implemented to achieve this goal.
We will discuss how page loads are classified and what specific treatment is required for every type, what tools and technologies are used in the stack, how being one of the largest image serving websites affects our approach to thumbnails and how we maintain and monitor our latency goals.
From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)
1. From nothing to a video
under 2 seconds
Mikhail Sychev, Software Engineer Unicorn at Google
2. Who am I?
● Software Engineer at YouTube
● Evangelist of modern web technologies
● Member of the “Make YouTube Faster” group
3. What we will talk about
● Types of page load, associated challenges and our
approach to handling them
● Tools that we use to build and monitor YouTube
● Tricks we learned along the way
4. 1 Second
...users notice the short delay, they stay focused on
their current train of thought during the one-second
interval … this means that new pages must display
within 1 second for users to feel like they're
navigating freely; any slower and they feel held back
by the computer and don't click as readily.
JAKOB NIELSEN http://www.nngroup.com/articles/response-times-3-important-limits/
5. YouTube is a video streaming service, starting
video playback as early as possible is the most
important task of the Watch page
....yet everything else is important too
9. Users opening YouTube page with a plain HTTP
(s) request
● Full HTML page downloaded and parsed
● Some or no static resources in cache
● Some DNS cache
● Thumbnails have to be downloaded
10. ● Various “pre-browse” techniques
○ http://www.slideshare.
net/souders/prebrowsing-velocity-ny-2013
○ http://www.slideshare.
net/MilanAryal/preconnect-prefetch-prerender
● Browsers are really good at rendering HTML soup
○ Because this is what the most of internet is
But that’s pretty much it...
11. Good news we have plenty of room for
optimizations (bad news is up to us to do all of it)
● How fast can we send data to the browser?
● How much data should be downloaded?
● How much data should be processed?
● Do we have CPU/thread or network congestion?
12. ● HTTP2/SPDY(for http) + QUIC(for video)
● gzip
● Image crushing
● JS compiled by Google closure and modularized
● Icon sprites (WebP where supported)
● Minified CSS
● CDN
https://developers.google.com/speed/pagespeed/
Basic things you would expect
13. ● Why should you care?
○ Typed JavaScript
■ Absolutely critical for large teams
○ A set of advanced optimizations
○ HUGE size savings and dead code removal
○ Kind of hard to setup and writing annotations is
time consuming
Closure compiler
14. ● Try the compiler online:
https://closure-compiler.appspot.com/
● Docs and examples:
https://developers.google.
com/closure/compiler/docs/api-tutorial2
16. GET /
INIT RPC DATA TRANSFER
PAGE RENDERING...
TEMPLATE
STATIC
Typical request lifetime
● Browser performs a request
● Backend parses this requests and necessary data is
fetched / RPCs called
● Page is rendered via some templating language and
sent to browser
● Browser starts to render while fetching the page and
downloads external resources
17. ● Browser has nothing to do while we are blocked by
RPCs on the backend
● But when we start sending the data it’s forced to
render the page while downloading and executing all
external resources
● Result - CPU/thread and bandwidth congestion
18. Chunking approach
● Render and send to the browser first chunk
containing references to external resources
● Browser starts to fetch them while the connection is
still open
● Send extra chunks of data as RPCs are completed
● Serialize data as JSON to be used later if UI is
blocking
GET /
INIT T0 DATA TRANSFER
RNDR...
RPC1
STATIC
T1 RPC2 T2
RNDR RNDR
19. Chunking approach
Works for client side applications too
● Send links to application resources early
● Render the application chrome, do not wait for page
onload event
● Append data as JSON to the end of the page
● Be careful of timing issues
○ You can’t predict if the application is initialized
first or if the page is completely downloaded
GET /
INIT T0 DATA TRANSFER
RNDR...
RPC
STATIC
JSON
RNDR
21. Player
● Player is large
○ It’s not just a video tag
○ Ads, format selection logic, UI, annotations, etc…
○ Sometimes we have to fallback to Flash
● Just executing all the necessary JS is a significant
CPU task
● We really don’t want to do this on every page load
(but nothing we can do for cold loads)
● Player can be blocked by OS (video and audio init)
22. Player
● No silver bullet for cold load
● Have to carefully profile and optimize the code
● Send the player early and init early
● But the page may still be downloading/painting
● So try not to get in the way e.g. asking the page for
the container size of the player may trigger relayout
blocking the browser for 10x ms
23. Player
● tldr: Efficient video playback is HARD, we have a
whole team dedicating to making it fast
● Pick your battles and don’t try to support every
browser/platform unless you absolutely have to
● Focus on HTML5 if possible, Flash is slowly going
away
27. Thumbnails
● 10+ thumbnails above the fold on the Watch page
● Some of the pages are mostly thumbnails
● Important for users to decide what to watch, we
want thumbnails as fast as possible unless they are
in the path of video
29. ● Only images above the fold are important on
initial page display, everything else can be loaded
later
● But some extra ones can still be preloaded to
prevent thumbnail popping
Delay/Lazy loading
30. ● Can’t start loading of the images until JS can be
executed and forces re-layout
● Not the best solution if most of the images are
above the fold
● We use hybrid approach, do not preloader
thumbnails that are always above the fold and
affect user behavior
Delay/Lazy loading
31. Visible to the user on page load
Invisible, but preloaded
Invisible to the user, not loaded
32. “WebP is a new image format that provides
lossless and lossy compression for images on
the web. WebP lossless images are 26% smaller
in size compared to PNGs. WebP lossy images
are 25-34% smaller in size...”
33.
34.
35. ● Chrome (+mobile)
● Opera (+mobile)
● FF and IE through WebPJS
● Android 4.0+
● iOS (through 3rd party libraries)
● WebKit based console applications
http://caniuse.com/webp
36. Use WebP for sprites and for thumbnails if you can
afford extra space and wants to serve to native
clients.
mileage may vary but expect 10% faster page load
39. ● Browsers already has priorities
● Sometimes we want more control
● Fetching of video bytes should be more important
than thumbnails
● But this requires large codebase refactoring
○ Triggers potential race conditions and issues of
various kinds
● setInterval and setTimeout hijacking as a simple
introduction of scheduling
41. ● A navigation from one YouTube page to another in
the same tab/window
● We have full control, can use rel=”pre-whatever”
● Can we do better?
○ Only transfer JSON data?
■ Requires to rewrite all backend templates
■ Browser are actually really good at rendering
html soup
43. ● Lightweight alternative to rewriting all of the backend
templates
● Chunks of the page are send from backend as
chunked JSON
○ Some overhead on escaping (don’t mess up your
JSON)
○ Overall JSON responses are smaller
● Player preinit on non Watch and persistent player
across page boundaries
● Less DOM changes on navigation
● Custom caching
59. ● Can we do better? Cache?
○ Visited pages/history - necessary to get the
back/forward right
○ Tricky, first page load problem
■ If first page is rendered on backend, how do
we go back?
○ How does this affect the metrics?
○ Can we do even better? Cache pages that would
visited with high probability(next video in autoplay
for example?)
○ But what if the user does not go to next page?
■ Even more metric craziness
■ More QPS
60. ● Every latency impacting change goes through A/B
testing
● Monitor both latency impact and user behaviors
● Making things fast is good, but sometimes we
have to revisit experiments due to behavior
changes (especially for delay loading)
Monitoring
61. ● Regressions happen all the time
○ Some are expected, like YouTube logo doodles
○ Some are real issues
● A lot of things can change
○ Sizes of common static resources
○ Number of images on the page
○ Latency of server responses
62. ● We log timestamps of important events, server
time, aft, qoe, etc..
● Browsers react differently, so we collect data per
browser
○ Version, background state, etc...
● http://www.webpagetest.org
● In addition we use many in-house tools to for
monitoring and notification
65. ● Aim for average page load latency of 1 second or
better
● Different types of page loads may require different
approach
● Use WebP for sprites(usually) and thumbnails(if
possible)
● Google closure compiler is awesome, but hard to set
up - use it if you can
● Minimize amount of work done on every load
(persistent player)
66. ● Understand how the browser works
● SPF is a reasonable alternative to replacing
everything with client side templating, saves QPS
and makes everything faster
● Chunking unblocks the browser, but requires
backend to support it
● Monitor everything, A/B testing is important,
profiling is critical