SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Rendering Web Content
@ 60FPS
Vangelis Kokkevis & Brian Salomon
vangelis@google.com bsalomon@Google.
com
Google Chrome

●
●
●

Recently celebrated Chrome’s fifth anniversary!
Hundreds of millions of active users
Cross platform:
○
○

●
●
●

Windows (XP +) , Mac, Linux
Chrome OS (x86 and ARM), Android, iOS (*)

Open source: Chromium and Blink
Rapid release cycle, four channels (canary, dev, beta, stable)
Core Principles: Speed, Security, Stability, Simplicity

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Chrome’s Multi-Process Architecture (pre-GPU)

User Input

Browser

Renderer
Renderer
Renderer
V8 (JavaScript)
V8 (JavaScript)
V8
Blink (JavaScript)
Blink(Web Renderer)
Blink(Web Renderer)
(Web Renderer)
Skia (2D graphics)
Skia (2D graphics)
Skia (2D graphics)

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Screen

Shared
Memory
Why use the GPU?

●

Enable new platform features:
○

●

3D CSS, WebGL

Speed & Responsiveness
○
○
○

Less jank: Smoother scrolling, 60fps CSS animations
Page “sticks to your finger”
Faster <canvas>, <video>

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Accelerated Compositing

Re-rasterizing is expensive and should be avoided if possible
Caching rasterized contents into textures is an effective way to reduce raster costs.
Split the page contents into layers, use the GPU to composite them
What gets a layer?
●
●

Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash
Content that is expected to change infrequently:
○
○
○

●

CSS transform and opacity animations
Overflow scroll
Fixed position elements

Content that overlaps other composited content

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Compositing Layers

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
The Rendering Pipeline

User Input
or Timer
Event

Run Script

Rasterize
Invalidated
Content

Re-Layout
Document

Upload New
Content to
Textures

Draw Textured
Quads

< 16ms =

(if needed)

Compositor
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tiling

Large content layers get tiled
●
●

Layer split up into 256 x 256 or 512 x 512 pixel tiles
Cache rasterized contents in manageable chunks to
○
○

Speed up scrolling
Conserve VRAM

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tiling Example

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
GPU Architecture

Browser

Screen

Shared Memory

Renderer
Blink (WebGL)
Skia (Canvas)
Compositor

CMD
CMD
CMD
ringbuffer
ringbuffer
ringbuffer

GLES2
Client

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

GPU Process

GLES2
Service

Transfer
Transfer
Transfer
buffer
buffer
buffer

ANGLE (GL ES -> D3D)
The Challenge

Ideally….

16ms

JS

Layout

Rasterize

16ms

Upload

Draw

JS

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Layout

Rasterize

Upload

Draw
The Challenge

In practice...

16ms

JS

Layout

Rasterize

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

16ms

Upload

Draw
Threaded Compositing

Solution: Move compositing to its own thread

16ms
Main
Thread

Compositor
Thread

JS

Upload

Layout

16ms

Rasterize

Draw

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Upload

Draw
Good enough?

The devil’s in the details
●
●

Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor
thread when scrolling.
How many tiles to pre-paint?
○
○

Too many: VRAM pressure, possibly lots of unnecessary work
Too few: Checkerboarding

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Deferred Rasterization
Less checkerboarding: Move raster out of main thread

16ms
Main
Thread

Compositor
Thread

Raster
Thread(s)

16ms

JS

Sort
Tiles

Record Display List

Layout

Issue
Raster
Tasks

UT

RT

RT

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

UT

RT

UT

Draw

RT

RT

Sort
Tiles

RT

Issue
Raster
Tasks

UT

UT

RT

UT

RT

UT

Draw

RT

RT
Tooling

Lots of threads, lots of asynchronous tasks.
Good performance tools are a must for debugging and improving!
Tools we use when developing Chrome:
●
●
●

Tracing (to monitor what each thread is doing in a timeline)
FrameViewer (Inspect layers, tiles and rasterization)
Telemetry (automated performance measurement framework)

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tracing

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Frame-Viewer

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Telemetry

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Challenges

●
●
●
●
●
●

Rasterization is a bottleneck
The main thread is unpredictable (JS, layout, long records)
There’s not enough cores to go around (mobile)
Bandwidth is at premium
GPU is a shared resource and can get oversubscribed
Huge matrix of OS / GPU / CPU / Drivers

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
What does the future hold

More performance gains:
●
●
●
●

Hardware accelerated rasterization
“Zero-copy” texture uploads
Hardware accelerated image decode
Smarter and more efficient layers

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Skia

●

Portable 2D graphics/text engine
○
○
○
○
○

●

Multiple Backends
○
○
○
○

●

Device independent coordinates
3x3 matrices w/ perspective
Arbitrary clipping
Transparency, anti-aliasing, dithering, filters
Extension architecture for…
SW rasterizer
GPU (“Ganesh”)
PDF
Picture (display list)

Open source
○

code.google.com/p/skia

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives
Non-Zero
●
●
●
●
●
●
●

Lines
Rectangles
Ellipses
Rounded Corner Rectangles
Text
...
Paths
○
○

Made of contours
Contours are connected set of Bezier
curves
■
■
■

○
○

lines
quadratics (rational)
cubics

Can be filled or stroked
Fills are based on winding number

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Even/Odd
Pipeline Stages
SkPaint: Life of a Path

Programmable (via Subclassing)
●

SkRasterizer
○
○

●

Coverage Mask -> Coverage Mask
e.g. Blur
Source-Space Coordinate -> Color
e.g. Gradients, Bitmap Fill

SkColorFilter
○
○

Color -> Color
e.g. Color Matrix, Blend with constant Color

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Src Image -> New Src Image
e.g. Color Blur, Morphology Filter
Subsume SkColorFilter?

SkXfermode
○
○
○

Path -> Coverage Mask
e.g. ?? [considering deprecating]

SkShader
○
○

●

●

SkMaskFilter
○
○

●

Path -> Path
e.g. Dashing

SkImageFilter
○
○
○

SkPathEffect
○
○

●

●

AKA Blend
Src Color + Dst Color -> New Dst Color
e.g. Porter-Duff modes, Darken, …

Fixed Function
●
●
●
●
●
●

Stroking (width, caps, joins)
Text settings (typeface, pt size, …)
AA enable/disable
Image filtering quality level
Alpha
Default color if no SkShader
GPU Shaders

GPU Backend has an “effect” system for
building shaders
●
●
●

Effects arranged in linear order.
Write a snippet of GLSL fragment
code.
Effect passes a vec4 “color” to the
next effect.
○

●
●

Input to first effect is either
constant or per-vertex value.

Can insert uniforms, functions,
textures.
Internal effects can
○
○

Insert vertex shader code.
Require additional vertex
attributes.

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Initial Coverage

Initial Color

Color Effect 1

Color Effect 2
final color

texture

matrix
uniform

Cov. Effect 1

Cov. Effect 2

Cov. Effect 3
final coverage

Important to keep color and fractional coverage separate.
Pipeline Stages and GPU Backend

●

SkPathEffect
○
○
○
○

●

SkRasterizer: ignored
○
○

●

Perform on CPU
Call filterPath(), draw the resulting path
Special hooks for some dashing cases
Future: general mechanism to avoid creating intermediate path object on CPU
No known clients use custom rasterizers.
Act as though no rasterizer installed

SkMaskFilter:
○

Filter object is given a gpu “context object” and primitive’s mask
■
■
■

○
○
○

Can create intermediate textures
Performs draws using Effects
Returns new mask as a texture.

Special case for filters that can be performed inline with the draw to dst
In practice the only significant SkMaskFilter is blur
Future: Specialize blur code path for simple primitive types (e.g. rects)

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages and GPU Backend Continued

●

SkShader
○
○

●

SkColorFilter:
○
○

●

Produces an Effect object that is inserted into the draw
Implementations for bitmap shaders, various gradient types, noise shader.
Produces an effect that receives SkShader effect’s output.
Implementations for color matrix, color table, blend-against-const-color

SkImageFilter:
○
○

Works the same way as SkMaskFilter but with color input/ouput
Implementations for
■
■
■

○

Graph implementation for chaining SkImageFilters together (CPU or GPU)
■
■

○

Color blur
Lighting effect
Any (color filter, shader, or xfermode) as an image filter
SVG image filter DAG
Future: Optimization pass to minimize intermediate draws.

Shortcuts for Image filters that can be done inline or are really just a matrix.

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages and GPU Backend Continued

●

SkXfermode: Either as GL coefficients or Effect
○

The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs
■

○

Many others are not:
■
■
■
■

○

Big caveat here
Luminance
Darken
Arithmetic
…

Xfermode can install an Effect
■

Access to the destination?
●
Effect framework provides abstract interface for accessing the dst color
●
GL_EXT_shader_framebuffer_fetch if available
●
Future: GL_NV_texture_barrier
●
Otherwise a dst-copy-to-texture is triggered

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Text

●

Skia sits on top of system font engine:
○
○
○
○

●

Large ALPHA8 texture used as glyph mask atlas (1024 x 2048)
○
○

●

FreeType
CoreText
GDI
DirectWrite
Will use a second RGB(A) texture if there are “LCD” glyphs
Texture divided into 256x256 texel “plots”

Strike: A unique combination of

●
●

Typeface
Size
Style (italic, bold, …)

Strikes claim (multiple) plots
Plots purged wholesale using LRU

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Strike 0

Strike 1

Strike 0

Strike 2

Strike 2

Strike 1

Strike 3

Strike 3

Strike 0

Strike 3

Strike 3

Strike 1

Strike 2

○
○
○

Strike 3

(free)

Strike 2
Primitives: Text Continued

●
●

Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/]
Attempt to perform all uploads for a frame before draws
○
○

●

Avoid flushing draws
○
○

●
●
●

Queue GL draws
Uploads go through immediately
Only flush draws to GL when a plot is purged that is referenced in currently queued draws
Matters a lot more on mobile, especially tiled architectures

Works pretty well for scrolling
Struggles with pinch-zoom
Under development: distance field atlas
○
○

Same texture partitioning and replacement scheme
“Masks” are (mostly) resolution independent

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Rects

Not anti-aliased: Simple, draw a quad!
Two approaches for anti-aliasing (non-MSAA):
●

Geometric
○
○
○

Create inner and outer offset geometry
Offset is 0.5 pixels
Use “coverage” vertex attribute
■
■

○
●

c=1

0 at outer offset rect
1 at inner offset rect

c=0

Handle degenerate cases

Shader
○

Attributes:
■
■
■

○

W = rect.width() + 0.5, H = rect.height() + 0.5
Y = normalized y-axis of rect
C = center of rect

coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1)

Geometry shaders could reduce VBO size and save CPU cycles

W
C

Y
H
p

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Misc

Adaptations for stroked rectangles
Similar shader techniques for:
●
●
●

Ellipses
Circles
Rounded-Rectangles

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Paths

●

Why are paths hard?
○
○

In most general case have to handle both the fill rule and anti-aliasing
After a blend coverage/alpha distinction is lost. Must only perform one blend in general.
Can’t double blend in overlap!

Can’t anti-alias interior edge!

Multiple edges from different contours
relevant to pixels in concavities!
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Paths Continued

●
●
●
●

MSAA solves the AA problem
Use the stencil to solve the fill rule problem
Tessellate contours into line segments
Pass 1:
○
○
○

●

+1

Draw the tessellated contours as triangle fan
Disable color writes
Stencil op: +1 for front face, -1 for back face

-1

Pass 2:
○
○
○

Draw bounding geometry
Enable color writes
Stencil func
■
■

●

Pass 1

+1

Winding: Pass if stencil is non-zero
Even/Odd: Pass if LSB is 1

Avoid tessellating quadratic and cubic beziers:
○
○
○

Discard in FS if outside the curve [Kokojima et al.]
Need per sample discard or sample coverage mask
No-go on ES3 :(

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Pass 2
Primitives: Paths Continued

For AA paths without MSAA:
●
●
●

Detect if path is one of the other primitive types (e.g. rounded rectangle)
If very thin stroke draw as AA lines (and ignore double blend problem)
If path is convex fill rule problem goes away
○
○
○

●

Fan the on-contour control points
Draw bounding hulls of curves
Compute coverage using implict eq. approx distance to curve [LoopBlinn]

Otherwise, SW rasterize mask and upload

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Questions

?

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Weitere ähnliche Inhalte

Was ist angesagt?

PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
AMD Developer Central
 

Was ist angesagt? (20)

WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
HC-4019, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU...
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony ParisiWT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
MM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman HashimMM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman Hashim
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 

Ähnlich wie WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
mraaaaa
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
Khaled MAMOU
 

Ähnlich wie WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon (20)

High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
Ustream Techtalks: Google Chrome Developer Tools
Ustream Techtalks: Google Chrome Developer ToolsUstream Techtalks: Google Chrome Developer Tools
Ustream Techtalks: Google Chrome Developer Tools
 
Korea linuxforum2014 html5game-sangseoklim
Korea linuxforum2014 html5game-sangseoklimKorea linuxforum2014 html5game-sangseoklim
Korea linuxforum2014 html5game-sangseoklim
 
(2) gui drawing
(2) gui drawing(2) gui drawing
(2) gui drawing
 
Towards shipping Ozone/Wayland (BlinkOn 10)
Towards shipping Ozone/Wayland (BlinkOn 10)Towards shipping Ozone/Wayland (BlinkOn 10)
Towards shipping Ozone/Wayland (BlinkOn 10)
 
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
How the Universal Render Pipeline unlocks games for you - Unite Copenhagen 2019
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
QGIS UK: QGIS Performance Enhancements (Lutra Consulting)
QGIS UK: QGIS Performance Enhancements (Lutra Consulting)QGIS UK: QGIS Performance Enhancements (Lutra Consulting)
QGIS UK: QGIS Performance Enhancements (Lutra Consulting)
 
QGIS UK User Group - QGIS Performance Enhancements (Lutra)
QGIS UK User Group - QGIS Performance Enhancements (Lutra)QGIS UK User Group - QGIS Performance Enhancements (Lutra)
QGIS UK User Group - QGIS Performance Enhancements (Lutra)
 
Sergey Gonchar - Fast rendering with Starling
Sergey Gonchar - Fast rendering with StarlingSergey Gonchar - Fast rendering with Starling
Sergey Gonchar - Fast rendering with Starling
 
Developing games and graphic visualizations in Pascal
Developing games and graphic visualizations in PascalDeveloping games and graphic visualizations in Pascal
Developing games and graphic visualizations in Pascal
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
 
Running HTML5 Mobile Web Games at 60fps
Running HTML5 Mobile Web Games at 60fpsRunning HTML5 Mobile Web Games at 60fps
Running HTML5 Mobile Web Games at 60fps
 
Gl tf siggraph-2013
Gl tf siggraph-2013Gl tf siggraph-2013
Gl tf siggraph-2013
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 qCUDA-ARM : Virtualization for Embedded GPU Architectures  qCUDA-ARM : Virtualization for Embedded GPU Architectures
qCUDA-ARM : Virtualization for Embedded GPU Architectures
 
Webrender 1.0
Webrender 1.0Webrender 1.0
Webrender 1.0
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 

Mehr von AMD Developer Central

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
AMD Developer Central
 

Mehr von AMD Developer Central (20)

Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

  • 1. Rendering Web Content @ 60FPS Vangelis Kokkevis & Brian Salomon vangelis@google.com bsalomon@Google. com
  • 2. Google Chrome ● ● ● Recently celebrated Chrome’s fifth anniversary! Hundreds of millions of active users Cross platform: ○ ○ ● ● ● Windows (XP +) , Mac, Linux Chrome OS (x86 and ARM), Android, iOS (*) Open source: Chromium and Blink Rapid release cycle, four channels (canary, dev, beta, stable) Core Principles: Speed, Security, Stability, Simplicity | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 3. Chrome’s Multi-Process Architecture (pre-GPU) User Input Browser Renderer Renderer Renderer V8 (JavaScript) V8 (JavaScript) V8 Blink (JavaScript) Blink(Web Renderer) Blink(Web Renderer) (Web Renderer) Skia (2D graphics) Skia (2D graphics) Skia (2D graphics) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Screen Shared Memory
  • 4. Why use the GPU? ● Enable new platform features: ○ ● 3D CSS, WebGL Speed & Responsiveness ○ ○ ○ Less jank: Smoother scrolling, 60fps CSS animations Page “sticks to your finger” Faster <canvas>, <video> | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 5. Accelerated Compositing Re-rasterizing is expensive and should be avoided if possible Caching rasterized contents into textures is an effective way to reduce raster costs. Split the page contents into layers, use the GPU to composite them What gets a layer? ● ● Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash Content that is expected to change infrequently: ○ ○ ○ ● CSS transform and opacity animations Overflow scroll Fixed position elements Content that overlaps other composited content | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 6. Compositing Layers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 7. The Rendering Pipeline User Input or Timer Event Run Script Rasterize Invalidated Content Re-Layout Document Upload New Content to Textures Draw Textured Quads < 16ms = (if needed) Compositor | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 8. Tiling Large content layers get tiled ● ● Layer split up into 256 x 256 or 512 x 512 pixel tiles Cache rasterized contents in manageable chunks to ○ ○ Speed up scrolling Conserve VRAM | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 9. Tiling Example | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 10. GPU Architecture Browser Screen Shared Memory Renderer Blink (WebGL) Skia (Canvas) Compositor CMD CMD CMD ringbuffer ringbuffer ringbuffer GLES2 Client | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL GPU Process GLES2 Service Transfer Transfer Transfer buffer buffer buffer ANGLE (GL ES -> D3D)
  • 11. The Challenge Ideally…. 16ms JS Layout Rasterize 16ms Upload Draw JS | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Layout Rasterize Upload Draw
  • 12. The Challenge In practice... 16ms JS Layout Rasterize | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL 16ms Upload Draw
  • 13. Threaded Compositing Solution: Move compositing to its own thread 16ms Main Thread Compositor Thread JS Upload Layout 16ms Rasterize Draw | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Upload Draw
  • 14. Good enough? The devil’s in the details ● ● Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor thread when scrolling. How many tiles to pre-paint? ○ ○ Too many: VRAM pressure, possibly lots of unnecessary work Too few: Checkerboarding | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 15. Deferred Rasterization Less checkerboarding: Move raster out of main thread 16ms Main Thread Compositor Thread Raster Thread(s) 16ms JS Sort Tiles Record Display List Layout Issue Raster Tasks UT RT RT | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL UT RT UT Draw RT RT Sort Tiles RT Issue Raster Tasks UT UT RT UT RT UT Draw RT RT
  • 16. Tooling Lots of threads, lots of asynchronous tasks. Good performance tools are a must for debugging and improving! Tools we use when developing Chrome: ● ● ● Tracing (to monitor what each thread is doing in a timeline) FrameViewer (Inspect layers, tiles and rasterization) Telemetry (automated performance measurement framework) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 17. Tracing | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 18. Frame-Viewer | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 19. Telemetry | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 20. Challenges ● ● ● ● ● ● Rasterization is a bottleneck The main thread is unpredictable (JS, layout, long records) There’s not enough cores to go around (mobile) Bandwidth is at premium GPU is a shared resource and can get oversubscribed Huge matrix of OS / GPU / CPU / Drivers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 21. What does the future hold More performance gains: ● ● ● ● Hardware accelerated rasterization “Zero-copy” texture uploads Hardware accelerated image decode Smarter and more efficient layers | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 22. Skia ● Portable 2D graphics/text engine ○ ○ ○ ○ ○ ● Multiple Backends ○ ○ ○ ○ ● Device independent coordinates 3x3 matrices w/ perspective Arbitrary clipping Transparency, anti-aliasing, dithering, filters Extension architecture for… SW rasterizer GPU (“Ganesh”) PDF Picture (display list) Open source ○ code.google.com/p/skia | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 23. Primitives Non-Zero ● ● ● ● ● ● ● Lines Rectangles Ellipses Rounded Corner Rectangles Text ... Paths ○ ○ Made of contours Contours are connected set of Bezier curves ■ ■ ■ ○ ○ lines quadratics (rational) cubics Can be filled or stroked Fills are based on winding number | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Even/Odd
  • 24. Pipeline Stages SkPaint: Life of a Path Programmable (via Subclassing) ● SkRasterizer ○ ○ ● Coverage Mask -> Coverage Mask e.g. Blur Source-Space Coordinate -> Color e.g. Gradients, Bitmap Fill SkColorFilter ○ ○ Color -> Color e.g. Color Matrix, Blend with constant Color | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Src Image -> New Src Image e.g. Color Blur, Morphology Filter Subsume SkColorFilter? SkXfermode ○ ○ ○ Path -> Coverage Mask e.g. ?? [considering deprecating] SkShader ○ ○ ● ● SkMaskFilter ○ ○ ● Path -> Path e.g. Dashing SkImageFilter ○ ○ ○ SkPathEffect ○ ○ ● ● AKA Blend Src Color + Dst Color -> New Dst Color e.g. Porter-Duff modes, Darken, … Fixed Function ● ● ● ● ● ● Stroking (width, caps, joins) Text settings (typeface, pt size, …) AA enable/disable Image filtering quality level Alpha Default color if no SkShader
  • 25. GPU Shaders GPU Backend has an “effect” system for building shaders ● ● ● Effects arranged in linear order. Write a snippet of GLSL fragment code. Effect passes a vec4 “color” to the next effect. ○ ● ● Input to first effect is either constant or per-vertex value. Can insert uniforms, functions, textures. Internal effects can ○ ○ Insert vertex shader code. Require additional vertex attributes. | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Initial Coverage Initial Color Color Effect 1 Color Effect 2 final color texture matrix uniform Cov. Effect 1 Cov. Effect 2 Cov. Effect 3 final coverage Important to keep color and fractional coverage separate.
  • 26. Pipeline Stages and GPU Backend ● SkPathEffect ○ ○ ○ ○ ● SkRasterizer: ignored ○ ○ ● Perform on CPU Call filterPath(), draw the resulting path Special hooks for some dashing cases Future: general mechanism to avoid creating intermediate path object on CPU No known clients use custom rasterizers. Act as though no rasterizer installed SkMaskFilter: ○ Filter object is given a gpu “context object” and primitive’s mask ■ ■ ■ ○ ○ ○ Can create intermediate textures Performs draws using Effects Returns new mask as a texture. Special case for filters that can be performed inline with the draw to dst In practice the only significant SkMaskFilter is blur Future: Specialize blur code path for simple primitive types (e.g. rects) | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 27. Pipeline Stages and GPU Backend Continued ● SkShader ○ ○ ● SkColorFilter: ○ ○ ● Produces an Effect object that is inserted into the draw Implementations for bitmap shaders, various gradient types, noise shader. Produces an effect that receives SkShader effect’s output. Implementations for color matrix, color table, blend-against-const-color SkImageFilter: ○ ○ Works the same way as SkMaskFilter but with color input/ouput Implementations for ■ ■ ■ ○ Graph implementation for chaining SkImageFilters together (CPU or GPU) ■ ■ ○ Color blur Lighting effect Any (color filter, shader, or xfermode) as an image filter SVG image filter DAG Future: Optimization pass to minimize intermediate draws. Shortcuts for Image filters that can be done inline or are really just a matrix. | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 28. Pipeline Stages and GPU Backend Continued ● SkXfermode: Either as GL coefficients or Effect ○ The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs ■ ○ Many others are not: ■ ■ ■ ■ ○ Big caveat here Luminance Darken Arithmetic … Xfermode can install an Effect ■ Access to the destination? ● Effect framework provides abstract interface for accessing the dst color ● GL_EXT_shader_framebuffer_fetch if available ● Future: GL_NV_texture_barrier ● Otherwise a dst-copy-to-texture is triggered | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 29. Primitives: Text ● Skia sits on top of system font engine: ○ ○ ○ ○ ● Large ALPHA8 texture used as glyph mask atlas (1024 x 2048) ○ ○ ● FreeType CoreText GDI DirectWrite Will use a second RGB(A) texture if there are “LCD” glyphs Texture divided into 256x256 texel “plots” Strike: A unique combination of ● ● Typeface Size Style (italic, bold, …) Strikes claim (multiple) plots Plots purged wholesale using LRU | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Strike 0 Strike 1 Strike 0 Strike 2 Strike 2 Strike 1 Strike 3 Strike 3 Strike 0 Strike 3 Strike 3 Strike 1 Strike 2 ○ ○ ○ Strike 3 (free) Strike 2
  • 30. Primitives: Text Continued ● ● Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/] Attempt to perform all uploads for a frame before draws ○ ○ ● Avoid flushing draws ○ ○ ● ● ● Queue GL draws Uploads go through immediately Only flush draws to GL when a plot is purged that is referenced in currently queued draws Matters a lot more on mobile, especially tiled architectures Works pretty well for scrolling Struggles with pinch-zoom Under development: distance field atlas ○ ○ Same texture partitioning and replacement scheme “Masks” are (mostly) resolution independent | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 31. Primitives: Rects Not anti-aliased: Simple, draw a quad! Two approaches for anti-aliasing (non-MSAA): ● Geometric ○ ○ ○ Create inner and outer offset geometry Offset is 0.5 pixels Use “coverage” vertex attribute ■ ■ ○ ● c=1 0 at outer offset rect 1 at inner offset rect c=0 Handle degenerate cases Shader ○ Attributes: ■ ■ ■ ○ W = rect.width() + 0.5, H = rect.height() + 0.5 Y = normalized y-axis of rect C = center of rect coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1) Geometry shaders could reduce VBO size and save CPU cycles W C Y H p | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 32. Primitives: Misc Adaptations for stroked rectangles Similar shader techniques for: ● ● ● Ellipses Circles Rounded-Rectangles | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 33. Primitives: Paths ● Why are paths hard? ○ ○ In most general case have to handle both the fill rule and anti-aliasing After a blend coverage/alpha distinction is lost. Must only perform one blend in general. Can’t double blend in overlap! Can’t anti-alias interior edge! Multiple edges from different contours relevant to pixels in concavities! | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 34. Primitives: Paths Continued ● ● ● ● MSAA solves the AA problem Use the stencil to solve the fill rule problem Tessellate contours into line segments Pass 1: ○ ○ ○ ● +1 Draw the tessellated contours as triangle fan Disable color writes Stencil op: +1 for front face, -1 for back face -1 Pass 2: ○ ○ ○ Draw bounding geometry Enable color writes Stencil func ■ ■ ● Pass 1 +1 Winding: Pass if stencil is non-zero Even/Odd: Pass if LSB is 1 Avoid tessellating quadratic and cubic beziers: ○ ○ ○ Discard in FS if outside the curve [Kokojima et al.] Need per sample discard or sample coverage mask No-go on ES3 :( | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL Pass 2
  • 35. Primitives: Paths Continued For AA paths without MSAA: ● ● ● Detect if path is one of the other primitive types (e.g. rounded rectangle) If very thin stroke draw as AA lines (and ignore double blend problem) If path is convex fill rule problem goes away ○ ○ ○ ● Fan the on-contour control points Draw bounding hulls of curves Compute coverage using implict eq. approx distance to curve [LoopBlinn] Otherwise, SW rasterize mask and upload | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
  • 36. Questions ? | RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL