2. What is a graphics pipeline?
3D Raster
Stage Stage Stage
Scene Image
● Hardware, real-time / interactive rendering
● Popular APIs : OpenGL and DirectX
3. Overview
● Basic Graphics Pipeline
● Modern Graphics Pipeline
● Beyond Pipelining
● The New Wave
4. Basic Graphics Pipeline
● Use case:
● Render a textured mesh with per-pixel lighting
● ambient light, 1 dir, 1 point, no shadows
● Assume z-buffer based architecture
5. 3D Scene
● Surface
● Triangle mesh
– Vertices and indices
– Per-vertex position, normal
● Position + orientation (world matrix)
● Material
● Per-vertex uv, tangent, binormal
● Diffuse + normal maps
● Diffuse lighting (direction, colour)
● Camera (view + projection matrices)
9. Pixel Processing
Textures
Per-Pixel
Diffuse
Position-WS
Position-SS Normal
Normal-WS
Tangent-WS Per-Pixel
Binormal-WS
Texture UV Depth
Pixel Colour
Shader Alpha
Uniform
Constants
Ambient L colour Texturing
Dir L colour Lighting
Dir L dir
Point L colour
Point L pos
12. Modern GPU Pipeline
● Unified shader architecture
● Common shading cores shared between Vertex,
Geometry and Pixel shading units
● Scheduler distributes work
● Load balancing
15. Modern GPU Pipeline
● Bandwidth:
● Hierarchical Z
● PS3: Compressed Z and colour to reduce
bandwidth for MSAA reads
● X360: in-GPU EDRAM – lots of bandwidth
17. Modern GPU
● More memory, processing units
● More floating point formats, fewer usage
restrictions
● More render targets (8)
● Longer shaders
● New data structures (e.g. Texture arrays)
● Better MSAA and anisotropic filtering support
18. Beyond Pipelining
● Multi-processor
● Solution to “memory” and “power” walls
● Pipelining : multiple stages happening at once
● Parallelism : many things happening in the same
stage
● Limit of pipelining
● Small number of pipeline steps
● Some steps are much more compute intensive
19. Parallelism
● Parallelism examples:
● All components of float4 at the same time
● Multiple vertices at the same time
● Multiple triangles at the same time
20. SIMD
● e.g. GPU ALU
● Shared instruction store and control
● Compact and less expensive
● Efficient with no loops or branches
● Problem with unused processing cycles
● Unfilled quads are inefficient
● Solution : avoid small or skinny triangles (PS3)
● Not good for more complicated data structures
or algorithms
21. SIMT
● Still SIMD. Shared code between threads.
● Process groups of primitives (e.g. 48 quads) in
each thread
● Latency hiding:
● 1 Thread stalls on texture fetch
● Othe threads continue execution
● Especially important due to “memory wall”
22. SIMT
● When branching:
● Only evaluate one branch if all primitives take that
branch
● Must evaluate both branches and mask the results
if not all primitives take the same branch
● Reduces unused processor cycles
23. MIMD
● e.g. Multi-core CPUs, Cell SPEs, Larrabee
● Diff code stores and controls for diff processors
● More complex hardware
● More expensive
● Synchronization issues
● Can handle more complex data structures and
algorithms
25. Cell SPEs
● SPEs
● Local memory store
● Shared memory accessed via DMA
● Ring bus
26. PS3
● RSX
● Traditional GPU (z-buffer, ROP)
● SIMD data structures and processing (arrays)
● Offload GPU work to SPUs
● Micro triangle removal
● Skinning
● Post-FX
● Lighting
● Mostly rely on SIMD-friendly data structures
27. Larrabee
● Many general purpose CPU cores
● Coherent memory access from cores
● Very few fixed-function units (e.g. Texture)
● Most graphics pipeline components are
programmable
● Depth buffer
● Blending
● Invites more complex data structures and algo
29. Programming
● GPU programming may become more like SPU
programming
● More MIMD
● More synchronization and data buffering issues
● More attention to latency hiding