SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Efficient Rendering with DirectX 12
on Intel Graphics
Andrew Lauritzen
Michael Apodaca
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
2
 Copyright © 2015 Intel Corporation. All rights reserved.
 *Other names and brands may be claimed as the property of others.
 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS
OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
 A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S
PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS,
OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY
CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS
NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
 Intel may make changes to specifications and product descriptions at any time, without notice.
 All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
 Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized
errata are available on request.
 Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not
authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.
 Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
 Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to
http://www.Intel.com/performance
 Iris™ graphics is available on select systems. Consult your system manufacturer.
 Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.
Legal
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Intel 4th Generation Core (i3/i5/i7 4xxx)
– Code-named “Haswell”, Gen 7.5 GPU architecture
– Intel HD Graphics 4400/4600/5000
– Intel Iris Graphics 5100, Iris Pro Graphics 5200, …
 Intel 5th Generation Core (i3/i5/i7 5xxx, Core M 5xxx)
– Code-named “Broadwell”, Gen 8 GPU architecture
– Intel HD Graphics 5300/5500/6000
– Intel Iris Graphics 6100, …
Decoder Cheat-sheet
3
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Performance
– Improve CPU bound games
– Improve multi-core scaling
 Power
– Improve performance on power-constrained platforms
– Improve heat and battery life
 How?
– Reduce CPU overhead of rendering
Why DirectX 12?
4
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Most GPU vendors have complex drivers
– Do lots of fancy optimizations on the fly
– Costs CPU, but makes the GPU run faster
 That’s ok, reviews compare GPUs using fast CPUs! 
 Drivers spawn threads that conflict with application
– Driver thread often consumes an entire core by itself
– Plus another core for the game submission thread
– Minimal multithreading beyond these two threads
Graphics APIs and Overhead
5
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Has recently become far more serious with SoCs
– Even if not “CPU bound”, CPU/GPU share power
– More CPU load => less GPU power/performance
 Complex CPU optimizations are not a good idea…
– Tax on all applications, even well-optimized ones
– CPU work can take more power than it saves on the GPU!
– Leads to lower overall performance
CPU/GPU Optimization Tradeoff
6
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 To address this, Intel wrote a much thinner DX11 driver
– Introduced with Haswell
 Big benefit to well-written applications
– But does far less work to make poor ones run well
– i.e. no redundant state elimination, minimal state-based
shader recompiles, etc.
 Still unavoidable CPU overhead due to API design
– DirectX 12 addresses this
Thinner Intel Graphics Driver
7
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Already significantly lower CPU overhead
 Large increase in power efficiency
– Power saved on the CPU can be given to the GPU
– Applications can both run faster and use less power
 Additional GPU optimization opportunities
– i.e. stuff that we had to drop in the thinner driver
– Pipeline state objects give the driver more context
DirectX 12 on Intel
8
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
DirectX 12 Power and Performance
less power @ same performance higher performance @ same power
DirectX 12 can significantly reduce CPU power or improve performance
9
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Commands and state
 Memory
 Resource binding
Agenda
10
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Commands and State
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 DirectX 11 context is “stateful”
– State grouped into moderately sized chunks
– Rasterizer, depth/stencil, blend, etc.
 Groupings do not always map perfectly to hardware
– Ex. DirectX blend state != GPU blend state
– Driver optimizations based on blend state + pixel shader
State in DirectX 11
12
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 API functions cause one or
more GPU commands to be
added to the command
buffer
 Some GPU commands are
deferred or conditional
– Often lazily added at the next
draw call
Commands in DirectX 11
Command
Buffer
deviceCtxt->aaa();
deviceCtxt->bbb();
deviceCtxt->ccc();
deviceCtxt->ddd();
13
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 At some point, the driver decides
to commit the command buffer
– If the command buffer fills, max
buffered frames, Flush(), etc.
 It’s passed to kernel mode and
GPU addresses are patched
 Then, it’s submitted to the GPU
Commands in DirectX 11
“DMA”
Buffer
Validate
(KMD)
GPU Ring
tail
head
Command
Buffer
14
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Limited parallelism with a single context
 Deferred contexts do not address the problems
– CPU performance and cache issues with transient objects
– State mismatch and lazy state setting
– Inherited internal states
– MAP_DISCARD renaming, hazard tracking, etc.
– Non-trivial patching happens at submission time
 Result: more overhead and limited parallelism
DirectX 11 Deferred Contexts
15
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Each thread has its own
command list and memory
– Fully independent
– Use ~1 command list/thread
 Command lists are submitted to
the GPU in arbitrary order
– Minimal driver work done at
submission time
– Submit all command lists in a
single API call where possible
Commands in DirectX 12
GPU Ring
tail
headCommand Lists
16
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 While those commands are in
flight, can record new commands
– Can reuse command lists
– Must use different memory
 When GPU finishes with memory, it
can also be reused
– App handles synchronization
– Typical to put fence at frame
boundaries
– Always reuse allocators!
Commands in DirectX 12
GPU Ring
tail
headCommand Lists
17
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Immutable, monolithic pipeline state objects (PSOs)
– Single object captures as much state as possible
– Much lower chance of missing driver context
– Allows link-time optimizations on shaders
 No state inheritance between direct command lists
– No API state or internal state inheritance (renaming, etc.)
– Explicit barriers to handle hazards and resource transitions
State in DirectX 12
18
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Create PSOs at initialization time
– Multithread your initialization/PSO creation code!
– Use PSO “libraries”
 PSO changes are usually fairly cheap
– Minimal CPU cost, some GPU cost
 Some state sorting is still desirable
– Turning shader stages on/off can cause pipeline stalls
Pipeline State Objects
19
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Reusable command lists to further lower CPU overhead
 Some minimal state inheritance is allowed
– Some patching may occur at submission time
– If you don’t need to inherit something, set it (again) in the bundle
 Overhead is already very low in DirectX 12
– Need ~10+ draws to make bundles a win on Haswell/Broadwell
– Only consider bundles if you have lots of static draws that can’t
reasonably be combined (via instancing or similar)
– Don’t add any GPU overhead/indirections to enable bundles!
Bundles
20
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 DirectX 12 expands on DrawIndirect/DispatchIndirect
 Command Signature
– Indirect Argument Buffer Format
– Draw/Dispatch calls
– Resource Bindings
 Indirect Argument Buffer
– Dynamic parameters
 Count Buffer
Execute Indirect
IB
VB
Draw
UAV
CBV
Draw
IB Args
VB Args
Draw Args
UAV Args
CBV Args
Draw Args
IB Args
VB Args
Draw Args
UAV Args
CBV Args
Draw Args
21
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Internal Compute Shader
Patches CommandList
– Compiled at
CreateCommandSignature
 If no resource bindings, then
no compute shader (legacy)
Execute Indirect on Haswell/Broadwell
22
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 DirectX 12 exposes multiple “queues” to application
– Graphics/compute, compute-only, copy, etc.
 Graphics and compute are not simultaneous on Intel
– Using separate queues is not a performance benefit
– Consider doing both on the main queue
 There is a simultaneous copy engine
– … but it has fairly low throughput
– Driver may implement large copies using the 3D engine
Multi-engine on Haswell/Broadwell
23
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Memory
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Previous APIs (ex. DirectX 11) hide a lot of details
– GPU physical memory residency (if applicable)
– GPU memory addressing (virtual, physical)
 OS/driver manage residency and addressing
– Ensures command buffers do not exceed hardware resources
– Track referenced allocations, ensure resident
– Allocate and patch GPU addresses
– Major source of CPU overhead!
 Applications try not to over-commit “GPU memory”
GPU Memory in WDDM 1.x
25
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Directly exposes control over physical residency
– Memory referenced by the GPU must be made “resident”
 No dedicated video memory on Intel processors
– “Resident” resources are allocated out of DRAM
 OS uses up to 45% of DRAM for graphics applications
– Ex. 1.8GB on a 4GB system, 3.6GB on an 8GB system, …
– Global limit across the system, not per-process
– Rest is reserved for regular CPU/OS use
GPU Memory Residency in WDDM 2.0
26
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Allocations are initially made resident
– Resource creation will fail if residency budget is exceeded
 OS will request that background apps trim residency
– Misbehaved applications will be suspended from rendering
– i.e. their GPU work will not be scheduled/make progress
 Be a good citizen; provide a good user experience
– Handle allocation failures and trim requests gracefully
– Evict idle resources, trim streaming pools, remove detailed
mips, drop quality settings, etc.
Memory Residency Best Practices
27
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Directly exposes per-process GPU virtual addresses
– Can do pointer arithmetic, store in data structures, etc.
– GPU virtual addresses allocated at resource allocation
– Guaranteed to remain at the same address until release
– Eliminates physical address patching overhead
 Haswell has a limited GPU virtual address space (~2GB)
– Subtly different than residency
GPU Virtual Addresses in WDDM 2.0
28
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Typical Discrete GPU Memory
dGPU
CPU
PCI-E
GPU
Page
Table
GPU virtual address
GPU virtual address
GPU DRAM
(GDDR)
CPU
Page
Table
CPU virtual address
CPU virtual address
CPU DRAM
(DDR)
Applications typically
optimize for this
29
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell Memory
Haswell is
limited by this
GPU
CPU
GPU
Page
Table
GPU virtual address
GPU virtual address
CPU
Page
Table
CPU virtual address
CPU virtual address
DRAM
(DDR)
30
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Not quite the same as limited GPU physical memory
– Limit on the amount of DRAM visible to the GPU at once
– All GPU-visible memory counts (upload/read-back heaps, …)
– Even non-resident memory counts
 In theory, managing only requires GPU page table edits
– But GPU virtual addresses are visible in DirectX 12
– Must reallocate/copy data
 GPU VA exhaustion will fail at resource allocation
– Again, please handle this gracefully! 
Haswell GPU Virtual Address Limit
31
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Good news: no longer an issue on Broadwell
– Large GPU virtual address space (same as CPU)
 Memory-related public service announcement:
– Don’t make/ship 32-bit (CPU) D3D12 applications!
– Even if it works today…
– Thank me later 
Broadwell GPU Virtual Addresses
32
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Resource Binding
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Resources views are effectively just a small structure
– Metadata and a pointer to memory (usually ~32-64 bytes)
– Stuff like texture dimensions, format, layout, etc.
 Direct3D 12 directly exposes these “descriptors”
– Independent from the actual memory they reference
– Can be created/copied/etc. freely
– Application must ensure no dangling pointers
Resource Descriptors
34
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Not an API object – manipulated directly by application
– Descriptor size query-able by application
– Can be created at any time; free-threaded API call
 Descriptors are put into “heaps” (arrays)
– CBVs, SRVs and UAVs can be mixed in one heap
– Samplers in a separate heap
– Can have one or more of each type, GPU visible or CPU only
 Changing heaps is expensive (pipeline flush)
– Ideally use a single heap of each type (sampler, CBV/SRV/UAV)
– Exception: changing heaps at command list boundary is “free”
Resource Descriptors
35
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Descriptors Example
UAV
CBV
SRV
CBV
SRV
SRV
Descriptor Heap
D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = { ... };
cmdList->CreateUnorderedAccessView(res, desc, [uavHandle])
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = { ... };
cmdList->CreateConstantBufferView(res, cbvDesc, [cbvHandle]);
...
36
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Think of it like a function signature for your shader(s)
 Defines parameters and how they map to shader inputs
– Root constants (data: zero indirections)
– Root descriptors (pointer to data: one indirection)
– Descriptor tables (pointer to descriptors: two indirections)
 Each parameter can be visible to one or more shader stages
 Parameters are “versioned” by implementation/hardware
– This is the single place the “stream” of versions are managed
– Maximum size is very small to avoid abuse
Root Signature
37
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Root Parameter Indirections
…
UAV
CBV
…
Descriptor Heap
MemoryRoot Signature
0 Root Constants
1 Root Descriptor
2 Descriptor Table
38
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Pass a small number of constants directly to shaders
– Bound to shader as a single constant buffer
 Useful for simple indirections; draw ID, material ID, etc.
– Avoids creating versioned memory, descriptor, heap, etc
– Shader can use to look up into arbitrary data structures
Root Constants
39
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Stores a single descriptor directly as a root parameter
– No need to burn through descriptor heap space
– Most useful for a descriptor that changes ~ every draw
 Can only reference “raw data”
– Only buffer resources (CBVs, SRVs/UAVs of buffers)
– No type conversions (i.e. only float/uint/sint components)
– i.e. it’s just a pointer to memory
– No out of bounds checking! Don’t do bad stuff 
Root Descriptors
40
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Maps continuous range of descriptors to shader slots
– Can mix SRVs, UAVs, and CBVs arbitrarily
 Multiple descriptor tables can point to disjoint ranges
– Ex. Use separate parameters for different update
frequencies
– Per-scene, per-material, per-instance, per-draw, etc.
– Similar to constant buffers, now also for the descriptors too
Descriptor Tables
41
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Root Signature Example
0 Descriptor Table
1 Descriptor Table
Root Signature
D3D12_DESCRIPTOR_RANGE Param0Ranges[3];
Param0Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 1); // t1
Param0Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 1); // b1
Param0Ranges[2].Init(D3D12_DESCRIPTOR_RANGE_SRV, 2, 4); // t4-t5
D3D12_DESCRIPTOR_RANGE Param1Ranges[2];
Param1Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_UAV, 1, 0); // u0
Param1Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 2); // b2
// Visibility to all stages allows sharing binding tables
D3D12_ROOT_PARAMETER Param[2];
Param[0].InitAsDescriptorTable(3, Param0Ranges, D3D12_SHADER_VISIBILITY_ALL);
Param[1].InitAsDescriptorTable(2, Param1Ranges, D3D12_SHADER_VISIBILITY_ALL);
t1 b1 t4 t5
u0 b2
42
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Root Signature Example
Root Signature
t1 b1 t4 t5
u0 b2
t0
...
Param[2].InitAsShaderResourceView(1, 0); // t0
Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants)
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant b0
43
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Root Signature Example
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant
Root Signature
t1 b1 t4 t5
u0 b2
...
Param[2].InitAsShaderResourceView(1, 0); // t0
Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants)
t0
b0
44
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Root Signature Example (HLSL)
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant
Root Signature
t1 b1 t4 t5
u0 b2
t0
b0
DescriptorTable(SRV(t1), CBV(b1), SRV(t4, numDescriptors=2)),
DescriptorTable(UAV(u0), CBV(b2)),
SRV(t0),
RootConstants(b0, num32BitConstants=4)
45
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Binding Example
UAV
CBV
SRV
CBV
SRV
SRV
Root Signature
cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]);
cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]);
t1 b1 t4 t5
u0 b2
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant
t0
b0
Descriptor Heap
46
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Binding Example
UAV
CBV
SRV
CBV
SRV
SRV
Root Signature
cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]);
cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]);
cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]);
t0 SRV
t1 b1 t4 t5
u0 b2
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant b0
Descriptor Heap
47
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
b0 {1, 3, 3, 7}
Binding Example
UAV
CBV
SRV
CBV
SRV
SRV
Root Signature
cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]);
cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]);
cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]);
cmdList->SetGraphicsRoot32BitConstants(3, {1,3,3,7}, 0, 4);
t0 SRV
0 Descriptor Table
1 Descriptor Table
2 Shader Resource View
3 uint4 Constant
t1 b1 t4 t5
u0 b2
Descriptor Heap
48
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Root constants implemented with “push constants”
– Buffer that hardware uses to prepopulate EU registers
– When EU thread launches, values are immediately available
– Can be a GPU performance win vs. loading buffer data
 Root descriptors also use push constants
– Pointers passed as constants to the shader
– Data read through general memory path
 Descriptor tables use “binding table” hardware
– Each descriptor binding requires one binding table slot
Haswell/Broadwell Resource Binding
49
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell/Broadwell Descriptor Tables
Root Signature
t1 b1 t4 t5
u0 b2
0 Descriptor Table
1 Descriptor Table
HLSL binding u0 b2 t1 b1 t4 t5 … …
Binding table index (BTI) 0 1 2 3 4 5 … …
Shader compiler
Emit proper BTIs
Driver runtime
Fill in binding tables
~2-12 reserved slots
and render targets
50
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell/Broadwell Descriptor Tables Example
Ring of Binding Tables
…
…
UAV
CBV
…
SRV
CBV
SRV
SRV
…
64KB
User descriptors
Up to ~1 million, each
32 bytes (Gen7.5)
64 bytes (Gen8)
Surface state base address
DWORD 0
DWORD 1
DWORD 2
…
DWORD 7
…
DWORD 16376
DWORD 16377
DWORD 16378
…
DWORD 16384
51
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell/Broadwell Descriptor Tables Example
DWORD 0
DWORD 1
DWORD 2
…
DWORD 7
…
DWORD 16376
DWORD 16377
DWORD 16378
…
DWORD 16384
t1
b1
t4
…
…
Ring of Binding Tables
…
…
UAV
CBV
…
SRV
CBV
SRV
SRV
…
64KB
User descriptors
Up to ~1 million, each
32 bytes (Gen7.5)
64 bytes (Gen8)
Surface state base address
Binding
table
pointer
52
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell/Broadwell Descriptor Tables Example
DWORD 0
DWORD 1
DWORD 2
…
DWORD 7
…
DWORD 16376
DWORD 16377
DWORD 16378
…
DWORD 16384
t1
b1
t4
…
…
…
t1
b1
t4
…
…
Ring of Binding Tables
…
…
UAV
CBV
…
SRV
CBV
SRV
SRV
…
64KB
User descriptors
Up to ~1 million, each
32 bytes (Gen7.5)
64 bytes (Gen8)
Surface state base address
Binding
table
pointer
53
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haswell/Broadwell Descriptor Tables Example
Ring of Binding Tables
Ring of Binding Tables
…
UAV
CBV
…
SRV
CBV
SRV
SRV
…
64KB
User descriptors
Up to ~1 million, each
32 bytes (Gen7.5)
64 bytes (Gen8)
Surface state base address
Binding
table
pointer
DWORD 0
DWORD 1
DWORD 2
…
DWORD 7
…
DWORD 16376
DWORD 16377
DWORD 16378
…
DWORD 16384
64KB
Pipeline stall!
54
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Minimize “types” of parameters changed in inner loop
– Descriptor tables, samplers, root descriptors, root constants
– Cost of changing 1 of type X ~ cost of changing all of type X
 Minimize # descriptors referenced by tables
– Don’t leave dangling/unused descriptors in large ranges
– Most important for root signatures used in inner loops
– Future hardware will only cost # tables, not # descriptors
Resource Binding Summary
55
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Define sampler parameters right in the root signature
– Or right in the shader with HLSL root signature language
 No performance advantage on Haswell/Broadwell
– Driver places static samplers in the regular sampler heap
– Same as manually putting them there manually
 Use them if they are convenient
– Performance should never be worse
Static Samplers
56
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Summary
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 DirectX 12 is a great fit for Intel hardware!
– Increased performance
– Increased power efficiency
 Already supported today on Haswell and Broadwell
– Will get even better in the future
Summary
58
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Follow @DirectX12 and @IntelSoftware
 https://software.intel.com/en-us/gamedev
 http://blogs.msdn.com/directx
 Working on DirectX 12 on Intel?
– andrew.t.lauritzen@intel.com, @AndrewLauritzen
Questions?
59

Weitere ähnliche Inhalte

Was ist angesagt?

NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술
NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술
NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술Ki Hyunwoo
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderEidos-Montréal
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lightingozlael ozlael
 
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑MinGeun Park
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsColin Barré-Brisebois
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희changehee lee
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizationspjcozzi
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...Codemotion
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John HableNaughty Dog
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Brdf기반 사전정의 스킨 셰이더
Brdf기반 사전정의 스킨 셰이더Brdf기반 사전정의 스킨 셰이더
Brdf기반 사전정의 스킨 셰이더동석 김
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 

Was ist angesagt? (20)

NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술
NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술
NDC2016 프로젝트 A1의 AAA급 캐릭터 렌더링 기술
 
Ndc11 이창희_hdr
Ndc11 이창희_hdrNdc11 이창희_hdr
Ndc11 이창희_hdr
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
D2 Hdr
D2 HdrD2 Hdr
D2 Hdr
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
[0107 박민근] 쉽게 배우는 hdr과 톤맵핑
 
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham OriginsGDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
GDC 2014 - Deformable Snow Rendering in Batman: Arkham Origins
 
[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희[Kgc2012] deferred forward 이창희
[Kgc2012] deferred forward 이창희
 
Screen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space MarineScreen Space Decals in Warhammer 40,000: Space Marine
Screen Space Decals in Warhammer 40,000: Space Marine
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Brdf기반 사전정의 스킨 셰이더
Brdf기반 사전정의 스킨 셰이더Brdf기반 사전정의 스킨 셰이더
Brdf기반 사전정의 스킨 셰이더
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 

Ähnlich wie Efficient Rendering with DirectX* 12 on Intel® Graphics

How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%Gael Hofemeier
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Gael Hofemeier
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...Gael Hofemeier
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JSIan Maffett
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKIntel® Software
 

Ähnlich wie Efficient Rendering with DirectX* 12 on Intel® Graphics (20)

How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using Dir...
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JS
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDK
 

Kürzlich hochgeladen

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Efficient Rendering with DirectX* 12 on Intel® Graphics

  • 1. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Efficient Rendering with DirectX 12 on Intel Graphics Andrew Lauritzen Michael Apodaca
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2  Copyright © 2015 Intel Corporation. All rights reserved.  *Other names and brands may be claimed as the property of others.  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.  A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.  Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.  Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance  Iris™ graphics is available on select systems. Consult your system manufacturer.  Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries. Legal
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Intel 4th Generation Core (i3/i5/i7 4xxx) – Code-named “Haswell”, Gen 7.5 GPU architecture – Intel HD Graphics 4400/4600/5000 – Intel Iris Graphics 5100, Iris Pro Graphics 5200, …  Intel 5th Generation Core (i3/i5/i7 5xxx, Core M 5xxx) – Code-named “Broadwell”, Gen 8 GPU architecture – Intel HD Graphics 5300/5500/6000 – Intel Iris Graphics 6100, … Decoder Cheat-sheet 3
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Performance – Improve CPU bound games – Improve multi-core scaling  Power – Improve performance on power-constrained platforms – Improve heat and battery life  How? – Reduce CPU overhead of rendering Why DirectX 12? 4
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Most GPU vendors have complex drivers – Do lots of fancy optimizations on the fly – Costs CPU, but makes the GPU run faster  That’s ok, reviews compare GPUs using fast CPUs!   Drivers spawn threads that conflict with application – Driver thread often consumes an entire core by itself – Plus another core for the game submission thread – Minimal multithreading beyond these two threads Graphics APIs and Overhead 5
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Has recently become far more serious with SoCs – Even if not “CPU bound”, CPU/GPU share power – More CPU load => less GPU power/performance  Complex CPU optimizations are not a good idea… – Tax on all applications, even well-optimized ones – CPU work can take more power than it saves on the GPU! – Leads to lower overall performance CPU/GPU Optimization Tradeoff 6
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  To address this, Intel wrote a much thinner DX11 driver – Introduced with Haswell  Big benefit to well-written applications – But does far less work to make poor ones run well – i.e. no redundant state elimination, minimal state-based shader recompiles, etc.  Still unavoidable CPU overhead due to API design – DirectX 12 addresses this Thinner Intel Graphics Driver 7
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Already significantly lower CPU overhead  Large increase in power efficiency – Power saved on the CPU can be given to the GPU – Applications can both run faster and use less power  Additional GPU optimization opportunities – i.e. stuff that we had to drop in the thinner driver – Pipeline state objects give the driver more context DirectX 12 on Intel 8
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DirectX 12 Power and Performance less power @ same performance higher performance @ same power DirectX 12 can significantly reduce CPU power or improve performance 9
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Commands and state  Memory  Resource binding Agenda 10
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Commands and State
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  DirectX 11 context is “stateful” – State grouped into moderately sized chunks – Rasterizer, depth/stencil, blend, etc.  Groupings do not always map perfectly to hardware – Ex. DirectX blend state != GPU blend state – Driver optimizations based on blend state + pixel shader State in DirectX 11 12
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  API functions cause one or more GPU commands to be added to the command buffer  Some GPU commands are deferred or conditional – Often lazily added at the next draw call Commands in DirectX 11 Command Buffer deviceCtxt->aaa(); deviceCtxt->bbb(); deviceCtxt->ccc(); deviceCtxt->ddd(); 13
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  At some point, the driver decides to commit the command buffer – If the command buffer fills, max buffered frames, Flush(), etc.  It’s passed to kernel mode and GPU addresses are patched  Then, it’s submitted to the GPU Commands in DirectX 11 “DMA” Buffer Validate (KMD) GPU Ring tail head Command Buffer 14
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Limited parallelism with a single context  Deferred contexts do not address the problems – CPU performance and cache issues with transient objects – State mismatch and lazy state setting – Inherited internal states – MAP_DISCARD renaming, hazard tracking, etc. – Non-trivial patching happens at submission time  Result: more overhead and limited parallelism DirectX 11 Deferred Contexts 15
  • 16. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Each thread has its own command list and memory – Fully independent – Use ~1 command list/thread  Command lists are submitted to the GPU in arbitrary order – Minimal driver work done at submission time – Submit all command lists in a single API call where possible Commands in DirectX 12 GPU Ring tail headCommand Lists 16
  • 17. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  While those commands are in flight, can record new commands – Can reuse command lists – Must use different memory  When GPU finishes with memory, it can also be reused – App handles synchronization – Typical to put fence at frame boundaries – Always reuse allocators! Commands in DirectX 12 GPU Ring tail headCommand Lists 17
  • 18. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Immutable, monolithic pipeline state objects (PSOs) – Single object captures as much state as possible – Much lower chance of missing driver context – Allows link-time optimizations on shaders  No state inheritance between direct command lists – No API state or internal state inheritance (renaming, etc.) – Explicit barriers to handle hazards and resource transitions State in DirectX 12 18
  • 19. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Create PSOs at initialization time – Multithread your initialization/PSO creation code! – Use PSO “libraries”  PSO changes are usually fairly cheap – Minimal CPU cost, some GPU cost  Some state sorting is still desirable – Turning shader stages on/off can cause pipeline stalls Pipeline State Objects 19
  • 20. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Reusable command lists to further lower CPU overhead  Some minimal state inheritance is allowed – Some patching may occur at submission time – If you don’t need to inherit something, set it (again) in the bundle  Overhead is already very low in DirectX 12 – Need ~10+ draws to make bundles a win on Haswell/Broadwell – Only consider bundles if you have lots of static draws that can’t reasonably be combined (via instancing or similar) – Don’t add any GPU overhead/indirections to enable bundles! Bundles 20
  • 21. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  DirectX 12 expands on DrawIndirect/DispatchIndirect  Command Signature – Indirect Argument Buffer Format – Draw/Dispatch calls – Resource Bindings  Indirect Argument Buffer – Dynamic parameters  Count Buffer Execute Indirect IB VB Draw UAV CBV Draw IB Args VB Args Draw Args UAV Args CBV Args Draw Args IB Args VB Args Draw Args UAV Args CBV Args Draw Args 21
  • 22. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Internal Compute Shader Patches CommandList – Compiled at CreateCommandSignature  If no resource bindings, then no compute shader (legacy) Execute Indirect on Haswell/Broadwell 22
  • 23. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  DirectX 12 exposes multiple “queues” to application – Graphics/compute, compute-only, copy, etc.  Graphics and compute are not simultaneous on Intel – Using separate queues is not a performance benefit – Consider doing both on the main queue  There is a simultaneous copy engine – … but it has fairly low throughput – Driver may implement large copies using the 3D engine Multi-engine on Haswell/Broadwell 23
  • 24. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Memory
  • 25. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Previous APIs (ex. DirectX 11) hide a lot of details – GPU physical memory residency (if applicable) – GPU memory addressing (virtual, physical)  OS/driver manage residency and addressing – Ensures command buffers do not exceed hardware resources – Track referenced allocations, ensure resident – Allocate and patch GPU addresses – Major source of CPU overhead!  Applications try not to over-commit “GPU memory” GPU Memory in WDDM 1.x 25
  • 26. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Directly exposes control over physical residency – Memory referenced by the GPU must be made “resident”  No dedicated video memory on Intel processors – “Resident” resources are allocated out of DRAM  OS uses up to 45% of DRAM for graphics applications – Ex. 1.8GB on a 4GB system, 3.6GB on an 8GB system, … – Global limit across the system, not per-process – Rest is reserved for regular CPU/OS use GPU Memory Residency in WDDM 2.0 26
  • 27. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Allocations are initially made resident – Resource creation will fail if residency budget is exceeded  OS will request that background apps trim residency – Misbehaved applications will be suspended from rendering – i.e. their GPU work will not be scheduled/make progress  Be a good citizen; provide a good user experience – Handle allocation failures and trim requests gracefully – Evict idle resources, trim streaming pools, remove detailed mips, drop quality settings, etc. Memory Residency Best Practices 27
  • 28. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Directly exposes per-process GPU virtual addresses – Can do pointer arithmetic, store in data structures, etc. – GPU virtual addresses allocated at resource allocation – Guaranteed to remain at the same address until release – Eliminates physical address patching overhead  Haswell has a limited GPU virtual address space (~2GB) – Subtly different than residency GPU Virtual Addresses in WDDM 2.0 28
  • 29. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Typical Discrete GPU Memory dGPU CPU PCI-E GPU Page Table GPU virtual address GPU virtual address GPU DRAM (GDDR) CPU Page Table CPU virtual address CPU virtual address CPU DRAM (DDR) Applications typically optimize for this 29
  • 30. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell Memory Haswell is limited by this GPU CPU GPU Page Table GPU virtual address GPU virtual address CPU Page Table CPU virtual address CPU virtual address DRAM (DDR) 30
  • 31. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Not quite the same as limited GPU physical memory – Limit on the amount of DRAM visible to the GPU at once – All GPU-visible memory counts (upload/read-back heaps, …) – Even non-resident memory counts  In theory, managing only requires GPU page table edits – But GPU virtual addresses are visible in DirectX 12 – Must reallocate/copy data  GPU VA exhaustion will fail at resource allocation – Again, please handle this gracefully!  Haswell GPU Virtual Address Limit 31
  • 32. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Good news: no longer an issue on Broadwell – Large GPU virtual address space (same as CPU)  Memory-related public service announcement: – Don’t make/ship 32-bit (CPU) D3D12 applications! – Even if it works today… – Thank me later  Broadwell GPU Virtual Addresses 32
  • 33. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Resource Binding
  • 34. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Resources views are effectively just a small structure – Metadata and a pointer to memory (usually ~32-64 bytes) – Stuff like texture dimensions, format, layout, etc.  Direct3D 12 directly exposes these “descriptors” – Independent from the actual memory they reference – Can be created/copied/etc. freely – Application must ensure no dangling pointers Resource Descriptors 34
  • 35. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Not an API object – manipulated directly by application – Descriptor size query-able by application – Can be created at any time; free-threaded API call  Descriptors are put into “heaps” (arrays) – CBVs, SRVs and UAVs can be mixed in one heap – Samplers in a separate heap – Can have one or more of each type, GPU visible or CPU only  Changing heaps is expensive (pipeline flush) – Ideally use a single heap of each type (sampler, CBV/SRV/UAV) – Exception: changing heaps at command list boundary is “free” Resource Descriptors 35
  • 36. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Descriptors Example UAV CBV SRV CBV SRV SRV Descriptor Heap D3D12_UNORDERED_ACCESS_VIEW_DESC uavDesc = { ... }; cmdList->CreateUnorderedAccessView(res, desc, [uavHandle]) D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = { ... }; cmdList->CreateConstantBufferView(res, cbvDesc, [cbvHandle]); ... 36
  • 37. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Think of it like a function signature for your shader(s)  Defines parameters and how they map to shader inputs – Root constants (data: zero indirections) – Root descriptors (pointer to data: one indirection) – Descriptor tables (pointer to descriptors: two indirections)  Each parameter can be visible to one or more shader stages  Parameters are “versioned” by implementation/hardware – This is the single place the “stream” of versions are managed – Maximum size is very small to avoid abuse Root Signature 37
  • 38. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Root Parameter Indirections … UAV CBV … Descriptor Heap MemoryRoot Signature 0 Root Constants 1 Root Descriptor 2 Descriptor Table 38
  • 39. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Pass a small number of constants directly to shaders – Bound to shader as a single constant buffer  Useful for simple indirections; draw ID, material ID, etc. – Avoids creating versioned memory, descriptor, heap, etc – Shader can use to look up into arbitrary data structures Root Constants 39
  • 40. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Stores a single descriptor directly as a root parameter – No need to burn through descriptor heap space – Most useful for a descriptor that changes ~ every draw  Can only reference “raw data” – Only buffer resources (CBVs, SRVs/UAVs of buffers) – No type conversions (i.e. only float/uint/sint components) – i.e. it’s just a pointer to memory – No out of bounds checking! Don’t do bad stuff  Root Descriptors 40
  • 41. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Maps continuous range of descriptors to shader slots – Can mix SRVs, UAVs, and CBVs arbitrarily  Multiple descriptor tables can point to disjoint ranges – Ex. Use separate parameters for different update frequencies – Per-scene, per-material, per-instance, per-draw, etc. – Similar to constant buffers, now also for the descriptors too Descriptor Tables 41
  • 42. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Root Signature Example 0 Descriptor Table 1 Descriptor Table Root Signature D3D12_DESCRIPTOR_RANGE Param0Ranges[3]; Param0Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 1); // t1 Param0Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 1); // b1 Param0Ranges[2].Init(D3D12_DESCRIPTOR_RANGE_SRV, 2, 4); // t4-t5 D3D12_DESCRIPTOR_RANGE Param1Ranges[2]; Param1Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_UAV, 1, 0); // u0 Param1Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 2); // b2 // Visibility to all stages allows sharing binding tables D3D12_ROOT_PARAMETER Param[2]; Param[0].InitAsDescriptorTable(3, Param0Ranges, D3D12_SHADER_VISIBILITY_ALL); Param[1].InitAsDescriptorTable(2, Param1Ranges, D3D12_SHADER_VISIBILITY_ALL); t1 b1 t4 t5 u0 b2 42
  • 43. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Root Signature Example Root Signature t1 b1 t4 t5 u0 b2 t0 ... Param[2].InitAsShaderResourceView(1, 0); // t0 Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants) 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant b0 43
  • 44. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Root Signature Example 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant Root Signature t1 b1 t4 t5 u0 b2 ... Param[2].InitAsShaderResourceView(1, 0); // t0 Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants) t0 b0 44
  • 45. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Root Signature Example (HLSL) 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant Root Signature t1 b1 t4 t5 u0 b2 t0 b0 DescriptorTable(SRV(t1), CBV(b1), SRV(t4, numDescriptors=2)), DescriptorTable(UAV(u0), CBV(b2)), SRV(t0), RootConstants(b0, num32BitConstants=4) 45
  • 46. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Binding Example UAV CBV SRV CBV SRV SRV Root Signature cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]); cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]); t1 b1 t4 t5 u0 b2 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant t0 b0 Descriptor Heap 46
  • 47. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Binding Example UAV CBV SRV CBV SRV SRV Root Signature cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]); cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]); cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]); t0 SRV t1 b1 t4 t5 u0 b2 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant b0 Descriptor Heap 47
  • 48. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. b0 {1, 3, 3, 7} Binding Example UAV CBV SRV CBV SRV SRV Root Signature cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]); cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]); cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]); cmdList->SetGraphicsRoot32BitConstants(3, {1,3,3,7}, 0, 4); t0 SRV 0 Descriptor Table 1 Descriptor Table 2 Shader Resource View 3 uint4 Constant t1 b1 t4 t5 u0 b2 Descriptor Heap 48
  • 49. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Root constants implemented with “push constants” – Buffer that hardware uses to prepopulate EU registers – When EU thread launches, values are immediately available – Can be a GPU performance win vs. loading buffer data  Root descriptors also use push constants – Pointers passed as constants to the shader – Data read through general memory path  Descriptor tables use “binding table” hardware – Each descriptor binding requires one binding table slot Haswell/Broadwell Resource Binding 49
  • 50. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell/Broadwell Descriptor Tables Root Signature t1 b1 t4 t5 u0 b2 0 Descriptor Table 1 Descriptor Table HLSL binding u0 b2 t1 b1 t4 t5 … … Binding table index (BTI) 0 1 2 3 4 5 … … Shader compiler Emit proper BTIs Driver runtime Fill in binding tables ~2-12 reserved slots and render targets 50
  • 51. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell/Broadwell Descriptor Tables Example Ring of Binding Tables … … UAV CBV … SRV CBV SRV SRV … 64KB User descriptors Up to ~1 million, each 32 bytes (Gen7.5) 64 bytes (Gen8) Surface state base address DWORD 0 DWORD 1 DWORD 2 … DWORD 7 … DWORD 16376 DWORD 16377 DWORD 16378 … DWORD 16384 51
  • 52. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell/Broadwell Descriptor Tables Example DWORD 0 DWORD 1 DWORD 2 … DWORD 7 … DWORD 16376 DWORD 16377 DWORD 16378 … DWORD 16384 t1 b1 t4 … … Ring of Binding Tables … … UAV CBV … SRV CBV SRV SRV … 64KB User descriptors Up to ~1 million, each 32 bytes (Gen7.5) 64 bytes (Gen8) Surface state base address Binding table pointer 52
  • 53. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell/Broadwell Descriptor Tables Example DWORD 0 DWORD 1 DWORD 2 … DWORD 7 … DWORD 16376 DWORD 16377 DWORD 16378 … DWORD 16384 t1 b1 t4 … … … t1 b1 t4 … … Ring of Binding Tables … … UAV CBV … SRV CBV SRV SRV … 64KB User descriptors Up to ~1 million, each 32 bytes (Gen7.5) 64 bytes (Gen8) Surface state base address Binding table pointer 53
  • 54. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haswell/Broadwell Descriptor Tables Example Ring of Binding Tables Ring of Binding Tables … UAV CBV … SRV CBV SRV SRV … 64KB User descriptors Up to ~1 million, each 32 bytes (Gen7.5) 64 bytes (Gen8) Surface state base address Binding table pointer DWORD 0 DWORD 1 DWORD 2 … DWORD 7 … DWORD 16376 DWORD 16377 DWORD 16378 … DWORD 16384 64KB Pipeline stall! 54
  • 55. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Minimize “types” of parameters changed in inner loop – Descriptor tables, samplers, root descriptors, root constants – Cost of changing 1 of type X ~ cost of changing all of type X  Minimize # descriptors referenced by tables – Don’t leave dangling/unused descriptors in large ranges – Most important for root signatures used in inner loops – Future hardware will only cost # tables, not # descriptors Resource Binding Summary 55
  • 56. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Define sampler parameters right in the root signature – Or right in the shader with HLSL root signature language  No performance advantage on Haswell/Broadwell – Driver places static samplers in the regular sampler heap – Same as manually putting them there manually  Use them if they are convenient – Performance should never be worse Static Samplers 56
  • 57. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Summary
  • 58. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  DirectX 12 is a great fit for Intel hardware! – Increased performance – Increased power efficiency  Already supported today on Haswell and Broadwell – Will get even better in the future Summary 58
  • 59. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Follow @DirectX12 and @IntelSoftware  https://software.intel.com/en-us/gamedev  http://blogs.msdn.com/directx  Working on DirectX 12 on Intel? – andrew.t.lauritzen@intel.com, @AndrewLauritzen Questions? 59