2. 2
Mark Kilgard
• Principal System Software Engineer
OpenGL driver and API evolution
Cg (“C for graphics”) shading language
GPU-accelerated path rendering & web browser
rendering
• OpenGL Utility Toolkit (GLUT) implementer
• Specified and implemented much of OpenGL
• Author of OpenGL for the X Window System
• Co-author of Cg Tutorial
• Worked on OpenGL for over 25 years
My Background
5. 5
OpenGL Codebase Leverage
Same driver code base supports multiple APIs
OpenGL for Embedded,
Mobile, and Web
Multi-vendor, explicit, low-level graphics
from Khronos
6. 6
NVIDIA’s Shading Compiler Even More Leveraged
Various
Direct3D versions3D APIs based on NVIDIA OpenGL driver code base
NVIDIA Shading Compiler code base
Apple’s proprietary
graphics API
Proprietary console API
7. 7
Still the One Truly Common & Open 3D API
OS X
Linux
FreeBSD
Solaris
Android
Windows
Embedded
Designs
8. 8
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extensions
OpenGL 4.5
Core
Maxwell
Extensions
Legacy EXT & Other
Compatibility Extensions
OpenGL Complete
Compatibility
Path Rendering
Multi-GPU.
SLI
Approaching Zero
Driver Overhead
NVIDIA Multi-generation
GPU Initiatives
DirectX inter-op
Vulkan inter-op
ES Enhancements
Full OpenGL
ES 3.2
Khronos Standard
Expected Compatibility
NVIDIA Initiatives
GPU Generation Features
9. 9
OpenGL’s Recent Advancements
2014 2015 2016
New ARB Extensions
3 standard extensions, beyond 4.5
• ARB_sparse_buffer
• ARB_pipeline_statistics_query
• ARB_transform_feedback_overflow_query
Maxwell Extensions
• Novel graphics features
• 14 new extensions
• Global Illumination &
Vector Graphics focus
13. 13
For those tracking birthdays...
Then celebrating OpenGL 4.3 Now celebrating OpenGL 4.6
14. 14
Need a Refresher on 2014, 2015, and 2016 OpenGL?
• Honestly, NVIDIA exposed lots of functionality in last 3 years
Available @ http://www.slideshare.net/Mark_Kilgard
15. 15
Introducing OpenGL 4.6
• Big feature: SPIR-V support required
• SPIR-V = standard intermediate language for parallel compute and graphics
• Vulkan 1.0 standard requires expressing SPIR-V
• Allows content creators to simplify their shader authoring and management pipelines
• Previously this was an optional ARB extension, not required for 4.5
• Includes NEW ARB_spirv_extensions to SPIR-V support
• Genius of AND: OpenGL 4.6 allows either GLSL or SPIR-V, your choice
• Technically, NVIDIA’s Vulkan 1.0 allows use GLSL directly via an extension
• Additional new ARB extensions bundled in OpenGL 4.6 for
• Improving performance
• Improving rendering quality
• Resolving outstanding Intellectual Property (IP) issues
support not built-in
16. 16
OpenGL extension exposing Khronos intermediate
language for parallel compute and graphics
Khronos extension for OpenGL + SPIR-V
ARB extension announced last year
July 22, 2016
Allows compiled SPIR-V code to be passed directly to OpenGL driver
Accepts SPIR-V output from open source Glslang Khronos Reference compiler
https://github.com/KhronosGroup/glslang
Other compilers can target SPIR-V too
Khronos standard extension ARB_gl_spirv
+
17. 17
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
•SPIR-V
•Khronos defined and controlled
cross-API intermediate language
•Native support for graphics
and parallel constructs
•32-bit Word Stream
•Extensible and easily parsed
•Retains data object and control
flow information for effective
code generation and translation
OpenCL C++OpenCL C
GLSL
Khronos has open sourced
these tools and translators
IHV Driver
Runtimes
Other
Intermediate
Forms
SPIR-V Validator
SPIR-V (Dis)Assembler LLVM to SPIR-V
Bi-directional
Translator
Khronos plans to open
source these tools soon
https://github.com/KhronosGroup/SPIR/tree/spirv-1.1
Open source C++
front-end released
HLSL
Khronos has open sourced
these tools and translators
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators HLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
SPIR-V Validator
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
SPIR-V (Dis)Assembler
SPIR-V Validator
LLVM to SPIR-V
Bi-directional
Translator
OpenCL C++OpenCL C
GLSLHLSL
Khronos plans to open
source these tools soon
Khronos has open sourced
these tools and translators
OpenGL support NEW with
ARB_gl_spirv
Standard in
OpenGL 4.6
18. 18
NVIDIA’s SIGGRAPH
Driver Update
• NVIDIA historically releases a “developer” driver at SIGGRAPH with support for all Khronos
standards announced at SIGGRAPH
• This year too
• Monday (July 31, 2017) NVIDIA put out a new SIGGRAPH driver
• OpenGL 4.6 (beta, expected to pass 4.6 Conformance when available)
• Multi-vendor (EXT) interoperability extensions
• Finally portable interoperability between OpenGL, Vulkan, OpenCL, etc.
• Generic: EXT_memory_object, EXT_semaphore
• Windows: EXT_memory_object_win32, EXT_win32_keyed_mutex, EXT_semaphore_win32
• Unix: EXT_memory_object_fd, EXT_semaphore_fd (Unix)
• Other new extensions
• NV_blend_minmax_factor, consistent with AMD_blend_minmax_factor
• Fill in missing ES functionality gaps
• EXT_clear_texture, EXT_conservative_depth, EXT_shader_group_vote, EXT_texture_compression_bptc,
EXT_texture_sRGB_R8, EXT_draw_transform_feedback, OES_viewport_array
• EXT_clip_cull_distance, ES support for clip planes & cull distances
• EXT_protected_textures (Tegra & ES only) for protected content
• For Windows and Linux operating systems
• Also Vulkan improvements
OpenGL 4.6 + Multi-vendor Interop + Vulkan Updates & More
https://developer.nvidia.com/opengl-driver
20. 20
What OpenGL 4.6 Packages Together
• OpenGL evolves by bundling extensions as a core version update
• OpenGL 4.6 = everything in 4.5 plus these extensions
• ARB_indirect_parameters
• ARB_pipeline_statistics_query
• ARB_polygon_offset_clamp
• KHR_no_error
• ARB_shader_atomic_counter_ops (just extends OpenGL Shading Language)
• ARB_shader_draw_parameters
• ARB_shader_group_vote (just extends OpenGL Shading Language)
• ARB_gl_spirv
• ARB_spirv_extensions
• ARB_texture_filter_anisotropic
• ARB_transform_feedback_overflow_query
• Now you can code for this functionality without ARB or EXT suffixing!
The one technically “brand new” extension;
other 4.6 functionality already proven & public
21. 21
ARB_indirect_parameters: Intro & Review
•Evolving capability in OpenGL 4.x
• General idea: allow the GPU to generate its own rendering work
• Part of AZDO philosophy
• AZDO = Approaching Zero Driver Overhead
• Big idea: If GPU generates its own work, the driver overhead on the CPU diminishes
• Example: compute shader generates sets of meshes; then renders those meshes
• But we don’t want the GPU to “wait” for the CPU to orchestrate this effort
•Builds on OpenGL 4.0 and 4.3’s improvements
• 4.0 added indirect draws: instanced draw call’s parameters sourced from GPU buffer
• 4.3 added multiple indirect draws: one GL command launched N indirect draws
•OpenGL 4.6’s breakthrough: ARB_indirect_parameters
• Now the count of multiple indirect draw batches itself can be sourced from the GPU
22. 22
Original Ways to Draw
• Two primary ways to draw with vertex arrays
• glDrawElements
• Accepts an array of vertex indexes
• glDrawArrays
• Accepts a sequential range indexes
• OpenGL 3.1 added instanced versions
• glDrawElementsInstanced
• glDrawArraysInstanced
• Includes “instance count” parameter
• Repeats each draw “instance count” times, changing gl_InstanceID each iteration
24. 24
Multi Draw Arrays
• glMultiDrawArrays & glMultiDrawElements
• Same as before, but loop over glDrawArrays or glDrawElements
• Primitive count parameter says how many iterations
• Each iteration sources non-mode parameters from CPU arrays
• Fundamentally not more powerful than you writing the loop in your CPU code
• But establishes a useful pattern for the future...
25. 25
Instancing
• GPU draw the same primitive topology, N times
• Shader or vertex attribute usage can transform & shader each instance differently
• Loops to output a single set of draw indices multiple times
• Each iteration outputs a different instance
• GLSL shaders can access gl_InstanceID to behave differently per instance
• Instancing alternative to using gl_InstanceID in your shader
• glVertexAttribDivisor gives a vertex attribute array a divisor
• When divisor is non-zero, floor(instance / divisor) is used for this array
• Common usage: when divisor is 1 for a vertex attribute array, treats instance ID uses index
• Effectively enables per-instance vertex arrays
26. 26
Power of Instancing
• Vertex arrays with a single object mesh can
render N distinct instances from a single GL
command
• Example image shows
• Hundreds of instances
• Draw from single mesh
• Each instance has its own color & translation
• Observations
• GPU reads instanced vertex attributes
• But CPU still launches the N instances
Source: In2GPU
27. 27
Draw Indirect (OpenGL 4.0)
• Conventional GL draw calls
• Require directly passing parameters to each GL draw call to find the indices to source
• Direct parameter passing means CPU supplies all the draw parameters
• Causes CPU overhead on each draw
• Solution: Draw Indirect
• Sources each batch of draw arrays or draw elements parameters from a GPU buffer
• Parameters, except for mode, accessed from GL_DRAW_INDIRECT_BUFFER binding
• Big advantage
• GPU can generate draw batches itself
• Say with compute shaders
• Means GPU can feed itself
28. 28
Draw Indirect Buffer Layout
• glDrawArraysIndirect
• Takes: (GLenum mode, const void *indirect)
• indirect is GPU offset to four 32-bit words
• Mimics calling
glDrawArraysInstanced(mode, cmd->first,
cmd->count, cmd->primCount);
• glDrawElementsIndirect
• Takes: (GLenum mode, GLenum type, const void *indirect)
• indirect is GPU buffer offset to five 32-bit words
• Mimics calling
glDrawElementsInstancedBaseVertex(mode,
cmd->count, type, cmd->firstIndex * sizeof-type,
cmd->primCount, cmd->baseVertex);
• BUT cmd pointer indirection happens on the GPU sourced
from a GL buffer object
struct DrawArraysIndirectCommand {
GLuint count;
GLuint primCount;
GLuint first;
GLuint reservedMustBeZero;
} ;
struct DrawElementsIndirectCommand {
GLuint count;
GLuint primCount;
GLuint firstIndex;
GLint baseVertex;
GLuint reservedMustBeZero;
} ;
Important: These structures
are read by the GPU from
GPU buffers
29. 29
Multi Draw Indirect (OpenGL 4.3)
• Now a single GL command can launch multiple draw indirect operations
• Takes a primitive count (N) for number of draw indirects
• Performs N draw indirect operations
• Each operation’s parameters are read from draw indirect buffer binding
• Stride parameter
• glMultiDrawArraysIndirect & glMultiDrawElementsIndirect
• Single CPU command launches N draw indirect operations
• All the parameters for all the draw indirect operations sourced by GPU
• Very high leverage: tiny CPU effort can launch enormous amount of rendering
30. 30
ARB_indirect_parameters
• Yet-another new buffer binding
• glBindBuffer(GL_PARAMETER_BUFFER);
• Buffer source for reading the indirect draw count
• Two new commands
• glMultiDrawArraysIndirectCount
• glMultiDrawElementsIndirectCount
• Like glMultiDraw{Arrays/Elements}Indirect except
• NEW draw count offset parameter is a buffer offset into NEW current parameter buffer
– parameter_buffer[drawcountoffset] actual drawcount
• Count clamped by maxdrawcount parameter
• What’s better about OpenGL 4.6 version?
• Free of ARB suffixes in OpenGL 4.6
31. 31
ARB_indirect_parameters Usage Scenario
• Correctly-ordered blended dynamic particle system
• Particles are semi-opaque 3D models, not just points
• OpenGL compute shader computes particle interactions & what to render
• Incrementally update particle positions & spin
• Cull particles outside current view
• Back-to-front sort of remaining viewable semi-opaque 3D models
• Write out ordered, un-culled multi draw indirect to GL_DRAW_INDIRECT_BUFFER
• Write out total of un-culled draw indirect count to GL_PARAMETER_BUFFER
• Single glMultiDrawElementsIndirectCount command draws particles
32. 32
ARB_pipeline_statistics_query
•New query types
• Shares same API initially used for occlusion queries
• glBeginQuery, glEndQuery, glGetQueryiv, glGenQueries, glDeleteQueries
• Original occlusion queries just returned samples passed
• Prior extensions added queries for transform feedback, conservative rasterization
•Now extended to return rendering statistics throughout the pipeline
• Shader invocation counts
• How many primitives pass through different points in rendering pipeline
•Useful for performance analysis
• Without this functionality, very difficult to accurately know how much
rendering work you are really creating
• Particularly for modern OpenGL usage
•Comparable to statistics available to Direct3D 11
• Compare with D3D11_QUERY_DATA_PIPELINE_STATISTICS
33. 33
Available Statistics
Query token Queried statistic
GL_VERTICES_SUBMITTED # of vertices issued to OpenGL
GL_PRIMITIVES_SUBMITTED # of primitives issued to OpenGL
GL_VERTEX_SHADER_INVOCATIONS # of times a vertex shader invoked
GL_TESS_CONTROL_SHADER_PATCHES # of times a tessellation control shader invoked
GL_TESS_EVALUATION_SHADER_INVOCATIONS # of times a tessellation evaluation shader invoked
GL_GEOMETRY_SHADER_INVOCATIONS # of times a geometry shader invoked
GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED # of primitives that entered primitive clipping
GL_FRAGMENT_SHADER_INVOCATIONS # of times a fragment shader invoked
GL_COMPUTE_SHADER_INVOCATIONS # of times a compute shader invoked
GL_CLIPPING_INPUT_PRIMITIVES # of primitives that entered primitive clipping
GL_CLIPPING_OUTPUT_PRIMITIVES # of primitives that output by primitive clipping
34. 34
Simple Example Usage
• Creating a query object
• GLuint query_object;
• glGenQueries(1, &query_object);
• Begin a query, do work, and end the query’s interval
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object);
• renderLotsOfStuff!
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object);
• Later read back to the CPU the query object’s result
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result;
• glGetQueryObjectui64v(query_object, GL_QUERY_RESULT, &query_result);
• When done with the query object
• glDeleteQueries(1, &query_buffer);
• Alternatively write query results into a buffer...
35. 35
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
create multiple
query objects
36. 36
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
create buffer object
for PU to write query
results
37. 37
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
begin queries
draw, and end
them
38. 38
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
now have GPU
write query results
to GPU buffer
39. 39
• Create multiple query objects
• const Glint num_results = 2; // could be larger!
• GLuint query_object[2];
• glGenQueries(num_results , query_object);
• Create GPU buffer object for writing query results into
• GLuint query_buffer_object;
• glGenBuffers(1, &query_buffer_object);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glBufferData(GL_QUERY_BUFFER, num_results*sizeof(GLuint64), NULL, GL_DYNAMIC_READ);
• Begin a query, do work, end the query’s interval, and write query results to query buffer offsets
• glBeginQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBeginQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• renderLotsOfStuff!
• gEndQuery(GL_CLIPPING_OUTPUT_PRIMITIVES, query_object[1]);
• gEndQuery(GL_FRAGMENT_SHADER_INVOCATIONS, query_object[0]);
• glBindBuffer(GL_QUERY_BUFFER, query_buffer_object);
• glGetQueryObjectui64v(query_object[0], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*0);
• glGetQueryObjectui64v(query_object[1], GL_QUERY_RESULT, (GLint64 *)sizeof(GLuint64)*1);
• Later read the query results from GPU buffer
• Ideally not immediately after the rendering so retrieving query doesn’t stall the pipeline!
• GLuint64 query_result[2];
• glGetBufferSubData(GL_QUERY_BUFFER, 0, sizeof(query_result), &result64);
• Cleanup
• glDeleteBuffers(1 , &query_buffer_object);
• glDeleteQueries(num_results , query_object);
Example Writing Query Results to GPU Buffers
later read the
GPU buffer’s
contents to the
CPU
40. 40
ARB_polygon_offset_clamp
• Extends OpenGL’s polygon offset feature
•Polygon offset was one of OpenGL’s first
extensions
• Standardized by OpenGL 1.1
• Biases rasterized depth (Z) by constant bias +
bias based on primitive’s depth maximum slope
• What’s NEW in OpenGL 4.6
•Effective depth bias clamped to a specified
maximum offset
•Used to mitigate second-order light leak
artifacts of polygon offset
•Long supported by PlayStation 3 and Direct3D
• First exposed in OpenGL as multi-vendor EXT
extension
•EXT_polygon_offset_clamp in 2014
•Adding to OpenGL 4.6 resolves IP issues
Source: Eric Lengyel, Terathon Software
41. 41
Motivation & Usage of Polygon Offset
• Motivation of polygon offset
• Depth buffers must quantize depth values
• Typically 24-bit fixed-point
• Want to rasterize depth-tested geometry
• BUT have need to disambiguate
nearly identical depth values
Rasterizing co-planar geometry, e.g. runway markings
Constructing shadow maps needs Z values to be “pushed back
a little” to avoid Z fighting causing self-shadowing artifacts
Shadow acne due to Z fighting
during shadow map testing
Shadow acne avoided using
polygon offset
Hidden line and silhouette rendering via polygon offset
42. 42
Polygon Offset Justified (1)
• Rasterizing triangles generates discretized depth values
• A rasterizer’s depth slope for a triangle determines how Z values vary over triangle in
pixel space
• Triangles are “snapped” to sub-pixel fractional positions
• Practical requirement, necessary for watertight rasterization
• Rasterization hardware operates with finite fixed-point precision
• Dealing with Z fighting isn’t as simple as “nudging Z values” a little closer/further
• Two triangles logically in the same plane are NOT after
• floating-point transformation
• sub-pixel transformation
• discrete depth interpolation
• geometric mesh uncertainty – those triangles may appear co-planar, but are they really??
43. 43
Polygon Offset Justified (2)
• Conceptually, think of interpolated depth as having “error bars”
• Depth rasterization error isn’t “experimental”
but rather “quantization” error
• Important: The depth slope tells maximum the
depth of a primitive will shift moving in pixel X & Y
• So if there is uncertainty (read: quantization!) in X & Y, a primitive’s depth slope quantizes the
maximum error per pixel shift
• Hence polygon offset’s bias should be scaled by the maximum of the X & Y depth slopes
• This is what the original OpenGL 1.1 polygon offset
functionality does
• Bias applied in unites of minimum Z buffer precision
• Typically a bias of 1 or 2 and slope of 0.5 is enough
to mitigate Z fighting
• Accounts for half a pixel of Z error
• Sounds fishy but (mostly) works! Think of your rasterized fragments & pixels
having error bars for X & Y... and Z!
44. 44
Polygon Offset Improved!
• Wait... Sounds fishy but (mostly) works!
• Mostly??
• What can go wrong?
• The depth slop can “get large” for geometry viewed edge-on
• Gradient magnitude for slope is conservative and can get too large
• So “fixing” shadow acne “exposes” light leaks
• This is the “too much of a good thing” principle at work
• Analogy: Band-aid on a band-aid:
• If the bias can sometimes get too large... then
• Clamp the maximum depth bias to some largest “reasonable” offset
45. 45
Using Polygon Offset Clamp
•Easy API just adds new maximum depth bias clamp value
• GL 1.1: glPolygonOffset(factor, units)
• GL 4.6: glPolygonOffsetClamp(factor, units, clamp);
•Changes the OpenGL specification’s equation for depth bias
• WAS
• NOW
46. 46
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
BEFORE AFTER
Solid girder’s shadow shows streaks
Animates badly
Mitigated by clamping
47. 47
Examples of Light Leaks
Mitigated by Polygon Offset Clamp
Dots of light within boot’s shadow
Animates badly
Mitigated by clamping
BEFORE AFTERBEFORE AFTER
48. 48
KHR_no_error
• The “no airbags” extension now part of OpenGL 4.6
• Makes OpenGL operation in the presence of GL_INVALID_VALUE,
GL_INVALID_OPERATION, etc. undefined
• GL_OUT_OF_MEMORY may still be generated but the occurrence might be delayed
• Intended to make OpenGL more efficient by obviating error checking & handling
• Hmm, not a large overhead in NVIDIA’s driver
• Typically error checks are folded into parameter handling
• Error checks are typically well-predicted “branches not taken” so cheap on modern CPUs
• Your must “opt in” to the “no error” semantic at context creation
• For EGL, works with eglCreateContext
• See the EGL_KHR_create_context_no_error extension
• Query the value of GL_CONTEXT_FLAGS for the GL_CONTEXT_FLAG_NO_ERROR_BIT to
see if the “no errors” semantic is enabled for a context
• WGL_ARB_create_context_no_error and GXL_ARB_create_context_no_error provide
WGL and GLX mechanisms for requesting “no error” semantic for a context.
49. 49
NVIDIA’s KHR_no_error Advice
• General advice: “Try it before you buy it”
• Not generating errors has a severe side-effect (main effect!) you’re blind to errors!
• First confirm there’s some sufficient performance benefit to offset the risk
• If you really are worried about API error detection overhead, consider Vulkan
• And before you even try it:
• Try disabling GL_DEBUG_OUTPUT_SYNCHRONOUS (part of OpenGL 4.3) first
• This still detects GL errors but avoids returning errors synchronously
• Asynchronous error and debug output helps NVIDIA’s dual-core driver to avoid app-
driver synchronization events for errors and debug output
• Then OpenGL API overhead can be relegated to another CPU improving performance
• Without losing well-defined error handling
• NVIDIA’s current “no errors” behavior is to simply hide posting the OpenGL error
• So the current benefit of “no errors” is very meager
• Errors are still detected and erroneous commands are ignored
• Considerations
• Expecting your software to work for years?
• Is your application’s predictable operation important for your user base?
• If yes, then blinding yourself to errors probably isn’t a good idea...
50. 50
ARB_shader_atomic_counter_ops
• Completes OpenGL Shading Language (GLSL) support for atomic counters
• Prior ARB_shader_atomic_counter limited to increment, decrement, & query ops
• Operates on special atomic_uint variables
• New built-in functions for atomic counters
• Addition & Subtract: atomicCounterAdd & atomicCounterSubtract
• Minimum & Maximum: atomicCounterMin & atomicCounterMax
• Bitwise operators (AND, OR, XOR, etc.): atomicCounterAnd, atomicCounterOr, etc.
• Exchange, Compare & Exchange: atomicCounterExchange, atomicCounterCompSwap
• NOTE: Image loads & stores support similar atomics
51. 51
ARB_shader_draw_parameters
• Adds to new GLSL built-in variables to get base vertex and instance
• gl_BaseVertex
• gl_BaseInstance
• Useful for offsetting gl_VertexID or gl_InstanceID respectively
• Also for glMultiDraw* commands, new GLSL built-in variable
• gl_DrawID
• glMultiDrawArrays, glMultiDrawArraysIndirect, glMultiDrawArraysIndirectCount
• glMultiDrawElements, glMultiDrawElementsIndirect, glMultiDrawElementsIndirectCount
• Rationale: lets app treat draw batches programmatically from within an über
shader to better minimize state changes
52. 52
ARB_shader_group_vote
• Provides new GLSL built-in functions to compute composite of a set of boolean
conditions across a group of shader invocations
• Functions returning a boolean
• bool anyInvocation(bool value)
all threads return true if value is true for any, otherwise false
• bool allInvocations(bool value)
all threads return true if value is true for all threads, otherwise false
• bool allInvocationsEqual(bool value)
all threads return true if value is identical (equal) for all threads, otherwise false
53. 53
ARB_shader_group_vote Rationale
• Rationale
• Implementation reality: GPUs run shader invocations using groups of threads
• NVIDIA calls these groups “warps”
• Threads run most efficient when they share the same sequence of instructions
• This is called “converged execution” (good), instead of diverged execution (bad)
• Group votes can keep threads running converged
• Consider this an advanced optimization to your shaders
• SPMT (“single program, multiple thread”) execution means shaders run reasonably
even when divergence is possible
• Example use: Common for all threads in shader to need exactly four loop
iterations
• If all threads can agree they are in the “4 iterations” case, the shader could be
written with an unrolled loop in expectation of this common case
• Thereby avoiding the loop overhead of the general case
54. 54
ARB_gl_spirv
• This extension announced at SIGGRAPH 2016
• But was optional
• NVIDIA announced support last year
• Much more useful to have core part of OpenGL 4.6
• And NOW it is!
56. 56
OpenGL Driver
GLSL Compiler
Front-end
ARB_gl_spirv Enabled
Offline Compilation of GLSL to SPIR-V
your
OpenGL
app
GPU
shader.vert
shader.geom
shader.frag
shader.vert.spv
shader.geom.pv
shader.frag.spv
glslangValidator
or
glslc
GPU-specific
Compiler
Back-end
SPIR-V
Compiler
Front-end
57. 57
Tools to Manipulate SPIR-V
• Open source SPIR-V tools available
• glslang: glslValidtator
• Provides basic GLSL compiler that generates OpenGL friendly SPIR-V
• Use the –G option for ARB_gl_spriv SPIR-V
• https://github.com/KhronosGroup/glslang
• SPIRV-Tools: spirv-as, spirv-dis, spirv-stats, etc.
• Utilities for assembling, disassembling, or otherwise manipulating SPIR-V binaries
• https://github.com/KhronosGroup/SPIRV-Tools
• glslc
• Compiler front-end matching conventional gcc/clang command line options
• Use the --target-env=opengl_compat
• https://github.com/google/shaderc
• Your choice:
• Build from source
• Get pre-compiled from LunarG Vulkan SDK
58. 58
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderSource
glCompileShader
glAttachShader
glCreateShader
glLinkProgram
glGetUniformLocation
glGetAttribLocation
Read GLSL text from file
glUseProgram
glProgramUniform*
while more
shader
domains while more
uniforms
to introspect
while more
attributes
to introspect
59. 59
API Usage Differences: Compiling GLSL vs. SPIR-V
glCreateProgram
glShaderBinary
glSpecializeShader
glAttachShader
glCreateShader
glLinkProgram
Read SPIR-V binary blob from file
glUseProgram
glProgramUniform*
while more
shader
domains
while more
uniforms
to initialize
app assume locations
assigned within the shader,
obviating dynamic introspection
60. 60
ARB_spirv_extensions
• Original ARB_gl_spirv extension only added support for SPIR-V 1.0 concepts that
were part of the OpenGL 4.5 Core Profile
• Many OpenGL ARB and vendor extensions not in OpenGL 4.5 Core add shading language
concepts
• BUT being defined prior to the existence of SPIR-V support in OpenGL, they lack SPIR-V
support for their additional features
• Advertising an extension + its SPIR-V extension means the SPIR-V support for that
extension is present
• So ARB_spirv_extensions adds mechanism to advertise a driver’s supported SPIR-V
extensions:
• Glint num_spirv_extensions;
glGetIntegerv(GL_NUM_SPIR_V_EXTENSIONS, &num_spirv_extensions);
• for (int ndx=0; ndx<num_spirv_extensions; ndx++)
const GLubyte *spirv_extension_name = glGetStringi(GL_SPIR_V_EXTENSIONS, ndx);
• Also defines several SPIR-V extensions...
61. 61
First Set of SPIR-V Extensions
SPIR-V Extension Name Corresponding OpenGL extension or functionality
SPV_KHR_shader_ballot ARB_shader_ballot
SPV_KHR_shader_draw_parameters ARB_shader_draw_parameters
SPV_KHR_subgroup_vote ARB_shader_group_vote
SPV_NV_stereo_view_rendering NV_stereo_view_rendering
SPV_NV_viewport_array2 NV_viewport_array2 or
ARB_shader_viewport_layer_array
SPV_NV_geometry_shader_passthrough NV_geometry_shader_passthrough
SPV_NV_sample_mask_override_coverage NV_sample_mask_override_coverage
SPV_AMD_shader_explicit_vertex_parameter AMD_shader_explicit_vertex_parameter
SPV_AMD_gpu_shader_half_float AMD_gpu_shader_half_float
SPV_KHR_shader_atomic_counter_ops ARB_shader_atomic_counter_ops
SPV_KHR_post_depth_coverage ARB_post_depth_coverage
SPV_KHR_storage_buffer_storage_class Storage buffer support
64. 64
ARB_transform_feedback_overflow_query
• Adds new query types which can be used to detect overflow of transform feedback
buffers
• GL_TRANSFORM_FEEDBACK_OVERFLOW if any stream overflows
GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW if a particular indexed vertex stream
overflows
• These two NEW query types are also allowed for glBeginConditionalRender for
conditional rendering
• Allows the graphics pipeline can condition rendering on if a prior vertex stream
operations overflowed
• Comparable to Direct3D 11’s D3D11_QUERY_SO_OVERFLOW_PREDICATE* stream-
out functionality
65. 65
Why OpenGL Core Updates
are Important (1)
• Not just opportunity for new functionality
• A new specification is released that reconciles all the
bundled extensions into a coherent single document
• Also gives the OpenGL Working Group to better structure OpenGL’s specification
• Opportunity to fix typos, improve consistency of terminology, clarify ambiguities, document expected
error behavior
• Almost two dozen different minor tweaks in 4.6, largely consequential to developers
• Future extensions can then be written against a cleanly resolved 4.6 specification
• Otherwise, extensions can overlap how they amend the core specification and lead to confusion
• Ensures new functionality is covered by the Khronos Intellectual Property (IP) Framework
• This allows OpenGL implementers, developers, and
end-users to confidently depend on the functionality described
• Specifically for 4.6, Intellectual Property concerns surrounding
both anisotropic texture filtering and polygon offset clamping
• Khronos maintains OpenGL, ES, and Vulkan in the
same “IP zone”—so ratifying a Khronos standard resolves issues
for related standards
Coherent Specification
Resolving IP Concerns
66. 66
Why OpenGL Core Updates are Important (2)
• Not just opportunity for new functionality
• Opportunity for a new Conformance Test Suite to be released
• New tests obviously cover NEW functionality
• But also include contributed tests for existing functionality
• Without a new core specification, it is hard to enforce stronger conformance testing
• Vendors would simply continue certifying with an older, weaker conformance test version
• A new core version is a new opportunity to raise the shared quality bar for OpenGL
• Developers adopt OpenGL features at different levels of comfort
• Many developers are happy to use the latest, greatest features
as soon as extensions are shipped in drivers
• Other developers, often those with long-term support horizons,
look for core updates to signal mature standards now ready
to be adopted
• Example: A graphics researcher and a medical device maker can
both use OpenGL, but embrace the features provided at varying
rates and at different milestones
Conformance Testing
QualitySheriff
Developer Comfort
Levels
67. 67
Why OpenGL Core Updates are Important (3)
• Not just opportunity for new functionality
• OpenGL Shading Language (GLSL) gets accompanying revision
• So OpenGL 4.6 brings with it an updated GLSL
• Like the core API specification, the GLSL specification needs reconciliation of new
extensions, typos fixed, clarifications, etc.
• As many Vulkan applications express shaders in GLSL and compile them with glslang to
generate the SPIR-V that Vulkan expects, updating GLSL helps advance Vulkan
• OpenGL core revisions are as much about consolidating OpenGL’s associated
ecosystem support as simply adding NEW features to OpenGL
Advancing the Ecosystem
68. 68
OpenGL 4.6’s Resolving of IP Issues & New Open Sourcing of OpenGL
Conformance Suite Benefits Open Source OpenGL Implementation
• Khronos using Vulkan’s conformance approach for OpenGL now
• See https://github.com/KhronosGroup/VK-GL-CTS
• Should help Mesa keep closer to latest official standard, better for OpenGL overall
"OpenGL 4.6 will be the first OpenGL release where conformant open source
implementations based on the Mesa project will be deliverable in a reasonable
timeframe after release. The open sourcing of the OpenGL conformance test
suite and ongoing work between Khronos and X.org will also allow for non-vendor
led open source implementations to achieve conformance in the near future.“
David Airlie, senior principal engineer at Red Hat, and developer on Mesa /
X.org projects
Source: Khronos OpenGL 4.6 press release
69. 69
Credit for OpenGL 4.6
• Khronos relies on its member companies to complete new OpenGL core updates
• Different companies drove different features, all free to comment and contribute
• Representatives of these companies drove the constituent features of OpenGL 4.6
See Appendix J of OpenGL 4.6 for comprehensive list of contributor companies and individuals
70. 70
GPU “Interop” Usage
•Increasingly applications want to share GPU resources and mix APIs
• Typically sophisticated applications
•APIs involved might be
• Graphics (OpenGL, Vulkan, Direct3D)
• Compute (OpenCL, CUDA)
• Video encode and decode (VDAPU, NVENC, NVDEC, Windows Media)
•Multiple motivations for cross-process GPU resource sharing
• Performance (don’t read back to CPU), latency control (VR compositing)
• Robustness (isolation)
• Security, including protecting digital media assets
•Interop = jargon for two things
• Sharing GPU resources among different APIs
• Sharing GPU resources across process boundaries
• For example, a display compositor
71. 71
Past Interop Extensions for OpenGL
•Past interoperability extensions would pair OpenGL concepts to those
of another one particular GPU API
• Often exposed as proprietary extensions
• Typically tied to platform concepts (e.g. Win32 HANDLEs)
• Simple when API concepts match (e.g. OpenGL textures to Direct3D Surfaces)
•Examples
• NV_DX_interop mixed OpenGL and Direct3D 9
• NV_DX_interop2 mixes OpenGL and Direct3D 10 & 11
• NV_vdpau_interop mixes OpenGL and Linux VDAPAU video input/output surfaces
• Additionally, CUDA & OpenCL have interop to OpenGL
•Worked well as designed BUT...
72. 72
Responding to New Interop Requirements
• Addressing criticism of prior interop extensions...
• In many cases, single-vendor and proprietary extensions
• Can we strive for something multi-vendor?
• Overcoming NEW Managed vs. Explicit GPU API philosophy mismatches
• Older GPU APIs (e.g. OpenGL, Direct3D 9,10,11) manage GPU resources and their
underlying memory as one
• Older APIs have textures, buffers, and synchronization objects
• New GPU APIs (e.g. Vulkan, Direct3D 12) uses lower-level mechanisms to manage
resources
• Newer explicit APIs have explicit memory allocations and semaphores
• Noticeable lack of common interop infrastructure
• Can there be some common framework for interop
• Isolate platform-specific methods to “import” objects into platform-specific extension
• Windows uses HANDLEs, etc.
• POSIX operating systems use file descriptors
73. 73
EXT_memory_object & EXT_semaphore
• Vulkan introduces explicit memory and synchronization objects
• EXT_memory_object imports Vulkan explicit memory objects to OpenGL
• EXT_semaphore imports Vulkan semaphore objects to OpenGL
• Extra interop mechanisms need to share GPU objects due to this
• Platform-specific extensions specify how to import memory objects & semaphores
• For POSIX systems (e.g. Linux), use EXT_memory_object_fd & EXT_sempahore_fd
• fd = POSIX file descriptor
• For Windows, use EXT_memory_object_win32 & EXT_semaphore_win32
• Uses either Win32’s opaque HANDLE type or KMT share handle
• KMT = Kernel-Mode Thunk interface for Windows Display Driver Model (WDDM)
• Also for interoperability with Direct3D 11 & 12
• Also EXT_win32_keyed_mutex provides access to the keyed synchronization primitive
of Direct3D image objects
75. 75
EXT_memory_object
• Introduces new memory object corresponding to Vulkan concept
• Import memory objects with platform-specific API
• Then “carve out” managed OpenGL textures and buffers from a memory object
• Commands to make textures: glTexStorageMem1DEXT, glTexStorageMem2DEXT,
glTexStorageMem3DEXT, glTexStorageMem2DMultisampleEXT,
glTexStorageMem3DMultisampleEXT
• Also Direct State Access (DSA) versions: glTextureStorageMem2DEXT, etc.
• Commands to carve out a buffer: glBufferStorageMemEXT,
glNamedBufferStorageMemEXT
76. 76
OpenGL ES Parity
• Mobile developers often target OpenGL ES
• Apple’s iOS and Google’s Android use of ES made the de facto standard graphics API for
mobile
• Moore’s Law has eliminated the need for ES on NVIDIA products
• ES 2.0/3.x is supported along with full OpenGL 4.x feature set
• Essentially an ES context “hides” the complete OpenGL 4.x feature set
• Good for compatibility and portability to other vendor’s less functional GPUs
• Unfortunately ES has been slow to adopt important GPU features
• NVIDIA makes sure developers relying on ES contexts don’t forego missing features
• NVIDIA works to coordinate multi-vendor EXT extensions to ES
• NVIDIA supports fully conformant ES contexts (+ extensions) even on Windows and Linux
• NVIDIA’s OpenGL in 2017 adds many ES parity extensions...
???
77. 77
Oh, 3D developer—you
flatter me noticing my
complete & mature
feature set
With ES parity,
what does she
have that I don’t?
OpenGL 4.6
Context
ES 3.2
Context
78. 78
ES Parity Extensions for 2017
Extension name Functionality
EXT_clear_texture Clear texture images & sub-images
EXT_conservative_depth Bound direction of fragment shader depth output
EXT_shader_group_vote Collective decision making in shaders
EXT_texture_compression_bptc Compressed texture formats corresponding to Direct3D’s BC6
(8-bit RGB & RGBA) and BC7 (for HDR) formats
EXT_texture_compression_rgtc One- and two-component texture compression
EXT_texture_sRGB_R8 Single-component (red) sRGB color-space component encoding
EXT_draw_transform_feedback Adds missing transform feedback API to ES intended for
geometry shaders’s variable output vertices
EXT_clip_cull_distance Clipping and culling planes
OES_viewport_array Viewport index support for geometry shaders
KHR_parallel_shader_compile Request multi-threaded GLSL shader compilation
79. 79
Still ES Lacks Much,
NVIDIA Provides What’s Missing
•The 2017 multi-vendor parity extensions
highlight what’s missing from standard ES 3.2
•Additional major items missing from standard ES 3.2
• Texture views with OES_texture_view missed ES 3.2 inclusion
• GPU-accelerated path rendering with NV_path_rendering for ES
•BUT NVIDIA’s OpenGL ES context provides these
•If ES still isn’t enough, just use an OpenGL 4.6 context
• For example, Direct State Access is not in ES contexts
+
80. 80
NVIDIA’s ES Parity Philosophy
• The idea of “ES Parity” is NOT to turn an ES context into an OpenGL 4.x context
• The idea is to expose
• Features NVIDIA’s ES developer base has requested
• Features that we judge other ES vendors could reasonably support
• When Khronos ES vendors broadly agree, we work towards an OES extension
– Example: OES_viewport_array
• When just subset of Khronos ES vendors agree, we work for a multi-vendor EXT extension
– Example: EXT_clip_cull_distance
• As a last resort, when other ES vendors don’t share our interest, we go with NV
• Need a feature missing from ES? Speak up
• NVIDIA does not expose extensions broadly inconsistent with ES’s philosophy
• For example, fixed-function, immediate-mode,
and display lists aren’t candidates for ES parity
• Developers desiring such functionality are better
off with OpenGL 4.x contexts
81. 81
NVIDIA ES Parity
Enhancements
Result of NVIDIA’s ES Parity Efforts
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2.0
Industry’s
most functional
and full-featured
ES driver
OSes and Architectures
Android, Linux,
Windows, FreeBSD;
x86, ARM, IBM PowerPC
82. 82
Perspective of ES Parity
from an OpenGL 4.6 Context
NVIDIA ES Parity
Enhancements
Full OpenGL
ES 3.2
ES 3.1
ES 3.0
ES 2.0
NVIDIA
OpenGL 4.6
with maximally
functional
extensions
Same
driver provides
ES and 4.6 contexts
Only
difference between
ES and 4.6 context is ES
context disables non-ES usage
83. 83
Miscellaneous NEW Extensions for 2017
• NV_blend_minmax_factor, based on AMD_blend_minmax_factor
• EXT_protected_textures (Tegra & ES only)
• Used with EGL’s EGL_EXT_protected_content
Miscellaneous
2017
84. 84
NV_blend_minmax_factor:
Modulated Min/Max Blending
• Original GL_MIN and GL_MAX blend equations limited
• Both ignore the blend source and destination blend factors from glBlendFunc
• Limitation of original SGI hardware
• Conventional min/max blend equations
• blendResult = min(sourceColor, destinationColor)
• blendResult = max(sourceColor, destinationColor)
• AMD_blend_minmax_factor extension generalizes with two new blend equations
• GL_FACTOR_MIN_AMD:
blendResult = min(sourceColor × sourceFactor, destinationColor × destinationFactor)
• GL_FACTOR_MIN_AMD:
blendResult = max(sourceColor × sourceFactor, destinationColor× destinationFactor)
• NV_blend_minmax_factor provides same capability
• Just with a few restrictions, matching blend equation advanced restrictions
• Not for use with dual-source blending
• Not for mismatched multiple draw buffers
• Single-precision floating-point blending done in half-precision
• Otherwise compatible with AMD extension (uses same token values)
85. 85
NV_blend_minmax_factor
Example
• Blend code
• blendResult = max(sourceColor,
destinationColor × (1−sourceAlpha))
• Code to configure
• glEnable(GL_BLEND);
• glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
• glBlendEquation(GL_FACTOR_MIN_AMD);
• Extension supported on Maxwell and later GPU
generations
Unconventional blending
Source: Visual Music Systems
86. 86
EGL_EXT_protected_content &
EXT_protected_textures (1)
•Together provide OpenGL protected access control to GPU images
• Intended for managing trust in display compositors and apps
• Designed for Android
•GL_TEXTURE_PROTECTED_EXT texture parameter
• Applies to OpenGL texture objects
• And hence applies to framebuffer objects containing texture objects
• Boolean, defaults to false (unprotected) unless explicitly specified true
•EGL_PROTECTED_CONTENT_EXT attribute
• Applies to EGL surfaces and EGLImages
• Boolean, defaults to false (unprotected) unless explicitly specified true
•Texture objects, EGL surfaces, and EGLImages all “resources” subject
to protection
87. 87
EGL_EXT_protected_content &
EXT_protected_textures (2)
• Pipeline stages of OpenGL contexts can also be designated protected and
unprotected
• Scenario:
• display compositor uses a protected context
• while apps would use unprotected contexts
• Technically different GPU stages can be protected vs. non-protected
• General access rules
• Protected pipeline stages
• Can read any EGL surfaces and images, protected or otherwise
• BUT may NOT write non-protected EGL surfaces and images
• Non-protected contexts/stages
• Can read & write non-protected
• BUT may NOT read or write protected content
• Expectation: GPU & operating system together enforce resource protection via
protected virtual memory mappings
88. 88
EGL_EXT_protected_content Scenarios
• Android 7.0’s secure texture video playback
• Allows secure GPU post-processing of protected image content
• Supports secure Digital Rights Management (DRM)
Source: Google
89. 89
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
NumberofOpenGLextensionsproposed
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
90. 90
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
1.5
4.5 4.6
4.4
4.34.24.1
3.3 & 4.0
3.2
3.1
3.0
2.1
2.0
1.4
1.3
1.21.1
OpenGL core version updates
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
91. 91
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
Run-up to
DirectX 10
Run-up to
DirectX 11 Run-up to
DirectX 12
TNT +
GeForce
Run-up to
DirectX 8
Run-up to
DirectX 9
Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
92. 92
Implemented NVIDIA OpenGL Extensions
by Approximate Initial Proposal Year
0
10
20
30
40
50
60
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
NumberofOpenGLextensionsproposed
Tesla development
(GeForce 8,9,100,200,300)
Fermi development
(~GeForce 400)
Kepler development
(~GeForce 600)
GeForce 1,2
GeForce 3,4 GeForce 5,6,7
Caveats: extensions vary greatly in complexity, often extensions re-prefix existing extensions,
difficult to say exactly when an extension was proposed, product release lags extension proposal
Pascal development
(~GeForce 10)
Maxwell development
(~GeForce 700-900)
Despite caveats, shows how OpenGL functionality ties to rhythm of GPU architecture & API updates
93. 93
Cumulative Implemented NVIDIA OpenGL
Extensions Over 20 Years
0
50
100
150
200
250
300
350
400
450
500
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
CumulativeImplemented
OpenGLextensionsproposed
Same data as prior graphs, just integrated over time
94. 94
NVIDIA OpenGL Shader Caching
• NVIDIA OpenGL driver saves GLSL shaders it
compiles
• Cached compiled shaders saved to your
local disk
• Next time you compile the same shader,
driver loads the post-compiled cached
copy
• Saves compilation time!
• Invalidated on new driver installation
• Games can “warm” cache on installation or
first play to speed game loading
• Available both on Windows and Linux!
• Windows location
• %LOCALAPPDATA%NVIDIAGLCache
• For older drivers, used %APPDATA
%NVIDIAGLCache
• Linux location
• %HOME/.nv/GLCache
• Or $XDG_CACHE_HOME/.nv/ if
XDG_CACHE_HOME environment variable
set
• Following the convention set by the XDG
Base Directory Standard
• Locations subject to change with future
drivers and new conventions
95. 95
Linux Graphics
Open Source Efforts from NVIDIA
• NVIDIA works to improve graphics support for entire Linux ecosystem
• Examples
• GL Vendor-Neutral Dispatch (GLVND)
• arbitrates vendor-neutral access to OpenGL and EGL/GLX APIs
• Wayland support for EGL Streams
• Video Decode and Presentation API for Unix (VDPAU)
• complete solution for decoding, post-processing, compositing, and displaying compressed or
uncompressed video streams
• All open source projects
96. 96
GLVND: GL Vendor-Neutral Dispatch library
• libglvnd
• Arbitrates OpenGL API calls between multiple vendors
• Multiple drivers from different vendors to coexist on the same file system
• Determines which vendor to dispatch each API call to at runtime
• Both GLX and EGL are supported
• Any combination with OpenGL and OpenGL ES (1.1, 2.0, 3.x)
• NVIDIA open source contribution
• https://github.com/NVIDIA/libglvnd
97. 97
Before GLVND
NVIDIA Proprietary
Linux Driver
Mesa + Nouveau
I control OpenGL
best on NVIDIA
GPUs
But I got
here first!
Drivers driving you crazy!
I just want my
Linux window
system to start!
pre-GLVND user
99. 99
NVIDIA’s Support for Wayland
• Wayland
• Intended as simpler replacement for X Window System
• A protocol for a compositor to talk to its clients
• Plus the C library implementation of that protocol
• Depends on a compositor (e.g. Weston) that is the display server
• Supports varying window managers (e.g. Mutter for Gnome)
• Wayland is supported on NVIDIA GPUs through EGL Streams
• Using NVIDIA’s Proprietary OpenGL driver performance & quality
• Both Weston and Mutter (used by gnome-shell) currently have EGL Stream support
• Although not by default
• See https://github.com/NVIDIA/egl-wayland
• NVIDIA open source project
100. 100
0
Latest VDPAU Support
• Video Decode and Presentation API for Unix (VDPAU)
• Latest NVIDIA GPUs (GeForce 1080, etc.)
• Supports VDPAU Feature Set H
• Hardware-accelerated decoding of 8192x8192 (8k) H.265/HEVC video streams
• Full support of HEVC Main12 profile
101. 101
1
NVIDIA Codec SDK 8.0
• Two hardware acceleration interfaces:
• NVENCODE API for video encode
acceleration
• NVDECODE API for video decode
acceleration
• Integration already available in the
FFmpeg/libav
• New in 8.0
• 10/12-bit decoding support with
HEVC/VP9, enabling end-to-end HDR
transcoding
• Improved quality via weighted prediction
• Support for OpenGL inputs (Linux only)
Download for registered developers: https://developer.nvidia.com/designworks/video_codec_sdk/downloads/v8.0
Info: https://developer.nvidia.com/nvidia-video-codec-sdk
102. 102
2
Supported Video Encoding Formats by GPU Generation
* Except GM108
** Except GP100 (is limited to 4K resolution)
8k encoding for latest GPUs!
104. 104
4
Supported Video Decode Formats by GPU Generation
* Except GM108
** Max resolution support is limited to selected Pascal chips
*** VP8 decode support is limited to selected Pascal chips
**** VP9 10/12 bit decode support is limited to select Pascal chips
8k encoding for latest GPUs!
105. 105
5
NVDEC to OpenGL to NVENC
NVDEC NVENC
OpenGL
texture object
OpenGL
texture object
OpenGL
texture object
Linux only for GL to NVENC
For Windows, use OpenGL
interop into Direct3D surfaces
to encode from Direct3D surfaces
Decode
into
Rendering to
Framebuffer Objects
Encode
from
106. 106
6
Proven GPU Codec Technology
•Same underlying technology powers these services
Play your PC games on your PC,
encode to the cloud
Play your PC game on your PC,
decode & play on your SHIELD TV
107. 107
7
GLEW Support Available NOW
GLEW = The OpenGL Extension Wrangler Library
Open source library
Pre-built distribution: http://glew.sourceforge.net/
Source code: https://github.com/nigels-com/glew
Your one-stop-shop for API support for all OpenGL extension APIs
Now released GLEW 2.1 (July 31, 2017) provides API support for
OpenGL 4.6
Multi-vendor EXT interoperability extensions
All of NVIDIA’s Maxwell & Pascal extensions
All other NVIDIA multi-GPU generation initiatives
Examples: NV_path_rendering, NV_command_list, NV_gpu_multicast
Thanks to Nigel Stewart, GLEW maintainer, for this
108. 108
8
NVIDIA OpenGL in 2017 Provides
OpenGL’s Maximally Available Superset
OpenGL 4.6
Pascal
Extensions
2015 ARB extensions
OpenGL 4.5
Core
Maxwell
Extensions
Legacy EXT & Other
Compatibility Extensions
OpenGL Complete
Compatibility
Path Rendering
Multi-GPU.
SLI
Approaching Zero
Driver Overhead
NVIDIA Multi-generation
GPU Initiatives
DirectX inter-op
Vulkan inter-op
ES Enhancements
Full OpenGL
ES 3.2
Khronos Standard
Expected Compatibility
NVIDIA Initiatives
GPU Generation Features
109. 109
9
Last Words
•Khronos announces OpenGL 4.6 today! Best OpenGL yet
•Highlights of NVIDIA’s OpenGL support in 2017
• NVIDIA has OpenGL 4.6 today, developer preview driver available NOW
• SPIR-V support standard part of OpenGL now
• Multi-vendor EXT interoperability extensions NEW this year
• “ES Parity” effort for 2017
• Miscellaneous extensions: protected content, min/max factor blending
• Open source graphics contributions from NVIDIA
• GLVND, VDPAU for video processing, and Wayland EGL Streams support
• GPU-accelerated Encode & Decode
110. 110
0
SIGGRAPH Paper Using OpenGL to Check Out
• How to make shaders modular without giving
up performance
• Open source on github
• Accompanied by OpenGL and Vulkan demo
• Wednesday, 2 August
• Los Angeles Convention Center, Room 150/151
• 10:45 am - 12:35 pm