This talk focuses on the newest release in RenderMan* 22.5 and its adoption at Pixar Animation Studios* for rendering future movies. With native support for Intel® Advanced Vector Extensions, Intel® Advanced Vector Extensions 2, and Intel® Advanced Vector Extensions 512, it includes enhanced library features, debugging support, and an extensive test framework.
4. Shading Network
• Multiple reusable shading
nodes
• Connect nodes to define
complex materials
• Production shading
networks can grow very
large to 100s, 1000s of
nodes.
4
5. C++ Shader Limitations
• Lack of context at compile time
• Input parameters unknown
• Geometry being shaded
unknown
• Mode of shading unknown
• Surrounding shading
network unknown
• Branchy testing required
• Lack of portability
• Requires “Performance Ninjas”
Image Credit: Ninja Working AT Desk from Vector.me (by Hector Gomez)
5
6. Open Shading
Language
• Developed by Sony Pictures Imageworks*
• C-like DSL for programmable shading
• API to connect shaders into networks
• Open source
• http://github.com/imageworks/OpenShadingLanguage
• Sci-Tech Award* in 2017
Logo owned by Academy of Motion Picture Arts and Sciences for Infobox
*Other names and brands may be claimed as the property of others.
6
7. Poster images (c) Sony Pictures*, Paramount*, Warner
Brothers*, Disney*, Fox*, Universal*
7
8. Example OSL Shader
shader marble (color Cin = .5,
float freq = 1.0,
output color Cout = 0)
{
float sum = 0;
float freqVal = freq;
point Pshad = transform ("object", P);
for (int i = 0; i < 6; i++)
{
sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ;
freqVal = 2 * freqVal;
}
Cout = Cin * sum;
}
Shader
Globals
(input set by renderer)
Library Calls
8
10. oslc
Offline
compiler
Shader
Written in OSL
Intermediate OSO
(Instructions + operands)
Renderer
(Pixar’s RenderMan*, Autodesk Arnold*, Blender*)
Scene Management
Ray Tracing/Path Tracing
Light Integration
OSL Runtime
Build
Shading
Network
callbacks
Execute
Shading
Network
(per Point)
Optimized
x86-64
QueryOutputs
*Other names and brands may be claimed as the property of others.
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Pre-
compiled
library
functions
OSL Framework
12. Renderer
(Pixar’s RenderMan*, Autodesk Arnold*, Blender*)
Scene Management
Ray Tracing/Path Tracing
Light Integration
SIMD OSL Runtime
callbacks
Execute
Shading
Network
(per Point)
Optimized Intel®
AVX-512, AVX2,
or AVX
QueryOutputs
*Other names and brands may be claimed as the property of others.
Render Time
Optimization
With
LLVM* Wide JIT
(Just In Time Compilation)
Pre-compiled
library
functions
Intel® AVX-
512
SIMD OSL Framework
Pre-compiled
library
functions
Intel® AVX2
Pre-compiled
library
functions
Intel® AVX
12
13. Components in
SIMD OSL Render-time
Optimized x86-64
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
13
*Other names and brands may be claimed as the property of others.
14. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Accessors
transparent
AOS view of SOA
SIMD OSL’s Wide Library
14
15. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
SIMD OSL’s Wide Library
15
16. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Array subscript returns a
proxy object to that lane
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
SIMD OSL’s Wide Library
16
17. my_callback(void *wS, void *wM, void *wVec, void *wVS, void *wVT, unsigned int
mask_value)
{
Mask mask (mask_value);
ASSERT(mask.any_on());
Wide<const float> wScale (wS);
Wide<const Vec3> wVec (wVec);
Wide<const Matrix44> wMat (wM);
Masked<Vec3> wVT_result (wVT, mask);
Masked<Vec3> wVS_result (wVS, mask);
for(int lane = 0; lane < __OSL_WIDTH; ++lane) {
Vec3 V = wVec[lane];
Float F = wScale[lane];
Matrix M = wMat[lane];
wVS_result[lane] = V*F;
wVT_result[lane] = transform(M,V);
}
}
Array subscript returns a
proxy object to that lane
Accessors
transparent
AOS view of SOA
Extract data
from a lane
of the SOA
Skips assignment if lane masked off
SIMD OSL’s Wide Library
17
18. Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flows
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
18
*Other names and brands may be claimed as the property of others.
19. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
Effective mask
(result of combining stack)
Divergent Control Flows
19
20. Stack of masks
PUSH
Effective mask
(result of combining stack)
if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Divergent Control Flows
20
21. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
PUSH
Effective mask
(result of combining stack)
Divergent Control Flows
21
22. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
PUSH
Effective mask
(result of combining stack)
Divergent Control Flows
22
23. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
23
24. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
NEGATE
Stack of masks
Effective mask
(result of combining stack)
PUSH
Divergent Control Flows
24
25. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
25
26. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective mask
(result of combining stack)
Divergent Control Flows
26
27. if (x > 0.5)
{
...
if (y > 0.5)
{
…
if (powB > 0.23)
{
…
}
else
{
…
}
} //y
} //x
Stack of masks
POP
Effective of mask
(result of combining stack)
Divergent Control Flows
27
28. Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flow
Vectorized IR
Generation
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
28
*Other names and brands may be claimed as the property of others.
29. General LLVM Code Flow for
OSL Operations
OSL
Retrieve symbols for
Operands
Emit LLVM-defined operations
OR
Call appropriate functions
Store Result
29
30. What changes in SIMD OSL
OSL
Retrieve symbols for
Operands
Load values
Initialize values
Emit LLVM-defined operations
OR
Call appropriate functions
Store Result
30
OperandsàUniform
ResultsàUniform
OperandsàUniform
ResultsàVarying
OperandsàVarying
ResultsàUniform
OperandsàVarying
ResultsàVarying
31. What changes in SIMD OSL
31
SIMD OSL
Retrieve symbols for
Operands
Call uniform
function
Store Result
OperandsàUniform
ResultsàUniform
32. What changes in SIMD OSL
32
SIMD OSL
Retrieve symbols for
Operands
Call uniform
function
Widen Result
Store Result
OperandsàUniform
ResultsàVarying
33. What changes in SIMD OSL
33
SIMD OSL
Retrieve symbols for
Operands
Add effective mask to
arguments
Call varying function
Add address for
Results to arguments
OperandsàVarying
ResultsàVarying
34. What changes in SIMD OSL
34
SIMD OSL
Retrieve symbols for
Operands
Add effective mask to
all arguments
Call varying function
Add address for
Results to arguments
Allocate a varying
temp
Widen uniform
Operands and store to
varying temp
OperandsàUniform,
and Varying
ResultsàVarying
35. What changes in SIMD OSL
35
Unreachable
OperandsàVarying
ResultsàUniform
36. Components in
SIMD OSL Render-time
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flow
Vectorized IR
Generation
“For-each-
unique”
algorithm
Optimized x86-64
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
36
*Other names and brands may be claimed as the property of others.
41. Components in
SIMD OSL Render-time
Optimized x86
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Wide Library
Divergent
Control Flows
Vectorized IR
Generation
“For-each-
unique”
algorithm
SIMD OSL
built-ins
41
Wizard Oz Castle Clipart: https://www.clipart.email/clipart/wizard-of-oz-castle-clipart-18891.html;
<a href="https://www.clipart.email/download/374139.html" title="Image from clipart.email"><img src="https://cdn.clipart.email/e173b51872baa07a65151101799b4f7d_wizard-of-oz-clipart-emerald-castle-pencil-and-in-color-wizard-_1300-1390.jpeg" width="350" alt="Wizard Of Oz Castle Clipart" /></a>
*Other names and brands may be claimed as the property of others.
43. OSL Microbenchmarks: Speedup of
SIMD AVX-512 OSL over Scalar OSL
0.125
0.25
0.5
1
2
4
8
16
null
sin cos tan
asin
acos
atan
sinh
cosh
tanh
atan2
sincos
log
log2
log10
logb
exp
exp2
expm1
pow
erf
erfc
radians
degrees
sqrt
inversesqrt
hypot
abs
fabs
sign
floor
ceil
roundtruncmod
min
maxclampmix
isnan
isfinite
select
dot
cross
length
distance
normalize
reflect
fresnel
rotate
transform
transform_matrix
matrix_object_camera
determinant
transpose
linearstep
smooth_linearstep
noise_perlin
noise_cell
noise_simplex
noise_gabor
pnoise_perlin
pnoise_cell
pnoise_gabor
spline_bezier
spline_bspline
spline_catmull-rom
spline_hermitespline_linearspline_constant
48 threads on Intel(R) Xeon(R) Platinum 8260L CPU @2.30GHz (config 2)
Average: 6.9x
Geomean: 6.14x
43
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
44. OSL SIMD Performance at Maximum
Batch Utilization
OSL’s testshade running Intel® AVX-512® on 48 threads of
Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
leopard concrete diamond oak marble
Speedupatmaxbatchsize
5.2x
6x
10x
12x
15x
44
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
45. SIMD OSL Intel® AVX-512 VS AVX2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
leopard concrete diamond plate oak marble thread donut
Speedup
1.6x 1.9x
1.1x
OSL’s testshade running Intel® AVX-512 and AVX2 on 48 threads of
Intel(R) Xeon(R) Platinum 8260L CPU @2.40 Ghz (config 1)
1.3x 1.3x
1.4x
1.8x
45
*Other names and brands may be claimed as the property of others.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
46. Evolution of SIMD OSL—Proof of
Concept to Production 2016‒2019
SIMD OSL
Library
SIMD OSL
Framework
SIMD OSL
Performance
Intel® AVX-512,
AVX2, AVX-specific
libraries
Masking and scatter-
gather
17k+ tests
Improved
performance on
built-in functions
Compiler + platform
support
Reduction in JIT
time
Coverage for built-in
function variants
Handling
treacherous control
flows
Noise functions
with options
LLVM optimization
passes to improve
AVX2
46
47. SIMD Open Shading
Language
Open Shading
Language
https://github.com/imageworks/OpenShadingLanguage
https://gitlab.com/intel-osl/BatchedOSL
47
51. 22.4’s Overall Rendering
Speedup with SIMD OSL
51
1
1.05
1.1
1.15
1.2
1.25
1.3
Bonnie’s room Fillmore Bonnie
Speedup
CLX8260L (24c, 2.3GHz)
1.11x
1.17x
1.27x
*Other names and brands may be claimed as the property of others.
Run on 48 threads of 24-core Intel(R) Xeon(R) Platinum 8260L CPU @ 2.30GHz (config 2)
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.