SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Shadow Caster Culling for Efficient Shadow Mapping
                      Jiˇ´ Bittner∗
                        rı                Oliver Mattausch†            Ari Silvennoinen‡            Michael Wimmer†
                ∗
                    Czech Technical University in Prague§     †
                                                                  Vienna University of Technology         ‡
                                                                                                              Umbra Software




Figure 1: (left) A view of a game scene rendered with shadows. (right) Shadow map with shadow casters rendered using a naive application
of occlusion culling in the light view (gray), and shadow casters rendered using our new method (orange). The view frustum corresponding
to the left image is shown in yellow. Note that for this view our shadow caster culling provided a 3x reduction in the number of rendered
shadow casters, leading to a 1.5x increase in total frame rate. The scene is a part of the Left 4 Dead 2 game (courtesy of Valve Corp.).


Abstract                                                                 been proposed, most of which focus on increasing the quality of
                                                                         the rendered image given a maximum resolution of the shadow
We propose a novel method for efficient construction of shadow            map [Stamminger and Drettakis 2002; Wimmer et al. 2004; Lloyd
maps by culling shadow casters which do not contribute to visible        et al. 2008]. Given sufficiently high resolution, these methods
shadows. The method uses a mask of potential shadow receivers            achieve shadows of very high quality.
to cull shadow casters using a hierarchical occlusion culling algo-
rithm. We propose several variants of the receiver mask implemen-        For highly complex scenes, however, shadow mapping can be-
tations with different culling efficiency and computational costs.        come very slow. One problem is the rendering of the main cam-
For scenes with statically focused shadow maps we designed an            era view. This problem has been addressed by occlusion culling,
efficient strategy to incrementally update the shadow map, which          which aims to quickly cull geometry which does not contribute to
comes close to the rendering performance for unshadowed scenes.          the image [Cohen-Or et al. 2003]. However, just rendering the cam-
We show that our method achieves 3x-10x speedup for rendering            era view quickly does not guarantee high frame rates when shadow
large city like scenes and 1.5x-2x speedup for rendering an actual       mapping is used. In particular, the overhead of creating the shadow
game scene.                                                              map is not reduced. Thus, rendering the shadow map can easily
                                                                         become the bottleneck of the whole rendering pipeline as it may
                                                                         require rendering a huge amount of shadow casters at a very high
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional
                                                                         resolution.
Graphics and Realism— [I.3.5]:       Computer Graphics—
Computational Geometry and Object Modeling                               A naive solution to this problem would again use occlusion culling
                                                                         to reduce the amount of rendered shadow casters when rendering
Keywords: shadow maps, occlusion culling, real time rendering            the shadow map. However it turns out that for common complex
                                                                         scenes like terrains or cities, the lights are set up so that they have
1    Introduction                                                        a global influence on the whole scene (e.g., sun shining over the
                                                                         city). For such scenes, occlusion culling from the light view will
                                                                         not solve the problem, as the depth complexity of the light view is
Shadow mapping is a well-known technique for rendering shad-
                                                                         rather low and thus not much geometry will be culled. Thus even
ows in 3D scenes [Williams 1978]. With the rapid development
                                                                         if occlusion culling is used for both the camera view and the light
of graphics hardware, shadow maps became an increasingly popu-
                                                                         view, we might end up rendering practically all geometry contained
lar technique for real-time rendering. A variety of techniques have
                                                                         in the intersection of the view frustum and the light frustum, and
    ∗ e-mail:bittner@fel.cvut.cz                                         many rendered shadow casters will not contribute to the shadows in
    † e-mail:{matt|wimmer}@cg.tuwien.ac.at                               the final image.
    ‡ e-mail:ari@umbrasoftware.com
    § Faculty of Electrical Engineering                                  In this paper, we propose a method for solving this problem by
                                                                         using the knowledge of visibility from the camera for culling the
                                                                         shadow casters. First, we use occlusion culling for the camera
                                                                         view to identify visible shadow receivers. Second, when render-
                                                                         ing into the shadow map, we use only those shadow casters which
                                                                         cast shadows on visible shadow receivers. All other shadow cast-
                                                                         ers are culled. For both the camera and the light views, we use
                                                                         an improved version of the coherent hierarchical culling algorithm
                                                                         (CHC++) [Mattausch et al. 2008], which provides efficient schedul-
ing of occlusion queries based on temporal coherence. We show             (1) Determine shadow receivers
that this method brings up to an order of magnitude speedup com-
pared to naive application of occlusion culling to shadow map-            (2) Create a mask of shadow receivers
ping (see Figure 1). As a minor contribution we propose a simple          (3) Render shadow casters using the mask for culling
method for incrementally updating a statically focused shadow map
for scenes with moving objects.                                           (4) Compute shading
                                                                         The main contribution of our paper lies in steps (2) and (3). To give
2 Related Work                                                           a complete picture of the method we briefly outline all four steps of
                                                                         the method.
Shadow mapping was originally introduced by Williams [1978],
and has meanwhile become the de-facto standard shadow algo-
rithm in real-time applications such as computer games, due to its       Determine shadow receivers        The potential shadow receivers
speed and simplicity. For a discussion of different methods to in-       for the current frame consist of all objects visible from the cam-
crease the quality of shadow mapping, we refer to a recent sur-          era view, since shadows on invisible objects do not contribute to
vey [Scherzer et al. 2010]. In our approach in particular, we make       the final image. Thus we first render the scene from the point of
use of light space perspective shadow mapping (LiSPSM) [Wim-             view of the camera and determine the visibility status of the scene
mer et al. 2004] in order to warp the shadow map to provide higher       objects in this view.
resolution near the viewpoint, and cascaded shadow maps [Lloyd
                                                                         To do both of these things efficiently, we employ hierarchical oc-
et al. 2006], which partition the shadow map according to the view
                                                                         clusion culling [Bittner et al. 2004; Guthe et al. 2006; Mattausch
frustum with the same aim.
                                                                         et al. 2008]. In particular we selected the recent CHC++ algo-
There has been surprisingly little work on shadow mapping for            rithm [Mattausch et al. 2008] as it is simple to implement and pro-
large scenes. One obvious optimization which also greatly aids           vides a good basis for optimizing further steps of our shadow caster
shadow quality is to focus the light frustum on the receiver frus-       culling method. CHC++ provides us with a visibility classification
tum [Brabec et al. 2002]. As a result, many objects which do not         of all nodes of a spatial hierarchy. The potential shadow receivers
cast shadows on objects in the view frustum are culled by frustum        correspond to visible leaves of this hierarchy.
culling in the shadow map rendering step. Lauritzen et al. [2010]
have reduced the focus region to the subset of visible view sam-         Note that in contrast to simple shadow mapping, our method re-
ples in the context of optimizing the splitting planes for cascaded      quires a render pass of the camera view before rendering the shadow
shadow maps.                                                             map. However, such a rendering pass is implemented in many ren-
                                                                         dering frameworks anyway. For example, deferred shading, which
Govindaraju et al. [2003] were the first to use occlusion culling to      has become popular due to recent advances in screen-space shading
reduce the amount of shadow computations in large scenes. Their          effects, provides such a pass, and even standard forward-renderers
algorithm computes object-precision shadows mostly on the CPU            often utilize a depth-prepass to improve pixel-level culling.
and used occlusion queries on a depth map to provide the tight-
est possible bound on the shadow polygons that have to be ren-
                                                                         Create a mask of shadow receivers          The crucial step of our al-
dered. While their algorithm also renders shadow receivers to the
                                                                         gorithm is the creation of a mask in the light view which represents
stencil buffer to identify potential shadow casters, this was done af-
                                                                         shadow receivers. For a crude approximation, this mask can be
ter a depth map had already been created, and thus this approach
                                                                         formed by rendering bounding boxes of visible shadow receivers in
is unsuitable for shadow mapping acceleration. In our algorithm,
                                                                         the stencil buffer attached to the shadow map. In the next section,
on the other hand, the creation of the depth map itself is acceler-
                                                                         we propose several increasingly sophisticated methods for building
ated, which is a significant benefit for many large scenes. Lloyd
                                                                         this mask, which provide different accuracy vs. complexity trade-
et al. [2004] later extended shadow caster culling by shadow vol-
                                                                         offs.
ume clamping, which significantly reduced the fillrate when render-
ing shadow volumes. Their paper additionally describes a method
for narrowing the stencil mask to shadow receiver fragments that         Render shadow casters         We use hierarchical visibility culling
are detected to lie in shadow. D´ coret [2005] proposed a different
                                   e                                     using hardware occlusion queries to speed up the shadow map ren-
method for shadow volume clamping as one of the applications for         dering pass. In addition to depth-based culling, we use the shadow
N-buffers. Unlike Lloyd et al. [2004] D´ coret enhances the sten-
                                           e                             receiver mask to cull shadow casters which do not contribute to vis-
cil mask by detecting receiver fragments which are visible from          ible shadows. More precisely, we set up the stencil test to discard
the camera. The methods for shadow volume culling and clamping           fragments outside the receiver mask. Thus, the method will only
have been extended by techniques allowing for more efficient GPU          render shadow casters that are visible from the light source and
implementations[?; Engelhardt and Dachsbacher 2009].                     whose projected bounding box at least partly overlaps the receiver
                                                                         mask.
While our method shares the idea of using visible receivers to cull
shadow casters proposed in the above mentioned techniques, in
these methods the shadow map served only as a proxy to facili-           Compute shading       The shadow map is used in the final rendering
tate shadow volume culling and clamping, and the efficient con-           pass to determine light source visibility for each shaded fragment.
struction of the shadow map itself has not been addressed. When
shadow mapping is used instead of more computationally demand-           4 Shadow Caster Culling
ing shadow volumes, the bottleneck of the computation moves to
the shadow map construction itself. This bottleneck, which we ad-
                                                                         The general idea of our method is to create a mask of visible shadow
dress in the paper, was previously left intact.
                                                                         receivers, i.e., those objects determined as visible in the first camera
                                                                         rendering pass. This mask is used in the shadow map rendering pass
3 Algorithm Outline                                                      together with the depth information to cull shadow casters which do
                                                                         not contribute to the visible shadows. Culling becomes particularly
The proposed algorithm consists of four main steps:                      efficient if it employs a spatial hierarchy to quickly cull large groups
Figure 2: Visualization of different receiver masks. On the left there is a light space view of the scene showing surfaces visible from the
camera in yellow. On the right three different receiver masks are shown. Note that the invisible object on the top right does not contribute to
any of the receiver masks.


of shadow casters, which is the case for example in the CHC++             casters (i.e., most of the scene is shadowed by objects outside of
algorithm.                                                                the camera view), the shadow mask creation can become rather
                                                                          expensive, since most parts of the shadow map will be replaced
There are various options on how the receiver mask can be imple-          by other objects which act as real shadow casters. Consequently,
mented. We describe four noteworthy variants which differ in im-          the resources for rendering these shadow receivers into the shadow
plementation complexity as well as in culling efficiency. The cre-         map were wasted, as they would have been culled by the occlusion
ation of the mask happens in a separate pass before rendering the         culling algorithm otherwise.
shadow map, but may already generate parts of the shadow map
itself as well, simplifying the subsequent shadow map rendering
pass.                                                                     4.3 Combined Geometry and Bounding Volume Mask

We start our description with a simple version of the mask, which         In order to combine the positive aspects of the two previously de-
is gradually extended with more advanced culling. As we shall see         scribed methods, we propose a technique which decides whether a
later in the results section, in most cases the more accurate mask        shadow receiver should fill the mask using its bounding box or its
provides better results, although there might be exceptions for par-      geometry. The decision is based on the estimation of whether the
ticular scene and hardware configurations.                                 visible shadow receiver will simultaneously act as a shadow caster.
                                                                          Such objects are rendered using geometry (with both stencil and
4.1 Bounding Volume Mask                                                  depth, as such objects must be rendered in the shadow map any-
                                                                          way), while all other visible receivers are rendered using bounding
The most straightforward way to create the mask is to rasterize           boxes (with only stencil write).
bounding volumes of all visible shadow receivers into the stencil
                                                                          The estimation uses temporal coherence: if a shadow receiver was
buffer attached to the shadow map.
                                                                          visible in the shadow map in the previous frame, it is likely that this
Such a mask will generally lead to more shadow casters being ren-         receiver will stay visible in the shadow map and therefore also act
dered than actually necessary. The amount of overestimation de-           as a shadow caster in the current frame.
pends mostly on how fine the subdivision of scene meshes into in-
                                                                          Again, all objects that have already been rendered into the shadow
dividual objects is, and how tightly their bounding volumes fit. If
                                                                          map using depth writes can be skipped in the shadow map render-
an individual scene object has too large spatial extent, its projection
                                                                          ing pass. However, in order to estimate the receiver visibility in the
might cover a large portion in the shadow mask even if only a small
                                                                          subsequent frame, visibility needs to be determined even for these
part of the mesh is actually visible. An illustration of the bounding
                                                                          skipped objects. Therefore we include them in the hierarchical oc-
volume mask is shown in Figure 2 (BVOL).
                                                                          clusion traversal and thus they may have their bounding volumes
                                                                          rasterized during the occlusion queries.
4.2 Geometry Mask
                                                                          4.4 Fragment Mask
In order to create a tighter shadow receiver mask, we can rasterize
the actual geometry of the visible shadow receivers instead of their
bounding volumes into the stencil buffer.                                 Even the most accurate of the masks described above, the geometry
                                                                          mask, can lead to overly conservative receiver surface approxima-
While rendering the visible shadow receivers, we also write the           tions. This happens when a receiver object spans a large region of
depth values of the shadow receivers into the shadow map. This            the shadow map even though most of the object is hidden in the
has the advantage that these objects have thus already been ren-          camera view. An example would be a single ground plane used for
dered into the shadow map and can be skipped during the subse-            a whole city, which would always fill the whole shadow receiver
quent shadow map rendering pass, which uses occlusion queries             mask and thus prevent any culling. While such extreme cases can
just for the remaining objects. An illustration of the bounding vol-      be avoided by subdividing large receiver meshes into smaller ob-
ume mask is shown in Figure 2 (GEOM).                                     jects, it is still beneficial to have a tighter mask which is completely
                                                                          independent of the actual object subdivision.
In some scenes many shadow receivers also act as shadow casters
and thus this method will provide a more accurate mask at almost          Fortunately, we can further refine the mask by using a fragment
no cost. However if most shadow receivers do not act as shadow            level receiver visibility tests for all shadow receivers where the full
Method             accuracy        fill rate      transform rate
     BVOL                low            high              low
    GEOM               medium         medium            medium
  GEOM+BVOL            medium       low–medium            low
     FRAG                high         medium            medium

Table 1: Summary of the masking techniques and indicators of their
main attributes. BVOL – bounding box, GEOM – geometry mask,
GEOM+BVOL – combined geometry and bounding volume mask,
FRAG – fragment mask. Note that the FRAG method either relies on
availability of direct stencil writes, or requires slightly more effort
in mask creation and culling phases as described in Section 4.4.



geometry is used for the mask. While creating the mask by raster-
izing the geometry in light space, we project each fragment back
to view space and test the projected fragment visibility against the
view space depth buffer. If the fragment is invisible, it is on a part    Figure 3: 2D illustration of the culling efficiency of different mask-
of the shadow receiver that is hidden, and is thus not written to the     ing strategies. The figure shows an example of the scene with
mask, otherwise, the mask entry for this fragment is updated. The         a shadow receiver object and different types of shadow receiver
shadow map depth needs to be written in both cases, since even if         masks. Object A is always rendered as it intersects all masks, ob-
the fragment is not visible in the camera view, it might still be a       ject B is culled only by the FRAG mask, C is culled by depth test
shadow caster that shadows another visible shadow receiver.               (occluded by B) and by the FRAG and GEOM masks, and finally D
                                                                          is culled by all types of masks.

A similar mask construction approach was proposed by
D´ coret [2005] for culling and clamping shadow volumes of
  e                                                                       5 Further Optimizations
shadow casters. D´ coret used a pair of litmaps to obtain minimum
                   e
and maximum depths of visible receivers per texel in order to clamp       This section contains several optimization techniques which can op-
shadow volumes. In his method all scene objects are processed             tionally support the proposed receiver masking in further enhancing
twice in the mask construction phase and the method does not use          the performance of the complete rendering algorithm.
the knowledge of visible shadow receivers. In contrast, we use a
single binary mask of visible receiver fragments and combine the
                                                                          5.1 Incremental Shadow Map Updates
construction of the mask with simultaneous rendering of depths of
objects which act both as shadow receivers and shadow casters,
                                                                          If the shadow map is statically focused, we can extend the approach
which is very important to reduce the bottleneck created by the
                                                                          by restricting the shadow receiver mask only to places where a po-
shadow map construction.
                                                                          tential change in shadow can happen. We can do so by building
                                                                          the shadow receiver mask only from objects involved in dynamic
The mask constructed using the fragment level visibility tests is         changes. In particular we render all moving objects into the shadow
pixel accurate, i.e., only locations that can receive a visible shadow    receiver mask in their previous and current positions. This pass is
are marked in the mask (up to the bias used for the fragment depth        restricted only to dynamic objects which are either visible in this or
comparison). Note that the fragment visibility test corresponds to        the previous frame.
a “reversed” shadow test, i.e., the roles of the camera and the light
                                                                          If only a fraction of the scene objects is transformed, the shadow re-
are swapped: instead of testing the visibility of the camera view
                                                                          ceiver mask becomes very tight and the overhead for shadow map
fragment with respect to the shadow map, we test visibility of the
                                                                          rendering becomes practically negligible. However, this optimiza-
fragment rendered into the shadow map with respect to the camera
                                                                          tion is only useful if a static shadow map brings sufficient shadow
using the camera view depth buffer. An illustration of the bounding
                                                                          quality compared to a shadow map focused on the view frustum.
volume mask is shown in Figure 2 (FRAG).
                                                                          Note that in order to create the stencil mask and simultaneously
                                                                          clear the depth buffer in a selective fashion, we reverse the depth
The implementation of this approach faces the problem that the            test (only fragments with greater depth pass) and set the fragment
stencil mask needs to be updated based on the outcome of the              depth to 1 in the shader while rendering the bounding boxes of mov-
shader. This functionality is currently not widely supported in hard-     ing objects.
ware. Therefore, we implement the fragment receiver mask as an
additional texture render target instead of using the stencil buffer.
This requires an additional texture fetch and a conditional fragment      5.2 Visibility-Aware Shadow Map Focusing
discard based on the texture fetch while rendering the occlusion
queries in the shadow map rendering pass, which incurs a small            Shadow maps are usually “focused” on the view frustum in order to
performance penalty compared to a pure stencil test.                      increase the available shadow map resolution. If visibility from the
                                                                          camera view is available, focusing can be improved further [Lau-
                                                                          ritzen et al. 2010]: The light frustum can then be computed by cal-
The summary of different receiver masking methods is given in Ta-         culating the convex hull of the light source and the set of bounding
ble 1. The illustration of the culling efficiency of different masks       boxes of the visible shadow receivers. Note that unlike the method
is shown in Figure 3. Note that apart from the BVOL method, all           of Lauritzen et al. [2010], our focusing approach does not require
other methods initiate the depth buffer values for the light view with    a view sample analysis on the GPU, but uses the readily available
depths of visible receivers.                                              visibility classification from the camera rendering pass.
Figure 4: This figure shows a viewpoint in the Vienna scene, the corresponding light view, and a visualization of the different versions of
receiver masks where blue indicates the bounding volume mask, green the geometry mask, and red the fragment mask.


This simple method effectively increases the resolution of the             6 Results and Discussion
shadow map, as it is focused only on the visible portion of the
scene. However this technique also has a drawback: if a large visi-        6.1 Test Setup
bility change occurs in the camera view, the focus of the light view
changes dramatically from one frame to the next, which can be per-         We implemented our algorithms in OpenGL and C++ and evaluated
ceived as temporal alias or popping of the shadow. This can easily         them using a GeForce 480 GTX GPU and a single Intel Core i7
happen in scenes in which a distant object suddenly pops up on the         CPU 920 with 2.67 GHz. For the camera view render pass we used
horizon such as a tall building, flying aircraft or a mountain.             a 800 × 600 32-bit RGBA render target (to store color and depth)
                                                                           and a 16-bit RGB render target (to store the normals), respectively.
                                                                           For the light view render pass we used a 32-bit depth-stencil texture
5.3 CHC++ Optimizations                                                    and an 8-bit RGB color buffer with varying sizes. The application
                                                                           uses a deferred shading pipeline for shadowing.
                                                                           City scenes exhibit the scene characteristics which our algorithm is
Our new shadow culling method works with any occlusion culling             targeted at – large, open scenes with high cost for shadow mapping.
algorithm that allows taking the stencil mask into account. We             Hence we tested our algorithm in three city environments: a model
use coherent hierarchical culling (CHC++ [Mattausch et al. 2008])          of Manhattan, the ancient town of Pompeii, and the town of Vienna
to implement hierarchical occlusion queries both for rendering the         (refer to Table 3 for detailed information). All scenes were pop-
camera view and for rendering the shadow map. For the shadow               ulated with various scene objects: Manhattan was populated with
map, occlusion queries can be set up so that they automatically take       1,600 cars, Pompeii with 200 animated characters, and Vienna with
into account the shadow receiver mask stored in the stencil buffer.        several street objects and trees. For Manhattan (Pompeii) we cre-
If the receiver mask is stored in a separate texture (as for the frag-     ated 302 (402 ) floor tiles in order to have sufficient granularity for
ment mask), a texture lookup and conditional fragment discard is           the object-based receiver mask methods. We measured our timings
necessary in the occlusion culling shader.                                 with a number of predefined walkthroughs. Figure 4 shows the dif-
                                                                           ferent receiver masks in a real scene.

We also propose a few modifications to the CHC++ algorithm,                    Abbreviation      Method
which should be useful in most applications using CHC++. Our re-              VFC               view frustum culling
cent experiments indicated that for highly complex views the main             REF               CHC++ occlusion culling (main reference)
performance hit of CHC++ comes from the idle time of the CPU,                 FOCUS             REF + visibility-aware focusing
when the algorithm has to wait for query results of previously in-            CHC++-OPT         optimized CHC++
visible nodes. The reason is that previously invisible nodes, which           INCR              incremental shadow map updates
sometimes generate further queries leading to further wait time, can          UNSHAD            main render pass without shadow mapping
be delayed due to batching, whereas queries for previously visible
nodes, whose result is only of interest in the next frame, are issued      Table 2: Abbreviations used in the plots (also refer to Table 1 for
immediately whenever the algorithm is stalled waiting for a query          the variants of receiver masks).
result.
                                                                           We compare our methods mainly against a reference method (REF)
Therefore, it turns out to be more beneficial to use waiting time to        that uses the original CHC++ algorithm for both the camera view
issue queries for previously invisible nodes instead, starting these       and the light view. We also show simple view frustum culling
queries as early as possible. Queries for the previously visible           (VFC) and a minor modification of the REF method which uses
nodes, on the other hand, are issued after the hierarchy traversal         visibility-aware focusing (FOCUS). Note that apart from the FO-
is finished, and their results are fetched just before the traversal in     CUS method, all other tested methods do not use visibility-aware
the next frame. Between these two stages we apply the shading pass         focusing. The tested methods are summarized in Table 2.
with shadow map lookups, which constitutes a significant amount
of work to avoid any stalls.                                               6.2 Receiver Mask Performance and Shadow Map Res-
                                                                               olution

Another very simple CHC++ modification concerns the handling of             Table 4 shows average frame times for the tested scenes and differ-
objects that enter the view frustum. If such an object has been in         ent shadow map resolutions. As can be seen, receiver masking is
the view frustum before, instead of setting it to invisible, it inherits   faster than REF and FOCUS for all scenes and parameter combina-
its previous classification. This brings benefits especially for the         tions. There is a tendency that our algorithm brings slightly higher
camera view for frequent rotational movements.                             speedup for uniform shadow mapping than for LiSPSM. This has
Vienna                                                      Manhattan                                                      Pompeii




                    12,685,693 vertices                                         104,960,856 vertices                                        63,141,390 vertices
                      22,464 objects                                               12,836 objects                                             11,097 objects
                    20,240 BVH nodes                                             6,471 BVH nodes                                            14,945 BVH nodes
 TIME (ms)                               VIENNA, LISPSM−1K      TIME (ms)                      NEW YORK, UNIFORM−4K      TIME (ms)                               POMPEII, LISPSM−1K
 50                                                             60                                                       90
                                         VFC                                    VFC                                      80            VFC
  40                                                            50
                                                                                                                         70
                                 REF                            40                                                       60           REF
  30                                                                          REF
                                                                                                                         50
                                                                30
                                                                                                                         40
  20                             FOCUS
                                                                20              FOCUS                                    30          FOCUS

  10                                                                                                                     20
                                                                10
                                                                                                                         10
                             FRAG                                              GEOM+BVOL                                                      FRAG
  0    500   1000    1500 2000   2500 3000     3500 4000 4500    0      500      1000   1500     2000   2500      3000    0      500         1000     1500     2000   2500      3000
                                                     FRAMES                                                    FRAMES                                                        FRAMES


                                                   Table 3: Statistics for the scenes and characteristic walkthroughs.


    SM type                         LISPSM             UNIFORM                                 Note that for shadow maps smaller or equal than 2K 2 , the frag-
   Shadow size               1K        2K   4K     1K     2K   4K                              ment mask is consistently the fastest method in our tests. For larger
      Scene                                   Vienna                                           shadow maps of 4K 2 or more, it becomes highly scene depen-
       REF                  21.6      22.3 22.4 28.7 28.7 29.2                                 dent whether the benefit outweighs the cost of the additional texture
     FOCUS                   9.1      9.3   9.3   10.8 11.0 11.4                               lookups during mask creation and occlusion culling. In this case,
  GEOM+BVOL                  4.9      4.9   5.0    4.9    5.0  5.1                             the GEOM+BVOL method can be used, which has a smaller over-
      FRAG                   2.9      3.5   6.1    2.9    3.4  5.9                             head caused only by the stencil buffer writes and the rasterization
      Scene                                 Manhattan                                          of the additional bounding boxes. On the other hand, we expect
                                                                                               a higher speedup for the reverse shadow testing once it is possible
       REF                  36.6      35.9 35.1 44.2 43.9 41.8
                                                                                               to write the stencil buffer in the shader. There is already a specifi-
     FOCUS                  28.9      28.1 27.9 31.9 30.7 30.0
                                                                                               cation available called GL ARB STENCIL EXPORT [corp. 2010],
  GEOM+BVOL                  5.6      5.7   6.6    5.6    5.7  6.5
                                                                                               and we hope that it will be fully supported in hardware soon.
      FRAG                   4.5      5.4   9.0    4.5    5.3  8.6
      Scene                                  Pompeii
                                                                                               6.3 Walkthrough Timings
       REF                  34.8      34.8 39.2 40.5 40.2 44.8
     FOCUS                  17.4      17.4 20.6 18.2 18.3 21.2
  GEOM+BVOL                 11.2      11.2 13.4 11.4 11.3 13.0                                 The plots in Table 3 show timings for walkthroughs selected scene
      FRAG                   9.7      10.0 13.2    9.8   10.2 12.5                             and parameter combinations. For Manhattan, we applied uniform
                                                                                               shadow mapping with 4K 2 resolution and use the combined geom-
                                                                                               etry and bounding volume receiver mask (GEOM+BVOL). Note
Table 4: Average frame times for the tested scenes (in ms). The time
                                                                                               that in this scene, the FOCUS method is often useless because
for the best method for the given scene and shadow map resolution
                                                                                               skyscrapers are often visible in the distance, while most of the ge-
is marked in bold.
                                                                                               ometry in between can actually be culled by receiver masking. In
                                                                                               Vienna and Pompeii, we applied LiSPSM shadow mapping with
                                                                                               1K 2 resolution, and the fragment mask (FRAG). The plots show
                                                                                               that receiver masking is the only method that provides stable frame
two reasons: First, LiSPSM focuses much better on the visible parts                            times for the presented walkthroughs, which is an important prop-
of the view frustum. Second, LiSPSM trades the quality increase                                erty for a real-time rendering algorithm. The walkthroughs cor-
in the near field with quality loss in the far field regions (depending                          responding to these plots along with visualizations of the culled
on the settings of the n parameter). Therefore it can happen that                              objects can be watched in the accompanying video for further anal-
small objects in the background are simply not represented in the                              ysis.
shadow map and will therefore be culled. Surprisingly, the frame
times of the REF and FOCUS methods slightly decrease for in-                                   Table 5 shows the statistics and the timings for a scene taken from
creasing shadow map sizes in Manhattan. This unintuitive behavior                              an actual game title (Left 4 Dead 2) using cascaded shadow map-
might be connected with the heavy transform limitation of these al-                            ping. This game scene was rendered using a different occlusion
gorithms for this particular scene configuration (i.e., due to many                             culling algorithm than CHC++, implemented in a different render-
car instances).                                                                                ing engine. The receiver masking is however implemented exactly
Left 4 Dead 2




                                                                          TIME (ms)                                 VIENNA, LISPSM−2K     VERTICES (M)                      VIENNA, LISPSM-2K
                                                                           40                                                              3.5
                                1,442,775 vertices                                                                         REF                                                REF
                                                                                                                                                                              CHC++−OPT
                                                                           35                                              CHC++−OPT        3
                                   6,359 objects                           30
                                                                                                                           BVOL
                                                                                                                           GEOM+BVOL       2.5
                                                                                                                                                                              BVOL
                                                                                                                                                                              GEOM+BVOL
                                10,675 BVH nodes                           25                                              FRAG
                                                                                                                                            2
                                                                                                                                                                              FRAG

        TIME (ms)                             LEFT4DEAD, CASCADED−1K       20
        40                                                                                                                                 1.5
                                                                           15
                                                                                                                                            1
         35               VFC                                              10

                                                                            5                                                              0.5
         30
                                                                            0         500   1000       1500         2000      2500           0     500      1000    1500    2000    2500
         25                                                                                                                      FRAMES                                               FRAMES
                          REF
         20
                                                                          Figure 5: Dependence of the frame time (bottom left) and num-
         15                                                               ber of vertices (bottom right) for a walkthrough of the Vienna scene
         10
                          FRAG                                            and different shadow computation methods. Note that the walk-
         5
                                                                          throuh also includes viewpoints located above the roofs. Such view-
                                                                          points see far over the city roofs (top). In this comparison the FRAG
         0          500         1000   1500      2000    2500      3000   method achieves consistently the best performance followed by the
                                                                FRAMES
                                                                          the GEOM+BVOL method.
Table 5: Statistics for the Left 4 Dead 2 scene and timings for                              TIME (ms)                                            POMPEII, STATIC−4K
a walkthrough using cascaded shadow mapping with four 1K 2                                   100
                                                                                                                           REF
shadow maps.

                                                                                                                       FRAG
                                                                                              10
                                                                                                                           INCR
as described in our paper. The fragment receiver mask and sub-
                                                                                                                      UNSHAD
sequent culling is applied separately for each of the four cascaded
shadow maps using 1K 2 resolution. Note that in this scene many
objects are actually inside the houses and hence culled already by                                                                                   2000
                                                                                                   1          500            1000         1500                     2500
the REF method (depth-based culling without receiver masking).                                                                                                     FRAMES
Even though this might imply a limited additional gain of our al-
gorithm compared to REF, Table 5 shows that fragment masking              Figure 6: Effect of incremental mask updates for a statically fo-
still provides a significant speedup. Note that during frames 1,800-       cused shadow map.
2,200, only a small fraction of objects is visible in the view frustum
and almost all casters in the shadow frustum throw a visible shadow
in the main view and thus there is very small potential for perfor-       6.5 Incremental Shadow Map Updates
mance gain using any shadow caster culling method.
                                                                          In Figure 6, we use a static 4K 2 shadow map for the whole Pom-
                                                                          peii scene. Using this particular setup, REF performs over ten times
6.4 Geometry Analysis and CHC++ Optimization                              worse than FRAG. Conversely, a static shadow map allows us to use
                                                                          the incremental shadow map updates optimization (INCR), which
                                                                          restricts shadow map updates to the parts of the shadow map that
                                                                          change, i.e., to the projections of the bounding boxes of dynamic
Figure 5 analyses the actual amount of geometry rendered for dif-
                                                                          objects. As can be seen, this technique can approach the perfor-
ferent methods and contrasts that with frame times. In particular,
                                                                          mance of unshadowed scene rendering (UNSHAD). Note that we
here we include a comparison of the original CHC++ algorithm
                                                                          use a log-scale for this plot.
used in REF with our optimized version CHC++-OPT. The differ-
ent receiver masking versions all use optimized CHC++. For all
algorithms we used a 2K 2 shadow map and LiSPSM.                          6.6 Dependence on Elevation Angle

                                                                          In Figure 7, we show the dependence of the light source elevation
Interestingly, the optimized CHC++ algorithm renders more ver-            angle on the (average) frame time for shadow mapping with resolu-
tices, but provides a constant speedup over the original version be-      tion 1K 2 in a Vienna walkthrough. For uniform shadow mapping,
cause of reduced CPU stalls when waiting for an occlusion query           there is a noticeable correlation between the angle and the render
result. As can be observed from the plots, the speedup of the re-         times for REF and FOCUS. Interestingly, there seems to be a much
ceiver masks corresponds closely to their level of tightness hence        weaker correlation for LiSPSM shadow mapping. The influence of
the number of rendered vertices.                                          the elevation angle on the performance of receiver masking is mini-
TIME (ms)    VIENNA, UNIFORM−1K      TIME (ms)       VIENNA, LISPSM−1K
35                                   22                                     IEEE Transactions on Visualization and Computer Graphics. 9,
                                                      REF
30             REF
                                     18                                     3, 412–431.
25
20
                                     14                                   CORP.,    N.,    2010.            Opengl extension        registry.
15
               FOCUS                 10              FOCUS                  http://www.opengl.org/registry, October.
10
                                     6                                      ´
 5              FRAG                                 FRAG                 D E CORET, X., 2005. N-buffers for efficient depth map query.
 0                                    2
   90       120      150       180        90     120      150      180
             ELEVATION ANGLE (deg)                ELEVATION ANGLE (deg)   E NGELHARDT, T., AND DACHSBACHER , C. 2009. Granular vis-
                                                                             ibility queries on the gpu. In I3D ’09: Proceedings of the 2009
Figure 7: Dependence of the (average) render time on the light               symposium on Interactive 3D graphics and games, ACM, New
elevation angle.                                                             York, NY, USA, 161–167.
                                                                          G OVINDARAJU , N. K., L LOYD , B., YOON , S.-E., S UD , A., AND
                                                                             M ANOCHA , D. 2003. Interactive shadow generation in complex
mal in either case and our method delivers a consistent performance          environments. ACM Trans. Graph. 22, 3, 501–510.
increase.
                                                                                              ´
                                                                          G UTHE , M., BAL AZS , A., AND K LEIN , R. 2006. Near optimal
                                                                             hierarchical culling: Performance driven use of hardware occlu-
7 Conclusion                                                                 sion queries. In Eurographics Symposium on Rendering 2006,
                                                                             The Eurographics Association, T. Akenine-M¨ ller and W. Hei-
                                                                                                                           o
We proposed a method for efficient shadow caster culling for                  drich, Eds.
shadow mapping. In particular, our method aims at culling all
shadow casters which do not contribute to visible shadow. We de-          L AURITZEN , A., S ALVI , M., AND L EFOHN , A. 2010. Sample
scribed several methods to create suitable receiver masks, and in            distribution shadow maps. In Advances in Real-Time Rendering
particular one method which creates a practically pixel-exact mask           in 3D Graphics and Games II, SIGGRAPH 2010 Courses.
through reverse shadow lookups.                                           L LOYD , B., W ENDT, J., G OVINDARAJU , N., AND M ANOCHA ,
                                                                             D. 2004. Cc shadow volumes. In Proceedings of EUROGRAPH-
Our method addresses the significant open problem of efficient con-
                                                                             ICS Symposium on Rendering 2004.
struction of shadow maps especially in large outdoor scenes, which
tend to appear more and more often in recent entertainment applica-       L LOYD , D. B., T UFT, D., YOON , S.-E., AND M ANOCHA , D.
tions. We demonstrate speedups of 3x-10x for large city-like scenes          2006. Warping and partitioning for low error shadow maps. In
and about 1.5x-2x for a scene taken from a recent game title. The            Proceedings of the Eurographics Workshop/Symposium on Ren-
method is also easy to implement and integrates nicely with com-             dering, EGSR, Eurographics Association, Aire-la-Ville, Switzer-
mon rendering pipelines that already have a depth-prepass.                   land, T. Akenine-M¨ ller and W. Heidrich, Eds., 215–226.
                                                                                                o
Furthermore, we proposed a method for incremental shadow map              L LOYD , D. B., G OVINDARAJU , N. K., Q UAMMEN , C., M OL -
updates in the case of statically focused shadow maps, as well as            NAR , S. E., AND M ANOCHA , D. 2008. Logarithmic perspective
optimizations to the CHC++ occlusion culling algorithm, which are            shadow maps. ACM Trans. Graph. 27, 4, 1–32.
also applicable in other contexts.
                                                                          M ATTAUSCH , O., B ITTNER , J., AND W IMMER , M. 2008. Chc++:
In the future we want to focus on using the method in the context           Coherent hierarchical culling revisited. Computer Graphics Fo-
of different shadow mapping techniques and to study the behavior            rum (Proceedings of Eurographics 2008) 27, 3 (Apr.), 221–230.
of the method in scenes with many lights.
                                                                          S CHERZER , D., W IMMER , M., AND P URGATHOFER , W. 2010. A
                                                                             survey of real-time hard shadow mapping methods. In State of
Acknowledgements                                                             the Art Reports Eurographics,
                                                                          S TAMMINGER , M., AND D RETTAKIS , G. 2002. Perspective
We would like to thank Jiˇ´ Duˇek for an early implementation of
                         rı   s                                              shadow maps. In SIGGRAPH ’02: Proceedings of the 29th
shadow map culling ideas; Jason Mitchell for the Left 4 Dead 2               annual conference on Computer graphics and interactive tech-
model; Stephen Hill and Petri H¨ kkinen for feedback. This work
                                a                                            niques, ACM, New York, NY, USA, 557–562.
has been supported by the Austrian Science Fund (FWF) contract
no. P21130-N13; the Ministry of Education, Youth and Sports of            W ILLIAMS , L. 1978. Casting curved shadows on curved surfaces.
the Czech Republic under research programs LC-06008 (Center                 Computer Graphics (SIGGRAPH ’78 Proceedings) 12, 3 (Aug.),
for Computer Graphics) and MEB-060906 (Kontakt OE/CZ); and                  270–274.
the Grant Agency of the Czech Republic under research program
P202/11/1883.                                                             W IMMER , M., S CHERZER , D., AND P URGATHOFER , W. 2004.
                                                                            Light space perspective shadow maps. In Rendering Techniques
                                                                            2004 (Proceedings Eurographics Symposium on Rendering), Eu-
References                                                                  rographics Association, A. Keller and H. W. Jensen, Eds., Euro-
                                                                            graphics, 143–151.
B ITTNER , J., W IMMER , M., P IRINGER , H., AND P URGATH -
   OFER , W. 2004. Coherent hierarchical culling: Hardware oc-
   clusion queries made useful. Computer Graphics Forum 23, 3
   (Sept.), 615–624. Proceedings EUROGRAPHICS 2004.
B RABEC , S., A NNEN , T., AND S EIDEL , H.-P. 2002. Practical
   shadow mapping. Journal of Graphics Tools: JGT 7, 4, 9–18.
C OHEN -O R , D., C HRYSANTHOU , Y., S ILVA , C., AND D URAND ,
   F. 2003. A survey of visibility for walkthrough applications.

Weitere ähnliche Inhalte

Was ist angesagt?

05_글로벌일루미네이션
05_글로벌일루미네이션05_글로벌일루미네이션
05_글로벌일루미네이션
noerror
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
guest11b095
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Mark Kilgard
 

Was ist angesagt? (20)

Datt 2500 week 10
Datt 2500 week 10Datt 2500 week 10
Datt 2500 week 10
 
Datt 2501 week 10
Datt 2501 week 10Datt 2501 week 10
Datt 2501 week 10
 
05_글로벌일루미네이션
05_글로벌일루미네이션05_글로벌일루미네이션
05_글로벌일루미네이션
 
Introduction of slam
Introduction of slamIntroduction of slam
Introduction of slam
 
Datt 2501 week 11
Datt 2501 week 11Datt 2501 week 11
Datt 2501 week 11
 
Robust Stenciled Shadow Volumes
Robust Stenciled Shadow VolumesRobust Stenciled Shadow Volumes
Robust Stenciled Shadow Volumes
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the Fallen
 
CS 354 Shadows
CS 354 ShadowsCS 354 Shadows
CS 354 Shadows
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Build Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionBuild Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface Reconstruction
 
Advanced Lighting for Interactive Applications
Advanced Lighting for Interactive ApplicationsAdvanced Lighting for Interactive Applications
Advanced Lighting for Interactive Applications
 
Lecture1
Lecture1Lecture1
Lecture1
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics API
 
APPEARANCE-BASED REPRESENTATION AND RENDERING OF CAST SHADOWS
APPEARANCE-BASED REPRESENTATION AND RENDERING OF CAST SHADOWSAPPEARANCE-BASED REPRESENTATION AND RENDERING OF CAST SHADOWS
APPEARANCE-BASED REPRESENTATION AND RENDERING OF CAST SHADOWS
 
Interactive Refractions And Caustics Using Image Space Techniques
Interactive Refractions And Caustics Using Image Space TechniquesInteractive Refractions And Caustics Using Image Space Techniques
Interactive Refractions And Caustics Using Image Space Techniques
 

Andere mochten auch

Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Umbra
 
Media presentation feb 2011
Media presentation feb 2011Media presentation feb 2011
Media presentation feb 2011
NEXTDC
 
RELIABILITY infographic
RELIABILITY infographicRELIABILITY infographic
RELIABILITY infographic
Leif Ericson
 
LLC Presentation
LLC PresentationLLC Presentation
LLC Presentation
Peter Bates
 
Yarra Valley Water and Box Hill Institute Initiative - Diploma of Sustainabl...
Yarra Valley Water and  Box Hill Institute Initiative - Diploma of Sustainabl...Yarra Valley Water and  Box Hill Institute Initiative - Diploma of Sustainabl...
Yarra Valley Water and Box Hill Institute Initiative - Diploma of Sustainabl...
NEXTDC
 
2. Parsons Brinckerhoff
2. Parsons Brinckerhoff2. Parsons Brinckerhoff
2. Parsons Brinckerhoff
NEXTDC
 
Water Infrastructure Group Advertisement
Water Infrastructure Group AdvertisementWater Infrastructure Group Advertisement
Water Infrastructure Group Advertisement
Leif Ericson
 
SWM_PrivateInspectionsMobile
SWM_PrivateInspectionsMobileSWM_PrivateInspectionsMobile
SWM_PrivateInspectionsMobile
Renee Opatz, GISP
 
B&V EMEIA Brains & Brawn
B&V EMEIA Brains & BrawnB&V EMEIA Brains & Brawn
B&V EMEIA Brains & Brawn
Paul Hart
 
City Council March 19, 2013 Hickory update
City Council March 19, 2013 Hickory updateCity Council March 19, 2013 Hickory update
City Council March 19, 2013 Hickory update
City of San Angelo Texas
 
Pro Bid Brochure
Pro Bid BrochurePro Bid Brochure
Pro Bid Brochure
Tara Quin
 

Andere mochten auch (20)

Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
Solving Visibility and Streaming in the The Witcher 3: Wild Hunt with Umbra 3
 
Object drawing light & shadow
Object drawing light & shadowObject drawing light & shadow
Object drawing light & shadow
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Media presentation feb 2011
Media presentation feb 2011Media presentation feb 2011
Media presentation feb 2011
 
RELIABILITY infographic
RELIABILITY infographicRELIABILITY infographic
RELIABILITY infographic
 
LLC Presentation
LLC PresentationLLC Presentation
LLC Presentation
 
Slides throughout the day (2016 EWMF)
Slides throughout the day (2016 EWMF)Slides throughout the day (2016 EWMF)
Slides throughout the day (2016 EWMF)
 
Produced Water Handling System Upgrade
Produced Water Handling System UpgradeProduced Water Handling System Upgrade
Produced Water Handling System Upgrade
 
T s2 gh3_nigel murphy
T s2 gh3_nigel murphyT s2 gh3_nigel murphy
T s2 gh3_nigel murphy
 
BlueTechValley conference 2011 - Grundfos presentation
BlueTechValley conference 2011 - Grundfos presentationBlueTechValley conference 2011 - Grundfos presentation
BlueTechValley conference 2011 - Grundfos presentation
 
Workshop E, Extending the benefits life cycle: ISO 55000 by John Heathcote
Workshop E, Extending the benefits life cycle: ISO 55000 by John HeathcoteWorkshop E, Extending the benefits life cycle: ISO 55000 by John Heathcote
Workshop E, Extending the benefits life cycle: ISO 55000 by John Heathcote
 
Yarra Valley Water and Box Hill Institute Initiative - Diploma of Sustainabl...
Yarra Valley Water and  Box Hill Institute Initiative - Diploma of Sustainabl...Yarra Valley Water and  Box Hill Institute Initiative - Diploma of Sustainabl...
Yarra Valley Water and Box Hill Institute Initiative - Diploma of Sustainabl...
 
2. Parsons Brinckerhoff
2. Parsons Brinckerhoff2. Parsons Brinckerhoff
2. Parsons Brinckerhoff
 
CFSGAM Presentation - Yarra Valley Water Breakfast series
CFSGAM Presentation - Yarra Valley Water Breakfast seriesCFSGAM Presentation - Yarra Valley Water Breakfast series
CFSGAM Presentation - Yarra Valley Water Breakfast series
 
Water Infrastructure Group Advertisement
Water Infrastructure Group AdvertisementWater Infrastructure Group Advertisement
Water Infrastructure Group Advertisement
 
SWM_PrivateInspectionsMobile
SWM_PrivateInspectionsMobileSWM_PrivateInspectionsMobile
SWM_PrivateInspectionsMobile
 
B&V EMEIA Brains & Brawn
B&V EMEIA Brains & BrawnB&V EMEIA Brains & Brawn
B&V EMEIA Brains & Brawn
 
City Council March 19, 2013 Hickory update
City Council March 19, 2013 Hickory updateCity Council March 19, 2013 Hickory update
City Council March 19, 2013 Hickory update
 
Pro Bid Brochure
Pro Bid BrochurePro Bid Brochure
Pro Bid Brochure
 
Cleantech Group 2011 Global conference - Grundfos presentation
Cleantech Group 2011 Global conference - Grundfos presentationCleantech Group 2011 Global conference - Grundfos presentation
Cleantech Group 2011 Global conference - Grundfos presentation
 

Ähnlich wie Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Oliver Mattausch, Ari Silvennoinen, Michael Wimmer)

Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
CSCJournals
 
Monocular LSD-SLAM integreation within AR System
Monocular LSD-SLAM integreation within AR SystemMonocular LSD-SLAM integreation within AR System
Monocular LSD-SLAM integreation within AR System
Markus Höll
 

Ähnlich wie Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Oliver Mattausch, Ari Silvennoinen, Michael Wimmer) (20)

06714519
0671451906714519
06714519
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGA
 
Isvc08
Isvc08Isvc08
Isvc08
 
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOSSHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
SHARP OR BLUR: A FAST NO-REFERENCE QUALITY METRIC FOR REALISTIC PHOTOS
 
SHADOW DETECTION USING TRICOLOR ATTENUATION MODEL ENHANCED WITH ADAPTIVE HIST...
SHADOW DETECTION USING TRICOLOR ATTENUATION MODEL ENHANCED WITH ADAPTIVE HIST...SHADOW DETECTION USING TRICOLOR ATTENUATION MODEL ENHANCED WITH ADAPTIVE HIST...
SHADOW DETECTION USING TRICOLOR ATTENUATION MODEL ENHANCED WITH ADAPTIVE HIST...
 
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
IRJET -  	  Dehazing of Single Nighttime Haze Image using Superpixel MethodIRJET -  	  Dehazing of Single Nighttime Haze Image using Superpixel Method
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
 
reviewpaper
reviewpaperreviewpaper
reviewpaper
 
Structlight
StructlightStructlight
Structlight
 
Deblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationDeblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel Estimation
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
 
06 robot vision
06 robot vision06 robot vision
06 robot vision
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdf
 
Rendering Algorithms.pptx
Rendering Algorithms.pptxRendering Algorithms.pptx
Rendering Algorithms.pptx
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing Approaches
 
IRJET- Photo Restoration using Multiresolution Texture Synthesis and Convolut...
IRJET- Photo Restoration using Multiresolution Texture Synthesis and Convolut...IRJET- Photo Restoration using Multiresolution Texture Synthesis and Convolut...
IRJET- Photo Restoration using Multiresolution Texture Synthesis and Convolut...
 
Monocular LSD-SLAM integreation within AR System
Monocular LSD-SLAM integreation within AR SystemMonocular LSD-SLAM integreation within AR System
Monocular LSD-SLAM integreation within AR System
 
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Oliver Mattausch, Ari Silvennoinen, Michael Wimmer)

  • 1. Shadow Caster Culling for Efficient Shadow Mapping Jiˇ´ Bittner∗ rı Oliver Mattausch† Ari Silvennoinen‡ Michael Wimmer† ∗ Czech Technical University in Prague§ † Vienna University of Technology ‡ Umbra Software Figure 1: (left) A view of a game scene rendered with shadows. (right) Shadow map with shadow casters rendered using a naive application of occlusion culling in the light view (gray), and shadow casters rendered using our new method (orange). The view frustum corresponding to the left image is shown in yellow. Note that for this view our shadow caster culling provided a 3x reduction in the number of rendered shadow casters, leading to a 1.5x increase in total frame rate. The scene is a part of the Left 4 Dead 2 game (courtesy of Valve Corp.). Abstract been proposed, most of which focus on increasing the quality of the rendered image given a maximum resolution of the shadow We propose a novel method for efficient construction of shadow map [Stamminger and Drettakis 2002; Wimmer et al. 2004; Lloyd maps by culling shadow casters which do not contribute to visible et al. 2008]. Given sufficiently high resolution, these methods shadows. The method uses a mask of potential shadow receivers achieve shadows of very high quality. to cull shadow casters using a hierarchical occlusion culling algo- rithm. We propose several variants of the receiver mask implemen- For highly complex scenes, however, shadow mapping can be- tations with different culling efficiency and computational costs. come very slow. One problem is the rendering of the main cam- For scenes with statically focused shadow maps we designed an era view. This problem has been addressed by occlusion culling, efficient strategy to incrementally update the shadow map, which which aims to quickly cull geometry which does not contribute to comes close to the rendering performance for unshadowed scenes. the image [Cohen-Or et al. 2003]. However, just rendering the cam- We show that our method achieves 3x-10x speedup for rendering era view quickly does not guarantee high frame rates when shadow large city like scenes and 1.5x-2x speedup for rendering an actual mapping is used. In particular, the overhead of creating the shadow game scene. map is not reduced. Thus, rendering the shadow map can easily become the bottleneck of the whole rendering pipeline as it may require rendering a huge amount of shadow casters at a very high CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional resolution. Graphics and Realism— [I.3.5]: Computer Graphics— Computational Geometry and Object Modeling A naive solution to this problem would again use occlusion culling to reduce the amount of rendered shadow casters when rendering Keywords: shadow maps, occlusion culling, real time rendering the shadow map. However it turns out that for common complex scenes like terrains or cities, the lights are set up so that they have 1 Introduction a global influence on the whole scene (e.g., sun shining over the city). For such scenes, occlusion culling from the light view will not solve the problem, as the depth complexity of the light view is Shadow mapping is a well-known technique for rendering shad- rather low and thus not much geometry will be culled. Thus even ows in 3D scenes [Williams 1978]. With the rapid development if occlusion culling is used for both the camera view and the light of graphics hardware, shadow maps became an increasingly popu- view, we might end up rendering practically all geometry contained lar technique for real-time rendering. A variety of techniques have in the intersection of the view frustum and the light frustum, and ∗ e-mail:bittner@fel.cvut.cz many rendered shadow casters will not contribute to the shadows in † e-mail:{matt|wimmer}@cg.tuwien.ac.at the final image. ‡ e-mail:ari@umbrasoftware.com § Faculty of Electrical Engineering In this paper, we propose a method for solving this problem by using the knowledge of visibility from the camera for culling the shadow casters. First, we use occlusion culling for the camera view to identify visible shadow receivers. Second, when render- ing into the shadow map, we use only those shadow casters which cast shadows on visible shadow receivers. All other shadow cast- ers are culled. For both the camera and the light views, we use an improved version of the coherent hierarchical culling algorithm (CHC++) [Mattausch et al. 2008], which provides efficient schedul-
  • 2. ing of occlusion queries based on temporal coherence. We show (1) Determine shadow receivers that this method brings up to an order of magnitude speedup com- pared to naive application of occlusion culling to shadow map- (2) Create a mask of shadow receivers ping (see Figure 1). As a minor contribution we propose a simple (3) Render shadow casters using the mask for culling method for incrementally updating a statically focused shadow map for scenes with moving objects. (4) Compute shading The main contribution of our paper lies in steps (2) and (3). To give 2 Related Work a complete picture of the method we briefly outline all four steps of the method. Shadow mapping was originally introduced by Williams [1978], and has meanwhile become the de-facto standard shadow algo- rithm in real-time applications such as computer games, due to its Determine shadow receivers The potential shadow receivers speed and simplicity. For a discussion of different methods to in- for the current frame consist of all objects visible from the cam- crease the quality of shadow mapping, we refer to a recent sur- era view, since shadows on invisible objects do not contribute to vey [Scherzer et al. 2010]. In our approach in particular, we make the final image. Thus we first render the scene from the point of use of light space perspective shadow mapping (LiSPSM) [Wim- view of the camera and determine the visibility status of the scene mer et al. 2004] in order to warp the shadow map to provide higher objects in this view. resolution near the viewpoint, and cascaded shadow maps [Lloyd To do both of these things efficiently, we employ hierarchical oc- et al. 2006], which partition the shadow map according to the view clusion culling [Bittner et al. 2004; Guthe et al. 2006; Mattausch frustum with the same aim. et al. 2008]. In particular we selected the recent CHC++ algo- There has been surprisingly little work on shadow mapping for rithm [Mattausch et al. 2008] as it is simple to implement and pro- large scenes. One obvious optimization which also greatly aids vides a good basis for optimizing further steps of our shadow caster shadow quality is to focus the light frustum on the receiver frus- culling method. CHC++ provides us with a visibility classification tum [Brabec et al. 2002]. As a result, many objects which do not of all nodes of a spatial hierarchy. The potential shadow receivers cast shadows on objects in the view frustum are culled by frustum correspond to visible leaves of this hierarchy. culling in the shadow map rendering step. Lauritzen et al. [2010] have reduced the focus region to the subset of visible view sam- Note that in contrast to simple shadow mapping, our method re- ples in the context of optimizing the splitting planes for cascaded quires a render pass of the camera view before rendering the shadow shadow maps. map. However, such a rendering pass is implemented in many ren- dering frameworks anyway. For example, deferred shading, which Govindaraju et al. [2003] were the first to use occlusion culling to has become popular due to recent advances in screen-space shading reduce the amount of shadow computations in large scenes. Their effects, provides such a pass, and even standard forward-renderers algorithm computes object-precision shadows mostly on the CPU often utilize a depth-prepass to improve pixel-level culling. and used occlusion queries on a depth map to provide the tight- est possible bound on the shadow polygons that have to be ren- Create a mask of shadow receivers The crucial step of our al- dered. While their algorithm also renders shadow receivers to the gorithm is the creation of a mask in the light view which represents stencil buffer to identify potential shadow casters, this was done af- shadow receivers. For a crude approximation, this mask can be ter a depth map had already been created, and thus this approach formed by rendering bounding boxes of visible shadow receivers in is unsuitable for shadow mapping acceleration. In our algorithm, the stencil buffer attached to the shadow map. In the next section, on the other hand, the creation of the depth map itself is acceler- we propose several increasingly sophisticated methods for building ated, which is a significant benefit for many large scenes. Lloyd this mask, which provide different accuracy vs. complexity trade- et al. [2004] later extended shadow caster culling by shadow vol- offs. ume clamping, which significantly reduced the fillrate when render- ing shadow volumes. Their paper additionally describes a method for narrowing the stencil mask to shadow receiver fragments that Render shadow casters We use hierarchical visibility culling are detected to lie in shadow. D´ coret [2005] proposed a different e using hardware occlusion queries to speed up the shadow map ren- method for shadow volume clamping as one of the applications for dering pass. In addition to depth-based culling, we use the shadow N-buffers. Unlike Lloyd et al. [2004] D´ coret enhances the sten- e receiver mask to cull shadow casters which do not contribute to vis- cil mask by detecting receiver fragments which are visible from ible shadows. More precisely, we set up the stencil test to discard the camera. The methods for shadow volume culling and clamping fragments outside the receiver mask. Thus, the method will only have been extended by techniques allowing for more efficient GPU render shadow casters that are visible from the light source and implementations[?; Engelhardt and Dachsbacher 2009]. whose projected bounding box at least partly overlaps the receiver mask. While our method shares the idea of using visible receivers to cull shadow casters proposed in the above mentioned techniques, in these methods the shadow map served only as a proxy to facili- Compute shading The shadow map is used in the final rendering tate shadow volume culling and clamping, and the efficient con- pass to determine light source visibility for each shaded fragment. struction of the shadow map itself has not been addressed. When shadow mapping is used instead of more computationally demand- 4 Shadow Caster Culling ing shadow volumes, the bottleneck of the computation moves to the shadow map construction itself. This bottleneck, which we ad- The general idea of our method is to create a mask of visible shadow dress in the paper, was previously left intact. receivers, i.e., those objects determined as visible in the first camera rendering pass. This mask is used in the shadow map rendering pass 3 Algorithm Outline together with the depth information to cull shadow casters which do not contribute to the visible shadows. Culling becomes particularly The proposed algorithm consists of four main steps: efficient if it employs a spatial hierarchy to quickly cull large groups
  • 3. Figure 2: Visualization of different receiver masks. On the left there is a light space view of the scene showing surfaces visible from the camera in yellow. On the right three different receiver masks are shown. Note that the invisible object on the top right does not contribute to any of the receiver masks. of shadow casters, which is the case for example in the CHC++ casters (i.e., most of the scene is shadowed by objects outside of algorithm. the camera view), the shadow mask creation can become rather expensive, since most parts of the shadow map will be replaced There are various options on how the receiver mask can be imple- by other objects which act as real shadow casters. Consequently, mented. We describe four noteworthy variants which differ in im- the resources for rendering these shadow receivers into the shadow plementation complexity as well as in culling efficiency. The cre- map were wasted, as they would have been culled by the occlusion ation of the mask happens in a separate pass before rendering the culling algorithm otherwise. shadow map, but may already generate parts of the shadow map itself as well, simplifying the subsequent shadow map rendering pass. 4.3 Combined Geometry and Bounding Volume Mask We start our description with a simple version of the mask, which In order to combine the positive aspects of the two previously de- is gradually extended with more advanced culling. As we shall see scribed methods, we propose a technique which decides whether a later in the results section, in most cases the more accurate mask shadow receiver should fill the mask using its bounding box or its provides better results, although there might be exceptions for par- geometry. The decision is based on the estimation of whether the ticular scene and hardware configurations. visible shadow receiver will simultaneously act as a shadow caster. Such objects are rendered using geometry (with both stencil and 4.1 Bounding Volume Mask depth, as such objects must be rendered in the shadow map any- way), while all other visible receivers are rendered using bounding The most straightforward way to create the mask is to rasterize boxes (with only stencil write). bounding volumes of all visible shadow receivers into the stencil The estimation uses temporal coherence: if a shadow receiver was buffer attached to the shadow map. visible in the shadow map in the previous frame, it is likely that this Such a mask will generally lead to more shadow casters being ren- receiver will stay visible in the shadow map and therefore also act dered than actually necessary. The amount of overestimation de- as a shadow caster in the current frame. pends mostly on how fine the subdivision of scene meshes into in- Again, all objects that have already been rendered into the shadow dividual objects is, and how tightly their bounding volumes fit. If map using depth writes can be skipped in the shadow map render- an individual scene object has too large spatial extent, its projection ing pass. However, in order to estimate the receiver visibility in the might cover a large portion in the shadow mask even if only a small subsequent frame, visibility needs to be determined even for these part of the mesh is actually visible. An illustration of the bounding skipped objects. Therefore we include them in the hierarchical oc- volume mask is shown in Figure 2 (BVOL). clusion traversal and thus they may have their bounding volumes rasterized during the occlusion queries. 4.2 Geometry Mask 4.4 Fragment Mask In order to create a tighter shadow receiver mask, we can rasterize the actual geometry of the visible shadow receivers instead of their bounding volumes into the stencil buffer. Even the most accurate of the masks described above, the geometry mask, can lead to overly conservative receiver surface approxima- While rendering the visible shadow receivers, we also write the tions. This happens when a receiver object spans a large region of depth values of the shadow receivers into the shadow map. This the shadow map even though most of the object is hidden in the has the advantage that these objects have thus already been ren- camera view. An example would be a single ground plane used for dered into the shadow map and can be skipped during the subse- a whole city, which would always fill the whole shadow receiver quent shadow map rendering pass, which uses occlusion queries mask and thus prevent any culling. While such extreme cases can just for the remaining objects. An illustration of the bounding vol- be avoided by subdividing large receiver meshes into smaller ob- ume mask is shown in Figure 2 (GEOM). jects, it is still beneficial to have a tighter mask which is completely independent of the actual object subdivision. In some scenes many shadow receivers also act as shadow casters and thus this method will provide a more accurate mask at almost Fortunately, we can further refine the mask by using a fragment no cost. However if most shadow receivers do not act as shadow level receiver visibility tests for all shadow receivers where the full
  • 4. Method accuracy fill rate transform rate BVOL low high low GEOM medium medium medium GEOM+BVOL medium low–medium low FRAG high medium medium Table 1: Summary of the masking techniques and indicators of their main attributes. BVOL – bounding box, GEOM – geometry mask, GEOM+BVOL – combined geometry and bounding volume mask, FRAG – fragment mask. Note that the FRAG method either relies on availability of direct stencil writes, or requires slightly more effort in mask creation and culling phases as described in Section 4.4. geometry is used for the mask. While creating the mask by raster- izing the geometry in light space, we project each fragment back to view space and test the projected fragment visibility against the view space depth buffer. If the fragment is invisible, it is on a part Figure 3: 2D illustration of the culling efficiency of different mask- of the shadow receiver that is hidden, and is thus not written to the ing strategies. The figure shows an example of the scene with mask, otherwise, the mask entry for this fragment is updated. The a shadow receiver object and different types of shadow receiver shadow map depth needs to be written in both cases, since even if masks. Object A is always rendered as it intersects all masks, ob- the fragment is not visible in the camera view, it might still be a ject B is culled only by the FRAG mask, C is culled by depth test shadow caster that shadows another visible shadow receiver. (occluded by B) and by the FRAG and GEOM masks, and finally D is culled by all types of masks. A similar mask construction approach was proposed by D´ coret [2005] for culling and clamping shadow volumes of e 5 Further Optimizations shadow casters. D´ coret used a pair of litmaps to obtain minimum e and maximum depths of visible receivers per texel in order to clamp This section contains several optimization techniques which can op- shadow volumes. In his method all scene objects are processed tionally support the proposed receiver masking in further enhancing twice in the mask construction phase and the method does not use the performance of the complete rendering algorithm. the knowledge of visible shadow receivers. In contrast, we use a single binary mask of visible receiver fragments and combine the 5.1 Incremental Shadow Map Updates construction of the mask with simultaneous rendering of depths of objects which act both as shadow receivers and shadow casters, If the shadow map is statically focused, we can extend the approach which is very important to reduce the bottleneck created by the by restricting the shadow receiver mask only to places where a po- shadow map construction. tential change in shadow can happen. We can do so by building the shadow receiver mask only from objects involved in dynamic The mask constructed using the fragment level visibility tests is changes. In particular we render all moving objects into the shadow pixel accurate, i.e., only locations that can receive a visible shadow receiver mask in their previous and current positions. This pass is are marked in the mask (up to the bias used for the fragment depth restricted only to dynamic objects which are either visible in this or comparison). Note that the fragment visibility test corresponds to the previous frame. a “reversed” shadow test, i.e., the roles of the camera and the light If only a fraction of the scene objects is transformed, the shadow re- are swapped: instead of testing the visibility of the camera view ceiver mask becomes very tight and the overhead for shadow map fragment with respect to the shadow map, we test visibility of the rendering becomes practically negligible. However, this optimiza- fragment rendered into the shadow map with respect to the camera tion is only useful if a static shadow map brings sufficient shadow using the camera view depth buffer. An illustration of the bounding quality compared to a shadow map focused on the view frustum. volume mask is shown in Figure 2 (FRAG). Note that in order to create the stencil mask and simultaneously clear the depth buffer in a selective fashion, we reverse the depth The implementation of this approach faces the problem that the test (only fragments with greater depth pass) and set the fragment stencil mask needs to be updated based on the outcome of the depth to 1 in the shader while rendering the bounding boxes of mov- shader. This functionality is currently not widely supported in hard- ing objects. ware. Therefore, we implement the fragment receiver mask as an additional texture render target instead of using the stencil buffer. This requires an additional texture fetch and a conditional fragment 5.2 Visibility-Aware Shadow Map Focusing discard based on the texture fetch while rendering the occlusion queries in the shadow map rendering pass, which incurs a small Shadow maps are usually “focused” on the view frustum in order to performance penalty compared to a pure stencil test. increase the available shadow map resolution. If visibility from the camera view is available, focusing can be improved further [Lau- ritzen et al. 2010]: The light frustum can then be computed by cal- The summary of different receiver masking methods is given in Ta- culating the convex hull of the light source and the set of bounding ble 1. The illustration of the culling efficiency of different masks boxes of the visible shadow receivers. Note that unlike the method is shown in Figure 3. Note that apart from the BVOL method, all of Lauritzen et al. [2010], our focusing approach does not require other methods initiate the depth buffer values for the light view with a view sample analysis on the GPU, but uses the readily available depths of visible receivers. visibility classification from the camera rendering pass.
  • 5. Figure 4: This figure shows a viewpoint in the Vienna scene, the corresponding light view, and a visualization of the different versions of receiver masks where blue indicates the bounding volume mask, green the geometry mask, and red the fragment mask. This simple method effectively increases the resolution of the 6 Results and Discussion shadow map, as it is focused only on the visible portion of the scene. However this technique also has a drawback: if a large visi- 6.1 Test Setup bility change occurs in the camera view, the focus of the light view changes dramatically from one frame to the next, which can be per- We implemented our algorithms in OpenGL and C++ and evaluated ceived as temporal alias or popping of the shadow. This can easily them using a GeForce 480 GTX GPU and a single Intel Core i7 happen in scenes in which a distant object suddenly pops up on the CPU 920 with 2.67 GHz. For the camera view render pass we used horizon such as a tall building, flying aircraft or a mountain. a 800 × 600 32-bit RGBA render target (to store color and depth) and a 16-bit RGB render target (to store the normals), respectively. For the light view render pass we used a 32-bit depth-stencil texture 5.3 CHC++ Optimizations and an 8-bit RGB color buffer with varying sizes. The application uses a deferred shading pipeline for shadowing. City scenes exhibit the scene characteristics which our algorithm is Our new shadow culling method works with any occlusion culling targeted at – large, open scenes with high cost for shadow mapping. algorithm that allows taking the stencil mask into account. We Hence we tested our algorithm in three city environments: a model use coherent hierarchical culling (CHC++ [Mattausch et al. 2008]) of Manhattan, the ancient town of Pompeii, and the town of Vienna to implement hierarchical occlusion queries both for rendering the (refer to Table 3 for detailed information). All scenes were pop- camera view and for rendering the shadow map. For the shadow ulated with various scene objects: Manhattan was populated with map, occlusion queries can be set up so that they automatically take 1,600 cars, Pompeii with 200 animated characters, and Vienna with into account the shadow receiver mask stored in the stencil buffer. several street objects and trees. For Manhattan (Pompeii) we cre- If the receiver mask is stored in a separate texture (as for the frag- ated 302 (402 ) floor tiles in order to have sufficient granularity for ment mask), a texture lookup and conditional fragment discard is the object-based receiver mask methods. We measured our timings necessary in the occlusion culling shader. with a number of predefined walkthroughs. Figure 4 shows the dif- ferent receiver masks in a real scene. We also propose a few modifications to the CHC++ algorithm, Abbreviation Method which should be useful in most applications using CHC++. Our re- VFC view frustum culling cent experiments indicated that for highly complex views the main REF CHC++ occlusion culling (main reference) performance hit of CHC++ comes from the idle time of the CPU, FOCUS REF + visibility-aware focusing when the algorithm has to wait for query results of previously in- CHC++-OPT optimized CHC++ visible nodes. The reason is that previously invisible nodes, which INCR incremental shadow map updates sometimes generate further queries leading to further wait time, can UNSHAD main render pass without shadow mapping be delayed due to batching, whereas queries for previously visible nodes, whose result is only of interest in the next frame, are issued Table 2: Abbreviations used in the plots (also refer to Table 1 for immediately whenever the algorithm is stalled waiting for a query the variants of receiver masks). result. We compare our methods mainly against a reference method (REF) Therefore, it turns out to be more beneficial to use waiting time to that uses the original CHC++ algorithm for both the camera view issue queries for previously invisible nodes instead, starting these and the light view. We also show simple view frustum culling queries as early as possible. Queries for the previously visible (VFC) and a minor modification of the REF method which uses nodes, on the other hand, are issued after the hierarchy traversal visibility-aware focusing (FOCUS). Note that apart from the FO- is finished, and their results are fetched just before the traversal in CUS method, all other tested methods do not use visibility-aware the next frame. Between these two stages we apply the shading pass focusing. The tested methods are summarized in Table 2. with shadow map lookups, which constitutes a significant amount of work to avoid any stalls. 6.2 Receiver Mask Performance and Shadow Map Res- olution Another very simple CHC++ modification concerns the handling of Table 4 shows average frame times for the tested scenes and differ- objects that enter the view frustum. If such an object has been in ent shadow map resolutions. As can be seen, receiver masking is the view frustum before, instead of setting it to invisible, it inherits faster than REF and FOCUS for all scenes and parameter combina- its previous classification. This brings benefits especially for the tions. There is a tendency that our algorithm brings slightly higher camera view for frequent rotational movements. speedup for uniform shadow mapping than for LiSPSM. This has
  • 6. Vienna Manhattan Pompeii 12,685,693 vertices 104,960,856 vertices 63,141,390 vertices 22,464 objects 12,836 objects 11,097 objects 20,240 BVH nodes 6,471 BVH nodes 14,945 BVH nodes TIME (ms) VIENNA, LISPSM−1K TIME (ms) NEW YORK, UNIFORM−4K TIME (ms) POMPEII, LISPSM−1K 50 60 90 VFC VFC 80 VFC 40 50 70 REF 40 60 REF 30 REF 50 30 40 20 FOCUS 20 FOCUS 30 FOCUS 10 20 10 10 FRAG GEOM+BVOL FRAG 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 FRAMES FRAMES FRAMES Table 3: Statistics for the scenes and characteristic walkthroughs. SM type LISPSM UNIFORM Note that for shadow maps smaller or equal than 2K 2 , the frag- Shadow size 1K 2K 4K 1K 2K 4K ment mask is consistently the fastest method in our tests. For larger Scene Vienna shadow maps of 4K 2 or more, it becomes highly scene depen- REF 21.6 22.3 22.4 28.7 28.7 29.2 dent whether the benefit outweighs the cost of the additional texture FOCUS 9.1 9.3 9.3 10.8 11.0 11.4 lookups during mask creation and occlusion culling. In this case, GEOM+BVOL 4.9 4.9 5.0 4.9 5.0 5.1 the GEOM+BVOL method can be used, which has a smaller over- FRAG 2.9 3.5 6.1 2.9 3.4 5.9 head caused only by the stencil buffer writes and the rasterization Scene Manhattan of the additional bounding boxes. On the other hand, we expect a higher speedup for the reverse shadow testing once it is possible REF 36.6 35.9 35.1 44.2 43.9 41.8 to write the stencil buffer in the shader. There is already a specifi- FOCUS 28.9 28.1 27.9 31.9 30.7 30.0 cation available called GL ARB STENCIL EXPORT [corp. 2010], GEOM+BVOL 5.6 5.7 6.6 5.6 5.7 6.5 and we hope that it will be fully supported in hardware soon. FRAG 4.5 5.4 9.0 4.5 5.3 8.6 Scene Pompeii 6.3 Walkthrough Timings REF 34.8 34.8 39.2 40.5 40.2 44.8 FOCUS 17.4 17.4 20.6 18.2 18.3 21.2 GEOM+BVOL 11.2 11.2 13.4 11.4 11.3 13.0 The plots in Table 3 show timings for walkthroughs selected scene FRAG 9.7 10.0 13.2 9.8 10.2 12.5 and parameter combinations. For Manhattan, we applied uniform shadow mapping with 4K 2 resolution and use the combined geom- etry and bounding volume receiver mask (GEOM+BVOL). Note Table 4: Average frame times for the tested scenes (in ms). The time that in this scene, the FOCUS method is often useless because for the best method for the given scene and shadow map resolution skyscrapers are often visible in the distance, while most of the ge- is marked in bold. ometry in between can actually be culled by receiver masking. In Vienna and Pompeii, we applied LiSPSM shadow mapping with 1K 2 resolution, and the fragment mask (FRAG). The plots show that receiver masking is the only method that provides stable frame two reasons: First, LiSPSM focuses much better on the visible parts times for the presented walkthroughs, which is an important prop- of the view frustum. Second, LiSPSM trades the quality increase erty for a real-time rendering algorithm. The walkthroughs cor- in the near field with quality loss in the far field regions (depending responding to these plots along with visualizations of the culled on the settings of the n parameter). Therefore it can happen that objects can be watched in the accompanying video for further anal- small objects in the background are simply not represented in the ysis. shadow map and will therefore be culled. Surprisingly, the frame times of the REF and FOCUS methods slightly decrease for in- Table 5 shows the statistics and the timings for a scene taken from creasing shadow map sizes in Manhattan. This unintuitive behavior an actual game title (Left 4 Dead 2) using cascaded shadow map- might be connected with the heavy transform limitation of these al- ping. This game scene was rendered using a different occlusion gorithms for this particular scene configuration (i.e., due to many culling algorithm than CHC++, implemented in a different render- car instances). ing engine. The receiver masking is however implemented exactly
  • 7. Left 4 Dead 2 TIME (ms) VIENNA, LISPSM−2K VERTICES (M) VIENNA, LISPSM-2K 40 3.5 1,442,775 vertices REF REF CHC++−OPT 35 CHC++−OPT 3 6,359 objects 30 BVOL GEOM+BVOL 2.5 BVOL GEOM+BVOL 10,675 BVH nodes 25 FRAG 2 FRAG TIME (ms) LEFT4DEAD, CASCADED−1K 20 40 1.5 15 1 35 VFC 10 5 0.5 30 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 25 FRAMES FRAMES REF 20 Figure 5: Dependence of the frame time (bottom left) and num- 15 ber of vertices (bottom right) for a walkthrough of the Vienna scene 10 FRAG and different shadow computation methods. Note that the walk- 5 throuh also includes viewpoints located above the roofs. Such view- points see far over the city roofs (top). In this comparison the FRAG 0 500 1000 1500 2000 2500 3000 method achieves consistently the best performance followed by the FRAMES the GEOM+BVOL method. Table 5: Statistics for the Left 4 Dead 2 scene and timings for TIME (ms) POMPEII, STATIC−4K a walkthrough using cascaded shadow mapping with four 1K 2 100 REF shadow maps. FRAG 10 INCR as described in our paper. The fragment receiver mask and sub- UNSHAD sequent culling is applied separately for each of the four cascaded shadow maps using 1K 2 resolution. Note that in this scene many objects are actually inside the houses and hence culled already by 2000 1 500 1000 1500 2500 the REF method (depth-based culling without receiver masking). FRAMES Even though this might imply a limited additional gain of our al- gorithm compared to REF, Table 5 shows that fragment masking Figure 6: Effect of incremental mask updates for a statically fo- still provides a significant speedup. Note that during frames 1,800- cused shadow map. 2,200, only a small fraction of objects is visible in the view frustum and almost all casters in the shadow frustum throw a visible shadow in the main view and thus there is very small potential for perfor- 6.5 Incremental Shadow Map Updates mance gain using any shadow caster culling method. In Figure 6, we use a static 4K 2 shadow map for the whole Pom- peii scene. Using this particular setup, REF performs over ten times 6.4 Geometry Analysis and CHC++ Optimization worse than FRAG. Conversely, a static shadow map allows us to use the incremental shadow map updates optimization (INCR), which restricts shadow map updates to the parts of the shadow map that change, i.e., to the projections of the bounding boxes of dynamic Figure 5 analyses the actual amount of geometry rendered for dif- objects. As can be seen, this technique can approach the perfor- ferent methods and contrasts that with frame times. In particular, mance of unshadowed scene rendering (UNSHAD). Note that we here we include a comparison of the original CHC++ algorithm use a log-scale for this plot. used in REF with our optimized version CHC++-OPT. The differ- ent receiver masking versions all use optimized CHC++. For all algorithms we used a 2K 2 shadow map and LiSPSM. 6.6 Dependence on Elevation Angle In Figure 7, we show the dependence of the light source elevation Interestingly, the optimized CHC++ algorithm renders more ver- angle on the (average) frame time for shadow mapping with resolu- tices, but provides a constant speedup over the original version be- tion 1K 2 in a Vienna walkthrough. For uniform shadow mapping, cause of reduced CPU stalls when waiting for an occlusion query there is a noticeable correlation between the angle and the render result. As can be observed from the plots, the speedup of the re- times for REF and FOCUS. Interestingly, there seems to be a much ceiver masks corresponds closely to their level of tightness hence weaker correlation for LiSPSM shadow mapping. The influence of the number of rendered vertices. the elevation angle on the performance of receiver masking is mini-
  • 8. TIME (ms) VIENNA, UNIFORM−1K TIME (ms) VIENNA, LISPSM−1K 35 22 IEEE Transactions on Visualization and Computer Graphics. 9, REF 30 REF 18 3, 412–431. 25 20 14 CORP., N., 2010. Opengl extension registry. 15 FOCUS 10 FOCUS http://www.opengl.org/registry, October. 10 6 ´ 5 FRAG FRAG D E CORET, X., 2005. N-buffers for efficient depth map query. 0 2 90 120 150 180 90 120 150 180 ELEVATION ANGLE (deg) ELEVATION ANGLE (deg) E NGELHARDT, T., AND DACHSBACHER , C. 2009. Granular vis- ibility queries on the gpu. In I3D ’09: Proceedings of the 2009 Figure 7: Dependence of the (average) render time on the light symposium on Interactive 3D graphics and games, ACM, New elevation angle. York, NY, USA, 161–167. G OVINDARAJU , N. K., L LOYD , B., YOON , S.-E., S UD , A., AND M ANOCHA , D. 2003. Interactive shadow generation in complex mal in either case and our method delivers a consistent performance environments. ACM Trans. Graph. 22, 3, 501–510. increase. ´ G UTHE , M., BAL AZS , A., AND K LEIN , R. 2006. Near optimal hierarchical culling: Performance driven use of hardware occlu- 7 Conclusion sion queries. In Eurographics Symposium on Rendering 2006, The Eurographics Association, T. Akenine-M¨ ller and W. Hei- o We proposed a method for efficient shadow caster culling for drich, Eds. shadow mapping. In particular, our method aims at culling all shadow casters which do not contribute to visible shadow. We de- L AURITZEN , A., S ALVI , M., AND L EFOHN , A. 2010. Sample scribed several methods to create suitable receiver masks, and in distribution shadow maps. In Advances in Real-Time Rendering particular one method which creates a practically pixel-exact mask in 3D Graphics and Games II, SIGGRAPH 2010 Courses. through reverse shadow lookups. L LOYD , B., W ENDT, J., G OVINDARAJU , N., AND M ANOCHA , D. 2004. Cc shadow volumes. In Proceedings of EUROGRAPH- Our method addresses the significant open problem of efficient con- ICS Symposium on Rendering 2004. struction of shadow maps especially in large outdoor scenes, which tend to appear more and more often in recent entertainment applica- L LOYD , D. B., T UFT, D., YOON , S.-E., AND M ANOCHA , D. tions. We demonstrate speedups of 3x-10x for large city-like scenes 2006. Warping and partitioning for low error shadow maps. In and about 1.5x-2x for a scene taken from a recent game title. The Proceedings of the Eurographics Workshop/Symposium on Ren- method is also easy to implement and integrates nicely with com- dering, EGSR, Eurographics Association, Aire-la-Ville, Switzer- mon rendering pipelines that already have a depth-prepass. land, T. Akenine-M¨ ller and W. Heidrich, Eds., 215–226. o Furthermore, we proposed a method for incremental shadow map L LOYD , D. B., G OVINDARAJU , N. K., Q UAMMEN , C., M OL - updates in the case of statically focused shadow maps, as well as NAR , S. E., AND M ANOCHA , D. 2008. Logarithmic perspective optimizations to the CHC++ occlusion culling algorithm, which are shadow maps. ACM Trans. Graph. 27, 4, 1–32. also applicable in other contexts. M ATTAUSCH , O., B ITTNER , J., AND W IMMER , M. 2008. Chc++: In the future we want to focus on using the method in the context Coherent hierarchical culling revisited. Computer Graphics Fo- of different shadow mapping techniques and to study the behavior rum (Proceedings of Eurographics 2008) 27, 3 (Apr.), 221–230. of the method in scenes with many lights. S CHERZER , D., W IMMER , M., AND P URGATHOFER , W. 2010. A survey of real-time hard shadow mapping methods. In State of Acknowledgements the Art Reports Eurographics, S TAMMINGER , M., AND D RETTAKIS , G. 2002. Perspective We would like to thank Jiˇ´ Duˇek for an early implementation of rı s shadow maps. In SIGGRAPH ’02: Proceedings of the 29th shadow map culling ideas; Jason Mitchell for the Left 4 Dead 2 annual conference on Computer graphics and interactive tech- model; Stephen Hill and Petri H¨ kkinen for feedback. This work a niques, ACM, New York, NY, USA, 557–562. has been supported by the Austrian Science Fund (FWF) contract no. P21130-N13; the Ministry of Education, Youth and Sports of W ILLIAMS , L. 1978. Casting curved shadows on curved surfaces. the Czech Republic under research programs LC-06008 (Center Computer Graphics (SIGGRAPH ’78 Proceedings) 12, 3 (Aug.), for Computer Graphics) and MEB-060906 (Kontakt OE/CZ); and 270–274. the Grant Agency of the Czech Republic under research program P202/11/1883. W IMMER , M., S CHERZER , D., AND P URGATHOFER , W. 2004. Light space perspective shadow maps. In Rendering Techniques 2004 (Proceedings Eurographics Symposium on Rendering), Eu- References rographics Association, A. Keller and H. W. Jensen, Eds., Euro- graphics, 143–151. B ITTNER , J., W IMMER , M., P IRINGER , H., AND P URGATH - OFER , W. 2004. Coherent hierarchical culling: Hardware oc- clusion queries made useful. Computer Graphics Forum 23, 3 (Sept.), 615–624. Proceedings EUROGRAPHICS 2004. B RABEC , S., A NNEN , T., AND S EIDEL , H.-P. 2002. Practical shadow mapping. Journal of Graphics Tools: JGT 7, 4, 9–18. C OHEN -O R , D., C HRYSANTHOU , Y., S ILVA , C., AND D URAND , F. 2003. A survey of visibility for walkthrough applications.