SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Memory-Aware and Efficient Ray-Casting Algorithm

        Saulo Ribeiro1 , Andr´ Maximo1, Cristiana Bentes2 , Antˆ nio Oliveira1 and Ricardo Farias1
                             e                                 o

       1                                                                2
      COPPE - Systems Engineering Program                                Department of Systems Engineering
    Federal University of Rio de Janeiro - Brazil                    State University of Rio de Janeiro - Brazil
  Cidade Universit´ ria - CT - Bloco H - 21941-972
                  a                                                 Rua S˜ o Francisco Xavier, 524 - 20550-900
                                                                          a
    {saulo, andre, oliveira, rfarias}@lcg.ufrj.br                                cris@eng.uerj.br


                         Abstract                                the traveling of a ray through out the mesh is guided by the
                                                                 connectivity of the cells of the mesh, avoiding the need of
   Ray-casting implementations require that the connectiv-       sorting the cells. The disadvantage is that cell connectivity
ity between the cells of the dataset to be explicitly com-       has to be explicitly computed and kept in memory. In other
puted and kept in memory. This constitutes a huge obsta-         words, the amount of memory used by ray-casting meth-
cle for obtaining real-time rendering for very large mod-        ods is a huge obstacle for handling very large models.
els. In this paper, we address this problem by introducing           Recent previous efforts towards improving ray-casting
a new implementation of the ray-casting algorithm for ir-        performance have focused on reducing the execution time
regular datasets. Our implementation optimizes the mem-          by exploiting hardware graphics, or GPUs (Graphic Pro-
ory usage of past implementations by exploring ray coher-        cessing Unit). This approach, however, only provides in-
ence. The idea is to keep in main memory the information of      teractive rendering time when the whole dataset fits into
the faces traversed by the ray cast through every pixel un-      the GPU memory. Once the data size exceeds the onboard
der the projection of a visible face. Our results show that      memory, expensive transfers from main memory to the GPU
exploring pixel coherence reduces considerably the mem-          have great impact on the rendering speed.
ory usage, while keeping the performance of our algorithm            Therefore, the problem of rendering large datasets (that
competitive with the fastest previous ones.                      do not fit in GPU memory) must be addressed by soft-
                                                                 ware solutions, due to the inherent larger memory capac-
                                                                 ity [6, 11, 7, 12]. Only a few software solutions, however,
                                                                 deal with irregular meshes. Garrity [5] proposed the first
1. Introduction                                                  method for ray-casting irregular meshes using the connec-
                                                                 tivity of cells. Bunyk et al. [2] later improved Garrity’s
   Direct volume rendering has become a popular technique        work providing a faster algorithm. Pina et al. [13] improved
for visualizing volumetric data from sources such as scien-      Bunyk approach in: memory consumption; completely han-
tific simulations, analytic functions, and medical scanners,      dling degenerate cases; and dealing with both tetrahedral
such as MRI, CT, among others. The main advantage of di-         and/or hexahedral meshes. By using new data structures,
rect volume rendering is to allow the investigation of the in-   Pina et al. reduced significantly the memory requirements
terior of the data volume, exhibiting its inner structures.      of the Bunyk approach, but the algorithms proposed were
   Several algorithms and methods have been proposed for         still slower than Bunyk.
direct volume rendering, the most popular one is the ray-            In this work, we propose a novel ray-casting algorithm
casting algorithm. In this algorithm, a ray is cast through      based on Pina et al. approaches, reducing even more the
each pixel of the image from the viewpoint. The trace of         memory consumption and improving the execution time.
the ray determines which cells of the volume each ray inter-     Our approach is based on the fact that the decision of how
sects. Every pair of intersections is used to compute the cell   to store the face information is the key for memory con-
contribution for the pixel color and opacity. The ray stops      sumption and execution time. The more information is kept
when it reaches full opacity or when it leaves the volume.       in memory, the faster the algorithm computes the next in-
   Compared to other direct volume rendering methods, the        tersection of the ray. In this work, we tackle this tradeoff
great advantages of ray-casting methods are: the computa-        by exploring ray coherence in order to maintain in mem-
tion for each pixel is independent of all other pixels; and      ory only the face information of the most recent traversals.
Our idea is to use each visible face computed in the pre-         entry point by rendering front faces, and then traverse
processing step to guide the creation and destruction of in-      through cells using the fragment program by storing the
ternal faces data in memory. Our algorithm, called Visible        cells and connectivity graph in textures. Their method, how-
Faces Ray-casting (VF-Ray), obtained consistent and sig-          ever, work only on convex unstructured data. Bernardon
nificant gains in memory usage over previous approaches.           et al. [1] improve Weiler’s approach, by using Bunyk’s
We also reduced the execution time of Pina et al. approaches      algorithm as the basis for the hardware implementation
by making a better use of the cache. When compared to the         and depth peeling to correctly render non-convex irregu-
fast and memory expensive approach by Bunyk et al., we            lar meshes. Espinha and Celes [3] propose an improvement
obtained comparable performance, spending only from 1/3           in Weiler’s work. They use partial pre-integration and pro-
to 1/6 of the memory.                                             vide interactive modifications of the transfer function. They
   The remainder of this paper is organized as follows. In        also use an alternative data structure, but as their work is
the next section we discuss direct volume rendering of ir-        based on Weiler’s work, the memory consumption is still
regular meshes. Section 3 briefly describes the Bunyk’s and        high. Recently, Weiler el al. [17] proposed an improvement
Pina’s approaches. Section 4 describes our ray-casting al-        over their previous work, where they deal with the prob-
gorithm and the improvements we made over previous ap-            lem of storing the whole dataset in GPU memory by using
proaches. In section 5, we present the results of our most        a compressed form, tetrahedral strips.
important experiments. Finally, in section 7, we present our          In terms of other direct volume rendering algorithms dif-
conclusions and proposals for future work.                        ferent from ray-casting, it is important to mention the cell
                                                                  projection approach. In this approach each polyhedral cell
                                                                  of the volumetric data is projected onto the screen, avoid-
2. Related Work                                                   ing the need of maintaining the data connectivity into the
                                                                  memory, but requiring the cells to be first sorted in visibil-
    Within the research area of accelerating ray-casting
                                                                  ity ordering and then composed to generate their color and
methods, two main research streams have emerged: soft-
                                                                  opacity in the final image. In this scope, there are some soft-
ware approaches and graphics hardware approaches.
                                                                  ware implementations, e.g., [10] and [4], that provide flex-
The very first implementation of the ray-casting algo-
                                                                  ibility and easy parallelization. The GPU implementations
rithm for rendering irregular meshes was proposed in
                                                                  of projection algorithms, however, are more popular [16, 9].
software by Garrity in [5]. His approach used the connec-
tivity of cells to compute the entry and exit of each ray. This
work was further improved by Bunyk et al. [2], by com-            3. Previous Approaches
puting for each pixel a list of intersections on visible
faces, and easily determining the correct order of the en-            In this section we briefly describe the ray-casting algo-
try points for the ray. The rendering process follows             rithm proposed by Bunyk et al. in [2], that we simply call
Garrity’s method, but when a ray exits the mesh, the al-          Bunyk, and the ones proposed by Pina et al. in [13], called
gorithm can easily determine in which cell the ray                ME-Ray (Memory Efficient Ray-cast) and EME-Ray (En-
will re-enter the mesh. This approach becomes sim-                hanced Memory Efficient Ray-cast).
pler and more efficient than Garrity has proposed, however             The data structures used by these algorithms to store the
it keeps some large auxiliary data structures. The re-            face data guided their memory usage and performance. The
cent work by Pina et al. [13] proposes two new ray-casting        face data are stored in a structure that we call face and cor-
algorithms, ME-Raycast and EME-Raycast, that pre-                 respond to the information about the geometry and the pa-
sented consistent and significant gains in memory usage            rameters of the face. The geometry represents the coordi-
and in the correctness of the final image over Bunyk’s ap-         nates of the points that define the face. The parameters are
proach. Their algorithms, however, are slower than Bunyk.         costly to compute and represent the constants for the equa-
    In terms of the acceleration techniques used in this work,    tion of the plane defined by the face, that will be used to
ray coherence has been used earlier for accelerating ray          compute the intersection between the ray and the face, and
tracing of traditional surface models [14, 15], and later used    the parameters are used to interpolate the scalar value in-
in ray-casting for skiping over empty spaces [8].                 side the face. The whole face data is the most consuming
    With the advent of programmable graphics hardware,            data structure in ray-casting.
several volume rendering algorithms have been developed
taking advantage of this feature to achieve high perfor-          3.1. Bunyk
mance. Graphics hardware based solutions usually provide
real-time performance and high quality images. Weiler et             In a preprocessing step, all points and tetrahedra of the
al. [16] present a hardware-based ray-casting algorithm           input dataset are read, a list of all vertices and a list of cells
based on the work of Garrity. They find the initial ray            are created. In addition, for each input tetrahedron, all four
of its faces are kept in a list. Bunyk also maintains, for           data structures allow them to deal with all degenerate sit-
each vertex, a list of all faces that use that vertex, called        uations that Bunyk is not able to handle.
referredBy list. After reading the dataset, Bunyk deter-
mines which faces belong to the visible side of the scene            4. Visible Face Driven Ray-Casting
boundary, the visible faces. The algorithm projects these
visible faces on the screen, creating for each pixel, a list of          Comparing the three ray-casting approaches described in
the intersections, that will be used as the entry point of its       the previous section, we can observe that the more faces
ray. After the entry points are stored, the actual ray-casting       data are kept in memory, the faster the algorithm computes
begins. For every pixel, the first intersection is its entry point    the next intersection of the ray. For this reason, Bunyk is
into the data, and each next intersection is obtained by in-         still the fastest software implementation of ray-casting.
specting the intersection between the ray and all other faces            Our algorithm, called Visible Face Driven Ray-Casting –
of the current cell. Each pair of intersections are used to          VF-Ray, addresses the problem of maintaining information
compute the contribution of the cell for the color and opac-         about the faces in memory, minimally degrading the render-
ity of the pixel. The searching for the next intersection in         ing performance. The basic idea behind VF-Ray is to keep
Bunyk is quite fast, since it is performed among the other           in main memory only the faces of the most recent traver-
three faces of the tetrahedron. However, the list of all faces       sals. To do so, we explore ray coherence, using the visi-
and the referredBy lists result in a very high memory                ble face information to guide the creation and destruction
consumption.                                                         of faces in memory.

3.2. ME-Ray                                                          4.1. Exploring Ray Coherence
    ME-Ray starts in a preprocessing step, creating a list of           VF-Ray is based on ray coherence, which comes from
all vertices, a list of all cells, for each cell the list of cells   the fact that the set of faces intersected by two rays, cast
that share a face with it, and the Use Set list for each ver-        from neighboring pixels, will contain almost the same faces.
tex. The Use Set lists substitute the referredBy lists               Our algorithm takes advantage of ray coherence by keeping
used in Bunyk, each Use Set contains a list of all cells in-         in memory only the faces of nearby rays.
cident on the vertex. In the rendering algorithm, ME-Ray                One important issue in exploring ray coherence is the de-
creates a list of visible faces, and determine the entry point       termination of the set of rays that are likely to have the same
of each ray, in the same way as Bunyk. The algorithm pro-            set of intersected faces. We use the visible faces informa-
ceeds looking for the next intersection, by scanning through         tion to do so. The set of rays that may reuse the same list
out the neighbor cells. ME-Ray creates on demand a list              of faces are the ones that correspond to the set of pixels un-
of faces, as the face is intersected by the ray. Faces that          der the projection of a certain visible face. This set of pixels
are never intersected by any ray will not be created. After          will be called here visible set. Figure 1 shows a 2D exam-
each pair of intersection is found, the contribution to the fi-       ple of this idea. The rays that are cast through the visible
nal color and opacity is computed. ME-Ray can save about             face, presented in the figure, tend to intersect the same in-
40% of the memory used by Bunyk, but is slower, as it has            ternal faces. Therefore, the internal faces are created and
to create the faces for the first time during the rendering.          stored once, for the first ray intersection. After that, they are
                                                                     reused until all the pixels of the visible set are computed.
3.3. EME-Ray                                                         This process is repeated for all visible faces. To guarantee
                                                                     correctness, all visible faces are ordered in depth order.
    EME-Ray was developed as an optimization of ME-Ray
in terms of memory usage. They have similar behavior.                                  screen
However, EME-Ray does not store the faces in memory,
since it was one of the most memory expensive structures in          Visible Face
                                                                      Projection
ME-Ray. Therefore, EME-Ray has to recalculate the faces                                               Internal Faces
parameters for the computation of the intersections every
time a new face is found. With this optimization, EME-Ray
uses only about 1/4 of the memory used by Bunyk. The sig-
nificant reductions in memory usage comes with an increase
in the execution time, EME-Ray usually doubles Bunyk ex-
ecution time. It is also important to notice that ME-Ray and
                                                                              Figure 1. Visible face rays coherence.
EME-Ray gains over Bunyk are not only in memory us-
age, but also in the correctness of the final image. Their
Our results show that exploiting pixel coherence reduces         ated and inserted into computedFaces list. Otherwise,
considerably the memory usage, while keeping the VF-                the f acej data are loaded from the list. The insertion and
Ray performance competitive with the fastest previous al-           loading operations in the list are done in constant time, since
gorithms because the reuse of face data improves cache per-         in the cell we store the index for the position of each face
formance. In fact, for all datasets we tested, approximately        in the list. The computation of the intersections is done us-
more than half of the faces are created at most two times for       ing the face parameters computed when the face is created.
medium size images.                                                 For each intersection, the scalar value is interpolated be-
                                                                    tween the three vertices of each face, which are used in the
4.2. Data structures                                                illumination integral proposed by Max [10].

    The data structures used by VF-Ray are similar to that
of EME-Ray: an array of vertices, an array of cells, and                 For each visible face vf acei do
the Use Set of each vertex. Besides these structures, VF-                 Project vf acei ; // Determine visible set
Ray also has a list, called computedFaces that holds the                  For each pixel p of visible set do
                                                                             Intersect vf acei ; // Interpolate scalar value
internal faces throughout the model. Note that this list is
                                                                            Do
cleaned up after the ray-casting computation of all pixels of
                                                                              Find next internal face f acej ; // Intersection
a visible set. In this way, we don’t need to store all the in-                If f acej not in computedF aces then
ternal faces, as done by Bunyk and ME-Ray algorithms.                            Create f acej and Insert in computedF aces;
    Furthermore, just like ME-Ray and EME-Ray, our algo-                      else
rithm can handle any type of convex cell. For tetrahedral                        Load f acej from computedF aces;
meshes, the intersections are computed between the line de-                   Ray integrate; // Optical model
fined by the ray path and the plane defined by the three                      While exist next face f acej ;
vertices of a triangular face. For hexahedral meshes, where               End For
each face is quadrangular, the intersection is found by split-            Clear computedF aces; // Clear list
ting these quadrangular faces into two triangular faces, and             End For
the same calculation is performed for each one.
                                                                           Figure 2. The pseudo-code of VF-Ray.
4.3. Handling Degeneracies

   The degenerate situations, which can be found during
the ray traversal, are treated in the same way that was per-
formed in Pina et al. [13], by investigating the Use set of
each vertex. So, when a ray hits a vertex or an edge, the al-       5. Results
gorithm can determine the next intersection correctly.
                                                                        In this section we evaluate the performance and memory
4.4. Algorithm                                                      usage of VF-Ray when compared to ME-Ray, EME-Ray
                                                                    and Bunyk. We also show a comparison between VF-Ray
   VF-Ray algorithm starts by reading the input dataset and         and a cell projection algorithm, called ZSweep [4]. This last
creating three lists: a list of vertices, a list of cells and the   experiment was done in order to put our results in perspec-
Use Set list for each vertex. In the rendering process, VF-         tive, with respect to other rendering paradigm. The main
Ray creates a list of all visible faces, that are the faces whose   idea of ZSweep algorithm is the sweeping of the data with a
normal makes an angle greater than 90o with the viewing             plane parallel to the viewing plane XY , towards the pos-
direction. For each visible face, the algorithm follows the         itive z direction. The sweeping process is performed by
steps presented in the pseudo-code of Figure 2.                     ordering the vertices by their increasing z coordinate val-
   Initially, the visible face, vf acei , is projected onto the     ues, and then retrieving one by one from this data struc-
screen, generating the visible set of vf acei . For each pixel      ture. For each vertex swept by the plane sweep, the algo-
p in the visible set, the algorithm first computes the intersec-     rithm projects, onto the screen, all faces that are incident to
tion between the ray cast from p and vf acei . After this first      it. During face projection, ZSweep stores a list of intersec-
intersection is found, the ray traversal begins, when the al-       tions for each pixel and uses delayed compositing. The pix-
gorithm computes the intersections of the ray with the inter-       els lists intersections are composed and the lists are flushed
nal faces. The next face intersected, f acej , is found on an-      each time the plane reaches the target-z. The target-z is the
other face of the current cell. If f acej was not created be-       maximum z-coordinate among the vertices adjacent to the
fore (this is the first intersection in f acej ), the face is cre-   vertex intersected by the sweeping plane.
5.1. Testing Configuration                                      ME-Ray uses from 1.5 to 6 times more memory than VF-
                                                               Ray. For EME-Ray, we notice an interesting result. Al-
    The VF-Ray algorithm was written in C++ ANSI with-         though EME-Ray does not store the faces in memory, for
out using any particular graphics libraries. Our experiments   high resolution images, bigger than 1024 × 1024, EME-
were conducted on a Intel Pentium IV 3.6 GHz with 2 GB         Ray uses more memory than VF-Ray. This result can be
RAM and 1MB of L2 cache, running Linux Fedora Core 5.          explained by the fact that EME-Ray, just like Bunyk and
    We have used four different tetrahedral datasets: Blunt    ME-Ray, maintains for each pixel a list containing the entry
Fin, SPX Liquid Oxygen Post, and Delta Wing. Table 1           faces for the pixel. The number of lists increases with the in-
shows the number of vertices, faces, boundary faces and        crease in image resolution. VF-Ray, on the other hand, does
cells for each dataset. We varied the image sizes, from        not require this kind of list, since it computes the entry point
512 × 512 to 8192 × 8192 pixels. The models rendered with      of each ray on demand, for each visible face.
VF-Ray are shown in Figures 3 to 6. The images generated          When compared to ZSweep, we can observe that except
by VF-Ray, ME-Ray, and EME-Ray are identical, and bet-         for SPX with a 512 × 512 image, VF-Ray uses much less
ter than the images generated by Bunyk. There are two rea-     memory than ZSweep. For the higher resolution images,
sons for this difference: VF-Ray, ME-Ray, and EME-Ray          ZSweep can use 8 or 9 times more memory than VF-Ray.
use double precision variables to compute ray-faces inter-     To render SPX with a 8192 × 8192 image, for example,
sections, while Bunyk used float precision, causing artifacts   ZSweep spends about 2GB of memory. The ZSweep mem-
in the images, and Bunyk does not handle correctly all de-     ory consumption is directly proportional to the final image
generate cases, generating some incorret pixels.               resolution, since it creates pixels lists to store the intersec-
                                                               tions found for each pixel.
 Dataset     # Verts   # Faces    # Tets    # Boundary
  Blunt       41 K      381 K     187 K        13 K            5.3. Rendering Performance
  SPX         149 K     1.6 M     827 K        44 K
                                                                  Table 3 shows the rendering time (in seconds) of VF-Ray
  Post        109 K      1M       513 K        27 K
                                                               for Blunt, SPX, Post and Delta, and the relative timing re-
  Delta       211 K      2M        1M          41 K
                                                               sults of ME-Ray, EME-Ray, Bunyk and ZSweep when com-
                Table 1. Dataset sizes.                        pared to VF-Ray execution. Just like the previous table, the
                                                               percentages presented in this table correspond to the ratio
                                                               of the other algorithms execution time over VF-Ray execu-
                                                               tion time. The time results correspond to the rendering of
5.2. Memory Consumption                                        one frame and do not include preprocessing time.
                                                                  As we can observe in Table 3, VF-Ray outperforms ME-
    Table 2 shows the memory used by VF-Ray to render          Ray, EME-Ray and ZSweep for all datasets and all image
one frame (in MBytes) for Blunt, SPX, Post and Delta, and      resolutions. For Blunt, Post and Delta, VF-Ray is about 3
the relative memory usage results of ME-Ray, EME-Ray,          times faster than EME-Ray and ZSweep, and uses about
Bunyk and ZSweep when compared to VF-Ray usage. The            3/4 of the time used by ME-Ray to render the image. For
percentages presented in this table correspond to the ratio    SPX, the gains of VF-Ray against EME-Ray and ZSweep
of the other algorithms memory usage over VF-Ray mem-          are smaller, but still high. When compared to Bunyk, for
ory consumption. In other words, we consider VF-Ray re-        Blunt and Delta, VF-Ray presents almost the same execu-
sults as 100% and are presenting how much the other algo-      tion time, since the difference is only about 10%. For SPX
rithms increase or decrease this result.                       and Post, VF-Ray is only about 22% slower than Bunyk.
    As we can observe in Table 2, in most of the cases,           Observing the increase in the rendering time of VF-Ray
VF-Ray uses less memory than the three other algorithms,       for different image resolutions, we notice that as the im-
and its memory usage increases linearly with the resolution.   age resolution increases, the rendering time increases in the
When compared to Bunyk, VF-Ray uses from 1/3 to 1/6 of         same proportion. The gains over the other algorithms are al-
the memory spent by Bunyk. These gains are impressive.         most the same for different resolutions.
For Blunt, for example, VF-Ray renders a 4096 × 4096 im-
age using the same amount of memory that Bunyk uses to         6. Discussion
render a 512 × 512 image. For the largest dataset, Delta,
Bunyk uses more memory to render a 512 × 512 image than           VF-Ray had obtained consistent and significant gains in
VF-Ray uses to render a 8192 × 8192 image.                     memory usage over Bunyk, providing almost the same per-
    When compared with the two memory-aware ray-casting        formance (the difference on their execution time is only
algorithms, ME-Ray and EME-Ray, we can observe that            about 16%). As the image resolution increases, our gains
Dataset     Resolution     Memory (MB)       ME-Ray    EME-Ray       Bunyk     ZSweep
                                512 × 512          14             404%      166%         528%       208%
                               1024 × 1024         30             240%      129%         297%       177%
                     Blunt     2048 × 2048         39             333%      251%         377%       369%
                               4096 × 4096         75             495%      451%         517%       689%
                               8192 × 8192        219             610%      655%         617%       917%
                                512 × 512          96             215%       74%         299%        87%
                               1024 × 1024         99             234%       88%         305%       107%
                      SPX      2048 × 2148        108             271%      141%         337%       188%
                               4096 × 4096        144             383%      284%         428%       407%
                               8192 × 8192        288             533%      498%         569%       711%
                                512 × 512          63             165%       74%         290%        97%
                               1024 × 1024         66             203%       95%         301%       129%
                      Post     2048 × 2048         74             281%      172%         353%       238%
                               4096 × 4096        110             424%      349%         470%       499%
                               8192 × 8192        256             601%      560%         603%       800%
                                512 × 512         116             167%       74%         303%        97%
                               1024 × 1024        119             200%       84%         308%       116%
                     Delta     2048 × 2048        129             246%      124%         330%       177%
                               4096 × 4096        165             346%      247%         403%       375%
                               8192 × 8192        307             500%      467%         534%       704%

      Table 2. Memory usage results for VF-Ray compared to ME-Ray, EME-Ray, Bunyk and ZSweep.

in memory usage over Bunyk are even more pronounced,              execution time come from the good cache performance of
but the difference in the rendering time remains the same.        VF-Ray. We have made some experiments to evaluate the
For 8192 × 8192 images, Bunyk uses about 6 times more             cache misses of VF-Ray and ME-Ray and obtained for VF-
memory than VF-Ray with a performance difference of               Ray a cache miss ratio very low, near 0%, while the cache
only about 18%. In our experiments, however, we are only          miss ratio of ME-Ray is about 1.9% for modest size im-
comparing executions where the whole dataset fits in main          ages. This result was obtained because our algorithm reuse
memory for all algorithms. Our experimental platform has          the faces for the rays in the same visible face. Given the fact
2 GB of main memory that can store all the data structures        that rays are shot one after the other, the closer the neigh-
used by our workload. As the memory usage increases, the          boring rays are to each other, the higher the probability is
rendering will need to use the virtual memory mechanisms          that they reuse the cached data.
of the operating system, which would have great influence              The comparison of VF-Ray with the cell projection ap-
on the overall execution time.                                    proach, implemented by ZSweep, showed that VF-Ray is
   The small increase in the execution time, when compared        faster and spends less memory than ZSweep. The gains
to Bunyk, comes from the fact that Bunyk computes all in-         are huge. For the Delta dataset, ZSweep uses 7 times more
ternal faces in the preprocessing step, before the ray-casting    memory, running about 3.5 times slower than VF-Ray. The
loop, while VF-Ray does it on demand, as the face is inter-       pixel list maintained by the ZSweep approach increases
sected for the first time. In addition, in VF-Ray, if a face is    while a target-Z was not reached. As this list grows, the
intersected by rays of two different visible faces, it has its    insertion is more expensive, since the pixel list must be or-
data computed twice.                                              dered. On the other hand, the data structures on the VF-Ray
   The comparison of VF-Ray with ME-Ray and EME-                  algorithm does not depend on ordering. The difference in
Ray provided interesting results. ME-Ray and EME-Ray              memory usage of ZSweep is due to the increase in these
are memory-aware algorithms designed to reduce the great          pixels lists. For ZSweep, as the image size increases, each
memory usage of Bunyk. VF-Ray was designed to reduce              face projected will insert intersection units into more pixel
ME-Ray memory usage, close to the usage of EME-Ray, but           lists. If the target-Z is not appropriate, the pixels lists will
running much faster than EME-Ray. Our results showed,             increase considerably.
however, that for high resolution images, our algorithm is           In terms of the dataset characteristics, the performance
not only faster than ME-Ray but uses less memory than             of VF-Ray increases as the number of pixels in each vis-
EME-Ray. The gains of VF-Ray over EME-Ray in mem-                 ible face increases. Datasets with a small number of faces
ory usage come from the elimination of the lists of entry         have a greater amount of pixels per visible face. Therefore,
faces for each pixel. The gains of VF-Ray over ME-Ray in          datasets, like Blunt, that has a small number of faces, tend
Dataset     Resolution    Time (sec)      ME-Ray    EME-Ray     Bunyk    ZSweep
                                  512 × 512         1.9          139%      296%       105%      253%
                                 1024 × 1024        7.0          143%      317%        94%      431%
                       Blunt     2048 × 2048       27.2          146%      337%        88%      481%
                                 4096 × 4096      107.4          147%      328%        88%      516%
                                 8192 × 8192      426.7          154%      341%        91%      382%
                                  512 × 512         4.1          111%      161%        76%      287%
                                 1024 × 1024       13.0          103%      203%        81%      202%
                       SPX       2048 × 2048       46.6          103%      225%        83%      219%
                                 4096 × 4096      177.1          105%      234%        82%      226%
                                 8192 × 8192      696.3          105%      238%        81%      260%
                                  512 × 512         5.0          138%      251%        80%      201%
                                 1024 × 1024       19.5          136%      254%        76%      167%
                       Post      2048 × 2048       78.6          133%      258%        74%      194%
                                 4096 × 4096      309.9          135%      257%        74%      227%
                                 8192 × 8192      1246.3         135%      263%        72%      246%
                                  512 × 512         3.1          130%      242%        91%      465%
                                 1024 × 1024       11.2          122%      270%        88%      271%
                       Delta     2048 × 2048       41.9          121%      280%        87%      312%
                                 4096 × 4096      162.7          120%      290%        84%      341%
                                 8192 × 8192      640.3          123%      290%        83%      369%

            Table 3. Time results for VF-Ray compared to ME-Ray, EME-Ray, Bunyk and ZSweep.

to present better performance results for VF-Ray.
   Finally, it is important to keep in mind that the reduc-
tions in memory usage will allow the implementation of the
algorithm in the graphics hardware. Furthermore, these re-
ductions will also allow the use of double precision in the
parameters to calculate the intersection between a ray and a
face, raising the chance to obtain more precise images.




                                                                         Figure 4. SPX image generated by VF-Ray.


                                                                   dral or hexahedral datasets, including non-convex and dis-
                                                                   connected meshes even with holes.
                                                                      Our idea was to explore ray coherence in order to reduce
                                                                   the memory requirements for the storing of faces data while
                                                                   achieving a good cache performance. We compared VF-Ray
    Figure 3. Blunt image generated by VF-Ray.
                                                                   memory usage and performance with previous ray-casting
                                                                   approaches, and with a cell projection algorithm. When
                                                                   compared to the fastest and memory expensive approach, by
                                                                   Bunyk, VF-Ray obtained comparable performance, spend-
7. Conclusions                                                     ing only from 1/3 to 1/6 of the memory. When compared
                                                                   to the memory-aware Pina et al algorithms, VF-Ray ob-
   In this work, we have proposed a novel memory-aware             tained consistent and significant gains in memory usage and
and efficient software ray-casting algorithm for volume ren-        in execution time. When compared to the cell projection ap-
dering, called VF-Ray. The algorithm is suitable for tetrahe-      proach, VF-Ray also reduced significantly the memory con-
[2] P. Bunyk, A. Kaufman, and C. Silva. Simple, fast, and ro-
                                                                       bust ray casting of irregular grids. Advances in Volume Visu-
                                                                       alization, ACM SIGGRAPH, 1(24), July 1998.
                                                                   [3] R. Espinha and W. Celes. High-quality hardware-based ray-
                                                                       casting volume rendering using partial pre-integration. In
                                                                       SIBGRAPI ’05: Proc. of the XVIII Brazilian Symposium on
                                                                       Computer Graphics and Image Processing, October 2005.
                                                                   [4] R. Farias, J. Mitchell, and C. Silva. Zsweep: An efficient and
                                                                       exact projection algorithm for unstructured volume render-
                                                                       ing. In 2000 Volume Visualization Symposium, pages 91 –
                                                                       99, October 2000.
                                                                   [5] M. P. Garrity. Raytracing irregular volume data. In VVS ’90:
                                                                       Proceedings of the 1990 workshop on Volume visualization,
                                                                       pages 35–40. ACM Press, 1990.
    Figure 5. Post image generated by VF-Ray.                      [6] L. Hong and A. Kaufman. Accelerated ray-casting for curvi-
                                                                       linear volumes. In VIS ’98: Proceedings of the conference
                                                                       on Visualization ’98, pages 247–253, 1998.
                                                                   [7] K. Koyamada. Fast ray-casting for irregular volumes. In
                                                                       ISHPC ’00: Proc. of the Third International Symposium on
                                                                       High Performance Computing, pages 557–572, 2000.
                                                                   [8] S. Lakare and A. Kaufman. Light weight space leaping us-
                                                                       ing ray coherence. In Proc. IEEE Visualization 2004, pages
                                                                       19–26, 2004.
                                                                   [9] R. Marroquim, A. Maximo, R. Farias, and C. Esperanca.
                                                                       Gpu-based cell projection for interactive volume rendering.
                                                                       In SIBGRAPI ’06: Proc. of the Brazilian Symposium on
                                                                       Computer Graphics and Image Processing, October 2006.
                                                                  [10] N. Max. Optical models for direct volume rendering.
                                                                       IEEE Transactions on Visualization and Computer Graph-
                                                                       ics, 1(2):99–108, 1995.
                                                                  [11] M. Meibner, J. Huang, D. Bartz, K. Mueller, and R. Craw-
    Figure 6. Delta image generated by VF-Ray.                         fis. A practical evaluation of popular volume rendering algo-
                                                                       rithms. In VVS ’00: Proceedings of the 2000 IEEE sympo-
                                                                       sium on Volume visualization, pages 81–90, 2000.
sumption and the execution time.                                  [12] A. Neubauer, L. Mroz, H. Hauser, and R. Wegenkittl. Cell-
   The reductions in memory usage makes VF-Ray suitable                based first-hit ray casting. In VISSYM ’02: Proceedings of the
for using double precision in the parameters to calculate the          symposium on Data Visualisation 2002, pages 77–ff, 2002.
intersection between a ray and a face. Using double preci-        [13] A. Pina, C. Bentes, and R. Farias. Memory efficient and ro-
sion avoids the artifacts that appear when float precision is           bust software implementation of the raycast algorithm. In
used. We conclude that VF-Ray is an efficient ray-casting               WSCG’07: The 15th Int. Conf. in Central Europe on Com-
                                                                       puter Graphics, Visualization and Computer Vision, 2007.
algorithm that allows the in-core rendering of big datasets.
                                                                  [14] A. Reshetov, A. Soupikov, and J. Hurley. Multi-level ray
As future work, we intend to optimize VF-Ray by keeping
                                                                       tracing algorithm. In SIGGRAPH ’05: ACM SIGGRAPH
face data for neighboring visible faces, and to parallelize
                                                                       2005 Papers, pages 1176–1185, 2005.
VF-Ray in order to run on a cluster of PCs                        [15] I. Wald, P. Slusallek, C. Benthin, and M. Wagner. nterac-
                                                                       tive rendering with coherent ray tracing. In Eurographics 01
8. Acknowledgments                                                     Proceedings, pages 153–164, 2001.
                                                                  [16] M. Weiler, M. Kraus, M. Merz, and T. Ertl. Hardware-based
   We acknowledge the grant of the first and second authors             ray casting for tetrahedral meshes. In Proceedings of the 14th
provided by Brazilian agencies CAPES and CNPq.                         IEEE conference on Visualization ’03, pages 333–340, 2003.
                                                                  [17] M. Weiler, P. Mall´ n, M. Kraus, and T. Ertl. Texture-encoded
                                                                                           o
                                                                       tetrahedral strips. In 2004 IEEE Symposium on Volume Visu-
References                                                             alization and Graphics (VolVis 2004), pages 71–78, October
                                                                       2004.
 [1] F. Bernardon, C. Pagot, J. Comba, and C. Silva. Gpu-based
     tiled ray casting using depth peeling. Journal of Graphics
     Tools, To appear.

Weitere ähnliche Inhalte

Was ist angesagt?

COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGES
COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGESCOMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGES
COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGESsipij
 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachIAEME Publication
 
Emerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechEmerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechPetteriTeikariPhD
 
Sparse Sampling in Digital Image Processing
Sparse Sampling in Digital Image ProcessingSparse Sampling in Digital Image Processing
Sparse Sampling in Digital Image ProcessingEswar Publications
 
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONTRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONijaia
 

Was ist angesagt? (13)

COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGES
COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGESCOMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGES
COMPRESSION ALGORITHM SELECTION FOR MULTISPECTRAL MASTCAM IMAGES
 
Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...Development of 3D convolutional neural network to recognize human activities ...
Development of 3D convolutional neural network to recognize human activities ...
 
L0351007379
L0351007379L0351007379
L0351007379
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
M017427985
M017427985M017427985
M017427985
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approach
 
Jl2516751681
Jl2516751681Jl2516751681
Jl2516751681
 
Emerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTechEmerging 3D Scanning Technologies for PropTech
Emerging 3D Scanning Technologies for PropTech
 
I017425763
I017425763I017425763
I017425763
 
Geometric Deep Learning
Geometric Deep Learning Geometric Deep Learning
Geometric Deep Learning
 
Sparse Sampling in Digital Image Processing
Sparse Sampling in Digital Image ProcessingSparse Sampling in Digital Image Processing
Sparse Sampling in Digital Image Processing
 
92 97
92 9792 97
92 97
 
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITIONTRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
TRANSFER LEARNING WITH CONVOLUTIONAL NEURAL NETWORKS FOR IRIS RECOGNITION
 

Andere mochten auch

Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 

Andere mochten auch (6)

Cudaray
CudarayCudaray
Cudaray
 
Saulo sib04
Saulo sib04Saulo sib04
Saulo sib04
 
Loop Snakes
Loop SnakesLoop Snakes
Loop Snakes
 
Loop snakesiv05
Loop snakesiv05Loop snakesiv05
Loop snakesiv05
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 

Ähnlich wie Ribeiro visfaces07

Analog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationAnalog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationsipij
 
Analog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationAnalog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationsipij
 
Analog signal processing solution
Analog signal processing solutionAnalog signal processing solution
Analog signal processing solutioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Hưng Đặng
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Networkgerogepatton
 
s41598-023-28094-1.pdf
s41598-023-28094-1.pdfs41598-023-28094-1.pdf
s41598-023-28094-1.pdfarchurssu
 
06-08 ppt.pptx
06-08 ppt.pptx06-08 ppt.pptx
06-08 ppt.pptxFarah Naaz
 
IRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
IRJET- A Survey on Medical Image Interpretation for Predicting PneumoniaIRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
IRJET- A Survey on Medical Image Interpretation for Predicting PneumoniaIRJET Journal
 
Poster Segmentation Chain
Poster Segmentation ChainPoster Segmentation Chain
Poster Segmentation ChainRMwebsite
 
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...sipij
 
IRJET- Exploring Image Super Resolution Techniques
IRJET- Exploring Image Super Resolution TechniquesIRJET- Exploring Image Super Resolution Techniques
IRJET- Exploring Image Super Resolution TechniquesIRJET Journal
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC IJECEIAES
 
Thesis on Image compression by Manish Myst
Thesis on Image compression by Manish MystThesis on Image compression by Manish Myst
Thesis on Image compression by Manish MystManish Myst
 

Ähnlich wie Ribeiro visfaces07 (20)

Analog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationAnalog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimation
 
Analog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimationAnalog signal processing approach for coarse and fine depth estimation
Analog signal processing approach for coarse and fine depth estimation
 
Analog signal processing solution
Analog signal processing solutionAnalog signal processing solution
Analog signal processing solution
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...Image compression and reconstruction using a new approach by artificial neura...
Image compression and reconstruction using a new approach by artificial neura...
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction NetworkEDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
 
s41598-023-28094-1.pdf
s41598-023-28094-1.pdfs41598-023-28094-1.pdf
s41598-023-28094-1.pdf
 
V01 i010405
V01 i010405V01 i010405
V01 i010405
 
06-08 ppt.pptx
06-08 ppt.pptx06-08 ppt.pptx
06-08 ppt.pptx
 
Y4301127131
Y4301127131Y4301127131
Y4301127131
 
IRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
IRJET- A Survey on Medical Image Interpretation for Predicting PneumoniaIRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
IRJET- A Survey on Medical Image Interpretation for Predicting Pneumonia
 
Poster Segmentation Chain
Poster Segmentation ChainPoster Segmentation Chain
Poster Segmentation Chain
 
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
 
Gi3511181122
Gi3511181122Gi3511181122
Gi3511181122
 
IRJET- Exploring Image Super Resolution Techniques
IRJET- Exploring Image Super Resolution TechniquesIRJET- Exploring Image Super Resolution Techniques
IRJET- Exploring Image Super Resolution Techniques
 
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
 
Thesis on Image compression by Manish Myst
Thesis on Image compression by Manish MystThesis on Image compression by Manish Myst
Thesis on Image compression by Manish Myst
 

Kürzlich hochgeladen

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Ribeiro visfaces07

  • 1. Memory-Aware and Efficient Ray-Casting Algorithm Saulo Ribeiro1 , Andr´ Maximo1, Cristiana Bentes2 , Antˆ nio Oliveira1 and Ricardo Farias1 e o 1 2 COPPE - Systems Engineering Program Department of Systems Engineering Federal University of Rio de Janeiro - Brazil State University of Rio de Janeiro - Brazil Cidade Universit´ ria - CT - Bloco H - 21941-972 a Rua S˜ o Francisco Xavier, 524 - 20550-900 a {saulo, andre, oliveira, rfarias}@lcg.ufrj.br cris@eng.uerj.br Abstract the traveling of a ray through out the mesh is guided by the connectivity of the cells of the mesh, avoiding the need of Ray-casting implementations require that the connectiv- sorting the cells. The disadvantage is that cell connectivity ity between the cells of the dataset to be explicitly com- has to be explicitly computed and kept in memory. In other puted and kept in memory. This constitutes a huge obsta- words, the amount of memory used by ray-casting meth- cle for obtaining real-time rendering for very large mod- ods is a huge obstacle for handling very large models. els. In this paper, we address this problem by introducing Recent previous efforts towards improving ray-casting a new implementation of the ray-casting algorithm for ir- performance have focused on reducing the execution time regular datasets. Our implementation optimizes the mem- by exploiting hardware graphics, or GPUs (Graphic Pro- ory usage of past implementations by exploring ray coher- cessing Unit). This approach, however, only provides in- ence. The idea is to keep in main memory the information of teractive rendering time when the whole dataset fits into the faces traversed by the ray cast through every pixel un- the GPU memory. Once the data size exceeds the onboard der the projection of a visible face. Our results show that memory, expensive transfers from main memory to the GPU exploring pixel coherence reduces considerably the mem- have great impact on the rendering speed. ory usage, while keeping the performance of our algorithm Therefore, the problem of rendering large datasets (that competitive with the fastest previous ones. do not fit in GPU memory) must be addressed by soft- ware solutions, due to the inherent larger memory capac- ity [6, 11, 7, 12]. Only a few software solutions, however, deal with irregular meshes. Garrity [5] proposed the first 1. Introduction method for ray-casting irregular meshes using the connec- tivity of cells. Bunyk et al. [2] later improved Garrity’s Direct volume rendering has become a popular technique work providing a faster algorithm. Pina et al. [13] improved for visualizing volumetric data from sources such as scien- Bunyk approach in: memory consumption; completely han- tific simulations, analytic functions, and medical scanners, dling degenerate cases; and dealing with both tetrahedral such as MRI, CT, among others. The main advantage of di- and/or hexahedral meshes. By using new data structures, rect volume rendering is to allow the investigation of the in- Pina et al. reduced significantly the memory requirements terior of the data volume, exhibiting its inner structures. of the Bunyk approach, but the algorithms proposed were Several algorithms and methods have been proposed for still slower than Bunyk. direct volume rendering, the most popular one is the ray- In this work, we propose a novel ray-casting algorithm casting algorithm. In this algorithm, a ray is cast through based on Pina et al. approaches, reducing even more the each pixel of the image from the viewpoint. The trace of memory consumption and improving the execution time. the ray determines which cells of the volume each ray inter- Our approach is based on the fact that the decision of how sects. Every pair of intersections is used to compute the cell to store the face information is the key for memory con- contribution for the pixel color and opacity. The ray stops sumption and execution time. The more information is kept when it reaches full opacity or when it leaves the volume. in memory, the faster the algorithm computes the next in- Compared to other direct volume rendering methods, the tersection of the ray. In this work, we tackle this tradeoff great advantages of ray-casting methods are: the computa- by exploring ray coherence in order to maintain in mem- tion for each pixel is independent of all other pixels; and ory only the face information of the most recent traversals.
  • 2. Our idea is to use each visible face computed in the pre- entry point by rendering front faces, and then traverse processing step to guide the creation and destruction of in- through cells using the fragment program by storing the ternal faces data in memory. Our algorithm, called Visible cells and connectivity graph in textures. Their method, how- Faces Ray-casting (VF-Ray), obtained consistent and sig- ever, work only on convex unstructured data. Bernardon nificant gains in memory usage over previous approaches. et al. [1] improve Weiler’s approach, by using Bunyk’s We also reduced the execution time of Pina et al. approaches algorithm as the basis for the hardware implementation by making a better use of the cache. When compared to the and depth peeling to correctly render non-convex irregu- fast and memory expensive approach by Bunyk et al., we lar meshes. Espinha and Celes [3] propose an improvement obtained comparable performance, spending only from 1/3 in Weiler’s work. They use partial pre-integration and pro- to 1/6 of the memory. vide interactive modifications of the transfer function. They The remainder of this paper is organized as follows. In also use an alternative data structure, but as their work is the next section we discuss direct volume rendering of ir- based on Weiler’s work, the memory consumption is still regular meshes. Section 3 briefly describes the Bunyk’s and high. Recently, Weiler el al. [17] proposed an improvement Pina’s approaches. Section 4 describes our ray-casting al- over their previous work, where they deal with the prob- gorithm and the improvements we made over previous ap- lem of storing the whole dataset in GPU memory by using proaches. In section 5, we present the results of our most a compressed form, tetrahedral strips. important experiments. Finally, in section 7, we present our In terms of other direct volume rendering algorithms dif- conclusions and proposals for future work. ferent from ray-casting, it is important to mention the cell projection approach. In this approach each polyhedral cell of the volumetric data is projected onto the screen, avoid- 2. Related Work ing the need of maintaining the data connectivity into the memory, but requiring the cells to be first sorted in visibil- Within the research area of accelerating ray-casting ity ordering and then composed to generate their color and methods, two main research streams have emerged: soft- opacity in the final image. In this scope, there are some soft- ware approaches and graphics hardware approaches. ware implementations, e.g., [10] and [4], that provide flex- The very first implementation of the ray-casting algo- ibility and easy parallelization. The GPU implementations rithm for rendering irregular meshes was proposed in of projection algorithms, however, are more popular [16, 9]. software by Garrity in [5]. His approach used the connec- tivity of cells to compute the entry and exit of each ray. This work was further improved by Bunyk et al. [2], by com- 3. Previous Approaches puting for each pixel a list of intersections on visible faces, and easily determining the correct order of the en- In this section we briefly describe the ray-casting algo- try points for the ray. The rendering process follows rithm proposed by Bunyk et al. in [2], that we simply call Garrity’s method, but when a ray exits the mesh, the al- Bunyk, and the ones proposed by Pina et al. in [13], called gorithm can easily determine in which cell the ray ME-Ray (Memory Efficient Ray-cast) and EME-Ray (En- will re-enter the mesh. This approach becomes sim- hanced Memory Efficient Ray-cast). pler and more efficient than Garrity has proposed, however The data structures used by these algorithms to store the it keeps some large auxiliary data structures. The re- face data guided their memory usage and performance. The cent work by Pina et al. [13] proposes two new ray-casting face data are stored in a structure that we call face and cor- algorithms, ME-Raycast and EME-Raycast, that pre- respond to the information about the geometry and the pa- sented consistent and significant gains in memory usage rameters of the face. The geometry represents the coordi- and in the correctness of the final image over Bunyk’s ap- nates of the points that define the face. The parameters are proach. Their algorithms, however, are slower than Bunyk. costly to compute and represent the constants for the equa- In terms of the acceleration techniques used in this work, tion of the plane defined by the face, that will be used to ray coherence has been used earlier for accelerating ray compute the intersection between the ray and the face, and tracing of traditional surface models [14, 15], and later used the parameters are used to interpolate the scalar value in- in ray-casting for skiping over empty spaces [8]. side the face. The whole face data is the most consuming With the advent of programmable graphics hardware, data structure in ray-casting. several volume rendering algorithms have been developed taking advantage of this feature to achieve high perfor- 3.1. Bunyk mance. Graphics hardware based solutions usually provide real-time performance and high quality images. Weiler et In a preprocessing step, all points and tetrahedra of the al. [16] present a hardware-based ray-casting algorithm input dataset are read, a list of all vertices and a list of cells based on the work of Garrity. They find the initial ray are created. In addition, for each input tetrahedron, all four
  • 3. of its faces are kept in a list. Bunyk also maintains, for data structures allow them to deal with all degenerate sit- each vertex, a list of all faces that use that vertex, called uations that Bunyk is not able to handle. referredBy list. After reading the dataset, Bunyk deter- mines which faces belong to the visible side of the scene 4. Visible Face Driven Ray-Casting boundary, the visible faces. The algorithm projects these visible faces on the screen, creating for each pixel, a list of Comparing the three ray-casting approaches described in the intersections, that will be used as the entry point of its the previous section, we can observe that the more faces ray. After the entry points are stored, the actual ray-casting data are kept in memory, the faster the algorithm computes begins. For every pixel, the first intersection is its entry point the next intersection of the ray. For this reason, Bunyk is into the data, and each next intersection is obtained by in- still the fastest software implementation of ray-casting. specting the intersection between the ray and all other faces Our algorithm, called Visible Face Driven Ray-Casting – of the current cell. Each pair of intersections are used to VF-Ray, addresses the problem of maintaining information compute the contribution of the cell for the color and opac- about the faces in memory, minimally degrading the render- ity of the pixel. The searching for the next intersection in ing performance. The basic idea behind VF-Ray is to keep Bunyk is quite fast, since it is performed among the other in main memory only the faces of the most recent traver- three faces of the tetrahedron. However, the list of all faces sals. To do so, we explore ray coherence, using the visi- and the referredBy lists result in a very high memory ble face information to guide the creation and destruction consumption. of faces in memory. 3.2. ME-Ray 4.1. Exploring Ray Coherence ME-Ray starts in a preprocessing step, creating a list of VF-Ray is based on ray coherence, which comes from all vertices, a list of all cells, for each cell the list of cells the fact that the set of faces intersected by two rays, cast that share a face with it, and the Use Set list for each ver- from neighboring pixels, will contain almost the same faces. tex. The Use Set lists substitute the referredBy lists Our algorithm takes advantage of ray coherence by keeping used in Bunyk, each Use Set contains a list of all cells in- in memory only the faces of nearby rays. cident on the vertex. In the rendering algorithm, ME-Ray One important issue in exploring ray coherence is the de- creates a list of visible faces, and determine the entry point termination of the set of rays that are likely to have the same of each ray, in the same way as Bunyk. The algorithm pro- set of intersected faces. We use the visible faces informa- ceeds looking for the next intersection, by scanning through tion to do so. The set of rays that may reuse the same list out the neighbor cells. ME-Ray creates on demand a list of faces are the ones that correspond to the set of pixels un- of faces, as the face is intersected by the ray. Faces that der the projection of a certain visible face. This set of pixels are never intersected by any ray will not be created. After will be called here visible set. Figure 1 shows a 2D exam- each pair of intersection is found, the contribution to the fi- ple of this idea. The rays that are cast through the visible nal color and opacity is computed. ME-Ray can save about face, presented in the figure, tend to intersect the same in- 40% of the memory used by Bunyk, but is slower, as it has ternal faces. Therefore, the internal faces are created and to create the faces for the first time during the rendering. stored once, for the first ray intersection. After that, they are reused until all the pixels of the visible set are computed. 3.3. EME-Ray This process is repeated for all visible faces. To guarantee correctness, all visible faces are ordered in depth order. EME-Ray was developed as an optimization of ME-Ray in terms of memory usage. They have similar behavior. screen However, EME-Ray does not store the faces in memory, since it was one of the most memory expensive structures in Visible Face Projection ME-Ray. Therefore, EME-Ray has to recalculate the faces Internal Faces parameters for the computation of the intersections every time a new face is found. With this optimization, EME-Ray uses only about 1/4 of the memory used by Bunyk. The sig- nificant reductions in memory usage comes with an increase in the execution time, EME-Ray usually doubles Bunyk ex- ecution time. It is also important to notice that ME-Ray and Figure 1. Visible face rays coherence. EME-Ray gains over Bunyk are not only in memory us- age, but also in the correctness of the final image. Their
  • 4. Our results show that exploiting pixel coherence reduces ated and inserted into computedFaces list. Otherwise, considerably the memory usage, while keeping the VF- the f acej data are loaded from the list. The insertion and Ray performance competitive with the fastest previous al- loading operations in the list are done in constant time, since gorithms because the reuse of face data improves cache per- in the cell we store the index for the position of each face formance. In fact, for all datasets we tested, approximately in the list. The computation of the intersections is done us- more than half of the faces are created at most two times for ing the face parameters computed when the face is created. medium size images. For each intersection, the scalar value is interpolated be- tween the three vertices of each face, which are used in the 4.2. Data structures illumination integral proposed by Max [10]. The data structures used by VF-Ray are similar to that of EME-Ray: an array of vertices, an array of cells, and For each visible face vf acei do the Use Set of each vertex. Besides these structures, VF- Project vf acei ; // Determine visible set Ray also has a list, called computedFaces that holds the For each pixel p of visible set do Intersect vf acei ; // Interpolate scalar value internal faces throughout the model. Note that this list is Do cleaned up after the ray-casting computation of all pixels of Find next internal face f acej ; // Intersection a visible set. In this way, we don’t need to store all the in- If f acej not in computedF aces then ternal faces, as done by Bunyk and ME-Ray algorithms. Create f acej and Insert in computedF aces; Furthermore, just like ME-Ray and EME-Ray, our algo- else rithm can handle any type of convex cell. For tetrahedral Load f acej from computedF aces; meshes, the intersections are computed between the line de- Ray integrate; // Optical model fined by the ray path and the plane defined by the three While exist next face f acej ; vertices of a triangular face. For hexahedral meshes, where End For each face is quadrangular, the intersection is found by split- Clear computedF aces; // Clear list ting these quadrangular faces into two triangular faces, and End For the same calculation is performed for each one. Figure 2. The pseudo-code of VF-Ray. 4.3. Handling Degeneracies The degenerate situations, which can be found during the ray traversal, are treated in the same way that was per- formed in Pina et al. [13], by investigating the Use set of each vertex. So, when a ray hits a vertex or an edge, the al- 5. Results gorithm can determine the next intersection correctly. In this section we evaluate the performance and memory 4.4. Algorithm usage of VF-Ray when compared to ME-Ray, EME-Ray and Bunyk. We also show a comparison between VF-Ray VF-Ray algorithm starts by reading the input dataset and and a cell projection algorithm, called ZSweep [4]. This last creating three lists: a list of vertices, a list of cells and the experiment was done in order to put our results in perspec- Use Set list for each vertex. In the rendering process, VF- tive, with respect to other rendering paradigm. The main Ray creates a list of all visible faces, that are the faces whose idea of ZSweep algorithm is the sweeping of the data with a normal makes an angle greater than 90o with the viewing plane parallel to the viewing plane XY , towards the pos- direction. For each visible face, the algorithm follows the itive z direction. The sweeping process is performed by steps presented in the pseudo-code of Figure 2. ordering the vertices by their increasing z coordinate val- Initially, the visible face, vf acei , is projected onto the ues, and then retrieving one by one from this data struc- screen, generating the visible set of vf acei . For each pixel ture. For each vertex swept by the plane sweep, the algo- p in the visible set, the algorithm first computes the intersec- rithm projects, onto the screen, all faces that are incident to tion between the ray cast from p and vf acei . After this first it. During face projection, ZSweep stores a list of intersec- intersection is found, the ray traversal begins, when the al- tions for each pixel and uses delayed compositing. The pix- gorithm computes the intersections of the ray with the inter- els lists intersections are composed and the lists are flushed nal faces. The next face intersected, f acej , is found on an- each time the plane reaches the target-z. The target-z is the other face of the current cell. If f acej was not created be- maximum z-coordinate among the vertices adjacent to the fore (this is the first intersection in f acej ), the face is cre- vertex intersected by the sweeping plane.
  • 5. 5.1. Testing Configuration ME-Ray uses from 1.5 to 6 times more memory than VF- Ray. For EME-Ray, we notice an interesting result. Al- The VF-Ray algorithm was written in C++ ANSI with- though EME-Ray does not store the faces in memory, for out using any particular graphics libraries. Our experiments high resolution images, bigger than 1024 × 1024, EME- were conducted on a Intel Pentium IV 3.6 GHz with 2 GB Ray uses more memory than VF-Ray. This result can be RAM and 1MB of L2 cache, running Linux Fedora Core 5. explained by the fact that EME-Ray, just like Bunyk and We have used four different tetrahedral datasets: Blunt ME-Ray, maintains for each pixel a list containing the entry Fin, SPX Liquid Oxygen Post, and Delta Wing. Table 1 faces for the pixel. The number of lists increases with the in- shows the number of vertices, faces, boundary faces and crease in image resolution. VF-Ray, on the other hand, does cells for each dataset. We varied the image sizes, from not require this kind of list, since it computes the entry point 512 × 512 to 8192 × 8192 pixels. The models rendered with of each ray on demand, for each visible face. VF-Ray are shown in Figures 3 to 6. The images generated When compared to ZSweep, we can observe that except by VF-Ray, ME-Ray, and EME-Ray are identical, and bet- for SPX with a 512 × 512 image, VF-Ray uses much less ter than the images generated by Bunyk. There are two rea- memory than ZSweep. For the higher resolution images, sons for this difference: VF-Ray, ME-Ray, and EME-Ray ZSweep can use 8 or 9 times more memory than VF-Ray. use double precision variables to compute ray-faces inter- To render SPX with a 8192 × 8192 image, for example, sections, while Bunyk used float precision, causing artifacts ZSweep spends about 2GB of memory. The ZSweep mem- in the images, and Bunyk does not handle correctly all de- ory consumption is directly proportional to the final image generate cases, generating some incorret pixels. resolution, since it creates pixels lists to store the intersec- tions found for each pixel. Dataset # Verts # Faces # Tets # Boundary Blunt 41 K 381 K 187 K 13 K 5.3. Rendering Performance SPX 149 K 1.6 M 827 K 44 K Table 3 shows the rendering time (in seconds) of VF-Ray Post 109 K 1M 513 K 27 K for Blunt, SPX, Post and Delta, and the relative timing re- Delta 211 K 2M 1M 41 K sults of ME-Ray, EME-Ray, Bunyk and ZSweep when com- Table 1. Dataset sizes. pared to VF-Ray execution. Just like the previous table, the percentages presented in this table correspond to the ratio of the other algorithms execution time over VF-Ray execu- tion time. The time results correspond to the rendering of 5.2. Memory Consumption one frame and do not include preprocessing time. As we can observe in Table 3, VF-Ray outperforms ME- Table 2 shows the memory used by VF-Ray to render Ray, EME-Ray and ZSweep for all datasets and all image one frame (in MBytes) for Blunt, SPX, Post and Delta, and resolutions. For Blunt, Post and Delta, VF-Ray is about 3 the relative memory usage results of ME-Ray, EME-Ray, times faster than EME-Ray and ZSweep, and uses about Bunyk and ZSweep when compared to VF-Ray usage. The 3/4 of the time used by ME-Ray to render the image. For percentages presented in this table correspond to the ratio SPX, the gains of VF-Ray against EME-Ray and ZSweep of the other algorithms memory usage over VF-Ray mem- are smaller, but still high. When compared to Bunyk, for ory consumption. In other words, we consider VF-Ray re- Blunt and Delta, VF-Ray presents almost the same execu- sults as 100% and are presenting how much the other algo- tion time, since the difference is only about 10%. For SPX rithms increase or decrease this result. and Post, VF-Ray is only about 22% slower than Bunyk. As we can observe in Table 2, in most of the cases, Observing the increase in the rendering time of VF-Ray VF-Ray uses less memory than the three other algorithms, for different image resolutions, we notice that as the im- and its memory usage increases linearly with the resolution. age resolution increases, the rendering time increases in the When compared to Bunyk, VF-Ray uses from 1/3 to 1/6 of same proportion. The gains over the other algorithms are al- the memory spent by Bunyk. These gains are impressive. most the same for different resolutions. For Blunt, for example, VF-Ray renders a 4096 × 4096 im- age using the same amount of memory that Bunyk uses to 6. Discussion render a 512 × 512 image. For the largest dataset, Delta, Bunyk uses more memory to render a 512 × 512 image than VF-Ray had obtained consistent and significant gains in VF-Ray uses to render a 8192 × 8192 image. memory usage over Bunyk, providing almost the same per- When compared with the two memory-aware ray-casting formance (the difference on their execution time is only algorithms, ME-Ray and EME-Ray, we can observe that about 16%). As the image resolution increases, our gains
  • 6. Dataset Resolution Memory (MB) ME-Ray EME-Ray Bunyk ZSweep 512 × 512 14 404% 166% 528% 208% 1024 × 1024 30 240% 129% 297% 177% Blunt 2048 × 2048 39 333% 251% 377% 369% 4096 × 4096 75 495% 451% 517% 689% 8192 × 8192 219 610% 655% 617% 917% 512 × 512 96 215% 74% 299% 87% 1024 × 1024 99 234% 88% 305% 107% SPX 2048 × 2148 108 271% 141% 337% 188% 4096 × 4096 144 383% 284% 428% 407% 8192 × 8192 288 533% 498% 569% 711% 512 × 512 63 165% 74% 290% 97% 1024 × 1024 66 203% 95% 301% 129% Post 2048 × 2048 74 281% 172% 353% 238% 4096 × 4096 110 424% 349% 470% 499% 8192 × 8192 256 601% 560% 603% 800% 512 × 512 116 167% 74% 303% 97% 1024 × 1024 119 200% 84% 308% 116% Delta 2048 × 2048 129 246% 124% 330% 177% 4096 × 4096 165 346% 247% 403% 375% 8192 × 8192 307 500% 467% 534% 704% Table 2. Memory usage results for VF-Ray compared to ME-Ray, EME-Ray, Bunyk and ZSweep. in memory usage over Bunyk are even more pronounced, execution time come from the good cache performance of but the difference in the rendering time remains the same. VF-Ray. We have made some experiments to evaluate the For 8192 × 8192 images, Bunyk uses about 6 times more cache misses of VF-Ray and ME-Ray and obtained for VF- memory than VF-Ray with a performance difference of Ray a cache miss ratio very low, near 0%, while the cache only about 18%. In our experiments, however, we are only miss ratio of ME-Ray is about 1.9% for modest size im- comparing executions where the whole dataset fits in main ages. This result was obtained because our algorithm reuse memory for all algorithms. Our experimental platform has the faces for the rays in the same visible face. Given the fact 2 GB of main memory that can store all the data structures that rays are shot one after the other, the closer the neigh- used by our workload. As the memory usage increases, the boring rays are to each other, the higher the probability is rendering will need to use the virtual memory mechanisms that they reuse the cached data. of the operating system, which would have great influence The comparison of VF-Ray with the cell projection ap- on the overall execution time. proach, implemented by ZSweep, showed that VF-Ray is The small increase in the execution time, when compared faster and spends less memory than ZSweep. The gains to Bunyk, comes from the fact that Bunyk computes all in- are huge. For the Delta dataset, ZSweep uses 7 times more ternal faces in the preprocessing step, before the ray-casting memory, running about 3.5 times slower than VF-Ray. The loop, while VF-Ray does it on demand, as the face is inter- pixel list maintained by the ZSweep approach increases sected for the first time. In addition, in VF-Ray, if a face is while a target-Z was not reached. As this list grows, the intersected by rays of two different visible faces, it has its insertion is more expensive, since the pixel list must be or- data computed twice. dered. On the other hand, the data structures on the VF-Ray The comparison of VF-Ray with ME-Ray and EME- algorithm does not depend on ordering. The difference in Ray provided interesting results. ME-Ray and EME-Ray memory usage of ZSweep is due to the increase in these are memory-aware algorithms designed to reduce the great pixels lists. For ZSweep, as the image size increases, each memory usage of Bunyk. VF-Ray was designed to reduce face projected will insert intersection units into more pixel ME-Ray memory usage, close to the usage of EME-Ray, but lists. If the target-Z is not appropriate, the pixels lists will running much faster than EME-Ray. Our results showed, increase considerably. however, that for high resolution images, our algorithm is In terms of the dataset characteristics, the performance not only faster than ME-Ray but uses less memory than of VF-Ray increases as the number of pixels in each vis- EME-Ray. The gains of VF-Ray over EME-Ray in mem- ible face increases. Datasets with a small number of faces ory usage come from the elimination of the lists of entry have a greater amount of pixels per visible face. Therefore, faces for each pixel. The gains of VF-Ray over ME-Ray in datasets, like Blunt, that has a small number of faces, tend
  • 7. Dataset Resolution Time (sec) ME-Ray EME-Ray Bunyk ZSweep 512 × 512 1.9 139% 296% 105% 253% 1024 × 1024 7.0 143% 317% 94% 431% Blunt 2048 × 2048 27.2 146% 337% 88% 481% 4096 × 4096 107.4 147% 328% 88% 516% 8192 × 8192 426.7 154% 341% 91% 382% 512 × 512 4.1 111% 161% 76% 287% 1024 × 1024 13.0 103% 203% 81% 202% SPX 2048 × 2048 46.6 103% 225% 83% 219% 4096 × 4096 177.1 105% 234% 82% 226% 8192 × 8192 696.3 105% 238% 81% 260% 512 × 512 5.0 138% 251% 80% 201% 1024 × 1024 19.5 136% 254% 76% 167% Post 2048 × 2048 78.6 133% 258% 74% 194% 4096 × 4096 309.9 135% 257% 74% 227% 8192 × 8192 1246.3 135% 263% 72% 246% 512 × 512 3.1 130% 242% 91% 465% 1024 × 1024 11.2 122% 270% 88% 271% Delta 2048 × 2048 41.9 121% 280% 87% 312% 4096 × 4096 162.7 120% 290% 84% 341% 8192 × 8192 640.3 123% 290% 83% 369% Table 3. Time results for VF-Ray compared to ME-Ray, EME-Ray, Bunyk and ZSweep. to present better performance results for VF-Ray. Finally, it is important to keep in mind that the reduc- tions in memory usage will allow the implementation of the algorithm in the graphics hardware. Furthermore, these re- ductions will also allow the use of double precision in the parameters to calculate the intersection between a ray and a face, raising the chance to obtain more precise images. Figure 4. SPX image generated by VF-Ray. dral or hexahedral datasets, including non-convex and dis- connected meshes even with holes. Our idea was to explore ray coherence in order to reduce the memory requirements for the storing of faces data while achieving a good cache performance. We compared VF-Ray Figure 3. Blunt image generated by VF-Ray. memory usage and performance with previous ray-casting approaches, and with a cell projection algorithm. When compared to the fastest and memory expensive approach, by Bunyk, VF-Ray obtained comparable performance, spend- 7. Conclusions ing only from 1/3 to 1/6 of the memory. When compared to the memory-aware Pina et al algorithms, VF-Ray ob- In this work, we have proposed a novel memory-aware tained consistent and significant gains in memory usage and and efficient software ray-casting algorithm for volume ren- in execution time. When compared to the cell projection ap- dering, called VF-Ray. The algorithm is suitable for tetrahe- proach, VF-Ray also reduced significantly the memory con-
  • 8. [2] P. Bunyk, A. Kaufman, and C. Silva. Simple, fast, and ro- bust ray casting of irregular grids. Advances in Volume Visu- alization, ACM SIGGRAPH, 1(24), July 1998. [3] R. Espinha and W. Celes. High-quality hardware-based ray- casting volume rendering using partial pre-integration. In SIBGRAPI ’05: Proc. of the XVIII Brazilian Symposium on Computer Graphics and Image Processing, October 2005. [4] R. Farias, J. Mitchell, and C. Silva. Zsweep: An efficient and exact projection algorithm for unstructured volume render- ing. In 2000 Volume Visualization Symposium, pages 91 – 99, October 2000. [5] M. P. Garrity. Raytracing irregular volume data. In VVS ’90: Proceedings of the 1990 workshop on Volume visualization, pages 35–40. ACM Press, 1990. Figure 5. Post image generated by VF-Ray. [6] L. Hong and A. Kaufman. Accelerated ray-casting for curvi- linear volumes. In VIS ’98: Proceedings of the conference on Visualization ’98, pages 247–253, 1998. [7] K. Koyamada. Fast ray-casting for irregular volumes. In ISHPC ’00: Proc. of the Third International Symposium on High Performance Computing, pages 557–572, 2000. [8] S. Lakare and A. Kaufman. Light weight space leaping us- ing ray coherence. In Proc. IEEE Visualization 2004, pages 19–26, 2004. [9] R. Marroquim, A. Maximo, R. Farias, and C. Esperanca. Gpu-based cell projection for interactive volume rendering. In SIBGRAPI ’06: Proc. of the Brazilian Symposium on Computer Graphics and Image Processing, October 2006. [10] N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graph- ics, 1(2):99–108, 1995. [11] M. Meibner, J. Huang, D. Bartz, K. Mueller, and R. Craw- Figure 6. Delta image generated by VF-Ray. fis. A practical evaluation of popular volume rendering algo- rithms. In VVS ’00: Proceedings of the 2000 IEEE sympo- sium on Volume visualization, pages 81–90, 2000. sumption and the execution time. [12] A. Neubauer, L. Mroz, H. Hauser, and R. Wegenkittl. Cell- The reductions in memory usage makes VF-Ray suitable based first-hit ray casting. In VISSYM ’02: Proceedings of the for using double precision in the parameters to calculate the symposium on Data Visualisation 2002, pages 77–ff, 2002. intersection between a ray and a face. Using double preci- [13] A. Pina, C. Bentes, and R. Farias. Memory efficient and ro- sion avoids the artifacts that appear when float precision is bust software implementation of the raycast algorithm. In used. We conclude that VF-Ray is an efficient ray-casting WSCG’07: The 15th Int. Conf. in Central Europe on Com- puter Graphics, Visualization and Computer Vision, 2007. algorithm that allows the in-core rendering of big datasets. [14] A. Reshetov, A. Soupikov, and J. Hurley. Multi-level ray As future work, we intend to optimize VF-Ray by keeping tracing algorithm. In SIGGRAPH ’05: ACM SIGGRAPH face data for neighboring visible faces, and to parallelize 2005 Papers, pages 1176–1185, 2005. VF-Ray in order to run on a cluster of PCs [15] I. Wald, P. Slusallek, C. Benthin, and M. Wagner. nterac- tive rendering with coherent ray tracing. In Eurographics 01 8. Acknowledgments Proceedings, pages 153–164, 2001. [16] M. Weiler, M. Kraus, M. Merz, and T. Ertl. Hardware-based We acknowledge the grant of the first and second authors ray casting for tetrahedral meshes. In Proceedings of the 14th provided by Brazilian agencies CAPES and CNPq. IEEE conference on Visualization ’03, pages 333–340, 2003. [17] M. Weiler, P. Mall´ n, M. Kraus, and T. Ertl. Texture-encoded o tetrahedral strips. In 2004 IEEE Symposium on Volume Visu- References alization and Graphics (VolVis 2004), pages 71–78, October 2004. [1] F. Bernardon, C. Pagot, J. Comba, and C. Silva. Gpu-based tiled ray casting using depth peeling. Journal of Graphics Tools, To appear.