+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
PhD defense talk (portfolio of my expertise)
1. GPU Data Structures
for Graphics and Vision
Promotionskolloquium, May 6th 2011
Gernot Ziegler
Dept. of Computer Graphics
(3D Video and Vision-Based Graphics Group)
2. Outline
Graphics Hardware:
Original Purpose and Recent Development
Classical Usage in Visual Computing
Free Viewpoint Video Compression
Color and Depth Reprojection
Hierarchical Image Processing
General Data Processing
Data Compaction with the HistoPyramid
Quadtree and Octree Generation
Data Expansion with the HistoPyramid
Conclusion
3. Graphics Hardware: Original Purpose
Graphics hardware accelerates typical data operations of
computer graphics (pixel moves, triangle rasterization)
GPU is simpler in design than CPU, but massively parallel.
5. Graphics Hardware: Capabilities
Data can now be anything (floating point & integer)
General Purpose Computing on GPU = GPGPU
"Classical Usage" in Visual Computing (still graphics-related)
– Computer Vision
– Video processing
– Volume analysis
General Data Processing
– PDE / ODE solver
– Spatial Data Structure Generation
– Database Ops
– Etc…
Game of Life
(Early GPGPU by S. Green)
7. Free Viewpoint Video Compression
(Chapter 3)
Map video footage into texture domain via proxy 3D model
8. Free Viewpoint Video Compression
(Chapter 3)
Obtain texture surface masking via shadow mapping
9. Free Viewpoint Video Compression: Publications
G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, H.-P. Seidel.
Multi-Video Compression in Texture Space.
11th IEEE Intl Conference on Image Processing (ICIP 2004),
Singapore, pp. 2467-2470, 2004.
G. Ziegler, H. Lensch, M. Magnor, H.-P. Seidel. Multi-Video
Compression in Texture Space using 4D SPIHT.
6th IEEE Workshop on Multimedia Signal Processing, Siena,
Italy, pp. 39-42, 2004.
10. Color and Depth Reprojection
(Chapter 4)
Depth-map "Projection" via proxy mesh & vertex shader
Novel View reconstruction
from partial depth camera views
11. Color and Depth Reprojection
(Chapter 4)
Blending by View Angle Our per-pixel approach (Purple: Blended areas)
18. Graphics Hardware: Limitations
GPU is connected with CPU via narrow databus
(bandwidth bottleneck, approx. 4 GB/s)
GPU is a Stream processor:
– 10K thread workload necessary to keep 100s cores busy
(data parallelization!)
– Thread switching lightweight, but synchronization expensive!
– Each thread can only write at a fixed position
Algorithms must be redesigned for GPU!
20. Data-Parallel Algorithm Challenges
Example from Computer Vision: List of all black pixels in an image
Step 1: Detect black pixels:
Step 2: Create a list of detected pixels
21. Previous approach to feature list generation
Step 2 (List generation) was not possible on GPU!
2a: GPU marks local features (e.g. thresholding, filtering)
2b: CPU searches image and generates feature list
But: Bus transfers expensive:
GPU useful only for complex feature isolation.
(e.g. large filter convolution & thresholding)
22. Our approach: Feature list generation on GPU
We generate feature lists on the GPU using data compaction.
Pixel/Voxels/Feature input is abstracted "data element stream”.
Compaction keeps only elements deemed relevant for output.
1D example (keep all elements that are blue):
Data flow:
Massive speedup due to strongly reduced bus dataflow!
1 1 0 1A B C D E F 1 1 1A B D E
23. Data Compaction: Problem task in 1D
Keep number of elements from input, based on a Classifier:
Implementation is trivial on CPU, single-thread.
On GPU: Need to parallelize into 10k threads!
First count number of output elements
using data-parallel reduction!
24. Data Compaction via HistoPyramid:Buildup
First, count number of output elements,
e.g. 4:1 data-parallel reduction
(Note the reduction pyramid, it is retained - HistoPyramid)
Can now allocate compact output, no spill.
But how are output elements generated?
Histogram pyramid /
HistoPyramid
25. Data Compaction via HistoPyramid: Traversal
Output generate: Start one thread per output element
Each output thread traverses reduction pyramid (read-only)
No read/write hazards = Data-parallel output writing!
As many threads as output elements
26. HistoPyramid: 2D Data Compaction
1D was tutorial, actual implementation is 2D !
Dataflow diagram:
27. GPU Data Compaction:Publications
Data Compaction fast enough for real-time volume analysis
First application: Mesh-to-volume-to-point cloud in real-time!
G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
28. GPU Data Compaction:Publications
Data Compaction fast enough for real-time volume analysis
First application: Mesh-to-volume-to-point cloud in real-time!
G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
29. GPU Data Compaction:Publications
Vector Field Contours: View-dependent vectorfield analysis to
visualize contour lines throughout the volume
Data Compaction delivers seedpoints for contour lines in ms!
T. Annen, H. Theisel, C. Rössl, G. Ziegler, H.-P. Seidel
Vector Field Contours
Graphics Interface 2008, Windsor/Canada, 2008, pp. 97-105
31. GPU Quadtrees: Introduction
2D Reduction follows a quadtree-like reduction pattern.
By tracking feature similarity in reduction,
quadtrees can be created from the reduction pyramid!
32. GPU QuadTree: Publications
Speed (ms) enables real-time quadtree processing from video!
e.g. for Compression, Vision,..
G. Ziegler, R. Dimitrov, C. Theobalt, H-P. Seidel.
Real-time Quadtree Analysis using HistoPyramids.
SPIE Electronic Imaging conference, San Jose/USA, 2007.
33. GPU Octree
(Chapter 9)
Feature Clustering extended to 3D volumes
Octrees from Volume Data
New algorithm, pointer octrees
(e.g. for spatial data structures)
Real-time creation of
high-resolution octrees
from meshes possible!
35. Data Expansion via HistoPyramid:
Problem task
We have a predicate function that determines
how many output copies to create from each input element:
Implementation is trivial on CPU
GPU: Input can be divided amongst threads, but:
Where shall each thread write its output?
Insight:
HistoPyramid traversal works even here!
36. Data Expansion via HistoPyramid:
HP Buildup
First, count number of output elements, e.g. via 4:1 reduction:
37. Data Expansion via HistoPyramid:
HP Traversal (single output copy)
Traversal for single output elements:
Exactly like data compaction, but: Mind local key index kL
38. Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
Traversal for multiple output elements:
kL determines number of copy. Still: one thread for each copy!
39. Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
Traversal for multiple output elements:
kL determines number of copy.
Observation: Thread can modify input before write-out!
Thus: Output can be modified version of input based on kL.
e.g. Geometry Creation:
(Generic algorithm…)
40. Data Expansion: Eikonal Rendering (Publication I)
Compute light transport through volume objects of varying refraction
Both real-time rendering and precomputed lighting simulation
Lighting simulation requires adaptive light wavefront simulation
I. Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel
Eikonal Rendering: Efficient Light Transport in Refractive Objects
ACM Transactions on Graphics 26 (3): 59-1 - 59-8, 2007
http://www.mpi-inf.mpg.de/resources/EikonalRendering/
41. Eikonal Rendering: Lighting Simulation
For given light-object position, precompute lighting inside
the volumetric object for real-time novel view rendering.
42. Lighting simulation implements numerical ODE solver on GPU.
Subdivide light's wavefront into a set of patches
Patch corners move as GPU particle system
– Each particle follows ray optics
During update, some patches:
– weaken too much (discard)
– leave volume (discard)
– grow too large (tesselate)
Since patch list is on GPU:
– Discard: Data Compaction
– Tesselate: Data Expansion
Eikonal Rendering: Wavefront Propagation
45. Data Expansion: Marching Cubes (Publication II)
Marching Cubes algorithm extracts iso-surfaces from volumes
Reformulate: Stream of voxels ...
– is first compacted to the relevant iso-surface voxels
– then expanded, becoming a stream of triangle vertices
C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel
High-speed Marching Cubes using HistoPyramids
Computer Graphics Forum 27 (8): 2028-2039, 2008
http://www.sintef.no/hpmc
46. Performance of OpenGL approach (2007):
Geometry shader (GS), e.g. NVIDIA GeForce 8, enabled
hardware data compaction & expansion for geometry -
should obsolete HistoPyramids, but HP-MC outperforms
geometry shader (HP-GS)!
HP-MC was 2007 fastest known MC algorithm.
(frames per second)
48. Conclusion and Outlook
GPUs increasingly useful in general data processing
Programming Model Restrictions not always bad
– Force programmer to change thought model
– E.g.: Fixed Output Location created HistoPyramid traversal concept
– Can be more efficient, even on more capable hardware!
(atomic counters, geometry shaders have less performance)
Data-Parallel Algorithm Design is hard
– But once done, parallelizable over any number of available cores
(if sufficient data available)
– Hard to imagine that auto-parallelization can achieve this
Future work
– Connected components, distance transforms, SATs…
– Accelerate further using CUDA C and OpenCL
49. Other work based on presented algorithms
Quadtree
C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
QuadN4tree: A GPU-Friendly Quadtree Leaves Neighborhood Structure.
Proc. of Computer Graphics International Conference (CGI) 2008.
C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
Using Quadtrees for Energy Minimization Via Graph Cuts.
Proc. of VMV - 12th Vision, Modeling, and Visualization Workshop, pp. 71-80.
Data Expansion
C. Dyken, M. Reimers, J. Seland.
Real-time GPU Silhouette Refinement using adaptively blended Bézier patches.
Computer Graphics Forum, Volume 27, number 1, pp. 1-12, 2007.
Data Compaction (Implementation)
J. Fung, S. Mann.
OpenVIDIA: parallel GPU computer vision.
Proc. of 13th annual ACM international conference on Multimedia, pp. 849 - 852.
http://openvidia.sf.net
52. San Jose (CA) | September 23rd, 2010
Christopher Dyken, SINTEF Norway
Gernot Ziegler, NVIDIA UK
GPU-accelerated data expansion
for the Marching Cubes algorithm
53. HistoPyramid performance
Accelerated HistoPyramids using CUDA C
HistoPyramid BuildUp
— Reduce 5-to-1, but store only first four sums!
— Build several levels via on-GPU shared memory
(less video memory transactions)
Marching Cubes specific
— Share scalar input data amongst neighbouring MC cells
(through shared memory)