SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
GPU Data Structures
for Graphics and Vision
Promotionskolloquium, May 6th 2011
Gernot Ziegler
Dept. of Computer Graphics
(3D Video and Vision-Based Graphics Group)
Outline
Graphics Hardware:
Original Purpose and Recent Development
Classical Usage in Visual Computing
 Free Viewpoint Video Compression
 Color and Depth Reprojection
 Hierarchical Image Processing
General Data Processing
 Data Compaction with the HistoPyramid
 Quadtree and Octree Generation
 Data Expansion with the HistoPyramid
Conclusion
Graphics Hardware: Original Purpose
 Graphics hardware accelerates typical data operations of
computer graphics (pixel moves, triangle rasterization)
 GPU is simpler in design than CPU, but massively parallel.
Graphics Hardware: Capabilities
 ~2003: Graphics Hardware becomes programmable:
GPU (Graphics Processing Unit)
Graphics Hardware: Capabilities
 Data can now be anything (floating point & integer)
 General Purpose Computing on GPU = GPGPU
"Classical Usage" in Visual Computing (still graphics-related)
– Computer Vision
– Video processing
– Volume analysis
General Data Processing
– PDE / ODE solver
– Spatial Data Structure Generation
– Database Ops
– Etc…
Game of Life
(Early GPGPU by S. Green)
Classical Usage in Visual Computing
Free Viewpoint Video Compression
(Chapter 3)
 Map video footage into texture domain via proxy 3D model
Free Viewpoint Video Compression
(Chapter 3)
 Obtain texture surface masking via shadow mapping
Free Viewpoint Video Compression: Publications
 G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, H.-P. Seidel.
Multi-Video Compression in Texture Space.
11th IEEE Intl Conference on Image Processing (ICIP 2004),
Singapore, pp. 2467-2470, 2004.
 G. Ziegler, H. Lensch, M. Magnor, H.-P. Seidel. Multi-Video
Compression in Texture Space using 4D SPIHT.
6th IEEE Workshop on Multimedia Signal Processing, Siena,
Italy, pp. 39-42, 2004.
Color and Depth Reprojection
(Chapter 4)
 Depth-map "Projection" via proxy mesh & vertex shader
Novel View reconstruction
from partial depth camera views
Color and Depth Reprojection
(Chapter 4)
Blending by View Angle Our per-pixel approach (Purple: Blended areas)
Hierarchical Image Processing: Stereo reconstruction
(Chapter 5.1)
 Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
Hierarchical Image Processing: Stereo reconstruction
(Chapter 5.1)
 Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
Hierarchical Image Processing: Reduction
(Chapter 5.2)
 Mipmap-like reduction: Dominant feature region, noise reduction
Hierarchical Image Processing: Reduction
(Thesis Chapter 5.3)
 Histogram of local gradients guides lens warp compensation
General Data Processing
Graphics Hardware: Capabilities
 GPU has massive computation and memory throughput
Graphics Hardware: Limitations
 GPU is connected with CPU via narrow databus
(bandwidth bottleneck, approx. 4 GB/s)
 GPU is a Stream processor:
– 10K thread workload necessary to keep 100s cores busy
(data parallelization!)
– Thread switching lightweight, but synchronization expensive!
– Each thread can only write at a fixed position
 Algorithms must be redesigned for GPU!
General Data Processing
Data Compaction
(Chapter 6)
Data-Parallel Algorithm Challenges
 Example from Computer Vision: List of all black pixels in an image
 Step 1: Detect black pixels:
 Step 2: Create a list of detected pixels
Previous approach to feature list generation
 Step 2 (List generation) was not possible on GPU!
 2a: GPU marks local features (e.g. thresholding, filtering)
 2b: CPU searches image and generates feature list
 But: Bus transfers expensive:
GPU useful only for complex feature isolation.
(e.g. large filter convolution & thresholding)
Our approach: Feature list generation on GPU
 We generate feature lists on the GPU using data compaction.
 Pixel/Voxels/Feature input is abstracted "data element stream”.
 Compaction keeps only elements deemed relevant for output.
1D example (keep all elements that are blue):
 Data flow:
Massive speedup due to strongly reduced bus dataflow!
1 1 0 1A B C D E F 1 1 1A B D E
Data Compaction: Problem task in 1D
 Keep number of elements from input, based on a Classifier:
 Implementation is trivial on CPU, single-thread.
 On GPU: Need to parallelize into 10k threads!
 First count number of output elements
using data-parallel reduction!
Data Compaction via HistoPyramid:Buildup
 First, count number of output elements,
e.g. 4:1 data-parallel reduction
 (Note the reduction pyramid, it is retained - HistoPyramid)
 Can now allocate compact output, no spill.
 But how are output elements generated?
Histogram pyramid /
HistoPyramid
Data Compaction via HistoPyramid: Traversal
 Output generate: Start one thread per output element
 Each output thread traverses reduction pyramid (read-only)
 No read/write hazards = Data-parallel output writing!
 As many threads as output elements
HistoPyramid: 2D Data Compaction
 1D was tutorial, actual implementation is 2D !
 Dataflow diagram:
GPU Data Compaction:Publications
 Data Compaction fast enough for real-time volume analysis
 First application: Mesh-to-volume-to-point cloud in real-time!
 G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
GPU Data Compaction:Publications
 Data Compaction fast enough for real-time volume analysis
 First application: Mesh-to-volume-to-point cloud in real-time!
 G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
GPU Data Compaction:Publications
 Vector Field Contours: View-dependent vectorfield analysis to
visualize contour lines throughout the volume
 Data Compaction delivers seedpoints for contour lines in ms!
 T. Annen, H. Theisel, C. Rössl, G. Ziegler, H.-P. Seidel
Vector Field Contours
Graphics Interface 2008, Windsor/Canada, 2008, pp. 97-105
General Data Processing
Quadtree and Octree Generation
(Chapter 8 and 9)
GPU Quadtrees: Introduction
 2D Reduction follows a quadtree-like reduction pattern.
 By tracking feature similarity in reduction,
quadtrees can be created from the reduction pyramid!
GPU QuadTree: Publications
 Speed (ms) enables real-time quadtree processing from video!
e.g. for Compression, Vision,..
 G. Ziegler, R. Dimitrov, C. Theobalt, H-P. Seidel.
Real-time Quadtree Analysis using HistoPyramids.
SPIE Electronic Imaging conference, San Jose/USA, 2007.
GPU Octree
(Chapter 9)
 Feature Clustering extended to 3D volumes
 Octrees from Volume Data
 New algorithm, pointer octrees
(e.g. for spatial data structures)
 Real-time creation of
high-resolution octrees
from meshes possible!
General Data Processing
Data Expansion
(Chapter 7)
Data Expansion via HistoPyramid:
Problem task
 We have a predicate function that determines
how many output copies to create from each input element:
 Implementation is trivial on CPU
 GPU: Input can be divided amongst threads, but:
Where shall each thread write its output?
 Insight:
HistoPyramid traversal works even here!
Data Expansion via HistoPyramid:
HP Buildup
 First, count number of output elements, e.g. via 4:1 reduction:
Data Expansion via HistoPyramid:
HP Traversal (single output copy)
 Traversal for single output elements:
 Exactly like data compaction, but: Mind local key index kL
Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
 Traversal for multiple output elements:
 kL determines number of copy. Still: one thread for each copy!
Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
 Traversal for multiple output elements:
 kL determines number of copy.
 Observation: Thread can modify input before write-out!
 Thus: Output can be modified version of input based on kL.
 e.g. Geometry Creation:
 (Generic algorithm…)
Data Expansion: Eikonal Rendering (Publication I)
 Compute light transport through volume objects of varying refraction
 Both real-time rendering and precomputed lighting simulation
 Lighting simulation requires adaptive light wavefront simulation
 I. Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel
Eikonal Rendering: Efficient Light Transport in Refractive Objects
ACM Transactions on Graphics 26 (3): 59-1 - 59-8, 2007
http://www.mpi-inf.mpg.de/resources/EikonalRendering/
Eikonal Rendering: Lighting Simulation
 For given light-object position, precompute lighting inside
the volumetric object for real-time novel view rendering.
 Lighting simulation implements numerical ODE solver on GPU.
 Subdivide light's wavefront into a set of patches
 Patch corners move as GPU particle system
– Each particle follows ray optics
 During update, some patches:
– weaken too much (discard)
– leave volume (discard)
– grow too large (tesselate)
 Since patch list is on GPU:
– Discard: Data Compaction
– Tesselate: Data Expansion
Eikonal Rendering: Wavefront Propagation
Eikonal Rendering: Short Demo
Eikonal Rendering: Short Demo
Data Expansion: Marching Cubes (Publication II)
 Marching Cubes algorithm extracts iso-surfaces from volumes
 Reformulate: Stream of voxels ...
– is first compacted to the relevant iso-surface voxels
– then expanded, becoming a stream of triangle vertices
 C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel
High-speed Marching Cubes using HistoPyramids
Computer Graphics Forum 27 (8): 2028-2039, 2008
http://www.sintef.no/hpmc
Performance of OpenGL approach (2007):
 Geometry shader (GS), e.g. NVIDIA GeForce 8, enabled
hardware data compaction & expansion for geometry -
should obsolete HistoPyramids, but HP-MC outperforms
geometry shader (HP-GS)!
HP-MC was 2007 fastest known MC algorithm.
(frames per second)
Conclusion and Outlook
(Chapter 10)
Conclusion and Outlook
 GPUs increasingly useful in general data processing
 Programming Model Restrictions not always bad
– Force programmer to change thought model
– E.g.: Fixed Output Location created HistoPyramid traversal concept
– Can be more efficient, even on more capable hardware!
(atomic counters, geometry shaders have less performance)
 Data-Parallel Algorithm Design is hard
– But once done, parallelizable over any number of available cores
(if sufficient data available)
– Hard to imagine that auto-parallelization can achieve this
 Future work
– Connected components, distance transforms, SATs…
– Accelerate further using CUDA C and OpenCL
Other work based on presented algorithms
Quadtree
 C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
QuadN4tree: A GPU-Friendly Quadtree Leaves Neighborhood Structure.
Proc. of Computer Graphics International Conference (CGI) 2008.
 C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
Using Quadtrees for Energy Minimization Via Graph Cuts.
Proc. of VMV - 12th Vision, Modeling, and Visualization Workshop, pp. 71-80.
Data Expansion
 C. Dyken, M. Reimers, J. Seland.
Real-time GPU Silhouette Refinement using adaptively blended Bézier patches.
Computer Graphics Forum, Volume 27, number 1, pp. 1-12, 2007.
Data Compaction (Implementation)
 J. Fung, S. Mann.
OpenVIDIA: parallel GPU computer vision.
Proc. of 13th annual ACM international conference on Multimedia, pp. 849 - 852.
http://openvidia.sf.net
End of Presentation
Recent Work
San Jose (CA) | September 23rd, 2010
Christopher Dyken, SINTEF Norway
Gernot Ziegler, NVIDIA UK
GPU-accelerated data expansion
for the Marching Cubes algorithm
HistoPyramid performance
 Accelerated HistoPyramids using CUDA C
 HistoPyramid BuildUp
— Reduce 5-to-1, but store only first four sums!
— Build several levels via on-GPU shared memory
(less video memory transactions)
 Marching Cubes specific
— Share scalar input data amongst neighbouring MC cells
(through shared memory)
Backpack (iso=0.4) (www.volvis.org)
Size 512x512x373 (187 mb)
Triangles 3 745 320 (0.039 tris/cell)
OpenGL HP4MC 13 fps (1291 mvps)
CUDA-OpenGL HP5MC 43 fps (4129 mvps)
Speedup
3.2x
Head aneuyrism (iso=0.4) (www.volvis.org)
Size 512x512x512 (256 mb)
Triangles 583 610 (0.004 tris/cell)
OpenGL HP4MC 15 fps (2034 mvps)
CUDA-OpenGL HP5MC 78 fps (10399 mvps)
Speedup
5.1x
Christmas tree (iso=0.05) (TU Wien)
Size 512x499x512 (250 mb)
Triangles 5 629 532 (0.043 tris/cell)
OpenGL HP4MC 10 fps (1358 mvps)
CUDA-OpenGL HP5MC 28 fps (3704 mvps)
Speedup
2.7x
5123-ish 16-bit performance
End of Presentation

Weitere ähnliche Inhalte

Was ist angesagt?

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Johan Andersson
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Edge AI and Vision Alliance
 
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Takahiro Harada
 

Was ist angesagt? (20)

Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
Background Subtraction Algorithm for Moving Object Detection Using Denoising ...Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
 
Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...Parallel implementation of geodesic distance transform with application in su...
Parallel implementation of geodesic distance transform with application in su...
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihang
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...Adaptive lifting based image compression scheme using interactive artificial ...
Adaptive lifting based image compression scheme using interactive artificial ...
 
An35225228
An35225228An35225228
An35225228
 
Digital Image Processing: An Introduction
Digital Image Processing: An IntroductionDigital Image Processing: An Introduction
Digital Image Processing: An Introduction
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
 
Real Time Face Detection on GPU Using OPENCL
Real Time Face Detection on GPU Using OPENCLReal Time Face Detection on GPU Using OPENCL
Real Time Face Detection on GPU Using OPENCL
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Sobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGASobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGA
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL Accelerator
 
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture ScenesA Novel Background Subtraction Algorithm for Dynamic Texture Scenes
A Novel Background Subtraction Algorithm for Dynamic Texture Scenes
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
Using GPUs for Collision detection, Recent Advances in Real-Time Collision an...
 

Ähnlich wie PhD defense talk (portfolio of my expertise)

VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
Masatsugu HASHIMOTO
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Ähnlich wie PhD defense talk (portfolio of my expertise) (20)

Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Graphics pipelining
Graphics pipeliningGraphics pipelining
Graphics pipelining
 
Realtime 3D Visualization without GPU
Realtime 3D Visualization without GPURealtime 3D Visualization without GPU
Realtime 3D Visualization without GPU
 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdfMod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
 
Gpu
GpuGpu
Gpu
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoon
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Ch7 031102
Ch7 031102Ch7 031102
Ch7 031102
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Praseed Pai
Praseed PaiPraseed Pai
Praseed Pai
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 

Kürzlich hochgeladen

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

PhD defense talk (portfolio of my expertise)

  • 1. GPU Data Structures for Graphics and Vision Promotionskolloquium, May 6th 2011 Gernot Ziegler Dept. of Computer Graphics (3D Video and Vision-Based Graphics Group)
  • 2. Outline Graphics Hardware: Original Purpose and Recent Development Classical Usage in Visual Computing  Free Viewpoint Video Compression  Color and Depth Reprojection  Hierarchical Image Processing General Data Processing  Data Compaction with the HistoPyramid  Quadtree and Octree Generation  Data Expansion with the HistoPyramid Conclusion
  • 3. Graphics Hardware: Original Purpose  Graphics hardware accelerates typical data operations of computer graphics (pixel moves, triangle rasterization)  GPU is simpler in design than CPU, but massively parallel.
  • 4. Graphics Hardware: Capabilities  ~2003: Graphics Hardware becomes programmable: GPU (Graphics Processing Unit)
  • 5. Graphics Hardware: Capabilities  Data can now be anything (floating point & integer)  General Purpose Computing on GPU = GPGPU "Classical Usage" in Visual Computing (still graphics-related) – Computer Vision – Video processing – Volume analysis General Data Processing – PDE / ODE solver – Spatial Data Structure Generation – Database Ops – Etc… Game of Life (Early GPGPU by S. Green)
  • 6. Classical Usage in Visual Computing
  • 7. Free Viewpoint Video Compression (Chapter 3)  Map video footage into texture domain via proxy 3D model
  • 8. Free Viewpoint Video Compression (Chapter 3)  Obtain texture surface masking via shadow mapping
  • 9. Free Viewpoint Video Compression: Publications  G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, H.-P. Seidel. Multi-Video Compression in Texture Space. 11th IEEE Intl Conference on Image Processing (ICIP 2004), Singapore, pp. 2467-2470, 2004.  G. Ziegler, H. Lensch, M. Magnor, H.-P. Seidel. Multi-Video Compression in Texture Space using 4D SPIHT. 6th IEEE Workshop on Multimedia Signal Processing, Siena, Italy, pp. 39-42, 2004.
  • 10. Color and Depth Reprojection (Chapter 4)  Depth-map "Projection" via proxy mesh & vertex shader Novel View reconstruction from partial depth camera views
  • 11. Color and Depth Reprojection (Chapter 4) Blending by View Angle Our per-pixel approach (Purple: Blended areas)
  • 12. Hierarchical Image Processing: Stereo reconstruction (Chapter 5.1)  Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
  • 13. Hierarchical Image Processing: Stereo reconstruction (Chapter 5.1)  Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
  • 14. Hierarchical Image Processing: Reduction (Chapter 5.2)  Mipmap-like reduction: Dominant feature region, noise reduction
  • 15. Hierarchical Image Processing: Reduction (Thesis Chapter 5.3)  Histogram of local gradients guides lens warp compensation
  • 17. Graphics Hardware: Capabilities  GPU has massive computation and memory throughput
  • 18. Graphics Hardware: Limitations  GPU is connected with CPU via narrow databus (bandwidth bottleneck, approx. 4 GB/s)  GPU is a Stream processor: – 10K thread workload necessary to keep 100s cores busy (data parallelization!) – Thread switching lightweight, but synchronization expensive! – Each thread can only write at a fixed position  Algorithms must be redesigned for GPU!
  • 19. General Data Processing Data Compaction (Chapter 6)
  • 20. Data-Parallel Algorithm Challenges  Example from Computer Vision: List of all black pixels in an image  Step 1: Detect black pixels:  Step 2: Create a list of detected pixels
  • 21. Previous approach to feature list generation  Step 2 (List generation) was not possible on GPU!  2a: GPU marks local features (e.g. thresholding, filtering)  2b: CPU searches image and generates feature list  But: Bus transfers expensive: GPU useful only for complex feature isolation. (e.g. large filter convolution & thresholding)
  • 22. Our approach: Feature list generation on GPU  We generate feature lists on the GPU using data compaction.  Pixel/Voxels/Feature input is abstracted "data element stream”.  Compaction keeps only elements deemed relevant for output. 1D example (keep all elements that are blue):  Data flow: Massive speedup due to strongly reduced bus dataflow! 1 1 0 1A B C D E F 1 1 1A B D E
  • 23. Data Compaction: Problem task in 1D  Keep number of elements from input, based on a Classifier:  Implementation is trivial on CPU, single-thread.  On GPU: Need to parallelize into 10k threads!  First count number of output elements using data-parallel reduction!
  • 24. Data Compaction via HistoPyramid:Buildup  First, count number of output elements, e.g. 4:1 data-parallel reduction  (Note the reduction pyramid, it is retained - HistoPyramid)  Can now allocate compact output, no spill.  But how are output elements generated? Histogram pyramid / HistoPyramid
  • 25. Data Compaction via HistoPyramid: Traversal  Output generate: Start one thread per output element  Each output thread traverses reduction pyramid (read-only)  No read/write hazards = Data-parallel output writing!  As many threads as output elements
  • 26. HistoPyramid: 2D Data Compaction  1D was tutorial, actual implementation is 2D !  Dataflow diagram:
  • 27. GPU Data Compaction:Publications  Data Compaction fast enough for real-time volume analysis  First application: Mesh-to-volume-to-point cloud in real-time!  G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel On-the-fly Point Clouds through Histogram Pyramids 11th International Fall Workshop on Vision, Modeling and Visualization 2006 (VMV2006), 2006, pp. 137-144.
  • 28. GPU Data Compaction:Publications  Data Compaction fast enough for real-time volume analysis  First application: Mesh-to-volume-to-point cloud in real-time!  G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel On-the-fly Point Clouds through Histogram Pyramids 11th International Fall Workshop on Vision, Modeling and Visualization 2006 (VMV2006), 2006, pp. 137-144.
  • 29. GPU Data Compaction:Publications  Vector Field Contours: View-dependent vectorfield analysis to visualize contour lines throughout the volume  Data Compaction delivers seedpoints for contour lines in ms!  T. Annen, H. Theisel, C. Rössl, G. Ziegler, H.-P. Seidel Vector Field Contours Graphics Interface 2008, Windsor/Canada, 2008, pp. 97-105
  • 30. General Data Processing Quadtree and Octree Generation (Chapter 8 and 9)
  • 31. GPU Quadtrees: Introduction  2D Reduction follows a quadtree-like reduction pattern.  By tracking feature similarity in reduction, quadtrees can be created from the reduction pyramid!
  • 32. GPU QuadTree: Publications  Speed (ms) enables real-time quadtree processing from video! e.g. for Compression, Vision,..  G. Ziegler, R. Dimitrov, C. Theobalt, H-P. Seidel. Real-time Quadtree Analysis using HistoPyramids. SPIE Electronic Imaging conference, San Jose/USA, 2007.
  • 33. GPU Octree (Chapter 9)  Feature Clustering extended to 3D volumes  Octrees from Volume Data  New algorithm, pointer octrees (e.g. for spatial data structures)  Real-time creation of high-resolution octrees from meshes possible!
  • 34. General Data Processing Data Expansion (Chapter 7)
  • 35. Data Expansion via HistoPyramid: Problem task  We have a predicate function that determines how many output copies to create from each input element:  Implementation is trivial on CPU  GPU: Input can be divided amongst threads, but: Where shall each thread write its output?  Insight: HistoPyramid traversal works even here!
  • 36. Data Expansion via HistoPyramid: HP Buildup  First, count number of output elements, e.g. via 4:1 reduction:
  • 37. Data Expansion via HistoPyramid: HP Traversal (single output copy)  Traversal for single output elements:  Exactly like data compaction, but: Mind local key index kL
  • 38. Data Expansion via HistoPyramid: HP Traversal (multiple output copies)  Traversal for multiple output elements:  kL determines number of copy. Still: one thread for each copy!
  • 39. Data Expansion via HistoPyramid: HP Traversal (multiple output copies)  Traversal for multiple output elements:  kL determines number of copy.  Observation: Thread can modify input before write-out!  Thus: Output can be modified version of input based on kL.  e.g. Geometry Creation:  (Generic algorithm…)
  • 40. Data Expansion: Eikonal Rendering (Publication I)  Compute light transport through volume objects of varying refraction  Both real-time rendering and precomputed lighting simulation  Lighting simulation requires adaptive light wavefront simulation  I. Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel Eikonal Rendering: Efficient Light Transport in Refractive Objects ACM Transactions on Graphics 26 (3): 59-1 - 59-8, 2007 http://www.mpi-inf.mpg.de/resources/EikonalRendering/
  • 41. Eikonal Rendering: Lighting Simulation  For given light-object position, precompute lighting inside the volumetric object for real-time novel view rendering.
  • 42.  Lighting simulation implements numerical ODE solver on GPU.  Subdivide light's wavefront into a set of patches  Patch corners move as GPU particle system – Each particle follows ray optics  During update, some patches: – weaken too much (discard) – leave volume (discard) – grow too large (tesselate)  Since patch list is on GPU: – Discard: Data Compaction – Tesselate: Data Expansion Eikonal Rendering: Wavefront Propagation
  • 45. Data Expansion: Marching Cubes (Publication II)  Marching Cubes algorithm extracts iso-surfaces from volumes  Reformulate: Stream of voxels ... – is first compacted to the relevant iso-surface voxels – then expanded, becoming a stream of triangle vertices  C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel High-speed Marching Cubes using HistoPyramids Computer Graphics Forum 27 (8): 2028-2039, 2008 http://www.sintef.no/hpmc
  • 46. Performance of OpenGL approach (2007):  Geometry shader (GS), e.g. NVIDIA GeForce 8, enabled hardware data compaction & expansion for geometry - should obsolete HistoPyramids, but HP-MC outperforms geometry shader (HP-GS)! HP-MC was 2007 fastest known MC algorithm. (frames per second)
  • 48. Conclusion and Outlook  GPUs increasingly useful in general data processing  Programming Model Restrictions not always bad – Force programmer to change thought model – E.g.: Fixed Output Location created HistoPyramid traversal concept – Can be more efficient, even on more capable hardware! (atomic counters, geometry shaders have less performance)  Data-Parallel Algorithm Design is hard – But once done, parallelizable over any number of available cores (if sufficient data available) – Hard to imagine that auto-parallelization can achieve this  Future work – Connected components, distance transforms, SATs… – Accelerate further using CUDA C and OpenCL
  • 49. Other work based on presented algorithms Quadtree  C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass. QuadN4tree: A GPU-Friendly Quadtree Leaves Neighborhood Structure. Proc. of Computer Graphics International Conference (CGI) 2008.  C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass. Using Quadtrees for Energy Minimization Via Graph Cuts. Proc. of VMV - 12th Vision, Modeling, and Visualization Workshop, pp. 71-80. Data Expansion  C. Dyken, M. Reimers, J. Seland. Real-time GPU Silhouette Refinement using adaptively blended Bézier patches. Computer Graphics Forum, Volume 27, number 1, pp. 1-12, 2007. Data Compaction (Implementation)  J. Fung, S. Mann. OpenVIDIA: parallel GPU computer vision. Proc. of 13th annual ACM international conference on Multimedia, pp. 849 - 852. http://openvidia.sf.net
  • 52. San Jose (CA) | September 23rd, 2010 Christopher Dyken, SINTEF Norway Gernot Ziegler, NVIDIA UK GPU-accelerated data expansion for the Marching Cubes algorithm
  • 53. HistoPyramid performance  Accelerated HistoPyramids using CUDA C  HistoPyramid BuildUp — Reduce 5-to-1, but store only first four sums! — Build several levels via on-GPU shared memory (less video memory transactions)  Marching Cubes specific — Share scalar input data amongst neighbouring MC cells (through shared memory)
  • 54. Backpack (iso=0.4) (www.volvis.org) Size 512x512x373 (187 mb) Triangles 3 745 320 (0.039 tris/cell) OpenGL HP4MC 13 fps (1291 mvps) CUDA-OpenGL HP5MC 43 fps (4129 mvps) Speedup 3.2x Head aneuyrism (iso=0.4) (www.volvis.org) Size 512x512x512 (256 mb) Triangles 583 610 (0.004 tris/cell) OpenGL HP4MC 15 fps (2034 mvps) CUDA-OpenGL HP5MC 78 fps (10399 mvps) Speedup 5.1x Christmas tree (iso=0.05) (TU Wien) Size 512x499x512 (250 mb) Triangles 5 629 532 (0.043 tris/cell) OpenGL HP4MC 10 fps (1358 mvps) CUDA-OpenGL HP5MC 28 fps (3704 mvps) Speedup 2.7x 5123-ish 16-bit performance
  • 55.