SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
2010/2/25
1
GPGPU
Ked
Result
Computation of Normal Vector
Image resolution: 640 x 480
CPU: 625 clock time
GPU: 125 clock time
Result
Computation of Normal Vector
Image resolution: 1280 x 1024
CPU: 2500 clock time
GPU: 172 clock time
OK, What is GPU
 A graphics accelerator incorporates custom
microchips which contain special mathematical
operations commonly used in graphics rendering.
GPGPU
• General purpose computing on GPU
 GPGPU
 GPGP
 GP2
(boring RD -.-||)‫‏‬
hi, I am R2-D2
Why faster
2010/2/25
2
Why faster Why faster
 CPU GPU
General purpose Specialized hardware
Serial execution Parallel execution
Minimum latency Maximum throughput
Development tools:
Focus on GPGPU
 CUDA:
 Compute Unified Device Architecture
 Developed by NVIDIA
 C like language
 Full developing environment
 Compiler
 Debugger
 Math libraries
Development tools:
Focus on GPGPU
 Advantage:
 Shared memory amongst threads
 16k
 Faster downloads and readbacks to and from GPU
 Full support for integer and bitwise operations
Development tools:
Shader programming
 ARB low-level assembly language
 OpenGL shading language
 Cg programming language
 DirectX high-level shader language
Development tools:
Shader programming
2010/2/25
3
Development tools:
Shader programming
 Developing tools of GLSL:
Pipeline of GPU processing
Shader programming
Vertex shader
Fragment shader
Geometry shader
RenderMan shading language
 Developed by Pixar has uncompromising image
quality as its fundamental goal
 Light shader
 Displacement shader
 Surface shader
 Volume shader
 Imager shader
Vertex shader Fragment shader
2010/2/25
4
Streaming of fragment shader
 Stream processing is a computer programming paradigm,
related to SIMD, that allows some applications to more
easily exploit a limited form of parallel processing. Such
applications can use multiple computational units, such
as the floating point units on a GPU, without explicitly
managing allocation, synchronization, or communication
among those units.
Branch of fragment shader
Conception of GPGPU
 Textures => Computing arrays
 Vertex Coordinates => Computational range
 Fragment programs => Computation
 Read from framebuffer => Get result
Case study:
Computation of normal vector
 Normal(V0) =
[ normal(F401) +
normal(F102) +
normal(F203) +
normal(F304) ] / 4
 Normal(F102) =
cross(v1v0, v2v0)‫‏‬
Prepare:
Choose graphic card
Prepare:
Test the graphic card
need
2010/2/25
5
Use GLSL in BCB environment:
Call GLee library Other choice: GLew
Install shader:
Run-time building
Texture:
Computing array
Vertex coordinate:
Computational range
Fragment program:
Computation
Read from framebuffer:
Get result
FameBuffer Object is a better choice
2010/2/25
6
Trivia:
Ghost in numerical computing
Review the result
Image resolution: 640 x 480
CPU: 625 clock time
GPU: 125 clock time
Image resolution: 1280 x 1024
CPU: 2500 clock time
GPU: 172 clock time
Reference
 GPU Gems 2
 OpenGL Shading Language
 OpenGL Programming Guide
 Dominik Göddeke
-- GPGPU::Basic Math Tutorial
(website)‫‏‬
 GPGPU: SIGGRAPH 2004 course
 Batch, batch, batch:
what does it really means
Thx.

Weitere ähnliche Inhalte

Was ist angesagt?

Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingMesbah Uddin Khan
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1NVIDIA
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learninggeetachauhan
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Optimal Virtual Machine Placement across Multiple Cloud Providers
Optimal Virtual Machine Placement across Multiple Cloud ProvidersOptimal Virtual Machine Placement across Multiple Cloud Providers
Optimal Virtual Machine Placement across Multiple Cloud ProvidersSivadon Chaisiri
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)Gernot Ziegler
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
 
Metrics 2.0 @ Monitorama PDX 2014
Metrics 2.0 @ Monitorama PDX 2014Metrics 2.0 @ Monitorama PDX 2014
Metrics 2.0 @ Monitorama PDX 2014Dieter Plaetinck
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARMEdge AI and Vision Alliance
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizationsgeetachauhan
 
Example uses of gpu compute models
Example uses of gpu compute modelsExample uses of gpu compute models
Example uses of gpu compute modelsPedram Mazloom
 
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...VIMALKUMAR KUMARESAN
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 

Was ist angesagt? (20)

Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1High Performance Pedestrian Detection On TEGRA X1
High Performance Pedestrian Detection On TEGRA X1
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Optimal Virtual Machine Placement across Multiple Cloud Providers
Optimal Virtual Machine Placement across Multiple Cloud ProvidersOptimal Virtual Machine Placement across Multiple Cloud Providers
Optimal Virtual Machine Placement across Multiple Cloud Providers
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North Carolina
 
Metrics 2.0 @ Monitorama PDX 2014
Metrics 2.0 @ Monitorama PDX 2014Metrics 2.0 @ Monitorama PDX 2014
Metrics 2.0 @ Monitorama PDX 2014
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
An35225228
An35225228An35225228
An35225228
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizations
 
Example uses of gpu compute models
Example uses of gpu compute modelsExample uses of gpu compute models
Example uses of gpu compute models
 
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
IITB Poster. Benchmarking GPU-based Acceleration of Spark in ML Workload usin...
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 

Andere mochten auch

Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...npinto
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - ConfooSirKetchup
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUNur Ahmadi
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...Edge AI and Vision Alliance
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architectureCHIHTE LU
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Tomasz Bednarz
 

Andere mochten auch (20)

Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
[Harvard CS264] 06 - CUDA Ninja Tricks: GPU Scripting, Meta-programming & Aut...
 
Gpgpu intro
Gpgpu introGpgpu intro
Gpgpu intro
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 
General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - Confoo
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPU
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres..."The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 

Ähnlich wie GPU Accelerates Normal Vector Computation by Up to 5x Faster Than CPU

GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationBladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationCliff Kinard
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Nat Weerawan
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 

Ähnlich wie GPU Accelerates Normal Vector Computation by Up to 5x Faster Than CPU (20)

GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Deep Learning Edge
Deep Learning Edge Deep Learning Edge
Deep Learning Edge
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client PresentationBladeCenter GPU Expansion Blade (BGE) - Client Presentation
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)Griffon Topic2 Presentation (Tia)
Griffon Topic2 Presentation (Tia)
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdfMod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Qt Programming on TI Processors
Qt Programming on TI ProcessorsQt Programming on TI Processors
Qt Programming on TI Processors
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 

Mehr von Su Yan-Jen

Captain america painting competition -- 12
Captain america painting competition -- 12Captain america painting competition -- 12
Captain america painting competition -- 12Su Yan-Jen
 
Captain america painting competition -- 11
Captain america painting competition -- 11Captain america painting competition -- 11
Captain america painting competition -- 11Su Yan-Jen
 
Captain america painting competition 10
Captain america painting competition 10Captain america painting competition 10
Captain america painting competition 10Su Yan-Jen
 
Captain america painting competition 9
Captain america painting competition 9Captain america painting competition 9
Captain america painting competition 9Su Yan-Jen
 
Captain america painting competition 8
 Captain america painting competition 8 Captain america painting competition 8
Captain america painting competition 8Su Yan-Jen
 
Captain america painting competition 7
 Captain america painting competition 7 Captain america painting competition 7
Captain america painting competition 7Su Yan-Jen
 
Captain america painting competition 6
 Captain america painting competition 6 Captain america painting competition 6
Captain america painting competition 6Su Yan-Jen
 
Captain america painting competition 5
Captain america painting competition 5Captain america painting competition 5
Captain america painting competition 5Su Yan-Jen
 
Captain america painting competition 4
Captain america  painting competition 4Captain america  painting competition 4
Captain america painting competition 4Su Yan-Jen
 
Captain america painting competition 3
Captain america painting competition 3Captain america painting competition 3
Captain america painting competition 3Su Yan-Jen
 
Captain america painting competition 2
Captain america painting competition 2Captain america painting competition 2
Captain america painting competition 2Su Yan-Jen
 
Captain America painting competition
Captain America painting competitionCaptain America painting competition
Captain America painting competitionSu Yan-Jen
 
PM2.5 visualization
PM2.5 visualizationPM2.5 visualization
PM2.5 visualizationSu Yan-Jen
 
Stereo matching
Stereo matchingStereo matching
Stereo matchingSu Yan-Jen
 
Face recognition
Face recognitionFace recognition
Face recognitionSu Yan-Jen
 
Data mining of commercial surveillance
Data mining of commercial surveillanceData mining of commercial surveillance
Data mining of commercial surveillanceSu Yan-Jen
 
Fundamental matrix
Fundamental matrixFundamental matrix
Fundamental matrixSu Yan-Jen
 

Mehr von Su Yan-Jen (20)

Captain america painting competition -- 12
Captain america painting competition -- 12Captain america painting competition -- 12
Captain america painting competition -- 12
 
Captain america painting competition -- 11
Captain america painting competition -- 11Captain america painting competition -- 11
Captain america painting competition -- 11
 
Captain america painting competition 10
Captain america painting competition 10Captain america painting competition 10
Captain america painting competition 10
 
Captain america painting competition 9
Captain america painting competition 9Captain america painting competition 9
Captain america painting competition 9
 
Captain america painting competition 8
 Captain america painting competition 8 Captain america painting competition 8
Captain america painting competition 8
 
Captain america painting competition 7
 Captain america painting competition 7 Captain america painting competition 7
Captain america painting competition 7
 
Captain america painting competition 6
 Captain america painting competition 6 Captain america painting competition 6
Captain america painting competition 6
 
Captain america painting competition 5
Captain america painting competition 5Captain america painting competition 5
Captain america painting competition 5
 
Captain america painting competition 4
Captain america  painting competition 4Captain america  painting competition 4
Captain america painting competition 4
 
Captain america painting competition 3
Captain america painting competition 3Captain america painting competition 3
Captain america painting competition 3
 
Captain america painting competition 2
Captain america painting competition 2Captain america painting competition 2
Captain america painting competition 2
 
Captain America painting competition
Captain America painting competitionCaptain America painting competition
Captain America painting competition
 
PM2.5 visualization
PM2.5 visualizationPM2.5 visualization
PM2.5 visualization
 
Transformer 3
Transformer 3Transformer 3
Transformer 3
 
Transformer 2
Transformer 2Transformer 2
Transformer 2
 
Transformer
TransformerTransformer
Transformer
 
Stereo matching
Stereo matchingStereo matching
Stereo matching
 
Face recognition
Face recognitionFace recognition
Face recognition
 
Data mining of commercial surveillance
Data mining of commercial surveillanceData mining of commercial surveillance
Data mining of commercial surveillance
 
Fundamental matrix
Fundamental matrixFundamental matrix
Fundamental matrix
 

Kürzlich hochgeladen

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

GPU Accelerates Normal Vector Computation by Up to 5x Faster Than CPU

  • 1. 2010/2/25 1 GPGPU Ked Result Computation of Normal Vector Image resolution: 640 x 480 CPU: 625 clock time GPU: 125 clock time Result Computation of Normal Vector Image resolution: 1280 x 1024 CPU: 2500 clock time GPU: 172 clock time OK, What is GPU  A graphics accelerator incorporates custom microchips which contain special mathematical operations commonly used in graphics rendering. GPGPU • General purpose computing on GPU  GPGPU  GPGP  GP2 (boring RD -.-||)‫‏‬ hi, I am R2-D2 Why faster
  • 2. 2010/2/25 2 Why faster Why faster  CPU GPU General purpose Specialized hardware Serial execution Parallel execution Minimum latency Maximum throughput Development tools: Focus on GPGPU  CUDA:  Compute Unified Device Architecture  Developed by NVIDIA  C like language  Full developing environment  Compiler  Debugger  Math libraries Development tools: Focus on GPGPU  Advantage:  Shared memory amongst threads  16k  Faster downloads and readbacks to and from GPU  Full support for integer and bitwise operations Development tools: Shader programming  ARB low-level assembly language  OpenGL shading language  Cg programming language  DirectX high-level shader language Development tools: Shader programming
  • 3. 2010/2/25 3 Development tools: Shader programming  Developing tools of GLSL: Pipeline of GPU processing Shader programming Vertex shader Fragment shader Geometry shader RenderMan shading language  Developed by Pixar has uncompromising image quality as its fundamental goal  Light shader  Displacement shader  Surface shader  Volume shader  Imager shader Vertex shader Fragment shader
  • 4. 2010/2/25 4 Streaming of fragment shader  Stream processing is a computer programming paradigm, related to SIMD, that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the floating point units on a GPU, without explicitly managing allocation, synchronization, or communication among those units. Branch of fragment shader Conception of GPGPU  Textures => Computing arrays  Vertex Coordinates => Computational range  Fragment programs => Computation  Read from framebuffer => Get result Case study: Computation of normal vector  Normal(V0) = [ normal(F401) + normal(F102) + normal(F203) + normal(F304) ] / 4  Normal(F102) = cross(v1v0, v2v0)‫‏‬ Prepare: Choose graphic card Prepare: Test the graphic card need
  • 5. 2010/2/25 5 Use GLSL in BCB environment: Call GLee library Other choice: GLew Install shader: Run-time building Texture: Computing array Vertex coordinate: Computational range Fragment program: Computation Read from framebuffer: Get result FameBuffer Object is a better choice
  • 6. 2010/2/25 6 Trivia: Ghost in numerical computing Review the result Image resolution: 640 x 480 CPU: 625 clock time GPU: 125 clock time Image resolution: 1280 x 1024 CPU: 2500 clock time GPU: 172 clock time Reference  GPU Gems 2  OpenGL Shading Language  OpenGL Programming Guide  Dominik Göddeke -- GPGPU::Basic Math Tutorial (website)‫‏‬  GPGPU: SIGGRAPH 2004 course  Batch, batch, batch: what does it really means Thx.