SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
1

GPU Ray Tracing
with CUDA
BY TOM PITKIN

Bill Clark, PhD
Stu Steiner, MS, PhC
Objectives


Develop a sequential CPU and parallel GPU ray tracer



Illustrate the difference in rendering speed and design of a CPU and
GPU ray tracer

2
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

3
What is Ray Tracing?


Rendering technique used in computer graphics



Simulates the behavior of light



Can produce advanced optical effects

4
Light in the Physical World

5
Light Source

Film
Object with
Red Reflectivity

Pinhole
The Virtual Camera Model


Eye Position – camera location in 3D space



Reference Point – point in 3D space where the camera is pointing



Orientation Vectors (u, v, n) – camera orientation in 3D space



Image Plane – projected plane of the camera’s field of view

Reference Point
v (Up Vector)
n

u
Eye Position

6
Ray Generation


Map the physical screen to the image plane



Divide the image plane into a uniform grid of pixel locations



7

Send a ray through the center of each pixel location

𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡
𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡

Pixel
Eye Position
𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ
𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
Ray Intersection Testing


Ray – Sphere Intersection



Ray – Triangle Intersection

8
Phong Reflection Model

Ambient

+

Diffuse

+

Specular

9

=

Phong Reflection
Specular Reflection


Recursive Ray Tracing

10
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

11
What is CUDA?


Compute Unified Device Architecture (CUDA)



Parallel computing platform



Developed by Nvidia

12
Kernel Functions


Specifies the code to be executed in parallel



Single Program, Multiple Data (SPMD)

13
Kernel Execution


Grids



Blocks



Threads

14
Memory Model


Global Memory



Constant Memory



Texture Memory



Registers



Local Memory



Shared Memory

15
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

16
Thread Organization


2D array of blocks



2D array of threads



17

Each thread represents
a ray

Block (0, 0)

Block (1, 0)

Block (2, 0)

Block (0, 1)

Block (1, 1)

Block (2, 1)

Image Plane
Testing Environment


OS – Ubuntu Gnome Remix 13.04



CPU – Core i7-920




Core Clock – 2.66 GHz

GPU – Nvidia GTX 570


Core Clock - 742 MHz



CUDA Core - 480



Memory Clock - 3800 MHz



Video Memory - GDDR5 1280MB

18
Test Objects


Teapot






Surfaces: 1

Triangles: 992

Al





Surfaces: 174
Triangles: 7,124

Crocodile


Surfaces: 6



Triangles: 34,404

19
Single Kernel

20
Single Kernel
Single Thread

160 (0.16 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

411 (0.41 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

5,867 (5.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Kernel Complexity and Size


Driver timeout



Register Spilling

21
Replacing Recursion


Iterative Loop



Layer based stack




Layers store color values returned from rays

Final image from convex combination of layers

22
Multi-Kernel

23
Multi-Kernel
Single Kernel (Previous Kernel)

381 (0.38 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

160 (0.16 sec)

967 (0.97 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

411 (0.41 sec)

13,217 (13.22 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

5,867 (5.87 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Multi-Kernel with Single-Precision Floating Points

24

Multi-Kernel with Single-Precision
Floating Points
Multi-Kernel (Previous Kernel)
46 (0.05 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

381 (0.38 sec)

118 (0.12 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

967 (0.97 sec)

1,556 (1.56 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

13,217 (13.22 sec)
1

10

100

1,000

Milliseconds

10,000

100,000
Caching Surface Data


Object’s surface data stored on shared memory



All threads in same block have access to cached surface data



Removes duplicate memory requests



Data reuse

25
Multi-Kernel with Surface Caching

26

Multi-Kernel with Surface Caching
Multi-Kernel with Single-Precision
Floating Points (Previous Kernel)

30 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

46 (0.05 sec)

133 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

118 (0.12 sec)

1,007 (1.01 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,556 (1.56 sec)
1

10

100

Milliseconds

1,000

10,000
Simplifying Mesh Data

27



Triangle data originally stored as three points (vertices)



Optimize data by storing triangles as one point (vertex) and two edges


Calculate edges on host before kernel call

0.5, 1

0, 0

0.5, 1

1, 0
Multi-Kernel with Mesh Optimization

28

Multi-Kernel with Mesh Optimization
Multi-Kernel with Surface Caching
(Previous Kernel)

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

30 (0.03 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

133 (0.13 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,007 (1.01 sec)
1

10

100

Milliseconds

1,000

10,000
Final Results

29
Multi-Kernel with Intersection
Optimization
Single Thread

27 (0.03 sec)

Teapot
(Surfaces: 1)
(Triangles: 992)

23,003 (23 sec)

127 (0.13 sec)

Al
(Surfaces: 174)
(Triangles: 7,124)

55,260 (55.26 sec)

873 (0.87 sec)

Crocodile
(Surfaces: 6)
(Triangles: 34,404)

1,617,160 (26.95 min)
1

10

100

1,000

10,000

Milliseconds

100,000

1,000,000

10,000,000
Outline


Introduction to Ray Tracing



CUDA



Parallelization with CUDA / Results



Future Work



Questions

30
Future Work


Spatial partitioning



Multiple GPUs



Optimize code for different GPUs

31
Questions?

32

Weitere ähnliche Inhalte

Was ist angesagt?

My PhD thesis presentation slides
My PhD thesis presentation slidesMy PhD thesis presentation slides
My PhD thesis presentation slidesMattia Bosio
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAPJakub Bartczuk
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & SelectionEng Teong Cheah
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning systemswapnac12
 
Undergraduate Thesis Defense Slides
Undergraduate Thesis Defense SlidesUndergraduate Thesis Defense Slides
Undergraduate Thesis Defense SlidesHeather Jordan
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...ActiveEon
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentationPawan Singh
 

Was ist angesagt? (20)

My PhD thesis presentation slides
My PhD thesis presentation slidesMy PhD thesis presentation slides
My PhD thesis presentation slides
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & Selection
 
Concept learning
Concept learningConcept learning
Concept learning
 
Basic Classification.pptx
Basic Classification.pptxBasic Classification.pptx
Basic Classification.pptx
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Visual Question Answering 2.0
Visual Question Answering 2.0Visual Question Answering 2.0
Visual Question Answering 2.0
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
Undergraduate Thesis Defense Slides
Undergraduate Thesis Defense SlidesUndergraduate Thesis Defense Slides
Undergraduate Thesis Defense Slides
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Computer vision
Computer vision Computer vision
Computer vision
 
GAN Evaluation
GAN EvaluationGAN Evaluation
GAN Evaluation
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
 

Andere mochten auch

Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Peter Souter
 
Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughSavvius, Inc
 
Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)LJ PROJECTS
 
My Thesis Defense Presentation
My Thesis Defense PresentationMy Thesis Defense Presentation
My Thesis Defense PresentationOnur Taylan
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefenceCatie Chase
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalMiriam College
 

Andere mochten auch (8)

Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...Hardening Your Config Management - Security and Attack Vectors in Config Mana...
Hardening Your Config Management - Security and Attack Vectors in Config Mana...
 
Cyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enoughCyber Security - IDS/IPS is not enough
Cyber Security - IDS/IPS is not enough
 
Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)Computer Security and Intrusion Detection(IDS/IPS)
Computer Security and Intrusion Detection(IDS/IPS)
 
IDS and IPS
IDS and IPSIDS and IPS
IDS and IPS
 
My Thesis Defense Presentation
My Thesis Defense PresentationMy Thesis Defense Presentation
My Thesis Defense Presentation
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis Defence
 
How to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a ProfessionalHow to Defend your Thesis Proposal like a Professional
How to Defend your Thesis Proposal like a Professional
 

Ähnlich wie Computer Science Thesis Defense

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805mistercteam
 
Computer graphics
Computer graphicsComputer graphics
Computer graphicsMohsin Azam
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsFisnik Kraja
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured articleKangZhang
 
高解析度面板瑕疵檢測
高解析度面板瑕疵檢測高解析度面板瑕疵檢測
高解析度面板瑕疵檢測CHENHuiMei
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMahesh Khadatare
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Yutaka KATAYAMA
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesDouglas Lanman
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesSubhajit Sahu
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APITomi Aarnio
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko3D
 

Ähnlich wie Computer Science Thesis Defense (20)

The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
Computer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT DevicesComputer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT Devices
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsPerformance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
2.Hardware.ppt
2.Hardware.ppt2.Hardware.ppt
2.Hardware.ppt
 
高解析度面板瑕疵檢測
高解析度面板瑕疵檢測高解析度面板瑕疵檢測
高解析度面板瑕疵檢測
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generation
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
 
thesis
thesisthesis
thesis
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : Notes
 
Svr Raskar
Svr RaskarSvr Raskar
Svr Raskar
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics API
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525
 

Kürzlich hochgeladen

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Computer Science Thesis Defense

  • 1. 1 GPU Ray Tracing with CUDA BY TOM PITKIN Bill Clark, PhD Stu Steiner, MS, PhC
  • 2. Objectives  Develop a sequential CPU and parallel GPU ray tracer  Illustrate the difference in rendering speed and design of a CPU and GPU ray tracer 2
  • 3. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 3
  • 4. What is Ray Tracing?  Rendering technique used in computer graphics  Simulates the behavior of light  Can produce advanced optical effects 4
  • 5. Light in the Physical World 5 Light Source Film Object with Red Reflectivity Pinhole
  • 6. The Virtual Camera Model  Eye Position – camera location in 3D space  Reference Point – point in 3D space where the camera is pointing  Orientation Vectors (u, v, n) – camera orientation in 3D space  Image Plane – projected plane of the camera’s field of view Reference Point v (Up Vector) n u Eye Position 6
  • 7. Ray Generation  Map the physical screen to the image plane  Divide the image plane into a uniform grid of pixel locations  7 Send a ray through the center of each pixel location 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝐻𝑒𝑖𝑔ℎ𝑡 𝑆𝑐𝑟𝑒𝑒𝑛 𝐻𝑒𝑖𝑔ℎ𝑡 Pixel Eye Position 𝐼𝑚𝑎𝑔𝑒 𝑃𝑙𝑎𝑛𝑒 𝑊𝑖𝑑𝑡ℎ 𝑆𝑐𝑟𝑒𝑒𝑛 𝑊𝑖𝑑𝑡ℎ
  • 8. Ray Intersection Testing  Ray – Sphere Intersection  Ray – Triangle Intersection 8
  • 11. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 11
  • 12. What is CUDA?  Compute Unified Device Architecture (CUDA)  Parallel computing platform  Developed by Nvidia 12
  • 13. Kernel Functions  Specifies the code to be executed in parallel  Single Program, Multiple Data (SPMD) 13
  • 15. Memory Model  Global Memory  Constant Memory  Texture Memory  Registers  Local Memory  Shared Memory 15
  • 16. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 16
  • 17. Thread Organization  2D array of blocks  2D array of threads  17 Each thread represents a ray Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Image Plane
  • 18. Testing Environment  OS – Ubuntu Gnome Remix 13.04  CPU – Core i7-920   Core Clock – 2.66 GHz GPU – Nvidia GTX 570  Core Clock - 742 MHz  CUDA Core - 480  Memory Clock - 3800 MHz  Video Memory - GDDR5 1280MB 18
  • 19. Test Objects  Teapot    Surfaces: 1 Triangles: 992 Al    Surfaces: 174 Triangles: 7,124 Crocodile  Surfaces: 6  Triangles: 34,404 19
  • 20. Single Kernel 20 Single Kernel Single Thread 160 (0.16 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 411 (0.41 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 5,867 (5.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 21. Kernel Complexity and Size  Driver timeout  Register Spilling 21
  • 22. Replacing Recursion  Iterative Loop  Layer based stack   Layers store color values returned from rays Final image from convex combination of layers 22
  • 23. Multi-Kernel 23 Multi-Kernel Single Kernel (Previous Kernel) 381 (0.38 sec) Teapot (Surfaces: 1) (Triangles: 992) 160 (0.16 sec) 967 (0.97 sec) Al (Surfaces: 174) (Triangles: 7,124) 411 (0.41 sec) 13,217 (13.22 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 5,867 (5.87 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 24. Multi-Kernel with Single-Precision Floating Points 24 Multi-Kernel with Single-Precision Floating Points Multi-Kernel (Previous Kernel) 46 (0.05 sec) Teapot (Surfaces: 1) (Triangles: 992) 381 (0.38 sec) 118 (0.12 sec) Al (Surfaces: 174) (Triangles: 7,124) 967 (0.97 sec) 1,556 (1.56 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 13,217 (13.22 sec) 1 10 100 1,000 Milliseconds 10,000 100,000
  • 25. Caching Surface Data  Object’s surface data stored on shared memory  All threads in same block have access to cached surface data  Removes duplicate memory requests  Data reuse 25
  • 26. Multi-Kernel with Surface Caching 26 Multi-Kernel with Surface Caching Multi-Kernel with Single-Precision Floating Points (Previous Kernel) 30 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 46 (0.05 sec) 133 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 118 (0.12 sec) 1,007 (1.01 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,556 (1.56 sec) 1 10 100 Milliseconds 1,000 10,000
  • 27. Simplifying Mesh Data 27  Triangle data originally stored as three points (vertices)  Optimize data by storing triangles as one point (vertex) and two edges  Calculate edges on host before kernel call 0.5, 1 0, 0 0.5, 1 1, 0
  • 28. Multi-Kernel with Mesh Optimization 28 Multi-Kernel with Mesh Optimization Multi-Kernel with Surface Caching (Previous Kernel) 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 30 (0.03 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 133 (0.13 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,007 (1.01 sec) 1 10 100 Milliseconds 1,000 10,000
  • 29. Final Results 29 Multi-Kernel with Intersection Optimization Single Thread 27 (0.03 sec) Teapot (Surfaces: 1) (Triangles: 992) 23,003 (23 sec) 127 (0.13 sec) Al (Surfaces: 174) (Triangles: 7,124) 55,260 (55.26 sec) 873 (0.87 sec) Crocodile (Surfaces: 6) (Triangles: 34,404) 1,617,160 (26.95 min) 1 10 100 1,000 10,000 Milliseconds 100,000 1,000,000 10,000,000
  • 30. Outline  Introduction to Ray Tracing  CUDA  Parallelization with CUDA / Results  Future Work  Questions 30
  • 31. Future Work  Spatial partitioning  Multiple GPUs  Optimize code for different GPUs 31

Hinweis der Redaktion

  1. Used C++ and CUDA
  2. Forward Ray TracingBackward Ray Tracing
  3. Pixel – picture element that represents one point on an image. Consists of a single color
  4. Don’t forget to mention what happens if a ray misses completely
  5. Ambient Light – indirect light reflected off of other objects in the sceneDiffuse Light – direct light reflected off the surface in all directionsSpecular light – direct light reflected off the surface in a single direction
  6. Block and Threads have unique identifier
  7. Register Memory – 50x faster than Global MemoryL2 Cache – LRU (Least Recently Used)L1 Cache – Spatial Locality (Quickly access memory in nearby location of current memory reference), Caches per-thread stack and other local data structures
  8. Logarithmic ScaleSingle Pass, 640 x 480
  9. 1852X speedup!