SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Bartłomiej Filipek
www.bfilipek.com
mail@bfilipek.com
   How does this work?
   General architecture
   Advices
   Tools


                           The lecture will not cover the
                           technical details about the gpu, it
                           shows only overview needed to
                           understand current technologies
                           and standads.
GPU
      CPU
                                           Vertex               Fragment
                                         Processing             Processing

   application

                           BUS
                   Commands, Textures,                          Framebuffer
      3D Api        Vertices, Shaders,
(DirectX/OpenGL)
                          Data…             Memory

     Driver




                                                      Display
Vertex units
                Vertex
              Processing




                                               Memory/textures


              Fragment
              Processing


As we can see, previous architectures
matched vertex/fragment „fixed” chain… so at
the beginning all the data was processed in      Pixel units
„vertex units” and then it was moved to
fragment units.
   SISD – Single Instruction Single Data
     Standard way… one instruction is being executed
      per single data.

   SIMD – Single Instruction Multiple Data
     Instruction is being executed per several data –
     like for one 4D vector (128 bits)

   MIMD – Multiple Instructions Multiple Data
     Parrarel processing!
Vertex units used                                                   Units




                                                                                                                 Dynamic task division…
  Fragment units used                                                 Vertex units used
              u n u s e d                                                                fragment units used
Effect that uses a lot vertex processing                            Effect that uses a lot vertex processing


  Vertex units used                                                   Units

                                             Fixed task division…
               u n u s e d

  Fragment units used                                                 fragment units used
                                                                                         vertex units used
Effect that uses a lot fragment processing                          Effect that uses a lot fragment processing


Vertex units/Fragment units and their quantities were fixed – we had N vertex processors, and M
fragment processors, but now we have unifed architecture. That means that we have K units
that can process vertex and fragments… there is no difference between them.
Controller




                                        Stream processors




As we can see there are no
vertex/fragment units… instead there     Shared memory
are stream processors that can handle
both vertex and fragments… and even
more.
   Scalars… not Vectors!
     Stream processor uses only one data per
        instruction.
       But we have a lot of SP!
       SP gives far more great flexibility.
       GPGPU
       SIMT – Single Instruction Multiple Threads
   New architecture - NV
   DX11, OpenCL
   Miltithreaded Rendering
     Rendering commands can be called from difrent threads

   3 000 000 000 transistors!
   End of 2009? End of winter 2010? Never?



   Double precission callculations cost twice as much as float,
    not ten times as it was before!
   Debugging – one can debug gpu directly from VisualStudio
Fragment
Vertex       Shader
Shader


         Geometry
          Shader




                       CUDA
Unified Shader         OpenCL
                       DirectX Compute
                       ATI Stream
   General-purpose computing on graphics
    processing units
   Kernels – code that will be executed on the
    GPU
   Not only graphics but also:
     Physics
        ▪ Fluids
        ▪ Collisions
        ▪ N-body simulations…
       Financial
       Speach/Pattern recognition
       Phenomena modelling – weather…
       Neural nets
       AI
   Use as few as possible:
       calculations
       Huge textures – mimpaps instead
       interpolators
       Data
       Rendering state changes
       Dynamic Vertex Buffers
       Textures… use texture atlases maybe
       Texture fetches
   Use more:
     Batches
     Triangle stripes
   Use Maths
         Uniform sphere:
         p = sqrt(Rx^2 + Ry^2 +   (Rz + 1)^2) =
             sqrt(Rx^2 + Ry^2 +   Rz^2 + 2Rz + 1);
         R vector is normalized   so: Rx^2 + Ry^2 + Rz^2   = 1
         p = sqrt(2 * (Rz + 1))   = 1.414*sqrt(Rz + 1)

                                                                 Calculte this before it
                                                                 is send to the gpu!
   Reduce calculation on uniform vars!
        half4 main(float2 diffuse : TEXCOORD0,
                   uniform sampler2D diffuseTex,
                  uniform half4 g_OverbrightColor) {
                 return tex2D(diffuseTex, diffuse) * g_OverbrightColor * 3.0;
        }

   Normalize
dot(normalize(N), normalize(L)) uses two sqrts!
but:
(N/|N|) dot (L/|L|) = (N dot L) / (|N| * |L|) = (N dot L) / (sqrt( (N dot N) *
(L dot L) ) = (N dot L) * rsq( (N dot N) * (L dot L) )
Now we have only one sqrt – three dots are much cheaper than sqrt
   Texture lookups:
     ~ 10 : 1 (ALU:Sampler)
     Normalization cube map
     Single „Dot” is not worth texture lookups…
     But calculation of NormalDistribution… YES!


   Early Z-Test
     Depth-only Rendering, then full scene (for the
     second time)
   Lighten number of attributes – „pack” them as possible.
     float4 myData is better than:
      ▪ float3 myDataOne;
      ▪ float1 myDataTwo;

   But do not pack in interpolators
     Use as few scalars as possible
     When vectors are packed no optimalizations can be performed

   What do you really need?
     Normal, binormal, tangent… no! You need only two of them!
     Binormal = normal _Cross_ Tangent
PerfKit
   •For DirectX mostly
   •Little support for OpenGL – via glExpert

PiX for Windows
•Shows everything! But only for Windows, DirectX…


  AMD GPU Perf




          Similar to Pix, but for
          OpenGL… 800$ ;(
GLIntercept
• OpenGL
• free 
• log every call of opengl command
• edit shaders in realtime
• although it is a bit simple it has a
powerful impact on debugging…
GPU ShaderAnalyzer
• free, from AMD!
• glsl/hlsl
• shows number of asm instructions
• ALU, TEX instructions, etc..
• bottlenecks
FXComposer, by
NVidia

                                ShaderDesigner
                                by TyphoonLabs

                 RenderMonkey
                 by AMD/ATI
   PPAM – slajdy -
                  PARALLEL PROCESSING AND APPLIED MATHEMATICS, Wrocław 2009


   Developer.nvidia.com
   glintercept.nutty.org
   developer.amd.com
   Nvidia GeForce GTX 260/280 Review
GPU - how can we use it?

Weitere ähnliche Inhalte

Was ist angesagt?

NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016 OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016 otoyinc
 
GFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESGFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESPrabindh Sundareson
 
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelGDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelUmbra Software
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012Mark Kilgard
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderEidos-Montréal
 
Shadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics HardwareShadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics Hardwarestefan_b
 
Clustered defered and forward shading
Clustered defered and forward shadingClustered defered and forward shading
Clustered defered and forward shadingWuBinbo
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererDavide Pasca
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeWuBinbo
 
CS 354 Texture Mapping
CS 354 Texture MappingCS 354 Texture Mapping
CS 354 Texture MappingMark Kilgard
 
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"OTOY Inc.
 
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015otoyinc
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 

Was ist angesagt? (20)

NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016 OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016
OTOY Presentation - 2016 NVIDIA GPU Technology Conference - April 5 2016
 
GFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ESGFX Part 4 - Introduction to Texturing in OpenGL ES
GFX Part 4 - Introduction to Texturing in OpenGL ES
 
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas TrudelGDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
GDC16: Improving geometry culling for Deus Ex: Mankind Divided by Nicolas Trudel
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012GTC 2012: NVIDIA OpenGL in 2012
GTC 2012: NVIDIA OpenGL in 2012
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
Shadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics HardwareShadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics Hardware
 
Clustered defered and forward shading
Clustered defered and forward shadingClustered defered and forward shading
Clustered defered and forward shading
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES renderer
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
CS 354 Texture Mapping
CS 354 Texture MappingCS 354 Texture Mapping
CS 354 Texture Mapping
 
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"
OTOY GTC17 Presentation Slides: "The Future of GPU Rendering"
 
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015
OTOY Presentation - 2015 NVIDIA GPU Technology Conference - March 17 2015
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 

Ähnlich wie GPU - how can we use it?

Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCJeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCMLconf
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02RubnCuesta2
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012Mark Kilgard
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Alexander Dolbilov
 
Realtime 3D Visualization without GPU
Realtime 3D Visualization without GPURealtime 3D Visualization without GPU
Realtime 3D Visualization without GPUTobias G
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)Gernot Ziegler
 
DIANNE - A distributed deep learning framework on OSGi - Tim Verbelen
DIANNE - A distributed deep learning framework on OSGi - Tim VerbelenDIANNE - A distributed deep learning framework on OSGi - Tim Verbelen
DIANNE - A distributed deep learning framework on OSGi - Tim Verbelenmfrancis
 
Rendering basics
Rendering basicsRendering basics
Rendering basicsicedmaster
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 

Ähnlich wie GPU - how can we use it? (20)

Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
NvFX GTC 2013
NvFX GTC 2013NvFX GTC 2013
NvFX GTC 2013
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYCJeff Johnson, Research Engineer, Facebook at MLconf NYC
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02
 
SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012SIGGRAPH 2012: NVIDIA OpenGL for 2012
SIGGRAPH 2012: NVIDIA OpenGL for 2012
 
Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)Optimizing unity games (Google IO 2014)
Optimizing unity games (Google IO 2014)
 
Realtime 3D Visualization without GPU
Realtime 3D Visualization without GPURealtime 3D Visualization without GPU
Realtime 3D Visualization without GPU
 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)PhD defense talk (portfolio of my expertise)
PhD defense talk (portfolio of my expertise)
 
Beyond porting
Beyond portingBeyond porting
Beyond porting
 
DIANNE - A distributed deep learning framework on OSGi - Tim Verbelen
DIANNE - A distributed deep learning framework on OSGi - Tim VerbelenDIANNE - A distributed deep learning framework on OSGi - Tim Verbelen
DIANNE - A distributed deep learning framework on OSGi - Tim Verbelen
 
Rendering basics
Rendering basicsRendering basics
Rendering basics
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 

Mehr von Bartlomiej Filipek

Empty Base Class Optimisation, [[no_unique_address]] and other C++20 Attributes
Empty Base Class Optimisation, [[no_unique_address]] and other C++20 AttributesEmpty Base Class Optimisation, [[no_unique_address]] and other C++20 Attributes
Empty Base Class Optimisation, [[no_unique_address]] and other C++20 AttributesBartlomiej Filipek
 
C++17 std::filesystem - Overview
C++17 std::filesystem - OverviewC++17 std::filesystem - Overview
C++17 std::filesystem - OverviewBartlomiej Filipek
 
Let's talks about string operations in C++17
Let's talks about string operations in C++17Let's talks about string operations in C++17
Let's talks about string operations in C++17Bartlomiej Filipek
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Bartlomiej Filipek
 
WPF - the future of GUI is near
WPF - the future of GUI is nearWPF - the future of GUI is near
WPF - the future of GUI is nearBartlomiej Filipek
 

Mehr von Bartlomiej Filipek (8)

Empty Base Class Optimisation, [[no_unique_address]] and other C++20 Attributes
Empty Base Class Optimisation, [[no_unique_address]] and other C++20 AttributesEmpty Base Class Optimisation, [[no_unique_address]] and other C++20 Attributes
Empty Base Class Optimisation, [[no_unique_address]] and other C++20 Attributes
 
Vocabulary Types in C++17
Vocabulary Types in C++17Vocabulary Types in C++17
Vocabulary Types in C++17
 
C++17 std::filesystem - Overview
C++17 std::filesystem - OverviewC++17 std::filesystem - Overview
C++17 std::filesystem - Overview
 
Let's talks about string operations in C++17
Let's talks about string operations in C++17Let's talks about string operations in C++17
Let's talks about string operations in C++17
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)
 
Summary of C++17 features
Summary of C++17 featuresSummary of C++17 features
Summary of C++17 features
 
WPF - the future of GUI is near
WPF - the future of GUI is nearWPF - the future of GUI is near
WPF - the future of GUI is near
 
3D User Interface
3D User Interface3D User Interface
3D User Interface
 

Kürzlich hochgeladen

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

GPU - how can we use it?

  • 2. How does this work?  General architecture  Advices  Tools The lecture will not cover the technical details about the gpu, it shows only overview needed to understand current technologies and standads.
  • 3. GPU CPU Vertex Fragment Processing Processing application BUS Commands, Textures, Framebuffer 3D Api Vertices, Shaders, (DirectX/OpenGL) Data… Memory Driver Display
  • 4. Vertex units Vertex Processing Memory/textures Fragment Processing As we can see, previous architectures matched vertex/fragment „fixed” chain… so at the beginning all the data was processed in Pixel units „vertex units” and then it was moved to fragment units.
  • 5. SISD – Single Instruction Single Data  Standard way… one instruction is being executed per single data.  SIMD – Single Instruction Multiple Data  Instruction is being executed per several data – like for one 4D vector (128 bits)  MIMD – Multiple Instructions Multiple Data  Parrarel processing!
  • 6. Vertex units used Units Dynamic task division… Fragment units used Vertex units used u n u s e d fragment units used Effect that uses a lot vertex processing Effect that uses a lot vertex processing Vertex units used Units Fixed task division… u n u s e d Fragment units used fragment units used vertex units used Effect that uses a lot fragment processing Effect that uses a lot fragment processing Vertex units/Fragment units and their quantities were fixed – we had N vertex processors, and M fragment processors, but now we have unifed architecture. That means that we have K units that can process vertex and fragments… there is no difference between them.
  • 7. Controller Stream processors As we can see there are no vertex/fragment units… instead there Shared memory are stream processors that can handle both vertex and fragments… and even more.
  • 8. Scalars… not Vectors!  Stream processor uses only one data per instruction.  But we have a lot of SP!  SP gives far more great flexibility.  GPGPU  SIMT – Single Instruction Multiple Threads
  • 9. New architecture - NV  DX11, OpenCL  Miltithreaded Rendering  Rendering commands can be called from difrent threads  3 000 000 000 transistors!  End of 2009? End of winter 2010? Never?  Double precission callculations cost twice as much as float, not ten times as it was before!  Debugging – one can debug gpu directly from VisualStudio
  • 10. Fragment Vertex Shader Shader Geometry Shader CUDA Unified Shader OpenCL DirectX Compute ATI Stream
  • 11. General-purpose computing on graphics processing units  Kernels – code that will be executed on the GPU  Not only graphics but also:  Physics ▪ Fluids ▪ Collisions ▪ N-body simulations…  Financial  Speach/Pattern recognition  Phenomena modelling – weather…  Neural nets  AI
  • 12. Use as few as possible:  calculations  Huge textures – mimpaps instead  interpolators  Data  Rendering state changes  Dynamic Vertex Buffers  Textures… use texture atlases maybe  Texture fetches  Use more:  Batches  Triangle stripes
  • 13. Use Maths Uniform sphere: p = sqrt(Rx^2 + Ry^2 + (Rz + 1)^2) = sqrt(Rx^2 + Ry^2 + Rz^2 + 2Rz + 1); R vector is normalized so: Rx^2 + Ry^2 + Rz^2 = 1 p = sqrt(2 * (Rz + 1)) = 1.414*sqrt(Rz + 1) Calculte this before it is send to the gpu!  Reduce calculation on uniform vars! half4 main(float2 diffuse : TEXCOORD0, uniform sampler2D diffuseTex, uniform half4 g_OverbrightColor) { return tex2D(diffuseTex, diffuse) * g_OverbrightColor * 3.0; }  Normalize dot(normalize(N), normalize(L)) uses two sqrts! but: (N/|N|) dot (L/|L|) = (N dot L) / (|N| * |L|) = (N dot L) / (sqrt( (N dot N) * (L dot L) ) = (N dot L) * rsq( (N dot N) * (L dot L) ) Now we have only one sqrt – three dots are much cheaper than sqrt
  • 14. Texture lookups:  ~ 10 : 1 (ALU:Sampler)  Normalization cube map  Single „Dot” is not worth texture lookups…  But calculation of NormalDistribution… YES!  Early Z-Test  Depth-only Rendering, then full scene (for the second time)
  • 15. Lighten number of attributes – „pack” them as possible.  float4 myData is better than: ▪ float3 myDataOne; ▪ float1 myDataTwo;  But do not pack in interpolators  Use as few scalars as possible  When vectors are packed no optimalizations can be performed  What do you really need?  Normal, binormal, tangent… no! You need only two of them!  Binormal = normal _Cross_ Tangent
  • 16. PerfKit •For DirectX mostly •Little support for OpenGL – via glExpert PiX for Windows •Shows everything! But only for Windows, DirectX… AMD GPU Perf Similar to Pix, but for OpenGL… 800$ ;(
  • 17. GLIntercept • OpenGL • free  • log every call of opengl command • edit shaders in realtime • although it is a bit simple it has a powerful impact on debugging…
  • 18. GPU ShaderAnalyzer • free, from AMD! • glsl/hlsl • shows number of asm instructions • ALU, TEX instructions, etc.. • bottlenecks
  • 19. FXComposer, by NVidia ShaderDesigner by TyphoonLabs RenderMonkey by AMD/ATI
  • 20. PPAM – slajdy - PARALLEL PROCESSING AND APPLIED MATHEMATICS, Wrocław 2009  Developer.nvidia.com  glintercept.nutty.org  developer.amd.com  Nvidia GeForce GTX 260/280 Review