SlideShare ist ein Scribd-Unternehmen logo
1 von 26
A Scalable Tridiagonal Solver    For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and  technology of Hunan University
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is a tridiagonal system?
What is it used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language  CUDA Cyclic reduction Cyclic reduction 2006 2007
A Classic Serial Algorithm ,[object Object],Phase 1:Forword Reduction Phase 2:Backward Substitution Elimination steps? Complexity? 2n-1 O(n)=2(n-1)+1
Parallel Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A set of equations mapped to one thread A single equation mapped to one thread
Cyclic Reduction 2-4  threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4  threads working log 2 (8)=3 steps
Advantages of Previous Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hybird Algorithm ,[object Object],[object Object],One 8-unkown system One PCR step Parallel Thomas
GPU Implementation ,[object Object],[object Object],[object Object],[object Object]
Tiled PCR ,[object Object],Redundancy of  naive tiling  of PCR ,[object Object],[object Object],[object Object]
Dependency & Parallelism How to Reduce Redundancy? ,[object Object],[object Object],Solution 1 Redundancy is also exist!
Dependency & Parallelism cont Fine-grained tiling ,[object Object],[object Object],Solution 2 Without redundancy Sequential   Computation
Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate   results  are cached 2.Each tile are processed  parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed  sequentially  using cache
Components of Buffered Sliding Window ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example
Advantages of TPCR ,[object Object],[object Object],[object Object],[object Object]
Thread-level Parallel  Thomas Algorithm ,[object Object],[object Object],64B aligned segment 128B aligned segment
Performance Evaluation Test-Platform ,[object Object],[object Object],[object Object],[object Object]
Performance Results Parameter  M  and  N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
Performance Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object]
Reference ,[object Object],[object Object],[object Object],[object Object]
Question? Thanks

Weitere ähnliche Inhalte

Was ist angesagt?

Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoSciCompIIT
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsLionel Briand
 
Time space trade off
Time space trade offTime space trade off
Time space trade offanisha talwar
 
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.GeeksLab Odessa
 
Introduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsIntroduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsChristopher Gilbert
 
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...LogicMindtech Nologies
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...jemin lee
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtplYan Drugalya
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopenHajime Tazaki
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance ComputingJuris Vencels
 

Was ist angesagt? (20)

Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric Into
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Nicpaper2009
Nicpaper2009Nicpaper2009
Nicpaper2009
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal ConstraintsIdentifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
Identifying Optimal Trade-Offs between CPU Time Usage and Temporal Constraints
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
Java/Scala Lab 2016. Владимир Гарбуз: Написание безопасного кода на Java.
 
Introduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious AlgorithmsIntroduction to Cache-Oblivious Algorithms
Introduction to Cache-Oblivious Algorithms
 
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...An area efficient relaxed half-stochastic decoding architecture for nonbinary...
An area efficient relaxed half-stochastic decoding architecture for nonbinary...
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Ch5 answers
Ch5 answersCh5 answers
Ch5 answers
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
Programming Trends in High Performance Computing
Programming Trends in High Performance ComputingProgramming Trends in High Performance Computing
Programming Trends in High Performance Computing
 

Andere mochten auch

Linked in series b pitch
Linked in series b pitchLinked in series b pitch
Linked in series b pitchUzoma Ikechukwu
 
Pisa sokk
Pisa sokkPisa sokk
Pisa sokktotyik
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室denghao12436
 
Mantas of maldives part 1
Mantas of maldives part 1Mantas of maldives part 1
Mantas of maldives part 1Abdulla Shafeeg
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室denghao12436
 
Ensayo dominio público
Ensayo dominio públicoEnsayo dominio público
Ensayo dominio públicoCONZAGA C.A.
 
Deadgirl_horror film
Deadgirl_horror filmDeadgirl_horror film
Deadgirl_horror filmdcjvicta
 
Előadás
ElőadásElőadás
Előadástotyik
 
úJ nemzedék
úJ nemzedékúJ nemzedék
úJ nemzedéktotyik
 
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11xuanhieu123445
 
Látlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőlLátlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőltotyik
 
Iskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőlIskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőltotyik
 
Totyik tbemutatóóra
Totyik tbemutatóóraTotyik tbemutatóóra
Totyik tbemutatóóratotyik
 

Andere mochten auch (15)

Ch026
Ch026Ch026
Ch026
 
Linked in series b pitch
Linked in series b pitchLinked in series b pitch
Linked in series b pitch
 
Pisa sokk
Pisa sokkPisa sokk
Pisa sokk
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Mantas of maldives part 1
Mantas of maldives part 1Mantas of maldives part 1
Mantas of maldives part 1
 
互联网人类学研究室
互联网人类学研究室互联网人类学研究室
互联网人类学研究室
 
Ensayo dominio público
Ensayo dominio públicoEnsayo dominio público
Ensayo dominio público
 
Deadgirl_horror film
Deadgirl_horror filmDeadgirl_horror film
Deadgirl_horror film
 
Előadás
ElőadásElőadás
Előadás
 
úJ nemzedék
úJ nemzedékúJ nemzedék
úJ nemzedék
 
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
[14 10-2011 16-19_32]ds_du_thi_xep_lop_16_10_11
 
Látlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységrőlLátlelet a magyarországi szegénységről
Látlelet a magyarországi szegénységről
 
Manual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitatManual guardar agua chuva unhabitat
Manual guardar agua chuva unhabitat
 
Iskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésrőlIskolai előadás pedagógus és intézményi ellenőrzésről
Iskolai előadás pedagógus és intézményi ellenőrzésről
 
Totyik tbemutatóóra
Totyik tbemutatóóraTotyik tbemutatóóra
Totyik tbemutatóóra
 

Ähnlich wie Tridiagonal solver in gpu

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Cheng-Hsuan Li
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorJinho Lee
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsVajira Thambawita
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingAMD
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 IntroductionDr. Pankaj Zope
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1wjunjmt
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Codemotion
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05Ülger Ahmet
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodellingObsidian Software
 

Ähnlich wie Tridiagonal solver in gpu (20)

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Packet sniffing
Packet sniffingPacket sniffing
Packet sniffing
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Cisco crs1
Cisco crs1Cisco crs1
Cisco crs1
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
4g lte matlab
4g lte matlab4g lte matlab
4g lte matlab
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Tridiagonal solver in gpu

  • 1. A Scalable Tridiagonal Solver For GPUs Team:WenMin Xiao&ChaoQun Li Institute of information science and technology of Hunan University
  • 2.
  • 3. What is a tridiagonal system?
  • 4.
  • 5. Two Applications on GPU Depth of field blur, Michael Kass et al. Shallow water simulation OpenGL and Shader language CUDA Cyclic reduction Cyclic reduction 2006 2007
  • 6.
  • 7.
  • 8. Cyclic Reduction 2-4 threads working Forward Reduction Backward Substitution 8-unkown system 4-unkown system 2-unkown system Solve 2 unkowns Solve the rest 2 unkowns Solve the rest 4 unkonws 2*log2(8)-1 = 2*3 -1 = 5 steps
  • 9. Parallel Cyclic Reduction(PCR) Forward Redution No Backward Substitution One 8-unkown system Two 4-unkown systems Four 2-unkown systems Solve all unkowns 4 threads working log 2 (8)=3 steps
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Cache Design Buffered Sliding Window Illustration of the buffered sliding window 1. Immedicate results are cached 2.Each tile are processed parallel 3.Each of tile has multiple sub tiles 4.Sub tiles are processed sequentially using cache
  • 17.
  • 19.
  • 20.
  • 21.
  • 22. Performance Results Parameter M and N : number of systems and system size 8.3x and 49x speedups 5x and 30x speedups
  • 23.
  • 24.
  • 25.