SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Parallel Computing: Perspectives for more e cient
hydrological modeling

Grigorios Anagnostopoulos

Internal Seminar, 11.10.2011
General Concepts

GPU Programming

CA Parallel implementation

What is parallel computing?

Simultaneous use of multiple computing resources to solve a single
computational problem.
The computing resources can be:
A single computer with multiple processors.
A number of computers connected to a network.
A combination of both.

Benefits of parallel computing:
The computational load is broken apart in discrete pieces of work
that can be treated simultaneously.
The total simulation time is much less using multiple computing
resources.

Parallel Computing: Perspectives for more e cient hydrological modeling
2 / 20
General Concepts

GPU Programming

CA Parallel implementation

Parallel Computer Models Classification
Parallel Computer Classification
Flynn’s taxonomy: A widely used classification
Flynn's taxonomy: a widely used classifications
Classify along two independent dimensions:



◦ Classify along two independent dimensions:
Instruction and Data.
 Instruction and Data
Each dimension can have two possible states:
◦ Each dimension can have two possible states:
 Single or Multiple
Single or Multiple.
SISD
Single Instruction,
Single Data

SIMD
Single Instruction,
Multiple Data

MISD
Multiple Instruction,
Single Data

MIMD
Multiple Instruction,
Multiple Data

38
Parallel Computing: Perspectives for more e cient hydrological modeling
3 / 20
General Concepts

CPU

CPU GPU Programming
CPU

CPU
CA Parallel implementation

MIMD: Multiple Instruction, Multiple Data
The most common type of Interconnectcomputer (most modern
parallel
computers fall into this category).
Consists of a collection of fully independent processing units or
Memory
cores having their own control unit and its own ALU.
Execution
FIGURE 2.3

can be synchronous or asynchronous, as the processors
own pace.

Acan operate system
shared-memory at their

CPU

CPU

CPU

CPU

Memory

Memory

Memory

Memory

Interconnect

FIGURE 2.4
A distributed-memory system Parallel Computing: Perspectives for more e cient hydrological modeling

4 / 20
General Concepts

GPU Programming

CA Parallel implementation

Parallelism: An everyday example

Parallelism



Task parallelism: the ability to execute di↵erent tasks within a
problem at the same time.

As an analogy, think about a farmer who
hires workers to pick apples from an
orchard of trees

Data parallelism: the ability to execute parts of the same task on
di↵erent data at the same time.

◦ Worker  hardware
As an analogy, think about a
farmer who hires workers to
(processing element)
pick apples from his trees:

◦ Trees  tasks

Worker = hardware
◦ Apples  data
(processing element).
Trees = task.
Apples = data.

Parallel Computing: Perspectives for more e cient hydrological modeling
5 / 20
47
Parallelism

General Concepts



GPU Programming

CA Parallel implementation

Sequential approach
The serial approach would be to have one
worker pick all of the apples from each tree

The sequential approach would be to have the worker pick all of
the apples from each tree.

48

Parallel Computing: Perspectives for more e cient hydrological modeling
6 / 20
Parallelism – More workers
workers
Parallelism: More

General Concepts

GPU Programming

CA Parallel implementation

Data parallel hardware: Working on the same tree, which allows
Working on the same tree.
each task parallel hardware, and would allow each task to
◦ data to be completed quicker.

be completed quicker work per tree?
How many workers should

 How many workers should there be per tree?
What ififsome trees have few apples, while others have many?
What some trees have few apples, while others many?

49
Parallel Computing: Perspectives for more e cient hydrological modeling
7 / 20
Parallelism – More workers
Parallelism: More workers

General Concepts

GPU Programming

CA Parallel implementation

 Each parallelism: Each worker pick a different tree
Task worker pick apples from apples from a di↵erent tree.

◦ Task parallelism, and although each task takes the
Although as in the serial version, many are
same time each task takes the same time as in the sequential version,
many tasks are parallel
accomplished inaccomplished in parallel.
What there are only few densely populated trees?
◦ What if if there are only aafew densely populated trees?

50
Parallel Computing: Perspectives for more e cient hydrological modeling
8 / 20
General Concepts

GPU Programming

CA Parallel implementation

Algorithm Decomposition

Task Decomposition
Most of engineering problems are non trivial and it is crucial to



have more formal to functionally independent parts
reduces an algorithm concepts for determining parallelism.
Tasks may have dependencies on other tasks
The concept of decomposition
◦ If the input of task B is dependent on the output of task A, then task
B is Task decomposition: dividing the algorithm into individual tasks,
dependent on task A
which are functionally independent. Tasks which don’t have
◦ Tasks that don’t have dependencies (or whose dependencies are
dependencies (or whose dependencies are completed) can be
completed) can be executed at any time to achieve parallelism
executed at any time to achieve parallelism.
◦ Task dependency graphs are used to describe the relationship
Data decomposition: dividing a data set into discrete chunks that
between tasks
can be processed in parallel.
A

B

A

B is dependent on A

B

C

A and B are independent
of each other

C is dependent on A and B

Parallel Computing: Perspectives for more e cient hydrological modeling
52

9 / 20
General Concepts



GPU Programming
CA Parallel
A quiet revolution and potential build-up implementation
◦ Calculation:TFLOPS Programming?
Why GPU vs. 100 GFLOPS

◦

Memory Bandwidth: ~10x

Many-core GPU

Multi-core CPU

Courtesy: John Owens

Figure 1.1. GPU in every PC– massive volume and potential impact
◦ Enlarging Perform ance Gap betw een GPUs and CPUs.
Parallel programming is easier than ever because it can be done at
relative low-end pc’s.
10

Cards such as the Nvidia Tesla C1060 and GT200 contain 240
cores, each of which is highly multithreaded.
Parallel Computing: Perspectives for more e cient hydrological modeling
10 / 20
General Concepts

●

CPU

GPU Programming

CA Parallel implementation

GPU vs CPU

●
●

●

GPU: area used for but very cache
Most die Few instructions memoryfast execution. Uses very fast

Relatively few transistors for ALUs

GDDR3 RAM. Most die area is used for ALUs and the caches are
relative small.

GPU CPU: Lots of instructions but slower execution. Uses slower DDR2
●

or die area used it ALUs
Most DDR3 RAM (butfor has direct access to more memory than

●

Relativelyfew transistors for ALUs.
relative small caches

GPUs). Most die area is used for memory cache and there are

Parallel Computing: Perspectives for more e cient hydrological modeling
11 / 20
General Concepts

GPU Programming

CA Parallel implementation

GPU is fastGPU is fast

Parallel Computing: Perspectives for more e cient hydrological modeling
12 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA: Compute Unified Device Architecture
CUDA Program: Consists of phases that are executed on either
the host (CPU) or a device (GPU).
No data parallelism = the code is executed at the host.
Data parallelism = the code is executed at the device.

Data-parallel portions of an application are expressed as device
kernels which run on the device.

Arrays of Parallel Threads

GPU kernels are written using the Single Program Multiple Data
(SPMD) programming model.
• A CUDA kernel is executed by an array of

threads
SPMD executes multiple instances of the same program
– All threads run the same code (SPMD)  
independently, where eachthat it uses to compute memorya di↵erent portion of
– Each thread has an ID program works on addresses and
the data. make control decisions
threadID

0 1 2 3 4 5 6 7

…
float x = input[threadID];
float y = func(x);
output[threadID] = y;
…

Parallel Computing: Perspectives for more e cient hydrological modeling
15

13 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA: Compute Unified Device Architecture
Chapter 2. Programming Model

Grid

A CUDA kernel is executed
by an array of threads.
Each thread has an ID,
which is used to compute
memory addresses and make
control decisions.
CUDA threads are organized
into multiple blocks.
Threads within a block
cooperate via shared
memory, atomic operations
and barrier synchronization.

Block (0, 0)

Block (1, 0)

Block (2, 0)

Block (0, 1)

Block (1, 1)

Block (2, 1)

Block (1, 1)
Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0)

Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1)

Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2)

Figure 2-1.Grid of Thread Blocks
Parallel Computing: Perspectives for more e cient hydrological modeling

2.3

Memory Hierarchy

14 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA memory types
Chapter 4: Hardware Implementation

Global memory: Low
bandwidth but large space.
Fastest read/write calls if
they are coalesced.

Device
Multiprocessor N

Multiprocessor 2
Multiprocessor 1

Texture memory: Cache
optimized for 2D spatial
patterns.

Shared Memory
Registers

Constant memory: Slow,
but with cache (8 kb).

Processor 1

Registers

Processor 2

Registers

…

Instruction
Unit
Processor M

Constant
Cache

Shared memory: Fast, but
it can be used only by the
threads of the same block.

Texture
Cache

Device Memory

Registers: 32768 32-bit
registers per Multi-processor.

A set of SIMT multiprocessors with on-chip shared memory.

Figure 4-2.Hardware Model

Parallel Computing: Perspectives for more e cient hydrological modeling

4.2

Multiple Devices

15 / 20
General Concepts

GPU Programming

CA Parallel implementation

CA Parallel implementation
A parallel version of the Cellular Automata algorithm for variably
saturated flow in soils was developed in CUDA API.
The infiltration experiment of Vauclin et al. (1979) was chosen as a
benchmark test for the accuracy and the speed of the algorithm.
0
t = 2 hrs
t = 3 hrs
t = 4 hrs
t = 8 hrs
experimental data

Water Depth (m)

0.5

1

1.5

2
0

0.5

1

1.5
Distance (m)

2

2.5

3

Parallel Computing: Perspectives for more e cient hydrological modeling
16 / 20
General Concepts

GPU Programming

CA Parallel implementation

Why parallel code is important?

In real case scenarios, where the 3-D simulation of large areas is
needed, the grid sizes are excessively large.
In natural hazards assessment the simulations should be fast in order
to be useful (the prediction should be before the actual event!).
Fast simulations allow us to calibrate easier the model parameters
and investigate more e ciently the physical phenomena.

The inherent CA concept natural parallelism make easier the
parallel implementation of the algorithm.

Parallel Computing: Perspectives for more e cient hydrological modeling
17 / 20
General Concepts

GPU Programming

CA Parallel implementation

Technical details
Di culties
The most challenging issue was the irregular geometry of the
domain which made more di cult the exploitation of the locality at
the thread computations and the use of the shared memory.
The cell values were stored in a 1D array and for each cell the
indexes of its neighboring cells were also stored.

Code structure
Simulation constants are stored in the constant memory.
Soil properties for each soil class are stored in the texture memory.
Atomic operations are used in order to check for convergence at
every iteration.
The shared memory is used to accelerate the atomic operations and
the block’s memory accesses.
Parallel Computing: Perspectives for more e cient hydrological modeling
18 / 20
General Concepts

GPU Programming

CA Parallel implementation

Results of the numerical tests
Nvidia Quadro 2000:
192 CUDA cores.
1 GB GDDR5 of RAM memory.

100000"

90"

70"

Speed%Up%

Speed%(%cells/sec%)%

80"
10000"
1000"
100"

CPU"

10"

GPU"

60"
50"
40"
30"
20"
10"

1"
1000"

10000"

100000"

Number%of%Cells%

1000000"

10000000"

0"
1000"

10000"

100000"
Number%of%Cells%

1000000"

10000000"

Parallel Computing: Perspectives for more e cient hydrological modeling
19 / 20
Thanks for your attention!

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScsandit
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...Edge AI and Vision Alliance
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarCloudera, Inc.
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...AMD Developer Central
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 

Was ist angesagt? (20)

Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Modern processors
Modern processorsModern processors
Modern processors
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Lec11 timing
Lec11 timingLec11 timing
Lec11 timing
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
GCF
GCFGCF
GCF
 

Andere mochten auch

Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel ComputingDavid Chou
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 

Andere mochten auch (6)

Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Introduction P2p
Introduction P2pIntroduction P2p
Introduction P2p
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 

Ähnlich wie Parallel Computing: Perspectives for more efficient hydrological modeling

Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
Content-Based Matching on GPUs
Content-Based Matching on GPUsContent-Based Matching on GPUs
Content-Based Matching on GPUsAlessandro Margara
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Sergey Platonov
 

Ähnlich wie Parallel Computing: Perspectives for more efficient hydrological modeling (20)

Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
CUDA
CUDACUDA
CUDA
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Handout3o
Handout3oHandout3o
Handout3o
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Content-Based Matching on GPUs
Content-Based Matching on GPUsContent-Based Matching on GPUs
Content-Based Matching on GPUs
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...Dori Exterman, Considerations for choosing the parallel computing strategy th...
Dori Exterman, Considerations for choosing the parallel computing strategy th...
 

Mehr von Grigoris Anagnostopoulos

Credibility of climate predictions revisited
Credibility of climate predictions revisitedCredibility of climate predictions revisited
Credibility of climate predictions revisitedGrigoris Anagnostopoulos
 
Modelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automataModelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automataGrigoris Anagnostopoulos
 
Hydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow LandslidesHydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow LandslidesGrigoris Anagnostopoulos
 
Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Grigoris Anagnostopoulos
 
A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...Grigoris Anagnostopoulos
 

Mehr von Grigoris Anagnostopoulos (6)

Credibility of climate predictions revisited
Credibility of climate predictions revisitedCredibility of climate predictions revisited
Credibility of climate predictions revisited
 
Modelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automataModelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automata
 
Hydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow LandslidesHydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow Landslides
 
Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)
 
A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...
 
Hydrological Modelling of Slope Stability
Hydrological Modelling of Slope StabilityHydrological Modelling of Slope Stability
Hydrological Modelling of Slope Stability
 

Kürzlich hochgeladen

Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashasashalaycock03
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptxmary850239
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxCapitolTechU
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxProtein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxvidhisharma994099
 

Kürzlich hochgeladen (20)

Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sasha
 
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic SupportMarch 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptxSlides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
Slides CapTechTalks Webinar March 2024 Joshua Sinai.pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxProtein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptx
 

Parallel Computing: Perspectives for more efficient hydrological modeling

  • 1. Parallel Computing: Perspectives for more e cient hydrological modeling Grigorios Anagnostopoulos Internal Seminar, 11.10.2011
  • 2. General Concepts GPU Programming CA Parallel implementation What is parallel computing? Simultaneous use of multiple computing resources to solve a single computational problem. The computing resources can be: A single computer with multiple processors. A number of computers connected to a network. A combination of both. Benefits of parallel computing: The computational load is broken apart in discrete pieces of work that can be treated simultaneously. The total simulation time is much less using multiple computing resources. Parallel Computing: Perspectives for more e cient hydrological modeling 2 / 20
  • 3. General Concepts GPU Programming CA Parallel implementation Parallel Computer Models Classification Parallel Computer Classification Flynn’s taxonomy: A widely used classification Flynn's taxonomy: a widely used classifications Classify along two independent dimensions:  ◦ Classify along two independent dimensions: Instruction and Data.  Instruction and Data Each dimension can have two possible states: ◦ Each dimension can have two possible states:  Single or Multiple Single or Multiple. SISD Single Instruction, Single Data SIMD Single Instruction, Multiple Data MISD Multiple Instruction, Single Data MIMD Multiple Instruction, Multiple Data 38 Parallel Computing: Perspectives for more e cient hydrological modeling 3 / 20
  • 4. General Concepts CPU CPU GPU Programming CPU CPU CA Parallel implementation MIMD: Multiple Instruction, Multiple Data The most common type of Interconnectcomputer (most modern parallel computers fall into this category). Consists of a collection of fully independent processing units or Memory cores having their own control unit and its own ALU. Execution FIGURE 2.3 can be synchronous or asynchronous, as the processors own pace. Acan operate system shared-memory at their CPU CPU CPU CPU Memory Memory Memory Memory Interconnect FIGURE 2.4 A distributed-memory system Parallel Computing: Perspectives for more e cient hydrological modeling 4 / 20
  • 5. General Concepts GPU Programming CA Parallel implementation Parallelism: An everyday example Parallelism  Task parallelism: the ability to execute di↵erent tasks within a problem at the same time. As an analogy, think about a farmer who hires workers to pick apples from an orchard of trees Data parallelism: the ability to execute parts of the same task on di↵erent data at the same time. ◦ Worker  hardware As an analogy, think about a farmer who hires workers to (processing element) pick apples from his trees: ◦ Trees  tasks Worker = hardware ◦ Apples  data (processing element). Trees = task. Apples = data. Parallel Computing: Perspectives for more e cient hydrological modeling 5 / 20 47
  • 6. Parallelism General Concepts  GPU Programming CA Parallel implementation Sequential approach The serial approach would be to have one worker pick all of the apples from each tree The sequential approach would be to have the worker pick all of the apples from each tree. 48 Parallel Computing: Perspectives for more e cient hydrological modeling 6 / 20
  • 7. Parallelism – More workers workers Parallelism: More General Concepts GPU Programming CA Parallel implementation Data parallel hardware: Working on the same tree, which allows Working on the same tree. each task parallel hardware, and would allow each task to ◦ data to be completed quicker. be completed quicker work per tree? How many workers should  How many workers should there be per tree? What ififsome trees have few apples, while others have many? What some trees have few apples, while others many? 49 Parallel Computing: Perspectives for more e cient hydrological modeling 7 / 20
  • 8. Parallelism – More workers Parallelism: More workers General Concepts GPU Programming CA Parallel implementation  Each parallelism: Each worker pick a different tree Task worker pick apples from apples from a di↵erent tree. ◦ Task parallelism, and although each task takes the Although as in the serial version, many are same time each task takes the same time as in the sequential version, many tasks are parallel accomplished inaccomplished in parallel. What there are only few densely populated trees? ◦ What if if there are only aafew densely populated trees? 50 Parallel Computing: Perspectives for more e cient hydrological modeling 8 / 20
  • 9. General Concepts GPU Programming CA Parallel implementation Algorithm Decomposition Task Decomposition Most of engineering problems are non trivial and it is crucial to   have more formal to functionally independent parts reduces an algorithm concepts for determining parallelism. Tasks may have dependencies on other tasks The concept of decomposition ◦ If the input of task B is dependent on the output of task A, then task B is Task decomposition: dividing the algorithm into individual tasks, dependent on task A which are functionally independent. Tasks which don’t have ◦ Tasks that don’t have dependencies (or whose dependencies are dependencies (or whose dependencies are completed) can be completed) can be executed at any time to achieve parallelism executed at any time to achieve parallelism. ◦ Task dependency graphs are used to describe the relationship Data decomposition: dividing a data set into discrete chunks that between tasks can be processed in parallel. A B A B is dependent on A B C A and B are independent of each other C is dependent on A and B Parallel Computing: Perspectives for more e cient hydrological modeling 52 9 / 20
  • 10. General Concepts  GPU Programming CA Parallel A quiet revolution and potential build-up implementation ◦ Calculation:TFLOPS Programming? Why GPU vs. 100 GFLOPS ◦ Memory Bandwidth: ~10x Many-core GPU Multi-core CPU Courtesy: John Owens Figure 1.1. GPU in every PC– massive volume and potential impact ◦ Enlarging Perform ance Gap betw een GPUs and CPUs. Parallel programming is easier than ever because it can be done at relative low-end pc’s. 10 Cards such as the Nvidia Tesla C1060 and GT200 contain 240 cores, each of which is highly multithreaded. Parallel Computing: Perspectives for more e cient hydrological modeling 10 / 20
  • 11. General Concepts ● CPU GPU Programming CA Parallel implementation GPU vs CPU ● ● ● GPU: area used for but very cache Most die Few instructions memoryfast execution. Uses very fast Relatively few transistors for ALUs GDDR3 RAM. Most die area is used for ALUs and the caches are relative small. GPU CPU: Lots of instructions but slower execution. Uses slower DDR2 ● or die area used it ALUs Most DDR3 RAM (butfor has direct access to more memory than ● Relativelyfew transistors for ALUs. relative small caches GPUs). Most die area is used for memory cache and there are Parallel Computing: Perspectives for more e cient hydrological modeling 11 / 20
  • 12. General Concepts GPU Programming CA Parallel implementation GPU is fastGPU is fast Parallel Computing: Perspectives for more e cient hydrological modeling 12 / 20
  • 13. General Concepts GPU Programming CA Parallel implementation CUDA: Compute Unified Device Architecture CUDA Program: Consists of phases that are executed on either the host (CPU) or a device (GPU). No data parallelism = the code is executed at the host. Data parallelism = the code is executed at the device. Data-parallel portions of an application are expressed as device kernels which run on the device. Arrays of Parallel Threads GPU kernels are written using the Single Program Multiple Data (SPMD) programming model. • A CUDA kernel is executed by an array of threads SPMD executes multiple instances of the same program – All threads run the same code (SPMD)   independently, where eachthat it uses to compute memorya di↵erent portion of – Each thread has an ID program works on addresses and the data. make control decisions threadID 0 1 2 3 4 5 6 7 … float x = input[threadID]; float y = func(x); output[threadID] = y; … Parallel Computing: Perspectives for more e cient hydrological modeling 15 13 / 20
  • 14. General Concepts GPU Programming CA Parallel implementation CUDA: Compute Unified Device Architecture Chapter 2. Programming Model Grid A CUDA kernel is executed by an array of threads. Each thread has an ID, which is used to compute memory addresses and make control decisions. CUDA threads are organized into multiple blocks. Threads within a block cooperate via shared memory, atomic operations and barrier synchronization. Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Block (1, 1) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Figure 2-1.Grid of Thread Blocks Parallel Computing: Perspectives for more e cient hydrological modeling 2.3 Memory Hierarchy 14 / 20
  • 15. General Concepts GPU Programming CA Parallel implementation CUDA memory types Chapter 4: Hardware Implementation Global memory: Low bandwidth but large space. Fastest read/write calls if they are coalesced. Device Multiprocessor N Multiprocessor 2 Multiprocessor 1 Texture memory: Cache optimized for 2D spatial patterns. Shared Memory Registers Constant memory: Slow, but with cache (8 kb). Processor 1 Registers Processor 2 Registers … Instruction Unit Processor M Constant Cache Shared memory: Fast, but it can be used only by the threads of the same block. Texture Cache Device Memory Registers: 32768 32-bit registers per Multi-processor. A set of SIMT multiprocessors with on-chip shared memory. Figure 4-2.Hardware Model Parallel Computing: Perspectives for more e cient hydrological modeling 4.2 Multiple Devices 15 / 20
  • 16. General Concepts GPU Programming CA Parallel implementation CA Parallel implementation A parallel version of the Cellular Automata algorithm for variably saturated flow in soils was developed in CUDA API. The infiltration experiment of Vauclin et al. (1979) was chosen as a benchmark test for the accuracy and the speed of the algorithm. 0 t = 2 hrs t = 3 hrs t = 4 hrs t = 8 hrs experimental data Water Depth (m) 0.5 1 1.5 2 0 0.5 1 1.5 Distance (m) 2 2.5 3 Parallel Computing: Perspectives for more e cient hydrological modeling 16 / 20
  • 17. General Concepts GPU Programming CA Parallel implementation Why parallel code is important? In real case scenarios, where the 3-D simulation of large areas is needed, the grid sizes are excessively large. In natural hazards assessment the simulations should be fast in order to be useful (the prediction should be before the actual event!). Fast simulations allow us to calibrate easier the model parameters and investigate more e ciently the physical phenomena. The inherent CA concept natural parallelism make easier the parallel implementation of the algorithm. Parallel Computing: Perspectives for more e cient hydrological modeling 17 / 20
  • 18. General Concepts GPU Programming CA Parallel implementation Technical details Di culties The most challenging issue was the irregular geometry of the domain which made more di cult the exploitation of the locality at the thread computations and the use of the shared memory. The cell values were stored in a 1D array and for each cell the indexes of its neighboring cells were also stored. Code structure Simulation constants are stored in the constant memory. Soil properties for each soil class are stored in the texture memory. Atomic operations are used in order to check for convergence at every iteration. The shared memory is used to accelerate the atomic operations and the block’s memory accesses. Parallel Computing: Perspectives for more e cient hydrological modeling 18 / 20
  • 19. General Concepts GPU Programming CA Parallel implementation Results of the numerical tests Nvidia Quadro 2000: 192 CUDA cores. 1 GB GDDR5 of RAM memory. 100000" 90" 70" Speed%Up% Speed%(%cells/sec%)% 80" 10000" 1000" 100" CPU" 10" GPU" 60" 50" 40" 30" 20" 10" 1" 1000" 10000" 100000" Number%of%Cells% 1000000" 10000000" 0" 1000" 10000" 100000" Number%of%Cells% 1000000" 10000000" Parallel Computing: Perspectives for more e cient hydrological modeling 19 / 20
  • 20. Thanks for your attention!