SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
High Performance & High Throughput
Computing
Giuseppe Fiameni – g.Fiameni@cineca.it
http://hackaday.com/2016/07/13/nirvana-like-youve-never-heard-them-before/
Agenda
• Computational Problem
• System architecture
• Serial vs Parallel
• Levels of parallelism
• Models for parallel programming
• HPC and HTC examples
EUDAT Summer School 2017 - High Performance/Throughput Computing 2
What does it make a problem be
relevant from the computational
point of view?
Computational Dimension:
number of operations needed to solve the problem,
in general is a function of the size of the involved
data structures (n, n2 , n3 , n log n, etc.)
flop - Floating point operations
indicates an arithmetic floating point operation.
flop/s - Floating points operations per second
is a unit to measure the speed of a computer.
Computational problems today: 1015 – 1022 flop
One year has about 3 x 107 seconds!
Most powerful computers today have reach a sustained
performance is of the order of Tflop/s - Pflop/s (1012 - 1015
flop/s).
Size of computational applications
EUDAT Summer School 2017 - High Performance/Throughput Computing 4
Example: Weather Prediction
Forecasts on a global scale:
 3D Grid to represent the Earth
 Earth's circumference:  40000 km
 radius:  6370 km
 Earth's surface:  4πr2  5•108 km2
 6 variables:
 temperature
 pressure
 humidity
 wind speed in the 3 Cartesian directions
 cells of 1 km on each side
 100 slices to see how the variables evolve on the
different levels of the atmosphere
 a 30 seconds time step is required for the
simulation with such resolution
 Each cell requires about 1000 operations per
time step (Navier-Stokes turbulence and various
phenomena)
In physics, the Navier–Stokes
equations /nævˈjeɪ stoʊks/,
named after Claude-Louis
Navier and George Gabriel
Stokes, describe the motion
of viscous fluid substances.
EUDAT Summer School 2017 - High Performance/Throughput Computing 5
Grid: 5 x 108 x 100 = 5 x 1010 cells
 each cell is represented with 8 Byte
 Memory space:
 (6 var) x (8 Byte) x (5 x 1010 cells)  2 x 1012 Byte = 2TB
A 24 hours forecast needs:
• 24 x 60 x 2  3 x 103 time-step
• (5 x 1010 cells) x (103 oper.) x (3 x 103 time-steps) = 1.5 x 1017
operations !
A computer with a power of 1Tflop/s will take 1.5 x 105 sec.
Example: Weather Prediction (cont.)
24 hours forecast will need 2 days to run ... but we shall obtain a very
accurate forecast!
EUDAT Summer School 2017 - High Performance/Throughput Computing 6
John Von Neumann
EUDAT Summer School 2017 - High Performance/Throughput Computing 7
https://en.wikipedia.org/wiki/Von_Neumann_architecture
What if you want increase
performance?
• Increase the clock rate
– number of clock cycles per second (peacemaker)
• Increase the instruction rate
– number of basic instructions that can be executed
per clock cycle
– Pipeling, vectorization
• Reduce access time to data
– Having frequently used data on the processor
caches significantly improve computational
performance
EUDAT Summer School 2017 - High Performance/Throughput Computing 8
The clock cycle  is defined as the
time between two adjacent pulses
of oscillator that sets the time of
the processor.
The number of these pulses per
second is known as clock speed or
clock frequency, generally
measured in GHz (gigahertz, or
billions of pulses per second).
The clock cycle controls the
synchronization of operations in a
computer: All the operations
inside the processor last a multiple
of .
Processor  (ns) freq (MHz)
CDC 6600 100 10
Cyber 76 27.5 36,3
IBM ES 9000 9 111
Cray Y-MP C90 4.1 244
Intel i860 20 50
PC Pentium < 0.5 > 2 GHz
Power PC 1.17 850
IBM Power 5 0.52 1.9 GHz
IBM Power 6 0.21 4.7 GHz
Increasing the clock frequency:
The speed of light sets an upper limit to the speed
with which electronic components can operate .
Propagation velocity of a signal in a vacuum:
300.000 Km/s = 30 cm/ns
Heat dissipation problems inside the processor.
Also Quantum tunneling expected to become
important.
Speed of Processors: Clock Cycle and Frequency
EUDAT Summer School 2017 - High Performance/Throughput Computing 9
Memory access time: the time required
by the processor to access data or to write
data from / to memory
The hierarchy exists because :
 fast memory is expensive and small
 slow memory is cheap and big
Latency
 how long do I have to wait for the
data?
 (cannot do anything while waiting)
Throughput
 how many bytes/second. but not
important if waiting. Total time = latency + (amount of data
/ throughput)
Time to run code = clock cycles running code + clock cycles waiting for
memory
Memory hierarchies
EUDAT Summer School 2017 - High Performance/Throughput Computing 10
Loop unrolling
for (i=0; i<N; i++)
s += a[i]*b[i]
EUDAT Summer School 2017 - High Performance/Throughput Computing 11
for (i=0; i<N; i += 2)
s += a[i]*b[i] + a[i+1]*b[i+1]
Why Have Cache?
EUDAT Summer School 2017 - High Performance/Throughput Computing 12
CPU 351 GB/sec
10.66 GB/sec (3%)
253 GB/sec (72%)
Parallelism
 Serial Process:
 A process in which its sub-processes happen
sequentially in time.
 Speed depends only on the rate at which each
sub-process will occur (e.g. processing unit clock
speed).
 Parallel Process:
 Process in which multiple sub-processes can be
active simultaneously.
 Speed depends on execution rate of each sub-
process AND how many sub-processes can be
made to occur simultaneously.
EUDAT Summer School 2017 - High Performance/Throughput Computing 13
Parallel for Capacity
• Example: Searching database of web-sites for some
text (i.e. Google)
• Searching sequentially through large volumes of text
too time-consuming
– Multiple servers hold different pages
– Each server can report the result of each
individual search
– The more servers you add the quicker the search
is
EUDAT Summer School 2017 - High Performance/Throughput Computing 14
Parallel for Capability
• Example: Large scale weather simulation
• Detailed description for atmosphere too large to run
on today's desktop
– Multiple servers are needed to hold all grid data in
memory
– Servers need to quickly communicate to
synchronise work over entire grid
– Communication between servers can become a
bottleneck. Latency really matters!
EUDAT Summer School 2017 - High Performance/Throughput Computing 15
Capability and capacity
• Parallel for Capability => High Performance Computing
– deals with one large problem
– problems that are hard to parallelise, not
embarrassing parallel
– single processor performance is important
• Parallel for Capacity => High Throughput Computing
– It is appropriate when the problem can be
decomposed into many (very many) smaller problems
that are essentially independent
– problems that are easy to parallelise, parameter
swapping
– can be adapted to very large numbers of processors
EUDAT Summer School 2017 - High Performance/Throughput Computing 16
Systems main architecture
• Single processor, single core
EUDAT Summer School 2017 - High Performance/Throughput Computing 17
• Dual processor, single core
• Single processor, dual core
Multi core architecture
• Modern architecture can go up to 34 cores
per processor and to 2/4 processors per
node
EUDAT Summer School 2017 - High Performance/Throughput Computing 18
Flynn Taxonomy
• A computer architecture is categorized by the
multiplicity of hardware used to manipulate streams of
instructions (sequence of instructions executed by the
computer) and streams of data (sequence of data used
to execute a stream of instructions).
EUDAT Summer School 2017 - High Performance/Throughput Computing 20
CU Control Unit
PU Processing Unit
MM Memory Module
DS Data stream
IS Instruction Stream
DS2
DS1
MM2
MM1
MMn
.
.
.
CU
IS
PU2
PU1
PUn
DSn
.
.
.
SIMD Systems
Synchronous parallelism
 SIMD systems presents a
single control unit
 A single instruction
operates simultaneously
on multiple data.
 Array processor and
vector systems fall in this
class
EUDAT Summer School 2017 - High Performance/Throughput Computing 21
22
.
.
.
.
.
.
.
.
.
IS2
CU2
IS2
PU2 MM2
DS2
PU1 MM1
DS1
PUn MMn
DSn
CU1
CUn
ISn
IS1
IS1
ISn
CU Control Unit
PU Processing Unit
MM Memory Module
DS Data stream
IS Instruction Stream
MIMD Systems
Asynchronous parallelism
 Multiple processors
execute different
instructions operating on
different data.
 Represents the
multiprocessor version
of the SIMD class.
 Wide class ranging from
multi-core systems to
large MPP systems.
EUDAT Summer School 2017 - High Performance/Throughput Computing 22
2
3
Moore's Law
• Empirical law which states that the complexity of
devices (number of transistors per square inch in
microprocessors) doubles every 18 months..Gordon
Moore, INTEL co-founder, 1965
• It is estimated that Moore's Law still applies in the near
future but applied to the number of cores per processor
EUDAT Summer School 2017 - High Performance/Throughput Computing 23
The limits of parallelism - Amdahl’s Law
If we have N processors:
taking s as the fraction of the
time spent in the sequential part
of the program (s + p = 1)
les.robertson@cern.ch
s – time spent in a serial
processor on serial parts
of the code
p – time spent in a serial
processor on parts that
could be executed in
parallel
Amdahl, G.M., Validity of single-processor approach to achieving large scale computing capability
Proc. AFIPS Conf., Reston, VA, 1967, pp. 483-485
Speedup =
𝑠+𝑝
𝑠+𝑝/𝑁
Speedup =
1
𝑠+(1−𝑠) /𝑁
→ 1/𝑠
EUDAT Summer School 2017 - High Performance/Throughput Computing
Amdahl’s Law - maximum speedup
max speedup
0
20
40
60
80
100
120
1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
sequential part as percentage of total time
speedup
EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
The importance of the network
• Latency – the round-trip time (RTT) to
communicate between two processors
Communications overhead:
• For fine grained parallel programs the
problem is latency, not bandwidth
t
… …
… …
… …
sequential
communications
overhead
parallelisable
Speedup =
𝑠+𝑝
𝑠+𝑐+ 𝑝/𝑁
c = latency + data_transfer_time
EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
Aspects of parallelism
 It has been recognized for a long time that
constant performance improvements cannot be
obtained just by increasing factors such as
processor clock speed – parallelism is needed.
 Parallelism can be present at many levels:
 Functional parallelism within the CPU
 Pipelining and vectorization
 Multi-processor and multi-core
 Accelerators
 Parallel I/O
EUDAT Summer School 2017 - High Performance/Throughput Computing 28
Pipelining
 It is a technique where more
instructions, belonging to a stream of
sequential execution, overlap their
execution
 This technique improves the
performance of the single processor
 The concept of pipelining is similar to
that of assembly line in a factory where
in a flow line (pipe) of assembly stations
the elements are assembled in a
continuous flow.
 All the assembly stations must operate
at the same processing speed, otherwise
the station slower becomes the
bottleneck of the entire pipe.
EUDAT Summer School 2017 - High Performance/Throughput Computing 29
30
The stages are
1 Fetch
2 D1 (main decode)
3 D2 (secondary decode, also called translate)
4 EX (execute)
5 WB (write back to registers and memory)
http://www.gamedev.net/page/resources/_/technical/general-programming/a-
journey-through-the-cpu-pipeline-r3115
Instructions Pipelining
EUDAT Summer School 2017 - High Performance/Throughput Computing 30
CPU Vector units
 Vectorization
performed by
dedicated hardware on
chip
 Compiler generates
vector instructions,
when it can, from
programmer’s code
 Important optimization
which can lead to 4x,
8x speedups according
“size” of vector unit
(e.g. 256 bit)
EUDAT Summer School 2017 - High Performance/Throughput Computing 32
CPU vs GPU
• A CPU is designed to handle complex tasks while GPUs only do
one thing well - handle billions of repetitive low level tasks -
originally the rendering of triangles in 3D graphics
• Originally, this was called GPGPU (General Purpose GPU
programming), and it required mapping scientific code to the
matrix operations for manipulating triangles.
• This was insanely difficult to do and took a lot of dedication.
However, with the advent of CUDA and OpenCL, high-level
languages targeting the GPU, GPU programming is rapidly
becoming mainstream in the scientific community.
EUDAT Summer School 2017 - High Performance/Throughput Computing 33
HPC vs HTC
High Performance
• Granularity largely defined
by the algorithm, limitations
in the hardware
• Load balancing difficult
• Reliability is all important
– if one part fails the
calculation stops (maybe
even aborts!)
– check-pointing essential
– hard to dynamically
reconfigure for smaller
number
of processors
High Throughput
• Granularity can be selected to
fit the environment
• Load balancing easy
• Mixing workloads is easy
• Sustained throughput is the
key
goal
– the order in which the
individual tasks execute is
(usually) not important
– if some equipment goes down
the work can be re-run later
– easy to re-schedule
dynamically the workload to
different configurations
EUDAT Summer School 2017 - High Performance/Throughput Computing 34
Granularity
• fine-grained parallelism – the independent bits are
small, need to exchange information, synchronize
often
– large number of more expensive processors
expensively interconnected
• coarse-grained – the problem can be decomposed
into large chunks that can be processed
independently
– large number of inexpensive processors,
inexpensively interconnected
EUDAT Summer School 2017 - High Performance/Throughput Computing
(1) Map Only
(4) Point to Point or
Map-Communication
(3) Iterative Map Reduce
or Map-Collective
(2) Classic
MapReduce
Input
map
reduce
Input
map
reduce
Iterations
Input
Output
map
Local
Graph
BLAST Analysis
Local Machine
Learning
Pleasingly Parallel
High Energy
Physics (HEP)
Histograms
Distributed search
Recommender
Engines
Expectation
maximization
Clustering e.g. K-
means
Linear Algebra,
PageRank
Classic MPI
PDE Solvers and
Particle Dynamics
Graph Problems
MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph
Integrated Systems such as Hadoop + Harp with
Compute and Communication model separated
Courtesy of Prof. Geoffrey Charles Fox – Indiana University
EUDAT Summer School 2017 - High Performance/Throughput Computing 36
A TYPICAL HPC ALGORITHM
37
Matrix multiplication
• Basic matrix multiplication on a 2-D grid
• Matrix multiplication is an important application in
HPC and appears in many areas (linear algebra)
• C = A * B where A, B, and C are matrices (two-
dimensional arrays)
• A restricted case is when B has only one column,
matrix-vector product, which appears in
representation of linear equations and partial
differential equations
EUDAT Summer School 2017 - High Performance/Throughput Computing 38
C = A x B
EUDAT Summer School 2017 - High Performance/Throughput Computing 39
A TYPICAL HTC ALGORITHM
40
Monte Carlo Methods
• Monte Carlo simulation is a broad term for methods that use
random numbers and statistical sampling to
solve problems, rather than exact modeling.
• Due to the sampling, the result will have some uncertainty,
but the statistical ‘law of large numbers’ will ensure that the
uncertainty goes down as the number of samples grows.
• The statistical law that underlies this is as follows:
– if N independent observations are made of a quantity with
standard deviation σ, then the standard deviation of the
mean is σ=pN. This means that more observations will lead
to more accuracy
• What makes Monte Carlo methods interesting is that the gain
in accuracy is not related to dimensionality of the original
problem.
EUDAT Summer School 2017 - High Performance/Throughput Computing 41
Agia Pelagia gulf area
EUDAT Summer School 2017 - High Performance/Throughput Computing 42
0 1
1
Hyper-Threading Technology
• Hyper-Threading Technology brings the simultaneous multi-
threading approach to the Intel architecture.
• Hyper-Threading Technology makes a single physical processor
appear as two or more logical processors
• Hyper-Threading Technology provides thread-level-parallelism (TLP)
on each processor resulting in increased utilization of processor and
execution resources
• Each logical processor maintain one copy of the architecture state
• While HT improves some performance benchmarks by up to
30%, its benefits are strongly application dependent. For HPC
codes, the consequences of HT also depend on inter-node
communication topology and load balance.
EUDAT Summer School 2017 - High Performance/Throughput Computing 43
EUDAT Summer School 2017 - High Performance/Throughput Computing 44
Reference
EUDAT Summer School 2017 - High Performance/Throughput Computing 45

Weitere ähnliche Inhalte

Was ist angesagt?

Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
Jithin Parakka
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
pkaviya
 
Resource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computingResource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computing
Masoumeh_tajvidi
 

Was ist angesagt? (20)

Service level agreement in cloud computing an overview
Service level agreement in cloud computing  an overviewService level agreement in cloud computing  an overview
Service level agreement in cloud computing an overview
 
Virtualization.ppt
Virtualization.pptVirtualization.ppt
Virtualization.ppt
 
Eucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebulaEucalyptus, Nimbus & OpenNebula
Eucalyptus, Nimbus & OpenNebula
 
Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture Cloud Computing and Service oriented Architecture
Cloud Computing and Service oriented Architecture
 
Virtual machine security
Virtual machine securityVirtual machine security
Virtual machine security
 
Cloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computingCloud computing system models for distributed and cloud computing
Cloud computing system models for distributed and cloud computing
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
 
Introduction to Virtualization
Introduction to VirtualizationIntroduction to Virtualization
Introduction to Virtualization
 
Cloud computing stack
Cloud computing stackCloud computing stack
Cloud computing stack
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Cloud service lifecycle management
Cloud service lifecycle managementCloud service lifecycle management
Cloud service lifecycle management
 
cloud computing:Types of virtualization
cloud computing:Types of virtualizationcloud computing:Types of virtualization
cloud computing:Types of virtualization
 
Virtualization & cloud computing
Virtualization & cloud computingVirtualization & cloud computing
Virtualization & cloud computing
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Cloud computing 8 cloud service models
Cloud computing 8 cloud service modelsCloud computing 8 cloud service models
Cloud computing 8 cloud service models
 
Resource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computingResource provisioning optimization in cloud computing
Resource provisioning optimization in cloud computing
 
Federation of OpenStack clouds
Federation of OpenStack cloudsFederation of OpenStack clouds
Federation of OpenStack clouds
 
Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)
 
Collaborating Using Cloud Services
Collaborating Using Cloud ServicesCollaborating Using Cloud Services
Collaborating Using Cloud Services
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Ähnlich wie High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA)

Lecture 3
Lecture 3Lecture 3
Lecture 3
Mr SMAK
 

Ähnlich wie High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA) (20)

Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Lecture1
Lecture1Lecture1
Lecture1
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
04 performance
04 performance04 performance
04 performance
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Tutotial 2 answer
Tutotial 2 answerTutotial 2 answer
Tutotial 2 answer
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 

Mehr von EUDAT

Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
EUDAT
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
EUDAT
 

Mehr von EUDAT (20)

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdf
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdf
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdf
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdf
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdf
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdf
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdf
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT services
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentation
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto Pilot
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last week
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshop
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materials
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSC
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for Researchers
 

Kürzlich hochgeladen

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA)

  • 1. High Performance & High Throughput Computing Giuseppe Fiameni – g.Fiameni@cineca.it http://hackaday.com/2016/07/13/nirvana-like-youve-never-heard-them-before/
  • 2. Agenda • Computational Problem • System architecture • Serial vs Parallel • Levels of parallelism • Models for parallel programming • HPC and HTC examples EUDAT Summer School 2017 - High Performance/Throughput Computing 2
  • 3. What does it make a problem be relevant from the computational point of view?
  • 4. Computational Dimension: number of operations needed to solve the problem, in general is a function of the size of the involved data structures (n, n2 , n3 , n log n, etc.) flop - Floating point operations indicates an arithmetic floating point operation. flop/s - Floating points operations per second is a unit to measure the speed of a computer. Computational problems today: 1015 – 1022 flop One year has about 3 x 107 seconds! Most powerful computers today have reach a sustained performance is of the order of Tflop/s - Pflop/s (1012 - 1015 flop/s). Size of computational applications EUDAT Summer School 2017 - High Performance/Throughput Computing 4
  • 5. Example: Weather Prediction Forecasts on a global scale:  3D Grid to represent the Earth  Earth's circumference:  40000 km  radius:  6370 km  Earth's surface:  4πr2  5•108 km2  6 variables:  temperature  pressure  humidity  wind speed in the 3 Cartesian directions  cells of 1 km on each side  100 slices to see how the variables evolve on the different levels of the atmosphere  a 30 seconds time step is required for the simulation with such resolution  Each cell requires about 1000 operations per time step (Navier-Stokes turbulence and various phenomena) In physics, the Navier–Stokes equations /nævˈjeɪ stoʊks/, named after Claude-Louis Navier and George Gabriel Stokes, describe the motion of viscous fluid substances. EUDAT Summer School 2017 - High Performance/Throughput Computing 5
  • 6. Grid: 5 x 108 x 100 = 5 x 1010 cells  each cell is represented with 8 Byte  Memory space:  (6 var) x (8 Byte) x (5 x 1010 cells)  2 x 1012 Byte = 2TB A 24 hours forecast needs: • 24 x 60 x 2  3 x 103 time-step • (5 x 1010 cells) x (103 oper.) x (3 x 103 time-steps) = 1.5 x 1017 operations ! A computer with a power of 1Tflop/s will take 1.5 x 105 sec. Example: Weather Prediction (cont.) 24 hours forecast will need 2 days to run ... but we shall obtain a very accurate forecast! EUDAT Summer School 2017 - High Performance/Throughput Computing 6
  • 7. John Von Neumann EUDAT Summer School 2017 - High Performance/Throughput Computing 7 https://en.wikipedia.org/wiki/Von_Neumann_architecture
  • 8. What if you want increase performance? • Increase the clock rate – number of clock cycles per second (peacemaker) • Increase the instruction rate – number of basic instructions that can be executed per clock cycle – Pipeling, vectorization • Reduce access time to data – Having frequently used data on the processor caches significantly improve computational performance EUDAT Summer School 2017 - High Performance/Throughput Computing 8
  • 9. The clock cycle  is defined as the time between two adjacent pulses of oscillator that sets the time of the processor. The number of these pulses per second is known as clock speed or clock frequency, generally measured in GHz (gigahertz, or billions of pulses per second). The clock cycle controls the synchronization of operations in a computer: All the operations inside the processor last a multiple of . Processor  (ns) freq (MHz) CDC 6600 100 10 Cyber 76 27.5 36,3 IBM ES 9000 9 111 Cray Y-MP C90 4.1 244 Intel i860 20 50 PC Pentium < 0.5 > 2 GHz Power PC 1.17 850 IBM Power 5 0.52 1.9 GHz IBM Power 6 0.21 4.7 GHz Increasing the clock frequency: The speed of light sets an upper limit to the speed with which electronic components can operate . Propagation velocity of a signal in a vacuum: 300.000 Km/s = 30 cm/ns Heat dissipation problems inside the processor. Also Quantum tunneling expected to become important. Speed of Processors: Clock Cycle and Frequency EUDAT Summer School 2017 - High Performance/Throughput Computing 9
  • 10. Memory access time: the time required by the processor to access data or to write data from / to memory The hierarchy exists because :  fast memory is expensive and small  slow memory is cheap and big Latency  how long do I have to wait for the data?  (cannot do anything while waiting) Throughput  how many bytes/second. but not important if waiting. Total time = latency + (amount of data / throughput) Time to run code = clock cycles running code + clock cycles waiting for memory Memory hierarchies EUDAT Summer School 2017 - High Performance/Throughput Computing 10
  • 11. Loop unrolling for (i=0; i<N; i++) s += a[i]*b[i] EUDAT Summer School 2017 - High Performance/Throughput Computing 11 for (i=0; i<N; i += 2) s += a[i]*b[i] + a[i+1]*b[i+1]
  • 12. Why Have Cache? EUDAT Summer School 2017 - High Performance/Throughput Computing 12 CPU 351 GB/sec 10.66 GB/sec (3%) 253 GB/sec (72%)
  • 13. Parallelism  Serial Process:  A process in which its sub-processes happen sequentially in time.  Speed depends only on the rate at which each sub-process will occur (e.g. processing unit clock speed).  Parallel Process:  Process in which multiple sub-processes can be active simultaneously.  Speed depends on execution rate of each sub- process AND how many sub-processes can be made to occur simultaneously. EUDAT Summer School 2017 - High Performance/Throughput Computing 13
  • 14. Parallel for Capacity • Example: Searching database of web-sites for some text (i.e. Google) • Searching sequentially through large volumes of text too time-consuming – Multiple servers hold different pages – Each server can report the result of each individual search – The more servers you add the quicker the search is EUDAT Summer School 2017 - High Performance/Throughput Computing 14
  • 15. Parallel for Capability • Example: Large scale weather simulation • Detailed description for atmosphere too large to run on today's desktop – Multiple servers are needed to hold all grid data in memory – Servers need to quickly communicate to synchronise work over entire grid – Communication between servers can become a bottleneck. Latency really matters! EUDAT Summer School 2017 - High Performance/Throughput Computing 15
  • 16. Capability and capacity • Parallel for Capability => High Performance Computing – deals with one large problem – problems that are hard to parallelise, not embarrassing parallel – single processor performance is important • Parallel for Capacity => High Throughput Computing – It is appropriate when the problem can be decomposed into many (very many) smaller problems that are essentially independent – problems that are easy to parallelise, parameter swapping – can be adapted to very large numbers of processors EUDAT Summer School 2017 - High Performance/Throughput Computing 16
  • 17. Systems main architecture • Single processor, single core EUDAT Summer School 2017 - High Performance/Throughput Computing 17 • Dual processor, single core • Single processor, dual core
  • 18. Multi core architecture • Modern architecture can go up to 34 cores per processor and to 2/4 processors per node EUDAT Summer School 2017 - High Performance/Throughput Computing 18
  • 19. Flynn Taxonomy • A computer architecture is categorized by the multiplicity of hardware used to manipulate streams of instructions (sequence of instructions executed by the computer) and streams of data (sequence of data used to execute a stream of instructions). EUDAT Summer School 2017 - High Performance/Throughput Computing 20
  • 20. CU Control Unit PU Processing Unit MM Memory Module DS Data stream IS Instruction Stream DS2 DS1 MM2 MM1 MMn . . . CU IS PU2 PU1 PUn DSn . . . SIMD Systems Synchronous parallelism  SIMD systems presents a single control unit  A single instruction operates simultaneously on multiple data.  Array processor and vector systems fall in this class EUDAT Summer School 2017 - High Performance/Throughput Computing 21
  • 21. 22 . . . . . . . . . IS2 CU2 IS2 PU2 MM2 DS2 PU1 MM1 DS1 PUn MMn DSn CU1 CUn ISn IS1 IS1 ISn CU Control Unit PU Processing Unit MM Memory Module DS Data stream IS Instruction Stream MIMD Systems Asynchronous parallelism  Multiple processors execute different instructions operating on different data.  Represents the multiprocessor version of the SIMD class.  Wide class ranging from multi-core systems to large MPP systems. EUDAT Summer School 2017 - High Performance/Throughput Computing 22
  • 22. 2 3 Moore's Law • Empirical law which states that the complexity of devices (number of transistors per square inch in microprocessors) doubles every 18 months..Gordon Moore, INTEL co-founder, 1965 • It is estimated that Moore's Law still applies in the near future but applied to the number of cores per processor EUDAT Summer School 2017 - High Performance/Throughput Computing 23
  • 23. The limits of parallelism - Amdahl’s Law If we have N processors: taking s as the fraction of the time spent in the sequential part of the program (s + p = 1) les.robertson@cern.ch s – time spent in a serial processor on serial parts of the code p – time spent in a serial processor on parts that could be executed in parallel Amdahl, G.M., Validity of single-processor approach to achieving large scale computing capability Proc. AFIPS Conf., Reston, VA, 1967, pp. 483-485 Speedup = 𝑠+𝑝 𝑠+𝑝/𝑁 Speedup = 1 𝑠+(1−𝑠) /𝑁 → 1/𝑠 EUDAT Summer School 2017 - High Performance/Throughput Computing
  • 24. Amdahl’s Law - maximum speedup max speedup 0 20 40 60 80 100 120 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% sequential part as percentage of total time speedup EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
  • 25. The importance of the network • Latency – the round-trip time (RTT) to communicate between two processors Communications overhead: • For fine grained parallel programs the problem is latency, not bandwidth t … … … … … … sequential communications overhead parallelisable Speedup = 𝑠+𝑝 𝑠+𝑐+ 𝑝/𝑁 c = latency + data_transfer_time EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
  • 26. Aspects of parallelism  It has been recognized for a long time that constant performance improvements cannot be obtained just by increasing factors such as processor clock speed – parallelism is needed.  Parallelism can be present at many levels:  Functional parallelism within the CPU  Pipelining and vectorization  Multi-processor and multi-core  Accelerators  Parallel I/O EUDAT Summer School 2017 - High Performance/Throughput Computing 28
  • 27. Pipelining  It is a technique where more instructions, belonging to a stream of sequential execution, overlap their execution  This technique improves the performance of the single processor  The concept of pipelining is similar to that of assembly line in a factory where in a flow line (pipe) of assembly stations the elements are assembled in a continuous flow.  All the assembly stations must operate at the same processing speed, otherwise the station slower becomes the bottleneck of the entire pipe. EUDAT Summer School 2017 - High Performance/Throughput Computing 29
  • 28. 30 The stages are 1 Fetch 2 D1 (main decode) 3 D2 (secondary decode, also called translate) 4 EX (execute) 5 WB (write back to registers and memory) http://www.gamedev.net/page/resources/_/technical/general-programming/a- journey-through-the-cpu-pipeline-r3115 Instructions Pipelining EUDAT Summer School 2017 - High Performance/Throughput Computing 30
  • 29. CPU Vector units  Vectorization performed by dedicated hardware on chip  Compiler generates vector instructions, when it can, from programmer’s code  Important optimization which can lead to 4x, 8x speedups according “size” of vector unit (e.g. 256 bit) EUDAT Summer School 2017 - High Performance/Throughput Computing 32
  • 30. CPU vs GPU • A CPU is designed to handle complex tasks while GPUs only do one thing well - handle billions of repetitive low level tasks - originally the rendering of triangles in 3D graphics • Originally, this was called GPGPU (General Purpose GPU programming), and it required mapping scientific code to the matrix operations for manipulating triangles. • This was insanely difficult to do and took a lot of dedication. However, with the advent of CUDA and OpenCL, high-level languages targeting the GPU, GPU programming is rapidly becoming mainstream in the scientific community. EUDAT Summer School 2017 - High Performance/Throughput Computing 33
  • 31. HPC vs HTC High Performance • Granularity largely defined by the algorithm, limitations in the hardware • Load balancing difficult • Reliability is all important – if one part fails the calculation stops (maybe even aborts!) – check-pointing essential – hard to dynamically reconfigure for smaller number of processors High Throughput • Granularity can be selected to fit the environment • Load balancing easy • Mixing workloads is easy • Sustained throughput is the key goal – the order in which the individual tasks execute is (usually) not important – if some equipment goes down the work can be re-run later – easy to re-schedule dynamically the workload to different configurations EUDAT Summer School 2017 - High Performance/Throughput Computing 34
  • 32. Granularity • fine-grained parallelism – the independent bits are small, need to exchange information, synchronize often – large number of more expensive processors expensively interconnected • coarse-grained – the problem can be decomposed into large chunks that can be processed independently – large number of inexpensive processors, inexpensively interconnected EUDAT Summer School 2017 - High Performance/Throughput Computing
  • 33. (1) Map Only (4) Point to Point or Map-Communication (3) Iterative Map Reduce or Map-Collective (2) Classic MapReduce Input map reduce Input map reduce Iterations Input Output map Local Graph BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Recommender Engines Expectation maximization Clustering e.g. K- means Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Problems MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Integrated Systems such as Hadoop + Harp with Compute and Communication model separated Courtesy of Prof. Geoffrey Charles Fox – Indiana University EUDAT Summer School 2017 - High Performance/Throughput Computing 36
  • 34. A TYPICAL HPC ALGORITHM 37
  • 35. Matrix multiplication • Basic matrix multiplication on a 2-D grid • Matrix multiplication is an important application in HPC and appears in many areas (linear algebra) • C = A * B where A, B, and C are matrices (two- dimensional arrays) • A restricted case is when B has only one column, matrix-vector product, which appears in representation of linear equations and partial differential equations EUDAT Summer School 2017 - High Performance/Throughput Computing 38
  • 36. C = A x B EUDAT Summer School 2017 - High Performance/Throughput Computing 39
  • 37. A TYPICAL HTC ALGORITHM 40
  • 38. Monte Carlo Methods • Monte Carlo simulation is a broad term for methods that use random numbers and statistical sampling to solve problems, rather than exact modeling. • Due to the sampling, the result will have some uncertainty, but the statistical ‘law of large numbers’ will ensure that the uncertainty goes down as the number of samples grows. • The statistical law that underlies this is as follows: – if N independent observations are made of a quantity with standard deviation σ, then the standard deviation of the mean is σ=pN. This means that more observations will lead to more accuracy • What makes Monte Carlo methods interesting is that the gain in accuracy is not related to dimensionality of the original problem. EUDAT Summer School 2017 - High Performance/Throughput Computing 41
  • 39. Agia Pelagia gulf area EUDAT Summer School 2017 - High Performance/Throughput Computing 42 0 1 1
  • 40. Hyper-Threading Technology • Hyper-Threading Technology brings the simultaneous multi- threading approach to the Intel architecture. • Hyper-Threading Technology makes a single physical processor appear as two or more logical processors • Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increased utilization of processor and execution resources • Each logical processor maintain one copy of the architecture state • While HT improves some performance benchmarks by up to 30%, its benefits are strongly application dependent. For HPC codes, the consequences of HT also depend on inter-node communication topology and load balance. EUDAT Summer School 2017 - High Performance/Throughput Computing 43
  • 41. EUDAT Summer School 2017 - High Performance/Throughput Computing 44
  • 42. Reference EUDAT Summer School 2017 - High Performance/Throughput Computing 45