SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Parallelization of
molecular dynamics
2019年 7月 11日
Jaewoon Jung
(RIKEN Center for Computational Science)
計算科学技術特論A
Overview of MD
Molecular Dynamics (MD)
1. Energy/forces are described by classical molecular mechanics force field.
2. Update state according to equations of motion
Long time MD trajectories are important to obtain
thermodynamic quantities of target systems.
Equation of
motion
Long time MD trajectory
=> Ensemble generation
Integration
Potential energy in MD
4
2
total 0
bonds
2
0
angles
dihedrals
12 61
0 0
1 1
( )
( )
[1 cos( )]
2
b
a
n
N N
ij ij i j
ij
ij ij ijj i j
E k b b
k
V n
r r q q
r r r
 
 


  
 
 
  
                 
       



 
O(N)
O(N)
O(N)
O(N2)
Main bottleneck in MD
12 6 2 2
0 0
2
0
erfc( ) exp( / 4 )
2 FFT( ( ))
ij ij i j ij
ij
ij ij iji j R
r r q q r
Q
r r r
 

  
                  
       
 
k
k
k
k
Real space, O(CN) Reciprocal space, O(NlogN)
Total number of particles
Non-bonded interaction
1. Non-bond energy calculation is reduced by introducing cutoff
2. The electrostatic energy calculation beyond cutoff will be done in the reciprocal space
with FFT
3. Further, it could be reduced by properly distributing over parallel processes, in particular
good domain decomposition scheme.
5
| |
Real part Reciprocal part Self energy
O(N2)
O(N1)
Difficulty to perform long time MD simulation
1. One time step length (Δt) is limited to 1-2 fs due to vibrations.
2. On the other hand, biologically meaningful events occur on the time scale of
milliseconds or longer.
fs ps ns μs ms sec
vibrations
Sidechain motions
Mainchain motions
Folding
Protein global motions
How to accelerate MD simulations?
=> Parallelization
Serial Parallel
16 cpus
X16?
1cpu
1cpu
Good Parallelization ;
1) Small amount of
computation in one process
2) Small communicational cost
C
CPU
Core
MPI
Comm
Parallelization
Shared memory parallelization (OpenMP)
Memory
P1 P2 P3 PP-1 PP
• All processes share data in memory
• For efficient parallelization, processes should not access the same memory address.
• It is only available for multi-processors in a physical node
Distributed memory parallelization (MPI)
M1
P1
M2
P2
M3
P3
MP-1
PP-1
MP
PP
…
• Processors do not share data in memory
• We need to send/receive data via communications
• For efficient parallelization, the amount of communication data should be minimized
Hybrid parallelization (MPI+OpenMP)
M1
P1
M2
P2
M3
P3
MP-1
PP-1
MP
PP
…
• Combination of shared memory and distributed memory parallelization.
• It is useful for minimizing communicational cost with very large number of processors
SIMD (Single instruction, multiple data)
• Same operation on multiple data points simultaneously
• Usually applicable to common tasks like adjusting graphic image or volume
• In most MD programs, SIMD becomes the one of the important topics to increase the
performance
SIMDNo SIMD
4 times faster
SIMT (Single instruction, multiple threads)
• SIMD combined with multithreading
• SIMT execution model is usually implemented on GPUs and related with GPGPU
(General Purpose computing on Graphics Processing Units)
• Currently, CUDA allows 32 threads for SIMT (warp size =32)
Grid 1
Block
(0,0)
Block
(0,1)
Block
(1,1)
Block
(1,0)
Block (1,0)
Thread
(0,0)
Thread
(0,1)
Thread
(1,0)
Thread
(1,1)
Thread
(1,2)
Thread
(2,0)
Thread
(2,1)
Thread
(3,0)
Thread
(3,1)
Thread
(2,2)
Thread
(3,2)
Thread
(0,2)
MPI Parallelization of MD
(non-bonded interaction in real space)
Parallelization scheme 1: Replicated data approach
1. Each process has a copy of all particle data.
2. Each process works only part of the whole works by proper assign in do loops.
do i = 1, N
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
my_rank = MPI_Rank
proc = total MPI
do i = my_rank+1, N, proc
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
MPI reduction (energy,force)
atom indices
Computationalcost
MPI rank 1
MPI rank 0
MPI rank 2
MPI rank 3
1 2 3 4 5 6 7 8
Perfect load balance is not guaranteed in this parallelization scheme
Parallelization scheme 1: Replicated data approach
Hybrid (MPI+OpenMP) parallelization of the
Replicated data approach
1. Works are distributed over MPI and OpenMP threads.
2. Parallelization is increased by reducing the number of MPIs involved in
communications.
my_rank = MPI_Rank
proc = total MPI
do i = my_rank+1, N, proc
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
MPI reduction
(energy,force)
my_rank = MPI_Rank
proc = total MPI
nthread = total OMP thread
!$omp parallel
id = omp thread id
my_id = my_rank*nthread + id
do i = my_id+1,N,proc*nthread
do j = i+1, N
energy(i,j)
force(i,j)
end do
end do
Openmp reduciton
!$omp end parallel
MPI reduction (energy,force)
Pros and Cons of the Replicated data approach
1. Pros : easy to implement
2. Cons
• Parallel efficiency is not good
• No perfect load balance
• Communication cost is not reduced by increasing the number of
processes
• We can parallelize only for energy calculation (with MPI,
parallelization of integration is not so much efficient)
• Needs a lot of memory
• Usage of global data
Parallelization scheme 2: Domain decomposition
1. The simulation space is divided into
subdomains according to MPI
(different colors for different MPIs).
2. Each MPI only considers the
corresponding subdomain.
3. MPI communications only among
neighboring processes.
Communications
between processors
Parallelization scheme 2: Domain decomposition
1. For the interaction between
different subdomains, it is
necessary to have the data of the
buffer region of each subdomain.
2. The size of the buffer region is
dependent on the cutoff values.
3. The interaction between particles in
different subdomains should be
considered very carefully.
4. The size of the subdomain and
buffer region is decreased by
increasing the number of processes.
𝒄
Sub-
domain
Buffer
Pros and Cons of the domain decomposition approach
1. Pros
• Good parallel efficiency
• Reduced computational cost by increasing the number of processes
• We can easily parallelize not only energy but also integration
• Availability of a huge system
• Data size is decreased by increasing the number of processes
2. Cons
• Implementation is not easy
• Domain decomposition scheme is highly depend on the potential energy
type, cutoff and so on
• Good performance cannot be obtained for nonuniform particle distributions
Special treatment for nonuniform distributions
Tree method
: Subdomain size is adjusted to have the same number of particles
Hilbert space filling curve
: a map that relates multi-dimensional space to one-dimensional curve
The above figure is from Wikipedia
https://en.wikipedia.org/wiki/Hilbert_curve
Comparison of two parallelization scheme
Computation Communication Memory
Replicated data O(N/P) O(N) O(N)
Domain
decomposition
O(N/P) O((N/P)2/3) O(N/P)
N: system size
P: number of processes
MPI Parallelization of MD
(reciprocal space)
Smooth particle mesh Ewald method
Real part Reciprocal part Self energy
The structure factor in the reciprocal part is approximated as
Using Cardinal B-splines of order n Fourier Transform of
charge
  2 2 2
2 2
2
' , 1 1
'
erfc1 1 exp( / )
( ) ( )
2 2
i j c
N N
i j i j
i
i j ii j
r
q q
E S q
V
   
   
  
  
  
 
   n k 0
r r n
r r n k
r k
kr r n
1 2 3 1 1 2 2 3 3 1 2 3( , , ) ( ) ( ) ( ) ( )( , , )S k k k b k b k b k F Q k k k
It is important to parallelize the Fast Fourier transform efficiently in PME!!
Ref : U. Essmann et al, J. Chem. Phys. 103, 8577 (1995)
Overall procedure of reciprocal space calculation
(charge on real space) (charge on grid)
E( )
Force calculation
from
PME_Pre
FFT
Inverse
FFT
PME_Post
Simple note of MPI_alltoall communications
MPI_alltoall
A1
A3
A2
B1
A4
B2
B4
B3
C2
C1
C3
C4
D1
D2
D4
D3
Proc1 Proc2 Proc3 Proc4
A1
C1
B1
A2
D1
B2
D2
C2
B3
A3
C3
D3
A4
B4
D4
C4
Proc1 Proc2 Proc3 Proc4
MPI_alltoall is same as matrix transpose!!
Parallel 3D FFT – slab (1D) decomposition
1. Each process is assigned a slob of size
for computing FFT of
grids on processes.
2. The scalability is limited by
3. should be divisible by
Proc 1
Proc 2
Proc 3
Proc 4
Parallel 3D FFT – slab (1D) decomposition
No communication
MPI_Alltoall communication
among all preocessors
Parallel 3D FFT – slab (1D) decomposition (continued)
1. Slab decomposition of 3D FFT has three steps
• 2D FFT (or two 1D FFT) along the two local dimension
• Global transpose (communication)
• 1D FFT along third dimension
2. Pros
• fast using small number of processes
3. Cons
• Limitation of the number of processes
Parallel 3D FFT –2D decomposition
Each process is assigned a data of size for computing a FFT of
grids on processes.
MPI_Alltoall
communications among 3
processes of the same color
MPI_Alltoall
communications among 3
processes of the same color
Parallel 3D FFT –2D decomposition (continued)
1. 2D decomposition of 3D FFT has five steps
• 1D FFT along the local dimension
• Global transpose
• 1D FFT along the second dimension
• Global transpose
• 1D FFT along the third dimension
2. The global transpose requires communication only between subgroups of all nodes
3. Cons: Slower than 1D decomposition for a small number of processes
4. Pros : Maximum parallelization is increased
Pseudo code of reciprocal space calculation with
2D decomposition of 3D FFT
! compute Q factor
do i = 1, natom/P
compute Q_orig
end do
call mpi_alltoall(Q_orig, Q_new, …)
accumulate Q from Q_new
!FFT : F(Q)
do iz = 1, zgrid(local)
do iy = 1, ygrid(local)
work_local = Q(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q, Q_new,…)
do iz = 1, zgrid(local)
do ix = 1, xgrid(local)
work_local = Q_new(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new,..)
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
! compute energy and virial
do iz = 1, zgrid
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
energy = energy + sum(Th*Q)
virial = viral + ..
end do
end do
end do
! X=F_1(Th)*F_1(Q)
! FFT (F(X))
do iy = 1, ygrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new,..)
do iz = 1, zgrid(local)
do ix = 1, xgrid(local)
work_local = Q(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
call mpi_alltoall(Q,Q_new)
do iz = 1, zgrid(local)
do iy = 1, ygrid(local)
work_local = Q(my_rank)
call fft(work_local)
Q(my_rank) = work_local
end do
end do
!compute force
Parallel 3D FFT – 3D decomposition
• More communications than
existing FFT
• MPI_Alltoall communications
only in one dimensional space
• Reduce communicational cost
for large number of processes
Ref) J. Jung et al. Comput. Phys. Comm.
200, 57-65 (2016)
Case Study:
Parallelization in GENESIS MD software
Domain decomposition in GENESIS SPDYN
1. The simulation space is divided into
subdomain according to the number
of MPI processes.
2. Each subdomain is further divided
into the unit domain (named cell).
3. Particle data are grouped together
in each cell.
4. Communications are considered
only between neighboring
processes.
5. In each subdomain, OpenMP
parallelization is used.
𝒄
Efficient shared memory calculation
=>Cell-wise particle data
1. Each cell contains an array with the data of the particles that reside within it
2. This improves the locality of the particle data for operations on individual cells or
pairs of cells.
particle data in traditional cell lists
Ref : P. Gonnet, JCC 33, 76-81 (2012)
cell-wise arrays of particle data
Midpoint cell method
• For every cell pair, midpoint cell is
decided.
• If midpoint cell is not uniquely
decided, we decide it by considering
load balance.
• For every particle pairs, interaction
subdomain is decided from the
midpoint cell.
• With this scheme, we can only
communicate data of each
subdomain only considering the
adjacent cells of each subdomain.
Subdomain structure with midpoint cell method
1. Each MPI has local subdomain which is surrounded by boundary cells.
2. At least, the local subdomain has at least two cells in each direction
Local sub-domain
Boundary
Communication pattern (example of coordinates)
1. Each subdomain can evaluate interactions by obtaining the data of boundary region from
other processes
2. Using the communication pattern shown below, we can minimize the frequency of
communication.
①②
③
④
① Receive from the upper process in x
dimension
② Receive from the lower process in x
dimension
③ Receive from the upper process in y
dimension
④ Receive from the lower process in y
dimension
Communication pattern (example of forces)
1. Communication pattern of forces is opposite to the communication pattern of coordinates.
2. Each process send the force data in boundary cells.
①
②
③ ④
① Send to the lower process in y
dimension
② Send to the upper process in y
dimension
③ Send to the lower process in x
dimension
④ Send to the upper process in x
dimension
OpenMP parallelization in GENESIS
1. In the case of integrator, every cell indices are divided according to the thread id.
2. As for the non-bond interaction, cell-pair lists are first identified and cell-pair lists are
distributed to each thread
1
5
2
13
9 10
6
3
14
4
7
11
8
12
1615
(1,2)
(1,5)
(1,6)
(2,3)
(2,5)
(2,6)
(2,7)
(3,4)
(3,6)
(3,7)
(3,8)
(4,7)
(4,8)
(5,6)
(5,9)
(5,10)
(6,7)
(6,10)
…
thread1
thread2
thread3
thread4
thread1
thread2
thread3
thread4
…
Parallelization of FFT in GENESIS
FFT in GENESIS (2 dimensional view)
GENESIS
(Identical domain
decomposition
between two space)
NAMD, Gromacs
(Different domain
decomposition)
Parallelization based on CPU/GPU calculation
Computationally expensive part: GPU
Communicationally expensive part: CPU
SIMD in GENESIS (developer version)
• Array of Structure (AoS)
x1 y1 z1 x2 y2 z2 xN yN zN
• Structure of Array (SoA)
In GENESIS, it is expressed as coord_pbc(1:3,1:natom,1:ncell)
In updated GENESIS source code for KNL, it will be expressed as
coord_pbc(1:natom,1:3,1:ncell)
x1 x2 x3 x4 zN-2 zN-1 zN
• For Haswell/Broadwell/KNL machines, SoA shows better performance
than AoS due to efficient vectorization (SIMD)
GENESIS developments on KNL
Simple pair-list scheme with better performance
• Pair-list scheme in GENESIS 1.X does not show good performance due to non-
contiguous memory access (Figure (a))
GENESIS developments on KNL
Memory
usage
Number of
interactions
CPU time
(pair-list)
CPU time
(non-bonded)
CPU time
(total)
Algorithm 1 (AoS) 305.88 MiB 47364770 9.95 36.42 46.37
Algorithm 1 (SoA) 305.88 MiB 47364770 6.60 33.71 40.31
Algorithm 2 (AoS) 48.4 MiB 122404436 4.91 102.46 107.37
Algorithm 2 (SoA) 48.4 MiB 122404436 4.15 27.86 32.01
• Algorithm 1: pair-list algorithm used in GENESIS 1.X
• Algorithm 2: New pair-list scheme
• Target system : ApoA1 (about 90,000 atoms, 2000 time steps)
Benchmark performance of GENESIS
GENESIS 1.0 performance on K (1)
8192 16384 32768 65536 131072 262144
0.006
0.008
0.01
0.02
0.04
0.06
Timeperstep(sec)
Number of cores
8192 16384 32768 65536 131072 262144
22
3
44
6
88
12
1616
24
3232
Simulationtime(ns/day)
Number of cores
• atom size: 11737298
• macro molecule size: 216 (43 type)
• metabolites size: 4212 (76 type)
• ion size: 23049 (Na+, Cl-, Mg2+, K+)
• water size: 2944143
• PBC size: 480 x 480 x 480(Å3)
GENESIS 1.0 performance on K (2)
16384 32768 65536 131072 262144
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.2
0.3
Number of cores
Timeperstep(sec)
16384 32768 65536 131072 262144
0.5
0.6
0.7
0.8
0.9
1
2
3
4
5
6
7
8
9
10
Simulationtime(ns/day)
Number of cores
100nm = 1/10000mm
GENESIS performance on KNL (Oak-foreast)
GENESIS performance on KNL (LANL Trinity)
Efficient parallelization enable us to perform the
MD simulations of the world largest system
Summary
• Parallelization: MPI and OpenMP
– MPI: Distributed memory parallelization
– OpenMP: Shared memory parallelization
• Parallelization of MD: mainly by domain decomposition with FFT
parallelization
• Key issues in parallelization: minimizing communication cost to maximize
the parallel efficiency

Weitere ähnliche Inhalte

Was ist angesagt?

Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...TELKOMNIKA JOURNAL
 
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...ecij
 
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationAn Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationIJRES Journal
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmeSAT Publishing House
 
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...IDES Editor
 
Writing distributed N-body code using distributed FFT - 1
Writing distributed N-body code using distributed FFT - 1Writing distributed N-body code using distributed FFT - 1
Writing distributed N-body code using distributed FFT - 1kr0y
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...ijdpsjournal
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...IJNSA Journal
 
presentation_final
presentation_finalpresentation_final
presentation_finalSushanta Roy
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd Iaetsd
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageTakahiro Katagiri
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
 

Was ist angesagt? (15)

Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...
 
Analysis_molf
Analysis_molfAnalysis_molf
Analysis_molf
 
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
 
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm OptimizationAn Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
An Improved Empirical Mode Decomposition Based On Particle Swarm Optimization
 
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithmResourceful fast dht algorithm for vlsi implementation by split radix algorithm
Resourceful fast dht algorithm for vlsi implementation by split radix algorithm
 
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
Solving Unit Commitment Problem Using Chemo-tactic PSO–DE Optimization Algori...
 
PRML 5.5
PRML 5.5PRML 5.5
PRML 5.5
 
Writing distributed N-body code using distributed FFT - 1
Writing distributed N-body code using distributed FFT - 1Writing distributed N-body code using distributed FFT - 1
Writing distributed N-body code using distributed FFT - 1
 
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
A PROGRESSIVE MESH METHOD FOR PHYSICAL SIMULATIONS USING LATTICE BOLTZMANN ME...
 
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
Multi carrier equalization by restoration of redundanc y (merry) for adaptive...
 
presentation_final
presentation_finalpresentation_final
presentation_final
 
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization andIaetsd fpga implementation of cordic algorithm for pipelined fft realization and
Iaetsd fpga implementation of cordic algorithm for pipelined fft realization and
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT Language
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
 

Ähnlich wie El text.tokuron a(2019).jung190711

Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHiroshi Watanabe
 
Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System IJECEIAES
 
A Single-Phase Clock Multiband Low-Power Flexible Divider
A Single-Phase Clock Multiband Low-Power Flexible DividerA Single-Phase Clock Multiband Low-Power Flexible Divider
A Single-Phase Clock Multiband Low-Power Flexible Dividerijsrd.com
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...IOSRJVSP
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Andrea Tassi
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis PresentationReem Sherif
 
Design and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing CircuitDesign and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing CircuitIDES Editor
 
Array multiplier using p mos based 3t xor cell
Array multiplier using p mos based 3t xor cellArray multiplier using p mos based 3t xor cell
Array multiplier using p mos based 3t xor cellAlexander Decker
 
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...IJTET Journal
 
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction IOSR Journals
 

Ähnlich wie El text.tokuron a(2019).jung190711 (20)

Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System
 
Ad4103173176
Ad4103173176Ad4103173176
Ad4103173176
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
A Single-Phase Clock Multiband Low-Power Flexible Divider
A Single-Phase Clock Multiband Low-Power Flexible DividerA Single-Phase Clock Multiband Low-Power Flexible Divider
A Single-Phase Clock Multiband Low-Power Flexible Divider
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis Presentation
 
Design and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing CircuitDesign and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing Circuit
 
13486500-FFT.ppt
13486500-FFT.ppt13486500-FFT.ppt
13486500-FFT.ppt
 
Array multiplier using p mos based 3t xor cell
Array multiplier using p mos based 3t xor cellArray multiplier using p mos based 3t xor cell
Array multiplier using p mos based 3t xor cell
 
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...
Design Of Area Delay Efficient Fixed-Point Lms Adaptive Filter For EEG Applic...
 
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction
 

Mehr von RCCSRENKEI

第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第12回 配信講義 計算科学技術特論B(2022)
第12回 配信講義 計算科学技術特論B(2022)第12回 配信講義 計算科学技術特論B(2022)
第12回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第13回 配信講義 計算科学技術特論B(2022)
第13回 配信講義 計算科学技術特論B(2022)第13回 配信講義 計算科学技術特論B(2022)
第13回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第10回 配信講義 計算科学技術特論B(2022)
第10回 配信講義 計算科学技術特論B(2022)第10回 配信講義 計算科学技術特論B(2022)
第10回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第9回 配信講義 計算科学技術特論B(2022)
 第9回 配信講義 計算科学技術特論B(2022) 第9回 配信講義 計算科学技術特論B(2022)
第9回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第8回 配信講義 計算科学技術特論B(2022)
第8回 配信講義 計算科学技術特論B(2022)第8回 配信講義 計算科学技術特論B(2022)
第8回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第6回 配信講義 計算科学技術特論B(2022)
第6回 配信講義 計算科学技術特論B(2022)第6回 配信講義 計算科学技術特論B(2022)
第6回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第5回 配信講義 計算科学技術特論B(2022)
第5回 配信講義 計算科学技術特論B(2022)第5回 配信講義 計算科学技術特論B(2022)
第5回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...Realization of Innovative Light Energy Conversion Materials utilizing the Sup...
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...RCCSRENKEI
 
Current status of the project "Toward a unified view of the universe: from la...
Current status of the project "Toward a unified view of the universe: from la...Current status of the project "Toward a unified view of the universe: from la...
Current status of the project "Toward a unified view of the universe: from la...RCCSRENKEI
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedRCCSRENKEI
 
第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第3回 配信講義 計算科学技術特論B(2022)
第3回 配信講義 計算科学技術特論B(2022)第3回 配信講義 計算科学技術特論B(2022)
第3回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第2回 配信講義 計算科学技術特論B(2022)
第2回 配信講義 計算科学技術特論B(2022)第2回 配信講義 計算科学技術特論B(2022)
第2回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
第1回 配信講義 計算科学技術特論B(2022)
第1回 配信講義 計算科学技術特論B(2022)第1回 配信講義 計算科学技術特論B(2022)
第1回 配信講義 計算科学技術特論B(2022)RCCSRENKEI
 
210603 yamamoto
210603 yamamoto210603 yamamoto
210603 yamamotoRCCSRENKEI
 
第15回 配信講義 計算科学技術特論A(2021)
第15回 配信講義 計算科学技術特論A(2021)第15回 配信講義 計算科学技術特論A(2021)
第15回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 

Mehr von RCCSRENKEI (20)

第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)第15回 配信講義 計算科学技術特論B(2022)
第15回 配信講義 計算科学技術特論B(2022)
 
第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)第14回 配信講義 計算科学技術特論B(2022)
第14回 配信講義 計算科学技術特論B(2022)
 
第12回 配信講義 計算科学技術特論B(2022)
第12回 配信講義 計算科学技術特論B(2022)第12回 配信講義 計算科学技術特論B(2022)
第12回 配信講義 計算科学技術特論B(2022)
 
第13回 配信講義 計算科学技術特論B(2022)
第13回 配信講義 計算科学技術特論B(2022)第13回 配信講義 計算科学技術特論B(2022)
第13回 配信講義 計算科学技術特論B(2022)
 
第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)第11回 配信講義 計算科学技術特論B(2022)
第11回 配信講義 計算科学技術特論B(2022)
 
第10回 配信講義 計算科学技術特論B(2022)
第10回 配信講義 計算科学技術特論B(2022)第10回 配信講義 計算科学技術特論B(2022)
第10回 配信講義 計算科学技術特論B(2022)
 
第9回 配信講義 計算科学技術特論B(2022)
 第9回 配信講義 計算科学技術特論B(2022) 第9回 配信講義 計算科学技術特論B(2022)
第9回 配信講義 計算科学技術特論B(2022)
 
第8回 配信講義 計算科学技術特論B(2022)
第8回 配信講義 計算科学技術特論B(2022)第8回 配信講義 計算科学技術特論B(2022)
第8回 配信講義 計算科学技術特論B(2022)
 
第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)第7回 配信講義 計算科学技術特論B(2022)
第7回 配信講義 計算科学技術特論B(2022)
 
第6回 配信講義 計算科学技術特論B(2022)
第6回 配信講義 計算科学技術特論B(2022)第6回 配信講義 計算科学技術特論B(2022)
第6回 配信講義 計算科学技術特論B(2022)
 
第5回 配信講義 計算科学技術特論B(2022)
第5回 配信講義 計算科学技術特論B(2022)第5回 配信講義 計算科学技術特論B(2022)
第5回 配信講義 計算科学技術特論B(2022)
 
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...Realization of Innovative Light Energy Conversion Materials utilizing the Sup...
Realization of Innovative Light Energy Conversion Materials utilizing the Sup...
 
Current status of the project "Toward a unified view of the universe: from la...
Current status of the project "Toward a unified view of the universe: from la...Current status of the project "Toward a unified view of the universe: from la...
Current status of the project "Toward a unified view of the universe: from la...
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)第4回 配信講義 計算科学技術特論B(2022)
第4回 配信講義 計算科学技術特論B(2022)
 
第3回 配信講義 計算科学技術特論B(2022)
第3回 配信講義 計算科学技術特論B(2022)第3回 配信講義 計算科学技術特論B(2022)
第3回 配信講義 計算科学技術特論B(2022)
 
第2回 配信講義 計算科学技術特論B(2022)
第2回 配信講義 計算科学技術特論B(2022)第2回 配信講義 計算科学技術特論B(2022)
第2回 配信講義 計算科学技術特論B(2022)
 
第1回 配信講義 計算科学技術特論B(2022)
第1回 配信講義 計算科学技術特論B(2022)第1回 配信講義 計算科学技術特論B(2022)
第1回 配信講義 計算科学技術特論B(2022)
 
210603 yamamoto
210603 yamamoto210603 yamamoto
210603 yamamoto
 
第15回 配信講義 計算科学技術特論A(2021)
第15回 配信講義 計算科学技術特論A(2021)第15回 配信講義 計算科学技術特論A(2021)
第15回 配信講義 計算科学技術特論A(2021)
 

Kürzlich hochgeladen

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Kürzlich hochgeladen (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

El text.tokuron a(2019).jung190711

  • 1. Parallelization of molecular dynamics 2019年 7月 11日 Jaewoon Jung (RIKEN Center for Computational Science) 計算科学技術特論A
  • 3. Molecular Dynamics (MD) 1. Energy/forces are described by classical molecular mechanics force field. 2. Update state according to equations of motion Long time MD trajectories are important to obtain thermodynamic quantities of target systems. Equation of motion Long time MD trajectory => Ensemble generation Integration
  • 4. Potential energy in MD 4 2 total 0 bonds 2 0 angles dihedrals 12 61 0 0 1 1 ( ) ( ) [1 cos( )] 2 b a n N N ij ij i j ij ij ij ijj i j E k b b k V n r r q q r r r                                                O(N) O(N) O(N) O(N2) Main bottleneck in MD 12 6 2 2 0 0 2 0 erfc( ) exp( / 4 ) 2 FFT( ( )) ij ij i j ij ij ij ij iji j R r r q q r Q r r r                                    k k k k Real space, O(CN) Reciprocal space, O(NlogN) Total number of particles
  • 5. Non-bonded interaction 1. Non-bond energy calculation is reduced by introducing cutoff 2. The electrostatic energy calculation beyond cutoff will be done in the reciprocal space with FFT 3. Further, it could be reduced by properly distributing over parallel processes, in particular good domain decomposition scheme. 5 | | Real part Reciprocal part Self energy O(N2) O(N1)
  • 6. Difficulty to perform long time MD simulation 1. One time step length (Δt) is limited to 1-2 fs due to vibrations. 2. On the other hand, biologically meaningful events occur on the time scale of milliseconds or longer. fs ps ns μs ms sec vibrations Sidechain motions Mainchain motions Folding Protein global motions
  • 7. How to accelerate MD simulations? => Parallelization Serial Parallel 16 cpus X16? 1cpu 1cpu Good Parallelization ; 1) Small amount of computation in one process 2) Small communicational cost C CPU Core MPI Comm
  • 9. Shared memory parallelization (OpenMP) Memory P1 P2 P3 PP-1 PP • All processes share data in memory • For efficient parallelization, processes should not access the same memory address. • It is only available for multi-processors in a physical node
  • 10. Distributed memory parallelization (MPI) M1 P1 M2 P2 M3 P3 MP-1 PP-1 MP PP … • Processors do not share data in memory • We need to send/receive data via communications • For efficient parallelization, the amount of communication data should be minimized
  • 11. Hybrid parallelization (MPI+OpenMP) M1 P1 M2 P2 M3 P3 MP-1 PP-1 MP PP … • Combination of shared memory and distributed memory parallelization. • It is useful for minimizing communicational cost with very large number of processors
  • 12. SIMD (Single instruction, multiple data) • Same operation on multiple data points simultaneously • Usually applicable to common tasks like adjusting graphic image or volume • In most MD programs, SIMD becomes the one of the important topics to increase the performance SIMDNo SIMD 4 times faster
  • 13. SIMT (Single instruction, multiple threads) • SIMD combined with multithreading • SIMT execution model is usually implemented on GPUs and related with GPGPU (General Purpose computing on Graphics Processing Units) • Currently, CUDA allows 32 threads for SIMT (warp size =32) Grid 1 Block (0,0) Block (0,1) Block (1,1) Block (1,0) Block (1,0) Thread (0,0) Thread (0,1) Thread (1,0) Thread (1,1) Thread (1,2) Thread (2,0) Thread (2,1) Thread (3,0) Thread (3,1) Thread (2,2) Thread (3,2) Thread (0,2)
  • 14. MPI Parallelization of MD (non-bonded interaction in real space)
  • 15. Parallelization scheme 1: Replicated data approach 1. Each process has a copy of all particle data. 2. Each process works only part of the whole works by proper assign in do loops. do i = 1, N do j = i+1, N energy(i,j) force(i,j) end do end do my_rank = MPI_Rank proc = total MPI do i = my_rank+1, N, proc do j = i+1, N energy(i,j) force(i,j) end do end do MPI reduction (energy,force)
  • 16. atom indices Computationalcost MPI rank 1 MPI rank 0 MPI rank 2 MPI rank 3 1 2 3 4 5 6 7 8 Perfect load balance is not guaranteed in this parallelization scheme Parallelization scheme 1: Replicated data approach
  • 17. Hybrid (MPI+OpenMP) parallelization of the Replicated data approach 1. Works are distributed over MPI and OpenMP threads. 2. Parallelization is increased by reducing the number of MPIs involved in communications. my_rank = MPI_Rank proc = total MPI do i = my_rank+1, N, proc do j = i+1, N energy(i,j) force(i,j) end do end do MPI reduction (energy,force) my_rank = MPI_Rank proc = total MPI nthread = total OMP thread !$omp parallel id = omp thread id my_id = my_rank*nthread + id do i = my_id+1,N,proc*nthread do j = i+1, N energy(i,j) force(i,j) end do end do Openmp reduciton !$omp end parallel MPI reduction (energy,force)
  • 18. Pros and Cons of the Replicated data approach 1. Pros : easy to implement 2. Cons • Parallel efficiency is not good • No perfect load balance • Communication cost is not reduced by increasing the number of processes • We can parallelize only for energy calculation (with MPI, parallelization of integration is not so much efficient) • Needs a lot of memory • Usage of global data
  • 19. Parallelization scheme 2: Domain decomposition 1. The simulation space is divided into subdomains according to MPI (different colors for different MPIs). 2. Each MPI only considers the corresponding subdomain. 3. MPI communications only among neighboring processes. Communications between processors
  • 20. Parallelization scheme 2: Domain decomposition 1. For the interaction between different subdomains, it is necessary to have the data of the buffer region of each subdomain. 2. The size of the buffer region is dependent on the cutoff values. 3. The interaction between particles in different subdomains should be considered very carefully. 4. The size of the subdomain and buffer region is decreased by increasing the number of processes. 𝒄 Sub- domain Buffer
  • 21. Pros and Cons of the domain decomposition approach 1. Pros • Good parallel efficiency • Reduced computational cost by increasing the number of processes • We can easily parallelize not only energy but also integration • Availability of a huge system • Data size is decreased by increasing the number of processes 2. Cons • Implementation is not easy • Domain decomposition scheme is highly depend on the potential energy type, cutoff and so on • Good performance cannot be obtained for nonuniform particle distributions
  • 22. Special treatment for nonuniform distributions Tree method : Subdomain size is adjusted to have the same number of particles Hilbert space filling curve : a map that relates multi-dimensional space to one-dimensional curve The above figure is from Wikipedia https://en.wikipedia.org/wiki/Hilbert_curve
  • 23. Comparison of two parallelization scheme Computation Communication Memory Replicated data O(N/P) O(N) O(N) Domain decomposition O(N/P) O((N/P)2/3) O(N/P) N: system size P: number of processes
  • 24. MPI Parallelization of MD (reciprocal space)
  • 25. Smooth particle mesh Ewald method Real part Reciprocal part Self energy The structure factor in the reciprocal part is approximated as Using Cardinal B-splines of order n Fourier Transform of charge   2 2 2 2 2 2 ' , 1 1 ' erfc1 1 exp( / ) ( ) ( ) 2 2 i j c N N i j i j i i j ii j r q q E S q V                       n k 0 r r n r r n k r k kr r n 1 2 3 1 1 2 2 3 3 1 2 3( , , ) ( ) ( ) ( ) ( )( , , )S k k k b k b k b k F Q k k k It is important to parallelize the Fast Fourier transform efficiently in PME!! Ref : U. Essmann et al, J. Chem. Phys. 103, 8577 (1995)
  • 26. Overall procedure of reciprocal space calculation (charge on real space) (charge on grid) E( ) Force calculation from PME_Pre FFT Inverse FFT PME_Post
  • 27. Simple note of MPI_alltoall communications MPI_alltoall A1 A3 A2 B1 A4 B2 B4 B3 C2 C1 C3 C4 D1 D2 D4 D3 Proc1 Proc2 Proc3 Proc4 A1 C1 B1 A2 D1 B2 D2 C2 B3 A3 C3 D3 A4 B4 D4 C4 Proc1 Proc2 Proc3 Proc4 MPI_alltoall is same as matrix transpose!!
  • 28. Parallel 3D FFT – slab (1D) decomposition 1. Each process is assigned a slob of size for computing FFT of grids on processes. 2. The scalability is limited by 3. should be divisible by Proc 1 Proc 2 Proc 3 Proc 4
  • 29. Parallel 3D FFT – slab (1D) decomposition No communication MPI_Alltoall communication among all preocessors
  • 30. Parallel 3D FFT – slab (1D) decomposition (continued) 1. Slab decomposition of 3D FFT has three steps • 2D FFT (or two 1D FFT) along the two local dimension • Global transpose (communication) • 1D FFT along third dimension 2. Pros • fast using small number of processes 3. Cons • Limitation of the number of processes
  • 31. Parallel 3D FFT –2D decomposition Each process is assigned a data of size for computing a FFT of grids on processes. MPI_Alltoall communications among 3 processes of the same color MPI_Alltoall communications among 3 processes of the same color
  • 32. Parallel 3D FFT –2D decomposition (continued) 1. 2D decomposition of 3D FFT has five steps • 1D FFT along the local dimension • Global transpose • 1D FFT along the second dimension • Global transpose • 1D FFT along the third dimension 2. The global transpose requires communication only between subgroups of all nodes 3. Cons: Slower than 1D decomposition for a small number of processes 4. Pros : Maximum parallelization is increased
  • 33. Pseudo code of reciprocal space calculation with 2D decomposition of 3D FFT ! compute Q factor do i = 1, natom/P compute Q_orig end do call mpi_alltoall(Q_orig, Q_new, …) accumulate Q from Q_new !FFT : F(Q) do iz = 1, zgrid(local) do iy = 1, ygrid(local) work_local = Q(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q, Q_new,…) do iz = 1, zgrid(local) do ix = 1, xgrid(local) work_local = Q_new(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new,..) do iy = 1, ygrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do ! compute energy and virial do iz = 1, zgrid do iy = 1, ygrid(local) do ix = 1, xgrid(local) energy = energy + sum(Th*Q) virial = viral + .. end do end do end do ! X=F_1(Th)*F_1(Q) ! FFT (F(X)) do iy = 1, ygrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new,..) do iz = 1, zgrid(local) do ix = 1, xgrid(local) work_local = Q(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do call mpi_alltoall(Q,Q_new) do iz = 1, zgrid(local) do iy = 1, ygrid(local) work_local = Q(my_rank) call fft(work_local) Q(my_rank) = work_local end do end do !compute force
  • 34. Parallel 3D FFT – 3D decomposition • More communications than existing FFT • MPI_Alltoall communications only in one dimensional space • Reduce communicational cost for large number of processes Ref) J. Jung et al. Comput. Phys. Comm. 200, 57-65 (2016)
  • 35. Case Study: Parallelization in GENESIS MD software
  • 36. Domain decomposition in GENESIS SPDYN 1. The simulation space is divided into subdomain according to the number of MPI processes. 2. Each subdomain is further divided into the unit domain (named cell). 3. Particle data are grouped together in each cell. 4. Communications are considered only between neighboring processes. 5. In each subdomain, OpenMP parallelization is used. 𝒄
  • 37. Efficient shared memory calculation =>Cell-wise particle data 1. Each cell contains an array with the data of the particles that reside within it 2. This improves the locality of the particle data for operations on individual cells or pairs of cells. particle data in traditional cell lists Ref : P. Gonnet, JCC 33, 76-81 (2012) cell-wise arrays of particle data
  • 38. Midpoint cell method • For every cell pair, midpoint cell is decided. • If midpoint cell is not uniquely decided, we decide it by considering load balance. • For every particle pairs, interaction subdomain is decided from the midpoint cell. • With this scheme, we can only communicate data of each subdomain only considering the adjacent cells of each subdomain.
  • 39. Subdomain structure with midpoint cell method 1. Each MPI has local subdomain which is surrounded by boundary cells. 2. At least, the local subdomain has at least two cells in each direction Local sub-domain Boundary
  • 40. Communication pattern (example of coordinates) 1. Each subdomain can evaluate interactions by obtaining the data of boundary region from other processes 2. Using the communication pattern shown below, we can minimize the frequency of communication. ①② ③ ④ ① Receive from the upper process in x dimension ② Receive from the lower process in x dimension ③ Receive from the upper process in y dimension ④ Receive from the lower process in y dimension
  • 41. Communication pattern (example of forces) 1. Communication pattern of forces is opposite to the communication pattern of coordinates. 2. Each process send the force data in boundary cells. ① ② ③ ④ ① Send to the lower process in y dimension ② Send to the upper process in y dimension ③ Send to the lower process in x dimension ④ Send to the upper process in x dimension
  • 42. OpenMP parallelization in GENESIS 1. In the case of integrator, every cell indices are divided according to the thread id. 2. As for the non-bond interaction, cell-pair lists are first identified and cell-pair lists are distributed to each thread 1 5 2 13 9 10 6 3 14 4 7 11 8 12 1615 (1,2) (1,5) (1,6) (2,3) (2,5) (2,6) (2,7) (3,4) (3,6) (3,7) (3,8) (4,7) (4,8) (5,6) (5,9) (5,10) (6,7) (6,10) … thread1 thread2 thread3 thread4 thread1 thread2 thread3 thread4 …
  • 44. FFT in GENESIS (2 dimensional view) GENESIS (Identical domain decomposition between two space) NAMD, Gromacs (Different domain decomposition)
  • 45. Parallelization based on CPU/GPU calculation Computationally expensive part: GPU Communicationally expensive part: CPU
  • 46. SIMD in GENESIS (developer version) • Array of Structure (AoS) x1 y1 z1 x2 y2 z2 xN yN zN • Structure of Array (SoA) In GENESIS, it is expressed as coord_pbc(1:3,1:natom,1:ncell) In updated GENESIS source code for KNL, it will be expressed as coord_pbc(1:natom,1:3,1:ncell) x1 x2 x3 x4 zN-2 zN-1 zN • For Haswell/Broadwell/KNL machines, SoA shows better performance than AoS due to efficient vectorization (SIMD)
  • 47. GENESIS developments on KNL Simple pair-list scheme with better performance • Pair-list scheme in GENESIS 1.X does not show good performance due to non- contiguous memory access (Figure (a))
  • 48. GENESIS developments on KNL Memory usage Number of interactions CPU time (pair-list) CPU time (non-bonded) CPU time (total) Algorithm 1 (AoS) 305.88 MiB 47364770 9.95 36.42 46.37 Algorithm 1 (SoA) 305.88 MiB 47364770 6.60 33.71 40.31 Algorithm 2 (AoS) 48.4 MiB 122404436 4.91 102.46 107.37 Algorithm 2 (SoA) 48.4 MiB 122404436 4.15 27.86 32.01 • Algorithm 1: pair-list algorithm used in GENESIS 1.X • Algorithm 2: New pair-list scheme • Target system : ApoA1 (about 90,000 atoms, 2000 time steps)
  • 50. GENESIS 1.0 performance on K (1) 8192 16384 32768 65536 131072 262144 0.006 0.008 0.01 0.02 0.04 0.06 Timeperstep(sec) Number of cores 8192 16384 32768 65536 131072 262144 22 3 44 6 88 12 1616 24 3232 Simulationtime(ns/day) Number of cores • atom size: 11737298 • macro molecule size: 216 (43 type) • metabolites size: 4212 (76 type) • ion size: 23049 (Na+, Cl-, Mg2+, K+) • water size: 2944143 • PBC size: 480 x 480 x 480(Å3)
  • 51. GENESIS 1.0 performance on K (2) 16384 32768 65536 131072 262144 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.2 0.3 Number of cores Timeperstep(sec) 16384 32768 65536 131072 262144 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 Simulationtime(ns/day) Number of cores 100nm = 1/10000mm
  • 52. GENESIS performance on KNL (Oak-foreast)
  • 53. GENESIS performance on KNL (LANL Trinity)
  • 54. Efficient parallelization enable us to perform the MD simulations of the world largest system
  • 55. Summary • Parallelization: MPI and OpenMP – MPI: Distributed memory parallelization – OpenMP: Shared memory parallelization • Parallelization of MD: mainly by domain decomposition with FFT parallelization • Key issues in parallelization: minimizing communication cost to maximize the parallel efficiency