2. Content
Measurement of speedup of Matrix
multiplication using SkeTo and HTA
Early knowledge for developing
systematically efficient HTA algorithmics
2
3. Parallel Alogrithm of inner product based
matrix multiplication is from the paper:
“A compositional Framework for Developing
Parallel Programs on Two-Dimensional
Arrays” – Kento Emoto, Zhenjiang Hu, et al.
3
4. Inner product based MM
method
A and B are too column vectors
Inner Product (IProd) of A and B is the
following:
4
5. Inner product based MM
method
all_redr operator
all_redc operator
5
Zipwithiprod operator
6. Alg 1: Parallel Algorithm for
SkeTo
a1,1 a1,2 … a1,n A1 A1 … A1
all_redr
a2,1 a2,2 … a2,n A2 A2 … A2
A=
…
…
…
…
…
…
an,1 an,2 … an,n An An … An
b1,1 b1,2 … b1,n B1 B 2 … B n
all_redc
b2,1 b2,2 … b2,n B1 B 2 … B n
B=
…
…
…
…
…
…
bn,1 bn,2 … bn,n B1 B 2 … B n
Where:
Ai = (ai,1 , ai,2 , ai,3 ,… , ai,n ), i = 1,2, … n
Bj = (b1,j , b2,j , b3,j ,… , Bn,j ), j = 1,2, … n
all_redr = scanr(<< , >>) o scan(>>, Bsd) o map|.|
all_redc = scanr(>> , <<) o scan(Abv, >>) o map|.|
(a) << (b) = (a)
(a) >> (b) = (b)
(a) BSD (b) = (a, b)
6 (a) ABV (b) = (a, b)
7. Alg 1: Parallel Algorithm for
SkeTo
A1 A1 … A1 B1 B 2 … B n
A2 A2 … A2 B1 B 2 … B n
A * B = Zipwithiprod
…
…
…
…
…
…
An An … An B1 B 2 … B n
(A1 . B1 ) (A1 . B2 ) … (A1 . Bn )
(A2 . B1 ) (A2 . B2 ) … (A2 . Bn )
=
…
…
…
(An . B1 ) (An . B2 ) … (An . Bn )
7
11. Early knowledge for developing systematically
efficient HTA algorithmics
• Formalize HTA data structure
• Use notations in Functional Language
• A fixed set of parallel skeletons
• Map, Reduce, MapReduce, Scan, ...
• A systematic programming methodology
• Develop efficient and correct parallel programs
• An automatic optimization mechanism
• Eliminate inefficient: compositional and nested uses
12. Formal Definition of HTA
• Constructive Algorithmics
• List, Matrix: not recursive
• Tree (Maybe is recursive)
• HTA is a recursive data structure
• How to construct it recursively?
13. Systematic programming
methodology
• Extend the result of two-dimension arrays.
• Almost homomorphism and accumulative parallel
skeleton is very useful
• The paper: “An Accumulative Parallel Skeleton
for All”
14. Automatic optimization
mechanism
• Tupling
• ???
• Fusion
• Fuse several skeletons into one.
• Eliminate unnecessary intermediate data
structures passed between skeletons.