SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Introduction to Polyhedral Compilation
Akihiro Hayashi, Jun Shirako
Rice University

1
Outline
q High-level Summary
q Theory
q Compilers and Tools
2
HIGH LEVEL SUMMARY
Introduction to Polyhedral Compilation
3
q The first priority is “performance” 
4	
Supercomputers
Personal Computers
Smartphones
Embedded
Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info
Parallel Computing
Parallel programming is hard…
5	
DRAM
L3 Cache
Core Core Core Core
L2 Cache L2 Cache
SIMD SIMD SIMD SIMD
L1$ L1$ L1$ L1$
DRAM(slowest)–Register(fastest)
Exploiting
SIMD
Scheduling tasks on CPUs
Optimizing
Data Locality
Multi-core CPUs Many-core GPUs
C
L2 Cache
DRAM
C
C
C
C
C
C
C
C C C C
C
C
C
C
C
C
C
C
C C C C
Utilizing
Accelerators
A gap between domain experts and
hardware
6	
Application Domain

(Domain Experts)
Prog Lang.
Compilers
Runtime
Want to get significant
performance
improvement easily
(Performance Portability)
Hard to exploit the full
capability of hardware
We believe
Languages and
Compilers are
very important! 
Hardware
(Concurrency Experts)
A review of literature
q Automatic Parallelizing Compiler 
§  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly,
Polaris, R-Stream, SUIF, …
q Parallel Languages
§  Language-based:
ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, …
§  Directive-based:
ü OpenMP, OpenACC, OmpSs, …
§  Library-based:
ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, …
7
From the perspective of
compilers…
q Compilers are one of the most
complicated software L
§  Pointer Analysis
§  Scalar Optimizations
§  Loop Transformations
§  Vectorization/SIMDization
§  Scheduling
§  Exploiting accelerators
§  …
8	
Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
What are compilers doing?
9	
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
Parsing Optimizations
What are compilers doing?
10	
q Compiler can modify programs (e.g. change the execution
order of statements) as long as maintaining the semantics
of programs 
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
Intermediate Representation
(e.g. AST)
Programs
x = a + b;
y = x;
z = x + y;
“Optimized” Code
z = x + y;
x = a + b;
Examples of optimizations:

Scalar optimizations
11	
x = a + b;
y = x;
z = x + y;
x = a + b;
y = a + b;
z = x + y;
a = 0;
if (a) {
…
}
Constant	
Propaga4on	
a = 0;
if (0) {
…
}
Dead	Code	
Elimina4on	
a = 0;
CSE
Examples of optimizations:

loop permutation (interchange)
12	
for (i = 0; i < M; i++) {
for (j = 0; j < N; j++) {
b[i][j] = a[i][j];
}
}
for (j = 0; j < N; j++) {
for (i = 0; i < M; i++) {
b[i][j] = a[i][j];
}
}
Offset access
(faster on CPUs)
Stride access
(slower on CPUs)
Interchanged	Original
Examples of optimizations:

loop fusion/distribution
13	
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
d[i] = a[i] + e[i];
}
for (i = 0; i < N; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i < N; i++) {
d[i] = a[i] + e[i];
}
Better temporal locality
on CPUs
Fused	 Distributed	
Good for Vectorization
on CPUs
Depending on the loop size “N”
The phase-ordering problem
q Which order is better?
14	
a = 0;
if (a) {
…
}
Dead	Code	
Elimina4on	
a = 0;
if (a) {
…
}
a = 0;
if (0) {
…
}
Constant	
Propaga4on	
a = 0;
if (a) {
…
}
Constant	
Propaga4on	
a = 0;
if (0) {
…
}
Dead	Code	
Elimina4on	
a = 0;
15	
x = a + b;
y = a + b;
z = x + y;
+
a b
+
a b
+
ASTPrograms
x = a + b;
y = x;
z = x + y;
“Optimized” Code
AST vs. The Polyhedral Model
i >= 0;
i < N;
…
Polyhedron
(Affine Inequalities) “Synthesized” Code
TODAY
AST
Why Polyhedral Model?
q One solution for tackling the phase-ordering problem
q Good for performing a set of loop transformations 
§  Loop permutation
§  Loop fusion/distribution
§  Loop tiling 
§  …
16	
“The Polyhedral Model is a convenient alternative representation 

which combines analysis power, expressiveness and high flexibility”
- OpenScop Specification and Library
THEORY
Introduction to Polyhedral Compilation
17
The polyhedral model in a nutshell
q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things:
§  Domain: A set of instances for a statement 
§  Scattering (Scheduling): an instance -> time stamp
§  Access: an instance -> array element(s)
q  Limitation: Only applicable for Static Control Part (SCoP) in general 
§  Loop bounds and conditionals are affine functions of the surrounding the loop iterators
18	
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
Representation of “Domain”
q Observations:
§  S1 is executed 30
times (30 instances)
§  Each instance is
associated with (i,j)
19	
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
“The key aspect of the polyhedral model is to consider statement instances.”
- OpenScop Specification and Library
Iteration Domain
q  A set of constraints to represent instances of a statement
§  Using iteration vectors (i,j);
§  If those constraints are affine -> Polyhedron 
20	
for (i=1; i <= 5; i++)
for (j=1; j <= 6;j++)
S1;
1 ≤ i ≤ 5,1 ≤ j ≤ 6;
DS1
=
1 0 −1
−1 0 5
0 1 −1
0 −1 6
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
1
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
≥ 0
Credits: Clint (https://www.ozinenko.com/clint)
Representation of “Scheduling”:

1-dimensional schedules
q Function T: returns the logical date of each statement 
21	
x = a + b; // S1
y = a + b; // S2
z = x + y; // S3
T_S1 = 0;
T_S2 = 1;
T_S3 = 2;
LogicalTime
T=0
T=1
T=2
Representation of “Scheduling”:

multi-dimensional schedules
22	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(0) = (1, 0);
T_S2(1) = (1, 1);
T_S3 = (2)
T=0
i=0
i=1
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§  Lexicographical Order: 
§  C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:

multi-dimensional schedules
23	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
z = x + y; // S3
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3 = (2)
T=0
i=0
i=1
Parameterized:
Recall “Iteration
domain”
0 ≤ i < 2
q Function T: returns the logical date of each statement
q Logical dates may be multi-dimensional (c.f. clocks
§  Lexicographical Order: 
§  C.f. Clocks (days, hours, minutes, seconds)
TS1
≺ TS2
≺ TS3
⇔ (0) ≺ (1,i) ≺ (2)
Representation of “Scheduling”:

multi-dimensional schedules 
24	
x = a + b; // S1
for (i = 0; i < 2; i++) {
 a[i] = x; // S2
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] += a[i]; // S3
}
}
LogicalTime
T=1
T=2
T_S1 = (0);
T_S2(i) = (1, i);
T_S3(i,j) = (2, i, j);
T=0
i=0
i=1
j=0
i=1
j=0
i=1
i=0
i=1
Loop transformations with schedules
25	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
TS1
(i, j) = 1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, j);
Original	schedule	
New	schedule	
New
Schedule
Iteration 

Vector
Original	
New	
Transformation
Loop transformations with schedules: 

Loop Reversal
26	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (-i, j);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = −1 0
0 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = −i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
for (i = -1; i <= 0; i++) {
for (j = 0; j < 3; j++) {
b[-i][j] = ...; // S1
}
}
inew
= −iold
;
iold
→ −inew
;
Loop transformations with schedules: 

Loop Permutation
27	
for (i = 0; i < 2; i++) {
for (j = 0; j < 3; j++) {
b[i][j] = ...; // S1
}
}
for (j = 0; j < 3; j++) {
  for (i = 0; i < 2; i++)
b[i][j] = ...; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (j, i);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = 0 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = j
i
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
Loop transformations with schedules: 

Loop Skewing
28	
for (i = 1; i <= 5; i++) {
for (j = 1; j <= 5; j++) {
a[i][j] = a[i-1][j+1]; // S1
}
}
for (i = 1; i <= 5; i++) {
for (j = i+1; j <= i+5; j++) {
a[i][j-i] = 

a[i-1][j-i+1]; // S1
}
}
T_S1(i, j) = (i, j);
T_S1(i, j) = (i, i+j);
Original	schedule	
New	schedule	
Original	
New	
TS1
(i,j) = 1 0
1 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ = i
i + j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
New
Schedule
Iteration 

VectorTransformation
jnew
= i + jold
;
jold
→ jnew
− i;
Loop transformations with schedules: 

Loop Skewing (Cont’d)
29	
TS1
= 1 0
1 1
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
Credits: Clint (https://www.ozinenko.com/clint)
(i,i+j)
=
(1,2);
(1,3);
(1,4);
(1,5);
(2,3);
(2,4);
(2,5);
(2,6);
(2,7);
(3,4);
(3,5);
(3,6);
(3,7);
(3,8);
(4,5);
…
(i,j)
=
(1,1);
(1,2);
(1,3);
(1,4);
(1,5);
(2,1);
(2,2);
(2,3);
(2,4);
(2,5);
(3,1);
(3,2);
(3,3);
(3,4);
(3,5);
…
Dependence
Execution Order
Scalar Dimensions in schedules
q 2d+1 format (d+d+1) 
q Can represent/transform imperfectly nested loops
§  e.g., Loop fusion/distribution
30	
for (i = 0; i < 2; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S3
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
T_S3(i,j) = (1, i, 0, j, 0)
Loop transformations to schedules

loop fusion w/ scalar dimensions
31	
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
b[i] = ...; // S2
for (i = 0; i < 2; i++)
for (j = 0; j < 3; j++)
a[i] = ...; // S1
for (j = 0; j < 3; j++)
b[i] = ...; // S2
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (1, i, 0, j);
T_S1(i,j) = (0, i, 0, j);
T_S2(i,j) = (0, i, 1, j);
Original	schedule	
New	schedule	
TS2
(i,j) =
0 0
1 0
0 0
0 1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟ +
0
0
1
0
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
=
0
i
1
j
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
New 

Schedule
Scalar
DimensionsTransformation
Original	
New
Schedules in general
32	
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules	
e.g.,	(0,	i,	0,	j)	
d = 2mS
+ 1, mS
= the size of iteration vector
Schedules in general
33	
TS
(
!
i) =
φS
1
(
!
i)
φS
2
(
!
i)
φS
3
(
!
i)
φS
4
(
!
i)
"
φS
d
(
!
i)
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
=
C11
S
C12
S
C13
S
C14
S
" C1mS
S
C21
S
C22
S
C23
S
C24
S
" C2mS
S
C31
S
C32
S
C33
S
C34
S
" C3mS
S
C41
S
C42
S
C43
S
C44
S
" C4mS
S
# # # # $ #
Cd 1
S
Cd 2
S
Cd 3
S
Cd 4
S
" CdmS
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
!
i( ) +
C10
S
C20
S
C30
S
C40
S
!
Cd 0
S
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
Scalar DimensionsA transformation for an iteration vector
d
mS
d
1
Schedules	
e.g.,	(0,	i,	0,	j)	
d = 2mS
+ 1, mS
= the size of iteration vector
Goal: 

Compute the coefficients and offsets for each statement
Legality of transformations
q All transformations are valid? NO!
 34	
for (i = 1; i <= 10; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
T_S1(i) = (0, i, 0);
T_S2(i,j) = (0, i, 1, j, 0);
for (i = 1; i <= 10; i++)
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
s[i] = ...; // S1
T_S2(i,j) = (0, i, 0, j, 0);
T_S2(i) = (0, i, 1);
Original	
New	
Transforma4on
Dependences
q Three types of dependence:
§  Read-After-Write: (a=1; then b=a;) 
§  Write-After-Read: (b=a; then a=1;)
§  Write-After-Write: (a=1; then a=2;)
q Dependence: computed from domain, access,
and schedule
§  Transformation = Find a new schedule that satisfies
all dependences
35
Dependence polyhedron
q  Dependence polyhedron : a set of inequalities ( )
§  A general and accurate representation of instance-wise dependences
36	
for (i = 1; i <= 10; i++)
s[i] = ...; // S1
for (j = 0; j < 3; j++)
a[i][j] = s[i]; // S2
iS1
= iS2
1 ≤ iS1
≤ 10,
1 ≤ iS2
≤ 10;
0 ≤ jS2
< 3;
1 −1 0 0
1 0 0 −1
−1 0 0 10
0 0 1 0
0 0 −1 2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
iS1
iS2
jS2
1
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
=
≥
0
DS1
DS2
S1
S2
Credits: Clint (https://www.ozinenko.com/clint)
iS1
= iS2
⇒ iS1
− iS2
≥ 0 ∧ iS2
− iS1
≥ 0
Legality of transformations
q Dependence polyhedron: 
q Legality:
§  
§  If “source” instance must happen before
“target” instance in the original program, the
transformed program must preserve this
property (must satisfy the dependence)
37	
∀ s,t ∈ Pe
,(s ∈ DSi
,t ∈ DSj
),TSi
(s) ≺ TSj
(t)
Pe
Putting it all together
q Goal : Compute all coefficients and offsets such that 
38	
TS2
(i, j) =
C11
S2
C12
S2
C21
S2
C22
S2
C31
S2
C32
S2
C41
S2
C32
S2
C51
S2
C52
S2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
iS2
jS2
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
+
C10
S2
C20
S2
C30
S2
C40
S2
C50
S2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
TS1
(i) =
C11
S1
C21
S1
C31
S1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
iS1( ) +
C10
S1
C20
S1
C30
S1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
1 −1 0 0
1 0 0 −1
−1 0 0 10
0 0 1 0
0 0 −1 2
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
⎟
iS1
iS2
jS2
1
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎟
=
≥
0
∀ s,t ∈ Pe
,(s ∈ DS1
,t ∈ DS2
),TS1
(s) ≺ TS2
(t)
Dependence	Polyhedron		 PeSchedules	
iS1
= iS2
1 ≤ iS1
≤ 10,
1 ≤ iS2
≤ 10;
0 ≤ jS2
< 3;
Linearizing the legality condition

(The Pluto Algorithm)
q The Legality condition (for iteration vectors)

q Uniform dependences : distance between two dependent
iteration is a constant ( is a constant)
q  Non-uniform dependences : distance between two
dependence varies ( is a function of j )
§  Apply the Farkas lemma


 39	
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ P
i → i + 1 ⇒ δ(s,t)
i → i + j ⇒ δ(s,t)
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ Pe
⇔
(c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≡ λe0
+ λek
k =1
me
∑ Pe
k
, λek
≥ 0
Each inequality in
a dependence
polyhedron
Cost Function & Objective Function

(The Pluto Algorithm)
q Compute all coefficients and offsets under the legality
condition : Solve an ILP problem 
q Cost Function = Transformation policy
§  Pluto’s cost function = dependence distance
ü Fuse loops as much as possible
ü Push loops carrying dependence inner level
§  Also used in ISL (Polly, PPCG, …) 
q Objective Function:
§  Iteratively find linearly independent solutions
40	
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s, s,t ∈ P
minimize ≺ (u1
,w,c1
Sj
,c2
Sj
)
Step-by-step example
41	
for (i = 0; i < N; i++) {
for (j = 1; j < N; j++) {
a[i][j] = a[j][i] + a[i][j-1]; // S1
}
}
a[0][1] = a[1][0] + a[0][0]; // S1(0,1)
a[0][2] = a[2][0] + a[0][1]; // S1(0,2)
a[0][3] = a[3][0] + a[0][2]; // S1(0,3)
...
a[1][1] = a[1][1] + a[1][0]; // S1(1,1)
a[1][2] = a[2][1] + a[1][1]; // S1(1,2)
a[1][3] = a[3][1] + a[1][2]; // S1(1,3)
...
a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
a[2][2] = a[2][2] + a[2][1]; // S1(2,2)
a[2][3] = a[3][2] + a[2][2]; // S1(2,3)
...
a[3][1] = a[1][3] + a[3][0]; // S1(3,1)
Dependence 1 (RAW)
Dependence 2 (RAW)
Dependence 3 (WAR)
(is
, js
) → (it
, jt
)
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
it
+ c2
S1
(jt
− 1)) ≥ 0
⇒ c2
S1
≥ 0
Step-by-step example:

Legality Constraints 1 (The Pluto Algorithm)
q Dependence 1 : RAW (flow dependence )
42	
Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1)
Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2)
...
Pe1
: is
= it
, js
= jt
− 1,0 ≤ it
≤ N − 1,2 ≤ jt
≤ N
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence	Polyhedron		Pe1
Step-by-step example:

Legality Constraints 2 (The Pluto Algorithm)
q Dependence 2 : RAW (flow dependence )
43	
Pe2
: is
= jt
, js
= it
,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
c1
S1
,c2
S1
( )
it
jt
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
− c1
S1
,c2
S1
( )
is
js
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
≥ 0, is
, js
,it
, jt
∈ Pe1
⇒ c1
S1
it
+ c2
S1
jt
− (c1
S1
is
+ c2
S1
js
) = c1
S1
it
+ c2
S1
jt
− (c1
S1
jt
+ c2
S1
it
) ≥ 0
⇒ (c1
S1
− c2
S1
)it
+ (c2
S1
− c1
S1
)jt
≥ 0,1 ≤ it
≤ N,2 ≤ jt
≤ N,it
− jt
≥ 1
δ(s,t) = (c1
Sj
,c2
Sj
,…,cmSj
Sj
)
!
t − (c1
Si
,c2
Si
,…,cmSi
Si
)
!
s ≥ 0, s,t ∈ PLegality Constraints:
Dependence	Polyhedron		Pe2
Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2)
...
Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1)
(is
, js
) → (it
, jt
)
Farkas	Lemma	+	Fourier	Mozkin	 c1
S1
− c2
S1
≥ 0
Step-by-step example:

Putting it all together (The Pluto Algorithm)
q Dependence 1

q Dependence 2 & 3
q Avoiding zero vector
q Objective Function:


44	
c2
S1
≥ 0,w ≥ c2
S1
c1
S1
− c2
S1
≥ 0,u1
≥ 0,u1
≥ c1
S1
− c2
S1
,3u1
+ w ≥ c1
S1
− c2
S1
c1
S1
+ c2
S1
≥ 1
minimize ≺ (u1
,w,c1
S1
,c2
S1
) → (0,1,1,1)
Constraints using parameter N

that bound the dependence distances
Find linearly Independent answer TS1
(i,j) = 1 1
1 0
⎛
⎝
⎜
⎞
⎠
⎟
i
j
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
Summary
q  The polyhedral transformation = “scheduling (determine the execution order of statements)”

q  3 important things:
§  Domain: A set of instances for a statement 
§  Scattering (Scheduling): an instance -> time stamp
§  Access: an instance -> array element(s)
q  Limitation: Only applicable for Static Control Part (SCoP) in general 
§  Loop bounds and conditionals are affine functions of the surrounding the loop iterators
45	
for (i=1; …){
S1;
for (j=1; …)
S2;
1 ≤ iS1
≤ 2;
1 ≤ iS2
≤ 2;
1 ≤ jS2
≤ 3;
iS1
= iS2
;
InequalitiesProgram
Constraints:
Cost Function:
ILP
δe
(
!
s,
!
t) = φSj
(
!
t) − φSi
(
!
s)
for (i=1; …){
S1;
}
for (i=1; …) {
…;
“Synthesized” Code
Ci
− Cj
≥ 0,!
COMPILERS AND TOOLS
Introduction to Polyhedral Compilation
46
Polyhedral Compilers & Tools
q PoCC (The Polyhedral Compiler Collection)
§  http://web.cs.ucla.edu/~pouchet/software/pocc/
§  Clan: extract a polyhedral IR from the source code
§  Candl: a dependence analyzer
§  LetSee: legal transformation space explorer
§  PLuTo: an automatic parallelizer and locality
optimizer 
§  CLooG: code generation from the polyhedral IR
47
Polyhedral Compilers & Tools
q Polly
§  http://polly.llvm.org/
§  ISL: Integer Set Library (including code generator)
q Clay/Chrole/Clint
§  https://www.ozinenko.com/projects
§  Clay: “Chunky Loop Alteration wizardrY”
§  Chrole: “Recovering high-level syntactic description of the
automatically computed polyhedral optimization”
§  Clint: “Interactive graphical interface to the manual and
compiler-assisted program restructuring in the polyhedral
model”
48
Clint
49
Further readings
q  Fundamentals
§  OpenScop Specification
ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html
§  ISL
ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf
q  Pluto algorithm
§  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral
Model” (PhD Dissertation, 2010)
§  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral
Parallelizer and Locality Optimizer.” [PLDI’08]
q  Polly
§  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning
polyhedra” [ACM TOPLAS2015]
q  Polyhedral model + AST-based Cost Function
§  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST-
based Transformations.” [SC’14]
q  GPU Code Generation
§  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code
generation for CUDA” [ACM TACO2013]
§  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the
Polyhedral Model” [CC’17]
 50

Weitere ähnliche Inhalte

Was ist angesagt?

充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろHiroshi Yamashita
 
SystemC Tutorial
SystemC TutorialSystemC Tutorial
SystemC Tutorialkocha2012
 
不遇の標準ライブラリ - valarray
不遇の標準ライブラリ - valarray不遇の標準ライブラリ - valarray
不遇の標準ライブラリ - valarrayRyosuke839
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)marsee101
 
HalideでつくるDomain Specific Architectureの世界
HalideでつくるDomain Specific Architectureの世界HalideでつくるDomain Specific Architectureの世界
HalideでつくるDomain Specific Architectureの世界Fixstars Corporation
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介MITSUNARI Shigeo
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門Norishige Fukushima
 
ホモトピー型理論入門
ホモトピー型理論入門ホモトピー型理論入門
ホモトピー型理論入門k h
 
SAT/SMTソルバの仕組み
SAT/SMTソルバの仕組みSAT/SMTソルバの仕組み
SAT/SMTソルバの仕組みMasahiro Sakai
 
AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解MITSUNARI Shigeo
 
Linuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違いLinuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違いRetrieva inc.
 
多段階計算の型システムの基礎
多段階計算の型システムの基礎多段階計算の型システムの基礎
多段階計算の型システムの基礎T. Suwa
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料一路 川染
 
大規模グラフ解析のための乱択スケッチ技法
大規模グラフ解析のための乱択スケッチ技法大規模グラフ解析のための乱択スケッチ技法
大規模グラフ解析のための乱択スケッチ技法Takuya Akiba
 
新しい並列for構文のご提案
新しい並列for構文のご提案新しい並列for構文のご提案
新しい並列for構文のご提案yohhoy
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013Ryo Sakamoto
 

Was ist angesagt? (20)

充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろ
 
LLVM
LLVMLLVM
LLVM
 
SystemC Tutorial
SystemC TutorialSystemC Tutorial
SystemC Tutorial
 
不遇の標準ライブラリ - valarray
不遇の標準ライブラリ - valarray不遇の標準ライブラリ - valarray
不遇の標準ライブラリ - valarray
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)Vivado hls勉強会1(基礎編)
Vivado hls勉強会1(基礎編)
 
HalideでつくるDomain Specific Architectureの世界
HalideでつくるDomain Specific Architectureの世界HalideでつくるDomain Specific Architectureの世界
HalideでつくるDomain Specific Architectureの世界
 
SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介SSE4.2の文字列処理命令の紹介
SSE4.2の文字列処理命令の紹介
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門
 
lsh
lshlsh
lsh
 
ホモトピー型理論入門
ホモトピー型理論入門ホモトピー型理論入門
ホモトピー型理論入門
 
SAT/SMTソルバの仕組み
SAT/SMTソルバの仕組みSAT/SMTソルバの仕組み
SAT/SMTソルバの仕組み
 
AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解AVX-512(フォーマット)詳解
AVX-512(フォーマット)詳解
 
Linuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違いLinuxカーネルを読んで改めて知るプロセスとスレッドの違い
Linuxカーネルを読んで改めて知るプロセスとスレッドの違い
 
多段階計算の型システムの基礎
多段階計算の型システムの基礎多段階計算の型システムの基礎
多段階計算の型システムの基礎
 
FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料FPGA+SoC+Linux実践勉強会資料
FPGA+SoC+Linux実践勉強会資料
 
大規模グラフ解析のための乱択スケッチ技法
大規模グラフ解析のための乱択スケッチ技法大規模グラフ解析のための乱択スケッチ技法
大規模グラフ解析のための乱択スケッチ技法
 
新しい並列for構文のご提案
新しい並列for構文のご提案新しい並列for構文のご提案
新しい並列for構文のご提案
 
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
GPUが100倍速いという神話をぶち殺せたらいいな ver.2013
 
Binary indexed tree
Binary indexed treeBinary indexed tree
Binary indexed tree
 

Andere mochten auch

Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15 mixARConference
 
Epoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementEpoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementrajesh kumar
 
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...Edge AI and Vision Alliance
 
"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by OanOan Sahito
 
Advanced Composite Materials & Technologies for Defence
Advanced Composite  Materials & Technologies for  DefenceAdvanced Composite  Materials & Technologies for  Defence
Advanced Composite Materials & Technologies for DefenceDigitech Rathod
 
Epoxy/CNT nanocomposites
Epoxy/CNT nanocompositesEpoxy/CNT nanocomposites
Epoxy/CNT nanocompositeszenziyan
 
Svnit composite materials
Svnit composite materialsSvnit composite materials
Svnit composite materialsNEERAJ PARMAR
 
Epoxy resin presented by biswajit maity
Epoxy resin  presented by biswajit maityEpoxy resin  presented by biswajit maity
Epoxy resin presented by biswajit maityBiswajit Maity
 
composite materials in aerospace application seminar
 composite materials in aerospace application seminar composite materials in aerospace application seminar
composite materials in aerospace application seminarChuchu Beera
 
Composite materials
Composite materialsComposite materials
Composite materialsStudent
 

Andere mochten auch (16)

Виктор Ерухимов Open VX mixar moscow sept'15
Виктор Ерухимов Open VX  mixar moscow sept'15 Виктор Ерухимов Open VX  mixar moscow sept'15
Виктор Ерухимов Open VX mixar moscow sept'15
 
Epoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancementEpoxy composites thermal conductivity enhancement
Epoxy composites thermal conductivity enhancement
 
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
 
Nano Technology Pipe Coating
Nano Technology Pipe CoatingNano Technology Pipe Coating
Nano Technology Pipe Coating
 
Composite Materials, Advanced Composite Bicycle Frame IDM12
Composite Materials, Advanced Composite Bicycle Frame IDM12Composite Materials, Advanced Composite Bicycle Frame IDM12
Composite Materials, Advanced Composite Bicycle Frame IDM12
 
Increasing temperature resistance - Highlight
Increasing temperature resistance - HighlightIncreasing temperature resistance - Highlight
Increasing temperature resistance - Highlight
 
"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan"Industrial applications of Composite Polymers" by Oan
"Industrial applications of Composite Polymers" by Oan
 
Epoxy resin
Epoxy resinEpoxy resin
Epoxy resin
 
Advanced Composite Materials & Technologies for Defence
Advanced Composite  Materials & Technologies for  DefenceAdvanced Composite  Materials & Technologies for  Defence
Advanced Composite Materials & Technologies for Defence
 
Epoxy/CNT nanocomposites
Epoxy/CNT nanocompositesEpoxy/CNT nanocomposites
Epoxy/CNT nanocomposites
 
Application of Composite Material in Aerospace Industry
Application of Composite Material in Aerospace IndustryApplication of Composite Material in Aerospace Industry
Application of Composite Material in Aerospace Industry
 
Svnit composite materials
Svnit composite materialsSvnit composite materials
Svnit composite materials
 
Epoxy resin presented by biswajit maity
Epoxy resin  presented by biswajit maityEpoxy resin  presented by biswajit maity
Epoxy resin presented by biswajit maity
 
composite materials in aerospace application seminar
 composite materials in aerospace application seminar composite materials in aerospace application seminar
composite materials in aerospace application seminar
 
Composite materials
Composite materialsComposite materials
Composite materials
 
Composite resin
Composite resinComposite resin
Composite resin
 

Ähnlich wie Introduction to Polyhedral Compilation

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programmingShakil Ahmed
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdfSrinivasaReddyPolamR
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015ihji
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfssuser034ce1
 
DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..KarthikeyaLanka1
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithmspppepito86
 
Event driven simulator
Event driven simulatorEvent driven simulator
Event driven simulatorSahil Abrol
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and PolynomialAroosa Rajput
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」Ken'ichi Matsui
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesQuoc-Sang Phan
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 
parallel programming.ppt
parallel programming.pptparallel programming.ppt
parallel programming.pptnazimsattar
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihmSajid Marwat
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 

Ähnlich wie Introduction to Polyhedral Compilation (20)

Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
time_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdftime_complexity_list_02_04_2024_22_pages.pdf
time_complexity_list_02_04_2024_22_pages.pdf
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Event driven simulator
Event driven simulatorEvent driven simulator
Event driven simulator
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
第13回数学カフェ「素数!!」二次会 LT資料「乱数!!」
 
Symbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo TheoriesSymbolic Execution as DPLL Modulo Theories
Symbolic Execution as DPLL Modulo Theories
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
parallel programming.ppt
parallel programming.pptparallel programming.ppt
parallel programming.ppt
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
Time complexity.ppt
Time complexity.pptTime complexity.ppt
Time complexity.ppt
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Java operators
Java operatorsJava operators
Java operators
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 

Mehr von Akihiro Hayashi

GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Akihiro Hayashi
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
LLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsAkihiro Hayashi
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Akihiro Hayashi
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionAkihiro Hayashi
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Akihiro Hayashi
 
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...Akihiro Hayashi
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Akihiro Hayashi
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
 

Mehr von Akihiro Hayashi (11)

GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
LLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS ProgramsLLVM-based Communication Optimizations for PGAS Programs
LLVM-based Communication Optimizations for PGAS Programs
 
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
Machine-learning based performance heuristics for Runtime CPU/GPU Selection i...
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
 
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
Studies on Automatic Parallelization for Heterogeneous and Homogeneous Multi...
 
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
LLVM Optimizations for PGAS Programs -Case Study: LLVM Wide Optimization in C...
 
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL Generation
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Introduction to Polyhedral Compilation

  • 1. Introduction to Polyhedral Compilation Akihiro Hayashi, Jun Shirako Rice University 1
  • 3. HIGH LEVEL SUMMARY Introduction to Polyhedral Compilation 3
  • 4. q The first priority is “performance” 4 Supercomputers Personal Computers Smartphones Embedded Pictures Borrowed From : commons.wikimedia.org, www.hirt-japan.info Parallel Computing
  • 5. Parallel programming is hard… 5 DRAM L3 Cache Core Core Core Core L2 Cache L2 Cache SIMD SIMD SIMD SIMD L1$ L1$ L1$ L1$ DRAM(slowest)–Register(fastest) Exploiting SIMD Scheduling tasks on CPUs Optimizing Data Locality Multi-core CPUs Many-core GPUs C L2 Cache DRAM C C C C C C C C C C C C C C C C C C C C C C C Utilizing Accelerators
  • 6. A gap between domain experts and hardware 6 Application Domain
 (Domain Experts) Prog Lang. Compilers Runtime Want to get significant performance improvement easily (Performance Portability) Hard to exploit the full capability of hardware We believe Languages and Compilers are very important! Hardware (Concurrency Experts)
  • 7. A review of literature q Automatic Parallelizing Compiler §  IBM XL Compilers, Intel Compilers, OSCAR, Pluto, Polly, Polaris, R-Stream, SUIF, … q Parallel Languages §  Language-based: ü Cilk, CUDA, OpenCL, C++AMP, Java, Habanero C/Java, PGAS, … §  Directive-based: ü OpenMP, OpenACC, OmpSs, … §  Library-based: ü Charm++, TBB, Thrust, RAJA, Kokkos, UPC++, HJLib, … 7
  • 8. From the perspective of compilers… q Compilers are one of the most complicated software L §  Pointer Analysis §  Scalar Optimizations §  Loop Transformations §  Vectorization/SIMDization §  Scheduling §  Exploiting accelerators §  … 8 Credits: dragon by Cassie McKown from the Noun Project, crossed swords by anbileru adaleru from the Noun Project, https://en.wikipedia.org/
  • 9. What are compilers doing? 9 x = a + b; y = a + b; z = x + y; + a b + a b + Intermediate Representation (e.g. AST) Programs x = a + b; y = x; z = x + y; “Optimized” Code Parsing Optimizations
  • 10. What are compilers doing? 10 q Compiler can modify programs (e.g. change the execution order of statements) as long as maintaining the semantics of programs x = a + b; y = a + b; z = x + y; + a b + a b + Intermediate Representation (e.g. AST) Programs x = a + b; y = x; z = x + y; “Optimized” Code z = x + y; x = a + b;
  • 11. Examples of optimizations:
 Scalar optimizations 11 x = a + b; y = x; z = x + y; x = a + b; y = a + b; z = x + y; a = 0; if (a) { … } Constant Propaga4on a = 0; if (0) { … } Dead Code Elimina4on a = 0; CSE
  • 12. Examples of optimizations:
 loop permutation (interchange) 12 for (i = 0; i < M; i++) { for (j = 0; j < N; j++) { b[i][j] = a[i][j]; } } for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { b[i][j] = a[i][j]; } } Offset access (faster on CPUs) Stride access (slower on CPUs) Interchanged Original
  • 13. Examples of optimizations:
 loop fusion/distribution 13 for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; d[i] = a[i] + e[i]; } for (i = 0; i < N; i++) { a[i] = b[i] + c[i]; } for (i = 0; i < N; i++) { d[i] = a[i] + e[i]; } Better temporal locality on CPUs Fused Distributed Good for Vectorization on CPUs Depending on the loop size “N”
  • 14. The phase-ordering problem q Which order is better? 14 a = 0; if (a) { … } Dead Code Elimina4on a = 0; if (a) { … } a = 0; if (0) { … } Constant Propaga4on a = 0; if (a) { … } Constant Propaga4on a = 0; if (0) { … } Dead Code Elimina4on a = 0;
  • 15. 15 x = a + b; y = a + b; z = x + y; + a b + a b + ASTPrograms x = a + b; y = x; z = x + y; “Optimized” Code AST vs. The Polyhedral Model i >= 0; i < N; … Polyhedron (Affine Inequalities) “Synthesized” Code TODAY AST
  • 16. Why Polyhedral Model? q One solution for tackling the phase-ordering problem q Good for performing a set of loop transformations §  Loop permutation §  Loop fusion/distribution §  Loop tiling §  … 16 “The Polyhedral Model is a convenient alternative representation 
 which combines analysis power, expressiveness and high flexibility” - OpenScop Specification and Library
  • 18. The polyhedral model in a nutshell q  The polyhedral transformation = “scheduling (determine the execution order of statements)” q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s) q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators 18 for (i=1; …){ S1; for (j=1; …) S2; 1 ≤ iS1 ≤ 2; 1 ≤ iS2 ≤ 2; 1 ≤ jS2 ≤ 3; iS1 = iS2 ; InequalitiesProgram Constraints: Cost Function: ILP δe ( ! s, ! t) = φSj ( ! t) − φSi ( ! s) for (i=1; …){ S1; } for (i=1; …) { …; “Synthesized” Code Ci − Cj ≥ 0,!
  • 19. Representation of “Domain” q Observations: §  S1 is executed 30 times (30 instances) §  Each instance is associated with (i,j) 19 for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1; “The key aspect of the polyhedral model is to consider statement instances.” - OpenScop Specification and Library
  • 20. Iteration Domain q  A set of constraints to represent instances of a statement §  Using iteration vectors (i,j); §  If those constraints are affine -> Polyhedron 20 for (i=1; i <= 5; i++) for (j=1; j <= 6;j++) S1; 1 ≤ i ≤ 5,1 ≤ j ≤ 6; DS1 = 1 0 −1 −1 0 5 0 1 −1 0 −1 6 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ i j 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ≥ 0 Credits: Clint (https://www.ozinenko.com/clint)
  • 21. Representation of “Scheduling”:
 1-dimensional schedules q Function T: returns the logical date of each statement 21 x = a + b; // S1 y = a + b; // S2 z = x + y; // S3 T_S1 = 0; T_S2 = 1; T_S3 = 2; LogicalTime T=0 T=1 T=2
  • 22. Representation of “Scheduling”:
 multi-dimensional schedules 22 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } z = x + y; // S3 LogicalTime T=1 T=2 T_S1 = (0); T_S2(0) = (1, 0); T_S2(1) = (1, 1); T_S3 = (2) T=0 i=0 i=1 q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks §  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds) TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
  • 23. Representation of “Scheduling”:
 multi-dimensional schedules 23 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } z = x + y; // S3 LogicalTime T=1 T=2 T_S1 = (0); T_S2(i) = (1, i); T_S3 = (2) T=0 i=0 i=1 Parameterized: Recall “Iteration domain” 0 ≤ i < 2 q Function T: returns the logical date of each statement q Logical dates may be multi-dimensional (c.f. clocks §  Lexicographical Order: §  C.f. Clocks (days, hours, minutes, seconds) TS1 ≺ TS2 ≺ TS3 ⇔ (0) ≺ (1,i) ≺ (2)
  • 24. Representation of “Scheduling”:
 multi-dimensional schedules 24 x = a + b; // S1 for (i = 0; i < 2; i++) {  a[i] = x; // S2 } for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] += a[i]; // S3 } } LogicalTime T=1 T=2 T_S1 = (0); T_S2(i) = (1, i); T_S3(i,j) = (2, i, j); T=0 i=0 i=1 j=0 i=1 j=0 i=1 i=0 i=1
  • 25. Loop transformations with schedules 25 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } TS1 (i, j) = 1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ T_S1(i, j) = (i, j); T_S1(i, j) = (i, j); Original schedule New schedule New Schedule Iteration 
 Vector Original New Transformation
  • 26. Loop transformations with schedules: 
 Loop Reversal 26 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (-i, j); Original schedule New schedule Original New TS1 (i,j) = −1 0 0 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = −i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation for (i = -1; i <= 0; i++) { for (j = 0; j < 3; j++) { b[-i][j] = ...; // S1 } } inew = −iold ; iold → −inew ;
  • 27. Loop transformations with schedules: 
 Loop Permutation 27 for (i = 0; i < 2; i++) { for (j = 0; j < 3; j++) { b[i][j] = ...; // S1 } } for (j = 0; j < 3; j++) {   for (i = 0; i < 2; i++) b[i][j] = ...; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (j, i); Original schedule New schedule Original New TS1 (i,j) = 0 1 1 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = j i ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation
  • 28. Loop transformations with schedules: 
 Loop Skewing 28 for (i = 1; i <= 5; i++) { for (j = 1; j <= 5; j++) { a[i][j] = a[i-1][j+1]; // S1 } } for (i = 1; i <= 5; i++) { for (j = i+1; j <= i+5; j++) { a[i][j-i] = 
 a[i-1][j-i+1]; // S1 } } T_S1(i, j) = (i, j); T_S1(i, j) = (i, i+j); Original schedule New schedule Original New TS1 (i,j) = 1 0 1 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ = i i + j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ New Schedule Iteration 
 VectorTransformation jnew = i + jold ; jold → jnew − i;
  • 29. Loop transformations with schedules: 
 Loop Skewing (Cont’d) 29 TS1 = 1 0 1 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ Credits: Clint (https://www.ozinenko.com/clint) (i,i+j) = (1,2); (1,3); (1,4); (1,5); (2,3); (2,4); (2,5); (2,6); (2,7); (3,4); (3,5); (3,6); (3,7); (3,8); (4,5); … (i,j) = (1,1); (1,2); (1,3); (1,4); (1,5); (2,1); (2,2); (2,3); (2,4); (2,5); (3,1); (3,2); (3,3); (3,4); (3,5); … Dependence Execution Order
  • 30. Scalar Dimensions in schedules q 2d+1 format (d+d+1) q Can represent/transform imperfectly nested loops §  e.g., Loop fusion/distribution 30 for (i = 0; i < 2; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = ...; // S2 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S3 T_S1(i) = (0, i, 0); T_S2(i,j) = (0, i, 1, j, 0); T_S3(i,j) = (1, i, 0, j, 0)
  • 31. Loop transformations to schedules
 loop fusion w/ scalar dimensions 31 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) b[i] = ...; // S2 for (i = 0; i < 2; i++) for (j = 0; j < 3; j++) a[i] = ...; // S1 for (j = 0; j < 3; j++) b[i] = ...; // S2 T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (1, i, 0, j); T_S1(i,j) = (0, i, 0, j); T_S2(i,j) = (0, i, 1, j); Original schedule New schedule TS2 (i,j) = 0 0 1 0 0 0 0 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ + 0 0 1 0 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ = 0 i 1 j ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ New 
 Schedule Scalar DimensionsTransformation Original New
  • 32. Schedules in general 32 TS ( ! i) = φS 1 ( ! i) φS 2 ( ! i) φS 3 ( ! i) φS 4 ( ! i) " φS d ( ! i) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ = C11 S C12 S C13 S C14 S " C1mS S C21 S C22 S C23 S C24 S " C2mS S C31 S C32 S C33 S C34 S " C3mS S C41 S C42 S C43 S C44 S " C4mS S # # # # $ # Cd 1 S Cd 2 S Cd 3 S Cd 4 S " CdmS S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ! i( ) + C10 S C20 S C30 S C40 S ! Cd 0 S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ Scalar DimensionsA transformation for an iteration vector d mS d 1 Schedules e.g., (0, i, 0, j) d = 2mS + 1, mS = the size of iteration vector
  • 33. Schedules in general 33 TS ( ! i) = φS 1 ( ! i) φS 2 ( ! i) φS 3 ( ! i) φS 4 ( ! i) " φS d ( ! i) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ = C11 S C12 S C13 S C14 S " C1mS S C21 S C22 S C23 S C24 S " C2mS S C31 S C32 S C33 S C34 S " C3mS S C41 S C42 S C43 S C44 S " C4mS S # # # # $ # Cd 1 S Cd 2 S Cd 3 S Cd 4 S " CdmS S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ! i( ) + C10 S C20 S C30 S C40 S ! Cd 0 S ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ Scalar DimensionsA transformation for an iteration vector d mS d 1 Schedules e.g., (0, i, 0, j) d = 2mS + 1, mS = the size of iteration vector Goal: 
 Compute the coefficients and offsets for each statement
  • 34. Legality of transformations q All transformations are valid? NO! 34 for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 T_S1(i) = (0, i, 0); T_S2(i,j) = (0, i, 1, j, 0); for (i = 1; i <= 10; i++) for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 s[i] = ...; // S1 T_S2(i,j) = (0, i, 0, j, 0); T_S2(i) = (0, i, 1); Original New Transforma4on
  • 35. Dependences q Three types of dependence: §  Read-After-Write: (a=1; then b=a;) §  Write-After-Read: (b=a; then a=1;) §  Write-After-Write: (a=1; then a=2;) q Dependence: computed from domain, access, and schedule §  Transformation = Find a new schedule that satisfies all dependences 35
  • 36. Dependence polyhedron q  Dependence polyhedron : a set of inequalities ( ) §  A general and accurate representation of instance-wise dependences 36 for (i = 1; i <= 10; i++) s[i] = ...; // S1 for (j = 0; j < 3; j++) a[i][j] = s[i]; // S2 iS1 = iS2 1 ≤ iS1 ≤ 10, 1 ≤ iS2 ≤ 10; 0 ≤ jS2 < 3; 1 −1 0 0 1 0 0 −1 −1 0 0 10 0 0 1 0 0 0 −1 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS1 iS2 jS2 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ = ≥ 0 DS1 DS2 S1 S2 Credits: Clint (https://www.ozinenko.com/clint) iS1 = iS2 ⇒ iS1 − iS2 ≥ 0 ∧ iS2 − iS1 ≥ 0
  • 37. Legality of transformations q Dependence polyhedron: q Legality: §  §  If “source” instance must happen before “target” instance in the original program, the transformed program must preserve this property (must satisfy the dependence) 37 ∀ s,t ∈ Pe ,(s ∈ DSi ,t ∈ DSj ),TSi (s) ≺ TSj (t) Pe
  • 38. Putting it all together q Goal : Compute all coefficients and offsets such that 38 TS2 (i, j) = C11 S2 C12 S2 C21 S2 C22 S2 C31 S2 C32 S2 C41 S2 C32 S2 C51 S2 C52 S2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS2 jS2 ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ + C10 S2 C20 S2 C30 S2 C40 S2 C50 S2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ TS1 (i) = C11 S1 C21 S1 C31 S1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ iS1( ) + C10 S1 C20 S1 C30 S1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ 1 −1 0 0 1 0 0 −1 −1 0 0 10 0 0 1 0 0 0 −1 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ iS1 iS2 jS2 1 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ = ≥ 0 ∀ s,t ∈ Pe ,(s ∈ DS1 ,t ∈ DS2 ),TS1 (s) ≺ TS2 (t) Dependence Polyhedron PeSchedules iS1 = iS2 1 ≤ iS1 ≤ 10, 1 ≤ iS2 ≤ 10; 0 ≤ jS2 < 3;
  • 39. Linearizing the legality condition
 (The Pluto Algorithm) q The Legality condition (for iteration vectors) q Uniform dependences : distance between two dependent iteration is a constant ( is a constant) q  Non-uniform dependences : distance between two dependence varies ( is a function of j ) §  Apply the Farkas lemma 39 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ P i → i + 1 ⇒ δ(s,t) i → i + j ⇒ δ(s,t) (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ Pe ⇔ (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≡ λe0 + λek k =1 me ∑ Pe k , λek ≥ 0 Each inequality in a dependence polyhedron
  • 40. Cost Function & Objective Function
 (The Pluto Algorithm) q Compute all coefficients and offsets under the legality condition : Solve an ILP problem q Cost Function = Transformation policy §  Pluto’s cost function = dependence distance ü Fuse loops as much as possible ü Push loops carrying dependence inner level §  Also used in ISL (Polly, PPCG, …) q Objective Function: §  Iteratively find linearly independent solutions 40 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s, s,t ∈ P minimize ≺ (u1 ,w,c1 Sj ,c2 Sj )
  • 41. Step-by-step example 41 for (i = 0; i < N; i++) { for (j = 1; j < N; j++) { a[i][j] = a[j][i] + a[i][j-1]; // S1 } } a[0][1] = a[1][0] + a[0][0]; // S1(0,1) a[0][2] = a[2][0] + a[0][1]; // S1(0,2) a[0][3] = a[3][0] + a[0][2]; // S1(0,3) ... a[1][1] = a[1][1] + a[1][0]; // S1(1,1) a[1][2] = a[2][1] + a[1][1]; // S1(1,2) a[1][3] = a[3][1] + a[1][2]; // S1(1,3) ... a[2][1] = a[1][2] + a[2][0]; // S1(2,1) a[2][2] = a[2][2] + a[2][1]; // S1(2,2) a[2][3] = a[3][2] + a[2][2]; // S1(2,3) ... a[3][1] = a[1][3] + a[3][0]; // S1(3,1) Dependence 1 (RAW) Dependence 2 (RAW) Dependence 3 (WAR)
  • 42. (is , js ) → (it , jt ) c1 S1 ,c2 S1 ( ) it jt ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ − c1 S1 ,c2 S1 ( ) is js ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ≥ 0, is , js ,it , jt ∈ Pe1 ⇒ c1 S1 it + c2 S1 jt − (c1 S1 is + c2 S1 js ) = c1 S1 it + c2 S1 jt − (c1 S1 it + c2 S1 (jt − 1)) ≥ 0 ⇒ c2 S1 ≥ 0 Step-by-step example:
 Legality Constraints 1 (The Pluto Algorithm) q Dependence 1 : RAW (flow dependence ) 42 Source: a[0][1] = a[1][0] + a[0][0]; // S1(0,1) Target: a[0][2] = a[2][0] + a[0][1]; // S1(0,2) ... Pe1 : is = it , js = jt − 1,0 ≤ it ≤ N − 1,2 ≤ jt ≤ N δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ PLegality Constraints: Dependence Polyhedron Pe1
  • 43. Step-by-step example:
 Legality Constraints 2 (The Pluto Algorithm) q Dependence 2 : RAW (flow dependence ) 43 Pe2 : is = jt , js = it ,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1 c1 S1 ,c2 S1 ( ) it jt ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ − c1 S1 ,c2 S1 ( ) is js ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ≥ 0, is , js ,it , jt ∈ Pe1 ⇒ c1 S1 it + c2 S1 jt − (c1 S1 is + c2 S1 js ) = c1 S1 it + c2 S1 jt − (c1 S1 jt + c2 S1 it ) ≥ 0 ⇒ (c1 S1 − c2 S1 )it + (c2 S1 − c1 S1 )jt ≥ 0,1 ≤ it ≤ N,2 ≤ jt ≤ N,it − jt ≥ 1 δ(s,t) = (c1 Sj ,c2 Sj ,…,cmSj Sj ) ! t − (c1 Si ,c2 Si ,…,cmSi Si ) ! s ≥ 0, s,t ∈ PLegality Constraints: Dependence Polyhedron Pe2 Target: a[1][2] = a[2][1] + a[1][1]; // S1(1,2) ... Source: a[2][1] = a[1][2] + a[2][0]; // S1(2,1) (is , js ) → (it , jt ) Farkas Lemma + Fourier Mozkin c1 S1 − c2 S1 ≥ 0
  • 44. Step-by-step example:
 Putting it all together (The Pluto Algorithm) q Dependence 1 q Dependence 2 & 3 q Avoiding zero vector q Objective Function: 44 c2 S1 ≥ 0,w ≥ c2 S1 c1 S1 − c2 S1 ≥ 0,u1 ≥ 0,u1 ≥ c1 S1 − c2 S1 ,3u1 + w ≥ c1 S1 − c2 S1 c1 S1 + c2 S1 ≥ 1 minimize ≺ (u1 ,w,c1 S1 ,c2 S1 ) → (0,1,1,1) Constraints using parameter N
 that bound the dependence distances Find linearly Independent answer TS1 (i,j) = 1 1 1 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ i j ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟
  • 45. Summary q  The polyhedral transformation = “scheduling (determine the execution order of statements)” q  3 important things: §  Domain: A set of instances for a statement §  Scattering (Scheduling): an instance -> time stamp §  Access: an instance -> array element(s) q  Limitation: Only applicable for Static Control Part (SCoP) in general §  Loop bounds and conditionals are affine functions of the surrounding the loop iterators 45 for (i=1; …){ S1; for (j=1; …) S2; 1 ≤ iS1 ≤ 2; 1 ≤ iS2 ≤ 2; 1 ≤ jS2 ≤ 3; iS1 = iS2 ; InequalitiesProgram Constraints: Cost Function: ILP δe ( ! s, ! t) = φSj ( ! t) − φSi ( ! s) for (i=1; …){ S1; } for (i=1; …) { …; “Synthesized” Code Ci − Cj ≥ 0,!
  • 46. COMPILERS AND TOOLS Introduction to Polyhedral Compilation 46
  • 47. Polyhedral Compilers & Tools q PoCC (The Polyhedral Compiler Collection) §  http://web.cs.ucla.edu/~pouchet/software/pocc/ §  Clan: extract a polyhedral IR from the source code §  Candl: a dependence analyzer §  LetSee: legal transformation space explorer §  PLuTo: an automatic parallelizer and locality optimizer §  CLooG: code generation from the polyhedral IR 47
  • 48. Polyhedral Compilers & Tools q Polly §  http://polly.llvm.org/ §  ISL: Integer Set Library (including code generator) q Clay/Chrole/Clint §  https://www.ozinenko.com/projects §  Clay: “Chunky Loop Alteration wizardrY” §  Chrole: “Recovering high-level syntactic description of the automatically computed polyhedral optimization” §  Clint: “Interactive graphical interface to the manual and compiler-assisted program restructuring in the polyhedral model” 48
  • 50. Further readings q  Fundamentals §  OpenScop Specification ü  http://icps.u-strasbg.fr/people/bastoul/public_html/development/openscop/docs/openscop.html §  ISL ü  https://lirias.kuleuven.be/bitstream/123456789/270231/1/icms2010verdoolaege.pdf q  Pluto algorithm §  U. Bondhugula, “Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model” (PhD Dissertation, 2010) §  U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, “A Practical Automatic Polyhedral Parallelizer and Locality Optimizer.” [PLDI’08] q  Polly §  T. Grosser, S. Verdoolaege, A. Cohen, “Polyhedral AST generation is more than scanning polyhedra” [ACM TOPLAS2015] q  Polyhedral model + AST-based Cost Function §  J. Shirako, L.N. Pouchet, V. Sarkar, “Oil and Water Can Mix: An Integration of Polyhedral and AST- based Transformations.” [SC’14] q  GPU Code Generation §  S. Verdoolaege, J.C Juega. A. Cohen, J.I Gomez, C. Tenllado, F. Catthoor, “Polyhedral parallel code generation for CUDA” [ACM TACO2013] §  J. Shirako, A. Hayashi, V. Sarkar., “Optimized Two-level Parallelization for GPU Accelerators using the Polyhedral Model” [CC’17] 50