淺談編譯器最佳化技術6. 6
Compilation Flow
通常大學部編譯器課程僅能
涵蓋 Parser 部份
以及陽春的 Code Generation
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
7. 7
Compilation Flow
但 Compiler 超好玩超神奇的部份
其實都在最佳化的地方
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
8. 8
Compilation Flow
但 Compiler 超好玩超神奇的部份
其實都在最佳化的地方
透過最佳化,
程式可以變得又小又快!
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
12. 基礎知識惡補
12
• Basic Block
• Control Flow Graph
• Static Single Assignment Form
13. Basic Block
13
• 單一進入點, 單一出口點的程式區段
• http://en.wikipedia.org/wiki/Basic_bl
ock
14. Control Flow Graph
14
• 簡稱CFG, 簡單來說就是程式的流程圖
• http://en.wikipedia.org/wiki/Control_
flow_graph
15. Basic Block
15
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
16. Basic Block
16
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
17. Basic Block
17
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
int ret;
if (n > 10)
ret = n * 2; ret = n + 2;
return ret;
18. CFG
18
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
int ret;
if (n > 10)
ret = n * 2; ret = n + 2;
return ret;
19. Basic Block
19
int sum (int n)
{
int ret = 0;
int i;
for (i = 0; i < n; ++i)
ret += i;
return ret;
}
20. Basic Block
20
int sum (int n)
{
int ret = 0;
int i;
for (i = 0; i < n; ++i)
ret += i;
return ret;
}
21. Basic Block
21
int sum (int n)
{
int ret = 0;
int i;
for (i = 0; i < n; ++i)
ret += i;
return ret;
}
int ret = 0;
int i;
i = 0;
i < n;
ret += i;
++i
return ret
22. CFG
22
int sum (int n)
{
int ret = 0;
int i;
for (i = 0; i < n; ++i)
ret += i;
return ret;
}
int ret = 0;
int i;
i = 0;
i < n;
ret += i;
++i
return ret
24. SSA
24
int foo ()
{
int ret;
ret = 10;
ret = 20;
return ret;
}
25. SSA
25
int foo ()
{
int ret;
ret = 10;
ret = 20;
return ret;
}
int foo ()
{
int ret;
ret1 = 10;
ret2 = 20;
return ret2;
}
每次賦值都會一個版本號
26. SSA
26
int foo ()
{
int ret;
ret = 10;
ret = 20;
return ret;
}
int foo ()
{
int ret;
ret1 = 10;
ret2 = 20;
return ret2;
}
每次賦值都會一個版本號
標完後可以馬上知道
是使用哪個運算式的結果
27. SSA
27
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
28. SSA
28
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
int foo (ini n)
{
int ret;
if (n > 10)
ret1 = n * 2;
else
ret2 = n + 2;
return ret?;
}
程式中有分歧點會合時
無法判定是從何而來
29. SSA
29
int foo (ini n)
{
int ret;
if (n > 10)
ret = n * 2;
else
ret = n + 2;
return ret;
}
int foo (ini n)
{
int ret;
if (n > 10)
ret1 = n * 2;
else
ret2 = n + 2;
ret3 = Φ (ret1, ret2)
return ret3;
}
此時需要使用Φ來
處理這種情況,
表示值的定義
需由程式流程決定
並給予新的版本號
31. LLVM
31
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
32. LLVM
32
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
xdot 是要看圖用的
33. LLVM
33
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
xdot 是要看圖用的
這個嘛...Fedora 套件系統
相依性沒設定好, xdot 的相依套件
34. LLVM
34
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手
會自己想辦法XD
35. LLVM
35
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手
會自己想辦法XD
– Windows !? 聽說官網有安裝檔?
36. LLVM
36
• 好用好玩而且最近很夯的 Compiler, 安
裝方法如下:
– sudo apt-get install llvm clang xdot
– sudo yum install llvm clang python-xdot
python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手
會自己想辦法XD
– Windows !? 聽說官網有安裝檔?
– 建議自己 build, 不然會沒有部份debug功能
37. LLVM IR
37
• v = operation type op1, op2, opn...
– %sum = add i32 %op1, %op2
運算元
型態
運算子們
運算結果
42. LLVM IR
42
• SSA-Based IR
– %sum = add i32 %op1, %op2
– %sum = mul i32 %op1, %op2
– error: multiple definition of local
value named 'sum'
43. SSA!?
43
• 對編譯器來講 SSA Form 很友善, 但對
於正常人來說寫 SSA Form 不太直覺...
44. SSA!?
44
• 對編譯器來講 SSA Form 很友善, 但對
於正常人來說寫 SSA Form 不太直覺...
– 習慣Functional programming者例外...XD
45. SSA!?
45
• 對編譯器來講 SSA Form 很友善, 但對
於正常人來說寫 SSA Form 不太直覺...
– 習慣Functional programming者例外...XD
• 手動插入PHI 更是件麻煩事
46. alloca
46
• 用來產生區域變數
– 分配到的空間放到 stack
• 使用上有點類似C語言的malloc, 但概念不太一
樣
47. alloca
47
define void @foo() {
%var = alloca i32
ret void
}
所產生的位置, 型別
可以看作是一個i32*
48. alloca
48
• 每次存取都必須透過 load/store
– 但在最佳化過程中, 若非必要則會變為
Register (透過mem2reg pass)
• 若為 array 或必須對其取位址, 則可能
無法變成 Register
49. alloca/store
49
define void @foo() {
%var = alloca i32
store i32 10, i32* %var
ret void
}
要存的值與型別型別跟要存的目標位置
50. alloca/load
50
define void @foo() {
%var = alloca i32
store i32 10, i32* %var
%t0 = load i32* %var
ret void
}
讀取回來的值型別跟要讀取的目標位置
51. LLVM/Clang
51
• 今天的分享中只會使用以下兩個工具:
– clang : 把 c 變成 LLVM IR
– opt : 進行最佳化以及觀察的工具
52. View CFG by LLVM
52
• clang foo.c -S -emit-llvm
• opt foo.ll -veiw-cfg
int foo(int a, int b)
{
if (a > b)
return a;
else
return b;
}
54. View CFG by LLVM
54
垃圾指令有點多,
但在觀察階段開最佳化,
又會干擾學習
opt foo.ll -O1 -veiw-cfg
開完最佳化後剩三道指令一個BB...
55. opt 使用注意事項 (1/3)
55
• 參數的位置很重要!!
opt foo.ll -view-cfg -O1
先秀出 CFG 再進行最佳化
opt foo.ll -O1 -view-cfg
先進行最佳化再來看 CFG
56. opt 使用注意事項 (2/3)
56
• 參數可以重複下
opt foo.ll -view-cfg -O1 -view-cfg
先秀出 CFG
再進行最佳化
最後再看一次 CFG
57. opt 使用注意事項 (3/3)
57
• 參數可以重複下, 最佳化也可以重複作
opt foo.ll -O1 -view-cfg -O1 -view-cfg
再進行最佳化
進行最佳化
58. mem2reg
58
• mem2reg: 不必要的 alloca 以及
load/store 砍掉
• 並且把程式變得比較有 SSA Form 的樣子
62. Constant Propagation
62
int foo(int a)
{
int magic_num = 10;
return a + magic_num;
}
int foo(int a)
{
int magic_num = 10;
return a + 10;
}
63. Constant Propagation
63
opt foo.ll -mem2reg -view-cfg
int foo(int a)
{
int magic_num = 10;
return a + magic_num;
}
這種最佳化太基本了,
在mem2reg過程順便作掉
int foo(int a)
{
int magic_num = 10;
return a + 10;
}
64. Constant Propagation
64
int foo(int a)
{
int magic_num = 10;
return a + magic_num;
}
int foo(int a)
{
int magic_num = 10;
return a + 10;
}
千萬不要覺得寫成右邊那樣
會比較快就寫一堆
該死的 Magic Number!!!!
68. Constant Folding
68
• Constant Folding: 常數折疊!
– 若運算對象都是常數,那就先算出來!
• a = 123 + 456
– a = 579
69. Constant Folding
69
• Constant Folding: 常數折疊!
– 若運算對象都是常數,那就先算出來!
• a = 123 + 456
– a = 579
• 程式中不一定有一堆這種常數運算, 但經
過Constant Propagation 後會慢慢出現
72. Constant Folding
72
a = 10
b = 100 + a
a = 10
b = 100 + 10
a = 10
b = 110
Constant Propagation
Constant Folding
75. 觀察 Constant Folding
75
• Constant Folding 則可以在 LLVM 的
Constant Propagation Pass 中處理
define i32 @folding() {
%t = add i32 10, 20
ret i32 %t
}
define i32 @folding() {
ret i32 30
}
opt -S cfolding.ll -constprop
76. Function Inline
76
• Inline: 行內函數? 內嵌函數?
• 概念就是把函數內容複製一份到呼叫端
• 節省掉函數的呼叫並且可探索更多的最佳
化機會!
78. Inline + Propagation
78
int add(int a, int b)
{
return a + b;
}
int foo(int n){
int sum = 0;
int i, t;
for (i = 0; i < n ;++i) {
t = add(10, 20);
sum = add(sum, i);
sum = add(sum, t);
}
return sum;
}
79. Inline + Propagation
79
int add(int a, int b)
{
return a + b;
}
int foo(int n){
int sum = 0;
int i, t;
for (i = 0; i < n ;++i) {
t = add(10, 20);
sum = add(sum, i);
sum = add(sum, t);
}
return sum;
}
define i32 @add(i32 %a, i32 %b) {
%1 = add i32 %a, %b
ret i32 %1
}
define i32 @foo(i32 %n) {
br label %1
; <label>:1
%sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ]
%i.0 = phi i32 [ 0, %0 ], [ %8, %7 ]
%2 = icmp slt i32 %i.0, %n
br i1 %2, label %3, label %9
; <label>:3
%4 = call i32 @add(i32 10, i32 20)
%5 = call i32 @add(i32 %sum.0, i32 %i.0)
%6 = call i32 @add(i32 %5, i32 %4)
br label %7
; <label>:7
%8 = add i32 %i.0, 1
br label %1
; <label>:9
ret i32 %sum.0
}
clang -emit-llvm -S inline.c
opt inline.ll -mem2reg -S
80. Inline + Propagation
80
define i32 @add(i32 %a, i32 %b) {
%1 = add i32 %a, %b
ret i32 %1
}
define i32 @foo(i32 %n) {
br label %1
; <label>:1
%sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ]
%i.0 = phi i32 [ 0, %0 ], [ %8, %7 ]
%2 = icmp slt i32 %i.0, %n
br i1 %2, label %3, label %9
; <label>:3
%4 = call i32 @add(i32 10, i32 20)
%5 = call i32 @add(i32 %sum.0, i32 %i.0)
%6 = call i32 @add(i32 %5, i32 %4)
br label %7
; <label>:7
%8 = add i32 %i.0, 1
br label %1
; <label>:9
ret i32 %sum.0
}
define i32 @foo(i32 %n) {
br label %1
; <label>:1
%sum.0 = phi i32 [ 0, %0 ], [ %5, %6 ]
%i.0 = phi i32 [ 0, %0 ], [ %7, %6 ]
%2 = icmp slt i32 %i.0, %n
br i1 %2, label %3, label %8
; <label>:3
%4 = add i32 %sum.0, %i.0
%5 = add i32 %4, 30
br label %6
; <label>:6
%7 = add i32 %i.0, 1
br label %1
; <label>:8
ret i32 %sum.0
}
opt inline.ll -mem2reg -inline -S
81. DCE
81
• DCE: Dead Code Elimination, 死碼消除?
• 在經過前面介紹的幾樣最佳化後, 慢慢的
會出現一些冗於的程式碼, 以及一些明顯
永遠不會成立的跳躍條件
82. DCE
82
int foo()
{
a = 5;
if (a > 10)
b = 10;
else
b = 20;
return b;
}
83. DCE
83
int foo()
{
a = 5;
if (a > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (5 > 10)
b = 10;
else
b = 20;
return b;
}
Constant Propagation
84. DCE
84
int foo()
{
a = 5;
if (a > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (5 > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (false)
b = 10;
else
b = 20;
return b;
}
Constant Propagation
Constant
Folding
85. DCE
85
int foo()
{
a = 5;
if (a > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (5 > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (false)
b = 10;
else
b = 20;
return b;
}
Constant Propagation
int foo()
{
b = 20;
return b;
}
Constant
Folding
DCE
86. DCE
86
int foo()
{
a = 5;
if (a > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (5 > 10)
b = 10;
else
b = 20;
return b;
}
int foo()
{
a = 5;
if (false)
b = 10;
else
b = 20;
return b;
}
int foo()
{
b = 20;
return b;
}
int foo()
{
return 20;
}
Constant Propagation
Constant
Folding
Constant DCE
Propagation
87. 用LLVM觀察DCE (1/5)
87
int foo()
{
int a;
int b;
a = 5;
if (a > 10)
b = a + 10;
else
b = a + 20;
return b;
}
clang -S -emit-llvm dce.c
define i32 @foo() {
entry:
%a = alloca i32
%b = alloca i32
store i32 5, i32* %a
%0 = load i32* %a
%cmp = icmp sgt i32 %0, 10
br i1 %cmp, label %if.then, label %if.else
if.then:
%1 = load i32* %a
%add = add i32 %1, 10
store i32 %add, i32* %b
br label %if.end
if.else:
%2 = load i32* %a
%add1 = add i32 %2, 20
store i32 %add1, i32* %b
br label %if.end
if.end:
%3 = load i32* %b
ret i32 %3
}
88. 用LLVM觀察DCE (2/5)
88
define i32 @foo() {
entry:
%a = alloca i32
%b = alloca i32
store i32 5, i32* %a
%0 = load i32* %a
%cmp = icmp sgt i32 %0, 10
br i1 %cmp, label %if.then, label %if.else
if.then:
%1 = load i32* %a
%add = add i32 %1, 10
store i32 %add, i32* %b
br label %if.end
if.else:
%2 = load i32* %a
%add1 = add i32 %2, 20
store i32 %add1, i32* %b
br label %if.end
if.end:
%3 = load i32* %b
ret i32 %3
}
opt dce.c -mem2reg -S
define i32 @foo() {
entry:
%cmp = icmp sgt i32 5, 10
br i1 %cmp, label %if.then, label %if.else
if.then:
%add = add i32 5, 10
br label %if.end
if.else:
%add1 = add i32 5, 20
br label %if.end
if.end:
%b.0 = phi i32 [ %add, %if.then ],
[ %add1, %if.else ]
ret i32 %b.0
}
89. 用LLVM觀察DCE (3/5)
89
define i32 @foo() {
entry:
%cmp = icmp sgt i32 5, 10
br i1 %cmp, label %if.then, label %if.else
if.then:
%add = add i32 5, 10
br label %if.end
if.else:
%add1 = add i32 5, 20
br label %if.end
if.end: %if.else, %if.then
%b.0 = phi i32 [ %add, %if.then ],
[ %add1, %if.else ]
ret i32 %b.0
}
-constprop
opt dce.ll -mem2reg -constprop -S
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
90. 用LLVM觀察DCE (4/5)
90
-dce
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
opt dce.ll -mem2reg -constprop -dce -S
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
91. 用LLVM觀察DCE (4/5)
91
-dce
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
看起來好像沒變化??
opt dce.ll -mem2reg -constprop -dce -S
92. 用LLVM觀察DCE (4/5)
92
-dce
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
看起來好像沒變化??
LLVM 將 CFG 化簡部份交給-simplifycfg pass
93. 用LLVM觀察DCE (5/5)
93
-simplifycfg
define i32 @foo() {
entry:
br i1 false, label %if.then,
label %if.else
if.then:
br label %if.end
if.else:
br label %if.end
if.end:
%b.0 = phi i32 [ 15, %if.then ],
[ 25, %if.else ]
ret i32 %b.0
}
opt dce.ll -mem2reg -constprop -simplifycfg -S
define i32 @foo() {
entry:
ret i32 25
}
94. 用LLVM觀察DCE - 2 (1/2)
94
-simplifycfg
opt dce.ll -mem2reg -simplifycfg -S
define i32 @foo() {
entry:
%cmp = icmp sgt i32 5, 10
%add = add i32 5, 10
%add1 = add i32 5, 20
%b.0 = select i1 %cmp, i32 %add,
i32 %add1
ret i32 %b.0
}
define i32 @foo() {
entry:
%cmp = icmp sgt i32 5, 10
br i1 %cmp, label %if.then, label %if.else
if.then:
%add = add i32 5, 10
br label %if.end
if.else:
%add1 = add i32 5, 20
br label %if.end
if.end: %if.else, %if.then
%b.0 = phi i32 [ %add, %if.then ],
[ %add1, %if.else ]
ret i32 %b.0
}
95. 用LLVM觀察DCE - 2 (2/2)
95
-constprop
define i32 @foo() {
entry:
%cmp = icmp sgt i32 5, 10
%add = add i32 5, 10
%add1 = add i32 5, 20
%b.0 = select i1 %cmp, i32 %add,
i32 %add1
ret i32 %b.0
}
opt dce.ll -mem2reg -simplifycfg -constprop -S
define i32 @foo() {
entry:
ret i32 25
}
96. CSE
96
• CSE:Common subexpression elimination
– 把可以共用的部份共用!
97. CSE
97
a = b * c + g;
d = b * c * e;
t = b * c;
a = t + g;
d = t * e;
98. 用LLVM觀察CSE (1/2)
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {
entry:
%mul = mul i32 %b, %c
%add = add i32 %mul, %g
%mul1 = mul i32 %b, %c
%mul2 = mul i32 %mul1, %e
%add3 = add i32 %add, %mul2
ret i32 %add3
}
98
int foo(int b, int c, int g, int e)
{
int a = b * c + g;
int d = b * c * e;
return a + d;
}
clang -emit-llvm -S cse.c
opt cse.ll -mem2reg -S
99. 用LLVM觀察CSE (2/2)
99
opt cse.ll -mem2reg -early-cse -S
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {
entry:
%mul = mul i32 %b, %c
%add = add i32 %mul, %g
%mul1 = mul i32 %b, %c
%mul2 = mul i32 %mul1, %e
%add3 = add i32 %add, %mul2
ret i32 %add3
}
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {
entry:
%mul = mul i32 %b, %c
%add = add i32 %mul, %g
%mul2 = mul i32 %mul, %e
%add3 = add i32 %add, %mul2
ret i32 %add3
}
-early-cse
100. Loop Unroll
100
• Loop Unroll:迴圈展開
– 跳躍指令在大多數架構下比一般運算指令貴
– 展開後Loop index可能從變數變成常數
sum = 0;
for (i = 0; i < 3; ++i)
sum = sum + i
sum = 0;
sum = sum + 0
sum = sum + 1
sum = sum + 2
101. 用LLVM觀察Loop Unroll (1/8)
101
int add(int a, int b)
{
return a + b;
}i
nt foo()
{
int sum = 0;
int i;
for (i = 0; i < 3; ++i)
sum = add(sum, i);
return sum;
}
clang -emit-llvm -S for.c
opt for.ll -mem2reg -S
define i32 @add(i32 %a, i32 %b) {
entry:
%add = add i32 %a, %b
ret i32 %add
}
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
ret i32 %sum.0
}
102. 用LLVM觀察Loop Unroll (2/8)
102
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
ret i32 %sum.0
}
opt for.ll -mem2reg -loop-unroll -S
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
%sum.0.lcssa = phi i32 [ %sum.0, %for.cond ]
ret i32 %sum.0.lcssa
}
-loop-unroll
103. 用LLVM觀察Loop Unroll (2/8)
103
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
ret i32 %sum.0
}
-loop-unroll
opt for.ll -mem2reg -loop-unroll -S
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
%sum.0.lcssa = phi i32 [ %sum.0, %for.cond ]
ret i32 %sum.0.lcssa
}
似乎 Unroll 不開????
104. 用LLVM觀察Loop Unroll (3/8)
$ opt -mem2reg -S for.ll -loop-unroll -debug
Args: opt -mem2reg -S for.ll -loop-unroll -debug
Loop Unroll: F[foo] Loop %for.cond
Loop Size = 8
Can't unroll; loop not terminated by a
conditional branch.
104
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
ret i32 %sum.0
}
opt for.ll -mem2reg -loop-unroll -S -debug
跟你抱怨這個 Loop,
Loop Unroll Pass 認不得!?
-loop-unroll
-debug
105. 用LLVM觀察Loop Unroll (4/8)
105
define i32 @foo() {
entry:
br label %for.cond
for.cond:
%i.0 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%sum.0 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%cmp = icmp slt i32 %i.0, 3
br i1 %cmp, label %for.body, label %for.end
for.body:
%call = call i32 @add(i32 %sum.0, i32 %i.0)
br label %for.inc
for.inc:
%inc = add i32 %i.0, 1
br label %for.cond
for.end:
ret i32 %sum.0
}
opt for.ll -mem2reg -loop-rotate -S
define i32 @foo() {
entry:
br label %for.body
for.body:
%sum.02 = phi i32 [ 0, %entry ],
[ %call, %for.inc ]
%i.01 = phi i32 [ 0, %entry ],
[ %inc, %for.inc ]
%call = call i32 @add(i32 %sum.02, i32 %i.01)
br label %for.inc
for.inc:
%inc = add i32 %i.01, 1
%cmp = icmp slt i32 %inc, 3
br i1 %cmp, label %for.body, label %for.end
for.end:
%sum.0.lcssa = phi i32 [ %call, %for.inc ]
ret i32 %sum.0.lcssa
}
翻轉吧!迴圈!
-loop-rorate
108. 用LLVM觀察Loop Unroll (7/8)
define i32 @foo() {
entry:
%call = call i32 @add(i32 0, i32 0)
%call.1 = call i32 @add(i32 %call, i32 1)
%call.2 = call i32 @add(i32 %call.1, i32 2)
ret i32 %call.2
}
108
define i32 @foo() {
entry:
br label %for.body
for.body:
%call = call i32 @add(i32 0, i32 0)
br label %for.inc
for.inc:
%call.1 = call i32 @add(i32 %call, i32 1)
br label %for.inc.1
for.inc.1:
%call.2 = call i32 @add(i32 %call.1, i32 2)
br label %for.inc.2
for.inc.2:
ret i32 %call.2
}
-simplifycfg
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -view-cfg
-S
109. 用LLVM觀察Loop Unroll (8/8)
109
define i32 @add(i32 %a, i32 %b) {
entry:
%add = add i32 %a, %b
ret i32 %add
}
define i32 @foo() {
entry:
%call = call i32 @add(i32 0, i32 0)
%call.1 = call i32 @add(i32 %call, i32 1)
%call.2 = call i32 @add(i32 %call.1, i32 2)
ret i32 %call.2
}
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg
-inline -constprop -S
-inline
define i32 @foo() {
entry:
%add.i = add i32 1, 2
ret i32 %add.i
}
110. 用LLVM觀察Loop Unroll (8/8)
-constprop
110
define i32 @add(i32 %a, i32 %b) {
entry:
%add = add i32 %a, %b
ret i32 %add
}
define i32 @foo() {
entry:
%call = call i32 @add(i32 0, i32 0)
%call.1 = call i32 @add(i32 %call, i32 1)
%call.2 = call i32 @add(i32 %call.1, i32 2)
ret i32 %call.2
}
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg
-inline -constprop -S
-inline
define i32 @foo() {
entry:
%add.i = add i32 1, 2
ret i32 %add.i
}
define i32 @foo() {
entry:
ret i32 3
}
114. Overview of GCC Optimization Pass
114
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
共 165 個pass 的 dump file!
115. Propagation
115
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
28 / 165 的 pass 在 Propagation!
116. Inline
116
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
3 / 165 的 pass 在 Inline!
117. DCE
117
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
13 / 165 的 pass 在 DCE!
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
118. CSE
118
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
4 / 165 的 pass 在 CSE!
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
119. Unroll
119
$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3
$ ls a.c.*
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
2 / 165 的 pass 在 Unroll!
120. Propagation + DCE + CSE + Inline + Unroll
120
50 / 165 !
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2
a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
121. Propagation + DCE + CSE + Inline + Unroll
121
50 / 165 !
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayout
a.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1
a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2
a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_sw
a.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmcons
a.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.ira
a.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reload
a.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreload
a.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2
a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2
a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.ree
a.c.020t.copyrename1 a.c.021t.ccp1 a.c.022t.forwprop1 聽a.c.完070t.ifcombine 這次a.c.113t.的ifcvt 分享a.c.175r.dfinit a.c.221r.pro_and_epilogue
a.c.071t.phiopt1 a.c.114t.vect a.c.176r.等cse1 於已a.c.222r.經
dse2
a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csa
a.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2
a.c.024t.esra 略懂a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2
a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3
a.c.026t.copyprop1 a.c.約077t.copyrename3 三分a.c.120t.之ivopts 一a.GCCc.184r.ce1 惹!!!
a.c.228r.cprop_hardreg
a.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dce
a.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotos
a.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbro
a.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4
a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2
a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stack
a.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignments
a.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.mach
a.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriers
a.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shorten
a.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrow
a.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2
a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.final
a.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinish
a.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
126. 總結
126
• Compiler Optimization很有趣, 但開始
玩之前一定要先讀一些基礎理論
• LLVM則是一個相當好的理論與實作的接軌