GCC compilers use several stages to compile C/C++ code into executable programs:
1. The preprocessor handles #include, #define, and other preprocessor directives.
2. The front-end parses the code into an abstract syntax tree (AST) and performs type checking and semantic analysis.
3. The middle-end converts the AST into the GIMPLE intermediate representation and performs optimizations like dead code elimination and constant propagation before generating register transfer language (RTL).
4. The back-end selects target-specific instructions, allocates registers, schedules instructions, and outputs assembly code, which is then linked together with other object files by the linker into a final executable.
4. GCC - compilation controller
WhyGCC?
Because we use it
Multiple languages:C, C++, Fortran, Java, Mercury, …
Multiple architectures:ARM, MN10300, PDP-10, AVR32, …
5. Before we go
Whathappens when developers design a logo?
"Do whatyou do bestand outsource the rest"
6. GCC - compilation controller
cc1- preprocessor and compiler
Output→ AT&T/Intel assembler file (*.s)
Use Eflag to preprocess only
Use Sflag to preprocess and compile
as- assembler (from binutils)
Output→ objectfile (*.o)
Use cflag to ignore the linker
collect2- linker
Output→ shared object/ELF(*.so, *)
7. The preprocessor
Entry-point
Almostno safety
C++ standard defines interresting requirements
Min. #includenesting levels - 15
Min. number of macros in one translation unit- 4095
Min. number of character in line - 4096
GCC preprocessor is limited bymemory
8. Preprocessor on steroids
People use preprocessor to do varietyof things
Usually, itis justbad habit
Some people uses more than one preprocessor :-)
@Gynvael Coldwind
1floatfast_sin(intdeg){
2 staticconstfloatsin_table[]={<?php
3 for($i=0;$i<359;$i++)
4 echo(sin($i).",");
5 echo(sin($i));
6 ?>};
7 returnsin_table[deg%360];
8};
php my.c | gcc x c
Hmm... good idea, butkind of naïve. Surelywe can do better!
12. cc1 - From input to output
IN → Front-end → Middle-end → Back-end → OUT
13. Frontend overview
C/C++ → AST → Generic
Itallstarts with lexer &parser
Immediate representation - AST
Atthe end - language-independent
14. Parsing
Simple example:
Basic lexers base on regular expressions
Statements are tokenized
x can be mapped to {id, 1}, where 1 is an index in symbol
table
a, b → {id, 2}, {id, 3}
+, *can be mapped to token table
3 can be mapped to constanttable
The lexer does notdefine anyorder
It's justtokenization
1x=a+b*3;
31. Semantic analysis
Compiler needs to check syntax tree with language definition
This analysis saves type information in symboltable
Type checking is also performed (e.g. array[1.f]is ill-
formed)
Implicitconversions are likelyto happen
32. Symbol table
GCC mustrecord variables in so-called symboltable
Itcontains information abouttype, storage, scope, etc.
Itis builtincrementallybyanalysing phases
Scopes are veryimportant
33. Generic
The code is correctin regards to syntax &language semantics
Itis also stored as AST
Although AST is abstract, itis notgeneric enough
Language-specific AST nodes are replaced
Rightfrom now, middle-end kicks in
35. GIMPLE
Modified GENERIC form
Only3 operands per expression
Why3? Three-address instructions
Function calls are exception
No nested function calls
Some controlstructures are represented with ifs and gotos
36. GIMPLE
Too complex expressions are breaked down to expression
temporaries
Example:
a = b + c + d
becomes
T1 = b + c
a = T1 + d
39. Static Single Assignment (SSA)
Everyvariable is assigned onlyonce
Can be used as a read-onlyvalue multiple times
In ifstatemens merging takes place
PHIfunction
GCC performs over 20 optimizations on SSAtree
47. Inlining
Based on mem-space/time costs
Notpossible when:
fnoinlineswitch is used
conflicting __attribute__`s
Forbidden when:
callto alloca, setjmp, or longjmp
non-localgoto instruction
recursion
variadic argumentlist
48. Vectorization
One of GCC's concurrencymodel
Compiler uses sse, sse2, sse3, …to make program faster
Enabled byO3or ftreevectorize
There are more than 25 cases where vectorization can be
done
e.g. backward access, multidimensionalarrays, conditions,
nested loops, …
With ftreevectorizerverbose=Nswitch,
vectorization can be debugged
57. RTL Objects
There are multiple types of RTLobjects:
Expressions
Integers, wide integers
Strings
Vectors
58. RTL Classes
There are few categories of RTLexpressions
RTX_UNARY: NOT, SQRT, ABS
RTX_OBJ: MEM, REG, VALUE
RTX_COMPARE: GE, LT
RTX_COMM_COMPARE: EQ, NE
RTX_COMM_ARITH: PLUS, MULT
…
59. Register allocation
The task:ensure thatmachine resources (registers) are used
optimally.
There are two types of register allocators:
LocalRegister Allocator
GlobalRegister Allocator
Since GCC 4.8 messyreload.c was replaced with LRA
60. Register allocation
The problem:interference-graph-coloring
Colors == registers
Assign registers (colors) to temporaries
Finding k-coloring graph is NP-complete, so GCC uses
heurestic method
In case of failure some of variables are stored in memory
Two variables can share registers onlywhen onlyone of them
live atanypointof the program
61. Register allocation - example
Instructions Live variables
a
b = a + 2
b, a
c = b *b
a, c
b = c + 1
a, b
return a *b
62. We can mess with compiler
1registerintvariableasm("rbx");
However…this is nota good idea (unless you have a verygood
reason)
Variable can be optimized
Register stillcan be used byother variables
63. Instruction scheduling
Goal:minimize length of the criticalpath
Goal:maximize parallelism opportunities
How does itwork?
1. Build the data dependence graph
2. Calculate priorities for each instruction
3. Iterativelyschedule readyinstructions
Used before and after register allocation
64. Instruction scheduling
Works wellin case of unrelated expressions
1a=x+1;
2b=y+2;
3c=z+3;
IF RF EX ME WB
Software pipelining
IF RF EX ME WB
IF RF EX ME WB
65. Instruction selection
GCC picks instruction from the setavailable for given target
Each instruction has its cost
Addressing mode is also selected
68. Rematerialization
Re-compute value of particular variable multiple times
Smaller register pressure, more CPUwork
Should happen onlywhen time of the computation is lesser
than load
Expression mustnothave side effects
Experimentalresults show 1-6%execution performance _
69. Common Subexpression
Elimination
Finds subexpressions thatoccurs in multiple places
Decides whether additionaltemporarywould make program
faster
Example:
Becomes:
CSE works also with functions
1k=i+j+10;
2r=i+j+30;
1movl 8(%rsp), %esi
2addl 12(%rsp),%esi
3xorl %eax, %eax
4leal 30(%rsi),%edx
5addl $10, %esi
70. Loop-invariant code motion
Move variables thatdo notdepend on the loop outside its
body
Benefits:less calculations &constants in registers
Example:
Becomes:
Can introduce high register pressure → rematerialization
1for(inti=0;i<n;i++){
2 x=y+z;
3 a[i]=6*i+x*x;
4}
1x=y+z;
2t1=x*x;
3for(inti=0;i<n;i++){
4 a[i]=6*i+t1;
5}
73. Link time optimizations
GCC optimizations are constrained to single translation unit
When LTO is enabled objectfiles include GIMPLE trees
Localoptimizations are applied globally:
Dead code ellimination
Constantpropagation
…
74. GCC test suites
Gcc is tested byover 19k of tests
Testsuites employDejaGnu, Tcl, and expecttools
Each testis a C file with specialcomments
Testresults are
PASS:the testpassed as expected
XPASS:the testunexpectedlypassed
FAIL:the testunexpectedlyfailed
XFAIL:the testfailed as expected
ERROR:the testsuite detected an error
WARNING:the testsuite detected a possible problem
UNSUPPORTED:the testis notsupported on this platform
79. Auxilliary tools
Tools everydeveloper should be aware of…
nm- helps examinating symbols in objectfiles
objdump- displays information from objectfiles
c++filt- demangles C++ symbols
addr2line- converts offsets to lines and filenames
…, see binutils
80. Bonus slide
Which came first, the chicken or the egg?
Firstcompilers were written in…assembly
Itwas challenging because of poor hardware resources
Itis believed thatfirstcompiler was created byGrace Hopper,
for A-0
Firstcomplete compiler - FORTRAN, IBM, 1957
Firstmulti-architecture compiler - COBOL, 1960