Making communications land - Are they received and understood as intended? we...
Â
Phases of the Compiler - Systems Programming
1. UNIT III: COMPILERS
3.2 PHASES OF THE COMPILER
A compiler is a software that accepts a program written in a high-level language and
produces its machine language equivalent. The process of compilation takes place in several
phases, which are shown below.
Source Program
Lexical Analyzer
Syntax Analyzer
Symbol Table
Intermediate Code
Generation & Semantic
Analyzer
Optimization
(optional)
Code Generation
Machine Code
Fig 1. The Compilation Process
Lexical Analysis Phase:
This is the first phase of a compiler. This phase is also called the scanning phase. The
compiler scans the source code from left to right, character by character, and groups
these characters into tokens. Each token represents a logically cohesive sequence of
characters such as variables, keywords, multi-character operators (such as :=, >=, ==,
etc). The main functions of this phase are summarised below:
(i)
(ii)
(iii)
(iv)
(v)
Identify the lexical units in a source statement.
Classify units into different lexical classes e.g., constants, reserved words, etc.,
and enter them in different tables.
build a descriptor (called a token) for each lexical unit.
ignore comments in the source program.
detect tokens which are not a part of the language.
The output of the lexical analysis phase goes to the syntax phase.
mukeshtekwani@hotmail.com
Page 1 of 6
2. Phases of the Compiler
Prof. Mukesh N. Tekwani
Syntax Analysis Phase:
This phase is also called the parsing phase. The following operations are performed in
this phase:
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Obtain tokens from lexical analyzer.
check whether the expression is syntactically correct.
report syntax errors, if any.
determine the statement class, i.e., is it an assignment statement, a condition
statement (if statement), etc.
group tokens into statements,
construct hierarchical structures called parse trees. These parse trees represent
the syntactic structure of the program.
Consider the statement X = Y + Z. It is represented by the parse tree as shown below:
=
X
+
Y
Z
Fig 2. A Parse Tree
Intermediate Code Generation and Semantic Analysis Phase:
The intermediate code produces a program in a different language, at an intermediate
level between the source code and the machine code. Intermediate languages are
sometimes assembly languages. The generation of an intermediate code offers the
following advantages:
(i)
Flexibility: a single lexical analyzer / parser can be used to generate code for
several different machines by providing separate back-ends that translate a
common intermediate language to a machine-specific assembly language.
(ii)
Intermediate code is used in interpretation. The intermediate code is executed
directly rather than translating it into binary code and storing it.
Semantic Phase:
The semantic phase has the following functions:
(i)
check phrases for semantic errors e.g., type-checking. In a C program, int x =
10.5 should be detected as a semantic error.
Page 2 of 6
mukeshtekwani@hotmail.com
3. Prof. Mukesh N. Tekwani
Phases of the Compiler
(ii)
semantic analyzer keeps track of types of identifiers and expressions, to verify
their consistent usage.
(iii)
semantic analyzer maintains the symbol table. The symbol table contains
information about each identifier in a program. This information includes
identifier type, scope of identifier, etc.
(iv)
using the symbol table, the semantic analyzer enforces a large number of rules
such as:
a. every identifier is declared before it is used.
b. no identifier is used in an inappropriate context (e.g., adding a string to an
integer);
c. subroutine or function calls have a correct number and type of arguments,
d. every function contains at least one statement that specifies a return value.
These values are checked at compile time, hence they are called static semantics.
(v)
Certain semantic rules are checked at run time; these are called dynamic
semantics. Examples of these are:
a. array subscript expression lie within the bounds of the array.
b. variables are never used in an expression unless they have been given a
value.
Symbol Table:
The symbol table is built and maintained by the semantic analysis phase. It maps each
identifier to the information known about it. This information includes the identifierâs
type (int, char, float, etc), internal structure (if any), and scope (the portion of the
program in which it is valid). Using the symbol table, the semantic analyzer enforces a
large variety of rules, e.g., it ensures that
!" a variable is declared before it is used,
!" no identifier is used in an inappropriate context (adding a string to an integer,
!" subroutine calls provide the correct number and types of arguments,
!" labels on the arms of a case statement are distinct constants,
!" every function contains at least one statement that specifies a return value.
Code Generation Phase:
The code generated depends upon the architecture of the target machine. A knowledge
of the instructions and addressing modes in target computer is necessary for code
generation process.
One of the important aspects of code generation is the efficient initialization of machine
resources. A number of assumptions may be made such as:
a) instruction types in target machines
mukeshtekwani@hotmail.com
Page 3 of 6
4. Phases of the Compiler
Prof. Mukesh N. Tekwani
b) commutative property of operators in an expression
c) proper usage of syntax for syntax directed translation.
Code Optimization Phase:
Optimization improves programs by making them smaller or faster or both. The goal of
code optimization is to translate a program into a new version that computes the same
result more efficiently â by taking less time, memory, and other system resources. For
example, the C compiler âTurbo Câ permits the programmer to optimize code for speed
or for size.
Code optimization is achieved in 2 ways:
a) rearranging computations in a program to make them execute more efficiently,
b) eliminating redundancies in a program.
Code optimization should not change the meaning of the program. Code optimization tries
to improve the program; the underlying algorithm is not affected. Thus, code optimization
cannot replace an inefficient algorithm with an algorithm which is more efficient. Code
optimization also cannot fully utilize the instruction set of a particular architecture. Thus,
code optimization is independent of the target machine and the PL.
An optimizing compiler is shown below. Note the presence of the optimization phase.
Source
Program
Front
End
Optimization
Phase
Back
End
Target
Program
Optimizing Transformations:
It is a rule for rewriting a segment of a program to improve its execution efficiency without
affecting its meaning. The two types of optimizing transformations are:
a) local transformations (applied over small segments of a program)
b) global transformations (applied over larger segments consisting of loops or function
bodies).
Some of the optimizing transformations used by compilers are:
i.
ii.
Compile time evaluation: Certain actions specified ina program can be
performed during the compilation stage itself. This eliminates the need to
perform them during execution stage. The main optimization of this type is
constant folding; if all the operands in an expression are constant, the operation
can be performed at compile time itself. The result of the operation, itself a
constant, then replaces the original expression. E.g., an assignment of the type
a:= 2.718/2 can be replaced by a:= 1.354. this eliminates an operation of
division.
Dead Code elimination:
Page 4 of 6
mukeshtekwani@hotmail.com
5. Prof. Mukesh N. Tekwani
Phases of the Compiler
Code which can be omitted from a program without affecting its results is called
dead code. If a variable is assigned a value which is never used subsequently in
the program, then that statement is a dead code.
E.g., j = 30;
Turbo C can point out such instances by way of a warning message.
iii.
Elimination of common sub-expressions:
Expressions which yield the same value are called common sub-expressions or
equivalent expressions. Consider the following code segments:
a:= b * c;
t := b*c;
:
a := t;
:
:
x := b * c + 3.5;
x := t + 3.5;
Here the two occurrences of b*c were eliminated by using a variable t.
iv.
Frequency Reduction:
Execution time of a program can be reduced by moving code from a part of a
program which is executed very frequently to another part of the program which
is executed fewer times. E.g., the transformation of loop optimization involves
moving loop invariant code out of a loop.
v.
Strength reduction:
This optimization technique reduces operation with a more efficient operation or
a series of operations that yield the same result in fewer machine clock cycles.
For example, multiplication by a power of two is replaced by a left shift, which
executes faster on most machines.
a = b * 4; becomes
a = b + b + b + b;
or
a = b << 2;
Similarly, division by powers of two are expensive operations and these can be
replaced with right shift.
c = d / 2; becomes
c = d >> 1;
Local and Global optimization:
a) Local transformations are applied over small segments of a program. This is a
preparatory phase for global optimization. It can be performed by the front end while
converting a source code into the IR. Local optimization provides limited benefits at
a low cost. The scope of this type of optimization is a basic block which is a
sequential set of instructions. Loop optimization cannot be performed by local
optimization.
b) Global transformations are applied over larger segments consisting of loops or
function bodies. It requires more analysis efforts to determine the feasibility of an
optimization. The techniques of control flow analysis and data flow analysis are
used to achieve global optimization.
mukeshtekwani@hotmail.com
Page 5 of 6
6. Phases of the Compiler
Prof. Mukesh N. Tekwani
Passes of a compiler:
Several phases of compilation are usually grouped into one pass consisting of reading an
input file and writing an output file.
For example, the following stages can be grouped into one pass:
i.
Lexical analysis,
ii.
Syntax analysis,
iii.
Semantic analysis, and
iv.
Intermediate code generation
It is desirable to have relatively few passes, since it takes time to read and write
intermediate files. On the other hand, if we group several phases into one pass, we may be
forced to keep the entire program in memory.
Difference between a phase and a pass of a compilation
Compilation proceeds through a set of well-defined phases; these are the lexical analysis,
syntax analysis, semantic analysis, intermediate code generation, machine independent code
optimization, code generation, and machine dependent code generation. Each phase
discovers information of use to later phases, or transforms the program into a form that is
more useful to the subsequent phases.
A pass is a phase or a set of phases that is serialized with respect to the rest of the
compilation. A pass does not start until the previous phases have been completed and it
finishes before any subsequent phases start.
REVIEW QUESTIONS
1.
List the principal phases of compilation and describe the work performed by each
phase.
2.
Explain the syntax analysis phase. What operations are carried out in this phase?
3.
Explain the semantic analysis phase. What operations are carried out in this phase?
4.
What is the difference between a phase and a pass of a compilation?
5.
What is the purpose of the compilerâs symbol table?
6.
What is the difference between static and dynamic semantics?
7.
What is meant by the term âcompiler passâ ? How does it differ from a phase of
compilation?
#####
Page 6 of 6
mukeshtekwani@hotmail.com