Compiler Construction
Phases of a compiler
Analysis and synthesis phases
-------------------
-> Compilation Issues
-> Phases of compilation
-> Structure of compiler
-> Code Analysis
2. » Phases of a compiler
» Analysis and synthesis phases
Jeena Thomas, Asst Professor, CSE, SJCET Palai
2
3. » A compiler is a kind of translator.
TRANSLATORSoftware that accepts text in
certain language
(SOURCE LANGUAGE)
Text in another language
,preserving the meaning
of text
(TARGET/OBJECT
LANGUAGE)
Jeena Thomas, Asst Professor, CSE, SJCET Palai
3
4. » A translator, is a generalized form of compiler.
» When the object language is a low level language,
such a translator is called a compiler.
» This conversion process is essential for the hardware
to interpret and perform the semantics of the input
program.
» As an important part of this translation process, the
compiler reports to its user the presence of errors in
source program.
Jeena Thomas, Asst Professor, CSE, SJCET Palai
4
6. » Compiler is a program written in source language and
translates it into an equivalent target language.
Jeena Thomas, Asst Professor, CSE, SJCET Palai
6
7. » Source code
» a=(b+c)*(b+c)*2
Target code
MOV b,R2
ADD R2,c
MUL R2,R2
MUL R2, #2.0
MOV R2,a
7
Jeena Thomas, Asst Professor, CSE, SJCET Palai
8. » FORTRAN compilers of the late 1950s
» 18 person-years to build
8
Jeena Thomas, Asst Professor, CSE, SJCET Palai
9. » Writing a compiler gives a student experience with large-
scale applications development. Your compiler program may
be the largest program you write as a student. Experience
working with really big data structures and complex
interactions between algorithms will help you out on your
next big programming project.
» Compiler writing is one of the shining triumphs of CS theory.
It demonstrates the value of theory over the impulse to just
"hack up" a solution.
» Compiler writing is a basic element of programming
language research. Many language researchers write
compilers for the languages they design.
» Many applications have similar properties to one or more
phases of a compiler, and compiler expertise and tools can
help an application programmer working on other projects
besides compiler
Jeena Thomas, Asst Professor, CSE, SJCET Palai
9
10. » Throughout the 1950âs, compilers were considered
difficult programs to write.
» The first Fortan compiler took 18 staff-years o
implement.
» Good implementation languages, programming
environments, and software tool has been
developed as the systematic techniques for
handling many of important tasks that occur
during compilation.
» With these advances, a substantial compiler can be
implemented even as a student project in a one-
semester compiler-design course.
Jeena Thomas, Asst Professor, CSE, SJCET Palai
10
11. » is more broadly applicable and has been
employed in rather unexpected areas.
» Text-formatting languages, preprocessor
packages
» Silicon compiler for the creation of VLSI circuits
» Command languages of OS
» Query languages of Database systems
11
Jeena Thomas, Asst Professor, CSE, SJCET Palai
12. » Hierarchy of operations to be maintained
-to determine the correct order of evaluation of
the expressions.
ï Maintaining data type integrity
-each part of complex expression can be made of
different types.
ï Compiler as prior knowledge about the nature of
user defined data types.
- struct, enum, union,
ï Appropriate storage mappings for data structures
- allocation of memory for data
Jeena Thomas, Asst Professor, CSE, SJCET Palai
12
13. » The compiler must resolve the occurrence of
each variable name in a program to determine
the name space to which a referenced variable
belongs to.(Symbol table)
» Compiler should have facilities to handle different
control structures like âif-then-elseâ, âforâ, âwhileâ
etc. The compiler should have the facilities to
increment the loop variable and terminate the
loop.
13
Jeena Thomas, Asst Professor, CSE, SJCET Palai
14. » Process of compilation is highly complex, it is split
into a series of subprocesses called phases.
» A phase is a logically cohesive operation that takes
as input one representation of source program and
produces as output another representation.
» Activities of compilation split into two parts
1) Analysis part
2) Synthesis part
14
Jeena Thomas, Asst Professor, CSE, SJCET Palai
16. » Analysis of source program
» is done by the front end of compiler
» It determines meaning of source string.
» Synthesis of target program
» Is done by the back end of the compiler.
» An equivalent target string is constructed from
the output given by the front end of compiler.
16
Jeena Thomas, Asst Professor, CSE, SJCET Palai
17. » In compiling, analysis has three phases:
» Linear analysis: stream of characters read from
left-to-right and grouped into tokens; known as
lexical analysis or scanning
» Hierarchical analysis: tokens grouped
hierarchically with collective meaning; known
as parsing or syntax analysis
» Semantic analysis: check if the program
components fit together meaningfully
17
Jeena Thomas, Asst Professor, CSE, SJCET Palai
18. » Optimization of code
» Allocation of memory
» Generation of code
18
Jeena Thomas, Asst Professor, CSE, SJCET Palai
21. » Performs the linear analysis on the source
program.
» It reads a stream of characters making up the
source program from left to right and groups them
into tokens.
» A token is defined as a sequence of characters that
have a collective meaning.
» For each token identified, this phase also
determines the category of the token as identifier,
constant or reserved words and its attribute that
identifies the symbolâ position in the symbol table
21
Jeena Thomas, Asst Professor, CSE, SJCET Palai
23. » Identifiers are names of variables, constants,
functions, data types, etc.
» Store information associated with identifiers
» Information associated with different types of
identifiers can be different
» Information associated with variables are
name, type, address, size (for array), etc.
» Information associated with functions are name
, type of return value, parameters, address, etc.
23
Jeena Thomas, Asst Professor, CSE, SJCET Palai
24. » Consider the following statement
» a=(b+c)*(b+c)*2--------------------------------(1)
24
Jeena Thomas, Asst Professor, CSE, SJCET Palai
25. Symbol Category Attribute
a Identifier #1
= operator Assignment(1)
b Identifier #2
+ operator Arithmetic(1)
c Identifier #3
* operator Arithmetic(2)
( operator Open parenthesis(1)
) operator Closed parenthesis(1)
2 Constant #4
25
Jeena Thomas, Asst Professor, CSE, SJCET Palai
26. » This phase performs hierarchical analysis on the
source program.
» Here, the tokens are grouped into hierarchically
nested collections with collective meaning called
expressions or statements.
» It determines structure of source language.
» Represents the grammar / syntax of the language.
» These grammatical phrases are represented in the
form of parse tree.
26
Jeena Thomas, Asst Professor, CSE, SJCET Palai
27. » Describes the syntactic structure of input
» The terminal nodes represent the tokens and
interior nodes represent the expressions.
27
Jeena Thomas, Asst Professor, CSE, SJCET Palai
28. » Syntactic structures also represented using syntax
trees.
» A syntax tree is a compressed representation of
the parse tree, where the operators appear as
interior nodes and operands for this operator as
their children
28
Jeena Thomas, Asst Professor, CSE, SJCET Palai
29. » Syntax tree is a compressed representation of a
parse tree.
» The interior node in a syntax tree represent an
operator, whereas the interior nodes in a parse
tree represent an expression.
» The leaf node of a syntax tree represent the
operand, whereas leaf node in a parse tree
represent the tokens.
29
Jeena Thomas, Asst Professor, CSE, SJCET Palai
30. » Goal- is to determine the meaning of a source
string.
» It checks the source program for semantic errors
and gathers the type of information that can be
used in subsequent phases of compilation.
» Type checking for operations also performed
during this phase.
» Output- Annotated tree
30
Jeena Thomas, Asst Professor, CSE, SJCET Palai
31. » It is a part of the synthesis process of the
compiler.
» The intermediate code is the representation for
an abstract machine.
» Using the intermediate code, optimization and
code generation can be performed.
31
Jeena Thomas, Asst Professor, CSE, SJCET Palai
32. » It should be easily generated from semantic
representation of the source program.
» It should be easy to translate the intermediate
code to target language.
» It should be capable of holding the values
computed during translation.
» It should maintain precedence ordering of the
source language.
» It should be capable of holding the correct number
of operands of the instruction.
32
Jeena Thomas, Asst Professor, CSE, SJCET Palai
34. » The main aim of this phase is to improve on the
intermediate code to generate a code that runs
faster and/or occupies less space.
» It is used to establish trade off between
compilation speed and execution speed.
34
Jeena Thomas, Asst Professor, CSE, SJCET Palai
36. » The main aim of this phase is to allocate storage
and generate a relocatable machine/ assembly
code.
» Memory locations and registers are allocated for
variables.
» The instructions in intermediate code format are
converted into machine instructions.
36
Jeena Thomas, Asst Professor, CSE, SJCET Palai
37. » MOV R2, b
» ADD R2,c
» MUL R2,R2
» MUL R2, #2.0
» MOV R2, a
37
Jeena Thomas, Asst Professor, CSE, SJCET Palai
38. » The compiler also attempts to improve the target
code generated by the code generator by choosing
proper addressing modes to improve the
performance, replacing slow instructions by fast
ones and eliminating redundant instructions.
» MUL R2, #2.0-------------------ï SHL(Shift Left
Instruction)
38
Jeena Thomas, Asst Professor, CSE, SJCET Palai
39. » MOV b, R2
» ADD R2,c
» MUL R2,R2
» SHL R2
» MOV R2, a
39
Jeena Thomas, Asst Professor, CSE, SJCET Palai
41. » A symbol table is a data structure that contains a
record for each identifier with fields for the
attributes of the identifier.
» This data structure has facilities to
manipulate(add/delete) the elements in it.
» The type information about the identifier is
detected during lexical analysis phase and is
entered into the symbol table.
» This information is used during the intermediate
code generation and code generation phases of
compiler to verify type information.
41
Jeena Thomas, Asst Professor, CSE, SJCET Palai
42. 42
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Address Symbol Attribute Memory Location
1 A id,real 1000
2 B id,real 1100
3 C id,real 1110
43. » literal table maintains the details of constants
and strings used in the program.
» It reduces the size of a program in memory by
allowing reuse of constants and strings.
» It is also needed by the code generator to
construct symbolic addresses for literals.
43
Jeena Thomas, Asst Professor, CSE, SJCET Palai
45. » Each phase encounters errors.
» After detecting the errors, this phase must deal
with the errors to continue with the process of
compilation.
45
Jeena Thomas, Asst Professor, CSE, SJCET Palai
46. » 1. Lexical analyzer: Misspelled tokens
» 2.Syntax analyzer: syntax errors like missing
parenthesis
» 3.Intermediate code generator: Incompatible
operands for an operator
» 4. Code Optimizer: Unreachable statements
» 5. Code Generator :Memory restrictions to store a
variable. For example, when the value of an
integer variable exceeds its size.
» Symbol tables: Multiply declared identifiers
46
Jeena Thomas, Asst Professor, CSE, SJCET Palai
47. » Show the output of all the phases of he
compiler for the following line o code
» A[index]=4+2+index
47
Jeena Thomas, Asst Professor, CSE, SJCET Palai
51. » Scanner generators:
» generate lexical analyzers automatically from
the language specifications written using
regular expressions.
» It generates a finite automaton to recognize the
regular expression.
» Example-lex
51
Jeena Thomas, Asst Professor, CSE, SJCET Palai
52. » parser generators
» They produce syntax analyzers from Context
Free Grammar(CFG).
» As syntax analysis phase is highly complex and
consumes manual and compilation time, these
parser generators are highly useful.
» Example-yacc
52
Jeena Thomas, Asst Professor, CSE, SJCET Palai
53. » Syntax-directed translation engines
» These engines have routines to traverse the
parse tree and produce intermediate code.
» The basic idea is that one or more translations
are associated with each node of parse tree.
53
Jeena Thomas, Asst Professor, CSE, SJCET Palai
54. » Automatic code generators
» These tools convert the intermediate language
into machine language for the target machine
using a collection of rules.
» Template matching process is used.
» An intermediate language statement is replaced
by its equivalent machine language statement
54
Jeena Thomas, Asst Professor, CSE, SJCET Palai
55. » Data-flow engines
» It is used in code optimization.
» These tools perform good code optimization
using âdata-flow analysisâ which gathers
information that flows from one part of the
program to another.
55
Jeena Thomas, Asst Professor, CSE, SJCET Palai