Computer system is made of hardware and software .The hardware understands instructions in the form of electronic charge or binary language in Software programming. So the programs written in High Level Language are fed into a series of tools and OS components to get the desired machine language.This is known as Language Processing System.
2. AGENDA
● LANGUAGE PROCESSING SYSTEM
● WHAT IS A COMPILER ?
● WHY USE A COMPILER ?
● TYPES OF A COMPILER
● COMPILER DESIGN
● PROS AND CONS
3. LANGUAGE PROCESSING SYSTEM :
⮚ Computer system is made of hardware and software .
⮚ The hardware understands instructions in the form of electronic charge or
binary language in Software programming.
⮚ So the programs written in High Level Language are fed into a series of
tools and OS components to get the desired machine language.
⮚ This is known as Language Processing System.
4.
5. PRE-PROCESSOR
● A special system software
● Performs preprocessing of High level language
● First step in language processing system
● Preprocessor mainly performs these tasks
i) Removing comments
ii) File inclusion
iii) Macro expansion.
6. ASSEMBLER
● Assembly language neither in binary form nor high level. It is an intermediate
state that is a combination of machine instructions and some other useful
data needed for execution.
● Assembler is a program that converts assembly language into machine code.
● Output of an assembler is an object file.
● Object file is a combination of machine instructions and the data required to
place these instructions in memory.
● Relocatable machine code can be loaded at any point and can be run. The
address within the program will be in such a way that it will cooperate for the
program movement.
7. LINKER
● Linker is a computer utility program that takes one or more object files
generated by the compiler and combines them into an single executable
object file.
● Linking is performed at both compile time and load time.
● It is performed at last step in compiling a program.
● Linking is of two types:
i) Static Linking
ii) Dynamic Linking
8. LOADER
● Loader is a part of a OS responsible for loading executable files into memory
and executes them by calculating the size of the program and creates
memory space for it.
● It initializes various registers to initiate execution.
● It converts the relocatable code into absolute code and tries to run the
program resulting in a running program or an error message (or sometimes
both can happen).
9. WHAT IS A COMPILER
● A compiler is a software program that transforms high-level source code that
is written by a developer in a high-level programming language into a low
level object code (binary code) in machine language, which is understood by
the processor.
● The process of converting high-level programming into machine language is
known as compilation.
● A compiler operates as a sequence of phases each of which transforms the
source program from one intermediate representation to another.
10. WHY USE A COMPILER?
● Verifies the entire program, so there are no syntax or semantic errors.
● The executable file is optimized by compiler, so it executes faster.
● Allows us to create an internal structure in memory.
● Allows the program to be machine independent if required(ie : no need to
execute the program on the same machine it was built).
● Helps in better understanding of language semantics and to handle language
performance issues.
● Translates entire program in other language and helps in checking for syntax
errors and data types .
11. Compiler Interpreter
Compiler generates an Intermediate Code. Interpreter generates Machine Code.
Compiler reads entire program for compilation.
Interpreter reads single statement at a time for
interpretation.
Compiler displays all errors and warning at time and
without fixing all errors program cannot be executed.
Since Interpreter reads single statement so an
interpreter display one error at a time and you have to
fix the error to interpret next statement.
Compiler needs more memory because of object (an
intermediate code) generation, every time when
program is being compiled an intermediate code
(object code) will be generated.
An Interpreter needs less memory to interpret the
program as interpreter does not generate any
intermediate code, it direct generates machine code.
Programming language like C, C++ use compilers.
Programming language like Python, Ruby use
interpreters.
12. Types of Compiler
• Native code compiler
• Cross compiler
• Source to source compiler
• One pass compiler
• Threaded code compiler
• Incremental compiler
• Source compiler
13. • Native code compiler:
• The compiler used to compile a source code for same type of platform only.
• The output generated by this type of compiler can only be run on the same type
of computer system and OS that the compiler itself runs on.
• Cross compiler:
• The compiler used to compile a source code for different kinds platform.
• Used in making software’s for embedded systems that can be used on multiple
platforms.
• Source to source compiler:
• Transcompiler or Transpier is a type of compiler that takes the source code of a
program written in one programming language as its input and produces the
equivalent source code in another programming language.
14. • One pass compiler:
• It is a type of compiler that compiles the whole process in only
one-pass.
• Threaded code compiler
• The compiler which simply replace a string by an appropriate binary
code.
• Incremental compiler:
• The compiler which compiles only the changed lines from the source
code and update the object code.
• Source compiler:
• The compiler which converts the source code high level language
code in to assembly language only.
15. STRUCTURE OF A COMPLIER
Two Phrase Complier
• Earlier, we depicted a compiler as a simple box that translates a source program into a target
program
• As the single-box model suggests, a compiler must both understand the source program that it
takes as input and map its functionality to the target machine.
• A design that decomposes compilation into two major pieces: a front end and a back end.
• The front end focuses on understanding the source-language program. The back end focuses on
mapping programs to the target machine.
• Intermediate representation (IR) becomes the compiler’s definitive representation for the code it
is translating
16. Three Phrase Complier
• The compiler writer can insert a third phase between the front end and the back end
• This middle section, or optimizer, takes an IR program as its input and produces a
semantically equivalent IR program as its output.
• This leads to the following compiler structure, termed a three-phase compiler.
• The optimizer may rewrite the IR in a way that is likely to produce a faster target program
from the back end or a smaller target program from the back end
• It may have other objectives, such as a program that produces fewer page faults or uses less
energy.
17. PHASES OF A COMPILER
• Any compiler must perform two major tasks
• Analysis of the source program
An intermediate representation is created from the give source code
• Synthesis of a machine-language program
Equivalent target program is created from the intermediate
representation.
18.
19. • Lexical analysis is the first phase of compiler which is also termed as
scanning. This phase scans the source code as a stream of characters and
converts it into lexes or tokens
• Each token will have <token-name, attribute values>
• It deletes the blank spaces and comments. Once a token is generated the
corresponding entry is made in the symbol table.
• Input: stream of characters
• Output: Token
• Example:
c=a+b*5;
<id, 1> <=> < id, 2> < +> <id, 3 > < * > < 5>
Lexical Analysis
20. Syntax Analysis
• Syntax analysis is the second phase of compiler which is also called as parsing.
• Parser converts the tokens produced by lexical analyser into a tree like
representation called parse tree.
• A parse tree describes the syntactic structure of the input.
• It follows operator precedence. The root node will have the operators and the
child nodes will have operands.
• Input: Tokens
• Output: Syntax tree
21. Semantic Analysis
• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax
tree.
• Performs type checking.
22. Intermediate Code Generation
• Compiler generates an intermediate code of the source code for the
target machine.
• It generates abstract code. It is in between the high-level language and
the machine language.
• This intermediate code should be generated in such a way that it makes it
easier to be translated into the target machine code.
• It uses three address code and used some temporary variables.
t1 = int to float <5>
t2 = <id,3>* t1
t3 = <id,2> + t2
<id,1> = t3
23. Code Optimization
• Code optimization phase produces optimized intermediate code as output. It results in
faster running machine code.
• It can be done by reducing the number of lines of code for a program.
• During the code optimization, the result of the program is not affected.
• To improve the code generation, the optimization involves
⮚ Deduction and removal of dead code (unreachable code).
⮚ Calculation of constants in expressions and terms.
⮚ Collapsing of repeated expression into temporary string.
⮚ Moving code outside the loop.
⮚ Removal of unwanted temporary variables.
t1 = <id,3>* <5.0>
<id,1>= <id,2> + t1
24. Code Generation
• It gets input from code optimization phase and produces the target code or object code as result.
• Intermediate instructions are translated into a sequence of machine instructions that perform
the same task.
• The code generation involves
o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data type.
o Generation of missing code.
MOV R1,<id,3>
MUL R1, #5.0
MOV R2<id,2>
ADD R1,R2
MOV <id,1>,R
25. Symbol Table Management
• The symbol table is a data structure containing a record of each variable name with fields
for the attributes of the name.
• The data structure should be designed to allow the complier to find the record for each
name quickly and to store or retrieve data from that record quickly.
• It is built in lexical and syntax analysis phases.
• The information is collected by the analysis phases of compiler and is used by synthesis
phases of compiler to generate code.
• It is used by compiler to achieve compile time efficiency.
31. Pros and Cons of C for compiler
development
• Compiler uses methods like lexical analyzer which converts parsed
data into an executable binary code which when developed C could
construct a specialized and potentially more efficient processor for
the task.
• But additional runtime overhead is required to generate and debug
lexer table ,tokens.
• Many lexical tool applications are developed using C language.
Ex:lex
32. • In syntax analysis all types of syntax errors and position at which it
has occurred will be found
• The main feature of C is a simple set of keywords ,syntax and a clean
style which makes it suitable for this phase.
• The main drawback in syntax analysis phase is cannot determine
whether a token is valid or not.
• Code optimization is an approach for enhancing the performance of
the code when developed using C the compiler optimizes the code
for faster execution.
• C compiler produces the machine code very fast compared to other
language compilers.
33. • C compiler can compile around 1000 lines of code in a second or two.
• Low level access to memory.
• Tail call optimization is not supported.It is a method to avoid new
stack frame for a function because the calling function will simply
return a value that it gets from the called function.
• C language does not have an automatic garbage collection so code
optimization may have a little lack due to this.
• Efficient exception handling is not possible in C.