SlideShare ist ein Scribd-Unternehmen logo
1 von 38
1
Lexical Analysis
2
Structure
of a
Compiler
Source Language
Target Language
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
Front
End
Code Optimizer
Target Code Generator
Back
End
Int. Code Generator
Intermediate Code
3
Today!
Source Language
Target Language
Semantic Analyzer
Syntax Analyzer
Lexical Analyzer
Front
End
Code Optimizer
Target Code Generator
Back
End
Int. Code Generator
Intermediate Code
4
The Role of Lexical Analyzer
The lexical analyzer is the first phase of a
compiler.The main task of lexical
Analyzer(Scanner) is to read a stream of
characters as an input and produce a sequence
of tokens that the parser(Syntax Analyzer) uses
for syntax analysis.
5
Read Character Token
Symbol
Table
ParserLexical
Analyzer
input
Push back
character
Get Next Token
The Role of Lexical Analyzer (cont’d)
6
When a lexical analyzer is inserted b/w the parser and
input stream ,it interacts with two in the manner shown in
the above fig.It reads characters from the input ,groups
them into lexeme and passes the tokens formed by the
lexemes,together with their attribute value,to the
parser.In some situations ,the lexical analyzer has to
read some characters ahead before it can decide on the
token to be returned to the parser.
The Role of Lexical Analyzer (cont’d)
7
The Role of Lexical Analyzer (cont’d)
For example,a lexical analyzer for Pascal
must read ahead after it sees the character
>.If the next character is = ,then the
character sequence >= is the lexeme forming
the token for the “Greater than or equal to ”
operator.Other wise > is the lexeme forming
“Greater than ” operator ,and the lexical
analyzer has read one character too many.
8
The Role of Lexical Analyzer (cont’d)
The extra character has to be pushed back
on to the input,because it can be the
beginning of the next lexeme in the input.The
lexical analyzer and parser form a producer-
Consumer pair. The lexical analyzer
produces tokens and the parser consumes
them.Produced tokens can be held in a token
buffer until they are consumed.
9
The Role of Lexical Analyzer (cont’d)
The interaction b/w the two is constrained only by the
size of the buffer,because the lexical analyzer can
not proceed when the buffer is full and the parser
can not proceed when the buffer is empty.Commonly
,the buffer holds just one token.In this case,the
interaction can be implemented simply by making
the lexical analyzer be a procedure called by the
parser,returning tokens on demand.
10
The Role of Lexical Analyzer (cont’d)
The implementation of reading and pushing
back character is usually done by setting up
an input buffer.A block of character is read
into the buffer at a time; a pointer keeps track
of the portion of the input that has been
analyzed.Pushing back a character is
implemented by moving the pointer.
11
The Role of Lexical Analyzer (cont’d)
Some times lexical analyzer are divided into
two phases,the first is called Scanning and
the second is called Lexical Analysis.The
scanning is responsible for doing simple
tasks ,while the lexical analyzer does the
more complex operations.For example, a
Fortran might use a scanner to eliminate
blanks from the input.
12
Issues in Lexical Analysis
There are several reasons for separating the
analysis phase of compiling into lexical analysis
and parsing.
 Simpler design is perhaps the most important
consideration.The separation of lexical analysis
from syntax analysis often allows us to simply one
or the other of these phases.For example,a parser
embodying the conventions for comments and
white space is significantly more complex---
13
Issues in Lexical Analysis (cont’d)
--then one that can assume comments and
white space have already been removed by
a lexical analyzer.If we are designing a new
language,separating the lexical and
syntactic conventions can lead to a cleaner
over all language design.
14
Issues in Lexical Analysis (cont’d)
2) Compiler efficiency is improved.
A separate lexical analyzer allows us to construct a
specialized and potentially more efficient processor
for the task.A large amount of time is spent
reading the source program and partitioning it into
tokens.Specialized buffering techniques for
reading in put characters and processing tokens
can significantly speed up the performance of a
compiler.
15
Issues in Lexical Analysis (cont’d)
3) Compiler portability is enhanced.
Input alphabet peculiarities and other
device-specific anomalies can be restricted
to the lexical analyzer.The representation
of special or non- standard symbols,such
as ↑ in Pascal ,can be isolated in the lexical
analyzer.
16
Issues in Lexical Analysis (cont’d)
Specialized tools have been designed to help
automate the construction of lexical
analyzers and parsers when they are
separated.
17
What exactly is lexing?
Consider the code:
if (i==j);
z=1;
else;
z=0;
endif;
This is really nothing more than a string of characters:
i f _ ( i = =j);ntz=1;nelse;ntz=0;nendif;
During our lexical analysis phase we must divide this string into meaningful sub-
strings.
18
Tokens
 The output of our lexical analysis phase is a streams of tokens.
 A token is a syntactic category.
 In English this would be types of words or punctuation, such as
a “noun”, “verb”, “adjective” or “end-mark”.
 In a program, this could be an “identifier”, a “floating-point
number”, a “math symbol”, a “keyword”, etc…
19
Identifying Tokens
 A sub-string that represents an instance of a token is called a
lexeme.
 The class of all possible lexemes in a token is described by the
use of a pattern.
 For example, the pattern to describe an identifier (a variable) is
a string of letters, numbers, or underscores, beginning with a
non-number.
 Patterns are typically described using regular expressions.
20
Implementation
A lexical analyzer must be able to do three things:
1. Remove all whitespace and comments.
2. Identify tokens within a string.
3. Return the lexeme of a found token, as well as the
line number it was found on.
21
Example
if_ ( i = = j);ntz=1;nelse;ntz=0;nendif;
 Line Token Lexeme
 1 BLOCK_COMMAND if
 1 OPEN_PAREN (
 1 ID i
 1 OP_RELATION ==
 1 ID j
 1 CLOSE_PAREN )
 1 ENDLINE ;
 2 ID z
 2 ASSIGN =
 2 NUMBER 1
 2 ENDLINE ;
 3 BLOCK_COMMAND else
 Etc…
22
Lookahead
 Lookahead will typically be important to a lexical analyzer.
 Tokens are typically read in from left-to-right, recognized one at
a time from the input string.
 It is not always possible to instantly decide if a token is finished
without looking ahead at the next character. For example…
 Is “i” a variable, or the first character of “if”?
 Is “=” an assignment or the beginning of “==”?
23
TOKENS
The output of our lexical analysis phase is a streams of tokens.A
token is a syntactic category.
“A name for a set of input strings with related structure”
Example: “ID,” “NUM”, “RELATION“,”IF”
In English this would be types of words or punctuation, such as
a “noun”, “verb”, “adjective” or “end-mark”.
In a program, this could be an “identifier”, a “floating-point
number”, a “math symbol”, a “keyword”, etc…
24
Tokens (cont’d)
As an example,consider the following line of
code,which could be part of a “ C ” program.
a [index] = 4 + 2
25
Tokens (cont’d)
a ID
[ Left bracket
index ID
] Right bracket
= Assign
4 Num
+ plus sign
2 Num
26
Tokens (cont’d)
Each token consists of one or more
characters that are collected into a unit
before further processing takes place.
A scanner may perform other operations
along with the recognition of tokens.For
example,it may enter identifiers into the
symbol table.
27
Attributes For Tokens
The lexical analyzer collects information
about tokens into their associated
attributes.As a practical matter ,a token has
usually only a single attribute, a pointer to the
symbol-table entry in which the information
about the token is kept;the pointer becomes
the attribute for the token.
28
Attributes For Tokens (cont’d)
Let num be the token representing an integer.
when a sequence of digits appears in the input
stream,the lexical analyzer will pass num to the
parser.The value of the integer will be passed
along as an attribute of the token num.Logically,
the lexical analyzer passes both the token and the
attribute to the parser.
29
Attributes For Tokens (cont’d)
If we write a token and its attribute as a tuple
enclosed b/w < >, the input
33 + 89 – 60
is transformed into the sequence of tuples
< num, 33 > <+, > <num, 89 > <-, > <num,
60>
30
Attributes For Tokens (cont’d)
The token “+” has no attribute ,the second
components of the tuples ,the attribute ,play
no role during parsing,but are needed during
translation.
31
Attributes For Tokens (cont’d)
The token and associated attribute values in the “ C ”
statement.
E = M + C * 2
Tell me the token and their attribute value ?
32
Attributes For Tokens (cont’d)
< ID , pointer to symbol-table entry for E >
< assign_op , >
< ID , pointer to symbol entry for M >
< add_op , >
< ID , pointer to symbol entry for C >
< mult_op , >
< num ,integer value 2 >
33
Pattern
“The set of strings described by a rule called a pattern
associated with the token.”
The pattern is said to match each string in the set.
For example, the pattern to describe an identifier (a
variable) is a string of letters, numbers, or underscores,
beginning with a non-number.
The pattern for identifier token, ID, is
ID → letter (letter | digit)*
Patterns are typically described using regular
expressions.(RE)
34
Lexeme
“A lexeme is a sequence of characters in the
source program that is matched by the
pattern for a token”
For example, the pattern for the Relation_op
token contains six lexemes ( =, < >, <, < =, >,
>=) so the lexical analyzer should return a
relation_op token to parser whenever it sees
any one of the six.
35
Example
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123
36
Tokens,Patterns.Lexemes
3.1416, 0, 6.02E23
Token Sample Lexemes Informal Description of Pattern
const
if
relation
id
num
const
if
<, <=, =, < >, >, >=
pi, count, D2
const
if
< or <= or = or < > or >= or >
letter followed by letters and digits
any numeric constant
Classifies
Pattern
37
Tokens,Patterns.Lexemes (cont’d)
Example:
Const pi = 3.1416;
In the example above ,when the character
sequence “pi” appears in the source
program,a token representing an identifier is
retuned to the parser(ID).The returning of a
token is often implemented by passing an
integer corresponding to the token.
38
THANKS

Weitere ähnliche Inhalte

Was ist angesagt?

Three Address code
Three Address code Three Address code
Three Address code Pooja Dixit
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler DesignShine Raj
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notationsNikhil Sharma
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler DesignKuppusamy P
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture NotesFellowBuddy.com
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design MAHASREEM
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysissumitbardhan
 
Bottom - Up Parsing
Bottom - Up ParsingBottom - Up Parsing
Bottom - Up Parsingkunj desai
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysisBilalzafar22
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 

Was ist angesagt? (20)

Three Address code
Three Address code Three Address code
Three Address code
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
LL(1) parsing
LL(1) parsingLL(1) parsing
LL(1) parsing
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler Design
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
 
Chapter 5 Syntax Directed Translation
Chapter 5   Syntax Directed TranslationChapter 5   Syntax Directed Translation
Chapter 5 Syntax Directed Translation
 
Unit 3 sp assembler
Unit 3 sp assemblerUnit 3 sp assembler
Unit 3 sp assembler
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Parsing
ParsingParsing
Parsing
 
Algorithm analysis
Algorithm analysisAlgorithm analysis
Algorithm analysis
 
Bottom - Up Parsing
Bottom - Up ParsingBottom - Up Parsing
Bottom - Up Parsing
 
Compilers
CompilersCompilers
Compilers
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 

Andere mochten auch

Programming Languages / Translators
Programming Languages / TranslatorsProgramming Languages / Translators
Programming Languages / TranslatorsProject Student
 
Programming languages,compiler,interpreter,softwares
Programming languages,compiler,interpreter,softwaresProgramming languages,compiler,interpreter,softwares
Programming languages,compiler,interpreter,softwaresNisarg Amin
 
Minimization of DFA
Minimization of DFAMinimization of DFA
Minimization of DFAkunj desai
 
NFA or Non deterministic finite automata
NFA or Non deterministic finite automataNFA or Non deterministic finite automata
NFA or Non deterministic finite automatadeepinderbedi
 
Intermediate code- generation
Intermediate code- generationIntermediate code- generation
Intermediate code- generationrawan_z
 
Intermediate code generation
Intermediate code generationIntermediate code generation
Intermediate code generationRamchandraRegmi
 
Language translator
Language translatorLanguage translator
Language translatorasmakh89
 

Andere mochten auch (17)

Dfa vs nfa
Dfa vs nfaDfa vs nfa
Dfa vs nfa
 
Nfa vs dfa
Nfa vs dfaNfa vs dfa
Nfa vs dfa
 
optimization of DFA
optimization of DFAoptimization of DFA
optimization of DFA
 
Programming Languages / Translators
Programming Languages / TranslatorsProgramming Languages / Translators
Programming Languages / Translators
 
Optimization of dfa
Optimization of dfaOptimization of dfa
Optimization of dfa
 
NFA to DFA
NFA to DFANFA to DFA
NFA to DFA
 
DFA Minimization
DFA MinimizationDFA Minimization
DFA Minimization
 
Programming languages,compiler,interpreter,softwares
Programming languages,compiler,interpreter,softwaresProgramming languages,compiler,interpreter,softwares
Programming languages,compiler,interpreter,softwares
 
Minimization of DFA
Minimization of DFAMinimization of DFA
Minimization of DFA
 
NFA or Non deterministic finite automata
NFA or Non deterministic finite automataNFA or Non deterministic finite automata
NFA or Non deterministic finite automata
 
Intermediate code- generation
Intermediate code- generationIntermediate code- generation
Intermediate code- generation
 
Intermediate code generation
Intermediate code generationIntermediate code generation
Intermediate code generation
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Translators(Compiler, Assembler) and interpreter
Translators(Compiler, Assembler) and interpreterTranslators(Compiler, Assembler) and interpreter
Translators(Compiler, Assembler) and interpreter
 
Language translator
Language translatorLanguage translator
Language translator
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Code generation
Code generationCode generation
Code generation
 

Ähnlich wie Lexical Analysis: Breaking Down Source Code into Tokens (39

COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdfManishBej3
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysismengistu23
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and LexemeA. S. M. Shafi
 
Data design and analysis of computing tools
Data design and analysis of computing toolsData design and analysis of computing tools
Data design and analysis of computing toolsKamranAli649587
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Aman Sharma
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compilerSudhaa Ravi
 
Using Static Analysis in Program Development
Using Static Analysis in Program DevelopmentUsing Static Analysis in Program Development
Using Static Analysis in Program DevelopmentPVS-Studio
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdfTANZINTANZINA
 
automata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxautomata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxYashaswiniYashu9555
 

Ähnlich wie Lexical Analysis: Breaking Down Source Code into Tokens (39 (20)

COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdf
 
Assignment4
Assignment4Assignment4
Assignment4
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and Lexeme
 
Data design and analysis of computing tools
Data design and analysis of computing toolsData design and analysis of computing tools
Data design and analysis of computing tools
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Using Static Analysis in Program Development
Using Static Analysis in Program DevelopmentUsing Static Analysis in Program Development
Using Static Analysis in Program Development
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
automata theroy and compiler designc.pptx
automata theroy and compiler designc.pptxautomata theroy and compiler designc.pptx
automata theroy and compiler designc.pptx
 
Module 2
Module 2 Module 2
Module 2
 
SS & CD Module 3
SS & CD Module 3 SS & CD Module 3
SS & CD Module 3
 
Lexical Analysis.pdf
Lexical Analysis.pdfLexical Analysis.pdf
Lexical Analysis.pdf
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
11700220036.pdf
11700220036.pdf11700220036.pdf
11700220036.pdf
 
Compiler lec 8
Compiler lec 8Compiler lec 8
Compiler lec 8
 

Kürzlich hochgeladen

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 

Kürzlich hochgeladen (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 

Lexical Analysis: Breaking Down Source Code into Tokens (39

  • 2. 2 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code
  • 3. 3 Today! Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code
  • 4. 4 The Role of Lexical Analyzer The lexical analyzer is the first phase of a compiler.The main task of lexical Analyzer(Scanner) is to read a stream of characters as an input and produce a sequence of tokens that the parser(Syntax Analyzer) uses for syntax analysis.
  • 5. 5 Read Character Token Symbol Table ParserLexical Analyzer input Push back character Get Next Token The Role of Lexical Analyzer (cont’d)
  • 6. 6 When a lexical analyzer is inserted b/w the parser and input stream ,it interacts with two in the manner shown in the above fig.It reads characters from the input ,groups them into lexeme and passes the tokens formed by the lexemes,together with their attribute value,to the parser.In some situations ,the lexical analyzer has to read some characters ahead before it can decide on the token to be returned to the parser. The Role of Lexical Analyzer (cont’d)
  • 7. 7 The Role of Lexical Analyzer (cont’d) For example,a lexical analyzer for Pascal must read ahead after it sees the character >.If the next character is = ,then the character sequence >= is the lexeme forming the token for the “Greater than or equal to ” operator.Other wise > is the lexeme forming “Greater than ” operator ,and the lexical analyzer has read one character too many.
  • 8. 8 The Role of Lexical Analyzer (cont’d) The extra character has to be pushed back on to the input,because it can be the beginning of the next lexeme in the input.The lexical analyzer and parser form a producer- Consumer pair. The lexical analyzer produces tokens and the parser consumes them.Produced tokens can be held in a token buffer until they are consumed.
  • 9. 9 The Role of Lexical Analyzer (cont’d) The interaction b/w the two is constrained only by the size of the buffer,because the lexical analyzer can not proceed when the buffer is full and the parser can not proceed when the buffer is empty.Commonly ,the buffer holds just one token.In this case,the interaction can be implemented simply by making the lexical analyzer be a procedure called by the parser,returning tokens on demand.
  • 10. 10 The Role of Lexical Analyzer (cont’d) The implementation of reading and pushing back character is usually done by setting up an input buffer.A block of character is read into the buffer at a time; a pointer keeps track of the portion of the input that has been analyzed.Pushing back a character is implemented by moving the pointer.
  • 11. 11 The Role of Lexical Analyzer (cont’d) Some times lexical analyzer are divided into two phases,the first is called Scanning and the second is called Lexical Analysis.The scanning is responsible for doing simple tasks ,while the lexical analyzer does the more complex operations.For example, a Fortran might use a scanner to eliminate blanks from the input.
  • 12. 12 Issues in Lexical Analysis There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing.  Simpler design is perhaps the most important consideration.The separation of lexical analysis from syntax analysis often allows us to simply one or the other of these phases.For example,a parser embodying the conventions for comments and white space is significantly more complex---
  • 13. 13 Issues in Lexical Analysis (cont’d) --then one that can assume comments and white space have already been removed by a lexical analyzer.If we are designing a new language,separating the lexical and syntactic conventions can lead to a cleaner over all language design.
  • 14. 14 Issues in Lexical Analysis (cont’d) 2) Compiler efficiency is improved. A separate lexical analyzer allows us to construct a specialized and potentially more efficient processor for the task.A large amount of time is spent reading the source program and partitioning it into tokens.Specialized buffering techniques for reading in put characters and processing tokens can significantly speed up the performance of a compiler.
  • 15. 15 Issues in Lexical Analysis (cont’d) 3) Compiler portability is enhanced. Input alphabet peculiarities and other device-specific anomalies can be restricted to the lexical analyzer.The representation of special or non- standard symbols,such as ↑ in Pascal ,can be isolated in the lexical analyzer.
  • 16. 16 Issues in Lexical Analysis (cont’d) Specialized tools have been designed to help automate the construction of lexical analyzers and parsers when they are separated.
  • 17. 17 What exactly is lexing? Consider the code: if (i==j); z=1; else; z=0; endif; This is really nothing more than a string of characters: i f _ ( i = =j);ntz=1;nelse;ntz=0;nendif; During our lexical analysis phase we must divide this string into meaningful sub- strings.
  • 18. 18 Tokens  The output of our lexical analysis phase is a streams of tokens.  A token is a syntactic category.  In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”.  In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…
  • 19. 19 Identifying Tokens  A sub-string that represents an instance of a token is called a lexeme.  The class of all possible lexemes in a token is described by the use of a pattern.  For example, the pattern to describe an identifier (a variable) is a string of letters, numbers, or underscores, beginning with a non-number.  Patterns are typically described using regular expressions.
  • 20. 20 Implementation A lexical analyzer must be able to do three things: 1. Remove all whitespace and comments. 2. Identify tokens within a string. 3. Return the lexeme of a found token, as well as the line number it was found on.
  • 21. 21 Example if_ ( i = = j);ntz=1;nelse;ntz=0;nendif;  Line Token Lexeme  1 BLOCK_COMMAND if  1 OPEN_PAREN (  1 ID i  1 OP_RELATION ==  1 ID j  1 CLOSE_PAREN )  1 ENDLINE ;  2 ID z  2 ASSIGN =  2 NUMBER 1  2 ENDLINE ;  3 BLOCK_COMMAND else  Etc…
  • 22. 22 Lookahead  Lookahead will typically be important to a lexical analyzer.  Tokens are typically read in from left-to-right, recognized one at a time from the input string.  It is not always possible to instantly decide if a token is finished without looking ahead at the next character. For example…  Is “i” a variable, or the first character of “if”?  Is “=” an assignment or the beginning of “==”?
  • 23. 23 TOKENS The output of our lexical analysis phase is a streams of tokens.A token is a syntactic category. “A name for a set of input strings with related structure” Example: “ID,” “NUM”, “RELATION“,”IF” In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”. In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…
  • 24. 24 Tokens (cont’d) As an example,consider the following line of code,which could be part of a “ C ” program. a [index] = 4 + 2
  • 25. 25 Tokens (cont’d) a ID [ Left bracket index ID ] Right bracket = Assign 4 Num + plus sign 2 Num
  • 26. 26 Tokens (cont’d) Each token consists of one or more characters that are collected into a unit before further processing takes place. A scanner may perform other operations along with the recognition of tokens.For example,it may enter identifiers into the symbol table.
  • 27. 27 Attributes For Tokens The lexical analyzer collects information about tokens into their associated attributes.As a practical matter ,a token has usually only a single attribute, a pointer to the symbol-table entry in which the information about the token is kept;the pointer becomes the attribute for the token.
  • 28. 28 Attributes For Tokens (cont’d) Let num be the token representing an integer. when a sequence of digits appears in the input stream,the lexical analyzer will pass num to the parser.The value of the integer will be passed along as an attribute of the token num.Logically, the lexical analyzer passes both the token and the attribute to the parser.
  • 29. 29 Attributes For Tokens (cont’d) If we write a token and its attribute as a tuple enclosed b/w < >, the input 33 + 89 – 60 is transformed into the sequence of tuples < num, 33 > <+, > <num, 89 > <-, > <num, 60>
  • 30. 30 Attributes For Tokens (cont’d) The token “+” has no attribute ,the second components of the tuples ,the attribute ,play no role during parsing,but are needed during translation.
  • 31. 31 Attributes For Tokens (cont’d) The token and associated attribute values in the “ C ” statement. E = M + C * 2 Tell me the token and their attribute value ?
  • 32. 32 Attributes For Tokens (cont’d) < ID , pointer to symbol-table entry for E > < assign_op , > < ID , pointer to symbol entry for M > < add_op , > < ID , pointer to symbol entry for C > < mult_op , > < num ,integer value 2 >
  • 33. 33 Pattern “The set of strings described by a rule called a pattern associated with the token.” The pattern is said to match each string in the set. For example, the pattern to describe an identifier (a variable) is a string of letters, numbers, or underscores, beginning with a non-number. The pattern for identifier token, ID, is ID → letter (letter | digit)* Patterns are typically described using regular expressions.(RE)
  • 34. 34 Lexeme “A lexeme is a sequence of characters in the source program that is matched by the pattern for a token” For example, the pattern for the Relation_op token contains six lexemes ( =, < >, <, < =, >, >=) so the lexical analyzer should return a relation_op token to parser whenever it sees any one of the six.
  • 35. 35 Example Input: count = 123 Tokens: identifier : Rule: “letter followed by …” Lexeme: count assg_op : Rule: = Lexeme: = integer_const : Rule: “digit followed by …” Lexeme: 123
  • 36. 36 Tokens,Patterns.Lexemes 3.1416, 0, 6.02E23 Token Sample Lexemes Informal Description of Pattern const if relation id num const if <, <=, =, < >, >, >= pi, count, D2 const if < or <= or = or < > or >= or > letter followed by letters and digits any numeric constant Classifies Pattern
  • 37. 37 Tokens,Patterns.Lexemes (cont’d) Example: Const pi = 3.1416; In the example above ,when the character sequence “pi” appears in the source program,a token representing an identifier is retuned to the parser(ID).The returning of a token is often implemented by passing an integer corresponding to the token.