SlideShare ist ein Scribd-Unternehmen logo
1 von 47
1
Lexical Analysis
• Basic Concepts & Regular Expressions
• What does a Lexical Analyzer do?
• How does it Work?
• Formalizing Token Definition & Recognition
• Reviewing Finite Automata Concepts
• Non-Deterministic and Deterministic FA
• Conversion Process
• Regular Expressions to NFA
• NFA to DFA
• Relating NFAs/DFAs /Conversion to Lexical Analysis
2
Lexical Analyzer in Perspective
lexical
analyzer parser
symbol
table
source
program
token
get next
token
Important Issue:
• What are Responsibilities of each Box ?
• Focus on Lexical Analyzer and Parser.
3
Lexical Analyzer in Perspective
• LEXICAL ANALYZER
• Scan Input
• Remove WS, NL, …
• Identify Tokens
• Create Symbol Table
• Insert Tokens into ST
• Generate Errors
• Send Tokens to Parser
• PARSER
• Perform Syntax Analysis
• Actions Dictated by Token
Order
• Update Symbol Table Entries
• Create Abstract Rep. of
Source
• Generate Errors
• And More…. (We’ll see later)
4
What Factors Have Influenced the
Functional Division of Labor ?
• Separation of Lexical Analysis From Parsing Presents
a Simpler Conceptual Model
• A parser embodying the conventions for comments and white space is
significantly more complex that one that can assume comments and
white space have already been removed by lexical analyzer.
• Separation Increases Compiler Efficiency
• Specialized buffering techniques for reading input characters and
processing tokens…
• Separation Promotes Portability.
• Input alphabet peculiarities and other device-specific anomalies can be
restricted to the lexical analyzer.
5
Introducing Basic Terminology
• What are Major Terms for Lexical Analysis?
• TOKEN
• A pair consisting of a token name and an optional attribute value.
• A particular keyword, or a sequence of input characters denoting
identifier.
• PATTERN
• A description of a form that the lexemes of a token may take.
• For keywords, the pattern is just a sequence of characters that
form keywords.
• LEXEME
• Actual sequence of characters that matches pattern and is
classified by a token
6
Introducing Basic Terminology
Token Sample Lexemes Informal Description of Pattern
const
if
relation
id
num
literal
const
if
<, <=, =, < >, >, >=
pi, count, D2
3.1416, 0, 6.02E23
“core dumped”
const
characters of i, f
< or <= or = or < > or >= or >
letter followed by letters and digits
any numeric constant
any characters between “ and “
except “
Classifies
Pattern
Actual values are critical. Info is :
1. Stored in symbol table
2. Returned to parser 7
Attributes for Tokens
• When more than one lexeme can match a pattern, a lexical
analyzer must provide the compiler additional information
about that lexeme matched.
• In formation about identifiers, its lexeme, type and location
at which it was first found is kept in symbol table.
• The appropriate attribute value for an identifier is a pointer
to the symbol table entry for that identifier.
8
Attributes for Tokens
Tokens influence parsing decision;
The attributes influence the translation of tokens.
Example: E = M * C ** 2
<id, pointer to symbol-table entry for E>
<assign_op, >
<id, pointer to symbol-table entry for M>
<mult_op, >
<id, pointer to symbol-table entry for C>
<exp_op, >
<num, integer value 2>
9
Handling Lexical Errors
• Its hard for lexical analyzer without the aid of other
components, that there is a source-code error.
• If the statement fi is encountered for the first time in a C
program it can not tell whether fi is misspelling of if statement
or a undeclared literal.
• Probably the parser in this case will be able to handle this.
• Error Handling is very localized, with Respect to Input Source
• For example: whil ( x = 0 ) do
generates no lexical errors in PASCAL
10
Handling Lexical Errors
• In what Situations do Errors Occur?
• Lexical analyzer is unable to proceed because none of the
patterns for tokens matches a prefix of remaining input.
• Panic mode Recovery
• Delete successive characters from the remaining input until
the analyzer can find a well-formed token.
• May confuse the parser – creating syntax error
• Possible error recovery actions:
• Deleting or Inserting Input Characters
• Replacing or Transposing Characters
11
Buffer Pairs
• Lexical analyzer needs to look ahead several characters
beyond the lexeme for a pattern before a match can be
announced.
• Use a function ungetc to push look-ahead characters back
into the input stream.
• Large amount of time can be consumed moving characters.
Special Buffering Technique
Use a buffer divided into two N-character halves
N = Number of characters on one disk block
One system command read N characters
Fewer than N character => eof
12
Buffer Pairs (2)
• Two pointers lexeme beginning and forward to the input buffer are
maintained.
• The string of characters between the pointers is the current lexeme.
• Initially both pointers point to first character of the next lexeme to be
found. Forward pointer scans ahead until a match for a pattern is
found
• Once the next lexeme is determined, the forward pointer is set to the
character at its right end.
• After the lexeme is processed both pointers are set to the character
immediately past the lexeme
Lexeme_beginning forward
Comments and white space can be treated as patterns that yield no token
M=E eof2**C*
13
Code to advance forward pointer
1. This buffering scheme works quite well most of the time
but with it amount of lookahead is limited.
2. Limited lookahead makes it impossible to recognize tokens
in situations where the distance, forward pointer must
travel is more than the length of buffer.
Pitfalls:
14
if forward at the end of first half then begin
reload second half ;
forward : = forward + 1;
end
else if forward at end of second half then begin
reload first half ;
move forward to beginning of first half
end
else forward : = forward + 1;
Specification of Tokens
15
Regular expressions are an important notation for specifying
lexeme patterns
An alphabet is a finite set of symbols.
• Typical example of symbols are letters, digits and punctuation etc.
• The set {0, 1} is the binary alphabet.
A string over an alphabet is a finite sequence of symbols drawn from that
alphabet.
• The length is string s is denoted as |s|
• Empty string is denoted by ε
Prefix: ban, banana, ε, etc are the prefixes of banana
Suffix: nana, banana, ε, etc are suffixes of banana
Kleene or closure of a language L, denoted by L*.
• L*: concatenation of L zero or more times
• L0: concatenation of L zero times
• L+: concatenation of L one or more times
Kleene closure
L* denotes “zero or more concatenations of” L
16
Example
Let: L = { a, b, c, ..., z }
D = { 0, 1, 2, ..., 9 }
D+ = “The set of strings with one or more digits”
L  D = “The set of all letters and digits (alphanumeric characters)”
LD = “The set of strings consisting of a letter followed by a digit”
L* = “The set of all strings of letters, including , the empty string”
( L  D )* = “Sequences of zero or more letters and digits”
L ( ( L  D )* ) = “Set of strings that start with a letter, followed by zero or
more letters and digits.”
17
Rules for specifying Regular
Expressions
Regular expressions over alphabet 
1.  is a regular expression that denotes {}.
2. If a is a symbol (i.e., if a ), then a is a regular expression
that denotes {a}.
3. Suppose r and s are regular expressions denoting the
languages L(r) and L(s). Then
a) (r) | (s) is a regular expression denoting L(r) U L(s).
b) (r)(s) is a regular expression denoting L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r). 18
How to “Parse” Regular Expressions
• Precedence:
• * has highest precedence.
• Concatenation as middle precedence.
• | has lowest precedence.
• Use parentheses to override these rules.
• Examples:
• a b* = a (b*)
• If you want (a b)* you must use parentheses.
• a | b c = a | (b c)
• If you want (a | b) c you must use parentheses.
• Concatenation and | are associative.
• (a b) c = a (b c) = a b c
• (a | b) | c = a | (b | c) = a | b | c
• Example:
• b d | e f * | g a = (b d) | (e (f *)) | (g a)
19
Example
• Let  = {a, b}
• The regular expression a | b denotes the set {a, b}
• The regular expression (a|b)(a|b) denotes {aa, ab, ba, bb}
• The regular expression a* denotes the set of all strings of
zero or more a’s. i.e., {, a, aa, aaa, ….. }
• The regular expression (a|b)* denotes the set containing
zero or more instances of an a or b.
• The regular expression a|a*b denotes the set containing
the string a and all strings consisting of zero or more a’s
followed by one b.
20
Regular Definition
• If Σ is an alphabet of basic symbols then a regular
definition is a sequence of the following form:
d1r1
d2r2
……..
dnrn
where
• Each di is a new symbol such that di  Σ and di dj where
j < I
• Each ri is a regular expression over Σ  {d1,d2,…,di-1) 21
Regular Definition
22
Addition Notation / Shorthand
23
UnsignedNumber 1240, 39.45, 6.33E15, or 1.578E-41
digit  0 | 1 | 2 | … | 9
digits  digit digit*
optional_fraction  . digits | 
optional_exponent  ( E ( + | -| ) digits) | 
num  digits optional_fraction optional_exponent
digit  0 | 1 | 2 | … | 9
digits  digit+
optional_fraction  (. digits ) ?
optional_exponent  ( E ( + | -) ? digits) ?
num  digits optional_fraction optional_exponent
Shorthand
24
Token Recognition
How can we use concepts developed so far to assist in
recognizing tokens of a source language ?
Assume Following Tokens:
if, then, else, relop, id, num
Given Tokens, What are Patterns ?
if  if
then  then
else  else
relop  < | <= | > | >= | = | <>
id  letter ( letter | digit )*
num  digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?
Grammar:
stmt  |if expr then stmt
|if expr then stmt else stmt
|
expr  term relop term | term
term  id | num
26
What Else Does Lexical Analyzer Do?
Scan away blanks, new lines, tabs
Can we Define Tokens For These?
blank  blank
tab  tab
newline  newline
delim  blank | tab | newline
ws  delim +
In these cases no token is returned to parser
27
Overall
ws
if
then
else
id
num
<
<=
=
< >
>
>=
-
if
then
else
id
num
relop
relop
relop
relop
relop
relop
-
-
-
-
pointer to table entry
Exact value
LT
LE
EQ
NE
GT
GE
Note: Each token has a unique token identifier to define category of lexemes
28
Constructing Transition Diagrams for
Tokens
• Transition Diagrams (TD) are used to represent the tokens
• As characters are read, the relevant TDs are used to attempt to
match lexeme to a pattern
• Each TD has:
• States : Represented by Circles
• Actions : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)
• Edges: arrows connecting the states
• Each TD is Deterministic (assume) - No need to choose
between 2 different actions !
29
Example TDs
start
other
=>
0 6 7
8
* RTN(GT)
RTN(GE)> = :
We’ve accepted “>” and have read one extra char that must be
unread. 30
Example : All RELOPs
start <
0
other
=
6 7
8
return(relop, LE)
5
4
>
=
1 2
3
other
>
=
*
*
return(relop, NE)
return(relop, LT)
return(relop, EQ)
return(relop, GE)
return(relop, GT)
31
Example TDs : id and delim
id :
delim :
start delim
28
other
3029
delim
*
return( get_token(), install_id())
start letter
9
other
1110
letter or digit
*
Either returns ptr or “0” if reserved
32
Example TDs : Unsigned #s
1912 1413 1615 1817
start otherdigit . digit E + | - digit
digit
digit
digit
E
digit
*
start digit
25
other
2726
digit
*
start digit
20
* .
21
digit
24
other
23
digit
digit
22
*
Questions: Is ordering important for unsigned #s ?
Why are there no TDs for then, else, if ?
return(num, install_num())
33
QUESTION :
What would the transition diagram
(TD) for strings containing each
vowel, in their strict lexicographical
order, look like?
34
Answer
cons  B | C | D | F | G | H | J | … | N | P | … | T | V | .. | Z
string  cons* A cons* E cons* I cons* O cons* U cons*
otherUOIEA
consconsconsconsconscons
start
error
accept
Note: The error path is taken
if the character is other than
a cons or the vowel in the lex
order.
35
Capturing Multiple Tokens
Capturing keyword “begin”
Capturing variable names
What if both need to happen at the same time?
b e g i n WS
WS – white space
A – alphabetic
AN – alphanumericA
AN
WS
start
start
36
Capturing Multiple Tokens
b e g i n WS
WS – white space
A – alphabetic
AN – alphanumeric
A-b
AN
WS
AN
Machine is much more complicated – just for these two tokens!
start
37
Finite State Automata (FSAs)
• “Finite State Machines”, “Finite Automata”, “FA”
• A recognizer for a language is a program that takes
as input a string x and answers “yes” if x is a
sentence of the language and “no” otherwise.
• The regular expression is compiled into a
recognizer by constructing a generalized transition
diagram called a finite automaton.
• Each state is labeled with a state name
• Directed edges, labeled with symbols
• Two types
• Deterministic (DFA)
• Non-deterministic (NFA)
38
Nondeterministic Finite Automata
A nondeterministic finite automaton (NFA) is a
mathematical model that consists of
1. A set of states S
2. A set of input symbols 
3. A transition function that maps state/symbol
pairs to a set of states
4. A special state s0 called the start state
5. A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no
39
Example – NFA : (a|b)*abb
S = { 0, 1, 2, 3 }
s0 = 0
F = { 3 }
 = { a, b }
start
0 3b21 ba
a
b
s
t
a
t
e
i n p u t
0
1
2
a b
{ 0, 1 }
-- { 2 }
-- { 3 }
{ 0 }
 (null) moves possible
ji 
Switch state but do not
use any input symbol
Transition Table
40
How Does An NFA Work ?
start
0 3b21 ba
a
b • Given an input string, we trace moves
• If no more input & in final state, ACCEPT
EXAMPLE:
Input: ababb
move(0, a) = 1
move(1, b) = 2
move(2, a) = ? (undefined)
REJECT !
move(0, a) = 0
move(0, b) = 0
move(0, a) = 1
move(1, b) = 2
move(2, b) = 3
ACCEPT !
-OR-
41
Handling Undefined Transitions
We can handle undefined transitions by defining one more
state, a “death” state, and transitioning all previously
undefined transition to this death state.
start
0 3b21 ba
a
b
4
a, b
a
a
 42
Other Concepts
start
0 3b21 ba
a
b
Not all paths may result in acceptance.
aabb is accepted along path : 0  0  1  2  3
BUT… it is not accepted along the valid path:
0  0  0  0  0
43
Deterministic Finite Automata
A DFA is an NFA with the following restrictions:
•  moves are not allowed
• For every state s S, there is one and only one path from s
for every input symbol a  .
Since transition tables don’t have any alternative options, DFAs
are easily simulated via an algorithm.
s  s0
c  nextchar;
while c  eof do
s  move(s,c);
c  nextchar;
end;
if s is in F then return “yes”
else return “no”
44
Example – DFA : (a|b)*abb
start
0 3b21 ba
a
b
start
0 3b21 ba
b
a
b
a
a
What Language is Accepted?
Recall the original NFA:
45
Relation between RE, NFA and DFA
1. There is an algorithm for converting any RE into an NFA.
2. There is an algorithm for converting any NFA to a DFA.
3. There is an algorithm for converting any DFA to a RE.
These facts tell us that REs, NFAs and DFAs have equivalent
expressive power.
All three describe the class of regular languages.
46
NFA vs DFA
• An NFA may be simulated by algorithm, when NFA is constructed
from the R.E
• Algorithm run time is proportional to |N| * |x| where |N| is the
number of states and |x| is the length of input
• Alternatively, we can construct DFA from NFA and uses it to
recognize input
• The space requirement of a DFA can be large. The RE
(a+b)*a(a+b)(a+b)….(a+b) [n-1 (a+b) at the end] has no DFA
with less than 2n states. Fortunately, such RE in practice does
not occur often
space
required
O(|r|) O(|r|*|x|)
O(|x|)O(2|r|)DFA
NFA
time to
simulate
where |r| is the length of the regular expression.
47
Thank You
Any Questions?
48

Weitere ähnliche Inhalte

Was ist angesagt?

A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler DesignAkhil Kaushik
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing ThesisHemant Sharma
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design MAHASREEM
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Aman Sharma
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignAkhil Kaushik
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzerahmed51236
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsShiraz316
 
LR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingLR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingR Islam
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisMunni28
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisNayemid4676
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler designSudip Singh
 

Was ist angesagt? (20)

A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler Design
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Turing machine-TOC
Turing machine-TOCTuring machine-TOC
Turing machine-TOC
 
Compiler Chapter 1
Compiler Chapter 1Compiler Chapter 1
Compiler Chapter 1
 
Church Turing Thesis
Church Turing ThesisChurch Turing Thesis
Church Turing Thesis
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design Syntax Analysis in Compiler Design
Syntax Analysis in Compiler Design
 
Lexical Analysis - Compiler design
Lexical Analysis - Compiler design Lexical Analysis - Compiler design
Lexical Analysis - Compiler design
 
Lexical Analysis - Compiler Design
Lexical Analysis - Compiler DesignLexical Analysis - Compiler Design
Lexical Analysis - Compiler Design
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzer
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
LR Parsing
LR ParsingLR Parsing
LR Parsing
 
Predictive parser
Predictive parserPredictive parser
Predictive parser
 
LR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingLR(1) and SLR(1) parsing
LR(1) and SLR(1) parsing
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)
 
Chapter 5 Syntax Directed Translation
Chapter 5   Syntax Directed TranslationChapter 5   Syntax Directed Translation
Chapter 5 Syntax Directed Translation
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 

Andere mochten auch

Compiler Design - Introduction to Compiler
Compiler Design - Introduction to CompilerCompiler Design - Introduction to Compiler
Compiler Design - Introduction to CompilerIffat Anjum
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Fog computing ( foggy cloud)
Fog computing  ( foggy cloud)Fog computing  ( foggy cloud)
Fog computing ( foggy cloud)Iffat Anjum
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and LexemesBen Scholzen
 
Lecture 05 syntax analysis 2
Lecture 05 syntax analysis 2Lecture 05 syntax analysis 2
Lecture 05 syntax analysis 2Iffat Anjum
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)Sami Said
 
Single linked list
Single linked listSingle linked list
Single linked listSayantan Sur
 
Lecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptxLecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptxIffat Anjum
 
Cognitive radio network_MS_defense_presentation
Cognitive radio network_MS_defense_presentationCognitive radio network_MS_defense_presentation
Cognitive radio network_MS_defense_presentationIffat Anjum
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler ConstructionSarmad Ali
 
Introduction to Parse
Introduction to ParseIntroduction to Parse
Introduction to Parseabeymm
 
Distributed contention based mac protocol for cognitive radio
Distributed contention based mac protocol for cognitive radioDistributed contention based mac protocol for cognitive radio
Distributed contention based mac protocol for cognitive radioIffat Anjum
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Iffat Anjum
 
Top down parsing(sid) (1)
Top down parsing(sid) (1)Top down parsing(sid) (1)
Top down parsing(sid) (1)Siddhesh Pange
 

Andere mochten auch (20)

Compiler Design - Introduction to Compiler
Compiler Design - Introduction to CompilerCompiler Design - Introduction to Compiler
Compiler Design - Introduction to Compiler
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Fog computing ( foggy cloud)
Fog computing  ( foggy cloud)Fog computing  ( foggy cloud)
Fog computing ( foggy cloud)
 
Lecture4 lexical analysis2
Lecture4 lexical analysis2Lecture4 lexical analysis2
Lecture4 lexical analysis2
 
About Tokens and Lexemes
About Tokens and LexemesAbout Tokens and Lexemes
About Tokens and Lexemes
 
Lecture 05 syntax analysis 2
Lecture 05 syntax analysis 2Lecture 05 syntax analysis 2
Lecture 05 syntax analysis 2
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)
 
Single linked list
Single linked listSingle linked list
Single linked list
 
Lexical
LexicalLexical
Lexical
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 
Lecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptxLecture 13 intermediate code generation 2.pptx
Lecture 13 intermediate code generation 2.pptx
 
Cognitive radio network_MS_defense_presentation
Cognitive radio network_MS_defense_presentationCognitive radio network_MS_defense_presentation
Cognitive radio network_MS_defense_presentation
 
Cs419 lec8 top-down parsing
Cs419 lec8    top-down parsingCs419 lec8    top-down parsing
Cs419 lec8 top-down parsing
 
Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
Introduction to Parse
Introduction to ParseIntroduction to Parse
Introduction to Parse
 
Distributed contention based mac protocol for cognitive radio
Distributed contention based mac protocol for cognitive radioDistributed contention based mac protocol for cognitive radio
Distributed contention based mac protocol for cognitive radio
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
Top down parsing(sid) (1)
Top down parsing(sid) (1)Top down parsing(sid) (1)
Top down parsing(sid) (1)
 
07 top-down-parsing
07 top-down-parsing07 top-down-parsing
07 top-down-parsing
 

Ähnlich wie Lecture 02 lexical analysis

3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdfTANZINTANZINA
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01riddhi viradiya
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptNderituGichuki1
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfDrIsikoIsaac
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysisIffat Anjum
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
 
Compiler Designs
Compiler DesignsCompiler Designs
Compiler Designswasim liam
 
Learn C LANGUAGE at ASIT
Learn C LANGUAGE at ASITLearn C LANGUAGE at ASIT
Learn C LANGUAGE at ASITASIT
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IVManoj Patil
 
Regular expressions h1
Regular expressions h1Regular expressions h1
Regular expressions h1Rajendran
 
COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdfManishBej3
 

Ähnlich wie Lecture 02 lexical analysis (20)

Syntax
SyntaxSyntax
Syntax
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
7645347.ppt
7645347.ppt7645347.ppt
7645347.ppt
 
Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01Unitiv 111206005201-phpapp01
Unitiv 111206005201-phpapp01
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdf
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
Compiler Designs
Compiler DesignsCompiler Designs
Compiler Designs
 
Learn C LANGUAGE at ASIT
Learn C LANGUAGE at ASITLearn C LANGUAGE at ASIT
Learn C LANGUAGE at ASIT
 
System Programming Unit IV
System Programming Unit IVSystem Programming Unit IV
System Programming Unit IV
 
Regular expressions h1
Regular expressions h1Regular expressions h1
Regular expressions h1
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdf
 

Mehr von Iffat Anjum

Lecture 16 17 code-generation
Lecture 16 17 code-generationLecture 16 17 code-generation
Lecture 16 17 code-generationIffat Anjum
 
Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environmentIffat Anjum
 
Lecture 12 intermediate code generation
Lecture 12 intermediate code generationLecture 12 intermediate code generation
Lecture 12 intermediate code generationIffat Anjum
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Iffat Anjum
 
Lecture 09 syntax analysis 05
Lecture 09 syntax analysis 05Lecture 09 syntax analysis 05
Lecture 09 syntax analysis 05Iffat Anjum
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
Lecture 07 08 syntax analysis-4
Lecture 07 08 syntax analysis-4Lecture 07 08 syntax analysis-4
Lecture 07 08 syntax analysis-4Iffat Anjum
 
Lecture 06 syntax analysis 3
Lecture 06 syntax analysis 3Lecture 06 syntax analysis 3
Lecture 06 syntax analysis 3Iffat Anjum
 
Lecture 03 lexical analysis
Lecture 03 lexical analysisLecture 03 lexical analysis
Lecture 03 lexical analysisIffat Anjum
 
On qo s provisioning in context aware wireless sensor networks for healthcare
On qo s provisioning in context aware wireless sensor networks for healthcareOn qo s provisioning in context aware wireless sensor networks for healthcare
On qo s provisioning in context aware wireless sensor networks for healthcareIffat Anjum
 
Data link control
Data link controlData link control
Data link controlIffat Anjum
 
Pnp mac preemptive slot allocation and non preemptive transmission for provid...
Pnp mac preemptive slot allocation and non preemptive transmission for provid...Pnp mac preemptive slot allocation and non preemptive transmission for provid...
Pnp mac preemptive slot allocation and non preemptive transmission for provid...Iffat Anjum
 
Qo s based mac protocol for medical wireless body area sensor networks
Qo s based mac protocol for medical wireless body area sensor networksQo s based mac protocol for medical wireless body area sensor networks
Qo s based mac protocol for medical wireless body area sensor networksIffat Anjum
 
A reinforcement learning based routing protocol with qo s support for biomedi...
A reinforcement learning based routing protocol with qo s support for biomedi...A reinforcement learning based routing protocol with qo s support for biomedi...
A reinforcement learning based routing protocol with qo s support for biomedi...Iffat Anjum
 
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...Iffat Anjum
 
Quality of service aware mac protocol for body sensor networks
Quality of service aware mac protocol for body sensor networksQuality of service aware mac protocol for body sensor networks
Quality of service aware mac protocol for body sensor networksIffat Anjum
 
Multicastingand multicast routing protocols
Multicastingand multicast routing protocolsMulticastingand multicast routing protocols
Multicastingand multicast routing protocolsIffat Anjum
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array) Iffat Anjum
 
Multicastingand multicast routing protocols
Multicastingand multicast routing protocolsMulticastingand multicast routing protocols
Multicastingand multicast routing protocolsIffat Anjum
 

Mehr von Iffat Anjum (20)

Lecture 16 17 code-generation
Lecture 16 17 code-generationLecture 16 17 code-generation
Lecture 16 17 code-generation
 
Lecture 14 run time environment
Lecture 14 run time environmentLecture 14 run time environment
Lecture 14 run time environment
 
Lecture 12 intermediate code generation
Lecture 12 intermediate code generationLecture 12 intermediate code generation
Lecture 12 intermediate code generation
 
Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2Lecture 11 semantic analysis 2
Lecture 11 semantic analysis 2
 
Lecture 09 syntax analysis 05
Lecture 09 syntax analysis 05Lecture 09 syntax analysis 05
Lecture 09 syntax analysis 05
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Lecture 07 08 syntax analysis-4
Lecture 07 08 syntax analysis-4Lecture 07 08 syntax analysis-4
Lecture 07 08 syntax analysis-4
 
Lecture 06 syntax analysis 3
Lecture 06 syntax analysis 3Lecture 06 syntax analysis 3
Lecture 06 syntax analysis 3
 
Lecture 03 lexical analysis
Lecture 03 lexical analysisLecture 03 lexical analysis
Lecture 03 lexical analysis
 
On qo s provisioning in context aware wireless sensor networks for healthcare
On qo s provisioning in context aware wireless sensor networks for healthcareOn qo s provisioning in context aware wireless sensor networks for healthcare
On qo s provisioning in context aware wireless sensor networks for healthcare
 
Data link control
Data link controlData link control
Data link control
 
Pnp mac preemptive slot allocation and non preemptive transmission for provid...
Pnp mac preemptive slot allocation and non preemptive transmission for provid...Pnp mac preemptive slot allocation and non preemptive transmission for provid...
Pnp mac preemptive slot allocation and non preemptive transmission for provid...
 
Qo s based mac protocol for medical wireless body area sensor networks
Qo s based mac protocol for medical wireless body area sensor networksQo s based mac protocol for medical wireless body area sensor networks
Qo s based mac protocol for medical wireless body area sensor networks
 
A reinforcement learning based routing protocol with qo s support for biomedi...
A reinforcement learning based routing protocol with qo s support for biomedi...A reinforcement learning based routing protocol with qo s support for biomedi...
A reinforcement learning based routing protocol with qo s support for biomedi...
 
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
Data centric multiobjective qo s-aware routing protocol (dm-qos) for body are...
 
Quality of service aware mac protocol for body sensor networks
Quality of service aware mac protocol for body sensor networksQuality of service aware mac protocol for body sensor networks
Quality of service aware mac protocol for body sensor networks
 
Library system
Library systemLibrary system
Library system
 
Multicastingand multicast routing protocols
Multicastingand multicast routing protocolsMulticastingand multicast routing protocols
Multicastingand multicast routing protocols
 
Fpga(field programmable gate array)
Fpga(field programmable gate array) Fpga(field programmable gate array)
Fpga(field programmable gate array)
 
Multicastingand multicast routing protocols
Multicastingand multicast routing protocolsMulticastingand multicast routing protocols
Multicastingand multicast routing protocols
 

Kürzlich hochgeladen

How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 

Kürzlich hochgeladen (20)

How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 

Lecture 02 lexical analysis

  • 1. 1
  • 2. Lexical Analysis • Basic Concepts & Regular Expressions • What does a Lexical Analyzer do? • How does it Work? • Formalizing Token Definition & Recognition • Reviewing Finite Automata Concepts • Non-Deterministic and Deterministic FA • Conversion Process • Regular Expressions to NFA • NFA to DFA • Relating NFAs/DFAs /Conversion to Lexical Analysis 2
  • 3. Lexical Analyzer in Perspective lexical analyzer parser symbol table source program token get next token Important Issue: • What are Responsibilities of each Box ? • Focus on Lexical Analyzer and Parser. 3
  • 4. Lexical Analyzer in Perspective • LEXICAL ANALYZER • Scan Input • Remove WS, NL, … • Identify Tokens • Create Symbol Table • Insert Tokens into ST • Generate Errors • Send Tokens to Parser • PARSER • Perform Syntax Analysis • Actions Dictated by Token Order • Update Symbol Table Entries • Create Abstract Rep. of Source • Generate Errors • And More…. (We’ll see later) 4
  • 5. What Factors Have Influenced the Functional Division of Labor ? • Separation of Lexical Analysis From Parsing Presents a Simpler Conceptual Model • A parser embodying the conventions for comments and white space is significantly more complex that one that can assume comments and white space have already been removed by lexical analyzer. • Separation Increases Compiler Efficiency • Specialized buffering techniques for reading input characters and processing tokens… • Separation Promotes Portability. • Input alphabet peculiarities and other device-specific anomalies can be restricted to the lexical analyzer. 5
  • 6. Introducing Basic Terminology • What are Major Terms for Lexical Analysis? • TOKEN • A pair consisting of a token name and an optional attribute value. • A particular keyword, or a sequence of input characters denoting identifier. • PATTERN • A description of a form that the lexemes of a token may take. • For keywords, the pattern is just a sequence of characters that form keywords. • LEXEME • Actual sequence of characters that matches pattern and is classified by a token 6
  • 7. Introducing Basic Terminology Token Sample Lexemes Informal Description of Pattern const if relation id num literal const if <, <=, =, < >, >, >= pi, count, D2 3.1416, 0, 6.02E23 “core dumped” const characters of i, f < or <= or = or < > or >= or > letter followed by letters and digits any numeric constant any characters between “ and “ except “ Classifies Pattern Actual values are critical. Info is : 1. Stored in symbol table 2. Returned to parser 7
  • 8. Attributes for Tokens • When more than one lexeme can match a pattern, a lexical analyzer must provide the compiler additional information about that lexeme matched. • In formation about identifiers, its lexeme, type and location at which it was first found is kept in symbol table. • The appropriate attribute value for an identifier is a pointer to the symbol table entry for that identifier. 8
  • 9. Attributes for Tokens Tokens influence parsing decision; The attributes influence the translation of tokens. Example: E = M * C ** 2 <id, pointer to symbol-table entry for E> <assign_op, > <id, pointer to symbol-table entry for M> <mult_op, > <id, pointer to symbol-table entry for C> <exp_op, > <num, integer value 2> 9
  • 10. Handling Lexical Errors • Its hard for lexical analyzer without the aid of other components, that there is a source-code error. • If the statement fi is encountered for the first time in a C program it can not tell whether fi is misspelling of if statement or a undeclared literal. • Probably the parser in this case will be able to handle this. • Error Handling is very localized, with Respect to Input Source • For example: whil ( x = 0 ) do generates no lexical errors in PASCAL 10
  • 11. Handling Lexical Errors • In what Situations do Errors Occur? • Lexical analyzer is unable to proceed because none of the patterns for tokens matches a prefix of remaining input. • Panic mode Recovery • Delete successive characters from the remaining input until the analyzer can find a well-formed token. • May confuse the parser – creating syntax error • Possible error recovery actions: • Deleting or Inserting Input Characters • Replacing or Transposing Characters 11
  • 12. Buffer Pairs • Lexical analyzer needs to look ahead several characters beyond the lexeme for a pattern before a match can be announced. • Use a function ungetc to push look-ahead characters back into the input stream. • Large amount of time can be consumed moving characters. Special Buffering Technique Use a buffer divided into two N-character halves N = Number of characters on one disk block One system command read N characters Fewer than N character => eof 12
  • 13. Buffer Pairs (2) • Two pointers lexeme beginning and forward to the input buffer are maintained. • The string of characters between the pointers is the current lexeme. • Initially both pointers point to first character of the next lexeme to be found. Forward pointer scans ahead until a match for a pattern is found • Once the next lexeme is determined, the forward pointer is set to the character at its right end. • After the lexeme is processed both pointers are set to the character immediately past the lexeme Lexeme_beginning forward Comments and white space can be treated as patterns that yield no token M=E eof2**C* 13
  • 14. Code to advance forward pointer 1. This buffering scheme works quite well most of the time but with it amount of lookahead is limited. 2. Limited lookahead makes it impossible to recognize tokens in situations where the distance, forward pointer must travel is more than the length of buffer. Pitfalls: 14 if forward at the end of first half then begin reload second half ; forward : = forward + 1; end else if forward at end of second half then begin reload first half ; move forward to beginning of first half end else forward : = forward + 1;
  • 15. Specification of Tokens 15 Regular expressions are an important notation for specifying lexeme patterns An alphabet is a finite set of symbols. • Typical example of symbols are letters, digits and punctuation etc. • The set {0, 1} is the binary alphabet. A string over an alphabet is a finite sequence of symbols drawn from that alphabet. • The length is string s is denoted as |s| • Empty string is denoted by ε Prefix: ban, banana, ε, etc are the prefixes of banana Suffix: nana, banana, ε, etc are suffixes of banana Kleene or closure of a language L, denoted by L*. • L*: concatenation of L zero or more times • L0: concatenation of L zero times • L+: concatenation of L one or more times
  • 16. Kleene closure L* denotes “zero or more concatenations of” L 16
  • 17. Example Let: L = { a, b, c, ..., z } D = { 0, 1, 2, ..., 9 } D+ = “The set of strings with one or more digits” L  D = “The set of all letters and digits (alphanumeric characters)” LD = “The set of strings consisting of a letter followed by a digit” L* = “The set of all strings of letters, including , the empty string” ( L  D )* = “Sequences of zero or more letters and digits” L ( ( L  D )* ) = “Set of strings that start with a letter, followed by zero or more letters and digits.” 17
  • 18. Rules for specifying Regular Expressions Regular expressions over alphabet  1.  is a regular expression that denotes {}. 2. If a is a symbol (i.e., if a ), then a is a regular expression that denotes {a}. 3. Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then a) (r) | (s) is a regular expression denoting L(r) U L(s). b) (r)(s) is a regular expression denoting L(r)L(s). c) (r)* is a regular expression denoting (L(r))*. d) (r) is a regular expression denoting L(r). 18
  • 19. How to “Parse” Regular Expressions • Precedence: • * has highest precedence. • Concatenation as middle precedence. • | has lowest precedence. • Use parentheses to override these rules. • Examples: • a b* = a (b*) • If you want (a b)* you must use parentheses. • a | b c = a | (b c) • If you want (a | b) c you must use parentheses. • Concatenation and | are associative. • (a b) c = a (b c) = a b c • (a | b) | c = a | (b | c) = a | b | c • Example: • b d | e f * | g a = (b d) | (e (f *)) | (g a) 19
  • 20. Example • Let  = {a, b} • The regular expression a | b denotes the set {a, b} • The regular expression (a|b)(a|b) denotes {aa, ab, ba, bb} • The regular expression a* denotes the set of all strings of zero or more a’s. i.e., {, a, aa, aaa, ….. } • The regular expression (a|b)* denotes the set containing zero or more instances of an a or b. • The regular expression a|a*b denotes the set containing the string a and all strings consisting of zero or more a’s followed by one b. 20
  • 21. Regular Definition • If Σ is an alphabet of basic symbols then a regular definition is a sequence of the following form: d1r1 d2r2 …….. dnrn where • Each di is a new symbol such that di  Σ and di dj where j < I • Each ri is a regular expression over Σ  {d1,d2,…,di-1) 21
  • 23. Addition Notation / Shorthand 23
  • 24. UnsignedNumber 1240, 39.45, 6.33E15, or 1.578E-41 digit  0 | 1 | 2 | … | 9 digits  digit digit* optional_fraction  . digits |  optional_exponent  ( E ( + | -| ) digits) |  num  digits optional_fraction optional_exponent digit  0 | 1 | 2 | … | 9 digits  digit+ optional_fraction  (. digits ) ? optional_exponent  ( E ( + | -) ? digits) ? num  digits optional_fraction optional_exponent Shorthand 24
  • 25. Token Recognition How can we use concepts developed so far to assist in recognizing tokens of a source language ? Assume Following Tokens: if, then, else, relop, id, num Given Tokens, What are Patterns ? if  if then  then else  else relop  < | <= | > | >= | = | <> id  letter ( letter | digit )* num  digit + (. digit + ) ? ( E(+ | -) ? digit + ) ? Grammar: stmt  |if expr then stmt |if expr then stmt else stmt | expr  term relop term | term term  id | num 26
  • 26. What Else Does Lexical Analyzer Do? Scan away blanks, new lines, tabs Can we Define Tokens For These? blank  blank tab  tab newline  newline delim  blank | tab | newline ws  delim + In these cases no token is returned to parser 27
  • 27. Overall ws if then else id num < <= = < > > >= - if then else id num relop relop relop relop relop relop - - - - pointer to table entry Exact value LT LE EQ NE GT GE Note: Each token has a unique token identifier to define category of lexemes 28
  • 28. Constructing Transition Diagrams for Tokens • Transition Diagrams (TD) are used to represent the tokens • As characters are read, the relevant TDs are used to attempt to match lexeme to a pattern • Each TD has: • States : Represented by Circles • Actions : Represented by Arrows between states • Start State : Beginning of a pattern (Arrowhead) • Final State(s) : End of pattern (Concentric Circles) • Edges: arrows connecting the states • Each TD is Deterministic (assume) - No need to choose between 2 different actions ! 29
  • 29. Example TDs start other => 0 6 7 8 * RTN(GT) RTN(GE)> = : We’ve accepted “>” and have read one extra char that must be unread. 30
  • 30. Example : All RELOPs start < 0 other = 6 7 8 return(relop, LE) 5 4 > = 1 2 3 other > = * * return(relop, NE) return(relop, LT) return(relop, EQ) return(relop, GE) return(relop, GT) 31
  • 31. Example TDs : id and delim id : delim : start delim 28 other 3029 delim * return( get_token(), install_id()) start letter 9 other 1110 letter or digit * Either returns ptr or “0” if reserved 32
  • 32. Example TDs : Unsigned #s 1912 1413 1615 1817 start otherdigit . digit E + | - digit digit digit digit E digit * start digit 25 other 2726 digit * start digit 20 * . 21 digit 24 other 23 digit digit 22 * Questions: Is ordering important for unsigned #s ? Why are there no TDs for then, else, if ? return(num, install_num()) 33
  • 33. QUESTION : What would the transition diagram (TD) for strings containing each vowel, in their strict lexicographical order, look like? 34
  • 34. Answer cons  B | C | D | F | G | H | J | … | N | P | … | T | V | .. | Z string  cons* A cons* E cons* I cons* O cons* U cons* otherUOIEA consconsconsconsconscons start error accept Note: The error path is taken if the character is other than a cons or the vowel in the lex order. 35
  • 35. Capturing Multiple Tokens Capturing keyword “begin” Capturing variable names What if both need to happen at the same time? b e g i n WS WS – white space A – alphabetic AN – alphanumericA AN WS start start 36
  • 36. Capturing Multiple Tokens b e g i n WS WS – white space A – alphabetic AN – alphanumeric A-b AN WS AN Machine is much more complicated – just for these two tokens! start 37
  • 37. Finite State Automata (FSAs) • “Finite State Machines”, “Finite Automata”, “FA” • A recognizer for a language is a program that takes as input a string x and answers “yes” if x is a sentence of the language and “no” otherwise. • The regular expression is compiled into a recognizer by constructing a generalized transition diagram called a finite automaton. • Each state is labeled with a state name • Directed edges, labeled with symbols • Two types • Deterministic (DFA) • Non-deterministic (NFA) 38
  • 38. Nondeterministic Finite Automata A nondeterministic finite automaton (NFA) is a mathematical model that consists of 1. A set of states S 2. A set of input symbols  3. A transition function that maps state/symbol pairs to a set of states 4. A special state s0 called the start state 5. A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no 39
  • 39. Example – NFA : (a|b)*abb S = { 0, 1, 2, 3 } s0 = 0 F = { 3 }  = { a, b } start 0 3b21 ba a b s t a t e i n p u t 0 1 2 a b { 0, 1 } -- { 2 } -- { 3 } { 0 }  (null) moves possible ji  Switch state but do not use any input symbol Transition Table 40
  • 40. How Does An NFA Work ? start 0 3b21 ba a b • Given an input string, we trace moves • If no more input & in final state, ACCEPT EXAMPLE: Input: ababb move(0, a) = 1 move(1, b) = 2 move(2, a) = ? (undefined) REJECT ! move(0, a) = 0 move(0, b) = 0 move(0, a) = 1 move(1, b) = 2 move(2, b) = 3 ACCEPT ! -OR- 41
  • 41. Handling Undefined Transitions We can handle undefined transitions by defining one more state, a “death” state, and transitioning all previously undefined transition to this death state. start 0 3b21 ba a b 4 a, b a a  42
  • 42. Other Concepts start 0 3b21 ba a b Not all paths may result in acceptance. aabb is accepted along path : 0  0  1  2  3 BUT… it is not accepted along the valid path: 0  0  0  0  0 43
  • 43. Deterministic Finite Automata A DFA is an NFA with the following restrictions: •  moves are not allowed • For every state s S, there is one and only one path from s for every input symbol a  . Since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm. s  s0 c  nextchar; while c  eof do s  move(s,c); c  nextchar; end; if s is in F then return “yes” else return “no” 44
  • 44. Example – DFA : (a|b)*abb start 0 3b21 ba a b start 0 3b21 ba b a b a a What Language is Accepted? Recall the original NFA: 45
  • 45. Relation between RE, NFA and DFA 1. There is an algorithm for converting any RE into an NFA. 2. There is an algorithm for converting any NFA to a DFA. 3. There is an algorithm for converting any DFA to a RE. These facts tell us that REs, NFAs and DFAs have equivalent expressive power. All three describe the class of regular languages. 46
  • 46. NFA vs DFA • An NFA may be simulated by algorithm, when NFA is constructed from the R.E • Algorithm run time is proportional to |N| * |x| where |N| is the number of states and |x| is the length of input • Alternatively, we can construct DFA from NFA and uses it to recognize input • The space requirement of a DFA can be large. The RE (a+b)*a(a+b)(a+b)….(a+b) [n-1 (a+b) at the end] has no DFA with less than 2n states. Fortunately, such RE in practice does not occur often space required O(|r|) O(|r|*|x|) O(|x|)O(2|r|)DFA NFA time to simulate where |r| is the length of the regular expression. 47