SlideShare ist ein Scribd-Unternehmen logo
1 von 116
Downloaden Sie, um offline zu lesen
Department of CE
Prof. Happy Chapla
Unit no : 2
Lexical Analyser
LexicalAnalyser
CD:COMPILER DESIGN
Outline :
Introduction to Lexical Analyser
Tokens, Lexemes, and Patterns
Specification of Tokens
Regular expression and Regular Definition
Transition Diagram
Hard coding and automatic generation of lexical analysers
Finite Automata
Regular expression to NFA using Thompson’s rule
NFA to DFA conversion using subset construction method
DFA Optimization
Regular expression to DFA conversion
Department of CE
Unit no : 2
Lexical Analyser
Prof. Happy Chapla
Role of
Lexical
Analyser
 Remove comments and white spaces in the form of
blanks, tabs, and newline characters (aka scanning)
 Macros expansion
 Read input characters from the source program
 Group them into lexemes
 Produce as output a sequence of tokens
 Interact with the symbol table
 Correlate error messages generated by the compiler
with the source program
Scanner –
Parser
Interaction
 After receiving a “Get next token” command from
parser, the lexical analyzer reads the input character
until it can identify the next token.
Why
separating
Lexical and
Syntactic?
 Simplicity of design
 Improved compiler efficiency
 Allows us to use specialized technique for lexer, not
suitable for parser
 Higher portability
 Input-device-specific peculiarities restricted to lexer
Tokens,
Lexemes and
Patterns
 Lexeme:
A lexeme is a sequence of characters in the source
program that is matched by the pattern for a token.
 Pattern:
A set of strings in the input for which the same token is
produced as output. This set of strings is described by a
rule called a pattern associated with the token.
 Token:
Token is a sequence of characters that can be treated as
a single logical entity. Typical tokens
are,
1) Identifiers 2) keywords 3) operators 4) special
symbols 5)constants
Example
Token lexeme pattern
else else characters e, l, s, e
if if characters i, f
comparision <=, < >, >=, > < or <= or = or < > or >=
id pi, s1, Mj5 letter
followed by letters & digit
num 3.14, 0,
3.09e17
any numeric constant
literal "core" any character b/w “and
“except"
Token
Classes
We have different token classes are possible:
 One token per keyword
 Tokens for the operators
 One token representing all identifiers
 Tokens representing constants (e.g. numbers)
 Tokens for punctuation symbols
Example
In C program, the variable declaration line as:
int value = 100;
 int (keyword)
 value(identifier)
 = (operator)
 100 (constant)
 ; (symbol)
Exercise:
if(y <= t)
y = y – 3;
Example
Total = Ans + 30
 Tokens:
 Total : Identifier 1
 = : Operator 1
 Ans : Identifier 2
 + : Operator 2
 30 : Constant 1
 Lexems:
 Lexems of identifiers : Total, Ans
 Lexems of operators : =, +
 Lexems of constant : 30
Dealing with
errors
How lexical analyser deals with errors?
 Lexical analyser unable to proceed: no pattern
matches
 Panic mode recovery: delete successive characters
from remaining input until token found
 Insert missing character
 Delete a character
 Replace character by another
 Transpose two adjacent characters
Specification of tokens
Specification
of tokens
 There are 3 specifications of tokens:
 Strings
 Language
 Regular expression
 Strings and Languages:
 An alphabet or character class is a finite set of
symbols.
 A string over an alphabet is a finite sequence of
symbols drawn from that alphabet.
 A language is any countable set of strings over
some fixed alphabet.
Operations of
Strings
 A prefix of string s is any string obtained by removing
zero or more symbols from the end of string s. For
example, ban is a prefix of banana.
 A suffix of string s is any string obtained by removing zero
or more symbols from the beginning of s. For example,
nana is a suffix of banana.
 A substring of s is obtained by deleting any prefix and any
suffix from s. For example, nan is a substring of banana.
 The proper prefixes, suffixes, and substrings of a string
s are those prefixes, suffixes, and substrings, respectively
of s that are not ε or not equal to s itself.
 A subsequence of s is any string formed by deleting zero
or more not necessarily consecutive positions of s. For
example, baan is a subsequence of banana.
Exercise
 Write prefix, suffix, substring, proper prefix, proper
suffix and subsequence for following strings:
 Example
 Revolution
Operations of
Languages
The following are the operations that can be applied to
languages:
(Applying these operations on L and S)
 Union (L U S) : {𝑡 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑡 𝑖𝑠 𝑖𝑛 𝑆 }
 Concatenation (LS) : {𝑡𝑧 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑧 𝑖𝑠 𝑖𝑛 𝑆 }
 Kleene Closure (L*) :
𝐿
∗
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
 Positive Closure (L+) :
𝐿
+
𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
Example
Let L = {0, 1} and S = {a, b, c}
 Union : L ∪ S = {0, 1, a, b, c}
 Concatenation : L . S = {0a, 1a, 0b, 1b, 0c, 1c}
 Kleene Closure : L* = {ε, 0, 1, 00,……}
 Positive Closure : L+ = {0, 1, 00,……}
Language:
A language is a set of strings.
Example: Even numbers
Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (All possible symbols for given language)
L = {0, 2, 4, 6, 8, 10, 12, 14, … } (Elements present in language)
Example: Variable name in C language
Σ = ASCII characters
L = {a, b, c, …, A, B, C, …, _, aa, ab, … }
Regular Language:
• A subset of all languages that can be defined by regular expressions.
• Any character is a regular expression matching itself. (a is a regular expression for
character a)
• ε is a regular expression matching the empty string.
Operations on Regular Language:
If R1 and R2 are two regular expressions, then:
• R1R2 is a regular expression matching the concatenation of the languages.
• R1 | R2: is a regular expression matching the disjunction of the languages.
• R1*: is a regular expression matching the Kleene closure of the language (0 or more
occurrences ).
• (R): is a regular expression matching R.
Example (Let Σ = {a, b})
• The regular expression a|b denotes the language _________.
Ans: {a, b}
• (a|b)(a|b) denotes ___________
Ans: {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ.
Another regular expression for the same language is aa|ab|ba|bb.
• a* denotes the language consisting of ______________
Ans: all strings of zero or more a's, that is, {ε,a,aa,aaa,...}
Example (Let Σ = {0, 1})
• (0|1)* denotes the set of all strings _____________
Ans: Strings containing zero or more instances of 0 or 1, that is, all strings of 0's and 1's:
{e, 0, 1, 00, 01, 10, 11,...}.
Another regular expression for the same language is (0*1*)*
• 0|0*1 denotes the language __________
Ans: {0, 1, 01, 001, 0001,...}, that is, the string 0 and all strings consisting of zero or more
0's and ending with 1.
Regular Expression
&
Regular Definition
Regular
Expression
Regular Expression is a sequence of characters that
define a pattern.
Notations:
 One or more instances : +
 Zero or more instances: *
 Zero or one instance: ?
 Alphabets : Σ
 Regular expression r and regular language for it is L(r)
 (abc): “abc” occurred together in a regular expression
 [abc]: a, b, c any one of these or all of these are present in
regular expression
Regular
Expression
Rules to define Regular Expression:
 Φ is a regular expression for empty set.
 ε is a regular expression, and L(ε) is { ε }, that is, the
language whose sole member is the empty string.
 If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and
L(a) = {a}.
 Suppose r and s are regular expressions denoting the
languages L(r) and L(s). Then,
 (r)|(s) is a regular expression denoting the language L(r) U
L(s).
 (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
 (r) is a regular expression denoting L(r).
Regular
Expression
 L = Zero or more occurences of a = a*
 a* = {ε, a, aa, aaa, aaaa,……} (Infinite elements)
 L = One or more occurences of a = a+
 a+ = {a, aa, aaa, aaaa,……} (Infinite elements)
Regular
Expression
Algebraic or identity rules of regular expressions:
(R and S are two regular expressions)
 R + S = S + R + or | is commutative
 (R + S) + T = R + (S + T) + or | is associative
 (RS)T = R(ST) concatenation is associative
 (R + S)T = RT + ST concatenation distributes over |
 ϵ.R = R.ϵ = R ϵ is identity element for concatenation
 R* = (ϵ + R)* relation between * and ϵ
 R** = R* * is idempotent
 RR* = R+
 R? = (R | ϵ) 0 or 1 occurrence
 [a – z] (a|b|…|z) 1 character from the given range
 [acdgj] (a|c|d|g|) 1 of the given characters
Exercise: (For given RE define its meaning or list down which are the valid string
for that RE)
• [012]+
Ans: (0|1|2)+ String can have one or more instances of 0 or 1 or 2. All strings of 0’s and
1’s and 2’s: {0, 1, 2, 00, 01, 02, 10, 11, 12,….}
• [0 – 9]+
Ans: All possible combinations of strings containing elements from 0 to 9. Atleast one
instance is required.
• [1 – 9][0 – 9]+
Ans: Starting symbol must be any number from 1 to 9. After that it can have any number
of instances of any number from 0 to 9.
Resultant set will have strings of length at least 2.
• [a – z A – Z][a – z A – Z 0 – 9]*
Ans: Starting with alphabets and string of length 1 {a, z, A, C} are also valid strings here.
Precedence
and
Associativity
 The unary operator * has highest precedence and is
left associative.
 Concatenation has second highest precedence and is
left associative.
 | has lowest precedence and is left associative.
Examples (Regular Expression)
Language String Regular Expression
0 or 1 0, 1 0 | 1
1 or 10 or 111 1, 10, 111 1 | 10 | 111
Strings having one or more 0 0, 00, 000, 0000,…… 0+
All possible binary strings
over Σ = {0, 1}
0, 1, 00, 01, 10, 11,
000,……
(0 | 1)+
All possible strings of length 3
over Σ = {a, b, c}
aaa, aba, abc, abb,…… (a|b|c) (a|b|c) (a|b|c)
Language String Regular Expression
One or more occurrences of
0 or 1 or both
0, 1, 00, 01, 10, 11, 111,
101,……
(0 | 1)+
Binary string ending with 0 0, 10, 100, 110, 00, 010,…… (0 | 1)* 0
Binary string starting with 1 1, 10, 100, 110, 101,
1101,……
1 (0 | 1)*
Binary string starting with 1
and ending with 0
10, 110, 100, 110, 1100,
1110, 1000,……
1 (1|0)* 0
String starting and ending
with same character for Σ =
{0, 1}
00, 11, 010, 000, 101, 1101,
0110, 1011,……
1 (1|0)* 1 or 0 (1|0)* 0
String ending with 01 for Σ
= {0, 1}
01, 101, 001, 1001,
1101,……
(0|1)*01
Language String Regular Expression
Language consisting of
exactly two 0’s for Σ = {0, 1}
00, 010, 000, 001, 0101,…… 1*01*01*
All binary strings with
length at least 3 for Σ = {0,
1}
000, 010, 110, 1111,
1011,……
(0|1) (0|1) (0|1) (0|1)*
All binary strings where 2nd
symbol from starting is 0
for Σ = {0, 1}
00, 10, 101, 100,…… (0|1)0(0|1)*
Any number of a’s followed
by any number of b’s
followed by any number of
c’s
ε, abc, aaabc, abbbbbc,
abccc, ab, accc,……
a*b*c*
Exercise
Write regular expression for language specified
Over Σ = {0, 1}
 Strings having even length
Ans: RE: ((0|1)(0|1))*
 String containing exactly three 1’s
Ans: RE: 0*10*10*10*
 String starting with 0 and having odd length
Ans: RE: 0((0|1)(0|1))*
 String starting or ending with 01 or 111
Ans: RE: (01|111)(0|1)* | (0|1)*(01|111)
Regular
Definition
 For notational convenience, we may give names to certain
regular expressions and use those names in subsequent
expressions, as if the names were themselves symbols.
 These names are known as regular definition.
 Regular definition is a sequence of definitions of the form:
𝑑1 → 𝑟1
𝑑2 → 𝑟2
……
𝑑𝑛 → 𝑟𝑛
Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.
Example
Regular definition for identifier
 letter → A|B|C|………..|Z|a|b|………..|z
 digit → 0|1|…….|9
 id → letter (letter | digit)*
Regular Definition for Even Numbers
 (+|-|ε) (0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)
 Sign → + | -
 OptSign → Sign | ε //In other words (Sign ?)
 Digit → [0 – 9] //In other words (0 | 1 | ….. | 9)
 EvenDigit → [02468] //In other words (0 | 2 | 4 | 6 | 8)
 EvenNumber → OptSign Digit* EvenDigit
Example
Regular Definition for Unsigned Numbers
 digit → 0 | 1 | ..... | 9
 digits → digit digit*
 optionalFraction → .digits | ε
 optionalExponent →( E ( + | - | ε ) digits ) | ε
 number → digits optionalFraction optionalExponent
This can be simplified as:
 digit → [0-9]
 digits → digit+
 number → digits (. digits)? ( E [+ | -]? digits )?
Transition Diagram
Transition
Diagram
 Transition diagram is a special kind of flowchart for
language analysis.
 In transition diagram the boxes of flowchart are
drawn as circle and called as states.
 States are connected by arrows called as edges.
 The label or weight on edge indicates the input
character that can appear after that state
Transition
Diagram
 Symbolized representation of transition diagram uses:
is a state
is a transition
is a start state
is a final state
<
0
6
1 2
3
4
5
8
7
=
other
>
=
other
=
>
Return (relop,LE)
Return (relop,NE)
Return (relop,LT)
Return (relop,GE)
Return (relop,GT)
Return (relop,EQ)
Transition Diagram : Example
Transition Diagram for Relational Operators
8
7
4
3
2
5
1 2 8
other
digit
3 4 5 6 7
digit
digit
digit
+or -
digit
digit
E
.
start
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Transition Diagram : Example
Transition Diagram for Unsigned Numbers
E digit
8
Hard Coding
and
Automatic Generation
of
Lexical Analyser
 Lexical analyser helps in identifying the pattern from the input.
 Transition diagram is constructed to recognize the patterns. It is known as hard
coding lexical analyzer.
Example:
 To represent identifier in ‘C’, the first character must be letter and other characters
are either letter or digits. To recognize this pattern, hard coding lexical analyzer
works with this transition diagram:
 Lex and flex are compiler tools which takes regular expression as an input and finds
out the pattern, matching to that expression.
2 3
Start
Letter or digit
Letter
1
Finite Automata
Implementing Regular Expression:
• Regular expressions can be implemented using finite
automata.
• There are two kinds of finite automata:
• NFAs (nondeterministic finite automata)
• DFAs (deterministic finite automata)
The step of implementing the lexical analyzer
1. Lexical Specification
2. Regular Expression
3. NFA
4. DFA
5. Table-Driven DFA
Finite State
Automaton
Finite State
Automaton
A finite set of states present in FSM
• One marked as initial state
• One or more marked as final states
• States sometimes labeled or numbered
A set of transitions from one state to another
• Each labeled with symbol from Σ (possible symbols), or ε
Operate by reading input symbols
• Transition can be taken if labelled with current symbol
• ε-transition can be taken at any time
Accept when final state reached & no more input
Reject if no transition possible, or no more input and not in
final state (DFA)
Finite
Automata
 We call the recognizer of the tokens as a finite
automaton.
 FA results in “yes” or “no” based on each input string.
 FA consist of:
 S : Set of states
 𝜮 : Set of input symbol
 move : A transition function
 S0 : Initial state
 F : Accepting state (Final state)
Finite
Automata
 A finite automaton can be: deterministic (DFA) or
non-deterministic (NFA)
 Both deterministic and non-deterministic finite
automaton recognize regular sets.
 Deterministic – faster recognizer, but it may take
more space
 Non-deterministic – slower, but it may take less space
 Deterministic automatons are widely used lexical
analysers.
 Deterministic finite automata (DFA): From each state exactly one edge leaving out
(for each symbol).
 Nondeterministic finite automata (NFA): There are no restrictions on the edges
leaving a state. There can be several with the same symbol as label and some edges
can be labeled with 𝜖.
1 2 3 4
a b b
a
b
1 2 3 4
a b b
a
a
a
b
DFA NFA
b
Regular Expression to NFA
(Thompson’s Rule)
Base Cases
(For NFA)
Construction
for s|t
Construction
for st
Construction
for s*
Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 2 3
a b
1
2
5
3
4
6
a
b
𝜖
𝜖 𝜖
𝜖
a*
ab
(a|b)
Example
(RE to NFA)
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑎
1 4
𝜖 𝜖
𝜖
𝜖
2 3
𝑏
6
5
𝑎 𝑏
5
𝑏
a*b
b*ab
Example
(RE to NFA)
1
2
5
3
6
𝜖
𝜖 𝜖
𝜖
4
d
c
𝜖
𝜖
𝜖
0 7
𝜖
(c|d)*
Exercise
 abab
 a*|b*
 abb(a)*ba
 (ab + b)*ba
 (a + b)*aa(a+b)
NFA to DFA conversion
(Using subset construction method)
Subset
Construction
Algorithm
 Input : NFA (N)
 Output : DFA (D) (Accepting the same language)
 Method : Apply Algorithm, Make Tansition Table, Dtran for
DFA .
Operations to perform:
Operation Description
Є – closure(s) Set of NFA States reachable from NFA State s on Є –
transition alone.
Є – closure(T) Set of NFA States reachable from some NFA State s in T on Є
–transition alone.
Move (T,a) Set of NFA states to which there is a transition on input
symbol a from some NFA state s in T.
Subset
Construction
Algorithm
Initially Є –closure (s0) be the only state in Dstates and it is
unmarked;
While there is unmarked states in T in Dstates do begin
Mark T;
for each input symbol a do begin
U = Є –closure (move (T,a));
If U is not in Dstates then
add U as unmarked state to Dstates;
Dtran [ T, a ] = U
end
end
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
Є closure (0) = {0,1,2,4,7}  A
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
State a b
A = {0,1,2,4,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,a) = {3,8}
Є closure (Move(A,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}
State a b
A = {0,1,2,4,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
A = {0,1,2,4,7}
Move (A,b) = {5}
Є closure (Move(A,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,a) = {3,8}
Є Closure (Move (B,a)) = { 3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
B = {1,2,3,4,6,7,8}
Move (B,b) = {5,9}
Є Closure (Move (B,b)) = {5,6,7,1,2,4,9}
= {1,2,4,5,6,7,9}  D
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,a) = {3,8}
Є Closure (Move (c,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
C = {1,2,4,5,6,7}
Move (C,b) = {5}
Є Closure (Move (C,b)) = {5,6,7,1,2,4}
= {1,2,4,5,6,7}  C
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
Move (D,a) = {3,8}
Є Closure (Move (D,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
D = {1,2,4,5,6,7,9}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
Move (D,b) = {5,10}
Є Closure (Move (D,b)) = {5,6,7,1,2,4,10}
= {1,2,4,5,6,7,10}  E
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10}
Move (E,a) = {3,8}
Є Closure (Move (E,a)) = {3,6,7,1,2,4,8}
= {1,2,3,4,6,7,8}  B
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B
Move (E,b) = {5}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B
Move (E,b) = {5}
Example : (a|b)*abb
0 1
2 3
4 5
6 7 8 9 10
Є
Є
Є
Є
Є
Є
a
b
a b b
Є
Є
E = {1,2,4,5,6,7,10}
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Move (E,b) = {5}
Example : (a|b)*abb
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Convert NFA to DFA : (a|b)*abb
Є –closure (0) = {0,1,2,4,7}  Let A
Move (A,a) = {3,8}
Є –closure ((A,a)) = {1,2,3,4,6,7,8}  Let B
Move (A,b) = {5}
Є –closure ((A,b)) = {1,2,4,5,6,7}  Let C
Move (B,a) = {3,8}
Є –closure ((B,a)) = {1,2,3,4,6,7,8}  Let B
Move (B,b) = {5,9}
Є –closure ((B,b)) = {1,2,4,5,6,7,9}  Let D
Move (C,a) = {3,8}
Є –closure ((C,a)) = {1,2,3,4,6,7,8}  Let B
Move (C,b) = {5}
Є –closure ((C,b)) = {1,2,4,5,6,7}  Let C
Convert NFA to DFA : (a|b)*abb (Cont…)
Move (D,a) = {3,8}
Є –closure ((D,a)) = {1,2,3,4,6,7,8}  Let B
Move (D,b) = {5,10}
Є –closure ((D,b)) = {1,2,4,5,6,7,10}  Let E
Move (E,a) = {3,8}
Є –closure ((E,a)) = {1,2,3,4,6,7,8}  Let B
Move (E,b) = {5}
Є –closure ((E,b)) = {1,2,4,5,6,7}  Let C
Convert NFA to DFA : (a|b)*abb (Cont…)
State a b
A = {0,1,2,4,7} B C
B= {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E
E = {1,2,4,5,6,7,10} B C
Convert NFA to DFA : (a|b)*abb (Cont…)
Example
 Convert following regular expression to DFA using
subset construction method:
 (0 + 1)*1(0 + 1)
 (0 + 1)*01*
DFA Optimization
DFA
Optimization
 The procedure can also be known as minimization of DFA.
 Minimization/optimization refers to the detection of those
states of a DFA, whose presence or absence in a DFA does not
affect the language accepted by the automata.
 The states that can be eliminated from automata, without
affecting the language accepted by automata, are:
 Unreachable or inaccessible states
 Dead states
 Non-distinguishable or indistinguishable state or equivalent
states.
 Partitioning algorithm helps in DFA optimization process.
Partitioning
Algorithm
1. Remove all the states that are unreachable from initial
state via any set of transition of DFA.
2. Draw the transition table for all pair of states.
3. Now, split the transition table into two tables T1 and T2.
1. T1 contains all final states
2. T2 contains non-final states
4. Find similar rows from T1 such that;
𝛿(q, a) = p
𝛿(r, a) = p
i.e. find the two states which have same value of a and b and
remove one of them
Continued…
5. Repeat step 3 until we find no similar rows available in T1
6. Repeat step 3 and step 4 for table T2 also.
7. Now combine the reduced T1 and T2 tables.
i.e. the final transition table of minimized DFA.
DFA Optimization
A B C
B B D
C B C
D B E
E B C
States a b
{𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
Nonaccepting States
{𝐴, 𝐵, 𝐶, 𝐷}
Accepting States
{𝐸}
{𝐴, 𝐵, 𝐶} {𝐷}
{𝐴, 𝐶} {𝐵}
• Now no more splitting is possible.
• If we chose A as the representative for group
(AC), then we obtain reduced transition
table
A B A
B B D
D B E
E B A
States a b
Optimized
TransitionTable
Conversion from regular
expression to DFA
Function
computed
from syntax
tree
 nullable (n): Is true for * node and node labeled with Ɛ. For
other nodes it is false.
 firstpos (n): Set of positions at node ti that corresponds to
the first symbol of the sub-expression rooted at n.
 lastpos (n): Set of positions at node ti that corresponds to
the last symbol of the sub-expression rooted at n.
 followpos (i): Set of positions that follows given position by
matching the first or last symbol of a string generated by
sub-expression of the given regular expression.
Rules to compute nullable, firstpos, lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled by  true ∅ ∅
A leaf with position 𝒊 false {𝑖} {𝑖}
nullable(c1)
or
nullable(c2)
firstpos(c1)

firstpos(c2)
lastpos(c1)

lastpos(c2)
|
n
c1
nullable(c1)
and
nullable(c2)
if (nullable(c1))
thenfirstpos(c1) 
firstpos(c2)
else firstpos(c1)
if (nullable(c2)) then
lastpos(c1)  lastpos(c2)
else lastpos(c2)
n
true firstpos(c1) lastpos(c1)
∗
n
c2
.
c1 c2
c1
Computation
of followpos
The position of regular expression can follow another in the
following ways:
 If n is a cat node with left child c1 and right child c2, then for
every position i in lastpos(c1), all positions in firstpos(c2) are
in followpos(i).
 For cat node, for each position i in lastpos of its left
child, the firstpos of its right child will be in followpos(i).
 If n is a star node and i is a position in lastpos(n), then all
positions in firstpos(n) are in followpos(i).
 For star node, the firstpos of that node is in f ollowpos of all
positions in lastpos of that node.
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
𝟏 𝟐
𝑎
.
.
.
𝑏
𝑏
#
(a|b)*abb
𝟒
𝟑
𝟓
𝟔
#
Step 2: Nullable node
Here, * is only nullable node
Step 1: Construct SyntaxTree
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
{1} {2}
{1,2}
{1,2} 𝑎
{3}
{1,2,3}
.
{4}
{1,2,3}
.
{5}
{1,2,3}
.
{6}
{1,2,3}
𝑏
𝑏
#
Step 3: Calculate firstpos
Firstpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
firstpos(c1)  firstpos(c2)
∗
n
c1
firstpos(c1)
if (nullable(c1))
firstpos(c1)  firstpos(c2)
else firstpos(c1)
.
n
c1 c2
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
Conversion from regular expression to DFA
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 3: Calculate lastpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
Lastpos
A leaf with position 𝒊 = {𝒊}
|
n
c1 c2
lastpos(c1)  lastpos(c2)
∗
n
c1
lastpos(c1)
if (nullable(c2))
lastpos(c1)  lastpos(c2)
else lastpos(c2)
.
n
c1 c2
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {5}
.
{6} {6}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {4}
.
{5} {5}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {4}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 5
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 4 = 5
4 5
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2,3} {3}
.
{4} {4}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {3}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 4
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 3 = 4
4 5
3 4
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
{1,2} {1,2}
.
{3} {3}
𝒄𝟏 𝒄𝟐
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 3
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 3
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 3
4 5
3 4
2 3
1 3
Firstpos
Lastpos
Conversion from regular expression to DFA
Position followpos
𝑎 𝑏
|
∗
.
{1} {1} {2} {2}
{1,2} {1,2}
{1,2} {1,2} 𝑎
{3} {3}
{1,2,3} {3}
.
{4} {4}
{1,2,3} {4}
.
{5} {5}
{1,2,3} {5}
.
{6} {6}
{1,2,3} {6}
𝑏
𝑏
#
Step 4: Calculate followpos
𝟏 𝟐
𝟒
𝟑
𝟓
𝟔
5 6
𝒏
𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2}
𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2
𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2
4 5
3 4
2 3
1 3
{1,2} {1,2}
*
1,2,
1,2,
Firstpos
Lastpos
Construct DFA
Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A
State A
δ( (1,2,3),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4}
Construct DFA
State B
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,4),b) = followpos(2) U followpos(4)
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
=(1,2,3) U (6) = {1,2,3,6} ----- D
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6}
Construct DFA
State D
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
=(1,2,3) U (4) = {1,2,3,4} ----- B
δ( (1,2,3,6),b) = followpos(2)
=(1,2,3) ----- A
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA
Construct DFA
Position followpos
5 6
4 5
3 4
2 1,2,3
1 1,2,3
States a b
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
A B C D
a b b
b
a
a
b
a
DFA
Note: Elements of E contains state 10 that is acceptance
state in NFA. So, State E is acceptance state
Thanks
Prof. Happy Chapla

Weitere ähnliche Inhalte

Was ist angesagt?

Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture NotesFellowBuddy.com
 
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on plSampath Kumar S
 
SOC Verification using SystemVerilog
SOC Verification using SystemVerilog SOC Verification using SystemVerilog
SOC Verification using SystemVerilog Ramdas Mozhikunnath
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.Malek Sumaiya
 
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfLecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfNiraliRajeshAroraAut
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaj Gupta
 
Context free languages
Context free languagesContext free languages
Context free languagesJahurul Islam
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal formsAkila Krishnamoorthy
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Srimatre K
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMSkoolkampus
 

Was ist angesagt? (20)

Flat unit 3
Flat unit 3Flat unit 3
Flat unit 3
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl
3.6 &amp; 7. pumping lemma for cfl &amp; problems based on pl
 
Role-of-lexical-analysis
Role-of-lexical-analysisRole-of-lexical-analysis
Role-of-lexical-analysis
 
SOC Verification using SystemVerilog
SOC Verification using SystemVerilog SOC Verification using SystemVerilog
SOC Verification using SystemVerilog
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
 
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdfLecture-18(11-02-22)Stochastics POS Tagging.pdf
Lecture-18(11-02-22)Stochastics POS Tagging.pdf
 
Unit 4
Unit 4Unit 4
Unit 4
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Relational algebra in dbms
Relational algebra in dbmsRelational algebra in dbms
Relational algebra in dbms
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 
Parse Tree
Parse TreeParse Tree
Parse Tree
 
Context free languages
Context free languagesContext free languages
Context free languages
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal forms
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMS
 

Ähnlich wie Chapter2CDpdf__2021_11_26_09_19_08.pdf

compiler Design course material chapter 2
compiler Design course material chapter 2compiler Design course material chapter 2
compiler Design course material chapter 2gadisaAdamu
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt2_2Specification of Tokens.ppt
2_2Specification of Tokens.pptRatnakar Mikkili
 
Automata
AutomataAutomata
AutomataGaditek
 
Automata
AutomataAutomata
AutomataGaditek
 
Theory of computing
Theory of computingTheory of computing
Theory of computingRanjan Kumar
 
Theory of Automata ___ Basis ...........
Theory of Automata ___ Basis ...........Theory of Automata ___ Basis ...........
Theory of Automata ___ Basis ...........NaumanAli215439
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysisIffat Anjum
 
theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 028threspecter
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)Jagjit Wilku
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler designRiazul Islam
 
The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxssuser039bf6
 

Ähnlich wie Chapter2CDpdf__2021_11_26_09_19_08.pdf (20)

Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
compiler Design course material chapter 2
compiler Design course material chapter 2compiler Design course material chapter 2
compiler Design course material chapter 2
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt
 
Automata
AutomataAutomata
Automata
 
Automata
AutomataAutomata
Automata
 
Ch2 automata.pptx
Ch2 automata.pptxCh2 automata.pptx
Ch2 automata.pptx
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 
Theory of Automata ___ Basis ...........
Theory of Automata ___ Basis ...........Theory of Automata ___ Basis ...........
Theory of Automata ___ Basis ...........
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
7645347.ppt
7645347.ppt7645347.ppt
7645347.ppt
 
theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
 
Flat unit 1
Flat unit 1Flat unit 1
Flat unit 1
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Ch3
Ch3Ch3
Ch3
 
SS UI Lecture 4
SS UI Lecture 4SS UI Lecture 4
SS UI Lecture 4
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
 
The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptx
 

Kürzlich hochgeladen

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 

Kürzlich hochgeladen (20)

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 

Chapter2CDpdf__2021_11_26_09_19_08.pdf

  • 1. Department of CE Prof. Happy Chapla Unit no : 2 Lexical Analyser LexicalAnalyser CD:COMPILER DESIGN
  • 2. Outline : Introduction to Lexical Analyser Tokens, Lexemes, and Patterns Specification of Tokens Regular expression and Regular Definition Transition Diagram Hard coding and automatic generation of lexical analysers Finite Automata Regular expression to NFA using Thompson’s rule NFA to DFA conversion using subset construction method DFA Optimization Regular expression to DFA conversion Department of CE Unit no : 2 Lexical Analyser Prof. Happy Chapla
  • 3. Role of Lexical Analyser  Remove comments and white spaces in the form of blanks, tabs, and newline characters (aka scanning)  Macros expansion  Read input characters from the source program  Group them into lexemes  Produce as output a sequence of tokens  Interact with the symbol table  Correlate error messages generated by the compiler with the source program
  • 4. Scanner – Parser Interaction  After receiving a “Get next token” command from parser, the lexical analyzer reads the input character until it can identify the next token.
  • 5. Why separating Lexical and Syntactic?  Simplicity of design  Improved compiler efficiency  Allows us to use specialized technique for lexer, not suitable for parser  Higher portability  Input-device-specific peculiarities restricted to lexer
  • 6. Tokens, Lexemes and Patterns  Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.  Pattern: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token.  Token: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are, 1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
  • 7. Example Token lexeme pattern else else characters e, l, s, e if if characters i, f comparision <=, < >, >=, > < or <= or = or < > or >= id pi, s1, Mj5 letter followed by letters & digit num 3.14, 0, 3.09e17 any numeric constant literal "core" any character b/w “and “except"
  • 8. Token Classes We have different token classes are possible:  One token per keyword  Tokens for the operators  One token representing all identifiers  Tokens representing constants (e.g. numbers)  Tokens for punctuation symbols
  • 9. Example In C program, the variable declaration line as: int value = 100;  int (keyword)  value(identifier)  = (operator)  100 (constant)  ; (symbol) Exercise: if(y <= t) y = y – 3;
  • 10. Example Total = Ans + 30  Tokens:  Total : Identifier 1  = : Operator 1  Ans : Identifier 2  + : Operator 2  30 : Constant 1  Lexems:  Lexems of identifiers : Total, Ans  Lexems of operators : =, +  Lexems of constant : 30
  • 11. Dealing with errors How lexical analyser deals with errors?  Lexical analyser unable to proceed: no pattern matches  Panic mode recovery: delete successive characters from remaining input until token found  Insert missing character  Delete a character  Replace character by another  Transpose two adjacent characters
  • 13. Specification of tokens  There are 3 specifications of tokens:  Strings  Language  Regular expression  Strings and Languages:  An alphabet or character class is a finite set of symbols.  A string over an alphabet is a finite sequence of symbols drawn from that alphabet.  A language is any countable set of strings over some fixed alphabet.
  • 14. Operations of Strings  A prefix of string s is any string obtained by removing zero or more symbols from the end of string s. For example, ban is a prefix of banana.  A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s. For example, nana is a suffix of banana.  A substring of s is obtained by deleting any prefix and any suffix from s. For example, nan is a substring of banana.  The proper prefixes, suffixes, and substrings of a string s are those prefixes, suffixes, and substrings, respectively of s that are not ε or not equal to s itself.  A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana.
  • 15. Exercise  Write prefix, suffix, substring, proper prefix, proper suffix and subsequence for following strings:  Example  Revolution
  • 16. Operations of Languages The following are the operations that can be applied to languages: (Applying these operations on L and S)  Union (L U S) : {𝑡 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑜𝑟 𝑡 𝑖𝑠 𝑖𝑛 𝑆 }  Concatenation (LS) : {𝑡𝑧 | 𝑡 𝑖𝑠 𝑖𝑛 𝐿 𝑎𝑛𝑑 𝑧 𝑖𝑠 𝑖𝑛 𝑆 }  Kleene Closure (L*) : 𝐿 ∗ 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑧𝑒𝑟𝑜 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.  Positive Closure (L+) : 𝐿 + 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 “𝑜𝑛𝑒 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑐𝑜𝑛𝑐𝑎𝑡𝑒𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓” 𝐿.
  • 17. Example Let L = {0, 1} and S = {a, b, c}  Union : L ∪ S = {0, 1, a, b, c}  Concatenation : L . S = {0a, 1a, 0b, 1b, 0c, 1c}  Kleene Closure : L* = {ε, 0, 1, 00,……}  Positive Closure : L+ = {0, 1, 00,……}
  • 18. Language: A language is a set of strings. Example: Even numbers Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (All possible symbols for given language) L = {0, 2, 4, 6, 8, 10, 12, 14, … } (Elements present in language) Example: Variable name in C language Σ = ASCII characters L = {a, b, c, …, A, B, C, …, _, aa, ab, … } Regular Language: • A subset of all languages that can be defined by regular expressions. • Any character is a regular expression matching itself. (a is a regular expression for character a) • ε is a regular expression matching the empty string.
  • 19. Operations on Regular Language: If R1 and R2 are two regular expressions, then: • R1R2 is a regular expression matching the concatenation of the languages. • R1 | R2: is a regular expression matching the disjunction of the languages. • R1*: is a regular expression matching the Kleene closure of the language (0 or more occurrences ). • (R): is a regular expression matching R. Example (Let Σ = {a, b}) • The regular expression a|b denotes the language _________. Ans: {a, b} • (a|b)(a|b) denotes ___________ Ans: {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ. Another regular expression for the same language is aa|ab|ba|bb. • a* denotes the language consisting of ______________ Ans: all strings of zero or more a's, that is, {ε,a,aa,aaa,...}
  • 20. Example (Let Σ = {0, 1}) • (0|1)* denotes the set of all strings _____________ Ans: Strings containing zero or more instances of 0 or 1, that is, all strings of 0's and 1's: {e, 0, 1, 00, 01, 10, 11,...}. Another regular expression for the same language is (0*1*)* • 0|0*1 denotes the language __________ Ans: {0, 1, 01, 001, 0001,...}, that is, the string 0 and all strings consisting of zero or more 0's and ending with 1.
  • 22. Regular Expression Regular Expression is a sequence of characters that define a pattern. Notations:  One or more instances : +  Zero or more instances: *  Zero or one instance: ?  Alphabets : Σ  Regular expression r and regular language for it is L(r)  (abc): “abc” occurred together in a regular expression  [abc]: a, b, c any one of these or all of these are present in regular expression
  • 23. Regular Expression Rules to define Regular Expression:  Φ is a regular expression for empty set.  ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole member is the empty string.  If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a}.  Suppose r and s are regular expressions denoting the languages L(r) and L(s). Then,  (r)|(s) is a regular expression denoting the language L(r) U L(s).  (r)(s) is a regular expression denoting the language L(r)L(s). c) (r)* is a regular expression denoting (L(r))*.  (r) is a regular expression denoting L(r).
  • 24. Regular Expression  L = Zero or more occurences of a = a*  a* = {ε, a, aa, aaa, aaaa,……} (Infinite elements)  L = One or more occurences of a = a+  a+ = {a, aa, aaa, aaaa,……} (Infinite elements)
  • 25. Regular Expression Algebraic or identity rules of regular expressions: (R and S are two regular expressions)  R + S = S + R + or | is commutative  (R + S) + T = R + (S + T) + or | is associative  (RS)T = R(ST) concatenation is associative  (R + S)T = RT + ST concatenation distributes over |  ϵ.R = R.ϵ = R ϵ is identity element for concatenation  R* = (ϵ + R)* relation between * and ϵ  R** = R* * is idempotent  RR* = R+  R? = (R | ϵ) 0 or 1 occurrence  [a – z] (a|b|…|z) 1 character from the given range  [acdgj] (a|c|d|g|) 1 of the given characters
  • 26. Exercise: (For given RE define its meaning or list down which are the valid string for that RE) • [012]+ Ans: (0|1|2)+ String can have one or more instances of 0 or 1 or 2. All strings of 0’s and 1’s and 2’s: {0, 1, 2, 00, 01, 02, 10, 11, 12,….} • [0 – 9]+ Ans: All possible combinations of strings containing elements from 0 to 9. Atleast one instance is required. • [1 – 9][0 – 9]+ Ans: Starting symbol must be any number from 1 to 9. After that it can have any number of instances of any number from 0 to 9. Resultant set will have strings of length at least 2. • [a – z A – Z][a – z A – Z 0 – 9]* Ans: Starting with alphabets and string of length 1 {a, z, A, C} are also valid strings here.
  • 27. Precedence and Associativity  The unary operator * has highest precedence and is left associative.  Concatenation has second highest precedence and is left associative.  | has lowest precedence and is left associative.
  • 28. Examples (Regular Expression) Language String Regular Expression 0 or 1 0, 1 0 | 1 1 or 10 or 111 1, 10, 111 1 | 10 | 111 Strings having one or more 0 0, 00, 000, 0000,…… 0+ All possible binary strings over Σ = {0, 1} 0, 1, 00, 01, 10, 11, 000,…… (0 | 1)+ All possible strings of length 3 over Σ = {a, b, c} aaa, aba, abc, abb,…… (a|b|c) (a|b|c) (a|b|c)
  • 29. Language String Regular Expression One or more occurrences of 0 or 1 or both 0, 1, 00, 01, 10, 11, 111, 101,…… (0 | 1)+ Binary string ending with 0 0, 10, 100, 110, 00, 010,…… (0 | 1)* 0 Binary string starting with 1 1, 10, 100, 110, 101, 1101,…… 1 (0 | 1)* Binary string starting with 1 and ending with 0 10, 110, 100, 110, 1100, 1110, 1000,…… 1 (1|0)* 0 String starting and ending with same character for Σ = {0, 1} 00, 11, 010, 000, 101, 1101, 0110, 1011,…… 1 (1|0)* 1 or 0 (1|0)* 0 String ending with 01 for Σ = {0, 1} 01, 101, 001, 1001, 1101,…… (0|1)*01
  • 30. Language String Regular Expression Language consisting of exactly two 0’s for Σ = {0, 1} 00, 010, 000, 001, 0101,…… 1*01*01* All binary strings with length at least 3 for Σ = {0, 1} 000, 010, 110, 1111, 1011,…… (0|1) (0|1) (0|1) (0|1)* All binary strings where 2nd symbol from starting is 0 for Σ = {0, 1} 00, 10, 101, 100,…… (0|1)0(0|1)* Any number of a’s followed by any number of b’s followed by any number of c’s ε, abc, aaabc, abbbbbc, abccc, ab, accc,…… a*b*c*
  • 31. Exercise Write regular expression for language specified Over Σ = {0, 1}  Strings having even length Ans: RE: ((0|1)(0|1))*  String containing exactly three 1’s Ans: RE: 0*10*10*10*  String starting with 0 and having odd length Ans: RE: 0((0|1)(0|1))*  String starting or ending with 01 or 111 Ans: RE: (01|111)(0|1)* | (0|1)*(01|111)
  • 32. Regular Definition  For notational convenience, we may give names to certain regular expressions and use those names in subsequent expressions, as if the names were themselves symbols.  These names are known as regular definition.  Regular definition is a sequence of definitions of the form: 𝑑1 → 𝑟1 𝑑2 → 𝑟2 …… 𝑑𝑛 → 𝑟𝑛 Where 𝑑𝑖 is a distinct name & 𝑟𝑖 is a regular expression.
  • 33. Example Regular definition for identifier  letter → A|B|C|………..|Z|a|b|………..|z  digit → 0|1|…….|9  id → letter (letter | digit)* Regular Definition for Even Numbers  (+|-|ε) (0|1|2|3|4|5|6|7|8|9)*(0|2|4|6|8)  Sign → + | -  OptSign → Sign | ε //In other words (Sign ?)  Digit → [0 – 9] //In other words (0 | 1 | ….. | 9)  EvenDigit → [02468] //In other words (0 | 2 | 4 | 6 | 8)  EvenNumber → OptSign Digit* EvenDigit
  • 34. Example Regular Definition for Unsigned Numbers  digit → 0 | 1 | ..... | 9  digits → digit digit*  optionalFraction → .digits | ε  optionalExponent →( E ( + | - | ε ) digits ) | ε  number → digits optionalFraction optionalExponent This can be simplified as:  digit → [0-9]  digits → digit+  number → digits (. digits)? ( E [+ | -]? digits )?
  • 36. Transition Diagram  Transition diagram is a special kind of flowchart for language analysis.  In transition diagram the boxes of flowchart are drawn as circle and called as states.  States are connected by arrows called as edges.  The label or weight on edge indicates the input character that can appear after that state
  • 37. Transition Diagram  Symbolized representation of transition diagram uses: is a state is a transition is a start state is a final state
  • 38. < 0 6 1 2 3 4 5 8 7 = other > = other = > Return (relop,LE) Return (relop,NE) Return (relop,LT) Return (relop,GE) Return (relop,GT) Return (relop,EQ) Transition Diagram : Example Transition Diagram for Relational Operators 8 7 4 3 2 5
  • 39. 1 2 8 other digit 3 4 5 6 7 digit digit digit +or - digit digit E . start 3 5280 39.37 1.894 E - 4 2.56 E + 7 45 E + 6 96 E 2 Transition Diagram : Example Transition Diagram for Unsigned Numbers E digit 8
  • 41.  Lexical analyser helps in identifying the pattern from the input.  Transition diagram is constructed to recognize the patterns. It is known as hard coding lexical analyzer. Example:  To represent identifier in ‘C’, the first character must be letter and other characters are either letter or digits. To recognize this pattern, hard coding lexical analyzer works with this transition diagram:  Lex and flex are compiler tools which takes regular expression as an input and finds out the pattern, matching to that expression. 2 3 Start Letter or digit Letter 1
  • 43. Implementing Regular Expression: • Regular expressions can be implemented using finite automata. • There are two kinds of finite automata: • NFAs (nondeterministic finite automata) • DFAs (deterministic finite automata) The step of implementing the lexical analyzer 1. Lexical Specification 2. Regular Expression 3. NFA 4. DFA 5. Table-Driven DFA Finite State Automaton
  • 44. Finite State Automaton A finite set of states present in FSM • One marked as initial state • One or more marked as final states • States sometimes labeled or numbered A set of transitions from one state to another • Each labeled with symbol from Σ (possible symbols), or ε Operate by reading input symbols • Transition can be taken if labelled with current symbol • ε-transition can be taken at any time Accept when final state reached & no more input Reject if no transition possible, or no more input and not in final state (DFA)
  • 45. Finite Automata  We call the recognizer of the tokens as a finite automaton.  FA results in “yes” or “no” based on each input string.  FA consist of:  S : Set of states  𝜮 : Set of input symbol  move : A transition function  S0 : Initial state  F : Accepting state (Final state)
  • 46. Finite Automata  A finite automaton can be: deterministic (DFA) or non-deterministic (NFA)  Both deterministic and non-deterministic finite automaton recognize regular sets.  Deterministic – faster recognizer, but it may take more space  Non-deterministic – slower, but it may take less space  Deterministic automatons are widely used lexical analysers.
  • 47.  Deterministic finite automata (DFA): From each state exactly one edge leaving out (for each symbol).  Nondeterministic finite automata (NFA): There are no restrictions on the edges leaving a state. There can be several with the same symbol as label and some edges can be labeled with 𝜖. 1 2 3 4 a b b a b 1 2 3 4 a b b a a a b DFA NFA b
  • 48. Regular Expression to NFA (Thompson’s Rule)
  • 53. Example (RE to NFA) 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑎 1 2 3 a b 1 2 5 3 4 6 a b 𝜖 𝜖 𝜖 𝜖 a* ab (a|b)
  • 54. Example (RE to NFA) 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑎 1 4 𝜖 𝜖 𝜖 𝜖 2 3 𝑏 6 5 𝑎 𝑏 5 𝑏 a*b b*ab
  • 55. Example (RE to NFA) 1 2 5 3 6 𝜖 𝜖 𝜖 𝜖 4 d c 𝜖 𝜖 𝜖 0 7 𝜖 (c|d)*
  • 56. Exercise  abab  a*|b*  abb(a)*ba  (ab + b)*ba  (a + b)*aa(a+b)
  • 57. NFA to DFA conversion (Using subset construction method)
  • 58. Subset Construction Algorithm  Input : NFA (N)  Output : DFA (D) (Accepting the same language)  Method : Apply Algorithm, Make Tansition Table, Dtran for DFA . Operations to perform: Operation Description Є – closure(s) Set of NFA States reachable from NFA State s on Є – transition alone. Є – closure(T) Set of NFA States reachable from some NFA State s in T on Є –transition alone. Move (T,a) Set of NFA states to which there is a transition on input symbol a from some NFA state s in T.
  • 59. Subset Construction Algorithm Initially Є –closure (s0) be the only state in Dstates and it is unmarked; While there is unmarked states in T in Dstates do begin Mark T; for each input symbol a do begin U = Є –closure (move (T,a)); If U is not in Dstates then add U as unmarked state to Dstates; Dtran [ T, a ] = U end end
  • 60. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є Є closure (0) = {0,1,2,4,7}  A
  • 61. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є Є closure (0) = {0,1,2,4,7}  A State a b A = {0,1,2,4,7}
  • 62. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} State a b A = {0,1,2,4,7}
  • 63. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} State a b A = {0,1,2,4,7}
  • 64. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,a) = {3,8} Є closure (Move(A,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B
  • 65. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} State a b A = {0,1,2,4,7} B
  • 66. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} Є closure (Move(A,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7} State a b A = {0,1,2,4,7} B
  • 67. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є A = {0,1,2,4,7} Move (A,b) = {5} Є closure (Move(A,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C
  • 68. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 69. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 70. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,a) = {3,8} Є Closure (Move (B,a)) = { 3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8}
  • 71. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,b) = {5,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B
  • 72. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є B = {1,2,3,4,6,7,8} Move (B,b) = {5,9} Є Closure (Move (B,b)) = {5,6,7,1,2,4,9} = {1,2,4,5,6,7,9}  D State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D
  • 73. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 74. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 75. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,a) = {3,8} Є Closure (Move (c,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7}
  • 76. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B
  • 77. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} Є Closure (Move (C,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B
  • 78. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є C = {1,2,4,5,6,7} Move (C,b) = {5} Є Closure (Move (C,b)) = {5,6,7,1,2,4} = {1,2,4,5,6,7}  C State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C
  • 79. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} Move (D,a) = {3,8}
  • 80. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} Move (D,a) = {3,8} Є Closure (Move (D,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B
  • 81. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B Move (D,b) = {5,10}
  • 82. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B Move (D,b) = {5,10} Є Closure (Move (D,b)) = {5,6,7,1,2,4,10} = {1,2,4,5,6,7,10}  E
  • 83. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є D = {1,2,4,5,6,7,9} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E Move (D,b) = {5,10} Є Closure (Move (D,b)) = {5,6,7,1,2,4,10} = {1,2,4,5,6,7,10}  E
  • 84. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} Move (E,a) = {3,8}
  • 85. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} Move (E,a) = {3,8} Є Closure (Move (E,a)) = {3,6,7,1,2,4,8} = {1,2,3,4,6,7,8}  B
  • 86. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B Move (E,b) = {5}
  • 87. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B Move (E,b) = {5}
  • 88. Example : (a|b)*abb 0 1 2 3 4 5 6 7 8 9 10 Є Є Є Є Є Є a b a b b Є Є E = {1,2,4,5,6,7,10} State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C Move (E,b) = {5}
  • 89. Example : (a|b)*abb State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C
  • 90. Convert NFA to DFA : (a|b)*abb Є –closure (0) = {0,1,2,4,7}  Let A Move (A,a) = {3,8} Є –closure ((A,a)) = {1,2,3,4,6,7,8}  Let B Move (A,b) = {5} Є –closure ((A,b)) = {1,2,4,5,6,7}  Let C
  • 91. Move (B,a) = {3,8} Є –closure ((B,a)) = {1,2,3,4,6,7,8}  Let B Move (B,b) = {5,9} Є –closure ((B,b)) = {1,2,4,5,6,7,9}  Let D Move (C,a) = {3,8} Є –closure ((C,a)) = {1,2,3,4,6,7,8}  Let B Move (C,b) = {5} Є –closure ((C,b)) = {1,2,4,5,6,7}  Let C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 92. Move (D,a) = {3,8} Є –closure ((D,a)) = {1,2,3,4,6,7,8}  Let B Move (D,b) = {5,10} Є –closure ((D,b)) = {1,2,4,5,6,7,10}  Let E Move (E,a) = {3,8} Є –closure ((E,a)) = {1,2,3,4,6,7,8}  Let B Move (E,b) = {5} Є –closure ((E,b)) = {1,2,4,5,6,7}  Let C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 93. State a b A = {0,1,2,4,7} B C B= {1,2,3,4,6,7,8} B D C = {1,2,4,5,6,7} B C D = {1,2,4,5,6,7,9} B E E = {1,2,4,5,6,7,10} B C Convert NFA to DFA : (a|b)*abb (Cont…)
  • 94. Example  Convert following regular expression to DFA using subset construction method:  (0 + 1)*1(0 + 1)  (0 + 1)*01*
  • 96. DFA Optimization  The procedure can also be known as minimization of DFA.  Minimization/optimization refers to the detection of those states of a DFA, whose presence or absence in a DFA does not affect the language accepted by the automata.  The states that can be eliminated from automata, without affecting the language accepted by automata, are:  Unreachable or inaccessible states  Dead states  Non-distinguishable or indistinguishable state or equivalent states.  Partitioning algorithm helps in DFA optimization process.
  • 97. Partitioning Algorithm 1. Remove all the states that are unreachable from initial state via any set of transition of DFA. 2. Draw the transition table for all pair of states. 3. Now, split the transition table into two tables T1 and T2. 1. T1 contains all final states 2. T2 contains non-final states 4. Find similar rows from T1 such that; 𝛿(q, a) = p 𝛿(r, a) = p i.e. find the two states which have same value of a and b and remove one of them
  • 98. Continued… 5. Repeat step 3 until we find no similar rows available in T1 6. Repeat step 3 and step 4 for table T2 also. 7. Now combine the reduced T1 and T2 tables. i.e. the final transition table of minimized DFA.
  • 99. DFA Optimization A B C B B D C B C D B E E B C States a b {𝐴, 𝐵, 𝐶, 𝐷, 𝐸} Nonaccepting States {𝐴, 𝐵, 𝐶, 𝐷} Accepting States {𝐸} {𝐴, 𝐵, 𝐶} {𝐷} {𝐴, 𝐶} {𝐵} • Now no more splitting is possible. • If we chose A as the representative for group (AC), then we obtain reduced transition table A B A B B D D B E E B A States a b Optimized TransitionTable
  • 101. Function computed from syntax tree  nullable (n): Is true for * node and node labeled with Ɛ. For other nodes it is false.  firstpos (n): Set of positions at node ti that corresponds to the first symbol of the sub-expression rooted at n.  lastpos (n): Set of positions at node ti that corresponds to the last symbol of the sub-expression rooted at n.  followpos (i): Set of positions that follows given position by matching the first or last symbol of a string generated by sub-expression of the given regular expression.
  • 102. Rules to compute nullable, firstpos, lastpos Node n nullable(n) firstpos(n) lastpos(n) A leaf labeled by  true ∅ ∅ A leaf with position 𝒊 false {𝑖} {𝑖} nullable(c1) or nullable(c2) firstpos(c1)  firstpos(c2) lastpos(c1)  lastpos(c2) | n c1 nullable(c1) and nullable(c2) if (nullable(c1)) thenfirstpos(c1)  firstpos(c2) else firstpos(c1) if (nullable(c2)) then lastpos(c1)  lastpos(c2) else lastpos(c2) n true firstpos(c1) lastpos(c1) ∗ n c2 . c1 c2 c1
  • 103. Computation of followpos The position of regular expression can follow another in the following ways:  If n is a cat node with left child c1 and right child c2, then for every position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i).  For cat node, for each position i in lastpos of its left child, the firstpos of its right child will be in followpos(i).  If n is a star node and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i).  For star node, the firstpos of that node is in f ollowpos of all positions in lastpos of that node.
  • 104. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . 𝟏 𝟐 𝑎 . . . 𝑏 𝑏 # (a|b)*abb 𝟒 𝟑 𝟓 𝟔 # Step 2: Nullable node Here, * is only nullable node Step 1: Construct SyntaxTree
  • 105. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . {1} {2} {1,2} {1,2} 𝑎 {3} {1,2,3} . {4} {1,2,3} . {5} {1,2,3} . {6} {1,2,3} 𝑏 𝑏 # Step 3: Calculate firstpos Firstpos A leaf with position 𝒊 = {𝒊} | n c1 c2 firstpos(c1)  firstpos(c2) ∗ n c1 firstpos(c1) if (nullable(c1)) firstpos(c1)  firstpos(c2) else firstpos(c1) . n c1 c2 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔
  • 106. Conversion from regular expression to DFA 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 3: Calculate lastpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 Lastpos A leaf with position 𝒊 = {𝒊} | n c1 c2 lastpos(c1)  lastpos(c2) ∗ n c1 lastpos(c1) if (nullable(c2)) lastpos(c1)  lastpos(c2) else lastpos(c2) . n c1 c2
  • 107. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {5} . {6} {6} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {5} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 6 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 5 = 6 Firstpos Lastpos
  • 108. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {4} . {5} {5} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {4} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 5 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 4 = 5 4 5 Firstpos Lastpos
  • 109. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2,3} {3} . {4} {4} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {3} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 4 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 3 = 4 4 5 3 4 Firstpos Lastpos
  • 110. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 {1,2} {1,2} . {3} {3} 𝒄𝟏 𝒄𝟐 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑐1) = {1,2} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑐2 = 3 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 3 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 3 4 5 3 4 2 3 1 3 Firstpos Lastpos
  • 111. Conversion from regular expression to DFA Position followpos 𝑎 𝑏 | ∗ . {1} {1} {2} {2} {1,2} {1,2} {1,2} {1,2} 𝑎 {3} {3} {1,2,3} {3} . {4} {4} {1,2,3} {4} . {5} {5} {1,2,3} {5} . {6} {6} {1,2,3} {6} 𝑏 𝑏 # Step 4: Calculate followpos 𝟏 𝟐 𝟒 𝟑 𝟓 𝟔 5 6 𝒏 𝑖 = 𝑙𝑎𝑠𝑡𝑝𝑜𝑠(𝑛) = {1,2} 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 𝑛 = 1,2 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 1 = 1,2 𝑓𝑜𝑙𝑙𝑜𝑤𝑝𝑜𝑠 2 = 1,2 4 5 3 4 2 3 1 3 {1,2} {1,2} * 1,2, 1,2, Firstpos Lastpos
  • 112. Construct DFA Initial state = 𝑓𝑖𝑟𝑠𝑡𝑝𝑜𝑠 of root = {1,2,3} ----- A State A δ( (1,2,3),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3),b) = followpos(2) =(1,2,3) ----- A Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4}
  • 113. Construct DFA State B δ( (1,2,3,4),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,4),b) = followpos(2) U followpos(4) =(1,2,3) U (5) = {1,2,3,5} ----- C State C δ( (1,2,3,5),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,5),b) = followpos(2) U followpos(5) =(1,2,3) U (6) = {1,2,3,6} ----- D Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6}
  • 114. Construct DFA State D δ( (1,2,3,6),a) = followpos(1) U followpos(3) =(1,2,3) U (4) = {1,2,3,4} ----- B δ( (1,2,3,6),b) = followpos(2) =(1,2,3) ----- A Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6} B A A B C D a b b b a a b a DFA
  • 115. Construct DFA Position followpos 5 6 4 5 3 4 2 1,2,3 1 1,2,3 States a b A={1,2,3} B A B={1,2,3,4} B C C={1,2,3,5} B D D={1,2,3,6} B A A B C D a b b b a a b a DFA Note: Elements of E contains state 10 that is acceptance state in NFA. So, State E is acceptance state