Notation, Regular Expressions in Lexical Specification, Error Handling, Finite Automata State Graphs, Epsilon Moves, Deterministic and Non-Deterministic Automata, Table Implementation of a DFA
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Â
Implement Lexical Analyzer Using Finite Automata
1. COMPILER DESIGN
Myself Archana R
Assistant Professor In
Department Of Computer Science
SACWC.
I am here because I love to give
presentations.
IMPLEMENTATION OF LEXICAL ANALYZER
4. Notation
⢠For convenience, we use a variation (allow user- defined abbreviations)
in regular expression notation.
⢠Union: A + B ďş A | B
⢠Option: A + ďĽ ďş A?
⢠Range: âaâ+âbâ+âŚ+âzâ ďş [a-z]
⢠Excluded range:
complement of [a-
z]
ďş [^a-z]
5. Regular Expressions in Lexical Specification
⢠Last lecture: a specification for the predicate,
s ď L(R)
⢠But a yes/no answer is not enough !
⢠Instead: partition the input into tokens.
⢠We will adapt regular expressions to this goal.
6. Regular Expressions ď Lexical Spec. (1)
1. Select a set of tokens
⢠Integer, Keyword, Identifier, OpenPar, ...
2. Write a regular expression (pattern) for the lexemes of
each token
⢠Integer = digit +
⢠Keyword = âifâ + âelseâ + âŚ
⢠Identifier = letter (letter + digit)*
⢠OpenPar = â(â
⢠âŚ
7. Regular Expressions ď Lexical Spec. (2)
3. Construct R, matching all lexemes for all tokens
R = Keyword + Identifier + Integer + âŚ
= R1 + R2 + R3 + âŚ
Facts: If s ď L(R) then s is a lexeme
â Furthermore s ď L(Ri) for some âiâ
â This âiâ determines the token that is reported
8. Regular Expressions ď Lexical Spec. (3)
4. Let input be x1âŚxn
⢠(x1 ... xn arecharacters)
⢠For 1 ďŁ i ďŁ n check
x1âŚxi ď L(R) ?
5. It must be thatt
x1âŚxi ď L(Rj) for some j
(if there is a choice, pick a smallest such j)
6. Remove x1âŚxi from input and go to previous step
9. How to Handle Spaces and Comments?
1. We could create a token Whitespace
Whitespace = (â â + ânâ + âtâ)+
â We could also add comments in there
â An input â tn 5555 â is transformed into Whitespace Integer Whitespace
2. Lexer skips spaces (preferred)
⢠Modify step 5 from before as follows:
It must be that xk ... xi L(Rj) for some j such that x1 ... xk-1 L(Whitespace)
⢠Parser is not bothered with spaces
10. Ambiguities (1)
⢠There are ambiguities in the algorithm
⢠How much input is used? What if
⢠x1âŚxi ď L(R) and also
⢠x1âŚxK ď L(R)
âRule: Pick the longest possible substring
âThe âmaximal munchâ
11. Ambiguities (2)
⢠Which token is used? What if
⢠x1âŚxi ď L(Rj) and also
⢠x1âŚxi ď L(Rk)
â Rule: use rule listed first (j if j < k)
⢠Example:
âR1 = Keyword and R2 = Identifier
ââifâ matches both
âTreats âifâ as a keyword not an identifier
12. Error Handling
⢠What if
No rule matches a prefix of input ?
⢠Problem: Canât just get stuck âŚ
⢠Solution:
âWrite a rule matching all âbadâ strings
âPut it last
⢠Lexer tools allow the writing of:
R = R1 + ... + Rn +Error
âToken Error matches if nothing else matches
13. Regular Languages & FiniteAutomata
Basic formal language theory result:
Regular expressions and finite automata both define the class of regular languages.
Thus, we are going to use:
⢠Regular expressions for specification
⢠Finite automata for implementation (automatic generation of lexical analyzers)
14. FiniteAutomata
⢠A finite automaton is a recognizer for the strings of a regular language
A finite automaton consists of
âA finite input alphabet ď
âA set of states S
âA start state n
âA set of accepting states F ď S
âA set of transitions state ďŽ input state
15. ⢠Transition
s1 ďŽa s2
⢠Is read
In state s1 on input âaâ go to states2
⢠If end of input (or no transition possible)
âIf in accepting state ď accept
âOtherwise ď reject
16. Finite Automata State Graphs
⢠A state
a
⢠The start state
⢠An accepting state
⢠A transition
17. A Simple Example
⢠A finite automaton that accepts only â1â
1
Another Simple Example
⢠A finite automaton accepting any number of 1âs
followed by a single 0
⢠Alphabet: {0,1}
1
0
18. And Another Example
⢠Alphabet {0,1}
⢠What language does this recognize?
1
1
1
0
0 0
⢠Alphabet still { 0, 1 }
1
1
⢠The operation of the automaton is not completely
defined by the input
â On input â11â the automaton could be in either state
19. Epsilon Moves
⢠Another kind of transition: ďĽ-moves
ďĽ
A B
⢠Machine can move from state A to state B without
reading input
20. Deterministic and Non-Deterministic Automata
⢠Deterministic Finite Automata (DFA)
âOne transition per input per state
âNo ďĽ- moves
⢠Non-deterministic Finite Automata (NFA)
âCan have multiple transitions for one input in a given state
âCan have ďĽ- moves
⢠Finite automata have finite memory
âEnough to only encode the current state
21. Execution of FiniteAutomata
⢠A DFA can take only one path through the state graph
â Completely determined by input
⢠NFAs can choose
âWhether to make ďĽ -moves
âWhich of multiple transitions for a single input to take.
22. Acceptance of NFAs
⢠An NFA can get into multiple states
1
0
0
1
⢠Input: 1 0
1
⢠Rule: NFA accepts an input if it can get in a final state
23. NFA vs. DFA (1)
⢠NFAs and DFAs recognize the same set of languages
(regular languages)
⢠DFAs are easier to implement
â There are no choices to consider
24. NFA vs. DFA (2)
⢠For a given language the NFA can be simpler than the DFA
NF
A
1
0
0
0
DF
A
0
1
1
1
0 0
⢠DFA can be exponentially larger than NFA
25. Regular Expressions to FiniteAutomata
⢠High-level sketch
NFA
Regular expressions DFA
Lexical Specification Table-driven Implementation of
DFA
26. Regular Expressions to NFA(1)
⢠For each kind of reg. expr, define an NFA
â Notation: NFA for regular expression M
⢠For ďĽ
⢠For input a
M
ďĽ
a
29. The Trick of NFA to DFA
⢠Simulate the NFA
⢠Each state of DFA
= a non-empty subset of states of the NFA
⢠Start state
= the set of NFA states reachable through ďĽ -moves from NFA start state
⢠Add a transition S ďŽ a Sâ to DFA iff
âSâ is the set of NFA states reachable from any state in S after seeing the input a
â˘considering ďĽ -moves as well
30. IMPLEMENTATION
⢠A DFA can be implemented by a 2D table T
âOne dimension is âstatesâ
âOther dimension is âinput symbolsâ
âFor every transition Si ďŽ a Sk define T[i,a] =k
⢠DFA âexecutionâ
âIf in state Si and input a, read T[i,a] = k and skip to state Sk
âVery efficient
32. Implementation (Cont.)
⢠NFA â DFA conversion is at the heart of tools such
as lex, ML-Lex or flex.
⢠But, DFAs can be huge.
⢠In practice, lex/ML-Lex/flex-like tools trade off
speed for space in the choice of NFA and DFA
representations.
33. DFA and Lexer
Two differences:
⢠DFAs recognize lexemes. A lexer must return a type of acceptance (token
type) rather than simply an accept/reject indication.
⢠DFAs consume the complete string and accept or reject it. A lexer must
find the end of the lexeme in the input stream and then find the next one,
etc.