SlideShare ist ein Scribd-Unternehmen logo
1 von 17
1
SPECIFICATION OF
TOKENS
2
Strings and Languages
• Regular Expressions are an important notation for specifying patterns.
• Alphabet – any finite set of symbols
e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1
• String – A finite sequence of symbols drawn from an alphabet
– Banana (ASCII Alphabet)
– Length of a string => |s|
– Empty String => ε
• Other terms relating to strings: prefix; suffix; substring; proper prefix,
suffix, or substring (non-empty, not entire string); subsequence
• Language – A set of strings over a fixed alphabet
3
Languages
• A language, L, is simply any set of strings over a
fixed alphabet.
Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}
Special Languages:  - EMPTY LANGUAGE
 - contains  string only
4
String operations
• Given String: banana
• Prefix : ban, banana
• Suffix : ana, banana
• Substring : nan, ban, ana, banana
• Subsequence: bnan, nn
• Proper Prefix and Suffix
5
String Operations
• Concatenation
– xy; s = s = s;  - identity for concatenation
– s0 =  if i > 0 si = si-1s
6
Operations on Languages
OPERATION DEFINITION
union of L and M
written L  M
concatenation of L
and M written LM
Kleene closure of L
written L*
positive closure of L
written L+
L  M = {s | s is in L or s is in M}
LM = {st | s is in L and t is in M}
L+=


0
i
i
L
L* denotes “zero or more concatenations of “ L
L*=


1
i
i
L
L+ denotes “one or more concatenations of “ L
Exponentiation Lo={ε}, L1=L,L2=LL
7
Operations on Languages
• LUD is the set of letters and digits
• LD is the set of strings consisting of a
letter followed by a digit
• L4 is the set of all four strings
• L* is the set of strings including ε
• D+ is the set of strings of one or more
digits.
8
Say What?
L = {A, B, C, D } D = {1, 2, 3}
• L  D
{A, B, C, D, 1, 2, 3 }
• LD
{A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
• L2
{ AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
• L*
{ All possible strings of L plus  }
• L+
L* - 
• L (L  D )
Valid :{ A1,AA2,B345,CD45} Invlaid:{321,4A2}
• L (L  D )*
Valid:{ A,A1,A23,D3,DA5..} Invalid:{31}
9
Regular Expressions
• A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of
Symbols (Strings) from an Alphabet.
• Let  Be an Alphabet, r a Regular Expression
Then L(r) is the Language That is characterized
by the Rules of r
10
Regular Expressions
• Defined over an alphabet Σ
• ε represents {ε}, the set containing the empty string
• If a is a symbol in Σ, then a is a regular expression
denoting {a}, the set containing the string a
• If r and s are regular expressions denoting the
languages L(r) and L(s), then:
– (r)|(s) is a regular expression denoting L(r)U L(s)
– (r)(s) is a regular expression denoting L(r)L(s)
– (r)* is a regular expression denoting (L(r))*
– (r) is a regular expression denoting L(r)
• Precedence: * (left associative), then concatenation (left
associative), then | (left associative)
11
Regular Expressions
Alphabet = {a, b}
1. a|b denotes {a, b}
2. (a|b)(a|b) denotes {ab, aa, ba, bb}
3. a* denotes {, a, aa, …}
4. (a|b)* - Strings of a’s and b’s including the 
5. a|a*b – a followed by zero/more a’s followed by b
12
Algebraic Properties of Regular
Expressions
AXIOM DESCRIPTION
r | s = s | r
r | (s | t) = (r | s) | t
(r s) t = r (s t)
r = r
r = r
r* = ( r |  )*
r ( s | t ) = r s | r t
( s | t ) r = s r | t r
r** = r*
| is commutative
| is associative
concatenation is associative
concatenation distributes over |
relation between * and 
 Is the identity element for concatenation
* is idempotent
13
Regular Definitions
• Names maybe given to regular expressions; these
names can be used like symbols
• Let  is an alphabet of basic symbols. The regular
definition is a sequence of definitions of the form
d1 r1
d2 r2
. . .
dn rn
Where, each di is a distinct name, and each ri is a
regular expression over the symbols in   {d1, d2,
…, di-1 }
14
Regular Definitions
• Example 1:
– letter  A|B|…|Z|a|b|…|z
– digit  0|1|…|9
– id  letter (letter | digit)*
• Example 2
– digit  0 | 1 | 2 | … | 9
– digits  digit digit*
– optional_fraction  . digits | 
– optional_exponent  ( E ( + | -| ) digits) | 
– num  digits optional_fraction optional_exponent
15
Regular Definitions
• Shorthand
– One or more instances: r+ denotes rr*
– Zero or one Instance: r? denotes r|ε
– Character classes: [a-z] denotes
[a|b|…|z]
16
Example
• digit  0 | 1 | 2 | … | 9
• digits  digit+
• optional_fraction  (. digits ) ?
• optional_exponent  ( E ( + | -) ? digits) ?
• num  digits optional_fraction optional_exponent
17
Limitations of Regular
Expression
• Some languages cannot be described by any regular
expression
• Cannot describe balanced or nested constructs
– Example, all valid strings of balanced parentheses
– This can be done with CFG
• Cannot describe repeated strings
– Example: {wcw|w is a string of a’s and b’s}
– This can be done with CFG
• Can be used to denote only a fixed or unspecified
number of repetitions.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Lex
LexLex
Lex
 
LR(1) and SLR(1) parsing
LR(1) and SLR(1) parsingLR(1) and SLR(1) parsing
LR(1) and SLR(1) parsing
 
Predictive parser
Predictive parserPredictive parser
Predictive parser
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
Non- Recursive Predictive Parsing.pptx
Non- Recursive Predictive Parsing.pptxNon- Recursive Predictive Parsing.pptx
Non- Recursive Predictive Parsing.pptx
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
5. NFA & DFA.pdf
5. NFA & DFA.pdf5. NFA & DFA.pdf
5. NFA & DFA.pdf
 
Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
NFA & DFA
NFA & DFANFA & DFA
NFA & DFA
 
Introduction TO Finite Automata
Introduction TO Finite AutomataIntroduction TO Finite Automata
Introduction TO Finite Automata
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
1.Role lexical Analyzer
1.Role lexical Analyzer1.Role lexical Analyzer
1.Role lexical Analyzer
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Minimization of DFA
Minimization of DFAMinimization of DFA
Minimization of DFA
 
Bottom - Up Parsing
Bottom - Up ParsingBottom - Up Parsing
Bottom - Up Parsing
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 

Ähnlich wie 2_2Specification of Tokens.ppt

Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
bolovv
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
Jagjit Wilku
 

Ähnlich wie 2_2Specification of Tokens.ppt (20)

Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdf
 
7645347.ppt
7645347.ppt7645347.ppt
7645347.ppt
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Computer Software: Compiler Construction Lecture 05.ppt
Computer Software: Compiler Construction Lecture 05.pptComputer Software: Compiler Construction Lecture 05.ppt
Computer Software: Compiler Construction Lecture 05.ppt
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
 
Lecture3 lexical analysis
Lecture3 lexical analysisLecture3 lexical analysis
Lecture3 lexical analysis
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Module 1 TOC.pptx
Module 1 TOC.pptxModule 1 TOC.pptx
Module 1 TOC.pptx
 
Regular expressions h1
Regular expressions h1Regular expressions h1
Regular expressions h1
 
compiler Design course material chapter 2
compiler Design course material chapter 2compiler Design course material chapter 2
compiler Design course material chapter 2
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Unit-1-part-1.pptx
Unit-1-part-1.pptxUnit-1-part-1.pptx
Unit-1-part-1.pptx
 
13000120020_A.pptx
13000120020_A.pptx13000120020_A.pptx
13000120020_A.pptx
 
L_2_apl.pptx
L_2_apl.pptxL_2_apl.pptx
L_2_apl.pptx
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
1 introduction
1 introduction1 introduction
1 introduction
 
Regular expression with DFA
Regular expression with DFARegular expression with DFA
Regular expression with DFA
 

Mehr von Ratnakar Mikkili (7)

AI-State Space Representation.pptx
AI-State Space Representation.pptxAI-State Space Representation.pptx
AI-State Space Representation.pptx
 
AI-State Space Representation.pptx
AI-State Space Representation.pptxAI-State Space Representation.pptx
AI-State Space Representation.pptx
 
Artificial Intelligence_Searching.pptx
Artificial Intelligence_Searching.pptxArtificial Intelligence_Searching.pptx
Artificial Intelligence_Searching.pptx
 
Artificial Intelligence_Environment.pptx
Artificial Intelligence_Environment.pptxArtificial Intelligence_Environment.pptx
Artificial Intelligence_Environment.pptx
 
2_4 Finite Automata.ppt
2_4 Finite Automata.ppt2_4 Finite Automata.ppt
2_4 Finite Automata.ppt
 
Push down automata
Push down automataPush down automata
Push down automata
 
Context free grammar
Context free grammarContext free grammar
Context free grammar
 

Kürzlich hochgeladen

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 

2_2Specification of Tokens.ppt

  • 2. 2 Strings and Languages • Regular Expressions are an important notation for specifying patterns. • Alphabet – any finite set of symbols e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1 • String – A finite sequence of symbols drawn from an alphabet – Banana (ASCII Alphabet) – Length of a string => |s| – Empty String => ε • Other terms relating to strings: prefix; suffix; substring; proper prefix, suffix, or substring (non-empty, not entire string); subsequence • Language – A set of strings over a fixed alphabet
  • 3. 3 Languages • A language, L, is simply any set of strings over a fixed alphabet. Alphabet Languages {0,1} {0,10,100,1000,100000…} {0,1,00,11,000,111,…} {a,b,c} {abc,aabbcc,aaabbbccc,…} {A, … ,Z} {FOR,WHILE,GOTO,…} {A,…,Z,a,…,z,0,…9, { All legal PASCAL progs} +,-,…,<,>,…} Special Languages:  - EMPTY LANGUAGE  - contains  string only
  • 4. 4 String operations • Given String: banana • Prefix : ban, banana • Suffix : ana, banana • Substring : nan, ban, ana, banana • Subsequence: bnan, nn • Proper Prefix and Suffix
  • 5. 5 String Operations • Concatenation – xy; s = s = s;  - identity for concatenation – s0 =  if i > 0 si = si-1s
  • 6. 6 Operations on Languages OPERATION DEFINITION union of L and M written L  M concatenation of L and M written LM Kleene closure of L written L* positive closure of L written L+ L  M = {s | s is in L or s is in M} LM = {st | s is in L and t is in M} L+=   0 i i L L* denotes “zero or more concatenations of “ L L*=   1 i i L L+ denotes “one or more concatenations of “ L Exponentiation Lo={ε}, L1=L,L2=LL
  • 7. 7 Operations on Languages • LUD is the set of letters and digits • LD is the set of strings consisting of a letter followed by a digit • L4 is the set of all four strings • L* is the set of strings including ε • D+ is the set of strings of one or more digits.
  • 8. 8 Say What? L = {A, B, C, D } D = {1, 2, 3} • L  D {A, B, C, D, 1, 2, 3 } • LD {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 } • L2 { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD} • L* { All possible strings of L plus  } • L+ L* -  • L (L  D ) Valid :{ A1,AA2,B345,CD45} Invlaid:{321,4A2} • L (L  D )* Valid:{ A,A1,A23,D3,DA5..} Invalid:{31}
  • 9. 9 Regular Expressions • A Regular Expression is a Set of Rules / Techniques for Constructing Sequences of Symbols (Strings) from an Alphabet. • Let  Be an Alphabet, r a Regular Expression Then L(r) is the Language That is characterized by the Rules of r
  • 10. 10 Regular Expressions • Defined over an alphabet Σ • ε represents {ε}, the set containing the empty string • If a is a symbol in Σ, then a is a regular expression denoting {a}, the set containing the string a • If r and s are regular expressions denoting the languages L(r) and L(s), then: – (r)|(s) is a regular expression denoting L(r)U L(s) – (r)(s) is a regular expression denoting L(r)L(s) – (r)* is a regular expression denoting (L(r))* – (r) is a regular expression denoting L(r) • Precedence: * (left associative), then concatenation (left associative), then | (left associative)
  • 11. 11 Regular Expressions Alphabet = {a, b} 1. a|b denotes {a, b} 2. (a|b)(a|b) denotes {ab, aa, ba, bb} 3. a* denotes {, a, aa, …} 4. (a|b)* - Strings of a’s and b’s including the  5. a|a*b – a followed by zero/more a’s followed by b
  • 12. 12 Algebraic Properties of Regular Expressions AXIOM DESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r (s t) r = r r = r r* = ( r |  )* r ( s | t ) = r s | r t ( s | t ) r = s r | t r r** = r* | is commutative | is associative concatenation is associative concatenation distributes over | relation between * and   Is the identity element for concatenation * is idempotent
  • 13. 13 Regular Definitions • Names maybe given to regular expressions; these names can be used like symbols • Let  is an alphabet of basic symbols. The regular definition is a sequence of definitions of the form d1 r1 d2 r2 . . . dn rn Where, each di is a distinct name, and each ri is a regular expression over the symbols in   {d1, d2, …, di-1 }
  • 14. 14 Regular Definitions • Example 1: – letter  A|B|…|Z|a|b|…|z – digit  0|1|…|9 – id  letter (letter | digit)* • Example 2 – digit  0 | 1 | 2 | … | 9 – digits  digit digit* – optional_fraction  . digits |  – optional_exponent  ( E ( + | -| ) digits) |  – num  digits optional_fraction optional_exponent
  • 15. 15 Regular Definitions • Shorthand – One or more instances: r+ denotes rr* – Zero or one Instance: r? denotes r|ε – Character classes: [a-z] denotes [a|b|…|z]
  • 16. 16 Example • digit  0 | 1 | 2 | … | 9 • digits  digit+ • optional_fraction  (. digits ) ? • optional_exponent  ( E ( + | -) ? digits) ? • num  digits optional_fraction optional_exponent
  • 17. 17 Limitations of Regular Expression • Some languages cannot be described by any regular expression • Cannot describe balanced or nested constructs – Example, all valid strings of balanced parentheses – This can be done with CFG • Cannot describe repeated strings – Example: {wcw|w is a string of a’s and b’s} – This can be done with CFG • Can be used to denote only a fixed or unspecified number of repetitions.