SlideShare a Scribd company logo
1 of 39
Download to read offline
Parsing Expression Grammars:
A Recognition­Based Syntactic Foundation
Bryan Ford
Massachusetts Institute of Technology
January 14, 2004
Designing a Language Syntax
Designing a Language Syntax
1.Formalize syntax via
context­free grammar
2.Write a YACC parser
specification
3.Hack on grammar
until “ near­LALR(1)”
4.Use generated parser
Textbook Method
Designing a Language Syntax
1.Formalize syntax via
context­free grammar
2.Write a YACC parser
specification
3.Hack on grammar
until “ near­LALR(1)”
4.Use generated parser
1.Specify syntax
informally
2.Write a recursive
descent parser
Textbook Method Pragmatic Method

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa
Start symbol

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa
Start symbol
Output strings
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
Input
string
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
Input
string
Derive
structure
Take­Home Points
Key benefits of PEGs:
● Simplicity, formalism, analyzability of CFGs
● Closer match to syntax practices
– More expressive than deterministic CFGs (LL/LR)
– More of the “ right kind” of expressiveness:
prioritized choice, greedy rules, syntactic predicates
– Unlimited lookahead, backtracking
● Linear­time parsing for any PEG
What kind of
recursive descent parsing?
Key assumptions:
● Parsing functions are stateless:
depend only on input string
● Parsing functions make decisions locally:
return at most one result (success/failure)
Parsing Expression Grammars
Consists of: (∑, N, R, eS)
– ∑: finite set of terminals (character set)
– N: finite set of nonterminals
– R: finite set of rules of the form “A  e”,
where A ∈ N, e is a parsing expression.
– eS: a parsing expression called the start expression.
Parsing Expressions
 the empty string
a terminal (a ∈ ∑)
A nonterminal (A ∈ N)
e1 e2 a sequence of parsing expressions
e1 / e2 prioritized choice between alternatives
e?, e*, e+ optional, zero­or­more, one­or­more
&e, !e syntactic predicates
How PEGs Express Languages
Given input string s, a parsing expression either:
– Matches and consumes a prefix s'
of s.
– Fails on s.
Example:
S  bad
S matches “ badder”
S matches “ baddest”
S fails on “ abad”
S fails on “ babe”
Prioritized Choice with Backtracking
S  A / B means:
“ To parse an S, first try to parse an A.
If A fails, then backtrack and try to parse a B.”
Example:
S  if C then S else S /
if C then S
S matches “ if C then S foo”
S matches “ if C then S1 else S2”
S fails on “ if C else S”
Prioritized Choice with Backtracking
S  A / B means:
“ To parse an S, first try to parse an A.
If A fails, then backtrack and try to parse a B.”
Example from the C++ standard:
“ An expression­statement ... can be indistinguishable
from a declaration ... In those cases the statement is a
declaration.”
statement  declaration /
expression­statement
Greedy Option and Repetition
A  e? equivalent to A  e / 
A  e* equivalent to A  e A / 
A  e+ equivalent to A  e e*
Example:
I  L+
L  a / b / c / ...
I matches “ foobar”
I matches “ foo(bar)”
I fails on “ 123”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
A  foo &(bar)
B  foo !(bar)
A matches “ foobar”
A fails on “ foobie”
B matches “ foobie”
B fails on “ foobar”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Begin marker
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Internal elements
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
End marker
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Only if an end marker doesn'
t start here...
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Only if an end marker doesn'
t start here...
...consume a nested comment,
or else consume any single character.
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Unified Grammars
PEGs can express both lexical and hierarchical
syntax of realistic languages in one grammar
● Example (in paper):
Complete self­describing PEG in 2/3 column
● Example (on web):
Unified PEG for Java language
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
General­purpose expression syntax
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
String literals
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Quotable characters
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Formal Properties of PEGs
● Express all deterministic languages ­ LR(k)
● Closed under union, intersection, complement
●
Some non­context free languages, e.g., an
bn
cn
● Undecidable whether L(G) = ∅
● Predicate operators can be eliminated
– ...but the process is non­trivial!
Minimalist Forms
Predicate­free PEG
⇩
TS [Birman '
70/'
73]
TDPL [Aho '
72]
Any PEG
⇩
gTS [Birman '
70/'
73]
GTDPL [Aho '
72]
A  
A  a
A  f
A  BC / D
A  
A  a
A  f
A  B[C, D]
⇦⇨
Formal Contributions
● Generalize TDPL/GTDPL with more expressive
structured parsing expression syntax
● Negative syntactic predicate ­ !e
● Predicate elimination transformation
– Intermediate stages depend on
generalized parsing expressions
● Proof of equivalence of TDPL and GTDPL
What can'
t PEGs express directly?
● Ambiguous languages
That's
what CFGs were designed for!
● Globally disambiguated languages?
– {a,b}n a {a,b}n ?
● State­ or semantic­dependent syntax
– C, C++ typedef symbol tables
– Python, Haskell, ML layout
Generating Parsers from PEGs
Recursive­descent parsing
☞Simple & direct, but exponential­time if not careful
Packrat parsing [Birman '
70/'
73, Ford '
02]
☞Linear­time, but can consume substantial storage
Classic LL/LR algorithms?
☞Grammar restrictions, but both time­ & space­efficient
Conclusion
PEGs model common parsing practices
– Prioritized choice, greedy rules, syntactic predicates
PEGs naturally complement CFGs
– CFG: generative system, for ambiguous languages
– PEG: recognition­based, for unambiguous languages
For more info:
http://pdos.lcs.mit.edu/~baford/packrat
(or G
Go
oo
og
gl
le
e for “ Packrat Parsing”)

More Related Content

Similar to Parsing Expression Grammars

Syntax Analysis.pdf
Syntax Analysis.pdfSyntax Analysis.pdf
Syntax Analysis.pdfShivang70701
 
Chapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptChapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptFamiDan
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfeliasabdi2024
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Talk Unix Shell Script
Talk Unix Shell ScriptTalk Unix Shell Script
Talk Unix Shell ScriptDr.Ravi
 
Talk Unix Shell Script 1
Talk Unix Shell Script 1Talk Unix Shell Script 1
Talk Unix Shell Script 1Dr.Ravi
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Sandy Smith
 
Strings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlStrings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlsana mateen
 
Unit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionsUnit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionssana mateen
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Ch3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxCh3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxTameneTamire
 

Similar to Parsing Expression Grammars (20)

Javascript
JavascriptJavascript
Javascript
 
First steps in C-Shell
First steps in C-ShellFirst steps in C-Shell
First steps in C-Shell
 
Syntax Analysis.pdf
Syntax Analysis.pdfSyntax Analysis.pdf
Syntax Analysis.pdf
 
Module 11
Module 11Module 11
Module 11
 
Lexical 2
Lexical 2Lexical 2
Lexical 2
 
Chapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.pptChapter 3 -Syntax Analyzer.ppt
Chapter 3 -Syntax Analyzer.ppt
 
New compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdfNew compiler design 101 April 13 2024.pdf
New compiler design 101 April 13 2024.pdf
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
C Programming
C ProgrammingC Programming
C Programming
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Talk Unix Shell Script
Talk Unix Shell ScriptTalk Unix Shell Script
Talk Unix Shell Script
 
Cleancode
CleancodeCleancode
Cleancode
 
Talk Unix Shell Script 1
Talk Unix Shell Script 1Talk Unix Shell Script 1
Talk Unix Shell Script 1
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
Strings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perlStrings,patterns and regular expressions in perl
Strings,patterns and regular expressions in perl
 
Unit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressionsUnit 1-strings,patterns and regular expressions
Unit 1-strings,patterns and regular expressions
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Ch3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptxCh3_Syntax Analysis.pptx
Ch3_Syntax Analysis.pptx
 

Recently uploaded

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Parsing Expression Grammars

  • 1. Parsing Expression Grammars: A Recognition­Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004
  • 3. Designing a Language Syntax 1.Formalize syntax via context­free grammar 2.Write a YACC parser specification 3.Hack on grammar until “ near­LALR(1)” 4.Use generated parser Textbook Method
  • 4. Designing a Language Syntax 1.Formalize syntax via context­free grammar 2.Write a YACC parser specification 3.Hack on grammar until “ near­LALR(1)” 4.Use generated parser 1.Specify syntax informally 2.Write a recursive descent parser Textbook Method Pragmatic Method
  • 5.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa
  • 6.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa Start symbol
  • 7.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa Start symbol Output strings
  • 8. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a
  • 9. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a Input string
  • 10. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a Input string Derive structure
  • 11. Take­Home Points Key benefits of PEGs: ● Simplicity, formalism, analyzability of CFGs ● Closer match to syntax practices – More expressive than deterministic CFGs (LL/LR) – More of the “ right kind” of expressiveness: prioritized choice, greedy rules, syntactic predicates – Unlimited lookahead, backtracking ● Linear­time parsing for any PEG
  • 12. What kind of recursive descent parsing? Key assumptions: ● Parsing functions are stateless: depend only on input string ● Parsing functions make decisions locally: return at most one result (success/failure)
  • 13. Parsing Expression Grammars Consists of: (∑, N, R, eS) – ∑: finite set of terminals (character set) – N: finite set of nonterminals – R: finite set of rules of the form “A  e”, where A ∈ N, e is a parsing expression. – eS: a parsing expression called the start expression.
  • 14. Parsing Expressions  the empty string a terminal (a ∈ ∑) A nonterminal (A ∈ N) e1 e2 a sequence of parsing expressions e1 / e2 prioritized choice between alternatives e?, e*, e+ optional, zero­or­more, one­or­more &e, !e syntactic predicates
  • 15. How PEGs Express Languages Given input string s, a parsing expression either: – Matches and consumes a prefix s' of s. – Fails on s. Example: S  bad S matches “ badder” S matches “ baddest” S fails on “ abad” S fails on “ babe”
  • 16. Prioritized Choice with Backtracking S  A / B means: “ To parse an S, first try to parse an A. If A fails, then backtrack and try to parse a B.” Example: S  if C then S else S / if C then S S matches “ if C then S foo” S matches “ if C then S1 else S2” S fails on “ if C else S”
  • 17. Prioritized Choice with Backtracking S  A / B means: “ To parse an S, first try to parse an A. If A fails, then backtrack and try to parse a B.” Example from the C++ standard: “ An expression­statement ... can be indistinguishable from a declaration ... In those cases the statement is a declaration.” statement  declaration / expression­statement
  • 18. Greedy Option and Repetition A  e? equivalent to A  e /  A  e* equivalent to A  e A /  A  e+ equivalent to A  e e* Example: I  L+ L  a / b / c / ... I matches “ foobar” I matches “ foo(bar)” I fails on “ 123”
  • 19. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: A  foo &(bar) B  foo !(bar) A matches “ foobar” A fails on “ foobie” B matches “ foobie” B fails on “ foobar”
  • 20. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)”
  • 21. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Begin marker
  • 22. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Internal elements
  • 23. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” End marker
  • 24. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” ➔
  • 25. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Only if an end marker doesn' t start here... ➔
  • 26. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Only if an end marker doesn' t start here... ...consume a nested comment, or else consume any single character. ➔
  • 27. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)”
  • 28. Unified Grammars PEGs can express both lexical and hierarchical syntax of realistic languages in one grammar ● Example (in paper): Complete self­describing PEG in 2/3 column ● Example (on web): Unified PEG for Java language
  • 29. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal]
  • 30. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] General­purpose expression syntax
  • 31. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] String literals
  • 32. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] Quotable characters
  • 33. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal]
  • 34. Formal Properties of PEGs ● Express all deterministic languages ­ LR(k) ● Closed under union, intersection, complement ● Some non­context free languages, e.g., an bn cn ● Undecidable whether L(G) = ∅ ● Predicate operators can be eliminated – ...but the process is non­trivial!
  • 35. Minimalist Forms Predicate­free PEG ⇩ TS [Birman ' 70/' 73] TDPL [Aho ' 72] Any PEG ⇩ gTS [Birman ' 70/' 73] GTDPL [Aho ' 72] A   A  a A  f A  BC / D A   A  a A  f A  B[C, D] ⇦⇨
  • 36. Formal Contributions ● Generalize TDPL/GTDPL with more expressive structured parsing expression syntax ● Negative syntactic predicate ­ !e ● Predicate elimination transformation – Intermediate stages depend on generalized parsing expressions ● Proof of equivalence of TDPL and GTDPL
  • 37. What can' t PEGs express directly? ● Ambiguous languages That's what CFGs were designed for! ● Globally disambiguated languages? – {a,b}n a {a,b}n ? ● State­ or semantic­dependent syntax – C, C++ typedef symbol tables – Python, Haskell, ML layout
  • 38. Generating Parsers from PEGs Recursive­descent parsing ☞Simple & direct, but exponential­time if not careful Packrat parsing [Birman ' 70/' 73, Ford ' 02] ☞Linear­time, but can consume substantial storage Classic LL/LR algorithms? ☞Grammar restrictions, but both time­ & space­efficient
  • 39. Conclusion PEGs model common parsing practices – Prioritized choice, greedy rules, syntactic predicates PEGs naturally complement CFGs – CFG: generative system, for ambiguous languages – PEG: recognition­based, for unambiguous languages For more info: http://pdos.lcs.mit.edu/~baford/packrat (or G Go oo og gl le e for “ Packrat Parsing”)