SlideShare ist ein Scribd-Unternehmen logo
1 von 25
ANTLR – Writing Parsers the
Easy Way
Michael Yarichuk
What is in common between those?
• Parse text logs (or any structured data) to make it searchable
• Parse custom (and complex!) configuration format
• Allow users to query your data
• Adjust or refactor incoming structured user queries
• Implement a DSL
• Parse a custom data format (no, not with REGEX!!)
Parsers!
And no, in case you were wondering, regex is
NOT an alternative to parsers!
So, let's deep dive!
First, what are parsers anyway?
We will talk about differences a
bit later…
Regardless of type, parsers are "magic"
Magic!
Source Code
Abstract Syntax Tree
Magic? No, rules!
group ::= '(' expression ')'
factor ::= integer | group
term ::= factor (('*' factor) | ('/' factor))*
expression ::= term (('+' term) | ('-' term))*
Backus-Naur Form grammar!
Parsing Process
Tokenize
Apply
grammar
rules
Build AST
LPAREN NUMBER PLUS_OP RPAREN MULT_OP NUMBER
Tokenization (Lexing)
(1 + 2) * 3
Token Stream
Grammar rules = state machine!
Grammar for algebraic expressions
Notice the
recursion!
Tokens
Abstract Syntax Tree
(1 + 2) * 3
LL Parser
• Predict  based on current token and lookahead, decide which rule
to try to apply
• Match  apply grammar rule and apply results to AST
• Top-down parsing!
• Backtrack if predict step is wrong
LR Parsers
• Shift  put next token to buffer
• Reduce  Apply grammar rule on tokens in buffer
• Bottom-up parsing!
LL Parsers vs LR Parsers
• Ambiguity resolving capabilities
• Error handling is better in LL (better error context!)
• LL implementations are easier to understand
• Pretty much equal in results
• In most cases, performance is not an issue!
Note: other types of parsers are less useful in practice
https://core.ac.uk/download/pdf/62921535.pdf
Why ANTLR?
• LL(*) parser generator
• Can generate parsers in MANY languages
• (C#, Java, C++, JavaScript and more)
• Can parse pretty much any useful grammar
• Can handle regular and contex-free languages in Chomsky Hierarchy
• Resolves ambiguities with programmatic predicates
ANTLR4 Grammar
• Combined grammar (lexer + parser in the same file)
• Separated grammar (lexer and parser different files)
• Can have multiple files with "includes"
Typical workflow
Antlr4 [options] [grammar.g4]
Lexer
• “Rules” define how to parse each token
• “Rule” definitions are a variant of regex
• Can define “fragments” – composable and re-usable definitions
• Lexing is “greedy”
Parsing
• Parsing “rules” represent state machine
• Parsing “rules” may use either tokens or other rules
• ANTLR4 supports left-recursion
Interpreting: ANTLR4 Visitor vs Listener
Visitor
• Needs explicit Visit() calls
• Call Visit() for each rule
Listener
• Methods called by ANTLR
• Traversal “events”
• EnterXYZ()
• VisitXYZ()
• ExitXYZ()
ANTLR4 demo
What we didn't cover
• Syntax ambiguity
• Parser performance (multiple ways to write the same thing!)
• Error handling (ANTLR4)
• Island grammars (ANTLR4)
• Actions and attributes (ANTLR4)
• Semantic predicates (ANTLR4)
Questions?
• michael.yarichuk@gmail.com
• @myarichuk
• https://github.com/myarichuk/AlgebraExpressionEvaluator
• https://github.com/myarichuk/RavenQueryParser

Weitere ähnliche Inhalte

Was ist angesagt?

"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"
Naomi Dushay
 

Was ist angesagt? (12)

"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"
 
Living with-spec
Living with-specLiving with-spec
Living with-spec
 
Living with-spec
Living with-specLiving with-spec
Living with-spec
 
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
 
Python training in hyderabad
Python training in hyderabadPython training in hyderabad
Python training in hyderabad
 
Acl reading@2016 10-26
Acl reading@2016 10-26Acl reading@2016 10-26
Acl reading@2016 10-26
 
Python
PythonPython
Python
 
Searching for The Matrix in haystack (with Elasticsearch)
Searching for The Matrix in haystack  (with Elasticsearch)Searching for The Matrix in haystack  (with Elasticsearch)
Searching for The Matrix in haystack (with Elasticsearch)
 
RIBBUN SOFTWARE
RIBBUN SOFTWARERIBBUN SOFTWARE
RIBBUN SOFTWARE
 
Introduction what is java
Introduction what is javaIntroduction what is java
Introduction what is java
 
Extracts from "Clean Code"
Extracts from "Clean Code"Extracts from "Clean Code"
Extracts from "Clean Code"
 
Extracts from "Clean Code"
Extracts from "Clean Code"Extracts from "Clean Code"
Extracts from "Clean Code"
 

Ähnlich wie ANTLR - Writing Parsers the Easy Way

Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
Iván Montes
 
Sketch engine presentation
Sketch engine presentationSketch engine presentation
Sketch engine presentation
iwan_rg
 

Ähnlich wie ANTLR - Writing Parsers the Easy Way (20)

Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
 
Alexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape MeetupAlexey Golub - Writing parsers in c# | 3Shape Meetup
Alexey Golub - Writing parsers in c# | 3Shape Meetup
 
Parser
ParserParser
Parser
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
PostgreSQL - Case Study
PostgreSQL - Case StudyPostgreSQL - Case Study
PostgreSQL - Case Study
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
XML
XMLXML
XML
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
 
ANTLR4 in depth
ANTLR4 in depthANTLR4 in depth
ANTLR4 in depth
 
Assignment4.pptx
Assignment4.pptxAssignment4.pptx
Assignment4.pptx
 
Sketch engine presentation
Sketch engine presentationSketch engine presentation
Sketch engine presentation
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and Semantics
 

Kürzlich hochgeladen

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 

ANTLR - Writing Parsers the Easy Way

  • 1. ANTLR – Writing Parsers the Easy Way Michael Yarichuk
  • 2. What is in common between those? • Parse text logs (or any structured data) to make it searchable • Parse custom (and complex!) configuration format • Allow users to query your data • Adjust or refactor incoming structured user queries • Implement a DSL • Parse a custom data format (no, not with REGEX!!)
  • 4. And no, in case you were wondering, regex is NOT an alternative to parsers!
  • 6. First, what are parsers anyway?
  • 7. We will talk about differences a bit later…
  • 8. Regardless of type, parsers are "magic" Magic! Source Code Abstract Syntax Tree
  • 9. Magic? No, rules! group ::= '(' expression ')' factor ::= integer | group term ::= factor (('*' factor) | ('/' factor))* expression ::= term (('+' term) | ('-' term))* Backus-Naur Form grammar!
  • 11. LPAREN NUMBER PLUS_OP RPAREN MULT_OP NUMBER Tokenization (Lexing) (1 + 2) * 3 Token Stream
  • 12. Grammar rules = state machine! Grammar for algebraic expressions Notice the recursion! Tokens
  • 14. LL Parser • Predict  based on current token and lookahead, decide which rule to try to apply • Match  apply grammar rule and apply results to AST • Top-down parsing! • Backtrack if predict step is wrong
  • 15. LR Parsers • Shift  put next token to buffer • Reduce  Apply grammar rule on tokens in buffer • Bottom-up parsing!
  • 16. LL Parsers vs LR Parsers • Ambiguity resolving capabilities • Error handling is better in LL (better error context!) • LL implementations are easier to understand • Pretty much equal in results • In most cases, performance is not an issue! Note: other types of parsers are less useful in practice https://core.ac.uk/download/pdf/62921535.pdf
  • 17. Why ANTLR? • LL(*) parser generator • Can generate parsers in MANY languages • (C#, Java, C++, JavaScript and more) • Can parse pretty much any useful grammar • Can handle regular and contex-free languages in Chomsky Hierarchy • Resolves ambiguities with programmatic predicates
  • 18. ANTLR4 Grammar • Combined grammar (lexer + parser in the same file) • Separated grammar (lexer and parser different files) • Can have multiple files with "includes"
  • 20. Lexer • “Rules” define how to parse each token • “Rule” definitions are a variant of regex • Can define “fragments” – composable and re-usable definitions • Lexing is “greedy”
  • 21. Parsing • Parsing “rules” represent state machine • Parsing “rules” may use either tokens or other rules • ANTLR4 supports left-recursion
  • 22. Interpreting: ANTLR4 Visitor vs Listener Visitor • Needs explicit Visit() calls • Call Visit() for each rule Listener • Methods called by ANTLR • Traversal “events” • EnterXYZ() • VisitXYZ() • ExitXYZ()
  • 24. What we didn't cover • Syntax ambiguity • Parser performance (multiple ways to write the same thing!) • Error handling (ANTLR4) • Island grammars (ANTLR4) • Actions and attributes (ANTLR4) • Semantic predicates (ANTLR4)
  • 25. Questions? • michael.yarichuk@gmail.com • @myarichuk • https://github.com/myarichuk/AlgebraExpressionEvaluator • https://github.com/myarichuk/RavenQueryParser