SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
TABLE OF CONTENT

 1. Table of Content
 2. Introduction
       1. Summary
       2. About The Author
       3. Before We Begin
 3. Overview
       1. The Four Parts of a Language
       2. Meet Awesome: Our Toy Language
 4. Lexer
       1. Lex (Flex)
       2. Ragel
       3. Python Style Indentation For Awesome
       4. Do It Yourself I
 5. Parser
       1. Bison (Yacc)
       2. Lemon
       3. ANTLR
       4. PEGs
       5. Operator Precedence
       6. Connecting The Lexer and Parser in Awesome
       7. Do It Yourself II
 6. Runtime Model
       1. Procedural
       2. Class-based
       3. Prototype-based
       4. Functional
       5. Our Awesome Runtime
       6. Do It Yourself III
7. Interpreter
       1. Do It Yourself IV
 8. Compilation
       1. Using LLVM from Ruby
       2. Compiling Awesome to Machine Code
 9. Virtual Machine
       1. Byte-code
       2. Types of VM
       3. Prototyping a VM in Ruby
10. Going Further
       1. Homoiconicity
       2. Self-Hosting
       3. What’s Missing?
11. Resources
       1. Books & Papers
       2. Events
       3. Forums and Blogs
       4. Interesting Languages
12. Solutions to Do It Yourself
       1. Solutions to Do It Yourself I
       2. Solutions to Do It Yourself II
       3. Solutions to Do It Yourself III
       4. Solutions to Do It Yourself IV
13. Appendix: Mio, a minimalist homoiconic language
       1. Homoicowhat?
       2. Messages all the way down
       3. The Runtime
       4. Implementing Mio in Mio
       5. But it’s ugly
14. Farewell!
Published November 2011.

Cover background image © Asja Boros

Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy
is for a single user. You may not share it in any way unless you have written permission
of the author.
This is a sample chapter.
Buy the full book online at
   createyourproglang.com
PARSER

    By themselves, the tokens output by the lexer are just building blocks. The parser
    contextualiees them by organieing them in a structure. The lexer produces an array of
    tokensj the parser produces a tree of nodes.

    Lets take those tokens from previous section:


1    [IDENTIFIER print] [STRING "I ate"] [COMMA]
2                          [NUMBER 3] [COMMA]
3                          [IDENTIFIER pies]



    The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of
    nodes that represents what the code means to the language. The previous lexer
    tokens will produce the following:


1    [lCall name=print,
2           argements=[lString valee="I ate"k,
3                          lNemher valee=3k,
4                          lLocal name=piesk]
5    k]



    Or as a visual tree:

    Figure 2
The parser found that print was a method call and the following tokens are the
    arguments.

    Parser generators are commonly used to accomplish the otherwise tedious task of
    building a parser. Much like the English language, a programming language needs a
    grammar to define its rules. The parser generator will convert this grammar into a parser
    that will compile lexer tokens into AST nodes.


    BISON (YACC )

    Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet
    Another Compiler Compiler, because it compiles the grammar to a compiler of
    tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it
    has been ported to several target languages.

             Racc for Ruby
             Ply for Python
             iavaCC for iava

    Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how
    a Yacc grammar rule is defined:


1    Call: /* Name of the rele */
2         Expression '.' IDENTIFIER                     { yy = CallNodefnew(y1, y3, NULL); }
3    j Expression '.' IDENTIFIER '(' ArgList ')'        { yy = CallNodefnew(y1, y3, y5); }
4    /*     y1       y2        y3     y4   y5      y6     l= valees from the rele are stored in
5                                                          these variahles. */
6    ;



    On the left is defined how the rule can be matched using tokens and other rules.
    On the right side, between brackets is the action to execute when the rule matches.
In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store
the result in $$.


LEMON

Lemon is huite similar to Yacc, with a few differences. From its website:

                    Using a different grammar syntax which is less prone to programming

                    errors.

                    The parser generated by Lemon is both re-entrant and thread-safe.

                    Lemon includes the concept of a non-terminal destructor, which

                    makes it much easier to write a parser that does not leak memory.


For more information, refer to the the manual or check real examples inside Potion.


ANTLR

ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in
the same grammar. It has been ported to several target languages.


PEGS

Parsing Expression Grammars, or PEGs, are very powerful at parsing complex
languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s
infamous syntax with encouraging results (tinyrb’s grammar).

Treetop is an interesting Ruby tool for creating PEG.


OPERATOR PRECEDENCE

One of the common pitfalls of language parsing is operator precedence. Parsing x
+ y    * z should not produce the same result as (x                + y)      * z, same for all other
operators. Each language has an operator precedence table, often based on
    mathematics order of operations. Several ways to handle this exist. Yacc-based
    parsers implement the Shunting Yard algorithm in which you give a precedence
    level to each kind of operator. Operators are declared in Bison and Yacc with %left
    and %right macros. Read more in Bison’s manual.

    Here’s the operator precedence table for our language, based on the C language
    operator precedence:


1    left   '.'
2    right 'n'
3    left   '*' '/'
4    left   '+' '-'
5    left   'k' 'k=' 'l' 'l='
6    left   '==' 'n='
7    left   'vv'
8    left   'jj'
9    right '='
10   left   ','



    The higher the precedence (top is higher), the sooner the operator will be parsed. If
    the line a + b    * c is being parsed, the part b   * c will be parsed first since *
    has higher precedence than +. Now, if several operators having the same
    precedence are competing to be parsed all the once, the conflict is resolved using
    associativity, declared with the left and right keyword before the token. For
    example, with the expression a = b        = c. Since = has right-to-left associativity, it
    will start parsing from the right, b   = c. Resulting in a = (b = c).

    For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative
    can be used. Simply declaring the grammar rules in the right order will produce the
    desired result:
1    expression:          exeality
 2    exeality:            additive ( ( '==' j 'n=' ) additive )*
 3    additive:            meltiplicative ( ( '+' j '-' ) meltiplicative )*
 4    meltiplicative:    primary ( ( '*' j '/' ) primary )*
 5    primary:            '(' expression ')' j NUMBER j zARIABLE j '-' primary



     The parser will try to match rules recursively, starting from expression and
     finding its way to primary. Since multiplicative is the last rule called in the
     parsing process, it will have greater precedence.


     CONNECTING THE LEXER AND PARSER IN
     AWESOME

     For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder
     to build a parser from scratch than it is to create a lexer. However, most languages
     end up writing their own parser because the result is faster and provides better error
     reporting.

     The input file you supply to Racc contains the grammar of your language and is very
     similar to a Yacc grammar.


 1    class qarser                                                                  grammar.y
 2

 3    g Declare toiens prodeced hy the lexer
 4    toien IF ELSE
 5    toien DEF
 6    toien CLASS
 7    toien NEWLINE
 8    toien NUMBER
 9    toien STRING
10    toien TRUE FALSE NIL
11    toien IDENTIFIER
12    toien CONSTANT
13    toien INDENT DEDENT
14
This is a sample chapter.
Buy the full book online at

  createyourproglang.com

Weitere ähnliche Inhalte

Was ist angesagt?

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1Nicholas I
 
Python 3 Programming Language
Python 3 Programming LanguagePython 3 Programming Language
Python 3 Programming LanguageTahani Al-Manie
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming LanguageDipankar Achinta
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way PresentationAmira ElSharkawy
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonYi-Fan Chu
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Tom Lee
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Tom Lee
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreJRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreErin Dees
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]Tom Lee
 
introduction to python
 introduction to python introduction to python
introduction to pythonJincy Nelson
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric SystemErin Dees
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM CompilerErin Dees
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & stylePython Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & styleKevlin Henney
 
Learn python – for beginners
Learn python – for beginnersLearn python – for beginners
Learn python – for beginnersRajKumar Rampelli
 

Was ist angesagt? (20)

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 3)
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1
 
Python 3 Programming Language
Python 3 Programming LanguagePython 3 Programming Language
Python 3 Programming Language
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming Language
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
 
Inside Python [OSCON 2012]
Inside Python [OSCON 2012]Inside Python [OSCON 2012]
Inside Python [OSCON 2012]
 
JRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists AnymoreJRuby, Not Just For Hard-Headed Pragmatists Anymore
JRuby, Not Just For Hard-Headed Pragmatists Anymore
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 
introduction to python
 introduction to python introduction to python
introduction to python
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Python
PythonPython
Python
 
Unit VI
Unit VI Unit VI
Unit VI
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM Compiler
 
Python Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & stylePython Foundation – A programmer's introduction to Python concepts & style
Python Foundation – A programmer's introduction to Python concepts & style
 
Lexing and parsing
Lexing and parsingLexing and parsing
Lexing and parsing
 
Learn python – for beginners
Learn python – for beginnersLearn python – for beginners
Learn python – for beginners
 

Ähnlich wie How to create a programming language

Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Renzo Borgatti
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysismengistu23
 
What is the deal with Elixir?
What is the deal with Elixir?What is the deal with Elixir?
What is the deal with Elixir?George Coffey
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationCorey Richardson
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
Erlang kickstart
Erlang kickstartErlang kickstart
Erlang kickstartRyan Brown
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholisticoscon2007
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
JavaScript: Core Part
JavaScript: Core PartJavaScript: Core Part
JavaScript: Core Part維佋 唐
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptxSouvikRoy149
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationAkhil Kaushik
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Keyappasami
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environmentsJ'tong Atong
 
117 A Outline 25
117 A Outline 25117 A Outline 25
117 A Outline 25wasntgosu
 

Ähnlich wie How to create a programming language (20)

Parser
ParserParser
Parser
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Cd ch2 - lexical analysis
Cd   ch2 - lexical analysisCd   ch2 - lexical analysis
Cd ch2 - lexical analysis
 
What is the deal with Elixir?
What is the deal with Elixir?What is the deal with Elixir?
What is the deal with Elixir?
 
Mozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 PresentationMozilla Intern Summer 2014 Presentation
Mozilla Intern Summer 2014 Presentation
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
Perl intro
Perl introPerl intro
Perl intro
 
Erlang kickstart
Erlang kickstartErlang kickstart
Erlang kickstart
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
JavaScript: Core Part
JavaScript: Core PartJavaScript: Core Part
JavaScript: Core Part
 
Compiler Design.pptx
Compiler Design.pptxCompiler Design.pptx
Compiler Design.pptx
 
Symbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code GenerationSymbol Table, Error Handler & Code Generation
Symbol Table, Error Handler & Code Generation
 
Parsing
ParsingParsing
Parsing
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
117 A Outline 25
117 A Outline 25117 A Outline 25
117 A Outline 25
 

Kürzlich hochgeladen

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

How to create a programming language

  • 1.
  • 2. TABLE OF CONTENT 1. Table of Content 2. Introduction 1. Summary 2. About The Author 3. Before We Begin 3. Overview 1. The Four Parts of a Language 2. Meet Awesome: Our Toy Language 4. Lexer 1. Lex (Flex) 2. Ragel 3. Python Style Indentation For Awesome 4. Do It Yourself I 5. Parser 1. Bison (Yacc) 2. Lemon 3. ANTLR 4. PEGs 5. Operator Precedence 6. Connecting The Lexer and Parser in Awesome 7. Do It Yourself II 6. Runtime Model 1. Procedural 2. Class-based 3. Prototype-based 4. Functional 5. Our Awesome Runtime 6. Do It Yourself III
  • 3. 7. Interpreter 1. Do It Yourself IV 8. Compilation 1. Using LLVM from Ruby 2. Compiling Awesome to Machine Code 9. Virtual Machine 1. Byte-code 2. Types of VM 3. Prototyping a VM in Ruby 10. Going Further 1. Homoiconicity 2. Self-Hosting 3. What’s Missing? 11. Resources 1. Books & Papers 2. Events 3. Forums and Blogs 4. Interesting Languages 12. Solutions to Do It Yourself 1. Solutions to Do It Yourself I 2. Solutions to Do It Yourself II 3. Solutions to Do It Yourself III 4. Solutions to Do It Yourself IV 13. Appendix: Mio, a minimalist homoiconic language 1. Homoicowhat? 2. Messages all the way down 3. The Runtime 4. Implementing Mio in Mio 5. But it’s ugly 14. Farewell!
  • 4. Published November 2011. Cover background image © Asja Boros Content of this book is © Marc-André Cournoyer. All right reserved. This eBook copy is for a single user. You may not share it in any way unless you have written permission of the author.
  • 5. This is a sample chapter. Buy the full book online at createyourproglang.com
  • 6. PARSER By themselves, the tokens output by the lexer are just building blocks. The parser contextualiees them by organieing them in a structure. The lexer produces an array of tokensj the parser produces a tree of nodes. Lets take those tokens from previous section: 1 [IDENTIFIER print] [STRING "I ate"] [COMMA] 2 [NUMBER 3] [COMMA] 3 [IDENTIFIER pies] The most common parser output is an Abstract Syntax Tree, or AST. It’s a tree of nodes that represents what the code means to the language. The previous lexer tokens will produce the following: 1 [lCall name=print, 2 argements=[lString valee="I ate"k, 3 lNemher valee=3k, 4 lLocal name=piesk] 5 k] Or as a visual tree: Figure 2
  • 7. The parser found that print was a method call and the following tokens are the arguments. Parser generators are commonly used to accomplish the otherwise tedious task of building a parser. Much like the English language, a programming language needs a grammar to define its rules. The parser generator will convert this grammar into a parser that will compile lexer tokens into AST nodes. BISON (YACC ) Bison is a modern version of Yacc, the most widely used parser. Yacc stands for Yet Another Compiler Compiler, because it compiles the grammar to a compiler of tokens. It’s used in several mainstream languages, like Ruby. Most often used with Lex, it has been ported to several target languages. Racc for Ruby Ply for Python iavaCC for iava Like Lex, from the previous chapter, Yacc compiles a grammar into a parser. Here’s how a Yacc grammar rule is defined: 1 Call: /* Name of the rele */ 2 Expression '.' IDENTIFIER { yy = CallNodefnew(y1, y3, NULL); } 3 j Expression '.' IDENTIFIER '(' ArgList ')' { yy = CallNodefnew(y1, y3, y5); } 4 /* y1 y2 y3 y4 y5 y6 l= valees from the rele are stored in 5 these variahles. */ 6 ; On the left is defined how the rule can be matched using tokens and other rules. On the right side, between brackets is the action to execute when the rule matches.
  • 8. In that block, we can reference tokens being matched using $1, $2, etc. Finally, we store the result in $$. LEMON Lemon is huite similar to Yacc, with a few differences. From its website: Using a different grammar syntax which is less prone to programming errors. The parser generated by Lemon is both re-entrant and thread-safe. Lemon includes the concept of a non-terminal destructor, which makes it much easier to write a parser that does not leak memory. For more information, refer to the the manual or check real examples inside Potion. ANTLR ANTLR is another parsing tool. This one let’s you declare lexing and parsing rules in the same grammar. It has been ported to several target languages. PEGS Parsing Expression Grammars, or PEGs, are very powerful at parsing complex languages. I’ve used a PEG generated from pegkleg in tinyrb to parse Ruby’s infamous syntax with encouraging results (tinyrb’s grammar). Treetop is an interesting Ruby tool for creating PEG. OPERATOR PRECEDENCE One of the common pitfalls of language parsing is operator precedence. Parsing x + y * z should not produce the same result as (x + y) * z, same for all other
  • 9. operators. Each language has an operator precedence table, often based on mathematics order of operations. Several ways to handle this exist. Yacc-based parsers implement the Shunting Yard algorithm in which you give a precedence level to each kind of operator. Operators are declared in Bison and Yacc with %left and %right macros. Read more in Bison’s manual. Here’s the operator precedence table for our language, based on the C language operator precedence: 1 left '.' 2 right 'n' 3 left '*' '/' 4 left '+' '-' 5 left 'k' 'k=' 'l' 'l=' 6 left '==' 'n=' 7 left 'vv' 8 left 'jj' 9 right '=' 10 left ',' The higher the precedence (top is higher), the sooner the operator will be parsed. If the line a + b * c is being parsed, the part b * c will be parsed first since * has higher precedence than +. Now, if several operators having the same precedence are competing to be parsed all the once, the conflict is resolved using associativity, declared with the left and right keyword before the token. For example, with the expression a = b = c. Since = has right-to-left associativity, it will start parsing from the right, b = c. Resulting in a = (b = c). For other types of parsers (ANTLR and PEG) a simpler but less efficient alternative can be used. Simply declaring the grammar rules in the right order will produce the desired result:
  • 10. 1 expression: exeality 2 exeality: additive ( ( '==' j 'n=' ) additive )* 3 additive: meltiplicative ( ( '+' j '-' ) meltiplicative )* 4 meltiplicative: primary ( ( '*' j '/' ) primary )* 5 primary: '(' expression ')' j NUMBER j zARIABLE j '-' primary The parser will try to match rules recursively, starting from expression and finding its way to primary. Since multiplicative is the last rule called in the parsing process, it will have greater precedence. CONNECTING THE LEXER AND PARSER IN AWESOME For our Awesome parser we’ll use Racc, the Ruby version of Yacc. It’s much harder to build a parser from scratch than it is to create a lexer. However, most languages end up writing their own parser because the result is faster and provides better error reporting. The input file you supply to Racc contains the grammar of your language and is very similar to a Yacc grammar. 1 class qarser grammar.y 2 3 g Declare toiens prodeced hy the lexer 4 toien IF ELSE 5 toien DEF 6 toien CLASS 7 toien NEWLINE 8 toien NUMBER 9 toien STRING 10 toien TRUE FALSE NIL 11 toien IDENTIFIER 12 toien CONSTANT 13 toien INDENT DEDENT 14
  • 11. This is a sample chapter. Buy the full book online at createyourproglang.com