SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Lex Yacc tutorial 
Kun-Yuan Hsieh 
kyshieh@pllab.cs.nthu.edu.tw 
Programming Language Lab., NTHU 
PLLab, NTHU,Cs2403 Programming 
Languages 
1
PLLab, NTHU,Cs2403 Programming 
Languages 
2 
Overview 
take a glance at Lex!
Compilation Sequence 
PLLab, NTHU,Cs2403 Programming 
Languages 
3
PLLab, NTHU,Cs2403 Programming 
Languages 
4 
What is Lex? 
• The main job of a lexical analyzer 
(scanner) is to break up an input stream 
into more usable elements (tokens) 
a = b + c * d; 
ID ASSIGN ID PLUS ID MULT ID SEMI 
• Lex is an utility to help you rapidly 
generate your scanners
Lex – Lexical Analyzer 
• Lexical analyzers tokenize input streams 
• Tokens are the terminals of a language 
– English 
PLLab, NTHU,Cs2403 Programming 
Languages 
5 
• words, punctuation marks, … 
– Programming language 
• Identifiers, operators, keywords, … 
• Regular expressions define 
terminals/tokens
Lex Source Program 
PLLab, NTHU,Cs2403 Programming 
Languages 
6 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments 
digit [0-9] 
letter [a-zA-Z] 
%% 
{letter}({letter}|{digit})* printf(“id: %sn”, yytext); 
n printf(“new linen”); 
%% 
main() { 
yylex(); 
}
Lex Source to C Program 
• The table is translated to a C program 
(lex.yy.c) which 
– reads an input stream 
– partitioning the input into strings which 
match the given expressions and 
– copying it to an output stream if necessary 
PLLab, NTHU,Cs2403 Programming 
Languages 
7
An Overview of Lex 
PLLab, NTHU,Cs2403 Programming 
Languages 
8 
Lex 
C compiler 
a.out 
Lex source 
program 
lex.yy.c 
input 
lex.yy.c 
a.out 
tokens
(required) 
(optional) 
{definitions} 
%% 
{transition rules} 
%% 
{user subroutines} 
PLLab, NTHU,Cs2403 Programming 
Languages 
9 
Lex Source 
• Lex source is separated into three sections by % 
% delimiters 
• The general format of Lex source is 
• The %% 
absolute minimum Lex program is thus
PLLab, NTHU,Cs2403 Programming 
Languages 
10 
Lex v.s. Yacc 
• Lex 
– Lex generates C code for a lexical analyzer, or 
scanner 
– Lex uses patterns that match strings in the input 
and converts the strings to tokens 
• Yacc 
– Yacc generates C code for syntax analyzer, or 
parser. 
– Yacc uses grammar rules that allow it to analyze 
tokens from Lex and create a syntax tree.
Lex source 
(Lexical Rules) 
Yacc source 
(Grammar Rules) 
call 
Input Parsed 
PLLab, NTHU,Cs2403 Programming 
Languages 
11 
Lex with Yacc 
Lex Yacc 
yylex() yyparse() 
Input 
lex.yy.c y.tab.c 
return token
Regular Expressions 
PLLab, NTHU,Cs2403 Programming 
Languages 
12
Lex Regular Expressions 
(Extended Regular Expressions) 
• A regular expression matches a set of strings 
• Regular expression 
PLLab, NTHU,Cs2403 Programming 
Languages 
13 
– Operators 
– Character classes 
– Arbitrary character 
– Optional expressions 
– Alternation and grouping 
– Context sensitivity 
– Repetitions and definitions
PLLab, NTHU,Cs2403 Programming 
Languages 
14 
Operators 
“  [ ] ^ - ? . * + | ( ) $ / { } % < > 
• If they are to be used as text characters, an 
escape should be used 
$ = “$” 
 = “” 
• Every character but blank, tab (t), newline (n) 
and the list above is always a text character
Character Classes [] 
• [abc] matches a single character, which may 
be a, b, or c 
• Every operator meaning is ignored except  - 
and ^ 
• e.g. 
[ab] => a or b 
[a-z] => a or b or c or … or z 
[-+0-9] => all the digits and the two signs 
[^a-zA-Z] => any character which is not a 
PLLab, NTHU,Cs2403 Programming 
Languages 
15 
letter
Arbitrary Character . 
• To match almost character, the 
operator character . is the class of all 
characters except newline 
• [40-176] matches all printable 
characters in the ASCII character set, 
from octal 40 (blank) to octal 176 
(tilde~) 
PLLab, NTHU,Cs2403 Programming 
Languages 
16
Optional & Repeated 
PLLab, NTHU,Cs2403 Programming 
Languages 
17 
Expressions 
• a? => zero or one instance of a 
• a* => zero or more instances of a 
• a+ => one or more instances of a 
• E.g. 
ab?c => ac or abc 
[a-z]+ => all strings of lower case letters 
[a-zA-Z][a-zA-Z0-9]* => all 
alphanumeric strings with a leading 
alphabetic character
Precedence of Operators 
• Level of precedence 
– Kleene closure (*), ?, + 
– concatenation 
– alternation (|) 
• All operators are left associative. 
• Ex: a*b|cd* = ((a*)b)|(c(d*)) 
PLLab, NTHU,Cs2403 Programming 
Languages 
18
Pattern Matching Primitives 
Metacharacter Matches 
. any character except newline 
n newline 
* zero or more copies of the preceding expression 
+ one or more copies of the preceding expression 
? zero or one copy of the preceding expression 
^ beginning of line / complement 
$ end of line 
a|b a or b 
(ab)+ one or more copies of ab (grouping) 
[ab] a or b 
a{3} 3 instances of a 
“a+b” literal “a+b” (C escapes still work) 
PLLab, NTHU,Cs2403 Programming 
Languages 
19
Recall: Lex Source 
a = b + c; 
a operator: ASSIGNMENT b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
20 
• Lex source is a table of 
– regular expressions and 
– corresponding program fragments (actions) 
… 
%% 
<regexp> <action> 
<regexp> <action> 
… 
%% 
%% 
“=“ printf(“operator: ASSIGNMENT”);
Transition Rules 
• regexp <one or more blanks> action (C code); 
• regexp <one or more blanks> { actions (C code) } 
• A null statement ; will ignore the input (no 
actions) 
[ tn] ; 
– Causes the three spacing characters to be ignored a = b + c; 
PLLab, NTHU,Cs2403 Programming 
Languages 
21 
d = b * c; 
↓ ↓ 
a=b+c;d=b*c;
Transition Rules (cont’d) 
• Four special options for actions: 
|, ECHO;, BEGIN, and REJECT; 
• | indicates that the action for this rule is from 
the action for the next rule 
PLLab, NTHU,Cs2403 Programming 
Languages 
22 
– [ tn] ; 
– “ “ | 
“t” | 
“n” ; 
• The unmatched token is using a default 
action that ECHO from the input to the output
Transition Rules (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
23 
• REJECT 
– Go do the next alternative 
… 
%% 
pink {npink++; REJECT;} 
ink {nink++; REJECT;} 
pin {npin++; REJECT;} 
. | 
n ; 
%% 
…
Lex Predefined Variables 
PLLab, NTHU,Cs2403 Programming 
Languages 
24 
• yytext -- a string containing the lexeme 
• yyleng -- the length of the lexeme 
• yyin -- the input stream pointer 
– the default input of default main() is stdin 
• yyout -- the output stream pointer 
– the default output of default main() is stdout. 
• cs20: %./a.out < inputfile > outfile 
• E.g. 
[a-z]+ printf(“%s”, yytext); 
[a-z]+ ECHO; 
[a-zA-Z]+ {words++; chars += yyleng;}
Lex Library Routines 
PLLab, NTHU,Cs2403 Programming 
Languages 
25 
• yylex() 
– The default main() contains a call of yylex() 
• yymore() 
– return the next token 
• yyless(n) 
– retain the first n characters in yytext 
• yywarp() 
– is called whenever Lex reaches an end-of-file 
– The default yywarp() always returns 1
Review of Lex Predefined 
PLLab, NTHU,Cs2403 Programming 
Languages 
26 
Variables 
Name Function 
char *yytext pointer to matched string 
int yyleng length of matched string 
FILE *yyin input stream pointer 
FILE *yyout output stream pointer 
int yylex(void) call to invoke lexer, returns token 
char* yymore(void) return the next token 
int yyless(int n) retain the first n characters in yytext 
int yywrap(void) wrapup, return 1 if done, 0 if not done 
ECHO write matched string 
REJECT go to the next alternative rule 
INITAL initial start condition 
BEGIN condition switch start condition
User Subroutines Section 
• You can use your Lex routines in the same 
ways you use routines in other programming 
languages. 
PLLab, NTHU,Cs2403 Programming 
Languages 
27 
%{ 
void foo(); 
%} 
letter [a-zA-Z] 
%% 
{letter}+ foo(); 
%% 
… 
void foo() { 
… 
}
User Subroutines Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
28 
(cont’d) 
• The section where main() is placed 
%{ 
int counter = 0; 
%} 
letter [a-zA-Z] 
%% 
{letter}+ {printf(“a wordn”); counter++;} 
%% 
main() { 
yylex(); 
printf(“There are total %d wordsn”, counter); 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
29 
Usage 
• To run Lex on a source file, type 
lex scanner.l 
• It produces a file named lex.yy.c which is 
a C program for the lexical analyzer. 
• To compile lex.yy.c, type 
cc lex.yy.c –ll 
• To run the lexical analyzer program, type 
./a.out < inputfile
PLLab, NTHU,Cs2403 Programming 
Languages 
30 
Versions of Lex 
• AT&T -- lex 
http://www.combo.org/lex_yacc_page/lex.html 
• GNU -- flex 
http://www.gnu.org/manual/flex-2.5.4/flex.html 
• a Win32 version of flex : 
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html 
or Cygwin : 
http://sources.redhat.com/cygwin/ 
• Lex on different machines is not created equal.
Yacc - Yet Another Compiler- 
PLLab, NTHU,Cs2403 Programming 
Languages 
31 
Compiler
PLLab, NTHU,Cs2403 Programming 
Languages 
32 
Introduction 
• What is YACC ? 
– Tool which will produce a parser for a 
given grammar. 
– YACC (Yet Another Compiler Compiler) 
is a program designed to compile a 
LALR(1) grammar and to produce the 
source code of the syntactic analyzer of 
the language produced by this grammar.
How YACC Works 
PLLab, NTHU,Cs2403 Programming 
Languages 
33 
a.out 
File containing desired 
grammar in yacc format 
yyaacccc pprrooggrraamm 
C source program created by yacc 
CC ccoommppiilleerr 
Executable program that will parse 
grammar given in gram.y 
gram.y 
yacc 
y.tab.c 
cc 
or gcc
How YACC Works 
y.output 
y.tab.c a.out 
PLLab, NTHU,Cs2403 Programming 
Languages 
34 
yacc 
(1) Parser generation time 
YACC source (*.y) 
y.tab.h 
y.tab.c 
C compiler/linker 
(2) Compile time 
a.out 
(3) Run time 
Token stream 
Abstract 
Syntax 
Tree
An YACC File Example 
PLLab, NTHU,Cs2403 Programming 
Languages 
35 
%{ 
#include <stdio.h> 
%} 
%token NAME NUMBER 
%% 
statement: NAME '=' expression 
| expression { printf("= %dn", $1); } 
; 
expression: expression '+' NUMBER { $$ = $1 + $3; } 
| expression '-' NUMBER { $$ = $1 - $3; } 
| NUMBER { $$ = $1; } 
; 
%% 
int yyerror(char *s) 
{ 
fprintf(stderr, "%sn", s); 
return 0; 
} 
int main(void) 
{ 
yyparse(); 
return 0; 
}
PLLab, NTHU,Cs2403 Programming 
Languages 
36 
Works with Lex 
How to 
work ?
PLLab, NTHU,Cs2403 Programming 
Languages 
37 
Works with Lex 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM
YACC File Format 
PLLab, NTHU,Cs2403 Programming 
Languages 
38 
%{ 
C declarations 
%} 
yacc declarations 
%% 
Grammar rules 
%% 
Additional C code 
– Comments enclosed in /* ... */ may appear in 
any of the sections.
Definitions Section 
PLLab, NTHU,Cs2403 Programming 
Languages 
39 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token ID NUM 
%start expr 
It is a terminal 
由 expr 開始 
parse
PLLab, NTHU,Cs2403 Programming 
Languages 
40 
Start Symbol 
• The first non-terminal specified in the 
grammar specification section. 
• To overwrite it with %start declaraction. 
%start non-terminal
PLLab, NTHU,Cs2403 Programming 
Languages 
41 
Rules Section 
• This section defines grammar 
• Example 
expr : expr '+' term | term; 
term : term '*' factor | factor; 
factor : '(' expr ')' | ID | NUM;
PLLab, NTHU,Cs2403 Programming 
Languages 
42 
Rules Section 
• Normally written like this 
• Example: 
expr : expr '+' term 
| term 
; 
term : term '*' factor 
| factor 
; 
factor : '(' expr ')' 
| ID 
| NUM 
;
The Position of Rules 
expr : expr '+' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
43 
| ID 
| NUM 
;
The Position of Rules 
expr : eexxpprr '+' term { $$ = $1 + $3; } 
| tteerrmm { $$ = $1; } 
; 
term : tteerrmm '*' factor { $$ = $1 * $3; } 
| ffaaccttoorr { $$ = $1; } 
; 
factor : '((' expr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
44 
| ID 
| NUM 
; 
$$11
The Position of Rules 
expr : expr '++' term { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '**' factor { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' eexxpprr ')' { $$ = $2; } 
PLLab, NTHU,Cs2403 Programming 
Languages 
45 
| ID 
| NUM 
; $$22
The Position of Rules 
expr : expr '+' tteerrmm { $$ = $1 + $3; } 
| term { $$ = $1; } 
; 
term : term '*' ffaaccttoorr { $$ = $1 * $3; } 
| factor { $$ = $1; } 
; 
factor : '(' expr '))' { $$ = $2; } 
| ID 
| NUM 
; $$33 DDeeffaauulltt:: $$$$ == $$11;; 
PLLab, NTHU,Cs2403 Programming 
Languages 
46
Communication between LEX and 
PLLab, NTHU,Cs2403 Programming 
Languages 
47 
YACC 
call yylex() [0-9]+ 
next token is NUM 
NUM ‘+’ NUM 
LEX and YACC需要一套方法確認token的身份
Communication between LEX 
PLLab, NTHU,Cs2403 Programming 
Languages 
48 
and YACC 
yacc -d gram.y 
Will produce: 
y.tab.h 
• Use enumeration ( 列舉 ) / 
define 
• 由一方產生,另一方 
include 
• YACC 產生 y.tab.h 
• LEX include y.tab.h
Communication between LEX 
scanner.l 
PLLab, NTHU,Cs2403 Programming 
Languages 
49 
and YACC 
%{ 
#include <stdio.h> 
#include "y.tab.h" 
%} 
id [_a-zA-Z][_a-zA-Z0-9]* 
%% 
int { return INT; } 
char { return CHAR; } 
float { return FLOAT; } 
{id} { return ID;} 
%{ 
#include <stdio.h> 
#include <stdlib.h> 
%} 
%token CHAR, FLOAT, ID, INT 
%% 
yacc -d xxx.y 
Produced 
y.tab.h: 
# define CHAR 258 
# define FLOAT 259 
# define ID 260 
# define INT 261 
parser.y
Phrase -> cart_animal AND CART 
| work_animal AND PLOW… 
PLLab, NTHU,Cs2403 Programming 
Languages 
50 
YACC 
• Rules may be recursive 
• Rules may be ambiguous* 
• Uses bottom up Shift/Reduce parsing 
– Get a token 
– Push onto stack 
– Can it reduced (How do we know?) 
• If yes: Reduce using a rule 
• If no: Get another token 
• Yacc cannot look ahead more than one token
PLLab, NTHU,Cs2403 Programming 
Languages 
51 
Yacc Example 
• Taken from Lex & Yacc 
• Simple calculator 
a = 4 + 6 
a 
a=10 
b = 7 
c = a + b 
c 
c = 17 
$
PLLab, NTHU,Cs2403 Programming 
Languages 
52 
Grammar 
expression ::= expression '+' term | 
expression '-' term | 
term 
term ::= term '*' factor | 
term '/' factor | 
factor 
factor ::= '(' expression ')' | 
'-' factor | 
NUMBER | 
NAME
PLLab, NTHU,Cs2403 Programming 
Languages 
53 
statement_list: statement 'n' 
| statement_list statement 'n' 
; 
statement: NAME '=' expression { $1->value = $3; } 
| expression { printf("= %gn", $1); } 
; 
expression: expression '+' term { $$ = $1 + $3; } 
| expression '-' term { $$ = $1 - $3; } 
| term 
; 
parser.y 
Parser (cont’d)
Parser (cont’d) 
PLLab, NTHU,Cs2403 Programming 
Languages 
54 
term: term '*' factor { $$ = $1 * $3; } 
| term '/' factor { if ($3 == 0.0) 
yyerror("divide by zero"); 
else 
$$ = $1 / $3; 
} 
| factor 
; 
factor: '(' expression ')' { $$ = $2; } 
| '-' factor { $$ = -$2; } 
| NUMBER { $$ = $1; } 
| NAME { $$ = $1->value; } 
; 
%% parser.y
Scanner 
%{ 
#include "y.tab.h" 
#include "parser.h" 
#include <math.h> 
%} 
%% 
([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { 
PLLab, NTHU,Cs2403 Programming 
Languages 
55 
yylval.dval = atof(yytext); 
return NUMBER; 
} 
[ t] ; /* ignore white space */ 
scanner.l
Scanner (cont’d) 
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ 
yylval.symp = symlook(yytext); 
return NAME; 
PLLab, NTHU,Cs2403 Programming 
Languages 
56 
} 
"$" { return 0; /* end of input */ } 
n|”=“|”+”|”-”|”*”|”/” return yytext[0]; 
%% 
scanner.l
YACC Command 
PLLab, NTHU,Cs2403 Programming 
Languages 
57 
• Yacc (AT&T) 
– yacc –d xxx.y 
• Bison (GNU) 
– bison –d –y xxx.y 
產生y.tab.c, 與yacc相同 
不然會產生xxx.tab.c
(1) 1 – 2 - 3 
(2) 1 – 2 * 3 
PLLab, NTHU,Cs2403 Programming 
Languages 
58 
Precedence / 
Association 
1. 1-2-3 = (1-2)-3? or 1-(2-3)? 
Define ‘-’ operator is left-association. 
2. 1-2*3 = 1-(2*3) 
Define “*” operator is precedent to “-” 
operator
PLLab, NTHU,Cs2403 Programming 
Languages 
59 
Precedence / 
Association 
%right ‘=‘ 
%left '<' '>' NE LE GE 
%left '+' '-‘ 
%left '*' '/' 
highest precedence
%left '+' '-' 
%left '*' '/' 
%noassoc UMINUS 
PLLab, NTHU,Cs2403 Programming 
Languages 
60 
Precedence / 
Association 
expr : expr ‘+’ expr { $$ = $1 + $3; } 
| expr ‘-’ expr { $$ = $1 - $3; } 
| expr ‘*’ expr { $$ = $1 * $3; } 
| expr ‘/’ expr 
{ 
if($3==0) 
yyerror(“divide 0”); 
else 
$$ = $1 / $3; 
} 
| ‘-’ expr %prec UMINUS {$$ = -$2; }
Shift/Reduce Conflicts 
• shift/reduce conflict 
– occurs when a grammar is written in 
such a way that a decision between 
shifting and reducing can not be made. 
– ex: IF-ELSE ambigious. 
• To resolve this conflict, yacc will 
choose to shift. 
PLLab, NTHU,Cs2403 Programming 
Languages 
61
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
62 
Summary `%start' 
Specify the grammar's start symbol 
`%union' 
Declare the collection of data types that semantic values may 
have 
`%token' 
Declare a terminal symbol (token type name) with no 
precedence or 
associativity specified 
`%type' 
Declare the type of semantic values for a nonterminal symbol
YACC Declaration 
PLLab, NTHU,Cs2403 Programming 
Languages 
63 
`%right' Summary 
Declare a terminal symbol (token type name) that is 
right-associative 
`%left' 
Declare a terminal symbol (token type name) that is left-associative 
`%nonassoc' 
Declare a terminal symbol (token type name) that is 
nonassociative 
(using it in a way that would be associative is a syntax error, 
ex: x op. y op. z is syntax error)
Reference Books 
PLLab, NTHU,Cs2403 Programming 
Languages 
64 
• lex & yacc, 2nd Edition 
– by John R.Levine, Tony Mason & Doug 
Brown 
– O’Reilly 
– ISBN: 1-56592-000-7 
• Mastering Regular Expressions 
– by Jeffrey E.F. Friedl 
– O’Reilly 
– ISBN: 1-56592-257-3

Weitere ähnliche Inhalte

Was ist angesagt?

Graph representation
Graph representationGraph representation
Graph representation
Tech_MX
 

Was ist angesagt? (20)

Intermediate code generation (Compiler Design)
Intermediate code generation (Compiler Design)   Intermediate code generation (Compiler Design)
Intermediate code generation (Compiler Design)
 
LL(1) parsing
LL(1) parsingLL(1) parsing
LL(1) parsing
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Graph coloring using backtracking
Graph coloring using backtrackingGraph coloring using backtracking
Graph coloring using backtracking
 
LEX & YACC
LEX & YACCLEX & YACC
LEX & YACC
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
4 pillars of OOPS CONCEPT
4 pillars of OOPS CONCEPT4 pillars of OOPS CONCEPT
4 pillars of OOPS CONCEPT
 
Unit 7. Functions
Unit 7. FunctionsUnit 7. Functions
Unit 7. Functions
 
Graph representation
Graph representationGraph representation
Graph representation
 
Asymptotic Notation and Complexity
Asymptotic Notation and ComplexityAsymptotic Notation and Complexity
Asymptotic Notation and Complexity
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Applications of stack
Applications of stackApplications of stack
Applications of stack
 
Two pass Assembler
Two pass AssemblerTwo pass Assembler
Two pass Assembler
 
Strings in python
Strings in pythonStrings in python
Strings in python
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
finding Min and max element from given array using divide & conquer
finding Min and max element from given array using  divide & conquer finding Min and max element from given array using  divide & conquer
finding Min and max element from given array using divide & conquer
 
Ll(1) Parser in Compilers
Ll(1) Parser in CompilersLl(1) Parser in Compilers
Ll(1) Parser in Compilers
 
Code optimization in compiler design
Code optimization in compiler designCode optimization in compiler design
Code optimization in compiler design
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 

Ähnlich wie Lex and Yacc ppt

system software
system software system software
system software
randhirlpu
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
Sami Said
 

Ähnlich wie Lex and Yacc ppt (20)

lex and yacc.pdf
lex and yacc.pdflex and yacc.pdf
lex and yacc.pdf
 
system software
system software system software
system software
 
Lexical
LexicalLexical
Lexical
 
Module4 lex and yacc.ppt
Module4 lex and yacc.pptModule4 lex and yacc.ppt
Module4 lex and yacc.ppt
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
 
Compiler design and lexical analyser
Compiler design and lexical analyserCompiler design and lexical analyser
Compiler design and lexical analyser
 
CD U1-5.pptx
CD U1-5.pptxCD U1-5.pptx
CD U1-5.pptx
 
Unit1.ppt
Unit1.pptUnit1.ppt
Unit1.ppt
 
Lex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.pptLex and Yacc Tool M1.ppt
Lex and Yacc Tool M1.ppt
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
module 4.pptx
module 4.pptxmodule 4.pptx
module 4.pptx
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
lecture_lex.pdf
lecture_lex.pdflecture_lex.pdf
lecture_lex.pdf
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
 
Compilers Design
Compilers DesignCompilers Design
Compilers Design
 
Lexical
LexicalLexical
Lexical
 
Lex programming
Lex programmingLex programming
Lex programming
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 

Kürzlich hochgeladen

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Kürzlich hochgeladen (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Lex and Yacc ppt

  • 1. Lex Yacc tutorial Kun-Yuan Hsieh kyshieh@pllab.cs.nthu.edu.tw Programming Language Lab., NTHU PLLab, NTHU,Cs2403 Programming Languages 1
  • 2. PLLab, NTHU,Cs2403 Programming Languages 2 Overview take a glance at Lex!
  • 3. Compilation Sequence PLLab, NTHU,Cs2403 Programming Languages 3
  • 4. PLLab, NTHU,Cs2403 Programming Languages 4 What is Lex? • The main job of a lexical analyzer (scanner) is to break up an input stream into more usable elements (tokens) a = b + c * d; ID ASSIGN ID PLUS ID MULT ID SEMI • Lex is an utility to help you rapidly generate your scanners
  • 5. Lex – Lexical Analyzer • Lexical analyzers tokenize input streams • Tokens are the terminals of a language – English PLLab, NTHU,Cs2403 Programming Languages 5 • words, punctuation marks, … – Programming language • Identifiers, operators, keywords, … • Regular expressions define terminals/tokens
  • 6. Lex Source Program PLLab, NTHU,Cs2403 Programming Languages 6 • Lex source is a table of – regular expressions and – corresponding program fragments digit [0-9] letter [a-zA-Z] %% {letter}({letter}|{digit})* printf(“id: %sn”, yytext); n printf(“new linen”); %% main() { yylex(); }
  • 7. Lex Source to C Program • The table is translated to a C program (lex.yy.c) which – reads an input stream – partitioning the input into strings which match the given expressions and – copying it to an output stream if necessary PLLab, NTHU,Cs2403 Programming Languages 7
  • 8. An Overview of Lex PLLab, NTHU,Cs2403 Programming Languages 8 Lex C compiler a.out Lex source program lex.yy.c input lex.yy.c a.out tokens
  • 9. (required) (optional) {definitions} %% {transition rules} %% {user subroutines} PLLab, NTHU,Cs2403 Programming Languages 9 Lex Source • Lex source is separated into three sections by % % delimiters • The general format of Lex source is • The %% absolute minimum Lex program is thus
  • 10. PLLab, NTHU,Cs2403 Programming Languages 10 Lex v.s. Yacc • Lex – Lex generates C code for a lexical analyzer, or scanner – Lex uses patterns that match strings in the input and converts the strings to tokens • Yacc – Yacc generates C code for syntax analyzer, or parser. – Yacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree.
  • 11. Lex source (Lexical Rules) Yacc source (Grammar Rules) call Input Parsed PLLab, NTHU,Cs2403 Programming Languages 11 Lex with Yacc Lex Yacc yylex() yyparse() Input lex.yy.c y.tab.c return token
  • 12. Regular Expressions PLLab, NTHU,Cs2403 Programming Languages 12
  • 13. Lex Regular Expressions (Extended Regular Expressions) • A regular expression matches a set of strings • Regular expression PLLab, NTHU,Cs2403 Programming Languages 13 – Operators – Character classes – Arbitrary character – Optional expressions – Alternation and grouping – Context sensitivity – Repetitions and definitions
  • 14. PLLab, NTHU,Cs2403 Programming Languages 14 Operators “ [ ] ^ - ? . * + | ( ) $ / { } % < > • If they are to be used as text characters, an escape should be used $ = “$” = “” • Every character but blank, tab (t), newline (n) and the list above is always a text character
  • 15. Character Classes [] • [abc] matches a single character, which may be a, b, or c • Every operator meaning is ignored except - and ^ • e.g. [ab] => a or b [a-z] => a or b or c or … or z [-+0-9] => all the digits and the two signs [^a-zA-Z] => any character which is not a PLLab, NTHU,Cs2403 Programming Languages 15 letter
  • 16. Arbitrary Character . • To match almost character, the operator character . is the class of all characters except newline • [40-176] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde~) PLLab, NTHU,Cs2403 Programming Languages 16
  • 17. Optional & Repeated PLLab, NTHU,Cs2403 Programming Languages 17 Expressions • a? => zero or one instance of a • a* => zero or more instances of a • a+ => one or more instances of a • E.g. ab?c => ac or abc [a-z]+ => all strings of lower case letters [a-zA-Z][a-zA-Z0-9]* => all alphanumeric strings with a leading alphabetic character
  • 18. Precedence of Operators • Level of precedence – Kleene closure (*), ?, + – concatenation – alternation (|) • All operators are left associative. • Ex: a*b|cd* = ((a*)b)|(c(d*)) PLLab, NTHU,Cs2403 Programming Languages 18
  • 19. Pattern Matching Primitives Metacharacter Matches . any character except newline n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ^ beginning of line / complement $ end of line a|b a or b (ab)+ one or more copies of ab (grouping) [ab] a or b a{3} 3 instances of a “a+b” literal “a+b” (C escapes still work) PLLab, NTHU,Cs2403 Programming Languages 19
  • 20. Recall: Lex Source a = b + c; a operator: ASSIGNMENT b + c; PLLab, NTHU,Cs2403 Programming Languages 20 • Lex source is a table of – regular expressions and – corresponding program fragments (actions) … %% <regexp> <action> <regexp> <action> … %% %% “=“ printf(“operator: ASSIGNMENT”);
  • 21. Transition Rules • regexp <one or more blanks> action (C code); • regexp <one or more blanks> { actions (C code) } • A null statement ; will ignore the input (no actions) [ tn] ; – Causes the three spacing characters to be ignored a = b + c; PLLab, NTHU,Cs2403 Programming Languages 21 d = b * c; ↓ ↓ a=b+c;d=b*c;
  • 22. Transition Rules (cont’d) • Four special options for actions: |, ECHO;, BEGIN, and REJECT; • | indicates that the action for this rule is from the action for the next rule PLLab, NTHU,Cs2403 Programming Languages 22 – [ tn] ; – “ “ | “t” | “n” ; • The unmatched token is using a default action that ECHO from the input to the output
  • 23. Transition Rules (cont’d) PLLab, NTHU,Cs2403 Programming Languages 23 • REJECT – Go do the next alternative … %% pink {npink++; REJECT;} ink {nink++; REJECT;} pin {npin++; REJECT;} . | n ; %% …
  • 24. Lex Predefined Variables PLLab, NTHU,Cs2403 Programming Languages 24 • yytext -- a string containing the lexeme • yyleng -- the length of the lexeme • yyin -- the input stream pointer – the default input of default main() is stdin • yyout -- the output stream pointer – the default output of default main() is stdout. • cs20: %./a.out < inputfile > outfile • E.g. [a-z]+ printf(“%s”, yytext); [a-z]+ ECHO; [a-zA-Z]+ {words++; chars += yyleng;}
  • 25. Lex Library Routines PLLab, NTHU,Cs2403 Programming Languages 25 • yylex() – The default main() contains a call of yylex() • yymore() – return the next token • yyless(n) – retain the first n characters in yytext • yywarp() – is called whenever Lex reaches an end-of-file – The default yywarp() always returns 1
  • 26. Review of Lex Predefined PLLab, NTHU,Cs2403 Programming Languages 26 Variables Name Function char *yytext pointer to matched string int yyleng length of matched string FILE *yyin input stream pointer FILE *yyout output stream pointer int yylex(void) call to invoke lexer, returns token char* yymore(void) return the next token int yyless(int n) retain the first n characters in yytext int yywrap(void) wrapup, return 1 if done, 0 if not done ECHO write matched string REJECT go to the next alternative rule INITAL initial start condition BEGIN condition switch start condition
  • 27. User Subroutines Section • You can use your Lex routines in the same ways you use routines in other programming languages. PLLab, NTHU,Cs2403 Programming Languages 27 %{ void foo(); %} letter [a-zA-Z] %% {letter}+ foo(); %% … void foo() { … }
  • 28. User Subroutines Section PLLab, NTHU,Cs2403 Programming Languages 28 (cont’d) • The section where main() is placed %{ int counter = 0; %} letter [a-zA-Z] %% {letter}+ {printf(“a wordn”); counter++;} %% main() { yylex(); printf(“There are total %d wordsn”, counter); }
  • 29. PLLab, NTHU,Cs2403 Programming Languages 29 Usage • To run Lex on a source file, type lex scanner.l • It produces a file named lex.yy.c which is a C program for the lexical analyzer. • To compile lex.yy.c, type cc lex.yy.c –ll • To run the lexical analyzer program, type ./a.out < inputfile
  • 30. PLLab, NTHU,Cs2403 Programming Languages 30 Versions of Lex • AT&T -- lex http://www.combo.org/lex_yacc_page/lex.html • GNU -- flex http://www.gnu.org/manual/flex-2.5.4/flex.html • a Win32 version of flex : http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html or Cygwin : http://sources.redhat.com/cygwin/ • Lex on different machines is not created equal.
  • 31. Yacc - Yet Another Compiler- PLLab, NTHU,Cs2403 Programming Languages 31 Compiler
  • 32. PLLab, NTHU,Cs2403 Programming Languages 32 Introduction • What is YACC ? – Tool which will produce a parser for a given grammar. – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of the language produced by this grammar.
  • 33. How YACC Works PLLab, NTHU,Cs2403 Programming Languages 33 a.out File containing desired grammar in yacc format yyaacccc pprrooggrraamm C source program created by yacc CC ccoommppiilleerr Executable program that will parse grammar given in gram.y gram.y yacc y.tab.c cc or gcc
  • 34. How YACC Works y.output y.tab.c a.out PLLab, NTHU,Cs2403 Programming Languages 34 yacc (1) Parser generation time YACC source (*.y) y.tab.h y.tab.c C compiler/linker (2) Compile time a.out (3) Run time Token stream Abstract Syntax Tree
  • 35. An YACC File Example PLLab, NTHU,Cs2403 Programming Languages 35 %{ #include <stdio.h> %} %token NAME NUMBER %% statement: NAME '=' expression | expression { printf("= %dn", $1); } ; expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% int yyerror(char *s) { fprintf(stderr, "%sn", s); return 0; } int main(void) { yyparse(); return 0; }
  • 36. PLLab, NTHU,Cs2403 Programming Languages 36 Works with Lex How to work ?
  • 37. PLLab, NTHU,Cs2403 Programming Languages 37 Works with Lex call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM
  • 38. YACC File Format PLLab, NTHU,Cs2403 Programming Languages 38 %{ C declarations %} yacc declarations %% Grammar rules %% Additional C code – Comments enclosed in /* ... */ may appear in any of the sections.
  • 39. Definitions Section PLLab, NTHU,Cs2403 Programming Languages 39 %{ #include <stdio.h> #include <stdlib.h> %} %token ID NUM %start expr It is a terminal 由 expr 開始 parse
  • 40. PLLab, NTHU,Cs2403 Programming Languages 40 Start Symbol • The first non-terminal specified in the grammar specification section. • To overwrite it with %start declaraction. %start non-terminal
  • 41. PLLab, NTHU,Cs2403 Programming Languages 41 Rules Section • This section defines grammar • Example expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
  • 42. PLLab, NTHU,Cs2403 Programming Languages 42 Rules Section • Normally written like this • Example: expr : expr '+' term | term ; term : term '*' factor | factor ; factor : '(' expr ')' | ID | NUM ;
  • 43. The Position of Rules expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 43 | ID | NUM ;
  • 44. The Position of Rules expr : eexxpprr '+' term { $$ = $1 + $3; } | tteerrmm { $$ = $1; } ; term : tteerrmm '*' factor { $$ = $1 * $3; } | ffaaccttoorr { $$ = $1; } ; factor : '((' expr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 44 | ID | NUM ; $$11
  • 45. The Position of Rules expr : expr '++' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '**' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' eexxpprr ')' { $$ = $2; } PLLab, NTHU,Cs2403 Programming Languages 45 | ID | NUM ; $$22
  • 46. The Position of Rules expr : expr '+' tteerrmm { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' ffaaccttoorr { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr '))' { $$ = $2; } | ID | NUM ; $$33 DDeeffaauulltt:: $$$$ == $$11;; PLLab, NTHU,Cs2403 Programming Languages 46
  • 47. Communication between LEX and PLLab, NTHU,Cs2403 Programming Languages 47 YACC call yylex() [0-9]+ next token is NUM NUM ‘+’ NUM LEX and YACC需要一套方法確認token的身份
  • 48. Communication between LEX PLLab, NTHU,Cs2403 Programming Languages 48 and YACC yacc -d gram.y Will produce: y.tab.h • Use enumeration ( 列舉 ) / define • 由一方產生,另一方 include • YACC 產生 y.tab.h • LEX include y.tab.h
  • 49. Communication between LEX scanner.l PLLab, NTHU,Cs2403 Programming Languages 49 and YACC %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* %% int { return INT; } char { return CHAR; } float { return FLOAT; } {id} { return ID;} %{ #include <stdio.h> #include <stdlib.h> %} %token CHAR, FLOAT, ID, INT %% yacc -d xxx.y Produced y.tab.h: # define CHAR 258 # define FLOAT 259 # define ID 260 # define INT 261 parser.y
  • 50. Phrase -> cart_animal AND CART | work_animal AND PLOW… PLLab, NTHU,Cs2403 Programming Languages 50 YACC • Rules may be recursive • Rules may be ambiguous* • Uses bottom up Shift/Reduce parsing – Get a token – Push onto stack – Can it reduced (How do we know?) • If yes: Reduce using a rule • If no: Get another token • Yacc cannot look ahead more than one token
  • 51. PLLab, NTHU,Cs2403 Programming Languages 51 Yacc Example • Taken from Lex & Yacc • Simple calculator a = 4 + 6 a a=10 b = 7 c = a + b c c = 17 $
  • 52. PLLab, NTHU,Cs2403 Programming Languages 52 Grammar expression ::= expression '+' term | expression '-' term | term term ::= term '*' factor | term '/' factor | factor factor ::= '(' expression ')' | '-' factor | NUMBER | NAME
  • 53. PLLab, NTHU,Cs2403 Programming Languages 53 statement_list: statement 'n' | statement_list statement 'n' ; statement: NAME '=' expression { $1->value = $3; } | expression { printf("= %gn", $1); } ; expression: expression '+' term { $$ = $1 + $3; } | expression '-' term { $$ = $1 - $3; } | term ; parser.y Parser (cont’d)
  • 54. Parser (cont’d) PLLab, NTHU,Cs2403 Programming Languages 54 term: term '*' factor { $$ = $1 * $3; } | term '/' factor { if ($3 == 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | factor ; factor: '(' expression ')' { $$ = $2; } | '-' factor { $$ = -$2; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } ; %% parser.y
  • 55. Scanner %{ #include "y.tab.h" #include "parser.h" #include <math.h> %} %% ([0-9]+|([0-9]*.[0-9]+)([eE][-+]?[0-9]+)?) { PLLab, NTHU,Cs2403 Programming Languages 55 yylval.dval = atof(yytext); return NUMBER; } [ t] ; /* ignore white space */ scanner.l
  • 56. Scanner (cont’d) [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); return NAME; PLLab, NTHU,Cs2403 Programming Languages 56 } "$" { return 0; /* end of input */ } n|”=“|”+”|”-”|”*”|”/” return yytext[0]; %% scanner.l
  • 57. YACC Command PLLab, NTHU,Cs2403 Programming Languages 57 • Yacc (AT&T) – yacc –d xxx.y • Bison (GNU) – bison –d –y xxx.y 產生y.tab.c, 與yacc相同 不然會產生xxx.tab.c
  • 58. (1) 1 – 2 - 3 (2) 1 – 2 * 3 PLLab, NTHU,Cs2403 Programming Languages 58 Precedence / Association 1. 1-2-3 = (1-2)-3? or 1-(2-3)? Define ‘-’ operator is left-association. 2. 1-2*3 = 1-(2*3) Define “*” operator is precedent to “-” operator
  • 59. PLLab, NTHU,Cs2403 Programming Languages 59 Precedence / Association %right ‘=‘ %left '<' '>' NE LE GE %left '+' '-‘ %left '*' '/' highest precedence
  • 60. %left '+' '-' %left '*' '/' %noassoc UMINUS PLLab, NTHU,Cs2403 Programming Languages 60 Precedence / Association expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; }
  • 61. Shift/Reduce Conflicts • shift/reduce conflict – occurs when a grammar is written in such a way that a decision between shifting and reducing can not be made. – ex: IF-ELSE ambigious. • To resolve this conflict, yacc will choose to shift. PLLab, NTHU,Cs2403 Programming Languages 61
  • 62. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 62 Summary `%start' Specify the grammar's start symbol `%union' Declare the collection of data types that semantic values may have `%token' Declare a terminal symbol (token type name) with no precedence or associativity specified `%type' Declare the type of semantic values for a nonterminal symbol
  • 63. YACC Declaration PLLab, NTHU,Cs2403 Programming Languages 63 `%right' Summary Declare a terminal symbol (token type name) that is right-associative `%left' Declare a terminal symbol (token type name) that is left-associative `%nonassoc' Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, ex: x op. y op. z is syntax error)
  • 64. Reference Books PLLab, NTHU,Cs2403 Programming Languages 64 • lex & yacc, 2nd Edition – by John R.Levine, Tony Mason & Doug Brown – O’Reilly – ISBN: 1-56592-000-7 • Mastering Regular Expressions – by Jeffrey E.F. Friedl – O’Reilly – ISBN: 1-56592-257-3