1. An Assignment
On
LEX & YACC
Submitted By
Mahbubur Rahman
Dept. of CSE
Jagannath University, Dhaka-1100
Date: 16-01-2019
2. LEX
Lex - A Lexical Analyzer Generator
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers").
Lex is commonly used with the yacc parser generator. Lex, originally written by Mike
Lesk and Eric Schmidt and described in 1975, is the standard analyzer generator on
many Unix systems, and an equivalent tool is specified as part of the POSIX standard.
Lex reads an input stream specifying the lexical analyzer and outputs source
code implementing the lexer in the C programming language.
Though originally distributed as proprietary software, some versions of Lex are
now open source. Open source versions of Lex, based on the original AT&T code are
now distributed as a part of open source operating systems such as Open
Solaris and Plan 9 from Bell Labs. One popular open source version of Lex,
called Flex, or the "fast lexical analyzer", is not derived from proprietary coding.
The structure of a Lex file is intentionally similar to that of a yacc file; files are
divided into three sections, separated by lines that contain only two percent signs,
as follows
The definition section defines macros and imports header files written in C. It is
also possible to write any C code here, which will be copied verbatim into the
generated source file.
The rules section associates regular expression patterns with C statements. When
the lexer sees text in the input matching a given pattern, it will execute the
associated C code.
The C code section contains C statements and functions that are copied verbatim
to the generated source file. These statements presumably contain code called
3. by the rules in the rules section. In large programs it is more convenient to place
this code in a separate file linked in at compile time.
Flex, A fast scanner generator
Flex is a tool for generating scanners: programs which recognized lexical patterns in
text. flex reads the given input files, or its standard input if no file names are given,
for a description of a scanner to generate. The description is in the form of pairs of
regular expressions and C code, called rules. flex generates as output a C source file,
`lex.yy.c', which defines a routine `yylex()'. This file is compiled and linked with the `-
lfl' library to produce an executable. When the executable is run, it analyzes its input
for occurrences of the regular expressions. Whenever it finds one, it executes the
corresponding C code.
The following is an example Lex file for the flex version of Lex. It recognizes strings
of numbers (positive integers) in the input, and simply prints them out.
/*** Definition section ***/
%{
/* C code to be copied verbatim */
#include <stdio.h>
%}
/* This tells flex to read only one input file */
%option noyywrap
%%
/*** Rules section ***/
/* [0-9]+ matches a string of one or more digits */
[0-9]+ {
4. /* yytext is a string containing the matched text. */
printf ("Saw an integer: %sn", yytext);
}
.|n
{/* Ignore all other characters. */ }
%%
/*** C Code section ***/
int main(void) {
/* Call the lexer, then quit. */
yylex();
return 0;
}
If this input is given to flex, it will be converted into a C file, lex.yy.c. This can be
compiled into an executable which matches and outputs strings of integers. For
example, given the input:
abc123z.!&*2gj6
the program will print:
Saw an integer: 123
Saw an integer: 2
Saw an integer: 6
5. YACC
Yacc: Yet Another Compiler-Compiler
Yacc is a computer program for the Unix operating system developed by Stephen C.
Johnson. It is a Look Ahead Left-to-Right (LALR) parser generator, generating
a parser, the part of a compiler that tries to make syntactic sense of the source code,
specifically a LALR parser, based on an grammar written in a notation similar
to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T
Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc
replacement.
The input to Yacc is a grammar with snippets of C code (called "actions") attached to
its rules. Its output is a shift-reduce parser in C that executes the C snippets associated
with each rule as soon as the rule is recognized. Typical actions involve the
construction of parse trees. Using an example from Johnson, if the call node (label,
left, right) constructs a binary parse tree node with the specified label and children,
then the rule
expr: expr '+' expr {$$ = node ('+', $1, $3);}
6. recognizes summation expressions and constructs nodes for them. The special
identifiers $$, $1 and $3 refer to items on the parser's stack.
Yacc produces only a parser (phrase analyzer); for full syntactic analysis this requires
an external lexical analyzer to perform the first tokenization stage (word analysis),
which is then followed by the parsing stage proper. Lexical analyzer generators, such
as Lex or Flex are widely available. The IEEE POSIX P1003.2 standard defines the
functionality and requirements for both Lex and Yacc.
Bison, The YACC-compatible Parser Generator
Bison is a general-purpose parser generator that converts a grammar description for
an LALR (1) context-free grammar into a C program to parse that grammar. Once
you are proficient with Bison, you may use it to develop a wide range of language
parsers, from those used in simple desk calculators to complex programming
languages.
Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to
work with Bison with no change. Anyone familiar with Yacc should be able to use
Bison with little trouble.