SlideShare ist ein Scribd-Unternehmen logo
1 von 16
COMPILER CONSTRUCTION
                    Principles and Practice

                         Kenneth C. Louden
                       San Jose State University

          1. INTRODUCTION

          2.   SCANNING
          3.   CONTEXT-FREE GRMMARS AND PARSING
          4.   TOP-DOWN PARSING
          5.   BOTTOM-UP PARSING

          6. SEMANTIC ANALYSIS
          7. RUNTIME ENVIRONMENT
          8. CODE GENERATION

Main References:
    (1) 《编译原理及实践》,(美)Kenneth C. Louden 著,冯博琴、冯岚等译,机械工
      业出版社。
  (2) 《编译程序原理与技术》,李赣生、王华民编著,清华大学出版社。
  (3) 《程序设计语言:编译原理》,陈火旺等编著,国防工业出版社。




                                  1
Chapter 1     Introduction


        Emphasis: History of the compiler
                Description of programs related to compilers
                   Compiling translation process
                  Major Data Structures of a compiler
                  Other related issues
                  Bootstapping and porting


What and Why Compliers?

Compilers: Computer Programs that translate one language to another
           Source language(input) to target language (output)

                 Source                                         Target
                                         compiler
                 program                                        program


        Source language: high-level language c or c++
        Target language: object code, machine code (machine instruction)

Purposes of learning compilers:
             1.Basic knowledge (theoretical techniques --- automata theory)
        2.Tools and practical experience to design and program an actual compiler

Additional Usage of compiling techniques:
            Developing command interpreters, interface programs

TINY :      the language for the discussion in the text
C-Minus :   consist of a small but sufficiently complex subset of C,
            It is more extensive than TINY and suitable for a class project.




                                          2
1.1 A brief history of compiler
1. In the late1940s, the stored-program computer invented by John von Neumann
        Programs were written in machine language, such as(Intel 8x86 in IBM PCs)
               c7 06 0000 0002
       means to move number 2 to the location 0000

2. Assembly language: numeric codes were replaced symbolic forms.
             Mov x, 2
       Assembler: translate the symbolic codes and memory location of assembly
       language into the corresponding numeric codes.
       Defects of the assembly language :
             difficult to read, write and understanding;
        Dependent on the particular machine.

3. FORTRAN language and its compiler: between 1954 and 1957, developed by
   the team at IBM , John Backus.
              The first compiler was developed

4.The structure of natural language studied by Noam Chomsky,
       The classification of languages according to the complexity of their grammars
       and the power of the algorithms needed to recognize them.

    Four levels of grammars: type 0, type 1,type2,type3 grammars
    Type 0: Turing machine
    Type 1: context-sensitive grammar
    Type 2: context-free grammar, the most useful of programming language
      Type 3: right-linear grammar, regular expressions and finite automata

5. Parsing problems: studied in 1960s and 1970s
       Code improvement techniques (optimization techniques): improve Compiler’s
   efficiency
         Compiler-compilers (parser generator ): only in one part of the compiler
   process.
          YACC written in 1975 by Steve Johnson for the UNIX system.
          Lex written in 1975 by Mike Lest.

6. Recent advances in compiler design:
       application of more sophisticated algorithms for inferring and /or simplifying
the information contained in a program.
              (with the development of more sophisticated programming languages
         that allow this kind of analysis.)
      development of standard windowing environments.
         (interactive development environment. IDE)

                                          3
1.2 Programs related to compilers
1. Interpreters: Another language translator.
         It executes the source program immediately.
    Interpreters
                            Depending on the language in use and the situation
    Compilers
    Interpreters: BASIC ,LISP and so on.

    Compilers : speed execution

2. Assemblers
    A translator translates assembly language into object code

3. Linkers
    Collects code separately compiled or assembled in different object files into a file.
    Connects the code for standard library functions.
    Connects resources supplied by the operating system of the computer.

4. Loaders
    Relocatable : the code is not completely fixed .
    Loaders resolve all relocatable address relative to the starting address.

5. Preprocessors
    Preprocessors: delete comments, include other files, perform macro substitutions.

6. Editors
    Produce a standard file( structure based editors)

7. Debuggers
    Determine execution errors in a compiled program.

8. Profilers
    Collect statistics on the behavior of an object program during execution.
    Statistics: the number of times each procedure is called, the percentage of
    execution time spent in each procedure.

9. project managers
    coordinate the files being worked on by different people.
             sccs(source code control system ) and rcs(revision control system) are
    project manager programs on Unix systems.




                                              4
1.3     The translation process
      The phase of a compiler:

        Source code


           scanner

                tokens

            parser
                                             Literal table
                Syntax tree

        Semantic analyzer
                                             Symbol table
                     Annotated tree

        Source code optimizer

                     Intermediate code       Error handler


            Code generator

                     Target code

        Target code optimizer


          Target code




                                         5
1. The scanner
   Lexical analysis: input a stream of characters, output tokens
   a[index] = 4 + 2
   Tokens: a, [, index, ], = , 4, + , 2

   The task of the scanner: the recognition of tokens, enter identifiers into the symbol
   table, or enter literal into the literal table.

2. The parser
   Determine the structure of the program
   Input: the forms of tokens
   Output: a parse tree or a syntax tree
     a syntax tree is a condensation of the information contained in the parse tree.

                                                expression

                                            Assign-expression



                 expression                          =                         expression


           Subscript-expression                                              Additive-expressive



                                                                expression            +            expression
expression
                     [        expression    ]

                                                                Number 4                           Number 2
Identifier a             Identifier index




                                                 6
3. The semantic analyzer
     Static semantics: be cannot be conveniently expressed as syntax and analyzed by
the parser, but can be determined prior to execution.
          For example: declarations and type checking,data types

   Dynamic semantics: be determined by executing it , cannot be determined by a
compiler.

                                     Assign-expression


             Subscript-expression                          Additive-expression
                integer                                      integer



       Identifier              Identifier                Number              Number
         a                     index                     4                   2
       Array of integer        integer                   integer             integer

4. The source code optimizer
   Source-level optimization:
                4+2        6, constant folding

  Three–address code:
     (intermediate code: any internal representation for the source code used by the
   compiler)
      t = 4+2
      a[index] = t

  two phase optimizer:
        1. t = 6
       a[index] = t
        2. a[index] = 6

    intermediate code: any internal representation for the source code used by the
                      compiler. (syntax tree ,three-address, four-address and so on)




                                            7
5. The code generator
   Input: intermediate code or IR
   Output: machine code, code for the target machine

6. The target code optimizer
   Improve the target code generated by the code generator
   Task : choosing addressing mode to improve performance
      Replacing slow instructions by faster ones
      Eliminating redundant or unnecessary operations
        MOV   R0 , index                 MOV R0, index
        MUL   R0 , 2                     SHL R0
        MOV   R1, &a                     MOV &a[R0],6
        ADD   R1 , R0
        MOV   *R1, 6




                                         8
1.4   Major data structures in a compiler
   1. tokens:
              a value of an enumerated data type          the sets of tokens
   2. the syntax tree:
              each node is a record whose fields represent the information
         collected by the parser and semantic analyzer
 3. the symbol table:
      information associated with identifiers:
                   functions, variables, constants, and data types.
                                                  the scanner
                           the parser        insertion
              The symbol table interacts with        the semantic analyzer deletion
                                                     the optimization     access
                           code generation
 4. the literal table:
         store: constants and strings
         need quick insertion and lookup, need not allow deletions
   5. intermediate code :
              this code kept as an array of text strings, a temporary text file, or as a
         linked list of structures.
   6. temporary files
             using temporary files to hold the products of intermediate steps
                   for example: backpatch address during code generation
               if x = 0 then …….else …….
            Code : CMP x, 0
                 JNE NEXT
                ………
             NEXT:
               ……….




                                           9
1.5 other issues in compiler structure
    Viewing the compiler’s structure from different angles:
1. Analysis and synthesis
    analysis :
         lexical analysis 、syntax analysis、semantic analysis (optimization)
    synthesis:
        code generation (optimization)

2. front end and back end
     separated depend on the source language or the target language.
     the front end:
         the scanner、parser、semantic analyzer, intermediate code synthesis
    the back end:
         the code generator, some optimization
                   Front end                                 Back end
 Source code                         Intermediate code                         Target code

    Advantage: portability

3. passes
     passes: process the entire source program several times
     the initial pass: construct a syntax tree or intermediate code from the source

    a pass may consist of several phases.

    One complier with three passes
       scanning and parsing;
       semantic analysis and source-level optimization;
       code generation and target-level optimization.

4. language definition and compilers
     relation between the language definition and compiler

    formal definition in mathematical terms for the language’s semantics
          one common method: denotational semantics.

    The structure and behavior of the runtime environment of the language affect
  compiler construction

5. compiler options and interfaces
     interfaces with the operating system

                                            10
provide options to the user for various purposes




                                      11
1.6     Bootstrapping and porting
      Host language: the language in which the compiler itself is written.

 Compiler for language A                 Existing compiler                Running compiler
 written in language B                   for language B                   for language A


      Considering the following situations:
      (1) The existing compiler for B runs on the target machine;
      (2) The existing compiler for B runs on a machine different from the target
          machine.
      (3) How the first compilers were written when no compilers exited yet.
          At first, the compiler is written in the machine language.
          Today, the compiler is written in another language

T-diagram:
                             S       T
                                                    (H is expected to be the same as T)
                                     H


      A compiler written in language H that translates language S into language T.

Combining T-diagram in two ways:

             A   B           B   C                        A       C

                     H           H                                H



        On the same machine H, a compiler from A to C can be obtained by combine
the compiler for A to B with the compiler from B to C.

         A       B
                                                              A       B
                 H       H   K

                             M                                        K


         Using a compiler from H to K to translate the implementation language of
another compiler from H to K.




                                               12
The solution to the first situation mentioned above:

        A        H
                                                               A    H
                 B       B   H

                                 H                                   H



The solution to the second situation mentioned above:

        A        H
                                                               A    H
                 B       B   K

                                 K                                   K



The issue of a blunder of circularity:

                                 S         T

                                           S


         It’s common to write the compiler with the source language.

The solution to the third situation mentioned above -----Bootstrapping:

            A        H
                                                                    A    H
                     A       A        H

Compiler                               H                                  H
written in own
language A



                         Compiler     in                           Running             but
                     machine language                              inefficient compiler



                     A       H
                                                                           A     H
                             A         A       H
    Compiler written                                                             H
    in own language                            H
    A


                                     Running             but             Final version of
                                     inefficient compiler                the compiler

                                                          13
Solution to the porting:
     In order to port the compiler from old host H to the new host K, use the old
compiler to produce a cross compiler and recompile the compiler to generate the new
one.

    Step 1        A   k
                                                            A    k
                      A     A     H
    Compiler source                                              H
    code retargeted               H
    to K



                          Original Compiler               Cross Compiler


    Step 2
                  A   K
                                                            A    H
                      A     A     K
    Compiler source
    code retargeted               H                              K
    to K




                          Cross Compiler                  Retargeted
                                                          compiler




                                              14
1.7     The TINY sample language and compiler
      Language TINY: as a running example ( as a source language )
      Target language: assembly language (TM machine)


1.7.1 the tiny language


The features of a program in TINY:
   1.   a sequence of statements separated by semicolons
   2.   no procedure, no declarations
   3.   all variables are integer,
   4.   two control statement : if-else and repeat
   5.   read and write statements
   6.   comments with curly brackets; but can not be nested
   7.   expressions are Boolean and integer arithmetic expressions ( using < ,=), (+,-,*
        /, parentheses, constants, variables ), Boolean expressions are only as tests in
        control statements.

One sample program in TINY: Factorial function
   Read x; {input an integer}
   If x>0 then {don’t compute if x <=0}
        Fact:=1;
        Repeat
            Fact :=fact *x;
            X:=x-1;
        Until x=0;
        Write fact {output factorial of x}
   End




                                           15
1.7.2 The TINY compiler


    C files: globals.h, util.h, scan.h, parse.h, symtab.h, analyze.h, code.h, cgen.h
          Main.c, util.c, scan.c, parse.c, symtab.c, analyze.c, code.c, cgen.c

    Four passes: 1. The scanner and the parser
                  2. semantic analysis: constructing the symbol table
                  3. semantic analysis: type checking
                  4. the code generator

    main.c drives these passes. The central code is as follows:
        syntaxTree = parse( );
        buildSymtab (syntaxTree);
        typeCheck(syntaxTree);
        codeGen(syntaxTree,codefile);

1.7.3 The TM Machine


The target language: the assembly language
 TM machine has some the properties of Reduced Instruction Set Computers(RISC).
   1. all arithmetic and testing must take place in registers.
   2. the addressing modes are extremely limited.

The simulator of the TM machine can directly execute the assembly files.




                                           16

Weitere ähnliche Inhalte

Was ist angesagt?

Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translationAkshaya Arunan
 
Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Tech_MX
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expressionvaluebound
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
Substitution techniques
Substitution techniquesSubstitution techniques
Substitution techniquesvinitha96
 
Operator Precedence Grammar
Operator Precedence GrammarOperator Precedence Grammar
Operator Precedence GrammarHarisonFekadu
 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler DesignAkhil Kaushik
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Keyappasami
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysisBilalzafar22
 
Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Daniyal Mughal
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming languageVasavi College of Engg
 

Was ist angesagt? (20)

Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)Symbol table design (Compiler Construction)
Symbol table design (Compiler Construction)
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Substitution techniques
Substitution techniquesSubstitution techniques
Substitution techniques
 
Lex
LexLex
Lex
 
Operator Precedence Grammar
Operator Precedence GrammarOperator Precedence Grammar
Operator Precedence Grammar
 
Compiler design
Compiler designCompiler design
Compiler design
 
Types of Parser
Types of ParserTypes of Parser
Types of Parser
 
Compiler Design- Machine Independent Optimizations
Compiler Design- Machine Independent OptimizationsCompiler Design- Machine Independent Optimizations
Compiler Design- Machine Independent Optimizations
 
Parsing in Compiler Design
Parsing in Compiler DesignParsing in Compiler Design
Parsing in Compiler Design
 
Operator precedence
Operator precedenceOperator precedence
Operator precedence
 
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
 
Chapter 5 Syntax Directed Translation
Chapter 5   Syntax Directed TranslationChapter 5   Syntax Directed Translation
Chapter 5 Syntax Directed Translation
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
 
Symbol Table
Symbol TableSymbol Table
Symbol Table
 
Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6Type checking compiler construction Chapter #6
Type checking compiler construction Chapter #6
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 

Ähnlich wie Compiler Construction Principles and Practice

Chapter One
Chapter OneChapter One
Chapter Onebolovv
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler designDHARANI BABU
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORijistjournal
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in detailskazi_aihtesham
 
unit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfunit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfDrIsikoIsaac
 
Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02Tirumala Rao
 
what is compiler and five phases of compiler
what is compiler and five phases of compilerwhat is compiler and five phases of compiler
what is compiler and five phases of compileradilmehmood93
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGEIJCI JOURNAL
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGEIJCI JOURNAL
 
COMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxCOMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxRossy719186
 
Chapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfChapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfDrIsikoIsaac
 
lec00-Introduction.pdf
lec00-Introduction.pdflec00-Introduction.pdf
lec00-Introduction.pdfwigewej294
 

Ähnlich wie Compiler Construction Principles and Practice (20)

Chapter One
Chapter OneChapter One
Chapter One
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
 
Principles of compiler design
Principles of compiler designPrinciples of compiler design
Principles of compiler design
 
System software
System softwareSystem software
System software
 
Cpcs302 1
Cpcs302  1Cpcs302  1
Cpcs302 1
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATORPSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
 
Ss ui lecture 2
Ss ui lecture 2Ss ui lecture 2
Ss ui lecture 2
 
Concept of compiler in details
Concept of compiler in detailsConcept of compiler in details
Concept of compiler in details
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Unit 1 cd
Unit 1 cdUnit 1 cd
Unit 1 cd
 
unit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdfunit1pdf__2021_12_14_12_37_34.pdf
unit1pdf__2021_12_14_12_37_34.pdf
 
Assignment1
Assignment1Assignment1
Assignment1
 
Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02Dineshmaterial1 091225091539-phpapp02
Dineshmaterial1 091225091539-phpapp02
 
Plc part 2
Plc  part 2Plc  part 2
Plc part 2
 
what is compiler and five phases of compiler
what is compiler and five phases of compilerwhat is compiler and five phases of compiler
what is compiler and five phases of compiler
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGESOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
SOFTWARE TOOL FOR TRANSLATING PSEUDOCODE TO A PROGRAMMING LANGUAGE
 
COMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptxCOMPILER CONSTRUCTION KU 1.pptx
COMPILER CONSTRUCTION KU 1.pptx
 
Chapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdfChapter1pdf__2021_11_23_10_53_20.pdf
Chapter1pdf__2021_11_23_10_53_20.pdf
 
lec00-Introduction.pdf
lec00-Introduction.pdflec00-Introduction.pdf
lec00-Introduction.pdf
 

Mehr von bolovv

Chapter 2 2 1 1
Chapter 2 2 1 1Chapter 2 2 1 1
Chapter 2 2 1 1bolovv
 
Chapter 2 2 1 2
Chapter 2 2 1 2Chapter 2 2 1 2
Chapter 2 2 1 2bolovv
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
Chapter Three(1)
Chapter Three(1)Chapter Three(1)
Chapter Three(1)bolovv
 
Chapter Seven(2)
Chapter Seven(2)Chapter Seven(2)
Chapter Seven(2)bolovv
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
Chapter Five(2)
Chapter Five(2)Chapter Five(2)
Chapter Five(2)bolovv
 
Chapter Seven(1)
Chapter Seven(1)Chapter Seven(1)
Chapter Seven(1)bolovv
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Chapter Eight(1)
Chapter Eight(1)Chapter Eight(1)
Chapter Eight(1)bolovv
 
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)bolovv
 

Mehr von bolovv (11)

Chapter 2 2 1 1
Chapter 2 2 1 1Chapter 2 2 1 1
Chapter 2 2 1 1
 
Chapter 2 2 1 2
Chapter 2 2 1 2Chapter 2 2 1 2
Chapter 2 2 1 2
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
Chapter Three(1)
Chapter Three(1)Chapter Three(1)
Chapter Three(1)
 
Chapter Seven(2)
Chapter Seven(2)Chapter Seven(2)
Chapter Seven(2)
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
Chapter Five(2)
Chapter Five(2)Chapter Five(2)
Chapter Five(2)
 
Chapter Seven(1)
Chapter Seven(1)Chapter Seven(1)
Chapter Seven(1)
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Chapter Eight(1)
Chapter Eight(1)Chapter Eight(1)
Chapter Eight(1)
 
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)
 

Kürzlich hochgeladen

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 

Kürzlich hochgeladen (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 

Compiler Construction Principles and Practice

  • 1. COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden San Jose State University 1. INTRODUCTION 2. SCANNING 3. CONTEXT-FREE GRMMARS AND PARSING 4. TOP-DOWN PARSING 5. BOTTOM-UP PARSING 6. SEMANTIC ANALYSIS 7. RUNTIME ENVIRONMENT 8. CODE GENERATION Main References: (1) 《编译原理及实践》,(美)Kenneth C. Louden 著,冯博琴、冯岚等译,机械工 业出版社。 (2) 《编译程序原理与技术》,李赣生、王华民编著,清华大学出版社。 (3) 《程序设计语言:编译原理》,陈火旺等编著,国防工业出版社。 1
  • 2. Chapter 1 Introduction Emphasis: History of the compiler Description of programs related to compilers Compiling translation process Major Data Structures of a compiler Other related issues Bootstapping and porting What and Why Compliers? Compilers: Computer Programs that translate one language to another Source language(input) to target language (output) Source Target compiler program program Source language: high-level language c or c++ Target language: object code, machine code (machine instruction) Purposes of learning compilers: 1.Basic knowledge (theoretical techniques --- automata theory) 2.Tools and practical experience to design and program an actual compiler Additional Usage of compiling techniques: Developing command interpreters, interface programs TINY : the language for the discussion in the text C-Minus : consist of a small but sufficiently complex subset of C, It is more extensive than TINY and suitable for a class project. 2
  • 3. 1.1 A brief history of compiler 1. In the late1940s, the stored-program computer invented by John von Neumann Programs were written in machine language, such as(Intel 8x86 in IBM PCs) c7 06 0000 0002 means to move number 2 to the location 0000 2. Assembly language: numeric codes were replaced symbolic forms. Mov x, 2 Assembler: translate the symbolic codes and memory location of assembly language into the corresponding numeric codes. Defects of the assembly language : difficult to read, write and understanding; Dependent on the particular machine. 3. FORTRAN language and its compiler: between 1954 and 1957, developed by the team at IBM , John Backus. The first compiler was developed 4.The structure of natural language studied by Noam Chomsky, The classification of languages according to the complexity of their grammars and the power of the algorithms needed to recognize them. Four levels of grammars: type 0, type 1,type2,type3 grammars Type 0: Turing machine Type 1: context-sensitive grammar Type 2: context-free grammar, the most useful of programming language Type 3: right-linear grammar, regular expressions and finite automata 5. Parsing problems: studied in 1960s and 1970s Code improvement techniques (optimization techniques): improve Compiler’s efficiency Compiler-compilers (parser generator ): only in one part of the compiler process. YACC written in 1975 by Steve Johnson for the UNIX system. Lex written in 1975 by Mike Lest. 6. Recent advances in compiler design: application of more sophisticated algorithms for inferring and /or simplifying the information contained in a program. (with the development of more sophisticated programming languages that allow this kind of analysis.) development of standard windowing environments. (interactive development environment. IDE) 3
  • 4. 1.2 Programs related to compilers 1. Interpreters: Another language translator. It executes the source program immediately. Interpreters Depending on the language in use and the situation Compilers Interpreters: BASIC ,LISP and so on. Compilers : speed execution 2. Assemblers A translator translates assembly language into object code 3. Linkers Collects code separately compiled or assembled in different object files into a file. Connects the code for standard library functions. Connects resources supplied by the operating system of the computer. 4. Loaders Relocatable : the code is not completely fixed . Loaders resolve all relocatable address relative to the starting address. 5. Preprocessors Preprocessors: delete comments, include other files, perform macro substitutions. 6. Editors Produce a standard file( structure based editors) 7. Debuggers Determine execution errors in a compiled program. 8. Profilers Collect statistics on the behavior of an object program during execution. Statistics: the number of times each procedure is called, the percentage of execution time spent in each procedure. 9. project managers coordinate the files being worked on by different people. sccs(source code control system ) and rcs(revision control system) are project manager programs on Unix systems. 4
  • 5. 1.3 The translation process The phase of a compiler: Source code scanner tokens parser Literal table Syntax tree Semantic analyzer Symbol table Annotated tree Source code optimizer Intermediate code Error handler Code generator Target code Target code optimizer Target code 5
  • 6. 1. The scanner Lexical analysis: input a stream of characters, output tokens a[index] = 4 + 2 Tokens: a, [, index, ], = , 4, + , 2 The task of the scanner: the recognition of tokens, enter identifiers into the symbol table, or enter literal into the literal table. 2. The parser Determine the structure of the program Input: the forms of tokens Output: a parse tree or a syntax tree a syntax tree is a condensation of the information contained in the parse tree. expression Assign-expression expression = expression Subscript-expression Additive-expressive expression + expression expression [ expression ] Number 4 Number 2 Identifier a Identifier index 6
  • 7. 3. The semantic analyzer Static semantics: be cannot be conveniently expressed as syntax and analyzed by the parser, but can be determined prior to execution. For example: declarations and type checking,data types Dynamic semantics: be determined by executing it , cannot be determined by a compiler. Assign-expression Subscript-expression Additive-expression integer integer Identifier Identifier Number Number a index 4 2 Array of integer integer integer integer 4. The source code optimizer Source-level optimization: 4+2 6, constant folding Three–address code: (intermediate code: any internal representation for the source code used by the compiler) t = 4+2 a[index] = t two phase optimizer: 1. t = 6 a[index] = t 2. a[index] = 6 intermediate code: any internal representation for the source code used by the compiler. (syntax tree ,three-address, four-address and so on) 7
  • 8. 5. The code generator Input: intermediate code or IR Output: machine code, code for the target machine 6. The target code optimizer Improve the target code generated by the code generator Task : choosing addressing mode to improve performance Replacing slow instructions by faster ones Eliminating redundant or unnecessary operations MOV R0 , index MOV R0, index MUL R0 , 2 SHL R0 MOV R1, &a MOV &a[R0],6 ADD R1 , R0 MOV *R1, 6 8
  • 9. 1.4 Major data structures in a compiler 1. tokens: a value of an enumerated data type the sets of tokens 2. the syntax tree: each node is a record whose fields represent the information collected by the parser and semantic analyzer 3. the symbol table: information associated with identifiers: functions, variables, constants, and data types. the scanner the parser insertion The symbol table interacts with the semantic analyzer deletion the optimization access code generation 4. the literal table: store: constants and strings need quick insertion and lookup, need not allow deletions 5. intermediate code : this code kept as an array of text strings, a temporary text file, or as a linked list of structures. 6. temporary files using temporary files to hold the products of intermediate steps for example: backpatch address during code generation if x = 0 then …….else ……. Code : CMP x, 0 JNE NEXT ……… NEXT: ………. 9
  • 10. 1.5 other issues in compiler structure Viewing the compiler’s structure from different angles: 1. Analysis and synthesis analysis : lexical analysis 、syntax analysis、semantic analysis (optimization) synthesis: code generation (optimization) 2. front end and back end separated depend on the source language or the target language. the front end: the scanner、parser、semantic analyzer, intermediate code synthesis the back end: the code generator, some optimization Front end Back end Source code Intermediate code Target code Advantage: portability 3. passes passes: process the entire source program several times the initial pass: construct a syntax tree or intermediate code from the source a pass may consist of several phases. One complier with three passes scanning and parsing; semantic analysis and source-level optimization; code generation and target-level optimization. 4. language definition and compilers relation between the language definition and compiler formal definition in mathematical terms for the language’s semantics one common method: denotational semantics. The structure and behavior of the runtime environment of the language affect compiler construction 5. compiler options and interfaces interfaces with the operating system 10
  • 11. provide options to the user for various purposes 11
  • 12. 1.6 Bootstrapping and porting Host language: the language in which the compiler itself is written. Compiler for language A Existing compiler Running compiler written in language B for language B for language A Considering the following situations: (1) The existing compiler for B runs on the target machine; (2) The existing compiler for B runs on a machine different from the target machine. (3) How the first compilers were written when no compilers exited yet. At first, the compiler is written in the machine language. Today, the compiler is written in another language T-diagram: S T (H is expected to be the same as T) H A compiler written in language H that translates language S into language T. Combining T-diagram in two ways: A B B C A C H H H On the same machine H, a compiler from A to C can be obtained by combine the compiler for A to B with the compiler from B to C. A B A B H H K M K Using a compiler from H to K to translate the implementation language of another compiler from H to K. 12
  • 13. The solution to the first situation mentioned above: A H A H B B H H H The solution to the second situation mentioned above: A H A H B B K K K The issue of a blunder of circularity: S T S It’s common to write the compiler with the source language. The solution to the third situation mentioned above -----Bootstrapping: A H A H A A H Compiler H H written in own language A Compiler in Running but machine language inefficient compiler A H A H A A H Compiler written H in own language H A Running but Final version of inefficient compiler the compiler 13
  • 14. Solution to the porting: In order to port the compiler from old host H to the new host K, use the old compiler to produce a cross compiler and recompile the compiler to generate the new one. Step 1 A k A k A A H Compiler source H code retargeted H to K Original Compiler Cross Compiler Step 2 A K A H A A K Compiler source code retargeted H K to K Cross Compiler Retargeted compiler 14
  • 15. 1.7 The TINY sample language and compiler Language TINY: as a running example ( as a source language ) Target language: assembly language (TM machine) 1.7.1 the tiny language The features of a program in TINY: 1. a sequence of statements separated by semicolons 2. no procedure, no declarations 3. all variables are integer, 4. two control statement : if-else and repeat 5. read and write statements 6. comments with curly brackets; but can not be nested 7. expressions are Boolean and integer arithmetic expressions ( using < ,=), (+,-,* /, parentheses, constants, variables ), Boolean expressions are only as tests in control statements. One sample program in TINY: Factorial function Read x; {input an integer} If x>0 then {don’t compute if x <=0} Fact:=1; Repeat Fact :=fact *x; X:=x-1; Until x=0; Write fact {output factorial of x} End 15
  • 16. 1.7.2 The TINY compiler C files: globals.h, util.h, scan.h, parse.h, symtab.h, analyze.h, code.h, cgen.h Main.c, util.c, scan.c, parse.c, symtab.c, analyze.c, code.c, cgen.c Four passes: 1. The scanner and the parser 2. semantic analysis: constructing the symbol table 3. semantic analysis: type checking 4. the code generator main.c drives these passes. The central code is as follows: syntaxTree = parse( ); buildSymtab (syntaxTree); typeCheck(syntaxTree); codeGen(syntaxTree,codefile); 1.7.3 The TM Machine The target language: the assembly language TM machine has some the properties of Reduced Instruction Set Computers(RISC). 1. all arithmetic and testing must take place in registers. 2. the addressing modes are extremely limited. The simulator of the TM machine can directly execute the assembly files. 16