Más contenido relacionado



  1. Compilers Design and Construction Lecture 1 Chapter 1
  2. • Any program written in a programming language must be translated before it can be executed. • This translation is typically accomplished by a software system called compiler. • This course aims to introduce the principles and techniques used to perform this translation and the issues that arise in the construction of a compiler. Course Aims
  3. Our course Talk about : 3
  4. Plan Week Subject Reading 1 Introduction Ch 1 2 Lexical analysis Ch 2-3 3,4 Syntax analysis Ch 4-5 5 Syntax, Semantic analysis Ch 4-5 Ch 6 6 Semantic analysis Ch 6 7 Mid 8 Intermediate code Ch 7-8 9,10,11 Control flow , Code generation Ch 9 –10 12 code optimization Ch 10
  5. Assessments Topic Mark lab 20% Mid Term Exam 15% Presence 5% Final Term Exam 60%
  6. 6 Learning Outcomes: • A student successfully completing this course should be able to: • understand the principles governing all phases of the compilation process. • understand the role of each of the basic components of a standard compiler. • show awareness of the problems of and methods and techniques applied to each phase of the compilation process. • apply standard techniques to solve basic problems that arise in compiler construction. • understand how the compiler can take advantage of particular processor characteristics to generate good code.
  7. References • Class textbook • Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman • Other useful books • Advanced Compiler Design & Implementation, Steven Muchnick • Building an Optimizing Compiler, Robert Morgan • Modern Compiler Implementation in Java, Andrew Appel
  8. Software Categories • System SW • Programs written for computer systems • Compilers, operating systems, … • Application SW • Programs written for computer users • Word-processors, spreadsheets, & other application packages
  9. A Layered View of a Computer from the perspective of compiler Machine with all its hardware System Software Compilers, Interpreters, Preprocessors, etc. Operating System, Device Drivers Application Programs Word-Processors, Spreadsheets, Database Software, IDEs, etc…
  10. Programs • Any program can be written in any programming language • A programming language(PL) is • A set of rules and symbols used to construct a computer program • A language used to interact with the computer
  11. 11 Why study Compilation Technology? • Success stories (one of the earliest branches in CS) • Applying theory to practice (scanning, parsing, static analysis) • Ideas from different parts of computer science are involved: • AI: Heuristic search techniques; greedy algorithms - Algorithms: graph algorithms - Theory: pattern matching - Also: Systems, Architecture • Compiler construction can be challenging and fun: • new architectures always create new challenges; success requires mastery of complex interactions; results are useful; opportunity to achieve performance.
  12. CS Expert Programmer Simple User Manually Problem that needs to be solved automatically Make SW to solve specific problem Make SW to compile any program
  13. 13 Principles of Compilation The compiler must: • preserve the meaning of the program being compiled. • “improve” the source code in some way. • Space (size of compiled code) • Feedback (information provided to the user) • Debugging • Compilation time efficiency (fast or slow compiler?)
  14. Introduction chapter 1
  15. Compilers • “Compilation” • Translation of a program written in a source language into a semantically equivalent program written in a target language Compiler Error messages Source Program Target Program Input Output Target program : an executable machine-language program.
  16. Interpreters • “Interpretation” • Performing the operations implied by the source program Interpreter Source Program Input Output Error messages
  17. History IBM developed 704 in 1954. All programming was done in assembly language. Cost of software development far exceeded cost of hardware. Low productivity. • Speedcoding interpreter: programs ran about 10 times slower than hand written assembly code • John Backus (in 1954): Proposed a program that translated high level expressions into native machine code. Skeptism all around. Most people thought it was impossible • Fortran I project (1954-1957): The first compiler was released
  18. Fortran I • The first compiler had a huge impact on the programming languages and computer science. The whole new field of compiler design was started. • More than half the programmers were using Fortran by 1958. • The development time was cut down to half. • Modern compilers preserve the basic structure of the Fortran I compiler !!!
  19. Computer Languages – Machine Language • Uses binary code • Machine-dependent • Not portable • Assembly Language • Uses mnemonics(list of words to remembers) • Machine-dependent • Not usually portable • High-Level Language (HLL) • Uses English-like language • Portable (but must be compiled for different platforms) • Examples: Pascal, C, C++, Java, Fortran, . . .
  20. Machine Language • The representation of a computer program which is actually read and understood by the computer. • A program in machine code consists of a sequence of machine instructions. • Instructions: • Machine instructions are in binary code • Instructions specify operations and memory cells involved in the operation Example: Operation Address 0010 0000 0000 0100 0100 0000 0000 0101 0011 0000 0000 0110
  21. Assembly Language A symbolic representation of the machine language of a specific processor. Is converted to machine code by an assembler. Each line of assembly code produces one machine instruction (One-to-one correspondence). Programming in assembly language is slow and error-prone but is more efficient in terms of hardware performance. Mnemonic representation of the instructions and data Example: Load Price Add Tax Store Cost
  22. High-level language • A programming language which use statements consisting of English-like keywords such as "FOR", "PRINT" or “IF“, ... etc. • Each statement corresponds to several machine language instructions (one-to-many correspondence). • Much easier to program than in assembly language. • Operations can be described using familiar symbols • Example: Cost = Price + Tax
  23. Compilers: The Big picture
  24. Editors , Preprocessors , Linker & Loader • - Editors • Compiler have been bundled together with editor and other programs into an interactive development environment (IDE) • May include some operations of a compiler, informing some errors • - Preprocessors • Delete comments, include other files, and perform macro substitutions • - Linkers • Collect separate object files into a directly executable file • Connect an object program to the code for standard library functions and to resource supplied by OS • - Loaders • Resolve all re-locatable address relative to a given base • Make executable code more flexible
  25. Compiling and running C programs Editor Compiler Linker Source code file.c Object code file.obj Executable code file.exe Libraries
  26. Debuggers • Used to determine execution error in a compiled program • Keep tracks of most or all of the source code information • Stop execution at pre-specified locations called breakpoints
  27. Debugging programerrors Editor Compiler Linker Source code file.c Object code file.obj Executable code file.exe Libraries Syntactic Errors Semantic Errors
  28. Interpreters • Execute the source program immediately rather than generating object code • Examples: BASIC, LISP, used often in educational or development situations • Speed of execution is slower than compiled code • Share many of their operations with compilers
  29. How to translate? • Direct translation is difficult. Why? • • Source code and machine code mismatch in level of abstraction • – Variables vs Memory locations/registers • – Functions vs jump/return • – Parameter passing • – structs • • Some languages are farther from machine code than others • – For example, languages supporting Object Oriented Paradigm
  30. How to translate easily? • Translate in steps. Each step handles a reasonably simple, logical, and well defined task • • Design a series of program representations • • Intermediate representations should be amenable to program manipulation of various kinds (type checking, optimization, code generation etc.) • • Representations become more machine specific and less language specific as the translation proceeds
  31. The first few steps • The first few steps can be understood by analogies to how humans comprehend a natural language • • The first step is recognizing/knowing alphabets of a language. For example • – English text consists of lower and upper case alphabets, digits, punctuations and white spaces • –Written programs consist of characters from the ASCII characters set (normally 9-13, 32-126)
  32. The first few steps • The next step to understand the sentence is recognizing words • –How to recognize English words? • –Words found in standard dictionaries • –Dictionaries are updated regularly
  33. The first few steps • How to recognize words in a programming language? • – a dictionary (of keywords etc.) • – rules for constructing words (identifiers, numbers etc.) • • This is called lexical analysis • • Recognizing words is not completely trivial. • For example: w hat ist his se nte nce?
  34. Lexical Analysis: Challenges • • We must know what the word separators are • • The language must define rules for breaking a sentence into a sequence of words. • • Normally white spaces and punctuations are word separators in languages.
  35. Lexical Analysis: Challenges • • In programming languages a character from a different class may also be treated as word separator. • • The lexical analyzer breaks a sentence into a sequence of words or tokens: • – If a == b then a = 1 ; else a = 2 ; • – Sequence of words (total 14 words) • if a == b then a = 1 ; else a = 2 ;
  36. The next step • • Once the words are understood, the next step is to understand the structure of the sentence • • The process is known as syntax checking or parsing
  37. Parsing Parsing a program is exactly the same process as shown in previous slide. • Consider an expression if x == y then z = 1 else z = 2
  38. Understanding the meaning • • Once the sentence structure is understood we try to understand the meaning of the sentence (semantic analysis) • • A challenging task • • Example: Prateek said Nitin left his assignment at home • • What does his refer to? Prateek or Nitin?
  39. Understanding the meaning • • Worse case Amit said Amit left his assignment at home • • Even worse Amit said Amit left Amit’s assignment at home • • How many Amits are there? Which one left the assignment? Whose assignment got left?
  40. Semantic Analysis • • Too hard for compilers. • They do not have capabilities similar to human understanding • • However, compilers do perform analysis to understand the meaning and catch inconsistencies • • Programming languages define strict rules to avoid such ambiguities • { int Amit = 3; { int Amit = 4; cout << Amit; } • }
  41. More on Semantic Analysis • • Compilers perform many other checks besides variable bindings • • Type checking Amit left her work at home • • There is a type mismatch between her and Amit. Presumably Amit is a male. And they are not the same person.
  42. Code Optimization • • No strong counter part with English, but is similar to editing/précis writing • • Automatically modify programs so that they • –Run faster • –Use less resources (memory, registers, space, fewer fetches etc.)
  43. Code Optimization • • Some common optimizations • –Common sub-expression elimination • –Copy propagation • –Dead code elimination • –Code motion • –Strength reduction • –Constant folding • • Example: x = 15 * 3 is transformed to x = 45
  44. Compiler
  45. Compilers • Analysis of the source program. • Synthesis into a machine-language program. 1 2 3
  46. Parts of Compilers 1. Lexical Analysis 2. Syntax Analysis 3. Semantic Analysis 4. Code Generation 5. Optimization Analysis Synthesis Front End Back End
  47. Compilers • Analysis • Front End • Split source code into different constitute pieces(token). • Put the pieces based on grammatical rules(Parse). • Report Errors. • Synthesis • Back End • Produce intermediate code • Optimize Intermediate code • Generate target code(machine language code)
  48. 48 Structure of a Compiler • Front end: analysis • Read source program and understand its structure and meaning • Back end: synthesis • Generate equivalent target language program Source Target Front End Back End
  49. Phases of a Compiler 49 Code Generator Code Optimizer Intermediate Code Generator Semantic Analyzer Syntax Analyzer Lexical Analyzer Error Handler Symbol Table Manager Target Program Source Program
  50. The Structure of a Compiler 50 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) (Character Stream) Intermediate Representation Target machine code Analysis phase Synthesis phase
  51. by Neng-Fa Zhou Analysis source program lexical analyzer syntax analyzer semantic analyzer source program tokens parse trees parse trees
  52. The Structure of a Compiler 52 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Scanner (Lexical Analyzer) The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens)  Puts information about identifiers into the symbol table.  Regular expressions are used to describe tokens (lexical constructs).  A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer. (Character Stream) Intermediate Representation Target machine code
  53. 53 Scanner (Lexical Analyzer) Ex: newval = oldval + 12 tokens: newval identifier = assignment operator oldval identifier + add operator 12 a number tokens
  54. The Structure of a Compiler 54 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Parser (Syntax Analyzer)  Given a formal syntax specification (typically as a context-free grammar [CFG] ), the parse reads tokens and groups them into units as specified by the productions of the CFG being used.  As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree.  CFG ( Context-Free Grammar )  BNF ( Backus-Naur Form )  GAA ( Grammar Analysis Algorithms ) (Character Stream) Intermediate Representation Target machine code
  55. 55 Parser (Syntax Analyzer) • A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. • A syntax analyzer is also called as a parser. • A parse tree describes a syntactic structure. parse tree
  56. 56 Parser (Syntax Analyzer (CFG) ) • The syntax of a language is specified by a context free grammar (CFG). • The rules in a CFG are mostly recursive. • A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. • If it satisfies, the syntax analyzer creates a parse tree for the given program. • Ex: We use BNF (Backus Naur Form) to specify a CFG assgstmt -> identifier := expression expression -> identifier expression -> number expression -> expression + expression
  57. 57 Syntax Analyzer versus Lexical Analyzer • Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer? • Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language. • The syntax analyzer deals with recursive constructs of the language. • The lexical analyzer simplifies the job of the syntax analyzer. • The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program. • The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language.
  58. The Structure of a Compiler 58 Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) Semantic Routines  Perform two functions  Check the static semantics of each construct  Do the actual translation  The heart of a compiler  Result is: Syntax Directed Translation  Semantic Processing Techniques Ex: newval = oldval + 12 The type of the identifier newval must match with type of the expression (oldval+12) Target machine code
  59. Semantic Analysis type checking type conversion
  60. Symbol Table • There is a record for each identifier • The attributes include name, type, location, etc.
  61. Synthesis of Object Code intermediate code generator code optimizer code generator parse tree & symbol table intermediate code optimized intermediate code target program
  62. The Structure of a Compiler 62 Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) Intermediate Code Generation  A compiler may produce an explicit intermediate codes representing the source program.  These intermediate codes are generally machine (architecture independent). But the level of intermediate codes is close to the level Target machine code
  63. Intermediate Code Generation
  64. The Structure of a Compiler 64 Scanner Parser Semantic Routines Code Generator Optimizer Source Program Tokens Syntactic Structure Symbol and Attribute Tables (Used by all Phases of The Compiler) Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code This phase can be very complex and slow Peephole optimization loop optimization, register allocation, code scheduling (Character Stream) Intermediate Representation Target machine code
  65. Code Optimization
  66. The Structure of a Compiler 66 Source Program (Character Stream) Scanner Tokens Parser Syntactic Structure Semantic Routines Intermediate Representation Optimizer Code Generator Code Generator  Produces the target language in a specific architecture.  The target program is normally is a relocatable object file containing the machine codes. Target machine code
  67. Code Generation
  68. The Structure of a Compiler 68 Scanner [Lexical Analyzer] Parser [Syntax Analyzer] Semantic Process [Semantic analyzer] Code Generator [Intermediate Code Generator] Code Optimizer Tokens Parse tree Abstract Syntax Tree w/ Attributes Non-optimized Intermediate Code Optimized Intermediate Code Code Optimizer Target machine code