Você está na página 1de 42

ICS 415 Introduction

Language
Humans use natural languages to communicate with each other e.g. English, Kiswahili, etc. But use programming languages to communicate with a computer e.g. Java, Pascal, C++

What is a Compiler?
A computer program is a set of instructions that the computer can understand and execute. In reality computers don t understand the instructions, they simply process data. Computer languages need to be unambiguous and have an exactly defined syntax and semantic (unlike natural language). High level programming languages have been developed for human convenience and readability. How?

What is a Compiler?
A compiler is therefore: A program that translates a high-level language program into a functionally equivalent low-level language program. A compiler is basically a translator whose source language (i.e., language to be translated) is the high-level language, and the target language is a low-level language. A compiler is used to implement a high-level language on a computer. High-level languages are more suitable for humans to work with Computers execute instructions in machine language There is need to convert programs written in high-level languages to a format that machines can execute machine language

What is a Compiler?
A compiler is therefore: A program that reads the high level input program and converts the high level language into machine code. A system software that converts a source language program into an equivalent target language program, ensuring that the input program conforms to the source language specification.

What is a Compiler?
Reasons for using high-level languages 1. Compared to machine language, the notation used by programming languages is closer to the way humans think about problems. 2. The compiler can spot some obvious programming mistakes. 3. Programs written in a high-level language tend to be shorter than equivalent programs written in machine language. 4. Another advantage of using a high-level level language is that the same program can be compiled to many different machine languages and, hence, be brought to run on many different machines.

What is a Compiler?
Reasons for Studying Compiler Construction 1. It is considered a topic that you should know in order to be wellcultured in computer science. 2. A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists. 3. The techniques used for constructing a compiler are useful for other purposes as well. 4. There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language.

What is a Compiler?

Program text

Compiler

Machine code

Errors

What is Language?
Program text is expressed in a programming/computer language. Language comprises of:  Alphabet e.g. A-Z, 0-9, special symbols e.g. _, +, *  Words or tokens e.g. if , { , elsif  Phrases e.g. if (x<y) then x++;  Rules that describe the major language elements: Syntax determines what phrases there are in the language. Semantics determines what a phrase means. How can we specify tokens or words? What of language structure?

Compilation
Compilation refers to the compiler's process of translating a high-level language program into a low-level language program. This process is very complex; hence, from the logical as well as an implementation point of view, it is customary to partition the compilation process into several phases. These phases are nothing more than logically cohesive operations that take as input one representation of a source program, and output another representation.

Phases of a Compilation

What is a Compiler?
Compilers are large, complicated programs that can only convert programs that conform to the syntax and semantic rules for a particular language. Compilation can be broken into two stages: Analysis (Front End) Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Synthesis (Back End) Code Optimization Code Generation

Compilation Front-End
The analyzer (front-end) Recognises legal constructs Reports errors Produces Intermediate Language Generates a preliminary storage map

Compilation Back-End
The synthesizer (back-end)  Translates the intermediate language (IL) into target machine code  Chooses the instructions required for each IL operation  Decides what information to keep on the processor registers  Ensures that the resulting program uses the target system efficiently

Compiler Front-End

Compiler Front-End

Lexical Analysis (Scanner)


This is the initial part of reading and analysing the program text. The text is read and divided into tokens, each of which corresponds to a symbol in the programming language, e.g., a variable name, keyword or number.

Lexical Analysis (Scanner)


Read chars from input stream (the program code) to identify tokens returns the next token from the input stream for the parser (usually keywords, variables, symbols are converted to integers for simplicity in identification) Strips out comments and white space Performs case conversion e.g. lower => upper case Correlates error messages (e.g. provide line number) Copy source program with embedded error messages

Syntactic Analysis
This phase takes the list of tokens produced by the lexical analysis phase and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing.

Syntactic Analysis
The Syntactic Analyzer (or Parser) will analyze groups of related tokens (``words'') that form larger constructs (phrases). These include arithmetic expressions and statements such as: while expression do statement ; x := a + b * 7; It will convert the linear string of tokens into structured representations such as expression trees and program flow graphs.

Semantic Analysis
This phase is also referred to as type checking This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that does not make sense given the type of the variable, such as trying to use a boolean value as a function pointer.

Intermediate Code Generation


The program is translated to a simple machine-independent intermediate language.

Compiler Back-end
Register allocation: The symbolic variable names used in the intermediate code are translated to numbers, each of which corresponds to a register in the target machine code. Machine-code generation: The intermediate language is translated to assembly language (a textual representation of machine code) for a specific machine architecture. Assembly and Linking: The assembly-language code is translated into binary representation and addresses of variables, functions, etc., are determined.

Structure of a Compiler

Source Language

Target Language

Structure of a Compiler

Source Language

Front End

Intermediate Code

Back End

Target Language

Structure of a Compiler

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code

Front End

Back End

Target Language

Structure of a Compiler

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code Code Optimizer Target Code Generator Target Language

Front End

Back End

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code Code Optimizer Target Code Generator Target Language

Example Compilation
Source Code: cur_time = start_time + cycles * 60

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Source Code: cur_time = start_time + cycles * 60 Lexical Analysis: ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60)

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Source Code: cur_time = start_time + cycles * 60 Lexical Analysis: ID(1) ASSIGN ID(2) ADD ID(3) MULT INT(60) Syntax Analysis: ASSIGN ID(1) ID(2) ADD MULT ID(3) INT(60)

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer

Example Compilation
Syntax Analysis: ASSIGN ID(1) ID(2) ADD MULT INT(60)

ID(3) Sematic Analysis: ASSIGN ID(1) ID(2) ID(3) ADD

MULT int2real INT(60)

Target Code Generator

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Sematic Analysis: ASSIGN ID(1) ID(2) ADD MULT ID(3) int2real INT(60) Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 0): temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 1): temp1 = 60.0 temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 2): temp2 = id3 * 60.0 temp3 = id2 + temp2 id1 = temp3

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code (step 3): temp2 = id3 * 60.0 id1 = id2 + temp2

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code: temp1 = id3 * 60.0 id1 = id2 + temp1

Target Language

Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Intermediate Code
Code Optimizer Target Code Generator

Example Compilation
Intermediate Code: temp1 = int2real(60) temp2 = id3 * temp1 temp3 = id2 + temp2 id1 = temp3 Optimized Code: temp1 = id3 * 60.0 id1 = id2 + temp1 Target Code: MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1

Target Language

PASCAL Example
Grammar (specified in BNF) BNF grammar contains a set of rules that define the syntax of some construct in the programming language. ::= < > => is defined to be => non terminal symbols (constructs defined in the grammar) No brackets => terminal symbols Sample program Lexical analysis Syntax analysis

SIMPLIFIED PASCAL GRAMMAR


<program> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END. <prog-name> ::= id <dec-list>::= <dec> | <dec-list>; <dec> <dec> <type> ::= <id-list> : <type> ::= INTEGER ::= <stmt> | <stmt-list> ; <stmt>

<id-list> ::= id | <id-list> , id <stmt-list> <stmt> <exp> <term> <read> ::= <assign> | <read> | <write> | <for> ::= <term> | <exp> + <term> | <exp> - <term> ::= <factor> | <term> * <factor> | <term> DIV <factor> ::= READ ( <id-list> )

<assign> ::= id := <exp>

<factor> ::= id | int | ( <exp> ) <write> ::= WRITE ( <id-list> ) <for>::= FOR <index-exp> DO <body> <index-exp> ::= id := <exp> TO <exp> <body> ::= <stmt> | BEGIN <stmt-list> END

Example of a PASCAL program


PROGRAM STATS VAR SUM, SUMSQ, I, VALUE, MEAN, VARIANCE: INTEGER BEGIN SUM:=0; SUMSQ:=0; FOR I:=1 TO 100 DO BEGIN READ (VALUE); SUM:=SUM+VALUE; SUMSQ:=SUMSQ+VALUE*VALUE END; MEAN:=SUM DIV 100; VARIANCE:=SUMSQ DIV 100 MEAN*MEAN; WRITE (MEAN, VARIANCE) END.

Exercise
Draw the parse tree for the example Pascal program, using the specified BNF Pascal grammar.

Você também pode gostar