Você está na página 1de 28

Spring 2012

Master of Computer Application (MCA) Semester III MC0073 Systems Programming 4 Credits (Book ID: B0811)
Assignment Set 1 (60 Marks) Answer all questions 1. Consider the following C Language program and list out the outcomes of:

main() { int a, b, c,d; printf( enter the value of a, &a); printf( enter the value of a, &b); if ( a > b) { c = a+b; printf ( %d %d ,c); } else { d=a+b; printf ( %d %d ,d); } } 2. What is the limitation of conventional pass-1 pass-2 compilation? How do you overcome it? (5 marks) 3. Identify the following notations and define them with examples: (10 Marks) L, , T, NT, and G define them with examples

The alphabet of L, denoted by the Greek symbol

, is the collection of symbols in its character

set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of a set, e.g. = {a, b, z, 0, l, 9}

Here the symbols {, , and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where , denotes the terminal symbol comma. A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun.

Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G.

Identify the basic elements of Grammar G

Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string of lg is obtained by using the following procedure 1. Let = S. 2. While is not a string of terminal symbols (a) Select an NT appearing in , say X. (b) Replace X by a string appearing on the RHS of a production of X.

4. Classify and define Grammars. Which Grammar is best suitable for Programming Languages and why? ( 5 Marks) 5. How many characters can be represented by ASCII-8 data format? What is the limitation of ASCII-7 format? (5 Marks)

The code called ASCII (pronounced "AS-key"), which stands for American Standard Code for Information Interchange, uses 7 bits for each character. Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only characters. A more common version is ASCII-8, also called extended ASCII, which uses 8 bits per character and can represent 256 different characters. For example, the letter A is represented by 01000001. The ASCII representation has been adopted as a standard by the U.S. government and is found in a variety of computers, particularly minicomputers and microcomputers. The following table shows part of the ASCII-8 code. Note that the byte: 01000011 does represent the character 'C'. Ascii7 is a Unicode-to-ASCII conversion module for programmers. It converts any Unicode string to 7-bit ASCII preserving information. Available as a source code module, Ascii7 is an easy way to support good Unicode-to-ASCII conversion in your own applications. Key features. Convert Unicode strings to 7-bit US-ASCII. Drop diacritics. Remove accents and umlauts. Replace special symbols with pure ASCII. Convert Cyrillic and Greek letters to their Latin equivalents. Get rid of gargabe conversion. Available as source code for: Visual Basic 6.0 Visual Basic .Net Visual Basic for Applications Other languages can be arranged on demand. Please inquire. The problem. Today's applications support a large range of Unicode characters. However, compatibility often requires the use of 7-bit ASCII. Character values must be forced to the 0127 range. What's the best way to convert Unicode text to ASCII? Programming environments, such as Visual Basic and the .Net framework, have lacking support for proper conversion. Even where available, conversion loses some non-ASCII characters and converts them to question marks (?). The result is loss of information and garbage text. The solution. This is where Ascii7 comes to help. Ascii7 converts Unicode text to its ASCII representation. Instead of turning non-ASCII characters to garbage, it provides a meaningful conversion. It does this by dropping diacritics from Latin letters and finding the closest ASCII equivalent for a wide range of characters.Where an exact match is not possible, a reasonable equivalent is used. The text stays as intelligible as possible for a human reader. Suggested uses. Enforce ASCII filenames for generated files. Produce standards-compliant file formats. Common formats requiring 7-bit ASCII: GIF file comment field, MHT file header lines and email headers. With Ascii7 you convert national characters to an international format that is guaranteed to work everywhere.

6. Compare RISC Architecture with CISC Architecture? What was the necessity to move to RISC architecture? (5 Marks)
The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with it's predecessor: CISC (Complex Instruction Set Computers) architecture. Multiplying Two Numbers in Memory On the right is a diagram representing the storage scheme for a generic computer. The main memory is divided into locations numbered from (row) 1: (column) 1 to (row) 6: (column) 4. The execution unit is responsible for carrying out all computations. However, the execution unit can only operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F). Let's say we want to find the product of two numbers one stored in location 2:3 and another stored in location 5:2 - and then store the product back in the location 2:3. The CISC Approach The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we'll call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction: MULT 2:3, 5:2 MULT is what is known as a "complex instruction." It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2, then this command is identical to the C statement "a = a * b." One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware. The RISC Approach RISC processors only use simple instructions that can be executed within one clock cycle. Thus, the "MULT" command described above could be divided into three separate commands: "LOAD," which moves data from the memory bank to a register, "PROD," which finds the product of two operands located within the registers, and "STORE," which moves data from a register to the memory banks. In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly: LOAD A, 2:3 LOAD B, 5:2

PROD A, B STORE 2:3, A At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statement into code of this form. However, the RISC strategy also brings some very important Emphasis on hardware Emphasis on software advantages. Because each Includes multi-clock Single-clock, instruction requires only one clock complex instructions reduced instruction only cycle to execute, the entire Memory-to-memory: Register to register: program will execute in "LOAD" and "STORE" "LOAD" and "STORE" approximately the same amount of incorporated in instructions are independent instructions time as the multi-cycle "MULT" command. These RISC "reduced Small code sizes, Low cycles per second, instructions" require less transistors high cycles per second large code sizes of hardware space than the Transistors used for storing Spends more transistors complex instructions, leaving more complex instructions on memory registers room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible. CISC RISC Separating the "LOAD" and "STORE" instructions actually reduces the amount of work that the computer must perform. After a CISC-style "MULT" command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register. In RISC, the operand will remain in the register until another value is loaded in its place

7. Discuss Addressing Modes of Intel 80X86 with suitable examples. (10 Marks)

ADDRESSING MODES OF 8086 Addressing mode indicates a way of locating data or operands. Depending upon the data types used in the instruction and the memory addressing modes, any instruction may belong to one or more addressing modes, or some instruction may not belong to any of the addressing modes. Thus the addressing modes describe the types of operands and the way they are accessed for executing an instruction. Here, we will present the addressing modes of the instructions depending upon their types. According to the flow of instruction execution, the instructions may be categorized as (i) Sequential control flow instructions and (ii) Control transfer instructions. Sequential control flow instructions are the instructions, which after execution, transfer control to the next instruction appearing immediately after it (in the sequence) in the program. For example, the arithmetic, logical, data transfer and processor control instructions are sequential control flow instructions. The control transfer instructions, on the other hand, transfer control to some predefined address somehow specified in the instruction after their execution. For example,

INT, CALL, RET and JUMP instructions fall under this category. The addressing modes for sequential control transfer instructions are explained as follows: 1. Immediate: In this type of addressing, immediate data is a part of instruction, and appears in the form of successive byte or bytes. Example: MOV AX, 0005H In the above example, 0005H is the immediate data. The immediate data may be 8-bit or 16-bit in size. 2. Direct: In the direct addressing mode, a 16-bit memory address (offset) is directly specified in the instruction as a part of it. Example: MOV AX, [5000H] Here, data resides in a memory location in the data segment, whose effective address may be computed using 5000H as the offset address and content of DS as segment address. The effective address, here, is 10H*DS+5000H. 3. Register: In register addressing mode, the data is stored in a register and it is referred using the particular register. All the registers, except IP, may be used in this mode. Example: MOV BX, AX. 4. Register Indirect: Sometimes, the address of the memory location, which contains data or operand, is determined in an indirect way, using the offset registers. This mode of addressing is known as register indirect mode. In this addressing mode, the offset address of data is in either BX or SI or DI registers. The default segment is either DS or ES. The data is supposed to be available at the address pointed to by the content of any of the above registers in the default data segment. Example: MOV AX, [BX] Here, data is present in a memory location in DS whose offset address is in BX. The effective address of the data is given as 10H*DS+ [BX]. 5. Indexed: In this addressing mode, offset of the operand is stored in one of the index registers. DS and ES are the default segments for index registers SI and DI respectively. This mode is a special case of the above discussed register indirect addressing mode. Example: MOV AX, [SI] Here, data is available at an offset address stored in SI in DS. The effective address, in this case, is computed as 10H*DS+ [SI]. 6. Register Relative: In this addressing mode, the data is available at an effective address formed by adding an 8-bit or 16-bit displacement with the content of any one of the registers BX, BP, SI and DI in the default (either DS or ES) segment. The example given before explains this mode.

Example: MOV Ax, 50H [BX] Here, effective address is given as 10H*DS+50H+ [BX]. 7. Based Indexed: The effective address of data is formed, in this addressing mode, by adding content of a base register (any one of BX or BP) to the content of an index register (any one of SI or DI). The default segment register may be ES or DS. Example: MOV AX, [BX] [SI] Here, BX is the base register and SI is the index register. The effective address is computed as 10H*DS+ [BX] + [SI]. 8. Relative Based Indexed: The effective address is formed by adding an 8-bit or 16-bit displacement with the sum of contents of any one of the bases registers (BX or BP) and any one of the index registers, in a default segment. Example: MOV AX, 50H [BX] [SI] Here, 50H is an immediate displacement, BX is a base register and SI is an index register. The effective address of data is computed as 160H*DS+ [BX] + [SI] + 50H. For the control transfer instructions, the addressing modes depend upon whether the destination location is within the same segment or a different one. It also depends upon the method of passing the destination address to the processor. Basically, there are two addressing modes for the control transfer instructions, viz. intersegment and intra-segment addressing modes. If the location to which the control is to be transferred lies in a different segment other than the current one, the mode is called inter-segment mode. If the destination location lies in the same segment, the mode is called intra-segment. Inter-segment Direct Inter-segment Inter-segment Indirect Modes for control Transfer instructions Intra-segment Intra-segment Direct Intra-segment Indirect Addressing Modes for Control Transfer Instruction

9. Intra-segment direct mode: In this mode, the address to which the control is to be transferred lies in the same segment in which the control transfer instruction lies and appears directly in the instruction as an immediate displacement value. In this addressing mode, the displacement is computed relative to the content of the instruction pointer IP. The effective address to which the control will be transferred is given by the sum of 8 or 16 bit displacement and current content of IP. In case of jump instruction, if the signed displacement (d) is of 8 bits (i.e. 128<d<+128), we term it as short jump and if it is of 16 bits (i.e. 32768<+32768), it is termed as long jump. 10. Intra-segment Indirect Mode: In this mode, the displacement to which the control is to be transferred, is in the same segment in which the control transfer instruction lies, but it is passed to the instruction indirectly. Here, the branch address is found as the content of a register or a memory location. This addressing mode may be used in unconditional branch instructions. 11. Inter-segment Direct Mode: In this mode, the address to which the control is to be transferred is in a different segment. This addressing mode provides a means of branching from one code segment to another code segment. Here, the CS and IP of the destination address are specified directly in the instruction. 12. Inter-segment Indirect Mode: In this mode, the address to which the control is to be transferred lies in a different segment and it is passed to the instruction indirectly, i.e. contents of a memory block containing four bytes, i.e. IP (LSB), IP (MSB), CS (LSB) and CS (MSB) sequentially. The starting address of the memory block may be referred using any of the addressing modes, except immediate mode. 8086 INSTRUCTION FORMAT The 8086 instruction sizes vary from one to six bytes.

8. List out the pass-1 data structures and pass-2 data structures. ( 5 Marks)

Two Pass Assembler Pass 1 (define symbols) - Assign addresses to all statements - Save addresses assigned to all labels - Process assembler directives (e.g., WORD, RESB, ... Pass 2 (assemble instructions & write output files) - Translate operation codes

- Assemble operands - Generate data values for BYTE & WORD directives - Process START and END directives - Write to object and listing files Algorithm:

Pass 1:
BEGIN initialize Scnt, Locctr, ENDval, and Errorflag to 0 WHILE Sourceline[Scnt] is a comment BEGIN increment Scnt END {while} Breakup Sourceline[Scnt] IF Opcode = 'START' THEN BEGIN convert Operand from hex and save in Locctr and ENDval IF Label not NULL THEN Insert (Label, Locctr) into Symtab ENDIF increment Scnt Breakup Sourceline[Scnt] END ENDIF WHILE Opcode <> 'END' BEGIN IF Sourceline[Scnt] is not a comment THEN BEGIN IF Label not NULL THEN Xsearch Symtab for Label IF not found Insert (Label, Locctr) into Symtab ELSE set errors flag in Errors[Scnt] ENDIF ENDIF Xsearch Opcodetab for Opcode IF found THEN DO CASE 1. Opcode is 'RESW' or 'RESB' BEGIN increment Locctr by Storageincr IF error THEN set errors flag in Errors[Scnt] ENDIF END {case 1 (RESW or RESB)} 2. Opcode is 'WORD' or 'BYTE' THEN BEGIN increment Locctr by Storageincr IF error THEN set errors flag in Errors[Scnt] ENDIF END {case 2 (WORD or BYTE)} 3. OTHERWISE BEGIN increment Locctr by Opcodeincr IF error THEN set errors flag in Errors[Scnt] ENDIF {case 3 (default)}

END ENDCASE ELSE

/* directives such as BASE handled here or */


set errors flag in Errors[Scnt] ENDIF END {IF block} ENDIF increment Scnt Breakup Sourceline[Scnt] END {while} IF Label not NULL THEN Xsearch Symtab for Label IF not found Insert (Label, Locctr) into Symtab ELSE set errors flag in Errors[Scnt] ENDIF ENDIF IF Operand not NULL Xsearch Symtab for Operand IF found install in ENDval ENDIF ENDIF

END {of Pass 1}

Pass 2:
BEGIN initialize Scnt, Locctr, Skip, and Errorflag to 0 write assembler report headings WHILE Sourceline[Scnt] is a comment BEGIN append to assembler report increment Scnt END {while} Breakup Sourceline[Scnt] IF Opcode = 'START' THEN BEGIN convert Operand from hex and save in Locctr append to assembler report increment Scnt Breakup Sourceline[Scnt] END ENDIF format and place the load point on object code array format and place ENDval on object code array, index ENDloc WHILE Opcode <> 'END' BEGIN IF Sourceline[Scnt] is not a comment THEN BEGIN Xsearch Opcodetab for Opcode IF found THEN DO CASE 1. Opcode is 'RESW' or 'RESB' BEGIN increment Locctr by Storageincr place '!' on object code array replace the value at index ENDloc with loader address format and place Locctr on object code array format and place ENDval on object code array, index ENDloc

set Skip to 1 END 2. Opcode is 'WORD' or 'BYTE' BEGIN increment Locctr by Storageincr Dostorage to get Objline IF error THEN set errors flag in Errors[Scnt] ENDIF END 3. OTHERWISE BEGIN increment Locctr by Opcodeincr Doinstruct to get Objline IF error THEN set errors flag in Errors[Scnt] ENDIF END ENDCASE ELSE

/* directives such as BASE handled here or */


set errors flag in Errors[Scnt] ENDIF END ENDIF append to assembler report IF Errors[Scnt] <> 0 THEN BEGIN set Errorflag to 1 append error report to assembler report END ENDIF IF Errorflag = 0 and Skip = 0 THEN BEGIN place Objline on object code array END ENDIF IF Skip = 1 THEN set Skip to 0 ENDIF increment Scnt Breakup Sourceline[Scnt] END {while} place '!' on object code array IF Errorflag = 0 THEN transfer object code array to file ENDIF

END {of Pass 2}

Data Structures 1) OPTAB (operation code table) mnemonic, machine code (instruction format, length) etc. static table instruction length array or hash table, easy for search Contents - Mnemonic Codes of all instructions - Machine language op code

- Other Information (for architectures with more that one length of instruction format) o Length of instruction o Instruction formats In pass 1 - Validate op codes - Compute instruction length (in SIC/XE) In pass 2 - Translate op codes to machine language Organized as hash table - Static (no updates)

2) SYMTAB (symbol table) label name, value, flag, (type, length) etc. dynamic table (insert, delete, search) hash table, non-random keys, hashing function Contents - Name of symbol - Address (value) - Error flags (from pass1) - Other information: attributes of data or instruction labeled In pass 1 - Enter labels as they are encountered in source program - Enter addresses from LOCCTR In pass 2 - Look up operand labels for addresses Also organized as a hash table - Entries only (no deletions) - Non-random keys -- watch hashing function 3) Location Counter (LOCCTR) counted in bytes. Initialized to address specified in START directive. Length of each assembled instruction is added to LOCCTR. LOCCTR points to starting address of each statement in the program.

Intermediate Data . Between Passes 1 and 2 . For each statement in source program - Address - Error flags - Pointers to OPTAB and SYMTAB - Other (more complicated languages): results of processing of operation and operand fields TWO PASS ASSEMBLER

9. Define Macro. Write a C program with a macro to find out biggest of two numbers. ( 5 Marks)
A macro is a fragment of code which has been given a name. Whenever the name is used, it is replaced by the contents of the macro. There are two kinds of macros. They differ mostly in what they look like when they are used. Object-like macros resemble data objects when used, function-like macros resemble function calls. You may define any valid identifier as a macro, even if it is a C keyword. The preprocessor does not know anything about keywords. #include<stdio.h> #include<conio.h> #define Greatest(X,Y) X>Y?X:Y int main() { int x,y; scanf("%d %d",&x,&y); printf("%d",Greatest(x,y)); getch(); }

Spring 2012

Master of Computer Application (MCA) Semester III MC0073 Systems Programming 4 Credits (Book ID: B0811)
Assignment Set 2 (60 Marks) Answer All Questions 1. Define Bootstrapping. Distinguish between Software Bootstrapping and Compiler Bootstrapping. ( 5 Marks)

Bootstrap loading The discussions of loading up to this point have all presumed that theres already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer? In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling ones self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the systems address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBM-compatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.) Why not just load the operating system directly? Because you cant fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that its tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs. Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs. None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init.

5.2.2 Software Bootstraping & Compiler Bootstraping Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language. 5.2.3 Compiler Bootstraping In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.

2. What is the function of following Intelx86 registers ( 5 Marks)

Each register name is really an acronym. This is true even for the "alphabetical" registers AX, BX, CX, and DX. The following list shows the register names and their meanings: AX - Accumulator Register BX - Base Register CX - Counter Register DX - Data Register SI - Source Index DI - Destination Index BP - Base Pointer SP - Stack Pointer Purpose of Intelx86 registers

AX All major calculations take place in EAX, making it similar to a dedicated accumulator register. DX The data register is an extension to the accumulator. It is most useful for storing data related to the accumulators current calculation. CX Like the variable i in high-level languages, the count register is the universal loop counter. DI Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient. SI In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI had a convenient one-byte instruction for loading data out of memory into the accumulator. SP ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring its value, there is never a good reason to use the stack pointer for anything else. BP In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register. BX In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

2. Write the algorithm of Boot Strap Loader. ( 5 Marks)

4. Identify Lexemes and Tokens in the following statements: ( 5 Marks)

1. A5=B+9;

Tokenized in the following table Lexeme Token Type A5 = B + 9 ; Identifier Assignment Operator Identifier Addition Number End of Statement Operator

2. x=(y+z)/10;

Tokenized in the following table Lexeme Token Type x = ( y + z ) Identifier Assignment Operator Opening Paranthesis Identifier Addition Operator Identifier Closing Parenthesis

/ 10 ;

Division

Operator Number End of Statement

Tokens are frequently defined by regular expressions, which are understood by a lexical analyzer generator such as lex. The lexical analyzer (either generated automatically by a tool like lex, or hand-crafted) reads in a stream of characters, identifies the lexemes in the stream, and categorizes them into tokens. This is called "tokenizing." If the lexer finds an invalid token, it will report an error. Following tokenizing is parsing. From there, the interpreted data may be loaded into data structures for general use, interpretation, or compiling.

5. Given the following Grammar ( 10 Marks) F ->id T -> F E -> T T -> F

F -> id F -> id

T ->T*F

E -> E+T E Parse the input string id+id*id by Bottom-up Parsing or Shift-Reduce Parsing method. Answer: Parse Stack $ $id $F $T $E $E+ $E+id $E+F $E+T $E $T $T* $T*id T*F T F Remaining Input id+id*id$ +id*id$ +id*id$ +id*id$ +id*id$ id*id$ *id$ *id$ *id$ *id$ *id$ id$ $ $ $ $ Parser Action Shift (push next token from input on stack, advance input) Shift Reduce : F->id Reduce : T->F Reduce: E->T Shift Shift Reduce: F->id Reduce : T->F Reduce : E->E+T Reduce : E->T Shift Shift Reduce : F->id Reduce : T->T*F Reduce : T->F

6. With a neat Block Diagram Explain the Phases of Compiler. ( 10 Marks)


) Phases of Compilation:
A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form

known as object code). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization. Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software

7. Write short notes on: ( 10 Marks) -Writing Tools


. Compiler writing tools:
A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between highlevel languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization. Purdue Compiler-Construction Tool Set tool: (PCCTS) A highly integrated lexical analser generator and parser generator by Terence J. Parr , Will E. Cohen and Henry G. Dietz , both of Purdue University. ANTLR (ANother Tool for Language Recognition) corresponds to YACC and DLG (DFA-based Lexical analyser Generator) functions like LEX. PCCTS has many additional features which make it easier to use for a wide range of translation problems. PCCTS grammars contain specifications for lexical and syntactic analysis with selective backtracking ("infinite lookahead"), semantic predicates, intermediateform construction and error reporting. Rules may employ Extended BNF (EBNF) grammar constructs and may define parameters, return values, and have local variables.

Languages described in PCCTS are recognised via LLk parsers constructed in pure, human-readable, C code. Selective backtracking is available to handle non-LL(k) constructs. PCCTS parsers may be compiled with a C++ compiler. PCCTS also includes the SORCERER tree parser generator. Latest version: 1.10, runs under Unix, MSDOS, OS/2, and Macintosh and is very portable. If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task. These compiler construction kits, parser generators, lexical analyzer / analyser (lexers) generators, code optimzers (optimizer generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code for your software.

8. Define: Finite State Automaton, Deterministic Finite state Automaton and Non-Deterministic Finite State Automaton with suitable examples. ( 6 Marks)

Deterministic finite automaton (DFA)also known as deterministic finite state machineis afinite state machine accepting finite strings of symbols. For each state, there is a transition arrow leading out to a next state for each symbol. Upon reading a symbol, a DFA jumps deterministically from a state to another by following the transition arrow. Deterministic means that there is only one outcome (i.e. move to next state when the symbol matches (S0 -> S1) or move back to the same state (S0 -> S0)). A DFA has a start state (denoted graphically by an arrow coming in from nowhere) where computations begin, and a set of accept states (denoted graphically by a double circle) which help define when a computation is successful. DFAs recognize exactly the set of regular languages which are, among other things, useful for doing lexical analysis and pattern matching. A DFA can be used in either an accepting mode to verify that an input string is indeed part of the language it represents, or a generating mode to create a list of all the strings in the language. A DFA is defined as an abstract mathematical concept, but due to the deterministic nature of a DFA, it is implementable in hardware and software for solving various specific problems. For example, a software state machine that decides whether or not online user-input such as phone numbers and email addresses are valid can be modeled as a DFA. Another example in hardware is the digital logic circuitry that controls whether an automatic door is open or closed, using input from motion sensors or pressure pads to decide whether or not to perform a state transition .

Formal definition A deterministic finite automaton M is a 5-tuple, (Q, , , q0, F), consisting of

a finite set of states (Q)

a finite set of input symbols called the alphabet () a transition function ( : Q Q) a start state (q0 Q) a set of accept states (F Q)

Let w = a1a2 ... an be a string over the alphabet . The automaton M accepts the string w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:
1. r0 = q0 2. ri+1 = (ri, ai+1), for i = 0, ..., n1 3. rn F.

In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function . The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automatonrejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M). A deterministic finite automaton without accept states and without a starting state is known as a transition system or semiautomaton. For more comprehensive introduction of the formal definition see automata theory.

DFAs can be built from nondeterministic finite automata through the powerset construction.

An example of a Deterministic Finite Automaton that accepts only binary numbers that are multiples of 3. The state S0 is both the start state and an accept state.

Example The following example is of a DFA M, with a binary alphabet, which requires that the input contains an even number of 0s.

The state diagram for M

M = (Q, , , q0, F) where


Q = {S1, S2}, = {0, 1}, q0 = S1, F = {S1}, and is defined by the following state transition table: 0 1 S1 S2 S1 S2 S1 S2

The state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s, M will finish in state S1, an accepting state, so the input string will be accepted. The language recognized by M is the regular language given by the regular expression 1*( 0 (1*) 0 (1*) )*, where "*" is the Kleene star, e.g., 1* denotes any nonnegative number (possibly zero) of symbols "1".

Nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite state machine where from each state and a given input symbol the automaton may jump into several possible next states. This distinguishes it from the deterministic

finite automaton (DFA), where the next possible state is uniquely determined. Although the DFA and NFA have distinct definitions, a NFA can be translated to equivalent DFA using powerset construction, i.e., the constructed DFA and the NFA recognize the same formal language. Both types of automata recognize only regular languages. Nondeterministic finite automata were introduced in 1959 by Michael O. Rabin and Dana Scott,[1] who also showed their equivalence to deterministic finite automata. Non-deterministic finite state machines are sometimes studied by the name subshifts of finite type. Non-deterministic finite state machines are generalized by probabilistic automata, which assign a probability to each state transition. Formal definition An NFA is represented formally by a 5-tuple, (Q, , , q0, F), consisting of

a finite set of states Q a finite set of input symbols a transition function : Q P(Q). an initial (or start) state q0 Q a set of states F distinguished as accepting (or final) states F Q.

Here, P(Q) denotes the power set of Q. Let w = a1a2 ... an be a word over the alphabet . The automaton M accepts the word w if a sequence of states, r0,r1, ..., rn, exists in Q with the following conditions:
1. r0 = q0 2. ri+1 (ri, ai+1), for i = 0, ..., n1 3. rn F.

In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition relation . The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automatonrejects the string. The set of strings M accepts is the language recognized by M and this language is denoted by L(M). For more comprehensive introduction of the formal definition see automata theory.
NFA-

The NFA- (also sometimes called NFA- or NFA with epsilon moves) replaces the transition function with one that allows the empty string as a possible input, so that one has instead

: Q ( {}) P(Q).

It can be shown that ordinary NFA and NFA- are equivalent, in that, given either one, one can construct the other, which recognizes the same language. Example

The state diagram for M

Let M be a NFA-, with a binary alphabet, that determines if the input contains an even number of 0s or an even number of 1s. Note that 0 occurrences is an even number of occurrences as well. In formal notation, let M = ({s0, s1, s2, s3, s4}, {0, 1}, , s0, {s1, s3}) where the transition relation T can be defined by this state transition table:
0 S0 {} 1

{} {S1, S3} {} {} {} {}

S1 {S2} {S1} S2 {S1} {S2} S3 {S3} {S4} S4 {S4} {S3}

M can be viewed as the union of two DFAs: one with states {S1, S2} and the other with states {S3, S4}. The language of M can be described by theregular language given by this regular expression (1*(01*01*)*) (0*(10*10*)*). We define M using -moves but M can be defined without using -moves.

9. Bring out the Basic functions of Loader. ( 4 Marks)

Você também pode gostar