Você está na página 1de 26

COMPILER DESIGN Rohit Goswami-160850107027

Practical : 1

Aim: To study about LEX, YACC and FLEX

LEX a Lexer Generator

LEX is officially known as a "Lexical Analyzer". Its main job is to break up an input stream
into more usable elements. Or in, other words, to identify the "interesting bits" in a text file. For
example, if you are writing a compiler for the C programming language, the symbols { } ( ) ; all have
significance on their own. The letter a usually appears as part of a keyword or variable name, and is
not interesting on its own. Instead, we are interested in the whole word. Spaces and newlines are
completely uninteresting, and we want to ignore them completely, unless they appear within quotes
"like this". All of these things are handled by the Lexical Analyzer.

During the first phase the compiler reads the input and converts strings in the source to tokens.
With regular expressions we can specify patterns to lex so it can generate a code that will allow it to
scan and match strings in the input. Each pattern specified in the input to lex has an associated action.
Typically an action returns a token that represents the matched string for subsequent use by the use
the parser. Initially we will simply print the matched string rather than return a token value.

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 1


COMPILER DESIGN Rohit Goswami-160850107027

The following represents a simple pattern,composed of a regular expression, that scan for
identifiers. Lex will read this pattern and procedure C code for a lexical analyzer that scans for
identifiers.

letter(letter | digit)*

This pattern matches a string of characters that begins with a single letter followed by zero
or more letters or digits.
Form of a LEX specification file:

Definitions
%%
Rules
%%
User code
The definitions section: "name definition"
The rules section: "pattern action"
The user code section: "yylex() routine"

Metacharacter Matches
. Any character except new line.
\n New Line
Zero or more copies of the preceding
* expression
One or more copies of the preceding
+ expression
? Zero or one copy of the preceding expression
^ Beginning of line.
$ End of line
a|b a or b
( ab )+ One or more copies of ab (grouping)

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 2


COMPILER DESIGN Rohit Goswami-160850107027

Name Function
intyylex(void) Call to invoke lexer, returns token
char*yytext Pointer to matched string
Yyleng Length of matched string
Yylval Value associated with token
Return 1 if done, return 0 if more
processing is required. Called by LEX
intyywrap(void) when input is exhausted
FILE*yyout Output file
FILE*yyin Input file
INITIAL Initial start condition
BEGIN Condition switch start condition

ECHO Write matched string

Yacc a Parser Generator


Yacc is officially known as a "parser". Its job is to analyze the structure of the input stream.
In the course of its normal work, the parser also verifies that the input is syntactically sound.
Consider again the example of a C-compiler. In the C-language, a word can be a function name or
a variable, depending on whether it is followed by a ( or a = There should be exactly one } for each
{ in the program. YACC stands for "Yet Another Compiler Compiler. For example, a C program
may contain something like:
{
intint;
int = 33;
printf("int: %d\n",int);
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 3


COMPILER DESIGN Rohit Goswami-160850107027

In this case, the lexical analyzer would have broken the input stream into a series of "tokens", like
this:
{
int
int
;
int
=
33
;
printf
(
"int: %d\n"
,
int
)
;
}
Note that the lexical analyzer has already determined that where the keyword int appears
within quotes, it is really just part of a literal string. It is up to the parser to decide if the token int
is being used as a keyword or variable. Or it may choose to reject the use of the name int as a
variable name. The parser also ensures that each statement ends with a ; and that the brackets
balance.

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 4


COMPILER DESIGN Rohit Goswami-160850107027

Form of a yacc specification file:

Declaration
%%
Grammar rules
%%
Programs

Yacc : Grammar Rules

Terminals (tokens) :

 Names must be declared:


 %token name 1 name 2 …
 Any name not declared as a token in the declarations section is assumed to be a nonterminal.

Start Symbol :

 may be declared, via : %start name


 if not declared explicitly, defaults to the nonterminal on the LHS of the first grammar rule
listed.
Production :

A grammar production A->B1|B2|Bn is written as :

A : B1|B2|Bn
Example:
stmt : KEYWORD_IF(' expr ') stmt;

Communication between Scanner and Parser :

Pattern in bellow figure is a file you create with a text editor. Lex will read your patterns
and generate C code for a lexical analyzer or scanner. The lexical analyzer matches strings in input,
based on your patterns, and converts the strings to tokens. Tokens are numerical representations
of strings, and simplify processing.

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 5


COMPILER DESIGN Rohit Goswami-160850107027

When the lexical analyzer finds identifiers in the input stream it enters them in a symbol
table. The symbol table may also contain other information such as data type (integer to real) and
location of each variable in memory. All subsequent references to identifiers refer to the
appropriate symbol table index.

The grammar in the below diagram is a text file you create with a text editor. Yacc will read your
grammar and generate C code for a syntax analyzer or parser. The syntax analyzer uses grammar rules that
allow it to analyze tokens from the lexical analyzer and create a syntax tree.

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 6


COMPILER DESIGN Rohit Goswami-160850107027

Previous figure illustrates the file naming conventions used by lex and yacc. We'll assume our goal
is to write a BASIC compiler. First, we need to specify all pattern matching rules for lex(bas.l)
and grammar rules for yacc (bas.y). Commands to create our compiler, bas.exe, are listedbelow:

yacc -d bas.y # create y.tab.h, y.tab.c

lexbas.l # create lex.yy.c

cc lex.yy.cy.tab.c -obas.exe # compile/link

Yacc reads the grammar descriptions in bas.y and generates a syntax analyzer (parser), that
includes function yyparse, in file y.tab.c. Included in file bas.y are token declarations. The -d
option causes yacc to generate definitions for tokens and place them in file y.tab.h. Lex reads the
pattern descriptions in bas.l, includes file y.tab.h, and generates a lexical analyzer, that includes
function yylex, in file lex.yy.c.

Finally, the lexer and parser are compiled and linked together to create executable bas.exe.
From main we call yyparse to run the compiler. Function yyparse automatically calls yylex to
obtain each token.
FLEX
FLEX is an open source version of LEX. Flex is an open source program designed to
automatically and quickly generate scanners, also known as tokenizers, which recognize lexical
patterns in text. Flex is an acronym that stands for "fast lexical analyzer generator. It is a free
alternative to Lex, the standard lexical analyzer generator in Unix-based systems. Flex was
originally written in the C programming language by Vern Paxson in 1987. Lex is proprietary but
versions based on the original code are available as open source. FLEX (Fast LEXical analyzer
generator) is a tool for generating scanners. First, FLEX reads a specification of a scanner either
from an input file *.lex, or from standard input, and it generates as output a C source file lex.yy.c.
Then, lex.yy.c is compiled and linked with the "-lfl" library to produce an executable a.out. Finally,
a.out analyzes its input stream and transforms it into a sequence of tokens.

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 7


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 2
Aim: Lex program to count the number of comment and eliminate them from
output.

Lex File:-
%{
int count=0;
%}
%%
“//” {count++;}
“*/”((.)*(\n)*)+”*/” {count++;}
%%
int main()
{
yyin=fopen(“pr2.c”,”r”);
yylex();
printf(“comments = %d\n”,count);
return 0;
}

C FILE:
//something missing
Hello world

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 8


COMPILER DESIGN Rohit Goswami-160850107027

Output :-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 9


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 3
Aim: LEX program to count no of Lines, Words, Letters and special
characters from given input.
%{
intlines,words,letters,special;
%}
%%
[\n] {lines++,words++;}
[a-zA-Z] {letters++;}
[ ]+ {words++;}
. {special++;}
%%
int main()
{
yyin=fopen(“p4.txt”,”r”);
yylex();
printf(“Number of lines: %d\n”,lines);
printf(“Number of words: %d\n”,words);
printf(“Number of letters: %d\n”,letters);
printf(“Number of special characters: %d\n”,special);
return 0;
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 10


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 11


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 4
Aim: C program to check given input is identifier, keywords, operators or
digits.
#include<stdio.h>
#include<ctype.h>
#include<string.h>
int main()
{
char n[10],c;
printf(“Enter an input:\n);
scanf(“%s”,n);
c=n[0];
if(isalpha(c))
{
if(strcmp(n,”do”)==0 || strcmp(n,”then”)==0 || strcmp(n,”if”)==0)
{
printf(“Given input is a keyword”);
}
else
{
printf(“Given input is an identifier”);
}
}
else if(strcmp(n,”+”)==0 || strcmp(n,”-“)==0)
{
printf(“Given input is an operator”);
}
else if(isdigit(c))
{
printf(“Given input is a digit”);
}
else
{

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 12


COMPILER DESIGN Rohit Goswami-160850107027

printf(“INVALID INPUT”);
}
return 0;
}

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 13


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 5
Aim: C program to remove Left recursion from given grammar.

#include<stdio.h>
#include<string.h>
#define SIZE 10
int main () {
char non_terminal;
char beta,alpha;
int num;
char production[10][SIZE];
int index=3;

printf("Enter Number of Production : ");


scanf("%d",&num);
printf("Enter the grammar :-\n");
for(int i=0;i<num;i++){
scanf("%s",production[i]);
}
for(int i=0;i<num;i++){
printf("\n Given GRAMMAR %s",production[i]);
non_terminal=production[i][0];
if(non_terminal==production[i][index]) {
alpha=production[i][index+1];
printf(" is left recursive.\n");
while(production[i][index]!=0 && production[i][index]!='|')
index++;
if(production[i][index]!=0) {
beta=production[i][index+1];

printf("%c->%c%c\'",non_terminal,beta,non_terminal);
printf("\n%c\'->%c%c\'|E\n",non_terminal,alpha,non_terminal);
}
else
printf(" can't be reduced\n");
}
else
printf(" is not left recursive.\n");
index=3;
}
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 14


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 15


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 6
Aim: C program to left factor the given grammar

#include<stdio.h>
#include<string.h>
#include<ctype.h>

int main()
{
char a[30],alpha[20],bita1[10],bita2[10],m[30],n[30];
inti,j,k,l,y,count=3,count1=0;
printf(“Enter a production: “);
scanf(“%s”,a);

for(i=3;a[i]!=’|’;i++)
{
count++;
}

for(j=3,i=0;j<count;j++,i++)
{
n[i]=a[j];
}
n[i]=’\0’;

for(i=count+1,j=0;a[i]!=’\0’;i++,j++)
{
m[j]=a[i];
}
m[j]=’\0’;

for(i=0,k=0;n[i]==m[i];i++,k++)
{

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 16


COMPILER DESIGN Rohit Goswami-160850107027

alpha[k]=m[i];
count1++;
}

alpha[k]=’\0’;

for(i=count1,l=0;n[i]!=’\0’;i++,l++)
{
bita1[l]=n[i];
}
bita1[l]=’\0’;

for(i=count1,y=0;m[i]!=’\0’;i++,y++)
{
bita2[y]=m[i];
}

bita2[y]=’\0’;
printf(“%c -> %s%c’”,a[0],alpha,a[0]);
printf(“\n%c’ -> %s|%s”,a[0],bita1,bita2);

return 0;

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 17


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 18


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 7
Aim: YACC program to check valid identifier.

Lex file:-
%{
#include “y.tab.h”
%}
%%
[a-zA-Z]+ {return LETTER;}
[0-9]+ {return DIGIT;}
%%

Yacc file:-
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
Variable: LETTER|LETTER rest;
rest: LETTER rest|DIGITrest|LETTER|DIGIT;
%%
int main()
{
printf(“Enter your identifier\n”);
yyparse();
printf(“Valid identifier\n”);
return 0;
}
intyyerror(char *s)
{
printf(“Invalid identifier\n”);
exit(0);
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 19


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 20


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 8
Aim: YACC file to check validity of arithmetic expression.
Lex File:-
%{
#include “y.tab.h”
%}
%%
[a-zA-Z]+ {return LETTER;}
[0-9]+ {return DIGIT;}
[+|-|*|/] {return OP;}
. {returnyytext[0];}
%%
YACC File:-
%{
#include<stdio.h>
%}
%token LETTER DIGIT OP
%%
variable: LETTER rest LETTER|DIGIT rest DIGIT;
rest: OP;
%%
int main()
{
printf(“Enter an Arithmetic Expression\n”);
yyparse();
printf(“Valid\n”);
return 0;
}
intyyerror(char *s)
{
printf(“Invalid\n”);
exit(0);
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 21


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 22


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 9
Aim: YACC program to check given string belongs to given
grammar or not.
Lex file:
%{
#include “y.tab.h”
%}
%%
[a] {return FIRST;}
[b] {return SECOND;}
%%
Yacc File:
%{
#include<stdio.h>
%}
%%
Variable: FIRST Variable SECOND|FIRST SECOND|
;
%%
int main()
{
printf(“Enter a String\n”);
yyparse();
printf(“Above String belongs to grammar\n”);
return 0;
}
intyyerror(char *s)
{
printf(“Above String does not belongs to grammar\n”);
exit(0);
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 23


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 24


COMPILER DESIGN Rohit Goswami-160850107027

Practical : 10
Aim: YACC program to implement Desk Calculator
Lex File:-
%{
#include “y.tab.h”
#include<stdio.h>
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
. {returnyytext[0];}
%%
Yacc File:-
%{
#include<stdio.h>
#include<stdlib.h>
%}
%token DIGIT
%left ‘+’’-‘’*’’/’’%’
%%
Variable: expr{printf(“Answer:%d\n”,$$);};
expr: DIGIT|expr’+’expr{$$=$1+$3;}|expr’-‘expr{$$=$1-
$3;}|expr’*’expr{$$=$1*$3;}|expr’/’expr{$$=$1/$3;}|expr’%’expr{$$=$1%$3;};
%%
int main()
{ printf(“Enter Digit\n”);
yyparse();
return 0;
}
intyyerror(char *s)
{
printf(“INVALID INPUT\n”);
exit(0);
}

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 25


COMPILER DESIGN Rohit Goswami-160850107027

Output:-

HJD INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH 26

Você também pode gostar