Escolar Documentos
Profissional Documentos
Cultura Documentos
Lex is a tool for writing lexical analyzers. Yacc is a tool for constructing parsers.
Example Example
Cool program text Cool program text if x = y then 1 else 2 fi if x = y then 1 else 2 fi Parser input (tokens) Parser input (tokens) IF ID = ID = ID THEN INT INT FI INT FI IF ID THEN INT ELSE ELSE Parser output (tree) Parser output (tree)
IF-THEN-ELSE = INT
Slide credit: Wes Weimer
INT ID
ID
The parser must distinguish between valid and invalid sequences of tokens
We need context free grammars.
ABSTRACT Computer program input generally has some structure; in fact, every computer program which does input can be thought of as dening an input language which it accepts. The input languages may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, standard input facilities are restricted, difcult to use and change, and do not completely check their inputs for validity. Yacc provides a general tool for controlling the input to a computer program. The Yacc user describes the structures of his input, together with code which is to be invoked when each such structure is recognized. Yacc turns such a specication into a subroutine which may be invoked to handle the input process; frequently, it is convenient and appropriate to have most of the ow of control in the users application handled by this subroutine. The input subroutine produced by Yacc calls a user supplied routine to return the next basic input item. Thus, the user can specify his input in terms of individual input characters, or, if he wishes, in terms of higher level constructs such as names and numbers. The user supplied routine may also handle idiomatic features such as comment and continuation conventions, which typically defy easy specication. Yacc is written in C[7], and runs under UNIX. The subroutine which is output may be in C or in Ratfor[4], at the users choice; Ratfor permits translation of the output subroutine into portable Fortran[5]. The class of specications accepted is a very general one, called LALR(1) grammars with disambiguating rules. The theory behind Yacc has been described elsewhere[1,2,3]. Yacc was originally designed to help produce the front end of compilers; in addition to this use, it has been successfully used in many application programs, including a phototypesetter language, a document retrieval system, a Fortran debugging system, and the Ratfor compiler.
1970
Yacc
Yacc specification describes a Context Free Grammar (CFG), that can be used to generate a parser. Elements of a CFG: 1. Terminals: tokens and literal characters, 2. Variables (nonterminals): syntactical elements, 3. Production rules, and 4. Start rule.
Yacc
Format of a production rule: symbol: definition {action} ; Example: A Bc is written in yacc as a: b 'c';
Yacc Format
Format of a yacc specification file:
declarations %% grammar rules and associated actions %% C programs
Declarations
To define tokens and their characteristics %token: declare names of tokens %left: define left-associative operators %right: define right-associative operators %nonassoc: define operators that may not associate with themselves %type: declare the type of variables
Declarations
%union: declare multiple data types for semantic values %start: declare the start symbol (default is the first variable in rules) %prec: assign precedence to a rule %{ C declarations directly copied to the resulting C program %} (E.g., variables, types, macros)
Printing messages
If the input stream does not match start, the default message of "syntax error" is printed and program terminates.
However, customized error messages can be generated. /*anbn1.y */ %token A B %% start: anbn '\n' {printf(" is in anbn\n"); return 0;} anbn: A B | A anbn B ; %% #include "lex.yy.c" yyerror(s) char *s; { printf("%s, it is not in anbn\n", s); }
Example Output
$anbn aabb is in anbn $anbn acadbefbg Syntax error, it is not in anbn
Positional assignment of values for items $$: left-hand side $1: first item in the right-hand side $n: nth item in the right-hand side
Example continued
$print-int 7 =7 007 =7 zippy syntax error Reenter: _
Recursive Rules
Although right-recursive rules can be used in yacc, left-recursive rules are preferred, and, in general, generate more efficient parsers.
yylval
yylex() function returns an integer, the token number, representing the kind of token read. If there is a value associated with that token, it should be assigned to the external variable yylval.
yylval
The type of yylval is int by default. To change the type of yylval use macro YYSTYPE in the declarations section of a yacc specifications file.
%{ #define YYSTYPE double %} If there are more than one data types for token values, yylval is declared as a union.
yylval
Example with three possible types for yylval: %union{ double real; /* real value */ int integer; /* integer value */ char str[30]; /* string value */ } Example: yytext = 0012, type of yylval: int, value of yylval: 12 yytext = +1.70, type of yylval: float, value of yylval: 1.7
Token types
The type of associated values of tokens can be specified by %token as %token <real> REAL %token <integer> INTEGER %token <str> IDENTIFIER STRING Type of variables can be defined by %type as %type <real> real-expr %type <integer> integer-expr
Operator Precedence
All of the tokens on the same line are assumed to have the same precedence level and associativity; The lines are listed in order of increasing precedence or binding strength.
%left '+' '-' %left '*' '/'
describes the precedence and associativity of the four arithmetic operators. Plus and minus are left associative, and have lower precedence than star and slash, which are also left associative.
References
http://memphis.compilertools.net/interpreter.html http://www.opengroup.org/onlinepubs/007908799/ xcu/yacc.html http://dinosaur.compilertools.net/yacc/index.html
lookahead token
being processed or left alone
Shift action
<lookahead_token> shift <state>
e.g.
IF shift 34 When lookahead token is IF
push down the current state on the stack, put state 34 onto stack and make it current state clear the lookahead symbol
Reduce Action
When the parser has seen the right hand side of a grammar rule and is prepared to announce that it has seen an instance of the rule, replace the right hand side by left hand side Example: . reduce 18 => means reduce grammar rule 18 Example: A : x y z; => pop off the top three states (number of rules on the RHS) from the stack, then perform Example: A goto 20 => causing state 20 to be pushed onto stack, and become the current state
Example
%token DING DONG DELL %% rhyme : sound place ; sound : DING DONG; place : DELL; $yacc -v filename.y produces a file named y.output it is a human readable description of the parser
Example
state 0 $accept: _rhyme $end DING shift 3 . error rhyme goto 1 sound goto 2 state 1 $accept:rhyme_$end $end accept . error state 2 rhyme:sound_place DELL shift 5 . error place goto 4 state 3 sound:DING_DONG DONG shift 6 . error state 4 rhyme:sound place_ (1) . reduce 1 state 5 place:DELL_ (3) . reduce 3 state 6 sound : DING DONG_ (2) . reduce 2
Current stack : 0 3 6
Current stack : 0
state 0 $accept: _rhyme DING shift 3 . error rhyme goto 1 sound goto 2
$end
In state 0, look for a goto on sound Push state 2 onto stack State 2 becomes current state Current stack : 0 2
Next token is DELL The action in state 2 on token DELL is shift 5 Push state 5 onto stack Make it current state Clear lookahead symbol
Current stack : 0 2 5
(3)
Current stack : 0 2
Current stack : 0 2 4
In state 4 only action is reduce 1 There are two symbols on the right Pop off two states from the stack Uncover state 0
Current stack : 0
$end
In state 0, goto on rhyme causes the parser to enter state 1 Push state 1 onto stack Make state 1 current state
Current stack : 0 1
In state 1 the input is read and endmarker is obtained ($end) The action is accept Successfully end the parser
Pointer Model
A pointer moves (right) on the RHS of a rule while input tokens and variables are processed
% token A B C %% start : A B C /* after reading A: start : A B C */
When all elements on the RHS are processed (pointer reaches the end of a rule) the rule is reduced If a rule reduces, the pointer returns to the rule it was called
Conflicts
There is a conflict if a rule is reduced when there is more than one pointer Conflicts are yaccs way of detecting ambiguities yacc detects conflicts when it is attempting to build the parser. Yacc looks one token ahead to see if the number of tokens reduces to one before declaring a conflict
Conflicts
Example:
% token %% start : x: A B y: A B A B C D E F x | y C D; E F;
After tokens A and B, either one of the tokens, or both will disappear. If the next token is E the first, if the next token is C the second will disappear. If the next token is anything other than C or E both will disappear Therefore, there is no conflict.
Conflicts
The other way for pointers to disappear is for them to merge in a common subrule Example: %token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;
Conflicts
Initially there are two pointers. After reading A and B these two pointers remain. Then these two pointers merge in the z rule. The state after reading token C is :
%token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;
Conflicts
However after reading A B C D this pointer splits again into two pointers %token A B C D E F %% start : x| y x: A B z E; y: A B z F; z : C D;
Note that yacc looks one token ahead before declaring any conflict. Since one of the pointers will disappear depending on the next token, yacc does not declare any conflict
Reduce-Reduce Conflict
Conflict example %token A B %% start : x B | y B ; x : A ; reduce y : A ; reduce reduce/reduce conflict on B After A there are two pointers. Both rules (x and y) want to reduce at the same time. If the next token is B, there will be still two pointers. Such conflicts are called reduce/reduce conflict
Shift-Reduce Conflict
Another type of conflict occurs when one rule reduces while the other shifts. Such conflicts are called shift/reduce conflicts
%token A R %% start : x | y R ; x : A R; (shift) y : A ; (reduce)
shift/reduce conflict on R
after A, y rule reduces x rule shifts. The next token for both cases is R.
Conflict Example
%token A %% start : x | y ; x : A ; (reduce) y : A ; (reduce)
At the end of each string there is a $end token. Therefore yacc declares reduce/reduce conflict on $end for the grammar above.
Conflicts
Empty rules %token A B %% start : empty A A | A B; empty : ; Without any tokens %token A B %% start : empty A A | A B; empty : ;
shift/reduce conflict on A
If the next token is A the empty rule will reduce and second rule (of start) will shift. Therefore yacc declares shift/reduce conflict on A
Debugging Yacc
$yacc -v filename.y produces a file named y.output for debugging purposes. Makefile
parser: y.tab.c gcc -o parser y.tab.c -ly -ll y.tab.c: parser.y lex.yy.c yacc parser.y lex.yy.c: scanner.l lex scanner.l
Example
Example
Example