Você está na página 1de 168

Lex and Yacc

CSE 4100

Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155
steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818

Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre
LandY.1

Lex and Yacc


CSE 4100

Two Compiler Writing Tools that are Utilized to easily Specify: Lexical Tokens and their Order of Processing (Lex) Context Free Grammar for LALR(1) (Yacc) Both Lex and Yacc have Long History in Computing Lex and Yacc Earliest Days of Unix Minicomputers Flex and Bison From GNU JFlex - Fast Scanner Generator for Java BYacc/J Berkeley CUP, ANTRL, PCYACC, PCLEX and PCYACC from Abacus

LandY.2

Lex A Lexical Analyzer Generator


CSE 4100

A Unix Utility from early 1970s A Compiler that Takes as Source a Specification for: Tokens/Patterns of a Language Generates a C Lexical Analyzer Program Pictorially:
Lex Source Program: lex.y lex.yy.c
Lex Compiler

lex.yy.c

C Compiler

a.out

Input stream

a.out

Sequence of tokens
LandY.3

Format of a Lexical Specification 3 Parts


CSE 4100

Declarations: Defs, Constants, Types, #includes, etc. that can Occur in a C Program Regular Definitions (expressions) Translation Rules: Pairs of (Regular Expression, Action) Informs Lexical Analyzer of Action when Pattern is Recognized Lex.y File Format: Auxiliary Procedures: DECLARATIONS Designer Defined C Code %% Can Replace System Calls TRANSLATION RULES
%% AUXILIARY PROCEDURES

LandY.4

Example lex.l File


CSE 4100 %{ #define #define #define #define #define #define #define #define #define #define #define #define #define %} letter digit ws id comment integer real string %% ":=" "else" T_IDENTIFIER T_INTEGER T_REAL T_STRING T_ASSIGN T_ELSE T_IF T_THEN T_EQ T_LT T_NE T_GE T_GT 300 301 302 303 304 305 306 307 308 309 310 311 312

User Defined Values to Each Token (else lex will assign)

Regular Expression [a-zA-Z] [0-9] Rules for later [ \t\n]+ token definitions [A-Za-z][A-Za-z0-9]* "(*"([^*]|\n|"*"+[^)])*"*"+")" [0-9]+/([^0-9]|"..") [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+) \'([^']|\'\')*\'
{printf(" %s ", yytext);return(T_ASSIGN);} {printf(" %s ", yytext);return(T_ELSE);}

Token Definitions
LandY.5

Example lex.l File


CSE 4100 "then" { #ifdef PRNTFLG Conditional compilation action printf(" %s ", yytext); #endif return(T_THEN); } "<=" {printf(" %s ", yytext);return(T_EQ);} Token "<" {printf(" %s ", yytext);return(T_LT);} Definitions "<>" {printf(" %s ", yytext);return(T_NE);} ">=" {printf(" %s ", yytext);return(T_GE);} ">" {printf(" %s ", yytext);return(T_GT);} {id} {printf(" %s ", yytext);return(T_IDENTIFIER);} {integer} {printf(" %s ", yytext);return(T_INTEGER);} {real} {printf(" %s ", yytext);return(T_REAL);} {string} {printf(" %s ", yytext);return(T_STRING);} {comment} {/* T_COMMENT */} Discard {ws} {/* spaces, tabs, newlines */} %% yywrap(){return 0;} EOF for input main() { int i; do { i = yylex(); } while (i!=0); }

Three Variables: yytext = currenttoken yylen = 12 yylval = 300


LandY.6

What is wrong with Following?


CSE 4100 letter digit ws id comment integer real string %% [a-zA-Z] [0-9] [ \t\n]+ [A-Za-z][A-Za-z0-9]* "(*"([^*]|\n|"*"+[^)])*"*"+")" [0-9]+/([^0-9]|"..") [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+) \'([^']|\'\')*\'

{id} {printf(" %s ", yytext);return(T_IDENTIFIER);} {integer} {printf(" %s ", yytext);return(T_INTEGER);} {real} {printf(" %s ", yytext);return(T_REAL);} {string} {printf(" %s ", yytext);return(T_STRING);} {comment} {/* T_COMMENT */} {ws} {/* spaces, tabs, newlines */} ":=" "else" "then" "<=" "<" "<>" ">=" ">" %%
LandY.7

{printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf("

%s %s %s %s %s %s %s %s

", ", ", ", ", ", ", ",

yytext);return(T_ASSIGN);} yytext);return(T_ELSE);} yytext);return(T_THEN);} yytext);return(T_EQ);} yytext);return(T_LT);} yytext);return(T_NE);} yytext);return(T_GE);} yytext);return(T_GT);}

Other Possible Actions


%% CSE 4100 ":=" "else" "then" "<=" ... Etc.... ">" {id} {integer} {real} {comment} {ws} {return(T_ASSIGN);} {return(T_ELSE);} {return(T_THEN);} {yylval = T_EQ; return(T_EQ);}

{yylval = T_GT; return(T_GT);}


{yylval = install_id(); return(T_IDENTIFIER);} {yylval = install_int(); return(T_INTEGER);} {yylval = install_real(); return(T_REAL);} {/* T_COMMENT */} {/* spaces, tabs, newlines */}

%% install_id() { /* A procedure to install the lexeme whose first character is pointed to by yytext and whose length is yylen into symbol table and return a pointer */ }

install_int() { /* Similar but installs an integer lexeme into symbol table */ }


install_real() { /* Similar but installs a real lexeme into symbol table */ }

LandY.8

Revisiting Internal Variables in Lex


CSE 4100

char *yytext; Pointer to current lexeme terminated by \0 int yylen; Number of chacters in yytex but not \0 yylval: Global variable through which the token value can be returned to Yacc Parser (Yacc) can access yylval, yylen, and yytext How are these used? Consider Integer Tokens: yylval = ascii_to_integer (yytext); Conversion from String to actual Integer Value

LandY.9

Using the lex Compiler


CSE 4100

Important Highlights Unix Lex defaults with respect to:


Single Rule size (2048 bytes) All Actions (20480 bytes) DFA States (512) NFA States (254)

Command Line: lex myfile.l Generates lex.yy.c pclex myfile.l Generates myfile.c -v flag Includes Statistics on State Machine, etc.

LandY.10

Highlights Generated lex.yy.c File


# define output (c) putc(c, yyout); CSE # define input() ((( yytchar=yysptr>yysbug?U(*--yysptr); getc(yyin))==10? 4100 yylineno++, yytchar):yytchar)==EOF?0:yytchar) # define uput() (yttchar= (c);if (yytchar==\n)yylineno--;*yysptr__=yytchar;}

FILE *yyin={stdin}, *yyout = {stdout};


yyinput () { return(input()); } yyoutput(c) int c { output(c); } yyunput(c) int c { upput(c); }

Compilation at Unix Command Line: lex lexfile.l (creates lex.yy.c) cc lex.yy.c ll (include lex library)

LandY.11

Full lex.yy.c File


# CSE # 4100 # # # # # # # # # include "stdio.h" define U(x) x define NLSTATE yyprevious3YYNEWLINE define BEGIN yybgin - yysvec + 1 + define INITIAL 0 define YYLERR yysvec define YYSTATE (yyestate-yysvec-1) define YYOPTIM 1 define YYLMAX BUFSIZ define output(c) putc(c,yyout) define inputO (((yytchar-yysptr>yysbuf?U(*--yysptr): getc(yyin))--10?(yylineno++,yytchar):yytchar)--EOF?0:yytchar) # define unput(c) {yytchar= (c);if(yytchar=-'\n')yylineno--;*yysptr++-yytchar;} # define yymore () (yymorfg-1) # define ECHO fprintf(yyout, "%s",yytext) # define REJECT { nstr - yyreject(); goto yyfussy;} int yyleng; extern char yytext[]; int yymorfg; extern char *yysptr, yysbuf[]; int yytchar; FILE *yyin - {stdin}, *yyout - {stdout); extern int yylineno; struct yysvf { struct yywork *yystoff; struct yysvf *yyother; int *yystops; }; struct yysvf *yyestate; extern struct yysvf yysvec[], *yybgin; LandY.12

Full lex.yy.c File


CSE 4100

#define T_IDENTIFIER 300 #define T INTEGER 301 #define T_REAL 302 #define T STRING 303 #define T_ASSIGN 304 #define T ELSE 305 #define T_IF 306 #define T_THEN 307 #define T_EQ 308 #define T LT 309 #define T_NE 310 #define T GE 311 #define T_GT 312 #define YYNEWLINE 10 yylex ( ) { int nstr; extern int yyprevious; while((nstr - yylook()) >- 0) yyfussy: switch(nstr) { case 0: if(yywrap()) return(0); break; case 1: {printf(" %s ", yytext);return(TASSIGN);} break; case 2: {printf(" %s ", yytext);return(T_ELSE);} break; case 3: (printf(" %s ", yytext) ;return (T IF) ; } break;
LandY.13

Full lex.yy.c File


CSE 4100 case 4: { #ifdef PRNTFLG printf(" %s ", yytext); #endif return(T_THEN); } break; case 5: {printf(" %s ", break; case 6: {printf(" %s ", break; case 7: {printf(" %s ", break; case 8: {printf(" %s ", break; case 9: {printf(" %s ", break; case 10: {printf(" %s ", break; case 11: {printf(" %s ", break; case 12: {printf(" %s ", break; case 13: {printf(" %s ",

yytext);return(T_EQ);} yytext);return(T_LT);} yytext);return(T_NE);) yytext);return(T_GE);} yytext);return(T_GT);} yytext);return(T_IDENTIFIER);}

yytext);return(T_INTEGER);)
yytext) ;return(T_REAL); } yytext);return(T_STRING);}

LandY.14

Full lex.yy.c File


CSE 4100

break; case 14: {/* T COMMENT */} break; case 15: {/* spaces, tabs, newlines */} break; case -1: break; default: fprintf(yyout,"bad switch yylook %d",nstr); ) return (0); } /* end of yylex */ yywrapO{} main() { int i; do { i = yylex(); } while (i!=0); }

LandY.15

A Pascal lex.l
CSE 4100 %{ #include "y.tab.h" %} letter digit ws id comment integer real string %% ":=" {return(T_ASSIGN);} ":" {return(T_COLON);} "array" {return(T_ARRAY);} "begin" {return(T_BEGIN);} "case" {return(T_CASE);} "const" {return(T_CONST);} "downto" {return(T_DOWNTO);} "do" {return(T_DO);} "else" {return(T_ELSE);} "end" {return(T_END);} "file" {return(T_FILE);} "for" {return(T_FOR);}
LandY.16

[a-zA-Z] [0-9] [ \t\n]+ [A-Za-z][A-Za-z0-9]* "(*"([^*]|\n|"*"+[^)])*"*"+")" [0-9]+/([^0-9]|"..") [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+) \'([^']|\'\')*\'

A Pascal lex.l
"function" {return(T_FUNCTION);} /* "goto" {return(T_GOTO);} */ CSE "if" {return(T_IF);} 4100 "label" {return(T_LABEL);} "nil" {return(T_NIL);} "not" {return(T_NOT);} "of" {return(T_OF);} /* "packed" {return(T_PACKED);} */ "procedure" {return(T_PROCEDURE);} "end" {return(T_END);} "program" {return(T_PROGRAM);} "record" {return(T_RECORD);} "repeat" {return(T_REPEAT);} "set" {return(T_SET);} "then" {return(T_THEN);} "to" {return(T_TO);} "type" {return(T_TYPE);} "until" {return(T_UNTIL);} "var" {return(T_VAR);} "while" {return(T_WHILE);} /* "with" {return(T_WITH);} */ "+" {return(T_PLUS);} "-" {return(T_MINUS);} "or" {return(T_OR);} "and" {return(T_AND);} "div" {return(T_DIV);} "mod" {return(T_MOD);} "/" {return(T_RDIV);}
LandY.17

A Pascal lex.l
"*" CSE "(" 4100 ")" "=" "," ".." "." "[" "]" "<=" "<" "<>" ">=" ">" "in" "^" ";" {return(T_MULT);} {return(T_LPAREN);} {return(T_RPAREN);} {return(T_EQ);} {return(T_COMMA);} {return(T_RANGE);} {return(T_PERIOD);} {return(T_LBRACK);} {return(T_RBRACK);} {return(T_EQ);} {return(T_LT);} {return(T_NE);} {return(T_GE);} {return(T_GT);} {return(T_IN);} {return(T_UPARROW);} {return(T_SEMI);}

{id} {return(T_IDENTIFIER);} {integer} {return(T_INTEGER);} {real} {return(T_REAL);} {string} {return(T_STRING);} {comment} {/* T_COMMENT */} {ws} {/* spaces, tabs, newlines */}

LandY.18

Project Part 1 Fall 2011


CSE 4100

What is Latex? Text Processing Language Embed Commands into Ascii File Opposite of Words WYSIWYG Geared Towards Publishing Particularly Prior to Newer Versions of Work Very Powerful Text Formatting Language Invented by Computer Scientist Donald Knuth http://www-cs-faculty.stanford.edu/~uno/ http://www-csfaculty.stanford.edu/~uno/abcde.html Famous for: The Art of Computer Programming http://www-csfaculty.stanford.edu/~uno/taocp.html
LandY.19

Project Part 1 Has Three Tasks


CSE 4100

Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset. Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex. Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.

LandY.20

latex.all.txt
CSE 4100

BASIC LATEX COMMANDS/OPTIONS The following discusses the Latex commands and options which will be supported in our text processor. TEXT THAT IS SHOWN IN ALL CAPITAL LETTERS CORRESPONDS TO TOKENS WHICH HAVE MANY DIFFERENT OPTIONS. 1. Section, Subsections, and Table of Contents Commands \section{STRING} \subsection{STRING} Examples or Explanation \section{Introduction} \subsection{A Text Processor} \subsection{Legal Latex Commands} \section{Using Latex} Generate a table of contents with page numbers

\tableofcontents

Specifically, it would generate: 1 Introduction 1.1 A Text Processor 1.2 Legal Latex Commands

2 Using Latex
LandY.21

latex.all.txt
2. Formatting Commands That Effect The Overall Document CSE 4100 Commands Examples or Explanation

\renewcommand{\baselinestretch}{INTEGER}

Establish the spacing 1 is single, 2 is double, etc. \pagenumbering{STYLE} STYLE is either arabic, roman, alph, Roman, or Alph arabic numbers pages using 1, 2, 3, ... etc. roman numbers pages using i, ii, iii, ... etc. alph numbers pages using a, b, c, ... etc. Roman numbers pages using I, II, III, ... etc. Alph numbers pages using A, B, C, ... etc. \arabic{COUNTER} COUNTER indicates the initial value of page numbers \roman{COUNTER} COUNTER indicates the initial value of page numbers \alph{COUNTER} COUNTER indicates the initial value of page numbers In this case, counter must be <= 26. \Roman{COUNTER} COUNTER indicates the initial value of page numbers \Alph{COUNTER} COUNTER indicates the initial value of page numbers In this case, counter must be <= 26. \vspace{INTEGER} Insert an INTEGER number of blank lines \hspace{INTEGER} Insert an INTEGER number of blank spaces \rm \it Change the font to roman Change the font to italics or underline

When the \rm or \it commands are used within curly braces, i.e., {\it The Huskies win again!}, only the text within the braces is affected. Otherwise, the command switches the mode of printing from that point on in the text. LandY.22

latex.all.txt
3. Using Backslash to Indicate a Character Rather Than a Command. CSE 4100 The backslash character (\) is used to tell Latex that the next character should be treated as a character and not as a command. The backslash is used with the following characters:

&

Without the backslash, each character has a special meaning, i.e., % is for a comment that is ignored during text processing, & divides column entries of tables, etc. With a backslash, i.e., \%, the character is interpreted as itself.

LandY.23

latex.all.txt
4. Begin/End Blocks - Centering and Verbatim CSE 4100 Begin/end blocks are used within Latex to identify a scope over which a given command applies. They are best illustrated with examples. \begin{verbatim} Four Score and Seven Years Ago Our Forefathers \end{verbatim} \begin{center} Four Score and\\ Seven Years\\ Ago Our Forefathers \end{center} The verbatim option displays the text exactly as it appears within the input file.

The center option centers the entire block of text as a single unit. The \\ are used to signal the end of a line.

This produces the output: Four Score and Seven Years Ago Our Forefathers Without the second \\, after Seven Years, the output would be: Four Score and Seven Years Ago Our Forefathers
LandY.24

latex.all.txt
Commands can be combined, such as: CSE 4100 \begin{center} \begin{verbatim} Four Score and Seven Years Ago Our Forefathers \end{verbatim} \end{center} This combination centers the entire block, exactly as it appears, without changing the indentation within each line.

The output in this case would be: Four Score and Seven Years Ago Our Forefathers

LandY.25

latex.all.txt
5. Begin/End Blocks - single and Lists CSE 4100 Begin/end blocks can also be utilized to construct lists of items automatically. For example, the following input and commands: \begin{single} \begin{itemize} \item Lexical Analyzer uses DFAs and NFAs \item Parsing using CFGs \item Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars \end{itemize} \end{single} \noindent These are some of the phases for compilation that we'll study over the course of the semester. Produces the output: - Lexical Analyzer uses DFAs and NFAs - Parsing uses CFGs - Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars. These are some of the phases for compilation that we'll study over the course of the semester. The command \noindent is used to make sure that a new paragraph is not started after the list has completed, which would occur as a default. LandY.26

latex.all.txt
The enumerate option is similar, but generates numbers for each item: CSE 4100 \begin{enumerate} \item Lexical Analyzer uses DFAs and NFAs \item Parsing using CFGs \item Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars \end{enumerate} Notice that without the single begin/end block, the following output is produced: 1. Lexical Analyzer uses DFAs and NFAs 2. Parsing uses CFGs 3. Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars.

LandY.27

A Sample Latex Input File latex.in.tex


CSE 4100 \begin{document} \pagenumbering{arabic} \arabic{5} \renewcommand{\baselinestretch}{2} \tableofcontents

\section{Introduction}
This is an example of text that would be transformed into a paragraph in latex. Blank lines between text in the input cause a new paragraph to be generated. \vspace{10} \it When the blank line occurs after a section, no indentation of the paragraph is performed. However, \hspace{20} all other blanks, would result in a 5 space indent of the paragraph. \rm \subsection{A Text Processor} A {\it text processor} is a very useful tool, since it allows us to develop formatted documents that are easy to read.

LandY.28

A Sample Latex Input File latex.in.tex


\subsection{Legal Latex Commands} CSE 4100 We have seen that there are many different Latex commands, that can be used in many different ways. However, sometimes, we wish to use a character to mean itself, and override its Latex interpretation. For example, to use curly braces, we employ the backslash \{ a set of integers \}. \section{Using Latex} Finally, there are many other useful commands that involve begin/end blocks, that establish an environment. These blocks behave in a similar fashion to begin/end blocks in a programming language, since they set a scope. We have discussed a number of examples: \begin{single} \begin{enumerate} \item single is for single spacing \item verbatim allows text that matches the what you see is what you get mode \item itemize uses ticks to indicate items \item center allows a block to be centered \end{enumerate} \end{single} \noindent It is important to note, even at this early stage, that lists may be created within lists, allowing the nesting of blocks and environments. \end{document}
LandY.29

Notes
CSE 4100

Not all of my Latex works in MikTex since it is based on an older version of Latex In prior two slides to get this to work you need to: Add \documentstyle{article} as First Line Add pt to the vspace and hspace
\vspace{10pt} \hspace{20pt}

Delete the \arabic{5} Web page has: latex.in.miktex.tex File with Changes

LandY.30

Latex Extensions - Tables And Automatic Numbering


CSE 4100 Latex is extended to support the definition of tables and their automatic numbering. As an example, consider the following: This is how tables are used in Latex. First a reference to a table, say for a table of Latex commands, must be given. Table \ref{latexcmds} is shown below. \begin{table}[h] \begin{center} \begin{tabular}{rcl} No.& Command & Explanation \\ 1 & center & allows centering of text \\ 2 & it & used for italics \\ 3 & item & used to identify items in a list \\ \end{tabular} \end{center} \caption{A Table of Latex Commands!!} \label{latexcmds} \end{table}

LandY.31

Latex Extensions - Tables And Automatic Numbering


This example produces the following output: CSE 4100 This is how tables are used in Latex. First a reference to a table, say for a table of Latex commands, must be given. Table 1 is shown below.

No. 1 2 3

Command center it item

Explanation allows centering of text used for italics used to identify items in a list

Table 1. A Table of Latex Commands!!

Notice that the table has been centered and the first column is right justified, the second column is centered, and the third column is left justified.

LandY.32

Latex Extensions - Tables And Automatic Numbering


Now, a brief explanation of the options: CSE 4100 \begin{tabular}{column-spec} options. where column-spec is any sequence of one or more r (right), l (left), or c (center)

...

& ... & ... \\

where & separates columns and \\ ends a row.


which signals the end of the table.

\end{tabular}

\begin{table}[location-options] indicates the start of the table environment, where location-options indicates where to place a table and may be either h (for here), t (for float to top of next page), or b (for float to bottom of current or next page). \caption{STRING} \label{WORD} \end{table} which indicates the tables caption which labels the caption/table with a word used to finish the table environment

Then, when \ref{WORD} appears in the text, the label is searched for and the automatic number assigned to the table is inserted.

LandY.33

Other Sample Latex Files Cent.tst


CSE 4100 \begin{document} \pagenumbering{arabic} \arabic{5} \renewcommand{\baselinestretch}{2} A Basic file that checks to see if the centering command works correctly. Note that the double backslash should indicate what should be centered and how it is centered. \begin{center} Single is for Single spacing\\ Verbatim allows text produced as is\\ Itemize uses ticks to indicate items\\ Center allows a block to be centered\\ \end{center} \end{document}

I will Send out an Email with a Zip File of Tests Your Lexical Analyzer Should Recognize All of these!

LandY.34

Project Part 1 Has Three Tasks


CSE 4100

Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset. Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex. Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.

LandY.35

Working Flex for Project Part 1 Fall 2011


CSE 4100

%{ /* THIS IS LATEX.L */

#include #define #define #define #define #define #define #define %} ws word %%

<stdio.h> TBEGIN TEND TDOCUMENT TWORD TBACKSL TLCURLYB TRCURLYB

200 201 202 203 204 205 206

[ \t\n]+ ([a-zA-Z0-9])*

LandY.36

Working Flex for Project Part 1 Fall 2011


CSE 4100

Recognize Following Tokens in Order Note \\ to Recongnize \


{printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);} {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);} {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);} {printf(" Val: %d\t; Lexeme: %s \n", TEND, yytext);return(TEND);} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);} { /* DO NOTHING */ }
LandY.37

"\\" "{" "}"

"begin"
"document" "end"

{word} {ws}

%%

Flex for Project Part 1 Fall 2011


CSE 4100

Remaining Code:
/* need main routine at bottom */ yywrap(){return 0;}

main() { int i; do { i = yylex(); printf("i is: %d ****\n", i); } while (i!= EOF); }

Building lex.yy.c and Compiling/Executing: ssh to Engineering Linux Box flex latex.l gcc lex.yy.c lfl a.out < latex.l
LandY.38

Lex.yy.c File
CSE #line 3 "lex.yy.c" 4100

#define

YY_INT_ALIGNED short int

/* A lexical scanner generated by flex */ #define FLEX_SCANNER #define YY_FLEX_MAJOR_VERSION 2 #define YY_FLEX_MINOR_VERSION 5 #define YY_FLEX_SUBMINOR_VERSION 34 #if YY_FLEX_SUBMINOR_VERSION > 0 #define FLEX_BETA #endif /* First, we deal with issues. */ /* begin #include #include #include #include platform-specific or compiler-specific

standard C headers. */ <stdio.h> <string.h> <errno.h> <stdlib.h>


LandY.39

/* end standard C headers. */

Lex.yy.c File
/* THOUSAND LINES OF CODE MISSING */
CSE 4100 void yyfree (void * ptr )

{ free( (char *) ptr ); /* see yyrealloc() for (char *) cast */ } #define YYTABLES_NAME "yytables" #line 29 "latex.l"

/* need main routine at bottom */ yywrap(){return 0;} main() { int i; do { i = yylex(); printf("i is: %d ****\n", i); } while (i!= EOF); }
LandY.40

Sample Latex Input File doc.tex and Output


\begin{document} CSE Hello world Does this work even on 4100 multiple lines \end{document} a.out Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: < doc.tex 204 204 **** 203 203 **** 205 205 **** 202 202 **** 206 206 **** 203 203 **** 203 203 **** 203 203 **** 203

; Lexeme: \ ; Lexeme: begiin ; Lexeme: { ; Lexeme: document ; Lexeme: } ; Lexeme: Hello ; Lexeme: world ; Lexeme: Does ; Lexeme: this
LandY.41

Output Continued
i is: CSE Val: 4100 i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: 203 203 203 203 203 203 203 203 203 203 203 204 204 201 201 205 205 202 202 206 206 **** ; Lexeme: work **** ; Lexeme: even **** ; Lexeme: on **** ; Lexeme: multiple **** ; Lexeme: lines ****

; Lexeme: \
**** ; Lexeme: end **** ; Lexeme: { ****

; Lexeme: document
**** ; Lexeme: } ****

LandY.42

Latexv2.l and docv2.tex


tablespec CSE colspec
4100

\[(h|t|b)\] (c|l|r)+

%% "\\" "{" "}" {printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);} {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);} {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);} {printf(" Val: %d\t; Lexeme: %s \n", TEND, yytext);return(TEND);} {printf(" Val: %d\t; Lexeme: %s \n", TTABLESPEC,yytext);return(TTABLESPEC);} {printf(" Val: %d\t; Lexeme: %s \n", TCOLSPEC, yytext);return(TCOLSPEC);} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);} { /* DO NOTHING */ }
LandY.43

"begin"
"document" "end"

{tablespec}
{colspec} {word}

{ws}

Latexv2.l and docv2.tex


\begiin{document} CSE Hello world Does [b] this work even on cclcrr [h] 4100 multiple ccc lrcll lines [t] \end{document}

LandY.44

Latexv3.l and docv2.tex


tablespec CSE colspec
4100

\[(h|t|b)\] (c|l|r)+

%% "\\" "{" "}" {printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);} {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);} {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);} {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);} {printf(" Val: %d\t; Lexeme: %s \n", TEND, yytext);return(TEND);} {printf(" Val: %d\t; Lexeme: %s \n", TTABLESPEC, yytext);return(TTABLESPEC);} {printf(" Val: %d\t; Lexeme: %s \n", TCOLSPEC, yytext);return(TCOLSPEC);} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);} { /* DO NOTHING */ }
LandY.45

"\\begin"
"\{document\}" "\\end"

{tablespec}
{colspec} {word}

{ws}

Latexv3.l and docv2.tex Output


Val: CSE i is: 4100 Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: 204 204 203 203 202 202 203 203 203 203 203 203 207 207 203 203 203 203 203 203 203 203 208 208 ; Lexeme: \ **** ; Lexeme: beigin **** ; Lexeme: {document} **** ; Lexeme: Hello **** ; Lexeme: world **** ; Lexeme: Does **** ; Lexeme: [b] Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: Val: i is: 207 207 203 203 208 208 208 208 203 203 207 207 201 201 202 202 ; Lexeme: [h] **** ; Lexeme: multiple **** ; Lexeme: ccc ****

; Lexeme: lrcll
**** ; Lexeme: lines **** ; Lexeme: [t] **** ; Lexeme: \end

****
; Lexeme: this **** ; Lexeme: work **** ; Lexeme: even ****

****
; Lexeme: {document} ****

; Lexeme: on
**** ; Lexeme: cclcrr ****

LandY.46

Project 1 Task 2 Fall 2011


CSE 4100

Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset. Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex. Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b. Design a CFG for the project that allows Latex programs (e.g., text to be formatted) to be recognized. This will provide you with important language design experience. How do you Get Started? Lets Consider Initial Grammar in Project 1 Spec
LandY.47

CFG For Latex


CSE 4100

Latex Program is defined by start_doc, end_doc, and main_body main_body is Left Recursive with Multiple main_options Main_option is either text_option or latex_otpions
---> start_doc main_body end_do ---> "\" "begin" "{" "document" "} ---> "\" "end" "{" "document" "} ---> | ---> | main_body main_option main_option text_option latex_options

latex_statement start_doc end_doc main_body

main_option

LandY.48

CFG For Latex


CSE 4100

Text_option is a sequence of words Latex_options starts with either A backslash \ A left curly brace { Backs_options can be Begin/end blocks Sections Etc.
---> | ---> | text_option "word" "\" "{" "word"

text_option

latex_options

backs_options curlyb_options

backs_options

---> | |

begin_end_opts section_options etc..YOU NEED TO COMPLETE THIS!!


LandY.49

CFG For Latex


CSE 4100

begin_end_opts begin_options

---> --->

begin_options

begin_block

end_options "}"

"begin" "{" beg_end_cmds table_options "end" "{" beg_end_cmds

end_options begin_block begin_end_cmds table_options

---> ---> ---> ---> |

"}"

WHAT ARE THE POSSIBILITIES??? "center" | "verbatim" "]" | etc...

"[" position epsilon

position
section_options

--->
---> |

"h"

"t"

"b

"section" "{" text_option "}" "subsection" "{" text_option "}"

ETC... TO BE COMPLETED BY YOU!!!!


LandY.50

CFG For Latex


CSE 4100

How would we write one of the begin_blocks, say for an Itemize List?
Itemize_list ---> | itemize_list item item What Does item go to?

What are some of the curlyb_options?


Curlyb_options ---> | Roman roman italics ---> \ rm text_option }

What are other simple backs_options?


Backs_otpions ---> |
Backs_roman

backs_roman backs_italics

---> rm text_option ??? | rm latex_ptopms ???


LandY.51

Key Issue
CSE 4100

Need to Re-Examine and Reanalyze latex.all.txt and all of the various test cases (emailed) Look for the Required Sturcture What are the Different Blocks? What are Options within Blocks? How are Nested Blocks Supported? What are Backslash and Curly Brace Options? You need to make sure that your Grammar can Parse any of the sample test cases You check this by Doing a Derivation for the Test Case or for a Portion of Latex

LandY.52

Project 1 Task 3 Fall 2011


CSE 4100

Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset. Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex. Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b. We will use Yacc Notation for the Grammar See Following Slides Notice that : replaces arrow and | still means alternate rule.\

LandY.53

Yacc For Latex


CSE #include <stdio.h> 4100

#include <ctype.h> %} %start latexstatement %token %token %token %token %token %token %token %token BACKSL WORD ITEMIZE H CAPTION TABOCON LROMAN RM LBEGIN WSWORD ENUMERATE T LABEL RENEW CROMAN IT LCURLYB SPECCHAR TABULAR B DBLBS BASELINES LALPH NOINDENT DOCUMENT CENTER TABLE R ITEM INTEGER CALPH REF RCURLYB VERBATIM LSQRB C SECTION PAGENUM VSPACE END SINGLE RSQRB L SUBSEC ARABIC HSPACE

%% latexstatement

: ; : ; :

startdoc

mainbody

enddoc

startdoc

BACKSL

LBEGIN

LCURLYB

DOCUMENT

RCURLYB

enddoc

BACKSL

END

LCURLYB

DOCUMENT

RCURLYB
LandY.54

Yacc For Latex


mainbody
CSE 4100

: | ; : | | ; : | ; : | ;

mainbody mainoption mainoption

mainoption

textoption commentoption latexoptions

textoption

textoption WORD

WORD

wstextoption

wstextoption WSWORD

WSWORD

commentoption

: ;
: | ; :

SPECCHAR

textoption

latexoptions

BACKSL backsoptions LCURLYB curlyboptions

RCURLYB

curlyboptions

BACKSL

fonts

textoption

LandY.55

Bison
CSE 4100

Compiler Writing Tool that Generates LALR(1) Parser Grammar Rules (BNF) can be Modified/Augmented with Semantic Actions via Code Segments Can work in Conjunction with Lex or Separately Three Major Parts of a Bison Specification: Declarations %% Grammar Rules %% User Supplied Programs

LandY.56

A First Example
CSE 4100

%{ /*Includes and Global Variables here*/ #include <stdio.h> #include <ctype.h> %} %start line %token DIGIT %% /* Grammar Rules */ line : expr '\n' ;
expr : expr '+' term | term ; term : term '*' fact | fact ; fact : '(' expr ')' | DIGIT ; %%

%% /* Define own yylex */ yylex(){ int c; c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } /* Error Routine */ yyerror(){} /* yyparse calls yylex */ main() { yyparse(); }

LandY.57

How Do Grammar Rules Fire?


CSE 4100

Follow RM Derivation in Reverse! Input 5 + 3 * 8


E E + T E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT DIGIT + DIGIT * DIGIT

line : expr '\n' expr : expr '+' term | term term : term '*' fact | fact fact : '(' expr ')' | DIGIT

LandY.58

Stack Performs RM Derivation in Reverse


CSE 4100

E + T E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT DIGIT + DIGIT * DIGIT
F T E + E DIGIT + E F + E T + E

DIGIT

* T + E

DIGIT * T + E

F * T + E

T + E

LandY.59

LALR State Machine

CSE 4100

(bison v *.y) Generates y.output


$accept : _line $end DIGIT shift 6 ( shift 5 . error line goto 1 expr goto 2 term goto 3 fact goto 4 state 5 fact : (_expr ) DIGIT shift 6 ( shift 5 . error expr goto 9 term goto 3 fact goto 4 state 6 $accept line_$end $end accept . error fact : DIGIT . reduce 7 state 7 (1) expr : expr +_term DIGIT shift 6 ( shift 5 . error term goto 10 U fact goto 4 state 8 expr : term_ term : term_* fact * shift 8 . reduce 3 (3) term : term *_fact DIGIT shift 6 ( shift 5 . error fact goto 11 (7)

state 0

state 1

state 2 line : expr_ expr : expr_+ term + shift 7 reduce 1 state 3

state 4 term : fact_ . reduce 5 (5)


LandY.60

LALR State Machine


CSE 4100
state 9 expr : expr_+ term fact : ( expr_) + shift 7 ) shift 12 error state 10

expr : expr + term_ (2) term : term_* fact * shift 8 reduce 2


state 11 term : term * fact_ (4) . reduce 4 state 12

fact : ( expr )_ reduce 6

(6)

7/300 terminals, 4/300 nonterminals 8/600 grammar rules, 13/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 8/350 working sets used memory: states,etc. 69/24000, parser 9/12000 9/600 distinct lookahead sets 4 extra closures 13 shift entries, 1 exceptions 7 goto entries 3 entries saved by goto default Optimizer space used: input 38/24000, output 218/12000 218 table entries, 205 zero maximum spread: 257, maximum offset: 43
LandY.61

Defining Precedence
CSE 4100

%token NUMBER %left '+' '-' %left '*' '/' %right UMINUS

Left associative and Equal precedence

%% expr : expr '+' expr {$$ = $1 + $3;}

| expr '-' expr | expr '*' expr | expr '/' expr | '(' expr ') {$$ | '-expr %prec | NUMBER ;
$$ = $2 | DIGIT

{$$ = $1 {$$ = $1 * {$$ = $1 / = $2; } UMINUS {$$

$3;} $3;} $3;} = - $2; }

UMINUS Highest precedence of all

{fact.val = expr.val} {fact.val = DIGIT.lexval}


$$ = char_to_int(yytext)
LandY.62

Automatic Ambiguity Resolution


CSE 4100

Input Grammar May be Ambiguous Bison (and others) have Default Disambiguating Rules In a Shift/Reduce Conflict, the Shift is Chosen In a Reduce/Reduce Conflict, the Reduction is to Reduce by earlier rule (listed from top-down) Cant Control S/R Conflict Resolution However, for R/R Resolution Reorder Rules to Force Different Shift Rewrite the Grammar to Remove Ambiguity Other Error is: Rule Not Reduced
If S/R Picks Shift, and Rule Never Reduced Elsewhere

LandY.63

y.output as Generated by Bison


CSE 4100

State 3 contains 1 shift/reduce conflict. Grammar rule 1 statement -> if_then opt_else rule 2 statement -> assign_stmt rule 3 if_then -> T_IF rel_expr T_THEN statement rule 4 opt_else -> /* empty */ rule 5 opt_else -> T_ELSE statement rule 6 assign_stmt -> T_IDENTIFIER T_ASSIGN value rule 7 value -> TINTEGER rule 8 value -> TREAL rule 9 value -> T STRING rule 10 rel_expr -> compare rel_op compare rule 11 compare -> T_IDENTIFIER rule 12 compare -> value rule 13 rel_op -> T_EQ rule 14 rel_op -> T_LT rule 15 rel_op -> T_NE rule 16 rel_op -> T_GE rule 17 rel_op -> T_GT
LandY.64

y.output as Generated by Bison


CSE 4100

Terminals, with rules where they appear $ (-1) error (256) T_IF (258) 3 T_THEN (259) 3 T_ELSE (260) 5 T_IDENTIFIER (261) 6 11 T_ASSIGN (262) 6 T_INTEGER (263) 7 T_REAL (264) 8 T_STRING (265) 9 T_EQ (266) 13 T_LT (267) 14 T_NE (268) 15 T_GE (269) 16 T_GT (270) 17

LandY.65

y.output as Generated by Bison


CSE 4100

Nonterminals, with rules where they appear statement (16) on left: 1 2, on right: 3 5 if_then (17) on left: 3, on right: 1 opt_else (18) on left: 4 5, on right: 1 assign_stmt (19) on left: 6, on right: 2 value (20) on left: 7 8 9, on right: 6 12 rel_expr (21) on left: 10, on right: 3 compare (22) on left: 11 12, on right: 10 rel_op (23) on left: 13 14 15 16 17, on right: 10

LandY.66

y.output as Generated by Bison


CSE 4100

state 0 T_IF T_IDENTIFIER statement if_then assign_stmt state 1 if_then -> T_IF . rel_expr T_THEN statement (rule 3) TIDENTIFIER shift, and go to state 5 TINTEGER shift, and go to state 6 T REAL shift, and go to state 7 T_STRING shift, and go to state 8 value go to state 9 rel_expr go to state 10 compare go to state 11 state 2 assign_stmt -> T_IDENTIFIER . TASSIGN value (rule 6) T_ASSIGN shift, and go to state 12 shift, and go to state 1 shift, and go to state 2 go to state 26 go to state 3 go to state 4

LandY.67

y.output as Generated by Bison


CSE 4100

state 3 statement -> if_then . opt_else (rule 1) T_ELSE shift, and go to state 13 T ELSE [reduce using rule 4 (opt_else)] $default reduce using rule 4 (opt_else) opt_else go to state 14 ... etc ... state 25 rel_expr -> compare rel_op compare (rule 10) $default reduce using rule 10 (rel_expr) state 26 $ go to state 27 state 27 $ go to state 28 state 28 $default accept

LandY.68

Hints for Writing Yacc Specifications


CSE 4100

Use All Capital Letters for Token Names and All Lower Case for Non-Terminals (Helps Debugging) Put Grammar Rules and Actions on Separate Lines (Makes Moving them Easier) Put all Rules with Same Left Hand Side Together and Utilize Veritical Bar for Alternatives Put a Semicolon After the Very Last Alternative for Each Left Hand Side and on a Separate Line Yacc Encourages Left Recursion LALR Discourages Right Recursion!

LandY.69

Project Part 2 Bison Fall 2011


CSE 4100

Two Tasks:

Note that when I last gave this project, I put intentional errors in both latex.in and latex.l. I think I took them all out of latex.l, but am not sure about latex.in.

LandY.70

Project Part 2 Bison Fall 2011


CSE 4100

Files on the Web Page:


latex.in : latex.l : latexp2.y : latexp2clean.y: projp2.tex : projp2.pdf : output : typescript : b2conflicts.txt: bison.debug.txt: latexp2.output : latexp2.tab.c : lex.yy.c : A sample input file. A sample latex flex file. Contains a bison specification Equivalent specification no fprintfs. This file - sample latex input. PDF version of Project, part 2. Generated file with parsing rules fired Output of latex.l (tokens recognized) S/R and R/R Conflicts from Bison Short Overview of Bison -v Output Complete Bison -v Output Parser Generated by Bison Lexical Analyzer Generated by Flex

LandY.71

Project Part 2 Latex.l Spec Fall 2011


%{ /* THIS IS LATEX.L */

CSE 4100 %}

ws integer punc word special %%

[ \t\n]+ [0-9]+ (\.|\,|\!|\?|\:|\;) ({punc}|[a-zA-Z0-9])* (\%|\_|\&|\$|\#)

"\\\\" "\\" "{" "}" {special} "[" "]" "alph" "Alph" "arabic" "baselinestretch" "begin" "caption" "center" "document" "end" "enumerate"

{printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf("

%s \n", yytext);fflush(stdout); return(DBLBS);} %s \n", yytext);fflush(stdout); return(BACKSL);} %s \n", yytext);fflush(stdout); return(LCURLYB);} %s \n", yytext);fflush(stdout); return(RCURLYB);} %s \n", yytext);fflush(stdout); return(SPECCHAR);} %s \n", yytext);fflush(stdout); return(LSQRB);} %s \n", yytext);fflush(stdout); return(RSQRB);} %s \n", yytext);fflush(stdout); return(LALPH );} %s \n", yytext);fflush(stdout); return(CALPH);} %s \n", yytext);fflush(stdout); return(ARABIC);} %s \n", yytext);fflush(stdout);return(BASELINES);} %s \n", yytext);fflush(stdout); return(LBEGIN);} %s \n", yytext);fflush(stdout); return(CAPTION);} %s \n", yytext);fflush(stdout); return(CENTER );} %s \n", yytext);fflush(stdout); return(DOCUMENT);} %s \n", yytext);fflush(stdout); return(END);} %s \n", yytext);fflush(stdout); return(ENUMERATE);}

LandY.72

Project Part 2 Latex.l Spec Fall 2011


CSE 4100
"hspace" "itemize" "item" "it" "label" "noindent" "pagenumbering" "ref" "renewcommand" "roman" "Roman" "rm" "section" "single" "subsection" "tableofcontents" "table" "tablular" "verbatim" "vspace" "b" "c" "h" "l" "r" "t" {integer} {word} {ws} {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); yytext);fflush(stdout); return(HSPACE);} return(ITEMIZE);} return(ITEM);} return(IT);} return(LABEL);} return(NOINDENT);} return(PAGENUM);} return(REF);} return(RENEW);} return(LROMAN);} return(CROMAN);} return(RM);} return(SECTION);} return(SINGLE);} return(SUBSEC);} return(TABOCON);} return(TABLE);} return(TABULAR);} return(VERBATIM);} return(VSPACE);} return(B);} return(C);} return(H);} return(L);} return(R);} return(T);} return(INTEGER);} return(WORD);}

{ /* DO NOTHING */ }

LandY.73

Project Part 2 Bison Fall 2011


CSE 4100

%{ /*

A VERSION OF YACC WITH FPRINTS FOR PROJECT, PART 2 */

#include <stdio.h> #include <ctype.h>

/* Define Global Vars */


FILE *fp; %} %start latexstatement

%token %token %token %token %token %token %token %token

BACKSL WORD ITEMIZE H CAPTION TABOCON LROMAN RM

LBEGIN WSWORD ENUMERATE T LABEL RENEW CROMAN IT

LCURLYB SPECCHAR TABULAR B DBLBS BASELINES LALPH NOINDENT

DOCUMENT CENTER TABLE R ITEM INTEGER CALPH REF

RCURLYB VERBATIM LSQRB C SECTION PAGENUM VSPACE

END SINGLE RSQRB L SUBSEC ARABIC HSPACE

%% latexstatement : startdoc mainbody enddoc {fprintf(fp,"after latexstatement\n");} ;


LandY.74

Project Part 2 Bison Fall 2011


startdoc : BACKSL LBEGIN CSE {fprintf(fp,"after startdoc\n");} 4100 ; enddoc : BACKSL END {fprintf(fp,"after enddoc\n");} ; LCURLYB DOCUMENT RCURLYB

LCURLYB

DOCUMENT

RCURLYB

mainbody : mainbody mainoption {fprintf(fp,"after mainbody1\n");} | mainoption {fprintf(fp,"after mainbody2\n");} ; mainoption : textoption {fprintf(fp,"after mainoption1\n");} | commentoption {fprintf(fp,"after mainoption2\n");} | latexoptions {fprintf(fp,"after mainoption3\n");} ; textoption : textoption WORD {fprintf(fp,"after textoption1\n");} | WORD {fprintf(fp,"after textoption2\n");} ;
LandY.75

Project Part 2 Bison Fall 2011


/* LOTS OF STUFF MISSING */ CSE 4100 fonts : RM {fprintf(fp,"after fonts1\n");} | IT {fprintf(fp,"after fonts2\n");} ; specialchar : | | ; SPECCHAR LCURLYB RCURLYB

nonewpara

: ;
: ;

NOINDENT

reference

REF

LCURLYB

WORD

RCURLYB

%% #include "lex.yy.c" yyerror(){} main() { fp = fopen("output","w"); yyparse(); }

LandY.76

Project Part 2 Bison Fall 2011


CSE 4100

Remove Following from end of guiho.l File


yywrapO{}

main() { int i; do { i = yylex(); } while (i!=0); }

Building lex.yy.c and Compiling/Executing: ssh to Linux flex latex.l bison v latexp2.y gcc latexp2.tab.c lfl a.out < latex.in
LandY.77

What Does Bison v Generate?


Rules never reduced CSE 4100 35 beginblock: listblock 56 optcaption: /* empty */ 58 optlabel: /* empty */

State State State State State State

12 conflicts: 1 shift/reduce 53 conflicts: 1 shift/reduce 67 conflicts: 1 shift/reduce 70 conflicts: 1 shift/reduce 73 conflicts: 1 shift/reduce 102 conflicts: 1 shift/reduce

Grammar 0 $accept: latexstatement $end


1 latexstatement: startdoc mainbody enddoc 2 startdoc: BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB 3 enddoc: BACKSL END LCURLYB DOCUMENT RCURLYB 4 mainbody: mainbody mainoption 5 | mainoption Missing Rules 89 nonewpara: NOINDENT 90 reference: REF LCURLYB WORD RCURLYB
LandY.78

What Does Bison v Generate?


Terminals, with rules where they appear CSE $end (0) 0 4100 error (256) BACKSL (258) 2 3 14 16 30 57 59 65 73 LBEGIN (259) 2 29 LCURLYB (260) 2 3 15 29 30 45 57 59 70 71 73 74 80 81 87 90 Missing Terminals NOINDENT (302) 89 REF (303) 90 Nonterminals, with rules where they appear $accept (49) on left: 0 latexstatement (50) on left: 1, on right: 0 startdoc (51) on left: 2, on right: 1 Missing Nonterminals specialchar (88) on left: 86 87 88, on right: 25 nonewpara (89) on left: 89, on right: 26 reference (90) on left: 90, on right: 27

LandY.79

These are the Item Sets!


state 0 CSE 0 $accept: . latexstatement $end 4100 BACKSL shift, and go to state 1 latexstatement go to state 2 startdoc go to state 3

state 1 2 startdoc: BACKSL . LBEGIN LCURLYB DOCUMENT RCURLYB LBEGIN shift, and go to state 4
state 2 0 $accept: latexstatement . $end $end shift, and go to state 5 state 3 1 latexstatement: startdoc . mainbody enddoc BACKSL LCURLYB WORD SPECCHAR shift, shift, shift, shift, and and and and to to to to to go go go go to to to to state state state state 10 11 12 13 14 6 7 8 9

mainbody mainoption textoption commentoption latexoptions

go go go go go

state state state state state

LandY.80

What are S/R Errors? (b2conflicts.txt)


State 12 CSE 4100 6 mainoption: textoption . 9 textoption: textoption . WORD WORD shift, and go to state 57 [reduce using rule 6 (mainoption)] reduce using rule 6 (mainoption)

WORD $default state 53

9 textoption: textoption . WORD 13 commentoption: SPECCHAR textoption . WORD shift, and go to state 57 [reduce using rule 13 (commentoption)] reduce using rule 13 (commentoption)

WORD $default

LandY.81

What are S/R Errors? (b2conflicts.txt)


state 67 CSE 4100 9 32 61 69 textoption: textoption . WORD beginblock: textoption . centerblock: textoption . DBLBS tableentry: textoption . shift, and go to state 57 shift, and go to state 97 reduce using rule 69 (tableentry) [reduce using rule 69 (tableentry)] reduce using rule 32 (beginblock)

WORD DBLBS

SPECCHAR DBLBS $default state 70

28 beginendopts: beginoptions beginblock . endoptions BACKSL BACKSL shift, and go to state 99 [reduce using rule 56 (optcaption)] go to state 100 go to state 101 go to state 102

endoptions endtableopts optcaption

LandY.82

What are S/R Errors? (b2conflicts.txt)


state 73 CSE 4100 35 beginblock: listblock . 63 listblock: listblock . anitem BACKSL BACKSL anitem state 102 shift, and go to state 65 [reduce using rule 35 (beginblock)] go to state 104

55 endtableopts: optcaption . optlabel


BACKSL BACKSL optlabel shift, and go to state 122 [reduce using rule 58 (optlabel)] go to state 123

LandY.83

Project Part 2 Bison Fall 2011


CSE 4100

What is Actually Occurring see Part 2 Spec Three Rules Never Used: 35 beginblock: listblock 56 optcaption: /* empty */ 58 optlabel: /* empty */ Certain Grammar Combos Cant Occur Six Shift/Reduce Errors Explore the Item Set and the Involved Grammar Rules Shift Always Picked What is the Grammar Behavior (rule that is Fired) based on that? What are the Options to Fix the Problem?

LandY.84

Project Part 2 Bison Fall 2011


CSE 4100

Whats the Solution? Need to Rework the Grammar so that All of the S/R Errors that Cause Problems and the Rules Not Reduced are Rectified Try Rewriting the Grammar Rules OK to Introduce S/R and R/R as Long as Program Still Parses and no Rules not Reduced This means Does it Run on All Test Cases!

LandY.85

What Tools to We have typescript File


THIS SHOWS WHERE THE TOKENS STOPPED BEING PROCESSED: CSE Script started on Sun 23 Oct 2011 11:38:50 AM EDT 4100 steve@icarus2:~/LP2# a.out < latex.in \ begin { document } \ pagenumbering { arabic } \ arabic { 5 ETC. to be centered \ end steve@icarus2:~/LP2# exit
LandY.86

What Tools to We have output File


CREATED BY FPRINTS SHOWS THE GRAMMAR RULES FIRED after AND WHERE IT FAILED textoption1 CSE after startdoc after mainoption3 after anitem 4100 after pagenumbers after mainbody1 after listblock1 after backsoptions5 after textoption2 after textoption2 after latexoptions1 after textoption1 after textoption1 after mainoption3 after textoption1 after textoption1 after mainbody2 after textoption1 after textoption1 after pagenuminit after textoption1 after textoption1 after backsoptions6 after textoption1 after textoption1 after latexoptions1 after textoption1 after anitem after mainoption3 after textoption1 after listblock1 after mainbody1 after textoption1 after textoption2 after linespacing after textoption1 after textoption1 after backsoptions4 after textoption1 after textoption1 after latexoptions1 after textoption1 after textoption1 after mainoption3 after textoption1 after textoption1 after mainbody1 after textoption1 after textoption1 after backsoptions3 after textoption1 after textoption1 after latexoptions1 after textoption1 after anitem after mainoption3 after textoption1 after listblock1 after mainbody1 after textoption1 after textoption2 after textoption1 FAILED AT THIS POINT after sectionoptions1 after textoption1 IN CONJUNCTION WITH after backsoptions2 after textoption1 TOKEN FROM TYPESCRIPT after latexoptions1 ETC

LandY.87

Project Part 2 Bison Fall 2011


CSE 4100

Recall Two Tasks:

Task 1 Involves Fixing S/R and Rules Not Reduced Errors Generate Revised latexp2.y Task 2 Involves Separate Activity to Supported Nested Blocks and Verbatim May Require Grammar and Perhaps flex Changes Need to Recognize white space for Verbatim
LandY.88

Project Part 2 Bison Fall 2011


CSE 4100

Hand in Requirements: Log File for Grammar Changes to eliminate the shift/reduce errors and other problems for Task 1
Track Original Grammar Segments and Revisions Hand in Revised Grammar for Task 1

Log File for Grammar Changes to support Nested Blocks and Verbatim for for Task 2
Track Original Grammar Segments and Revisions Hand in Revised Grammar for Task 2

Test Cases for both Tasks (own Test Cases) Compilation Instructions if Different from Default

LandY.89

Advice re. Project 2 Task 1


CSE 4100

Need to Focus on Rules not Reduced See Proj2Advice.doc that was Emailed Well Briefly Review Note that out of the six S/R errors, two do not need to be fixed For those two, need to Examine the State, the involved Rules, the Shift/Reduction Conflict The Reduction May not Occur in that State but if it Occurs in Another State May be OK Test a Sample Input Associated with Grammar Rules that are Involved

LandY.90

Optcaption and optlabel Grammar Rules


beginoptions CSE 4100 endoptions begtableopts : ; : : | | ; : ; : | ; : ; : ; : | ; : | LBEGIN LCURLYB begendcmds RCURLYB begtableopts endtableopts BACKSL END LCURLYB begendcmds RCURLYB

LSQRB position RSQRB LCURLYB tablespec RCURLYB /* epsilon move */ H | T | B

position

tablespec

tablespec colspec R | C |

colspec

colspec

endtableopts

optcaption

optlabel

optcaption

/* epsilon move */ BACKSL CAPTION LCURLYB

textoption

RCURLYB

optlabel

/* epsilon move */ BACKSL LABEL LCURLYB

WORD

RCURLYB
LandY.91

Optcaption and optlabel Grammar Rules


CSE 4100

What are the four possibilities? 1. neither is present; 2. optcaption only 3. optlabel only 4. both present Can you rewrite the grammar rules above to precisely cover these four options more explicitly? Can you alleviate the epsilon-epsilon possibility in endtableoptions? You still want that option, but if you can get the other three non-empty options (2, 3, and 4) recognized then you will likely also be able to recognize the epsilon-epsilon case.

LandY.92

Rule Not Reduced and State 73


CSE 4100

Rule not reduced: 35 beginblock: listblock state 73 35 beginblock: listblock . 63 listblock: listblock . anitem BACKSL BACKSL anitem shift, and go to state 65 [reduce using rule 35 (beginblock)] go to state 104

This always SHIFTS when seeing anitem that starts with a BACKSL!
State 65 us Processing the anitem Rule

LandY.93

Rule Not Reduced and State 73


CSE 4100

\begin{document} \begin{itemize} \item Single is for Single spacing \item Hello again \end{itemize} \end{document}
\
begin { document } \ begin { itemize } after item after after Single after is for after after Single after spacing after \ item after after Hello after again after \ end after after after startdoc begendcmds4 begtableopts3 beginoptions textoption2 textoption1 textoption1 textoption1 textoption1 anitem listblock2 textoption2 textoption1 anitem listblock1

Issue: Thinks \ will be Followed by item while Instead it is followed By end

LandY.94

Revisiting First Example via Attr. Grammars


CSE 4100

%{ /*Includes and Global Variables here*/ #include <stdio.h> #include <ctype.h> %} %start line %token DIGIT %% /* Grammar Rules */ line : expr '\n' ;
expr : expr '+' term | term ; term : term '*' fact | fact ; fact : '(' expr ')' | DIGIT ; %%

%% /* Define own yylex */ yylex(){ int c; c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } /* Error Routine */ yyerror(){} /* yyparse calls yylex */ main() { yyparse(); }

LandY.95

How Do Grammar Rules Fire?


CSE 4100

Just like Attribute Grammars! Input 5 + 3 * 8


E E + T E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT DIGIT + DIGIT * DIGIT

line : expr '\n' expr : expr '+' term | term term : term '*' fact | fact fact : '(' expr ')' | DIGIT

LandY.96

Stack Performs RM Derivation in Reverse


CSE 4100

E + T E + T * F E + T * DIGIT E + F * DIGIT E + DIGIT * DIGIT T + DIGIT * DIGIT F + DIGIT * DIGIT DIGIT + DIGIT * DIGIT
F T E + E DIGIT + E F + E T + E

DIGIT

* T + E

DIGIT * T + E

F * T + E

T + E

LandY.97

Corresponding Attribute Grammar


CSE 4100

val is a synthesized attribute


line : expr {line.val = expr.val } ; expr : expr1 '+' term {expr.val = expr1.val + term.val} | term

{expr.val = term.val}
term : term1 '*' fact {term.val = term1.val * fact.val} | fact

{term.val = fact.val}
fact : '(' expr ')

{fact.val = expr.val}
| DIGIT

{fact.val = DIGIT.lexval}
LandY.98

How Does this Transition into Bison?


CSE 4100

Bison (in y.tab.c) Maintains User-Accessible Parsing Stack Defined as:


#ifndef #define #endif YYSTYPE YYSTYPE YYSTYPE YYSTYPE int yylval, yyval; yyv[YYMAXDEPTH];

yyv

$3 $2 $1

Consider Grammar Rule S -> A B C Eventually, A B C on Stack to be Replaced by S in Reduction For that Rule, Offsets into Parsing Stack are Defined as: $1 = A, $2 = B, $3 = C
LandY.99

How Does this Transition into Yacc?


CSE 4100

yyv

$3 $2 $1

Consider Grammar Rule S -> A B C (all are nonterminals) Eventually, A B C on Stack to be Replaced by S in Reduction For that Rule, Offsets into Parsing Stack are Defined as: $1 = A, $2 = B, $3 = C

S : A {$1 B {$2 C {$3 $$ ; = 5;} = 7;} = 9; = $1 + $2 + $3;}


LandY.100

Revisiting the Attribute Grammar


CSE 4100

line : expr {line.val = expr.val } $$ = $1 expr : expr1 '+' term {expr.val = expr1.val + term.val} $$ = $1 + $3 | term

{expr.val = term.val}
$$ = $1 term : term1 '*' fact {term.val = term1.val * fact.val} $$ = $1 * $3 | fact

{term.val = fact.val}
$$ = $1 fact : '(' expr ')

{fact.val = expr.val}
$$ = $2 | DIGIT

{fact.val = DIGIT.lexval}
$$ = char_to_int(yytext)
LandY.101

Interactions Between Lex and Yacc


CSE 4100

IN LEX: char yytext[YYLMAX]; int yylength; yytext: globally passes lexeme to parser Yylval: Set in lexical analyzer Returns Token value What is place in stack yyv

IN YACC: #ifndef YYSTYPE #define YYSTYPE int #endif YYSTYPE yylval, yyval; YYSTYPE yyv[YYMAXDEPTH]; yyv S -> A B C $$ $1 $2 $3

$3 $2 $1

LandY.102

Pascal to C Conversion
CSE 4100

Utilize a Limited Subset of Pascal If-Then-Else and Assignment Statements Relational (Boolean) Expressions and Operators Conversions of Note: If-Then-Else goes to If-Else (no then in C) = Goes to == < > Goes to != := Goes to = Key Issues Define String Variables to Hold Concatenated Program Bottom Up Construction Utilizes Current Lexeme (yytext) Concatenated with Appropriate Conversions Information Passes Up the Grammar
LandY.103

Pascal to C Conversion
CSE 4100

%{ #include <stdio.h> #include <ctype.h> char strans[100], atrans[100], itrans[100], etrans[100], vtrans[100], retrans[100], ctrans[100], rtrans[100]; %} %start statement %token T_IF T_THEN T_ELSE T_IDENTIFIER T_ASSIGN T_INTEGER T_REAL %token T_STRING T_EQ T_LT T_LE T_NE T_GE T_GT

%% statement : if_then opt_else {strcpy(strans, itrans); strcat(strans, etrans); printf("%s\n", strans);} | assign_stmt {strcat(strans, atrans); printf("%s\n", strans);} ;
if_then : T_IF rel_expr {strcpy(itrans, "if "); strcat(itrans, retrans);} T_THEN assign_stmt{strcat(itrans, atrans);} ;

LandY.104

Pascal to C Conversion
CSE 4100

opt_else : /* the empty case */ {strcpy(etrans, "");} | T_ELSE assign_stmt {strcpy(etrans, " else "); strcat(etrans, atrans);} ; assign_stmt : T_IDENTIFIER {strcpy(atrans, yytext);} T_ASSIGN {strcat(atrans, "=");} value {strcat(atrans, vtrans);} ; value : T_INTEGER {strcpy(vtrans, yytext);} | T_REAL {strcpy(vtrans, yytext);} | T_STRING {strcpy(vtrans, yytext);} ; rel_expr : compare {strcpy(retrans, ctrans);} rel_op {strcat(retrans, rtrans);} compare {strcat(retrans, ctrans);} ;
LandY.105

Pascal to C Conversion
CSE 4100

compare : T_IDENTIFIER {strcpy(ctrans, yytext);} | value {strcpy(ctrans, yytext);} ; rel_op : T_EQ {strcpy(rtrans, "==");} | T_LT {strcpy(rtrans, "<");} | T_LE {strcpy(rtrans, "<=");} | T_NE {strcpy(rtrans, "!=");} | T_GE {strcpy(rtrans, ">=");} | T_GT {strcpy(rtrans, ">");} ; %% #include "lex.yy.c" yyerror(){} main() { yyparse(); }

LandY.106

What would Pascal to C Generate?


CSE 4100

/* SAMPLE INPUT ... */ procedure MAIN is X, Y: INTEGER; A, B, C: FLOAT; D, E: CHARACTER; begin if (X = Y) and (Z /= W) then Z:= X; if (A <= B) then A := B; end if; X := X + 1; else Y:=Y+1; end if; A :=B +C * D; A :=B * C / D; end MAIN;

LandY.107

What would Pascal to C Generate?


CSE 4100

/* AND OUTPUT */ TYPE BEING CONVERTED TO: TYPE BEING CONVERTED TO: TYPE BEING CONVERTED TO: assign_stmt*** Z = X ; assign stmt*** A = B ; if stmt*** if ( A <= B { A = B ; } assign stmt*** X = X + 1 assign_stmt*** Y = Y + 1 if stmt*** if ( X == Y && Z != W { Z =- X ; if ( A <= B { A = B ; } X = X + 1 ; } else { Y = Y + 1; } assign_stmt*** A = B + C assign_stmt*** A = B * C

int float char

; ;

* D ; / D ;

LandY.108

Redefine Parsing Stack


CSE 4100

%{ #include <stdio.h> #include <ctype.h> Typedef char *stype; #define YYSTYPE stype; char strans[100], atrans[100], itrans[100], etrans[100], vtrans[100], retrans[100], ctrans[100], rtrans[100]; %} . . . Etc . . . %% statement : if_then opt_else {strcat(itrans, etrans); $$ = itrans; printf("%s\n", $$);} | assign_stmt {$$ = atrans; printf("%s\n", $$);} ; IN Y.TAB.C REDEFINES CONTENTS OF PARSING STACK #ifndef YYSTYPE #define YYSTYPE int #endif YYSTYPE yylval, yyval; YYSTYPE yyv[YYMAXDEPTH];
LandY.109

Utilizing Unions to Redefine Parsing Stack


CSE 4100

Unions Define Ability of Data Structure to be of Multiple Types (one or other attribute active) Consider the C Union Definition:

union EITHEROR /* Union Type Name */ { char trans[100]; int XYZ; } EOR; /* Variable Name */ EOR.trans is a string (use strcpy, strcat, etc.) EOR.XYZ is an int (use assignment, boolean expr, etc.)

Only trans or XYZ has a value but NOT both!

LandY.110

Utilizing Unions to Redefine Parsing Stack


CSE 4100

%{ #include <stdio.h> #include <ctype.h> %} %start statement %union { char trans[100]; int XYZ; }

Union Definition

%token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GE T_GT

%type <trans> statement if_then opt_else %type <trans> assign_stmt value compare %type <trans> rel_op variable rel_expr

Redefines nonterminals of type <trans> to allow them to be that part of the union

/* ALSO, types and tokens for XYZ are possible */ %%


LandY.111

Utilizing Unions to Redefine Parsing Stack


CSE 4100

What Does the Parsing Stack now Contain


IN YACC: YYSTYPE yylval, yyval; YYSTYPE yyv[YYMAXDEPTH];

IN LEX: char yytext[YYLMAX]; int yylength;

THIS EFFECTIVELY REPLACES YYSTYPE %union { yyv char trans[100]; int XYZ; } S -> A B C $$ $1 $2 $3 $$.trans $1.XYX $2.trans Etc.

$3 $2 $1

LandY.112

Unions for Pascal to C Conversion


CSE 4100

statement : if_then opt_else {strcpy($$, $1); strcat($$, $2); printf("%s\n", $$);} | assign_stmt {strcpy($$, $1); printf("%s\n", $$);} ;

if_then : T_IF rel_expr {strcpy($$, strcat($$, strcat($$, ;

T_THEN assign_stmt " if "); $2); $4);}

opt_else : /* the empty case */ {strcpy($$, "");} | T_ELSE assign_stmt {strcpy($$, " else "); strcat($$, $2);} ;
LandY.113

Unions for Pascal to C Conversion


CSE 4100

assign_stmt : variable T_ASSIGN value {strcpy($$, $1); strcat($$, " = "); strcat($$, $3);} ; value : T_INTEGER {strcpy($$, yytext);} | T_REAL {strcpy($$, yytext);} | T_STRING {strcpy($$, yytext);} ; rel_expr : compare rel_op compare {strcpy($$, $1); strcat($$, $2); strcat($$, $3);} ; compare : T_IDENTIFIER {strcpy($$, yytext);} | value {strcpy($$, yytext);}
LandY.114

Unions for Pascal to C Conversion


CSE 4100

variable : T_IDENTIFIER {strcpy($$, yytext);} ; rel_op : T_EQ {strcpy($$, | T_LT {strcpy($$, | T_LE {strcpy($$, | T_NE {strcpy($$, | T_GE {strcpy($$, | T_GT {strcpy($$, ; %% #include "lex.yy.c" yyerror(){} yywrap(){} main() { yyparse(); }

" == ");} " < ");} " <= ");}

" != ");}
" >= ");} " > ");}

LandY.115

Also Possible to Redefine Tokens


CSE 4100

%{ #include <stdio.h> #include <ctype.h> %} %start statement %union { char trans[100]; int XYZ; } %token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GE T_GT

%type <trans> T_IDENTIFIER T_ASSIGN etc . . . type <trans> statement if_then opt_else %type <trans> assign_stmt value compare %type <trans> rel_op variable rel_expr
/* ALSO, types and tokens for XYZ are possible */ %%
LandY.116

Also Possible to Redefine Tokens


CSE 4100

assign_stmt : T_IDENTIFIER T_ASSIGN value {strcpy($$, $1); strcat($$, " = "); strcat($$, $3);} ; value : T_INTEGER {strcpy($$, yytext);} | T_REAL {strcpy($$, yytext);} | T_STRING {strcpy($$, yytext);} ;

LandY.117

Project Part 3 Bison Fall 2011


CSE 4100

Using Bison for Syntax Directed Translation Implementation of Attribute Grammar Given Input Latex File: Basic Text Processing Capabilities Advanced Text Processing Capabilities Nested Blocks in Single Enviornment Full Blown Verbatim Type checking for
Begin/End Blocks Combinations of Blocks Tabular Specification

Documentation (written using your Latex Syntax Directed Translator and Document Generator)
LandY.118

Files on Web Page


CSE 4100

latex.l : latexp3c.y :

Common lexical analyzer specification Yacc file with nested blocks, WS, and verbatim along with basic code generation

latexp3c.output : S/R and R/R Conflicts - Are all OK? generate.c : Basic routines for formatted text generation util.c : Utility routines latex.input.txt : latexout.txt : latextoc.txt : Sample input Generated output for sample (with errors!) Generated table of contents for sample

proj3gs.doc : Grading Sheet - place initials next to which parts each person on the team was primarily responsible for.

LandY.119

Project Part 3 Bison Final Project Reqrmts


CSE 4100

Your Revised latexp3c.y file You may have multiple versions for each of the major Document Generation Capabilities Documentation of your Solution in Latex Using your Syntax Directed Translator/Generator Assumptions Log file with Major design decisions, problems, etc. Test Cases and Test Results (to be supplied) Zip File (lastnames.zip) 42 Students 21 Teams of 2! Email me your Teams by Nov 11th!

LandY.120

Project Part 3 Grading Sheet


CSE 4100 Project Part 3 Grading Sheet Student Name: Basic Text Processing Capabilities (35 points total) Section/Subsection/Table of Contents (5 Line Spacing/Single-Double-Triple (5 Page Numbering/Styles (2.5 Vertical Spacing (2.5 Italics/Roman Fonts (2.5 Paragraphs/Noindent (2.5 Right Justification (10 Begin/End Single Blocks (5 _____ Testing - latex.tst file Advanced Text Processing Capabilities (55 points total) ___ item.tst Itemize Blocks (5 ___ enum.tst Enumerate Blocks (5 ___ cent.tst Center Blocks (5 ___ verb.tst Verbatim Blocks (5 ___ tab.tst Tabular Blocks (10 ___ cent.tst Table Blocks with Refs/Captions (5 ___ sing.tst Relevant Combinations of Blocks (20 ___ nest.tst Single around Itemize/Enum/Center All Combos of Itemize/Enum Center around Tabular/Verbatim points) points) points) points) points) points) points) points) _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____

points) points) points) points) points) points) points)

LandY.121

Project Part 3 Grading Sheet


CSE 4100 Documentation, Log, Testing (10 points total) - 5 points if documentation not Latex executable - Testing of three project files Testing: _____ projp1.tst _____ projp2.tst _____ projp3.tst Nested Blocks within Single Environment (15 points total) Grammar Changes (5 points) Implementation and Testing (10 points) Testing: _____ nblocks.tst Full Blown Verbatim (15 points total) Grammar Changes (5 points) Implementation and Testing (10 points) Testing: _____ fverbat.tst Type/Error Checking (20 points total) Begin/End Blocks - Matching (5 points) Adv Begin/End Blocks - Limited Combos (5 points) Tabular Specifications Cols vs. Entries (5 points) Testing: ____ tcbe.tst ____ tcabe.tst ____ tcts.tst SUBTOTAL(150): Standard Deductions (At most 10 points) _____ _____ (up to 5 points) (up to 5 points) (up to 5 points) TOTAL(150): _____ _____ _____ _____
LandY.122

_____

_____

_____

_____ _____ _____ _____

No Directory Location/Compilation Instr. Lack of Comments Other

The flex File latex.l


CSE 4100 /* THIS IS latex.l */ %{ /* A LEX FOR PART 3 OF THE PROJECT WHERE VERBATIM WORKS */ %} ws [ \t\n]+ punc (\.|\,|\!|\?) word ({punc}|[a-zA-Z0-9])* special (\%|\_|\&|\$|\#) cols (r|l|c)* %% "\\\\" {special} "[" "]"

{printf(" {printf(" {printf(" {printf("

%s %s %s %s

\n", \n", \n", \n",

yytext);return(DBLBS);} yytext);return(SPECCHAR);} yytext);return(LSQRB);} yytext);return(RSQRB);}

"\\alph" {printf(" %s \n", yytext);return(LALPH1);} "{alph}" {printf(" %s \n", yytext);return(LALPH2);} "\\Alph" {printf(" %s \n", yytext);return(CALPH1);} "{Alph}" {printf(" %s \n", yytext);return(CALPH2);} "\\arabic" {printf(" %s \n", yytext);return(ARABIC1);} "{arabic}" {printf(" %s \n", yytext);return(ARABIC2);} "\\baselinestretch" {printf(" %s \n", yytext);return(BASELINES);} "\\begin" {printf(" %s \n", yytext);return(LBEGIN);} "\\caption" {printf(" %s \n", yytext);return(CAPTION);} "{center}" {printf(" %s \n", yytext);return(CENTER );}

LandY.123

The flex File latex.l


CSE 4100 "{document}" "\\end" "{enumerate}" "\\hspace" "{itemize}" "\\item" "\\it" "\\label" "\\noindent" "\\pagenumbering" "\\ref" "\\renewcommand" "\\roman" "{roman}" "\\Roman" "{Roman}" "\\rm" "\\section" "{single}" "\\subsection" "\\tableofcontents" "{table}" "{tabular}" "{verbatim}" "\\vspace" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", \n", yytext);return(DOCUMENT);} yytext);return(END);} yytext);return(ENUMERATE);} yytext);return(HSPACE);} yytext);return(ITEMIZE);} yytext);return(ITEM);} yytext);return(IT);} yytext);return(LABEL);} yytext);return(NOINDENT);} yytext);return(PAGENUM);} yytext);return(REF);} yytext);return(RENEW);} yytext);return(LROMAN1);} yytext);return(LROMAN2);} yytext);return(CROMAN1);} yytext);return(CROMAN2);} yytext);return(RM);} yytext);return(SECTION);} yytext);return(SINGLE);} yytext);return(SUBSEC);} yytext);return(TABOCON);} yytext);return(TABLE);} yytext);return(TABULAR);} yytext);return(VERBATIM);} yytext);return(VSPACE);}

LandY.124

The flex File latex.l


CSE 4100 "b" "h" "t" {cols} "{" "}" {word} {ws} {printf(" {printf(" {printf(" {printf(" {printf(" {printf(" %s %s %s %s %s %s \n", \n", \n", \n", \n", \n", yytext);return(B);} yytext);return(H);} yytext);return(T);} yytext);return(COLS);} yytext);return(LCURLYB);} yytext);return(RCURLYB);}

{printf(" %s \n", yytext);return(WORD);} {printf("ws--%s--ws\n", yytext); if ((strcmp(yytext, "\n\n") == 0) && (ws_flag == 0)) return(WS); else if (ws_flag == 1) return(WS);}

%%

LandY.125

The Bison File latexp3c.y


CSE 4100

/* THIS IS latexp3code.y */ %{ /* A YACC FOR PART 3 OF THE PROJECT WHERE VERBATIM AND NESTING WORKS */ #include <stdio.h> #include <ctype.h> #include <string.h> #define BUF_SIZE 512 int ws_flag = 0; #include "lex.yy.c" #include "util.c" #include "generate.c" %}

%union {
char int } trans[BUF_SIZE+1]; val;

%start latexstatement
LandY.126

The Bison File latexp3c.y


CSE 4100 %token %token %token %token %token %token %token %token %token BACKSL WORD ITEMIZE H CAPTION TABOCON LROMAN1 RM ARABIC2 LBEGIN WS ENUMERATE T LABEL RENEW CROMAN1 IT LROMAN2 LCURLYB SPECCHAR TABULAR B DBLBS BASELINES LALPH1 NOINDENT CROMAN2 DOCUMENT CENTER TABLE COLS ITEM PAGENUM CALPH1 REF LALPH2 RCURLYB VERBATIM LSQRB SECTION INTEGER VSPACE CALPH2 END SINGLE RSQRB SUBSEC ARABIC1 HSPACE

%type <trans> textoption wsorword %type <val> style2 ARABIC2 LROMAN2 CROMAN2 LALPH2 CALPH2

%%
NOTE: YOU NEED TO ADD %type for ALL NON-TERMINALS and TOKENS that you wish to use the $$, $1, $2, etc. notation and the redefined parsing stack.

LandY.127

The Bison File latexp3c.y


latexstatement
CSE 4100

: ; : ; : ; : | ; :

startdoc

mainbody

enddoc

startdoc

LBEGIN

DOCUMENT

enddoc

END

DOCUMENT

mainbody

mainbody mainoption mainoption

mainoption

| | ;

textoption { generate_formatted_text($1); } commentoption latexoptions

LandY.128

The Bison File latexp3c.y


textoption
CSE 4100

textoption wsorword { strcat($$, " "); strcat($$, $2); } wsorword { strcpy($$, $1); }

wsorword

WS {
strcpy($$, yytext); } WORD { strcpy($$, yytext); }

; commentoption : ; SPECCHAR textoption

LandY.129

The Bison File latexp3c.y


latexoptions
CSE 4100

: | ; : ; : | | | | | | | | | | ; : ;

backsoptions LCURLYB curlyboptions

RCURLYB

curlyboptions

fonts

textoption

backsoptions

beginendopts sectionoptions tableofcont linespacing pagenumbers pagenuminit spacing fonts specialchar nonewpara reference

beginendopts

LBEGIN

begcmds

beginblock

endbegin

LandY.130

The Bison File latexp3c.y


CSE 4100

begcmds

: | | | | | | ; : | ; : | | | | | ;

CENTER VERBATIM {ws_flag=1;} SINGLE ITEMIZE ENUMERATE TABLE begtableopts TABULAR begtabularopts

endbegin

END endcmds endtableopts

TABLE

endcmds

CENTER VERBATIM {ws_flag=0;} SINGLE ITEMIZE ENUMERATE TABULAR

LandY.131

The Bison File latexp3c.y


beginblock
CSE 4100

: | | | ;

beginendopts textoption /* FOR single or verbatim */ {printf("single or verb\n");} entrylist /* FOR center and tabular */ {printf("center or tabular\n");} listblock /* FOR item and enumerate */ {printf("item or enumerate\n");}

listblock

listblock

anitem {printf("listblockA\n");} {printf("listblockB\n");}

|
; anitem : | ; : |

anitem

ITEM textoption beginendopts

entrylist

entrylist anentry

anentry {printf("entrylistA\n");} {printf("entrylistB\n");}

;
LandY.132

The Bison File latexp3c.y


anentry
CSE 4100

: | ;

entry

DBLBS {printf("anentryA\n");}

beginendopts {printf("anentryB\n");}

entry

: | ;

entry

SPECCHAR

textoption {printf("entryA\n");} {printf("entryB\n");}

textoption

begtableopts

: ; : ; : | | ;

LSQRB

position

RSQRB

begtabularopts

LCURLYB

COLS

RCURLYB

position

H T B

LandY.133

The Bison File latexp3c.y


endtableopts
CSE 4100

: | | ;

END CAPTION labelrest

LCURLYB

textoption

RCURLYB

captionrest

captionrest

: | ; : ; :

END labelrest

labelrest sectionoptions

LABEL

LCURLYB

WORD

RCURLYB

END

SECTION LCURLYB textoption RCURLYB { generate_sec_header(get_sec_ctr(), $3); incr_sec_ctr(); } SUBSEC LCURLYB textoption RCURLYB { generate_subsec_header(get_sec_ctr(), get_subsec_ctr(), $3); incr_subsec_ctr(); }
LandY.134

The Bison File latexp3c.y


tableofcont
CSE 4100

TABOCON { set_gen_toc(); }

; linespacing : ; pagenumbers : PAGENUM style2 { set_page_style($2); } RENEW LCURLYB BASELINES RCURLYB LCURLYB WORD RCURLYB

; style2 : | | | | ; ARABIC2 LROMAN2 CROMAN2 LALPH2 CALPH2

LandY.135

The Bison File latexp3c.y


pagenuminit
CSE 4100

style1 LCURLYB WORD { set_page_no(yytext[0]); } RCURLYB

; style1 : | | | | ; : ; ARABIC1 LROMAN1 CROMAN1 LALPH1 CALPH1

spacing

horvert

LCURLYB

WORD

RCURLYB

horvert

: | ;
: | ;

VSPACE HSPACE

fonts

RM IT
LandY.136

The Bison File latexp3c.y


specialchar
CSE 4100

: | | ; : ; : ;

SPECCHAR LCURLYB RCURLYB

nonewpara

NOINDENT

reference %% yyerror(){}

REF

LCURLYB

WORD

RCURLYB

main() { fpout = fopen("latexout","w"); fptoc = fopen("latextoc","w"); init_lines_so_far(); init_sec_ctr(); init_output_page(); yyparse(); }


LandY.137

Latex.input.txt
CSE 4100 \begin{document} \pagenumbering{arabic} \arabic{5} \renewcommand{\baselinestretch}{2} \tableofcontents \section{Introduction} This is an example of text that would be transformed into a paragraph in latex. Blank lines between text in the input cause a new paragraph to be generated. When the blank line occurs after a section, no indentation of the paragraph is performed. However, all other blanks, would result in a five space indent of the paragraph. \subsection{A Text Processor} A text processor is a very useful tool, since it allows us to develop formatted documents that are easy to read.

LandY.138

Latex.input.txt
CSE 4100 \subsection{Legal Latex Commands} We have seen that there are many different Latex commands, that can be used in many different ways. However, sometimes, we wish to use a character to mean itself, and override its Latex interpretation. For example, to use curly braces, we employ the backslash a set of integers. \section{Using Latex} Finally, there are many other useful commands that involve begin/end blocks, that establish an environment. These blocks behave in a similar fashion to begin/end blocks in a programming language, since they set a scope. We have discussed a number of examples. It is important to note, even at this early stage, that lists may be created within lists, allowing the nesting of blocks and environments. \end{document}a

LandY.139

latexout.txt
CSE 4100

1 Introduction

This is an example of text that would be transformed into a paragraph in latex. Blank lines between text in the input cause a new paragraph to be generated. When the blank line occurs after a section, no indentation of the paragraph is performed. However, all other blanks, would result in a five 2.1 A Text Processor

A text processor is a very useful tool, since it allows us to develop formatted documents that are easy to

LandY.140

latexout.txt
2.2 Legal Latex Commands
CSE 4100

We have seen that there are many different Latex commands, that can be used in many different ways. However, sometimes, we wish to use a character to mean itself, and override its Latex interpretation. For example, to use curly braces, we employ the backslash a 2 Using Latex Finally, there are many other useful commands that involve begin end blocks, that establish an environment. These blocks behave in a similar fashion to begin end blocks in a programming language, since they set a scope. We have discussed a number of examples. It is important to note, even at this early stage, that lists may be created within lists, allowing the nesting of WHY DOESNT PRINT IT ALL OUT?
LandY.141

latextoc.txt
CSE 4100

1 Introduction ---------- PAGE 5 2.1 A Text Processor ---------- PAGE 5 2.2 Legal Latex Commands ---------- PAGE 5 2 Using Latex ---------- PAGE 5

LandY.142

The util.c File


CSE 4100

FILE *fpout; FILE *fptoc; #define #define #define #define char int OUT_WIDTH SPACE_LEFT LINES_PER_PAGE TOC_ON 40 5 40 1

line[OUT_WIDTH + 1]; lines_so_far;

void init_lines_so_far() { lines_so_far = 0; } void incr_lines_so_far() { lines_so_far++; } int check_done_page() { if (lines_so_far < LINES_PER_PAGE) return 1; else return 0; }
LandY.143

The util.c File


CSE 4100

struct {

doc_symtab int int int int int int int page_no_counter; page_style; line_spacing; current_font; generate_toc; section_counter; subsect_counter;

}; struct doc_symtab DST;

void init_sec_ctr() { DST.section_counter = 1; DST.subsect_counter = 1; } void incr_sec_ctr() { DST.section_counter++; DST.subsect_counter = 1; }

LandY.144

The util.c File


CSE 4100

void incr_subsec_ctr() { DST.subsect_counter++; }

int get_sec_ctr() { return DST.section_counter; }


int get_subsec_ctr() { return DST.subsect_counter; } int get_gen_toc() { return DST.generate_toc; } void set_gen_toc() { DST.generate_toc = 1; }
LandY.145

The util.c File


CSE 4100

void set_page_no(p) char p; { DST.page_no_counter = p - '0'; } int get_page_no(p) { return DST.page_no_counter; }

int inc_page_no() { DST.page_no_counter++; return (DST.page_no_counter - 1); } void set_page_style(s) int s; { DST.page_style = s; }

LandY.146

The generate.c File


CSE 4100

/* THIS IS THE generate.c FILE */ init_output_page() { fprintf(fpout, "\n\n\n\n\n"); fflush(fpout); } void generate_sec_header(i, s) int i; char *s; { fprintf(fpout, "\n\n%d %s\n", i, s); fflush(fpout); if (get_gen_toc() == TOC_ON) fprintf(fptoc, "\n%d %s ---------- PAGE %d\n", i, s, get_page_no()); }

LandY.147

The generate.c File


CSE 4100

void generate_subsec_header(i, j, s) int i,j; char *s; { fprintf(fpout, "\n\n%d.%d %s\n", i, j, s); fflush(fpout); if (get_gen_toc() == TOC_ON) fprintf(fptoc, "\n%d.%d %s ---------- PAGE %d\n", i, j, s, get_page_no()); }

LandY.148

The generate.c File


CSE 4100

void generate_formatted_text(s) char *s; { int slen = strlen(s); int i, j, k, r; int llen; for (i = 0; i <= slen; ) { for (j = 0; ((j < OUT_WIDTH) && (i <= slen)); i++, j++) line[j] = s[i]; if (i <= slen) { if ((line[j-1] != ' ') && (s[i] !=' ')) { for (k = j-1; line[k] != ' '; k--) ; i = i - (j - k - 1); j = k; } for ( ; s[i] == ' '; i++) ; }

LandY.149

The generate.c File


CSE 4100

line[j] = '\0'; llen = strlen(line); if (i <= slen) { fprintf(fpout, "\n%s", line); fflush(fpout); } else { for(r = 0; r <= llen; r++) s[r] = line[r]; /* includes backslash 0 */ } } }

LandY.150

Using Structures in %union


CSE 4100

#define BUF_SIZE struct { symtabtest int char char };

512

a, b; c[BUF_SIZE]; d[BUF_SIZE];

%} %start latexstatement %union { struct int }

symtabtest val;

st;

%token ETC... %type <st> entrylist entry DBLBS listblock anitem %type <st> textoption wsorword WORD WS ITEM

%% ETC...

LandY.151

Using Structures in %union


CSE 4100

mainoption

textoption

| | ; :

textoption { fprintf(fp, "%d %d %s %s\n", $1.a, $1.b, $1.c, $1.d); } commentoption latexoptions textoption wsorword { $$.a = 5; } wsorword { $$.b = 10; }

wsorword

; :

WS { strcpy($$.c, yytext); } WORD { strcpy($$.d, yytext); }


LandY.152

Additional Lex/Yacc Examples


CSE 4100

Consider Ada9X (originally Ada95 and now Ada2005) is a Package Based, OO Programming Language Builds Upon the Original Ada Language Extension of Pascal Developed as a Language for DoD Named After Ada Lovelace (1815-1852) Worked on Charles Babbages Early Mechanical Gerneral Purpose Computer/Analytical Engine The worlds First Programmer Wrote the worlds First Computer Program on Bernoulli Numbers

LandY.153

Ada9X Lex
CSE 4100 %{ /******* A "lex"-style lexer for Ada 9X ****************************/ /* Copyright (C) Intermetrics, Inc. 1994 Cambridge, MA USA */ /* Copying permitted if accompanied by this statement. */ /* Derivative works are permitted if accompanied by this statement.*/ /* This lexer is known to be only approximately correct, but it is */ /* more than adequate for most uses (the lexing of apostrophe is */ /* not as sophisticated as it needs to be to be "perfect"). */ /* As usual there is *no warranty* but we hope it is useful. */ /*******************************************************************/ int error_count; %} DIGIT EXTENDED_DIGIT INTEGER EXPONENT DECIMAL_LITERAL BASE BASED_INTEGER BASED_LITERAL [0-9] [0-9a-zA-Z] ({DIGIT}(_?{DIGIT})*) ([eE](\+?|-){INTEGER}) {INTEGER}(\.?{INTEGER})?{EXPONENT}? {INTEGER} {EXTENDED_DIGIT}(_?{EXTENDED_DIGIT})* {BASE}#{BASED_INTEGER}(\.{BASED_INTEGER})?#{EXPONENT}?

LandY.154

Ada9X Lex
CSE 4100

%% "." "<" "(" "+" "|" "&" "*" ")" ";" "-" "/" "," ">" ":" "=" "'" ".." "<<" "<>" "<=" "**" "/=" ">>" ">=" ":=" "=>"

return('.'); return('<'); return('('); return('+'); return('|'); return('&'); return('*'); return(')'); return(';'); return('-'); return('/'); return(','); return('>'); return(':'); return('='); return(TIC); return(DOT_DOT); return(LT_LT); return(BOX); return(LT_EQ); return(EXPON); return(NE); return(GT_GT); return(GE); return(IS_ASSIGNED); return(RIGHT_SHAFT);
LandY.155

Ada9X Lex
CSE 4100 [a-zA-Z](_?[a-zA-Z0-9])* { return(lk_keyword(yytext)); } "'"."'" return(char_lit); \"(\"\"|[^\n\"])*\" return(char_string); {DECIMAL_LITERAL} return(numeric_lit); {BASED_LITERAL} return(numeric_lit); --.*\n ; [ \t\n\f] ; . {fprintf(stderr, " Illegal character:%c: on line %d\n", *yytext, yylineno); error_count++;} %% /* * Keywords stored in alpha order */ typedef struct { char * kw; int kwv; } KEY_TABLE; /* Reserved keyword list and Token values * as defined in y.tab.h */ # define NUM_KEYWORDS 69
LandY.156

Ada9X Lex
KEY_TABLE key_tab[NUM_KEYWORDS] = { CSE {"ABSTRACT", ABSTRACT}, {"ACCEPT", ACCEPT}, {"ACCESS", ACCESS}, 4100 {"ALIASED", ALIASED}, {"ALL", ALL}, {"AND", AND}, {"ARRAY", ARRAY}, {"AT", AT}, {"BEGIN", BEGiN}, {"BODY", BODY}, {"CASE", CASE}, {"CONSTANT", CONSTANT}, {"DECLARE", DECLARE}, {"DELAY", DELAY}, {"DELTA", DELTA}, {"DIGITS", DIGITS}, {"DO", DO}, {"ELSE", ELSE}, {"ELSIF", ELSIF}, {"END", END}, {"ENTRY", ENTRY}, {"EXCEPTION", EXCEPTION}, {"EXIT", EXIT}, {"FOR", FOR}, {"FUNCTION", FUNCTION}, {"GENERIC", GENERIC}, {"GOTO", GOTO}, {"IF", IF}, {"IN", IN}, {"IS", IS}, {"LIMITED", LIMITED}, {"LOOP", LOOP}, {"MOD", MOD}, {"NEW", NEW}, {"NOT", NOT}, {"NULL", NuLL}, {"OF", OF}, {"OR", OR}, {"OTHERS", OTHERS}, {"OUT", OUT}, {"PACKAGE", PACKAGE}, {"PRAGMA", PRAGMA}, {"PRIVATE", PRIVATE}, {"PROCEDURE", PROCEDURE}, {"PROTECTED", PROTECTED}, {"RAISE", RAISE}, {"RANGE", RANGE}, {"RECORD", RECORD}, {"REM", REM}, {"RENAMES", RENAMES}, {"REQUEUE", REQUEUE}, {"RETURN", RETURN}, {"REVERSE", REVERSE}, {"SELECT", SELECT}, {"SEPARATE", SEPARATE}, {"SUBTYPE", SUBTYPE}, {"TAGGED", TAGGED}, {"TASK", TASK}, {"TERMINATE", TERMINATE}, {"THEN", THEN}, {"TYPE", TYPE}, {"UNTIL", UNTIL}, {"USE", USE}, {"WHEN", WHEN}, {"WHILE", WHILE}, {"WITH", WITH}, {"XOR", XOR} };

LandY.157

Ada9X Lex
CSE 4100 to_upper(str) char *str; { char * cp; for (cp=str; *cp; cp++) { if (islower(*cp)) *cp -= ('a' - 'A') ; } } lk_keyword(str) char *str; { int min; int max; int guess, compare; min = 0; max = NUM_KEYWORDS-1; guess = (min + max) / 2; to_upper(str); for (guess=(min+max)/2; min<=max; guess=(min+max)/2) { if ((compare = strcmp(key_tab[guess].kw, str)) < 0) { min = guess + 1; } else if (compare > 0) { max = guess - 1; } else {return key_tab[guess].kwv;} } return identifier; }
LandY.158

Ada9X Lex
yyerror(s) CSE char *s; 4100 { extern int yychar;

error_count++;
fprintf(stderr," %s", s); if (yylineno) fprintf(stderr,", on line %d,", yylineno); fprintf(stderr," on input: "); if (yychar >= 0400) { if ((yychar >= ABORT) && (yychar <= XOR)) { fprintf(stderr, "(token) %s #%d\n", key_tab[yychar-ABORT].kw, yychar); } else switch (yychar) { case char_lit : fprintf(stderr, "character literal\n"); break; case identifier : fprintf(stderr, "identifier\n"); break; case char_string : fprintf(stderr, "string\n"); break; case numeric_lit : fprintf(stderr, "numeric literal\n"); break; case TIC : fprintf(stderr, "single-quote\n"); break; case DOT_DOT : fprintf(stderr, "..\n"); break; LandY.159

Ada9X Lex
CSE 4100 case LT_LT : fprintf(stderr, "<<\n"); break; case BOX : fprintf(stderr, "<>\n"); break; case LT_EQ : fprintf(stderr, "<=\n"); break; case EXPON : fprintf(stderr, "**\n"); break; case NE : fprintf(stderr, "/=\n"); break; case GT_GT : fprintf(stderr, ">>\n"); break; case GE : fprintf(stderr, ">=\n"); break; case IS_ASSIGNED : fprintf(stderr, ":=\n"); break; case RIGHT_SHAFT : fprintf(stderr, "=>\n"); break; default : fprintf(stderr, "(token) %d\n", yychar); } } else {switch (yychar) { case '\t': fprintf(stderr,"horizontal-tab\n"); return; case '\n': fprintf(stderr,"newline\n"); return; case '\0': fprintf(stderr,"\$end\n"); return; case ' ': fprintf(stderr, "(blank)"); return; default : fprintf(stderr,"(char) %c\n", yychar); return; } LandY.160 }

Ada9X Yacc
CSE 4100 /******* A YACC grammar for Ada 9X *********************************/ /* Copyright (C) Intermetrics, Inc. 1994 Cambridge, MA USA */ /* Copying permitted if accompanied by this statement. */ /* Derivative works are permitted if accompanied by this statement.*/ /* This grammar is thought to be correct as of May 1, 1994 */ /* but as usual there is *no warranty* to that effect. */ /*******************************************************************/ %{ #include <stdio.h> #include <ctype.h> #include <strings.h> #define BUF_SIZE 512 %} %union { char int %token %token %token %token %token %token %token %token %token %token %token trans[BUF_SIZE+1]; val; }

TIC DOT_DOT LT_LT BOX LT_EQ EXPON NE GT_GT GE IS_ASSIGNED RIGHT_SHAFT ABORT ABS ABSTRACT ACCEPT ACCESS ALIASED ALL AND ARRAY AT BEGiN BODY CASE CONSTANT DECLARE DELAY DELTA DIGITS DO ELSE ELSIF END ENTRY EXCEPTION EXIT FOR FUNCTION GENERIC GOTO IF IN IS LIMITED LOOP MOD NEW NOT NuLL OF OR OTHERS OUT PACKAGE PRAGMA PRIVATE PROCEDURE PROTECTED RAISE RANGE RECORD REM RENAMES REQUEUE RETURN REVERSE SELECT SEPARATE SUBTYPE TAGGED TASK TERMINATE THEN TYPE UNTIL USE WHEN WHILE WITH XOR char_lit identifier char_string numeric_lit

LandY.161

Ada9X Yacc
%type CSE %type 4100 %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> access_opt access_type adding address_spec aliased_opt align_opt allocator alternative alternative_s array_type assign_stmt attrib_def attribute_id basic_loop block block_body block_decl body body_opt body_stub c_id_opt c_name_list case_hdr case_stmt choice choice_s code_stmt comp_assoc comp_decl comp_decl_s comp_list comp_loc_s comp_unit compilation component_subtype_def compound_name compound_stmt cond_clause cond_clause_s cond_part condition constr_array_type context_spec decl decl_item decl_item_or_body decl_item_or_body_s1 decl_item_s decl_item_s1 decl_part def_id def_id_s derived_type designator discrete_range discrete_with_range discrim_part discrim_part_opt discrim_spec discrim_spec_s else_opt exit_stmt expression factor fixed_type float_type formal_part formal_part_opt generic_decl generic_derived_type generic_discrim_part_opt generic_formal generic_formal_part generic_inst generic_pkg_inst generic_subp_inst generic_type_def goal_symbol goto_stmt id_opt if_stmt index index_s init_opt integer_type iter_discrete_range_s iter_index_constraint iter_part iteration label label_opt limited_opt literal logical loop_stmt mark mode multiplying name name_opt name_s null_stmt number_decl object_decl
LandY.162

Ada9X Yacc
CSE 4100

%type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type %type

<trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans> <trans>

object_qualifier_opt object_subtype_def param param_s paren_expression pkg_body pkg_decl pkg_spec primary private_opt private_part private_type procedure_call prot_body prot_decl prot_def prot_elem_decl prot_elem_decl_s prot_op_body prot_op_body_s prot_op_decl prot_op_decl_s prot_opt prot_private_opt prot_spec qualified range range_constr_opt range_constraint range_spec range_spec_opt real_type record_def record_type record_type_spec relation relational rep_spec return_stmt reverse_opt short_circuit simple_expression simple_stmt statement statement_s subp_default subprog_body subprog_decl subprog_spec subprog_spec_is_push subunit subunit_body tagged_opt term type_completion type_decl type_def unary unconstr_array_type unit unlabeled use_clause use_clause_opt value value_s value_s_2 variant variant_part variant_s when_opt with_clause my_identifier error epsilon ALIASED CONSTANT IS_ASSIGNED TYPE IS '(' NEW ABSTRACT RANGE MOD DIGITS DELTA NOT ARRAY ACCESS CASE WHEN OTHERS NuLL TAGGED RECORD PROTECTED AND OR my_char_lit '=' NE '<' LT_EQ '>' GE '+' '-' '*' '/' ':' LT_LT IF ELSE CASE WHEN WHILE FOR REVERSE LOOP DECLARE BEGiN EXIT RETURN GOTO PROCEDURE FUNCTION IN OUT PACKAGE PRIVATE LIMITED USE WITH SEPARATE GENERIC FOR AT my_char_string my_numeric_lit
LandY.163

Ada9X Yacc
CSE 4100 %% goal_symbol : compilation ; decl : object_decl | number_decl | type_decl | subprog_decl | pkg_decl | prot_decl | generic_decl | body_stub | error ';' ; object_decl : def_id_s ':' object_qualifier_opt object_subtype_def init_opt ';' ; def_id_s : def_id | def_id_s ',' def_id ; def_id : my_identifier {strcpy($$, $1);} ; object_qualifier_opt : epsilon | ALIASED | CONSTANT | ALIASED CONSTANT ;
LandY.164

Ada9X Yacc
object_subtype_def : name CSE | array_type 4100 ; init_opt : epsilon | IS_ASSIGNED expression ; number_decl : def_id_s ':' CONSTANT IS_ASSIGNED expression ';' ; type_decl : TYPE my_identifier discrim_part_opt type_completion ';' ; discrim_part_opt : epsilon | discrim_part | '(' BOX ')' ; type_completion : epsilon | IS type_def ; type_def : integer_type | real_type | array_type | record_type | access_type | derived_type | private_type ; ETC See Full Yacc on web page
LandY.165

Ada9X Yacc
REMAINING NON GRAMMAR CODE AT END OF YACC FILE CSE 4100 %% mystrcat(s, t) char s[], t[]; { int i, j; i = j = 0; while (s[i] != '\0') i++; s[i] = ' '; i++; while ((s[i++] = t[j++]) != '\0') ; }

LandY.166

Ada9X Yacc
CSE 4100 /* To build this, run it through lex, compile it, and link it with */ /* the result of yacc'ing and cc'ing grammar9x.y, plus "-ly" */ FILE *fp; #include "lex.yy.c" main(argc, argv) int argc; char *argv[]; { /* Simple Ada 9X syntax checker */ /* Checks standard input if no arguments */ /* Checks files if one or more arguments */ extern int error_count; extern int yyparse(); extern int yylineno; FILE *flptr; int i; fp = fopen("output","w");

LandY.167

Ada9X Yacc
if (argc == 1) { CSE yyparse(); 4100 } else { for (i = 1; i < argc; i++) { if ((flptr = freopen(argv[i], "r",stdin)) == NULL) { fprintf(stderr, "%s: Can't open %s", argv[0], argv[i]); } else { if (argc > 2) fprintf(stderr, "%s:\n", argv[i]); yylineno = 1; yyparse(); } } } if (error_count) { fprintf(stderr, "%d syntax error%s detected\n", error_count, error_count == 1? "": "s"); exit(-1); } else { fprintf(stderr, "No syntax errors detected\n"); } } yywrap() {return 1;}

LandY.168

Você também pode gostar