This can be necessary if the clearing of the erroneous entities is done by an action which handles itself the input file.
We illustrate below this error treatment with an interactive parser. The inputs are composed of a name followed by an integer on one line. In outputs 17
the data is re-arranged in the following manner: the integer, colon and the name. Thus the inputs looks like:
name1 integer1
name2 integer2
and the output looks like:
integer1:name1
integer2:name2
The Bison specification file is:
%{#include <stdio.h>
extern FILE* out_file;
%}
%token NUMBER NAME ENDOFLINE;
%%
init
: entry {printf("The End"\n");}
entry
: line
| entry line
;
line
: NAME NUMBER ENDOFLINE
{fprintf(out_file, "%4d:%s\n", $2, name);}
| error ENDOFLINE
{yyerrok;
printf("error : start again ");}
line
;
%%
#include "lex.yy.c"
Whenever a line is not correct, the parser displays an appropriate message and re-start processing the line entirely. The outputs are in the output file out_file declared and opened in the main program before the call to yyparse (not shown).
The variable name is declared and initialised in the lexical analyser by the string read, yylval. Here is the speicifcation for flex:
char name[20];
%%
[a-z]+
{strcpy(name, yytext); return(NAME);}
[0-9]+
{yylval = atoi(yytext); return(NUMBER);}
\n
return(ENDOFLINE);
.
;
18
3.9
Passing Data from the Lexical Analyser to the Parser
The values returned by the actions and the lexical entities (The $is and the
%tokens) are by default integers, but Bison can handle other types (floats, strings etc.) To do this we must change the specification file by:
• Declaring the set of types used. The elements of the Bison’s stack must be an union of the types. This is done in the declaration part using the directive %union, for example:
%union {int
aninteger ;
float
afloat ;
other
another;/* defined by a typedef beforehand */
}
• Adding the type of each lexical entity in the following format:
%token <aninteger> NUMBER
%token <afloat> FLOAT
• Declaring the type of each non-terminal in the following format:
%type <aninteger> integer_expression
/* rule integer_expression */
%type <another>
another_expression
/* rule another_expression */
• In the actions use the type of the value returned in the following format: another_expression : t1 ’+’ t2 {$<other> = $1 + $3}
/* t1 and t2 are declared of type other */
• if the type is a C structure, the name of the field must of course be included as in:
expTT : tt1 ’?’ tt2 {$<type3>.a = $1.a + $3.a;
$<type3>.b = $1.b - $3.b;}
/* type3 is a struture with fields a and b */
3.10
Conflicts
If you write your own grammar or modify an existing one then you are bound to get conflicts in your grammar which are detected by Bison. These must be resolved for your parser to be correct. Conflicts arise in the grammar because it is ambiguous: i.e. there are more than one way of parsing a legal sequence of entities. There are two kinds of conflicts:
• Shift/Reduce conflict
• Reduce/Reduce conflict
19
To resolve conflicts it is necessary to understand a little bit about the Bison parser algorithm. As already mentionned the automaton generated by Bison uses a stack to keep track of the entities that have been recognised but which do not yet as a whole form a rule. Traditionnaly pushing a lexical entity on the stack is called shifting. Whenever the entities on the top of the stack matches a rule, the entities concerned are poped from the stack, this is traditionally called a reduction.
For example, if our infix calculator’s parser stack contains: 1+5*3 and the next input token is a newline character, then the last three entities are reduced to 15 via the rule:
exp : term ’*’ factor
Then the stack contains just three elements: 1+15, at which point a further reduction is performed: the stack grows and shrinks as the text under analysis is being parsed.
When a lexical entity is read, it is not immediately shifted, first it becomes the look ahead token which is not on the stack.
3.10.1
Shift/Reduce Conflicts
The classic situation when this occurs is the conditional statement in programming languages:
if_stmt : IF expr THEN stmt
| IF expr THEN stmt ELSE stmt
;
With a rule like this, whenever the ELSE entity is read and becomes the look ahead entity, the automaton is in an ambigous mode because a reduction is possible (according to the first part of the rule) and a shift is also possible (according to the second part of the rule), hence we have a Shift/Reduce conflict.
The default behaviour is to shift, however it is strongly recommended that this warning be removed by re-writting the rule in a non ambiguous manner as in: if_stmt : IF expr THEN stmt else_opt
;
else_opt : /* nothing */
| ELSE stmt
;
Another classical situation is with dealing with lists of entities:
list_of_vars : var
| var ’,’ list_of_vars
here after a single var the automation is in an ambiguous state. Here the rule should re-written to be left recursive:
20
list_of_vars : var
| list_of_vars ’,’ var
Shift/Reduce conflicts may also occur if the priority of operators in expression is not self implied in the rules or if the preceedence directives are not used.
3.10.2
Reduce/Reduce Conflicts
A Reduce/Reduce conflict occurs if there are two or more rule that apply to the same sequence of entities. For example in:
sequence : /* nothing */
| sequence types
| sequence subtypes
;
types
: /* nothing */
| types type
;
subtypes : /* nothing */
| subtypes subtype
;
which is meant to express a sequence of type or subtype entities intertwinned.
|