Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #643 > unrolled thread

Yacc/Bison - what semantic actions to take on a parse error

Started byJames Harris <james.harris.1@gmail.com>
First post2012-05-23 04:19 -0700
Last post2012-05-30 14:41 -0400
Articles 4 — 2 participants

Back to article view | Back to comp.compilers


Contents

  Yacc/Bison - what semantic actions to take on a parse error James Harris <james.harris.1@gmail.com> - 2012-05-23 04:19 -0700
    Re: Yacc/Bison - what semantic actions to take on a parse error James Harris <james.harris.1@gmail.com> - 2012-05-24 12:05 -0700
      Re: Yacc/Bison - what semantic actions to take on a parse error James Harris <james.harris.1@gmail.com> - 2012-05-24 22:49 -0700
        Re: Yacc/Bison - what semantic actions to take on a parse error Chris F Clark <cfc@shell01.TheWorld.com> - 2012-05-30 14:41 -0400

#643 — Yacc/Bison - what semantic actions to take on a parse error

FromJames Harris <james.harris.1@gmail.com>
Date2012-05-23 04:19 -0700
SubjectYacc/Bison - what semantic actions to take on a parse error
Message-ID<12-05-014@comp.compilers>
Yacc etc allow the special "error" keyword to be used in rules to aid
error recovery. Where those rules are there to generate a node of a
tree and there has been a parse error what should one tell Yacc to do?
Sometimes there's nothing valid one can build a node from and I can't
find a good way to communicate the situation to Yacc.

I've looked at various options. Some are OK in certain cases but none
seem right in the general case. I'll post more details if interested
but there may be a simple answer.

Anyone know of an easy or a standard answer or can provide some
recommendations?

James
[My standard answer is that the error token is mostly useful for
resynchronizing to try to find some more syntax errors, but that it's
a losing battle to try to do much what you've parsed.  Since the
recovery process pops stuff on the stack and throws it away, you're
going to lose some partially parsed subtrees and get a storage leak
unless you hack on the parser skeleton to do something with the stuff
that's popped off. Bison added %destructor to let you do that. -John]

[toc] | [next] | [standalone]


#650

FromJames Harris <james.harris.1@gmail.com>
Date2012-05-24 12:05 -0700
Message-ID<12-05-021@comp.compilers>
In reply to#643
On May 23, 12:19 pm, James Harris <james.harri...@gmail.com> wrote:

> Yacc etc allow the special "error" keyword to be used in rules to aid
> error recovery. Where those rules are there to generate a node of a
> tree and there has been a parse error what should one tell Yacc to do?

...

> [My standard answer is that the error token is mostly useful for
> resynchronizing to try to find some more syntax errors, but that it's
> a losing battle to try to do much what you've parsed.

I wasn't thinking about using the parse tree after the parse phase so
much as just completing the parse.

An example may help illustrate. Say were defining a node type X where
there is nothing special about that node type. We might have a grammar
construct something like the following. I'll use quotes "..." to
indicate descriptive text.

%type <X_type> X
X
  : "a normal X" ';' { $$ = Xnode("specific data"); }
  | error ';' { ACTION; }
;

The Xnode call constructs a node. The X production expects $$ to be
set to a node of the given type.

The issue is that the error production cannot create a meaningful node
so what actions to replace ACTION are appropriate? Here are some
options.

* Create an X node with dummy values. That would satisfy the type
checking.
* Set $<err_msg>$ = "invalid X node"
* Braces but no action, i.e. {}
* No action clause so default to $$ = $1;
* Some combination of YYERROR; and yyerror();

John, you'll know but for anyone who isn't aware of these options
those for Bison are shown at

  http://www.gnu.org/software/bison/manual/html_node/Action-Features.html

So, which option is 'best'? Or should we just ignore a type mismatch
error?

James
[You already know you ran into a syntax error, so I'd think that type
checking is more likely to produce an error cascade than something
useful. -John]

[toc] | [prev] | [next] | [standalone]


#651

FromJames Harris <james.harris.1@gmail.com>
Date2012-05-24 22:49 -0700
Message-ID<12-05-022@comp.compilers>
In reply to#650
On May 24, 8:05 pm, James Harris <james.harri...@gmail.com> wrote:

...

> The issue is that the error production cannot create a meaningful node
> so what actions to replace ACTION are appropriate? Here are some
> options.
>
> * Create an X node with dummy values. That would satisfy the type
> checking.
> * Set $<err_msg>$ = "invalid X node"
> * Braces but no action, i.e. {}
> * No action clause so default to $$ = $1;
> * Some combination of YYERROR; and yyerror();

...

>  http://www.gnu.org/software/bison/manual/html_node/Action-Features.html
>
> So, which option is 'best'? Or should we just ignore a type mismatch
> error?

> [You already know you ran into a syntax error, so I'd think that type
> checking is more likely to produce an error cascade than something
> useful. -John]

It sounds like you might be referring to type checking in the
compiler's later semantic analysis phase. I was asking about the
checking for a type match that the Yacc/Bison build process carries
out.

For example, if we leave the default action $$ = $1 in an error
production the parser creation process might report something like

  $1 of 'X' has no declared type

or, if we set $$ to a dummy

  error: cannot convert dummy_class* to X_class* in assignment

James
[Ah, that, right.  If it were going to try to use the parse tree, I'd
create a dummy node, otherwise { $$ = NULL; }  -John]

[toc] | [prev] | [next] | [standalone]


#660

FromChris F Clark <cfc@shell01.TheWorld.com>
Date2012-05-30 14:41 -0400
Message-ID<12-05-031@comp.compilers>
In reply to#651
On May 24, 8:05 pm, James Harris <james.harri...@gmail.com> wrote:

> * Create an X node with dummy values. That would satisfy the type
> checking.

Around 1987 some folks at Unisys faced the same problem.  Their
solution was "plastic error nodes", nodes that represent errors but
which can convert to any other type (be used in any other context).
The error nodes could also carry the information about what the error
was, e.g. what correct things were found, and what seemed to be
missing.  That seems to be what you are trying to recreate.

The issue you seem to have is type checking of the node types.

The moderators solution of using null pointers solves that problem.
As long as you use pointers (rather than references) in C++, a null
value will convert to any pointer type. Of course, then you have the
issue of checking for nulls at every dereference site, which may be a
good idea anyway.  You also have the issue that a null pointer cannot
carry any other information, i.e. you lose the abilitiy to carry any
useful information you could gather at the error site.

Alternately, you could follow the Java model of having an "object"
type that everything can convert to/from. That's actually the default
model in yacc (i.e. if you don't try typing your nodes in the
grammar). The parse stack in an LR parser by default needs to be
untyped because it needs to hold different parts of different phrases
being parsed at different times.

The last solution is to have a different node created at each error
location.  The type of node should be a special error node whose type
is derived from the type on the left-hand-side of the
production. Something like the following snippet:

%type <X_type> X
X
  : "a normal X" ';' { $$ = Xnode("specific data"); }
  | error ';' { class XerrorNode : public Xnode {}; $$ = XerrorNode(); }

As the moderator has mentioned, when you get to an error production,
yacc (or bison) has already started mangling the stack, discarding
things that were pushed on before but were part of the erroneous
(non-sentential) phrase that the error is synchronizing past.  None of
this is going to help with that.

Finally, error productions are a kind of final ditch effort at error
processing.  They are there so that you can continue parsing at a
relatively sane place, but....  They are a blunt instrument.  They
don't help you create good error messages for simple and important
cases.  They can easily lead you into spurious cascading error
situations.

Hope this helps,
-Chris

******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------
[I wouldn't say that LR parse stacks need to be untyped, although they do need
to hold different types.  Yacc and bison parse stacks are generally an array
of C unions. -John]

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web