Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!news.glorb.com!usenet.stanford.edu!news.iecc.com!nerds-end
From: torbenm@diku.dk (Torben Ægidius Mogensen)
Newsgroups: comp.compilers
Subject: Re: Maintaining scope while parsing C with a YACC grammar
Date: Tue, 03 May 2011 09:51:14 +0200
Organization: SunSITE.dk - Supporting Open source
Lines: 44
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <11-05-008@comp.compilers>
References: <11-04-036@comp.compilers> <11-04-038@comp.compilers> <11-05-003@comp.compilers>
NNTP-Posting-Host: news.iecc.com
X-Trace: gal.iecc.com 1304531589 3573 64.57.183.58 (4 May 2011 17:53:09 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Wed, 4 May 2011 17:53:09 +0000 (UTC)
Keywords: C, parse
Posted-Date: 04 May 2011 13:53:09 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: x330-a1.tempe.blueboxinc.net comp.compilers:117

eliben <eliben@gmail.com> writes:
> Since it's parsing of C I'm talking about, this approach will have to
> somehow handle ambiguity of this kind:
>
> T * x;
>
> This can be either a declaration or a multiplication, depending on
> earlier symbol table information (whether T is a type or not).

One technique for handling this is to let the lexer access the symbol
table and determine if T is a type name or not and generate different
tokens for these.  The grammar would then have productions somewhat
like

Declaration -> Type non-type-id
             | ...

Type -> type-id
      | Type *
      | ...

Expression -> Expression * Expression
            | non-type-id
            | ...

It becomes much more complicated for real C, but the idea should be
clear enough.

This requires the parser to keep a symbol table for the current scope
available to the lexer.  This table needs not contain full information
for each identifier, just enough to distinguish type names from other
names.

That said, I consider this kind of ambiguity bad language design, as
it is not only hard for a parser to handle, but also hard for a human
reader.  Possible fixes are to make declarations and expressions /
statements non-overlapping syntactically (as in Pascal) or to keep
type names syntactically distinct from variable names, e.g. by making
type names start with upper case letter and variable names start with
lower case letters (as in Haskell).

	Torben
[As Dennis said, "the ice is thin here." -John]