Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!news.glorb.com!usenet.stanford.edu!news.iecc.com!nerds-end From: torbenm@diku.dk (Torben Ægidius Mogensen) Newsgroups: comp.compilers Subject: Re: Maintaining scope while parsing C with a YACC grammar Date: Tue, 03 May 2011 09:51:14 +0200 Organization: SunSITE.dk - Supporting Open source Lines: 44 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-05-008@comp.compilers> References: <11-04-036@comp.compilers> <11-04-038@comp.compilers> <11-05-003@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1304531589 3573 64.57.183.58 (4 May 2011 17:53:09 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Wed, 4 May 2011 17:53:09 +0000 (UTC) Keywords: C, parse Posted-Date: 04 May 2011 13:53:09 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:117 eliben writes: > Since it's parsing of C I'm talking about, this approach will have to > somehow handle ambiguity of this kind: > > T * x; > > This can be either a declaration or a multiplication, depending on > earlier symbol table information (whether T is a type or not). One technique for handling this is to let the lexer access the symbol table and determine if T is a type name or not and generate different tokens for these. The grammar would then have productions somewhat like Declaration -> Type non-type-id | ... Type -> type-id | Type * | ... Expression -> Expression * Expression | non-type-id | ... It becomes much more complicated for real C, but the idea should be clear enough. This requires the parser to keep a symbol table for the current scope available to the lexer. This table needs not contain full information for each identifier, just enough to distinguish type names from other names. That said, I consider this kind of ambiguity bad language design, as it is not only hard for a parser to handle, but also hard for a human reader. Possible fixes are to make declarations and expressions / statements non-overlapping syntactically (as in Pascal) or to keep type names syntactically distinct from variable names, e.g. by making type names start with upper case letter and variable names start with lower case letters (as in Haskell). Torben [As Dennis said, "the ice is thin here." -John]