Path: csiph.com!x330-a1.tempe.blueboxinc.net!feeder1.hal-mli.net!news.glorb.com!news-out.readnews.com!news-xxxfer.readnews.com!news.misty.com!news.iecc.com!nerds-end From: "Ira Baxter" Newsgroups: comp.compilers Subject: Re: Maintaining scope while parsing C with a YACC grammar Date: Fri, 13 May 2011 17:46:39 -0500 Organization: Compilers Central Lines: 47 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-05-011@comp.compilers> References: <11-04-036@comp.compilers> <11-04-038@comp.compilers> <11-05-003@comp.compilers> <11-05-007@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1305408850 83880 64.57.183.58 (14 May 2011 21:34:10 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sat, 14 May 2011 21:34:10 +0000 (UTC) Keywords: C, parse, types Posted-Date: 14 May 2011 17:34:10 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com X-RFC2646: Format=Flowed; Original Xref: x330-a1.tempe.blueboxinc.net comp.compilers:120 "Robert A Duff" wrote in message > No, sorry. I'd be interested to hear if anybody built a C parser that > didn't use any feedback into the parser from a symbol table to deal > with the typedef issue. Our DMS Software Reengineering Toolkit does exactly this. (www.semanticdesigns.com/Products/DMS/DMSToolkit.html). We use GLR parsing machinery, which means we can decouple parsing from semantic hackery. The grammar for this is just as you'd expect, with productions for expression statements and declarations. The GLR parser builds an ambiguous tree with maximal sharing of the subtrees. At the end of the parse, you have a parse DAG with ambiguity nodes. Name resolution is handled by an attribute grammar evaluator with some special effects. The AGE is pretty general and we use it for lots of tree-structured analyses; this is pretty nice for nested scopes, building control flow graphs, etc. For name and type resolution, the attributes passed up and down the tree are, as you might expect, reference to symbol scopes and/or type declarations. The special effects of interest here are the ability to declare an aborted attribute evaluation at any tree node; this will kill off any dependent attribute computation and delete the offending tree node and its otherwise-unreferenced children (remember, there may be sharing, so a node may have more than one reference). What we do to handle the T* x issue is check for consistency of the types; if T is a type, then T*x as an expression makes no sense and that part of the AGE dies off. If T is numeric type, then T* x makes no sense as a declartion, and that part of the AGE dies off. Destroying the "bad" subtree removes the incorrect parse from the tree, and thus the final tree has no ambiguity nodes and a completed name/type resoultion with symbol tables that is correct. The C parser is question has been used in anger to carry out static analysis of software systems with usome 19,000 compilation units. We think it is quite robust. This scheme works really nicely for C, C++, COBOL and other languages, and it means we can build grammars without worrying about semantic constraints. The grammars are thus nice and clean, and the AGEs are pretty clean too because of the separation of concerns. You can do this by mixing symbol lookup and parsing, but then it generally gets a lot messier to build both. -- Ira Baxter, CTO www.semanticdesigns.com