Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.glorb.com!news.ripco.com!rahul.net!wasp.rahul.net!rahul.net!news.lightlink.com!news.iecc.com!nerds-end From: "Ben L. Titzer" Newsgroups: comp.compilers Subject: Re: Parsing C#-like generics Date: Wed, 13 Jul 2011 10:19:32 -0700 (PDT) Organization: Compilers Central Lines: 43 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-07-023@comp.compilers> References: <11-07-019@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1310910425 99332 64.57.183.58 (17 Jul 2011 13:47:05 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 17 Jul 2011 13:47:05 +0000 (UTC) Keywords: parse, syntax Posted-Date: 17 Jul 2011 09:47:05 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:199 On Jul 11, 11:22 am, "Harold Aptroot" > I'm having some trouble parsing generics when mixed with comparisons. The > way I try to do it, there is an ambiguity between LessThan and a "list of > types between angle brackets". > For example, x(x to a function call x with a type parameter list < x > and a single argument > which is the expression x make it clearer, the actual result should just be an AST). > My parser generator (GOLD parsing system) complains about a shift-reduce > error, and the parser it produces doesn't want to parse any expression with > a LessThan in it because it believes that to be a incomplete type list > (lacking a closing > ) > > I know it is actually inherently ambiguous, because t(t3) could mean > two things: > - LessThan(t, BiggerThan(t2, t3) > - invoke t with argument t3 > In that case I want to pick option two. > For tt3 I want to pick option one, not report "missing ( " > > Can this be done with an LALR parser at all? If so, how? One trick I've used in the past is to lex the '<' that introduces a type parameter list as part of the identifier: "foo" would lex as a single IDENT token. and "foo<" would lex as a single PARAMETERIZED_IDENT token. and "foo <" would lex as IDENT followed by LESS_THAN You can then use the IDENT and PARAMETERIZED_IDENT tokens in various places in the grammar, with PARAMETERIZED_IDENT being followed by a type list and a '>' token. This then requires any use of the '<' operator that follow an identifer to have intervening whitespace. It also requires that any parameterization of an identifier not have intervening whitespace. I think it's a decent tradeoff if you are defining the language yourself, but won't work for languages with more complex rules for resolving the ambiguity.