Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end From: BGB Newsgroups: comp.compilers Subject: Re: Parsing C#-like generics Date: Thu, 14 Jul 2011 13:13:50 -0700 Organization: albasani.net Lines: 132 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-07-024@comp.compilers> References: <11-07-019@comp.compilers> <11-07-021@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1310910438 99544 64.57.183.58 (17 Jul 2011 13:47:18 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 17 Jul 2011 13:47:18 +0000 (UTC) Keywords: parse, syntax Posted-Date: 17 Jul 2011 09:47:18 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:200 On 7/12/2011 5:25 AM, Hans-Peter Diettrich wrote: > Harold Aptroot schrieb: > IMO you should better separate declarations from code (statements, > expressions). Then the parser will "know" from that context, that a > declaration can contain type lists, but not x > Above example should parse better as > x{x where the C style braces around statement blocks allow for better > disambiguation of the< token. the problem though is that often there are good reasons to allow these types of things to appear in related contexts. for example, if using a C-like declaration syntax, but without the aide of having all types and declarations known up front, one will have to deal with the potential ambiguity during parsing as to whether they are dealing with one type of expression or another, and potentially need to use some level of back-tracking to work this out. part of this issue is because, in statement context, one of 3 different major elements may appear: a declaration; a plain statement; an expression. given each may appear and it may not be possible to know up-front which is present, one will have to tread carefully WRT avoiding ambiguities between them, as an otherwise innocent seeming piece of syntax may lead to potential misparsing elsewhere in the language. allowing too many potential cases of misparses may frustrate programmers with otherwise valid seeming code stepping on syntactic edge cases and being parsed as something unintended. better then IMO is to try to treat, declarations, statements, and expressions, effectively as a unified whole (basically, a giant expression tower which also includes statements and declarations as part of its lower-end, essentially as precedence levels below the comma operator). as well, one can try to avoid introducing syntactic ambiguities wherever possible. doing this may also allow in some cases allowing for much more compact syntax, as extra typing can be left out which would otherwise be required to disambiguate the syntax. consider as a contrived example: Foo foo(x)fun(x)new Foo(x);(x); (nevermind that its meaning may not be entirely obvious, but code like this may be written in my own language, which otherwise has a mostly C-family style syntax.) if the above were translated into a purer ActionScript style syntax, it would look more like: function foo(x):Foo { return(function(x){return(new Foo(x));}(x)); } but, in my language, a few of these constructions can be left off (the latter is valid syntax in my language as well, and is more-or-less equivalent). all this is left as a matter of style. side note: "foo(x)fun(x)new Foo(x);(x);", although only trivially different, is not valid in my language, as now the parser has no idea that it is looking at a function declaration. however, as a tradeoff, I ended up having to omit C/Java style casts, and ended up using a slightly nasty-looking syntax for attributes, each because they created ambiguities with other parts of the syntax. "x=(int)y;" is not valid, but would need to be written as "x=y as! int;" ("as" and "as!" are both casts, but differ as to how they handle cast failures). similarly, "$[foo]" or "$[foo(bar)]" is the syntax for attributes, mostly because initially I was using C#-style "[foo(bar)]" attributes, but these clashed in an annoying way with the current array syntax, and the originally planned disambiguation rules would have been a little nasty. unambiguous parsing would depend on subsequent syntax for disambiguation, and I prefer to have it possible to know within a few tokens which syntactic form is present, rather than potentially parsing a large chunk of code only to discover that the wrong path had been followed. note that "@foo(bar)" probably would also have worked, but "$[...]" was what I decided on. as well as other "weird" syntax: "[1,2,3]SB" for a 3-element signed byte array, mostly as I lacked any good way to put it in prefix position ("#SB[1,2,3]" wouldn't have worked for other reasons); "[1,2,3]:sbyte" is equivalent to the above; ... this is a major downside though: the more features one tries to allow through a compact syntax, the more hair that tends to appear, and it may risk leading to constructions that are just plain nasty looking. it is also made more difficult if one avoids depending on prior declarations as context (frequently used for disambiguation in C and C++ syntax), which IMO has a number of drawbacks (creates dependency issues, can slow down the parser, ...). preferably also avoided is contextual semantic dependencies, where a given expression may have very different semantics depending on the context in which it is used. this can complicate the compiler and potentially also confuse the user. a more plain syntax, say, plain JavaScript style, one will not have so many of these issues as pretty much everything in statement context is either a plain expression, or uses a keyword to indicate what it is (the 'function' or 'var' keywords disambiguate these sorts of things). there are merits to this route as well, as having most things indicated explicitly via keywords makes the parser a good deal simpler. ActionScript goes and adds a few things to the basic JavaScript style syntax, notably the use of modifiers and explicit types, but most of these are relatively straightforward (since the modifiers are themselves keywords, and several other special cases are introduced mostly via the introduction of additional keywords into certain contexts, ...). or such...