Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: syntax complexity Date: Tue, 21 Feb 2023 08:14:10 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-02-055@comp.compilers> References: <23-02-045@comp.compilers> <23-02-047@comp.compilers> <23-02-050@comp.compilers> <29156_1676600565_63EEE4F4_29156_1009_1_23-02-051@comp.compilers> <23-02-052@comp.compilers> <23-02-053@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="316"; mail-complaints-to="abuse@iecc.com" Keywords: syntax, lex, comment Posted-Date: 21 Feb 2023 12:38:32 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3394 Our moderator writes: >[There are definitely things that are hard to say in BNF, even though >they're intuitively simple. An example is the solution to the dangling-else problem that we discussed some time ago. At least one of the language standards I looked at during this discussion specified an ambiguous grammar for the IF-statement, with the disambiguation coming from prose, rather than specifying an unambiguous grammar. Of course, better than either solution is to design the language to require that an IF-statement is terminated with, e.g., fi (Algol 68) or END (Modula-2). >Another is floating point numbers with >optional "." and "E" but you need at least one of them. -John] Regular expression syntax is missing an operator that signifies the intersection of the sets recognized by the operand regexps. Let's call this operator "&". Then this requirement for an FP number can be expressed as: ([0-9.]*&.*[0-9].*)(E[0-9]+)?&.*[.E].* "[0-9.]*" specifies a lenient form of the mantissa part; ".*[0-9].*" specifies a string that has at least one digit; so "([0-9.]*&.*[0-9].*)" says that the mantissa part can contain digits and "." and must contain one digit. "(E[0-9]+)?" specifies the optional exponent part. So "([0-9.]*&.*[0-9].*)(E[0-9]+)" is a lenient form of an FP number that may miss both "." and "E". ".*[.E].*" specifies the requirement stated above: The number must contain a "." or an "E". Again the "&" combines these requirements. You can translate regexps with & to a DFA, and don't lose any performance there. For an NFA-based implementation, the only way I see is that you process both sub-NFAs and check if they both accept the string, so it's somewhat slower; I guess that this is a major reason why such an operator has not been added to widely-used regexp syntaxes. Has anybody implemented such an operator at all? - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [Wouldn't that pattern allow 1.2.3 ? -John]