Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.compilers > #3394

Re: syntax complexity

From anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups comp.compilers
Subject Re: syntax complexity
Date 2023-02-21 08:14 +0000
Organization Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID <23-02-055@comp.compilers> (permalink)
References (2 earlier) <23-02-047@comp.compilers> <23-02-050@comp.compilers> <29156_1676600565_63EEE4F4_29156_1009_1_23-02-051@comp.compilers> <23-02-052@comp.compilers> <23-02-053@comp.compilers>

Show all headers | View raw


Our moderator writes:

>[There are definitely things that are hard to say in BNF, even though
>they're intuitively simple.

An example is the solution to the dangling-else problem that we
discussed some time ago.  At least one of the language standards I
looked at during this discussion specified an ambiguous grammar for
the IF-statement, with the disambiguation coming from prose, rather
than specifying an unambiguous grammar.

Of course, better than either solution is to design the language to
require that an IF-statement is terminated with, e.g., fi (Algol 68) or END
(Modula-2).

>Another is floating point numbers with
>optional "." and "E" but you need at least one of them. -John]

Regular expression syntax is missing an operator that signifies the
intersection of the sets recognized by the operand regexps.  Let's
call this operator "&".  Then this requirement for an FP number can be
expressed as:

([0-9.]*&.*[0-9].*)(E[0-9]+)?&.*[.E].*

"[0-9.]*" specifies a lenient form of the mantissa part; ".*[0-9].*"
specifies a string that has at least one digit; so
"([0-9.]*&.*[0-9].*)" says that the mantissa part can contain digits
and "." and must contain one digit.  "(E[0-9]+)?" specifies the
optional exponent part.  So "([0-9.]*&.*[0-9].*)(E[0-9]+)" is a
lenient form of an FP number that may miss both "." and "E".
".*[.E].*" specifies the requirement stated above: The number must
contain a "." or an "E".  Again the "&" combines these requirements.

You can translate regexps with & to a DFA, and don't lose any
performance there.  For an NFA-based implementation, the only way I
see is that you process both sub-NFAs and check if they both accept
the string, so it's somewhat slower; I guess that this is a major
reason why such an operator has not been added to widely-used regexp
syntaxes.

Has anybody implemented such an operator at all?

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Wouldn't that pattern allow 1.2.3 ? -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

syntax complexity gah4 <gah4@u.washington.edu> - 2023-02-15 15:08 -0800
  Re: syntax complexity Thomas Koenig <tkoenig@netcologne.de> - 2023-02-16 06:32 +0000
  Re: syntax complexity Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2023-02-16 12:03 +0100
    Re: syntax complexity gah4 <gah4@u.washington.edu> - 2023-02-16 11:33 -0800
      Re: syntax complexity gah4 <gah4@u.washington.edu> - 2023-02-16 16:08 -0800
      Re: syntax complexity Roger L Costello <costello@mitre.org> - 2023-02-20 15:09 +0000
        Re: syntax complexity gah4 <gah4@u.washington.edu> - 2023-02-20 09:57 -0800
          Re: syntax complexity anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-02-21 08:14 +0000
            Re: syntax complexity anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-02-21 18:39 +0000
              Re: ireegular expressions, syntax complexity anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-02-22 10:55 +0000
            Re: irregular expressions, syntax complexity arnold@freefriends.org (Aharon Robbins) - 2023-02-22 08:53 +0000
            Re: irregular expressions, syntax complexity Kaz Kylheku <864-117-4973@kylheku.com> - 2023-02-23 00:34 +0000
        Re: syntax complexity George Neuner <gneuner2@comcast.net> - 2023-02-20 13:49 -0500
      syntax complexity Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2023-02-21 20:54 +0200

csiph-web