Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9866

GNU Awk's types of regular expressions

From Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups comp.lang.awk
Subject GNU Awk's types of regular expressions
Date 2024-11-28 19:18 +0100
Organization A noiseless patient Spider
Message-ID <viac5m$l8oh$1@dont-email.me> (permalink)

Show all headers | View raw


In GNU Awk there's currently three types of regular expressions, in
addition to the standard regexp-constants (/regex/) and the dynamic
regexps ("regex", or variables containing "regex") there's in newer
versions also first class regexp objects (@/regex/, "Strongly Typed
Regexp Constants") supported.

One principal advantage of regexp-constants is that the engine to
parse the regexp can be created in advance, while a dynamic regexp
may be constructed dynamically (from strings) and needs an explicit
runtime-step to create the engine before the matching can be done.
Now I assumed that  @/regex-const/  would in that respect behave as
 /regex-const/ ... - until I found in the GNU Awk manual this text:

|
| Thus, if you have something like this:
|
|   re = @/don't panic/
|   sub(/don't/, "do", re)
|   print typeof(re), re
|
| then re retains its type, but now attempts to match the string ‘do
| panic’. This provides a (very indirect) way to create regexp-typed
| variables at runtime.
|

(I'm astonished that first class regexp objects can be dynamically
changed. But that is not my point here; I'm interested in potential
pre-compiles of regexp constants...)

This would imply that the first class regexp constants can be changed
like dynamic regexps and that there's no regexp pre-compile involved.
This would also rise suspicion that the "normal" regexp-constants are
probably also not precomputed.

So constant-regexps (both forms) have (only?) the advantage that the
regexp-syntax can be (initially during awk parsing) checked, e.g.,

 	re = @/don't panic[/
 	     ^ unterminated regexp

And dynamic regexps and first class regexps that got changed (e.g.
by code like

  sub(/don't/, "do[", re)

in above sample snippet) would both create runtime errors, e.g.

  error: Unmatched [, [^, [:, [., or [=: /do[ panic/
  fatal: could not make typed regex

(as all ill-formed regexp-types will produce a runtime error).

Janis

Back to comp.lang.awk | Previous | NextNext in thread | Find similar


Thread

GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-11-28 19:18 +0100
  Re: GNU Awk's types of regular expressions Kaz Kylheku <643-408-1753@kylheku.com> - 2024-11-29 04:13 +0000
    Re: GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-11-29 09:33 +0100
    Re: GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-11-30 12:41 +0100
  Re: GNU Awk's types of regular expressions arnold@freefriends.org (Aharon Robbins) - 2024-12-01 20:20 +0000
    Re: GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-12-01 22:17 +0100
      Re: GNU Awk's types of regular expressions arnold@skeeve.com (Aharon Robbins) - 2024-12-01 23:18 +0000
        Re: GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-12-02 08:00 +0100
          Re: GNU Awk's types of regular expressions arnold@skeeve.com (Aharon Robbins) - 2024-12-02 20:58 +0000
            Re: GNU Awk's types of regular expressions Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-12-02 23:13 +0100

csiph-web