Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9943

Re: Experiences with match() subexpressions?

From Kaz Kylheku <643-408-1753@kylheku.com>
Newsgroups comp.lang.awk
Subject Re: Experiences with match() subexpressions?
Date 2025-04-11 08:22 +0000
Organization A noiseless patient Spider
Message-ID <20250411004239.533@kylheku.com> (permalink)
References (1 earlier) <vt8bit$2uiq5$1@dont-email.me> <vt8j5u$1gmdg$1@news.xmission.com> <vt9dre$3t3po$1@dont-email.me> <67f8b7af$0$705$14726298@news.sunsite.dk> <vtafa1$vfhn$1@dont-email.me>

Show all headers | View raw


On 2025-04-11, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> On 11.04.2025 08:33, Aharon Robbins wrote:
>> In article <vt9dre$3t3po$1@dont-email.me>,
>> Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>>> The feature can be very useful,
>>> but not for the case I was looking for. - Actually, it could have
>>> provided the functionality I was seeking, but since GNU Awk relies
>>> on the GNU regexp functions as they are implemented I cannot expect
>>> that any provided features gets extended by Awk. - If GNU Awk would
>>> have an own RE implementation then we could think about using, e.g.,
>>> another array dimension to store the (now only temporary existing,
>>> and generally unavailable) subexpressions.
>> 
>> Actually, this is not so trivial.  The data structures at the C level
>> as mandated by POSIX are one dimensional; the submatches in parentheses
>> are counted from left to right. There's no way to represent the
>> subexpressions that are under control of interval expressions, which
>> would essentially require a two-dimensional data structure.
>
> Yes, that's why I had thought about a 2-dimensional array [on GNU
> Awk level] so that arr[n][i] for i=1..z would contain the patterns.
> This is what I actually tried with GNU Awk (before I had asked you)
> to see whether there's some undocumented feature.

I solved this problem 15 years ago in the TXR Pattern Language

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"

We can eval the output into Bash and have a ${r[@]} array.

We can see the captured variables in a Lisp format:

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -l -c '@(coll)R=@r,@(until)E@(end)E=@e'
(r "r1" "r2" "r3")
(e . "e")

The matches occuring in repetition constructs like @(coll) or its
vertical, line-oriented counterpart @(collect), are automatically
tabulated into lists.

We can see that the "e" variable wasn't; it is string valued,
rather than list valued.

One possibility is to use the @(merge dest {sources}*) directive which
examines different nesting depths of its operands and
intelligently combines them.

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)
@(forget r e)'
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"

A plethora of techniques are possible.

In Lisp, Split data along commas, then again on =

1> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ","))
("R=r1" "R=r2" "R=r3" "E=e")
2> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ",")
     (map (op spl "=")))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))

Or pattern match the comma splits:

3> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ",")
     (map (do match `@key=@val` @1 (list key val))))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))

Just the R's please

4> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ",")
     (map (do if-match `R=@val` @1 val)))
("r1" "r2" "r3" nil)

Splice out the nils:

8> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ",")
     (mappend (do if-match `R=@val` @1 (list val))))
("r1" "r2" "r3")

Or remove them:

9> (flow "R=r1,R=r2,R=r3,E=e"
     (spl ",")
     (map (do if-match `R=@val` @1 val))
     (remq nil))

Heck, use a Lispified Awk. The variable f holds
the fields. Whenw e assign f to itself, that
forces the recalculation of variable rec with
the ofs:

10> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
         (:set fs "," ofs ":")
         (t (set f f) (prn)))
R=r1:R=r2:R=r3:E=e
nil

Use two Awks, nested inside each other: inner Awk
processes the fields f produced by the outer Awk:

11> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
         (:set fs "," ofs ":")
         (t (awk (:inputs f)
                 (:set fs "=")
                 (t (prn [f 1])))))
r1
r2
r3
e
nil

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:06 +0200
  Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:09 +0200
    Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 11:08 +0000
      Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 13:55 +0200
        Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 14:04 +0000
          Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 23:39 +0200
            Re: Experiences with match() subexpressions? arnold@freefriends.org (Aharon Robbins) - 2025-04-11 06:33 +0000
              Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 09:10 +0200
                Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 08:22 +0000
                Re: Experiences with match() subexpressions? Manuel Collado <mcollado2011@gmail.com> - 2025-04-18 12:03 +0200
                Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-18 12:01 +0000
                Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-18 14:24 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 07:40 +0000
              The new matcher (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-11 08:57 +0000
                Re: The new matcher (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 15:50 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 17:54 +0000
    Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-10 20:07 -0500
      Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-13 12:52 -0500
        Nitpicking the code (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-14 18:20 +0000
          Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-14 20:53 +0200
            Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Ed Morton <mortonspam@gmail.com> - 2025-04-14 18:55 -0500
              Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-15 05:35 +0200

csiph-web