Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9943
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Newsgroups | comp.lang.awk |
| Subject | Re: Experiences with match() subexpressions? |
| Date | 2025-04-11 08:22 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <20250411004239.533@kylheku.com> (permalink) |
| References | (1 earlier) <vt8bit$2uiq5$1@dont-email.me> <vt8j5u$1gmdg$1@news.xmission.com> <vt9dre$3t3po$1@dont-email.me> <67f8b7af$0$705$14726298@news.sunsite.dk> <vtafa1$vfhn$1@dont-email.me> |
On 2025-04-11, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> On 11.04.2025 08:33, Aharon Robbins wrote:
>> In article <vt9dre$3t3po$1@dont-email.me>,
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>>> The feature can be very useful,
>>> but not for the case I was looking for. - Actually, it could have
>>> provided the functionality I was seeking, but since GNU Awk relies
>>> on the GNU regexp functions as they are implemented I cannot expect
>>> that any provided features gets extended by Awk. - If GNU Awk would
>>> have an own RE implementation then we could think about using, e.g.,
>>> another array dimension to store the (now only temporary existing,
>>> and generally unavailable) subexpressions.
>>
>> Actually, this is not so trivial. The data structures at the C level
>> as mandated by POSIX are one dimensional; the submatches in parentheses
>> are counted from left to right. There's no way to represent the
>> subexpressions that are under control of interval expressions, which
>> would essentially require a two-dimensional data structure.
>
> Yes, that's why I had thought about a 2-dimensional array [on GNU
> Awk level] so that arr[n][i] for i=1..z would contain the patterns.
> This is what I actually tried with GNU Awk (before I had asked you)
> to see whether there's some undocumented feature.
I solved this problem 15 years ago in the TXR Pattern Language
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
We can eval the output into Bash and have a ${r[@]} array.
We can see the captured variables in a Lisp format:
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -l -c '@(coll)R=@r,@(until)E@(end)E=@e'
(r "r1" "r2" "r3")
(e . "e")
The matches occuring in repetition constructs like @(coll) or its
vertical, line-oriented counterpart @(collect), are automatically
tabulated into lists.
We can see that the "e" variable wasn't; it is string valued,
rather than list valued.
One possibility is to use the @(merge dest {sources}*) directive which
examines different nesting depths of its operands and
intelligently combines them.
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)
@(forget r e)'
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"
A plethora of techniques are possible.
In Lisp, Split data along commas, then again on =
1> (flow "R=r1,R=r2,R=r3,E=e"
(spl ","))
("R=r1" "R=r2" "R=r3" "E=e")
2> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (op spl "=")))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))
Or pattern match the comma splits:
3> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do match `@key=@val` @1 (list key val))))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))
Just the R's please
4> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do if-match `R=@val` @1 val)))
("r1" "r2" "r3" nil)
Splice out the nils:
8> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(mappend (do if-match `R=@val` @1 (list val))))
("r1" "r2" "r3")
Or remove them:
9> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do if-match `R=@val` @1 val))
(remq nil))
Heck, use a Lispified Awk. The variable f holds
the fields. Whenw e assign f to itself, that
forces the recalculation of variable rec with
the ofs:
10> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
(:set fs "," ofs ":")
(t (set f f) (prn)))
R=r1:R=r2:R=r3:E=e
nil
Use two Awks, nested inside each other: inner Awk
processes the fields f produced by the outer Awk:
11> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
(:set fs "," ofs ":")
(t (awk (:inputs f)
(:set fs "=")
(t (prn [f 1])))))
r1
r2
r3
e
nil
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
Back to comp.lang.awk | Previous | Next — Previous in thread | Next in thread | Find similar
Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:06 +0200
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:09 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 11:08 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 13:55 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 14:04 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 23:39 +0200
Re: Experiences with match() subexpressions? arnold@freefriends.org (Aharon Robbins) - 2025-04-11 06:33 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 09:10 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 08:22 +0000
Re: Experiences with match() subexpressions? Manuel Collado <mcollado2011@gmail.com> - 2025-04-18 12:03 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-18 12:01 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-18 14:24 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 07:40 +0000
The new matcher (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-11 08:57 +0000
Re: The new matcher (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 15:50 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 17:54 +0000
Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-10 20:07 -0500
Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-13 12:52 -0500
Nitpicking the code (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-14 18:20 +0000
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-14 20:53 +0200
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Ed Morton <mortonspam@gmail.com> - 2025-04-14 18:55 -0500
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-15 05:35 +0200
csiph-web