Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9936

Re: Experiences with match() subexpressions?

From Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups comp.lang.awk
Subject Re: Experiences with match() subexpressions?
Date 2025-04-10 13:55 +0200
Organization A noiseless patient Spider
Message-ID <vt8bit$2uiq5$1@dont-email.me> (permalink)
References <vt7qlq$2ge70$1@dont-email.me> <vt7qs4$2gior$1@dont-email.me> <vt88s7$1ghd2$1@news.xmission.com>

Show all headers | View raw


On 10.04.2025 13:08, Kenny McCormack wrote:
> In article <vt7qs4$2gior$1@dont-email.me>,
> Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>> On 10.04.2025 09:06, Janis Papanagnou wrote:
>>> I'm looking for subexpressions of regexp-matches using GNU Awk's
>>> third parameter of match(). For example
>>>
>>>   data = "R=r1,R=r2,R=r3,E=e"
>>>   match (data, /^(R=([^,]+),){2,5}E=(.+)$/, arr)
>>>
>>> The result stored in 'arr' seems to be determined by the static
>>> parenthesis structure, so with the pattern repetition {2,5} only
>>> the last matched data in the subexpression (r3) seems to persist
>>> in arr. - I suppose there's no cute way to achieve what I wanted?
>>
>> To clarify; what I wanted is access of the values "r1", "r2", "r3",
>> and "e" through 'arr'.
> 
> I have to admit that I (still) don't really understand how this match third
> arg stuff works. 

I've never used that before but it seems to be quite simple; for every
parenthesis group expression in the regexp it provides (statically, as
the parentheses are written, from left to right) an array element with
the expanded matched subexpression.

> I.e., I can never predict what will happen, so I always
> just dump out the array and try to reverse-engineer it each time I need to
> use it.
> 
> I adapted your code into the following test script:
> 
> --- Cut Here ---
> #!/bin/sh
> gawk 'BEGIN {
>     data = "R=r1,R=r2,R=r3,E=e"
>     match (data, /^(R=([^,]+),){2,5}E=(.+)$/, arr)
>     for (i in arr) print i,arr[i]
>     }'
> 
> # To clarify; what I wanted is access of the values "r1", "r2", "r3",
> # and "e" through 'arr'.
> --- Cut Here ---
> 
> The output I get is:
> 
> --- Cut Here ---
> 0start 1
> 0length 18
> 3start 18
> 1start 11
> 2start 13
> 3length 1
> 2length 2
> 1length 5

Above output stuff appears because in 'arr' there's additional elements
about the pattern positions stored.

I don't need that so I'm just interested in the data patterns below and
iterate with a index-counted loop...

> 0 R=r1,R=r2,R=r3,E=e

the whole expression

> 1 R=r3,

the expression in the first parenthesis

> 2 r3

the expression in the second, embedded parenthesis

> 3 e

the expression in the final parenthesis

> --- Cut Here ---
> 
> After playing around a bit, I could not come up with any sensible way of
> getting what you want to get.

Yeah, Arnold just told me the same; that it's impossible because the
underlying GNU regexp library doesn't support what I'm looking for.

What I considered a possible workaround (in this case) is to sequence
the (...){2,5} expression by using sequences of (...)? expressions.
(But in the general case, for larger ranges than 2-5, that's neither
feasible nor sensible any more.)

> 
> As an alternative, it sounds like you could just could just split the
> string on the comma; that would get you:

Yes, that was also how I did such things in the past. Only when I saw
that "third argument" to match() I hoped the two-level parsing could
be simplified in one step. The reason was that I thought to have seen
other languages (Perl, maybe?) that supported such a feature.

> 
>     R=r1
>     R=r2
>     R=r3
>     E=e
> 
> Or, for finer control, you could use patsplit().

I think I'll do the parsing the straightforward two-step way as I did
before the GNU Awk specific functions were available; it's probably
also the clearest way to program that functionality.

Janis

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:06 +0200
  Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:09 +0200
    Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 11:08 +0000
      Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 13:55 +0200
        Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 14:04 +0000
          Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 23:39 +0200
            Re: Experiences with match() subexpressions? arnold@freefriends.org (Aharon Robbins) - 2025-04-11 06:33 +0000
              Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 09:10 +0200
                Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 08:22 +0000
                Re: Experiences with match() subexpressions? Manuel Collado <mcollado2011@gmail.com> - 2025-04-18 12:03 +0200
                Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-18 12:01 +0000
                Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-18 14:24 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 07:40 +0000
              The new matcher (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-11 08:57 +0000
                Re: The new matcher (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 15:50 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 17:54 +0000
    Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-10 20:07 -0500
      Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-13 12:52 -0500
        Nitpicking the code (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-14 18:20 +0000
          Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-14 20:53 +0200
            Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Ed Morton <mortonspam@gmail.com> - 2025-04-14 18:55 -0500
              Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-15 05:35 +0200

csiph-web