Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9947

Re: Experiences with match() subexpressions?

From Ed Morton <mortonspam@gmail.com>
Newsgroups comp.lang.awk
Subject Re: Experiences with match() subexpressions?
Date 2025-04-13 12:52 -0500
Organization A noiseless patient Spider
Message-ID <vtgtkr$3br8e$1@dont-email.me> (permalink)
References <vt7qlq$2ge70$1@dont-email.me> <vt7qs4$2gior$1@dont-email.me> <vt9q0n$70fm$1@dont-email.me>

Show all headers | View raw


On 4/10/2025 8:07 PM, Ed Morton wrote:
> On 4/10/2025 2:09 AM, Janis Papanagnou wrote:
>> On 10.04.2025 09:06, Janis Papanagnou wrote:
>>> I'm looking for subexpressions of regexp-matches using GNU Awk's
>>> third parameter of match(). For example
>>>
>>>    data = "R=r1,R=r2,R=r3,E=e"
>>>    match (data, /^(R=([^,]+),){2,5}E=(.+)$/, arr)
>>>
>>> The result stored in 'arr' seems to be determined by the static
>>> parenthesis structure, so with the pattern repetition {2,5} only
>>> the last matched data in the subexpression (r3) seems to persist
>>> in arr. - I suppose there's no cute way to achieve what I wanted?
>>
>> To clarify; what I wanted is access of the values "r1", "r2", "r3",
>> and "e" through 'arr'.
> 
> Correct, you can't do what you want using just `match()`, it's simply 
> matching a regexp with capture groups against a string, just like sed does.
> 
> There are, of course, several other ways to get `arr[]` populated the 
> way you want. e.g split(), patsplit(), while(match()), or dynamically 
> generating the regexp. The best one to choose will depend on the real 
> values that r1, etc. can have, for example it'd be hard to use split() 
> if `r1` can be a quoted string that might itself contain similar 
> substrings such as `data = "R=\"R=r1,R=r2\",R=r2,R=r3,E=e"`.

FWIW, probably more for the benefit of any awk newcomers reading this, 
if your data really could have quoted fields (otherwise a simple 
`split(data,",")` is all you need) then, assuming they follow the same 
quoting rules as for CSVs, I'd use either of these or similar with GNU 
awk (for `patsplit()`:

     data = "R=\"R=r1,R=r2\",R=r2,R=r3,E=e"
     nf = patsplit(data, arr, /[RE]=([^,]*|"([^"]|"")*")/)
     delete arr
     for ( i in arr ) {
         sub(/[^=]+=/, "", arr[i])
     }

or any awk:

     data = "R=\"R=r1,R=r2\",R=r2,R=r3,E=e"
     nf = 0
     delete arr
     while ( match(data, /[RE]=([^,]*|"([^"]|"")*")/, a) ) {
         arr[++nf] = substr(data, RSTART+2, RLENGTH-2)
         data = substr(data, RSTART+RLENGTH)
     }

either of which would populate `arr[]` with:

     "R=r1,R=r2"
     r2
     r3
     e

and set `nf` to the number of entries in `arr[]`.

Regards,

     Ed.

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:06 +0200
  Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:09 +0200
    Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 11:08 +0000
      Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 13:55 +0200
        Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 14:04 +0000
          Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 23:39 +0200
            Re: Experiences with match() subexpressions? arnold@freefriends.org (Aharon Robbins) - 2025-04-11 06:33 +0000
              Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 09:10 +0200
                Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 08:22 +0000
                Re: Experiences with match() subexpressions? Manuel Collado <mcollado2011@gmail.com> - 2025-04-18 12:03 +0200
                Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-18 12:01 +0000
                Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-18 14:24 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 07:40 +0000
              The new matcher (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-11 08:57 +0000
                Re: The new matcher (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 15:50 +0200
              Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 17:54 +0000
    Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-10 20:07 -0500
      Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-13 12:52 -0500
        Nitpicking the code (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-14 18:20 +0000
          Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-14 20:53 +0200
            Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Ed Morton <mortonspam@gmail.com> - 2025-04-14 18:55 -0500
              Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-15 05:35 +0200

csiph-web