Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #9946
| Path | csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
| Newsgroups | comp.lang.awk |
| Subject | Re: Experiences with match() subexpressions? |
| Date | Fri, 11 Apr 2025 17:54:07 -0000 (UTC) |
| Organization | A noiseless patient Spider |
| Lines | 99 |
| Message-ID | <20250411102342.782@kylheku.com> (permalink) |
| References | <vt7qlq$2ge70$1@dont-email.me> <vt8bit$2uiq5$1@dont-email.me> <vt8j5u$1gmdg$1@news.xmission.com> <vt9dre$3t3po$1@dont-email.me> <67f8b7af$0$705$14726298@news.sunsite.dk> |
| Injection-Date | Fri, 11 Apr 2025 19:54:08 +0200 (CEST) |
| Injection-Info | dont-email.me; posting-host="83c4f9938a9c8a0ea420cd848f641865"; logging-data="2362972"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19470OI3V/DSu9NyjL6v/Y110ikbGp57p4=" |
| User-Agent | slrn/pre1.0.4-9 (Linux) |
| Cancel-Lock | sha1:nhuWrQmw0cFvl6ZSNNHukM2xaOQ= |
| Xref | csiph.com comp.lang.awk:9946 |
Show key headers only | View raw
On 2025-04-11, Aharon Robbins <arnold@freefriends.org> wrote:
> In article <vt9dre$3t3po$1@dont-email.me>,
> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>>The feature can be very useful,
>>but not for the case I was looking for. - Actually, it could have
>>provided the functionality I was seeking, but since GNU Awk relies
>>on the GNU regexp functions as they are implemented I cannot expect
>>that any provided features gets extended by Awk. - If GNU Awk would
>>have an own RE implementation then we could think about using, e.g.,
>>another array dimension to store the (now only temporary existing,
>>and generally unavailable) subexpressions.
>
> Actually, this is not so trivial. The data structures at the C level
> as mandated by POSIX are one dimensional; the submatches in parentheses
> are counted from left to right. There's no way to represent the
> subexpressions that are under control of interval expressions, which
> would essentially require a two-dimensional data structure.
Here is what I believe is the right requirement, if you want repeatedly
visited subexpressions to capture all their iterations.
The dimensionality has to be such that the entire array of matches is
versioned as a whole.
In other words, abstractly, we have
matches[history][register]
where history counts from 0, that being the latest matches.
register also goes from zero; [0] is the match for the entire
expression, [1] for subexpression 1 and so on.
Any time there is a repetition in any subexpression, matches[0]
is duplicated and pushed into the history.
We can imagine the matches[h][0..(n-1)] giving a trace of the
matches through the tree of subexpressions, from root to leaf.
Each time someting is matched, the entire trace is recorded
in the history, so everything is consistent.
Say we want to parse the syntax
key=v1,v2,v3 foo=a,b
Using something like :
([^ =]+=([^ ,]*,?)* *)*
1 2
Then we have the subgroups 1 and 2. We would like to end up with
a two dimensional match array like this:
match[hist][reg] =
reg
hist 0 1 2
0 key=v1,v2,v3 foo=a,b foo=a,b b
1 key=v1,v2,v3 foo=a,b foo=a,b a,
2 key=v1,v2,v3 foo=a,b key=v1,v2,v3 v3
3 key=v1,v2,v3 foo=a,b key=v1,v2,v3 v2,
4 key=v1,v2,v3 foo=a,b key=v1,v2,v3 v1,
This gives us the raw trace snashpot data from which a tree could be
built using a simple algorithm (say, still in the order of leftmost
being more recent match):
"key=v1,v2,v3 foo=a,b"
/ \
"foo=a,b" "key=v1,v2,v3"
/ \ / | \
"b" "a," "v3" "v2," "v1,"
This structure provides more logical access.
Anyway, I feel this problem is better solved using approaches
that avoid regexes, or that use regexes for just some low-level
tokenizing.
With my above regex, there are stray commas in the items,
because they had to be included in the repetition, and there
is no nice way to exclude them without adding another level
of parentheses.
Each time we play with the parentheses, we radically change
the structure and size of the output.
It just ends up a wrongheaded academic exercise.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
Back to comp.lang.awk | Previous | Next — Previous in thread | Next in thread | Find similar
Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:06 +0200
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 09:09 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 11:08 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 13:55 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-10 14:04 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-10 23:39 +0200
Re: Experiences with match() subexpressions? arnold@freefriends.org (Aharon Robbins) - 2025-04-11 06:33 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 09:10 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 08:22 +0000
Re: Experiences with match() subexpressions? Manuel Collado <mcollado2011@gmail.com> - 2025-04-18 12:03 +0200
Re: Experiences with match() subexpressions? gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-18 12:01 +0000
Re: Experiences with match() subexpressions? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-18 14:24 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 07:40 +0000
The new matcher (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-11 08:57 +0000
Re: The new matcher (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-11 15:50 +0200
Re: Experiences with match() subexpressions? Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-11 17:54 +0000
Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-10 20:07 -0500
Re: Experiences with match() subexpressions? Ed Morton <mortonspam@gmail.com> - 2025-04-13 12:52 -0500
Nitpicking the code (Was: Experiences with match() subexpressions?) gazelle@shell.xmission.com (Kenny McCormack) - 2025-04-14 18:20 +0000
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-14 20:53 +0200
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Ed Morton <mortonspam@gmail.com> - 2025-04-14 18:55 -0500
Re: Nitpicking the code (Was: Experiences with match() subexpressions?) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-04-15 05:35 +0200
csiph-web