Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9979

Re: substr() - copying or not copying, that is here the question.

Path csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups comp.lang.awk
Subject Re: substr() - copying or not copying, that is here the question.
Date Sun, 1 Jun 2025 00:16:58 +0200
Organization A noiseless patient Spider
Lines 59
Message-ID <101fv4s$1g5c8$1@dont-email.me> (permalink)
References <101f9oo$18edp$1@dont-email.me> <683b5389$0$683$14726298@news.sunsite.dk>
MIME-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding 7bit
Injection-Date Sun, 01 Jun 2025 00:17:00 +0200 (CEST)
Injection-Info dont-email.me; posting-host="5efe03dbd7af97f43c3764a2772b692a"; logging-data="1578376"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Y9hgbcaxVsT7y3eJ/bl4u"
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
Cancel-Lock sha1:qIosECmLx/g2wrsQFrb3dx/A5fg=
In-Reply-To <683b5389$0$683$14726298@news.sunsite.dk>
X-Enigmail-Draft-Status N1110
Xref csiph.com comp.lang.awk:9979

Show key headers only | View raw


On 31.05.2025 21:07, Mack The Knife wrote:
> In article <101f9oo$18edp$1@dont-email.me>,
> Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>> In the context   p=index(substr(t,s),r)
>> it would not be necessary to copy the substr(t,s),
>> the index() function could operate on the original
>> using some access "descriptor" (say, a pointer and
>> a length) in read-only mode.
>>
>> Will (GNU) Awk do a copy of the data value or does
>> it use a read-only descriptor access to the already
>> existing substring of variable "t"?
>>
>> Currently I'm playing with some huge data and copies
>> of MB sized data is costly (if it's repeatedly done
>> with various substr() subscripts).
> 
> substr() makes a copy. This is clear in the code.

Okay. Thanks for checking that!

> 
> It's almost impossible to do this via read-only descriptor.
> Consider something like
> 
> 	x = substr($0, 10, 15)
> 	getline
> 	print x

Well, it's possible to do that with a descriptor if GNU
Awk had a delayed/lazy evaluation principle implemented.
(Before 'getline' invalidates $0 a copy is necessary, of
course.)

(It's been reported that there's some optimizations in
GNU Awk implemented, so it could have also be the case
here. That's why I'm asking.)

> 
> Gawk manages the storage such that for something like
> your example the copy will be released after index()
> returns a value.

As said, I'm working on a huge string of data. What are
other options to efficiently work on substring parts of
the data? With the result of your code-check I don't see
a chance to achieve that with GNU or maybe any Awk using
only standard functionality.

Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)

Janis

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-05-31 18:12 +0200
  Re: substr() - copying or not copying, that is here the question. mack@the-knife.org (Mack The Knife) - 2025-05-31 19:07 +0000
    Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-01 00:16 +0200
      Re: substr() - copying or not copying, that is here the question. Ben Bacarisse <ben@bsb.me.uk> - 2025-06-01 11:42 +0100
        Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-01 13:43 +0200
          Re: substr() - copying or not copying, that is here the question. gazelle@shell.xmission.com (Kenny McCormack) - 2025-06-01 12:06 +0000
            Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-01 15:27 +0200
        Re: substr() - copying or not copying, that is here the question. gazelle@shell.xmission.com (Kenny McCormack) - 2025-06-01 11:53 +0000
          Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-01 15:47 +0200
            Re: substr() - copying or not copying, that is here the question. gazelle@shell.xmission.com (Kenny McCormack) - 2025-06-01 14:17 +0000
              Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-08 00:01 +0200
      Re: substr() - copying or not copying, that is here the question. mack@the-knife.org (Mack The Knife) - 2025-06-03 06:56 +0000
        Re: substr() - copying or not copying, that is here the question. gazelle@shell.xmission.com (Kenny McCormack) - 2025-06-03 11:04 +0000
        Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-08 00:05 +0200
          Re: substr() - copying or not copying, that is here the question. mack@the-knife.org (Mack The Knife) - 2025-06-08 12:35 +0000
            Re: substr() - copying or not copying, that is here the question. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-06-11 11:07 +0200
              Meta chat (Was: substr() - copying or not copying, that is here the question.) gazelle@shell.xmission.com (Kenny McCormack) - 2025-06-11 12:11 +0000
    Re: substr() - copying or not copying, that is here the question. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-06-01 00:01 +0000

csiph-web