Path: csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou Newsgroups: comp.lang.awk Subject: Re: substr() - copying or not copying, that is here the question. Date: Sun, 1 Jun 2025 15:47:38 +0200 Organization: A noiseless patient Spider Lines: 55 Message-ID: <101hlls$24vmc$1@dont-email.me> References: <101f9oo$18edp$1@dont-email.me> <683b5389$0$683$14726298@news.sunsite.dk> <101fv4s$1g5c8$1@dont-email.me> <87h60zrbea.fsf@bsb.me.uk> <101hevp$2qrh$1@news.xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Date: Sun, 01 Jun 2025 15:47:40 +0200 (CEST) Injection-Info: dont-email.me; posting-host="5efe03dbd7af97f43c3764a2772b692a"; logging-data="2260684"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wnwqt6OmmVMXNlT9F1f1R" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:OCjkivyBQRe54DSH/HM4GLOkfSs= In-Reply-To: <101hevp$2qrh$1@news.xmission.com> X-Enigmail-Draft-Status: N1110 Xref: csiph.com comp.lang.awk:9987 On 01.06.2025 13:53, Kenny McCormack wrote: > In article <87h60zrbea.fsf@bsb.me.uk>, Ben Bacarisse wrote: > ... >> An alternative (depending on the context) would be to consider an >> extension that provides an index function with a third argument giving >> the initial offset. I've not looked at how extensions get access to >> GAWK strings, so this many not be as easy as it sounds, but I would >> guess that it might be relatively simple to do. > > The thing about writing GAWK extensions is that the first one is hard, > because it is all new stuff to learn (and you have to establish your own > conventions for how your extensions are going to look, code-wise). [...] You are describing the individual practical accustoming to writing own extensions. Okay. My viewpoint was another. It's IMO a problem if folks write own index() extension in his/her own version and code quality. We see proliferations of own versions in all areas of IT; and I don't see it as a desirable goal. I wouldn't want two (or three) 'match' functions (match(), match_a(), match_ae()) or similar.[*] If a function provides actually the same basic task, and it should get controlled (like program options[**]) with optional arguments, that would be what I'd think be the right way. > > By the way, if you find the substring at position 900005 (i.e., the 5th (*) > char of the searched string), should the function return 5 or 900005? > > (*) Or 6th; I'm not sure of my exact notation at this point. I don't think the decision is crucial since (I think) you could derive one value from the other (provided the given arguments). The behavior should probably be discussed to find the advantages of one or the other option. Yet I'm not sure whether the usefulness of such a match() function extension is commonly accepted. (Anyway. Core evolutions are deprecated. And it won't happen.) Janis [*] I find it already a bit, umm, strange to have three different substitution functions in GNU Awk (two historic/standard and one somewhat generalized and extended). [**] Do we need grep, egrep, fgrep - on my system they are not even hardlinks -, or should grep be used with options grep -E, grep -F ?