Groups > alt.comp.lang.awk > #2 > unrolled thread

printing words without newlines?

Started by	David Chmelik <dchmelik@gmail.com>
First post	2024-05-12 04:57 +0000
Last post	2024-05-16 19:40 -0500
Articles	19 — 6 participants

Back to article view | Back to alt.comp.lang.awk

  printing words without newlines? David Chmelik <dchmelik@gmail.com> - 2024-05-12 04:57 +0000
    Re: printing words without newlines? Bruce Horrocks <07.013@scorecrow.com> - 2024-05-12 09:52 +0100
      Re: printing words without newlines? Bruce Horrocks <07.013@scorecrow.com> - 2024-05-12 09:55 +0100
      Re: printing words without newlines? gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-12 12:11 +0000
        Re: printing words without newlines? David Chmelik <dchmelik@gmail.com> - 2024-05-13 02:04 +0000
        Re: printing words without newlines? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-13 16:49 +0000
    Re: printing words without newlines? gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-13 06:56 +0000
      Re: printing words without newlines? gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-13 14:53 +0000
        Resurrecting an old thread (Was: printing words without newlines?) gazelle@shell.xmission.com (Kenny McCormack) - 2024-07-15 18:10 +0000
    Re: printing words without newlines? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-05-13 10:18 +0200
    Re: printing words without newlines? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-13 17:17 +0000
      Re: printing words without newlines? gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-13 17:26 +0000
        Re: printing words without newlines? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-13 23:33 +0000
          Array indices are small integers? (Was: printing words without newlines?) gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-14 13:40 +0000
    Re: printing words without newlines? Ed Morton <mortonspam@gmail.com> - 2024-05-16 08:11 -0500
      Re: printing words without newlines? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-05-16 15:55 +0200
        Once upon a time... (Was: printing words without newlines?) gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-16 14:15 +0000
          Re: Once upon a time... (Was: printing words without newlines?) gazelle@shell.xmission.com (Kenny McCormack) - 2024-05-16 15:17 +0000
        Re: printing words without newlines? Ed Morton <mortonspam@gmail.com> - 2024-05-16 19:40 -0500

#2 — printing words without newlines?

From	David Chmelik <dchmelik@gmail.com>
Date	2024-05-12 04:57 +0000
Subject	printing words without newlines?
Message-ID	<v1pi7c$2b87j$1@dont-email.me>

I'm learning more AWK basics and wrote function to read file, sort, 
print.  I use GNU AWK (gawk) and its sort but printing is harder to get 
working than anything... separate lines work, but when I use printf() or 
set ORS then use print (for words one line) all awk outputs (on FreeBSD 
UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline 
before shell prompt)... is this normal (and I made mistake?) or am I 
approaching it wrong?  I recall BASIC prints new lines, but as I learned 
basic C and some derivatives, I'm used to newlines only being specified...
------------------------------------------------------------------------
# print_file_words.awk
# pass filename to function
BEGIN { print_file_words("data.txt"); }

# read two-column array from file and sort lines and print
function print_file_words(file) {
# set record separator then use print
# ORS=" "
  while(getline<file) arr[$1]=$0
  PROCINFO["sorted_in"]="@ind_num_asc"
  for(i in arr) 
  {
    split(arr[i],arr2)
    # output all words or on one line with ORS
    print arr2[2]
    # output all words on one line without needing ORS
    #printf("%s ",arr2[2])
  }
}
------------------------------------------------------------------------
# sample data.txt
2 your
1 all
3 base
5 belong
4 are
7 us
6 to

[toc] | [next] | [standalone]

#3

From	Bruce Horrocks <07.013@scorecrow.com>
Date	2024-05-12 09:52 +0100
Message-ID	<e0be0c38-e14e-45ba-ac87-5e2e4bd4f5cd@scorecrow.com>
In reply to	#2

On 12/05/2024 05:57, David Chmelik wrote:
> I'm learning more AWK basics and wrote function to read file, sort,
> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
> working than anything... separate lines work, but when I use printf() or
> set ORS then use print (for words one line) all awk outputs (on FreeBSD
> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
> before shell prompt)... is this normal (and I made mistake?) or am I
> approaching it wrong?  I recall BASIC prints new lines, but as I learned
> basic C and some derivatives, I'm used to newlines only being specified...
> ------------------------------------------------------------------------
> # print_file_words.awk
> # pass filename to function
> BEGIN { print_file_words("data.txt"); }
> 
> # read two-column array from file and sort lines and print
> function print_file_words(file) {
> # set record separator then use print
> # ORS=" "
>    while(getline<file) arr[$1]=$0
>    PROCINFO["sorted_in"]="@ind_num_asc"
>    for(i in arr)
>    {
>      split(arr[i],arr2)
>      # output all words or on one line with ORS
>      print arr2[2]
>      # output all words on one line without needing ORS
>      #printf("%s ",arr2[2])
>    }
> }
> ------------------------------------------------------------------------
> # sample data.txt
> 2 your
> 1 all
> 3 base
> 5 belong
> 4 are
> 7 us
> 6 to

You need to set ORS in the BEGIN { } section (or on the command line).

See 
<https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> 
for an example - just replace the "\n\n" in the example with " " to see 
the effect you are looking for.

-- 
Bruce Horrocks
Surrey, England

[toc] | [prev] | [next] | [standalone]

#4

From	Bruce Horrocks <07.013@scorecrow.com>
Date	2024-05-12 09:55 +0100
Message-ID	<8f790f04-5f16-4f07-8e0b-261b628baa7a@scorecrow.com>
In reply to	#3

On 12/05/2024 09:52, Bruce Horrocks wrote:
> On 12/05/2024 05:57, David Chmelik wrote:
>> I'm learning more AWK basics and wrote function to read file, sort,
>> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
>> working than anything... separate lines work, but when I use printf() or
>> set ORS then use print (for words one line) all awk outputs (on FreeBSD
>> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
>> before shell prompt)... is this normal (and I made mistake?) or am I
>> approaching it wrong?  I recall BASIC prints new lines, but as I learned
>> basic C and some derivatives, I'm used to newlines only being 
>> specified...
>> ------------------------------------------------------------------------
>> # print_file_words.awk
>> # pass filename to function
>> BEGIN { print_file_words("data.txt"); }
>>
>> # read two-column array from file and sort lines and print
>> function print_file_words(file) {
>> # set record separator then use print
>> # ORS=" "
>>    while(getline<file) arr[$1]=$0
>>    PROCINFO["sorted_in"]="@ind_num_asc"
>>    for(i in arr)
>>    {
>>      split(arr[i],arr2)
>>      # output all words or on one line with ORS
>>      print arr2[2]
>>      # output all words on one line without needing ORS
>>      #printf("%s ",arr2[2])
>>    }
>> }
>> ------------------------------------------------------------------------
>> # sample data.txt
>> 2 your
>> 1 all
>> 3 base
>> 5 belong
>> 4 are
>> 7 us
>> 6 to
> 
> You need to set ORS in the BEGIN { } section (or on the command line).
> 
> See 
> <https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> for an example - just replace the "\n\n" in the example with " " to see the effect you are looking for.
> 

Let me re-phrase that: it would be better to set ORS in the BEGIN {} 
section. I'm not sure why yours is not working but with some commented 
out code and some not, your example is unclear.

If what I have suggested doesn't work for you then please re-post your 
exact code.

-- 
Bruce Horrocks
Surrey, England

[toc] | [prev] | [next] | [standalone]

#5

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-12 12:11 +0000
Message-ID	<v1qblf$solc$1@news.xmission.com>
In reply to	#3

In article <e0be0c38-e14e-45ba-ac87-5e2e4bd4f5cd@scorecrow.com>,
Bruce Horrocks  <07.013@scorecrow.com> wrote:
...
>You need to set ORS in the BEGIN { } section (or on the command line).

This is demonstrably false.  You can set ORS whenever/wherever you want.
Whatever value it has when a plain "print" statement is executed, is what
will be used.  You are probably about thinking about the various variables
that affect input parsing. These variables clearly must be set prior to the
reading of the input, which usually means they need to be set in BEGIN (or
via something like -F or -v on the command line).

One of my favorite idioms (and one that might actually be useful to OP) is:

# Print every 3 input lines as a single output line
# Yes, this single line is the whole program!
ORS = NR % 3 ? " " : "\n"

>See 
><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> 
>for an example - just replace the "\n\n" in the example with " " to see 
>the effect you are looking for.

Of course, the whole point of this thread is that none of us has any idea
what OP is talking about or what his actual problem is.  We can only guess...

-- 
"It does a lot of things half well and it's just a garbage heap of ideas that are
mutually exclusive."

	- Ken Thompson, on C++ -

[toc] | [prev] | [next] | [standalone]

#8

From	David Chmelik <dchmelik@gmail.com>
Date	2024-05-13 02:04 +0000
Message-ID	<v1rsg2$37eqd$1@dont-email.me>
In reply to	#5

On Sun, 12 May 2024 12:11:27 -0000 (UTC), Kenny McCormack wrote:
> Of course, the whole point of this thread is that none of us has any
> idea what OP is talking about or what his actual problem is.  We can
> only guess...

Not the point.  I stated I'm trying AWK... problem is in subject line.  
Surprisingly, after rebooting PC, it all works now (un)commenting  
particular parts (OSR or commenting out print and uncommenting printf).

> On 12/05/2024 09:52, Bruce Horrocks wrote:
> Let me re-phrase that: it would be better to set ORS in the BEGIN {}
> section. I'm not sure why yours is not working but with some commented
> out code and some not, your example is unclear.

Okay.  What I posted works to read file, sort, print lines; I commented 
out two versions that (initially) didn't work to print all on one line  
(OSR or commenting out print and uncommenting printf).  After rebooting 
(maybe just needed to restart shell?) those worked as expected... with ORS 
in BEGIN but alternatively in function I wrote.  I guess as Mr McCormack 
explained, one might have reasons to change OSR in different functions.

[toc] | [prev] | [next] | [standalone]

#14

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2024-05-13 16:49 +0000
Message-ID	<20240513093736.709@kylheku.com>
In reply to	#5

On 2024-05-12, Kenny McCormack <gazelle@shell.xmission.com> wrote:
> In article <e0be0c38-e14e-45ba-ac87-5e2e4bd4f5cd@scorecrow.com>,
> Bruce Horrocks  <07.013@scorecrow.com> wrote:
> ...
>>You need to set ORS in the BEGIN { } section (or on the command line).
>
> This is demonstrably false.  You can set ORS whenever/wherever you want.
> Whatever value it has when a plain "print" statement is executed, is what
> will be used.  You are probably about thinking about the various variables
> that affect input parsing. These variables clearly must be set prior to the
> reading of the input, which usually means they need to be set in BEGIN (or
> via something like -F or -v on the command line).
>
> One of my favorite idioms (and one that might actually be useful to OP) is:
>
> # Print every 3 input lines as a single output line
> # Yes, this single line is the whole program!
> ORS = NR % 3 ? " " : "\n"
>
>>See 
>><https://www.gnu.org/software/gawk/manual/html_node/Output-Separators.html> 
>>for an example - just replace the "\n\n" in the example with " " to see 
>>the effect you are looking for.
>
> Of course, the whole point of this thread is that none of us has any idea
> what OP is talking about or what his actual problem is.  We can only guess...

The problem seems to be that there is a file of words preceded by
unique integer ranks which indicate the order. They are to be reproduced
in rank order, on one line.

s is the TXR Lisp interactive listener of TXR 294.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
Self-assembly keeps TXR costs low; but ask about our installation service!
1> (flow "data.txt"
      file-get-lines
      (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
      transpose
      (select (second @1) (first @1))
      (join-with " ")
      put-line)
all your base are belong to us

We can insert prints into the pipeline to see the transformations:

2> (flow "data.txt"
      prinl
      file-get-lines
      prinl
      (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
      prinl
      transpose
      prinl
      (select (second @1) (first @1))
      prinl
      (join-with " ")
      prinl
      put-line)
"data.txt"
("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
(#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
 #(5 "to"))
#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
#("all" "your" "base" "are" "belong" "to" "us")
"all your base are belong to us"
all your base are belong to us
t

That is tedious; say, why not make a macro dflow (debug flow) which inserts
those prinl's for us?

3> (defmacro dflow (. args)
     ^(flow ,*(interpose 'prinl args)))
dflow

Sanity check: is it inserting prinls?

4> (macroexpand-1 '(dflow a b c d))
(flow a prinl
  b prinl c prinl
  d)

Use dflow:

5> (dflow "data.txt"
      file-get-lines
      (mapcar (do match `@a @b` @1 (vec (pred (toint a)) b)))
      transpose
      (select (second @1) (first @1))
      (join-with " ")
      put-line)
"data.txt"
("2 your" "1 all" "3 base" "5 belong" "4 are" "7 us" "6 to")
(#(1 "your") #(0 "all") #(2 "base") #(4 "belong") #(3 "are") #(6 "us")
 #(5 "to"))
#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))
#("all" "your" "base" "are" "belong" "to" "us")
"all your base are belong to us"
all your base are belong to us
t

After file-get-lines we have a list of strings like "2 your".

We map those through an anonymous function which matches the
string pattern `@a @b` to capture the space-separated text pieces.
A is converted to integer and mapped to its predecessor
(because we want to use it as an index, and indexing is zero based).
We map each string to a two element vector consisting of the
zero-based index as an integer type, and a string, so now we have:

(#(1 "your") #(0 "all") ...)

#(a b c) is a vector notation.

Then we want to transpose rows to columns to get the integer
column as a vector, and the values as a vector.

#(#(1 0 2 4 3 6 5) #("your" "all" "base" "belong" "are" "us" "to"))

Now we use the built-in function select which selects elements out
of a sequence, based on indices supplied in another sequence.

Now we have the vector of words in the right order; we just
join with a space.

[toc] | [prev] | [next] | [standalone]

#11

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-13 06:56 +0000
Message-ID	<v1sdji$tofu$2@news.xmission.com>
In reply to	#2

In article <v1pi7c$2b87j$1@dont-email.me>,
David Chmelik  <dchmelik@gmail.com> wrote:
...
># print_file_words.awk
># pass filename to function
>BEGIN { print_file_words("data.txt"); }
>
># read two-column array from file and sort lines and print
>function print_file_words(file) {
># set record separator then use print
># ORS=" "
>  while(getline<file) arr[$1]=$0
>  PROCINFO["sorted_in"]="@ind_num_asc"
>  for(i in arr) 
>  {
>    split(arr[i],arr2)
>    # output all words or on one line with ORS
>    print arr2[2]
>    # output all words on one line without needing ORS
>    #printf("%s ",arr2[2])
>  }
>}
>------------------------------------------------------------------------
># sample data.txt
>2 your
>1 all
>3 base
>5 belong
>4 are
>7 us
>6 to

I guess this is what you actually want:

{ A[$1] = $2 }
END {
    len = length(A)
    for (i=1; i<=len; i++)
	printf("%s%s",A[i],i<len ? " " : "\n")
    }

-- 
The randomly chosen signature file that would have appeared here is more than 4
lines long.  As such, it violates one or more Usenet RFCs.  In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
	http://user.xmission.com/~gazelle/Sigs/Noam

[toc] | [prev] | [next] | [standalone]

#13

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-13 14:53 +0000
Message-ID	<v1t9hi$u4lh$1@news.xmission.com>
In reply to	#11

In article <v1sdji$tofu$2@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
...
>I guess this is what you actually want:
>
>{ A[$1] = $2 }
>END {
>    len = length(A)
>    for (i=1; i<=len; i++)
>	printf("%s%s",A[i],i<len ? " " : "\n")
>    }

Improved version:

{ A[$1] = $2 }
END {
    for (i=1; i<=NR; i++)
	printf("%s%s",A[i],i<NR ? " " : "\n")
    }

Note that the value of NR in END is sort of a gray area, but it works as
expected in GAWK, which is really all we care about.

-- 
[Donald] Trump didn't have it all handed to him by his parents,
like Hillary Clinton did.

	- Some dumb cluck in Ohio; featured in Michael Moore's "Trumpland" -

[toc] | [prev] | [next] | [standalone]

#24 — Resurrecting an old thread (Was: printing words without newlines?)

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-07-15 18:10 +0000
Subject	Resurrecting an old thread (Was: printing words without newlines?)
Message-ID	<v73ong$3gdp5$1@news.xmission.com>
In reply to	#13

In article <v1t9hi$u4lh$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
...
>Improved version:
>
>{ A[$1] = $2 }
>END {
>    for (i=1; i<=NR; i++)
>	printf("%s%s",A[i],i<NR ? " " : "\n")
>    }
>
>Note that the value of NR in END is sort of a gray area, but it works as
>expected in GAWK, which is really all we care about.

Here's an even tighter version.  Saves about 20 bytes of code.
Yes, I know this code makes a lot of assumptions, but all the assumptions
are valid in the instant case (and that's all that matters):

{ A[$1] = $2 }
END {
    for (i=1; i<=NR; i++) $i = A[i]
    print
    }

-- 
Joni Ernst (2014): Obama should be impeached because 2 people have died of Ebola.
Joni Ernst (2020): Trump is doing great things, because only 65,000 times as many people have died of COVID-19.

Josef Stalin (1947): When one person dies, it is a tragedy; when a million die, it is merely statistics.

[toc] | [prev] | [next] | [standalone]

#12

From	Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Date	2024-05-13 10:18 +0200
Message-ID	<v1sid1$3bsil$1@dont-email.me>
In reply to	#2

On 12.05.2024 06:57, David Chmelik wrote:
> I'm learning more AWK basics and wrote function to read file, sort, 
> print.  I use GNU AWK (gawk) and its sort but printing is harder to get 
> working than anything... separate lines work, but when I use printf() or 
> set ORS then use print (for words one line) all awk outputs (on FreeBSD 
> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline 
> before shell prompt)... is this normal (and I made mistake?) or am I 
> approaching it wrong?  I recall BASIC prints new lines, but as I learned 
> basic C and some derivatives, I'm used to newlines only being specified...

IIUC you meanwhile have your script running, and probably code similar
to

    BEGIN { print_file_words("data.txt"); }

    function print_file_words(file) {
        while (getline <file >0)
            arr[$1] = $0
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (i in arr) {
            split (arr[i], arr2)
            printf "%s ", arr2[2]
        }
        printf "\n"
    }

I suggest to add the '>0' test to your code, and also print a final
"\n" so that your command line prompt doesn't overwrite your output.
Note also that printf (like print) is a command, no function. Adding
local variable declarations is also sensible to not get problems if
you operate your code in other source code contexts.

Janis

> ------------------------------------------------------------------------
> # print_file_words.awk
> # pass filename to function
> BEGIN { print_file_words("data.txt"); }
> 
> # read two-column array from file and sort lines and print
> function print_file_words(file) {
> # set record separator then use print
> # ORS=" "
>   while(getline<file) arr[$1]=$0
>   PROCINFO["sorted_in"]="@ind_num_asc"
>   for(i in arr) 
>   {
>     split(arr[i],arr2)
>     # output all words or on one line with ORS
>     print arr2[2]
>     # output all words on one line without needing ORS
>     #printf("%s ",arr2[2])
>   }
> }
> ------------------------------------------------------------------------
> # sample data.txt
> 2 your
> 1 all
> 3 base
> 5 belong
> 4 are
> 7 us
> 6 to
>

[toc] | [prev] | [next] | [standalone]

#15

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2024-05-13 17:17 +0000
Message-ID	<20240513100418.652@kylheku.com>
In reply to	#2

On 2024-05-12, David Chmelik <dchmelik@gmail.com> wrote:
> # sample data.txt
> 2 your
> 1 all
> 3 base
> 5 belong
> 4 are
> 7 us
> 6 to

$ awk '{
  if ($1 > max) max = $1;
  rank[$1] = $2
}

END {
  for (i = 1; i <= max; i++)
    if (i in rank) {
      printf("%s%s", sep, rank[i]);
      sep = " "
    }
  print ""
}' data.txt
all your base are belong to us

We do not perform any sort, and so we don't require GNU extensions. Sorting is
silly, because data is already sorted: we are given the positional rank of
every word, which is a way of capturing order. All we have to do is visit the
words in that order.

We can do that by iterating an index i from 1 to the highest index
we have seen.  If there is a rank[i] entry, then we print it.
(We do this "(i in rank)" check in case there are gaps in the rank
sequence.)

After we print one word, we start using the " " separator before all
subsequent words.

If we must sort, there is the sort utility:

$ sort -n data.txt | awk '{ printf("%s%s", sep, $2); sep = " " }' && echo
all your base are belong to us

Also, if we can suffer a spurious trailing space:

$ sort -n data.txt | awk '{ print $2 }' | tr '\n' ' ' && echo
all your base are belong to us

[toc] | [prev] | [next] | [standalone]

#16

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-13 17:26 +0000
Message-ID	<v1tih0$u8kt$1@news.xmission.com>
In reply to	#15

In article <20240513100418.652@kylheku.com>,
Kaz Kylheku  <643-408-1753@kylheku.com> wrote:
...
(This version more complicated than it needs to be, but essentially the
same as what I posted earlier)
>$ awk '{
>  if ($1 > max) max = $1;
>  rank[$1] = $2
>}
>
>END {
>  for (i = 1; i <= max; i++)
>    if (i in rank) {
>      printf("%s%s", sep, rank[i]);
>      sep = " "
>    }
>  print ""
>}' data.txt
>all your base are belong to us
>
>We do not perform any sort, and so we don't require GNU extensions. Sorting is

But GNU extensions are good - especially since OP specifically mentioned
using GAWK.  And much more on-topic than Lisp (et al).

Final note: In fact, it has been established (on this newsgroup as well as
empirically by me and others) that if the indices are small integers, you
get sorting for free (in GAWK, which, as noted, is all we care about).  So,
you don't even really need to mess with PROCINFO[]...

And, one more note about sorting.  Some responders on this thread have
gotten confused about what is to be sorted.  They assumed that OP wanted
the words sorted (alphabetically), when, in fact, he just wants them sorted
(numerically) by the position number (the first field in the data line).

-- 
The randomly chosen signature file that would have appeared here is more than 4
lines long.  As such, it violates one or more Usenet RFCs.  In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
	http://user.xmission.com/~gazelle/Sigs/Mandela

[toc] | [prev] | [next] | [standalone]

#17

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2024-05-13 23:33 +0000
Message-ID	<20240513162301.128@kylheku.com>
In reply to	#16

On 2024-05-13, Kenny McCormack <gazelle@shell.xmission.com> wrote:
> In article <20240513100418.652@kylheku.com>,
> Kaz Kylheku  <643-408-1753@kylheku.com> wrote:
> ...
> (This version more complicated than it needs to be, but essentially the
> same as what I posted earlier)
>>$ awk '{
>>  if ($1 > max) max = $1;
>>  rank[$1] = $2
>>}
>>
>>END {
>>  for (i = 1; i <= max; i++)
>>    if (i in rank) {
>>      printf("%s%s", sep, rank[i]);
>>      sep = " "
>>    }
>>  print ""
>>}' data.txt
>>all your base are belong to us
>>
>>We do not perform any sort, and so we don't require GNU extensions. Sorting is
>
> But GNU extensions are good - especially since OP specifically mentioned
> using GAWK.  And much more on-topic than Lisp (et al).

The above performs O(N) steps, whereas sorting is O(N log N),
and sometimes worse due to degenerate cases in some algorithms.

Why use an extension that only makes the program more verbose and brings
in an unnecessary algorithm.

> Final note: In fact, it has been established (on this newsgroup as well as
> empirically by me and others) that if the indices are small integers, you
> get sorting for free (in GAWK, which, as noted, is all we care about).  So,
> you don't even really need to mess with PROCINFO[]...

Are you referring to the idea of just replacing the above for + if
structure with:

  for (i in rank) {

  }

and relying on the small integer indices being hashed in order?

Where is that documented? The manual reiterates that this is not
specified: "By default, the order in which a ‘for (indx in array)’ loop
scans an array is not defined; it is generally based upon the internal
implementation of arrays inside awk."

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#18 — Array indices are small integers? (Was: printing words without newlines?)

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-14 13:40 +0000
Subject	Array indices are small integers? (Was: printing words without newlines?)
Message-ID	<v1vpk5$vbji$1@news.xmission.com>
In reply to	#17

In article <20240513162301.128@kylheku.com>,
Kaz Kylheku  <643-408-1753@kylheku.com> wrote:
...
>> Final note: In fact, it has been established (on this newsgroup as well as
>> empirically by me and others) that if the indices are small integers, you
>> get sorting for free (in GAWK, which, as noted, is all we care about).  So,
>> you don't even really need to mess with PROCINFO[]...
>
>Are you referring to the idea of just replacing the above for + if
>structure with:
>
>  for (i in rank) {
>
>  }
>
>and relying on the small integer indices being hashed in order?

Yes.

>Where is that documented? The manual reiterates that this is not
>specified: "By default, the order in which a for (indx in array) loop
>scans an array is not defined; it is generally based upon the internal
>implementation of arrays inside awk."

It is documented in this newsgroup (Google is your friend).
And assented to by one or both of the GAWK insiders who are known to post here.
It seems to be an attribute (i.e., quirk) of the particular hashing
algorithm used.

Now, of course it isn't guaranteed and could disappear in some future
version of GAWK - and, of course, one wouldn't rely on it in production
code, since it is so easy to make it right by including the line (shown in
this thread's OP) that sets PROCINFO[].

But it is true, nonetheless.

-- 
The key difference between faith and science is that in science, evidence that
doesn't fit the theory tends to weaken the theory (that is, make it less likely to
be believed), whereas in faith, contrary evidence just makes faith stronger (on
the assumption that Satan is testing you - trying to make you abandon your faith).

[toc] | [prev] | [next] | [standalone]

#19

From	Ed Morton <mortonspam@gmail.com>
Date	2024-05-16 08:11 -0500
Message-ID	<v250m9$1j3gp$1@dont-email.me>
In reply to	#2

On 5/11/2024 11:57 PM, David Chmelik wrote:
> I'm learning more AWK basics and wrote function to read file, sort,
> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
> working than anything... separate lines work, but when I use printf() or
> set ORS then use print (for words one line) all awk outputs (on FreeBSD
> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
> before shell prompt)... 

Your input file probably has DOS line endings, see 
https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it 
for what that means and how to deal with them but basically either run 
`dos2unix` on your file before calling awk or add `sub(\r$/,"")` as I 
show below*.

is this normal (and I made mistake?) or am I
> approaching it wrong?  I recall BASIC prints new lines, but as I learned
> basic C and some derivatives, I'm used to newlines only being specified...
> ------------------------------------------------------------------------
> # print_file_words.awk
> # pass filename to function
> BEGIN { print_file_words("data.txt"); }
> 
> # read two-column array from file and sort lines and print
> function print_file_words(file) {
> # set record separator then use print
> # ORS=" "

Move the above to a BEGIN section so it is executed once total instead 
of once per input line.

>    while(getline<file) arr[$1]=$0

The above would spin off into an infinite loop if getline failed since 
in that case it'd return a negative number which would still evaluate to 
"true" when tested as a condition. It needs to be:

     while ( (getline < file) > 0 ) arr[$1] = $0

See http://awk.freeshell.org/AllAboutGetline for that and more info on 
using getline.

*This is where you'd strip CRs from the end of input lines. Do either of 
these, the first uses a non-POSIX extension function gensub() (which 
gawk has), the second would work in any awk:

     a) while ( (getline < file) > 0 ) arr[$1] = gensub(/\r$/,"",1)

     b) while ( (getline < file) > 0 ) { sub(/\r$/,""); arr[$1] = $0 }

>    PROCINFO["sorted_in"]="@ind_num_asc"
>    for(i in arr)
>    {
>      split(arr[i],arr2)
>      # output all words or on one line with ORS
>      print arr2[2]
>      # output all words on one line without needing ORS
>      #printf("%s ",arr2[2])
>    }

Add `print RS` after the loop if you had set ORS to a blank so the 
output ends in a newline and therefore is a valid POSIX text file, 
otherwise YMMV with what subsequent text processing tools can do with it.

     Ed.

> }
> ------------------------------------------------------------------------
> # sample data.txt
> 2 your
> 1 all
> 3 base
> 5 belong
> 4 are
> 7 us
> 6 to

[toc] | [prev] | [next] | [standalone]

#20

From	Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Date	2024-05-16 15:55 +0200
Message-ID	<v2538p$1jmvm$1@dont-email.me>
In reply to	#19

On 16.05.2024 15:11, Ed Morton wrote:
> On 5/11/2024 11:57 PM, David Chmelik wrote:
>> I'm learning more AWK basics and wrote function to read file, sort,
>> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
>> working than anything... separate lines work, but when I use printf() or
>> set ORS then use print (for words one line) all awk outputs (on FreeBSD
>> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
>> before shell prompt)... 
> 
> [...]
>> ------------------------------------------------------------------------
>> # print_file_words.awk
>> # pass filename to function
>> BEGIN { print_file_words("data.txt"); }
>>
>> # read two-column array from file and sort lines and print
>> function print_file_words(file) {
>> # set record separator then use print
>> # ORS=" "
> 
> Move the above to a BEGIN section so it is executed once total instead
> of once per input line.

A function definition called once from the BEGIN section isn't
called "once per input line".

Janis

> 
>> [...]

[toc] | [prev] | [next] | [standalone]

#21 — Once upon a time... (Was: printing words without newlines?)

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-16 14:15 +0000
Subject	Once upon a time... (Was: printing words without newlines?)
Message-ID	<v254ev$125p2$1@news.xmission.com>
In reply to	#20

In article <v2538p$1jmvm$1@dont-email.me>,
Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
...
>A function definition called once from the BEGIN section isn't
>called "once per input line".

Especially since it is commented out, so it executes exactly zero times.

Actually setting ORS (or any other similar variable) inside a function
definition is not such a bad idea, in terms of modularity.

-- 
To all the people worried about how bad it would look to have a public trial of a
former president (and all the usual verbiage that we heard in 1974), I say this to DJT:
    Just plead guilty, take your medicine, do your time, just fade away.
    For the good of the country.  Do the right thing.

[toc] | [prev] | [next] | [standalone]

#22 — Re: Once upon a time... (Was: printing words without newlines?)

From	gazelle@shell.xmission.com (Kenny McCormack)
Date	2024-05-16 15:17 +0000
Subject	Re: Once upon a time... (Was: printing words without newlines?)
Message-ID	<v2582m$125p2$2@news.xmission.com>
In reply to	#21

In article <v254ev$125p2$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>In article <v2538p$1jmvm$1@dont-email.me>,
>Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>...
>>A function definition called once from the BEGIN section isn't
>>called "once per input line".
>
>Especially since it is commented out, so it executes exactly zero times.
>
>Actually setting ORS (or any other similar variable) inside a function
>definition is not such a bad idea, in terms of modularity.

In fact, I'd like to expand on that.  It is commonly held that a
well-written function that changes the values of "special variables" should
save and restore them.  I.e.:

    function foo(arg1, arg2, ...) {
	oldORS = ORS
	ORS = new value
	...
	ORS = oldORS
	}

But in fact, in practice, this can get tricky - due to vagaries of the AWK
language.  What would really be nice is if you could declare special
variables in the parameter list - which would give them the "local
variable" treatment.  I.e.:

    function foo(arg1, arg2, ..., ORS) {
	ORS = new value
	...
	}

Now, ORS would be magically restored to its previous value w/o the function
having to deal with it (**).  Unfortunately, neither GAWK nor TAWK allows this.
GAWK gives an error message saying you can't use special variables in arg
lists.  TAWK just silently ignores the attempt.

What would be even better is if this happened magically w/o needing to do
the above parameter trick.  An argument can be made that changes to special
variables should, by default, be local to functions.  Now, as it happens,
this would break one of my functions - which I call "setsort", which sets
PROCINFO["sorted_in"] for me.  Basically, I can never remember the special
names of the internal sorting functions (e.g., @ind_whatever), so I wrote a
function setsort() and can now just do: setsort(1) to get the most commonly
used sorting functionality.  I find it easier to remember the numbers than
to remember the exact spelling of those names.

This, in turn, could be fixed if there was a "global" statement that would
make a selected variable global rather than local (*).  This is, in part,
inspired by Tcl syntax, where everything is local by default and you have
to explicitly use "global var" to make "var" global.  I've often thought
that, if it could be done all over again, AWK might be better if it had
followed the Tcl model for function variables.  Of course, it can't be
changed now.

(*) So, my setsort() function, I would write: global PROCINFO
and that would make changes to PROCINFO visible to the caller.

(**) Or, you could even pass a value for ORS in as part of the function call.

-- 
The randomly chosen signature file that would have appeared here is more than 4
lines long.  As such, it violates one or more Usenet RFCs.  In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
	http://user.xmission.com/~gazelle/Sigs/PennJillette

[toc] | [prev] | [next] | [standalone]

#23

From	Ed Morton <mortonspam@gmail.com>
Date	2024-05-16 19:40 -0500
Message-ID	<v2691k$1r4un$1@dont-email.me>
In reply to	#20

On 5/16/2024 8:55 AM, Janis Papanagnou wrote:
> On 16.05.2024 15:11, Ed Morton wrote:
>> On 5/11/2024 11:57 PM, David Chmelik wrote:
>>> I'm learning more AWK basics and wrote function to read file, sort,
>>> print.  I use GNU AWK (gawk) and its sort but printing is harder to get
>>> working than anything... separate lines work, but when I use printf() or
>>> set ORS then use print (for words one line) all awk outputs (on FreeBSD
>>> UNIX 14 and Slackware GNU/Linux 15) is a space (and not even newline
>>> before shell prompt)...
>>
>> [...]
>>> ------------------------------------------------------------------------
>>> # print_file_words.awk
>>> # pass filename to function
>>> BEGIN { print_file_words("data.txt"); }
>>>
>>> # read two-column array from file and sort lines and print
>>> function print_file_words(file) {
>>> # set record separator then use print
>>> # ORS=" "
>>
>> Move the above to a BEGIN section so it is executed once total instead
>> of once per input line.
> 
> A function definition called once from the BEGIN section isn't
> called "once per input line".

I didn't notice the function keyword nestled in the preceding comments 
and didn't give it much thought, thanks for pointing that out.

	Ed.

[toc] | [prev] | [standalone]

csiph-web

printing words without newlines?

Contents

#2 — printing words without newlines?

#3

#4

#5

#8

#14

#11

#13

#24 — Resurrecting an old thread (Was: printing words without newlines?)

#12

#15

#16

#17

#18 — Array indices are small integers? (Was: printing words without newlines?)

#19

#20

#21 — Once upon a time... (Was: printing words without newlines?)

#22 — Re: Once upon a time... (Was: printing words without newlines?)

#23