Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #10073

Re: Can I output a binary file from an AWK program?

From Kaz Kylheku <046-301-5902@kylheku.com>
Newsgroups comp.lang.awk
Subject Re: Can I output a binary file from an AWK program?
Date 2026-04-17 00:25 +0000
Organization A noiseless patient Spider
Message-ID <20260416171309.949@kylheku.com> (permalink)
References <10q99ai$ph7$1@news.muc.de> <20260329163603.959@kylheku.com> <87a4vquqm5.fsf@example.invalid>

Show all headers | View raw


On 2026-03-30, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <046-301-5902@kylheku.com> writes:
>> On 2026-03-28, Alan Mackenzie <acm@muc.de> wrote:
>>> I think the Subject: line says it all.  I have a text source file to
>>> convert into a binary output file.  For example, I want to be able to
>>> output a 32-bit integer as a four byte little-endian binary integer.
>>
>> Firstly, Awk's printf has a %c specifier which will output any byte:
>>
>> $ awk 'BEGIN { printf("%c%c", 0x41, 0x0A) }'
>> A
>> $
> [...]
>
> The behavior seems to depend on the current locale.  I haven't
> investigated it thoroughly.  I don't know whether there's a way to
> force binary output.
>
> For example, the cent sign '¢' is U+00a2, represented in UTF-8
> as the two-byte sequence 0xc2, 0xa2.
>
> I have LANG=en_US.UTF-8 in my environment.
>
> $ gawk --version | head -n 1
> GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
> $ gawk 'BEGIN { printf("%c\n", 0xa2) }'
> ¢
> $ LANG=C gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
> 00000000  a2 0a                                             |..|
> 00000002
> $

I just read your response now, having returned from some travels.

On a whim, informed by something unrelated to GNU Awk, I tried this:

$ gawk 'BEGIN { printf("%c\n", 0xa2) }'
¢
$ gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
00000000  c2 a2 0a                                          |...|
00000003
$ gawk 'BEGIN { printf("%c\n", 0xdca2) }'
�
$ gawk 'BEGIN { printf("%c\n", 0xdca2) }' | hd
00000000  a2 0a                                             |..|
00000002

I.e. we are mapping the A2 to the surrogate pair region DCxx.
When we do this, Awk seems to be putting out the A2 byte
that we want (and not the UTF-8 encoding of the U+DCA2 code
point, which would be wrong).

This is a "thing out there", and I know about it from having implemented
it in the TXR project's UTF-8 handling also: the concept of mapping
invalid bytes in UTF-8 input to DCxx code points, and then mapping DCxx
code points back to bytes on output. (This is in contrast to strict
handling, such as throwing exceptions on invalid bytes in UTF-8.)

It seems that GNU Awk implements at least some aspect of this idea.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Can I output a binary file from an AWK program? Alan Mackenzie <acm@muc.de> - 2026-03-28 19:14 +0000
  Re: Can I output a binary file from an AWK program? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2026-03-28 21:17 +0100
    Re: Can I output a binary file from an AWK program? Alan Mackenzie <acm@muc.de> - 2026-03-29 11:30 +0000
      Re: Can I output a binary file from an AWK program? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2026-03-29 13:35 +0200
  Re: Can I output a binary file from an AWK program? Kaz Kylheku <046-301-5902@kylheku.com> - 2026-03-29 23:45 +0000
    Re: Can I output a binary file from an AWK program? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2026-03-29 17:07 -0700
      Re: Can I output a binary file from an AWK program? Kaz Kylheku <046-301-5902@kylheku.com> - 2026-04-17 00:25 +0000
  Re: Can I output a binary file from an AWK program? gazelle@shell.xmission.com (Kenny McCormack) - 2026-03-30 09:37 +0000
    Re: Can I output a binary file from an AWK program? gazelle@shell.xmission.com (Kenny McCormack) - 2026-04-02 12:06 +0000

csiph-web