Groups | Search | Server Info | Login | Register
Groups > comp.lang.awk > #10073
| From | Kaz Kylheku <046-301-5902@kylheku.com> |
|---|---|
| Newsgroups | comp.lang.awk |
| Subject | Re: Can I output a binary file from an AWK program? |
| Date | 2026-04-17 00:25 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <20260416171309.949@kylheku.com> (permalink) |
| References | <10q99ai$ph7$1@news.muc.de> <20260329163603.959@kylheku.com> <87a4vquqm5.fsf@example.invalid> |
On 2026-03-30, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <046-301-5902@kylheku.com> writes:
>> On 2026-03-28, Alan Mackenzie <acm@muc.de> wrote:
>>> I think the Subject: line says it all. I have a text source file to
>>> convert into a binary output file. For example, I want to be able to
>>> output a 32-bit integer as a four byte little-endian binary integer.
>>
>> Firstly, Awk's printf has a %c specifier which will output any byte:
>>
>> $ awk 'BEGIN { printf("%c%c", 0x41, 0x0A) }'
>> A
>> $
> [...]
>
> The behavior seems to depend on the current locale. I haven't
> investigated it thoroughly. I don't know whether there's a way to
> force binary output.
>
> For example, the cent sign '¢' is U+00a2, represented in UTF-8
> as the two-byte sequence 0xc2, 0xa2.
>
> I have LANG=en_US.UTF-8 in my environment.
>
> $ gawk --version | head -n 1
> GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
> $ gawk 'BEGIN { printf("%c\n", 0xa2) }'
> ¢
> $ LANG=C gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
> 00000000 a2 0a |..|
> 00000002
> $
I just read your response now, having returned from some travels.
On a whim, informed by something unrelated to GNU Awk, I tried this:
$ gawk 'BEGIN { printf("%c\n", 0xa2) }'
¢
$ gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
00000000 c2 a2 0a |...|
00000003
$ gawk 'BEGIN { printf("%c\n", 0xdca2) }'
�
$ gawk 'BEGIN { printf("%c\n", 0xdca2) }' | hd
00000000 a2 0a |..|
00000002
I.e. we are mapping the A2 to the surrogate pair region DCxx.
When we do this, Awk seems to be putting out the A2 byte
that we want (and not the UTF-8 encoding of the U+DCA2 code
point, which would be wrong).
This is a "thing out there", and I know about it from having implemented
it in the TXR project's UTF-8 handling also: the concept of mapping
invalid bytes in UTF-8 input to DCxx code points, and then mapping DCxx
code points back to bytes on output. (This is in contrast to strict
handling, such as throwing exceptions on invalid bytes in UTF-8.)
It seems that GNU Awk implements at least some aspect of this idea.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
Back to comp.lang.awk | Previous | Next — Previous in thread | Next in thread | Find similar
Can I output a binary file from an AWK program? Alan Mackenzie <acm@muc.de> - 2026-03-28 19:14 +0000
Re: Can I output a binary file from an AWK program? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2026-03-28 21:17 +0100
Re: Can I output a binary file from an AWK program? Alan Mackenzie <acm@muc.de> - 2026-03-29 11:30 +0000
Re: Can I output a binary file from an AWK program? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2026-03-29 13:35 +0200
Re: Can I output a binary file from an AWK program? Kaz Kylheku <046-301-5902@kylheku.com> - 2026-03-29 23:45 +0000
Re: Can I output a binary file from an AWK program? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2026-03-29 17:07 -0700
Re: Can I output a binary file from an AWK program? Kaz Kylheku <046-301-5902@kylheku.com> - 2026-04-17 00:25 +0000
Re: Can I output a binary file from an AWK program? gazelle@shell.xmission.com (Kenny McCormack) - 2026-03-30 09:37 +0000
Re: Can I output a binary file from an AWK program? gazelle@shell.xmission.com (Kenny McCormack) - 2026-04-02 12:06 +0000
csiph-web