Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #386133

Re: Hex string literals (was Re: C23 thoughts and opinions)

From Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups comp.lang.c
Subject Re: Hex string literals (was Re: C23 thoughts and opinions)
Date 2024-06-17 19:20 -0700
Organization None to speak of
Message-ID <874j9r0z6s.fsf@nosuchdomain.example.com> (permalink)
References (16 earlier) <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me> <87v82b43h6.fsf@nosuchdomain.example.com> <87iky830v7.fsf_-_@nosuchdomain.example.com> <v4pd8t$m52o$1@dont-email.me>

Show all headers | View raw


bart <bc@freeuk.com> writes:
> On 17/06/2024 00:48, Keith Thompson wrote:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>> [...]
>>> uc"..." string literals might be made even simpler, for example allowing
>>> only hex digits and not requiring \x (uc"01020304" rather than
>>> uc"\x01\x02\x03\x04").  That's probably overkill.  uc"..."  literals
>>> could be useful in other contexts, and programmers will want
>>> flexibility.  Maybe something like hex"01020304" (embedded spaces could
>>> be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
>> [...]
>> *If* hexadecimal string literals were to be added to a future
>> version
>> of the language, I think I have a syntax that I like better than
>> what I suggested.
>> Inspired by the existing syntax for integer and floating-point
>> hex constants, I propose using a "0x" prefix.  0x"deadbeef" is an
>> expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
>> with values 0xde, 0xad, 0xbe, 0xef in that order.  Byte order is
>> irrelevant; we're specifying byte values in order, not bytes of
>> the representation of some larger type.  memcpy()ing 0x"deadbeef"
>> to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
>> more exotic possibilities).
>
> Some points:
>
> * Can the hex string span multiple lines? (You say space is the only
>   white-space allowed)

No.

> * If not, would adjacent hex strings be concatenated, as happens with
>   ordinary strings? Since hex data for one char array can be large.

Yes (ordinary string literals, not strings).  C23 says, "If a sequence
of adjacent string literal tokens includes prefixed string literal
tokens, the prefixed tokens shall all have the same prefix.", so you
can't concatenate hex string literals with ordinary string literals.

A use case I have in mind is having #embed, perhaps with a new
parameter, expand to a sequence of one or more hex string literals.
I'd expect
    #embed "big_file.dat" hex(true)
to expand to something like:
    0x"29db6210971cc5bafc17292cb047c143db305eabbee23a4d6ccd63eee37344cc"
    0x"735a83327d056b87d167ecd8626f0059556735a66b7a1761d8ffe94ff5b1dc38"
    ...

> * Your examples use only digits a-f but I assume A-F will work too.

Certainly, and the prefix can be either 0x or 0X, as for hex numeric
constants.

> * Can individual byte values end early, so allowing B to mean 0B? (My
>   scheme requires hex digits to be in pairs.)

Yes, the end of a byte is indicated either by having enough hex digits
(2 if CHAR_BIT==8, more otherwise) or by a space.  0x"1 2" is equivalent
to 0x"01 02" or 0x"0102" if CHAR_BIT==8.

>> Again, unlike other string literals, there is no implicit terminating
>> null byte.  And I suggest making them const, since there's no
>> existing code to break.
>> If CHAR_BIT==8, each byte is represented by two hex digits.  More
>> generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
>> the absence of whitespace.  Added whitespace marks the end of a byte,
>> 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8
>> respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
>> 0x"" is a syntax error, since C doesn't support zero-length arrays.
>> Anything between the quotes other than hex digits and spaces is a
>> syntax error.
>> 0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the
>> end of a byte, but the usage of spaces doesn't have to be consistent.
>
> Here it gets confusing. But first, I understand that CHAR_BIT could be
> 64, where hex literals get long enough that they could do with 
> separators. But spaces now are significant in marking the early end of
> a 64-bit value.

Yes, CHAR_BIT could be 64, but that's vanishingly rare (it might be the
case for some DSP implementations).

> What I have in mind is that somebody might write 0x"12 34 56 78" to
> designate 4 8-bit values totalling 32 bits, and wants the spaces for 
> readability. Compiled for a machine with 16-bit characters, it will
> now represent (in little-endian) the 64-bit value 0x0078005600340012
> instead of 0x78563412.

Right.  Clarity for systems with CHAR_BIT>8 is not a high priority
IMHO.  I'm trying to specify the syntax and semantics to allow
for such systems, but not to make things a little easier for such
exotic systems at the expense of complexity.  And I'd expect most
long hex string literals to be automatically generated, whether by
#embed or some other mechanism, so legibility isn't hugely important.

C23 adds optional digit separators for numeric constants so you can
write 123'456'789 or 0xde'ad'be'ef .  I tend to think that allowing
purely decorative separators *and* semantically significant blanks
would be too confusing, but I wouldn't strongly object to it.

> I assume the hex string can only be used to initialise a char[] array?
> (The feature I presented elsewhere, 'data-strings', could be used to 
> initialise any array type, just like #embed IIUC.)

In the design I have in mind, no, a hex string literal is of type
const unsigned char[N].  The standard defines the *representation*
of a value as the contents of an object of type unsigned char[N].
If you want to initialize something other than array of unsigned
char, you can use some kind of type-punning.

>> This could be made more flexible by allowing various backslash
>> escapes, but I'm not inclined to complicate it too much.
>> Note that the value of a (proposed) hex string literal is not a
>> string unless it happens to end in zero.  I still use the term
>> "string literal" because it's closely tied to existing string
>> literal syntax, and existing string literals don't necessarily
>> represent strings anyway ("embedded\0null\0characters").
>> Binary string literals 0b"11001001" might also be worth
>> considering (that's of type `const unsigned char[1]`).
>
> You mean, values that can only be one byte long? I don't get it. How
> many use-cases are there for char-arrays that are only a byte long?

No, that was just an example.  *If* binary string literals were
allowed they could be arbitrarily long.  0b"11001001 00110110"
would be of type `const unsigned char[2]`.  (I'm not stringly in
favor of supporting binary or octal string literals; they're just
something to be considered if we want to be overly consistent.
Hex alone is probably enough.)

[...]

>> What I'm trying to design here is a more straightforward way to
>> represent raw (unsigned char[]) data in C code, largely but not
>> exclusively for use by #embed.
>
> Sorry, I thought this was an alternative to #embed, for smaller
> amounts of data directly written in source code.

It's not *just* an alternative to #embed.  It's intended partly as
an alternative for what #embed can expand to.  If we were earlier
in the C23 process, I might have proposed this as the default way
#embed works.  Since it's probably too late for such a change, I'd
suggest a new #embed parameter so that `#embed "filename" hex(true)`
expands to one or more hex string constants.  (A compiler could
implement this as an extension even if the standard isn't changed.)

And of course hex string literals could also be used without #embed.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: C23 thoughts and opinions Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-14 14:30 -0700
  Re: C23 thoughts and opinions bart <bc@freeuk.com> - 2024-06-14 23:39 +0100
    Re: C23 thoughts and opinions David Brown <david.brown@hesbynett.no> - 2024-06-15 19:17 +0200
      Re: C23 thoughts and opinions bart <bc@freeuk.com> - 2024-06-15 20:27 +0100
        Re: C23 thoughts and opinions Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-15 22:39 +0000
          Re: C23 thoughts and opinions bart <bc@freeuk.com> - 2024-06-16 00:20 +0100
            Re: C23 thoughts and opinions Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-16 01:16 +0000
            Re: C23 thoughts and opinions "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-16 12:31 -0700
              Re: C23 thoughts and opinions Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-17 00:03 +0000
        Re: C23 thoughts and opinions David Brown <david.brown@hesbynett.no> - 2024-06-16 16:54 +0200
          Re: C23 thoughts and opinions bart <bc@freeuk.com> - 2024-06-16 20:00 +0100
            Re: C23 thoughts and opinions David Brown <david.brown@hesbynett.no> - 2024-06-17 10:49 +0200
            Re: C23 thoughts and opinions Michael S <already5chosen@yahoo.com> - 2024-06-17 13:18 +0300
  Re: C23 thoughts and opinions David Brown <david.brown@hesbynett.no> - 2024-06-15 17:58 +0200
    Re: C23 thoughts and opinions Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-15 22:37 +0000
      Re: C23 thoughts and opinions David Brown <david.brown@hesbynett.no> - 2024-06-16 16:55 +0200
  Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-16 16:48 -0700
    Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-17 11:42 +0200
      Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-17 17:19 -0700
        Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-18 04:19 +0000
          Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-17 22:39 -0700
        Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-18 15:54 +0200
          Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-18 15:00 -0700
            Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-19 09:37 +0200
              Re: Hex string literals (was Re: C23 thoughts and opinions) Kaz Kylheku <643-408-1753@kylheku.com> - 2024-06-19 10:17 +0000
                Re: Hex string literals (was Re: C23 thoughts and opinions) Michael S <already5chosen@yahoo.com> - 2024-06-19 13:44 +0300
                Re: Hex string literals (was Re: C23 thoughts and opinions) bart <bc@freeuk.com> - 2024-06-19 11:57 +0100
                Re: Hex string literals (was Re: C23 thoughts and opinions) scott@slp53.sl.home (Scott Lurndal) - 2024-06-19 13:46 +0000
                Re: Hex string literals (was Re: C23 thoughts and opinions) Michael S <already5chosen@yahoo.com> - 2024-06-19 18:02 +0300
          Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-19 07:25 +0000
            Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-19 10:49 +0200
              Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-21 07:13 +0000
                Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-21 13:06 +0200
                Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-21 22:48 +0000
                Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-22 13:40 +0200
                Re: Hex string literals (was Re: C23 thoughts and opinions) James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-06-21 10:15 -0400
            Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-19 02:32 -0700
      Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-18 04:19 +0000
    Re: Hex string literals (was Re: C23 thoughts and opinions) Richard Kettlewell <invalid@invalid.invalid> - 2024-06-17 11:41 +0100
      Re: Hex string literals (was Re: C23 thoughts and opinions) Richard Kettlewell <invalid@invalid.invalid> - 2024-06-17 14:57 +0100
      Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-17 18:57 -0700
        Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-18 08:12 +0000
        Re: Hex string literals (was Re: C23 thoughts and opinions) Richard Kettlewell <invalid@invalid.invalid> - 2024-06-18 16:14 +0100
    Re: Hex string literals (was Re: C23 thoughts and opinions) bart <bc@freeuk.com> - 2024-06-17 14:21 +0100
      Re: Hex string literals (was Re: C23 thoughts and opinions) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-06-17 19:20 -0700
      Re: Hex string literals (was Re: C23 thoughts and opinions) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-17 22:39 -0700
        Re: Hex string literals (was Re: C23 thoughts and opinions) Michael S <already5chosen@yahoo.com> - 2024-06-18 12:39 +0300
          Re: Hex string literals (was Re: C23 thoughts and opinions) bart <bc@freeuk.com> - 2024-06-18 11:28 +0100
            Re: Hex string literals (was Re: C23 thoughts and opinions) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-18 11:12 -0700
          Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-18 17:20 +0200
          Re: Hex string literals (was Re: C23 thoughts and opinions) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-18 11:04 -0700
          Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-20 06:51 +0000
      Re: Hex string literals (was Re: C23 thoughts and opinions) Kaz Kylheku <643-408-1753@kylheku.com> - 2024-06-18 09:50 +0000
        Re: Hex string literals (was Re: C23 thoughts and opinions) scott@slp53.sl.home (Scott Lurndal) - 2024-06-18 13:56 +0000
          Re: Hex string literals (was Re: C23 thoughts and opinions) David Brown <david.brown@hesbynett.no> - 2024-06-18 17:21 +0200
            Re: Hex string literals (was Re: C23 thoughts and opinions) Richard Harnden <richard.nospam@gmail.invalid> - 2024-06-18 19:25 +0100
              Re: Hex string literals (was Re: C23 thoughts and opinions) Richard Harnden <richard.nospam@gmail.invalid> - 2024-06-18 19:38 +0100
              Re: Hex string literals (was Re: C23 thoughts and opinions) Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-06-21 22:49 +0000

csiph-web