Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #395822

Re: u8"" c11 c23

Path csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: u8"" c11 c23
Date Mon, 15 Dec 2025 11:13:21 -0800
Organization A noiseless patient Spider
Lines 86
Message-ID <86h5trtv72.fsf@linuxsc.com> (permalink)
References <10d5vck$3kufd$1@dont-email.me> <875xc9p674.fsf@example.invalid>
MIME-Version 1.0
Content-Type text/plain; charset=us-ascii
Injection-Date Mon, 15 Dec 2025 19:13:24 +0000 (UTC)
Injection-Info dont-email.me; posting-host="3b66f75cc16331490dd39d06d7ef9603"; logging-data="2231266"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1925XR7yA1mx4Nj5hgQtGvdKN3LUkKvASc="
User-Agent Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock sha1:vJ3eC3voj4DI4GIcC3jnVMNtC6k= sha1:rbuAHst3NrXJkyLFwUottEzOz+8=
Xref csiph.com comp.lang.c:395822

Show key headers only | View raw


Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Thiago Adams <thiago.adams@gmail.com> writes:
>
>> speaking on signed x unsigned,
>>
>> u8"a"  in C11 had the type char [N]. Normally char is signed
>
> I would have said "commonly" rather than "normally".  Not an
> important point.
>
>> in C23 it is unsigned char8_t  [N].
>>
>> when converting code from c11 to c23 we have a error here
>> const char* s = u8""
>>
>>
>> I generally "cast char* " to "unsigned char*" when handling
>> something with utf8.  I am not u8"" , I use just " " with utf8
>> encoded source code and I just assume const char* is utf8.
>
> That raises another issue.
>
> The <uchar.h> header was introduced in C99.  In C99, C11, and C17,
> that header defines char16_t and char32_t.  C23 introduces char8_t.
>
> There doesn't seem to be any way, other than checking the value of
> __STDC_VERSION__ to determine whether char8_t is defined or not.
> There are not *_MIN or *_MAX macros for these types, either in
> <uchar.h> or in <limits.h>.  A test program I just wrote would have
> been a little simpler if I could have used `#ifdef CHAR8_MAX`.
>
> Here's the test program :
>
> #include <stdio.h>
> #include <uchar.h>
>
> #define TYPEOF(x) \
>     (_Generic(x, \
>         char:  "char", \
>         signed char:  "signed char", \
>         unsigned char:  "unsigned char", \
>         short:  "short", \
>         unsigned short:  "unsigned short", \
>         int:  "int", \
>         unsigned int:  "unsigned int", \
>         long:  "long", \
>         unsigned long:  "unsigned long", \
>         long long:  "long long", \
>         unsigned long long:  "unsigned long long"))
>
> int main(void) {
>     printf("__STDC_VERSION__ = %ldL\n", __STDC_VERSION__);
>     printf("u8\"a\"[0] is of type %s\n",
>            TYPEOF(u8"a"[0]));
> #if __STDC_VERSION__ >= 202311L
>     printf("char8_t is %s\n", TYPEOF((char8_t)0));
> #endif
>     printf("char16_t is %s\n", TYPEOF((char16_t)0));
>     printf("char32_t is %s\n", TYPEOF((char32_t)0));
> }
>
> Its output with `gcc -std=c17` :
>
> __STDC_VERSION__ = 201710L
> u8"a"[0] is of type char
> char16_t is unsigned short
> char32_t is unsigned int
>
> Its output with `gcc -std=c23` :
>
> __STDC_VERSION__ = 202311L
> u8"a"[0] is of type unsigned char
> char8_t is unsigned char
> char16_t is unsigned short
> char32_t is unsigned int

Since C23 defines char8_t to be the same type as unsigned char,
it seems better to just define it when it isn't there:

    #include <limits.h>

    #if CHAR_BIT == 8 && __STDC_VERSION__ < 202311
    typedef unsigned char char8_t;
    #endif

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-10-20 15:35 -0300
  Re: u8"" c11 c23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-20 15:19 -0700
    Re: u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-10-21 07:57 -0300
      Re: u8"" c11 c23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-21 10:26 -0700
        Re: u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-10-21 15:04 -0300
          Re: u8"" c11 c23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-21 11:51 -0700
            Re: u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-10-21 16:17 -0300
    Re: u8"" c11 c23 Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-15 11:13 -0800
      Re: u8"" c11 c23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-15 14:27 -0800
        Re: u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-12-16 07:57 -0300
          Re: u8"" c11 c23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-16 04:17 -0800
        Re: u8"" c11 c23 Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-21 22:37 -0800
  Re: u8"" c11 c23 Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-21 10:35 +0200
    Re: u8"" c11 c23 Thiago Adams <thiago.adams@gmail.com> - 2025-10-21 07:07 -0300
      Re: u8"" c11 c23 Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-21 12:09 +0200
  Re: u8"" c11 c23 BGB <cr88192@gmail.com> - 2025-12-16 14:59 -0600

csiph-web