Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #387552

Re: multi bytes character - how to make it defined behavior?

From Ben Bacarisse <ben@bsb.me.uk>
Newsgroups comp.lang.c
Subject Re: multi bytes character - how to make it defined behavior?
Date 2024-08-14 01:32 +0100
Organization A noiseless patient Spider
Message-ID <874j7ot04x.fsf@bsb.me.uk> (permalink)
References <v9frim$3u7qi$1@dont-email.me>

Show all headers | View raw


Thiago Adams <thiago.adams@gmail.com> writes:

> static_assert('×' == 50071);

static_assert(U'×' == 215);

works, but then I don't know what you were trying to do.

> GCC -  warning multi byte
> CLANG - error character too large
>
> I think instead of "multi bytes" we need "multi characters" - not
> bytes.
>
> We decode utf8 then we have the character to decide if it is multi char or
> not.

These terms can be confusing and I don't know exactly how you are using
them.  Basically I simply don't know what that second sentence is
saying.

> decoding '×' would consume bytes 195 and 151 the result is the decoded
> Unicode value of 215.

Yes, Unicode 215 is UTF-8 encoded as two bytes with values 195 and 151.

> It is not multi byte : 256*195 + 151 = 50071

If that × is UTF-8 encoded then it might look, to the compiler, just
like an old-fashioned multi-character character constant just like 'ab'
does.  Then again, it might not.  gcc and clan take different views on
the matter.

You can get clang to that the same view a gcc by writing

  static_assert('\xC3\x97' == 50071);

instead.  Now both gcc and clang see it for what it is: an old-fashioned
multi-character character constant.

> O the other hand 'ab' is "multi character" resulting

The term for these things used to be "multi-byte character constant" and
they were highly non-portable.  The trouble is that the term "multi-byte
character" now refers to highly portable encodings like UTF-8.  Maybe
that's why gcc seems to have changed it's warning from what you gave to:

  warning: multi-character character constant [-Wmultichar]

> 256 * 'a' + 'b' = 256*97+98= 24930
>
> One consequence is that
>
> 'ab' == '𤤰'
>
> But I don't think this is a problem. At least everything is defined.
>

-- 
Ben.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-13 11:45 -0300
  Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 00:52 +0100
    Re: multi bytes character - how to make it defined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 17:33 -0700
      Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 08:41 -0300
        Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 14:05 +0100
          Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 10:31 -0300
            Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 16:34 +0100
              Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 13:10 -0300
                Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 13:27 -0300
                Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 18:07 +0100
                Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 14:40 -0300
                Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 19:12 +0100
                Re: multi bytes character - how to make it defined behavior? Thiago Adams <thiago.adams@gmail.com> - 2024-08-14 15:28 -0300
                Re: multi bytes character - how to make it defined behavior? Bart <bc@freeuk.com> - 2024-08-14 20:32 +0100
                Re: multi bytes character - how to make it defined behavior? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-15 02:43 +0000
            Re: multi bytes character - how to make it defined behavior? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-15 02:41 +0000
          Re: multi bytes character - how to make it defined behavior? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-15 01:39 +0000
  Re: multi bytes character - how to make it defined behavior? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-14 01:32 +0100
  Re: multi bytes character - how to make it defined behavior? Richard Damon <richard@damon-family.org> - 2024-08-13 23:44 -0400

csiph-web