Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #172433

Re: signed/unsigned - what will fail

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: signed/unsigned - what will fail
Date 2023-08-16 20:12 -0700
Organization A noiseless patient Spider
Message-ID <86bkf65pm4.fsf@linuxsc.com> (permalink)
References (4 earlier) <c7bd3b8e-a9d1-4e38-a4ce-f7697bc2e777n@googlegroups.com> <ubhrot$37a4c$1@dont-email.me> <aedc8f63-5638-4d24-a116-594dc63c8fa4n@googlegroups.com> <ubiahg$394g8$4@dont-email.me> <87zg2qfyy8.fsf@bsb.me.uk>

Show all headers | View raw


Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

> David Brown <david.brown@hesbynett.no> writes:
>
>> On 16/08/2023 09:02, Malcolm McLean wrote:
>>
>>> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote:
>>>
>>>> It is, of course, a terrible idea to do any kind of arithmetic or
>>>> hold numbers in plain char - use them for 7-bit ASCII characters
>>>> only.
>>>
>>> I pass about UTF-8 as char *s in Baby X. But of course it is
>>> converted to unsigned char for the actual UTF-8 manipulations.
>
> I think too much is often made of this.  What is it that you are
> worried about?  A lot of UTF-8 fiddling is masking values that will
> have been promoted to int.  The masking can be more portable than
> the conversion.

I'm not sure the question is so clear cut.  First the pointer
conversion (from char * to unsigned char *) is absolutely
guaranteed to work, and accesses through the unsigned char * are
guaranteed to work.  The only question is what bits do you get.
Of course if the implementation is two's complement then it
doesn't matter.  But if it isn't, where did the bits come from?
That matters because values that go through <stdio.h> functions
may have been converted to -- or re-interpreted as, it isn't
clear which -- unsigned char.  If a file is being read that was
produced under a different implementation, reading the bytes as
char's rather than unsigned char's could result in incorrect
values.  Or, unfortunately, vice versa.

Speaking for myself normally I would prefer to do UTF8-style
processing through unsigned char pointers.  My reasoning is it's
easier to think about and (probably) more likely to work in the
unusual cases.  Also, now that I think of it, safer, because
unsigned char cannot have trap representations.  Also if there is
some sort of encoding problem I have more confidence in solving
the problem working directly on unsigned char values than in
reasoning through what will happen when working with the signed
values.  Conversely if I were reading code that was doing UTF8
processing and using plain char, I think I would need to work
harder to understand how it works.  So FWIW there is a personal
view.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:50 -0700
  Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:59 -0700
    Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 07:03 -0700
    Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 07:44 -0700
      Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 08:01 -0700
        Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 09:48 -0700
          Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 08:53 +0200
            Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 00:02 -0700
              Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 13:05 +0200
                Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:40 +0100
                Re: signed/unsigned - what will fail Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 20:12 -0700
            Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 13:31 +0100
              Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 15:31 +0200
                Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:05 +0000
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 16:20 +0200
                Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:43 +0000
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:16 +0200
                Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 17:50 +0000
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:05 +0200
                Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 07:35 -0700
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:21 +0200
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:30 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:33 -0700
                Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:01 +0100
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:09 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:29 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:14 -0700
                Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:52 +0100
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 17:07 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 12:52 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 13:13 -0700
                Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 21:52 +0100
                Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 18:25 +0100
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:15 +0200
                Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 10:44 +0300
                Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:17 +0000
                Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:51 +0000
                Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 15:11 +0300
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 21:20 +0200
              Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:26 -0700
                Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 21:51 +0100
                Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 15:35 -0700
            Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:14 -0700
              Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:34 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:48 -0700

csiph-web