Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #172278 > unrolled thread

signed/unsigned - what will fail

Started byfir <profesor.fir@gmail.com>
First post2023-08-15 06:50 -0700
Last post2023-08-16 08:48 -0700
Articles 20 on this page of 45 — 11 participants

Back to article view | Back to comp.lang.c


Contents

  signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:50 -0700
    Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:59 -0700
      Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 07:03 -0700
      Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 07:44 -0700
        Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 08:01 -0700
          Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 09:48 -0700
            Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 08:53 +0200
              Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 00:02 -0700
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 13:05 +0200
                  Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:40 +0100
                    Re: signed/unsigned - what will fail Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 20:12 -0700
              Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 13:31 +0100
                Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 15:31 +0200
                  Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:05 +0000
                    Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 16:20 +0200
                      Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:43 +0000
                        Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:16 +0200
                          Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 17:50 +0000
                            Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:05 +0200
                  Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 07:35 -0700
                    Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:21 +0200
                      Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:30 -0700
                        Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:33 -0700
                          Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:01 +0100
                            Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:09 -0700
                              Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:29 -0700
                            Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:14 -0700
                              Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:52 +0100
                                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 17:07 -0700
                        Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 12:52 -0700
                          Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 13:13 -0700
                      Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 21:52 +0100
                  Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 18:25 +0100
                    Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:15 +0200
                  Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 10:44 +0300
                    Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:17 +0000
                      Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:51 +0000
                        Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 15:11 +0300
                        Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 21:20 +0200
                Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:26 -0700
                  Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 21:51 +0100
                    Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 15:35 -0700
              Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:14 -0700
                Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:34 -0700
                  Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:48 -0700

Page 1 of 3  [1] 2 3  Next page →


#172278 — signed/unsigned - what will fail

Fromfir <profesor.fir@gmail.com>
Date2023-08-15 06:50 -0700
Subjectsigned/unsigned - what will fail
Message-ID<21d1ef97-8620-4115-b412-7279e0ef4d6bn@googlegroups.com>
im not sure if im a fan of division signed/unsigned
(for example maybe there are cases where you dont care
and in such case treat given int as both signed and unsigned
(as it kinda is)

im not stating hovever to define 3 tates signed, unsigned, and 
dontcare or just remove signed unsigned or what else - becouse i dont
know, i dont understand it enough well

but asume i use dont care (and always write just "int"):
what will exactly fail?

[toc] | [next] | [standalone]


#172280

Fromfir <profesor.fir@gmail.com>
Date2023-08-15 06:59 -0700
Message-ID<7ffec8c7-1b4c-4c3c-9342-daed7af19dabn@googlegroups.com>
In reply to#172278
wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a):
> im not sure if im a fan of division signed/unsigned 
> (for example maybe there are cases where you dont care 
> and in such case treat given int as both signed and unsigned 
> (as it kinda is) 
> 
> im not stating hovever to define 3 tates signed, unsigned, and 
> dontcare or just remove signed unsigned or what else - becouse i dont 
> know, i dont understand it enough well 
> 
> but asume i use dont care (and always write just "int"): 
> what will exactly fail?

im not sure but probably could assume that additions and subtractions are safe,
but im not sure:

seay 

char a = 200, b= 200;

int c = a+b; //is it 400?

int d = a*b; //what with that?

[toc] | [prev] | [next] | [standalone]


#172281

Fromfir <profesor.fir@gmail.com>
Date2023-08-15 07:03 -0700
Message-ID<58c59847-321e-4174-9bd0-255a3159d220n@googlegroups.com>
In reply to#172280
wtorek, 15 sierpnia 2023 o 15:59:15 UTC+2 fir napisał(a):
> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): 
> > im not sure if im a fan of division signed/unsigned 
> > (for example maybe there are cases where you dont care 
> > and in such case treat given int as both signed and unsigned 
> > (as it kinda is) 
> > 
> > im not stating hovever to define 3 tates signed, unsigned, and 
> > dontcare or just remove signed unsigned or what else - becouse i dont 
> > know, i dont understand it enough well 
> > 
> > but asume i use dont care (and always write just "int"): 
> > what will exactly fail?
> im not sure but probably could assume that additions and subtractions are safe, 
> but im not sure: 
> 
> seay 
> 
> char a = 200, b= 200; 
> 
> int c = a+b; //is it 400? 
> 
> int d = a*b; //what with that?

i would also like probably to insight the thesis that programming language like c should be donte maybe thsi way that maybe thsi kind of expresions should work like int and char woudl be of class "dontcare" i mean possibly giving proper results for a=b=-20, and a=b=200; (and maybe even a=b=-200) but i dont 'inspected' it all

[toc] | [prev] | [next] | [standalone]


#172287

FromÖö Tiib <ootiib@hot.ee>
Date2023-08-15 07:44 -0700
Message-ID<89f530cb-dd82-46f9-9567-a1f81e55d239n@googlegroups.com>
In reply to#172280
On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote:
> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): 
> > im not sure if im a fan of division signed/unsigned 
> > (for example maybe there are cases where you dont care 
> > and in such case treat given int as both signed and unsigned 
> > (as it kinda is) 
> > 
> > im not stating hovever to define 3 tates signed, unsigned, and 
> > dontcare or just remove signed unsigned or what else - becouse i dont 
> > know, i dont understand it enough well 
> > 
> > but asume i use dont care (and always write just "int"): 
> > what will exactly fail?
> im not sure but probably could assume that additions and subtractions are safe, 
> but im not sure: 
> 
> seay 
> 
> char a = 200, b= 200; 
> 
You contradict your "and always write just "int"" I see clearly char there.

Do you turn off warnings on your compilers? With gcc you get most likely
something like: "warning: overflow in conversion from 'int' to 'char' changes
value from '200' to '-56' [-Woverflow]"

> int c = a+b; //is it 400? 

Can be, but it is more likely -112. 

> 
> int d = a*b; //what with that?

Can be 40000, 3136, -25536, what is your point?

[toc] | [prev] | [next] | [standalone]


#172290

Fromfir <profesor.fir@gmail.com>
Date2023-08-15 08:01 -0700
Message-ID<6fcfcd10-82e7-4ecb-8bec-e6292ff73322n@googlegroups.com>
In reply to#172287
wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a):
> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: 
> > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): 
> > > im not sure if im a fan of division signed/unsigned 
> > > (for example maybe there are cases where you dont care 
> > > and in such case treat given int as both signed and unsigned 
> > > (as it kinda is) 
> > > 
> > > im not stating hovever to define 3 tates signed, unsigned, and 
> > > dontcare or just remove signed unsigned or what else - becouse i dont 
> > > know, i dont understand it enough well 
> > > 
> > > but asume i use dont care (and always write just "int"): 
> > > what will exactly fail? 
> > im not sure but probably could assume that additions and subtractions are safe, 
> > but im not sure: 
> > 
> > seay 
> > 
> > char a = 200, b= 200; 
> >
> You contradict your "and always write just "int"" I see clearly char there. 
> 
> Do you turn off warnings on your compilers? With gcc you get most likely 
> something like: "warning: overflow in conversion from 'int' to 'char' changes 
> value from '200' to '-56' [-Woverflow]"
> > int c = a+b; //is it 400?
> Can be, but it is more likely -112.
> > 
> > int d = a*b; //what with that?
> Can be 40000, 3136, -25536, what is your point?

it cant be 2 or 3 at once it is one of it..

points may be many but im close to think that such "expresiion" shouldgenerally 'carry' 
a real value this way if

int a = -200, b=-200;
int c = a*b ; c should be  40000 and so on

i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations

[toc] | [prev] | [next] | [standalone]


#172315

FromÖö Tiib <ootiib@hot.ee>
Date2023-08-15 09:48 -0700
Message-ID<c7bd3b8e-a9d1-4e38-a4ce-f7697bc2e777n@googlegroups.com>
In reply to#172290
On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote:
> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): 
> > On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: 
> > > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): 
> > > > im not sure if im a fan of division signed/unsigned 
> > > > (for example maybe there are cases where you dont care 
> > > > and in such case treat given int as both signed and unsigned 
> > > > (as it kinda is) 
> > > > 
> > > > im not stating hovever to define 3 tates signed, unsigned, and 
> > > > dontcare or just remove signed unsigned or what else - becouse i dont 
> > > > know, i dont understand it enough well 
> > > > 
> > > > but asume i use dont care (and always write just "int"): 
> > > > what will exactly fail? 
> > > im not sure but probably could assume that additions and subtractions are safe, 
> > > but im not sure: 
> > > 
> > > seay 
> > > 
> > > char a = 200, b= 200; 
> > > 
> > You contradict your "and always write just "int"" I see clearly char there. 
> > 
> > Do you turn off warnings on your compilers? With gcc you get most likely 
> > something like: "warning: overflow in conversion from 'int' to 'char' changes 
> > value from '200' to '-56' [-Woverflow]" 
> > > int c = a+b; //is it 400? 
> > Can be, but it is more likely -112. 
> > > 
> > > int d = a*b; //what with that? 
> > Can be 40000, 3136, -25536, what is your point?
> 
> it cant be 2 or 3 at once it is one of it.. 
> 
It may cause signed integer overflow on 8 bit or 16 bit embedded system
and that is undefined behavior unless something defines it for said system.

>
> points may be many but im close to think that such "expresiion" shouldgenerally 'carry' 
> a real value this way if 
> 
> int a = -200, b=-200; 
> int c = a*b ; c should be 40000 and so on 
> 
> i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations

How so? If on 8 bit microcontroller int has 16 bits of storage with value
range -32768 to 32767 then there are no way how it can be 40000.
It is simply impossible for c to have that value.

[toc] | [prev] | [next] | [standalone]


#172352

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-16 08:53 +0200
Message-ID<ubhrot$37a4c$1@dont-email.me>
In reply to#172315
On 15/08/2023 18:48, Öö Tiib wrote:
> On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote:
>> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a):
>>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote:
>>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a):
>>>>> im not sure if im a fan of division signed/unsigned
>>>>> (for example maybe there are cases where you dont care
>>>>> and in such case treat given int as both signed and unsigned
>>>>> (as it kinda is)
>>>>>
>>>>> im not stating hovever to define 3 tates signed, unsigned, and
>>>>> dontcare or just remove signed unsigned or what else - becouse i dont
>>>>> know, i dont understand it enough well
>>>>>
>>>>> but asume i use dont care (and always write just "int"):
>>>>> what will exactly fail?
>>>> im not sure but probably could assume that additions and subtractions are safe,
>>>> but im not sure:
>>>>
>>>> seay
>>>>
>>>> char a = 200, b= 200;
>>>>
>>> You contradict your "and always write just "int"" I see clearly char there.
>>>
>>> Do you turn off warnings on your compilers? With gcc you get most likely
>>> something like: "warning: overflow in conversion from 'int' to 'char' changes
>>> value from '200' to '-56' [-Woverflow]"
>>>> int c = a+b; //is it 400?
>>> Can be, but it is more likely -112.

Just to be clear about this in case anyone wonders why you wrote that - 
plain "char" can be signed or unsigned, depending on the platform. 
Older platforms (including x86) tend to have plain char as signed, while 
more modern ABIs are usually unsigned.

It is, of course, a terrible idea to do any kind of arithmetic or hold 
numbers in plain char - use them for 7-bit ASCII characters only.  For 
anything with numbers, use "signed char", "unsigned char", or (better, 
IMHO) appropriate <stdint.h> types when you need a small integer type.

So after "char a = 200;", the value in "a" depends on the target's ABI. 
(Enabling warnings on the compiler is always a good idea!)

>>>>
>>>> int d = a*b; //what with that?
>>> Can be 40000, 3136, -25536, what is your point?
>>
>> it cant be 2 or 3 at once it is one of it..

If plain char is signed on the target, then "d" will always be 3136.

If plain char is unsigned and int is bigger than 16 bit (generally 32 
bit) on the target, then "d" will always be 40000.

If plain char is unsigned and int is 16-bit, then there is an arithmetic 
overflow in the signed int multiplication - that's undefined behaviour, 
and there's no limit to what can go wrong, including the compiler 
generating code that treats the result as 2 or 3 of these values. 
Often, it will appear to be -25536 - but that is not guaranteed or 
reliable in any way (without specific compiler flags or documentation).

(I know you, Öö, know this - but fir may be less sure.)

>>
> It may cause signed integer overflow on 8 bit or 16 bit embedded system
> and that is undefined behavior unless something defines it for said system.
> 
>>
>> points may be many but im close to think that such "expresiion" shouldgenerally 'carry'
>> a real value this way if
>>
>> int a = -200, b=-200;
>> int c = a*b ; c should be 40000 and so on
>>
>> i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations
> 
> How so? If on 8 bit microcontroller int has 16 bits of storage with value
> range -32768 to 32767 then there are no way how it can be 40000.
> It is simply impossible for c to have that value.

[toc] | [prev] | [next] | [standalone]


#172353

FromMalcolm McLean <malcolm.arthur.mclean@gmail.com>
Date2023-08-16 00:02 -0700
Message-ID<aedc8f63-5638-4d24-a116-594dc63c8fa4n@googlegroups.com>
In reply to#172352
On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote:
> On 15/08/2023 18:48, Öö Tiib wrote: 
> > On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote: 
> >> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): 
> >>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: 
> >>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): 
> >>>>> im not sure if im a fan of division signed/unsigned 
> >>>>> (for example maybe there are cases where you dont care 
> >>>>> and in such case treat given int as both signed and unsigned 
> >>>>> (as it kinda is) 
> >>>>> 
> >>>>> im not stating hovever to define 3 tates signed, unsigned, and 
> >>>>> dontcare or just remove signed unsigned or what else - becouse i dont 
> >>>>> know, i dont understand it enough well 
> >>>>> 
> >>>>> but asume i use dont care (and always write just "int"): 
> >>>>> what will exactly fail? 
> >>>> im not sure but probably could assume that additions and subtractions are safe, 
> >>>> but im not sure: 
> >>>> 
> >>>> seay 
> >>>> 
> >>>> char a = 200, b= 200; 
> >>>> 
> >>> You contradict your "and always write just "int"" I see clearly char there. 
> >>> 
> >>> Do you turn off warnings on your compilers? With gcc you get most likely 
> >>> something like: "warning: overflow in conversion from 'int' to 'char' changes 
> >>> value from '200' to '-56' [-Woverflow]" 
> >>>> int c = a+b; //is it 400? 
> >>> Can be, but it is more likely -112.
> Just to be clear about this in case anyone wonders why you wrote that - 
> plain "char" can be signed or unsigned, depending on the platform. 
> Older platforms (including x86) tend to have plain char as signed, while 
> more modern ABIs are usually unsigned. 
> 
> It is, of course, a terrible idea to do any kind of arithmetic or hold 
> numbers in plain char - use them for 7-bit ASCII characters only. 
>
I pass about UTF-8 as char *s in Baby X. But of course it is converted to unsigned
char for the actual UTF-8 manipulations. 

[toc] | [prev] | [next] | [standalone]


#172365

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-16 13:05 +0200
Message-ID<ubiahg$394g8$4@dont-email.me>
In reply to#172353
On 16/08/2023 09:02, Malcolm McLean wrote:
> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote:
>> On 15/08/2023 18:48, Öö Tiib wrote:
>>> On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote:
>>>> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a):
>>>>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote:
>>>>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a):
>>>>>>> im not sure if im a fan of division signed/unsigned
>>>>>>> (for example maybe there are cases where you dont care
>>>>>>> and in such case treat given int as both signed and unsigned
>>>>>>> (as it kinda is)
>>>>>>>
>>>>>>> im not stating hovever to define 3 tates signed, unsigned, and
>>>>>>> dontcare or just remove signed unsigned or what else - becouse i dont
>>>>>>> know, i dont understand it enough well
>>>>>>>
>>>>>>> but asume i use dont care (and always write just "int"):
>>>>>>> what will exactly fail?
>>>>>> im not sure but probably could assume that additions and subtractions are safe,
>>>>>> but im not sure:
>>>>>>
>>>>>> seay
>>>>>>
>>>>>> char a = 200, b= 200;
>>>>>>
>>>>> You contradict your "and always write just "int"" I see clearly char there.
>>>>>
>>>>> Do you turn off warnings on your compilers? With gcc you get most likely
>>>>> something like: "warning: overflow in conversion from 'int' to 'char' changes
>>>>> value from '200' to '-56' [-Woverflow]"
>>>>>> int c = a+b; //is it 400?
>>>>> Can be, but it is more likely -112.
>> Just to be clear about this in case anyone wonders why you wrote that -
>> plain "char" can be signed or unsigned, depending on the platform.
>> Older platforms (including x86) tend to have plain char as signed, while
>> more modern ABIs are usually unsigned.
>>
>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>> numbers in plain char - use them for 7-bit ASCII characters only.
>>
> I pass about UTF-8 as char *s in Baby X. But of course it is converted to unsigned
> char for the actual UTF-8 manipulations.

Conversion from plain char to unsigned char will always be safe and 
well-defined, regardless of the signedness of plain char.  And it is 
always safe to access plain char data through an unsigned char pointer. 
Given that string literals are treated as char array (unless you have an 
encoding prefix for wide characters of some kind), using plain char 
seems entirely reasonable.  But "const char *" would be vastly better.

(The conversion from unsigned char to plain char is implementation 
dependent, but I know of no platforms where it will not work in the 
simple and obvious way.)


[toc] | [prev] | [next] | [standalone]


#172417

FromBen Bacarisse <ben.usenet@bsb.me.uk>
Date2023-08-16 22:40 +0100
Message-ID<87zg2qfyy8.fsf@bsb.me.uk>
In reply to#172365
David Brown <david.brown@hesbynett.no> writes:

> On 16/08/2023 09:02, Malcolm McLean wrote:
>> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote:

>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>> numbers in plain char - use them for 7-bit ASCII characters only.
>>>
>> I pass about UTF-8 as char *s in Baby X. But of course it is
>> converted to unsigned char for the actual UTF-8 manipulations.

I think too much is often made of this.  What is it that you are worried
about?  A lot of UTF-8 fiddling is masking values that will have been
promoted to int.  The masking can be more portable than the conversion.

> Conversion from plain char to unsigned char will always be safe and
> well-defined, regardless of the signedness of plain char.

It's not UB in the dreaded C sense, but it's not defined in a way that
will help you for UTF-8.  Let's say you want to see if *cp (a plain
pointer to signed char) points to a UTF-8 continuation character.  I'd
write

  (*cp & 0xC0) == 0x80

and the value of six bits represented is *cp & 0x3F, i.e. 2.

Will this fail because char is signed?  No.  Will it fail if char is
signed and the implementation uses sign-and-magnitude?  I don't think so
because UTF-8 mandates the bit pattern not the value.  (If anyone has
ever seen UTF-8 on a non-two's complement machine with signed char, do
let me know!)

> And it is always
> safe to access plain char data through an unsigned char pointer.

But what happens if I convert to unsigned char:

  const unsigned char *ucp = cp;

Now what is *cp & 0x3F on a sign-and-magnitude machine?  It's 62 not 2.

Admittedly, anything but two's complement is now relegated to C's
history even if the machines still exist, so may we should not care
anymore.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]


#172433

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-16 20:12 -0700
Message-ID<86bkf65pm4.fsf@linuxsc.com>
In reply to#172417
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

> David Brown <david.brown@hesbynett.no> writes:
>
>> On 16/08/2023 09:02, Malcolm McLean wrote:
>>
>>> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote:
>>>
>>>> It is, of course, a terrible idea to do any kind of arithmetic or
>>>> hold numbers in plain char - use them for 7-bit ASCII characters
>>>> only.
>>>
>>> I pass about UTF-8 as char *s in Baby X. But of course it is
>>> converted to unsigned char for the actual UTF-8 manipulations.
>
> I think too much is often made of this.  What is it that you are
> worried about?  A lot of UTF-8 fiddling is masking values that will
> have been promoted to int.  The masking can be more portable than
> the conversion.

I'm not sure the question is so clear cut.  First the pointer
conversion (from char * to unsigned char *) is absolutely
guaranteed to work, and accesses through the unsigned char * are
guaranteed to work.  The only question is what bits do you get.
Of course if the implementation is two's complement then it
doesn't matter.  But if it isn't, where did the bits come from?
That matters because values that go through <stdio.h> functions
may have been converted to -- or re-interpreted as, it isn't
clear which -- unsigned char.  If a file is being read that was
produced under a different implementation, reading the bytes as
char's rather than unsigned char's could result in incorrect
values.  Or, unfortunately, vice versa.

Speaking for myself normally I would prefer to do UTF8-style
processing through unsigned char pointers.  My reasoning is it's
easier to think about and (probably) more likely to work in the
unusual cases.  Also, now that I think of it, safer, because
unsigned char cannot have trap representations.  Also if there is
some sort of encoding problem I have more confidence in solving
the problem working directly on unsigned char values than in
reasoning through what will happen when working with the signed
values.  Conversely if I were reading code that was doing UTF8
processing and using plain char, I think I would need to work
harder to understand how it works.  So FWIW there is a personal
view.

[toc] | [prev] | [next] | [standalone]


#172372

FromBart <bc@freeuk.com>
Date2023-08-16 13:31 +0100
Message-ID<ubifj9$39v7p$1@dont-email.me>
In reply to#172352
On 16/08/2023 07:53, David Brown wrote:

> Just to be clear about this in case anyone wonders why you wrote that - 
> plain "char" can be signed or unsigned, depending on the platform. Older 
> platforms (including x86) tend to have plain char as signed, while more 
> modern ABIs are usually unsigned.
> 
> It is, of course, a terrible idea to do any kind of arithmetic or hold 
> numbers in plain char - use them for 7-bit ASCII characters only.  For 
> anything with numbers, use "signed char", "unsigned char", or (better, 
> IMHO) appropriate <stdint.h> types when you need a small integer type.

Unfortunately, in C, string literals have type char*, and char* strings 
are encountered everywhere, where they can store ASCII, extended ASCII 
of various kinds, or UTF8.

But you frequently need to manipulate individual character codes.

[toc] | [prev] | [next] | [standalone]


#172375

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-16 15:31 +0200
Message-ID<ubij3o$3afvd$1@dont-email.me>
In reply to#172372
On 16/08/2023 14:31, Bart wrote:
> On 16/08/2023 07:53, David Brown wrote:
> 
>> Just to be clear about this in case anyone wonders why you wrote that 
>> - plain "char" can be signed or unsigned, depending on the platform. 
>> Older platforms (including x86) tend to have plain char as signed, 
>> while more modern ABIs are usually unsigned.
>>
>> It is, of course, a terrible idea to do any kind of arithmetic or hold 
>> numbers in plain char - use them for 7-bit ASCII characters only.  For 
>> anything with numbers, use "signed char", "unsigned char", or (better, 
>> IMHO) appropriate <stdint.h> types when you need a small integer type.
> 
> Unfortunately, in C, string literals have type char*, and char* strings 
> are encountered everywhere, where they can store ASCII, extended ASCII 
> of various kinds, or UTF8.
> 
> But you frequently need to manipulate individual character codes.

It's quite rare to have to manipulate characters individually, other 
than perhaps comparing them to character constants.  Certainly general 
arithmetic is very rare on characters, with the exception of something 
like UTF8 code point extraction.  Do you have examples where people 
might reasonably want to perform arithmetic on plain char, that you 
think occur frequently?

I think it is entirely appropriate to use "char" for characters (and 
const char* for immutable strings).  But I don't think it is appropriate 
for any kind of arithmetic.

[toc] | [prev] | [next] | [standalone]


#172376

Fromscott@slp53.sl.home (Scott Lurndal)
Date2023-08-16 14:05 +0000
Message-ID<HI4DM.141794$ftCb.112835@fx34.iad>
In reply to#172375
David Brown <david.brown@hesbynett.no> writes:
>On 16/08/2023 14:31, Bart wrote:
>> On 16/08/2023 07:53, David Brown wrote:
>> 
>>> Just to be clear about this in case anyone wonders why you wrote that 
>>> - plain "char" can be signed or unsigned, depending on the platform. 
>>> Older platforms (including x86) tend to have plain char as signed, 
>>> while more modern ABIs are usually unsigned.
>>>
>>> It is, of course, a terrible idea to do any kind of arithmetic or hold 
>>> numbers in plain char - use them for 7-bit ASCII characters only.  For 
>>> anything with numbers, use "signed char", "unsigned char", or (better, 
>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>> 
>> Unfortunately, in C, string literals have type char*, and char* strings 
>> are encountered everywhere, where they can store ASCII, extended ASCII 
>> of various kinds, or UTF8.
>> 
>> But you frequently need to manipulate individual character codes.
>
>It's quite rare to have to manipulate characters individually, other 
>than perhaps comparing them to character constants.  Certainly general 
>arithmetic is very rare on characters, with the exception of something 
>like UTF8 code point extraction. 

toupper() and tolower() were often implemented using arithmetic
on characters in the C locale, but the point stands.

[toc] | [prev] | [next] | [standalone]


#172378

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-16 16:20 +0200
Message-ID<ubilva$3as8c$1@dont-email.me>
In reply to#172376
On 16/08/2023 16:05, Scott Lurndal wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 16/08/2023 14:31, Bart wrote:
>>> On 16/08/2023 07:53, David Brown wrote:
>>>
>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>> Older platforms (including x86) tend to have plain char as signed,
>>>> while more modern ABIs are usually unsigned.
>>>>
>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>> numbers in plain char - use them for 7-bit ASCII characters only.  For
>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>
>>> Unfortunately, in C, string literals have type char*, and char* strings
>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>> of various kinds, or UTF8.
>>>
>>> But you frequently need to manipulate individual character codes.
>>
>> It's quite rare to have to manipulate characters individually, other
>> than perhaps comparing them to character constants.  Certainly general
>> arithmetic is very rare on characters, with the exception of something
>> like UTF8 code point extraction.
> 
> toupper() and tolower() were often implemented using arithmetic
> on characters in the C locale, but the point stands.
> 

Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) 
rather than arithmetic instructions?  But I wouldn't do logical 
operations on plain char either!  (The C library implementation can, of 
course, do whatever it likes.)

[toc] | [prev] | [next] | [standalone]


#172380

Fromscott@slp53.sl.home (Scott Lurndal)
Date2023-08-16 14:43 +0000
Message-ID<%f5DM.670467$AsA.656909@fx18.iad>
In reply to#172378
David Brown <david.brown@hesbynett.no> writes:
>On 16/08/2023 16:05, Scott Lurndal wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> On 16/08/2023 14:31, Bart wrote:
>>>> On 16/08/2023 07:53, David Brown wrote:
>>>>
>>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>>> Older platforms (including x86) tend to have plain char as signed,
>>>>> while more modern ABIs are usually unsigned.
>>>>>
>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>>> numbers in plain char - use them for 7-bit ASCII characters only.  For
>>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>>
>>>> Unfortunately, in C, string literals have type char*, and char* strings
>>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>>> of various kinds, or UTF8.
>>>>
>>>> But you frequently need to manipulate individual character codes.
>>>
>>> It's quite rare to have to manipulate characters individually, other
>>> than perhaps comparing them to character constants.  Certainly general
>>> arithmetic is very rare on characters, with the exception of something
>>> like UTF8 code point extraction.
>> 
>> toupper() and tolower() were often implemented using arithmetic
>> on characters in the C locale, but the point stands.
>> 
>
>Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) 
>rather than arithmetic instructions?  But I wouldn't do logical 
>operations on plain char either!  (The C library implementation can, of 
>course, do whatever it likes.)

Unix v7:
#define isalpha(c)      ((_ctype_+1)[c]&(_U|_L))
#define isupper(c)      ((_ctype_+1)[c]&_U)
#define islower(c)      ((_ctype_+1)[c]&_L)
#define isdigit(c)      ((_ctype_+1)[c]&_N)
#define isxdigit(c)     ((_ctype_+1)[c]&(_N|_X))
#define isspace(c)      ((_ctype_+1)[c]&_S)
#define ispunct(c)      ((_ctype_+1)[c]&_P)
#define isalnum(c)      ((_ctype_+1)[c]&(_U|_L|_N))
#define isprint(c)      ((_ctype_+1)[c]&(_P|_U|_L|_N))
#define iscntrl(c)      ((_ctype_+1)[c]&_C)
#define isascii(c)      ((unsigned)(c)<=0177)
#define toupper(c)      ((c)-'a'+'A')
#define tolower(c)      ((c)-'A'+'a')
#define toascii(c)      ((c)&0177)


Unixware 2.01:

#define isascii(c)      (!((c) & ~0177))
#define toascii(c)      ((c) & 0177)

#if defined(_XOPEN_SOURCE) || (__STDC__ == 0 \
        && !defined(_POSIX_SOURCE) && !defined(_POSIX_C_SOURCE))
#define _toupper(c)     ((__ctype + 258)[c])
#define _tolower(c)     ((__ctype + 258)[c])

>

[toc] | [prev] | [next] | [standalone]


#172391

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-16 19:16 +0200
Message-ID<ubj09u$3ckl4$1@dont-email.me>
In reply to#172380
On 16/08/2023 16:43, Scott Lurndal wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 16/08/2023 16:05, Scott Lurndal wrote:
>>> David Brown <david.brown@hesbynett.no> writes:
>>>> On 16/08/2023 14:31, Bart wrote:
>>>>> On 16/08/2023 07:53, David Brown wrote:
>>>>>
>>>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>>>> Older platforms (including x86) tend to have plain char as signed,
>>>>>> while more modern ABIs are usually unsigned.
>>>>>>
>>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>>>> numbers in plain char - use them for 7-bit ASCII characters only.  For
>>>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>>>
>>>>> Unfortunately, in C, string literals have type char*, and char* strings
>>>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>>>> of various kinds, or UTF8.
>>>>>
>>>>> But you frequently need to manipulate individual character codes.
>>>>
>>>> It's quite rare to have to manipulate characters individually, other
>>>> than perhaps comparing them to character constants.  Certainly general
>>>> arithmetic is very rare on characters, with the exception of something
>>>> like UTF8 code point extraction.
>>>
>>> toupper() and tolower() were often implemented using arithmetic
>>> on characters in the C locale, but the point stands.
>>>
>>
>> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20)
>> rather than arithmetic instructions?  But I wouldn't do logical
>> operations on plain char either!  (The C library implementation can, of
>> course, do whatever it likes.)
> 
> Unix v7:
> #define isalpha(c)      ((_ctype_+1)[c]&(_U|_L))
> #define isupper(c)      ((_ctype_+1)[c]&_U)
> #define islower(c)      ((_ctype_+1)[c]&_L)
> #define isdigit(c)      ((_ctype_+1)[c]&_N)
> #define isxdigit(c)     ((_ctype_+1)[c]&(_N|_X))
> #define isspace(c)      ((_ctype_+1)[c]&_S)
> #define ispunct(c)      ((_ctype_+1)[c]&_P)
> #define isalnum(c)      ((_ctype_+1)[c]&(_U|_L|_N))
> #define isprint(c)      ((_ctype_+1)[c]&(_P|_U|_L|_N))
> #define iscntrl(c)      ((_ctype_+1)[c]&_C)
> #define isascii(c)      ((unsigned)(c)<=0177)
> #define toupper(c)      ((c)-'a'+'A')
> #define tolower(c)      ((c)-'A'+'a')
> #define toascii(c)      ((c)&0177)
> 

Those "toupper" and "tolower" macros are going to give the wrong result 
for any argument that is not a lower case or upper case (respectively) 
ASCII letter.  The C standards require the macros to return the argument 
unchanged except when the character can be made upper-case or 
lower-case.  Maybe these macros were from pre-C90 C, but they are pretty 
useless as they stand.  (C99 says they are functions, not macros, but 
maybe that too has changed.  The functions take an "int" argument, so 
there is no attempt at arithmetic on (promoted) chars.)

> 
> Unixware 2.01:
> 
> #define isascii(c)      (!((c) & ~0177))
> #define toascii(c)      ((c) & 0177)
> 
> #if defined(_XOPEN_SOURCE) || (__STDC__ == 0 \
>          && !defined(_POSIX_SOURCE) && !defined(_POSIX_C_SOURCE))
> #define _toupper(c)     ((__ctype + 258)[c])
> #define _tolower(c)     ((__ctype + 258)[c])
> 
>>

[toc] | [prev] | [next] | [standalone]


#172401

Fromscott@slp53.sl.home (Scott Lurndal)
Date2023-08-16 17:50 +0000
Message-ID<j%7DM.428704$U3w1.122961@fx09.iad>
In reply to#172391
David Brown <david.brown@hesbynett.no> writes:
>On 16/08/2023 16:43, Scott Lurndal wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> On 16/08/2023 16:05, Scott Lurndal wrote:
>>>> David Brown <david.brown@hesbynett.no> writes:
>>>>> On 16/08/2023 14:31, Bart wrote:
>>>>>> On 16/08/2023 07:53, David Brown wrote:
>>>>>>
>>>>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>>>>> Older platforms (including x86) tend to have plain char as signed,
>>>>>>> while more modern ABIs are usually unsigned.
>>>>>>>
>>>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>>>>> numbers in plain char - use them for 7-bit ASCII characters only.  For
>>>>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>>>>
>>>>>> Unfortunately, in C, string literals have type char*, and char* strings
>>>>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>>>>> of various kinds, or UTF8.
>>>>>>
>>>>>> But you frequently need to manipulate individual character codes.
>>>>>
>>>>> It's quite rare to have to manipulate characters individually, other
>>>>> than perhaps comparing them to character constants.  Certainly general
>>>>> arithmetic is very rare on characters, with the exception of something
>>>>> like UTF8 code point extraction.
>>>>
>>>> toupper() and tolower() were often implemented using arithmetic
>>>> on characters in the C locale, but the point stands.
>>>>
>>>
>>> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20)
>>> rather than arithmetic instructions?  But I wouldn't do logical
>>> operations on plain char either!  (The C library implementation can, of
>>> course, do whatever it likes.)
>> 
>> Unix v7:
>> #define isalpha(c)      ((_ctype_+1)[c]&(_U|_L))
>> #define isupper(c)      ((_ctype_+1)[c]&_U)
>> #define islower(c)      ((_ctype_+1)[c]&_L)
>> #define isdigit(c)      ((_ctype_+1)[c]&_N)
>> #define isxdigit(c)     ((_ctype_+1)[c]&(_N|_X))
>> #define isspace(c)      ((_ctype_+1)[c]&_S)
>> #define ispunct(c)      ((_ctype_+1)[c]&_P)
>> #define isalnum(c)      ((_ctype_+1)[c]&(_U|_L|_N))
>> #define isprint(c)      ((_ctype_+1)[c]&(_P|_U|_L|_N))
>> #define iscntrl(c)      ((_ctype_+1)[c]&_C)
>> #define isascii(c)      ((unsigned)(c)<=0177)
>> #define toupper(c)      ((c)-'a'+'A')
>> #define tolower(c)      ((c)-'A'+'a')
>> #define toascii(c)      ((c)&0177)
>> 
>
>Those "toupper" and "tolower" macros are going to give the wrong result 
>for any argument that is not a lower case or upper case (respectively) 
>ASCII letter.

Which in 1979 when Unix V7 in the wild is why isalpha was used to qualify toupper/lower.

>  The C standards require the macros to return the argument 

The above header file predated the standard by a number of years.

[toc] | [prev] | [next] | [standalone]


#172456

FromDavid Brown <david.brown@hesbynett.no>
Date2023-08-17 16:05 +0200
Message-ID<ubl9fv$3prfv$4@dont-email.me>
In reply to#172401
On 16/08/2023 19:50, Scott Lurndal wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 16/08/2023 16:43, Scott Lurndal wrote:
>>> David Brown <david.brown@hesbynett.no> writes:
>>>> On 16/08/2023 16:05, Scott Lurndal wrote:
>>>>> David Brown <david.brown@hesbynett.no> writes:
>>>>>> On 16/08/2023 14:31, Bart wrote:
>>>>>>> On 16/08/2023 07:53, David Brown wrote:
>>>>>>>
>>>>>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>>>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>>>>>> Older platforms (including x86) tend to have plain char as signed,
>>>>>>>> while more modern ABIs are usually unsigned.
>>>>>>>>
>>>>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>>>>>> numbers in plain char - use them for 7-bit ASCII characters only.  For
>>>>>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>>>>>
>>>>>>> Unfortunately, in C, string literals have type char*, and char* strings
>>>>>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>>>>>> of various kinds, or UTF8.
>>>>>>>
>>>>>>> But you frequently need to manipulate individual character codes.
>>>>>>
>>>>>> It's quite rare to have to manipulate characters individually, other
>>>>>> than perhaps comparing them to character constants.  Certainly general
>>>>>> arithmetic is very rare on characters, with the exception of something
>>>>>> like UTF8 code point extraction.
>>>>>
>>>>> toupper() and tolower() were often implemented using arithmetic
>>>>> on characters in the C locale, but the point stands.
>>>>>
>>>>
>>>> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20)
>>>> rather than arithmetic instructions?  But I wouldn't do logical
>>>> operations on plain char either!  (The C library implementation can, of
>>>> course, do whatever it likes.)
>>>
>>> Unix v7:
>>> #define isalpha(c)      ((_ctype_+1)[c]&(_U|_L))
>>> #define isupper(c)      ((_ctype_+1)[c]&_U)
>>> #define islower(c)      ((_ctype_+1)[c]&_L)
>>> #define isdigit(c)      ((_ctype_+1)[c]&_N)
>>> #define isxdigit(c)     ((_ctype_+1)[c]&(_N|_X))
>>> #define isspace(c)      ((_ctype_+1)[c]&_S)
>>> #define ispunct(c)      ((_ctype_+1)[c]&_P)
>>> #define isalnum(c)      ((_ctype_+1)[c]&(_U|_L|_N))
>>> #define isprint(c)      ((_ctype_+1)[c]&(_P|_U|_L|_N))
>>> #define iscntrl(c)      ((_ctype_+1)[c]&_C)
>>> #define isascii(c)      ((unsigned)(c)<=0177)
>>> #define toupper(c)      ((c)-'a'+'A')
>>> #define tolower(c)      ((c)-'A'+'a')
>>> #define toascii(c)      ((c)&0177)
>>>
>>
>> Those "toupper" and "tolower" macros are going to give the wrong result
>> for any argument that is not a lower case or upper case (respectively)
>> ASCII letter.
> 
> Which in 1979 when Unix V7 in the wild is why isalpha was used to qualify toupper/lower.

That would not suffice - you'd need isupper() or islower() :

#define useful_tolower(c) isupper((c)) ? tolower((c)) : (c)
#define useful_toupper(c) islower((c)) ? toupper((c)) : (c)

> 
>>   The C standards require the macros to return the argument
> 
> The above header file predated the standard by a number of years.
> 

Fair enough.  Some things have got better over time!

[toc] | [prev] | [next] | [standalone]


#172379

FromMalcolm McLean <malcolm.arthur.mclean@gmail.com>
Date2023-08-16 07:35 -0700
Message-ID<9d8b343f-c61b-4bc6-b73e-a2df1ece7a63n@googlegroups.com>
In reply to#172375
On Wednesday, 16 August 2023 at 14:31:51 UTC+1, David Brown wrote:
> On 16/08/2023 14:31, Bart wrote: 
> > On 16/08/2023 07:53, David Brown wrote: 
> > 
> >> Just to be clear about this in case anyone wonders why you wrote that 
> >> - plain "char" can be signed or unsigned, depending on the platform. 
> >> Older platforms (including x86) tend to have plain char as signed, 
> >> while more modern ABIs are usually unsigned. 
> >> 
> >> It is, of course, a terrible idea to do any kind of arithmetic or hold 
> >> numbers in plain char - use them for 7-bit ASCII characters only.  For 
> >> anything with numbers, use "signed char", "unsigned char", or (better, 
> >> IMHO) appropriate <stdint.h> types when you need a small integer type. 
> > 
> > Unfortunately, in C, string literals have type char*, and char* strings 
> > are encountered everywhere, where they can store ASCII, extended ASCII 
> > of various kinds, or UTF8. 
> > 
> > But you frequently need to manipulate individual character codes.
> It's quite rare to have to manipulate characters individually, other 
> than perhaps comparing them to character constants. Certainly general 
> arithmetic is very rare on characters, with the exception of something 
> like UTF8 code point extraction. Do you have examples where people 
> might reasonably want to perform arithmetic on plain char, that you 
> think occur frequently? 
> 
> I think it is entirely appropriate to use "char" for characters (and 
> const char* for immutable strings). But I don't think it is appropriate 
> for any kind of arithmetic.
>
Theoretically an atoi() should be implemeted with

strchr("0123456789", ch);

to convert from character to digit. But people like efficiency.

 

[toc] | [prev] | [next] | [standalone]


Page 1 of 3  [1] 2 3  Next page →

Back to top | Article view | comp.lang.c


csiph-web