Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #172278 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2023-08-15 06:50 -0700 |
| Last post | 2023-08-16 08:48 -0700 |
| Articles | 20 on this page of 45 — 11 participants |
Back to article view | Back to comp.lang.c
signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:50 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 06:59 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 07:03 -0700
Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 07:44 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-15 08:01 -0700
Re: signed/unsigned - what will fail Öö Tiib <ootiib@hot.ee> - 2023-08-15 09:48 -0700
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 08:53 +0200
Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 00:02 -0700
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 13:05 +0200
Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:40 +0100
Re: signed/unsigned - what will fail Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 20:12 -0700
Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 13:31 +0100
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 15:31 +0200
Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:05 +0000
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 16:20 +0200
Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 14:43 +0000
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:16 +0200
Re: signed/unsigned - what will fail scott@slp53.sl.home (Scott Lurndal) - 2023-08-16 17:50 +0000
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:05 +0200
Re: signed/unsigned - what will fail Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2023-08-16 07:35 -0700
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-16 19:21 +0200
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:30 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 10:33 -0700
Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 22:01 +0100
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:09 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:29 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 14:14 -0700
Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-17 00:52 +0100
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 17:07 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 12:52 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 13:13 -0700
Re: signed/unsigned - what will fail Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-08-16 21:52 +0100
Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 18:25 +0100
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 16:15 +0200
Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 10:44 +0300
Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:17 +0000
Re: signed/unsigned - what will fail Spiros Bousbouras <spibou@gmail.com> - 2023-08-17 08:51 +0000
Re: signed/unsigned - what will fail Phil Carmody <pc+usenet@asdf.org> - 2023-08-17 15:11 +0300
Re: signed/unsigned - what will fail David Brown <david.brown@hesbynett.no> - 2023-08-17 21:20 +0200
Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:26 -0700
Re: signed/unsigned - what will fail Bart <bc@freeuk.com> - 2023-08-16 21:51 +0100
Re: signed/unsigned - what will fail Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 15:35 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:14 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:34 -0700
Re: signed/unsigned - what will fail fir <profesor.fir@gmail.com> - 2023-08-16 08:48 -0700
Page 1 of 3 [1] 2 3 Next page →
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2023-08-15 06:50 -0700 |
| Subject | signed/unsigned - what will fail |
| Message-ID | <21d1ef97-8620-4115-b412-7279e0ef4d6bn@googlegroups.com> |
im not sure if im a fan of division signed/unsigned (for example maybe there are cases where you dont care and in such case treat given int as both signed and unsigned (as it kinda is) im not stating hovever to define 3 tates signed, unsigned, and dontcare or just remove signed unsigned or what else - becouse i dont know, i dont understand it enough well but asume i use dont care (and always write just "int"): what will exactly fail?
[toc] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2023-08-15 06:59 -0700 |
| Message-ID | <7ffec8c7-1b4c-4c3c-9342-daed7af19dabn@googlegroups.com> |
| In reply to | #172278 |
wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > im not sure if im a fan of division signed/unsigned > (for example maybe there are cases where you dont care > and in such case treat given int as both signed and unsigned > (as it kinda is) > > im not stating hovever to define 3 tates signed, unsigned, and > dontcare or just remove signed unsigned or what else - becouse i dont > know, i dont understand it enough well > > but asume i use dont care (and always write just "int"): > what will exactly fail? im not sure but probably could assume that additions and subtractions are safe, but im not sure: seay char a = 200, b= 200; int c = a+b; //is it 400? int d = a*b; //what with that?
[toc] | [prev] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2023-08-15 07:03 -0700 |
| Message-ID | <58c59847-321e-4174-9bd0-255a3159d220n@googlegroups.com> |
| In reply to | #172280 |
wtorek, 15 sierpnia 2023 o 15:59:15 UTC+2 fir napisał(a): > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > > im not sure if im a fan of division signed/unsigned > > (for example maybe there are cases where you dont care > > and in such case treat given int as both signed and unsigned > > (as it kinda is) > > > > im not stating hovever to define 3 tates signed, unsigned, and > > dontcare or just remove signed unsigned or what else - becouse i dont > > know, i dont understand it enough well > > > > but asume i use dont care (and always write just "int"): > > what will exactly fail? > im not sure but probably could assume that additions and subtractions are safe, > but im not sure: > > seay > > char a = 200, b= 200; > > int c = a+b; //is it 400? > > int d = a*b; //what with that? i would also like probably to insight the thesis that programming language like c should be donte maybe thsi way that maybe thsi kind of expresions should work like int and char woudl be of class "dontcare" i mean possibly giving proper results for a=b=-20, and a=b=200; (and maybe even a=b=-200) but i dont 'inspected' it all
[toc] | [prev] | [next] | [standalone]
| From | Öö Tiib <ootiib@hot.ee> |
|---|---|
| Date | 2023-08-15 07:44 -0700 |
| Message-ID | <89f530cb-dd82-46f9-9567-a1f81e55d239n@googlegroups.com> |
| In reply to | #172280 |
On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > > im not sure if im a fan of division signed/unsigned > > (for example maybe there are cases where you dont care > > and in such case treat given int as both signed and unsigned > > (as it kinda is) > > > > im not stating hovever to define 3 tates signed, unsigned, and > > dontcare or just remove signed unsigned or what else - becouse i dont > > know, i dont understand it enough well > > > > but asume i use dont care (and always write just "int"): > > what will exactly fail? > im not sure but probably could assume that additions and subtractions are safe, > but im not sure: > > seay > > char a = 200, b= 200; > You contradict your "and always write just "int"" I see clearly char there. Do you turn off warnings on your compilers? With gcc you get most likely something like: "warning: overflow in conversion from 'int' to 'char' changes value from '200' to '-56' [-Woverflow]" > int c = a+b; //is it 400? Can be, but it is more likely -112. > > int d = a*b; //what with that? Can be 40000, 3136, -25536, what is your point?
[toc] | [prev] | [next] | [standalone]
| From | fir <profesor.fir@gmail.com> |
|---|---|
| Date | 2023-08-15 08:01 -0700 |
| Message-ID | <6fcfcd10-82e7-4ecb-8bec-e6292ff73322n@googlegroups.com> |
| In reply to | #172287 |
wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): > On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: > > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > > > im not sure if im a fan of division signed/unsigned > > > (for example maybe there are cases where you dont care > > > and in such case treat given int as both signed and unsigned > > > (as it kinda is) > > > > > > im not stating hovever to define 3 tates signed, unsigned, and > > > dontcare or just remove signed unsigned or what else - becouse i dont > > > know, i dont understand it enough well > > > > > > but asume i use dont care (and always write just "int"): > > > what will exactly fail? > > im not sure but probably could assume that additions and subtractions are safe, > > but im not sure: > > > > seay > > > > char a = 200, b= 200; > > > You contradict your "and always write just "int"" I see clearly char there. > > Do you turn off warnings on your compilers? With gcc you get most likely > something like: "warning: overflow in conversion from 'int' to 'char' changes > value from '200' to '-56' [-Woverflow]" > > int c = a+b; //is it 400? > Can be, but it is more likely -112. > > > > int d = a*b; //what with that? > Can be 40000, 3136, -25536, what is your point? it cant be 2 or 3 at once it is one of it.. points may be many but im close to think that such "expresiion" shouldgenerally 'carry' a real value this way if int a = -200, b=-200; int c = a*b ; c should be 40000 and so on i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations
[toc] | [prev] | [next] | [standalone]
| From | Öö Tiib <ootiib@hot.ee> |
|---|---|
| Date | 2023-08-15 09:48 -0700 |
| Message-ID | <c7bd3b8e-a9d1-4e38-a4ce-f7697bc2e777n@googlegroups.com> |
| In reply to | #172290 |
On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote: > wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): > > On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: > > > wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > > > > im not sure if im a fan of division signed/unsigned > > > > (for example maybe there are cases where you dont care > > > > and in such case treat given int as both signed and unsigned > > > > (as it kinda is) > > > > > > > > im not stating hovever to define 3 tates signed, unsigned, and > > > > dontcare or just remove signed unsigned or what else - becouse i dont > > > > know, i dont understand it enough well > > > > > > > > but asume i use dont care (and always write just "int"): > > > > what will exactly fail? > > > im not sure but probably could assume that additions and subtractions are safe, > > > but im not sure: > > > > > > seay > > > > > > char a = 200, b= 200; > > > > > You contradict your "and always write just "int"" I see clearly char there. > > > > Do you turn off warnings on your compilers? With gcc you get most likely > > something like: "warning: overflow in conversion from 'int' to 'char' changes > > value from '200' to '-56' [-Woverflow]" > > > int c = a+b; //is it 400? > > Can be, but it is more likely -112. > > > > > > int d = a*b; //what with that? > > Can be 40000, 3136, -25536, what is your point? > > it cant be 2 or 3 at once it is one of it.. > It may cause signed integer overflow on 8 bit or 16 bit embedded system and that is undefined behavior unless something defines it for said system. > > points may be many but im close to think that such "expresiion" shouldgenerally 'carry' > a real value this way if > > int a = -200, b=-200; > int c = a*b ; c should be 40000 and so on > > i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations How so? If on 8 bit microcontroller int has 16 bits of storage with value range -32768 to 32767 then there are no way how it can be 40000. It is simply impossible for c to have that value.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-16 08:53 +0200 |
| Message-ID | <ubhrot$37a4c$1@dont-email.me> |
| In reply to | #172315 |
On 15/08/2023 18:48, Öö Tiib wrote: > On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote: >> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): >>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: >>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): >>>>> im not sure if im a fan of division signed/unsigned >>>>> (for example maybe there are cases where you dont care >>>>> and in such case treat given int as both signed and unsigned >>>>> (as it kinda is) >>>>> >>>>> im not stating hovever to define 3 tates signed, unsigned, and >>>>> dontcare or just remove signed unsigned or what else - becouse i dont >>>>> know, i dont understand it enough well >>>>> >>>>> but asume i use dont care (and always write just "int"): >>>>> what will exactly fail? >>>> im not sure but probably could assume that additions and subtractions are safe, >>>> but im not sure: >>>> >>>> seay >>>> >>>> char a = 200, b= 200; >>>> >>> You contradict your "and always write just "int"" I see clearly char there. >>> >>> Do you turn off warnings on your compilers? With gcc you get most likely >>> something like: "warning: overflow in conversion from 'int' to 'char' changes >>> value from '200' to '-56' [-Woverflow]" >>>> int c = a+b; //is it 400? >>> Can be, but it is more likely -112. Just to be clear about this in case anyone wonders why you wrote that - plain "char" can be signed or unsigned, depending on the platform. Older platforms (including x86) tend to have plain char as signed, while more modern ABIs are usually unsigned. It is, of course, a terrible idea to do any kind of arithmetic or hold numbers in plain char - use them for 7-bit ASCII characters only. For anything with numbers, use "signed char", "unsigned char", or (better, IMHO) appropriate <stdint.h> types when you need a small integer type. So after "char a = 200;", the value in "a" depends on the target's ABI. (Enabling warnings on the compiler is always a good idea!) >>>> >>>> int d = a*b; //what with that? >>> Can be 40000, 3136, -25536, what is your point? >> >> it cant be 2 or 3 at once it is one of it.. If plain char is signed on the target, then "d" will always be 3136. If plain char is unsigned and int is bigger than 16 bit (generally 32 bit) on the target, then "d" will always be 40000. If plain char is unsigned and int is 16-bit, then there is an arithmetic overflow in the signed int multiplication - that's undefined behaviour, and there's no limit to what can go wrong, including the compiler generating code that treats the result as 2 or 3 of these values. Often, it will appear to be -25536 - but that is not guaranteed or reliable in any way (without specific compiler flags or documentation). (I know you, Öö, know this - but fir may be less sure.) >> > It may cause signed integer overflow on 8 bit or 16 bit embedded system > and that is undefined behavior unless something defines it for said system. > >> >> points may be many but im close to think that such "expresiion" shouldgenerally 'carry' >> a real value this way if >> >> int a = -200, b=-200; >> int c = a*b ; c should be 40000 and so on >> >> i think possibly classes like signed unsigned maybe should be more use for storage arrays not much in normal calculations > > How so? If on 8 bit microcontroller int has 16 bits of storage with value > range -32768 to 32767 then there are no way how it can be 40000. > It is simply impossible for c to have that value.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-16 00:02 -0700 |
| Message-ID | <aedc8f63-5638-4d24-a116-594dc63c8fa4n@googlegroups.com> |
| In reply to | #172352 |
On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote: > On 15/08/2023 18:48, Öö Tiib wrote: > > On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote: > >> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): > >>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: > >>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): > >>>>> im not sure if im a fan of division signed/unsigned > >>>>> (for example maybe there are cases where you dont care > >>>>> and in such case treat given int as both signed and unsigned > >>>>> (as it kinda is) > >>>>> > >>>>> im not stating hovever to define 3 tates signed, unsigned, and > >>>>> dontcare or just remove signed unsigned or what else - becouse i dont > >>>>> know, i dont understand it enough well > >>>>> > >>>>> but asume i use dont care (and always write just "int"): > >>>>> what will exactly fail? > >>>> im not sure but probably could assume that additions and subtractions are safe, > >>>> but im not sure: > >>>> > >>>> seay > >>>> > >>>> char a = 200, b= 200; > >>>> > >>> You contradict your "and always write just "int"" I see clearly char there. > >>> > >>> Do you turn off warnings on your compilers? With gcc you get most likely > >>> something like: "warning: overflow in conversion from 'int' to 'char' changes > >>> value from '200' to '-56' [-Woverflow]" > >>>> int c = a+b; //is it 400? > >>> Can be, but it is more likely -112. > Just to be clear about this in case anyone wonders why you wrote that - > plain "char" can be signed or unsigned, depending on the platform. > Older platforms (including x86) tend to have plain char as signed, while > more modern ABIs are usually unsigned. > > It is, of course, a terrible idea to do any kind of arithmetic or hold > numbers in plain char - use them for 7-bit ASCII characters only. > I pass about UTF-8 as char *s in Baby X. But of course it is converted to unsigned char for the actual UTF-8 manipulations.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-16 13:05 +0200 |
| Message-ID | <ubiahg$394g8$4@dont-email.me> |
| In reply to | #172353 |
On 16/08/2023 09:02, Malcolm McLean wrote: > On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote: >> On 15/08/2023 18:48, Öö Tiib wrote: >>> On Tuesday, 15 August 2023 at 18:01:23 UTC+3, fir wrote: >>>> wtorek, 15 sierpnia 2023 o 16:45:07 UTC+2 Öö Tiib napisał(a): >>>>> On Tuesday, 15 August 2023 at 16:59:15 UTC+3, fir wrote: >>>>>> wtorek, 15 sierpnia 2023 o 15:51:07 UTC+2 fir napisał(a): >>>>>>> im not sure if im a fan of division signed/unsigned >>>>>>> (for example maybe there are cases where you dont care >>>>>>> and in such case treat given int as both signed and unsigned >>>>>>> (as it kinda is) >>>>>>> >>>>>>> im not stating hovever to define 3 tates signed, unsigned, and >>>>>>> dontcare or just remove signed unsigned or what else - becouse i dont >>>>>>> know, i dont understand it enough well >>>>>>> >>>>>>> but asume i use dont care (and always write just "int"): >>>>>>> what will exactly fail? >>>>>> im not sure but probably could assume that additions and subtractions are safe, >>>>>> but im not sure: >>>>>> >>>>>> seay >>>>>> >>>>>> char a = 200, b= 200; >>>>>> >>>>> You contradict your "and always write just "int"" I see clearly char there. >>>>> >>>>> Do you turn off warnings on your compilers? With gcc you get most likely >>>>> something like: "warning: overflow in conversion from 'int' to 'char' changes >>>>> value from '200' to '-56' [-Woverflow]" >>>>>> int c = a+b; //is it 400? >>>>> Can be, but it is more likely -112. >> Just to be clear about this in case anyone wonders why you wrote that - >> plain "char" can be signed or unsigned, depending on the platform. >> Older platforms (including x86) tend to have plain char as signed, while >> more modern ABIs are usually unsigned. >> >> It is, of course, a terrible idea to do any kind of arithmetic or hold >> numbers in plain char - use them for 7-bit ASCII characters only. >> > I pass about UTF-8 as char *s in Baby X. But of course it is converted to unsigned > char for the actual UTF-8 manipulations. Conversion from plain char to unsigned char will always be safe and well-defined, regardless of the signedness of plain char. And it is always safe to access plain char data through an unsigned char pointer. Given that string literals are treated as char array (unless you have an encoding prefix for wide characters of some kind), using plain char seems entirely reasonable. But "const char *" would be vastly better. (The conversion from unsigned char to plain char is implementation dependent, but I know of no platforms where it will not work in the simple and obvious way.)
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-08-16 22:40 +0100 |
| Message-ID | <87zg2qfyy8.fsf@bsb.me.uk> |
| In reply to | #172365 |
David Brown <david.brown@hesbynett.no> writes: > On 16/08/2023 09:02, Malcolm McLean wrote: >> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote: >>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>> numbers in plain char - use them for 7-bit ASCII characters only. >>> >> I pass about UTF-8 as char *s in Baby X. But of course it is >> converted to unsigned char for the actual UTF-8 manipulations. I think too much is often made of this. What is it that you are worried about? A lot of UTF-8 fiddling is masking values that will have been promoted to int. The masking can be more portable than the conversion. > Conversion from plain char to unsigned char will always be safe and > well-defined, regardless of the signedness of plain char. It's not UB in the dreaded C sense, but it's not defined in a way that will help you for UTF-8. Let's say you want to see if *cp (a plain pointer to signed char) points to a UTF-8 continuation character. I'd write (*cp & 0xC0) == 0x80 and the value of six bits represented is *cp & 0x3F, i.e. 2. Will this fail because char is signed? No. Will it fail if char is signed and the implementation uses sign-and-magnitude? I don't think so because UTF-8 mandates the bit pattern not the value. (If anyone has ever seen UTF-8 on a non-two's complement machine with signed char, do let me know!) > And it is always > safe to access plain char data through an unsigned char pointer. But what happens if I convert to unsigned char: const unsigned char *ucp = cp; Now what is *cp & 0x3F on a sign-and-magnitude machine? It's 62 not 2. Admittedly, anything but two's complement is now relegated to C's history even if the machines still exist, so may we should not care anymore. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-16 20:12 -0700 |
| Message-ID | <86bkf65pm4.fsf@linuxsc.com> |
| In reply to | #172417 |
Ben Bacarisse <ben.usenet@bsb.me.uk> writes: > David Brown <david.brown@hesbynett.no> writes: > >> On 16/08/2023 09:02, Malcolm McLean wrote: >> >>> On Wednesday, 16 August 2023 at 07:53:32 UTC+1, David Brown wrote: >>> >>>> It is, of course, a terrible idea to do any kind of arithmetic or >>>> hold numbers in plain char - use them for 7-bit ASCII characters >>>> only. >>> >>> I pass about UTF-8 as char *s in Baby X. But of course it is >>> converted to unsigned char for the actual UTF-8 manipulations. > > I think too much is often made of this. What is it that you are > worried about? A lot of UTF-8 fiddling is masking values that will > have been promoted to int. The masking can be more portable than > the conversion. I'm not sure the question is so clear cut. First the pointer conversion (from char * to unsigned char *) is absolutely guaranteed to work, and accesses through the unsigned char * are guaranteed to work. The only question is what bits do you get. Of course if the implementation is two's complement then it doesn't matter. But if it isn't, where did the bits come from? That matters because values that go through <stdio.h> functions may have been converted to -- or re-interpreted as, it isn't clear which -- unsigned char. If a file is being read that was produced under a different implementation, reading the bytes as char's rather than unsigned char's could result in incorrect values. Or, unfortunately, vice versa. Speaking for myself normally I would prefer to do UTF8-style processing through unsigned char pointers. My reasoning is it's easier to think about and (probably) more likely to work in the unusual cases. Also, now that I think of it, safer, because unsigned char cannot have trap representations. Also if there is some sort of encoding problem I have more confidence in solving the problem working directly on unsigned char values than in reasoning through what will happen when working with the signed values. Conversely if I were reading code that was doing UTF8 processing and using plain char, I think I would need to work harder to understand how it works. So FWIW there is a personal view.
[toc] | [prev] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2023-08-16 13:31 +0100 |
| Message-ID | <ubifj9$39v7p$1@dont-email.me> |
| In reply to | #172352 |
On 16/08/2023 07:53, David Brown wrote: > Just to be clear about this in case anyone wonders why you wrote that - > plain "char" can be signed or unsigned, depending on the platform. Older > platforms (including x86) tend to have plain char as signed, while more > modern ABIs are usually unsigned. > > It is, of course, a terrible idea to do any kind of arithmetic or hold > numbers in plain char - use them for 7-bit ASCII characters only. For > anything with numbers, use "signed char", "unsigned char", or (better, > IMHO) appropriate <stdint.h> types when you need a small integer type. Unfortunately, in C, string literals have type char*, and char* strings are encountered everywhere, where they can store ASCII, extended ASCII of various kinds, or UTF8. But you frequently need to manipulate individual character codes.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-16 15:31 +0200 |
| Message-ID | <ubij3o$3afvd$1@dont-email.me> |
| In reply to | #172372 |
On 16/08/2023 14:31, Bart wrote: > On 16/08/2023 07:53, David Brown wrote: > >> Just to be clear about this in case anyone wonders why you wrote that >> - plain "char" can be signed or unsigned, depending on the platform. >> Older platforms (including x86) tend to have plain char as signed, >> while more modern ABIs are usually unsigned. >> >> It is, of course, a terrible idea to do any kind of arithmetic or hold >> numbers in plain char - use them for 7-bit ASCII characters only. For >> anything with numbers, use "signed char", "unsigned char", or (better, >> IMHO) appropriate <stdint.h> types when you need a small integer type. > > Unfortunately, in C, string literals have type char*, and char* strings > are encountered everywhere, where they can store ASCII, extended ASCII > of various kinds, or UTF8. > > But you frequently need to manipulate individual character codes. It's quite rare to have to manipulate characters individually, other than perhaps comparing them to character constants. Certainly general arithmetic is very rare on characters, with the exception of something like UTF8 code point extraction. Do you have examples where people might reasonably want to perform arithmetic on plain char, that you think occur frequently? I think it is entirely appropriate to use "char" for characters (and const char* for immutable strings). But I don't think it is appropriate for any kind of arithmetic.
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-16 14:05 +0000 |
| Message-ID | <HI4DM.141794$ftCb.112835@fx34.iad> |
| In reply to | #172375 |
David Brown <david.brown@hesbynett.no> writes: >On 16/08/2023 14:31, Bart wrote: >> On 16/08/2023 07:53, David Brown wrote: >> >>> Just to be clear about this in case anyone wonders why you wrote that >>> - plain "char" can be signed or unsigned, depending on the platform. >>> Older platforms (including x86) tend to have plain char as signed, >>> while more modern ABIs are usually unsigned. >>> >>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>> numbers in plain char - use them for 7-bit ASCII characters only. For >>> anything with numbers, use "signed char", "unsigned char", or (better, >>> IMHO) appropriate <stdint.h> types when you need a small integer type. >> >> Unfortunately, in C, string literals have type char*, and char* strings >> are encountered everywhere, where they can store ASCII, extended ASCII >> of various kinds, or UTF8. >> >> But you frequently need to manipulate individual character codes. > >It's quite rare to have to manipulate characters individually, other >than perhaps comparing them to character constants. Certainly general >arithmetic is very rare on characters, with the exception of something >like UTF8 code point extraction. toupper() and tolower() were often implemented using arithmetic on characters in the C locale, but the point stands.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-16 16:20 +0200 |
| Message-ID | <ubilva$3as8c$1@dont-email.me> |
| In reply to | #172376 |
On 16/08/2023 16:05, Scott Lurndal wrote: > David Brown <david.brown@hesbynett.no> writes: >> On 16/08/2023 14:31, Bart wrote: >>> On 16/08/2023 07:53, David Brown wrote: >>> >>>> Just to be clear about this in case anyone wonders why you wrote that >>>> - plain "char" can be signed or unsigned, depending on the platform. >>>> Older platforms (including x86) tend to have plain char as signed, >>>> while more modern ABIs are usually unsigned. >>>> >>>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>>> numbers in plain char - use them for 7-bit ASCII characters only. For >>>> anything with numbers, use "signed char", "unsigned char", or (better, >>>> IMHO) appropriate <stdint.h> types when you need a small integer type. >>> >>> Unfortunately, in C, string literals have type char*, and char* strings >>> are encountered everywhere, where they can store ASCII, extended ASCII >>> of various kinds, or UTF8. >>> >>> But you frequently need to manipulate individual character codes. >> >> It's quite rare to have to manipulate characters individually, other >> than perhaps comparing them to character constants. Certainly general >> arithmetic is very rare on characters, with the exception of something >> like UTF8 code point extraction. > > toupper() and tolower() were often implemented using arithmetic > on characters in the C locale, but the point stands. > Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) rather than arithmetic instructions? But I wouldn't do logical operations on plain char either! (The C library implementation can, of course, do whatever it likes.)
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-16 14:43 +0000 |
| Message-ID | <%f5DM.670467$AsA.656909@fx18.iad> |
| In reply to | #172378 |
David Brown <david.brown@hesbynett.no> writes:
>On 16/08/2023 16:05, Scott Lurndal wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> On 16/08/2023 14:31, Bart wrote:
>>>> On 16/08/2023 07:53, David Brown wrote:
>>>>
>>>>> Just to be clear about this in case anyone wonders why you wrote that
>>>>> - plain "char" can be signed or unsigned, depending on the platform.
>>>>> Older platforms (including x86) tend to have plain char as signed,
>>>>> while more modern ABIs are usually unsigned.
>>>>>
>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold
>>>>> numbers in plain char - use them for 7-bit ASCII characters only. For
>>>>> anything with numbers, use "signed char", "unsigned char", or (better,
>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type.
>>>>
>>>> Unfortunately, in C, string literals have type char*, and char* strings
>>>> are encountered everywhere, where they can store ASCII, extended ASCII
>>>> of various kinds, or UTF8.
>>>>
>>>> But you frequently need to manipulate individual character codes.
>>>
>>> It's quite rare to have to manipulate characters individually, other
>>> than perhaps comparing them to character constants. Certainly general
>>> arithmetic is very rare on characters, with the exception of something
>>> like UTF8 code point extraction.
>>
>> toupper() and tolower() were often implemented using arithmetic
>> on characters in the C locale, but the point stands.
>>
>
>Surely that would be logical instructions (x &= ~0x20, or x |= 0x20)
>rather than arithmetic instructions? But I wouldn't do logical
>operations on plain char either! (The C library implementation can, of
>course, do whatever it likes.)
Unix v7:
#define isalpha(c) ((_ctype_+1)[c]&(_U|_L))
#define isupper(c) ((_ctype_+1)[c]&_U)
#define islower(c) ((_ctype_+1)[c]&_L)
#define isdigit(c) ((_ctype_+1)[c]&_N)
#define isxdigit(c) ((_ctype_+1)[c]&(_N|_X))
#define isspace(c) ((_ctype_+1)[c]&_S)
#define ispunct(c) ((_ctype_+1)[c]&_P)
#define isalnum(c) ((_ctype_+1)[c]&(_U|_L|_N))
#define isprint(c) ((_ctype_+1)[c]&(_P|_U|_L|_N))
#define iscntrl(c) ((_ctype_+1)[c]&_C)
#define isascii(c) ((unsigned)(c)<=0177)
#define toupper(c) ((c)-'a'+'A')
#define tolower(c) ((c)-'A'+'a')
#define toascii(c) ((c)&0177)
Unixware 2.01:
#define isascii(c) (!((c) & ~0177))
#define toascii(c) ((c) & 0177)
#if defined(_XOPEN_SOURCE) || (__STDC__ == 0 \
&& !defined(_POSIX_SOURCE) && !defined(_POSIX_C_SOURCE))
#define _toupper(c) ((__ctype + 258)[c])
#define _tolower(c) ((__ctype + 258)[c])
>
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-16 19:16 +0200 |
| Message-ID | <ubj09u$3ckl4$1@dont-email.me> |
| In reply to | #172380 |
On 16/08/2023 16:43, Scott Lurndal wrote: > David Brown <david.brown@hesbynett.no> writes: >> On 16/08/2023 16:05, Scott Lurndal wrote: >>> David Brown <david.brown@hesbynett.no> writes: >>>> On 16/08/2023 14:31, Bart wrote: >>>>> On 16/08/2023 07:53, David Brown wrote: >>>>> >>>>>> Just to be clear about this in case anyone wonders why you wrote that >>>>>> - plain "char" can be signed or unsigned, depending on the platform. >>>>>> Older platforms (including x86) tend to have plain char as signed, >>>>>> while more modern ABIs are usually unsigned. >>>>>> >>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>>>>> numbers in plain char - use them for 7-bit ASCII characters only. For >>>>>> anything with numbers, use "signed char", "unsigned char", or (better, >>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type. >>>>> >>>>> Unfortunately, in C, string literals have type char*, and char* strings >>>>> are encountered everywhere, where they can store ASCII, extended ASCII >>>>> of various kinds, or UTF8. >>>>> >>>>> But you frequently need to manipulate individual character codes. >>>> >>>> It's quite rare to have to manipulate characters individually, other >>>> than perhaps comparing them to character constants. Certainly general >>>> arithmetic is very rare on characters, with the exception of something >>>> like UTF8 code point extraction. >>> >>> toupper() and tolower() were often implemented using arithmetic >>> on characters in the C locale, but the point stands. >>> >> >> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) >> rather than arithmetic instructions? But I wouldn't do logical >> operations on plain char either! (The C library implementation can, of >> course, do whatever it likes.) > > Unix v7: > #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) > #define isupper(c) ((_ctype_+1)[c]&_U) > #define islower(c) ((_ctype_+1)[c]&_L) > #define isdigit(c) ((_ctype_+1)[c]&_N) > #define isxdigit(c) ((_ctype_+1)[c]&(_N|_X)) > #define isspace(c) ((_ctype_+1)[c]&_S) > #define ispunct(c) ((_ctype_+1)[c]&_P) > #define isalnum(c) ((_ctype_+1)[c]&(_U|_L|_N)) > #define isprint(c) ((_ctype_+1)[c]&(_P|_U|_L|_N)) > #define iscntrl(c) ((_ctype_+1)[c]&_C) > #define isascii(c) ((unsigned)(c)<=0177) > #define toupper(c) ((c)-'a'+'A') > #define tolower(c) ((c)-'A'+'a') > #define toascii(c) ((c)&0177) > Those "toupper" and "tolower" macros are going to give the wrong result for any argument that is not a lower case or upper case (respectively) ASCII letter. The C standards require the macros to return the argument unchanged except when the character can be made upper-case or lower-case. Maybe these macros were from pre-C90 C, but they are pretty useless as they stand. (C99 says they are functions, not macros, but maybe that too has changed. The functions take an "int" argument, so there is no attempt at arithmetic on (promoted) chars.) > > Unixware 2.01: > > #define isascii(c) (!((c) & ~0177)) > #define toascii(c) ((c) & 0177) > > #if defined(_XOPEN_SOURCE) || (__STDC__ == 0 \ > && !defined(_POSIX_SOURCE) && !defined(_POSIX_C_SOURCE)) > #define _toupper(c) ((__ctype + 258)[c]) > #define _tolower(c) ((__ctype + 258)[c]) > >>
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2023-08-16 17:50 +0000 |
| Message-ID | <j%7DM.428704$U3w1.122961@fx09.iad> |
| In reply to | #172391 |
David Brown <david.brown@hesbynett.no> writes: >On 16/08/2023 16:43, Scott Lurndal wrote: >> David Brown <david.brown@hesbynett.no> writes: >>> On 16/08/2023 16:05, Scott Lurndal wrote: >>>> David Brown <david.brown@hesbynett.no> writes: >>>>> On 16/08/2023 14:31, Bart wrote: >>>>>> On 16/08/2023 07:53, David Brown wrote: >>>>>> >>>>>>> Just to be clear about this in case anyone wonders why you wrote that >>>>>>> - plain "char" can be signed or unsigned, depending on the platform. >>>>>>> Older platforms (including x86) tend to have plain char as signed, >>>>>>> while more modern ABIs are usually unsigned. >>>>>>> >>>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>>>>>> numbers in plain char - use them for 7-bit ASCII characters only. For >>>>>>> anything with numbers, use "signed char", "unsigned char", or (better, >>>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type. >>>>>> >>>>>> Unfortunately, in C, string literals have type char*, and char* strings >>>>>> are encountered everywhere, where they can store ASCII, extended ASCII >>>>>> of various kinds, or UTF8. >>>>>> >>>>>> But you frequently need to manipulate individual character codes. >>>>> >>>>> It's quite rare to have to manipulate characters individually, other >>>>> than perhaps comparing them to character constants. Certainly general >>>>> arithmetic is very rare on characters, with the exception of something >>>>> like UTF8 code point extraction. >>>> >>>> toupper() and tolower() were often implemented using arithmetic >>>> on characters in the C locale, but the point stands. >>>> >>> >>> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) >>> rather than arithmetic instructions? But I wouldn't do logical >>> operations on plain char either! (The C library implementation can, of >>> course, do whatever it likes.) >> >> Unix v7: >> #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) >> #define isupper(c) ((_ctype_+1)[c]&_U) >> #define islower(c) ((_ctype_+1)[c]&_L) >> #define isdigit(c) ((_ctype_+1)[c]&_N) >> #define isxdigit(c) ((_ctype_+1)[c]&(_N|_X)) >> #define isspace(c) ((_ctype_+1)[c]&_S) >> #define ispunct(c) ((_ctype_+1)[c]&_P) >> #define isalnum(c) ((_ctype_+1)[c]&(_U|_L|_N)) >> #define isprint(c) ((_ctype_+1)[c]&(_P|_U|_L|_N)) >> #define iscntrl(c) ((_ctype_+1)[c]&_C) >> #define isascii(c) ((unsigned)(c)<=0177) >> #define toupper(c) ((c)-'a'+'A') >> #define tolower(c) ((c)-'A'+'a') >> #define toascii(c) ((c)&0177) >> > >Those "toupper" and "tolower" macros are going to give the wrong result >for any argument that is not a lower case or upper case (respectively) >ASCII letter. Which in 1979 when Unix V7 in the wild is why isalpha was used to qualify toupper/lower. > The C standards require the macros to return the argument The above header file predated the standard by a number of years.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2023-08-17 16:05 +0200 |
| Message-ID | <ubl9fv$3prfv$4@dont-email.me> |
| In reply to | #172401 |
On 16/08/2023 19:50, Scott Lurndal wrote: > David Brown <david.brown@hesbynett.no> writes: >> On 16/08/2023 16:43, Scott Lurndal wrote: >>> David Brown <david.brown@hesbynett.no> writes: >>>> On 16/08/2023 16:05, Scott Lurndal wrote: >>>>> David Brown <david.brown@hesbynett.no> writes: >>>>>> On 16/08/2023 14:31, Bart wrote: >>>>>>> On 16/08/2023 07:53, David Brown wrote: >>>>>>> >>>>>>>> Just to be clear about this in case anyone wonders why you wrote that >>>>>>>> - plain "char" can be signed or unsigned, depending on the platform. >>>>>>>> Older platforms (including x86) tend to have plain char as signed, >>>>>>>> while more modern ABIs are usually unsigned. >>>>>>>> >>>>>>>> It is, of course, a terrible idea to do any kind of arithmetic or hold >>>>>>>> numbers in plain char - use them for 7-bit ASCII characters only. For >>>>>>>> anything with numbers, use "signed char", "unsigned char", or (better, >>>>>>>> IMHO) appropriate <stdint.h> types when you need a small integer type. >>>>>>> >>>>>>> Unfortunately, in C, string literals have type char*, and char* strings >>>>>>> are encountered everywhere, where they can store ASCII, extended ASCII >>>>>>> of various kinds, or UTF8. >>>>>>> >>>>>>> But you frequently need to manipulate individual character codes. >>>>>> >>>>>> It's quite rare to have to manipulate characters individually, other >>>>>> than perhaps comparing them to character constants. Certainly general >>>>>> arithmetic is very rare on characters, with the exception of something >>>>>> like UTF8 code point extraction. >>>>> >>>>> toupper() and tolower() were often implemented using arithmetic >>>>> on characters in the C locale, but the point stands. >>>>> >>>> >>>> Surely that would be logical instructions (x &= ~0x20, or x |= 0x20) >>>> rather than arithmetic instructions? But I wouldn't do logical >>>> operations on plain char either! (The C library implementation can, of >>>> course, do whatever it likes.) >>> >>> Unix v7: >>> #define isalpha(c) ((_ctype_+1)[c]&(_U|_L)) >>> #define isupper(c) ((_ctype_+1)[c]&_U) >>> #define islower(c) ((_ctype_+1)[c]&_L) >>> #define isdigit(c) ((_ctype_+1)[c]&_N) >>> #define isxdigit(c) ((_ctype_+1)[c]&(_N|_X)) >>> #define isspace(c) ((_ctype_+1)[c]&_S) >>> #define ispunct(c) ((_ctype_+1)[c]&_P) >>> #define isalnum(c) ((_ctype_+1)[c]&(_U|_L|_N)) >>> #define isprint(c) ((_ctype_+1)[c]&(_P|_U|_L|_N)) >>> #define iscntrl(c) ((_ctype_+1)[c]&_C) >>> #define isascii(c) ((unsigned)(c)<=0177) >>> #define toupper(c) ((c)-'a'+'A') >>> #define tolower(c) ((c)-'A'+'a') >>> #define toascii(c) ((c)&0177) >>> >> >> Those "toupper" and "tolower" macros are going to give the wrong result >> for any argument that is not a lower case or upper case (respectively) >> ASCII letter. > > Which in 1979 when Unix V7 in the wild is why isalpha was used to qualify toupper/lower. That would not suffice - you'd need isupper() or islower() : #define useful_tolower(c) isupper((c)) ? tolower((c)) : (c) #define useful_toupper(c) islower((c)) ? toupper((c)) : (c) > >> The C standards require the macros to return the argument > > The above header file predated the standard by a number of years. > Fair enough. Some things have got better over time!
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2023-08-16 07:35 -0700 |
| Message-ID | <9d8b343f-c61b-4bc6-b73e-a2df1ece7a63n@googlegroups.com> |
| In reply to | #172375 |
On Wednesday, 16 August 2023 at 14:31:51 UTC+1, David Brown wrote:
> On 16/08/2023 14:31, Bart wrote:
> > On 16/08/2023 07:53, David Brown wrote:
> >
> >> Just to be clear about this in case anyone wonders why you wrote that
> >> - plain "char" can be signed or unsigned, depending on the platform.
> >> Older platforms (including x86) tend to have plain char as signed,
> >> while more modern ABIs are usually unsigned.
> >>
> >> It is, of course, a terrible idea to do any kind of arithmetic or hold
> >> numbers in plain char - use them for 7-bit ASCII characters only. For
> >> anything with numbers, use "signed char", "unsigned char", or (better,
> >> IMHO) appropriate <stdint.h> types when you need a small integer type.
> >
> > Unfortunately, in C, string literals have type char*, and char* strings
> > are encountered everywhere, where they can store ASCII, extended ASCII
> > of various kinds, or UTF8.
> >
> > But you frequently need to manipulate individual character codes.
> It's quite rare to have to manipulate characters individually, other
> than perhaps comparing them to character constants. Certainly general
> arithmetic is very rare on characters, with the exception of something
> like UTF8 code point extraction. Do you have examples where people
> might reasonably want to perform arithmetic on plain char, that you
> think occur frequently?
>
> I think it is entirely appropriate to use "char" for characters (and
> const char* for immutable strings). But I don't think it is appropriate
> for any kind of arithmetic.
>
Theoretically an atoi() should be implemeted with
strchr("0123456789", ch);
to convert from character to digit. But people like efficiency.
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.lang.c
csiph-web