Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #154849 > unrolled thread
| Started by | Robbie Hatley <see.my.signature@for.my.contact.info> |
|---|---|
| First post | 2020-09-10 20:37 -0700 |
| Last post | 2020-11-06 20:35 -0800 |
| Articles | 13 — 9 participants |
Back to article view | Back to comp.lang.c
Puzzling "array subscript has type ‘char’" warning. Robbie Hatley <see.my.signature@for.my.contact.info> - 2020-09-10 20:37 -0700
Re: Puzzling "array subscript has type ‘char’" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-09-10 20:58 -0700
Re: Puzzling "array subscript has type ‘char’" warning. Robbie Hatley <see.my.signature@for.my.contact.info> - 2020-09-10 22:49 -0700
Re: Puzzling "array subscript has type ‘char’" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-09-11 01:10 -0700
Re: Puzzling "array subscript has type ‘char’" warning. gazelle@shell.xmission.com (Kenny McCormack) - 2020-09-11 13:09 +0000
Re: Puzzling "array subscript has type ‘char’" warning. Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2020-09-11 06:14 -0700
Re: Puzzling "array subscript has type ‘char’" warning. Richard Damon <Richard@Damon-Family.org> - 2020-09-11 08:09 -0400
Re: Puzzling "array subscript has type ‘char’" warning. James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-09-11 09:01 -0400
Re: Puzzling "array subscript has type ‘char’" warning. Kaz Kylheku <793-849-0957@kylheku.com> - 2020-09-11 16:16 +0000
Re: Puzzling "array subscript has type char" warning. dave_thompson_2@comcast.net - 2020-10-11 16:12 -0400
Re: Puzzling "array subscript has type char" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:28 -0700
Re: Puzzling "array subscript has type char" warning. dave_thompson_2@comcast.net - 2020-11-01 23:07 -0500
Re: Puzzling "array subscript has type ‘char’" warning. jenniferjeson35@gmail.com - 2020-11-06 20:35 -0800
| From | Robbie Hatley <see.my.signature@for.my.contact.info> |
|---|---|
| Date | 2020-09-10 20:37 -0700 |
| Subject | Puzzling "array subscript has type ‘char’" warning. |
| Message-ID | <e7Cdncm5EpqKbMfCnZ2dnUU7_83NnZ2d@giganews.com> |
Ahoy, group. I just ran into a warning I don't understand.
The code that generated the warning was this:
else if (isdigit(*Ptr2))
{
*Ptr1++ = *Ptr2;
FirstDigit = 1;
}
and the warning (from GCC) was this:
integer-from-string.c:39:24: warning: array subscript has
type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))
But there are no arrays being used! Ptr1 and Ptr2 are simple
pointers-to-char. So why would GCC think that *Ptr2 is an
array subscript?
As I understand it, isdigit() (from "ctype.h") doesn't treat
its argument as an array-subscript, but as an int character
ordinal, and returns 1 if the character is a digit, otherwise
returns 0.
I was able to make the warning go away with a typecast to int:
else if (isdigit((int)(*Ptr2)))
{
*Ptr1++ = *Ptr2;
FirstDigit = 1;
}
But oddly, the code works fine with EITHER version. So I'm not
understanding where the "array subscript" warning is coming
from. Anyone have any ideas?
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
https://www.facebook.com/robbie.hatley
https://people.well.com/user/lonewolf/
[toc] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-09-10 20:58 -0700 |
| Message-ID | <87lfhhge1m.fsf@nosuchdomain.example.com> |
| In reply to | #154849 |
Robbie Hatley <see.my.signature@for.my.contact.info> writes:
> Ahoy, group. I just ran into a warning I don't understand.
> The code that generated the warning was this:
>
> else if (isdigit(*Ptr2))
> {
> *Ptr1++ = *Ptr2;
> FirstDigit = 1;
> }
>
> and the warning (from GCC) was this:
>
> integer-from-string.c:39:24: warning: array subscript has
> type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))
>
> But there are no arrays being used! Ptr1 and Ptr2 are simple
> pointers-to-char. So why would GCC think that *Ptr2 is an
> array subscript?
>
> As I understand it, isdigit() (from "ctype.h") doesn't treat
> its argument as an array-subscript, but as an int character
> ordinal, and returns 1 if the character is a digit, otherwise
> returns 0.
isdigit is very likely a macro in the implementation you're using.
The macro presumably works by subscripting into an array object.
> I was able to make the warning go away with a typecast to int:
>
> else if (isdigit((int)(*Ptr2)))
> {
> *Ptr1++ = *Ptr2;
> FirstDigit = 1;
> }
>
> But oddly, the code works fine with EITHER version. So I'm not
> understanding where the "array subscript" warning is coming
> from. Anyone have any ideas?
The ctype() function requires an argument of type int whose value must
be either EOF (almost certainly -1) or a value representable as an
unsigned char. For any other value, the behavior is undefined.
A typical implementation might define an array of small integers indexed
from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
from the argument.
Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
happens to be negative (and not equal to EOF), then evaluating
isdigit(*Ptr2) has undefined behavior. Converting to int merely
masks the problem. Your code "works fine" if *Ptr1 happens to
be non-negative.
Cast the argument to unsigned char:
isdigit((unsigned char)*Ptr2
(Yes, it's annoying that you need to do thsis, and I consider it a
design flaw in the <ctype.h> functions, but there it is.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Robbie Hatley <see.my.signature@for.my.contact.info> |
|---|---|
| Date | 2020-09-10 22:49 -0700 |
| Message-ID | <1eidnfKeDO6cjcbCnZ2dnUU7_8zNnZ2d@giganews.com> |
| In reply to | #154851 |
On 9/10/2020 8:58 PM, Keith Thompson wrote:
> isdigit is very likely a macro in the implementation you're using.
> The macro presumably works by subscripting into an array object.
Ah. Well, that would explain the "array subscript is char" warning.
But as to why anyone would implement it using arrays, I'm not seeing.
Wouldn't be be faster and easier to do the following? This should
work for both ASCII and UTF-8:
int isdigit (int x)
{
if ( x > 47 && x < 58 )
return 1;
else
return 0;
}
But perhaps there's some clever way to do it with a macro and
an array that I'm not seeing. OK, I'm intrigued. Let's google
"newlib isdigit source code" and see.
Interesting! Uses an array and a bitwise-and:
#undef isdigit
int
_DEFUN(isdigit,(c),int c)
{
return(__ctype_ptr__[c+1] & _N);
}
> The ctype() function requires an argument of type int whose value must
> be either EOF (almost certainly -1) or a value representable as an
> unsigned char. For any other value, the behavior is undefined.
>
> A typical implementation might define an array of small integers indexed
> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
> from the argument.
>
> Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
> happens to be negative (and not equal to EOF), then evaluating
> isdigit(*Ptr2) has undefined behavior. Converting to int merely
> masks the problem. Your code "works fine" if *Ptr1 happens to
> be non-negative.
>
> Cast the argument to unsigned char:
>
> isdigit((unsigned char)*Ptr2
>
> (Yes, it's annoying that you need to do thsis, and I consider it a
> design flaw in the <ctype.h> functions, but there it is.)
Fascinating. Yes, that's safer, because it forces the number to be
in the [0,127] range. Thanks for the tip!
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
https://www.facebook.com/robbie.hatley
https://people.well.com/user/lonewolf/
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-09-11 01:10 -0700 |
| Message-ID | <87h7s4hgyb.fsf@nosuchdomain.example.com> |
| In reply to | #154852 |
Robbie Hatley <see.my.signature@for.my.contact.info> writes:
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
[...]
>> Cast the argument to unsigned char:
>>
>> isdigit((unsigned char)*Ptr2
>>
>> (Yes, it's annoying that you need to do thsis, and I consider it a
>> design flaw in the <ctype.h> functions, but there it is.)
>
> Fascinating. Yes, that's safer, because it forces the number to be
> in the [0,127] range. Thanks for the tip!
No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | gazelle@shell.xmission.com (Kenny McCormack) |
|---|---|
| Date | 2020-09-11 13:09 +0000 |
| Message-ID | <rjfstq$2s0t$1@news.xmission.com> |
| In reply to | #154855 |
In article <87h7s4hgyb.fsf@nosuchdomain.example.com>, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >Robbie Hatley <see.my.signature@for.my.contact.info> writes: >> On 9/10/2020 8:58 PM, Keith Thompson wrote: >[...] >>> Cast the argument to unsigned char: >>> >>> isdigit((unsigned char)*Ptr2 >>> >>> (Yes, it's annoying that you need to do thsis, and I consider it a >>> design flaw in the <ctype.h> functions, but there it is.) >> >> Fascinating. Yes, that's safer, because it forces the number to be >> in the [0,127] range. Thanks for the tip! > >No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8). Presumably, Robbie was assuming that CHAR_BIT is 7. I don't know if any such machines exist. -- Donald Drumpf claims to be "the least racist person you'll ever meet". This would be true if the only other person you've ever met was David Duke.
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.arthur.mclean@gmail.com> |
|---|---|
| Date | 2020-09-11 06:14 -0700 |
| Message-ID | <4bd9068a-68cd-42d1-bc0e-f0bcd5fff7b8n@googlegroups.com> |
| In reply to | #154862 |
On Friday, 11 September 2020 at 14:09:31 UTC+1, Kenny McCormack wrote: > In article <87h7s4h...@nosuchdomain.example.com>, > Keith Thompson <Keith.S.T...@gmail.com> wrote: > >Robbie Hatley <see.my.s...@for.my.contact.info> writes: > >> On 9/10/2020 8:58 PM, Keith Thompson wrote: > >[...] > >>> Cast the argument to unsigned char: > >>> > >>> isdigit((unsigned char)*Ptr2 > >>> > >>> (Yes, it's annoying that you need to do thsis, and I consider it a > >>> design flaw in the <ctype.h> functions, but there it is.) > >> > >> Fascinating. Yes, that's safer, because it forces the number to be > >> in the [0,127] range. Thanks for the tip! > > > >No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8). > Presumably, Robbie was assuming that CHAR_BIT is 7. > > I don't know if any such machines exist. > They're not conforming. Of course hardware with 7 bit bytes could be built, probably has been in the dim and distant past. But C inists that CHAR_BIT be at least 8, so you couldn't easily implement ANSI C on such machines.
[toc] | [prev] | [next] | [standalone]
| From | Richard Damon <Richard@Damon-Family.org> |
|---|---|
| Date | 2020-09-11 08:09 -0400 |
| Message-ID | <NNJ6H.456501$AN2.205383@fx46.iad> |
| In reply to | #154852 |
On 9/11/20 1:49 AM, Robbie Hatley wrote:
>
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
>
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
>
> Ah. Well, that would explain the "array subscript is char" warning.
>
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:
>
> int isdigit (int x)
> {
> if ( x > 47 && x < 58 )
> return 1;
> else
> return 0;
> }
>
> But perhaps there's some clever way to do it with a macro and
> an array that I'm not seeing. OK, I'm intrigued. Let's google
> "newlib isdigit source code" and see.
>
> Interesting! Uses an array and a bitwise-and:
>
> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
> return(__ctype_ptr__[c+1] & _N);
> }
>
>> The ctype() function requires an argument of type int whose value must
>> be either EOF (almost certainly -1) or a value representable as an
>> unsigned char. For any other value, the behavior is undefined.
>>
>> A typical implementation might define an array of small integers indexed
>> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
>> from the argument.
>>
>> Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
>> happens to be negative (and not equal to EOF), then evaluating
>> isdigit(*Ptr2) has undefined behavior. Converting to int merely
>> masks the problem. Your code "works fine" if *Ptr1 happens to
>> be non-negative.
>>
>> Cast the argument to unsigned char:
>>
>> isdigit((unsigned char)*Ptr2
>>
>> (Yes, it's annoying that you need to do thsis, and I consider it a
>> design flaw in the <ctype.h> functions, but there it is.)
>
> Fascinating. Yes, that's safer, because it forces the number to be
> in the [0,127] range. Thanks for the tip!
>
>
The big problem is that at the time the standard was forming, 'Unicode'
wasn't really a thing, and other languages were being handled with
things like code-pages and sometimes wide character sets. With
code-pages, arrays make a lot of sense for handling the classifications,
particularly for things like islower.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-09-11 09:01 -0400 |
| Message-ID | <rjfseo$iks$1@dont-email.me> |
| In reply to | #154852 |
On 9/11/20 1:49 AM, Robbie Hatley wrote:
>
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
>
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
>
> Ah. Well, that would explain the "array subscript is char" warning.
>
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:
>
> int isdigit (int x)
> {
> if ( x > 47 && x < 58 )
> return 1;
> else
> return 0;
> }
>
> But perhaps there's some clever way to do it with a macro and
> an array that I'm not seeing. OK, I'm intrigued. Let's google
> "newlib isdigit source code" and see.
>
> Interesting! Uses an array and a bitwise-and:
>
> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
> return(__ctype_ptr__[c+1] & _N);
> }
The key advantage of this approach is that a single array can be used
for all of the ctype functions, using a different value for _N for each
function. The mask used for isalnum can be the bit-wise or of the masks
for isalpha and isdigit. The mask used for isalpha can be the bit-wise
or of the masks for isupper and islower.
Furthermore, __ctype_ptr__ can be a pointer to an array. There's a
different array for each locale, and setlocale() simply changes the
value of that pointer.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <793-849-0957@kylheku.com> |
|---|---|
| Date | 2020-09-11 16:16 +0000 |
| Message-ID | <20200911090559.896@kylheku.com> |
| In reply to | #154852 |
On 2020-09-11, Robbie Hatley <see.my.signature@for.my.contact.info> wrote:
>
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
>
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
>
> Ah. Well, that would explain the "array subscript is char" warning.
>
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:
The single-byte-character is*() routines are not concerned
at all with multi-byte encodings like UTF-8.
Most of these functions have locale-dependent behavior, so that
is one reason for table lookup. However, isdigit and isxdigit
are the exceptions; they are locale-independent.
> int isdigit (int x)
> {
> if ( x > 47 && x < 58 )
> return 1;
> else
> return 0;
> }
That is better from a "cache footprint" point of view.
On a machine that has a single instruction for testing whether
a value lies in a fixed range, it could translate to optimal code.
On other machines it will have some branching and testing requiring
branch prediction. (Pick your poison: failed cache lookups or failed
branch predictions.)
Note that there is a requirement which might not have been
mentioned which is that these functions also handle the value EOF,
which the above does.
> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
> return(__ctype_ptr__[c+1] & _N);
> }
That's why you have this c+1 here; EOF is -1 on that implementation.
[toc] | [prev] | [next] | [standalone]
| From | dave_thompson_2@comcast.net |
|---|---|
| Date | 2020-10-11 16:12 -0400 |
| Subject | Re: Puzzling "array subscript has type char" warning. |
| Message-ID | <mlp6ofhh2jdc0ojt7qi6j3os6rr45pniel@4ax.com> |
| In reply to | #154851 |
On Thu, 10 Sep 2020 20:58:29 -0700, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: ... > isdigit is very likely a macro in the implementation you're using. > The macro presumably works by subscripting into an array object. ... > The ctype() function requires an argument of type int whose value must > be either EOF (almost certainly -1) or a value representable as an > unsigned char. For any other value, the behavior is undefined. > > A typical implementation might define an array of small integers indexed > from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1 > from the argument. > ADDING 1 to the argument (which is -1..UCHAR_MAX) and then testing a bit, or sometimes a few bits, of the array element which, more interestingly, is okay because isxx() are only required to return nonzero or zero, not specifically one or zero as for the builtin comparison/equality and logical operators. (e-s made me change the subject, sorry)
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-11 17:28 -0700 |
| Subject | Re: Puzzling "array subscript has type char" warning. |
| Message-ID | <87wnzw5lui.fsf@nosuchdomain.example.com> |
| In reply to | #155534 |
dave_thompson_2@comcast.net writes:
> On Thu, 10 Sep 2020 20:58:29 -0700, Keith Thompson
> <Keith.S.Thompson+u@gmail.com> wrote:
> ...
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
> ...
>> The ctype() function requires an argument of type int whose value must
>> be either EOF (almost certainly -1) or a value representable as an
>> unsigned char. For any other value, the behavior is undefined.
>>
>> A typical implementation might define an array of small integers indexed
>> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
>> from the argument.
>>
> ADDING 1 to the argument (which is -1..UCHAR_MAX)
Yes, adding, thanks.
> and then testing a bit, or sometimes a few bits, of the array element
>
> which, more interestingly, is okay because isxx() are only required to
> return nonzero or zero, not specifically one or zero as for the
> builtin comparison/equality and logical operators.
>
> (e-s made me change the subject, sorry)
What is "e-s"? (The previous subject had a couple of non-ASCII
characters.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | dave_thompson_2@comcast.net |
|---|---|
| Date | 2020-11-01 23:07 -0500 |
| Subject | Re: Puzzling "array subscript has type char" warning. |
| Message-ID | <b91vpfpmvvolngheki9k3qgnorso3d3m9c@4ax.com> |
| In reply to | #155549 |
On Sun, 11 Oct 2020 17:28:21 -0700, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: > dave_thompson_2@comcast.net writes: ... > > (e-s made me change the subject, sorry) > > What is "e-s"? (The previous subject had a couple of non-ASCII > characters.) The newsserver I use, eternal-september.org which obviously detected what you point out, even though I failed to notice at the time :-)
[toc] | [prev] | [next] | [standalone]
| From | jenniferjeson35@gmail.com |
|---|---|
| Date | 2020-11-06 20:35 -0800 |
| Message-ID | <56b885e0-3486-45c8-8550-4c96ef88fa8bo@googlegroups.com> |
| In reply to | #154849 |
On Friday, September 11, 2020 at 9:08:17 AM UTC+5:30, Robbie Hatley wrote:
> Ahoy, group. I just ran into a warning I don't understand.
> The code that generated the warning was this:
>
> else if (isdigit(*Ptr2))
> {
> *Ptr1++ = *Ptr2;
> FirstDigit = 1;
> }
>
> and the warning (from GCC) was this:
>
> integer-from-string.c:39:24: warning: array subscript has
> type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))
>
> But there are no arrays being used! Ptr1 and Ptr2 are simple
> pointers-to-char. So why would GCC think that *Ptr2 is an
> array subscript?
>
> As I understand it, isdigit() (from "ctype.h") doesn't treat
> its argument as an array-subscript, but as an int character
> ordinal, and returns 1 if the character is a digit, otherwise
> returns 0.
>
> I was able to make the warning go away with a typecast to int:
>
> else if (isdigit((int)(*Ptr2)))
> {
> *Ptr1++ = *Ptr2;
> FirstDigit = 1;
> }
>
> But oddly, the code works fine with EITHER version. So I'm not
> understanding where the "array subscript" warning is coming
> from. Anyone have any ideas?
>
>
> --
> Cheers,
> Robbie Hatley
> Midway City, CA, USA
> perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
> https://www.facebook.com/robbie.hatley
> https://people.well.com/user/lonewolf/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.c
csiph-web