Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #154849 > unrolled thread

Puzzling "array subscript has type ‘char’" warning.

Started byRobbie Hatley <see.my.signature@for.my.contact.info>
First post2020-09-10 20:37 -0700
Last post2020-11-06 20:35 -0800
Articles 13 — 9 participants

Back to article view | Back to comp.lang.c


Contents

  Puzzling "array subscript has type ‘char’" warning. Robbie Hatley <see.my.signature@for.my.contact.info> - 2020-09-10 20:37 -0700
    Re: Puzzling "array subscript has type ‘char’" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-09-10 20:58 -0700
      Re: Puzzling "array subscript has type ‘char’" warning. Robbie Hatley <see.my.signature@for.my.contact.info> - 2020-09-10 22:49 -0700
        Re: Puzzling "array subscript has type ‘char’" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-09-11 01:10 -0700
          Re: Puzzling "array subscript has type ‘char’" warning. gazelle@shell.xmission.com (Kenny McCormack) - 2020-09-11 13:09 +0000
            Re: Puzzling "array subscript has type ‘char’" warning. Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2020-09-11 06:14 -0700
        Re: Puzzling "array subscript has type ‘char’" warning. Richard Damon <Richard@Damon-Family.org> - 2020-09-11 08:09 -0400
        Re: Puzzling "array subscript has type ‘char’" warning. James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-09-11 09:01 -0400
        Re: Puzzling "array subscript has type ‘char’" warning. Kaz Kylheku <793-849-0957@kylheku.com> - 2020-09-11 16:16 +0000
      Re: Puzzling "array subscript has type char" warning. dave_thompson_2@comcast.net - 2020-10-11 16:12 -0400
        Re: Puzzling "array subscript has type char" warning. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:28 -0700
          Re: Puzzling "array subscript has type char" warning. dave_thompson_2@comcast.net - 2020-11-01 23:07 -0500
    Re: Puzzling "array subscript has type ‘char’" warning. jenniferjeson35@gmail.com - 2020-11-06 20:35 -0800

#154849 — Puzzling "array subscript has type ‘char’" warning.

FromRobbie Hatley <see.my.signature@for.my.contact.info>
Date2020-09-10 20:37 -0700
SubjectPuzzling "array subscript has type ‘char’" warning.
Message-ID<e7Cdncm5EpqKbMfCnZ2dnUU7_83NnZ2d@giganews.com>
Ahoy, group. I just ran into a warning I don't understand.
The code that generated the warning was this:

else if (isdigit(*Ptr2))
{
   *Ptr1++ = *Ptr2;
   FirstDigit = 1;
}

and the warning (from GCC) was this:

integer-from-string.c:39:24: warning: array subscript has
type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))

But there are no arrays being used! Ptr1 and Ptr2 are simple
pointers-to-char. So why would GCC think that *Ptr2 is an
array subscript?

As I understand it, isdigit() (from "ctype.h") doesn't treat
its argument as an array-subscript, but as an int character
ordinal, and returns 1 if the character is a digit, otherwise
returns 0.

I was able to make the warning go away with a typecast to int:

else if (isdigit((int)(*Ptr2)))
{
   *Ptr1++ = *Ptr2;
   FirstDigit = 1;
}

But oddly, the code works fine with EITHER version. So I'm not
understanding where the "array subscript" warning is coming
from. Anyone have any ideas?


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
https://www.facebook.com/robbie.hatley
https://people.well.com/user/lonewolf/

[toc] | [next] | [standalone]


#154851

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2020-09-10 20:58 -0700
Message-ID<87lfhhge1m.fsf@nosuchdomain.example.com>
In reply to#154849
Robbie Hatley <see.my.signature@for.my.contact.info> writes:
> Ahoy, group. I just ran into a warning I don't understand.
> The code that generated the warning was this:
>
> else if (isdigit(*Ptr2))
> {
>    *Ptr1++ = *Ptr2;
>    FirstDigit = 1;
> }
>
> and the warning (from GCC) was this:
>
> integer-from-string.c:39:24: warning: array subscript has
> type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))
>
> But there are no arrays being used! Ptr1 and Ptr2 are simple
> pointers-to-char. So why would GCC think that *Ptr2 is an
> array subscript?
>
> As I understand it, isdigit() (from "ctype.h") doesn't treat
> its argument as an array-subscript, but as an int character
> ordinal, and returns 1 if the character is a digit, otherwise
> returns 0.

isdigit is very likely a macro in the implementation you're using.
The macro presumably works by subscripting into an array object.

> I was able to make the warning go away with a typecast to int:
>
> else if (isdigit((int)(*Ptr2)))
> {
>    *Ptr1++ = *Ptr2;
>    FirstDigit = 1;
> }
>
> But oddly, the code works fine with EITHER version. So I'm not
> understanding where the "array subscript" warning is coming
> from. Anyone have any ideas?

The ctype() function requires an argument of type int whose value must
be either EOF (almost certainly -1) or a value representable as an
unsigned char.  For any other value, the behavior is undefined.

A typical implementation might define an array of small integers indexed
from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
from the argument.

Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
happens to be negative (and not equal to EOF), then evaluating
isdigit(*Ptr2) has undefined behavior.  Converting to int merely
masks the problem.  Your code "works fine" if *Ptr1 happens to
be non-negative.

Cast the argument to unsigned char:

    isdigit((unsigned char)*Ptr2

(Yes, it's annoying that you need to do thsis, and I consider it a
design flaw in the <ctype.h> functions, but there it is.)

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#154852

FromRobbie Hatley <see.my.signature@for.my.contact.info>
Date2020-09-10 22:49 -0700
Message-ID<1eidnfKeDO6cjcbCnZ2dnUU7_8zNnZ2d@giganews.com>
In reply to#154851
On 9/10/2020 8:58 PM, Keith Thompson wrote:

> isdigit is very likely a macro in the implementation you're using.
> The macro presumably works by subscripting into an array object.

Ah. Well, that would explain the "array subscript is char" warning.

But as to why anyone would implement it using arrays, I'm not seeing.
Wouldn't be be faster and easier to do the following? This should
work for both ASCII and UTF-8:

int isdigit (int x)
{
   if ( x > 47 && x < 58 )
      return 1;
   else
      return 0;
}

But perhaps there's some clever way to do it with a macro and
an array that I'm not seeing. OK, I'm intrigued. Let's google
"newlib isdigit source code" and see.

Interesting! Uses an array and a bitwise-and:

#undef isdigit
int
_DEFUN(isdigit,(c),int c)
{
   return(__ctype_ptr__[c+1] & _N);
}

> The ctype() function requires an argument of type int whose value must
> be either EOF (almost certainly -1) or a value representable as an
> unsigned char.  For any other value, the behavior is undefined.
> 
> A typical implementation might define an array of small integers indexed
> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
> from the argument.
> 
> Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
> happens to be negative (and not equal to EOF), then evaluating
> isdigit(*Ptr2) has undefined behavior.  Converting to int merely
> masks the problem.  Your code "works fine" if *Ptr1 happens to
> be non-negative.
> 
> Cast the argument to unsigned char:
> 
>     isdigit((unsigned char)*Ptr2
> 
> (Yes, it's annoying that you need to do thsis, and I consider it a
> design flaw in the <ctype.h> functions, but there it is.)

Fascinating. Yes, that's safer, because it forces the number to be
in the [0,127] range. Thanks for the tip!


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
https://www.facebook.com/robbie.hatley
https://people.well.com/user/lonewolf/

[toc] | [prev] | [next] | [standalone]


#154855

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2020-09-11 01:10 -0700
Message-ID<87h7s4hgyb.fsf@nosuchdomain.example.com>
In reply to#154852
Robbie Hatley <see.my.signature@for.my.contact.info> writes:
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
[...]
>> Cast the argument to unsigned char:
>> 
>>     isdigit((unsigned char)*Ptr2
>> 
>> (Yes, it's annoying that you need to do thsis, and I consider it a
>> design flaw in the <ctype.h> functions, but there it is.)
>
> Fascinating. Yes, that's safer, because it forces the number to be
> in the [0,127] range. Thanks for the tip!

No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8).

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#154862

Fromgazelle@shell.xmission.com (Kenny McCormack)
Date2020-09-11 13:09 +0000
Message-ID<rjfstq$2s0t$1@news.xmission.com>
In reply to#154855
In article <87h7s4hgyb.fsf@nosuchdomain.example.com>,
Keith Thompson  <Keith.S.Thompson+u@gmail.com> wrote:
>Robbie Hatley <see.my.signature@for.my.contact.info> writes:
>> On 9/10/2020 8:58 PM, Keith Thompson wrote:
>[...]
>>> Cast the argument to unsigned char:
>>> 
>>>     isdigit((unsigned char)*Ptr2
>>> 
>>> (Yes, it's annoying that you need to do thsis, and I consider it a
>>> design flaw in the <ctype.h> functions, but there it is.)
>>
>> Fascinating. Yes, that's safer, because it forces the number to be
>> in the [0,127] range. Thanks for the tip!
>
>No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8).

Presumably, Robbie was assuming that CHAR_BIT is 7.

I don't know if any such machines exist.

-- 
Donald Drumpf claims to be "the least racist person you'll ever meet".

This would be true if the only other person you've ever met was David Duke.

[toc] | [prev] | [next] | [standalone]


#154863

FromMalcolm McLean <malcolm.arthur.mclean@gmail.com>
Date2020-09-11 06:14 -0700
Message-ID<4bd9068a-68cd-42d1-bc0e-f0bcd5fff7b8n@googlegroups.com>
In reply to#154862
On Friday, 11 September 2020 at 14:09:31 UTC+1, Kenny McCormack wrote:
> In article <87h7s4h...@nosuchdomain.example.com>,
> Keith Thompson <Keith.S.T...@gmail.com> wrote: 
> >Robbie Hatley <see.my.s...@for.my.contact.info> writes: 
> >> On 9/10/2020 8:58 PM, Keith Thompson wrote: 
> >[...] 
> >>> Cast the argument to unsigned char: 
> >>> 
> >>> isdigit((unsigned char)*Ptr2 
> >>> 
> >>> (Yes, it's annoying that you need to do thsis, and I consider it a 
> >>> design flaw in the <ctype.h> functions, but there it is.) 
> >> 
> >> Fascinating. Yes, that's safer, because it forces the number to be 
> >> in the [0,127] range. Thanks for the tip! 
> > 
> >No, it forces it to be in the [0,255] range (assuming CHAR_BIT==8).
> Presumably, Robbie was assuming that CHAR_BIT is 7. 
> 
> I don't know if any such machines exist. 
> 
They're not conforming. Of course hardware with 7 bit bytes could be built,
probably has been in the dim and distant past. But C inists that CHAR_BIT
be at least 8, so you couldn't easily implement ANSI C on such machines.

[toc] | [prev] | [next] | [standalone]


#154860

FromRichard Damon <Richard@Damon-Family.org>
Date2020-09-11 08:09 -0400
Message-ID<NNJ6H.456501$AN2.205383@fx46.iad>
In reply to#154852
On 9/11/20 1:49 AM, Robbie Hatley wrote:
> 
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
> 
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
> 
> Ah. Well, that would explain the "array subscript is char" warning.
> 
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:
> 
> int isdigit (int x)
> {
>    if ( x > 47 && x < 58 )
>       return 1;
>    else
>       return 0;
> }
> 
> But perhaps there's some clever way to do it with a macro and
> an array that I'm not seeing. OK, I'm intrigued. Let's google
> "newlib isdigit source code" and see.
> 
> Interesting! Uses an array and a bitwise-and:
> 
> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
>    return(__ctype_ptr__[c+1] & _N);
> }
> 
>> The ctype() function requires an argument of type int whose value must
>> be either EOF (almost certainly -1) or a value representable as an
>> unsigned char.  For any other value, the behavior is undefined.
>>
>> A typical implementation might define an array of small integers indexed
>> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
>> from the argument.
>>
>> Given that Ptr2 is of type char*, if plain char is signed and *Ptr2
>> happens to be negative (and not equal to EOF), then evaluating
>> isdigit(*Ptr2) has undefined behavior.  Converting to int merely
>> masks the problem.  Your code "works fine" if *Ptr1 happens to
>> be non-negative.
>>
>> Cast the argument to unsigned char:
>>
>>     isdigit((unsigned char)*Ptr2
>>
>> (Yes, it's annoying that you need to do thsis, and I consider it a
>> design flaw in the <ctype.h> functions, but there it is.)
> 
> Fascinating. Yes, that's safer, because it forces the number to be
> in the [0,127] range. Thanks for the tip!
> 
> 
The big problem is that at the time the standard was forming, 'Unicode'
wasn't really a thing, and other languages were being handled with
things like code-pages and sometimes wide character sets. With
code-pages, arrays make a lot of sense for handling the classifications,
particularly for things like islower.

[toc] | [prev] | [next] | [standalone]


#154861

FromJames Kuyper <jameskuyper@alumni.caltech.edu>
Date2020-09-11 09:01 -0400
Message-ID<rjfseo$iks$1@dont-email.me>
In reply to#154852
On 9/11/20 1:49 AM, Robbie Hatley wrote:
> 
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
> 
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
> 
> Ah. Well, that would explain the "array subscript is char" warning.
> 
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:
> 
> int isdigit (int x)
> {
>    if ( x > 47 && x < 58 )
>       return 1;
>    else
>       return 0;
> }
> 
> But perhaps there's some clever way to do it with a macro and
> an array that I'm not seeing. OK, I'm intrigued. Let's google
> "newlib isdigit source code" and see.
> 
> Interesting! Uses an array and a bitwise-and:
> 
> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
>    return(__ctype_ptr__[c+1] & _N);
> }

The key advantage of this approach is that a single array can be used
for all of the ctype functions, using a different value for _N for each
function. The mask used for isalnum can be the bit-wise or of the masks
for isalpha and isdigit. The mask used for isalpha can be the bit-wise
or of the masks for isupper and islower.

Furthermore, __ctype_ptr__ can be a pointer to an array. There's a
different array for each locale, and setlocale() simply changes the
value of that pointer.

[toc] | [prev] | [next] | [standalone]


#154869

FromKaz Kylheku <793-849-0957@kylheku.com>
Date2020-09-11 16:16 +0000
Message-ID<20200911090559.896@kylheku.com>
In reply to#154852
On 2020-09-11, Robbie Hatley <see.my.signature@for.my.contact.info> wrote:
>
> On 9/10/2020 8:58 PM, Keith Thompson wrote:
>
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
>
> Ah. Well, that would explain the "array subscript is char" warning.
>
> But as to why anyone would implement it using arrays, I'm not seeing.
> Wouldn't be be faster and easier to do the following? This should
> work for both ASCII and UTF-8:

The single-byte-character is*() routines are not concerned
at all with multi-byte encodings like UTF-8.

Most of these functions have locale-dependent behavior, so that
is one reason for table lookup.  However, isdigit and isxdigit
are the exceptions; they are locale-independent.

> int isdigit (int x)
> {
>    if ( x > 47 && x < 58 )
>       return 1;
>    else
>       return 0;
> }

That is better from a "cache footprint" point of view.

On a machine that has a single instruction for testing whether
a value lies in a fixed range, it could translate to optimal code.

On other machines it will have some branching and testing requiring
branch prediction. (Pick your poison: failed cache lookups or failed
branch predictions.)

Note that there is a requirement which might not have been
mentioned which is that these functions also handle the value EOF,
which the above does.

> #undef isdigit
> int
> _DEFUN(isdigit,(c),int c)
> {
>    return(__ctype_ptr__[c+1] & _N);
> }

That's why you have this c+1 here; EOF is -1 on that implementation.

[toc] | [prev] | [next] | [standalone]


#155534 — Re: Puzzling "array subscript has type char" warning.

Fromdave_thompson_2@comcast.net
Date2020-10-11 16:12 -0400
SubjectRe: Puzzling "array subscript has type char" warning.
Message-ID<mlp6ofhh2jdc0ojt7qi6j3os6rr45pniel@4ax.com>
In reply to#154851
On Thu, 10 Sep 2020 20:58:29 -0700, Keith Thompson
<Keith.S.Thompson+u@gmail.com> wrote:
...
> isdigit is very likely a macro in the implementation you're using.
> The macro presumably works by subscripting into an array object.
...
> The ctype() function requires an argument of type int whose value must
> be either EOF (almost certainly -1) or a value representable as an
> unsigned char.  For any other value, the behavior is undefined.
> 
> A typical implementation might define an array of small integers indexed
> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
> from the argument.
> 
ADDING 1 to the argument (which is -1..UCHAR_MAX)

and then testing a bit, or sometimes a few bits, of the array element

which, more interestingly, is okay because isxx() are only required to
return nonzero or zero, not specifically one or zero as for the
builtin comparison/equality and logical operators.

(e-s made me change the subject, sorry)

[toc] | [prev] | [next] | [standalone]


#155549 — Re: Puzzling "array subscript has type char" warning.

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2020-10-11 17:28 -0700
SubjectRe: Puzzling "array subscript has type char" warning.
Message-ID<87wnzw5lui.fsf@nosuchdomain.example.com>
In reply to#155534
dave_thompson_2@comcast.net writes:
> On Thu, 10 Sep 2020 20:58:29 -0700, Keith Thompson
> <Keith.S.Thompson+u@gmail.com> wrote:
> ...
>> isdigit is very likely a macro in the implementation you're using.
>> The macro presumably works by subscripting into an array object.
> ...
>> The ctype() function requires an argument of type int whose value must
>> be either EOF (almost certainly -1) or a value representable as an
>> unsigned char.  For any other value, the behavior is undefined.
>> 
>> A typical implementation might define an array of small integers indexed
>> from 0 to UCHAR_MAX+1, with the index value computed by subtracting 1
>> from the argument.
>> 
> ADDING 1 to the argument (which is -1..UCHAR_MAX)

Yes, adding, thanks.

> and then testing a bit, or sometimes a few bits, of the array element
>
> which, more interestingly, is okay because isxx() are only required to
> return nonzero or zero, not specifically one or zero as for the
> builtin comparison/equality and logical operators.
>
> (e-s made me change the subject, sorry)

What is "e-s"?  (The previous subject had a couple of non-ASCII
characters.)

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#156340 — Re: Puzzling "array subscript has type char" warning.

Fromdave_thompson_2@comcast.net
Date2020-11-01 23:07 -0500
SubjectRe: Puzzling "array subscript has type char" warning.
Message-ID<b91vpfpmvvolngheki9k3qgnorso3d3m9c@4ax.com>
In reply to#155549
On Sun, 11 Oct 2020 17:28:21 -0700, Keith Thompson
<Keith.S.Thompson+u@gmail.com> wrote:

> dave_thompson_2@comcast.net writes:
...
> > (e-s made me change the subject, sorry)
> 
> What is "e-s"?  (The previous subject had a couple of non-ASCII
> characters.)

The newsserver I use, eternal-september.org 

which obviously detected what you point out, even though I failed to
notice at the time :-)

[toc] | [prev] | [next] | [standalone]


#156534

Fromjenniferjeson35@gmail.com
Date2020-11-06 20:35 -0800
Message-ID<56b885e0-3486-45c8-8550-4c96ef88fa8bo@googlegroups.com>
In reply to#154849
On Friday, September 11, 2020 at 9:08:17 AM UTC+5:30, Robbie Hatley wrote:
> Ahoy, group. I just ran into a warning I don't understand.
> The code that generated the warning was this:
> 
> else if (isdigit(*Ptr2))
> {
>    *Ptr1++ = *Ptr2;
>    FirstDigit = 1;
> }
> 
> and the warning (from GCC) was this:
> 
> integer-from-string.c:39:24: warning: array subscript has
> type ‘char’ [-Wchar-subscripts] 39 | else if (isdigit(*Ptr2))
> 
> But there are no arrays being used! Ptr1 and Ptr2 are simple
> pointers-to-char. So why would GCC think that *Ptr2 is an
> array subscript?
> 
> As I understand it, isdigit() (from "ctype.h") doesn't treat
> its argument as an array-subscript, but as an int character
> ordinal, and returns 1 if the character is a digit, otherwise
> returns 0.
> 
> I was able to make the warning go away with a typecast to int:
> 
> else if (isdigit((int)(*Ptr2)))
> {
>    *Ptr1++ = *Ptr2;
>    FirstDigit = 1;
> }
> 
> But oddly, the code works fine with EITHER version. So I'm not
> understanding where the "array subscript" warning is coming
> from. Anyone have any ideas?
> 
> 
> -- 
> Cheers,
> Robbie Hatley
> Midway City, CA, USA
> perl -E 'say "\154o\156e\167o\154f\100w\145ll\56c\157m";'
> https://www.facebook.com/robbie.hatley
> https://people.well.com/user/lonewolf/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.c


csiph-web