Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #173802 > unrolled thread

Can we lie to memchr?

Started byKaz Kylheku <864-117-4973@kylheku.com>
First post2023-09-03 17:59 +0000
Last post2023-09-03 19:30 +0100
Articles 4 — 3 participants

Back to article view | Back to comp.lang.c


Contents

  Can we lie to memchr? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-09-03 17:59 +0000
    Re: Can we lie to memchr? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-03 11:22 -0700
      Re: Can we lie to memchr? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-09-03 18:58 +0000
    Re: Can we lie to memchr? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-03 19:30 +0100

#173802 — Can we lie to memchr?

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-09-03 17:59 +0000
SubjectCan we lie to memchr?
Message-ID<20230903104255.310@kylheku.com>
You would think that memchr can be used to test whether a string is
longer than N without traversing it. For instance we can take a
gigabyte-long character string and efficiently test wheether it is
shorter than 10 characters:

  memchr(gigastr, 0, 10) == 0

if a null is found within the first 10 bytes, then its length
is 10 or more.

But suppose a 7 byte string is passed (length 6).

That *object* is smaller than n; it does not have an "initial sequence
of n characters" for memchr to search.

ISO C doesn't say that bytes of the initial sequence which are
beyond are sought-after value shall not be accessed by memchr.

For instance, for shits and giggles, memchr could perform a
right-to-left scan, and report the most recently found, hence
leftmost, occurrence of the value.

Or it could assume it can load an 8 byte word from the start of the object
(even if unaligned), since that lies within 10. Yet that 8 could extend
into an unmapped page.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [next] | [standalone]


#173803

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-09-03 11:22 -0700
Message-ID<86r0nfqfql.fsf@linuxsc.com>
In reply to#173802
Kaz Kylheku <864-117-4973@kylheku.com> writes:

> You would think that memchr can be used to test whether a string is
> longer than N without traversing it.  For instance we can take a
> gigabyte-long character string and efficiently test wheether it is
> shorter than 10 characters:
>
>   memchr(gigastr, 0, 10) == 0
>
> if a null is found within the first 10 bytes, then its length
> is 10 or more.
>
> But suppose a 7 byte string is passed (length 6).
>
> That *object* is smaller than n;  it does not have an "initial sequence
> of n characters" for memchr to search.
>
> ISO C doesn't say that bytes of the initial sequence which are
> beyond are sought-after value shall not be accessed by memchr.  [...]

It does, and has for more than 10 years.

[toc] | [prev] | [next] | [standalone]


#173810

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-09-03 18:58 +0000
Message-ID<20230903115634.349@kylheku.com>
In reply to#173803
On 2023-09-03, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Kaz Kylheku <864-117-4973@kylheku.com> writes:
>> ISO C doesn't say that bytes of the initial sequence which are
>> beyond are sought-after value shall not be accessed by memchr.  [...]
>
> It does, and has for more than 10 years.

Thanks, Tim, and also Ben.

I looked in the wrong tab of the PDF reader, where I have C99 open!

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#173804

FromBen Bacarisse <ben.usenet@bsb.me.uk>
Date2023-09-03 19:30 +0100
Message-ID<87fs3vw1n7.fsf@bsb.me.uk>
In reply to#173802
Kaz Kylheku <864-117-4973@kylheku.com> writes:

> You would think that memchr can be used to test whether a string is
> longer than N without traversing it. For instance we can take a
> gigabyte-long character string and efficiently test wheether it is
> shorter than 10 characters:
>
>   memchr(gigastr, 0, 10) == 0
>
> if a null is found within the first 10 bytes, then its length
> is 10 or more.
>
> But suppose a 7 byte string is passed (length 6).
>
> That *object* is smaller than n; it does not have an "initial sequence
> of n characters" for memchr to search.
>
> ISO C doesn't say that bytes of the initial sequence which are
> beyond are sought-after value shall not be accessed by memchr.

Well, it does say that

  "The implementation shall behave as if it reads the characters sequentially
  and stops as soon as a matching character is found."

> For instance, for shits and giggles, memchr could perform a
> right-to-left scan, and report the most recently found, hence
> leftmost, occurrence of the value.
>
> Or it could assume it can load an 8 byte word from the start of the object
> (even if unaligned), since that lies within 10. Yet that 8 could extend
> into an unmapped page.

Only if the behaviour is consistent with the above quote, so anything
going wrong as a result of looking beyond the first occurrence is, I
think, ruled out.

-- 
Ben.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.c


csiph-web