Groups > comp.lang.c > #152174 > unrolled thread

How many wide characters may mbstowcs store?

Started by	Philipp Klaus Krause <pkk@spth.de>
First post	2020-05-11 13:30 +0200
Last post	2020-06-28 06:32 -0700
Articles	16 on this page of 76 — 16 participants

Back to article view | Back to comp.lang.c

  How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 13:30 +0200
    Re: How many wide characters may mbstowcs store? Manfred <noname@add.invalid> - 2020-05-11 13:55 +0200
      Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 14:01 +0200
        Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-11 09:08 -0400
          Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 15:19 +0200
          Re: How many wide characters may mbstowcs store? Manfred <noname@add.invalid> - 2020-05-11 18:32 +0200
          Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 18:59 +0200
            Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 17:42 +0000
              Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 20:30 +0200
              Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-12 00:29 -0400
                Re: How many wide characters may mbstowcs store? Manfred <noname@add.invalid> - 2020-05-12 14:41 +0200
                Re: How many wide characters may mbstowcs store? Bonita Montero <Bonita.Montero@gmail.com> - 2020-05-12 16:19 +0200
    Re: How many wide characters may mbstowcs store? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-05-11 13:03 +0100
      Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 14:07 +0200
    Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 15:20 +0200
      Re: How many wide characters may mbstowcs store? Bonita Montero <Bonita.Montero@gmail.com> - 2020-05-11 16:31 +0200
      Re: How many wide characters may mbstowcs store? Barry Schwarz <schwarzb@delq.com> - 2020-05-11 10:06 -0700
    Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 15:58 +0200
      Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-11 10:24 -0400
      Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 16:52 +0200
        Re: How many wide characters may mbstowcs store? Manfred <noname@add.invalid> - 2020-05-11 18:55 +0200
      Re: How many wide characters may mbstowcs store? richard@cogsci.ed.ac.uk (Richard Tobin) - 2020-05-11 15:51 +0000
        Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 19:01 +0200
          Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 17:33 +0000
            Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 20:57 +0200
              Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 19:17 +0000
                Re: How many wide characters may mbstowcs store? richard@cogsci.ed.ac.uk (Richard Tobin) - 2020-05-11 19:41 +0000
                  Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 20:01 +0000
              Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 19:19 +0000
      Re: How many wide characters may mbstowcs store? Florian Weimer <fw@deneb.enyo.de> - 2020-05-11 20:24 +0200
        Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 20:59 +0200
          Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 19:17 +0000
            Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 21:24 +0200
          Re: How many wide characters may mbstowcs store? Florian Weimer <fw@deneb.enyo.de> - 2020-05-11 22:30 +0200
    Re: How many wide characters may mbstowcs store? Bonita Montero <Bonita.Montero@gmail.com> - 2020-05-11 16:44 +0200
      Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 16:54 +0200
        Re: How many wide characters may mbstowcs store? Bonita Montero <Bonita.Montero@gmail.com> - 2020-05-11 16:57 +0200
          Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 17:07 +0200
            Re: How many wide characters may mbstowcs store? Bonita Montero <Bonita.Montero@gmail.com> - 2020-05-11 17:08 +0200
            Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-11 11:25 -0400
        Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-11 09:06 -0700
          Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 19:05 +0200
            Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 19:19 +0200
              Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-23 07:51 -0700
                Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-23 20:27 +0200
                  Re: How many wide characters may mbstowcs store? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-23 14:25 -0700
                    Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-26 07:09 -0700
                  Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-26 07:14 -0700
                    Re: How many wide characters may mbstowcs store? Spiros Bousbouras <spibou@gmail.com> - 2020-05-26 16:00 +0000
                      Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-29 21:23 -0700
                        Re: How many wide characters may mbstowcs store? Spiros Bousbouras <spibou@gmail.com> - 2020-05-30 20:08 +0000
                          Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-03 08:46 -0700
                            Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-03 10:18 -0700
                              Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-23 05:35 -0700
                                Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-26 06:32 -0700
                                  Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-09-02 09:19 -0700
                                    Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-09-02 19:50 -0700
                            Re: How many wide characters may mbstowcs store? raltbos@xs4all.nl (Richard Bos) - 2020-06-04 20:34 +0000
          Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 17:39 +0000
            Re: How many wide characters may mbstowcs store? Autist <autist69@gmail.com> - 2020-05-11 19:42 +0200
            Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-11 20:28 +0200
              Re: How many wide characters may mbstowcs store? scott@slp53.sl.home (Scott Lurndal) - 2020-05-11 18:37 +0000
                Re: How many wide characters may mbstowcs store? Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2020-05-11 11:50 -0700
                  Re: How many wide characters may mbstowcs store? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-05-12 20:02 +0100
                    Re: How many wide characters may mbstowcs store? Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2020-05-12 13:12 -0700
            Re: How many wide characters may mbstowcs store? richard@cogsci.ed.ac.uk (Richard Tobin) - 2020-05-11 19:56 +0000
            Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 16:49 -0700
        Re: How many wide characters may mbstowcs store? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-11 14:19 -0700
          Re: How many wide characters may mbstowcs store? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-11 14:22 -0700
            Re: How many wide characters may mbstowcs store? Philipp Klaus Krause <pkk@spth.de> - 2020-05-12 09:17 +0200
          Re: How many wide characters may mbstowcs store? raltbos@xs4all.nl (Richard Bos) - 2020-05-24 16:12 +0000
            Re: How many wide characters may mbstowcs store? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-24 15:10 -0700
            Re: How many wide characters may mbstowcs store? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-24 22:58 -0400
    Re: How many wide characters may mbstowcs store? richard@cogsci.ed.ac.uk (Richard Tobin) - 2020-05-11 20:07 +0000
    Re: How many wide characters may mbstowcs store? Andrey Tarasevich <andreytarasevich@hotmail.com> - 2020-06-25 21:42 -0700
      Re: How many wide characters may mbstowcs store? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 06:32 -0700

Page 4 of 4 — ← Prev page 1 2 3 [4]

#152225

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-05-11 20:28 +0200
Message-ID	<r9c5ha$db2$1@solani.org>
In reply to	#152220

Am 11.05.20 um 19:39 schrieb Scott Lurndal:
> 
> It's not a bug.  The phrase
> 
>     "No characters that follow a null byte (which is converted into
>      a wide-character code with value 0) shall be examined or converted."
> 
> Is there to ensure that no bytes beyond the null are _read_ from the source
> string (thus ensuring that no page fault, for example, occurs because a byte
> beyond the nul is on the next (unallocated) page).   It has no bearing on
> whether the function is allowed to write element 'n-1' of the destination operand
> which is allowed explictly by the standard regardless of the length of
> the input string.

I cna see how you could assume that would be allowed implicitly, but
explicitly?
Then, by your reasoning, would strcpy() be allowed to write arbitrary
amounts of data to the destination?

[toc] | [prev] | [next] | [standalone]

#152227

From	scott@slp53.sl.home (Scott Lurndal)
Date	2020-05-11 18:37 +0000
Message-ID	<7YguG.179866$2U3.164806@fx04.iad>
In reply to	#152225

Philipp Klaus Krause <pkk@spth.de> writes:
>Am 11.05.20 um 19:39 schrieb Scott Lurndal:
>> 
>> It's not a bug.  The phrase
>> 
>>     "No characters that follow a null byte (which is converted into
>>      a wide-character code with value 0) shall be examined or converted."
>> 
>> Is there to ensure that no bytes beyond the null are _read_ from the source
>> string (thus ensuring that no page fault, for example, occurs because a byte
>> beyond the nul is on the next (unallocated) page).   It has no bearing on
>> whether the function is allowed to write element 'n-1' of the destination operand
>> which is allowed explictly by the standard regardless of the length of
>> the input string.
>
>I cna see how you could assume that would be allowed implicitly, but
>explicitly?

The purpose of the standard is to provide a contract between the
application and the implementation.

The requirements in the standard describe what the implementation is allowed
to do.  Explicitly.  It cannot write beyond the 'n-1'th element of the
destination, and cannot read beyond the nul-byte in the source.

>Then, by your reasoning, would strcpy() be allowed to write arbitrary
>amounts of data to the destination?

Arbitrary in the sense that it can continue to store bytes into the
destination until it processes a null byte, yes.   It's not analogous
to the interfaces you're discussion since strcpy's destination buffer
isn't explicity bounded by the API.   Consider strncpy, for example, where the
implementation must pad the destination with nul-bytes up to
the 'n-1'th element of the destination buffer if a nul-byte is
encountered in the source string before the destination buffer
is exhausted.  This is a much better analogy to the behavior
of the bounded wide-string functions.

[toc] | [prev] | [next] | [standalone]

#152229

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2020-05-11 11:50 -0700
Message-ID	<0464d036-5ce1-4a13-b92b-a4ff66aa1af4@googlegroups.com>
In reply to	#152227

On Monday, 11 May 2020 19:38:04 UTC+1, Scott Lurndal  wrote:
> Philipp Klaus Krause <pkk@spth.de> writes:
> >Am 11.05.20 um 19:39 schrieb Scott Lurndal:
> >> 
> >> It's not a bug.  The phrase
> >> 
> >>     "No characters that follow a null byte (which is converted into
> >>      a wide-character code with value 0) shall be examined or converted."
> >> 
> >> Is there to ensure that no bytes beyond the null are _read_ from the source
> >> string (thus ensuring that no page fault, for example, occurs because a byte
> >> beyond the nul is on the next (unallocated) page).   It has no bearing on
> >> whether the function is allowed to write element 'n-1' of the destination operand
> >> which is allowed explictly by the standard regardless of the length of
> >> the input string.
> >
> >I cna see how you could assume that would be allowed implicitly, but
> >explicitly?
> 
> The purpose of the standard is to provide a contract between the
> application and the implementation.
> 
> The requirements in the standard describe what the implementation is allowed
> to do.  Explicitly.  It cannot write beyond the 'n-1'th element of the
> destination, and cannot read beyond the nul-byte in the source.
> 
> >Then, by your reasoning, would strcpy() be allowed to write arbitrary
> >amounts of data to the destination?
> 
> Arbitrary in the sense that it can continue to store bytes into the
> destination until it processes a null byte, yes.   It's not analogous
> to the interfaces you're discussion since strcpy's destination buffer
> isn't explicity bounded by the API.   Consider strncpy, for example, where the
> implementation must pad the destination with nul-bytes up to
> the 'n-1'th element of the destination buffer if a nul-byte is
> encountered in the source string before the destination buffer
> is exhausted.  This is a much better analogy to the behavior
> of the bounded wide-string functions.
>
Note that if we know that the buffers are correctly aligned and the size
is a multiple of the natural word size, probably 8 bytes, we can implement
the function in such as way as to always read and write multiples of 8 bytes,
and construct the write in registers. As memory access is usually the
rate limiting operation, tis might well be faster than trying to
detect the exact output buffer end.

[toc] | [prev] | [next] | [standalone]

#152263

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2020-05-12 20:02 +0100
Message-ID	<87ftc5dlif.fsf@bsb.me.uk>
In reply to	#152229

Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

> On Monday, 11 May 2020 19:38:04 UTC+1, Scott Lurndal  wrote:
<cut>
>> The purpose of the standard is to provide a contract between the
>> application and the implementation.
>> 
>> The requirements in the standard describe what the implementation is allowed
>> to do.  Explicitly.  It cannot write beyond the 'n-1'th element of the
>> destination, and cannot read beyond the nul-byte in the source.

If the implementation writes beyond any converted wide null it must
pretend that it didn't because the return result counts the number of
modified locations.  As a result, it can only write the value that was
there before (that's technically a modification in C terms, but it's one
the implementation can lie about).

<cut>
> Note that if we know that the buffers are correctly aligned and the size
> is a multiple of the natural word size, probably 8 bytes, we can implement
> the function in such as way as to always read and write multiples of 8 bytes,
> and construct the write in registers. As memory access is usually the
> rate limiting operation, tis might well be faster than trying to
> detect the exact output buffer end.

Is there a way that can be useful when the "extra" entries -- those
after any convert wide null -- must remain unchanged?  Seems unlikely,
but it's not my area of expertise.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#152265

From	Malcolm McLean <malcolm.arthur.mclean@gmail.com>
Date	2020-05-12 13:12 -0700
Message-ID	<fdcc2e17-60d3-426c-8883-6324ef0882b8@googlegroups.com>
In reply to	#152263

On Tuesday, 12 May 2020 20:03:00 UTC+1, Ben Bacarisse  wrote:
> Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
> 
> > On Monday, 11 May 2020 19:38:04 UTC+1, Scott Lurndal  wrote:
> <cut>
> >> The purpose of the standard is to provide a contract between the
> >> application and the implementation.
> >> 
> >> The requirements in the standard describe what the implementation is allowed
> >> to do.  Explicitly.  It cannot write beyond the 'n-1'th element of the
> >> destination, and cannot read beyond the nul-byte in the source.
> 
> If the implementation writes beyond any converted wide null it must
> pretend that it didn't because the return result counts the number of
> modified locations.  As a result, it can only write the value that was
> there before (that's technically a modification in C terms, but it's one
> the implementation can lie about).
> 
> <cut>
> > Note that if we know that the buffers are correctly aligned and the size
> > is a multiple of the natural word size, probably 8 bytes, we can implement
> > the function in such as way as to always read and write multiples of 8 bytes,
> > and construct the write in registers. As memory access is usually the
> > rate limiting operation, tis might well be faster than trying to
> > detect the exact output buffer end.
> 
> Is there a way that can be useful when the "extra" entries -- those
> after any convert wide null -- must remain unchanged?  Seems unlikely,
> but it's not my area of expertise.
> 
Yes. That additional restriction makes life very messy, because you've
got to have special logic to handle the end case. But you can still do 
your reads and writes in 64 bits. However you need an extra read of
the destination buffer at data end, if your 16 bit output is not a 
multiple of four.

The idea is that you read and write 64 bits at a time, and keep the
intermediate information in registers.

I haven't actually tried to implement this, much less test it for speed.
However it's maybe a worthwhile little test project.

[toc] | [prev] | [next] | [standalone]

#152237

From	richard@cogsci.ed.ac.uk (Richard Tobin)
Date	2020-05-11 19:56 +0000
Message-ID	<r9cam6$29n8$2@macpro.inf.ed.ac.uk>
In reply to	#152220

In article <n5guG.283633$Xk.216585@fx46.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

>Is there to ensure that no bytes beyond the null are _read_ from the source
>string (thus ensuring that no page fault, for example, occurs because a byte
>beyond the nul is on the next (unallocated) page).   It has no bearing on
>whether the function is allowed to write element 'n-1' of the
>destination operand
>which is allowed explictly by the standard regardless of the length of
>the input string.

What?  It can overwrite the contents of the destination array beyond
the length required for converting the input?  Where does it
explicitly say that?

-- Richard

[toc] | [prev] | [next] | [standalone]

#152455

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2020-05-24 16:49 -0700
Message-ID	<86ftbouc4k.fsf@linuxsc.com>
In reply to	#152220

scott@slp53.sl.home (Scott Lurndal) writes:

> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
>> Philipp Klaus Krause <pkk@spth.de> writes:
>>
>>> Am 11.05.20 um 16:44 schrieb Bonita Montero:
>>>
>>>> There's a POSIX-extension that if you pass nullptr for s, you get
>>>> the size of the buffer needed for s.  Maybe this will help you.
>>>> Otherwise: multibyte-characters are usually UTF-8-characters and
>>>> it should be easy to find code to convert these charaters into
>>>> wide-characters; but it should be also easy to write this yourself
>>>> in 20min.
>>>
>>> At the moment I want to figure out what to do about the problem.  File a
>>> bug against GCC in Ubuntu?  File a defect report / clarification request
>>> with WG14?
>>
>> File a gcc bug report.  The gnu/gcc folks have misunderstood the
>> standard, and they are shooting their users in the foot.  Your
>> support/regression tests deserve thanks, and have provided a
>> public service.
>
> It's not a bug.  The phrase
>
>     "No characters that follow a null byte (which is converted into
>      a wide-character code with value 0) shall be examined or converted."
>
> Is there to ensure that no bytes beyond the null are _read_ from the
> source string (thus ensuring that no page fault, for example, occurs
> because a byte beyond the nul is on the next (unallocated) page).
> It has no bearing on whether the function is allowed to write
> element 'n-1' of the destination operand which is allowed explictly
> by the standard regardless of the length of the input string.

I don't agree with your interpretation.  First you are misquoting
the description given in 7.22.8.1 p2.  Second the statements that
"[mbstowcs] stores not more than n wide characters into the
array" and that "No multibyte characters that follow a null
character [...] will be examined or converted" do not constitute
explicit permission to do anything.  Just the opposite:  they
give an explicit restriction NOT to do something.  Third the
interpretation you suggest is not consistent with all other
string library functions in the Standard:  they all don't do
anything past the final character explicitly processed, unless
there is some sort of explicit statement like "under such and
such circumstances the contents of the array is indeterminate."
There is no such explicit statement here.

By the way, the function I was talking about is wcstombs, not
mbstowcs.

> An application that provides an 'n' parameter larger than the
> allocated space of the destination buffer is _BROKEN_.

If you want to say it's not a good programming practice, I have
no problem with that.  But all the evidence I have found supports
the conclusion that the behavior gcc exhibits here does not
conform to what the Standard is meant to require.

[toc] | [prev] | [next] | [standalone]

#152241

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-05-11 14:19 -0700
Message-ID	<877dxitbjo.fsf@nosuchdomain.example.com>
In reply to	#152201

Philipp Klaus Krause <pkk@spth.de> writes:
> Am 11.05.20 um 16:44 schrieb Bonita Montero:
>> There's a POSIX-extension that if you pass nullptr for s, you get
>> the size of the buffer needed for s. Maybe this will help you.
>> Otherwise: multibyte-characters are usually UTF-8-characters and
>> it should be easy to find code to convert these charaters into
>> wide-characters; but it should be also easy to write this yourself
>> in 20min.
>
> At the moment I want to figure out what to do about the problem. File a
> bug against GCC in Ubuntu? File a defect report / clarification request
> with WG14?

Why would you file a bug report against gcc?  wcstombs is implemented by
the library, not by the compiler.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#152242

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-05-11 14:22 -0700
Message-ID	<873686tbei.fsf@nosuchdomain.example.com>
In reply to	#152241

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> Philipp Klaus Krause <pkk@spth.de> writes:
>> Am 11.05.20 um 16:44 schrieb Bonita Montero:
>>> There's a POSIX-extension that if you pass nullptr for s, you get
>>> the size of the buffer needed for s. Maybe this will help you.
>>> Otherwise: multibyte-characters are usually UTF-8-characters and
>>> it should be easy to find code to convert these charaters into
>>> wide-characters; but it should be also easy to write this yourself
>>> in 20min.
>>
>> At the moment I want to figure out what to do about the problem. File a
>> bug against GCC in Ubuntu? File a defect report / clarification request
>> with WG14?
>
> Why would you file a bug report against gcc?  wcstombs is implemented by
> the library, not by the compiler.

(Unless it's a result of incorrect optimization by gcc.)

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#152244

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-05-12 09:17 +0200
Message-ID	<r9dihj$d6r$1@solani.org>
In reply to	#152242

Am 11.05.20 um 23:22 schrieb Keith Thompson:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>> Philipp Klaus Krause <pkk@spth.de> writes:
>>> Am 11.05.20 um 16:44 schrieb Bonita Montero:
>>>> There's a POSIX-extension that if you pass nullptr for s, you get
>>>> the size of the buffer needed for s. Maybe this will help you.
>>>> Otherwise: multibyte-characters are usually UTF-8-characters and
>>>> it should be easy to find code to convert these charaters into
>>>> wide-characters; but it should be also easy to write this yourself
>>>> in 20min.
>>>
>>> At the moment I want to figure out what to do about the problem. File a
>>> bug against GCC in Ubuntu? File a defect report / clarification request
>>> with WG14?
>>
>> Why would you file a bug report against gcc?  wcstombs is implemented by
>> the library, not by the compiler.
> 
> (Unless it's a result of incorrect optimization by gcc.)
> 

I had assumed it to be a compiler issue since I was able to observe the
problem with GCC, but not LLVM on Ubuntu.
And indeed the problem is apparently in the gcc package for Ubuntu:
Their patch to the upstream Debian gcc package predefines
_FORTIFY_SOURCE to 2, which makes glibc non-compliant.

[toc] | [prev] | [next] | [standalone]

#152452

From	raltbos@xs4all.nl (Richard Bos)
Date	2020-05-24 16:12 +0000
Message-ID	<5eca9c8f.21221265@news.xs4all.nl>
In reply to	#152241

Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

> Philipp Klaus Krause <pkk@spth.de> writes:
> > Am 11.05.20 um 16:44 schrieb Bonita Montero:
> >> There's a POSIX-extension that if you pass nullptr for s, you get
> >> the size of the buffer needed for s. Maybe this will help you.
> >> Otherwise: multibyte-characters are usually UTF-8-characters and
> >> it should be easy to find code to convert these charaters into
> >> wide-characters; but it should be also easy to write this yourself
> >> in 20min.
> >
> > At the moment I want to figure out what to do about the problem. File a
> > bug against GCC in Ubuntu? File a defect report / clarification request
> > with WG14?
> 
> Why would you file a bug report against gcc?  wcstombs is implemented by
> the library, not by the compiler.

*Yawm*

Same thing, more or less same team.

(Imagine someone pulling that excuse against Microsoft C? Or Python, or
Forth, or even Basic?)

Pull up your panties already.

Richard

[toc] | [prev] | [next] | [standalone]

#152454

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-05-24 15:10 -0700
Message-ID	<87pnatq91c.fsf@nosuchdomain.example.com>
In reply to	#152452

raltbos@xs4all.nl (Richard Bos) writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
[...]
>> Why would you file a bug report against gcc?  wcstombs is implemented by
>> the library, not by the compiler.
>
> *Yawm*
>
> Same thing, more or less same team.

Different projects, different teams, different bug reporting systems.
If you file a bug report against gcc for a problem in glibc, you're
just wasting time.  They might respond and tell you where to file it.
They *might* redirect it for you, but I wouldn't count on that.

However, as I acknowledged in a followup, this particular problem
may be an issue with gcc's optimizations, which of course implies
that filing a bug report against gcc would be appropriate.

> (Imagine someone pulling that excuse against Microsoft C? Or Python, or
> Forth, or even Basic?)

Imagine paying attention to whether a compiler and runtime library
share a bug reporting system or not.

gcc is often used with libraries other than glibc, and glibc is
often used with compilers other than gcc.  This is less true of
the systems you mention (though Microsoft's C library is commonly
used with compilers other than Microsoft's).  I haven't looked into
Microsoft's bug reporting system(s).  If I wanted to report a bug
in their C implementation, I would do so first.

> Pull up your panties already.

Be less rude.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#152461

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2020-05-24 22:58 -0400
Message-ID	<rafc89$5hk$1@dont-email.me>
In reply to	#152452

On 5/24/20 12:12 PM, Richard Bos wrote:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
...
>> Why would you file a bug report against gcc?  wcstombs is implemented by
>> the library, not by the compiler.
> 
> *Yawm*
> 
> Same thing, more or less same team.

Not really - they're very different teams. A bug report filed with the
wrong team stands a good chance of being ignored, or at least, dismissed
with instructions to file it in the right location.

> (Imagine someone pulling that excuse against Microsoft C? Or Python, or
> Forth, or even Basic?)

How many of those are compilers that are routinely used with a standard
library provided by a different vendor? How many of those have standard
library implementations that are routinely used with a compiler provided
by a different vendor?

[toc] | [prev] | [next] | [standalone]

#152239

From	richard@cogsci.ed.ac.uk (Richard Tobin)
Date	2020-05-11 20:07 +0000
Message-ID	<r9cb9v$2aea$1@macpro.inf.ed.ac.uk>
In reply to	#152174

In article <r9bd16$nbt$1@solani.org>,
Philipp Klaus Krause  <pkk@spth.de> wrote:

>"size_t mbstowcs(wchar_t * restrict pwcs, const char *restrict s, size_t n);
>
>The mbstowcs function converts a sequence of multibyte characters that
>begins in the initial shift state from the array pointed to by s into a
>sequence of corresponding wide characters and stores not more than n
>wide characters into the array pointed to by pwcs. No multibyte
>characters that follow a null character (which is converted into a null
>wide character) will be examined or converted. Each multibyte character
>is converted as if by a call to the mbtowc function, except that the
>conversion state of the mbtowc function is not affected.
>
>No more than n elements will be modified in the array pointed to by
>pwcs. If copying takes place between objects that overlap, the behavior
>is undefined."

This description seems to be full of holes.  It doesn't even say that
the characters it writes into the destination must be the ones it
converted from the source, unlike the much better description of
wcstombs.

-- Richard

[toc] | [prev] | [next] | [standalone]

#152900

From	Andrey Tarasevich <andreytarasevich@hotmail.com>
Date	2020-06-25 21:42 -0700
Message-ID	<rd3uc8$sqv$1@dont-email.me>
In reply to	#152174

On 5/11/2020 4:30 AM, Philipp Klaus Krause wrote:
>
> For wcstombs, the wording seems clear to state that it will stop at a
> terminating 0, but for the mbstowcs it seems unclear to me.
>

The issue is fairly similar to the one described here

https://trust-in-soft.com/blog/2015/12/21/memcmp-requires-pointers-to-fully-valid-buffers/

The question is whether `memcmp` is allowed to read beyond the first 
differing byte, while still within the specified buffer size.

-- 
Best regards,
Andrey Tarasevich

[toc] | [prev] | [next] | [standalone]

#152950

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2020-06-28 06:32 -0700
Message-ID	<864kqviae5.fsf@linuxsc.com>
In reply to	#152900

Andrey Tarasevich <andreytarasevich@hotmail.com> writes:

> On 5/11/2020 4:30 AM, Philipp Klaus Krause wrote:
>
>> For wcstombs, the wording seems clear to state that it will stop at a
>> terminating 0, but for the mbstowcs it seems unclear to me.
>
> The issue is fairly similar to the one described here
>
> https://trust-in-soft.com/blog/2015/12/21/memcmp-requires-pointers-to-fully-valid-buffers/
>
> The question is whether `memcmp` is allowed to read beyond the first
> differing byte, while still within the specified buffer size.

The question is similar.  The answer isn't.

[toc] | [prev] | [standalone]

Page 4 of 4 — ← Prev page 1 2 3 [4]

csiph-web

How many wide characters may mbstowcs store?

Contents

#152225

#152227

#152229

#152263

#152265

#152237

#152455

#152241

#152242

#152244

#152452

#152454

#152461

#152239

#152900

#152950