Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #387229 > unrolled thread

relearning C: why does an in-place change to a char* segfault?

Started byMark Summerfield <mark@qtrac.eu>
First post2024-08-01 08:06 +0000
Last post2024-08-13 17:43 -0700
Articles 20 on this page of 107 — 21 participants

Back to article view | Back to comp.lang.c


Contents

  relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:06 +0000
    Re: relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:24 +0000
      Re: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-01 11:53 +0100
    Re: relearning C: why does an in-place change to a char* segfault? Richard Harnden <richard.nospam@gmail.invalid> - 2024-08-01 09:38 +0100
      Re: relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:54 +0000
      Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 11:12 +0100
        Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 13:59 -0700
          Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 22:07 +0100
            Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:28 -0700
            Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-01 20:20 -0400
            Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-02 01:06 +0000
              Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 10:43 +0100
                Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 11:03 -0400
                Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 14:19 -0400
                  Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 19:33 +0100
                  Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-03 01:31 +0000
                    Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 22:01 -0400
                      Re: relearning C: why does an in-place change to a char* segfault? Joe Pfeiffer <pfeiffer@cs.nmsu.edu> - 2024-08-03 08:32 -0600
                      Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-04 01:05 +0000
                      Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 02:52 -0700
                  Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:46 -0700
                    Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 18:44 -0700
                      Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-15 16:00 -0700
                        Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-15 16:27 -0700
                          Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-09-27 17:33 -0700
                            Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-27 20:34 -0700
                              Re: relearning C: why does an in-place change to a char* segfault? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-09-28 07:22 +0200
                                Re: relearning C: why does an in-place change to a char* segfault? Phillip Frabott <nntp@fulltermprivacy.com> - 2024-09-28 17:57 +0000
                                  Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-28 13:42 -0700
                                    Re: relearning C: why does an in-place change to a char* segfault? Phillip Frabott <nntp@fulltermprivacy.com> - 2024-09-28 22:05 +0000
                                      Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-28 15:17 -0700
                    Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-14 10:33 -0400
                      Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-15 16:05 -0700
            Re: relearning C: why does an in-place change to a char* segfault? Bonita Montero <Bonita.Montero@gmail.com> - 2024-08-04 15:52 +0200
          Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:11 -0700
            Re: relearning C: why does an in-place change to a char* segfault? Vir Campestris <vir.campestris@invalid.invalid> - 2024-08-13 15:34 +0100
              Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 13:08 -0700
                Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:41 -0700
                Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-14 10:40 +0200
              Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:40 -0700
                Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 18:47 -0700
                  Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-14 03:16 +0000
                    Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 20:49 -0700
    Re: relearning C: why does an in-place change to a char* segfault? scott@slp53.sl.home (Scott Lurndal) - 2024-08-01 13:28 +0000
    No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Michael S <already5chosen@yahoo.com> - 2024-08-01 17:40 +0300
      Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-01 19:56 +0200
        Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-08-02 05:30 +0000
          Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-02 03:02 -0700
            Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Harnden <richard.nospam@gmail.invalid> - 2024-08-02 13:04 +0100
              Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 09:59 -0400
              Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-02 11:24 -0700
                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 14:42 -0400
                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 14:58 -0400
                    Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 15:11 -0400
                      Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 08:32 -0700
                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 08:27 -0700
                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-02 12:27 -0700
                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-02 23:29 +0100
                    Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-02 16:11 -0700
                      Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 02:06 +0100
                        Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-04 19:37 -0700
                          Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-04 19:38 -0700
                          Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 12:03 +0100
                            Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 13:35 -0700
                              Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 21:54 +0100
                                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 15:39 -0700
                                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-06 12:29 +0100
                                    Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-06 12:48 -0700
                                      Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-06 23:59 +0100
                                        Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-12 16:18 -0700
                                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 15:44 -0700
                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:38 -0700
                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-12 14:55 -0700
                    Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-09-03 06:11 -0700
              Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? dave_thompson_2@comcast.net - 2024-08-25 16:52 -0400
                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-25 14:26 -0700
            Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:33 -0700
              Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-12 14:45 -0700
                Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 16:05 -0700
                  Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-13 13:08 +0200
                    Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 13:00 -0700
          Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-03 19:54 +0200
    Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-01 12:02 -0400
    Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-01 19:39 +0000
      Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 21:42 +0100
        Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:13 -0700
        Re: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-01 22:40 +0100
        Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-02 00:37 +0000
          Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 11:36 +0100
          Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 13:47 -0700
        Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-03 00:14 +0200
          Re: relearning C: why does an in-place change to a char* segfault? scott@slp53.sl.home (Scott Lurndal) - 2024-08-03 17:07 +0000
            Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 17:11 -0700
          Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 17:07 -0700
            Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-04 01:08 +0000
              Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 19:58 -0700
                Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-04 07:22 -0400
                  Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 02:55 -0700
                Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-05 06:33 +0000
                  Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-04 23:38 -0700
                    Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-05 21:27 +0000
                      Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-05 15:40 -0700
                        Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-06 16:57 +0100
                          Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-06 20:40 +0200
            Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-04 17:20 +0200
      Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:06 -0700
      Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:43 -0700

Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6  Next page →


#387546 — Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault?

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-13 13:00 -0700
SubjectRe: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault?
Message-ID<877cckgplx.fsf@nosuchdomain.example.com>
In reply to#387540
David Brown <david.brown@hesbynett.no> writes:
> On 13/08/2024 01:05, Tim Rentsch wrote:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>>>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>>>> candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
>>>>> writes:
>>>>>> David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
>>>>> [...]
>>>>>
>>>>>>> gcc has the option "-Wwrite-strings" that makes string literals in
>>>>>>> C have "const char" array type, and thus give errors when you try
>>>>>>> to assign to a non-const char * pointer.  But the option has to be
>>>>>>> specified explicitly (it is not in -Wall) because it changes the
>>>>>>> meaning of the code and can cause compatibility issues with
>>>>>>> existing correct code.
>>>>>>
>>>>>> -Wwrite-strings is included in -Wpedantic.
>>>>>
>>>>> No it isn't, nor is it included in -Wall -- and it wouldn't make
>>>>> sense to do so.
>>>>>
>>>>> The -Wpedantic option is intended to produce all required
>>>>> diagnostics for the specified C standard.  -Wwrite-strings
>>>>> gives string literals the type `const char[LENGTH]`, which
>>>>> enables useful diagnostics but is *non-conforming*.
>>>>
>>>> As long as the -Wwrite-strings diagnostics are only warnings the
>>>> result is still conforming.
>>>
>>> It's not just about diagnostics.  This program:
>>>
>>> #include <stdio.h>
>>> int main(void) {
>>>      puts(_Generic("hello",
>>>                    char*:  "char*",
>>>                    const char*:  "const char*",
>>>                    default: "?"));
>>> }
>>>
>>> must print "char*" in a conforming implementation.  With
>>> (gcc|clang) -Wwrite-strings, it prints "const char*".
>> Good point.  I hadn't considered such cases.
>> 
>>> And something as simple as:
>>>
>>>      char *p = "hello";
>>>
>>> is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".
>> That violates the "As long as the -Wwrite-strings diagnostics are
>> only warnings" condition.
>
> Indeed.
>
> I personally think it is nice to have an option to make string
> literals "const" in C, even though it is non-conforming.  I also think
> it is very useful to have a warning on attempts to write to string
> literals.  But I think gcc has made a mistake here by conflating the
> two.  I'd rather see the warning being enabled by default (or at least
> in -Wall), while the "make string literals const" option should
> require an explicit flag and be a "-f" flag rather than a "-W" flag.
> The current situation seems to be a quick-and-dirty way to get the
> warning.
>
> Other people may have different opinions, of course :-)

I agree.  An alternative way to implement "-Wwrite-strings" might have
been to invent a new attribute that can be applied to string literal
objects.  With the current "-Wwrite-strings", gcc marks string literal
objects as const, with all the non-conforming consequences that implies.
Instead, they could have added an attribute like say, "unwritable" that
triggers warnings but no other changes in semantics and no fatal errors
(unless you use -Werror, but then you're literally asking for it).

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#387306 — Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault?

FromDavid Brown <david.brown@hesbynett.no>
Date2024-08-03 19:54 +0200
SubjectRe: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault?
Message-ID<v8lqsc$3i80j$1@dont-email.me>
In reply to#387252
On 02/08/2024 07:30, candycanearter07 wrote:
> David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
>> On 01/08/2024 16:40, Michael S wrote:
>>> On Thu, 01 Aug 2024 08:06:57 +0000
>>> Mark Summerfield <mark@qtrac.eu> wrote:
>>>
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>>       while (*s) {
>>>>           *s = toupper(*s); // SEGFAULT
>>>>           s++;
>>>>       }
>>>> }
>>>>
>>>> int main() {
>>>>       char* text = "this is a test";
>>>>       printf("before [%s]\n", text);
>>>>       uppercase_ascii(text);
>>>>       printf("after  [%s]\n", text);
>>>> }
>>>>
>>>
>>> The answers to your question are already given above, so I'd talk about
>>> something else. Sorry about it.
>>>
>>> To my surprise, none of the 3 major compilers that I tried issued the
>>> warning at this line:
>>>     char* text = "this is a test";
>>> If implicit conversion of 'const char*' to 'char*' does not warrant
>>> compiler warning than I don't know what does.
>>> Is there something in the Standard that explicitly forbids diagnostic
>>> for this sort of conversion?
>>>
>>> BTW, all 3 compilers issue reasonable warnings when I write it slightly
>>> differently:
>>>     const char* ctext = "this is a test";
>>>     char* text = ctext;
>>>
>>> I am starting to suspect that compilers (and the Standard?) consider
>>> string literals as being of type 'char*' rather than 'const char*'.
>>>
>>
>> Your suspicions are correct - in C, string literals are used to
>> initialise an array of char (or wide char, or other appropriate
>> character type).  Perhaps you are thinking of C++, where the type is
>> "const char" (or other const character type).
>>
>> So in C, when a string literal is used in an expression it is converted
>> to a "char *" pointer.  You can, of course, assign that to a "const char
>> *" pointer.  But it does not make sense to have a warning when assigning
>> it to a non-const "char *" pointer.  This is despite it being undefined
>> behaviour (explicitly stated in the standards) to attempt to write to a
>> string literal.
>>
>> The reason string literals are not const in C is backwards compatibility
>> - they existed before C had "const", and making string literals into
>> "const char" arrays would mean that existing code that assigned them to
>> non-const pointers would then be in error.  C++ was able to do the right
>> thing and make them arrays of const char because it had "const" from the
>> beginning.
>>
>> gcc has the option "-Wwrite-strings" that makes string literals in C
>> have "const char" array type, and thus give errors when you try to
>> assign to a non-const char * pointer.  But the option has to be
>> specified explicitly (it is not in -Wall) because it changes the meaning
>> of the code and can cause compatibility issues with existing correct code.
> 
> 
> -Wwrite-strings is included in -Wpedantic.

No, it is not - which is a good thing, because -Wpedantic should not 
include features that change the semantics of the language!  (IMHO the 
flag should not be called -Wwrite-strings, but -fconst-string-literals 
or similar.  It's not really a normal warning option.)

For C++, -pedantic-errors includes the -Wwrite-strings flag which then 
makes implicit conversion of string literal expressions to non-const 
char* pointers an error.  But that's C++, not C.

[toc] | [prev] | [next] | [standalone]


#387238

FromJames Kuyper <jameskuyper@alumni.caltech.edu>
Date2024-08-01 12:02 -0400
Message-ID<v8gbim$283gj$1@dont-email.me>
In reply to#387229
On 8/1/24 04:06, Mark Summerfield wrote:
> This program segfaults at the commented line:
>
> #include <ctype.h>
> #include <stdio.h>
>
> void uppercase_ascii(char *s) {
> while (*s) {
> *s = toupper(*s); // SEGFAULT
> s++;
> }
> }
>
> int main() {
> char* text = "this is a test";

"In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal or
literals. 89) The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. ..." (6.4.5p6)

"... If the program attempts to modify such an array, the behavior is
undefined." (6.4.5p7).

This gives implementation the freedom,for instance, to store that array
in read-only memory, though they don't have to do so. The segfault you
got suggests that the implementation you're using did so. On other
platforms, writes to read-only memory might be silently ignored. On a
platform where it is possible to write to such memory, the
implementation is still free to optimize the code on the assumption that
you won't. That could produce bizarrely unexpected behavior if you
actually do modify it.

What you want to do is initialize an array with the static literal:

char text[] = "this is a test";

Nominally, such an array is initialized by copying from the string
literal's array. However, there's no way for strictly conforming code to
determine whether or not there are two such arrays. If the "text" array
has static storage duration, the string literal's array is likely to be
optimized away.

[toc] | [prev] | [next] | [standalone]


#387240

FromKaz Kylheku <643-408-1753@kylheku.com>
Date2024-08-01 19:39 +0000
Message-ID<20240801114615.906@kylheku.com>
In reply to#387229
On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
> This program segfaults at the commented line:
>
> #include <ctype.h>
> #include <stdio.h>
>
> void uppercase_ascii(char *s) {
>     while (*s) {
>         *s = toupper(*s); // SEGFAULT
>         s++;
>     }
> }
>
> int main() {
>     char* text = "this is a test";

The "this is a test" object is a literal. It is part of the program's image.
When you try to change it, you're making your program self-modifying.

The ISO C language standard doesn't require implementations to support
self-modifying programs; the behavior is left undefined.

It could work in some documented, reliable way, in a given
implementation.

It's the same with any other constant in the program. Say you have
a malloc(1024) somewhere in the program. That 1024 number is encoded
into the program's image somhow, and in principle you could write code
to somehow get at that number and change it to 256. Long before you got
that far, you would be in undefined behavior territory.  If it worked,
it could have surprising effects. For instance, there could be another
call to malloc(1024) in the program and, surprisingly, *that* one also
changes to malloc(256).

A literal like "this is a test" is similar to that 1024, except
that it's very easy to get at it. The language defines it aws an object
with an address, and to get that address all we have to do is evaluate
that expression itself. A minimal piece of code that requests the
undefined consequences of modifying a string literal is as easy
as "a"[0] = 0.

> Program received signal SIGSEGV, Segmentation fault.
> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test") 
> at inplace.c:6
> 6	        *s = toupper(*s);

On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.

GCC uses to have a -fwritable-strings option, but it has been removed
for quite some time now.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#387241

FromBart <bc@freeuk.com>
Date2024-08-01 21:42 +0100
Message-ID<v8gs06$2ceis$1@dont-email.me>
In reply to#387240
On 01/08/2024 20:39, Kaz Kylheku wrote:
> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>> This program segfaults at the commented line:
>>
>> #include <ctype.h>
>> #include <stdio.h>
>>
>> void uppercase_ascii(char *s) {
>>      while (*s) {
>>          *s = toupper(*s); // SEGFAULT
>>          s++;
>>      }
>> }
>>
>> int main() {
>>      char* text = "this is a test";
> 
> The "this is a test" object is a literal. It is part of the program's image.

So is the text here:

   char text[]="this is a test";

But this can be changed without making the program self-modifying.

I guess it depends on what is classed as the program's 'image'.

I'd say the image in the state it is in just after loading or just 
before execution starts (since certain fixups are needed). But some 
sections will be writable during execution, some not.

> When you try to change it, you're making your program self-modifying.

>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
>> at inplace.c:6
>> 6	        *s = toupper(*s);
> 
> On Linux, the string literals of a C executable are located together
> with the program text. They are interspersed among the machine
> instructions which reference them. The program text is mapped
> read-only, so an attempted modification is an access violation trapped
> by the OS, turned into a SIGSEGV signal.

Does it really do that? That's the method I've used for read-only 
strings, to put them into the code-segment (since I neglected to support 
a dedicated read-only data section, and it's too much work now).

But I don't like it since the code section is also executable; you could 
inadvertently execute code within a string (which might happen to 
contain machine code for other purposes).

The dangers are small, but there must be reasons why a dedication 
section is normally used. gcc on Windows creates up to 19 sections, so 
it would odd for literal strings to share with code.

[toc] | [prev] | [next] | [standalone]


#387245

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-01 14:13 -0700
Message-ID<87bk2cgd4z.fsf@nosuchdomain.example.com>
In reply to#387241
Bart <bc@freeuk.com> writes:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>> The "this is a test" object is a literal. It is part of the
>> program's image.
>
> So is the text here:
>
>   char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.

Incorrect.  The string literal results in the creation of an array
object.  Any attempt to modify that array object would have undefined
behavior -- but there's no way to modify it because its address isn't
available to the code.

`text` is a distinct object.  At execution time (assuming it's defined
at block scope), that object is initialized by copying from the string
literal object.  (This is what happens in the abstract machine; there
are opportunities for optimization that might result in the string
literal object not existing in the generated code.)

> I guess it depends on what is classed as the program's 'image'.

Not really.

Given:

    int n = 42;

you can't modify 42, but you can modify n.  There's no need to consider
the idea of self-modifying code.  You're just trying to make it seem
more confusing than it really is.

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#387247

FromBen Bacarisse <ben@bsb.me.uk>
Date2024-08-01 22:40 +0100
Message-ID<8734nnexbs.fsf@bsb.me.uk>
In reply to#387241
Bart <bc@freeuk.com> writes:

> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>> The "this is a test" object is a literal. It is part of the program's
>> image.
>
> So is the text here:
>
>   char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.

Different "this".  The array generated by the string can't be modified
without UB.  The "this" that can be changed in the corrected version is
a plain, automatically allocated array of char, initialised with the
values from the string.

> I guess it depends on what is classed as the program's 'image'.

The self-modifying remark is a bit of a red-herring, but altering the
value of named automatic objects can't be classed as altering the
program's image even in any reasonable way at all.

> I'd say the image in the state it is in just after loading or just before
> execution starts (since certain fixups are needed). But some sections will
> be writable during execution, some not.
>
>> When you try to change it, you're making your program self-modifying.
>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
>>> at inplace.c:6
>>> 6	        *s = toupper(*s);
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
>
> Does it really do that?

Linux does not really have much to do with it; the C implementation
decides, though the OS will influence what choices make more or less
sense.

For example, with my gcc (13.2.0) on Ubuntu the string is put into a
section called .rodata, but tcc on the same Linux box puts it in .data.
As a result the tcc compiled program runs without any issues and outputs

before [this is a test]
after  [THIS IS A TEST]

Some C implementations, on some Linux systems might put strings in the
text segment, but I've not see a system that does that for decades.
Mind you "Linux" refers to a huge class of systems ranging from top-end
servers to tiny embedded devices)

-- 
Ben.

[toc] | [prev] | [next] | [standalone]


#387249

FromKaz Kylheku <643-408-1753@kylheku.com>
Date2024-08-02 00:37 +0000
Message-ID<20240801172148.200@kylheku.com>
In reply to#387241
On 2024-08-01, Bart <bc@freeuk.com> wrote:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>> 
>> The "this is a test" object is a literal. It is part of the program's image.
>
> So is the text here:
>
>    char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.

The array which is initialized by the literal is what can be
changed.

In this situation, the literal is just initializer syntax,
not required to be an object with an address.

But there could well be such an object in the program image,
especially if the array is automatic, and thus instantiated
many times. 

If the program tries to search for that object and modify it,
it will run into UB.

> I guess it depends on what is classed as the program's 'image'.
>
> I'd say the image in the state it is in just after loading or just 
> before execution starts (since certain fixups are needed). But some 
> sections will be writable during execution, some not.

Programs can self-modify in ways designed into the run time.
The toaster has certain internal receptacles that can take
certain forks, according to some rules, which do not affect
the user operating the toaster according to the manual.

> The dangers are small, but there must be reasons why a dedication 
> section is normally used. gcc on Windows creates up to 19 sections, so 
> it would odd for literal strings to share with code.

One reason is that PC-relative addressing can be used by code to
find its literals. Since that usually has a limited range, it helps
to keep the literals with the code. Combining sections also reduces
size. The addressing is also relocatable, which is useful in shared
libs.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#387255

FromBart <bc@freeuk.com>
Date2024-08-02 11:36 +0100
Message-ID<v8icrj$2paum$1@dont-email.me>
In reply to#387249
On 02/08/2024 01:37, Kaz Kylheku wrote:
> On 2024-08-01, Bart <bc@freeuk.com> wrote:
>> On 01/08/2024 20:39, Kaz Kylheku wrote:
>>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>>       while (*s) {
>>>>           *s = toupper(*s); // SEGFAULT
>>>>           s++;
>>>>       }
>>>> }
>>>>
>>>> int main() {
>>>>       char* text = "this is a test";
>>>
>>> The "this is a test" object is a literal. It is part of the program's image.
>>
>> So is the text here:
>>
>>     char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
> 
> The array which is initialized by the literal is what can be
> changed.
> 
> In this situation, the literal is just initializer syntax,
> not required to be an object with an address.

I don't spot the 'int main() {' part of your example; my version of it 
was meant to be static. (My A, B examples explicitly used 'static'.)



>> I guess it depends on what is classed as the program's 'image'.
>>
>> I'd say the image in the state it is in just after loading or just
>> before execution starts (since certain fixups are needed). But some
>> sections will be writable during execution, some not.
> 
> Programs can self-modify in ways designed into the run time.
> The toaster has certain internal receptacles that can take
> certain forks, according to some rules, which do not affect
> the user operating the toaster according to the manual.
> 
>> The dangers are small, but there must be reasons why a dedication
>> section is normally used. gcc on Windows creates up to 19 sections, so
>> it would odd for literal strings to share with code.
> 
> One reason is that PC-relative addressing can be used by code to
> find its literals. Since that usually has a limited range, it helps
> to keep the literals with the code. Combining sections also reduces
> size. The addressing is also relocatable, which is useful in shared
> libs.

You must be talking about ARM then, with its limited address 
displacement (I think 12 bits or +/- 2KB).

On x64, PC-relative uses a 32-bit offset so the range is +/- 2GB; enough 
to have string literals located in their own read-only section of memory.

I'm sure you can do that on ARM too, I can think of several ways (and 
there are loads more registers to play with keep as bases to tables of 
such data). But I don't know what real code does.

[toc] | [prev] | [next] | [standalone]


#387523

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2024-08-12 13:47 -0700
Message-ID<86a5hh4gft.fsf@linuxsc.com>
In reply to#387249
Kaz Kylheku <643-408-1753@kylheku.com> writes:

> On 2024-08-01, Bart <bc@freeuk.com> wrote:
>
>> On 01/08/2024 20:39, Kaz Kylheku wrote:
>>
>>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>>
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>>      while (*s) {
>>>>          *s = toupper(*s); // SEGFAULT
>>>>          s++;
>>>>      }
>>>> }
>>>>
>>>> int main() {
>>>>      char* text = "this is a test";
>>>
>>> The "this is a test" object is a literal.  It is part of the
>>> program's image.
>>
>> So is the text here:
>>
>>    char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
>
> The array which is initialized by the literal is what can be
> changed.
>
> In this situation, the literal is just initializer syntax,
> not required to be an object with an address.

In the abstract machine I believe the initializing string
literal is required to be an object with an address.  The
discussion of string literals in 6.4.5 says there is such
an object for every string literal, and I don't see any
text in 6.7.9, covering Initialization, that overrules or
contradicts that.

[toc] | [prev] | [next] | [standalone]


#387275

FromDavid Brown <david.brown@hesbynett.no>
Date2024-08-03 00:14 +0200
Message-ID<v8jlnk$31hqf$1@dont-email.me>
In reply to#387241
On 01/08/2024 22:42, Bart wrote:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>>      while (*s) {
>>>          *s = toupper(*s); // SEGFAULT
>>>          s++;
>>>      }
>>> }
>>>
>>> int main() {
>>>      char* text = "this is a test";
>>
>> The "this is a test" object is a literal. It is part of the program's 
>> image.
> 
> So is the text here:
> 
>    char text[]="this is a test";
> 
> But this can be changed without making the program self-modifying.

"this is a test" is a string literal, and is typically part of the 
program's image.  (There are some C implementations that do things 
differently, like storing such initialisation data in a compressed format.)

The array "char text[]", however, is a normal variable of type array of 
char.  It is most definitely not part of the program image - it is in 
ram (statically allocated or on the stack, depending on the context) and 
is initialised by copying the characters from the string literal (prior 
to main(), or at each entry to its scope if it is a local variable).

The string literal initialisation data cannot be changed without 
self-modifying code or other undefined behaviour.  The variable "text" 
is just a normal array and can be changed at will.

> 
> I guess it depends on what is classed as the program's 'image'.
> 

No, it depends on understanding what the C means and not trying to 
confuse yourself and others.

> I'd say the image in the state it is in just after loading or just 
> before execution starts (since certain fixups are needed). But some 
> sections will be writable during execution, some not.
> 

That is a poor definition because you are not considering initialised 
data, and you are not clear about what you mean by "before execution 
starts".  A C program typically has an entry point that clears the 
zero-initialised program-lifetime data, initialises the initialised 
program-lifetime data by copying from a block in the program image, then 
sets up things like stdin, heap support, argc/argv, and various other 
run-time setup features.  Then it calls main().  The initialised data 
section and zero-initialised data section are certainly part of the 
state of the program at the start of the execution from C's viewpoint - 
entry to main().  They are equally certainly not part of the program image.

One reasonable definition of "program image" would be "the file on the 
disk" (on general-purpose OS's) or "the binary data in flash" on typical 
embedded systems.  Another might be the read-only data sections set up 
by the OS loader just before jumping to the entry point of the C 
run-time code (long before main() is called and the C code itself starts).

>> When you try to change it, you're making your program self-modifying.
> 
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a 
>>> test")
>>> at inplace.c:6
>>> 6            *s = toupper(*s);
>>
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
> 
> Does it really do that? That's the method I've used for read-only 
> strings, to put them into the code-segment (since I neglected to support 
> a dedicated read-only data section, and it's too much work now).
>

No, Linux systems don't have read-only data or string literals 
interspersed with code.  They have such data in separate segments, for 
better cache efficiency and to allow different section attributes 
(read-only data can't be executed).

> But I don't like it since the code section is also executable; you could 
> inadvertently execute code within a string (which might happen to 
> contain machine code for other purposes).
> 

That's why code and read-only data is rarely interspersed.

> The dangers are small, but there must be reasons why a dedication 
> section is normally used. gcc on Windows creates up to 19 sections, so 
> it would odd for literal strings to share with code.
> 
> 

[toc] | [prev] | [next] | [standalone]


#387300

Fromscott@slp53.sl.home (Scott Lurndal)
Date2024-08-03 17:07 +0000
Message-ID<PttrO.6301$UJj9.4591@fx33.iad>
In reply to#387275
David Brown <david.brown@hesbynett.no> writes:
>On 01/08/2024 22:42, Bart wrote:

>>    char text[]="this is a test";
>> 
>> But this can be changed without making the program self-modifying.
>
>"this is a test" is a string literal, and is typically part of the 
>program's image.  (There are some C implementations that do things 
>differently, like storing such initialisation data in a compressed format.)
>
>The array "char text[]", however, is a normal variable of type array of 
>char.  It is most definitely not part of the program image - it is in 
>ram (statically allocated or on the stack, depending on the context) and 
>is initialised by copying the characters from the string literal (prior 
>to main(), or at each entry to its scope if it is a local variable).

Linux (ELF):

A file-scope static declaration of char text[] will emit the string
literal into the .data section and that data section will be loaded
into memory by the ELF loader.  There is no copy made at runtime
before main().

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

char text1[] = "This is a test of a static-scope string";

int
main(int argc, const char **argv)
{
    char text2[] = "This is a test of a function-scope string";


    fprintf(stdout, "%p %s\n", &text1, text1);
    fprintf(stdout, "%s\n", text2);

    return 0;
}

$ /tmp/a
0x601060 This is a test of a static-scope string
This is a test of a function-scope string

$ objdump -p /tmp/a

/tmp/a:     file format elf64-x86-64

Program Header:
    PHDR off    0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
         filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
  INTERP off    0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
         filesz 0x000000000000001c memsz 0x000000000000001c flags r--
    LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
         filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
    LOAD off    0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
         filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-

.data section:

0000e00: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000e10: 5005 4000 0000 0000 3005 4000 0000 0000  P.@.....0.@.....
0000e20: 0000 0000 0000 0000 0100 0000 0000 0000  ................
0000e30: 0100 0000 0000 0000 0c00 0000 0000 0000  ................
0000e40: 2804 4000 0000 0000 0d00 0000 0000 0000  (.@.............
0000e50: a406 4000 0000 0000 1900 0000 0000 0000  ..@.............
0000e60: 100e 6000 0000 0000 1b00 0000 0000 0000  ..`.............
0000e70: 0800 0000 0000 0000 1a00 0000 0000 0000  ................
0000e80: 180e 6000 0000 0000 1c00 0000 0000 0000  ..`.............
0000e90: 0800 0000 0000 0000 f5fe ff6f 0000 0000  ...........o....
0000ea0: 9802 4000 0000 0000 0500 0000 0000 0000  ..@.............
0000eb0: 3803 4000 0000 0000 0600 0000 0000 0000  8.@.............
0000ec0: c002 4000 0000 0000 0a00 0000 0000 0000  ..@.............
0000ed0: 4700 0000 0000 0000 0b00 0000 0000 0000  G...............
0000ee0: 1800 0000 0000 0000 1500 0000 0000 0000  ................
0000ef0: 0000 0000 0000 0000 0300 0000 0000 0000  ................
0000f00: 0010 6000 0000 0000 0200 0000 0000 0000  ..`.............
0000f10: 4800 0000 0000 0000 1400 0000 0000 0000  H...............
0000f20: 0700 0000 0000 0000 1700 0000 0000 0000  ................
0000f30: e003 4000 0000 0000 0700 0000 0000 0000  ..@.............
0000f40: b003 4000 0000 0000 0800 0000 0000 0000  ..@.............
0000f50: 3000 0000 0000 0000 0900 0000 0000 0000  0...............
0000f60: 1800 0000 0000 0000 feff ff6f 0000 0000  ...........o....
0000f70: 9003 4000 0000 0000 ffff ff6f 0000 0000  ..@........o....
0000f80: 0100 0000 0000 0000 f0ff ff6f 0000 0000  ...........o....
0000f90: 8003 4000 0000 0000 0000 0000 0000 0000  ..@.............
0000fa0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000fb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0001000: 280e 6000 0000 0000 0000 0000 0000 0000  (.`.............
0001010: 0000 0000 0000 0000 6604 4000 0000 0000  ........f.@.....
0001020: 7604 4000 0000 0000 8604 4000 0000 0000  v.@.......@.....
0001030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0001040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0001050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0001060: 5468 6973 2069 7320 6120 7465 7374 206f  This is a test o
0001070: 6620 6120 7374 6174 6963 2d73 636f 7065  f a static-scope
0001080: 2073 7472 696e 6700 4743 433a 2028 474e   string.GCC: (GN

$ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
0x250

[toc] | [prev] | [next] | [standalone]


#387314

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-03 17:11 -0700
Message-ID<877ccxdu44.fsf@nosuchdomain.example.com>
In reply to#387300
scott@slp53.sl.home (Scott Lurndal) writes:
> David Brown <david.brown@hesbynett.no> writes:
>>On 01/08/2024 22:42, Bart wrote:
>>>    char text[]="this is a test";
>>> 
>>> But this can be changed without making the program self-modifying.
>>
>>"this is a test" is a string literal, and is typically part of the 
>>program's image.  (There are some C implementations that do things 
>>differently, like storing such initialisation data in a compressed format.)
>>
>>The array "char text[]", however, is a normal variable of type array of 
>>char.  It is most definitely not part of the program image - it is in 
>>ram (statically allocated or on the stack, depending on the context) and 
>>is initialised by copying the characters from the string literal (prior 
>>to main(), or at each entry to its scope if it is a local variable).
>
> Linux (ELF):
>
> A file-scope static declaration of char text[] will emit the string
> literal into the .data section and that data section will be loaded
> into memory by the ELF loader.  There is no copy made at runtime
> before main().
>
> #include <stdint.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> char text1[] = "This is a test of a static-scope string";

In the abstract machine, there's an anonymous array object corresponding
to the string literal, and `text` is a distinct object, also with static
storage duration.  The compiler optimizes it away and only stores the
data in `text`.

> int
> main(int argc, const char **argv)
> {
>     char text2[] = "This is a test of a function-scope string";

Since the second string literal is identical, the compiler is permitted
to store them in the same place (it's unspecified, so the implementation
doesn't have to document this).  Presumably there's code to copy from
the static array into `text2`, executed within `main`.

>
>     fprintf(stdout, "%p %s\n", &text1, text1);
>     fprintf(stdout, "%s\n", text2);
>
>     return 0;
> }
>
> $ /tmp/a
> 0x601060 This is a test of a static-scope string
> This is a test of a function-scope string
>
> $ objdump -p /tmp/a
>
> /tmp/a:     file format elf64-x86-64
>
> Program Header:
>     PHDR off    0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
>          filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
>   INTERP off    0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
>          filesz 0x000000000000001c memsz 0x000000000000001c flags r--
>     LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
>          filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
>     LOAD off    0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
>          filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-
>
> .data section:
>
> 0000e00: 0000 0000 0000 0000 0000 0000 0000 0000  ................
[36 lines deleted]
> 0001050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0001060: 5468 6973 2069 7320 6120 7465 7374 206f  This is a test o
> 0001070: 6620 6120 7374 6174 6963 2d73 636f 7065  f a static-scope
> 0001080: 2073 7472 696e 6700 4743 433a 2028 474e   string.GCC: (GN
>
> $ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
> 0x250

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#387313

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-03 17:07 -0700
Message-ID<87bk29duba.fsf@nosuchdomain.example.com>
In reply to#387275
David Brown <david.brown@hesbynett.no> writes:
[...]
> "this is a test" is a string literal, and is typically part of the
> program's image.  (There are some C implementations that do things
> differently, like storing such initialisation data in a compressed
> format.)
[...]

What implementations do that?  Typically data that's all zeros isn't
stored in the image, but general compression isn't something I've seen
(not that I've paid much attention).  It would save space in the image,
but it would require decompression at load time and wouldn't save any
space at run time.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#387317

FromLawrence D'Oliveiro <ldo@nz.invalid>
Date2024-08-04 01:08 +0000
Message-ID<v8mkao$3n2rq$5@dont-email.me>
In reply to#387313
On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

> ... general compression isn't something I've seen ...

I recall Apple had a patent on some aspects of the “PEF” executable format 
that they created for their PowerPC machines running old MacOS. This had 
to do with some clever instruction encodings for loading stuff into 
memory.

[toc] | [prev] | [next] | [standalone]


#387320

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-03 19:58 -0700
Message-ID<8734nldmea.fsf@nosuchdomain.example.com>
In reply to#387317
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>
>> ... general compression isn't something I've seen ...
>
> I recall Apple had a patent on some aspects of the “PEF” executable format 
> that they created for their PowerPC machines running old MacOS. This had 
> to do with some clever instruction encodings for loading stuff into 
> memory.

Is that relevant to what I asked about?

What I had in mind is something that, given this:

    static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

would store something less than 1000*sizeof(int) bytes in the executable
file.  I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#387322

FromRichard Damon <richard@damon-family.org>
Date2024-08-04 07:22 -0400
Message-ID<6a0172b8fe12171901a97032b8a91641308fdc50@i2pn2.org>
In reply to#387320
On 8/3/24 10:58 PM, Keith Thompson wrote:
> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>
>>> ... general compression isn't something I've seen ...
>>
>> I recall Apple had a patent on some aspects of the “PEF” executable format
>> that they created for their PowerPC machines running old MacOS. This had
>> to do with some clever instruction encodings for loading stuff into
>> memory.
> 
> Is that relevant to what I asked about?
> 
> What I had in mind is something that, given this:
> 
>      static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
> 
> would store something less than 1000*sizeof(int) bytes in the executable
> file.  I wouldn't be hard to do, but I'm not convinced it would be
> worthwhile.
> 

I vaguely seem to remember an embedded format that did something like 
this. The .init segement that was "copied" to the .data segement has a 
simple run-length encoding option. For non-repetitive data, it just 
encoded 1 copy of length n. But it could also encode repeats like your 
example. When EPROM was a scarce commodity squeezing out a bit of size 
for the .init segment was useful.

My guess that since it didn't persist, it didn't actually help that much.

[toc] | [prev] | [next] | [standalone]


#387507

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2024-08-12 02:55 -0700
Message-ID<86le129ibu.fsf@linuxsc.com>
In reply to#387322
Richard Damon <richard@damon-family.org> writes:

> On 8/3/24 10:58 PM, Keith Thompson wrote:
>
>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>
>>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>>
>>>> ... general compression isn't something I've seen ...
>>>
>>> I recall Apple had a patent on some aspects of the ?PEF?
>>> executable format that they created for their PowerPC machines
>>> running old MacOS.  This had to do with some clever instruction
>>> encodings for loading stuff into memory.
>>
>> Is that relevant to what I asked about?
>>
>> What I had in mind is something that, given this:
>>
>>      static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
>>
>> would store something less than 1000*sizeof(int) bytes in the
>> executable file.  I wouldn't be hard to do, but I'm not convinced
>> it would be worthwhile.
>
> I vaguely seem to remember an embedded format that did something like
> this.  The .init segement that was "copied" to the .data segement has
> a simple run-length encoding option.  For non-repetitive data, it
> just encoded 1 copy of length n.  But it could also encode repeats
> like your example.  When EPROM was a scarce commodity squeezing out a
> bit of size for the .init segment was useful.
>
> My guess that since it didn't persist, it didn't actually help that
> much.

Or maybe it helped back in the day, but since then technology has
changed and it doesn't help any more.

[toc] | [prev] | [next] | [standalone]


#387343

FromLawrence D'Oliveiro <ldo@nz.invalid>
Date2024-08-05 06:33 +0000
Message-ID<v8prni$hmng$1@dont-email.me>
In reply to#387320
On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:

> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>
>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>
>>> ... general compression isn't something I've seen ...
>>
>> I recall Apple had a patent on some aspects of the “PEF” executable
>> format that they created for their PowerPC machines running old MacOS.
>> This had to do with some clever instruction encodings for loading stuff
>> into memory.
> 
> Is that relevant to what I asked about?

“Compression”

[toc] | [prev] | [next] | [standalone]


#387344

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-08-04 23:38 -0700
Message-ID<871q33cw4p.fsf@nosuchdomain.example.com>
In reply to#387343
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
> On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:
>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>>
>>>> ... general compression isn't something I've seen ...
>>>
>>> I recall Apple had a patent on some aspects of the “PEF” executable
>>> format that they created for their PowerPC machines running old MacOS.
>>> This had to do with some clever instruction encodings for loading stuff
>>> into memory.
>> 
>> Is that relevant to what I asked about?
>
> “Compression”

Was that intended to be responsive?

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6  Next page →

Back to top | Article view | comp.lang.c


csiph-web