Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #387229 > unrolled thread
| Started by | Mark Summerfield <mark@qtrac.eu> |
|---|---|
| First post | 2024-08-01 08:06 +0000 |
| Last post | 2024-08-13 17:43 -0700 |
| Articles | 20 on this page of 107 — 21 participants |
Back to article view | Back to comp.lang.c
relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:06 +0000
Re: relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:24 +0000
Re: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-01 11:53 +0100
Re: relearning C: why does an in-place change to a char* segfault? Richard Harnden <richard.nospam@gmail.invalid> - 2024-08-01 09:38 +0100
Re: relearning C: why does an in-place change to a char* segfault? Mark Summerfield <mark@qtrac.eu> - 2024-08-01 08:54 +0000
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 11:12 +0100
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 13:59 -0700
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 22:07 +0100
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:28 -0700
Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-01 20:20 -0400
Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-02 01:06 +0000
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 10:43 +0100
Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 11:03 -0400
Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 14:19 -0400
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 19:33 +0100
Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-03 01:31 +0000
Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 22:01 -0400
Re: relearning C: why does an in-place change to a char* segfault? Joe Pfeiffer <pfeiffer@cs.nmsu.edu> - 2024-08-03 08:32 -0600
Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-04 01:05 +0000
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 02:52 -0700
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:46 -0700
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 18:44 -0700
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-15 16:00 -0700
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-15 16:27 -0700
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-09-27 17:33 -0700
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-27 20:34 -0700
Re: relearning C: why does an in-place change to a char* segfault? Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-09-28 07:22 +0200
Re: relearning C: why does an in-place change to a char* segfault? Phillip Frabott <nntp@fulltermprivacy.com> - 2024-09-28 17:57 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-28 13:42 -0700
Re: relearning C: why does an in-place change to a char* segfault? Phillip Frabott <nntp@fulltermprivacy.com> - 2024-09-28 22:05 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-09-28 15:17 -0700
Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-14 10:33 -0400
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-15 16:05 -0700
Re: relearning C: why does an in-place change to a char* segfault? Bonita Montero <Bonita.Montero@gmail.com> - 2024-08-04 15:52 +0200
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:11 -0700
Re: relearning C: why does an in-place change to a char* segfault? Vir Campestris <vir.campestris@invalid.invalid> - 2024-08-13 15:34 +0100
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 13:08 -0700
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:41 -0700
Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-14 10:40 +0200
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:40 -0700
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 18:47 -0700
Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-14 03:16 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 20:49 -0700
Re: relearning C: why does an in-place change to a char* segfault? scott@slp53.sl.home (Scott Lurndal) - 2024-08-01 13:28 +0000
No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Michael S <already5chosen@yahoo.com> - 2024-08-01 17:40 +0300
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-01 19:56 +0200
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2024-08-02 05:30 +0000
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-02 03:02 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Harnden <richard.nospam@gmail.invalid> - 2024-08-02 13:04 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 09:59 -0400
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-02 11:24 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 14:42 -0400
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-02 14:58 -0400
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-02 15:11 -0400
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 08:32 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 08:27 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-02 12:27 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-02 23:29 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-02 16:11 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 02:06 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-04 19:37 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-04 19:38 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 12:03 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 13:35 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-05 21:54 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 15:39 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-06 12:29 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-06 12:48 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-06 23:59 +0100
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-12 16:18 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-08-05 15:44 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:38 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-12 14:55 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-09-03 06:11 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? dave_thompson_2@comcast.net - 2024-08-25 16:52 -0400
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-25 14:26 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 14:33 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-12 14:45 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 16:05 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-13 13:08 +0200
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-13 13:00 -0700
Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-03 19:54 +0200
Re: relearning C: why does an in-place change to a char* segfault? James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-08-01 12:02 -0400
Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-01 19:39 +0000
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-01 21:42 +0100
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:13 -0700
Re: relearning C: why does an in-place change to a char* segfault? Ben Bacarisse <ben@bsb.me.uk> - 2024-08-01 22:40 +0100
Re: relearning C: why does an in-place change to a char* segfault? Kaz Kylheku <643-408-1753@kylheku.com> - 2024-08-02 00:37 +0000
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-02 11:36 +0100
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 13:47 -0700
Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-03 00:14 +0200
Re: relearning C: why does an in-place change to a char* segfault? scott@slp53.sl.home (Scott Lurndal) - 2024-08-03 17:07 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 17:11 -0700
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 17:07 -0700
Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-04 01:08 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-03 19:58 -0700
Re: relearning C: why does an in-place change to a char* segfault? Richard Damon <richard@damon-family.org> - 2024-08-04 07:22 -0400
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-12 02:55 -0700
Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-05 06:33 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-04 23:38 -0700
Re: relearning C: why does an in-place change to a char* segfault? Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-08-05 21:27 +0000
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-05 15:40 -0700
Re: relearning C: why does an in-place change to a char* segfault? Bart <bc@freeuk.com> - 2024-08-06 16:57 +0100
Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-06 20:40 +0200
Re: relearning C: why does an in-place change to a char* segfault? David Brown <david.brown@hesbynett.no> - 2024-08-04 17:20 +0200
Re: relearning C: why does an in-place change to a char* segfault? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-08-01 14:06 -0700
Re: relearning C: why does an in-place change to a char* segfault? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-08-13 17:43 -0700
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-13 13:00 -0700 |
| Subject | Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? |
| Message-ID | <877cckgplx.fsf@nosuchdomain.example.com> |
| In reply to | #387540 |
David Brown <david.brown@hesbynett.no> writes:
> On 13/08/2024 01:05, Tim Rentsch wrote:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>>>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>>>> candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
>>>>> writes:
>>>>>> David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
>>>>> [...]
>>>>>
>>>>>>> gcc has the option "-Wwrite-strings" that makes string literals in
>>>>>>> C have "const char" array type, and thus give errors when you try
>>>>>>> to assign to a non-const char * pointer. But the option has to be
>>>>>>> specified explicitly (it is not in -Wall) because it changes the
>>>>>>> meaning of the code and can cause compatibility issues with
>>>>>>> existing correct code.
>>>>>>
>>>>>> -Wwrite-strings is included in -Wpedantic.
>>>>>
>>>>> No it isn't, nor is it included in -Wall -- and it wouldn't make
>>>>> sense to do so.
>>>>>
>>>>> The -Wpedantic option is intended to produce all required
>>>>> diagnostics for the specified C standard. -Wwrite-strings
>>>>> gives string literals the type `const char[LENGTH]`, which
>>>>> enables useful diagnostics but is *non-conforming*.
>>>>
>>>> As long as the -Wwrite-strings diagnostics are only warnings the
>>>> result is still conforming.
>>>
>>> It's not just about diagnostics. This program:
>>>
>>> #include <stdio.h>
>>> int main(void) {
>>> puts(_Generic("hello",
>>> char*: "char*",
>>> const char*: "const char*",
>>> default: "?"));
>>> }
>>>
>>> must print "char*" in a conforming implementation. With
>>> (gcc|clang) -Wwrite-strings, it prints "const char*".
>> Good point. I hadn't considered such cases.
>>
>>> And something as simple as:
>>>
>>> char *p = "hello";
>>>
>>> is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".
>> That violates the "As long as the -Wwrite-strings diagnostics are
>> only warnings" condition.
>
> Indeed.
>
> I personally think it is nice to have an option to make string
> literals "const" in C, even though it is non-conforming. I also think
> it is very useful to have a warning on attempts to write to string
> literals. But I think gcc has made a mistake here by conflating the
> two. I'd rather see the warning being enabled by default (or at least
> in -Wall), while the "make string literals const" option should
> require an explicit flag and be a "-f" flag rather than a "-W" flag.
> The current situation seems to be a quick-and-dirty way to get the
> warning.
>
> Other people may have different opinions, of course :-)
I agree. An alternative way to implement "-Wwrite-strings" might have
been to invent a new attribute that can be applied to string literal
objects. With the current "-Wwrite-strings", gcc marks string literal
objects as const, with all the non-conforming consequences that implies.
Instead, they could have added an attribute like say, "unwritable" that
triggers warnings but no other changes in semantics and no fatal errors
(unless you use -Werror, but then you're literally asking for it).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2024-08-03 19:54 +0200 |
| Subject | Re: No warning at implicit removal of const. Was: relearning C: why does an in-place change to a char* segfault? |
| Message-ID | <v8lqsc$3i80j$1@dont-email.me> |
| In reply to | #387252 |
On 02/08/2024 07:30, candycanearter07 wrote:
> David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
>> On 01/08/2024 16:40, Michael S wrote:
>>> On Thu, 01 Aug 2024 08:06:57 +0000
>>> Mark Summerfield <mark@qtrac.eu> wrote:
>>>
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>> while (*s) {
>>>> *s = toupper(*s); // SEGFAULT
>>>> s++;
>>>> }
>>>> }
>>>>
>>>> int main() {
>>>> char* text = "this is a test";
>>>> printf("before [%s]\n", text);
>>>> uppercase_ascii(text);
>>>> printf("after [%s]\n", text);
>>>> }
>>>>
>>>
>>> The answers to your question are already given above, so I'd talk about
>>> something else. Sorry about it.
>>>
>>> To my surprise, none of the 3 major compilers that I tried issued the
>>> warning at this line:
>>> char* text = "this is a test";
>>> If implicit conversion of 'const char*' to 'char*' does not warrant
>>> compiler warning than I don't know what does.
>>> Is there something in the Standard that explicitly forbids diagnostic
>>> for this sort of conversion?
>>>
>>> BTW, all 3 compilers issue reasonable warnings when I write it slightly
>>> differently:
>>> const char* ctext = "this is a test";
>>> char* text = ctext;
>>>
>>> I am starting to suspect that compilers (and the Standard?) consider
>>> string literals as being of type 'char*' rather than 'const char*'.
>>>
>>
>> Your suspicions are correct - in C, string literals are used to
>> initialise an array of char (or wide char, or other appropriate
>> character type). Perhaps you are thinking of C++, where the type is
>> "const char" (or other const character type).
>>
>> So in C, when a string literal is used in an expression it is converted
>> to a "char *" pointer. You can, of course, assign that to a "const char
>> *" pointer. But it does not make sense to have a warning when assigning
>> it to a non-const "char *" pointer. This is despite it being undefined
>> behaviour (explicitly stated in the standards) to attempt to write to a
>> string literal.
>>
>> The reason string literals are not const in C is backwards compatibility
>> - they existed before C had "const", and making string literals into
>> "const char" arrays would mean that existing code that assigned them to
>> non-const pointers would then be in error. C++ was able to do the right
>> thing and make them arrays of const char because it had "const" from the
>> beginning.
>>
>> gcc has the option "-Wwrite-strings" that makes string literals in C
>> have "const char" array type, and thus give errors when you try to
>> assign to a non-const char * pointer. But the option has to be
>> specified explicitly (it is not in -Wall) because it changes the meaning
>> of the code and can cause compatibility issues with existing correct code.
>
>
> -Wwrite-strings is included in -Wpedantic.
No, it is not - which is a good thing, because -Wpedantic should not
include features that change the semantics of the language! (IMHO the
flag should not be called -Wwrite-strings, but -fconst-string-literals
or similar. It's not really a normal warning option.)
For C++, -pedantic-errors includes the -Wwrite-strings flag which then
makes implicit conversion of string literal expressions to non-const
char* pointers an error. But that's C++, not C.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2024-08-01 12:02 -0400 |
| Message-ID | <v8gbim$283gj$1@dont-email.me> |
| In reply to | #387229 |
On 8/1/24 04:06, Mark Summerfield wrote:
> This program segfaults at the commented line:
>
> #include <ctype.h>
> #include <stdio.h>
>
> void uppercase_ascii(char *s) {
> while (*s) {
> *s = toupper(*s); // SEGFAULT
> s++;
> }
> }
>
> int main() {
> char* text = "this is a test";
"In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal or
literals. 89) The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. ..." (6.4.5p6)
"... If the program attempts to modify such an array, the behavior is
undefined." (6.4.5p7).
This gives implementation the freedom,for instance, to store that array
in read-only memory, though they don't have to do so. The segfault you
got suggests that the implementation you're using did so. On other
platforms, writes to read-only memory might be silently ignored. On a
platform where it is possible to write to such memory, the
implementation is still free to optimize the code on the assumption that
you won't. That could produce bizarrely unexpected behavior if you
actually do modify it.
What you want to do is initialize an array with the static literal:
char text[] = "this is a test";
Nominally, such an array is initialized by copying from the string
literal's array. However, there's no way for strictly conforming code to
determine whether or not there are two such arrays. If the "text" array
has static storage duration, the string literal's array is likely to be
optimized away.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2024-08-01 19:39 +0000 |
| Message-ID | <20240801114615.906@kylheku.com> |
| In reply to | #387229 |
On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
> This program segfaults at the commented line:
>
> #include <ctype.h>
> #include <stdio.h>
>
> void uppercase_ascii(char *s) {
> while (*s) {
> *s = toupper(*s); // SEGFAULT
> s++;
> }
> }
>
> int main() {
> char* text = "this is a test";
The "this is a test" object is a literal. It is part of the program's image.
When you try to change it, you're making your program self-modifying.
The ISO C language standard doesn't require implementations to support
self-modifying programs; the behavior is left undefined.
It could work in some documented, reliable way, in a given
implementation.
It's the same with any other constant in the program. Say you have
a malloc(1024) somewhere in the program. That 1024 number is encoded
into the program's image somhow, and in principle you could write code
to somehow get at that number and change it to 256. Long before you got
that far, you would be in undefined behavior territory. If it worked,
it could have surprising effects. For instance, there could be another
call to malloc(1024) in the program and, surprisingly, *that* one also
changes to malloc(256).
A literal like "this is a test" is similar to that 1024, except
that it's very easy to get at it. The language defines it aws an object
with an address, and to get that address all we have to do is evaluate
that expression itself. A minimal piece of code that requests the
undefined consequences of modifying a string literal is as easy
as "a"[0] = 0.
> Program received signal SIGSEGV, Segmentation fault.
> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
> at inplace.c:6
> 6 *s = toupper(*s);
On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.
GCC uses to have a -fwritable-strings option, but it has been removed
for quite some time now.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2024-08-01 21:42 +0100 |
| Message-ID | <v8gs06$2ceis$1@dont-email.me> |
| In reply to | #387240 |
On 01/08/2024 20:39, Kaz Kylheku wrote:
> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>> This program segfaults at the commented line:
>>
>> #include <ctype.h>
>> #include <stdio.h>
>>
>> void uppercase_ascii(char *s) {
>> while (*s) {
>> *s = toupper(*s); // SEGFAULT
>> s++;
>> }
>> }
>>
>> int main() {
>> char* text = "this is a test";
>
> The "this is a test" object is a literal. It is part of the program's image.
So is the text here:
char text[]="this is a test";
But this can be changed without making the program self-modifying.
I guess it depends on what is classed as the program's 'image'.
I'd say the image in the state it is in just after loading or just
before execution starts (since certain fixups are needed). But some
sections will be writable during execution, some not.
> When you try to change it, you're making your program self-modifying.
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
>> at inplace.c:6
>> 6 *s = toupper(*s);
>
> On Linux, the string literals of a C executable are located together
> with the program text. They are interspersed among the machine
> instructions which reference them. The program text is mapped
> read-only, so an attempted modification is an access violation trapped
> by the OS, turned into a SIGSEGV signal.
Does it really do that? That's the method I've used for read-only
strings, to put them into the code-segment (since I neglected to support
a dedicated read-only data section, and it's too much work now).
But I don't like it since the code section is also executable; you could
inadvertently execute code within a string (which might happen to
contain machine code for other purposes).
The dangers are small, but there must be reasons why a dedication
section is normally used. gcc on Windows creates up to 19 sections, so
it would odd for literal strings to share with code.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-01 14:13 -0700 |
| Message-ID | <87bk2cgd4z.fsf@nosuchdomain.example.com> |
| In reply to | #387241 |
Bart <bc@freeuk.com> writes:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>> while (*s) {
>>> *s = toupper(*s); // SEGFAULT
>>> s++;
>>> }
>>> }
>>>
>>> int main() {
>>> char* text = "this is a test";
>> The "this is a test" object is a literal. It is part of the
>> program's image.
>
> So is the text here:
>
> char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.
Incorrect. The string literal results in the creation of an array
object. Any attempt to modify that array object would have undefined
behavior -- but there's no way to modify it because its address isn't
available to the code.
`text` is a distinct object. At execution time (assuming it's defined
at block scope), that object is initialized by copying from the string
literal object. (This is what happens in the abstract machine; there
are opportunities for optimization that might result in the string
literal object not existing in the generated code.)
> I guess it depends on what is classed as the program's 'image'.
Not really.
Given:
int n = 42;
you can't modify 42, but you can modify n. There's no need to consider
the idea of self-modifying code. You're just trying to make it seem
more confusing than it really is.
[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben@bsb.me.uk> |
|---|---|
| Date | 2024-08-01 22:40 +0100 |
| Message-ID | <8734nnexbs.fsf@bsb.me.uk> |
| In reply to | #387241 |
Bart <bc@freeuk.com> writes:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>> while (*s) {
>>> *s = toupper(*s); // SEGFAULT
>>> s++;
>>> }
>>> }
>>>
>>> int main() {
>>> char* text = "this is a test";
>> The "this is a test" object is a literal. It is part of the program's
>> image.
>
> So is the text here:
>
> char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.
Different "this". The array generated by the string can't be modified
without UB. The "this" that can be changed in the corrected version is
a plain, automatically allocated array of char, initialised with the
values from the string.
> I guess it depends on what is classed as the program's 'image'.
The self-modifying remark is a bit of a red-herring, but altering the
value of named automatic objects can't be classed as altering the
program's image even in any reasonable way at all.
> I'd say the image in the state it is in just after loading or just before
> execution starts (since certain fixups are needed). But some sections will
> be writable during execution, some not.
>
>> When you try to change it, you're making your program self-modifying.
>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
>>> at inplace.c:6
>>> 6 *s = toupper(*s);
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
>
> Does it really do that?
Linux does not really have much to do with it; the C implementation
decides, though the OS will influence what choices make more or less
sense.
For example, with my gcc (13.2.0) on Ubuntu the string is put into a
section called .rodata, but tcc on the same Linux box puts it in .data.
As a result the tcc compiled program runs without any issues and outputs
before [this is a test]
after [THIS IS A TEST]
Some C implementations, on some Linux systems might put strings in the
text segment, but I've not see a system that does that for decades.
Mind you "Linux" refers to a huge class of systems ranging from top-end
servers to tiny embedded devices)
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2024-08-02 00:37 +0000 |
| Message-ID | <20240801172148.200@kylheku.com> |
| In reply to | #387241 |
On 2024-08-01, Bart <bc@freeuk.com> wrote:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>> while (*s) {
>>> *s = toupper(*s); // SEGFAULT
>>> s++;
>>> }
>>> }
>>>
>>> int main() {
>>> char* text = "this is a test";
>>
>> The "this is a test" object is a literal. It is part of the program's image.
>
> So is the text here:
>
> char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.
The array which is initialized by the literal is what can be
changed.
In this situation, the literal is just initializer syntax,
not required to be an object with an address.
But there could well be such an object in the program image,
especially if the array is automatic, and thus instantiated
many times.
If the program tries to search for that object and modify it,
it will run into UB.
> I guess it depends on what is classed as the program's 'image'.
>
> I'd say the image in the state it is in just after loading or just
> before execution starts (since certain fixups are needed). But some
> sections will be writable during execution, some not.
Programs can self-modify in ways designed into the run time.
The toaster has certain internal receptacles that can take
certain forks, according to some rules, which do not affect
the user operating the toaster according to the manual.
> The dangers are small, but there must be reasons why a dedication
> section is normally used. gcc on Windows creates up to 19 sections, so
> it would odd for literal strings to share with code.
One reason is that PC-relative addressing can be used by code to
find its literals. Since that usually has a limited range, it helps
to keep the literals with the code. Combining sections also reduces
size. The addressing is also relocatable, which is useful in shared
libs.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2024-08-02 11:36 +0100 |
| Message-ID | <v8icrj$2paum$1@dont-email.me> |
| In reply to | #387249 |
On 02/08/2024 01:37, Kaz Kylheku wrote:
> On 2024-08-01, Bart <bc@freeuk.com> wrote:
>> On 01/08/2024 20:39, Kaz Kylheku wrote:
>>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>> while (*s) {
>>>> *s = toupper(*s); // SEGFAULT
>>>> s++;
>>>> }
>>>> }
>>>>
>>>> int main() {
>>>> char* text = "this is a test";
>>>
>>> The "this is a test" object is a literal. It is part of the program's image.
>>
>> So is the text here:
>>
>> char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
>
> The array which is initialized by the literal is what can be
> changed.
>
> In this situation, the literal is just initializer syntax,
> not required to be an object with an address.
I don't spot the 'int main() {' part of your example; my version of it
was meant to be static. (My A, B examples explicitly used 'static'.)
>> I guess it depends on what is classed as the program's 'image'.
>>
>> I'd say the image in the state it is in just after loading or just
>> before execution starts (since certain fixups are needed). But some
>> sections will be writable during execution, some not.
>
> Programs can self-modify in ways designed into the run time.
> The toaster has certain internal receptacles that can take
> certain forks, according to some rules, which do not affect
> the user operating the toaster according to the manual.
>
>> The dangers are small, but there must be reasons why a dedication
>> section is normally used. gcc on Windows creates up to 19 sections, so
>> it would odd for literal strings to share with code.
>
> One reason is that PC-relative addressing can be used by code to
> find its literals. Since that usually has a limited range, it helps
> to keep the literals with the code. Combining sections also reduces
> size. The addressing is also relocatable, which is useful in shared
> libs.
You must be talking about ARM then, with its limited address
displacement (I think 12 bits or +/- 2KB).
On x64, PC-relative uses a 32-bit offset so the range is +/- 2GB; enough
to have string literals located in their own read-only section of memory.
I'm sure you can do that on ARM too, I can think of several ways (and
there are loads more registers to play with keep as bases to tables of
such data). But I don't know what real code does.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2024-08-12 13:47 -0700 |
| Message-ID | <86a5hh4gft.fsf@linuxsc.com> |
| In reply to | #387249 |
Kaz Kylheku <643-408-1753@kylheku.com> writes:
> On 2024-08-01, Bart <bc@freeuk.com> wrote:
>
>> On 01/08/2024 20:39, Kaz Kylheku wrote:
>>
>>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>>
>>>> This program segfaults at the commented line:
>>>>
>>>> #include <ctype.h>
>>>> #include <stdio.h>
>>>>
>>>> void uppercase_ascii(char *s) {
>>>> while (*s) {
>>>> *s = toupper(*s); // SEGFAULT
>>>> s++;
>>>> }
>>>> }
>>>>
>>>> int main() {
>>>> char* text = "this is a test";
>>>
>>> The "this is a test" object is a literal. It is part of the
>>> program's image.
>>
>> So is the text here:
>>
>> char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
>
> The array which is initialized by the literal is what can be
> changed.
>
> In this situation, the literal is just initializer syntax,
> not required to be an object with an address.
In the abstract machine I believe the initializing string
literal is required to be an object with an address. The
discussion of string literals in 6.4.5 says there is such
an object for every string literal, and I don't see any
text in 6.7.9, covering Initialization, that overrules or
contradicts that.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2024-08-03 00:14 +0200 |
| Message-ID | <v8jlnk$31hqf$1@dont-email.me> |
| In reply to | #387241 |
On 01/08/2024 22:42, Bart wrote:
> On 01/08/2024 20:39, Kaz Kylheku wrote:
>> On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
>>> This program segfaults at the commented line:
>>>
>>> #include <ctype.h>
>>> #include <stdio.h>
>>>
>>> void uppercase_ascii(char *s) {
>>> while (*s) {
>>> *s = toupper(*s); // SEGFAULT
>>> s++;
>>> }
>>> }
>>>
>>> int main() {
>>> char* text = "this is a test";
>>
>> The "this is a test" object is a literal. It is part of the program's
>> image.
>
> So is the text here:
>
> char text[]="this is a test";
>
> But this can be changed without making the program self-modifying.
"this is a test" is a string literal, and is typically part of the
program's image. (There are some C implementations that do things
differently, like storing such initialisation data in a compressed format.)
The array "char text[]", however, is a normal variable of type array of
char. It is most definitely not part of the program image - it is in
ram (statically allocated or on the stack, depending on the context) and
is initialised by copying the characters from the string literal (prior
to main(), or at each entry to its scope if it is a local variable).
The string literal initialisation data cannot be changed without
self-modifying code or other undefined behaviour. The variable "text"
is just a normal array and can be changed at will.
>
> I guess it depends on what is classed as the program's 'image'.
>
No, it depends on understanding what the C means and not trying to
confuse yourself and others.
> I'd say the image in the state it is in just after loading or just
> before execution starts (since certain fixups are needed). But some
> sections will be writable during execution, some not.
>
That is a poor definition because you are not considering initialised
data, and you are not clear about what you mean by "before execution
starts". A C program typically has an entry point that clears the
zero-initialised program-lifetime data, initialises the initialised
program-lifetime data by copying from a block in the program image, then
sets up things like stdin, heap support, argc/argv, and various other
run-time setup features. Then it calls main(). The initialised data
section and zero-initialised data section are certainly part of the
state of the program at the start of the execution from C's viewpoint -
entry to main(). They are equally certainly not part of the program image.
One reasonable definition of "program image" would be "the file on the
disk" (on general-purpose OS's) or "the binary data in flash" on typical
embedded systems. Another might be the read-only data sections set up
by the OS loader just before jumping to the entry point of the C
run-time code (long before main() is called and the C code itself starts).
>> When you try to change it, you're making your program self-modifying.
>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a
>>> test")
>>> at inplace.c:6
>>> 6 *s = toupper(*s);
>>
>> On Linux, the string literals of a C executable are located together
>> with the program text. They are interspersed among the machine
>> instructions which reference them. The program text is mapped
>> read-only, so an attempted modification is an access violation trapped
>> by the OS, turned into a SIGSEGV signal.
>
> Does it really do that? That's the method I've used for read-only
> strings, to put them into the code-segment (since I neglected to support
> a dedicated read-only data section, and it's too much work now).
>
No, Linux systems don't have read-only data or string literals
interspersed with code. They have such data in separate segments, for
better cache efficiency and to allow different section attributes
(read-only data can't be executed).
> But I don't like it since the code section is also executable; you could
> inadvertently execute code within a string (which might happen to
> contain machine code for other purposes).
>
That's why code and read-only data is rarely interspersed.
> The dangers are small, but there must be reasons why a dedication
> section is normally used. gcc on Windows creates up to 19 sections, so
> it would odd for literal strings to share with code.
>
>
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-08-03 17:07 +0000 |
| Message-ID | <PttrO.6301$UJj9.4591@fx33.iad> |
| In reply to | #387275 |
David Brown <david.brown@hesbynett.no> writes:
>On 01/08/2024 22:42, Bart wrote:
>> char text[]="this is a test";
>>
>> But this can be changed without making the program self-modifying.
>
>"this is a test" is a string literal, and is typically part of the
>program's image. (There are some C implementations that do things
>differently, like storing such initialisation data in a compressed format.)
>
>The array "char text[]", however, is a normal variable of type array of
>char. It is most definitely not part of the program image - it is in
>ram (statically allocated or on the stack, depending on the context) and
>is initialised by copying the characters from the string literal (prior
>to main(), or at each entry to its scope if it is a local variable).
Linux (ELF):
A file-scope static declaration of char text[] will emit the string
literal into the .data section and that data section will be loaded
into memory by the ELF loader. There is no copy made at runtime
before main().
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
char text1[] = "This is a test of a static-scope string";
int
main(int argc, const char **argv)
{
char text2[] = "This is a test of a function-scope string";
fprintf(stdout, "%p %s\n", &text1, text1);
fprintf(stdout, "%s\n", text2);
return 0;
}
$ /tmp/a
0x601060 This is a test of a static-scope string
This is a test of a function-scope string
$ objdump -p /tmp/a
/tmp/a: file format elf64-x86-64
Program Header:
PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
filesz 0x000000000000001c memsz 0x000000000000001c flags r--
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-
.data section:
0000e00: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000e10: 5005 4000 0000 0000 3005 4000 0000 0000 P.@.....0.@.....
0000e20: 0000 0000 0000 0000 0100 0000 0000 0000 ................
0000e30: 0100 0000 0000 0000 0c00 0000 0000 0000 ................
0000e40: 2804 4000 0000 0000 0d00 0000 0000 0000 (.@.............
0000e50: a406 4000 0000 0000 1900 0000 0000 0000 ..@.............
0000e60: 100e 6000 0000 0000 1b00 0000 0000 0000 ..`.............
0000e70: 0800 0000 0000 0000 1a00 0000 0000 0000 ................
0000e80: 180e 6000 0000 0000 1c00 0000 0000 0000 ..`.............
0000e90: 0800 0000 0000 0000 f5fe ff6f 0000 0000 ...........o....
0000ea0: 9802 4000 0000 0000 0500 0000 0000 0000 ..@.............
0000eb0: 3803 4000 0000 0000 0600 0000 0000 0000 8.@.............
0000ec0: c002 4000 0000 0000 0a00 0000 0000 0000 ..@.............
0000ed0: 4700 0000 0000 0000 0b00 0000 0000 0000 G...............
0000ee0: 1800 0000 0000 0000 1500 0000 0000 0000 ................
0000ef0: 0000 0000 0000 0000 0300 0000 0000 0000 ................
0000f00: 0010 6000 0000 0000 0200 0000 0000 0000 ..`.............
0000f10: 4800 0000 0000 0000 1400 0000 0000 0000 H...............
0000f20: 0700 0000 0000 0000 1700 0000 0000 0000 ................
0000f30: e003 4000 0000 0000 0700 0000 0000 0000 ..@.............
0000f40: b003 4000 0000 0000 0800 0000 0000 0000 ..@.............
0000f50: 3000 0000 0000 0000 0900 0000 0000 0000 0...............
0000f60: 1800 0000 0000 0000 feff ff6f 0000 0000 ...........o....
0000f70: 9003 4000 0000 0000 ffff ff6f 0000 0000 ..@........o....
0000f80: 0100 0000 0000 0000 f0ff ff6f 0000 0000 ...........o....
0000f90: 8003 4000 0000 0000 0000 0000 0000 0000 ..@.............
0000fa0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001000: 280e 6000 0000 0000 0000 0000 0000 0000 (.`.............
0001010: 0000 0000 0000 0000 6604 4000 0000 0000 ........f.@.....
0001020: 7604 4000 0000 0000 8604 4000 0000 0000 v.@.......@.....
0001030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0001060: 5468 6973 2069 7320 6120 7465 7374 206f This is a test o
0001070: 6620 6120 7374 6174 6963 2d73 636f 7065 f a static-scope
0001080: 2073 7472 696e 6700 4743 433a 2028 474e string.GCC: (GN
$ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
0x250
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-03 17:11 -0700 |
| Message-ID | <877ccxdu44.fsf@nosuchdomain.example.com> |
| In reply to | #387300 |
scott@slp53.sl.home (Scott Lurndal) writes:
> David Brown <david.brown@hesbynett.no> writes:
>>On 01/08/2024 22:42, Bart wrote:
>>> char text[]="this is a test";
>>>
>>> But this can be changed without making the program self-modifying.
>>
>>"this is a test" is a string literal, and is typically part of the
>>program's image. (There are some C implementations that do things
>>differently, like storing such initialisation data in a compressed format.)
>>
>>The array "char text[]", however, is a normal variable of type array of
>>char. It is most definitely not part of the program image - it is in
>>ram (statically allocated or on the stack, depending on the context) and
>>is initialised by copying the characters from the string literal (prior
>>to main(), or at each entry to its scope if it is a local variable).
>
> Linux (ELF):
>
> A file-scope static declaration of char text[] will emit the string
> literal into the .data section and that data section will be loaded
> into memory by the ELF loader. There is no copy made at runtime
> before main().
>
> #include <stdint.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> char text1[] = "This is a test of a static-scope string";
In the abstract machine, there's an anonymous array object corresponding
to the string literal, and `text` is a distinct object, also with static
storage duration. The compiler optimizes it away and only stores the
data in `text`.
> int
> main(int argc, const char **argv)
> {
> char text2[] = "This is a test of a function-scope string";
Since the second string literal is identical, the compiler is permitted
to store them in the same place (it's unspecified, so the implementation
doesn't have to document this). Presumably there's code to copy from
the static array into `text2`, executed within `main`.
>
> fprintf(stdout, "%p %s\n", &text1, text1);
> fprintf(stdout, "%s\n", text2);
>
> return 0;
> }
>
> $ /tmp/a
> 0x601060 This is a test of a static-scope string
> This is a test of a function-scope string
>
> $ objdump -p /tmp/a
>
> /tmp/a: file format elf64-x86-64
>
> Program Header:
> PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
> filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
> INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
> filesz 0x000000000000001c memsz 0x000000000000001c flags r--
> LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
> filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
> LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
> filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-
>
> .data section:
>
> 0000e00: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[36 lines deleted]
> 0001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0001060: 5468 6973 2069 7320 6120 7465 7374 206f This is a test o
> 0001070: 6620 6120 7374 6174 6963 2d73 636f 7065 f a static-scope
> 0001080: 2073 7472 696e 6700 4743 433a 2028 474e string.GCC: (GN
>
> $ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
> 0x250
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-03 17:07 -0700 |
| Message-ID | <87bk29duba.fsf@nosuchdomain.example.com> |
| In reply to | #387275 |
David Brown <david.brown@hesbynett.no> writes:
[...]
> "this is a test" is a string literal, and is typically part of the
> program's image. (There are some C implementations that do things
> differently, like storing such initialisation data in a compressed
> format.)
[...]
What implementations do that? Typically data that's all zeros isn't
stored in the image, but general compression isn't something I've seen
(not that I've paid much attention). It would save space in the image,
but it would require decompression at load time and wouldn't save any
space at run time.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Lawrence D'Oliveiro <ldo@nz.invalid> |
|---|---|
| Date | 2024-08-04 01:08 +0000 |
| Message-ID | <v8mkao$3n2rq$5@dont-email.me> |
| In reply to | #387313 |
On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote: > ... general compression isn't something I've seen ... I recall Apple had a patent on some aspects of the “PEF” executable format that they created for their PowerPC machines running old MacOS. This had to do with some clever instruction encodings for loading stuff into memory.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-03 19:58 -0700 |
| Message-ID | <8734nldmea.fsf@nosuchdomain.example.com> |
| In reply to | #387317 |
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>
>> ... general compression isn't something I've seen ...
>
> I recall Apple had a patent on some aspects of the “PEF” executable format
> that they created for their PowerPC machines running old MacOS. This had
> to do with some clever instruction encodings for loading stuff into
> memory.
Is that relevant to what I asked about?
What I had in mind is something that, given this:
static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
would store something less than 1000*sizeof(int) bytes in the executable
file. I wouldn't be hard to do, but I'm not convinced it would be
worthwhile.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Richard Damon <richard@damon-family.org> |
|---|---|
| Date | 2024-08-04 07:22 -0400 |
| Message-ID | <6a0172b8fe12171901a97032b8a91641308fdc50@i2pn2.org> |
| In reply to | #387320 |
On 8/3/24 10:58 PM, Keith Thompson wrote:
> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>
>>> ... general compression isn't something I've seen ...
>>
>> I recall Apple had a patent on some aspects of the “PEF” executable format
>> that they created for their PowerPC machines running old MacOS. This had
>> to do with some clever instruction encodings for loading stuff into
>> memory.
>
> Is that relevant to what I asked about?
>
> What I had in mind is something that, given this:
>
> static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
>
> would store something less than 1000*sizeof(int) bytes in the executable
> file. I wouldn't be hard to do, but I'm not convinced it would be
> worthwhile.
>
I vaguely seem to remember an embedded format that did something like
this. The .init segement that was "copied" to the .data segement has a
simple run-length encoding option. For non-repetitive data, it just
encoded 1 copy of length n. But it could also encode repeats like your
example. When EPROM was a scarce commodity squeezing out a bit of size
for the .init segment was useful.
My guess that since it didn't persist, it didn't actually help that much.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2024-08-12 02:55 -0700 |
| Message-ID | <86le129ibu.fsf@linuxsc.com> |
| In reply to | #387322 |
Richard Damon <richard@damon-family.org> writes:
> On 8/3/24 10:58 PM, Keith Thompson wrote:
>
>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>
>>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>>
>>>> ... general compression isn't something I've seen ...
>>>
>>> I recall Apple had a patent on some aspects of the ?PEF?
>>> executable format that they created for their PowerPC machines
>>> running old MacOS. This had to do with some clever instruction
>>> encodings for loading stuff into memory.
>>
>> Is that relevant to what I asked about?
>>
>> What I had in mind is something that, given this:
>>
>> static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements
>>
>> would store something less than 1000*sizeof(int) bytes in the
>> executable file. I wouldn't be hard to do, but I'm not convinced
>> it would be worthwhile.
>
> I vaguely seem to remember an embedded format that did something like
> this. The .init segement that was "copied" to the .data segement has
> a simple run-length encoding option. For non-repetitive data, it
> just encoded 1 copy of length n. But it could also encode repeats
> like your example. When EPROM was a scarce commodity squeezing out a
> bit of size for the .init segment was useful.
>
> My guess that since it didn't persist, it didn't actually help that
> much.
Or maybe it helped back in the day, but since then technology has
changed and it doesn't help any more.
[toc] | [prev] | [next] | [standalone]
| From | Lawrence D'Oliveiro <ldo@nz.invalid> |
|---|---|
| Date | 2024-08-05 06:33 +0000 |
| Message-ID | <v8prni$hmng$1@dont-email.me> |
| In reply to | #387320 |
On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote: > Lawrence D'Oliveiro <ldo@nz.invalid> writes: > >> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote: >> >>> ... general compression isn't something I've seen ... >> >> I recall Apple had a patent on some aspects of the “PEF” executable >> format that they created for their PowerPC machines running old MacOS. >> This had to do with some clever instruction encodings for loading stuff >> into memory. > > Is that relevant to what I asked about? “Compression”
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-08-04 23:38 -0700 |
| Message-ID | <871q33cw4p.fsf@nosuchdomain.example.com> |
| In reply to | #387343 |
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
> On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:
>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>> On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
>>>
>>>> ... general compression isn't something I've seen ...
>>>
>>> I recall Apple had a patent on some aspects of the “PEF” executable
>>> format that they created for their PowerPC machines running old MacOS.
>>> This had to do with some clever instruction encodings for loading stuff
>>> into memory.
>>
>> Is that relevant to what I asked about?
>
> “Compression”
Was that intended to be responsive?
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
Back to top | Article view | comp.lang.c
csiph-web