Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #383809 > unrolled thread

Re: A Famous Security Bug

Started byKaz Kylheku <433-929-6894@kylheku.com>
First post2024-03-20 18:54 +0000
Last post2024-03-28 05:52 -0400
Articles 20 on this page of 116 — 13 participants

Back to article view | Back to comp.lang.c

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-20 18:54 +0000
    Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-20 19:38 +0000
      Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-20 14:20 -0700
        Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-20 14:23 -0700
    Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-21 16:13 +0100
      Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-21 17:41 +0000
        Re: A Famous Security Bug "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-21 12:37 -0700
          Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-21 20:21 +0000
            Re: A Famous Security Bug "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-21 14:31 -0700
              Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-21 23:19 +0000
                Re: A Famous Security Bug "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-21 17:38 -0700
                  Re: A Famous Security Bug "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-22 12:39 -0700
        Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-21 13:46 -0700
          Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 15:50 +0000
            Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-22 09:31 -0700
              Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 17:20 +0000
                Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 13:38 -0400
                  Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 19:27 +0000
                Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 19:13 +0100
                Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-22 11:21 -0700
                  Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 19:43 +0000
                    Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-23 16:36 +0100
                      Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-23 16:07 +0000
                        Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-23 18:58 +0100
                          Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-24 01:23 +0000
                        Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-23 12:51 -0400
                          Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-24 05:50 +0000
                            Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 14:21 +0100
                              Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-24 16:02 +0000
                                Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 17:27 +0100
                                  Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-27 21:06 +0000
                                    Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-28 19:07 +0100
                                Re: A Famous Security Bug "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-24 12:45 -0700
            Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 13:05 -0400
            Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 18:42 +0100
              Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 18:55 +0000
                Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 21:26 +0100
          Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 12:35 -0400
            Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 17:28 +0000
              Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 13:38 -0400
        Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 13:51 +0100
    Re: A Famous Security Bug Anton Shepelev <anton.txt@gmail.moc> - 2024-03-21 21:13 +0300
      Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-21 12:42 -0700
      Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-21 20:21 +0000
        Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 14:38 +0100
          Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-22 15:33 +0000
            Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 13:15 -0400
            Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-22 18:50 +0100
              Re: A Famous Security Bug Richard Kettlewell <invalid@invalid.invalid> - 2024-03-23 09:20 +0000
                Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-23 16:06 +0000
                Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-23 17:08 +0100
                  Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-23 16:56 +0000
                Re: A Famous Security Bug Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-03-24 09:45 -0700
                  Re: A Famous Security Bug Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-03-24 17:53 +0000
                    Re: A Famous Security Bug Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-04-17 12:10 -0700
                      Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-04-18 10:20 +0200
                      Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-04-18 14:26 -0700
        Re: A Famous Security Bug Anton Shepelev <anton.txt@g{oogle}mail.com> - 2024-03-28 12:23 +0300
          Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-28 14:12 +0000
      Re: A Famous Security Bug Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-03-22 07:50 -0700
      Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-22 13:14 -0400
        Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-22 21:41 +0000
          Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-22 16:30 -0700
            Re: A Famous Security Bug Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-23 00:09 +0000
              Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-23 17:25 +0100
                Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-23 16:51 +0000
                Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-23 19:58 +0000
                  Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 14:42 +0100
          Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-23 03:26 -0400
            Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-23 11:26 +0000
              Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-23 17:51 +0100
                Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-23 21:21 +0000
                  Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 15:52 +0100
                    Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-24 19:56 +0000
                      Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-24 13:49 -0700
                        Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 23:38 +0100
                        Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 01:42 +0300
                          Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 09:37 +0100
                          Re: A Famous Security Bug Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-25 08:54 -0700
                        Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-24 23:07 +0000
                          Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 01:39 +0200
                            Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-25 02:12 +0000
                              Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 09:58 +0100
                                Re: A Famous Security Bug Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-03-25 13:26 +0000
                                  Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 15:43 +0200
                              Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-25 17:21 +0000
                            Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 09:53 +0100
                              Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-25 17:24 +0000
                      Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-24 23:43 +0100
                        Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 13:16 +0200
                          Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 13:26 +0100
                            Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 15:11 +0200
                              Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 16:30 +0100
                              Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-25 16:39 +0000
                            Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-25 16:06 +0000
                              Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 18:51 +0200
                                Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-25 18:10 +0000
                                  Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 21:01 +0100
                                    Re: A Famous Security Bug scott@slp53.sl.home (Scott Lurndal) - 2024-03-25 20:28 +0000
                                  Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 23:05 +0200
                                    Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-25 21:25 +0000
                                      Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-26 01:31 +0200
                                        Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-26 00:34 +0000
                              Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 19:07 +0100
                  Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-24 18:53 +0300
                    Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-24 18:58 +0000
                      Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 13:04 +0200
                        Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-25 13:24 +0200
                      Re: A Famous Security Bug David Brown <david.brown@hesbynett.no> - 2024-03-25 16:17 +0100
                      Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-28 06:14 -0400
              Re: A Famous Security Bug Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-03-23 11:44 -0700
              Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-24 17:22 +0300
              Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-24 17:26 +0300
                Re: A Famous Security Bug bart <bc@freeuk.com> - 2024-03-24 19:12 +0000
                  Re: A Famous Security Bug Michael S <already5chosen@yahoo.com> - 2024-03-24 22:33 +0300
              Re: A Famous Security Bug James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-03-28 05:52 -0400

Page 1 of 6  [1] 2 3 4 5 6  Next page →


#383809 — Re: A Famous Security Bug

FromKaz Kylheku <433-929-6894@kylheku.com>
Date2024-03-20 18:54 +0000
SubjectRe: A Famous Security Bug
Message-ID<20240320114218.151@kylheku.com>
On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>   A "famous security bug":
>
> void f( void )
> { char buffer[ MAX ];
>   /* . . . */
>   memset( buffer, 0, sizeof( buffer )); }
>
>   . Can you see what the bug is?

I don't know about "the bug", but conditions can be identified under
which that would have a problem executing, like MAX being in excess
of available automatic storage.

If the /*...*/ comment represents the elision of some security sensitive
code, where the memset is intended to obliterate secret information,
of course, that obliteration is not required to work.

After the memset, the buffer has no next use, so the all the assignments
performed by memset to the bytes of buffer are dead assignments that can
be elided.

To securely clear memory, you have to use a function for that purpose
that is not susceptible to optimization.

If you're not doing anything stupid, like link time optimization, an
external function in another translation unit (a function that the
compiler doesn't recognize as being an alias or wrapper for memset)
ought to suffice.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [next] | [standalone]


#383812

Fromscott@slp53.sl.home (Scott Lurndal)
Date2024-03-20 19:38 +0000
Message-ID<lXGKN.156286$t8cc.2924@fx06.iad>
In reply to#383809
Kaz Kylheku <433-929-6894@kylheku.com> writes:
>On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>   A "famous security bug":
>>
>> void f( void )
>> { char buffer[ MAX ];
>>   /* . . . */
>>   memset( buffer, 0, sizeof( buffer )); }
>>
>>   . Can you see what the bug is?
>
>I don't know about "the bug", but conditions can be identified under
>which that would have a problem executing, like MAX being in excess
>of available automatic storage.

Perhaps Stephan is under the mistaken assumption that
'buffer' devolves to a type of 'char *' when used
with the sizeof operator.

[toc] | [prev] | [next] | [standalone]


#383819

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-03-20 14:20 -0700
Message-ID<87zfus1txp.fsf@nosuchdomain.example.com>
In reply to#383812
scott@slp53.sl.home (Scott Lurndal) writes:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>   A "famous security bug":
>>>
>>> void f( void )
>>> { char buffer[ MAX ];
>>>   /* . . . */
>>>   memset( buffer, 0, sizeof( buffer )); }
>>>
>>>   . Can you see what the bug is?
>>
>>I don't know about "the bug", but conditions can be identified under
>>which that would have a problem executing, like MAX being in excess
>>of available automatic storage.
>
> Perhaps Stephan is under the mistaken assumption that
> 'buffer' devolves to a type of 'char *' when used
> with the sizeof operator.

That was my first thought, but I think the idea (not clearly stated) is
that the /* . . . */ code stores sensitive information in buffer, and
the memset call is intended to clobber that information, but may be
elided since buffer is not explicitly used later.  A malicious process
with access to the program's memory might be able to read that
information after f() has returned.

C23 adds memset_explicit() for this purpose.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#383820

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-03-20 14:23 -0700
Message-ID<87v85g1tsn.fsf@nosuchdomain.example.com>
In reply to#383819
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> scott@slp53.sl.home (Scott Lurndal) writes:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>>On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>   A "famous security bug":
>>>>
>>>> void f( void )
>>>> { char buffer[ MAX ];
>>>>   /* . . . */
>>>>   memset( buffer, 0, sizeof( buffer )); }
>>>>
>>>>   . Can you see what the bug is?
>>>
>>>I don't know about "the bug", but conditions can be identified under
>>>which that would have a problem executing, like MAX being in excess
>>>of available automatic storage.
>>
>> Perhaps Stephan is under the mistaken assumption that
>> 'buffer' devolves to a type of 'char *' when used
>> with the sizeof operator.
>
> That was my first thought, but I think the idea (not clearly stated) is
> that the /* . . . */ code stores sensitive information in buffer, and
> the memset call is intended to clobber that information, but may be
> elided since buffer is not explicitly used later.  A malicious process
> with access to the program's memory might be able to read that
> information after f() has returned.

And I should acknowledge that Kaz mentioned that before I did.

> C23 adds memset_explicit() for this purpose.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#383835

FromDavid Brown <david.brown@hesbynett.no>
Date2024-03-21 16:13 +0100
Message-ID<uthirj$29aoc$1@dont-email.me>
In reply to#383809
On 20/03/2024 19:54, Kaz Kylheku wrote:
> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>    A "famous security bug":
>>
>> void f( void )
>> { char buffer[ MAX ];
>>    /* . . . */
>>    memset( buffer, 0, sizeof( buffer )); }
>>
>>    . Can you see what the bug is?
> 
> I don't know about "the bug", but conditions can be identified under
> which that would have a problem executing, like MAX being in excess
> of available automatic storage.
> 
> If the /*...*/ comment represents the elision of some security sensitive
> code, where the memset is intended to obliterate secret information,
> of course, that obliteration is not required to work.
> 
> After the memset, the buffer has no next use, so the all the assignments
> performed by memset to the bytes of buffer are dead assignments that can
> be elided.
> 
> To securely clear memory, you have to use a function for that purpose
> that is not susceptible to optimization.
> 
> If you're not doing anything stupid, like link time optimization, an
> external function in another translation unit (a function that the
> compiler doesn't recognize as being an alias or wrapper for memset)
> ought to suffice.
> 

Using LTO is not "stupid".  Relying on people /not/ using LTO, or not 
using other valid optimisations, is "stupid".

/Really/ dealing with this kind of potential data leakage is a 
multi-faceted problem.  Does it matter if you zero out this buffer, if 
that is just in the cache and the original password data is still in the 
ram ready to be read with a Rowhammer attack?   Or if there is a copy 
somewhere else in code that used it?

Of course it is important to deal with possible security issues at every 
practical point, so zeroing out this buffer is part of the solution - as 
long as no one thinks that using C23's memset_explicit(), or similar 
functions, are somehow complete fixes.

Calling an external function in another translation unit, however, is 
not a way to guarantee particular effects.  Using volatile accesses is 
better (if you don't have a suitable function with the right guarantees 
for your target OS and/or C version).  There are also compiler-specific 
ways, such as adding "__asm__("" : "+m" (buffer));" after the memset.


[toc] | [prev] | [next] | [standalone]


#383841

FromKaz Kylheku <433-929-6894@kylheku.com>
Date2024-03-21 17:41 +0000
Message-ID<20240321092738.111@kylheku.com>
In reply to#383835
On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
> On 20/03/2024 19:54, Kaz Kylheku wrote:
>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>    A "famous security bug":
>>>
>>> void f( void )
>>> { char buffer[ MAX ];
>>>    /* . . . */
>>>    memset( buffer, 0, sizeof( buffer )); }
>>>
>>>    . Can you see what the bug is?
>> 
>> I don't know about "the bug", but conditions can be identified under
>> which that would have a problem executing, like MAX being in excess
>> of available automatic storage.
>> 
>> If the /*...*/ comment represents the elision of some security sensitive
>> code, where the memset is intended to obliterate secret information,
>> of course, that obliteration is not required to work.
>> 
>> After the memset, the buffer has no next use, so the all the assignments
>> performed by memset to the bytes of buffer are dead assignments that can
>> be elided.
>> 
>> To securely clear memory, you have to use a function for that purpose
>> that is not susceptible to optimization.
>> 
>> If you're not doing anything stupid, like link time optimization, an
>> external function in another translation unit (a function that the
>> compiler doesn't recognize as being an alias or wrapper for memset)
>> ought to suffice.
>
> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not 
> using other valid optimisations, is "stupid".

LTO is a nonconforming optimization. It destroys the concept that
when a translation unit is translated, the semantic analysis is
complete, such that the only remaining activity is resolution of
external references (linkage), and that the semantic analysis of one
translation unit deos not use information about another translation
unit.

This has not yet changed in last April's N3096 draft, where
translation phases 7 and 8 are:

  7. White-space characters separating tokens are no longer significant.
     Each preprocessing token is converted into a token. The resulting
     tokens are syntactically and semantically analyzed and translated
     as a translation unit.

  8. All external object and function references are resolved. Library
     components are linked to satisfy external references to functions
     and objects not defined in the current translation. All such
     translator output is collected into a program image which contains
     information needed for execution in its execution environment.

and before that, the Program Structure section says:

  The separate translation units of a program communicate by (for
  example) calls to functions whose identifiers have external linkage,
  manipulation of objects whose identifiers have external linkage, or
  manipulation of data files. Translation units may be separately
  translated and then later linked to produce an executable program.

LTO deviates from the the model that translation units are separate,
and the conceptual steps of phases 7 and 8.

The translation unit that is prepared for LTO is not fully cooked.  You
have no idea what its code will turn into when the interrupted
compilation is resumed during linkage, under the influence of other
tranlation units it is combined with.

So in fact, the language allows us to take it for granted that, given

  my_memset(array, 0, sizeof(array)); }

at the end of a function, and my_memset is an external definition
provided by another translation unit, the call may not be elided.

The one who may be acting recklessly is he who turns on nonconforming
optimizations that are not documented as supported by the code base.

Another example would be something like gcc's -ffast-math.
You wouldn't unleash that on numerical code written by experts,
and expect the same correct results.


-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#383846

From"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date2024-03-21 12:37 -0700
Message-ID<uti2am$2d2ts$1@dont-email.me>
In reply to#383841
On 3/21/2024 10:41 AM, Kaz Kylheku wrote:
> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>     A "famous security bug":
>>>>
>>>> void f( void )
>>>> { char buffer[ MAX ];
>>>>     /* . . . */
>>>>     memset( buffer, 0, sizeof( buffer )); }
>>>>
>>>>     . Can you see what the bug is?
>>>
>>> I don't know about "the bug", but conditions can be identified under
>>> which that would have a problem executing, like MAX being in excess
>>> of available automatic storage.
>>>
>>> If the /*...*/ comment represents the elision of some security sensitive
>>> code, where the memset is intended to obliterate secret information,
>>> of course, that obliteration is not required to work.
>>>
>>> After the memset, the buffer has no next use, so the all the assignments
>>> performed by memset to the bytes of buffer are dead assignments that can
>>> be elided.
>>>
>>> To securely clear memory, you have to use a function for that purpose
>>> that is not susceptible to optimization.
>>>
>>> If you're not doing anything stupid, like link time optimization, an
>>> external function in another translation unit (a function that the
>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>> ought to suffice.
>>
>> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not
>> using other valid optimisations, is "stupid".
> 
> LTO is a nonconforming optimization. It destroys the concept that
> when a translation unit is translated, the semantic analysis is
> complete, such that the only remaining activity is resolution of
> external references (linkage), and that the semantic analysis of one
> translation unit deos not use information about another translation
> unit.
>[...]

Side note:

Actually, way back (pre c/c++ 11), I was worried about LTO messing up my 
custom, highly sensitive sync code.

https://web.archive.org/web/20070509044340/http://appcore.home.comcast.net/

Notice the externally assembled functions comment?

"All of its “critical-sequences” are contained in externally assembled 
functions ( read all ) in order to prevent a rouge C compiler from 
reordering anything that would corrupt the data-structure. The queue 
allocates its nodes from a three-level cache"

If a damn "rogue" compiler can mess with my custom ASM then things are 
going to be broken...

;^)

[toc] | [prev] | [next] | [standalone]


#383852

Fromscott@slp53.sl.home (Scott Lurndal)
Date2024-03-21 20:21 +0000
Message-ID<ZE0LN.84950$_a1e.38190@fx16.iad>
In reply to#383846
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

>"All of its “critical-sequences” are contained in externally assembled 
>functions ( read all ) in order to prevent a rouge C compiler from 

As opposed to a viridian C compiler?

[toc] | [prev] | [next] | [standalone]


#383860

From"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date2024-03-21 14:31 -0700
Message-ID<uti8ve$2ekbr$1@dont-email.me>
In reply to#383852
On 3/21/2024 1:21 PM, Scott Lurndal wrote:
> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
> 
>> "All of its “critical-sequences” are contained in externally assembled
>> functions ( read all ) in order to prevent a rouge C compiler from
> 
> As opposed to a viridian C compiler?

I was worried about "overly aggressive" LTO messing around with my ASM.

[toc] | [prev] | [next] | [standalone]


#383863

Fromscott@slp53.sl.home (Scott Lurndal)
Date2024-03-21 23:19 +0000
Message-ID<Hf3LN.544352$Ama9.472059@fx12.iad>
In reply to#383860
"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>On 3/21/2024 1:21 PM, Scott Lurndal wrote:
>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>> 
>>> "All of its “critical-sequences” are contained in externally assembled
>>> functions ( read all ) in order to prevent a rouge C compiler from
>> 
>> As opposed to a viridian C compiler?
>
>I was worried about "overly aggressive" LTO messing around with my ASM.

And you missed the oblique reference to the mispelling of 'rogue' as 'rouge'.

[toc] | [prev] | [next] | [standalone]


#383864

From"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date2024-03-21 17:38 -0700
Message-ID<utijv2$2h3up$1@dont-email.me>
In reply to#383863
On 3/21/2024 4:19 PM, Scott Lurndal wrote:
> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>> On 3/21/2024 1:21 PM, Scott Lurndal wrote:
>>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>>>
>>>> "All of its “critical-sequences” are contained in externally assembled
>>>> functions ( read all ) in order to prevent a rouge C compiler from
>>>
>>> As opposed to a viridian C compiler?
>>
>> I was worried about "overly aggressive" LTO messing around with my ASM.
> 
> And you missed the oblique reference to the mispelling of 'rogue' as 'rouge'.

Yup! I sure did. I have red on my face!

[toc] | [prev] | [next] | [standalone]


#383896

From"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date2024-03-22 12:39 -0700
Message-ID<utkmpk$33puj$1@dont-email.me>
In reply to#383864
On 3/21/2024 5:38 PM, Chris M. Thomasson wrote:
> On 3/21/2024 4:19 PM, Scott Lurndal wrote:
>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>>> On 3/21/2024 1:21 PM, Scott Lurndal wrote:
>>>> "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>>>>
>>>>> "All of its “critical-sequences” are contained in externally assembled
>>>>> functions ( read all ) in order to prevent a rouge C compiler from
>>>>
>>>> As opposed to a viridian C compiler?
>>>
>>> I was worried about "overly aggressive" LTO messing around with my ASM.
>>
>> And you missed the oblique reference to the mispelling of 'rogue' as 
>> 'rouge'.
> 
> Yup! I sure did. I have red on my face!

I wonder if I have a bit of dyslexia. Sometimes when I am typing along 
without looking at the keyboard, I can make a mistake that is backwards 
wrt two letters.

For instance, spelling the word "careful" as "carfeul", car fuel? lol... 
The mistake I made with rogue vs rouge is that same swapping error as 
well. This is a "bad" one because spell checker does not flag it.

It's strange because when I look at the keyboard while I am typing, 
well, that does not occur.

[toc] | [prev] | [next] | [standalone]


#383856

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-03-21 13:46 -0700
Message-ID<87a5mr1ffp.fsf@nosuchdomain.example.com>
In reply to#383841
Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>    A "famous security bug":
>>>>
>>>> void f( void )
>>>> { char buffer[ MAX ];
>>>>    /* . . . */
>>>>    memset( buffer, 0, sizeof( buffer )); }
>>>>
>>>>    . Can you see what the bug is?
>>> 
>>> I don't know about "the bug", but conditions can be identified under
>>> which that would have a problem executing, like MAX being in excess
>>> of available automatic storage.
>>> 
>>> If the /*...*/ comment represents the elision of some security sensitive
>>> code, where the memset is intended to obliterate secret information,
>>> of course, that obliteration is not required to work.
>>> 
>>> After the memset, the buffer has no next use, so the all the assignments
>>> performed by memset to the bytes of buffer are dead assignments that can
>>> be elided.
>>> 
>>> To securely clear memory, you have to use a function for that purpose
>>> that is not susceptible to optimization.
>>> 
>>> If you're not doing anything stupid, like link time optimization, an
>>> external function in another translation unit (a function that the
>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>> ought to suffice.
>>
>> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not 
>> using other valid optimisations, is "stupid".
>
> LTO is a nonconforming optimization. It destroys the concept that
> when a translation unit is translated, the semantic analysis is
> complete, such that the only remaining activity is resolution of
> external references (linkage), and that the semantic analysis of one
> translation unit deos not use information about another translation
> unit.
>
> This has not yet changed in last April's N3096 draft, where
> translation phases 7 and 8 are:
>
>   7. White-space characters separating tokens are no longer significant.
>      Each preprocessing token is converted into a token. The resulting
>      tokens are syntactically and semantically analyzed and translated
>      as a translation unit.
>
>   8. All external object and function references are resolved. Library
>      components are linked to satisfy external references to functions
>      and objects not defined in the current translation. All such
>      translator output is collected into a program image which contains
>      information needed for execution in its execution environment.
>
> and before that, the Program Structure section says:
>
>   The separate translation units of a program communicate by (for
>   example) calls to functions whose identifiers have external linkage,
>   manipulation of objects whose identifiers have external linkage, or
>   manipulation of data files. Translation units may be separately
>   translated and then later linked to produce an executable program.
>
> LTO deviates from the the model that translation units are separate,
> and the conceptual steps of phases 7 and 8.
[...]

Link time optimization is as valid as cross-function optimization *as
long as* it doesn't change the defined behavior of the program.

Say I have a call to foo in main, and the definition of foo is in
another translation unit.  In the absence of LTO, the compiler will have
to generate a call to foo.  If LTO is able to determine that foo doesn't
do anything, it can remove the code for the function call, and the
resulting behavior of the linked program is unchanged.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#383877

FromKaz Kylheku <433-929-6894@kylheku.com>
Date2024-03-22 15:50 +0000
Message-ID<20240322083648.539@kylheku.com>
In reply to#383856
On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>>    A "famous security bug":
>>>>>
>>>>> void f( void )
>>>>> { char buffer[ MAX ];
>>>>>    /* . . . */
>>>>>    memset( buffer, 0, sizeof( buffer )); }
>>>>>
>>>>>    . Can you see what the bug is?
>>>> 
>>>> I don't know about "the bug", but conditions can be identified under
>>>> which that would have a problem executing, like MAX being in excess
>>>> of available automatic storage.
>>>> 
>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>> code, where the memset is intended to obliterate secret information,
>>>> of course, that obliteration is not required to work.
>>>> 
>>>> After the memset, the buffer has no next use, so the all the assignments
>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>> be elided.
>>>> 
>>>> To securely clear memory, you have to use a function for that purpose
>>>> that is not susceptible to optimization.
>>>> 
>>>> If you're not doing anything stupid, like link time optimization, an
>>>> external function in another translation unit (a function that the
>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>> ought to suffice.
>>>
>>> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not 
>>> using other valid optimisations, is "stupid".
>>
>> LTO is a nonconforming optimization. It destroys the concept that
>> when a translation unit is translated, the semantic analysis is
>> complete, such that the only remaining activity is resolution of
>> external references (linkage), and that the semantic analysis of one
>> translation unit deos not use information about another translation
>> unit.
>>
>> This has not yet changed in last April's N3096 draft, where
>> translation phases 7 and 8 are:
>>
>>   7. White-space characters separating tokens are no longer significant.
>>      Each preprocessing token is converted into a token. The resulting
>>      tokens are syntactically and semantically analyzed and translated
>>      as a translation unit.
>>
>>   8. All external object and function references are resolved. Library
>>      components are linked to satisfy external references to functions
>>      and objects not defined in the current translation. All such
>>      translator output is collected into a program image which contains
>>      information needed for execution in its execution environment.
>>
>> and before that, the Program Structure section says:
>>
>>   The separate translation units of a program communicate by (for
>>   example) calls to functions whose identifiers have external linkage,
>>   manipulation of objects whose identifiers have external linkage, or
>>   manipulation of data files. Translation units may be separately
>>   translated and then later linked to produce an executable program.
>>
>> LTO deviates from the the model that translation units are separate,
>> and the conceptual steps of phases 7 and 8.
> [...]
>
> Link time optimization is as valid as cross-function optimization *as
> long as* it doesn't change the defined behavior of the program.

It always does; the interaction of a translation unit with another
is an externally visible aspect of the C program. (That can be inferred
from the rules which forbid semantic analysis across translation
units, only linkage.)

That's why we can have a real world security issue caused by zeroing
being optimized away.

The rules spelled out in ISO C allow us to unit test a translation
unit by linking it to some harness, and be sure it has exactly the
same behaviors when linked to the production program.

If I have some translation unit in which there is a function foo, such
that when I call foo, it then calls an external function bar, that's
observable. I can link that unit to a program which supplies bar,
containing a printf call, then call foo and verify that the printf call
is executed.

Since ISO C says that the semantic analysis has been done (that
unit having gone through phase 7), we can take it for granted as a
done-and-dusted property of that translation unit that it calls bar
whenever its foo is invoked.

> Say I have a call to foo in main, and the definition of foo is in
> another translation unit.  In the absence of LTO, the compiler will have
> to generate a call to foo.  If LTO is able to determine that foo doesn't
> do anything, it can remove the code for the function call, and the
> resulting behavior of the linked program is unchanged.

There always situations in which optimizations that have been forbidden
don't cause a problem, and are even desirable.

If you have LTO turned on, you might be programming in GNU C or Clang C
or whatever, not standard C.

Sometimes programs have the same interpretation in GNU C and standard
C, or the same interpretation to someone who doesn't care about certain
differences.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#383880

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-03-22 09:31 -0700
Message-ID<87le6az0s8.fsf@nosuchdomain.example.com>
In reply to#383877
Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
>>>> On 20/03/2024 19:54, Kaz Kylheku wrote:
>>>>> On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>>>>>>    A "famous security bug":
>>>>>>
>>>>>> void f( void )
>>>>>> { char buffer[ MAX ];
>>>>>>    /* . . . */
>>>>>>    memset( buffer, 0, sizeof( buffer )); }
>>>>>>
>>>>>>    . Can you see what the bug is?
>>>>> 
>>>>> I don't know about "the bug", but conditions can be identified under
>>>>> which that would have a problem executing, like MAX being in excess
>>>>> of available automatic storage.
>>>>> 
>>>>> If the /*...*/ comment represents the elision of some security sensitive
>>>>> code, where the memset is intended to obliterate secret information,
>>>>> of course, that obliteration is not required to work.
>>>>> 
>>>>> After the memset, the buffer has no next use, so the all the assignments
>>>>> performed by memset to the bytes of buffer are dead assignments that can
>>>>> be elided.
>>>>> 
>>>>> To securely clear memory, you have to use a function for that purpose
>>>>> that is not susceptible to optimization.
>>>>> 
>>>>> If you're not doing anything stupid, like link time optimization, an
>>>>> external function in another translation unit (a function that the
>>>>> compiler doesn't recognize as being an alias or wrapper for memset)
>>>>> ought to suffice.
>>>>
>>>> Using LTO is not "stupid".  Relying on people /not/ using LTO, or not 
>>>> using other valid optimisations, is "stupid".
>>>
>>> LTO is a nonconforming optimization. It destroys the concept that
>>> when a translation unit is translated, the semantic analysis is
>>> complete, such that the only remaining activity is resolution of
>>> external references (linkage), and that the semantic analysis of one
>>> translation unit deos not use information about another translation
>>> unit.
>>>
>>> This has not yet changed in last April's N3096 draft, where
>>> translation phases 7 and 8 are:
>>>
>>>   7. White-space characters separating tokens are no longer significant.
>>>      Each preprocessing token is converted into a token. The resulting
>>>      tokens are syntactically and semantically analyzed and translated
>>>      as a translation unit.
>>>
>>>   8. All external object and function references are resolved. Library
>>>      components are linked to satisfy external references to functions
>>>      and objects not defined in the current translation. All such
>>>      translator output is collected into a program image which contains
>>>      information needed for execution in its execution environment.
>>>
>>> and before that, the Program Structure section says:
>>>
>>>   The separate translation units of a program communicate by (for
>>>   example) calls to functions whose identifiers have external linkage,
>>>   manipulation of objects whose identifiers have external linkage, or
>>>   manipulation of data files. Translation units may be separately
>>>   translated and then later linked to produce an executable program.
>>>
>>> LTO deviates from the the model that translation units are separate,
>>> and the conceptual steps of phases 7 and 8.
>> [...]
>>
>> Link time optimization is as valid as cross-function optimization *as
>> long as* it doesn't change the defined behavior of the program.
>
> It always does; the interaction of a translation unit with another
> is an externally visible aspect of the C program. (That can be inferred
> from the rules which forbid semantic analysis across translation
> units, only linkage.)
>
> That's why we can have a real world security issue caused by zeroing
> being optimized away.
>
> The rules spelled out in ISO C allow us to unit test a translation
> unit by linking it to some harness, and be sure it has exactly the
> same behaviors when linked to the production program.
>
> If I have some translation unit in which there is a function foo, such
> that when I call foo, it then calls an external function bar, that's
> observable. I can link that unit to a program which supplies bar,
> containing a printf call, then call foo and verify that the printf call
> is executed.
>
> Since ISO C says that the semantic analysis has been done (that
> unit having gone through phase 7), we can take it for granted as a
> done-and-dusted property of that translation unit that it calls bar
> whenever its foo is invoked.

We can take it for granted that the output performed by the printf call
will be performed, because output is observable behavior.  If the
external function bar is modified, the LTO step has to be redone.

>> Say I have a call to foo in main, and the definition of foo is in
>> another translation unit.  In the absence of LTO, the compiler will have
>> to generate a call to foo.  If LTO is able to determine that foo doesn't
>> do anything, it can remove the code for the function call, and the
>> resulting behavior of the linked program is unchanged.
>
> There always situations in which optimizations that have been forbidden
> don't cause a problem, and are even desirable.
>
> If you have LTO turned on, you might be programming in GNU C or Clang C
> or whatever, not standard C.
>
> Sometimes programs have the same interpretation in GNU C and standard
> C, or the same interpretation to someone who doesn't care about certain
> differences.

Are you claiming that a function call is observable behavior?

Consider:

main.c:
#include "foo.h"
int main(void) {
    foo();
}


foo.h:
#ifndef FOO_H
#define FOO_H
void foo(void);
#endif


foo.c:
void foo(void) {
    // do nothing
}


Are you saying that the "call" instruction generated for the function
call is *observable behavior*?  If an implementation doesn't generate
that "call" instruction because it's able to determine at link time that
the call does nothing, that optimization is forbidden?

I presume you'd agree that omitting the "call" instruction is allowed if
the call and the function definition are in the same translation unit.
What wording in the standard requires a "call" instruction to be
generated if they're in different translation units?

That's a trivial example, but other link time optimizations that don't
change a program's observable behavior (insert weasel words about
unspecified behavior) are also allowed.

In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

I don't see anything about required CPU instructions.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#383886

FromKaz Kylheku <433-929-6894@kylheku.com>
Date2024-03-22 17:20 +0000
Message-ID<20240322094449.555@kylheku.com>
In reply to#383880
On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>> Since ISO C says that the semantic analysis has been done (that
>> unit having gone through phase 7), we can take it for granted as a
>> done-and-dusted property of that translation unit that it calls bar
>> whenever its foo is invoked.
>
> We can take it for granted that the output performed by the printf call
> will be performed, because output is observable behavior.  If the
> external function bar is modified, the LTO step has to be redone.

That's what undeniably has to be done in the LTO world. Nothing that
is done brings that world into conformance, though.

>>> Say I have a call to foo in main, and the definition of foo is in
>>> another translation unit.  In the absence of LTO, the compiler will have
>>> to generate a call to foo.  If LTO is able to determine that foo doesn't
>>> do anything, it can remove the code for the function call, and the
>>> resulting behavior of the linked program is unchanged.
>>
>> There always situations in which optimizations that have been forbidden
>> don't cause a problem, and are even desirable.
>>
>> If you have LTO turned on, you might be programming in GNU C or Clang C
>> or whatever, not standard C.
>>
>> Sometimes programs have the same interpretation in GNU C and standard
>> C, or the same interpretation to someone who doesn't care about certain
>> differences.
>
> Are you claiming that a function call is observable behavior?

Yes. It is the observable behavior of an unlinked translation unit.

It can be observed by linking a harness to it, with a main() function
and all else that is required to make it a complete program.

That harness becomes an instrument for observation.

> Consider:
>
> main.c:
> #include "foo.h"
> int main(void) {
>     foo();
> }
>
>
> foo.h:
> #ifndef FOO_H
> #define FOO_H
> void foo(void);
> #endif
>
>
> foo.c:
> void foo(void) {
>     // do nothing
> }
>
>
> Are you saying that the "call" instruction generated for the function
> call is *observable behavior*?

Of course; it can be observed externally, without doing any reverse
engineering on the translated unit.

External linkage is called "external" for a reason!

> If an implementation doesn't generate
> that "call" instruction because it's able to determine at link time that
> the call does nothing, that optimization is forbidden?

The text says so. Translation units are separate; semantic analysis is
finished in translation phase 7; linking in 8.

Out of translation phases 1-7 we get a concrete artifact: the translated
unit. That has externally visible features, like what symbols it
requires. Its behavior with regard to those symbols can be empirically
observed, validated by tests and expected to hold thereafter.

Since semantic analysis is complete, any observable behavior can be
taken to be a fact about that translated unit, a property of it, which
will not change when it is subject to linkage. The truth cannot be
clawed back, according to the way things are defined in the standard,
and this is a good thing.

> I presume you'd agree that omitting the "call" instruction is allowed if
> the call and the function definition are in the same translation unit.

Yes.

And that's a way to get the effect of LTO portably, in a conforming
way, in any implementation going back decades. Instead of linkage use
#include "foo.c", #include "bar.c" (taking steps to ensure your internal
names don't clash).

LTO is more convenient in that you don't have to use an unusual
program structure, and keeps your internal linkage scopes separate.
Just don't pretend it's conforming to standard C, any more than
-ffast-math.

LTO is "vooodoo" though. The translation units contain intermediate
code, not target code. The intermediate code continues to be subject
to compiler passes when the translation units are brought together.
Thus translation is going on, but the units are gone.

> What wording in the standard requires a "call" instruction to be
> generated if they're in different translation units?
>
> That's a trivial example, but other link time optimizations that don't
> change a program's observable behavior (insert weasel words about
> unspecified behavior) are also allowed.

An example would be the removal of material that is not referenced,
like functions not called anywhere, or entire translation units
whose external names are not referenced. That can cause issues too,
and I've run into them, but I can't call that nonconforming.
Nothing is semantically analyzed across translation units, only the
linkage graph itself, which may be found to be disconnected.

> In phase 8:
>     All external object and function references are resolved. Library
>     components are linked to satisfy external references to functions
>     and objects not defined in the current translation. All such
>     translator output is collected into a program image which contains
>     information needed for execution in its execution environment.
>
> I don't see anything about required CPU instructions.

I don't see anything about /removing/ instructions that have to be
there according to the semantic analysis performed in order to
translate those units from phases 1 - 7, and that can be confirmed
to be present with a test harness.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#383888

FromJames Kuyper <jameskuyper@alumni.caltech.edu>
Date2024-03-22 13:38 -0400
Message-ID<utkfm6$311sb$4@dont-email.me>
In reply to#383886
On 3/22/24 13:20, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
...
>> Are you claiming that a function call is observable behavior?
>
> Yes. It is the observable behavior of an unlinked translation unit.

In the context of the C standard, "observable behavior" is a term with a
precisely specified meaning which is NOT "behavior which can be
observed". That definition does not cover function calls, not even those
with external linkage. What the standard says about what optimizations
are permitted is in terms of "observable behavior", NOT "behavior which
can be observed".

>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
>
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

And the C standard imposes no requirement that such behavior occur as
described by the abstract semantics. Only actual observable behavior, as
that term is defined by the C standard, must occur as if those semantics
were followed - whether or not they actually were.

...
>> If an implementation doesn't generate
>> that "call" instruction because it's able to determine at link time that
>> the call does nothing, that optimization is forbidden?
>
> The text says so. Translation units are separate; semantic analysis is
> finished in translation phase 7; linking in 8.

Translation phases are specified solely for the purpose of expressing
the precedence of the corresponding semantic rules. The standard
explicitly allows for the phases to be intermingled or even done out of
order, so long as the observable behavior is behavior that would be
permitted if they had been done in the order specified.

> Out of translation phases 1-7 we get a concrete artifact: the translated
> unit. That has externally visible features, like what symbols it
> requires. Its behavior with regard to those symbols can be empirically
> observed, validated by tests and expected to hold thereafter.

And the standard imposes no requirements on those externally visible
features, only on some (but not ALL) of the behavior that results from
executing the program.

[toc] | [prev] | [next] | [standalone]


#383895

FromKaz Kylheku <433-929-6894@kylheku.com>
Date2024-03-22 19:27 +0000
Message-ID<20240322115519.204@kylheku.com>
In reply to#383888
On 2024-03-22, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> And the C standard imposes no requirement that such behavior occur as
> described by the abstract semantics. Only actual observable behavior, as
> that term is defined by the C standard, must occur as if those semantics
> were followed - whether or not they actually were.

But there is something. Though not normative text, EXAMPLE 1 gives
the range of possibilities for optimization:

  EXAMPLE 1 An implementation might define a one-to-one correspondence
  between abstract and actual semantics: at every sequence point, the
  values of the actual objects would agree with those specified by the
  abstract semantics. The keyword volatile would then be redundant.

  Alternatively, an implementation might perform various optimizations
  within each translation unit, such that the actual semantics would agree
  with the abstract semantics only when making function calls across
  translation unit boundaries.

I believe the intent of this example is to give the two extremes
representing the full range of what is envisioned as permissible.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#383892

FromDavid Brown <david.brown@hesbynett.no>
Date2024-03-22 19:13 +0100
Message-ID<utkho0$32p2g$1@dont-email.me>
In reply to#383886
On 22/03/2024 18:20, Kaz Kylheku wrote:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:

>>
>> Are you claiming that a function call is observable behavior?
> 
> Yes. It is the observable behavior of an unlinked translation unit.
> 
> It can be observed by linking a harness to it, with a main() function
> and all else that is required to make it a complete program.
> 
> That harness becomes an instrument for observation.

That is "observable" in the same sense that the size of a compiled 
object file is "observable" by executing "ls -l".  It is not "observable 
behaviour" as defined by the C standards.

C defines "observable behaviour" for /programs/.  Not for translation 
units, or translated translation units (what one might call an "object 
file" - be it assembly, machine code, or internal compiler-specific 
formats).

For C, it makes no sense to talk about "observable behaviour" for a 
unit.  It is only by linking the unit to your test harness that you get 
a "program", which then has "observable behaviour".


>>
>>
>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
> 
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

The contents of an object file - or the instructions used in a complete 
program - are not "observable behaviour" in C.  Again, I refer you to 
5.1.2.2.2p6.

> 
>> If an implementation doesn't generate
>> that "call" instruction because it's able to determine at link time that
>> the call does nothing, that optimization is forbidden?
> 
> The text says so. Translation units are separate; semantic analysis is
> finished in translation phase 7; linking in 8.

The text also says (in footnotes) that the phases are for conceptual 
description only, and in practice they are typically folded together.


>> What wording in the standard requires a "call" instruction to be
>> generated if they're in different translation units?
>>
>> That's a trivial example, but other link time optimizations that don't
>> change a program's observable behavior (insert weasel words about
>> unspecified behavior) are also allowed.
> 
> An example would be the removal of material that is not referenced,
> like functions not called anywhere, or entire translation units
> whose external names are not referenced. That can cause issues too,
> and I've run into them, but I can't call that nonconforming.
> Nothing is semantically analyzed across translation units, only the
> linkage graph itself, which may be found to be disconnected.
> 

Removal of unreferenced material at link time is very common.  In some 
fields, it is standard practice to use compiler and linker flags geared 
at making this easier.  It is not really any different than using static 
libraries - the linker will load all requested static libraries, then 
throw out all parts that are not transitively reachable from non-library 
code.

The inclusion or not of material in the program image is not directly 
observable behaviour in C - there is no way to write portable C code to 
determine if the function "foo" has been included in the image despite 
never being referenced.  (You can, of course, have the linker include 
information about the image inside the image itself and read that with 
volatile accesses from within the program.)

In small-systems embedded programming, "-ffunction-sections" and 
"-fdata-sections", along with "-Wl,--gc-sections", are almost invariably 
used for gcc to reduce the size of the final image.  It makes it much 
more practical to write re-usable code even if not all functions are 
used in any given application.  I have never heard of it "causing 
issues", and I cannot see how it might be non-conforming.  (And if it is 
not a conformance issue, how is it relevant here?)

>> In phase 8:
>>      All external object and function references are resolved. Library
>>      components are linked to satisfy external references to functions
>>      and objects not defined in the current translation. All such
>>      translator output is collected into a program image which contains
>>      information needed for execution in its execution environment.
>>
>> I don't see anything about required CPU instructions.
> 
> I don't see anything about /removing/ instructions that have to be
> there according to the semantic analysis performed in order to
> translate those units from phases 1 - 7, and that can be confirmed
> to be present with a test harness.
> 

The C standard doesn't deal with CPU instructions.  It does not have a 
concept of "running" a translated translation unit - you can only run a 
complete program, at which point there is no distinction between the 
translation units that are "collected" into the program image.  It's all 
fused together into one big lump, with one set of observable behaviours.

[toc] | [prev] | [next] | [standalone]


#383893

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2024-03-22 11:21 -0700
Message-ID<87cyrmyvnv.fsf@nosuchdomain.example.com>
In reply to#383886
Kaz Kylheku <433-929-6894@kylheku.com> writes:
> On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Kaz Kylheku <433-929-6894@kylheku.com> writes:
>>> Since ISO C says that the semantic analysis has been done (that
>>> unit having gone through phase 7), we can take it for granted as a
>>> done-and-dusted property of that translation unit that it calls bar
>>> whenever its foo is invoked.
>>
>> We can take it for granted that the output performed by the printf call
>> will be performed, because output is observable behavior.  If the
>> external function bar is modified, the LTO step has to be redone.
>
> That's what undeniably has to be done in the LTO world. Nothing that
> is done brings that world into conformance, though.
>
>>>> Say I have a call to foo in main, and the definition of foo is in
>>>> another translation unit.  In the absence of LTO, the compiler will have
>>>> to generate a call to foo.  If LTO is able to determine that foo doesn't
>>>> do anything, it can remove the code for the function call, and the
>>>> resulting behavior of the linked program is unchanged.
>>>
>>> There always situations in which optimizations that have been forbidden
>>> don't cause a problem, and are even desirable.
>>>
>>> If you have LTO turned on, you might be programming in GNU C or Clang C
>>> or whatever, not standard C.
>>>
>>> Sometimes programs have the same interpretation in GNU C and standard
>>> C, or the same interpretation to someone who doesn't care about certain
>>> differences.
>>
>> Are you claiming that a function call is observable behavior?
>
> Yes. It is the observable behavior of an unlinked translation unit.

An unlinked translation unit has no observable behavior in the way that
term is defined by the standard.

> It can be observed by linking a harness to it, with a main() function
> and all else that is required to make it a complete program.
> 
> That harness becomes an instrument for observation.

And a "call" instruction in a program consisting of a single translation
unit can be observed in a variety of ways.  That doesn't make it
"observable behavior".

Are you using the phrase "observable behavior" in a sense other than
what's defined in N1570 5.1.2.3?

[...]

>> Are you saying that the "call" instruction generated for the function
>> call is *observable behavior*?
>
> Of course; it can be observed externally, without doing any reverse
> engineering on the translated unit.

Is the "call" instruction *observable behavior* as defined in 5.1.2.3?

[...]

>> In phase 8:
>>     All external object and function references are resolved. Library
>>     components are linked to satisfy external references to functions
>>     and objects not defined in the current translation. All such
>>     translator output is collected into a program image which contains
>>     information needed for execution in its execution environment.
>>
>> I don't see anything about required CPU instructions.
>
> I don't see anything about /removing/ instructions that have to be
> there according to the semantic analysis performed in order to
> translate those units from phases 1 - 7, and that can be confirmed
> to be present with a test harness.

The standard doesn't mention either adding or removing instructions.

Running a program under a test harness is effectively running a
different program.  Of course it can yield information about the
original program, but in effect you're linking the program with a
different set of libraries.

I can use a test harness to observe whether a program uses an add or inc
instruction to evaluate `i++` (assuming the CPU has both instructions).
The standard doesn't care how the increment happens, as long as the
result is correct.  It doesn't care *whether* the increment happens
unless the result affects the programs *observable behavior*.

What in the description of translation phases 7 and 8 makes
behavior-preserving optimizations valid in phase 7 and forbidden in
phase 8?  (Again, insert weasel words about unspecified behavior.)

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


Page 1 of 6  [1] 2 3 4 5 6  Next page →

Back to top | Article view | comp.lang.c


csiph-web