Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2795 > unrolled thread

Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

Started byMartin Ward <martin@gkc.org.uk>
First post2022-01-05 10:25 +0000
Last post2022-01-11 22:01 -0500
Articles 20 on this page of 26 — 9 participants

Back to article view | Back to comp.compilers

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: for or against equality, was Why are ambiguous grammars usually a bad idea? Martin Ward <martin@gkc.org.uk> - 2022-01-05 10:25 +0000
    Re: for or against equality, was Why are ambiguous grammars usually a bad idea? David Brown <david.brown@hesbynett.no> - 2022-01-06 09:11 +0100
      Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-06 16:43 +0000
        Re: what is defined, was for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-07 12:06 +0100
        Re: what is defined, was for or against equality Spiros Bousbouras <spibou@gmail.com> - 2022-01-07 13:21 +0000
          Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-08 09:31 +0000
            Re: what is defined, was for or against equality Spiros Bousbouras <spibou@gmail.com> - 2022-01-08 22:28 +0000
              Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-09 00:09 +0000
                Re: what is defined, was for or against equality Spiros Bousbouras <spibou@gmail.com> - 2022-01-09 21:30 +0000
            Re: what is defined, was for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-09 23:00 +0100
              Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-10 12:04 +0000
                Re: what is defined, was for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-11 18:16 +0100
                  Re: what is defined, was for or against equality Kaz Kylheku <480-992-1380@kylheku.com> - 2022-01-11 19:19 +0000
                    Re: what is defined, was for or against equality gah4 <gah4@u.washington.edu> - 2022-01-11 14:18 -0800
                      Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-12 19:02 +0000
                        Re: what is defined, was for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-13 08:24 +0100
                        Re: what is defined, was for or against equality Thomas Koenig <tkoenig@netcologne.de> - 2022-01-13 11:17 +0000
            Re: what is defined, was for or against equality gah4 <gah4@u.washington.edu> - 2022-01-10 16:58 -0800
      Re: for or against equality, was Why are ambiguous grammars usually a bad idea? Robert Prins <robert@prino.org> - 2022-01-06 19:07 +0000
      Undefined behaviour, was: for or against equality Martin Ward <martin@gkc.org.uk> - 2022-01-07 14:02 +0000
        Re: Undefined behaviour, was: for or against equality Spiros Bousbouras <spibou@gmail.com> - 2022-01-08 03:41 +0000
      Re: Undefined behaviour, was: for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-07 15:56 +0100
        Re: Undefined behaviour, was: for or against equality anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-01-08 17:52 +0000
          Re: Undefined behaviour, was: for or against equality David Brown <david.brown@hesbynett.no> - 2022-01-09 23:53 +0100
          Re: Undefined behaviour, was: for or against equality Kaz Kylheku <480-992-1380@kylheku.com> - 2022-01-11 16:55 +0000
            Re: Undefined behaviour, was: for or against equality George Neuner <gneuner2@comcast.net> - 2022-01-11 22:01 -0500

Page 1 of 2  [1] 2  Next page →


#2795 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

FromMartin Ward <martin@gkc.org.uk>
Date2022-01-05 10:25 +0000
SubjectRe: for or against equality, was Why are ambiguous grammars usually a bad idea?
Message-ID<22-01-016@comp.compilers>
On 04/01/2022 21:26, gah4 wrote:
> Stories are that COBOL programmers always
> keep the list of reserved words nearby, to avoid using them.

Our esteemed moderator claims:
> [COBOL doesn't have that many reserved words

I count 510 reserved words in IBM COBOL. Adding a few other dialects
can push the total to 700 or more. By comparison, C has about 32
reserved words.

The story I heard was of a COBOL shop where it was mandatory to
include a hyphen in every data name: in effect, *every* unhyphenated
word was treated as a reserved word. The slightly more managable list
of *hyphenated* reserved words (149 in IBM COBOL, but 46 of these are
of the form COMP-0, COMP-1, COMP-2 etc) was printed out and posted on
the wall.

I just noticed that if you include a digit in the part of the name
before the first hyphen, you can guarantee to avoid all
the reserved words!

PL/I went to the other extreme of no reserved words in reaction
to COBOL. Also, the aim of PL/I was to be a language which does
everything: business programming (like COBOL) and scientific
programming (like FORTRAN). In theory, if you only wanted
to do, say, business programming, you only needed to learn
part of the language and you would not get tripped up by keywords
from the other part of the language that you didn't know about yet.

Using a language that you don't know in its entirety might seem
dangerous, but everybody seems to do it these days:
how many C programmers have read the entire 500+ pages of
the latest C standard and memorised the 200+ varieties
of "undefined behaviour" so that they can avoid all of them
in every line of code that they write?
--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
[IBM hoped everyone would switch from Fortran and COBOL to PL/I and
it was obvious Fortran programmers would not put up with reserved
words, particularly ones unrelated to scientific programming.
As far as the size of languages, that seems a matter of point of
view.  Python is a large language if you consider the standard
library to be part of the language, a very small one if you don't.
-John]

[toc] | [next] | [standalone]


#2796

FromDavid Brown <david.brown@hesbynett.no>
Date2022-01-06 09:11 +0100
Message-ID<22-01-018@comp.compilers>
In reply to#2795
On 05/01/2022 11:25, Martin Ward wrote:

> Using a language that you don't know in its entirety might seem
> dangerous, but everybody seems to do it these days:
> how many C programmers have read the entire 500+ pages of
> the latest C standard and memorised the 200+ varieties
> of "undefined behaviour" so that they can avoid all of them
> in every line of code that they write?

I think it is normal not to know everything about the language you use.
 And if you include the language's standard library, then there are very
few currently used languages where it would even be possible to learn it
all.  By the time you learned all of the language and default libraries
of C++, Java, Python, etc., there would be a new version out and you'd
have more to learn.

The important things for writing code are to know enough to be able to
write the kind of code you are doing, and to avoid accidentally doing
things you didn't intend. Static warning tools are vital here - from
syntax-highlighting and check-as-you-type editors and IDE's, through
compiler warning flags, to stand-alone checkers. Your tools should
tell you if you are accidentally using a reserved word as an
identifier.

There is no need to memorize undefined behaviours for a language -
indeed, such a thing is impossible since everything not defined by a
language standard is, by definition, undefined behaviour. (C and C++
are not special here - the unusual thing is just that their standards
say this explicitly.)

The trick is to memorize the /defined/ behaviours, and stick to them.
You generally don't need to know if a language leaves (1 / 0) as
undefined, or gives a specific value, or prints an error message -
usually it is sufficient to know the values for which (x / y) /is/
defined, and stick to those values.

Basically, trying to execute undefined behaviour is no more and no less
than a bug in the program - whether it is "undefined" in terms of the
language, the library, the code you wrote yourself, the customer's
specification, or anything else.  People program primarily by trying to
write correct code - not by trying to think of all the ways they could
write incorrect code!


The real challenge from big languages and big standard libraries is not
/writing/ code, it is /reading/ it.  It doesn't really matter if a C
programmer, when writing some code, does not know what the syntax "void
foo(int a[static 10]);"  means.  (Most C programmers don't know it, and
never miss it.)  But it can be a problem if they have to read and
understand code that uses something they don't know.

[toc] | [prev] | [next] | [standalone]


#2797 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-06 16:43 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-020@comp.compilers>
In reply to#2796
David Brown <david.brown@hesbynett.no> schrieb:

> There is no need to memorize undefined behaviours for a language -
> indeed, such a thing is impossible since everything not defined by a
> language standard is, by definition, undefined behaviour. (C and C++
> are not special here - the unusual thing is just that their standards
> say this explicitly.)

This is a rather C-centric view of things.  The Fortran standard
uses a different model.

There are constraints, which are numbered.  Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance).  If it fails to do so, this is a bug in
the compiler.

There are also phrases which have "shall" or "shall not".  If this
is violated, this is an error in the program.  Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required.  Many run-time errors such as array overruns
fall into this category.

[...]

> The real challenge from big languages and big standard libraries is not
> /writing/ code, it is /reading/ it.  It doesn't really matter if a C
> programmer, when writing some code, does not know what the syntax "void
> foo(int a[static 10]);"  means.  (Most C programmers don't know it, and
> never miss it.)  But it can be a problem if they have to read and
> understand code that uses something they don't know.

Agreed.

[toc] | [prev] | [next] | [standalone]


#2802 — Re: what is defined, was for or against equality

FromDavid Brown <david.brown@hesbynett.no>
Date2022-01-07 12:06 +0100
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-026@comp.compilers>
In reply to#2797
On 06/01/2022 17:43, Thomas Koenig wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>
>> There is no need to memorize undefined behaviours for a language -
>> indeed, such a thing is impossible since everything not defined by a
>> language standard is, by definition, undefined behaviour. (C and C++
>> are not special here - the unusual thing is just that their standards
>> say this explicitly.)
>
> This is a rather C-centric view of things.  The Fortran standard
> uses a different model.
>
> There are constraints, which are numbered.  Any violation of such
> a constraint needs to be reported by the compiler ("processor",
> in Fortran parlance).  If it fails to do so, this is a bug in
> the compiler.

C has basically the same concept.

(IIRC, C++ as a few constraints such as the "one definition rule" that
where the standard says no diagnostics are necessary, because
identifying the error would mean the compiler has to see multiple
translation units at once.  Compilers often diagnose these if they have
some kind of link-time optimisation or program-at-once mode.)

>
> There are also phrases which have "shall" or "shall not".  If this
> is violated, this is an error in the program.  Catching such a
> violation is a good thing from quality of implementation standpoint,
> but is not required.  Many run-time errors such as array overruns
> fall into this category.

That is the same in C.  From 4.2 "Conformance" :

"""
If a “shall” or “shall not” requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is undefined.
Undefined behavior is otherwise indicated in this International Standard
by the words “undefined behavior” or by the omission of any explicit
definition of behavior.  There is no difference in emphasis among these
three; they all describe “behavior that is undefined”.
"""

The only difference I see from what you describe of Fortran (I have not
read any Fortran standards) is that the C standards also note that
behaviour that is not defined in the standards is undefined behaviour as
far as the standards are concerned.  That is a tautology, of course, and
applies equally to Fortran and any other language.


It is quite possible that the details of which behaviours are defined or
not varies between the languages - things like division by 0,
out-of-bounds array access, etc., may be different.  As I understand it,
passing aliased pointers or array references as different parameters to
the same function can lead to undefined behaviour in Fortran, whereas it
is defined in C (unless you use "restrict").


> [...]
>
>> The real challenge from big languages and big standard libraries is not
>> /writing/ code, it is /reading/ it.  It doesn't really matter if a C
>> programmer, when writing some code, does not know what the syntax "void
>> foo(int a[static 10]);"  means.  (Most C programmers don't know it, and
>> never miss it.)  But it can be a problem if they have to read and
>> understand code that uses something they don't know.
>
> Agreed.

[toc] | [prev] | [next] | [standalone]


#2803 — Re: what is defined, was for or against equality

FromSpiros Bousbouras <spibou@gmail.com>
Date2022-01-07 13:21 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-027@comp.compilers>
In reply to#2797
On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>
> > There is no need to memorize undefined behaviours for a language -
> > indeed, such a thing is impossible since everything not defined by a
> > language standard is, by definition, undefined behaviour. (C and C++
> > are not special here - the unusual thing is just that their standards
> > say this explicitly.)
>
> This is a rather C-centric view of things.  The Fortran standard
> uses a different model.
>
> There are constraints, which are numbered.  Any violation of such
> a constraint needs to be reported by the compiler ("processor",
> in Fortran parlance).  If it fails to do so, this is a bug in
> the compiler.
>
> There are also phrases which have "shall" or "shall not".  If this
> is violated, this is an error in the program.  Catching such a
> violation is a good thing from quality of implementation standpoint,
> but is not required.  Many run-time errors such as array overruns
> fall into this category.

This seems to me exactly like the C model. What difference do you see ?

Regarding the more general issue, it seems to me that undefined behaviour is
a red herring (which I think is the point David was making). Every time one
writes code in any language , one must have an expectation on how the code is
supposed to behave and some reasoning on why the code they wrote will behave
according to their expectations. The reasoning will be based (apart from
general rules from logic and mathematics) on what the standard of the
programming language specifies (if the language has a standard) , what the
translator/compiler documentation specifies , what the documentation of any
libraries they use specifies and so forth.

For example lets say that I write in C

int a = INT_MAX + 1 ;

with the expectation that  a  will get the value INT_MIN. The onus is on me
to provide a reasoning why the code above will meet my expectation. If I
cannot provide such a reasoning then from my point of view the code is
already undefined. The fact that the C standard also says that the code is
undefined is irrelevant. Even if the C standard specified for example that
signed integer arithmetic uses wraparound, unless I could point to the place
in the standard where it said so, the code is still undefined from my point
of view so I should not use it.

But lets say that I have the above code and I intend to compile it with
GCC using the  -fwrapv   flag. Then my expectation is actually justified
based on the GCC documentation for what  -fwrapv  means and the parts
of the C standard which define what the various symbols in

int a = INT_MAX + 1 ;

mean. I'm not going to provide a proof because it should be obvious. But
any such proof would not need to cite any part of the C standard which
explicitly mentions undefined behaviour.


The only occasion where an explicit mention of undefined behaviour would be
relevant would be if the C standard (or any standard) were contradictory i.e.
it said in some place that some construct has a certain defined behaviour and
it said in some other place that the same construct has undefined behaviour.
But with a popular language like C , if such contradictions existed , they
would be caught early and corrected.

[toc] | [prev] | [next] | [standalone]


#2808 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-08 09:31 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-032@comp.compilers>
In reply to#2803
Spiros Bousbouras <spibou@gmail.com> schrieb:
> On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
> Thomas Koenig <tkoenig@netcologne.de> wrote:
>> David Brown <david.brown@hesbynett.no> schrieb:
>>
>> > There is no need to memorize undefined behaviours for a language -
>> > indeed, such a thing is impossible since everything not defined by a
>> > language standard is, by definition, undefined behaviour. (C and C++
>> > are not special here - the unusual thing is just that their standards
>> > say this explicitly.)
>>
>> This is a rather C-centric view of things.  The Fortran standard
>> uses a different model.
>>
>> There are constraints, which are numbered.  Any violation of such
>> a constraint needs to be reported by the compiler ("processor",
>> in Fortran parlance).  If it fails to do so, this is a bug in
>> the compiler.
>>
>> There are also phrases which have "shall" or "shall not".  If this
>> is violated, this is an error in the program.  Catching such a
>> violation is a good thing from quality of implementation standpoint,
>> but is not required.  Many run-time errors such as array overruns
>> fall into this category.
>
> This seems to me exactly like the C model. What difference do you see ?

First, I see a difference in result.  Highly intelligent and
knowledgable people argue vehemently if a program should be able
to use undefined behavior or not, and lot of vitriol is directed
against compiler writers who use the assumption that undefined
behavior cannot happen in their compilers for optimization,
especially if it turns out that existing code was broken and no
longer works after a compiler upgrade (Just read a few of Linus
Torvald's comments on that matter).

I see C conflating two separate concepts:  Programm errors and
behavior that is outside the standard.  "Undefined behavior is
always a programming error" does not work; that would make

#include <unistd.h>
#include <string.h>

int main()
{
  char a[] = "Hello, world!\n";
  write (1, a, strlen(a));
  return 0;
}

not more and not less erroneous than

int main()
{
  int *p = 0;
  *p = 42;
}

whereas I would argue that there is an important difference between
the two.

If the C standard replaced "the behavior is undefined" with "the
program is in error, and the subsequent behavior is undefined"
or something along those lines, the discussion would be much
muted.

(Somebody may point out to me that this what the standard is
actually saying.  If so, that would sort of reinforce my argument
that it should be clearer :-)

[toc] | [prev] | [next] | [standalone]


#2810 — Re: what is defined, was for or against equality

FromSpiros Bousbouras <spibou@gmail.com>
Date2022-01-08 22:28 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-034@comp.compilers>
In reply to#2808
On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
> Spiros Bousbouras <spibou@gmail.com> schrieb:
> > On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
> > Thomas Koenig <tkoenig@netcologne.de> wrote:
> >> This is a rather C-centric view of things.  The Fortran standard
> >> uses a different model.
> >>
> >> There are constraints, which are numbered.  Any violation of such
> >> a constraint needs to be reported by the compiler ("processor",
> >> in Fortran parlance).  If it fails to do so, this is a bug in
> >> the compiler.
> >>
> >> There are also phrases which have "shall" or "shall not".  If this
> >> is violated, this is an error in the program.  Catching such a
> >> violation is a good thing from quality of implementation standpoint,
> >> but is not required.  Many run-time errors such as array overruns
> >> fall into this category.
> >
> > This seems to me exactly like the C model. What difference do you see ?
>
> First, I see a difference in result.  Highly intelligent and
> knowledgable people argue vehemently if a program should be able
> to use undefined behavior or not, and lot of vitriol is directed
> against compiler writers who use the assumption that undefined
> behavior cannot happen in their compilers for optimization,
> especially if it turns out that existing code was broken and no
> longer works after a compiler upgrade (Just read a few of Linus
> Torvald's comments on that matter).
>
> I see C conflating two separate concepts:  Programm errors and
> behavior that is outside the standard.  "Undefined behavior is
> always a programming error" does not work; that would make

The C standard is in no position to say that some programme is in
error. This would require near omniscience from the standard
writers.

> #include <unistd.h>
> #include <string.h>
>
> int main()
> {
>   char a[] = "Hello, world!\n";
>   write (1, a, strlen(a));
>   return 0;
> }
>
> not more and not less erroneous than
>
> int main()
> {
>   int *p = 0;
>   *p = 42;
> }
>
> whereas I would argue that there is an important difference between
> the two.

The only difference I see between the two is that the first is defined
by POSIX and the second is not. According to POSIX the first is required
to print something on stdout. I cannot imagine any extension which
would make the second programme do something useful and a conforming
implementation may well compile it as essentially a no-op.

But with something like

int main(voidd) {
    int *p = 0 ;
    *p = 42 ;
    .... do other stuff ...
    return 0 ;
}

the C standard allows for a conforming implementation to do something
useful like perhaps store 42 to address 0.

> If the C standard replaced "the behavior is undefined" with "the
> program is in error, and the subsequent behavior is undefined"
> or something along those lines, the discussion would be much
> muted.
>
> (Somebody may point out to me that this what the standard is
> actually saying.  If so, that would sort of reinforce my argument
> that it should be clearer :-)

No , it most definitely does not say that nor could it possibly say
that.

[toc] | [prev] | [next] | [standalone]


#2811 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-09 00:09 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-035@comp.compilers>
In reply to#2810
Spiros Bousbouras <spibou@gmail.com> schrieb:
> On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
> Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Spiros Bousbouras <spibou@gmail.com> schrieb:
>> > On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
>> > Thomas Koenig <tkoenig@netcologne.de> wrote:
>> >> This is a rather C-centric view of things.  The Fortran standard
>> >> uses a different model.
>> >>
>> >> There are constraints, which are numbered.  Any violation of such
>> >> a constraint needs to be reported by the compiler ("processor",
>> >> in Fortran parlance).  If it fails to do so, this is a bug in
>> >> the compiler.
>> >>
>> >> There are also phrases which have "shall" or "shall not".  If this
>> >> is violated, this is an error in the program.  Catching such a
>> >> violation is a good thing from quality of implementation standpoint,
>> >> but is not required.  Many run-time errors such as array overruns
>> >> fall into this category.
>> >
>> > This seems to me exactly like the C model. What difference do you see ?
>>
>> First, I see a difference in result.  Highly intelligent and
>> knowledgable people argue vehemently if a program should be able
>> to use undefined behavior or not, and lot of vitriol is directed
>> against compiler writers who use the assumption that undefined
>> behavior cannot happen in their compilers for optimization,
>> especially if it turns out that existing code was broken and no
>> longer works after a compiler upgrade (Just read a few of Linus
>> Torvald's comments on that matter).
>>
>> I see C conflating two separate concepts:  Programm errors and
>> behavior that is outside the standard.  "Undefined behavior is
>> always a programming error" does not work; that would make

> The C standard is in no position to say that some programme is in
> error. This would require near omniscience from the standard
> writers.

A standard (or other specification document) is certainly able to
state that some construct is in error.  To grab an often-quoted
example:

J3/18-007r1, the Fortran 2018 interpretation documents, states in
subclause 9.5.3, "Array elements and array sections",

# The value of a subscript in an array element shall be within the
# bounds for its dimension.

No omnicience required to write or understand that sentence.

This puts the burden on the programmer.  The compiler might catch
such an error error and abort the program, or other unpredictable
things such as overwriting an unrelated variable might also happen.

Reading a language standard can be hard.  Quite often, information
is scattered throughout the text and needs to be pieced together
to find the necessary information, especially definition of terms
which are crucial to understanding.  Most programmers do do not
read standards (at least final committee drafts can usually be
found these days on the Internet), but compiler writers should at
least be familiar with what they are implementing.

Programmers often rely on books, but these can also get things wrong.

Because programmers are human, they also can get ticked off when being
told that a construct they have used for years has been illegal
for decades :-|

Having a good standard is crucial to being able to write good compilers.

[toc] | [prev] | [next] | [standalone]


#2813 — Re: what is defined, was for or against equality

FromSpiros Bousbouras <spibou@gmail.com>
Date2022-01-09 21:30 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-037@comp.compilers>
In reply to#2811
On Sun, 9 Jan 2022 00:09:19 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
> Spiros Bousbouras <spibou@gmail.com> schrieb:
> > On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
> > Thomas Koenig <tkoenig@netcologne.de> wrote:
> >> I see C conflating two separate concepts:  Programm errors and
> >> behavior that is outside the standard.  "Undefined behavior is
> >> always a programming error" does not work; that would make
>
> > The C standard is in no position to say that some programme is in
> > error. This would require near omniscience from the standard
> > writers.
>
> A standard (or other specification document) is certainly able to
> state that some construct is in error.  To grab an often-quoted
> example:
>
> J3/18-007r1, the Fortran 2018 interpretation documents, states in
> subclause 9.5.3, "Array elements and array sections",
>
> # The value of a subscript in an array element shall be within the
> # bounds for its dimension.
>
> No omnicience required to write or understand that sentence.
>
> This puts the burden on the programmer.  The compiler might catch
> such an error error and abort the program, or other unpredictable
> things such as overwriting an unrelated variable might also happen.

I haven't read any Fortran standards so I can only go by the above quote.
Only the programmer knows what their requirements are and why they think that
the code they wrote will meet those requirements. My idea of error is that
either the code does not meet the requirements or it does so only by accident
and the programmer does not have a correct reasoning as to why their code
will meet those requirements. You seem to be reading the quote as saying

    No matter what the programmer requirements and no matter what extensions
    their Fortram implementation offers , the programmer requirements will
    not be justifiably met if they use an array subscript outside the bounds
    for its dimension.

Perhaps some Fortran implementation gives information as to the layout of
distinct variables so that one knows what will be overwritten by writing off
the bounds of some aray and it will be overwritten in the way the programmer
wants. Unlikely (especially for Fortran) but it cannot be excluded. I can
imagine a C implementation for small embedded systems which does provide such
information and a programmer using it to reduce the number of instructions to
achieve a desired result. A more realistic example is the following :

#include <stdio.h>

int main(void) {
    int a = 12 , b = 14 ;
    printf("%2$d %1$d\n" , a , b) ;
    return 0 ;
}

The above code has undefined behaviour according to the C standard. It is
defined according to POSIX .Whether it is in error depends on whether the
programmer really wanted to print
14 12

and no standards committee can possibly know this. So I still think that your
reading requires omniscience from the Fortran standard writers. But perhaps
there are other parts of the standard which justify your reading. For example
some parts of the Common Lisp standard do state that an implementation must
not extend some construct to provide useful functionality beyond what the
standard specifies. I don't remember precisely how it states it and I can't
find those parts now.

> Reading a language standard can be hard.  Quite often, information
> is scattered throughout the text and needs to be pieced together
> to find the necessary information, especially definition of terms
> which are crucial to understanding.  Most programmers do do not
> read standards (at least final committee drafts can usually be
> found these days on the Internet), but compiler writers should at
> least be familiar with what they are implementing.
>
> Programmers often rely on books, but these can also get things wrong.

C books at least usually don't go into the fine details of undefined
behaviour. To hone one's instincts in this area one should spend a few
months systematically reading  comp.lang.c  while consulting a draft
of the standard !

> Because programmers are human, they also can get ticked off when being
> told that a construct they have used for years has been illegal
> for decades :-|

This may happen but my impression with C is that the strongest complaints
come from people who

- have read the C standard (or at least the relevant parts of it)

- know that their code has undefined behaviour and know what the term means

- they do not rely on any compiler extensions

yet still feel certain (dare I say "entitled" ?) that their code ought to
behave in a certain way. For an extreme example see Robert M. Hyatt of
crafty fame (a chess programme which has won awards in the past) :
http://www.open-chess.org/viewtopic.php?f=5&t=2519 .
[Fortran used to require that arrays were stored in column major order, that
double precision took twice the space of real and integer, and you were allowed
to use EQUIVALENCE and adjustable dimensions in argument arrays to do overlaying
assuming that layout.  Dunno how much more modern Fortran has deprecated it. -John]

[toc] | [prev] | [next] | [standalone]


#2814 — Re: what is defined, was for or against equality

FromDavid Brown <david.brown@hesbynett.no>
Date2022-01-09 23:00 +0100
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-038@comp.compilers>
In reply to#2808
On 08/01/2022 10:31, Thomas Koenig wrote:
> Spiros Bousbouras <spibou@gmail.com> schrieb:

>> This seems to me exactly like the C model. What difference do you see ?
>
> First, I see a difference in result.  Highly intelligent and
> knowledgable people argue vehemently if a program should be able
> to use undefined behavior or not, and lot of vitriol is directed
> against compiler writers who use the assumption that undefined
> behavior cannot happen in their compilers for optimization,
> especially if it turns out that existing code was broken and no
> longer works after a compiler upgrade (Just read a few of Linus
> Torvald's comments on that matter).

People want compilers to do what the programmer meant, not what he or
she wrote.  And in particular, if a compiler did one thing once, they
want it to continue to do the same thing with the same code - as long as
they got what they wanted the first time round.

This is, of course, entirely natural for humans.  But it is not natural
for computer programs like compilers.

Linus Torvald's is known for blowing his top on matters that he either
does not understand, or when he has mixed his personal opinions with
facts, or while only looking at a small part of the big picture.  (He is
also known as an incredible programmer, a world-class project leader,
and a charismatic visionary who revolutionised the software world - but
that's beside the point here!).

A key example of his complaints in this area revolve around a function
that was something equivalent to :

int foo(int * p) {
	int x = *p;
	if (!p) return -1;
	return x;
}

His complaint was that the compiler saw that "*p" was accessed, and
therefore assumed "p" could not be zero and optimised away the test.
The compiler did exactly what it was asked to do - the optimisation is
perfectly valid according to the C standards and additional definitions
given by the compiler.  But it was not what the programmer wanted, and
not what older versions of the compiler had done.

Of course, when a new optimisation simply makes object code more
efficient, programmers want that - they don't /always/ want the compiler
to handle things the way older versions did.  They want the compiler to
read their minds and see what they meant to write, and generate optimal
code for that.


None of this is helped by the fact that C code often has to work
efficiently on a variety of targets and compilers, and some compilers
give extra guarantees about how they interpret code beyond the
definitions given in the C standards.  Many more compilers can be relied
upon in practice to work in particular ways, though they don't guarantee
or document it, and this means the most efficient code that works in
practice on one compiler may be wrong and give incorrect results on
another compiler.  You can write C code that is correct and widely
portable, but you can't write C code that is correct, optimally
efficient, and widely portable.



The big question here, is why do you think Fortran is any different?  In
theory, there isn't a difference - nothing you have said here convinces
me that there is any fundamental difference between Fortran and C in
regards to undefined behaviour.  (And there's no difference in the
implementations - the most commonly used Fortran compilers also handle
C, C++, and perhaps other languages.)

I believe it is a matter of who writes Fortran programs, and what these
programs do.  Now, I don't know or use Fortran myself, so I might be
wrong here.  However, it seems to me that Fortran is typically used by
experienced professional programmers and for scientific or numerical
programming.  C is used by a much wider range of programmers, for a much
wider range of programming tasks.  I think it is inevitable that you'll
get more people programming in C when they are not fully sure of what
they are doing, more code where subtle mistakes can be made, more people
using C when other languages would have been better choices, and more C
programmers who are likely to blame their tools for their own mistakes.



>
> I see C conflating two separate concepts:  Programm errors and
> behavior that is outside the standard.  "Undefined behavior is
> always a programming error" does not work; that would make
>
> #include <unistd.h>
> #include <string.h>
>
> int main()
> {
>   char a[] = "Hello, world!\n";
>   write (1, a, strlen(a));
>   return 0;
> }
>

C does not have a "write" function in the standard library.  So the
behaviour of "write" is not defined by the C standards - but that does
not mean the behaviour is undefined.  It just means it is defined
elsewhere, not in the C standards.  If the programmer doesn't know what
the "write" function does or how it is specified, then it might be
undefined behaviour - certainly it is bad programming.


> not more and not less erroneous than
>
> int main()
> {
>   int *p = 0;
>   *p = 42;
> }
>
> whereas I would argue that there is an important difference between
> the two.
>

There is no fundamental difference - if you know the behaviour is
defined, it is defined.  (The program is then correct or incorrect
depending on how that definition matches your requirements.)  If not, it
is undefined (and incorrect).  In neither case is the behaviour defined
by the C standard, but the behaviour could be defined by something else
(library documentation or external definition of "write", or a C
compiler that specifically says it defines the behaviour of
dereferencing null pointers).

> If the C standard replaced "the behavior is undefined" with "the
> program is in error, and the subsequent behavior is undefined"
> or something along those lines, the discussion would be much
> muted.
>

That sounds like you dislike the "time travel" aspect of C's undefined
behaviour.  Many would agree with that - they don't like the idea that
undefined behaviour later in the program can be used to change the
behaviour of code earlier on.  The C standard considers undefined
behaviour to be program-wide - if you execute something that has
undefined behaviour (remembering that this means there is no definition
/anywhere/ of what will happen), the whole program is wrong and you
can't expect anything from it.

People often find this disturbing.  They think perhaps it is fair enough
that dereferencing a null pointer can crash a program, but it shouldn't
affect things that came before it.

However, there are two key points to think about.  First, the standards
handling of undefined behaviour means that a compiler /can/ use UB to
change the object code generated for earlier source code, not that it
/must/ do so.  A compiler always balances efficient code generation with
ease-of-use and ease-of-debugging.  The ideal balance point will depend
on the programmer writing the code, so compiler flags are used to tune
it, but surprises can still happen.

The other point is to consider how the standards could say anything
else.  If the standards required observable behaviour to be completed
before undefined behaviour occurred, the results would be terrible.
Dereferencing a null pointer or dividing by zero could cause a complete
crash (remember the "Windows for Warships" affair?  A single divide by
zero brought the whole ship network down, leaving it dead in the water
for hours).  That means the compiler would need to make sure any
volatile writes had hit main memory before reading a pointer.  It would
have to ensure all file stream buffers were flushed to disk before doing
a division.  You can be sure Linus Torvalds would have a thing or two to
say about such a compiler.

> (Somebody may point out to me that this what the standard is
> actually saying.  If so, that would sort of reinforce my argument
> that it should be clearer :-)
[Fortran has in principle historically allowed rather aggressive optimization,
e.g., A*B+A*C can turn into A*(B+C).  On the other hand, in the real world,
when IBM improved their optimizing compiler Fortran H into Fortran X, the
developers said any new optimization had to produce bit identical results
to what the old compiler did.  So this is not a new issue. -John]

[toc] | [prev] | [next] | [standalone]


#2817 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-10 12:04 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-041@comp.compilers>
In reply to#2814
David Brown <david.brown@hesbynett.no> schrieb:

> The big question here, is why do you think Fortran is any different?  In
> theory, there isn't a difference - nothing you have said here convinces
> me that there is any fundamental difference between Fortran and C in
> regards to undefined behaviour.

I am not sure how to better explain it.  I will try a bit, but
this will be my last reply to you in this thread.  We seem to have
a fundamental difference in our understanding, and seem to be
unable to resolve it.

> (And there's no difference in the
> implementations - the most commonly used Fortran compilers also handle
> C, C++, and perhaps other languages.)

Sort of.

At the risk of boring most readers of this group, a very short, but
(hopefully) pertinent  introduction of how modern compilers work:

A front end translates the source to an abstract syntax tree (which
you can view with gfortran with -fdump-fortran-original) and from
that into an intermediate representation (which you can view with
gfortran, or with gcc in general, with -fdump-tree-original).
This intermediate representation is then optimized, in
an architecture-independent way (usually using SSA) and then
translated into assembler or directly to object code using a
"back end", of which many compilers also have several.

An example:  The program

  print *,"Hello, world"
end

is translated into (code only)

  WRITE UNIT=6 FMT=-1
  TRANSFER 'Hello, world'
  DT_END

and then, in the intermediate representation.

MAIN__ ()
{
  {
    struct __st_parameter_dt dt_parm.0;

    dt_parm.0.common.filename = &"hello.f90"[1]{lb: 1 sz: 1};
    dt_parm.0.common.line = 2;
    dt_parm.0.common.flags = 128;
    dt_parm.0.common.unit = 6;
    _gfortran_st_write (&dt_parm.0);
    _gfortran_transfer_character_write (&dt_parm.0, &"Hello, world"[1]{lb: 1 sz: 1}, 12);
    _gfortran_st_write_done (&dt_parm.0);
  }
}

There is no compiler (if you mean a single binary) that handles both
C and Fortran.  They are separate front ends to common middle
and back ends.

And there are certainly differences in the code that the front
ends handle to the middle end, so saying that there is "no
difference in the implementations" is not correct.

>> I see C conflating two separate concepts:  Programm errors and
>> behavior that is outside the standard.  "Undefined behavior is
>> always a programming error" does not work; that would make
>>
>> #include <unistd.h>
>> #include <string.h>
>>
>> int main()
>> {
>>   char a[] = "Hello, world!\n";
>>   write (1, a, strlen(a));
>>   return 0;
>> }
>>
>
> C does not have a "write" function in the standard library.  So the
> behaviour of "write" is not defined by the C standards - but that does
> not mean the behaviour is undefined.

When interpreting at a language standard, you _must_ follow the
definitions in the standards if they exist, you cannot use everyday
interpretations.

Subclause 3.4.3 (N2596) defines

# undefined behavior

# behavior, upon use of a nonportable or erroneous program
# construct or of erroneous data, for which this document imposes
# no requirements

write() is nonportable and the C standard imposes no requirements
on it.  Therefore, the program above invokes undefined behavior.



> It just means it is defined
> elsewhere, not in the C standards.

Nope, see above.

(If you replaced every occurence of "undefined behavior" in the C
standard with "WRTLPFMFT behavior" and "the behavior is undefined"
with "the behavior is WRTLPFMFT", the meaning of the standard
would not change.)
[It seems like nitpicking here.  Yes, the C and POSIX standards are
different things, but we all know how common it is to use them
together. -John]

[toc] | [prev] | [next] | [standalone]


#2820 — Re: what is defined, was for or against equality

FromDavid Brown <david.brown@hesbynett.no>
Date2022-01-11 18:16 +0100
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-044@comp.compilers>
In reply to#2817
On 10/01/2022 13:04, Thomas Koenig wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>
>> The big question here, is why do you think Fortran is any different?  In
>> theory, there isn't a difference - nothing you have said here convinces
>> me that there is any fundamental difference between Fortran and C in
>> regards to undefined behaviour.
>
> I am not sure how to better explain it.  I will try a bit, but
> this will be my last reply to you in this thread.  We seem to have
> a fundamental difference in our understanding, and seem to be
> unable to resolve it.
>

Fair enough.  Maybe in a future discussion, one of us will have an
"Aha!" moment and understand the other's viewpoint, and progress will be
made - until then, there's no point in going around in circles.  I'll
snip bits of your post here, and try to minimise new points (unless I
get that "Aha!") - but be sure I am reading and appreciating your entire
post.

>> (And there's no difference in the
>> implementations - the most commonly used Fortran compilers also handle
>> C, C++, and perhaps other languages.)
>
> Sort of.
>
> At the risk of boring most readers of this group, a very short, but
> (hopefully) pertinent  introduction of how modern compilers work:
>
>
> There is no compiler (if you mean a single binary) that handles both
> C and Fortran.  They are separate front ends to common middle
> and back ends.

Yes.  But it is the middle end that handles most of the optimisations,
including those based on undefined behaviour.  The front end determines
whether code can have undefined behaviour and in what circumstances.

>> C does not have a "write" function in the standard library.  So the
>> behaviour of "write" is not defined by the C standards - but that does
>> not mean the behaviour is undefined.
>
> When interpreting at a language standard, you _must_ follow the
> definitions in the standards if they exist, you cannot use everyday
> interpretations.
>
> Subclause 3.4.3 (N2596) defines
>
> # undefined behavior
>
> # behavior, upon use of a nonportable or erroneous program
> # construct or of erroneous data, for which this document imposes
> # no requirements
>
> write() is nonportable and the C standard imposes no requirements
> on it.  Therefore, the program above invokes undefined behavior.

No.  (As always, this is based on my interpretation of the standards -
consider everything to have "IMHO" attached.)  The implementation of
"write" is outside the scope of the standards, and is therefore
undefined as far as the standards are concerned.  That does not make it
undefined behaviour in the program - it just means the standards don't
say what "write" should do.

[toc] | [prev] | [next] | [standalone]


#2821 — Re: what is defined, was for or against equality

FromKaz Kylheku <480-992-1380@kylheku.com>
Date2022-01-11 19:19 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-045@comp.compilers>
In reply to#2820
On 2022-01-11, David Brown <david.brown@hesbynett.no> wrote:
> On 10/01/2022 13:04, Thomas Koenig wrote:
>> David Brown <david.brown@hesbynett.no> schrieb:
>>
>>> The big question here, is why do you think Fortran is any different?  In
>>> theory, there isn't a difference - nothing you have said here convinces
>>> me that there is any fundamental difference between Fortran and C in
>>> regards to undefined behaviour.
>>
>> I am not sure how to better explain it.  I will try a bit, but
>> this will be my last reply to you in this thread.  We seem to have
>> a fundamental difference in our understanding, and seem to be
>> unable to resolve it.
>
> Fair enough.  Maybe in a future discussion, one of us will have an
> "Aha!" moment and understand the other's viewpoint, and progress will be
> made - until then, there's no point in going around in circles.  I'll
> snip bits of your post here, and try to minimise new points (unless I
> get that "Aha!") - but be sure I am reading and appreciating your entire
> post.
>
>>> (And there's no difference in the
>>> implementations - the most commonly used Fortran compilers also handle
>>> C, C++, and perhaps other languages.)
>>
>> Sort of.
>>
>> At the risk of boring most readers of this group, a very short, but
>> (hopefully) pertinent  introduction of how modern compilers work:
>>
>>
>> There is no compiler (if you mean a single binary) that handles both
>> C and Fortran.  They are separate front ends to common middle
>> and back ends.
>
> Yes.  But it is the middle end that handles most of the optimisations,
> including those based on undefined behaviour.  The front end determines
> whether code can have undefined behaviour and in what circumstances.

More precisely, optimizations are based on the absence of undefined
behavior: the assumption that contracts are being upheld.

More precisely, that contracts are being upheld in the face of the
inability to determine and diagnose statically whether they are
violated; i.e. there is a "blind trust". (Though there do exist
situations in which, in principle, undefined behavior is easily
deducible at translation time, without a requirement to do so.)

Front-ends for different languages are written to the respective
requirements of those languages. Their first aim is to handle
well-defined constructs and situations.  They target the intermediate
language of the compiler middle. That language has its own contracts.
The front end for each respective language has to ensure that every
situation in which behavior is defined (contract is upheld) is
translated to reliable intermediate code whose contract is upheld.
Care has to be taken that the intermediate code is expressed in the
right way so that it will not change behavior in invalid ways due to
optimizations.

This leaves a lot of room for Fortran and C to have entirely different
defined/undefined behaviors.

Even the front end for one single language can have a lot of switches
affecting what is defined or not.

Thre could be a switch which says that overflowing integer addition has
two's complement wrapping behavior. In that case, the compiler then
selects the intermediate instructions which provide that behavior
reliably (possibly simulating signed arithmetic with unsigned), and
also disables any inferences in the front end that might be based on the
assumption that overflow has not occurred.

>>> C does not have a "write" function in the standard library.  So the
>>> behaviour of "write" is not defined by the C standards - but that does
>>> not mean the behaviour is undefined.
>>
>> When interpreting at a language standard, you _must_ follow the
>> definitions in the standards if they exist, you cannot use everyday
>> interpretations.
>>
>> Subclause 3.4.3 (N2596) defines
>>
>> # undefined behavior
>>
>> # behavior, upon use of a nonportable or erroneous program
>> # construct or of erroneous data, for which this document imposes
>> # no requirements
>>
>> write() is nonportable and the C standard imposes no requirements
>> on it.  Therefore, the program above invokes undefined behavior.
>
> No.  (As always, this is based on my interpretation of the standards -

Yes; using any function that is not in the C program, or in the
standard, is ISO C undefined behavior.

A program which includes <unistd.h> is not required to  compile
according to ISO C; it can fail with an error message about the
header not being defined. Or, #include <unistd.h> is allowed, in
a conforming implementation, to bring in tokens which have nothing
to do with POSIX.

Furthermore, a program which calls write, and does not provide such a
function itself, is not required to successfully link.  If it does link,
there is no requirement that this symbol is a function described by
POSIX.

POSIX implementations have to go out of their way to allow C programs
to use write as an external name, which ISO C allows.

For instance, the GNU C Library defines write as a weak symbol for
some identifier which resembles __libc_write: the "strong" symbol.

The C library internally uses only that __libc_write: it never calls
write, because user code could replace it:

  int write(char *x) { ... }

  double write = 42.0;

When the application defines the external name write, the weak symbol
coming from glibc yields; it is suppressed in favor of the program's
definition.

> consider everything to have "IMHO" attached.)  The implementation of
> "write" is outside the scope of the standards, and is therefore
> undefined as far as the standards are concerned.  That does not make it
> undefined behaviour in the program - it just means the standards don't
> say what "write" should do.

Right; it's "ISO C formal undefined behavior", not "behavior that is
not defined by any party whatsoever" ... though it could well be.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]


#2822 — Re: what is defined, was for or against equality

Fromgah4 <gah4@u.washington.edu>
Date2022-01-11 14:18 -0800
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-046@comp.compilers>
In reply to#2821
On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:

(big snip)

> This leaves a lot of room for Fortran and C to have entirely different
> defined/undefined behaviors.

> Even the front end for one single language can have a lot of switches
> affecting what is defined or not.

I suppose so.  But more usual, the compiler works to the least
common denominator.

For one, C requires static variables, and especially external ones, to
initialize to zero, but Fortran doesn't.  Fortran compilers that use C
compiler middle and back ends, tend to zero such variables.

I suspect that there are many more that I don't know about.
As long as the cost is small, and it satisfies both standards,
not much reason not to do it.

Fortran has stricter rules on aliasing than C.  I don't actually know
about any effect on C programs, though, but it might be that
compilers do the same for C.

One that is not C or Fortran, but IEEE 754, is the effect of
relational operators with NaN.  Comparisons with NaN,
except for "not equal", return false.  That means that compilers
have to be careful optimizing such, and especially that
"greater than or equal" is not the logical complement of "less than".
(I haven't looked at how compilers handle this, or, even more,
how the hardware handles it.)

[toc] | [prev] | [next] | [standalone]


#2824 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-12 19:02 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-048@comp.compilers>
In reply to#2822
gah4 <gah4@u.washington.edu> schrieb:
> On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:
>
> (big snip)
>
>> This leaves a lot of room for Fortran and C to have entirely different
>> defined/undefined behaviors.
>
>> Even the front end for one single language can have a lot of switches
>> affecting what is defined or not.
>
> I suppose so.  But more usual, the compiler works to the least
> common denominator.
>
> For one, C requires static variables, and especially external ones, to
> initialize to zero, but Fortran doesn't.  Fortran compilers that use C
> compiler middle and back ends, tend to zero such variables.

This is more a matter of operating system and linker conventions
than of compilers.

Looking at the ELF standard, one finds

.bss

This section holds uninitialized data that contribute to the program's
memory image. By definition, the system initializes the data with zeros
when the program begins to run. The section occupies no file space, as
indicated by the section type, SHT_NOBITS.

which, unsurprisingly, matches exactly what C is doing.

Anybody who writes a Fortran compiler for an ELF system will
use .bss for COMMOM blocks, because it is easiest.  Initialization
with zeros then happens automatically.

> I suspect that there are many more that I don't know about.
> As long as the cost is small, and it satisfies both standards,
> not much reason not to do it.
>
> Fortran has stricter rules on aliasing than C.  I don't actually know
> about any effect on C programs, though, but it might be that
> compilers do the same for C.

The rules are different, and unless C is the intermediate language,
a good compiler will hand the corresponding hints to the middle end.
[I have used Fortran systems that initialized otherwise undefined data to a value that would
trap, to help find use-before-set errors.  -John]

[toc] | [prev] | [next] | [standalone]


#2826 — Re: what is defined, was for or against equality

FromDavid Brown <david.brown@hesbynett.no>
Date2022-01-13 08:24 +0100
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-050@comp.compilers>
In reply to#2824
On 12/01/2022 20:02, Thomas Koenig wrote:
> gah4 <gah4@u.washington.edu> schrieb:
>> On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:

>> For one, C requires static variables, and especially external ones, to
>> initialize to zero, but Fortran doesn't.  Fortran compilers that use C
>> compiler middle and back ends, tend to zero such variables.
>
> This is more a matter of operating system and linker conventions
> than of compilers.
>
> Looking at the ELF standard, one finds
>
> .bss
>
> This section holds uninitialized data that contribute to the program's
> memory image. By definition, the system initializes the data with zeros
> when the program begins to run. The section occupies no file space, as
> indicated by the section type, SHT_NOBITS.
>
> which, unsurprisingly, matches exactly what C is doing.
>
> Anybody who writes a Fortran compiler for an ELF system will
> use .bss for COMMOM blocks, because it is easiest.  Initialization
> with zeros then happens automatically.

I was under the impression that FORTRAN compilers typically put data in
the ".common" section of object files.  A key difference between .common
and .bss is that (with standard linker setup) duplicate symbols in .bss
are an error, while duplicate symbols in .common are merged.  But in C
startup code, .common is also zeroed (FORTRAN may have different startup
code here - with no experience of the language, I don't know such details).

The use of ".common" by C compilers such as gcc was common practice
precisely to improve compatibility with FORTRAN in the early days, and
it let people write "int global_x;" in headers and have everything work,
rather than the correct practice of "extern int global_x;" in headers
and a single "int global_x;" in one object file.  The big disadvantages
are that if you have "int local_x;" in two files, and don't use static,
they'll be merged with no error, and if you have "int global_x;" in one
file and "double global_x;" in another, it's a mess.  Modern gcc now
uses "-fno-common" to avoid this.

>
>> I suspect that there are many more that I don't know about.
>> As long as the cost is small, and it satisfies both standards,
>> not much reason not to do it.
>>
>> Fortran has stricter rules on aliasing than C.  I don't actually know
>> about any effect on C programs, though, but it might be that
>> compilers do the same for C.
>
> The rules are different, and unless C is the intermediate language,
> a good compiler will hand the corresponding hints to the middle end.

AFAIUI the difference in aliasing rules is that in FORTRAN, pointer or
array parameters are assumed not to alias, while in C the compiler must
assume that they might alias, unless you use "restrict".  Are there
other differences?

[toc] | [prev] | [next] | [standalone]


#2827 — Re: what is defined, was for or against equality

FromThomas Koenig <tkoenig@netcologne.de>
Date2022-01-13 11:17 +0000
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-051@comp.compilers>
In reply to#2824
Thomas Koenig <tkoenig@netcologne.de> schrieb:

> [I have used Fortran systems that initialized otherwise undefined
> data to a value that would trap, to help find use-before-set errors.
> -John]

That usually is still available, but optional.  An short example:

$ cat a.f90
program main
  print *,a
end program main
$ gfortran -g -ffpe-trap=invalid -finit-real=snan a.f90
$ ./a.out

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

with a backtrace pointing to the offending line.

It does not necessarily work on COMMON blocks, though.

[toc] | [prev] | [next] | [standalone]


#2818 — Re: what is defined, was for or against equality

Fromgah4 <gah4@u.washington.edu>
Date2022-01-10 16:58 -0800
SubjectRe: what is defined, was for or against equality
Message-ID<22-01-042@comp.compilers>
In reply to#2808
On Saturday, January 8, 2022 at 10:11:55 AM UTC-8, Thomas Koenig wrote:

(snip)

> I see C conflating two separate concepts: Programm errors and
> behavior that is outside the standard. "Undefined behavior is
> always a programming error" does not work; that would make

> #include <unistd.h>
> #include <string.h>
> int main()
> {
> char a[] = "Hello, world!\n";
> write (1, a, strlen(a));
> return 0;
> }

Without the:

#include <unistd.h>

I agree that this would be undefined behavior.  But with the include file,
you are agreeing to use whatever standard the include file belongs to.

The include file defines the arguments to write(), but even more indicates
that you either supply (in another file), or use an otherwise supplied library
defining write().

[toc] | [prev] | [next] | [standalone]


#2798

FromRobert Prins <robert@prino.org>
Date2022-01-06 19:07 +0000
Message-ID<22-01-021@comp.compilers>
In reply to#2796
On 2022-01-06 08:11, David Brown wrote:
> On 05/01/2022 11:25, Martin Ward wrote:
>
> Your tools should tell you if you are accidentally using a reserved word as an
> identifier.

Your language should not have reserved words, if PL/I (AD 1964) could already do
without them...

'nuff said!

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html
[Just because it's possible to do something doesn't mean it is a good idea.  A
lot of us think a reasonable number of reserved words are fine and make it less
likely that a typo will silently change the meaning of a program. -John]

[toc] | [prev] | [next] | [standalone]


#2804 — Undefined behaviour, was: for or against equality

FromMartin Ward <martin@gkc.org.uk>
Date2022-01-07 14:02 +0000
SubjectUndefined behaviour, was: for or against equality
Message-ID<22-01-028@comp.compilers>
In reply to#2796
On 06/01/2022 08:11, David Brown wrote:
> The trick is to memorize the/defined/  behaviours, and stick to them.

Isn't the set of defined behaviours bigger than the set
of undefined behaviours? How do you know what is defined
if you don't know what is undefined?

For example, a = b + c is precisely defined in C and C++ for
floating point variables, but the result can be "undefined behaviour"
for ordinary 32 bit signed integer values.

If you want to stick to defined behaviours then you need
to add extra code. For example, CERT recommends:

   if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
       ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
     /* Handle error */
   } else {
     sum = si_a + si_b;
   }

--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.compilers


csiph-web