Groups > comp.arch > #110403 > unrolled thread

Calling conventions (particularly 32-bit ARM)

Started by	David Brown <david.brown@hesbynett.no>
First post	2025-01-06 14:57 +0100
Last post	2025-01-12 14:59 +0100
Articles	20 on this page of 97 — 17 participants

Back to article view | Back to comp.arch

  Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-06 14:57 +0100
    Re: Calling conventions (particularly 32-bit ARM) Theo <theom+news@chiark.greenend.org.uk> - 2025-01-06 15:23 +0000
      Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 09:22 +0100
    Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-06 15:32 +0000
      Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-06 20:19 +0000
        Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 10:09 +0100
          Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-07 23:23 +0000
            Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-07 23:35 +0000
              Re: Calling conventions (particularly 32-bit ARM) "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-01-07 15:42 -0800
                Re: Calling conventions (particularly 32-bit ARM) "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-01-07 20:01 -0800
              Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-08 01:38 +0000
      Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 09:49 +0100
    Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-06 20:10 +0000
      Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-07 02:11 +0000
        Re: Calling conventions (particularly 32-bit ARM) Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-01-07 06:53 +0000
        Re: Calling conventions (particularly 32-bit ARM) Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2025-01-12 12:10 -0800
          Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-12 20:41 +0000
          Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-13 01:20 +0000
            Re: Calling conventions (particularly 32-bit ARM) Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2025-01-14 09:40 -0800
              Re: Calling conventions (particularly 32-bit ARM) Terje Mathisen <terje.mathisen@tmsw.no> - 2025-01-14 19:18 +0100
                Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 23:48 +0200
                  Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 23:27 +0000
                    Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-15 03:31 +0000
                      Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-15 16:50 +0000
                        Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-15 22:03 +0000
                    Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-16 03:02 +0000
                      Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-16 15:08 +0000
                        Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-16 16:24 +0000
          Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-13 21:33 +0000
            Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 06:48 +0000
              Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 18:03 +0000
      Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2025-01-07 16:52 -0500
        Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 12:20 -0500
          Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-09 08:38 +0000
            Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-13 10:55 -0500
        Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-01-27 17:09 -0800
          Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2025-01-28 22:53 -0500
            Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-02-14 20:40 -0800
              Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2026-02-17 15:35 -0500
                Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-14 09:34 -0700
                  Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2026-03-24 17:20 -0400
      Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 12:34 -0500
        Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-08 20:19 +0000
        Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-08 22:08 +0000
          Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 18:20 -0500
            Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-09 00:11 +0000
            Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-09 07:23 +0000
              Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-09 10:07 +0000
                Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-10 08:24 +0000
              Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-09 20:48 +0000
                Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-09 21:23 +0000
                  Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-10 01:08 +0000
                    Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-10 09:19 +0000
              Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-10 08:33 +0000
                Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-10 10:25 +0000
                  Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-10 15:17 +0000
                  Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 02:10 +0000
                    Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 14:19 +0000
                    Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-13 18:02 +0000
                      Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 19:00 +0000
                        Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 21:53 +0000
                          Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 22:02 +0000
                            Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 22:40 +0000
                              Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 02:32 +0000
                                Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 15:08 +0100
                                Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 14:22 +0000
                                  Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 16:41 +0200
                                    Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-14 18:02 +0000
                                  Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 18:15 +0000
                                  Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 18:19 +0000
                                    Re: Calling conventions (particularly 32-bit ARM) Terje Mathisen <terje.mathisen@tmsw.no> - 2025-01-14 19:39 +0100
                                      Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 19:14 +0000
                                      Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 20:01 +0000
                                        Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 22:05 +0000
                                          Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 15:23 -0800
                                            Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 23:39 +0000
                                              Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 16:59 -0800
                                            Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 23:40 +0000
                                              Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 17:11 -0800
                                      Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-15 00:09 +0200
                                    Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 19:08 +0000
                                    Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 19:24 +0000
                                      Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 20:31 +0000
                                        Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 23:13 +0200
                                          Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-15 00:47 +0000
                              Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 06:20 +0000
                              Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 15:05 +0100
                                Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 14:39 +0000
                                  Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 16:50 +0100
                                    Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 15:32 -0800
                          Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 00:14 -0800
                Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-10 14:43 +0000
              Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-10 18:39 +0100
                Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-10 18:39 +0000
                  Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-12 14:55 +0100
                Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-10 19:19 +0000
                  Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-12 14:59 +0100

Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →

#115541

From	George Neuner <gneuner2@comcast.net>
Date	2026-03-24 17:20 -0400
Message-ID	<ojv5sktvj8oj7dv61nc3ntipia1gqoit92@4ax.com>
In reply to	#115330

On Sat, 14 Mar 2026 09:34:55 -0700, Tim Rentsch
<tr.17687@z991.linuxsc.com> wrote:

>George Neuner <gneuner2@comcast.net> writes:
>
>> On Sat, 14 Feb 2026 20:40:26 -0800, Tim Rentsch
>> <tr.17687@z991.linuxsc.com> wrote:

>Returning to the main area of discussion - it occurs to me that we
>may have different operational assumptions about the situations
>where nested functions (or lambda expressions) are called.  In
>particular, there is an important distinction between a call where
>it is known which nested function/lambda is being called, and a call
>where that is not known, for example by virtue of having been passed
>as a function argument, with the call being done through a formal
>parameter.  

I make no assumptions about whether a closure can outlive the call
chain that created it - unless the function is so simple as to be able
to be /compiled/ inline, that pretty much is a given.

But there is an operational difference between a 1st class closure
which is stored or passed upward and may be called from anywhere, and
a 2nd class closure which, if stored or passed at all, will be called
only at or below the scope that defined it.

And similarly, the ways in which the compiler can treat the closure
are different.

I know you [Tim] know this, but for sake of others following along,
there are two general methods for compiling a closure: 

The first is "closure conversion" in which the function's stack based
free variables are relocated into a heap allocated structure.  The
function is modified to take an additional "environment" pointer, and
to access its free variables from the structure rather than the stack.
All the call sites then are modified to pass the address of the
"environment" structure.

The second is "lambda lifting" which modifies the function such that
its free variables become explicit arguments. All of its call sites
are modified to pass those variables.

The question of which method makes more sense depends on what use will
be made of the closure.  If the closure's "name" [symbol or pointer]
can be stored into a data structure for some unknown use later, then a
variant of closure conversion is the only /safe/ way to do it.

However, there are many cases where - even with anonymous functions -
the use is delimited and guaranteed to be at or below the scope that
defined the closure.  This is quite common in FP languages, and even
in C++[*].  If the free variables are guaranteed to be in scope, there
is no need to relocate them.

[*] C++ closures are objects that can be stored and passed around, but
witness what happens if you bind some free data by reference and it is
no longer available when the closure is invoked.  BOOM!

>This distinction is analogous to calling an outside
>function using its name versus calling a function through pointer
>to function, where it isn't known what the pointer might point to.
>When calling a function using the function's name, it is possible to
>expand the function body inline.  (The inline expansion might be
>done at link time, but conceptually that is no different.)
>
>Conversely, when calling through a pointer-to-function, doing an
>inline expansion isn't feasible in general, because we don't know
>what function is going to be called, and also there might be more
>than one.  The same sort of distinction occurs with calls to nested
>functions (or equivalently lambdas).

Agreed WRT inlining - it generally is not possible to inline a closure
function at the call site.

But WRT calling a function through a pointer, there is little
difference vs calling it by name: in either case the compiler (or
linker) simply needs to know the function's signature [or a dumb
C-like calling convention].  Compiling a call can involve syntactic
swizzling and/or passing hidden arguments:  witness object method
calls in C++.

In Lisp or Scheme every function is anonymous. If you want to "save"
it for use later, you bind it to a name.

When you "defun" or "define" the mental picture is of creating a named
function, but in reality[+] you are creating two objects: a named
symbol, and an anonymous function which then is bound to the symbol.

E.g. in Scheme, there is no difference [apart from syntax] between

   (define foo (x y) ...) 
and
   (define foo (lambda (x y) ...))

In either case you call it by  (foo x_arg y_arg)  and refer to it by
foo  whenever you need it, e.g.,  (map foo ... )

Additionally you can  (define fie foo)  or  (let ((fie foo)) ... )
either of which makes fie refer to the same object as foo.  
[Functions are objects. This are analogous to copying a pointer.]

Yet there are Schemes that can inline function uses given the proper
circumstances.

[+] In both Lisp and Scheme, local functions (closures or simple) that
only ever are passed downward can be bound to compiler generated
"internal" names rather than programmer named symbols.  This is an
implementation detail.

Also Lisp can't quite manage the simple use shown above because Lisp
allows dual binding both a data object and a function object to the
same symbol.  In context, when the symbol itself is data, you need to
use a different syntax to specify that you want to use its function
binding rather than its data binding.
Not particularly relevant here, but included for completeness.

>The key point is that when packaging up a nested function plus data
>into a closure, they all have to look the same:  one pointer to
>function and one pointer to data.  That's because down the line the
>ultimate caller doesn't know where the closure came from, so all
>closures have to be structurally identical.

All uses of the function must call the function in the same way.  This
is true regardless of whether it is a closure or simple function.

In the case of the closure, however, there is no requirement that
there be only "one pointer to [environment] data".  That is just one
way of doing it.

Note that lambda lifting can be done partially, and it also can be
combined with closure conversion by creating the closure as a
trampoline which calls the lifted function passing environment data as
arguments.  

Trampoline? Two calls instead of one?  Why would you do this?  

Well, it might make sense if the function works with, e.g., multiple
(or multi-dimensional) arrays, and the computation performed is such
that the code will be more performant if the function is given
pointers to the data rather than having to figure them out every time.

This, of course, depends on many things such as: source language, CPU
addressing capabilities, available runtime information (dope vectors?
limits, etc.), and on the compiler's optimization (or not) of address
computations.

Analogous arguments can be made for any complex data structures that
could end up in a closure's environment [doesn't matter read-only or
read-write].  What is best to do is, at best, an ad hoc metric ...
there are no rules.

>When I talk about closures, my built-in assumption is that what is
>being talked about is the don't-know-who-is-being-called case.  Any
>direct call to a nested function doesn't need a closure (there might
>be one as a matter of convenience, but there doesn't have to be), so
>any idea of adjusting call the call sites doesn't apply when we're
>talking about closures (again, to be clear, in the way I use the
>term closure).
>
>My sense now is that you are talking (mostly? exclusively?) about
>the direct call case.  Is that a fair read of your comments?

Yes, sort of.  

My initial response (to Mitch) mainly was concerned with the block
structured, delimited 2nd class case where it is possible to know
whether free variables would still be in scope at the call sites.

It is good that you brought up the 1st class case.  It is important
that people understand the whole picture rather than just a piece of
it.

Yes, closure conversion provably handles any use case. Just realize
that it may not be the most performant way to do it.  ;-)

[toc] | [prev] | [next] | [standalone]

#110446

From	Stefan Monnier <monnier@iro.umontreal.ca>
Date	2025-01-08 12:34 -0500
Message-ID	<jwv34htql17.fsf-monnier+comp.arch@gnu.org>
In reply to	#110413

> I looked high and low for codes using more than 8 arguments and
> returning aggregates larger than 8 double words, and about the
> only things I found were a handful of []print[]() calls.

For languages where the type systems ensures that the max number of
arguments is known (and the same) when compiling the function and when
compiling the calls to it, you could adjust the number of caller-saved
argument registers according to the actual number of arguments of the
function, thus making it "cheap" to allow, say, 13 argument registers
for those functions that take 13 arguments, since it doesn't impact the
other functions.

But in any case, I suspect there are also diminishing returns at some
point: how much faster is it in practice to pass/return 13 values in
registers instead of 8 of them in registers and the remaining 5 on
the stack?  I expect a 13-arg function to perform an amount
of work that will dwarf the extra work of going through the stack.


        Stefan

[toc] | [prev] | [next] | [standalone]

#110449

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-08 20:19 +0000
Message-ID	<fce2bc7116c6e905e624819e7c8bdf8d@www.novabbs.org>
In reply to	#110446

On Wed, 8 Jan 2025 17:34:30 +0000, Stefan Monnier wrote:

>> I looked high and low for codes using more than 8 arguments and
>> returning aggregates larger than 8 double words, and about the
>> only things I found were a handful of []print[]() calls.
>
> For languages where the type systems ensures that the max number of
> arguments is known (and the same) when compiling the function and when
> compiling the calls to it, you could adjust the number of caller-saved
> argument registers according to the actual number of arguments of the
> function, thus making it "cheap" to allow, say, 13 argument registers
> for those functions that take 13 arguments, since it doesn't impact the
> other functions.

The counter argument is that there are too few subroutines wanting
this amount of register argument passing. So, even if you allowed
for this, it probably does not show up on the bottom line.

> But in any case, I suspect there are also diminishing returns at some
> point: how much faster is it in practice to pass/return 13 values in
> registers instead of 8 of them in registers and the remaining 5 on
> the stack?  I expect a 13-arg function to perform an amount
> of work that will dwarf the extra work of going through the stack.

Then there is the issue of what is IN the structure passed in
registers??

If it is a series of bytes, then it is better passed by reference
so the bytes can be LDed (1 instruction) rather than extracted
(2 instructions in most ISAs); or STed (1 instruction) rather
than insertion (3 instruction most ISAs).

If, instead, the structure is comprised of bit-fields, then it is
almost always wise to pass in registers--since extraction and
insertion are always reg->reg.

Also note: If the structure is written deep with the subroutine,
many (many) instructions before return, Then it is often wiser
to perform this stuff into a memory area, and reload just prior
to return.

>
>
>         Stefan

[toc] | [prev] | [next] | [standalone]

#110450

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2025-01-08 22:08 +0000
Message-ID	<2025Jan8.230846@mips.complang.tuwien.ac.at>
In reply to	#110446

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>For languages where the type systems ensures that the max number of
>arguments is known (and the same) when compiling the function and when
>compiling the calls to it, you could adjust the number of caller-saved
>argument registers according to the actual number of arguments of the
>function, thus making it "cheap" to allow, say, 13 argument registers
>for those functions that take 13 arguments, since it doesn't impact the
>other functions.

ABI calling conventions tend to be designed to support at least C,
including varargs and often also tolerant of differences between the
number of arguments in the caller and callee.

Language-private calling conventions can be a good idea, but then, if
you want to call C code (or be called by C code), you need to handle
ABI calling conventions in addition.

>But in any case, I suspect there are also diminishing returns at some
>point: how much faster is it in practice to pass/return 13 values in
>registers instead of 8 of them in registers and the remaining 5 on
>the stack?  I expect a 13-arg function to perform an amount
>of work that will dwarf the extra work of going through the stack.

I certainly have a use for as many arguments as the ABI provides, for
functions that typically contain only a few payload instructions: You
can implement a direct-threaded VM interpreter using tail-call
optimization, along the lines of

void add(VMinst *ip, long *sp, long sp_top)
{
  /* payload start */
  sp_top += *sp++;
  /* payload end */
  /* invoke the next VM instruction */
  (*ip)(ip+1,sp,sp_top);
}

30 years ago gcc could not tail-call-optimize this, in the meantime it
can (and clang can do it, too).  However, typical VMs have more than
just these three VM registers (Gforth has ip, sp, rp, fp, lp, up,
fp_top (usually mapped to a real-machine FP register) and registers
for as many sp stack items as practical; we intend to cache rp_top in
a register, too), and ideally you can pass them all as arguments; so
we could make good use of 10+ arguments.  If there are not enough
arguments in registers, you have to use explicit register vars (a GNU
C extension) in addition, but that is more architecture-specific.
Some preliminary testing on AMD64 resulted in gcc apparently
supporting a lot of explicit registers on AMD64, and clang/LLVM only
one.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

[toc] | [prev] | [next] | [standalone]

#110451

From	Stefan Monnier <monnier@iro.umontreal.ca>
Date	2025-01-08 18:20 -0500
Message-ID	<jwvr05cq4tx.fsf-monnier+comp.arch@gnu.org>
In reply to	#110450

> ABI calling conventions tend to be designed to support at least C,
> including varargs and often also tolerant of differences between the
> number of arguments in the caller and callee.

I can agree that it's important to support those use-cases (varargs
obviously, mismatched arg numbers less so), but I think the focus of
optimization of the ABI should be calls to functions known to take the
exact same number of arguments (after all, even in C we normally know
the prototype of the called function; only sloppy ancient C calls
functions without proper declarations), even if it comes at the cost of
using different calling conventions for the two cases.

>>But in any case, I suspect there are also diminishing returns at some
>>point: how much faster is it in practice to pass/return 13 values in
>>registers instead of 8 of them in registers and the remaining 5 on
>>the stack?  I expect a 13-arg function to perform an amount
>>of work that will dwarf the extra work of going through the stack.
> I certainly have a use for as many arguments as the ABI provides,

Ah, yes, machine-generated code can always defy intuitions about what
is "typical".  🙂


        Stefan

[toc] | [prev] | [next] | [standalone]

#110452

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-09 00:11 +0000
Message-ID	<a318d8af91ca17885939aede2007871c@www.novabbs.org>
In reply to	#110451

On Wed, 8 Jan 2025 23:20:43 +0000, Stefan Monnier wrote:

>> ABI calling conventions tend to be designed to support at least C,
>> including varargs and often also tolerant of differences between the
>> number of arguments in the caller and callee.

My 6600 ABI was designed for C, but is compatible with Fortran and
C++ {and I suspect most languages--under the assumption that those
languages have to clean up their own messes*}.

(*) C++ has to drop "stuff" on the stack so that it can properly
deallocate new structures when Try-Throw-Catch is performing walk
backs, and to utilize that "stack stuff" when searching for the
right exception block.

When C calls Fortran and Fortran is expecting an array, C has
to build the dope vector used by Fortran in accessing said array.

Any calling convention is pressed on both sides--more argument registers
and more callee-save registers--but the number of registers if fixed.

> I can agree that it's important to support those use-cases (varargs
> obviously, mismatched arg numbers less so), but I think the focus of
> optimization of the ABI should be calls to functions known to take the
> exact same number of arguments (after all, even in C we normally know
> the prototype of the called function; only sloppy ancient C calls
> functions without proper declarations), even if it comes at the cost of
> using different calling conventions for the two cases.

In My 66000 ABI varargs takes one more Prologue instructions as
a non-varargs subroutine and creates a vector of DW arguments
which can be picked off with va_list = SP; va_start = 0,
and va_arg(va_list,arg) = LD  Rd,[va_list,Rarg<<3];

One of the key reasons to have a unified register model.

>>>But in any case, I suspect there are also diminishing returns at some
>>>point: how much faster is it in practice to pass/return 13 values in
>>>registers instead of 8 of them in registers and the remaining 5 on
>>>the stack?

Back when we looked at this in mid 1990s, using more registers for
arguments (than the 8 we were using) was "well down" the low hanging
fruit.

[toc] | [prev] | [next] | [standalone]

#110453

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2025-01-09 07:23 +0000
Message-ID	<2025Jan9.082357@mips.complang.tuwien.ac.at>
In reply to	#110451

Stefan Monnier <monnier@iro.umontreal.ca> writes:
[Someone wrote:]
>> ABI calling conventions tend to be designed to support at least C,
>> including varargs and often also tolerant of differences between the
>> number of arguments in the caller and callee.
>
>I can agree that it's important to support those use-cases (varargs
>obviously, mismatched arg numbers less so),

You are head of a group of people who design a new architecture (say,
it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
Your ABI designer comes to you and tells you that his life would be
easier if it was ok that programs with mismatched arguments don't need
to work.  Would you tell him that they don't need to work?

If yes, a few years down the road your prospective customers have to
decide whether to go for your newfangled architecture or one of the
established ones.  They learn that a number of programs work
everywhere else, but not on your architecture.  How many of them will
be placated by your reasoning that these programs are not strictly
confoming standard programs?  How many will be alarmed by your
admission that you find it ok that you find it ok that such programs
don't work on your architecture?  After all, hardly any program is a
strictly conforming standard program.

>only sloppy ancient C calls
>functions without proper declarations)

You find it ok to design a calling convention such that ancient C
programs do not work?

What benefit do you expect from such a calling convention?  To allow
to use registers as arguments (and not callee-saved) that would
otherwise be preferably used as callee-saved registers?

However, I wonder why, e.g., RISC-V does not allow the use of all
caller-saved registers as arguments.  In addition to the 8 argument
registers (a0-a7=x10-x17), RISC-V has 7 additional caller-saved
registers: t0-t6(=x5-x7,x28-x31); for FP register's it's even more
extreme: 8 argument registers fa0-fa7=f10-f17, and 12 additional
caller-saved registers ft0-ft12=f0-f7,f28-f31.

>even if it comes at the cost of
>using different calling conventions for the two cases.

That would mean that you find it ok that existing programs that use
vararg functions like printf but do not declare them before use don't
work on your newfangled architecture.  Looking at
<https://pdos.csail.mit.edu/6.828/2023/readings/riscv-calling.pdf>,
the RISC-V people find that acceptable:

|If argument i < 8 is a floating-point type, it is passed in
|floating-point register fai; [...]  Additionally, floating-point
|arguments to variadic functions (except those that are explicitly
|named in the parameter list) are passed in integer registers.

So if I 'printf("%f",1.0)' without first declaring printf, the program
won't work.  I just tried out compiling the following program on
RISC-V with gcc 10.3.1:

int main()
{
  printf("%f\n",1.0);
}

int xxx()
{
  yyy("%f\n",1.0,2);
}

Note that there is no "#include <stdio.h>" or any declaration of
printf() or yyy().  Yet 1.0 is passed to printf() in a1, while it is
passed to yyy() in fa0, and 2 is passed to yyy() in a1.

And gcc works around the varargs decision by using the varargs calling
convention for some well-known vararg functions like printf, while
other undeclared functions use the non-varargs calling convention.
Apparently the fallout of that decision by the RISC-V people hit a
"relevant" program.

[1] Apparently they stuck with the decision to deal differently with
varargs, and then decided to change the rest of the calling convention
to benefit from that decision by not leaving holes in the FP argument
registers for integers and vice versa.  I don't find this clearly
expressed in
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc>.
The only thing that points in that direction is:

|Values are passed in floating-point registers whenever possible,
|whether or not the integer registers have been exhausted.

But this does not talk about how the integer argument register
numbering is changed by the "Hardware Floating-point Calling
Convention".

>> I certainly have a use for as many arguments as the ABI provides,
>
>Ah, yes, machine-generated code can always defy intuitions about what
>is "typical".

While I use a generator for my interpreter engines, many other people
hand-code them.  They would probably use macros for the function
declaration and the tail-call, though.  Or maybe a macro that wraps
the whole payload so that one can easily switch between this technique
and one of the others.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

[toc] | [prev] | [next] | [standalone]

#110455

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2025-01-09 10:07 +0000
Message-ID	<vlo758$3a54k$1@dont-email.me>
In reply to	#110453

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

> That would mean that you find it ok that existing programs that use
> vararg functions like printf but do not declare them before use don't
> work on your newfangled architecture.

Interestingly, tail call optimization (which I believe you like)
can cause bugs with mismatched arguments when different functions
disagree abuout the stack size.  Here is a nasty case with sibling
calls:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329

So, if you want to allow mismatched declarations, better
disable tail calls, to be on the safe side.

[toc] | [prev] | [next] | [standalone]

#110461

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2025-01-10 08:24 +0000
Message-ID	<2025Jan10.092430@mips.complang.tuwien.ac.at>
In reply to	#110455

Thomas Koenig <tkoenig@netcologne.de> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>
>> That would mean that you find it ok that existing programs that use
>> vararg functions like printf but do not declare them before use don't
>> work on your newfangled architecture.
>
>Interestingly, tail call optimization (which I believe you like)
>can cause bugs with mismatched arguments when different functions
>disagree abuout the stack size.

I have a use case for tail-call optimization.  When I first looked
into that around 1994, I found that gcc does not perform tail-call
optimization, and I was surprised, because it had been written by a
Lisp programmer.

When I looked into the reasons, I found that in C calling conventions
typically the caller is responsible for alloating stack space for
arguments and for deallocating that stack space.  The reason for that
is varargs and the fact that in old C there was no requirement to
define a prototype of a function (including vararg functions).  If in
case of a call just before a return the function needs to put
deallocating code between the call and return, the call is not a
tail-call and therefore cannot be tail-call optimized.

So I thought that with C calling conventions (necessitated by the
properties of the C language), tail-call optimization is not possible,
but Mark Probst, a student in our group, actually managed to deal with
the tail-recursion case
<https://www.complang.tuwien.ac.at/schani/diplarb.ps>.

A few years later sibling call optimization (more restrictive than
general tail-call optimization, but less restrictive than
tail-recursion elimination) appeared in gcc.  The gcc manual
apparently does not say what a sibling call is, but
<https://stackoverflow.com/questions/22037261/what-does-sibling-calls-mean>
says "where caller function and callee function do not need to be
same, but they have compatible stack footprint.".  Given the bug you
point out, that's obviously not restrictive enough to be correct in
all cases.

Concerning my use case, for me it's good enough if tail-calls are
optimized when the caller and the callee have the same argument types
and return type, and the arguments fit in registers.  So if in your
buggy case gcc decided not to optimize the call as sibling call, my
use case would not be affected.

Moreover, I need a guarantee that a call is actually
tail-call-optimized (and if not, compilation should ideally error out,
saving me the need to validate that property afterwards), and I would
be willing to put some text in the source code that indicates that
intent.  E.g., something along the lines of

void add(VMinst *ip, long *sp, long sp_top)
{
  /* payload start */
  sp_top += *sp++;
  /* payload end */
  /* invoke the next VM instruction */
  (*ip)(ip+1,sp,sp_top) __attribute__("tail-call optimized");
}

Existing code would be unaffected by such an approach to tail-call
optimization.

Beyond my use case: tail-call optimization has to be applied like
every other optimization: It preserve the behaviour of existing,
working programs, i.e., the result must be equivalent.  In the bug you
mention, this obviously was not the case, and one way out would be not
to apply tail-call optimization in this case and similar cases (maybe
in all cases where arguments are in memory).  That looks like a simple
way to fix the bug.  Maybe there's a less restrictive one.

Sure one can wish that C was different (e.g., like the fantasy that
all C programs are strictly conforming to some particular C standard
that turns some desired transformation into a correct optimization),
but existing, working programs are far more relevant than the wishes
for some transformation IMO; there are a lot of people who see this
differently, but it seems to me that these people not only wish that
the old C programs vanish, but they don't care much about new C
programs (apart from a few benchmarks), either.  After all, they don't
program in C, but in C++, Fortran, Rust, or something else.

Actually, concerning the fantasy mentioned above, gcc already offers
options such as -std=c23 and -pedantic which would allow the user to
tell gcc that the compiled program actually lives in this fantasy
world, but if the user did not ask for pain, a compiler should not
provide it.

>So, if you want to allow mismatched declarations, better
>disable tail calls, to be on the safe side.

That would be a way of dealing with the problem.  It matches the
general pattern of people defending transformations that do not
preserve program equivalence (i.e., are buggy when intended as
optimizations) by putting up a straw man that disables correct
otimizations in addition to transformations that do not preserve
program equivalence.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

[toc] | [prev] | [next] | [standalone]

#110456

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-09 20:48 +0000
Message-ID	<b8744b4868c5f6b7ef6f629f0fbbf492@www.novabbs.org>
In reply to	#110453

On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> [Someone wrote:]
>>> ABI calling conventions tend to be designed to support at least C,
>>> including varargs and often also tolerant of differences between the
>>> number of arguments in the caller and callee.
>>
>>I can agree that it's important to support those use-cases (varargs
>>obviously, mismatched arg numbers less so),
>
> You are head of a group of people who design a new architecture (say,
> it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
> Your ABI designer comes to you and tells you that his life would be
> easier if it was ok that programs with mismatched arguments don't need
> to work.  Would you tell him that they don't need to work?

No, I would stand my ground and mandate that they do work.

> If yes, a few years down the road your prospective customers have to
> decide whether to go for your newfangled architecture or one of the
> established ones.  They learn that a number of programs work
> everywhere else, but not on your architecture.  How many of them will
> be placated by your reasoning that these programs are not strictly
> confoming standard programs?

One of the salient point that allowed C to overtake PASCAL is that
you can write printf() in C while you cannot write write() in PARCAL.
Do not break this assumption on any architecture.

>                               How many will be alarmed by your
> admission that you find it ok that you find it ok that such programs
> don't work on your architecture?  After all, hardly any program is a
> strictly conforming standard program.
>
>>only sloppy ancient C calls
>>functions without proper declarations)
>
> You find it ok to design a calling convention such that ancient C
> programs do not work?

I went the other way, I made an ABI that made varargs EASY !!
and in such a way the caller does not need to know callee is
varargs.

> What benefit do you expect from such a calling convention?  To allow
> to use registers as arguments (and not callee-saved) that would
> otherwise be preferably used as callee-saved registers?

I found no particular problem in passing a fixed number of arguments
in registers and the rest on a stack. va_list; dumps the registers
onto the stack to form a vector of arguments in memory. va_arg
initializes the pointer to where the registers got stuck on the stack.

> However, I wonder why, e.g., RISC-V does not allow the use of all
> caller-saved registers as arguments.

A) we need some registers for passing of arguments, and some
..for returning results.
B) we need some temporary registers so short leaf subroutines
..do not need stack space in order to compute with the given
..arguments
C) we need some registers for holding onto caller's state while
..processing callee operations
D) there is generally a register holding the return address.

Generally (A) and (B) have a sliding window. The fewer arguments
and results, the more temporary registers.

(C) includes FP and SP as callee preserved registers--that is
..when control returns to caller R16..R31 contain the same
..values as when the CALL was performed.

In looking at code out of My 66000 LLVM compiler, there are so
few subroutines with "that many" arguments and results, that
mandating more than 8 arguments or results go through memory
is not really a performance burden.

Also: More callee save registers (preserved) causes more stack
space to be allocated for the 'temporary' registers. Say you want
all the registers (except return address register, and return
result register) to be preserved across a subroutine call: So, a
small subroutine needing 3 registers to perform its calculations;
now has 3 STs and 3 LDs to preserve caller registers, whereas
with temporary registers there is no overhead.

> In addition to the 8 argument
> registers (a0-a7=x10-x17), RISC-V has 7 additional caller-saved
> registers: t0-t6(=x5-x7,x28-x31); for FP register's it's even more
> extreme: 8 argument registers fa0-fa7=f10-f17, and 12 additional
> caller-saved registers ft0-ft12=f0-f7,f28-f31.
>
>>even if it comes at the cost of
>>using different calling conventions for the two cases.
>
> That would mean that you find it ok that existing programs that use
> vararg functions like printf but do not declare them before use don't
> work on your newfangled architecture.  Looking at
> <https://pdos.csail.mit.edu/6.828/2023/readings/riscv-calling.pdf>,
> the RISC-V people find that acceptable:
>
> |If argument i < 8 is a floating-point type, it is passed in
> |floating-point register fai; [...]  Additionally, floating-point
> |arguments to variadic functions (except those that are explicitly
> |named in the parameter list) are passed in integer registers.
>
> So if I 'printf("%f",1.0)' without first declaring printf, the program
> won't work.  I just tried out compiling the following program on
> RISC-V with gcc 10.3.1:
>
> int main()
> {
>   printf("%f\n",1.0);
> }
>
> int xxx()
> {
>   yyy("%f\n",1.0,2);
> }
>
> Note that there is no "#include <stdio.h>" or any declaration of
> printf() or yyy().  Yet 1.0 is passed to printf() in a1, while it is
> passed to yyy() in fa0, and 2 is passed to yyy() in a1.

This is bad...not horrible, but bad.

> And gcc works around the varargs decision by using the varargs calling
> convention for some well-known vararg functions like printf, while
> other undeclared functions use the non-varargs calling convention.
> Apparently the fallout of that decision by the RISC-V people hit a
> "relevant" program.

A good ABI does not need these distinctions.

It also leaves open code compiled partially by GCC from linking with
code
compiled by LLVM will have interoperability issues on mundane calls.

> [1] Apparently they stuck with the decision to deal differently with
> varargs, and then decided to change the rest of the calling convention
> to benefit from that decision by not leaving holes in the FP argument
> registers for integers and vice versa.  I don't find this clearly
> expressed in
> <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc>.
> The only thing that points in that direction is:
>
> |Values are passed in floating-point registers whenever possible,
> |whether or not the integer registers have been exhausted.
>
> But this does not talk about how the integer argument register
> numbering is changed by the "Hardware Floating-point Calling
> Convention".
>
>>> I certainly have a use for as many arguments as the ABI provides,
>>
>>Ah, yes, machine-generated code can always defy intuitions about what
>>is "typical".
>
> While I use a generator for my interpreter engines, many other people
> hand-code them.  They would probably use macros for the function
> declaration and the tail-call, though.  Or maybe a macro that wraps
> the whole payload so that one can easily switch between this technique
> and one of the others.
>
> - anton

[toc] | [prev] | [next] | [standalone]

#110457

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2025-01-09 21:23 +0000
Message-ID	<vlpeoi$3hjos$1@dont-email.me>
In reply to	#110456

MitchAlsup1 <mitchalsup@aol.com> schrieb:
> On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
>
>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> [Someone wrote:]
>>>> ABI calling conventions tend to be designed to support at least C,
>>>> including varargs and often also tolerant of differences between the
>>>> number of arguments in the caller and callee.
>>>
>>>I can agree that it's important to support those use-cases (varargs
>>>obviously, mismatched arg numbers less so),
>>
>> You are head of a group of people who design a new architecture (say,
>> it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
>> Your ABI designer comes to you and tells you that his life would be
>> easier if it was ok that programs with mismatched arguments don't need
>> to work.  Would you tell him that they don't need to work?
>
> No, I would stand my ground and mandate that they do work.

That can be tricky.  You can read

https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html

and its sequel

https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/

as a cautionary tale.

To cut this a little shorter:  Assume eight arguments are passed in
registers, like for My 66000.

Caller calls

  foo (a1, a2, a3, a4, a5, a6, a7, a8);

Callee side:

  foo (a1, a2, a3, a4, a5, a6, a7, a8, a9)

Foo ends with

  bar (b1, b2, b3, b4, b5, b6, b7, b8, b9);

and wants to save stack space, so it stores the value of b9 into
the space where it was supposed to be, and then branches to bar.
Result: Stack corruption.

What would you tell your ABI designer in that case?  Don't do tail
calls, it is better to use more stack space, with all effect on
stack sizes and locality that would have?

[toc] | [prev] | [next] | [standalone]

#110458

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-10 01:08 +0000
Message-ID	<bf799b48fb92f919049b454dff4080b8@www.novabbs.org>
In reply to	#110457

On Thu, 9 Jan 2025 21:23:30 +0000, Thomas Koenig wrote:

> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>> On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
>>
>>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> [Someone wrote:]
>>>>> ABI calling conventions tend to be designed to support at least C,
>>>>> including varargs and often also tolerant of differences between the
>>>>> number of arguments in the caller and callee.
>>>>
>>>>I can agree that it's important to support those use-cases (varargs
>>>>obviously, mismatched arg numbers less so),
>>>
>>> You are head of a group of people who design a new architecture (say,
>>> it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
>>> Your ABI designer comes to you and tells you that his life would be
>>> easier if it was ok that programs with mismatched arguments don't need
>>> to work.  Would you tell him that they don't need to work?
>>
>> No, I would stand my ground and mandate that they do work.
>
> That can be tricky.  You can read
>
> https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html
>
> and its sequel
>
> https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/
>
> as a cautionary tale.

Yes, I had to make a nasty ABI work on the HEP (Denelcor)

> To cut this a little shorter:  Assume eight arguments are passed in
> registers, like for My 66000.
>
> Caller calls
>
>   foo (a1, a2, a3, a4, a5, a6, a7, a8);
>
> Callee side:
>
>   foo (a1, a2, a3, a4, a5, a6, a7, a8, a9)
>
> Foo ends with
>
>   bar (b1, b2, b3, b4, b5, b6, b7, b8, b9);
>
> and wants to save stack space, so it stores the value of b9 into
> the space where it was supposed to be, and then branches to bar.
> Result: Stack corruption.
>
> What would you tell your ABI designer in that case?  Don't do tail
> calls, it is better to use more stack space, with all effect on
> stack sizes and locality that would have?

Same response I would give to::

     printf( "%d %d %d %d %d/r", a[i] );

"They deserve what they get".

You will notice that no ISA has ever had a "go jump in the lake"
instruction. For had there been, computers would not have survived
the the present--they would all be in the lake...

[toc] | [prev] | [next] | [standalone]

#110460

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2025-01-10 09:19 +0000
Message-ID	<vlqomv$3tp3o$1@dont-email.me>
In reply to	#110458

MitchAlsup1 <mitchalsup@aol.com> schrieb:
> On Thu, 9 Jan 2025 21:23:30 +0000, Thomas Koenig wrote:
>
>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>> On Thu, 9 Jan 2025 7:23:57 +0000, Anton Ertl wrote:
>>>
>>>> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>> [Someone wrote:]
>>>>>> ABI calling conventions tend to be designed to support at least C,
>>>>>> including varargs and often also tolerant of differences between the
>>>>>> number of arguments in the caller and callee.
>>>>>
>>>>>I can agree that it's important to support those use-cases (varargs
>>>>>obviously, mismatched arg numbers less so),
>>>>
>>>> You are head of a group of people who design a new architecture (say,
>>>> it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
>>>> Your ABI designer comes to you and tells you that his life would be
>>>> easier if it was ok that programs with mismatched arguments don't need
>>>> to work.  Would you tell him that they don't need to work?
>>>
>>> No, I would stand my ground and mandate that they do work.
>>
>> That can be tricky.  You can read
>>
>> https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html
>>
>> and its sequel
>>
>> https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/
>>
>> as a cautionary tale.
>
> Yes, I had to make a nasty ABI work on the HEP (Denelcor)
>
>> To cut this a little shorter:  Assume eight arguments are passed in
>> registers, like for My 66000.
>>
>> Caller calls
>>
>>   foo (a1, a2, a3, a4, a5, a6, a7, a8);
>>
>> Callee side:
>>
>>   foo (a1, a2, a3, a4, a5, a6, a7, a8, a9)
>>
>> Foo ends with
>>
>>   bar (b1, b2, b3, b4, b5, b6, b7, b8, b9);
>>
>> and wants to save stack space, so it stores the value of b9 into
>> the space where it was supposed to be, and then branches to bar.
>> Result: Stack corruption.
>>
>> What would you tell your ABI designer in that case?  Don't do tail
>> calls, it is better to use more stack space, with all effect on
>> stack sizes and locality that would have?
>
> Same response I would give to::
>
>      printf( "%d %d %d %d %d/r", a[i] );
>
> "They deserve what they get".

So, mismatched arguments don't need to work?  We're in agreement, then.

> You will notice that no ISA has ever had a "go jump in the lake"
> instruction. For had there been, computers would not have survived
> the the present--they would all be in the lake...

I don't find it in

https://paws.kettering.edu/~jhuggins/humor/opcodes.html so I guess
it does not exists.  (That list is old, that was floating around when
/pub directories were still open on ftp servers...)

[toc] | [prev] | [next] | [standalone]

#110459

From	antispam@fricas.org (Waldek Hebisch)
Date	2025-01-10 08:33 +0000
Message-ID	<vlqm0d$27bfb$1@paganini.bofh.team>
In reply to	#110453

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
> [Someone wrote:]
>>> ABI calling conventions tend to be designed to support at least C,
>>> including varargs and often also tolerant of differences between the
>>> number of arguments in the caller and callee.
>>
>>I can agree that it's important to support those use-cases (varargs
>>obviously, mismatched arg numbers less so),
> 
> You are head of a group of people who design a new architecture (say,
> it's 2010 and you design ARM A64, or it's 2014 and you design RISC-V).
> Your ABI designer comes to you and tells you that his life would be
> easier if it was ok that programs with mismatched arguments don't need
> to work.  Would you tell him that they don't need to work?
> 
> If yes, a few years down the road your prospective customers have to
> decide whether to go for your newfangled architecture or one of the
> established ones.  They learn that a number of programs work
> everywhere else, but not on your architecture.  How many of them will
> be placated by your reasoning that these programs are not strictly
> confoming standard programs?  How many will be alarmed by your
> admission that you find it ok that you find it ok that such programs
> don't work on your architecture?  After all, hardly any program is a
> strictly conforming standard program.

Such things happended many times in the past.  AFAIK standard
setup on a VAX was that accessing data at address 0 gave you 0.
A lot of VAX programs needed fixes to run on different machines.
I remember issue with writing to strings: early C compilers
put literal strings in writable memory and programs assumed that
they can change strings.  C 'errno' was made more abstract due
to multithreading, it broke some programs.  Concerning varags,
Power PC and later AMD-64 used calling convention incompatible
with popular expectations.

Concerning customers, they will tolerate a lot of things, as long
as there are benefits (faster or cheaper machines, better security,
etc.) and fixes require reasonable amount of work.  So that
really is question of cost/benefit ratio.

-- 
                              Waldek Hebisch

[toc] | [prev] | [next] | [standalone]

#110462

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2025-01-10 10:25 +0000
Message-ID	<2025Jan10.112523@mips.complang.tuwien.ac.at>
In reply to	#110459

antispam@fricas.org (Waldek Hebisch) writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> If yes, a few years down the road your prospective customers have to
>> decide whether to go for your newfangled architecture or one of the
>> established ones.  They learn that a number of programs work
>> everywhere else, but not on your architecture.  How many of them will
>> be placated by your reasoning that these programs are not strictly
>> confoming standard programs?  How many will be alarmed by your
>> admission that you find it ok that you find it ok that such programs
>> don't work on your architecture?  After all, hardly any program is a
>> strictly conforming standard program.
>
>Such things happended many times in the past.  AFAIK standard
>setup on a VAX was that accessing data at address 0 gave you 0.
>A lot of VAX programs needed fixes to run on different machines.

That case is interesting.  It's certainly a benefit to programmers if
most uses of NULL produce a SIGSEGV, but for existing programs mapping
allowing to have accessible memory in page 0 is an advantage.  So how
did we get from there to where we are now?

First, my guess is that the VAX is only called out because it was so
popular, and it was one of the first Unix machines where doing it
differently was possible.  I am sure that earlier Unix tragets without
virtual memory used memory starting with address 1 because they would
otherwise have wasted precious memory.

Anyway, once we had virtual memory, whether to use the start of the
address space is not an issue of the ABI (which is hard to change),
but could be determined by programmers on linking.  I guess that at
first they used explicit options for making the first page
unaccessible, and these options soon became the defaults.  By the time
I started with Unix in the later 1980s, that battle was over; I
certainly never experienced it as an issue, and only read about it in
papers on VAXocentrism.

>I remember issue with writing to strings: early C compilers
>put literal strings in writable memory and programs assumed that
>they can change strings.

gcc definitely had an option for that.  Again not an ABI issue, but
one that can be controlled by programmers on compilation.

>C 'errno' was made more abstract due
>to multithreading, it broke some programs.

That's pretty similar to an ABI issue (not sure if errno is in the
ABIs or not).  And the really perverse thing is that raw Unix and
Linux system calls have been thread-safe from the start.  It's only
the limitation of C language in early times (no struct returns,
bringing us back to the topic of the thread) that gave us the errno
variable in the C wrappers of these system calls that turned out not
to be thread-safe and led to problems later.

>Concerning varags,
>Power PC and later AMD-64 used calling convention incompatible
>with popular expectations.

I did not experience calling convention problems on PowerPC in my
software, so apparently it was compatible with my expectations.

Still, Power(PC) is very niche.  I recently talked to someone who
worked a lot on Power while he was at IBM (now he no longer works for
IBM); I asked him why people are buying Power, and he said something
along the lines that IBM is satisfying a base of established
customers.  Maybe Power would be more popular if it had had a calling
convention compatible with popular expectations, probably not.

As for AMD64, whatever popular expectation they may have been
incompatible with (again I experienced no problems), the user could
fall back to the IA-32 calling convention (i.e., compile the program
as a 32-bit program, or just run the existing 32-bit binary),
providing an easy workaround for ABI problems for existing, working
programs.

>Concerning customers, they will tolerate a lot of things, as long
>as there are benefits (faster

Didn't work out for Alpha.

>or cheaper machines,

People are abandoning PCs in favour of Raspis?  Does not look that way
to me.

>better security,

Oh, really?  Which machine became a success because of better security?

>etc.) and fixes require reasonable amount of work.

Many customers expect a machine that's compatible with their legacy
software, and are not willing (or at all able) to "fix" it.  Many even
require machines that are officially supported by the software vendor.
And for a software vendor, the need for one fix is probably a sign
that the platform is not as compatible as they would like, and that
qualifying that platform requires more work, and they will charge that
work to the platform's customers.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

[toc] | [prev] | [next] | [standalone]

#110464

From	John Levine <johnl@taugh.com>
Date	2025-01-10 15:17 +0000
Message-ID	<vlrdm9$964$1@gal.iecc.com>
In reply to	#110462

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>Such things happended many times in the past.  AFAIK standard
>>setup on a VAX was that accessing data at address 0 gave you 0.
>>A lot of VAX programs needed fixes to run on different machines.
>
>That case is interesting.  It's certainly a benefit to programmers if
>most uses of NULL produce a SIGSEGV, but for existing programs mapping
>allowing to have accessible memory in page 0 is an advantage.  So how
>did we get from there to where we are now?
>
>First, my guess is that the VAX is only called out because it was so
>popular, and it was one of the first Unix machines where doing it
>differently was possible.  I am sure that earlier Unix tragets without
>virtual memory used memory starting with address 1 because they would
>otherwise have wasted precious memory.

It was a bug. As I recall, the first thing in the address space in Berkeley Unix
was a register save mask where the low byte happened to be zero, and a lot of
sloppy programs written by students accidentally depended on it, e.g.

  if(*p == 0) /* no string */

For a while ports to 68K and other architectures ensured there was a zero byte at
location zero so the Berkeley programs wouldn't crash, but eventually people fixed
the code.

Location 0 on the PDP-11 had nothing memorable and we did our string tests correctly.
You could deferenence a null pointer but you got a string of junk.




-- 
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

[toc] | [prev] | [next] | [standalone]

#110479

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-13 02:10 +0000
Message-ID	<6be6d207cf7386fb66d47f2fe619df71@www.novabbs.org>
In reply to	#110462

On Fri, 10 Jan 2025 10:25:23 +0000, Anton Ertl wrote:

> antispam@fricas.org (Waldek Hebisch) writes:
>>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>> If yes, a few years down the road your prospective customers have to
>>> decide whether to go for your newfangled architecture or one of the
>>> established ones.  They learn that a number of programs work
>>> everywhere else, but not on your architecture.  How many of them will
>>> be placated by your reasoning that these programs are not strictly
>>> confoming standard programs?  How many will be alarmed by your
>>> admission that you find it ok that you find it ok that such programs
>>> don't work on your architecture?  After all, hardly any program is a
>>> strictly conforming standard program.
>>
>>Such things happended many times in the past.  AFAIK standard
>>setup on a VAX was that accessing data at address 0 gave you 0.
>>A lot of VAX programs needed fixes to run on different machines.
>
> That case is interesting.  It's certainly a benefit to programmers if
> most uses of NULL produce a SIGSEGV, but for existing programs mapping
> allowing to have accessible memory in page 0 is an advantage.  So how
> did we get from there to where we are now?

The blame goes to defining NULL as a pointer that is not pointing at
anything. We have no integer that has the property of one value that
is not an integer--we COULD have had such a value (NEG_MAX on 2's
complement, -0 on 1's complement), but no..........

>>C 'errno' was made more abstract due
>>to multithreading, it broke some programs.

errno is an atrocity all by itself; single handedly preventing
direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
as instructions.

Lobbing errno over into Thread Local Store just makes the problems
worse.

> That's pretty similar to an ABI issue (not sure if errno is in the
> ABIs or not).

errno is not ABI, errno is part of subroutine definitions within
a library. That errno can be set from different libraries, and
that errno got dropped in TLS makes it doubly idiotic.

[toc] | [prev] | [next] | [standalone]

#110480

From	scott@slp53.sl.home (Scott Lurndal)
Date	2025-01-13 14:19 +0000
Message-ID	<3i9hP.109125$zX7.43937@fx37.iad>
In reply to	#110479

mitchalsup@aol.com (MitchAlsup1) writes:
>On Fri, 10 Jan 2025 10:25:23 +0000, Anton Ertl wrote:
>
>> antispam@fricas.org (Waldek Hebisch) writes:
>>>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>>> If yes, a few years down the road your prospective customers have to
>>>> decide whether to go for your newfangled architecture or one of the
>>>> established ones.  They learn that a number of programs work
>>>> everywhere else, but not on your architecture.  How many of them will
>>>> be placated by your reasoning that these programs are not strictly
>>>> confoming standard programs?  How many will be alarmed by your
>>>> admission that you find it ok that you find it ok that such programs
>>>> don't work on your architecture?  After all, hardly any program is a
>>>> strictly conforming standard program.
>>>
>>>Such things happended many times in the past.  AFAIK standard
>>>setup on a VAX was that accessing data at address 0 gave you 0.
>>>A lot of VAX programs needed fixes to run on different machines.
>>
>> That case is interesting.  It's certainly a benefit to programmers if
>> most uses of NULL produce a SIGSEGV, but for existing programs mapping
>> allowing to have accessible memory in page 0 is an advantage.  So how
>> did we get from there to where we are now?
>
>The blame goes to defining NULL as a pointer that is not pointing at
>anything. We have no integer that has the property of one value that
>is not an integer--we COULD have had such a value (NEG_MAX on 2's
>complement, -0 on 1's complement), but no..........

One of the advantages of BCD systems - we could define a NULL
pointer value that was non-zero, non-numeric, and didn't point
to anything.

(0xc0eeeeee).

[toc] | [prev] | [next] | [standalone]

#110482

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2025-01-13 18:02 +0000
Message-ID	<vm3kf2$1t0s1$1@dont-email.me>
In reply to	#110479

MitchAlsup1 <mitchalsup@aol.com> schrieb:

> errno is an atrocity all by itself; single handedly preventing
> direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
> as instructions.

Fortunately, the C standard does not require errno to be set
for these functions.  Apple, for example, does not do so.

[toc] | [prev] | [next] | [standalone]

#110483

From	mitchalsup@aol.com (MitchAlsup1)
Date	2025-01-13 19:00 +0000
Message-ID	<6248473300a9fc0fd964c635510f510d@www.novabbs.org>
In reply to	#110482

On Mon, 13 Jan 2025 18:02:10 +0000, Thomas Koenig wrote:

> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>
>> errno is an atrocity all by itself; single handedly preventing
>> direct use of SIN(), COS(), TAN(), ATAN(), exp(), ln(), pow()
>> as instructions.
>
> Fortunately, the C standard does not require errno to be set
> for these functions.  Apple, for example, does not do so.

Nor will I.

[toc] | [prev] | [next] | [standalone]

Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →

csiph-web

Calling conventions (particularly 32-bit ARM)

Contents

#115541

#110446

#110449

#110450

#110451

#110452

#110453

#110455

#110461

#110456

#110457

#110458

#110460

#110459

#110462

#110464

#110479

#110480

#110482

#110483