Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.arch > #110403 > unrolled thread
| Started by | David Brown <david.brown@hesbynett.no> |
|---|---|
| First post | 2025-01-06 14:57 +0100 |
| Last post | 2025-01-12 14:59 +0100 |
| Articles | 20 on this page of 97 — 17 participants |
Back to article view | Back to comp.arch
Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-06 14:57 +0100
Re: Calling conventions (particularly 32-bit ARM) Theo <theom+news@chiark.greenend.org.uk> - 2025-01-06 15:23 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 09:22 +0100
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-06 15:32 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-06 20:19 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 10:09 +0100
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-07 23:23 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-07 23:35 +0000
Re: Calling conventions (particularly 32-bit ARM) "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-01-07 15:42 -0800
Re: Calling conventions (particularly 32-bit ARM) "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-01-07 20:01 -0800
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-08 01:38 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-07 09:49 +0100
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-06 20:10 +0000
Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-07 02:11 +0000
Re: Calling conventions (particularly 32-bit ARM) Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-01-07 06:53 +0000
Re: Calling conventions (particularly 32-bit ARM) Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2025-01-12 12:10 -0800
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-12 20:41 +0000
Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-13 01:20 +0000
Re: Calling conventions (particularly 32-bit ARM) Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2025-01-14 09:40 -0800
Re: Calling conventions (particularly 32-bit ARM) Terje Mathisen <terje.mathisen@tmsw.no> - 2025-01-14 19:18 +0100
Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 23:48 +0200
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 23:27 +0000
Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-15 03:31 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-15 16:50 +0000
Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-15 22:03 +0000
Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-16 03:02 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-16 15:08 +0000
Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-16 16:24 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-13 21:33 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 06:48 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 18:03 +0000
Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2025-01-07 16:52 -0500
Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 12:20 -0500
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-09 08:38 +0000
Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-13 10:55 -0500
Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-01-27 17:09 -0800
Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2025-01-28 22:53 -0500
Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-02-14 20:40 -0800
Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2026-02-17 15:35 -0500
Re: Calling conventions (particularly 32-bit ARM) Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-14 09:34 -0700
Re: Calling conventions (particularly 32-bit ARM) George Neuner <gneuner2@comcast.net> - 2026-03-24 17:20 -0400
Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 12:34 -0500
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-08 20:19 +0000
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-08 22:08 +0000
Re: Calling conventions (particularly 32-bit ARM) Stefan Monnier <monnier@iro.umontreal.ca> - 2025-01-08 18:20 -0500
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-09 00:11 +0000
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-09 07:23 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-09 10:07 +0000
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-10 08:24 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-09 20:48 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-09 21:23 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-10 01:08 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-10 09:19 +0000
Re: Calling conventions (particularly 32-bit ARM) antispam@fricas.org (Waldek Hebisch) - 2025-01-10 08:33 +0000
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-10 10:25 +0000
Re: Calling conventions (particularly 32-bit ARM) John Levine <johnl@taugh.com> - 2025-01-10 15:17 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 02:10 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 14:19 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-13 18:02 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 19:00 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 21:53 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-13 22:02 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-13 22:40 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 02:32 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 15:08 +0100
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 14:22 +0000
Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 16:41 +0200
Re: Calling conventions (particularly 32-bit ARM) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-01-14 18:02 +0000
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 18:15 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 18:19 +0000
Re: Calling conventions (particularly 32-bit ARM) Terje Mathisen <terje.mathisen@tmsw.no> - 2025-01-14 19:39 +0100
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 19:14 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 20:01 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 22:05 +0000
Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 15:23 -0800
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 23:39 +0000
Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 16:59 -0800
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 23:40 +0000
Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 17:11 -0800
Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-15 00:09 +0200
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-14 19:08 +0000
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 19:24 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 20:31 +0000
Re: Calling conventions (particularly 32-bit ARM) Michael S <already5chosen@yahoo.com> - 2025-01-14 23:13 +0200
Re: Calling conventions (particularly 32-bit ARM) mitchalsup@aol.com (MitchAlsup1) - 2025-01-15 00:47 +0000
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-14 06:20 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 15:05 +0100
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-14 14:39 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-14 16:50 +0100
Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 15:32 -0800
Re: Calling conventions (particularly 32-bit ARM) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-01-14 00:14 -0800
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-10 14:43 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-10 18:39 +0100
Re: Calling conventions (particularly 32-bit ARM) scott@slp53.sl.home (Scott Lurndal) - 2025-01-10 18:39 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-12 14:55 +0100
Re: Calling conventions (particularly 32-bit ARM) Thomas Koenig <tkoenig@netcologne.de> - 2025-01-10 19:19 +0000
Re: Calling conventions (particularly 32-bit ARM) David Brown <david.brown@hesbynett.no> - 2025-01-12 14:59 +0100
Page 2 of 5 — ← Prev page 1 [2] 3 4 5 Next page →
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2025-01-14 23:48 +0200 |
| Message-ID | <20250114234819.000012ac@yahoo.com> |
| In reply to | #110502 |
On Tue, 14 Jan 2025 19:18:27 +0100 Terje Mathisen <terje.mathisen@tmsw.no> wrote: > Stephen Fuld wrote: > > On 1/12/2025 5:20 PM, Waldek Hebisch wrote: > >> You are implicitely assuming that passing large number of > >> arguments is expensive. > > > > I guess. I am actually assuming that passing arguments in memory > > is more expensive than passing them in registers. I don't think > > that is controversial. > > Usually true, except for recursive functions where you have to store > most stuff on the stack anyway, so going directly there can sometimes > generate more compact code. > > Terje > I would think that for Fortran (==everything passed by reference) memory would beat registers most of the time. May be, except for functions with 0-4 parameters. Do common Fortarn compilers even bother with passing in register? It would require replacement of natural by-reference "pointer in register points to value in memory" calling sequence to something like copy-in/copy-out, right?
[toc] | [prev] | [next] | [standalone]
| From | mitchalsup@aol.com (MitchAlsup1) |
|---|---|
| Date | 2025-01-14 23:27 +0000 |
| Message-ID | <c7d39917db81f33fd90a8b57e7d0f85d@www.novabbs.org> |
| In reply to | #110511 |
On Tue, 14 Jan 2025 21:48:19 +0000, Michael S wrote: > On Tue, 14 Jan 2025 19:18:27 +0100 > Terje Mathisen <terje.mathisen@tmsw.no> wrote: > >> Stephen Fuld wrote: >>> On 1/12/2025 5:20 PM, Waldek Hebisch wrote: >>>> You are implicitely assuming that passing large number of >>>> arguments is expensive. >>> >>> I guess. I am actually assuming that passing arguments in memory >>> is more expensive than passing them in registers. I don't think >>> that is controversial. >> >> Usually true, except for recursive functions where you have to store >> most stuff on the stack anyway, so going directly there can sometimes >> generate more compact code. >> >> Terje >> > > I would think that for Fortran (==everything passed by reference) > memory would beat registers most of the time. Pass by COMMON block was even faster. > May be, except for > functions with 0-4 parameters. > Do common Fortarn compilers even bother with passing in register? Fortran compilers are given an ABI (leaning towards C, C++) and are required to "do something reasonable" in mapping Fortran conventions into C conventions. C subroutines on the called side, then, have to have a data structure identical to what Fortran compiler would have produced (Dope Vector). C callers will have to use those kinds of structures to successfully call Fortran entry points. > It would require replacement of natural by-reference "pointer in > register points to value in memory" calling sequence to something like > copy-in/copy-out, right? No, Fortran will pass dope vectors to called subroutines. The called subroutine needs to understand the dope vector.
[toc] | [prev] | [next] | [standalone]
| From | John Levine <johnl@taugh.com> |
|---|---|
| Date | 2025-01-15 03:31 +0000 |
| Message-ID | <vm7a73$qm2$1@gal.iecc.com> |
| In reply to | #110515 |
According to MitchAlsup1 <mitchalsup@aol.com>: >> I would think that for Fortran (==everything passed by reference) >> memory would beat registers most of the time. > >Pass by COMMON block was even faster. Sometimes. On machines that don't have direct addressing, such as S/360, the code needs to load a pointer to the data either way so it's a wash. Even when you do have direct addressing, if code is compiled to be position indepedent, the common block wouldn't be in the same module as the code that references it so it still needs to load a pointer from the GOT or whatever its equivalent is. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
[toc] | [prev] | [next] | [standalone]
| From | mitchalsup@aol.com (MitchAlsup1) |
|---|---|
| Date | 2025-01-15 16:50 +0000 |
| Message-ID | <4c878826966f3f8d5f3241250b544473@www.novabbs.org> |
| In reply to | #110523 |
On Wed, 15 Jan 2025 3:31:47 +0000, John Levine wrote: > According to MitchAlsup1 <mitchalsup@aol.com>: >>> I would think that for Fortran (==everything passed by reference) >>> memory would beat registers most of the time. >> >>Pass by COMMON block was even faster. > > Sometimes. On machines that don't have direct addressing, such as > S/360, > the code needs to load a pointer to the data either way so it's a wash. > > Even when you do have direct addressing, if code is compiled to be > position indepedent, the common block wouldn't be in the same module > as the code that references it so it still needs to load a pointer > from the GOT or whatever its equivalent is. Pass by COMMON block allows one to pass hundreds of data values in a single call. You are treating the common block as if it had but one data container.
[toc] | [prev] | [next] | [standalone]
| From | John Levine <johnl@taugh.com> |
|---|---|
| Date | 2025-01-15 22:03 +0000 |
| Message-ID | <vm9bca$1ns0$1@gal.iecc.com> |
| In reply to | #110527 |
According to MitchAlsup1 <mitchalsup@aol.com>: >On Wed, 15 Jan 2025 3:31:47 +0000, John Levine wrote: > >> According to MitchAlsup1 <mitchalsup@aol.com>: >>>Pass by COMMON block was even faster. >> >> Sometimes. On machines that don't have direct addressing, such as >> S/360, >> the code needs to load a pointer to the data either way so it's a wash. >> >> Even when you do have direct addressing, if code is compiled to be >> position indepedent, the common block wouldn't be in the same module >> as the code that references it so it still needs to load a pointer >> from the GOT or whatever its equivalent is. > >Pass by COMMON block allows one to pass hundreds of data values in a >single call. > >You are treating the common block as if it had but one data container. If I were that kind of programmer, I could use EQUIVALENCE to glue a bunch of local variables and arrays together and pass that as a subroutine argument. Also remember that on machines without direct addressing there's extra code if the size of a block of whatever size is more than the offset size in an instruction, 12 bits on S/360 and usually 16 on z. It's really a matter of taste and programming style more than efficiency. R's, John -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
[toc] | [prev] | [next] | [standalone]
| From | antispam@fricas.org (Waldek Hebisch) |
|---|---|
| Date | 2025-01-16 03:02 +0000 |
| Message-ID | <vm9ssi$3so92$1@paganini.bofh.team> |
| In reply to | #110515 |
MitchAlsup1 <mitchalsup@aol.com> wrote:
> On Tue, 14 Jan 2025 21:48:19 +0000, Michael S wrote:
>
>> On Tue, 14 Jan 2025 19:18:27 +0100
>> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>>
>>> Stephen Fuld wrote:
>>>> On 1/12/2025 5:20 PM, Waldek Hebisch wrote:
>>>>> You are implicitely assuming that passing large number of
>>>>> arguments is expensive.
>>>>
>>>> I guess. I am actually assuming that passing arguments in memory
>>>> is more expensive than passing them in registers. I don't think
>>>> that is controversial.
>>>
>>> Usually true, except for recursive functions where you have to store
>>> most stuff on the stack anyway, so going directly there can sometimes
>>> generate more compact code.
>>>
>>> Terje
>>>
>>
>> I would think that for Fortran (==everything passed by reference)
>> memory would beat registers most of the time.
One still needs to pass _values_ of addresses. Doing it in
registers (assuming that enough are available) is likely to
be more efficient than storing addresses in memory and
re-fetching them later. _Relatively_ difference between
passing in registers and passing in memory is smaller, as
there are memory references to access arguments, but registers
are likely to be a plus (unless there is excessive spiling and
called routine needs to write addreses to memory and load
them later).
> Pass by COMMON block was even faster.
I do not think so. I LAPACK-like cases there are array arguments.
Normal calling convention needs to store and later read parameters
and pass addresses. COMMON would force copying of entire arrays,
much less efficienct than handling parameters.
In complicated program there could be many COMMON blocks, leading
to worse locality than stack use (not relevant for cacheless
machine and one with very bing caches, but could make a difference
for machines with small caches).
>> It would require replacement of natural by-reference "pointer in
>> register points to value in memory" calling sequence to something like
>> copy-in/copy-out, right?
>
> No, Fortran will pass dope vectors to called subroutines. The
> called subroutine needs to understand the dope vector.
I would not say this. AFAIK in Fortran 77 caller passes enough
information so that called routine can construct its own dope
vector (if desired). IIUC that is very similar to VMT-s in C99.
I think PL/I, Ada, Extended Pascal and probably Fortran 90 use
dope vectors.
--
Waldek Hebisch
[toc] | [prev] | [next] | [standalone]
| From | mitchalsup@aol.com (MitchAlsup1) |
|---|---|
| Date | 2025-01-16 15:08 +0000 |
| Message-ID | <9ee003d6b85267ba8ae18aa96618278b@www.novabbs.org> |
| In reply to | #110534 |
On Thu, 16 Jan 2025 3:02:44 +0000, Waldek Hebisch wrote:
> MitchAlsup1 <mitchalsup@aol.com> wrote:
>>
>
>> Pass by COMMON block was even faster.
>
> I do not think so. I LAPACK-like cases there are array arguments.
> Normal calling convention needs to store and later read parameters
> and pass addresses. COMMON would force copying of entire arrays,
> much less efficienct than handling parameters.
SUBROUTINE FOO
COMMON /ALPHA/ i,j,k,a[100],b[100],c[100,100]
See no arguments, passed directly by common-block, no copying of
data, no dope vectors needed.
[toc] | [prev] | [next] | [standalone]
| From | antispam@fricas.org (Waldek Hebisch) |
|---|---|
| Date | 2025-01-16 16:24 +0000 |
| Message-ID | <vmbbs4$3v6su$2@paganini.bofh.team> |
| In reply to | #110541 |
MitchAlsup1 <mitchalsup@aol.com> wrote:
> On Thu, 16 Jan 2025 3:02:44 +0000, Waldek Hebisch wrote:
>
>> MitchAlsup1 <mitchalsup@aol.com> wrote:
>>>
>>
>>> Pass by COMMON block was even faster.
>>
>> I do not think so. I LAPACK-like cases there are array arguments.
>> Normal calling convention needs to store and later read parameters
>> and pass addresses. COMMON would force copying of entire arrays,
>> much less efficienct than handling parameters.
>
> SUBROUTINE FOO
> COMMON /ALPHA/ i,j,k,a[100],b[100],c[100,100]
>
> See no arguments, passed directly by common-block, no copying of
> data, no dope vectors needed.
No copy only if there is single set of arguments. If there are
different arguments, then one needs to pass them, that is copy
them.
--
Waldek Hebisch
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2025-01-13 21:33 +0000 |
| Message-ID | <vm40rc$21r0r$1@dont-email.me> |
| In reply to | #110476 |
Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb: > Has Lapack (and the other old style Fortran numeric > code that Waldek mentioned) lost its/their importance as a major user of > CPU cycles? It's less than it used to be in the days when supercomputers roamed the computer centers, but for these applications where it matters, it can be significant. > Or do these subroutines consume so many CPU cycles that the > overhead of the large number of parameters is lost in the noise? If you have many small matrices to multiply, startup overhead can be quite significant. Not on a 2000*2000 matrix, though. > Or is > there some other explanation for Mitch not considering their importance? I think eight arguments, passed by reference in registers, is not too bad.
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2025-01-14 06:48 +0000 |
| Message-ID | <vm51cd$2ac32$1@dont-email.me> |
| In reply to | #110484 |
I wrote: > Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb: >> Has Lapack (and the other old style Fortran numeric >> code that Waldek mentioned) lost its/their importance as a major user of >> CPU cycles? > > It's less than it used to be in the days when supercomputers > roamed the computer centers, but for these applications where > it matters, it can be significant. > >> Or do these subroutines consume so many CPU cycles that the >> overhead of the large number of parameters is lost in the noise? > > If you have many small matrices to multiply, startup overhead > can be quite significant. Not on a 2000*2000 matrix, though. > >> Or is >> there some other explanation for Mitch not considering their importance? > > I think eight arguments, passed by reference in registers, is not > too bad. ... when the rest can be passed on the stack.
[toc] | [prev] | [next] | [standalone]
| From | mitchalsup@aol.com (MitchAlsup1) |
|---|---|
| Date | 2025-01-14 18:03 +0000 |
| Message-ID | <a2b9c4c04c26bff6682d7b11c939e347@www.novabbs.org> |
| In reply to | #110490 |
On Tue, 14 Jan 2025 6:48:45 +0000, Thomas Koenig wrote: > I wrote: >> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb: >>> Has Lapack (and the other old style Fortran numeric >>> code that Waldek mentioned) lost its/their importance as a major user of >>> CPU cycles? >> >> It's less than it used to be in the days when supercomputers >> roamed the computer centers, but for these applications where >> it matters, it can be significant. >> >>> Or do these subroutines consume so many CPU cycles that the >>> overhead of the large number of parameters is lost in the noise? >> >> If you have many small matrices to multiply, startup overhead >> can be quite significant. Not on a 2000*2000 matrix, though. >> >>> Or is >>> there some other explanation for Mitch not considering their importance? >> >> I think eight arguments, passed by reference in registers, is not >> too bad. > > .... when the rest can be passed on the stack. And those passed in registers can be stored into memory adjacent to the memory arguments easily.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2025-01-07 16:52 -0500 |
| Message-ID | <ve4rnjpipquiugm3j1831m0a7v0va4lu03@4ax.com> |
| In reply to | #110413 |
On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) wrote: >I looked high and low for codes using more than 8 arguments and >returning aggregates larger than 8 double words, and about the >only things I found were a handful of []print[]() calls. Large numbers of parameters may be generated either by closure conversion or by lambda lifting. These are FP language transformations that are analogous to, but potentially more complex than, the rewriting of object methods and their call sites to pass the current object in an OO language. [The difference between closure conversion and lambda lifting is the scope of the tranformation: conversion limits code transformations to within the defining call chain, whereas lifting pulls the closure to top level making it (at least potentially) globally available.] In either case the original function is rewritten such that non-local variables can be passed as parameters. The function's code must be altered to access the non-locals - either directly as explicit individual parameters, or by indexing from a pointer to an environment data structure. While in a simple case this could look exactly like the OO method transformation, recall that a general closure may require access to non-local variables spread through multiple environments. Even if whole environments are passed via single pointers, there still may need to be multiple parameters added. Where exactly the line is drawn between passing individual variables from an enviroment vs passing the whole enviroment is a heuristic that is tied to the CPU's argument passing convention. YMMV.
[toc] | [prev] | [next] | [standalone]
| From | Stefan Monnier <monnier@iro.umontreal.ca> |
|---|---|
| Date | 2025-01-08 12:20 -0500 |
| Message-ID | <jwv8qrlql4p.fsf-monnier+comp.arch@gnu.org> |
| In reply to | #110436 |
>>I looked high and low for codes using more than 8 arguments and
>>returning aggregates larger than 8 double words, and about the
>>only things I found were a handful of []print[]() calls.
> Large numbers of parameters may be generated either by closure
> conversion or by lambda lifting.
AFAIK in these cases the same compiler generates the code for the
function and for the calls, so it should be pretty much free to use any
calling convention it likes.
Stefan
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2025-01-09 08:38 +0000 |
| Message-ID | <2025Jan9.093832@mips.complang.tuwien.ac.at> |
| In reply to | #110445 |
Stefan Monnier <monnier@iro.umontreal.ca> writes: >AFAIK in these cases the same compiler generates the code for the >function and for the calls, so it should be pretty much free to use any >calling convention it likes. With separate compilation, the compiler does not know which other compiler generates the code for the caller of a function or the callee of a function. ABI Calling conventions exist in order to make code by different compilers (whether the same language or a different one) interoperable. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
[toc] | [prev] | [next] | [standalone]
| From | Stefan Monnier <monnier@iro.umontreal.ca> |
|---|---|
| Date | 2025-01-13 10:55 -0500 |
| Message-ID | <jwvldvepvh5.fsf-monnier+comp.arch@gnu.org> |
| In reply to | #110454 |
Anton Ertl [2025-01-09 08:38:32] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>AFAIK in these cases the same compiler generates the code for the
>>function and for the calls, so it should be pretty much free to use any
>>calling convention it likes.
> With separate compilation, the compiler does not know which other
> compiler generates the code for the caller of a function or the callee
> of a function.
My reply was to:
Large numbers of parameters may be generated either by closure
conversion or by lambda lifting.
Can you show me an example where that happens and where the caller and
the callee can be generated by different compilers?
Stefan
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2025-01-27 17:09 -0800 |
| Message-ID | <86lduv3frs.fsf@linuxsc.com> |
| In reply to | #110436 |
George Neuner <gneuner2@comcast.net> writes: > On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) > wrote: > >> I looked high and low for codes using more than 8 arguments and >> returning aggregates larger than 8 double words, and about the >> only things I found were a handful of []print[]() calls. > > Large numbers of parameters may be generated either by closure > conversion or by lambda lifting. These are FP language > transformations that are analogous to, but potentially more complex > than, the rewriting of object methods and their call sites to pass the > current object in an OO language. > > [The difference between closure conversion and lambda lifting is the > scope of the tranformation: conversion limits code transformations to > within the defining call chain, whereas lifting pulls the closure to > top level making it (at least potentially) globally available.] > > In either case the original function is rewritten such that non-local > variables can be passed as parameters. The function's code must be > altered to access the non-locals - either directly as explicit > individual parameters, or by indexing from a pointer to an environment > data structure. > > While in a simple case this could look exactly like the OO method > transformation, recall that a general closure may require access to > non-local variables spread through multiple environments. Even if > whole environments are passed via single pointers, there still may > need to be multiple parameters added. Isn't it the case that access to all of the enclosing environments can be provided by passing a single pointer? I'm pretty sure it is.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2025-01-28 22:53 -0500 |
| Message-ID | <uh8jpj1qarqbauqdv3lk4foikocgbv7fq9@4ax.com> |
| In reply to | #110640 |
On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >George Neuner <gneuner2@comcast.net> writes: > >> On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) >> wrote: >> >>> I looked high and low for codes using more than 8 arguments and >>> returning aggregates larger than 8 double words, and about the >>> only things I found were a handful of []print[]() calls. >> >> Large numbers of parameters may be generated either by closure >> conversion or by lambda lifting. These are FP language >> transformations that are analogous to, but potentially more complex >> than, the rewriting of object methods and their call sites to pass the >> current object in an OO language. >> >> [The difference between closure conversion and lambda lifting is the >> scope of the tranformation: conversion limits code transformations to >> within the defining call chain, whereas lifting pulls the closure to >> top level making it (at least potentially) globally available.] >> >> In either case the original function is rewritten such that non-local >> variables can be passed as parameters. The function's code must be >> altered to access the non-locals - either directly as explicit >> individual parameters, or by indexing from a pointer to an environment >> data structure. >> >> While in a simple case this could look exactly like the OO method >> transformation, recall that a general closure may require access to >> non-local variables spread through multiple environments. Even if >> whole environments are passed via single pointers, there still may >> need to be multiple parameters added. > >Isn't it the case that access to all of the enclosing environments >can be provided by passing a single pointer? I'm pretty sure it >is. Certainly, if the enclosing environments somehow are chained together. In real code though, in many instances such a chain will not already exist when the closure is constructed. The compiler would have to install pointers to the needed environments (or, alternatively, pointers directly to the needed values) into the new closure's immediate environment. [essentially this creates a private "display" for the closure.] Completely doable: it is simply that, if there are enough registers, passing the pointers as parameters will tend to be more performant.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2026-02-14 20:40 -0800 |
| Message-ID | <86o6lqhb5x.fsf@linuxsc.com> |
| In reply to | #110645 |
George Neuner <gneuner2@comcast.net> writes: > On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch > <tr.17687@z991.linuxsc.com> wrote: > >> George Neuner <gneuner2@comcast.net> writes: >> >>> On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) >>> wrote: >>> >>>> I looked high and low for codes using more than 8 arguments and >>>> returning aggregates larger than 8 double words, and about the >>>> only things I found were a handful of []print[]() calls. >>> >>> Large numbers of parameters may be generated either by closure >>> conversion or by lambda lifting. These are FP language >>> transformations that are analogous to, but potentially more complex >>> than, the rewriting of object methods and their call sites to pass the >>> current object in an OO language. >>> >>> [The difference between closure conversion and lambda lifting is the >>> scope of the tranformation: conversion limits code transformations to >>> within the defining call chain, whereas lifting pulls the closure to >>> top level making it (at least potentially) globally available.] >>> >>> In either case the original function is rewritten such that non-local >>> variables can be passed as parameters. The function's code must be >>> altered to access the non-locals - either directly as explicit >>> individual parameters, or by indexing from a pointer to an environment >>> data structure. >>> >>> While in a simple case this could look exactly like the OO method >>> transformation, recall that a general closure may require access to >>> non-local variables spread through multiple environments. Even if >>> whole environments are passed via single pointers, there still may >>> need to be multiple parameters added. >> >> Isn't it the case that access to all of the enclosing environments >> can be provided by passing a single pointer? I'm pretty sure it >> is. > > Certainly, if the enclosing environments somehow are chained together. > In real code though, in many instances such a chain will not already > exist when the closure is constructed. The compiler would have to > install pointers to the needed environments (or, alternatively, > pointers directly to the needed values) into the new closure's > immediate environment. > [essentially this creates a private "display" for the closure.] > > Completely doable: it is simply that, if there are enough registers, > passing the pointers as parameters will tend to be more performant. Sounds like you're saying that you agree that passing just one value is always feasible. Also that, depending on individual circumstances, either approach might have better performance.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2026-02-17 15:35 -0500 |
| Message-ID | <tqf9pk9kfchiaib67sfgchjm84sspnl5oe@4ax.com> |
| In reply to | #114976 |
Hi Tim, On Sat, 14 Feb 2026 20:40:26 -0800, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >George Neuner <gneuner2@comcast.net> writes: > >> On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch >> <tr.17687@z991.linuxsc.com> wrote: >> >>> George Neuner <gneuner2@comcast.net> writes: >>> >>>> On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) >>>> wrote: >>>> >>>>> I looked high and low for codes using more than 8 arguments and >>>>> returning aggregates larger than 8 double words, and about the >>>>> only things I found were a handful of []print[]() calls. >>>> >>>> Large numbers of parameters may be generated either by closure >>>> conversion or by lambda lifting. These are FP language >>>> transformations that are analogous to, but potentially more complex >>>> than, the rewriting of object methods and their call sites to pass the >>>> current object in an OO language. >>>> >>>> [The difference between closure conversion and lambda lifting is the >>>> scope of the tranformation: conversion limits code transformations to >>>> within the defining call chain, whereas lifting pulls the closure to >>>> top level making it (at least potentially) globally available.] >>>> >>>> In either case the original function is rewritten such that non-local >>>> variables can be passed as parameters. The function's code must be >>>> altered to access the non-locals - either directly as explicit >>>> individual parameters, or by indexing from a pointer to an environment >>>> data structure. >>>> >>>> While in a simple case this could look exactly like the OO method >>>> transformation, recall that a general closure may require access to >>>> non-local variables spread through multiple environments. Even if >>>> whole environments are passed via single pointers, there still may >>>> need to be multiple parameters added. >>> >>> Isn't it the case that access to all of the enclosing environments >>> can be provided by passing a single pointer? I'm pretty sure it >>> is. >> >> Certainly, if the enclosing environments somehow are chained together. >> In real code though, in many instances such a chain will not already >> exist when the closure is constructed. The compiler would have to >> install pointers to the needed environments (or, alternatively, >> pointers directly to the needed values) into the new closure's >> immediate environment. >> [essentially this creates a private "display" for the closure.] >> >> Completely doable: it is simply that, if there are enough registers, >> passing the pointers as parameters will tend to be more performant. > >Sounds like you're saying that you agree that passing >just one value is always feasible. Also that, depending >on individual circumstances, either approach might have >better performance. You are correct ... it always is possible to pass the closure environment to the function using a single pointer. But you may not want to do it that way. My point was about the structure of closure environments. In general, you want to minimize what data needs to be persisted - particularly in a program that generates lots of /related/ closures - while also keeping in mind that data may need to be both shared among multiple closures [not just among multiple functions in a common closure]. It may be necessary, e.g., to pull data out of a stack context and heap allocate it instead. That requires changing the stack context to be a pointer rather than a value, rewriting any functions that expect the value to use the pointer instead, and constructing new persistent "environment" structures that can find the relocated data. This can require a lot of effort by the compiler. OTOH, if the structure of the program is such that the closure's non-local data is guaranteed to be in scope when the closure is invoked, it often is simpler just to rewrite closure functions to access that data via a pointer parameter, and change the call sites to pass the required pointer(s). The closure may still need a persistent enviroment, but this method reduces or eliminates the need for /chained/ environments, and having to rewrite other non-closure functions that happen to use the data. This also can require a lot of effort by the compiler, but the effort can be more focused on the closures, and less on "regular" code. What you really don't want in any case is to have to preserve entire stacks just to support creating closures. It doesn't matter whether the stack is linear or a chain of heap allocated structures[*]. Some rewriting and data relocation (out of the stack) will be necessary in any case. [*] yes, this actually is done in some GC'd language implementations. When the stack shrinks, discarded contexts are cleaned up by the GC.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2026-03-14 09:34 -0700 |
| Message-ID | <86a4wa74k0.fsf@linuxsc.com> |
| In reply to | #115023 |
George Neuner <gneuner2@comcast.net> writes: > Hi Tim, Hi. :) > On Sat, 14 Feb 2026 20:40:26 -0800, Tim Rentsch > <tr.17687@z991.linuxsc.com> wrote: > >> George Neuner <gneuner2@comcast.net> writes: >> >>> On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch >>> <tr.17687@z991.linuxsc.com> wrote: >>> >>>> George Neuner <gneuner2@comcast.net> writes: >>>> >>>>> On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1) >>>>> wrote: >>>>> >>>>>> I looked high and low for codes using more than 8 arguments and >>>>>> returning aggregates larger than 8 double words, and about the >>>>>> only things I found were a handful of []print[]() calls. >>>>> >>>>> Large numbers of parameters may be generated either by closure >>>>> conversion or by lambda lifting. These are FP language >>>>> transformations that are analogous to, but potentially more complex >>>>> than, the rewriting of object methods and their call sites to pass the >>>>> current object in an OO language. >>>>> >>>>> [The difference between closure conversion and lambda lifting is the >>>>> scope of the tranformation: conversion limits code transformations to >>>>> within the defining call chain, whereas lifting pulls the closure to >>>>> top level making it (at least potentially) globally available.] >>>>> >>>>> In either case the original function is rewritten such that non-local >>>>> variables can be passed as parameters. The function's code must be >>>>> altered to access the non-locals - either directly as explicit >>>>> individual parameters, or by indexing from a pointer to an environment >>>>> data structure. >>>>> >>>>> While in a simple case this could look exactly like the OO method >>>>> transformation, recall that a general closure may require access to >>>>> non-local variables spread through multiple environments. Even if >>>>> whole environments are passed via single pointers, there still may >>>>> need to be multiple parameters added. >>>> >>>> Isn't it the case that access to all of the enclosing environments >>>> can be provided by passing a single pointer? I'm pretty sure it >>>> is. >>> >>> Certainly, if the enclosing environments somehow are chained together. >>> In real code though, in many instances such a chain will not already >>> exist when the closure is constructed. The compiler would have to >>> install pointers to the needed environments (or, alternatively, >>> pointers directly to the needed values) into the new closure's >>> immediate environment. >>> [essentially this creates a private "display" for the closure.] >>> >>> Completely doable: it is simply that, if there are enough registers, >>> passing the pointers as parameters will tend to be more performant. >> >> Sounds like you're saying that you agree that passing >> just one value is always feasible. Also that, depending >> on individual circumstances, either approach might have >> better performance. > > You are correct ... it always is possible to pass the closure > environment to the function using a single pointer. > > But you may not want to do it that way. > > > My point was about the structure of closure environments. In general, > you want to minimize what data needs to be persisted - particularly in > a program that generates lots of /related/ closures - while also > keeping in mind that data may need to be both shared among multiple > closures [not just among multiple functions in a common closure]. > > It may be necessary, e.g., to pull data out of a stack context and > heap allocate it instead. That requires changing the stack context to > be a pointer rather than a value, rewriting any functions that expect > the value to use the pointer instead, and constructing new persistent > "environment" structures that can find the relocated data. > > This can require a lot of effort by the compiler. > > > OTOH, if the structure of the program is such that the closure's > non-local data is guaranteed to be in scope when the closure is > invoked, it often is simpler just to rewrite closure functions to > access that data via a pointer parameter, and change the call sites to > pass the required pointer(s). > > The closure may still need a persistent enviroment, but this method > reduces or eliminates the need for /chained/ environments, and having > to rewrite other non-closure functions that happen to use the data. > > This also can require a lot of effort by the compiler, but the effort > can be more focused on the closures, and less on "regular" code. > > > What you really don't want in any case is to have to preserve entire > stacks just to support creating closures. It doesn't matter whether > the stack is linear or a chain of heap allocated structures[*]. Some > rewriting and data relocation (out of the stack) will be necessary in > any case. > > [*] yes, this actually is done in some GC'd language implementations. > When the stack shrinks, discarded contexts are cleaned up by the GC. To address the last point first - I'm familiar with circumstances where "stack" data might be held in heap-allocated parcels. I think that possibility is orthogonal to the main issue, but in any case I'm aware of the need to use extra-stack memory in some cases. Returning to the main area of discussion - it occurs to me that we may have different operational assumptions about the situations where nested functions (or lambda expressions) are called. In particular, there is an important distinction between a call where it is known which nested function/lambda is being called, and a call where that is not known, for example by virtue of having been passed as a function argument, with the call being done through a formal parameter. This distinction is analogous to calling an outside function using its name versus calling a function through pointer to function, where it isn't known what the pointer might point to. When calling a function using the function's name, it is possible to expand the function body inline. (The inline expansion might be done at link time, but conceptually that is no different.) Conversely, when calling through a pointer-to-function, doing an inline expansion isn't feasible in general, because we don't know what function is going to be called, and also there might be more than one. The same sort of distinction occurs with calls to nested functions (or equivalently lambdas). The key point is that when packaging up a nested function plus data into a closure, they all have to look the same: one pointer to function and one pointer to data. That's because down the line the ultimate caller doesn't know where the closure came from, so all closures have to be structurally identical. When I talk about closures, my built-in assumption is that what is being talked about is the don't-know-who-is-being-called case. Any direct call to a nested function doesn't need a closure (there might be one as a matter of convenience, but there doesn't have to be), so any idea of adjusting call the call sites doesn't apply when we're talking about closures (again, to be clear, in the way I use the term closure). My sense now is that you are talking (mostly? exclusively?) about the direct call case. Is that a fair read of your comments?
[toc] | [prev] | [next] | [standalone]
Page 2 of 5 — ← Prev page 1 [2] 3 4 5 Next page →
Back to top | Article view | comp.arch
csiph-web