Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3006 > unrolled thread
| Started by | Rock Brentwood <rockbrentwood@gmail.com> |
|---|---|
| First post | 2022-05-16 12:27 -0700 |
| Last post | 2022-05-21 17:24 +0000 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.compilers
Fortran to C/C++ translation: a running example. Rock Brentwood <rockbrentwood@gmail.com> - 2022-05-16 12:27 -0700
Re: Fortran to C/C++ translation: a running example. Thomas Koenig <tkoenig@netcologne.de> - 2022-05-17 14:59 +0000
Re: Fortran to C/C++ translation: a running example. Lydia Marie Williamson <lydiamariewilliamson@gmail.com> - 2022-05-20 16:34 -0700
Re: Fortran to C/C++ translation: a running example. gah4 <gah4@u.washington.edu> - 2022-05-21 09:31 -0700
Re: Fortran to C/C++ translation: a running example. Thomas Koenig <tkoenig@netcologne.de> - 2022-05-21 17:24 +0000
| From | Rock Brentwood <rockbrentwood@gmail.com> |
|---|---|
| Date | 2022-05-16 12:27 -0700 |
| Subject | Fortran to C/C++ translation: a running example. |
| Message-ID | <22-05-032@comp.compilers> |
The classic text-based computer game Zork / dungeon was originally devised on MIT computers in a LISP-offshoot (MDL), and translated to Fortran 77 by an "Anonymous" author. Some time later an enterprising soul converted a version of the Fortran edition of Zork into C ... pre-ANSI C ... with the aid of an earlier version of "f2c", but left no detailed paper trail behind on the actual translation process and stages. I think this is the kind of project our moderator would really like. It's been retranslated from Fortran (with the aid of a later version of "f2c") here: https://github.com/LydiaMarieWilliamson/zork-fortran every intermediate stage of the process is archived in the history log and commit history. This was carried out in tandem with a revision of the Fortran source, itself (as Fortran 2018 no longer supports all of Fortran 77), and an upward revision of the 1991 translation into C99. Both the newer C translation, from 2021, and 2021 revision of the older 1991 C translation have converted onto the same result. A key issue that arise, which led to later revision in the Fortran standard, is the lack of information required to distinguish between parameters that are input-only, output-only, input/output. That has to be inferred, which requires either transparency of library functions (here: the functions in the f2c library or whatever is written in its place) or I/O specifications in the library functions. So, a "strength reduction" step is required to lift input/output parameters (the default) to input-only or output-only. A similar issue arises with locals, which are "static", by default, in Fortran (or the Fortran equivalent of "static"). A "strength reduction" step is required to lift non-static locals to bona fide "auto" locals. Another key issue the aliasing that goes on with "equivalence" constructs. There is no good uniform translation for this into C ... it actually better fits C++, where you have reference types available. There's really no good reason why those have been left out of C, when other things which appeared first in C++, like "const", "bool" or function prototypes, found their way into C. However, a substantial chunk of use-cases for equivalence constructs can be carved out as "enum" types, so there was a strength reduction step for this, too. Perhaps the moderator will have more to say about the intricacies of Fortran translation. In the meanwhile, another project has already been staged for conversion to C++ - LAPACK https://github.com/LydiaMarieWilliamson/lapack but is in a holding pattern for now. This one will more heavily involve the synthesis of "template" types. To date, ongoing attempts, elsewhere, have been mostly limited to creating C or C++ shells for the Fortran core, rather than a conversion of the core, itself. [It's been at least 20 years since I've done any sort of Fortran translation so for this maze of twisty little passages, I'm afraid you're on your own. I'm always surprised in translation exercises how many ways that languages that look superficially the same are different in ways that make the translation much harder. -John]
[toc] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2022-05-17 14:59 +0000 |
| Message-ID | <22-05-036@comp.compilers> |
| In reply to | #3006 |
Rock Brentwood <rockbrentwood@gmail.com> schrieb: [...] > A key issue that arise, which led to later revision in the Fortran standard, > is the lack of information required to distinguish between parameters that are > input-only, output-only, input/output. Nit: In Fortran, "parameters" are what you would call "constants" in another language. Arguments to functions or subroutines are called "dummy arguments", which are then associated with "actual arguments" on the caller's side. > That has to be inferred, which requires > either transparency of library functions (here: the functions in the f2c > library or whatever is written in its place) or I/O specifications in the > library functions. So, a "strength reduction" step is required to lift > input/output parameters (the default) to input-only or output-only. "Strength reduction" is a term normally used for something else, for example when replacing multiplication (as in a loop for array processing) by addition. It's a question of the semantics of the code. For something like (C side) aux_var = 5; foo (&aux_var); you can almost certainly rewrite foo to take a value argument. > A similar issue arises with locals, which are "static", by default, in Fortran > (or the Fortran equivalent of "static"). A "strength reduction" step is > required to lift non-static locals to bona fide "auto" locals. The FORTRAN language never guaranteed that variables would keep their data unless SAVE was specified, but many compilers did it anyway, so the code may indeed assume so. Some experimentation on the Fortran side can help there. Compiling the code with -frecursive and/or with one of the -finit-integer and -finit-real options (I'm talking gfortran options here, but other compilers have similar) will help you find trouble spots. If you happen to have access to nagfor, they have a -C=all option which will find very many bugs in code that people think correct, even more with -C=undefined. > Another key issue the aliasing that goes on with "equivalence" constructs. > There is no good uniform translation for this into C ... The question is - what is equivalence used for? Something sane? Generally, C's union are a good match for Fortran's equivalence, with the same problem with undefined behavior if the unions are used for type punning. >it actually better > fits C++, where you have reference types available. There's really no good > reason why those have been left out of C, when other things which appeared > first in C++, like "const", "bool" or function prototypes, found their way > into C. > > However, a substantial chunk of use-cases for equivalence constructs can be > carved out as "enum" types, so there was a strength reduction step for this, > too. > > Perhaps the moderator will have more to say about the intricacies of Fortran > translation. In the meanwhile, another project has already been staged for > conversion to C++ - LAPACK > > https://github.com/LydiaMarieWilliamson/lapack > > but is in a holding pattern for now. This one will more heavily involve the > synthesis of "template" types. To date, ongoing attempts, elsewhere, have been > mostly limited to creating C or C++ shells for the Fortran core, rather than a > conversion of the core, itself. Fortran has guarantees on the semantics which are quite well tuned for optimization. Converting it into C or C++ may well lose execution speed.
[toc] | [prev] | [next] | [standalone]
| From | Lydia Marie Williamson <lydiamariewilliamson@gmail.com> |
|---|---|
| Date | 2022-05-20 16:34 -0700 |
| Message-ID | <22-05-038@comp.compilers> |
| In reply to | #3006 |
On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote: > Another key issue the aliasing that goes on with "equivalence" constructs. > There is no good uniform translation for this into C ... it actually better > fits C++, where you have reference types available. There's really no good > reason why those have been left out of C, when other things which appeared > first in C++, like "const", "bool" or function prototypes, found their way > into C. > > However, a substantial chunk of use-cases for equivalence constructs can be > carved out as "enum" types, so there was a strength reduction step for this, > too. This is not exactly correct. It's "common blocks" that were handled in this way. In the Fortran source of Zork/dungeon, the "equivalence" statements and "common blocks" were used together, so it's easy to get the issue confused. I don't know if their being used together is something that always happened in Fortran, or if it was just particular to this program. > In the meanwhile, another project has already been staged for > conversion to C++ - LAPACK > > https://github.com/LydiaMarieWilliamson/lapack > > but is in a holding pattern for now. There were several stages to the translation, one of which involved regularizing and normalizing the Fortran, itself. This is also on the local machines here. But while that was happening, LAPACK came back alive, and is out on GitHub and being actively maintained again. Originally, it was (mostly) inert. > [It's been at least 20 years since I've done any sort of Fortran translation > so for this maze of twisty little passages, I'm afraid you're on your own. > I'm always surprised in translation exercises how many ways that languages > that look superficially the same are different in ways that make the translation much harder. -John] Things would be easier going into C++, instead of C, since it already has aliasing, operator overloading, re-defineable array indexing, and call-by-reference. This inclusion of more Fortran-friendly features into C++ was apparently done intentionally. [It was not unusual to use common and equivalence together, particularly when memory was tight. But equivalence is like a union, not an enum. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-05-21 09:31 -0700 |
| Message-ID | <22-05-041@comp.compilers> |
| In reply to | #3011 |
On Saturday, May 21, 2022 at 8:54:47 AM UTC-7, Lydia Marie Williamson wrote: (snip on COMMON and EQUIVALENCE) > This is not exactly correct. It's "common blocks" that were handled in this > way. > In the Fortran source of Zork/dungeon, the "equivalence" statements and > "common blocks" were used together, so it's easy to get the issue confused. I > don't know if their being used together is something that always happened in > Fortran, or if it was just particular to this program. COMMON and EQUIVALENCE are closely related in the Fortran standard, and in the implementation by compilers. A variable equivalenced to a variable in common, is also in common. Such variable can extend the length of the common block, but only at the end, not the beginning. It used to be that compilers would print out a variable map, with the address, or offset, of each variable, and its length and type. That was often useful to be sure that the compiler did what you thought it did. Also, it would include the length of each common block, again good to check to be sure they agree with what you expect. The Fortran standard has a C interoperability feature that explains how Fortran features and C features work together.
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2022-05-21 17:24 +0000 |
| Message-ID | <22-05-042@comp.compilers> |
| In reply to | #3011 |
Lydia Marie Williamson <lydiamariewilliamson@gmail.com> schrieb:
> On Monday, May 16, 2022 at 2:53:09 PM UTC-5, Rock Brentwood wrote:
>> Another key issue the aliasing that goes on with "equivalence" constructs.
>> There is no good uniform translation for this into C ... it actually better
>> fits C++, where you have reference types available. There's really no good
>> reason why those have been left out of C, when other things which appeared
>> first in C++, like "const", "bool" or function prototypes, found their way
>> into C.
>>
>> However, a substantial chunk of use-cases for equivalence constructs can be
>> carved out as "enum" types, so there was a strength reduction step for this,
>> too.
>
> This is not exactly correct. It's "common blocks" that were handled in this way.
>
> In the Fortran source of Zork/dungeon, the "equivalence" statements and
> "common blocks" were used together, so it's easy to get the issue confused. I
> don't know if their being used together is something that always happened in
> Fortran, or if it was just particular to this program.
Fortran has the concept of storage association - under certain
circumstances, the ordering of variables is prescribed by the
standard.
COMMON blocks are one example of this. Taking an example from the
original Fortran source code:
COMMON /SYNTAX/ VFLAG,DOBJ,DFL1,DFL2,DFW1,DFW2,
& IOBJ,IFL1,IFL2,IFW1,IFW2
This declares a common block /SYNTAX/ with 11 named variables
(all of them integers due to an IMPLICIT INTEGER (A-Z) earlier in
all files), which have to be contiguous in memory.
The next line
INTEGER SYN(11)
declares an integer array with 11 elements.
Finally, the statement
EQUIVALENCE (VFLAG, SYN)
tells the compiler that the address of the (first element of) SYN
and VFLAG are the same.
So, you can now use SYN(1) to refer to VFLAG, SYN(2) to DOBJ and so on.
Why is this done? I see only one use case, in np3.for
DO 10 I=1,11
C !CLEAR SYNTAX.
SYN(I)=0
10 CONTINUE
simply to create a shortcut for clearing the syntax.
This is a benign (and standard-conforming) way of using COMMON
and EQUIVALENCE. Equivalent C code might create a 'struct syntax'
and clear it with a memset, or have 11 individual variables and
zero them individually.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web