Groups > comp.lang.c > #394488 > unrolled thread

Nice way of allocating flexible struct.

Started by	Kaz Kylheku <643-408-1753@kylheku.com>
First post	2025-10-08 06:35 +0000
Last post	2025-12-14 22:48 -0800
Articles	20 on this page of 73 — 14 participants

Back to article view | Back to comp.lang.c

  Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 06:35 +0000
    Re: Nice way of allocating flexible struct. pozz <pozzugno@gmail.com> - 2025-10-08 09:09 +0200
      Re: Nice way of allocating flexible struct. richard@cogsci.ed.ac.uk (Richard Tobin) - 2025-10-08 12:01 +0000
      Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 15:23 +0000
        Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 19:04 +0300
          Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 20:05 +0000
      Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 18:52 +0300
      Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-14 23:55 -0800
    Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 11:09 +0200
      Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 11:23 +0200
        Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 12:53 +0300
          Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 12:09 +0200
            Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 15:59 +0200
              Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 12:29 -0500
                Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 21:04 +0200
                  Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 22:49 -0500
                    Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-10 01:13 +0100
                      Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 01:54 +0000
                        Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-09 19:43 -0700
                        Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-10 11:25 +0100
                      Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-09 22:50 -0500
                      Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-09 20:59 -0700
                        Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 01:27 -0500
                          Re: Nice way of allocating flexible struct. David Brown <david.brown@hesbynett.no> - 2025-10-10 12:06 +0200
                            Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-10 17:28 +0300
                              Re: Nice way of allocating flexible struct. David Brown <david.brown@hesbynett.no> - 2025-10-10 17:47 +0200
                                Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 16:32 -0500
                                  Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-11 00:02 +0000
                                    Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-13 06:20 +0200
                            Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 15:01 -0500
                            Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 23:45 +0000
                        Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-01-06 18:24 -0800
                    Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-14 06:29 +0200
                      Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-14 20:13 -0500
                        Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-15 11:26 +0100
                          Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-15 13:00 -0500
                            Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-17 22:07 +0100
                              Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-17 17:44 -0500
                            Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-20 10:02 +0200
                              Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-20 04:42 -0500
                                Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 03:40 +0200
                          Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-16 06:45 +0200
                        Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-16 06:37 +0200
                          Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-16 04:43 -0500
                            Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-20 09:58 +0200
                              Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-20 18:36 +0000
                                Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-20 16:44 -0500
                                  Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 01:33 +0000
                                  Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 04:19 +0200
                                    Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) BGB <cr88192@gmail.com> - 2025-10-21 04:27 -0500
                                      Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-22 02:30 +0200
                                        Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) BGB <cr88192@gmail.com> - 2025-10-22 02:10 -0500
                                Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 03:51 +0200
                                  Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-20 19:21 -0700
                                    Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 04:53 +0200
                                    Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:21 +0000
                                      Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-21 13:42 -0500
                                  Re: Nice way of allocating flexible struct. James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-10-21 06:17 -0400
                                    Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:41 +0000
                                  Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:12 +0000
              Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 19:36 +0200
                Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 19:51 +0200
      Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 15:29 +0000
    Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 11:33 -0500
      Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-08 14:57 -0700
        Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-09 01:39 +0000
          Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 22:25 -0500
            Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-09 19:50 -0700
              Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 04:20 +0000
        Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-15 11:24 -0800
    Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-08 13:35 -0700
      Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-08 13:36 -0700
    Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-14 22:48 -0800

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

#394523

From	BGB <cr88192@gmail.com>
Date	2025-10-09 22:50 -0500
Message-ID	<10c9vsr$3p286$1@dont-email.me>
In reply to	#394519

On 10/9/2025 7:13 PM, bart wrote:
> On 09/10/2025 04:49, BGB wrote:
>> On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
>>> On 08.10.2025 19:29, BGB wrote:
> 
>> Though, similar was often a problem in my other language design 
>> attempts: The most efficient way to do things was often also the C way.
>>
>>
>>
>> The only real exception I have found to this rule basically being in 
>> relation to some features I have borrowed from languages like GLSL and 
>> Verilog. But, some of this stuff isn't so much making the language 
>> "higher level" as much as "being easier to map to ISA features and 
>> optimize".
>>
>> Say:
>>    vd[62:52]=vs[20:10];
>> Being easier to optimize than, say:
>>    vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);
> 
> Using special bit-features makes it easier to generate decent code for a 
> simple compiler.
> 
> But gcc for example has no trouble optimising that masking/shifting 
> version.
> 

BGBCC is not so clever...

Granted, its code footprint is tiny vs GCC, and it can do a full rebuild 
in a few seconds (with effectively the entire "compiler toolchain" in a 
single binary).

Like, GCC and LLVM are both very large (over 10M lines).

Contrast, BGBCC is still in kLOC territory.

Granted, still not that small, still pretty big if compared with 
something like Doom; but alas, I haven't really been able to fit a C 
compiler into a Doom-like code footprint (say, trying to keep a C 
compiler under 30k lines).

I did start making an attempt at one point, but ended up dropping the 
effort after I have already exceeded a Doom-like code footprint, and it 
still wasn't very close to being done.

Is a little easier with an interpreter, but if one wants sensible native 
code generation, doing it within a small footprint is difficult.

So, as-is, I have a compiler that is roughly around the size of the 
Quake 2 engine...

> (It can do it in four x64 instructions, whereas I need nine working from 
> vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to 
> extract the data to bits 10..0 first for example.)
> 
> The main advantage is that it is a LOT easier to write, read and 
> understand. The C would need macros to make it practical.
> 

Shifts/Masks and macros is more traditional, but as noted, with my 
compiler explicit bit notation is easier to optimize, as well as read 
and write.

> 
>> Though, Verilog itself, not so much... Works well in an ASIC or FPGA, 
>> not so much on a CPU.
>>
>> Though, as can be noted:
>>    Bit-ranges are required to be constant at compile time;
>>    When used with normal integer types, both bounds are required.
> 
> I can handle some variable elements, but it gets rapidly complicated. At 
> some point it needs to use library functions to do the work.
> 

In my case, I only allowed constant ranges here.

If runtime calls were used, they would eat any possible savings.
But, the ability to generate efficient code here falls on its face if 
non-constant.

>>
>> OTOH, GLSL offers nice and efficient ways to deal with SIMD.
>> Well, and also having some types for bit-preserving casts.
>> Or ability to specify endianess and alignment for individual struct 
>> members.
>> ...
>>
>>
>>>>
>>>> Granted, a few of my own language design attempts ended up with a
>>>> different mess:  [...]
>>>
>>> A sensibly defined language isn't something easily to create
>>> or obtain! - Personally I'd have appreciated it more if more
>>> designers of "own languages" have oriented their designs on
>>> sensible existing and proven concepts. - There may be a
>>> "market" for all these "own languages", I don't know, but I
>>> also don't care much, given what I've seen or heard of yet.
>>> (This isn't meant to be offensive, just to be clear, only
>>> that I don't care much. As compiler writers don't care much
>>> what I think.)
>>>
>>
>> Yeah.
>>
>> They have either tended to not amount to much, or converged towards 
>> more conventional languages.
>>
>>
>>
>>>> [ attempt for a discussion on features of "own language"
>>>>    snipped; not my business ]
> 
> (There are those who can devise and use their own languages, and those 
> who can't.)
> 
>> Some amount of my stuff recently has involved various niche stuff.
>>    Interfacing with hardware;
>>    Motor controls;
>>    Implementing things like an OpenGL back-end or similar;
>>    Being used for a Boot ROM and OS kernel;
>>    Sometimes neural nets.
> 
> Some impressive stuff.
> 

Yes, and mostly C domain.

I have my own experimental ISA which was partly designed with the 
intention of using it for a mix of motor controls and computer vision.

Mostly I have just ended it for running ports of 90s era games (and 
otherwise mostly using ye olde ARM). Partly, one is still hard-pressed 
to get an FPGA to be performance competitive with something like a 
RasPi; and a RasPi is cheaper.

But, for my own ISA, did end up writing the firmware and OS (including 
OpenGL) using my own compiler, although mostly in an extended C dialect.

And, using my C compiler as:
I could throw it together for my own ISA and design experiments;
GCC or Clang would have been too much of an uphill battle;
LCC would still have left me needing to do most of the relevant work myself;
...

If comparing against GCC targeting RV64, my stuff gets better 
performance (though, BGBCC typically loses if both compilers are limited 
to plain RV64G; but my compiler with my ISA can beat GCC when GCC is 
limited to RV64G).

Though, With a few carefully selected extensions, RISC-V can be brought 
into a similar performance profile as my own ISAs:
   Indexed Load and Store;
   Load/Store Pair (load or store pairs of 2 registers at a time);
   Jumbo Prefixes (can expand immediate values from 12 to 33 bits).

In some programs, this combination can get a 40-60% speedup over plain 
RV64G.

A lot of other things are possible, but the gains are generally a lot 
smaller.

As I can note, ISA list supported by my compiler looks kinda like:
   SuperH: SH4
     BJX1 (an extended variant of SH4).
       (Split into several variants)
     ( Not currently maintained )
   BJX2 (Current ISA family)
     XG1: Original form of the ISA
     XG2: Intermediate form
     XG3: Reworked to coexist better with RISC-V.
   RISC-V
     RV64G/RV64GC
     Various optional extensions.

While RISC-V exists and is popular, not fully jumped over to RV as in 
its basic form, its performance is a little weak (partly due to weak 
areas and "foot guns"). Its performance can be improved, but there are 
limits.

The XG3 variant is promising, but is essentially XG2 and RV64G just sort 
of awkwardly hot-glued together. I don't expect it will see widespread 
adoption (even if it does get reasonably good performance). Like, an ISA 
design that is two unrelated ISAs glued together isn't necessarily the 
most elegant solution (even if XG3's encoding scheme was able to clean 
up some of the dog chew in XG2).

Pretty much all of the normal RV64G (or RV64I) encodings are still 
usable in XG3, just with trade-offs (like, RV64 encodings have split X/F 
registers whereas XG3 encodings have a unified register space, ...).

> 
>> Some features are useful in some contexts but not others:
>> For example, "__int128" is very helpful when writing FPU-emulation 
>> code for Binary128 handling, but has a lot fewer use-cases much beyond 
>> this.
>>
>> Or, like:
>>    exp=vala[126:112];  //extract exponent
>>    fra=(_BitInt(128)) { 0x0001i16, vala[111:0]};  //extract fraction
> 
> I had i128/u128 types at one point (quite a nice implementation too; it 
> was only missing full 128-bit divide, I had only 128/64.)
> 
> But the only place they got used was implementing 128-bit support in the 
> self-hosted compiler and its library! So they were dropped.
> 

I have them in my compiler, and to some extent in my ISA, but one of the 
main use cases I have for them is implementing Binary128 support code 
(or, "long double").

RISC-V doesn't have "__int128", and (in used) many of the operations end 
up as runtime calls. It is more awkward in RISC-V as well as there also 
isn't really a good/efficient way to implement 128-bit math using the 
available 64-bit instructions.

I ended up adding a modified form of the 'Q' instructions from RISC-V, 
where:
"long double" is rarely used enough that the cost of handling it with 
emulation traps is acceptable;
But, common enough that you don't want it to be too horribly slow;
My ISA has access to some 128-bit integer operations;
On the RISC-V side, the cost of doing everything with 64-bit integer 
math is slow enough to offset the cost of the emulation traps.

Though not quite the same as the Q extension:
Uses register pairs rather than 128 bit registers;
But, my sentiment here is that for low-traffic uses (128-bit integer and 
floating-point stuff in general), then the use of pairs of 64-bit 
registers is preferable.

In the underlying hardware, supporting 128-bit integer math was a 
cheaper option if compared with 128-bit FPU hardware; and int128 also 
sees a little more traffic.

As for ISA level support for int128:
   ADD/SUB, Native (ALU chaining)
     Some control bits are needed to merge Carry-Select across the units;
     Carry-select scales well, so not too much added latency cost.
   Shift, Native (shift units ganged)
     Can use two 64-bit funnel shifters in parallel with some trickery.
   AND/OR/XOR: 2x ALU in parallel; No special handling needed.
   ...

This leaves MUL/DIV/etc, no viable way to handle them directly or 
efficiently in hardware. Currently, the fastest way to do 128-bit 
multiply in this case being to build it from 32-bit widening-multiply 
instructions.

And, for Binary128 FPU multiply, one needs the high 128-bits of a 
128*128 -> 256 bit widening multiply.

>> Unless maybe something can come along that is a better C than C...
> 
> There are lots of new products, mostly too ambitious, too big and too 
> complex. But C is already ensconced everywhere.
> 

Yeah.

Note trying to beat C at its own game is not about having the biggest 
possible feature list.

A feature which might help in one context might be actively detrimental 
in another.

Maybe OOP is itself optional, so a C-like subset basically has close to 
a 1:1 feature-set with C (and remains well suited for procedural code).

Likewise, shouldn't need to pay for things like RTTI or Exceptions when 
not used.

Also, ideally "don't pay for what you don't use".
Some compiler or language features exist, but if programmers don't use 
them, ideally they shouldn't need to pay for them.

Like, say, a language could have optional dynamic types; but if writing 
a small Boot ROM, then this sort of thing is strictly off-limits.

Like, say, if you have 32K of ROM space, can't justify wasting it on 
non-essential features.

But, if a C alternative language is basically just C with a slightly 
different syntax, not necessarily all that compelling either.

...

>> Would likely simplify or eliminate some infrequently used features in C.
>>
>> Possibly:
>>    Preprocessor, still exists, but its role is reduced.
>>      Its role can be partly replaced by compiler metadata.
>>    Trigraphs and digraphs: Gone;
>>    K&R style declarations, also gone;
>>    Parser should not depend on previous declarations;
>>    Non trivial types and declarator syntax: Eliminate;
>>    ...
>>
>> Possibly:
>> Pointers and arrays can be specified on the type rather than 
>> declarator (so, more like C# here)
>> ...
>>
>> But, as I see it, drastically changing the syntax (like in Go or Rust) 
>> is undesirable. Contrast, say, C# style syntax was more conservative.
> 
> Nobody cares about C syntax. Learning all its ins and outs seems be a 
> rite of passage.
> 
> The trouble is that C-style is so dominant, few people would know what a 
> decent syntax looks like. Or, more, likely, they associate a clean, 
> well-designed syntax with toy or scripting languages, and can't take it 
> seriously.
> 
> But if it looks as hairy as C++ then it must be the business!
> 

Comparably, C# style syntax is simplified if compared with C or C++, but 
retains many similar properties (and isn't quite as verbose or as 
awkward as Java).

>> Though, the harder problem here isn't necessarily that of designing or 
>> implementing it, but more in how to make its use preferable to jus 
>> staying with C.
>>
>>
>> One merit is if code can be copy-pasted, but if one has to change all 
>> instances of:
>>    char *s0, *s1;
>> To:
>>    char* s0, s1;
>>
>> Well, this is likely to get old, unless it still uses, or allows C 
>> style declaration syntax in this case.
> 
> That one's been fixed (50 years late): you instead write:
> 
>   typeof(char*) s0, s1;
> 
> But you will need an extension if it's not part of C23.
> 

It is a thing of if the language uses a C# style syntax, even if 
"unsafe", directly copy-pasting from C would require some amount of editing.

But, either way, to be useful as a C alternative:
Would need to be able to do all of the same stuff as C in a roughly 
similar way;
Should have similar or better performance;
...

Though, one thing is:
The language should not try to ram OOP down everyone's throat (a problem 
that existed in both C# and Java).

Ideally, one could still do things using a C like procedural style.

[toc] | [prev] | [next] | [standalone]

#394524

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2025-10-09 20:59 -0700
Message-ID	<87qzvbh0f5.fsf@example.invalid>
In reply to	#394519

bart <bc@freeuk.com> writes:
> On 09/10/2025 04:49, BGB wrote:
[...]

> Nobody cares about C syntax.

That is so manifestly untrue that I can't imagine what you actually
meant.

Many of us, myself included, don't particularly like some aspects of C
syntax, but that's not the same as not caring about it.

>                              Learning all its ins and outs seems be a
> rite of passage.

Perhaps.  It's also necessary if you want to work with the language.

> The trouble is that C-style is so dominant, few people would know what
> a decent syntax looks like. Or, more, likely, they associate a clean,
> well-designed syntax with toy or scripting languages, and can't take
> it seriously.
>
> But if it looks as hairy as C++ then it must be the business!

C syntax has survived and been propagated to other languages because
it's well known, not, I think, because anybody really likes it.

[...]

>> One merit is if code can be copy-pasted, but if one has to change
>> all instances of:
>>    char *s0, *s1;
>> To:
>>    char* s0, s1;
>> Well, this is likely to get old, unless it still uses, or allows C
>> style declaration syntax in this case.
>
> That one's been fixed (50 years late): you instead write:
>
>  typeof(char*) s0, s1;
>
> But you will need an extension if it's not part of C23.

Yes, that will work in C23, but it would never occur to me to
write that.  I'd just write `char *s0, *s1;` or, far more likely,
define s0 and s1 on separate lines.  Using typeof that way triggers
my WTF filter.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#394526

From	BGB <cr88192@gmail.com>
Date	2025-10-10 01:27 -0500
Message-ID	<10ca933$3rssv$1@dont-email.me>
In reply to	#394524

On 10/9/2025 10:59 PM, Keith Thompson wrote:
> bart <bc@freeuk.com> writes:
>> On 09/10/2025 04:49, BGB wrote:
> [...]
> 
>> Nobody cares about C syntax.
> 
> That is so manifestly untrue that I can't imagine what you actually
> meant.
> 
> Many of us, myself included, don't particularly like some aspects of C
> syntax, but that's not the same as not caring about it.
> 

Yes.

>>                               Learning all its ins and outs seems be a
>> rite of passage.
> 
> Perhaps.  It's also necessary if you want to work with the language.
> 
>> The trouble is that C-style is so dominant, few people would know what
>> a decent syntax looks like. Or, more, likely, they associate a clean,
>> well-designed syntax with toy or scripting languages, and can't take
>> it seriously.
>>
>> But if it looks as hairy as C++ then it must be the business!
> 
> C syntax has survived and been propagated to other languages because
> it's well known, not, I think, because anybody really likes it.
> 

I would gladly pick C style syntax over PASCAL, FORTRAN, or COBOL.

> [...]
> 
>>> One merit is if code can be copy-pasted, but if one has to change
>>> all instances of:
>>>     char *s0, *s1;
>>> To:
>>>     char* s0, s1;
>>> Well, this is likely to get old, unless it still uses, or allows C
>>> style declaration syntax in this case.
>>
>> That one's been fixed (50 years late): you instead write:
>>
>>   typeof(char*) s0, s1;
>>
>> But you will need an extension if it's not part of C23.
> 
> Yes, that will work in C23, but it would never occur to me to
> write that.  I'd just write `char *s0, *s1;` or, far more likely,
> define s0 and s1 on separate lines.  Using typeof that way triggers
> my WTF filter.
> 

Agreed.

I think it can be contrast with C# style syntax (with "unsafe") where 
one would write:
   char* s0, s1;
Though, imagining a world where probably char is an unsigned byte, so 
that UTF-8 makes sense.

So, say, if we had types (for a hypothetical language) like:
   sbyte, ubyte: 8-bits, signed/unsigned
   byte: 8-bits, probably unsigned
   char: 8-bits, probably unsigned (UTF-8)
   wchar: 16-bits, unsigned (UTF-16)
   short: 16-bits, signed
   ushort: 16-bits, unsigned
   int: 32-bits, signed
   uint: 32-bits, unsigned
   long: 64-bits, signed
   ulong: 64-bits, unsigned

Maybe some more, but with explicit bit sizes.
     int8/int16/int32/int64/int128
     uint8/uint16/uint32/uint64/uint128
   But, no separate "unsigned".
   Core type name is always a single identifier, unlike C.

And, special types:
   string: String, UTF-8
   wstring: String, UTF-16
   ...
While a string is (effectively) a pointer to the first character, the 
type can be seen as distinct from that of 'char*'. Nominal 
representation would be as a series of codepoints terminated with a NUL 
byte.

Default string type can be UTF-8 because, most of the time, UTF-16 would 
be a waste of memory (but, can be kept for "those that have that 
preference").

And, floating point:
   float:   Binary32
   double:  Binary64
   half:    Binary16
Maybe explicit sized names:
   float16/float32/float64/float128

Maybe vectors (optional):
   vec2f   2x float
   vec4f   4x float
   vec4h   4x half
   vec2d   2x double
   quatf   quaternion (float)
   quath   quaternion (half)

Untyped bit-blobs:
   m8, m16, m32, m64, m128
     Say:
       ulong li;
       double f;
       li=(m64)f;  //cast double->ulong as bit pattern
       f=(m64)li;  //cast ulong->double as bit pattern

With basic decl syntax like:
   int      i;    //normal variable
   int[16]  ia;   //fixed array (inline)
   int*     pi;   //pointer
   int[]    ia2;  //flexible array (reference, hosted only)
   int*[16] pia;   //fixed array of pointers

This would apply to primitive types and structs.

Object types having different suffixes (hosted only):
   Foo      obj1;  //basic object
   Foo!     obj2;  //automatic
   Foo^     obj3;  //refcount
   Foo(Z)   obj4;  //zone
   Foo[]    aobj1;  //flexible array of Foo
   Foo[16]  aobj2;  //fixed array of Foo
   ...

Where, struct and class implicitly declare type, so:
   struct Str1 {
     int x, y;
   }
   class Foo:Bar {  //class Foo, extends Bar
     ...
   }
   interface IBaz { //interface
     ...
   }

Where, likely flexible arrays, classes, and interfaces, only exist:
   If the implementation is hosted;
   If a non-hosted implementation provides a full memory manager.

Would eliminate some more obscure patterns from C syntax, like:
   int (*arr[16])[16];  //like, WUT?...
Just sorta say, stuff like this doesn't exist.

Might consider doing function pointers like:
   delegate void FuncT();
   ...
   FuncT fun;

Well, in contrast to some of my own languages which had used
   typedef void FuncT();
For this, and 'delegate' as a scope modifier (where identifier lookups 
may look into the variable). But, nothing stops using delegate for both 
purposes (based on whether it is followed by a prototype or object 
variable declaration).

Scoping could look like:
   global top-level (behaves like C top-level);
   namespaces.

Probably, things like function overloading are only allows inside 
classes or namespaces. At the global toplevel, no overloading is allowed 
(so, it is like C++ 'extern "C"' by default).

So, say:
   namespace foo {
     using c.stdio;

     //why not?...
     int func(int x, int y)
     {
         printf("yeah...\n");  //via stdio
         return x+y;
     }
   }

Where, say, we don't need a bunch of wrapper classes for file IO and 
printing, because C's "stdio.h" already does this.

Here, there can be some magic behind the scenes, where the compiler can 
use namespaces and metadata rather than bulk textual inclusion.

One option is that when the compiler compiles stuff, like the runtime 
library, it also generates "manifests" that the compiler can use to find 
declarations. Likely, the manifest files could exist like a sort of 
hierarchical database partly mapped onto a virtual filesystem, with 
search paths and similar (sorta like the class path in the JVM). Just, 
this metadata will only exist for compiling stuff.

Why not textual inclusion?: Because it wastes a lot of CPU time and RAM 
to generate and parse a mountain of random stuff for every translation 
unit (the amount of text pulled in from headers is typically several 
orders of magnitude larger than the actual code for the translation unit 
itself). Also precompiled headers are a poor solution to this.

May or may not do varargs differently.
   One possible interpretation could be, say:
   void vafunc(char* s, va...)
   {
     char* s1;
     long x, y;
     x=va[0];  //first variable argument
     y=va[1];  //second variable argument
     s1=(char*)va[2];  //third argument (string)
   }
Where, the exact element type depends on the target, but probably 'long' 
or something (or 'int' on a 32-bit target).

ABI rule then would be:
If the ABI would otherwise have distinction based on argument types, 
vararg functions will receive all arguments as native machine words (int 
or long or similar). If they are passed in registers, they will be 
spilled to memory, and this memory will be returned as the argument list.

Then, say, C style va_list could be faked as, say:
   long* vap=va;
   x=*vap++;
   y=*vap++;
   s1=(char*)(*vap++);

...

As for whether or not this offers enough to actually be worth bothering 
with, vs "just use C", dunno...

I am seemingly one of the few people considering ideas for a 
hypothetical C replacement that actually likes C.

...

[toc] | [prev] | [next] | [standalone]

#394527

From	David Brown <david.brown@hesbynett.no>
Date	2025-10-10 12:06 +0200
Message-ID	<10calqi$3v6ha$1@dont-email.me>
In reply to	#394526

On 10/10/2025 08:27, BGB wrote:
> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>> bart <bc@freeuk.com> writes:

>>
>>>> One merit is if code can be copy-pasted, but if one has to change
>>>> all instances of:
>>>>     char *s0, *s1;
>>>> To:
>>>>     char* s0, s1;
>>>> Well, this is likely to get old, unless it still uses, or allows C
>>>> style declaration syntax in this case.
>>>
>>> That one's been fixed (50 years late): you instead write:
>>>
>>>   typeof(char*) s0, s1;
>>>
>>> But you will need an extension if it's not part of C23.
>>
>> Yes, that will work in C23, but it would never occur to me to
>> write that.  I'd just write `char *s0, *s1;` or, far more likely,
>> define s0 and s1 on separate lines.  Using typeof that way triggers
>> my WTF filter.
>>
> 
> Agreed.
> 
> 
> 
> I think it can be contrast with C# style syntax (with "unsafe") where 
> one would write:
>    char* s0, s1;

Does C# treat s1 as "char*" in this case?  That sounds like an 
extraordinarily bad design decision - having a syntax that is very like 
the dominant C syntax yet subtly different.

Issues like this have been "solved" for decades - in the sense that 
people who care about their code don't make mistakes from mixups of 
"char" and "char*" declarations.  There are a dozen different ways to be 
sure it is not an issue.  Simplest of all is a style rule - never 
declare identifiers of different types in the same declaration.  I'd 
have preferred that to be a rule baked into the language from the start, 
but we all have things we dislike about the C syntax.

[toc] | [prev] | [next] | [standalone]

#394529

From	Michael S <already5chosen@yahoo.com>
Date	2025-10-10 17:28 +0300
Message-ID	<20251010172802.00005bdd@yahoo.com>
In reply to	#394527

On Fri, 10 Oct 2025 12:06:10 +0200
David Brown <david.brown@hesbynett.no> wrote:

> On 10/10/2025 08:27, BGB wrote:
> > On 10/9/2025 10:59 PM, Keith Thompson wrote:  
> >> bart <bc@freeuk.com> writes:  
> 
> >>  
> >>>> One merit is if code can be copy-pasted, but if one has to change
> >>>> all instances of:
> >>>>     char *s0, *s1;
> >>>> To:
> >>>>     char* s0, s1;
> >>>> Well, this is likely to get old, unless it still uses, or allows
> >>>> C style declaration syntax in this case.  
> >>>
> >>> That one's been fixed (50 years late): you instead write:
> >>>
> >>>   typeof(char*) s0, s1;
> >>>
> >>> But you will need an extension if it's not part of C23.  
> >>
> >> Yes, that will work in C23, but it would never occur to me to
> >> write that.  I'd just write `char *s0, *s1;` or, far more likely,
> >> define s0 and s1 on separate lines.  Using typeof that way triggers
> >> my WTF filter.
> >>  
> > 
> > Agreed.
> > 
> > 
> > 
> > I think it can be contrast with C# style syntax (with "unsafe")
> > where one would write:
> >    char* s0, s1;  
> 
> Does C# treat s1 as "char*" in this case?  That sounds like an 
> extraordinarily bad design decision - having a syntax that is very
> like the dominant C syntax yet subtly different.
> 

Generally, I disagree with your rule. Not that it makes no sense at
all, but sometimes a violation has more sense. For example, I strongly
prefer for otherwise C-like languages to parse 011 literal as decimal
11 rather than 9.

In this particular case it's more subtle.
What makes it a non-issue in practice is the fact that pointers is C# is
very rarely used expert-level feature, especially so after 7 or 8
years ago the language got slices (Span<T>).
A person that decides to use C# pointers has to understand at least
half a dozen of more arcane things than this one.
Also it's very unlikely in case somebody made such mistake that his
code will pass compilation. After all, we're talking about C# here, not
something like Python.

[toc] | [prev] | [next] | [standalone]

#394530

From	David Brown <david.brown@hesbynett.no>
Date	2025-10-10 17:47 +0200
Message-ID	<10cb9rd$5f2i$2@dont-email.me>
In reply to	#394529

On 10/10/2025 16:28, Michael S wrote:
> On Fri, 10 Oct 2025 12:06:10 +0200
> David Brown <david.brown@hesbynett.no> wrote:
> 
>> On 10/10/2025 08:27, BGB wrote:
>>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>>> bart <bc@freeuk.com> writes:
>>
>>>>   
>>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>>> all instances of:
>>>>>>      char *s0, *s1;
>>>>>> To:
>>>>>>      char* s0, s1;
>>>>>> Well, this is likely to get old, unless it still uses, or allows
>>>>>> C style declaration syntax in this case.
>>>>>
>>>>> That one's been fixed (50 years late): you instead write:
>>>>>
>>>>>    typeof(char*) s0, s1;
>>>>>
>>>>> But you will need an extension if it's not part of C23.
>>>>
>>>> Yes, that will work in C23, but it would never occur to me to
>>>> write that.  I'd just write `char *s0, *s1;` or, far more likely,
>>>> define s0 and s1 on separate lines.  Using typeof that way triggers
>>>> my WTF filter.
>>>>   
>>>
>>> Agreed.
>>>
>>>
>>>
>>> I think it can be contrast with C# style syntax (with "unsafe")
>>> where one would write:
>>>     char* s0, s1;
>>
>> Does C# treat s1 as "char*" in this case?  That sounds like an
>> extraordinarily bad design decision - having a syntax that is very
>> like the dominant C syntax yet subtly different.
>>
> 
> Generally, I disagree with your rule. Not that it makes no sense at
> all, but sometimes a violation has more sense. For example, I strongly
> prefer for otherwise C-like languages to parse 011 literal as decimal
> 11 rather than 9.

I did not intend to describe a general rule (and I agree with you in 
regard to octal).

> 
> In this particular case it's more subtle.
> What makes it a non-issue in practice is the fact that pointers is C# is
> very rarely used expert-level feature, especially so after 7 or 8
> years ago the language got slices (Span<T>).
> A person that decides to use C# pointers has to understand at least
> half a dozen of more arcane things than this one.
> Also it's very unlikely in case somebody made such mistake that his
> code will pass compilation. After all, we're talking about C# here, not
> something like Python.
> 

Sure.

It would seem to me, however, that it would have been better for the C# 
designers to pick a different syntax here rather than something that 
looks like C, but has subtle differences that are going to cause newbies 
confusion when they try to google for explanations for their problems. 
For example, if raw pointers are rarely used, then they should perhaps 
be accessible using a more verbose syntax than a punctuation mark - 
"ptr<char> s0, s1;" might work.

However, I have no experience with C#, and don't know the reasons for 
its syntax choices.

[toc] | [prev] | [next] | [standalone]

#394533

From	BGB <cr88192@gmail.com>
Date	2025-10-10 16:32 -0500
Message-ID	<10cbu37$bhar$1@dont-email.me>
In reply to	#394530

On 10/10/2025 10:47 AM, David Brown wrote:
> On 10/10/2025 16:28, Michael S wrote:
>> On Fri, 10 Oct 2025 12:06:10 +0200
>> David Brown <david.brown@hesbynett.no> wrote:
>>
>>> On 10/10/2025 08:27, BGB wrote:
>>>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>>>> bart <bc@freeuk.com> writes:
>>>
>>>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>>>> all instances of:
>>>>>>>      char *s0, *s1;
>>>>>>> To:
>>>>>>>      char* s0, s1;
>>>>>>> Well, this is likely to get old, unless it still uses, or allows
>>>>>>> C style declaration syntax in this case.
>>>>>>
>>>>>> That one's been fixed (50 years late): you instead write:
>>>>>>
>>>>>>    typeof(char*) s0, s1;
>>>>>>
>>>>>> But you will need an extension if it's not part of C23.
>>>>>
>>>>> Yes, that will work in C23, but it would never occur to me to
>>>>> write that.  I'd just write `char *s0, *s1;` or, far more likely,
>>>>> define s0 and s1 on separate lines.  Using typeof that way triggers
>>>>> my WTF filter.
>>>>
>>>> Agreed.
>>>>
>>>>
>>>>
>>>> I think it can be contrast with C# style syntax (with "unsafe")
>>>> where one would write:
>>>>     char* s0, s1;
>>>
>>> Does C# treat s1 as "char*" in this case?  That sounds like an
>>> extraordinarily bad design decision - having a syntax that is very
>>> like the dominant C syntax yet subtly different.
>>>
>>
>> Generally, I disagree with your rule. Not that it makes no sense at
>> all, but sometimes a violation has more sense. For example, I strongly
>> prefer for otherwise C-like languages to parse 011 literal as decimal
>> 11 rather than 9.
> 
> I did not intend to describe a general rule (and I agree with you in 
> regard to octal).
> 

Yeah, '0' by itself indicating octal is weird, so I might agree here.
   123    //decimal
   0123   //maybe reinterpret as decimal?
   0o123  //octal
   0x123  //hexadecimal
   0b101  //binary

In BGBCC, had defined some additional handling for suffixes:
   iNN, where NN is an integer, specifies a number of bits.
   uNN or uiNN, specifies a number of bits, but unsigned.
   Types could specify non-power-of-2 widths (understood as _BitInt).

Though, there was also the wonk that these literals could also allow X 
and Z in place of bits or hex digits, but this was more a side-effect of 
a fizzled effort to try to add Verilog support to BGBCC (which was also 
sort of where the bit notation came from).

Though, generally, X and Z have no real purpose in C code though (and 
may not exist in actual integer values), so would be little more than a 
curiosity (with some of this more as stuff intended to try to test out 
functionality being added for sake of trying to support Verilog).

But, as noted, in a few cases, the Verilog mechanisms can offer a 
performance advantage over traditional C constructs. In other cases, not 
so much....

This was being worked on at one point as I sometimes face frustration at 
the almost non-existent debugging features in Verilator (you basically 
have to do a more awkward form of printf debugging; would kinda be nice 
sometimes if one could set breakpoints and inspect variables, ...).

But, what passes for control-flow in Verilog doesn't really map over so 
well (basically need to update stuff based on "sensitivity graph" mostly 
driven by clock signals and similar).

>>
>> In this particular case it's more subtle.
>> What makes it a non-issue in practice is the fact that pointers is C# is
>> very rarely used expert-level feature, especially so after 7 or 8
>> years ago the language got slices (Span<T>).
>> A person that decides to use C# pointers has to understand at least
>> half a dozen of more arcane things than this one.
>> Also it's very unlikely in case somebody made such mistake that his
>> code will pass compilation. After all, we're talking about C# here, not
>> something like Python.
>>
> 
> Sure.
> 
> It would seem to me, however, that it would have been better for the C# 
> designers to pick a different syntax here rather than something that 
> looks like C, but has subtle differences that are going to cause newbies 
> confusion when they try to google for explanations for their problems. 
> For example, if raw pointers are rarely used, then they should perhaps 
> be accessible using a more verbose syntax than a punctuation mark - 
> "ptr<char> s0, s1;" might work.
> 
> However, I have no experience with C#, and don't know the reasons for 
> its syntax choices.
> 

Early on, it didn't have generics and so wouldn't use that syntax.

Unlike C++, it doesn't have templates, so "ptr<char>" would not make so 
much sense, and then 'ptr' would be limited to being a class instance, 
which are always by-reference. Also early on, no operator overloading 
either (as with generics, this part was added later).

Also the language discouraged pointers anyways, so you had to opt-in by 
using the 'unsafe' keyword before the compiler would allow them (and 
then, only for 'trusted' executables).

Though, for a hybrid language, would likely drop the concept of trusted 
executables (or, allow it, maybe with the added constraint that object 
lifetimes be statically-provable; maybe asking too much though).

The concept of trusted executables doesn't make as much sense with 
native-code compilation.

>

[toc] | [prev] | [next] | [standalone]

#394535

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-10-11 00:02 +0000
Message-ID	<20251010170008.460@kylheku.com>
In reply to	#394533

On 2025-10-10, BGB <cr88192@gmail.com> wrote:
> Yeah, '0' by itself indicating octal is weird, so I might agree here.
>    123    //decimal
>    0123   //maybe reinterpret as decimal?
>    0o123  //octal
>    0x123  //hexadecimal
>    0b101  //binary

Lisp people worked this out before the end of the 80s:

[1]> 777
777
[2]> 00777
777
[3]> #o777
511
[4]> #x777
1911
[5]> #b1001
9

Leading zeros changing base is really a sneaky stupidity, and causes
problems in shell scripts also, from time to time.

$ printf "%d\n" 0777
511
$
$ echo $(( 0777 + 0 ))
511

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#394539

From	Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Date	2025-10-13 06:20 +0200
Message-ID	<10chumb$1v98o$1@dont-email.me>
In reply to	#394535

On 11.10.2025 02:02, Kaz Kylheku wrote:
> On 2025-10-10, BGB <cr88192@gmail.com> wrote:
>> Yeah, '0' by itself indicating octal is weird, so I might agree here.
>>    123    //decimal
>>    0123   //maybe reinterpret as decimal?
>>    0o123  //octal
>>    0x123  //hexadecimal
>>    0b101  //binary
> 
> Lisp people worked this out before the end of the 80s:
> 
> [1]> 777
> 777
> [2]> 00777
> 777
> [3]> #o777
> 511
> [4]> #x777
> 1911
> [5]> #b1001
> 9
> 
> Leading zeros changing base is really a sneaky stupidity, and causes
> problems in shell scripts also, from time to time.
> 
> $ printf "%d\n" 0777
> 511
> $
> $ echo $(( 0777 + 0 ))
> 511
> 

Yes, indeed. And behavior between shells and versions differs as well.

$ dash -c 'printf "%d\n" 077'
63
$ ksh93u -c 'printf "%d\n" 077'
63
$ ksh93u+ -c 'printf "%d\n" 077'
77

Now, is that good that ksh has fixed that? (I have my doubts.)

Also if you get actual values from variable expansion (as opposed to
constant literals) you may get surprises.

At some point I used [in Kornshell] often explicit "base specifiers"
(which is not generally available in shells), base#number

$ ksh93u -c 'printf "%d\n" 10#077'
77

especially sensible if used with variables containing arbitrary number
formats where the leading zero is hidden in 'var'.

$ ksh93u -c 'var=077; printf "%d\n" 10#$var $var'
77
63

Leading zero octals is yet another badly designed language feature,
and not only in the shell language - but that ship has sailed...

Janis

[toc] | [prev] | [next] | [standalone]

#394532

From	BGB <cr88192@gmail.com>
Date	2025-10-10 15:01 -0500
Message-ID	<10cboq0$a8ak$1@dont-email.me>
In reply to	#394527

On 10/10/2025 5:06 AM, David Brown wrote:
> On 10/10/2025 08:27, BGB wrote:
>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>> bart <bc@freeuk.com> writes:
> 
>>>
>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>> all instances of:
>>>>>     char *s0, *s1;
>>>>> To:
>>>>>     char* s0, s1;
>>>>> Well, this is likely to get old, unless it still uses, or allows C
>>>>> style declaration syntax in this case.
>>>>
>>>> That one's been fixed (50 years late): you instead write:
>>>>
>>>>   typeof(char*) s0, s1;
>>>>
>>>> But you will need an extension if it's not part of C23.
>>>
>>> Yes, that will work in C23, but it would never occur to me to
>>> write that.  I'd just write `char *s0, *s1;` or, far more likely,
>>> define s0 and s1 on separate lines.  Using typeof that way triggers
>>> my WTF filter.
>>>
>>
>> Agreed.
>>
>>
>>
>> I think it can be contrast with C# style syntax (with "unsafe") where 
>> one would write:
>>    char* s0, s1;
> 
> Does C# treat s1 as "char*" in this case?  That sounds like an 
> extraordinarily bad design decision - having a syntax that is very like 
> the dominant C syntax yet subtly different.
> 

Yes. In this case, things like "*" or "[]" are associated with the type 
rather than the declarator.

> Issues like this have been "solved" for decades - in the sense that 
> people who care about their code don't make mistakes from mixups of 
> "char" and "char*" declarations.  There are a dozen different ways to be 
> sure it is not an issue.  Simplest of all is a style rule - never 
> declare identifiers of different types in the same declaration.  I'd 
> have preferred that to be a rule baked into the language from the start, 
> but we all have things we dislike about the C syntax.
> 

The partial reason for some of the differences is that it allows a 
parser that does not need to know about previous typedefs and declarations.

In C, you need to know prior typedefs to parse correctly.
   In C++, you also need to know previous template declarations, etc.
   With classes/structs/etc adding implicit typedefs.

Avoiding the need to know typedefs in advance allows for a parser where 
there either is no preprocessor (Java), or the preprocessor still exists 
but its use is far more limited in scope and mostly unused (C#).

Also typically, things like the type-system are handled later in the 
pipeline (in .NET, it was closer to what would be considered the linker 
stage in a traditional compiler).

In effect, the front-end process works with relatively incomplete 
information, producing IL bytecode that specifies where to look for 
things and what to look for, but not the complete information. When an 
EXE or DLL is produced, it would resolve things for what exists within 
the current "assembly" (roughly equal to the EXE or DLL being compiled), 
with the ".NET runtime" needing to sort out the rest (typically AOT 
compiling the binaries into some internal form).

However, I would assume not having a "runtime" here, meaning the linker 
would need to produce native code binaries.

FWIW: BGBCC also generally uses a bytecode representation internally, 
and then produces native binaries as output. Though, the way the 
bytecode is structured and works differs from that of .NET bytecode. 
However, in both cases, they are using implicitly-typed stack machines 
at the IL stage. In BGBCC, for the backend stage, the bytecode IL is 
translated into "Three-Address-Code" roughly in "SSA Form" (though, not 
exactly the same as in LLVM; as it typically uses a combination of 
variable-ID and sequence-number, rather than creating a new "register" 
every time; also typically the "phi" operations are implicit).

Can note that it does support ASM, but the handling is generally that 
any ASM code is preprocessed and then passed through the IL stage as 
string blobs (then assembled in the backend stage).

Note, while it is possible to go more directly from a stack IL to native 
code (without going through 3AC/SSA), the generated code is garbage.

Also, while it is possible to have a compiler that uses SSA as an 
on-disk IR format (like Clang), IMO this creates a lot of pain and 
exposes too much of the backend machinery (it would be very much a pain 
to use LLVM bitcode in anything other than LLVM).

So, seemingly, a stack-oriented bytecode is the "least pain" option.
Well... Unless they do it like WASM and find other creative ways to 
screw it up...

Can note that in the case of a language like C#, the visibility of types 
and similar comes through the use of namespaces (which partly take on a 
similar role to headers in C or C++, or packages in Java or ActionScript).

Where, say:
   namespace foo { using bar.baz; } //C# style
   namespace foo { using namespace bar::baz; } //C++ style
   package foo { import bar.baz; } //ActionScript style

Though:
   import bar.baz.*;  //Java

But, Java differs here in that the code structure (and packages) are 
directly tied to organization of files in the filesystem (typically with 
one class per file).

Contrast, .NET and C# used "assemblies" as the organizing principle; or, 
generally, everything that is being compiled together to become a given 
EXE or DLL is lumped into a single unit.

Though, one option could be to organize code instead by namespace, with 
the toplevel tied to each location in the search path.

Though, with such a compiler, rather than specifying a list of 
individual source files, one might specify directories and the compiler 
figures things out on its own (basically compiling everything in a given 
directory).

One way to handle things like static libraries would be to build a blob 
of intermediate bytecode (and/or native-code COFF or ELF objects) along 
with a manifest database. The bytecode blob would contain all of the IR 
for the library (or machine-code if native), and the manifest would only 
contain declarations (preferably in a semi-compact form that is 
reasonably efficient to search). The manifests could then partly be used 
for knowing about declarations, and also for which objects or libraries 
to pull into the program being compiled (rather than giving them 
individually on the command-line).

This approach would differ from .NET which embeds all of the metadata 
into the object-files and distributable binaries. But, here I am 
assuming that the final binary is a bare native EXE or DLL image here; 
meaning that any manifest data for a DLL would need to be handled more 
like an "import library".

In .NET, generally the EXE or DLL was merely being used as an external 
packaging scheme for holding the VM's IR image (typically with no actual 
machine code in the ExE/DLL; or for EXE's merely a stub to try to launch 
the .NET runtime).

Can note that JVM uses JAR files that repurpose the ZIP format, but ZIP 
is a high-overhead format when used in this way. For my own uses, I 
typically used a variant of the WAD2 format, or a custom format I called 
WAD4, which also has lower overheads if compared with ZIP.

Can note:
   WAD2: Originated with Quake
     Typically has 16-byte names, no directories.
     I have a variant that adds directories,
       but names drop to 14 chars if non-root.
       Encodes directories by each entry encoding its parent directory.
     Names are typically stored without file extensions.
   WAD4: Custom, used in my project some for data packaging and small VFS
     Similar to WAD2, but with directory trees;
     Name size is expanded to 32 bytes;
     Also had some amount of Unix style file metadata.
       Mostly for when used as a VFS.
       Namely: UID/GID/Mode.

Contrast to an actual filesystem, which typically has a more complex 
structure. But, a format like WAD2 or WAD4 can keep overhead low (and 
both are more versatile than the IWAD/PWAD format used in Doom; which 
only stored a flat list of 8-byte names).

Decided to leave out a bunch of stuff (for conciseness) and note that 
the most likely option ATM could be to use a further modified form of 
WAD2 for manifests, where:
If payload data is small enough, it may be stored inline in the 
directory entry;
If the name is too large for the name field, it is stored externally 
(similar to payload data).

Metadata would likely be structured in a way that is superficially 
similar to the "Windows System Registry". Within my compiler, I had 
already used a similar system, though in the past the metadata had been 
expressed in a textual form (based similar to the REG format used by 
Windows, usually when installing stuff; itself derived from the INI 
format). In this way, the WAD lumps being used more as key/values or 
data blobs rather than in a file-like way.

Can note that my considered format is somewhat different from the "hive" 
format used by Windows, but that format would be needlessly bulky if 
used for compiler metadata (and more suited for HDD based access, not so 
much for blobs to be read into RAM buffers).

All this stuff could likely differ from one compiler implementation to 
another though (similar to how compilers may differ as to which format 
they use for object files and static libraries, and mostly no one 
notices or cares).

...

[toc] | [prev] | [next] | [standalone]

#394534

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-10-10 23:45 +0000
Message-ID	<20251010162821.357@kylheku.com>
In reply to	#394527

On 2025-10-10, David Brown <david.brown@hesbynett.no> wrote:
> On 10/10/2025 08:27, BGB wrote:
>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>> bart <bc@freeuk.com> writes:
>
>>>
>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>> all instances of:
>>>>>     char *s0, *s1;
>>>>> To:
>>>>>     char* s0, s1;
>>>>> Well, this is likely to get old, unless it still uses, or allows C
>>>>> style declaration syntax in this case.
>>>>
>>>> That one's been fixed (50 years late): you instead write:
>>>>
>>>>   typeof(char*) s0, s1;
>>>>
>>>> But you will need an extension if it's not part of C23.
>>>
>>> Yes, that will work in C23, but it would never occur to me to
>>> write that.  I'd just write `char *s0, *s1;` or, far more likely,
>>> define s0 and s1 on separate lines.  Using typeof that way triggers
>>> my WTF filter.
>>>
>> 
>> Agreed.
>> 
>> 
>> 
>> I think it can be contrast with C# style syntax (with "unsafe") where 
>> one would write:
>>    char* s0, s1;
>
> Does C# treat s1 as "char*" in this case?  That sounds like an 
> extraordinarily bad design decision - having a syntax that is very like 
> the dominant C syntax yet subtly different.

The detailed properties of C syntax do not have that much mind share
in the kind of development done in C# and its ilk.

Only a minority of developers within a minority moving between
C and C# would suffer from confusion.

>
> Issues like this have been "solved" for decades - in the sense that 
> people who care about their code don't make mistakes from mixups of 
> "char" and "char*" declarations.  There are a dozen different ways to be 
> sure it is not an issue.  Simplest of all is a style rule - never 
> declare identifiers of different types in the same declaration.
> have preferred that to be a rule baked into the language from the start, 
> but we all have things we dislike about the C syntax.

But the C syntax lets us factor out a common, complex part of the type
between two declared entities into the stem, so that we then highlight
what is different between them, without using a typedef alias.  And the
fact that they are in the declaration, shows they are related.

  struct foo {
    /* lotsa members */

  } x[42], *px = x, px_end = x + 42;

I /think/ that Java goes further in that you can factor out array
derivation into the stem:

  int[3] a[4], b; // don't quote me on it

But something occurs to me. typedef shouldn't be a storage class;
that is silly. typedef should be something you can derive in a
declarator. Then you could do this:

  struct {
    /* lotsa members */

  } typedef(foo), x[42], *px = x, px_end = x + 42;

How about a two-argument variant of typedef for use in
any part of a declarator:

  int typedef(typedef(*, ptr_t)foo[42], array_t);

This is just 

  int *foo[42];

in which the pointer to int is typedefed as ptr_t, the array of 42
of those as array_t,  and foo is declared as an object of that
array type.

Maybe :typedef syntax could be better.

  struct {
    /* lotsa members */

  } foo : typedef, x[42], *px = x, px_end = x + 42;

and

  int *:typedef(ptr_t) foo[42]:typedef(array_t);

Same thing: when the pointer is derived via *, the :typedef(name) syntax
takes a snapshot of that type and stores it into the scope under that
typedef name.  Same thing with :typedef(arg) after the array declarator.

:)

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#396245

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2026-01-06 18:24 -0800
Message-ID	<86h5syqhwk.fsf@linuxsc.com>
In reply to	#394524

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> bart <bc@freeuk.com> writes:
>
>> On 09/10/2025 04:49, BGB wrote:
>
> [...]
>
>> Nobody cares about C syntax.
>
> That is so manifestly untrue that I can't imagine what you actually
> meant.
>
> Many of us, myself included, don't particularly like some aspects of
> C syntax, but that's not the same as not caring about it.
>
>>                              Learning all its ins and outs seems be
>> a rite of passage.
>
> Perhaps.  It's also necessary if you want to work with the language.
>
>> The trouble is that C-style is so dominant, few people would know
>> what a decent syntax looks like.  Or, more, likely, they associate
>> a clean, well-designed syntax with toy or scripting languages, and
>> can't take it seriously.
>>
>> But if it looks as hairy as C++ then it must be the business!
>
> C syntax has survived and been propagated to other languages because
> it's well known, not, I think, because anybody really likes it.

I like C syntax, most of it anyway.  Even the parts I don't
especially like aren't horrendous, which is more than I can
say for some other popular languages.

[toc] | [prev] | [next] | [standalone]

#394544

From	Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Date	2025-10-14 06:29 +0200
Message-ID	<10ckjjf$2mg8q$1@dont-email.me>
In reply to	#394517

(Sorry for the delayed reply; your ~450 lines post was too long for
me to consider a timely reply.)

On 09.10.2025 05:49, BGB wrote:
> On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
>> On 08.10.2025 19:29, BGB wrote:
>>> On 10/8/2025 8:59 AM, Janis Papanagnou wrote:

> 
> Throughout much of my life, C++ has been around, but using it has often
> turned into a footgun. Early on the code had a bad habit of breaking
> from one compiler version to another, or the ability to compile C++ code
> in general would be broken (primarily with Cygwin and MinGW; where
> whether or not "g++" worked on a given install attempt, or with a given
> program, was very hit or miss).

I used it early on on various Unix platforms; all had some details
different - like the way how templates worked in the development
environment - but nothing was really an issue; as with current
configuration settings this was covered and handled by the build
system.

It doesn't astonish me the least if you've faced specific problems
on the Windows platforms.

> [...]
> 
> In most cases, it left C as a more preferable option.
> C can be made to do the same stuff at similar performance, with often
> only minimal difference in expressive power.

The problem is, IMO, rather that "C", in the first place, doesn't
compare to C++ in its level of "expressive power".

> 
> And, the main "powerful" tool of C++, templates,

(IMO, the main powerful tool was primarily classes, polymorphisms,
also [real] references.)

> tending to do bad
> things to build times and result in excessive code bloat.

I recall that initially we had issues with code bloat, but I don't
recall that it would have been a problem; we handled that (but,
after that long time, don't ask me how).

> 
> And, if one tries to avoid C++'s drawbacks, the result was mostly code
> that still looks mostly like C.

(That sounds as if you haven't used OO designs, reference parameters,
overloading, and so on, obviously.)

> 
> Though, similar was often a problem in my other language design
> attempts: The most efficient way to do things was often also the C way.

IME, *writing* software in "C" requires much more time than in C++;
presuming you meant that with "most efficient way to do things".

(Saving a few seconds in "C" compared to C++ programs can hardly be
relevant, I'd say; unless you were not really familiar with C++ ?
Or have special application areas, as I read below in the post.)

> [...]
> 
> Some amount of my stuff recently has involved various niche stuff.
>   Interfacing with hardware;
>   Motor controls;
>   Implementing things like an OpenGL back-end or similar;
>   Being used for a Boot ROM and OS kernel;
>   Sometimes neural nets.

"Nice. - I've done Neural Net simulations with C++ back these days.)

> 
> Few traditional languages other than C work well at a lot of this.
> 
> 
> A usual argued weakness of C is that it requires manual memory
> management. But, OTOH, you *really* don't want a GC in motor controls or
> an OS kernel or similar.
> 
> Like, if the GC triggers, and an interrupt handler happens at a bad
> time, then you have a problem.
> 
> Or, if you have a 1us timing tolerance for motor controls and this gets
> blown because the GC takes 75ms, etc...

Sure, you should know where to use static memory, dynamic management
organized yourself, or "I-don't-want-to-care" and use GC management,
or a sensible deliberate mixture of that (if the language allows).

(I've never used GC with C++; is that meanwhile possible?)

> [...]
> 
> Maybe C will be around indefinitely for all I know.

Not unlikely.

> 
> Like, the passage of time still hasn't totally eliminated FORTRAN and
> COBOL.

There's obviously some demand. *shrug* - I don't care much. - My last
"contact" with FORTRAN was when one of my children was asked to handle
some legacy library code; my suggestion was to get rid of that task.

> And, C is far more commonly used than either.
> 
> Unless maybe something can come along that is a better C than C...

There's so many languages meanwhile - frankly, there were already a
lot back then, four decades ago! - so I don't think the proliferation
will stop; I don't think that evolution is a good thing. It seems that
often the inventors have their own agenda and the success of languages
depends mainly on the marketing efforts and the number of fan-people
that got triggered by newly invented buzzwords, and an own invented
terminology [for already existing old concepts]!

> [...]
> 
>> I certainly agree to what a "clean language" can be.
>>
>> My opinion on that is, though, that the "C" base of C++ is part of
>> the problem. Which doesn't let it appear to me "C" to be "better"
>> than C++, but that the "C" base is part of C++'s problem. (Here
>> I'm not speaking about "C++"'s own problems that probably entered
>> about with C++0x/C++11, IMO. - Mileages certainly vary.)
>>
> 
> Possibly.
> 
> 
> A new C-like language need not necessarily be strictly C based.

(There's a couple things I like in "C". But if I'd have to invent a
language it would certainly not be "C-like". I'd took a higher-level
[better designed] language as paragon and support the "C" features I
like, if not already present in that language.)

> 
> My thinking would be likely keeping a similar basic syntax though,
> though likely more syntactically similar to C#,

(But the syntax is one of C's and descendants' problem, IMO. - Part
of what was described in existing "C-like" languages is either the
less-desired elements or deviations, but the latter will probably
just add to confusion if details are subtle. It's already bad enough
with subtle differences between different "C" standards it seems.)

> but retaining more in
> terms of implementation with C and C++.

(But weren't exactly these languages already [partly] invented with
such an agenda?)

> 
> Would likely simplify or eliminate some infrequently used features in C.
> 
> Possibly:
>   Preprocessor, still exists, but its role is reduced.
>     Its role can be partly replaced by compiler metadata.
>   Trigraphs and digraphs: Gone;
>   K&R style declarations, also gone;
>   Parser should not depend on previous declarations;
>   Non trivial types and declarator syntax: Eliminate;
>   ...

Sounds all reasonable to me.

> 
> Possibly:
> Pointers and arrays can be specified on the type rather than declarator
> (so, more like C# here)

(Yeah, but mind the comments on effects of "subtle differences".)

> [...]
> 
> Though, the harder problem here isn't necessarily that of designing or
> implementing it, but more in how to make its use preferable to jus
> staying with C.

Well, as formulated, that's an individual thing. Meanwhile I have the
freedom to use what I like in my recreational activities, but if we
consider professional projects there's conditions and requirements to
take into account.

> 
> One merit is if code can be copy-pasted, but if one has to change all
> instances of:
>   char *s0, *s1;
> To:
>   char* s0, s1;

Such changes would be annoying. (And I say that with a strong aversion
of C's declaration syntax.) - For me, "C" is not a good base; neither
to keep its bad syntax nor to have to change it alike in subtle ways.

My style is anyway another; [mostly] separate declarations, and those
initialized, as in

  char * s0 = some_alloc (...);
  char * s1 = 0;

More important is that such declarations may appear anywhere not just
at the begin of a block. (I'm still traumatized by K&R, I suppose.)

> [...]
> 
> Java and C# had made 'char' 16-bit, but I now suspect this may have been
> a mistake. It may be preferable instead keep 'char' as 8 bits and make
> UTF-8 the default string format. In the vast majority of cases, strings
> hold primarily or entirely ASCII characters.

I think we should be careful here! An Unicode "character" may require
even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
If we want a sensible type system defined we should be aware of that
difference. The question is; what shall be expressed by a 'char' type;
the semantic entity or the transfer syntax. (This question is similar
to the Unix file system, also based on octets; that made it possible
to represent any international multi-octet characters. There's some
layer necessary to get from the "transfer-syntax" (the encoding) to
the representation.) - What will, say, a "C" user expect from 'char';
just move it around or represent it on some output (or input) medium.

> 
> Also, can probably have a string type:
>   string str="Some String";
> But, then allow that string is freely cast to "char*", ...

(Wasn't that so in C++? - And in addition there's the corresponding
template classes, IIRC. - But I don't recall all the gory details.)

> Well, and that the underlying representation of a string is still as a
> pointer into a string-table or similar.

> Also the design of the standard library should remain conservative and
> not add piles of needless wrappers or cruft.

Not sure what you have in mind here.

Personally, despite some resentment on some of the complex syntax
and constructs necessary, I liked the C++ STL; its orthogonality
and concepts in principle. (And especially if compared to some
other languages' ad hoc "tool-chest" libraries I stumbled across.)

>>> [...]
>>>
>>> Like, one can throw out the whole mess that is dealing with
>>> Multiple-Inheritance
>>
>> Well, when I started with C++ there wasn't multiple-inheritance
>> available. Personally thinking its omission would be a mistake;
>> I missed it back these day.
>>
>> I'm not sure what "mess" you have in mind. - Explicit qualification
>> isn't a hindrance. Weakening the independence of classes in complex
>> multi-level class-topologies is something under control of the
>> program designer. - So it's fine to have it with all design options
>> it opens.
> 
> There is both implementation complexity of MI, and also some added
> complexity with using it. The complexity gets messy.

(Okay, if that's what you took from it, I of course accept it.
But I'd have more expected that you might have dislike of some
STL parts than [multiple] inheritance.)

> 
> 
> The SI + Interfaces model can reduce both.

I've used classes with only "pure virtual" functions to achieve
the interface abstraction; since I could easily design what I
needed with standard features and practically no overhead I thus
wasn't missing the 'interface' feature.

(But of course I can see the implementation argument you make.)

> Granted, these can grow their own warts (like default methods or
> similar), but arguably still not as bad as MI.

(Well, I appreciated it to have that feature available in C++,
even though my first OO language, Simula, didn't support it, so
I was used to not having it when I got into C++ and liked it.)

> 
> I am more thinking from the perspective of implementing a compiler.

Hah! Yeah. - Recently in another NG someone disliked a feature
because he had suffered from troubles implementing it. (It was
not MI but formatted I/O in that case.) - I'm not implementing
complex languages, so I guess I can feel lucky if someone else
did the language implementation job and I can just use it.

> [ implementation issues snipped and gracefully skipped ]

> [...]
> Virtual inheritance still means one can't just call the copy logic for
> each parent class when copying a derived class;

(I don't think I agree here. - Or are you still talking of the
implementers' challenges? - But never mind. Programming in C++
I could model everything I liked. That was really nice.)

Janis

> [...]

[toc] | [prev] | [next] | [standalone]

#394548

From	BGB <cr88192@gmail.com>
Date	2025-10-14 20:13 -0500
Message-ID	<10cmsjg$3bdrj$1@dont-email.me>
In reply to	#394544

On 10/13/2025 11:29 PM, Janis Papanagnou wrote:
> (Sorry for the delayed reply; your ~450 lines post was too long for
> me to consider a timely reply.)
> 
> On 09.10.2025 05:49, BGB wrote:
>> On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
>>> On 08.10.2025 19:29, BGB wrote:
>>>> On 10/8/2025 8:59 AM, Janis Papanagnou wrote:
> 
>>
>> Throughout much of my life, C++ has been around, but using it has often
>> turned into a footgun. Early on the code had a bad habit of breaking
>> from one compiler version to another, or the ability to compile C++ code
>> in general would be broken (primarily with Cygwin and MinGW; where
>> whether or not "g++" worked on a given install attempt, or with a given
>> program, was very hit or miss).
> 
> I used it early on on various Unix platforms; all had some details
> different - like the way how templates worked in the development
> environment - but nothing was really an issue; as with current
> configuration settings this was covered and handled by the build
> system.
> 
> It doesn't astonish me the least if you've faced specific problems
> on the Windows platforms.
> 

It was pretty variable, but usual thing was that trying to build any 
kind of C++ code (even a trivial "Hello World") would on some installs 
of these compilers, simply die in a storm of error messages.

Well, and for a given Cygwin install attempt, whether or not "g++" would 
work, etc, was a bit like playing roulette.

After switching to MSVC, things were a little more stable here.
   But, by then there were other issues.

>> [...]
>>
>> In most cases, it left C as a more preferable option.
>> C can be made to do the same stuff at similar performance, with often
>> only minimal difference in expressive power.
> 
> The problem is, IMO, rather that "C", in the first place, doesn't
> compare to C++ in its level of "expressive power".
> 

?...

I have yet to find much that can be expressed in C++ but is not also 
expressible in C.

The main things that are fundamentally different, are things like 
Exceptions and RTTI, but even in C++, these don't come free.

Though, if exceptions are implemented using an approach similar to VEH 
in the Windows X64 ABI, it is at least modest.

>>
>> And, the main "powerful" tool of C++, templates,
> 
> (IMO, the main powerful tool was primarily classes, polymorphisms,
> also [real] references.)
> 

These can be done in C via manually written vtables, and passing the 
address of a variable.

>> tending to do bad
>> things to build times and result in excessive code bloat.
> 
> I recall that initially we had issues with code bloat, but I don't
> recall that it would have been a problem; we handled that (but,
> after that long time, don't ask me how).
> 
>>
>> And, if one tries to avoid C++'s drawbacks, the result was mostly code
>> that still looks mostly like C.
> 
> (That sounds as if you haven't used OO designs, reference parameters,
> overloading, and so on, obviously.)
> 

We can do OO, just using a different approach, say:
   typedef struct FooObj_s FooObj;
   typedef struct FooObj_vt_s FooObj_vt;
   struct FooObj_vt_s (
     void *resv1;
     void *resv2;
     void *resv3;
     void *resv4;
     int (*Method1)(FooObj *self, int x, int y);
     int (*Method2)(FooObj *self, int x, int y, int z);
     ...
   };
   struct FooObj_s {
     FooObj_vt *vt;
     int w;
     ...
   };

And, references as:
   int someFunction(int *rvar);
   ...
   someFunction(&somevar);

It all works, and doesn't require significantly more LOC than it would 
have in C++.

>>
>> Though, similar was often a problem in my other language design
>> attempts: The most efficient way to do things was often also the C way.
> 
> IME, *writing* software in "C" requires much more time than in C++;
> presuming you meant that with "most efficient way to do things".
> 
> (Saving a few seconds in "C" compared to C++ programs can hardly be
> relevant, I'd say; unless you were not really familiar with C++ ?
> Or have special application areas, as I read below in the post.)
> 

Main limiting factor at present is that it is a harder issue to write a 
non-trivial C++ compiler.

I could write C++ code, but then it isn't really portable outside 
running on my PC or similar.

Though, I have a mostly usable C compiler at least.
   At least, usable for porting single programs.
   Trying to port something like the Linux userland, not so much.
     Too much stuff here is written to assume GCC.

Some simple programs worked with "./configure" scripts and getting it to 
mimic GCC enough that configure will try to use it as a cross compiler, 
but then programs invariably break when trying to use various GCC'isms 
or trying to rely on glibc specific stuff or other Linux specific 
headers or so on.

So, yeah, nowhere near up to the level of trying to deal with trying to 
port "bash" and "coreutils" and similar.

But, was able to experimentally port things like "Quake 3 Arena" and 
similar, though Q3A is a little impractical on a 50MHz CPU; but Doom 
runs well.

Granted, a new language would not really address and of the "make 
existing software work" issues.

>> [...]
>>
>> Some amount of my stuff recently has involved various niche stuff.
>>    Interfacing with hardware;
>>    Motor controls;
>>    Implementing things like an OpenGL back-end or similar;
>>    Being used for a Boot ROM and OS kernel;
>>    Sometimes neural nets.
> 
> "Nice. - I've done Neural Net simulations with C++ back these days.)
> 

I have experimented with some, but in this case mostly using a lot of SIMD.

I had noted that in some cases, like primarily SIMD heavy NN code, my 
50MHz FPGA soft-processor could compete surprisingly well with an early 
2000s laptop.

But, then again, also noted by benchmarking said laptop:
   memcpy: ~ 450 MB/sec;
   x87 multiply-accumulate: ~ 60 MFLOP.
   CPU speed: 1400 MHz, 32-bit x86.
     Has MMX and similar, but was not using MMX.
     Had noted the process is mostly bandwidth limited.

The SIMD unit on my soft-processor has a theoretical hard-limit of 200 
MFLOP, but if using compact formats (mostly FP8 for storage, FP16 
internally) and careful pipelining, can approach similar performance to 
the laptop at this task.

I had experimented some with more compact encodings for weights, for 
example:
   FP8U A/B (E4.M4)
   3-bit per value: S,FF (S=Sign)
     A<=B: Interpolated: A, B, (5/8)A+(3/8)B, (5/8)B+(3/8)A
     A> B: Similar, but '111' encodes 0, ...
With interpolation as bytes, results unpacked into vectors of 4x 
Binary16 (with 4 weight vectors in 64 bits).

Could also be used as:
   Monochrome HDR format with 16 texels per block;
   A color HDR format with 4 texels per block.

In this case, there is a big gap in main RAM bandwidth, which seems to 
be a big issue with this task. But, the RAM bandwidth gap is reduced by 
using more compact storage for values (with special-purpose 
instructions, like a Load that also performs the 4xFP8 to 4xFP16 
conversion, ...).

But, yeah, getting "even anywhere close" is kinda notable given the 
laptop has a 28x clock-speed advantage.

Though, the laptop is far more powerful at running Quake and similar (no 
contest regarding Quake performance).

The FPGA soft-processor could maybe compete better if it could do 
Binary16 SIMD multiply-accumulate operations, but the latency would be 
too high and I couldn't pipeline it.

Getting this result requires writing ASM and manually scheduling the 
pipeline though.

>>
>> Few traditional languages other than C work well at a lot of this.
>>
>>
>> A usual argued weakness of C is that it requires manual memory
>> management. But, OTOH, you *really* don't want a GC in motor controls or
>> an OS kernel or similar.
>>
>> Like, if the GC triggers, and an interrupt handler happens at a bad
>> time, then you have a problem.
>>
>> Or, if you have a 1us timing tolerance for motor controls and this gets
>> blown because the GC takes 75ms, etc...
> 
> Sure, you should know where to use static memory, dynamic management
> organized yourself, or "I-don't-want-to-care" and use GC management,
> or a sensible deliberate mixture of that (if the language allows).
> 
> (I've never used GC with C++; is that meanwhile possible?)
> 

It is possible to use conservative mark/sweep collectors in C.
But performance leaves something to be desired.

Younger me tried to do this, but even for things like 3D engines, I 
ended up trying more to find ways to avoid needing to run the GC.

>> [...]
>>
>> Maybe C will be around indefinitely for all I know.
> 
> Not unlikely.
> 
>>
>> Like, the passage of time still hasn't totally eliminated FORTRAN and
>> COBOL.
> 
> There's obviously some demand. *shrug* - I don't care much. - My last
> "contact" with FORTRAN was when one of my children was asked to handle
> some legacy library code; my suggestion was to get rid of that task.
> 

In my case, I don't have any descendants.

Apparently they still exist in some places, mostly as languages that no 
one uses.

Seemingly a lot of businesses made a migration from COBOL to Java.

>> And, C is far more commonly used than either.
>>
>> Unless maybe something can come along that is a better C than C...
> 
> There's so many languages meanwhile - frankly, there were already a
> lot back then, four decades ago! - so I don't think the proliferation
> will stop; I don't think that evolution is a good thing. It seems that
> often the inventors have their own agenda and the success of languages
> depends mainly on the marketing efforts and the number of fan-people
> that got triggered by newly invented buzzwords, and an own invented
> terminology [for already existing old concepts]!
> 

Apparently the languages people are trying to push as C replacements are 
mostly Rust, Zig, and Go.

None of these particularly compel me though.
   They seem more like needless deviations from C than a true successor.

I guess the older generations mostly had Pascal and Ada.

There was ALGOL, but both C and Pascal descended from ALGOL.

>> [...]
>>
>>> I certainly agree to what a "clean language" can be.
>>>
>>> My opinion on that is, though, that the "C" base of C++ is part of
>>> the problem. Which doesn't let it appear to me "C" to be "better"
>>> than C++, but that the "C" base is part of C++'s problem. (Here
>>> I'm not speaking about "C++"'s own problems that probably entered
>>> about with C++0x/C++11, IMO. - Mileages certainly vary.)
>>>
>>
>> Possibly.
>>
>>
>> A new C-like language need not necessarily be strictly C based.
> 
> (There's a couple things I like in "C". But if I'd have to invent a
> language it would certainly not be "C-like". I'd took a higher-level
> [better designed] language as paragon and support the "C" features I
> like, if not already present in that language.)
> 

I would think some major goals might be:
   Allowing for a compiler with a smaller code footprint.
     Though, the backend is often a big source of pain here.
   Language is reasonably clean and orthogonal;
   Is amendable to efficient code generation;
   Low requirements for implementation overhead.
     Should aim for similar hard constraints to C.
     Should still be usable for bare-metal and firmware.
     And for OS kernel programming.

Sadly, cleaning up the frontend language wont do as much to simplify the 
backend.

Cleaning up the backend mostly means needing to limit complexity in 
areas that effect code generation:
   Corner cases in data representation and the type-system;
   Conner cases in the native ABI;
   ...

My preference is to keep a C family syntax, sorta like C# or GLSL.

As noted elsewhere, my thinking is partly that pipeline looks like:
   Preprocessor (basic or optional, C like)
   Parser (Context-independent, generates ASTs)
   Front end compiler: Compiles ASTs to a stack IL.
     With front-end semi-type-aware.

Core language should only require frontend to understand primitive types 
(like in C# and Java; with complex types offloaded to backend). Would 
aim to eliminate headers mostly because headers add considerable bulk to 
the ASTs (far more time and memory often spent dealing with header stuff 
than the actual code in the translation units).

Backend:
   IL -> 3AC/SSA;
   Does code generation and similar.

Likely, most packaging (for IL object files and static libraries) would 
be based around a variation of the WAD format (probably WAD2 based; 
though in simple cases the Doom IWAD/PWAD format works well).

One of my past (stalled) attempts at doing a smaller C compiler had been 
using a modified WAD in place of the COFF format, though in this case it 
is debatable how much was really saved by using WAD in place of COFF 
here (and some of the tables partly derived from ELF as well).

It is kinda pros/cons between modified WAD and RIFF-style TLV formats.
RIFF is more traditional;
But, WAD sometimes fits use-patterns better, and can be adapted to 
different contexts.

One use is to add an additional magic to encode the use-case of the header.

Would be nice if I could manage the thing of fitting a full-featured 
compiler in under 40k lines.

>>
>> My thinking would be likely keeping a similar basic syntax though,
>> though likely more syntactically similar to C#,
> 
> (But the syntax is one of C's and descendants' problem, IMO. - Part
> of what was described in existing "C-like" languages is either the
> less-desired elements or deviations, but the latter will probably
> just add to confusion if details are subtle. It's already bad enough
> with subtle differences between different "C" standards it seems.)
> 

Some simplification is possible, particularly regarding things like 
declarations; without drastically changing the look of the language.

So, the language may still look like C, but be a little easier to parse.

Keeping the general syntax intact helps with familiarity and ease of 
writing code for those who already know similar languages. Though, yes, 
looking mostly similar to C, but not exactly, could annoy some people.

Though, mostly, the syntax could follow similar patterns to C# and Java. 
Would differ from Java mostly in the avoidance of needless verbosity; 
and allowing a more free-form program structure.

It is also possible to allow for a subset of code that is valid in both 
languages.

>> but retaining more in
>> terms of implementation with C and C++.
> 
> (But weren't exactly these languages already [partly] invented with
> such an agenda?)
> 

?...

I am imagining something that basically does similar stuff to what C 
already does, and can ideally be used in a similar context.

The main downsides is that C and C++ are more complicated than ideal in 
many areas. This has a detrimental effect on compilers.

Not so much intending to make a language that tries to be more intuitive 
or hand-holding though. However, if it is possible to make provisions 
for things like static-analysis or bounds-checked arrays (in a way that 
ideally doesn't adversely effect performance), this can be nice.

In some cases, one can try to pass some compile-tie metadata through the 
type-system, but this has the downside of adding complexity for the 
compiler. Though, it could be allowed in cases where mostly relevant for 
static analysis but without actually adding new requirements for the 
implementation.

Say, for example:
   "int*" and "int[]" are equivalent for the ABI and for a minimal 
implementation, but a more advanced implementation is allowed to 
constrain the allowed semantics for "int[]" in ways that would not 
necessarily be valid for "int*".

Well, and you could "have your cake and eat it too", say, by having 
"int[]" and friends allow for very aggressive TBAA (type-based alias 
analysis) but "int*" is assumed to alias readily.

Also, "int[]" can be assumed to potentially convey implicit array 
bounds, whereas "int*" can be assumed to not convey array bounds (even 
if the compiler represents both, at the machine level, as a bare pointer 
to the first element of the array).

So, for example:
   Foo obj1;
   Bar obj2;
Where obj1 and obj2 may only alias in the case of subclass/superclass 
relationships (but, if not potentially the same class instance; can be 
assumed that no alias is possible).

Yeah, I am aware there is the "provenance" thing, but personally I fail 
to understand how exactly the provenance model is supposed to work (so 
it makes more sense to me to operate within the limits of more 
conventional aliasing semantics; and defining rules for when it is safe, 
and when it is not safe, to assume non-alias based on types or similar).

Well, and personally I feel "assume TBAA may wreck your day; so just use 
memcpy()" to be a crappy solution. I also sympathize with the desire to 
not ask people to put "restrict" or similar all over the place (and to 
be able to optimize stuff by assuming that non-aliasing things don't alias).

Realistically, asking the compiler to infer any value flow outside of a 
single frame is asking too much. So my model would likely assume that 
each function exists as its own island in terms of pointer aliasing.

Well, and a simpler model (used by BGBCC) where taking the address of 
something, etc, effectively nukes the ability to assume non-alias.

>>
>> Would likely simplify or eliminate some infrequently used features in C.
>>
>> Possibly:
>>    Preprocessor, still exists, but its role is reduced.
>>      Its role can be partly replaced by compiler metadata.
>>    Trigraphs and digraphs: Gone;
>>    K&R style declarations, also gone;
>>    Parser should not depend on previous declarations;
>>    Non trivial types and declarator syntax: Eliminate;
>>    ...
> 
> Sounds all reasonable to me.
> 
>>
>> Possibly:
>> Pointers and arrays can be specified on the type rather than declarator
>> (so, more like C# here)
> 
> (Yeah, but mind the comments on effects of "subtle differences".)
> 

Possible, though if there is a type mismatch here, most likely the 
compiler will error-out.

>> [...]
>>
>> Though, the harder problem here isn't necessarily that of designing or
>> implementing it, but more in how to make its use preferable to jus
>> staying with C.
> 
> Well, as formulated, that's an individual thing. Meanwhile I have the
> freedom to use what I like in my recreational activities, but if we
> consider professional projects there's conditions and requirements to
> take into account.
> 

Probably.

I would like something to be a "good" alternative to C, while:
   Allowing cheap/simple compiler;
   Rules turned to make static analysis less of a pain;
   Doesn't overly hinder a memory-safe implementation;
   But, also can be used for machine-level development;
   Shouldn't be overly unfamiliar to those who know similar languages.

>>
>> One merit is if code can be copy-pasted, but if one has to change all
>> instances of:
>>    char *s0, *s1;
>> To:
>>    char* s0, s1;
> 
> Such changes would be annoying. (And I say that with a strong aversion
> of C's declaration syntax.) - For me, "C" is not a good base; neither
> to keep its bad syntax nor to have to change it alike in subtle ways.
> 
> My style is anyway another; [mostly] separate declarations, and those
> initialized, as in
> 
>    char * s0 = some_alloc (...);
>    char * s1 = 0;
> 
> More important is that such declarations may appear anywhere not just
> at the begin of a block. (I'm still traumatized by K&R, I suppose.)
> 

Yeah.

I would assume allowing putting declarations wherever.

>> [...]
>>
>> Java and C# had made 'char' 16-bit, but I now suspect this may have been
>> a mistake. It may be preferable instead keep 'char' as 8 bits and make
>> UTF-8 the default string format. In the vast majority of cases, strings
>> hold primarily or entirely ASCII characters.
> 
> I think we should be careful here! An Unicode "character" may require
> even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
> If we want a sensible type system defined we should be aware of that
> difference. The question is; what shall be expressed by a 'char' type;
> the semantic entity or the transfer syntax. (This question is similar
> to the Unix file system, also based on octets; that made it possible
> to represent any international multi-octet characters. There's some
> layer necessary to get from the "transfer-syntax" (the encoding) to
> the representation.) - What will, say, a "C" user expect from 'char';
> just move it around or represent it on some output (or input) medium.
> 

It is a tradeoff.
But, if "char*" can point to a string, then "char" needs to be the same 
size as an item in memory (thus, probably a byte).

Otherwise, it would make sense to have "char" as an alias to "int" and 
require "ubyte*" for use as strings. For consistency with C, makes more 
sense to assume char to be a byte.

>>
>> Also, can probably have a string type:
>>    string str="Some String";
>> But, then allow that string is freely cast to "char*", ...
> 
> (Wasn't that so in C++? - And in addition there's the corresponding
> template classes, IIRC. - But I don't recall all the gory details.)
> 

C++ string seemingly assumes some sort of object representation (that 
could be cast to a pointer).

I am more assuming that it is an implementation type which would be 
represented as basically the equivalent of "const char *restrict".

But, with the compiler able to assume that it is a string type, so one 
of either:
A pointer to a string literal in some presumably read-only memory area;
A character array or buffer that was interned into a string table.

But not "a pointer to a modifiable character buffer".

In the latter case, "char*" or "char[]" would be considered the correct 
types to use.

Nominally, "string" would likely not allow pointer arithmetic, but could 
decay into "const char *" or similar, which would allow pointer arithmetic.

While, object-based strings are a perennial feature in many languages, 
having them as anything much more complex than a pointer to a string 
table adds overhead.

One can argue that one merit of object-based representations is that 
then you don't have to use a generic "strlen()"; but for constant 
strings and string tables, there is a workaround that I have used in 
some of my own past languages:
   Look at the preceding byte:
     00: Raw string, you will need to "strlen()" it;
     01..BF: We are not looking at the start of a string.
     C0..EF: Length-prefix present
       Encoded as a byte-transposed UTF-8 value.

The prefix can also be used to encode the character encoding, but for 
this case I will assume it is always UTF-8.

>> Well, and that the underlying representation of a string is still as a
>> pointer into a string-table or similar.
> 
>> Also the design of the standard library should remain conservative and
>> not add piles of needless wrappers or cruft.
> 
> Not sure what you have in mind here.
> 
> Personally, despite some resentment on some of the complex syntax
> and constructs necessary, I liked the C++ STL; its orthogonality
> and concepts in principle. (And especially if compared to some
> other languages' ad hoc "tool-chest" libraries I stumbled across.)
> 

I was primarily thinking of Java and its excessive piles of wrapper 
classes. Like, C gives you the stdio functions, which are basic but 
effective.

Java has:
WhateverInputStream, WhateverOutputStream, 
WhateverRandomAccessWhateverStream, etc.

We don't need this. Java just sort of ran with it, creating piles of 
random wrapper classes whose existence serves almost no practical 
purpose (and would have been much better served, say, by simply 
providing a File class that holds a mock-up of C's stdio interface; 
which is, ironically, closer to the approach C# had taken here).

The great sin here of C++ is mostly things like iostream.

I would in any case assume not following Java's pattern of "overly 
bureaucratic boilerplate". Well, or assuming that programmers can't 
think for themselves and will just look through a word-salad list until 
they find whatever class has the combination of words describing the 
specific task they intend to do.

>>>> [...]
>>>>
>>>> Like, one can throw out the whole mess that is dealing with
>>>> Multiple-Inheritance
>>>
>>> Well, when I started with C++ there wasn't multiple-inheritance
>>> available. Personally thinking its omission would be a mistake;
>>> I missed it back these day.
>>>
>>> I'm not sure what "mess" you have in mind. - Explicit qualification
>>> isn't a hindrance. Weakening the independence of classes in complex
>>> multi-level class-topologies is something under control of the
>>> program designer. - So it's fine to have it with all design options
>>> it opens.
>>
>> There is both implementation complexity of MI, and also some added
>> complexity with using it. The complexity gets messy.
> 
> (Okay, if that's what you took from it, I of course accept it.
> But I'd have more expected that you might have dislike of some
> STL parts than [multiple] inheritance.)
> 

Not exactly a fan of STL either, but these are different.

As noted, my concern here is more for compiler complexity, and MI is 
more a big thorn in the side to anyone who wants to write their own 
compiler.

>>
>>
>> The SI + Interfaces model can reduce both.
> 
> I've used classes with only "pure virtual" functions to achieve
> the interface abstraction; since I could easily design what I
> needed with standard features and practically no overhead I thus
> wasn't missing the 'interface' feature.
> 
> (But of course I can see the implementation argument you make.)
> 

Yeah.

An "abstract base class" can be inferred to be an interface.

So, one could also end up with a C++ implementation that allows:
   single inheritance;
   abstract base classes.

But doesn't allow true MI.
Ironically this is closer to how BGBCC's attempt at C++ turned out.

>> Granted, these can grow their own warts (like default methods or
>> similar), but arguably still not as bad as MI.
> 
> (Well, I appreciated it to have that feature available in C++,
> even though my first OO language, Simula, didn't support it, so
> I was used to not having it when I got into C++ and liked it.)
> 
>>
>> I am more thinking from the perspective of implementing a compiler.
> 
> Hah! Yeah. - Recently in another NG someone disliked a feature
> because he had suffered from troubles implementing it. (It was
> not MI but formatted I/O in that case.) - I'm not implementing
> complex languages, so I guess I can feel lucky if someone else
> did the language implementation job and I can just use it.
> 

I am writing from the POV of someone who did start making an attempt to 
implement C++ support, and mostly gave up at roughly an early 1990s 
feature level.

If you dropped MI, templates, and pretty much everything following from 
these, stuff would be a lot easier.

>> [ implementation issues snipped and gracefully skipped ]
> 
>> [...]
>> Virtual inheritance still means one can't just call the copy logic for
>> each parent class when copying a derived class;
> 
> (I don't think I agree here. - Or are you still talking of the
> implementers' challenges? - But never mind. Programming in C++
> I could model everything I liked. That was really nice.)
> 

Still implementation.

Theoretically, any combination of features that is allowed in the 
language should also be allowed by the compiler.

With simpler "POD" classes, it is mostly a "memcpy()" internally.

With things like virtual inheritance and non-trivial inheritance 
patterns, "all hell breaks loose".

It solves "the diamond inheritance problem" from the perspective of the 
user, but creates a new problem for the implementation:
Now the in-memory layout of the parent classes depends on how they are 
used within the derived class.

Combined with the ability to assign classes by-value, as far as compiler 
implementation goes, now you have a mess on your hands.

> Janis
> 
>> [...]
> 
>

[toc] | [prev] | [next] | [standalone]

#394556

From	bart <bc@freeuk.com>
Date	2025-10-15 11:26 +0100
Message-ID	<10cnstb$3irl3$1@dont-email.me>
In reply to	#394548

On 15/10/2025 02:13, BGB wrote:

> Apparently the languages people are trying to push as C replacements are 
> mostly Rust, Zig, and Go.
> 
> None of these particularly compel me though.
>    They seem more like needless deviations from C than a true successor.

So what would a true successor look like?

> 
> 
> I guess the older generations mostly had Pascal and Ada.
> 
> There was ALGOL, but both C and Pascal descended from ALGOL.

I've heard that before that C was somehow derived from Algol and even 
Algol 68.

But it is so utterly unlike either of those, that if it's from the same 
family, then it must have been adopted.

> As noted elsewhere, my thinking is partly that pipeline looks like:
>    Preprocessor (basic or optional, C like)
>    Parser (Context-independent, generates ASTs)
>    Front end compiler: Compiles ASTs to a stack IL.

> Backend:
>    IL -> 3AC/SSA;

That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?

Why not go straight to 3AC?

(I've tried both stack and 3AC ILs, but not both in the same compiler! I 
finally decided to stay with stack; 3AC code *always* got too fiddly to 
deal with.

So stack IL is directly translated to register-based, unoptimised native 
code, which reasonably efficient. Performance is usually somewhere in 
between Tiny C and gcc-O2.)

[toc] | [prev] | [next] | [standalone]

#394560

From	BGB <cr88192@gmail.com>
Date	2025-10-15 13:00 -0500
Message-ID	<10conk5$3rhj2$1@dont-email.me>
In reply to	#394556

On 10/15/2025 5:26 AM, bart wrote:
> On 15/10/2025 02:13, BGB wrote:
> 
>> Apparently the languages people are trying to push as C replacements 
>> are mostly Rust, Zig, and Go.
>>
>> None of these particularly compel me though.
>>    They seem more like needless deviations from C than a true successor.
> 
> So what would a true successor look like?
> 

Probably sorta like C with a few vaguely C++ like features, but with a 
cleaner and simpler design.

Should ideally be usable for similar stuff to C.
   Not drastically or needlessly different.

Looking around, it seems like the CMU C0 and C1 teaching languages also 
seem in the general area design-wise, though they exist more as limited 
C-like subset languages intended more for introductory programming for 
CS courses.

Could make sense to have some C++ style functionality, but with an aim 
of not going down the rabbit hole of adding excessive implementation 
complexity.

>>
>>
>> I guess the older generations mostly had Pascal and Ada.
>>
>> There was ALGOL, but both C and Pascal descended from ALGOL.
> 
> I've heard that before that C was somehow derived from Algol and even 
> Algol 68.
> 
> But it is so utterly unlike either of those, that if it's from the same 
> family, then it must have been adopted.
> 

Idea is that it went ALGOL -> BCPL -> B -> C.
   Going the other way, ALGOL was derived from FORTRAN.

ALGOL was also the ancestor of Pascal and Ada, so there was a bit of 
mutation there,

> 
>> As noted elsewhere, my thinking is partly that pipeline looks like:
>>    Preprocessor (basic or optional, C like)
>>    Parser (Context-independent, generates ASTs)
>>    Front end compiler: Compiles ASTs to a stack IL.
> 
>> Backend:
>>    IL -> 3AC/SSA;
> 
> That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?
> 
> Why not go straight to 3AC?
> 
> (I've tried both stack and 3AC ILs, but not both in the same compiler! I 
> finally decided to stay with stack; 3AC code *always* got too fiddly to 
> deal with.
> 

Well, the downside of 3AC (as an IL) is that it tends to be fiddly and 
often is much more specific to the design choices of the frontend and 
backend that produced it.

Also, going from a Stack IL to 3AC is fairly easy, and generally less of 
a mess than dealing with a 3AC IL here. Also with 3AC one has to decide 
on things like whether or not it is in SSA form, as SSA vs non-SSA 
follow different rules.

Downside is that a stack IL is often further from the code you "actually 
want to generate" than a 3AC IL would have been (and to generate more 
efficient 3AC you may need to generate less-concise stack code, such as 
my having the frontend manually use temporary variables, partly negating 
some of the conceptual benefits of a stack IR, but alas).

But, on the positive side, the stack manipulations/etc map readily to 
SSA form.

A stack IL that makes sense for a compiler might look like:
   Stack ops for each major operator;
   No explicit types in most instructions.
     Type can be carried along the stack.
     The .NET IL also did this.
   Control flow is via labels and conditional branches.
   Typically no items on the stack during a branch.
   May make sense to combine common stack-ops with storing to a variable.
     Say: "ADD; STORE n" => "ADD_ST n"
     Rationale being that this is less work for the backend.
   Types can be identified by signature strings.

Granted, one can note that a stack IL typically needs around 70% more 
operations than you would need for a 3AC, but most of these operations 
will disappear in the conversion process.

one semi-unresolved design issue is whether it is better to have a 
single unified numbering space for local variables, like in the JVM and 
similar, or several different numbering spaces (arguments, locals, and 
temporary variables). In my ILs, I have often ended up going for the latter.

Say, for example, you can encode the "name"/"symbol" for Load/Store/Etc 
as a VLN, say:
   0xxxxxxx: 0..127
   10xxxxxx xxxxxxxx 128..16383
   110xxxxx ...: 16384..2M
   ...
And then use a tagging scheme to encode variable IDs, say:
   ...xxxx00  Local
   ...xxxx10  Temporary
   ...xxx001  Argument
   ...xxx101  Int32 Literal
   ...xx0011  Global Variable
   ...xx1011  String Literal

Where Locals and Temporaries are given the shortest code as these are 
more common and preferably have shorter (single byte) encodings when 
possible (so, for example, the first 32 local variables can be single 
byte, etc).

For integer literals, one can additionally use a zigzag coding 
(0,-1,1,-2,2, ...). String literals can be encoded as an offset into a 
string table.

for something like a typecast operator, you might encode an offset into 
a string table for a type-signature string.

...

Well, sorta, the IL used in BGBCC isn't quite so clean.
It instead encodes strings and symbols inline, and uses a sliding table 
to refer back to them when they repeat. This also works, but is more 
ugly than encoding IDs and using a string table might have been.

But, string tables make more sense for an externally-structured format.

Ironically, came up with a possible format for manifest files (loosely 
WAD based) that could also make sense as an IL packaging format.

Ended up going back and forth between having it be WAD2 or WAD4 based, 
instead ended up with a compromise of supporting mixed 32 and 64 byte 
entries. Would have a tree structure similar to WAD4, but with the 
downside that for the 32-byte entries names are reduced to 10 bytes (vs 
32 bytes for the 64-byte entries; or 16 bytes in the original WAD2 format).

But, can debate whether or not this would make sense in a 
space-efficiency sense. The design is more focused on semi-efficient 
random access rather than compactness (whereas typically bytecode IL 
packaging is more focused on being compact).

Though, compactness may not matter as much for things like object-files 
which are less likely to be used to actually distribute code.

Though, one merit is that it could more easily allow for a compiler that 
decodes stack-IR into 3AC one function at a time, or demand-loads parts 
of the image, rather than needing to load everything for the whole 
program in advance (and burning a lot of RAM this way).

Annoyingly, even a simple format like IWAD would still end up needing 16 
bytes per entry.

But, it can offer more flexibility (and not needing an additional 
mechanism to look things up by QName), say, if compared with a format 
like RIFF (which has an 8-byte minimum overhead per lump). Well, and the 
scheme as-is, allows lumps with <= 12 bytes of payload to encode it 
inline in a 32-byte entry, so... Might not be too far behind, may just 
make sense to use it, and then LZ compress it if it needs to be more 
compact.

> So stack IL is directly translated to register-based, unoptimised native 
> code, which reasonably efficient. Performance is usually somewhere in 
> between Tiny C and gcc-O2.)
> 

I usually go stack -> 3AC, and then 3AC -> Native.
In BGBCC, as-is, there was no separate assembler step, but I now realize 
this probably isn't ideal (and I still end up needing an assembler 
anyways, just now it is a little more of a mess as it isn't really 
cleanly separated from the rest of the 3AC->Native backend).

Long ago, when I originally write it, there was an x86 backend, which 
didn't use an assembler. For the SuperH backend, I initially skipped 
having an assembler, and my current backend (targeting my BJX2 ISA and 
also RISC-V) was derived from a fork off the SuperH backend.

So it continued in not having an assembler, and generating instructions 
with big "switch()" blocks, which also scales poorly, but sorta makes 
sense when the ISA is smallish.

The stalled out new compiler would have used an assembler, and more so, 
an assembler driven by an instruction listing table.

It stalled out though because code footprint quickly exceeded my 
original target and still wasn't done enough to be useful.

I was also trying for a more traditional compiler design (one 
translation unit at a time, producing native object files as an 
intermediate step, with a native-code linker). I suspect now this may 
have been a mistake, and Frontend->IL + IL->Native may be a better option.

[toc] | [prev] | [next] | [standalone]

#394578

From	bart <bc@freeuk.com>
Date	2025-10-17 22:07 +0100
Message-ID	<10cub5m$1ch30$1@dont-email.me>
In reply to	#394560

On 15/10/2025 19:00, BGB wrote:
> On 10/15/2025 5:26 AM, bart wrote:
>> On 15/10/2025 02:13, BGB wrote:
>>
>>> Apparently the languages people are trying to push as C replacements 
>>> are mostly Rust, Zig, and Go.
>>>
>>> None of these particularly compel me though.
>>>    They seem more like needless deviations from C than a true successor.
>>
>> So what would a true successor look like?
>>
> 
> Probably sorta like C with a few vaguely C++ like features, but with a 
> cleaner and simpler design.
> 
> Should ideally be usable for similar stuff to C.
>    Not drastically or needlessly different.

Well, my own language is somewhere at the level of C, yet it looks very 
different.

It was a rather crude affair in 1981/82, it has now evolved some modern 
conveniences. But I deliberately keep it low-level.

This is a C program to print some square roots:

   #include <stdio.h>
   #include <math.h>

   int main() {
        for (int i=1; i<=10; ++i)
            printf("%d %f\n", i, sqrt(i));
   }

This is it in my language (all are complete programs):

   proc main=
       for i to 10 do
           println i, sqrt i
       od
   end

Finally, this is it in Zig:

   const std = @import("std");

   pub fn main() void {
       for (1..11) |i| {
           std.debug.print("{} {}\n", .{i, @sqrt(@as(f64, 
@floatFromInt(i)))});
       }
   }

This one is interesting in making the C version look remarkably clean 
and uncluttered by comparison!

But, yeah, I NEED it to look different from C as I find C syntax 
unreadable; it's a kind of 'C-blindness' where source looks more like a 
block Mime-encoded text.

>> But it is so utterly unlike either of those, that if it's from the 
>> same family, then it must have been adopted.
>>
> 
> Idea is that it went ALGOL -> BCPL -> B -> C.
>    Going the other way, ALGOL was derived from FORTRAN.

BCPL also looks quite unlike Algol. If C's progression from Algol was 
via BCPL, which lost some features, then the claim that it came from 
Algol makes even less sense.

I've also heard that there were influences on C from Algol68. That was a 
language that used simple left-to-right type syntax, defaulted to 
1-based indexing, was (largely) case-insensitive, and used := for 
assignment and = for equality.

Each one of this is the exact opposite of C!

My language was genuinely inspired by Algol68, and as such, still has 
those exact characteristics.

So I think I'd take such claims of C being inspired by or based on Algol 
/or/ Algol68 with a pinch of salt.

[toc] | [prev] | [next] | [standalone]

#394579

From	BGB <cr88192@gmail.com>
Date	2025-10-17 17:44 -0500
Message-ID	<10cuh13$1enaj$1@dont-email.me>
In reply to	#394578

On 10/17/2025 4:07 PM, bart wrote:
> On 15/10/2025 19:00, BGB wrote:
>> On 10/15/2025 5:26 AM, bart wrote:
>>> On 15/10/2025 02:13, BGB wrote:
>>>
>>>> Apparently the languages people are trying to push as C replacements 
>>>> are mostly Rust, Zig, and Go.
>>>>
>>>> None of these particularly compel me though.
>>>>    They seem more like needless deviations from C than a true 
>>>> successor.
>>>
>>> So what would a true successor look like?
>>>
>>
>> Probably sorta like C with a few vaguely C++ like features, but with a 
>> cleaner and simpler design.
>>
>> Should ideally be usable for similar stuff to C.
>>    Not drastically or needlessly different.
> 
> Well, my own language is somewhere at the level of C, yet it looks very 
> different.
> 
> It was a rather crude affair in 1981/82, it has now evolved some modern 
> conveniences. But I deliberately keep it low-level.
> 

As for me, in 1982: I didn't exist yet.

So, alas, my journey mostly starts in the 90s (childhood) and 2000s 
(teens / early 20s). But, alas, I have now exceeded 4 decades of existence.

> 
> This is a C program to print some square roots:
> 
>    #include <stdio.h>
>    #include <math.h>
> 
>    int main() {
>         for (int i=1; i<=10; ++i)
>             printf("%d %f\n", i, sqrt(i));
>    }
> 

In the language I was imagining as a hypothetical, probably:
    using stdio, math;
    int main() {
         for (int i=1; i<=10; ++i)
             printf("%d %f\n", i, sqrt(i));
    }
Doesn't really match my personal style, but there is little here that 
actually needs to change in the language as-imagined.

> 
> This is it in my language (all are complete programs):
> 
>    proc main=
>        for i to 10 do
>            println i, sqrt i
>        od
>    end
> 

Yeah.

In my first script language:
   for(var i=0; i<=10; i++)
     printf("%d %f\n", i, sqrt(i));

Generally there was no "main", but would instead basically execute 
everything at the toplevel from top to bottom.

Function syntax was basically:
   function foo(x, y)
     { return x+y; }
But, did allow a shorthand:
   function foo(x, y) x+y;
Where, no braces meant that the expression was interpreted as the entire 
body.
Alternatively:
   function foo(x, y) { x+y }
Where, if a function is terminated by an expression with no semicolon, 
then it behaved as if there was an implicit return statement.

Initially, it was dynamically typed, but later gained static types:
   function foo(x:int, y:int):int { x+y }
Which did basically the same thing, but with the types as 32-bit integers.

Say:
   var arr1 = [1,2,3,4,5];        //array, dynamically typed
   var arr2:int[] = [1,2,3,4,5];  //array of int
   var arr3       = [1,2,3,4,5]i; //array is int[], but arr3 is not.
   var obj1 = {x:3, y:4};  //ex-nihilo object with 2 members
   ...

A later successor language had switched to a more Java like syntax and 
structure. It was then in an awkward middle ground of being neither 
great as a free-form scripting language; nor great as an implementation 
language. Where it competed with C, but still lacked good integration 
with C; as I had also switched to a simpler FFI modeled after C#'s 
P/Invoke (its predecessor had used header-mining and auto-glue).

A more recent project had gone the complete opposite direction, and used 
a language modeled off of 1980s era BASIC.

But, as noted, it ended up diverging some from BASIC, gaining the use of 
dynamic scoping and "RETURN expr" + "x = GOSUB label", ...

Arguments could be provided, but also wonky:
Rather than being defined by the function, they are provided caller 
side, mostly as names that are bound within the newly created dynamic 
frame of the called label.

In this latter case, no separate parser, no bytecode, ...
Instead, the program was basically parsed into an array of token IDs 
(grouped into lines), and the interpreter walked the tokens (with all 
the keywords and special tokens defined first so that they got 
statically-assigned numbers).

So, there was basically an array of tokens, and an array mapping lines 
to indices in the token array, and an array to map labels to line 
indices, ...

In this case, we are back to starting by walking down from the top of a 
file. The parser design and syntax was also not very flexible, but 
sufficient for the use-case.

I went this route mostly because it allows for a smaller/simpler 
interpreter. So, whole language design was based mostly on how I could 
do an interpreter within a relatively modest line-count (initial goal 
being to fit everything in under 1000 lines of C).

Adding the CSG stuff basically blew out this limit though (could be 
sub-1000 lines, but this mostly excludes things like vector math or CSG 
solids; which also broke consistency with the "1980s BASIC" design 
influence).

Where, had wanted something for some basic 3D modeling and animation 
tasks, but: OpenSCAD + exporting STL didn't fit my needs.
Implementing the SCAD language, to extend with new features, would have 
been more code.

In some ways, programmatic CSG is preferable to traditional 3D modeling 
and animation tools (even if pretty much everyone else seems to prefer 
triangle mesh models and GUI-driven 3D model tools over CSG and 
describing geometry directly). Though, granted, nicer to have have 
higher-level primitives and operators, rather than having to do 
everything manually with planes and "Brush Solids". Though, internally, 
the primitives get turned into "brush solids" based on collections of 
planes, with geometry mostly being clipped between solids using the 
planes (so, you don't have true spheres, rather a sphere becomes a brush 
with a large number of planes, ...).

Granted, all the code for the CSG proper is not counted in the line 
count for the BASIC interpreter, otherwise the number would be 
significantly higher.

But, in this language, one could have something like:
   let i=0
   lbl0:
   print i
   i=i+1
   if i<=10 goto lbl0
Where, "THEN" is optional if "GOTO" or similar is used.
   Though, "THEN" is required if you want an "ELSE" clause.
   Note that both clauses need to go on the same line.

Note that in this case, the language was case-insensitive as the 
keywords and identifiers were case normalized.
   L0:
   ...
   GOTO L0
Is equivalent to:
   l0:
   ...
   goto l0

Traditionally, BASIC is written all upper case though, but then one 
needs to SHIFT or CAPS-LOCK the whole time.

> 
> Finally, this is it in Zig:
> 
>    const std = @import("std");
> 
>    pub fn main() void {
>        for (1..11) |i| {
>            std.debug.print("{} {}\n", .{i, @sqrt(@as(f64, 
> @floatFromInt(i)))});
>        }
>    }
> 
> 
> This one is interesting in making the C version look remarkably clean 
> and uncluttered by comparison!
> 
> But, yeah, I NEED it to look different from C as I find C syntax 
> unreadable; it's a kind of 'C-blindness' where source looks more like a 
> block Mime-encoded text.
> 

OK.

> 
>>> But it is so utterly unlike either of those, that if it's from the 
>>> same family, then it must have been adopted.
>>>
>>
>> Idea is that it went ALGOL -> BCPL -> B -> C.
>>    Going the other way, ALGOL was derived from FORTRAN.
> 
> BCPL also looks quite unlike Algol. If C's progression from Algol was 
> via BCPL, which lost some features, then the claim that it came from 
> Algol makes even less sense.
> 
> I've also heard that there were influences on C from Algol68. That was a 
> language that used simple left-to-right type syntax, defaulted to 1- 
> based indexing, was (largely) case-insensitive, and used := for 
> assignment and = for equality.
> 
> Each one of this is the exact opposite of C!
> 
> My language was genuinely inspired by Algol68, and as such, still has 
> those exact characteristics.
> 
> So I think I'd take such claims of C being inspired by or based on 
> Algol /or/ Algol68 with a pinch of salt.
> 
I think it is more the claim that the path that led to C was via ALGOL 
60, but either way...

[toc] | [prev] | [next] | [standalone]

#394589

From	Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Date	2025-10-20 10:02 +0200
Message-ID	<10d4q9q$32gi5$1@dont-email.me>
In reply to	#394560

On 15.10.2025 20:00, BGB wrote:
> On 10/15/2025 5:26 AM, bart wrote:
>> On 15/10/2025 02:13, BGB wrote:
>>
>>> There was ALGOL, but both C and Pascal descended from ALGOL.
>>
>> I've heard that before that C was somehow derived from Algol and even
>> Algol 68.
>>
>> But it is so utterly unlike either of those, that if it's from the
>> same family, then it must have been adopted.
> 
> Idea is that it went ALGOL -> BCPL -> B -> C.
>   Going the other way, ALGOL was derived from FORTRAN.

In which way do you think that Algol 60 was "derived from FORTRAN"?

Janis

[toc] | [prev] | [next] | [standalone]

#394590

From	BGB <cr88192@gmail.com>
Date	2025-10-20 04:42 -0500
Message-ID	<10d50b6$341dj$1@dont-email.me>
In reply to	#394589

On 10/20/2025 3:02 AM, Janis Papanagnou wrote:
> On 15.10.2025 20:00, BGB wrote:
>> On 10/15/2025 5:26 AM, bart wrote:
>>> On 15/10/2025 02:13, BGB wrote:
>>>
>>>> There was ALGOL, but both C and Pascal descended from ALGOL.
>>>
>>> I've heard that before that C was somehow derived from Algol and even
>>> Algol 68.
>>>
>>> But it is so utterly unlike either of those, that if it's from the
>>> same family, then it must have been adopted.
>>
>> Idea is that it went ALGOL -> BCPL -> B -> C.
>>    Going the other way, ALGOL was derived from FORTRAN.
> 
> In which way do you think that Algol 60 was "derived from FORTRAN"?
> 

Stuff online seems to say the backwards path went:
   ALGOL 60 <- ALGOL 58 <- FORTRAN and Plankalkul

Granted, FORTRAN is a somewhat different language from ALGOL.


Though, can note that a lot of the early languages, like early forms of 
FORTRAN and COBOL, were line-oriented and tended to use line numbers and 
requiring stuff to be in certain columns, ...

Though, ALGOL and its descendants generally went over to a more 
free-form syntax.


Arguably, FORTRAN and BASIC were some of the easier ways to approach 
designing a language syntax.


Say, if ranking languages by minimal complexity, maybe:
   Level 1:
     Limited forms of FORTH and G-Code
     Can real line-by-line and evaluate tokens left to right;
     Logic mostly driven directly by tokens.
       Forth: Tokens drive a stack;
       G-Code: Tokens set internal state/parameters.
         Any command initiates at the end of the line.
   Level 2:
     Older forms of FORTRAN and BASIC
     Can work line by line and match fixed patterns;
     Can use fairly simple rules to split up tokens;
     Can use basic pattern matching for an interpreter.
   Level 3:
     LISP, QBasic and Visual Basic, ALGOL, ...
     Typically requires building an AST, etc.
       Though, Lisp syntax can map directly to an AST.
     Can be supported easily by a tree-walk interpreter.
   Level 4:
     JavaScript and similar;
     Requires a full syntax tree and more advanced parser;
     An interpreter can use a tree-walk or basic IR.
   Level 5:
     Languages like C;
     Generally requires a full-featured type-system;
     Language structure and parsing may depend on types;
     Direct interpretation is not usually feasible;
     ...



> Janis
>

[toc] | [prev] | [next] | [standalone]

Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →

csiph-web

Nice way of allocating flexible struct.

Contents

#394523

#394524

#394526

#394527

#394529

#394530

#394533

#394535

#394539

#394532

#394534

#396245

#394544

#394548

#394556

#394560

#394578

#394579

#394589

#394590