Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #394488 > unrolled thread
| Started by | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| First post | 2025-10-08 06:35 +0000 |
| Last post | 2025-12-14 22:48 -0800 |
| Articles | 20 on this page of 73 — 14 participants |
Back to article view | Back to comp.lang.c
Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 06:35 +0000
Re: Nice way of allocating flexible struct. pozz <pozzugno@gmail.com> - 2025-10-08 09:09 +0200
Re: Nice way of allocating flexible struct. richard@cogsci.ed.ac.uk (Richard Tobin) - 2025-10-08 12:01 +0000
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 15:23 +0000
Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 19:04 +0300
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 20:05 +0000
Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 18:52 +0300
Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-14 23:55 -0800
Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 11:09 +0200
Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 11:23 +0200
Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-08 12:53 +0300
Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 12:09 +0200
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 15:59 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 12:29 -0500
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 21:04 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 22:49 -0500
Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-10 01:13 +0100
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 01:54 +0000
Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-09 19:43 -0700
Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-10 11:25 +0100
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-09 22:50 -0500
Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-09 20:59 -0700
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 01:27 -0500
Re: Nice way of allocating flexible struct. David Brown <david.brown@hesbynett.no> - 2025-10-10 12:06 +0200
Re: Nice way of allocating flexible struct. Michael S <already5chosen@yahoo.com> - 2025-10-10 17:28 +0300
Re: Nice way of allocating flexible struct. David Brown <david.brown@hesbynett.no> - 2025-10-10 17:47 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 16:32 -0500
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-11 00:02 +0000
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-13 06:20 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-10 15:01 -0500
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 23:45 +0000
Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-01-06 18:24 -0800
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-14 06:29 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-14 20:13 -0500
Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-15 11:26 +0100
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-15 13:00 -0500
Re: Nice way of allocating flexible struct. bart <bc@freeuk.com> - 2025-10-17 22:07 +0100
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-17 17:44 -0500
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-20 10:02 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-20 04:42 -0500
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 03:40 +0200
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-16 06:45 +0200
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-16 06:37 +0200
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-16 04:43 -0500
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-20 09:58 +0200
Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-20 18:36 +0000
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-20 16:44 -0500
Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 01:33 +0000
Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 04:19 +0200
Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) BGB <cr88192@gmail.com> - 2025-10-21 04:27 -0500
Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-22 02:30 +0200
Re: Language-design, tradeoffs (was Re: Nice way of allocating flexible struct.) BGB <cr88192@gmail.com> - 2025-10-22 02:10 -0500
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 03:51 +0200
Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-20 19:21 -0700
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-21 04:53 +0200
Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:21 +0000
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-21 13:42 -0500
Re: Nice way of allocating flexible struct. James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-10-21 06:17 -0400
Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:41 +0000
Re: Nice way of allocating flexible struct. rbowman <bowman@montana.com> - 2025-10-21 18:12 +0000
Re: Nice way of allocating flexible struct. Bonita Montero <Bonita.Montero@gmail.com> - 2025-10-08 19:36 +0200
Re: Nice way of allocating flexible struct. Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2025-10-08 19:51 +0200
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-08 15:29 +0000
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 11:33 -0500
Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-08 14:57 -0700
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-09 01:39 +0000
Re: Nice way of allocating flexible struct. BGB <cr88192@gmail.com> - 2025-10-08 22:25 -0500
Re: Nice way of allocating flexible struct. Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-10-09 19:50 -0700
Re: Nice way of allocating flexible struct. Kaz Kylheku <643-408-1753@kylheku.com> - 2025-10-10 04:20 +0000
Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-15 11:24 -0800
Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-08 13:35 -0700
Re: Nice way of allocating flexible struct. "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-10-08 13:36 -0700
Re: Nice way of allocating flexible struct. Tim Rentsch <tr.17687@z991.linuxsc.com> - 2025-12-14 22:48 -0800
Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-09 22:50 -0500 |
| Message-ID | <10c9vsr$3p286$1@dont-email.me> |
| In reply to | #394519 |
On 10/9/2025 7:13 PM, bart wrote:
> On 09/10/2025 04:49, BGB wrote:
>> On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
>>> On 08.10.2025 19:29, BGB wrote:
>
>> Though, similar was often a problem in my other language design
>> attempts: The most efficient way to do things was often also the C way.
>>
>>
>>
>> The only real exception I have found to this rule basically being in
>> relation to some features I have borrowed from languages like GLSL and
>> Verilog. But, some of this stuff isn't so much making the language
>> "higher level" as much as "being easier to map to ISA features and
>> optimize".
>>
>> Say:
>> vd[62:52]=vs[20:10];
>> Being easier to optimize than, say:
>> vd=(vd&(~(2047ULL<<52)))|(((vs>>10)&2047ULL)<<52);
>
> Using special bit-features makes it easier to generate decent code for a
> simple compiler.
>
> But gcc for example has no trouble optimising that masking/shifting
> version.
>
BGBCC is not so clever...
Granted, its code footprint is tiny vs GCC, and it can do a full rebuild
in a few seconds (with effectively the entire "compiler toolchain" in a
single binary).
Like, GCC and LLVM are both very large (over 10M lines).
Contrast, BGBCC is still in kLOC territory.
Granted, still not that small, still pretty big if compared with
something like Doom; but alas, I haven't really been able to fit a C
compiler into a Doom-like code footprint (say, trying to keep a C
compiler under 30k lines).
I did start making an attempt at one point, but ended up dropping the
effort after I have already exceeded a Doom-like code footprint, and it
still wasn't very close to being done.
Is a little easier with an interpreter, but if one wants sensible native
code generation, doing it within a small footprint is difficult.
So, as-is, I have a compiler that is roughly around the size of the
Quake 2 engine...
> (It can do it in four x64 instructions, whereas I need nine working from
> vd.[62..52] := vs.[20..10]. It could be improved though; I don't need to
> extract the data to bits 10..0 first for example.)
>
> The main advantage is that it is a LOT easier to write, read and
> understand. The C would need macros to make it practical.
>
Shifts/Masks and macros is more traditional, but as noted, with my
compiler explicit bit notation is easier to optimize, as well as read
and write.
>
>> Though, Verilog itself, not so much... Works well in an ASIC or FPGA,
>> not so much on a CPU.
>>
>> Though, as can be noted:
>> Bit-ranges are required to be constant at compile time;
>> When used with normal integer types, both bounds are required.
>
> I can handle some variable elements, but it gets rapidly complicated. At
> some point it needs to use library functions to do the work.
>
In my case, I only allowed constant ranges here.
If runtime calls were used, they would eat any possible savings.
But, the ability to generate efficient code here falls on its face if
non-constant.
>>
>> OTOH, GLSL offers nice and efficient ways to deal with SIMD.
>> Well, and also having some types for bit-preserving casts.
>> Or ability to specify endianess and alignment for individual struct
>> members.
>> ...
>>
>>
>>>>
>>>> Granted, a few of my own language design attempts ended up with a
>>>> different mess: [...]
>>>
>>> A sensibly defined language isn't something easily to create
>>> or obtain! - Personally I'd have appreciated it more if more
>>> designers of "own languages" have oriented their designs on
>>> sensible existing and proven concepts. - There may be a
>>> "market" for all these "own languages", I don't know, but I
>>> also don't care much, given what I've seen or heard of yet.
>>> (This isn't meant to be offensive, just to be clear, only
>>> that I don't care much. As compiler writers don't care much
>>> what I think.)
>>>
>>
>> Yeah.
>>
>> They have either tended to not amount to much, or converged towards
>> more conventional languages.
>>
>>
>>
>>>> [ attempt for a discussion on features of "own language"
>>>> snipped; not my business ]
>
> (There are those who can devise and use their own languages, and those
> who can't.)
>
>> Some amount of my stuff recently has involved various niche stuff.
>> Interfacing with hardware;
>> Motor controls;
>> Implementing things like an OpenGL back-end or similar;
>> Being used for a Boot ROM and OS kernel;
>> Sometimes neural nets.
>
> Some impressive stuff.
>
Yes, and mostly C domain.
I have my own experimental ISA which was partly designed with the
intention of using it for a mix of motor controls and computer vision.
Mostly I have just ended it for running ports of 90s era games (and
otherwise mostly using ye olde ARM). Partly, one is still hard-pressed
to get an FPGA to be performance competitive with something like a
RasPi; and a RasPi is cheaper.
But, for my own ISA, did end up writing the firmware and OS (including
OpenGL) using my own compiler, although mostly in an extended C dialect.
And, using my C compiler as:
I could throw it together for my own ISA and design experiments;
GCC or Clang would have been too much of an uphill battle;
LCC would still have left me needing to do most of the relevant work myself;
...
If comparing against GCC targeting RV64, my stuff gets better
performance (though, BGBCC typically loses if both compilers are limited
to plain RV64G; but my compiler with my ISA can beat GCC when GCC is
limited to RV64G).
Though, With a few carefully selected extensions, RISC-V can be brought
into a similar performance profile as my own ISAs:
Indexed Load and Store;
Load/Store Pair (load or store pairs of 2 registers at a time);
Jumbo Prefixes (can expand immediate values from 12 to 33 bits).
In some programs, this combination can get a 40-60% speedup over plain
RV64G.
A lot of other things are possible, but the gains are generally a lot
smaller.
As I can note, ISA list supported by my compiler looks kinda like:
SuperH: SH4
BJX1 (an extended variant of SH4).
(Split into several variants)
( Not currently maintained )
BJX2 (Current ISA family)
XG1: Original form of the ISA
XG2: Intermediate form
XG3: Reworked to coexist better with RISC-V.
RISC-V
RV64G/RV64GC
Various optional extensions.
While RISC-V exists and is popular, not fully jumped over to RV as in
its basic form, its performance is a little weak (partly due to weak
areas and "foot guns"). Its performance can be improved, but there are
limits.
The XG3 variant is promising, but is essentially XG2 and RV64G just sort
of awkwardly hot-glued together. I don't expect it will see widespread
adoption (even if it does get reasonably good performance). Like, an ISA
design that is two unrelated ISAs glued together isn't necessarily the
most elegant solution (even if XG3's encoding scheme was able to clean
up some of the dog chew in XG2).
Pretty much all of the normal RV64G (or RV64I) encodings are still
usable in XG3, just with trade-offs (like, RV64 encodings have split X/F
registers whereas XG3 encodings have a unified register space, ...).
>
>> Some features are useful in some contexts but not others:
>> For example, "__int128" is very helpful when writing FPU-emulation
>> code for Binary128 handling, but has a lot fewer use-cases much beyond
>> this.
>>
>> Or, like:
>> exp=vala[126:112]; //extract exponent
>> fra=(_BitInt(128)) { 0x0001i16, vala[111:0]}; //extract fraction
>
> I had i128/u128 types at one point (quite a nice implementation too; it
> was only missing full 128-bit divide, I had only 128/64.)
>
> But the only place they got used was implementing 128-bit support in the
> self-hosted compiler and its library! So they were dropped.
>
I have them in my compiler, and to some extent in my ISA, but one of the
main use cases I have for them is implementing Binary128 support code
(or, "long double").
RISC-V doesn't have "__int128", and (in used) many of the operations end
up as runtime calls. It is more awkward in RISC-V as well as there also
isn't really a good/efficient way to implement 128-bit math using the
available 64-bit instructions.
I ended up adding a modified form of the 'Q' instructions from RISC-V,
where:
"long double" is rarely used enough that the cost of handling it with
emulation traps is acceptable;
But, common enough that you don't want it to be too horribly slow;
My ISA has access to some 128-bit integer operations;
On the RISC-V side, the cost of doing everything with 64-bit integer
math is slow enough to offset the cost of the emulation traps.
Though not quite the same as the Q extension:
Uses register pairs rather than 128 bit registers;
But, my sentiment here is that for low-traffic uses (128-bit integer and
floating-point stuff in general), then the use of pairs of 64-bit
registers is preferable.
In the underlying hardware, supporting 128-bit integer math was a
cheaper option if compared with 128-bit FPU hardware; and int128 also
sees a little more traffic.
As for ISA level support for int128:
ADD/SUB, Native (ALU chaining)
Some control bits are needed to merge Carry-Select across the units;
Carry-select scales well, so not too much added latency cost.
Shift, Native (shift units ganged)
Can use two 64-bit funnel shifters in parallel with some trickery.
AND/OR/XOR: 2x ALU in parallel; No special handling needed.
...
This leaves MUL/DIV/etc, no viable way to handle them directly or
efficiently in hardware. Currently, the fastest way to do 128-bit
multiply in this case being to build it from 32-bit widening-multiply
instructions.
And, for Binary128 FPU multiply, one needs the high 128-bits of a
128*128 -> 256 bit widening multiply.
>> Unless maybe something can come along that is a better C than C...
>
> There are lots of new products, mostly too ambitious, too big and too
> complex. But C is already ensconced everywhere.
>
Yeah.
Note trying to beat C at its own game is not about having the biggest
possible feature list.
A feature which might help in one context might be actively detrimental
in another.
Maybe OOP is itself optional, so a C-like subset basically has close to
a 1:1 feature-set with C (and remains well suited for procedural code).
Likewise, shouldn't need to pay for things like RTTI or Exceptions when
not used.
Also, ideally "don't pay for what you don't use".
Some compiler or language features exist, but if programmers don't use
them, ideally they shouldn't need to pay for them.
Like, say, a language could have optional dynamic types; but if writing
a small Boot ROM, then this sort of thing is strictly off-limits.
Like, say, if you have 32K of ROM space, can't justify wasting it on
non-essential features.
But, if a C alternative language is basically just C with a slightly
different syntax, not necessarily all that compelling either.
...
>> Would likely simplify or eliminate some infrequently used features in C.
>>
>> Possibly:
>> Preprocessor, still exists, but its role is reduced.
>> Its role can be partly replaced by compiler metadata.
>> Trigraphs and digraphs: Gone;
>> K&R style declarations, also gone;
>> Parser should not depend on previous declarations;
>> Non trivial types and declarator syntax: Eliminate;
>> ...
>>
>> Possibly:
>> Pointers and arrays can be specified on the type rather than
>> declarator (so, more like C# here)
>> ...
>>
>> But, as I see it, drastically changing the syntax (like in Go or Rust)
>> is undesirable. Contrast, say, C# style syntax was more conservative.
>
> Nobody cares about C syntax. Learning all its ins and outs seems be a
> rite of passage.
>
> The trouble is that C-style is so dominant, few people would know what a
> decent syntax looks like. Or, more, likely, they associate a clean,
> well-designed syntax with toy or scripting languages, and can't take it
> seriously.
>
> But if it looks as hairy as C++ then it must be the business!
>
Comparably, C# style syntax is simplified if compared with C or C++, but
retains many similar properties (and isn't quite as verbose or as
awkward as Java).
>> Though, the harder problem here isn't necessarily that of designing or
>> implementing it, but more in how to make its use preferable to jus
>> staying with C.
>>
>>
>> One merit is if code can be copy-pasted, but if one has to change all
>> instances of:
>> char *s0, *s1;
>> To:
>> char* s0, s1;
>>
>> Well, this is likely to get old, unless it still uses, or allows C
>> style declaration syntax in this case.
>
> That one's been fixed (50 years late): you instead write:
>
> typeof(char*) s0, s1;
>
> But you will need an extension if it's not part of C23.
>
It is a thing of if the language uses a C# style syntax, even if
"unsafe", directly copy-pasting from C would require some amount of editing.
But, either way, to be useful as a C alternative:
Would need to be able to do all of the same stuff as C in a roughly
similar way;
Should have similar or better performance;
...
Though, one thing is:
The language should not try to ram OOP down everyone's throat (a problem
that existed in both C# and Java).
Ideally, one could still do things using a C like procedural style.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2025-10-09 20:59 -0700 |
| Message-ID | <87qzvbh0f5.fsf@example.invalid> |
| In reply to | #394519 |
bart <bc@freeuk.com> writes:
> On 09/10/2025 04:49, BGB wrote:
[...]
> Nobody cares about C syntax.
That is so manifestly untrue that I can't imagine what you actually
meant.
Many of us, myself included, don't particularly like some aspects of C
syntax, but that's not the same as not caring about it.
> Learning all its ins and outs seems be a
> rite of passage.
Perhaps. It's also necessary if you want to work with the language.
> The trouble is that C-style is so dominant, few people would know what
> a decent syntax looks like. Or, more, likely, they associate a clean,
> well-designed syntax with toy or scripting languages, and can't take
> it seriously.
>
> But if it looks as hairy as C++ then it must be the business!
C syntax has survived and been propagated to other languages because
it's well known, not, I think, because anybody really likes it.
[...]
>> One merit is if code can be copy-pasted, but if one has to change
>> all instances of:
>> char *s0, *s1;
>> To:
>> char* s0, s1;
>> Well, this is likely to get old, unless it still uses, or allows C
>> style declaration syntax in this case.
>
> That one's been fixed (50 years late): you instead write:
>
> typeof(char*) s0, s1;
>
> But you will need an extension if it's not part of C23.
Yes, that will work in C23, but it would never occur to me to
write that. I'd just write `char *s0, *s1;` or, far more likely,
define s0 and s1 on separate lines. Using typeof that way triggers
my WTF filter.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-10 01:27 -0500 |
| Message-ID | <10ca933$3rssv$1@dont-email.me> |
| In reply to | #394524 |
On 10/9/2025 10:59 PM, Keith Thompson wrote:
> bart <bc@freeuk.com> writes:
>> On 09/10/2025 04:49, BGB wrote:
> [...]
>
>> Nobody cares about C syntax.
>
> That is so manifestly untrue that I can't imagine what you actually
> meant.
>
> Many of us, myself included, don't particularly like some aspects of C
> syntax, but that's not the same as not caring about it.
>
Yes.
>> Learning all its ins and outs seems be a
>> rite of passage.
>
> Perhaps. It's also necessary if you want to work with the language.
>
>> The trouble is that C-style is so dominant, few people would know what
>> a decent syntax looks like. Or, more, likely, they associate a clean,
>> well-designed syntax with toy or scripting languages, and can't take
>> it seriously.
>>
>> But if it looks as hairy as C++ then it must be the business!
>
> C syntax has survived and been propagated to other languages because
> it's well known, not, I think, because anybody really likes it.
>
I would gladly pick C style syntax over PASCAL, FORTRAN, or COBOL.
> [...]
>
>>> One merit is if code can be copy-pasted, but if one has to change
>>> all instances of:
>>> char *s0, *s1;
>>> To:
>>> char* s0, s1;
>>> Well, this is likely to get old, unless it still uses, or allows C
>>> style declaration syntax in this case.
>>
>> That one's been fixed (50 years late): you instead write:
>>
>> typeof(char*) s0, s1;
>>
>> But you will need an extension if it's not part of C23.
>
> Yes, that will work in C23, but it would never occur to me to
> write that. I'd just write `char *s0, *s1;` or, far more likely,
> define s0 and s1 on separate lines. Using typeof that way triggers
> my WTF filter.
>
Agreed.
I think it can be contrast with C# style syntax (with "unsafe") where
one would write:
char* s0, s1;
Though, imagining a world where probably char is an unsigned byte, so
that UTF-8 makes sense.
So, say, if we had types (for a hypothetical language) like:
sbyte, ubyte: 8-bits, signed/unsigned
byte: 8-bits, probably unsigned
char: 8-bits, probably unsigned (UTF-8)
wchar: 16-bits, unsigned (UTF-16)
short: 16-bits, signed
ushort: 16-bits, unsigned
int: 32-bits, signed
uint: 32-bits, unsigned
long: 64-bits, signed
ulong: 64-bits, unsigned
Maybe some more, but with explicit bit sizes.
int8/int16/int32/int64/int128
uint8/uint16/uint32/uint64/uint128
But, no separate "unsigned".
Core type name is always a single identifier, unlike C.
And, special types:
string: String, UTF-8
wstring: String, UTF-16
...
While a string is (effectively) a pointer to the first character, the
type can be seen as distinct from that of 'char*'. Nominal
representation would be as a series of codepoints terminated with a NUL
byte.
Default string type can be UTF-8 because, most of the time, UTF-16 would
be a waste of memory (but, can be kept for "those that have that
preference").
And, floating point:
float: Binary32
double: Binary64
half: Binary16
Maybe explicit sized names:
float16/float32/float64/float128
Maybe vectors (optional):
vec2f 2x float
vec4f 4x float
vec4h 4x half
vec2d 2x double
quatf quaternion (float)
quath quaternion (half)
Untyped bit-blobs:
m8, m16, m32, m64, m128
Say:
ulong li;
double f;
li=(m64)f; //cast double->ulong as bit pattern
f=(m64)li; //cast ulong->double as bit pattern
With basic decl syntax like:
int i; //normal variable
int[16] ia; //fixed array (inline)
int* pi; //pointer
int[] ia2; //flexible array (reference, hosted only)
int*[16] pia; //fixed array of pointers
This would apply to primitive types and structs.
Object types having different suffixes (hosted only):
Foo obj1; //basic object
Foo! obj2; //automatic
Foo^ obj3; //refcount
Foo(Z) obj4; //zone
Foo[] aobj1; //flexible array of Foo
Foo[16] aobj2; //fixed array of Foo
...
Where, struct and class implicitly declare type, so:
struct Str1 {
int x, y;
}
class Foo:Bar { //class Foo, extends Bar
...
}
interface IBaz { //interface
...
}
Where, likely flexible arrays, classes, and interfaces, only exist:
If the implementation is hosted;
If a non-hosted implementation provides a full memory manager.
Would eliminate some more obscure patterns from C syntax, like:
int (*arr[16])[16]; //like, WUT?...
Just sorta say, stuff like this doesn't exist.
Might consider doing function pointers like:
delegate void FuncT();
...
FuncT fun;
Well, in contrast to some of my own languages which had used
typedef void FuncT();
For this, and 'delegate' as a scope modifier (where identifier lookups
may look into the variable). But, nothing stops using delegate for both
purposes (based on whether it is followed by a prototype or object
variable declaration).
Scoping could look like:
global top-level (behaves like C top-level);
namespaces.
Probably, things like function overloading are only allows inside
classes or namespaces. At the global toplevel, no overloading is allowed
(so, it is like C++ 'extern "C"' by default).
So, say:
namespace foo {
using c.stdio;
//why not?...
int func(int x, int y)
{
printf("yeah...\n"); //via stdio
return x+y;
}
}
Where, say, we don't need a bunch of wrapper classes for file IO and
printing, because C's "stdio.h" already does this.
Here, there can be some magic behind the scenes, where the compiler can
use namespaces and metadata rather than bulk textual inclusion.
One option is that when the compiler compiles stuff, like the runtime
library, it also generates "manifests" that the compiler can use to find
declarations. Likely, the manifest files could exist like a sort of
hierarchical database partly mapped onto a virtual filesystem, with
search paths and similar (sorta like the class path in the JVM). Just,
this metadata will only exist for compiling stuff.
Why not textual inclusion?: Because it wastes a lot of CPU time and RAM
to generate and parse a mountain of random stuff for every translation
unit (the amount of text pulled in from headers is typically several
orders of magnitude larger than the actual code for the translation unit
itself). Also precompiled headers are a poor solution to this.
May or may not do varargs differently.
One possible interpretation could be, say:
void vafunc(char* s, va...)
{
char* s1;
long x, y;
x=va[0]; //first variable argument
y=va[1]; //second variable argument
s1=(char*)va[2]; //third argument (string)
}
Where, the exact element type depends on the target, but probably 'long'
or something (or 'int' on a 32-bit target).
ABI rule then would be:
If the ABI would otherwise have distinction based on argument types,
vararg functions will receive all arguments as native machine words (int
or long or similar). If they are passed in registers, they will be
spilled to memory, and this memory will be returned as the argument list.
Then, say, C style va_list could be faked as, say:
long* vap=va;
x=*vap++;
y=*vap++;
s1=(char*)(*vap++);
...
As for whether or not this offers enough to actually be worth bothering
with, vs "just use C", dunno...
I am seemingly one of the few people considering ideas for a
hypothetical C replacement that actually likes C.
...
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2025-10-10 12:06 +0200 |
| Message-ID | <10calqi$3v6ha$1@dont-email.me> |
| In reply to | #394526 |
On 10/10/2025 08:27, BGB wrote: > On 10/9/2025 10:59 PM, Keith Thompson wrote: >> bart <bc@freeuk.com> writes: >> >>>> One merit is if code can be copy-pasted, but if one has to change >>>> all instances of: >>>> char *s0, *s1; >>>> To: >>>> char* s0, s1; >>>> Well, this is likely to get old, unless it still uses, or allows C >>>> style declaration syntax in this case. >>> >>> That one's been fixed (50 years late): you instead write: >>> >>> typeof(char*) s0, s1; >>> >>> But you will need an extension if it's not part of C23. >> >> Yes, that will work in C23, but it would never occur to me to >> write that. I'd just write `char *s0, *s1;` or, far more likely, >> define s0 and s1 on separate lines. Using typeof that way triggers >> my WTF filter. >> > > Agreed. > > > > I think it can be contrast with C# style syntax (with "unsafe") where > one would write: > char* s0, s1; Does C# treat s1 as "char*" in this case? That sounds like an extraordinarily bad design decision - having a syntax that is very like the dominant C syntax yet subtly different. Issues like this have been "solved" for decades - in the sense that people who care about their code don't make mistakes from mixups of "char" and "char*" declarations. There are a dozen different ways to be sure it is not an issue. Simplest of all is a style rule - never declare identifiers of different types in the same declaration. I'd have preferred that to be a rule baked into the language from the start, but we all have things we dislike about the C syntax.
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2025-10-10 17:28 +0300 |
| Message-ID | <20251010172802.00005bdd@yahoo.com> |
| In reply to | #394527 |
On Fri, 10 Oct 2025 12:06:10 +0200 David Brown <david.brown@hesbynett.no> wrote: > On 10/10/2025 08:27, BGB wrote: > > On 10/9/2025 10:59 PM, Keith Thompson wrote: > >> bart <bc@freeuk.com> writes: > > >> > >>>> One merit is if code can be copy-pasted, but if one has to change > >>>> all instances of: > >>>> char *s0, *s1; > >>>> To: > >>>> char* s0, s1; > >>>> Well, this is likely to get old, unless it still uses, or allows > >>>> C style declaration syntax in this case. > >>> > >>> That one's been fixed (50 years late): you instead write: > >>> > >>> typeof(char*) s0, s1; > >>> > >>> But you will need an extension if it's not part of C23. > >> > >> Yes, that will work in C23, but it would never occur to me to > >> write that. I'd just write `char *s0, *s1;` or, far more likely, > >> define s0 and s1 on separate lines. Using typeof that way triggers > >> my WTF filter. > >> > > > > Agreed. > > > > > > > > I think it can be contrast with C# style syntax (with "unsafe") > > where one would write: > > char* s0, s1; > > Does C# treat s1 as "char*" in this case? That sounds like an > extraordinarily bad design decision - having a syntax that is very > like the dominant C syntax yet subtly different. > Generally, I disagree with your rule. Not that it makes no sense at all, but sometimes a violation has more sense. For example, I strongly prefer for otherwise C-like languages to parse 011 literal as decimal 11 rather than 9. In this particular case it's more subtle. What makes it a non-issue in practice is the fact that pointers is C# is very rarely used expert-level feature, especially so after 7 or 8 years ago the language got slices (Span<T>). A person that decides to use C# pointers has to understand at least half a dozen of more arcane things than this one. Also it's very unlikely in case somebody made such mistake that his code will pass compilation. After all, we're talking about C# here, not something like Python.
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2025-10-10 17:47 +0200 |
| Message-ID | <10cb9rd$5f2i$2@dont-email.me> |
| In reply to | #394529 |
On 10/10/2025 16:28, Michael S wrote: > On Fri, 10 Oct 2025 12:06:10 +0200 > David Brown <david.brown@hesbynett.no> wrote: > >> On 10/10/2025 08:27, BGB wrote: >>> On 10/9/2025 10:59 PM, Keith Thompson wrote: >>>> bart <bc@freeuk.com> writes: >> >>>> >>>>>> One merit is if code can be copy-pasted, but if one has to change >>>>>> all instances of: >>>>>> char *s0, *s1; >>>>>> To: >>>>>> char* s0, s1; >>>>>> Well, this is likely to get old, unless it still uses, or allows >>>>>> C style declaration syntax in this case. >>>>> >>>>> That one's been fixed (50 years late): you instead write: >>>>> >>>>> typeof(char*) s0, s1; >>>>> >>>>> But you will need an extension if it's not part of C23. >>>> >>>> Yes, that will work in C23, but it would never occur to me to >>>> write that. I'd just write `char *s0, *s1;` or, far more likely, >>>> define s0 and s1 on separate lines. Using typeof that way triggers >>>> my WTF filter. >>>> >>> >>> Agreed. >>> >>> >>> >>> I think it can be contrast with C# style syntax (with "unsafe") >>> where one would write: >>> char* s0, s1; >> >> Does C# treat s1 as "char*" in this case? That sounds like an >> extraordinarily bad design decision - having a syntax that is very >> like the dominant C syntax yet subtly different. >> > > Generally, I disagree with your rule. Not that it makes no sense at > all, but sometimes a violation has more sense. For example, I strongly > prefer for otherwise C-like languages to parse 011 literal as decimal > 11 rather than 9. I did not intend to describe a general rule (and I agree with you in regard to octal). > > In this particular case it's more subtle. > What makes it a non-issue in practice is the fact that pointers is C# is > very rarely used expert-level feature, especially so after 7 or 8 > years ago the language got slices (Span<T>). > A person that decides to use C# pointers has to understand at least > half a dozen of more arcane things than this one. > Also it's very unlikely in case somebody made such mistake that his > code will pass compilation. After all, we're talking about C# here, not > something like Python. > Sure. It would seem to me, however, that it would have been better for the C# designers to pick a different syntax here rather than something that looks like C, but has subtle differences that are going to cause newbies confusion when they try to google for explanations for their problems. For example, if raw pointers are rarely used, then they should perhaps be accessible using a more verbose syntax than a punctuation mark - "ptr<char> s0, s1;" might work. However, I have no experience with C#, and don't know the reasons for its syntax choices.
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-10 16:32 -0500 |
| Message-ID | <10cbu37$bhar$1@dont-email.me> |
| In reply to | #394530 |
On 10/10/2025 10:47 AM, David Brown wrote: > On 10/10/2025 16:28, Michael S wrote: >> On Fri, 10 Oct 2025 12:06:10 +0200 >> David Brown <david.brown@hesbynett.no> wrote: >> >>> On 10/10/2025 08:27, BGB wrote: >>>> On 10/9/2025 10:59 PM, Keith Thompson wrote: >>>>> bart <bc@freeuk.com> writes: >>> >>>>>>> One merit is if code can be copy-pasted, but if one has to change >>>>>>> all instances of: >>>>>>> char *s0, *s1; >>>>>>> To: >>>>>>> char* s0, s1; >>>>>>> Well, this is likely to get old, unless it still uses, or allows >>>>>>> C style declaration syntax in this case. >>>>>> >>>>>> That one's been fixed (50 years late): you instead write: >>>>>> >>>>>> typeof(char*) s0, s1; >>>>>> >>>>>> But you will need an extension if it's not part of C23. >>>>> >>>>> Yes, that will work in C23, but it would never occur to me to >>>>> write that. I'd just write `char *s0, *s1;` or, far more likely, >>>>> define s0 and s1 on separate lines. Using typeof that way triggers >>>>> my WTF filter. >>>> >>>> Agreed. >>>> >>>> >>>> >>>> I think it can be contrast with C# style syntax (with "unsafe") >>>> where one would write: >>>> char* s0, s1; >>> >>> Does C# treat s1 as "char*" in this case? That sounds like an >>> extraordinarily bad design decision - having a syntax that is very >>> like the dominant C syntax yet subtly different. >>> >> >> Generally, I disagree with your rule. Not that it makes no sense at >> all, but sometimes a violation has more sense. For example, I strongly >> prefer for otherwise C-like languages to parse 011 literal as decimal >> 11 rather than 9. > > I did not intend to describe a general rule (and I agree with you in > regard to octal). > Yeah, '0' by itself indicating octal is weird, so I might agree here. 123 //decimal 0123 //maybe reinterpret as decimal? 0o123 //octal 0x123 //hexadecimal 0b101 //binary In BGBCC, had defined some additional handling for suffixes: iNN, where NN is an integer, specifies a number of bits. uNN or uiNN, specifies a number of bits, but unsigned. Types could specify non-power-of-2 widths (understood as _BitInt). Though, there was also the wonk that these literals could also allow X and Z in place of bits or hex digits, but this was more a side-effect of a fizzled effort to try to add Verilog support to BGBCC (which was also sort of where the bit notation came from). Though, generally, X and Z have no real purpose in C code though (and may not exist in actual integer values), so would be little more than a curiosity (with some of this more as stuff intended to try to test out functionality being added for sake of trying to support Verilog). But, as noted, in a few cases, the Verilog mechanisms can offer a performance advantage over traditional C constructs. In other cases, not so much.... This was being worked on at one point as I sometimes face frustration at the almost non-existent debugging features in Verilator (you basically have to do a more awkward form of printf debugging; would kinda be nice sometimes if one could set breakpoints and inspect variables, ...). But, what passes for control-flow in Verilog doesn't really map over so well (basically need to update stuff based on "sensitivity graph" mostly driven by clock signals and similar). >> >> In this particular case it's more subtle. >> What makes it a non-issue in practice is the fact that pointers is C# is >> very rarely used expert-level feature, especially so after 7 or 8 >> years ago the language got slices (Span<T>). >> A person that decides to use C# pointers has to understand at least >> half a dozen of more arcane things than this one. >> Also it's very unlikely in case somebody made such mistake that his >> code will pass compilation. After all, we're talking about C# here, not >> something like Python. >> > > Sure. > > It would seem to me, however, that it would have been better for the C# > designers to pick a different syntax here rather than something that > looks like C, but has subtle differences that are going to cause newbies > confusion when they try to google for explanations for their problems. > For example, if raw pointers are rarely used, then they should perhaps > be accessible using a more verbose syntax than a punctuation mark - > "ptr<char> s0, s1;" might work. > > However, I have no experience with C#, and don't know the reasons for > its syntax choices. > Early on, it didn't have generics and so wouldn't use that syntax. Unlike C++, it doesn't have templates, so "ptr<char>" would not make so much sense, and then 'ptr' would be limited to being a class instance, which are always by-reference. Also early on, no operator overloading either (as with generics, this part was added later). Also the language discouraged pointers anyways, so you had to opt-in by using the 'unsafe' keyword before the compiler would allow them (and then, only for 'trusted' executables). Though, for a hybrid language, would likely drop the concept of trusted executables (or, allow it, maybe with the added constraint that object lifetimes be statically-provable; maybe asking too much though). The concept of trusted executables doesn't make as much sense with native-code compilation. >
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-10-11 00:02 +0000 |
| Message-ID | <20251010170008.460@kylheku.com> |
| In reply to | #394533 |
On 2025-10-10, BGB <cr88192@gmail.com> wrote: > Yeah, '0' by itself indicating octal is weird, so I might agree here. > 123 //decimal > 0123 //maybe reinterpret as decimal? > 0o123 //octal > 0x123 //hexadecimal > 0b101 //binary Lisp people worked this out before the end of the 80s: [1]> 777 777 [2]> 00777 777 [3]> #o777 511 [4]> #x777 1911 [5]> #b1001 9 Leading zeros changing base is really a sneaky stupidity, and causes problems in shell scripts also, from time to time. $ printf "%d\n" 0777 511 $ $ echo $(( 0777 + 0 )) 511 -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Janis Papanagnou <janis_papanagnou+ng@hotmail.com> |
|---|---|
| Date | 2025-10-13 06:20 +0200 |
| Message-ID | <10chumb$1v98o$1@dont-email.me> |
| In reply to | #394535 |
On 11.10.2025 02:02, Kaz Kylheku wrote: > On 2025-10-10, BGB <cr88192@gmail.com> wrote: >> Yeah, '0' by itself indicating octal is weird, so I might agree here. >> 123 //decimal >> 0123 //maybe reinterpret as decimal? >> 0o123 //octal >> 0x123 //hexadecimal >> 0b101 //binary > > Lisp people worked this out before the end of the 80s: > > [1]> 777 > 777 > [2]> 00777 > 777 > [3]> #o777 > 511 > [4]> #x777 > 1911 > [5]> #b1001 > 9 > > Leading zeros changing base is really a sneaky stupidity, and causes > problems in shell scripts also, from time to time. > > $ printf "%d\n" 0777 > 511 > $ > $ echo $(( 0777 + 0 )) > 511 > Yes, indeed. And behavior between shells and versions differs as well. $ dash -c 'printf "%d\n" 077' 63 $ ksh93u -c 'printf "%d\n" 077' 63 $ ksh93u+ -c 'printf "%d\n" 077' 77 Now, is that good that ksh has fixed that? (I have my doubts.) Also if you get actual values from variable expansion (as opposed to constant literals) you may get surprises. At some point I used [in Kornshell] often explicit "base specifiers" (which is not generally available in shells), base#number $ ksh93u -c 'printf "%d\n" 10#077' 77 especially sensible if used with variables containing arbitrary number formats where the leading zero is hidden in 'var'. $ ksh93u -c 'var=077; printf "%d\n" 10#$var $var' 77 63 Leading zero octals is yet another badly designed language feature, and not only in the shell language - but that ship has sailed... Janis
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-10 15:01 -0500 |
| Message-ID | <10cboq0$a8ak$1@dont-email.me> |
| In reply to | #394527 |
On 10/10/2025 5:06 AM, David Brown wrote:
> On 10/10/2025 08:27, BGB wrote:
>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>> bart <bc@freeuk.com> writes:
>
>>>
>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>> all instances of:
>>>>> char *s0, *s1;
>>>>> To:
>>>>> char* s0, s1;
>>>>> Well, this is likely to get old, unless it still uses, or allows C
>>>>> style declaration syntax in this case.
>>>>
>>>> That one's been fixed (50 years late): you instead write:
>>>>
>>>> typeof(char*) s0, s1;
>>>>
>>>> But you will need an extension if it's not part of C23.
>>>
>>> Yes, that will work in C23, but it would never occur to me to
>>> write that. I'd just write `char *s0, *s1;` or, far more likely,
>>> define s0 and s1 on separate lines. Using typeof that way triggers
>>> my WTF filter.
>>>
>>
>> Agreed.
>>
>>
>>
>> I think it can be contrast with C# style syntax (with "unsafe") where
>> one would write:
>> char* s0, s1;
>
> Does C# treat s1 as "char*" in this case? That sounds like an
> extraordinarily bad design decision - having a syntax that is very like
> the dominant C syntax yet subtly different.
>
Yes. In this case, things like "*" or "[]" are associated with the type
rather than the declarator.
> Issues like this have been "solved" for decades - in the sense that
> people who care about their code don't make mistakes from mixups of
> "char" and "char*" declarations. There are a dozen different ways to be
> sure it is not an issue. Simplest of all is a style rule - never
> declare identifiers of different types in the same declaration. I'd
> have preferred that to be a rule baked into the language from the start,
> but we all have things we dislike about the C syntax.
>
The partial reason for some of the differences is that it allows a
parser that does not need to know about previous typedefs and declarations.
In C, you need to know prior typedefs to parse correctly.
In C++, you also need to know previous template declarations, etc.
With classes/structs/etc adding implicit typedefs.
Avoiding the need to know typedefs in advance allows for a parser where
there either is no preprocessor (Java), or the preprocessor still exists
but its use is far more limited in scope and mostly unused (C#).
Also typically, things like the type-system are handled later in the
pipeline (in .NET, it was closer to what would be considered the linker
stage in a traditional compiler).
In effect, the front-end process works with relatively incomplete
information, producing IL bytecode that specifies where to look for
things and what to look for, but not the complete information. When an
EXE or DLL is produced, it would resolve things for what exists within
the current "assembly" (roughly equal to the EXE or DLL being compiled),
with the ".NET runtime" needing to sort out the rest (typically AOT
compiling the binaries into some internal form).
However, I would assume not having a "runtime" here, meaning the linker
would need to produce native code binaries.
FWIW: BGBCC also generally uses a bytecode representation internally,
and then produces native binaries as output. Though, the way the
bytecode is structured and works differs from that of .NET bytecode.
However, in both cases, they are using implicitly-typed stack machines
at the IL stage. In BGBCC, for the backend stage, the bytecode IL is
translated into "Three-Address-Code" roughly in "SSA Form" (though, not
exactly the same as in LLVM; as it typically uses a combination of
variable-ID and sequence-number, rather than creating a new "register"
every time; also typically the "phi" operations are implicit).
Can note that it does support ASM, but the handling is generally that
any ASM code is preprocessed and then passed through the IL stage as
string blobs (then assembled in the backend stage).
Note, while it is possible to go more directly from a stack IL to native
code (without going through 3AC/SSA), the generated code is garbage.
Also, while it is possible to have a compiler that uses SSA as an
on-disk IR format (like Clang), IMO this creates a lot of pain and
exposes too much of the backend machinery (it would be very much a pain
to use LLVM bitcode in anything other than LLVM).
So, seemingly, a stack-oriented bytecode is the "least pain" option.
Well... Unless they do it like WASM and find other creative ways to
screw it up...
Can note that in the case of a language like C#, the visibility of types
and similar comes through the use of namespaces (which partly take on a
similar role to headers in C or C++, or packages in Java or ActionScript).
Where, say:
namespace foo { using bar.baz; } //C# style
namespace foo { using namespace bar::baz; } //C++ style
package foo { import bar.baz; } //ActionScript style
Though:
import bar.baz.*; //Java
But, Java differs here in that the code structure (and packages) are
directly tied to organization of files in the filesystem (typically with
one class per file).
Contrast, .NET and C# used "assemblies" as the organizing principle; or,
generally, everything that is being compiled together to become a given
EXE or DLL is lumped into a single unit.
Though, one option could be to organize code instead by namespace, with
the toplevel tied to each location in the search path.
Though, with such a compiler, rather than specifying a list of
individual source files, one might specify directories and the compiler
figures things out on its own (basically compiling everything in a given
directory).
One way to handle things like static libraries would be to build a blob
of intermediate bytecode (and/or native-code COFF or ELF objects) along
with a manifest database. The bytecode blob would contain all of the IR
for the library (or machine-code if native), and the manifest would only
contain declarations (preferably in a semi-compact form that is
reasonably efficient to search). The manifests could then partly be used
for knowing about declarations, and also for which objects or libraries
to pull into the program being compiled (rather than giving them
individually on the command-line).
This approach would differ from .NET which embeds all of the metadata
into the object-files and distributable binaries. But, here I am
assuming that the final binary is a bare native EXE or DLL image here;
meaning that any manifest data for a DLL would need to be handled more
like an "import library".
In .NET, generally the EXE or DLL was merely being used as an external
packaging scheme for holding the VM's IR image (typically with no actual
machine code in the ExE/DLL; or for EXE's merely a stub to try to launch
the .NET runtime).
Can note that JVM uses JAR files that repurpose the ZIP format, but ZIP
is a high-overhead format when used in this way. For my own uses, I
typically used a variant of the WAD2 format, or a custom format I called
WAD4, which also has lower overheads if compared with ZIP.
Can note:
WAD2: Originated with Quake
Typically has 16-byte names, no directories.
I have a variant that adds directories,
but names drop to 14 chars if non-root.
Encodes directories by each entry encoding its parent directory.
Names are typically stored without file extensions.
WAD4: Custom, used in my project some for data packaging and small VFS
Similar to WAD2, but with directory trees;
Name size is expanded to 32 bytes;
Also had some amount of Unix style file metadata.
Mostly for when used as a VFS.
Namely: UID/GID/Mode.
Contrast to an actual filesystem, which typically has a more complex
structure. But, a format like WAD2 or WAD4 can keep overhead low (and
both are more versatile than the IWAD/PWAD format used in Doom; which
only stored a flat list of 8-byte names).
Decided to leave out a bunch of stuff (for conciseness) and note that
the most likely option ATM could be to use a further modified form of
WAD2 for manifests, where:
If payload data is small enough, it may be stored inline in the
directory entry;
If the name is too large for the name field, it is stored externally
(similar to payload data).
Metadata would likely be structured in a way that is superficially
similar to the "Windows System Registry". Within my compiler, I had
already used a similar system, though in the past the metadata had been
expressed in a textual form (based similar to the REG format used by
Windows, usually when installing stuff; itself derived from the INI
format). In this way, the WAD lumps being used more as key/values or
data blobs rather than in a file-like way.
Can note that my considered format is somewhat different from the "hive"
format used by Windows, but that format would be needlessly bulky if
used for compiler metadata (and more suited for HDD based access, not so
much for blobs to be read into RAM buffers).
All this stuff could likely differ from one compiler implementation to
another though (similar to how compilers may differ as to which format
they use for object files and static libraries, and mostly no one
notices or cares).
...
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-10-10 23:45 +0000 |
| Message-ID | <20251010162821.357@kylheku.com> |
| In reply to | #394527 |
On 2025-10-10, David Brown <david.brown@hesbynett.no> wrote:
> On 10/10/2025 08:27, BGB wrote:
>> On 10/9/2025 10:59 PM, Keith Thompson wrote:
>>> bart <bc@freeuk.com> writes:
>
>>>
>>>>> One merit is if code can be copy-pasted, but if one has to change
>>>>> all instances of:
>>>>> char *s0, *s1;
>>>>> To:
>>>>> char* s0, s1;
>>>>> Well, this is likely to get old, unless it still uses, or allows C
>>>>> style declaration syntax in this case.
>>>>
>>>> That one's been fixed (50 years late): you instead write:
>>>>
>>>> typeof(char*) s0, s1;
>>>>
>>>> But you will need an extension if it's not part of C23.
>>>
>>> Yes, that will work in C23, but it would never occur to me to
>>> write that. I'd just write `char *s0, *s1;` or, far more likely,
>>> define s0 and s1 on separate lines. Using typeof that way triggers
>>> my WTF filter.
>>>
>>
>> Agreed.
>>
>>
>>
>> I think it can be contrast with C# style syntax (with "unsafe") where
>> one would write:
>> char* s0, s1;
>
> Does C# treat s1 as "char*" in this case? That sounds like an
> extraordinarily bad design decision - having a syntax that is very like
> the dominant C syntax yet subtly different.
The detailed properties of C syntax do not have that much mind share
in the kind of development done in C# and its ilk.
Only a minority of developers within a minority moving between
C and C# would suffer from confusion.
>
> Issues like this have been "solved" for decades - in the sense that
> people who care about their code don't make mistakes from mixups of
> "char" and "char*" declarations. There are a dozen different ways to be
> sure it is not an issue. Simplest of all is a style rule - never
> declare identifiers of different types in the same declaration.
> have preferred that to be a rule baked into the language from the start,
> but we all have things we dislike about the C syntax.
But the C syntax lets us factor out a common, complex part of the type
between two declared entities into the stem, so that we then highlight
what is different between them, without using a typedef alias. And the
fact that they are in the declaration, shows they are related.
struct foo {
/* lotsa members */
} x[42], *px = x, px_end = x + 42;
I /think/ that Java goes further in that you can factor out array
derivation into the stem:
int[3] a[4], b; // don't quote me on it
But something occurs to me. typedef shouldn't be a storage class;
that is silly. typedef should be something you can derive in a
declarator. Then you could do this:
struct {
/* lotsa members */
} typedef(foo), x[42], *px = x, px_end = x + 42;
How about a two-argument variant of typedef for use in
any part of a declarator:
int typedef(typedef(*, ptr_t)foo[42], array_t);
This is just
int *foo[42];
in which the pointer to int is typedefed as ptr_t, the array of 42
of those as array_t, and foo is declared as an object of that
array type.
Maybe :typedef syntax could be better.
struct {
/* lotsa members */
} foo : typedef, x[42], *px = x, px_end = x + 42;
and
int *:typedef(ptr_t) foo[42]:typedef(array_t);
Same thing: when the pointer is derived via *, the :typedef(name) syntax
takes a snapshot of that type and stores it into the scope under that
typedef name. Same thing with :typedef(arg) after the array declarator.
:)
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2026-01-06 18:24 -0800 |
| Message-ID | <86h5syqhwk.fsf@linuxsc.com> |
| In reply to | #394524 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > bart <bc@freeuk.com> writes: > >> On 09/10/2025 04:49, BGB wrote: > > [...] > >> Nobody cares about C syntax. > > That is so manifestly untrue that I can't imagine what you actually > meant. > > Many of us, myself included, don't particularly like some aspects of > C syntax, but that's not the same as not caring about it. > >> Learning all its ins and outs seems be >> a rite of passage. > > Perhaps. It's also necessary if you want to work with the language. > >> The trouble is that C-style is so dominant, few people would know >> what a decent syntax looks like. Or, more, likely, they associate >> a clean, well-designed syntax with toy or scripting languages, and >> can't take it seriously. >> >> But if it looks as hairy as C++ then it must be the business! > > C syntax has survived and been propagated to other languages because > it's well known, not, I think, because anybody really likes it. I like C syntax, most of it anyway. Even the parts I don't especially like aren't horrendous, which is more than I can say for some other popular languages.
[toc] | [prev] | [next] | [standalone]
| From | Janis Papanagnou <janis_papanagnou+ng@hotmail.com> |
|---|---|
| Date | 2025-10-14 06:29 +0200 |
| Message-ID | <10ckjjf$2mg8q$1@dont-email.me> |
| In reply to | #394517 |
(Sorry for the delayed reply; your ~450 lines post was too long for me to consider a timely reply.) On 09.10.2025 05:49, BGB wrote: > On 10/8/2025 2:04 PM, Janis Papanagnou wrote: >> On 08.10.2025 19:29, BGB wrote: >>> On 10/8/2025 8:59 AM, Janis Papanagnou wrote: > > Throughout much of my life, C++ has been around, but using it has often > turned into a footgun. Early on the code had a bad habit of breaking > from one compiler version to another, or the ability to compile C++ code > in general would be broken (primarily with Cygwin and MinGW; where > whether or not "g++" worked on a given install attempt, or with a given > program, was very hit or miss). I used it early on on various Unix platforms; all had some details different - like the way how templates worked in the development environment - but nothing was really an issue; as with current configuration settings this was covered and handled by the build system. It doesn't astonish me the least if you've faced specific problems on the Windows platforms. > [...] > > In most cases, it left C as a more preferable option. > C can be made to do the same stuff at similar performance, with often > only minimal difference in expressive power. The problem is, IMO, rather that "C", in the first place, doesn't compare to C++ in its level of "expressive power". > > And, the main "powerful" tool of C++, templates, (IMO, the main powerful tool was primarily classes, polymorphisms, also [real] references.) > tending to do bad > things to build times and result in excessive code bloat. I recall that initially we had issues with code bloat, but I don't recall that it would have been a problem; we handled that (but, after that long time, don't ask me how). > > And, if one tries to avoid C++'s drawbacks, the result was mostly code > that still looks mostly like C. (That sounds as if you haven't used OO designs, reference parameters, overloading, and so on, obviously.) > > Though, similar was often a problem in my other language design > attempts: The most efficient way to do things was often also the C way. IME, *writing* software in "C" requires much more time than in C++; presuming you meant that with "most efficient way to do things". (Saving a few seconds in "C" compared to C++ programs can hardly be relevant, I'd say; unless you were not really familiar with C++ ? Or have special application areas, as I read below in the post.) > [...] > > Some amount of my stuff recently has involved various niche stuff. > Interfacing with hardware; > Motor controls; > Implementing things like an OpenGL back-end or similar; > Being used for a Boot ROM and OS kernel; > Sometimes neural nets. "Nice. - I've done Neural Net simulations with C++ back these days.) > > Few traditional languages other than C work well at a lot of this. > > > A usual argued weakness of C is that it requires manual memory > management. But, OTOH, you *really* don't want a GC in motor controls or > an OS kernel or similar. > > Like, if the GC triggers, and an interrupt handler happens at a bad > time, then you have a problem. > > Or, if you have a 1us timing tolerance for motor controls and this gets > blown because the GC takes 75ms, etc... Sure, you should know where to use static memory, dynamic management organized yourself, or "I-don't-want-to-care" and use GC management, or a sensible deliberate mixture of that (if the language allows). (I've never used GC with C++; is that meanwhile possible?) > [...] > > Maybe C will be around indefinitely for all I know. Not unlikely. > > Like, the passage of time still hasn't totally eliminated FORTRAN and > COBOL. There's obviously some demand. *shrug* - I don't care much. - My last "contact" with FORTRAN was when one of my children was asked to handle some legacy library code; my suggestion was to get rid of that task. > And, C is far more commonly used than either. > > Unless maybe something can come along that is a better C than C... There's so many languages meanwhile - frankly, there were already a lot back then, four decades ago! - so I don't think the proliferation will stop; I don't think that evolution is a good thing. It seems that often the inventors have their own agenda and the success of languages depends mainly on the marketing efforts and the number of fan-people that got triggered by newly invented buzzwords, and an own invented terminology [for already existing old concepts]! > [...] > >> I certainly agree to what a "clean language" can be. >> >> My opinion on that is, though, that the "C" base of C++ is part of >> the problem. Which doesn't let it appear to me "C" to be "better" >> than C++, but that the "C" base is part of C++'s problem. (Here >> I'm not speaking about "C++"'s own problems that probably entered >> about with C++0x/C++11, IMO. - Mileages certainly vary.) >> > > Possibly. > > > A new C-like language need not necessarily be strictly C based. (There's a couple things I like in "C". But if I'd have to invent a language it would certainly not be "C-like". I'd took a higher-level [better designed] language as paragon and support the "C" features I like, if not already present in that language.) > > My thinking would be likely keeping a similar basic syntax though, > though likely more syntactically similar to C#, (But the syntax is one of C's and descendants' problem, IMO. - Part of what was described in existing "C-like" languages is either the less-desired elements or deviations, but the latter will probably just add to confusion if details are subtle. It's already bad enough with subtle differences between different "C" standards it seems.) > but retaining more in > terms of implementation with C and C++. (But weren't exactly these languages already [partly] invented with such an agenda?) > > Would likely simplify or eliminate some infrequently used features in C. > > Possibly: > Preprocessor, still exists, but its role is reduced. > Its role can be partly replaced by compiler metadata. > Trigraphs and digraphs: Gone; > K&R style declarations, also gone; > Parser should not depend on previous declarations; > Non trivial types and declarator syntax: Eliminate; > ... Sounds all reasonable to me. > > Possibly: > Pointers and arrays can be specified on the type rather than declarator > (so, more like C# here) (Yeah, but mind the comments on effects of "subtle differences".) > [...] > > Though, the harder problem here isn't necessarily that of designing or > implementing it, but more in how to make its use preferable to jus > staying with C. Well, as formulated, that's an individual thing. Meanwhile I have the freedom to use what I like in my recreational activities, but if we consider professional projects there's conditions and requirements to take into account. > > One merit is if code can be copy-pasted, but if one has to change all > instances of: > char *s0, *s1; > To: > char* s0, s1; Such changes would be annoying. (And I say that with a strong aversion of C's declaration syntax.) - For me, "C" is not a good base; neither to keep its bad syntax nor to have to change it alike in subtle ways. My style is anyway another; [mostly] separate declarations, and those initialized, as in char * s0 = some_alloc (...); char * s1 = 0; More important is that such declarations may appear anywhere not just at the begin of a block. (I'm still traumatized by K&R, I suppose.) > [...] > > Java and C# had made 'char' 16-bit, but I now suspect this may have been > a mistake. It may be preferable instead keep 'char' as 8 bits and make > UTF-8 the default string format. In the vast majority of cases, strings > hold primarily or entirely ASCII characters. I think we should be careful here! An Unicode "character" may require even 32 bit, but UTF-8 is just an "encoding" (in units of an octet). If we want a sensible type system defined we should be aware of that difference. The question is; what shall be expressed by a 'char' type; the semantic entity or the transfer syntax. (This question is similar to the Unix file system, also based on octets; that made it possible to represent any international multi-octet characters. There's some layer necessary to get from the "transfer-syntax" (the encoding) to the representation.) - What will, say, a "C" user expect from 'char'; just move it around or represent it on some output (or input) medium. > > Also, can probably have a string type: > string str="Some String"; > But, then allow that string is freely cast to "char*", ... (Wasn't that so in C++? - And in addition there's the corresponding template classes, IIRC. - But I don't recall all the gory details.) > Well, and that the underlying representation of a string is still as a > pointer into a string-table or similar. > Also the design of the standard library should remain conservative and > not add piles of needless wrappers or cruft. Not sure what you have in mind here. Personally, despite some resentment on some of the complex syntax and constructs necessary, I liked the C++ STL; its orthogonality and concepts in principle. (And especially if compared to some other languages' ad hoc "tool-chest" libraries I stumbled across.) >>> [...] >>> >>> Like, one can throw out the whole mess that is dealing with >>> Multiple-Inheritance >> >> Well, when I started with C++ there wasn't multiple-inheritance >> available. Personally thinking its omission would be a mistake; >> I missed it back these day. >> >> I'm not sure what "mess" you have in mind. - Explicit qualification >> isn't a hindrance. Weakening the independence of classes in complex >> multi-level class-topologies is something under control of the >> program designer. - So it's fine to have it with all design options >> it opens. > > There is both implementation complexity of MI, and also some added > complexity with using it. The complexity gets messy. (Okay, if that's what you took from it, I of course accept it. But I'd have more expected that you might have dislike of some STL parts than [multiple] inheritance.) > > > The SI + Interfaces model can reduce both. I've used classes with only "pure virtual" functions to achieve the interface abstraction; since I could easily design what I needed with standard features and practically no overhead I thus wasn't missing the 'interface' feature. (But of course I can see the implementation argument you make.) > Granted, these can grow their own warts (like default methods or > similar), but arguably still not as bad as MI. (Well, I appreciated it to have that feature available in C++, even though my first OO language, Simula, didn't support it, so I was used to not having it when I got into C++ and liked it.) > > I am more thinking from the perspective of implementing a compiler. Hah! Yeah. - Recently in another NG someone disliked a feature because he had suffered from troubles implementing it. (It was not MI but formatted I/O in that case.) - I'm not implementing complex languages, so I guess I can feel lucky if someone else did the language implementation job and I can just use it. > [ implementation issues snipped and gracefully skipped ] > [...] > Virtual inheritance still means one can't just call the copy logic for > each parent class when copying a derived class; (I don't think I agree here. - Or are you still talking of the implementers' challenges? - But never mind. Programming in C++ I could model everything I liked. That was really nice.) Janis > [...]
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-14 20:13 -0500 |
| Message-ID | <10cmsjg$3bdrj$1@dont-email.me> |
| In reply to | #394544 |
On 10/13/2025 11:29 PM, Janis Papanagnou wrote:
> (Sorry for the delayed reply; your ~450 lines post was too long for
> me to consider a timely reply.)
>
> On 09.10.2025 05:49, BGB wrote:
>> On 10/8/2025 2:04 PM, Janis Papanagnou wrote:
>>> On 08.10.2025 19:29, BGB wrote:
>>>> On 10/8/2025 8:59 AM, Janis Papanagnou wrote:
>
>>
>> Throughout much of my life, C++ has been around, but using it has often
>> turned into a footgun. Early on the code had a bad habit of breaking
>> from one compiler version to another, or the ability to compile C++ code
>> in general would be broken (primarily with Cygwin and MinGW; where
>> whether or not "g++" worked on a given install attempt, or with a given
>> program, was very hit or miss).
>
> I used it early on on various Unix platforms; all had some details
> different - like the way how templates worked in the development
> environment - but nothing was really an issue; as with current
> configuration settings this was covered and handled by the build
> system.
>
> It doesn't astonish me the least if you've faced specific problems
> on the Windows platforms.
>
It was pretty variable, but usual thing was that trying to build any
kind of C++ code (even a trivial "Hello World") would on some installs
of these compilers, simply die in a storm of error messages.
Well, and for a given Cygwin install attempt, whether or not "g++" would
work, etc, was a bit like playing roulette.
After switching to MSVC, things were a little more stable here.
But, by then there were other issues.
>> [...]
>>
>> In most cases, it left C as a more preferable option.
>> C can be made to do the same stuff at similar performance, with often
>> only minimal difference in expressive power.
>
> The problem is, IMO, rather that "C", in the first place, doesn't
> compare to C++ in its level of "expressive power".
>
?...
I have yet to find much that can be expressed in C++ but is not also
expressible in C.
The main things that are fundamentally different, are things like
Exceptions and RTTI, but even in C++, these don't come free.
Though, if exceptions are implemented using an approach similar to VEH
in the Windows X64 ABI, it is at least modest.
>>
>> And, the main "powerful" tool of C++, templates,
>
> (IMO, the main powerful tool was primarily classes, polymorphisms,
> also [real] references.)
>
These can be done in C via manually written vtables, and passing the
address of a variable.
>> tending to do bad
>> things to build times and result in excessive code bloat.
>
> I recall that initially we had issues with code bloat, but I don't
> recall that it would have been a problem; we handled that (but,
> after that long time, don't ask me how).
>
>>
>> And, if one tries to avoid C++'s drawbacks, the result was mostly code
>> that still looks mostly like C.
>
> (That sounds as if you haven't used OO designs, reference parameters,
> overloading, and so on, obviously.)
>
We can do OO, just using a different approach, say:
typedef struct FooObj_s FooObj;
typedef struct FooObj_vt_s FooObj_vt;
struct FooObj_vt_s (
void *resv1;
void *resv2;
void *resv3;
void *resv4;
int (*Method1)(FooObj *self, int x, int y);
int (*Method2)(FooObj *self, int x, int y, int z);
...
};
struct FooObj_s {
FooObj_vt *vt;
int w;
...
};
And, references as:
int someFunction(int *rvar);
...
someFunction(&somevar);
It all works, and doesn't require significantly more LOC than it would
have in C++.
>>
>> Though, similar was often a problem in my other language design
>> attempts: The most efficient way to do things was often also the C way.
>
> IME, *writing* software in "C" requires much more time than in C++;
> presuming you meant that with "most efficient way to do things".
>
> (Saving a few seconds in "C" compared to C++ programs can hardly be
> relevant, I'd say; unless you were not really familiar with C++ ?
> Or have special application areas, as I read below in the post.)
>
Main limiting factor at present is that it is a harder issue to write a
non-trivial C++ compiler.
I could write C++ code, but then it isn't really portable outside
running on my PC or similar.
Though, I have a mostly usable C compiler at least.
At least, usable for porting single programs.
Trying to port something like the Linux userland, not so much.
Too much stuff here is written to assume GCC.
Some simple programs worked with "./configure" scripts and getting it to
mimic GCC enough that configure will try to use it as a cross compiler,
but then programs invariably break when trying to use various GCC'isms
or trying to rely on glibc specific stuff or other Linux specific
headers or so on.
So, yeah, nowhere near up to the level of trying to deal with trying to
port "bash" and "coreutils" and similar.
But, was able to experimentally port things like "Quake 3 Arena" and
similar, though Q3A is a little impractical on a 50MHz CPU; but Doom
runs well.
Granted, a new language would not really address and of the "make
existing software work" issues.
>> [...]
>>
>> Some amount of my stuff recently has involved various niche stuff.
>> Interfacing with hardware;
>> Motor controls;
>> Implementing things like an OpenGL back-end or similar;
>> Being used for a Boot ROM and OS kernel;
>> Sometimes neural nets.
>
> "Nice. - I've done Neural Net simulations with C++ back these days.)
>
I have experimented with some, but in this case mostly using a lot of SIMD.
I had noted that in some cases, like primarily SIMD heavy NN code, my
50MHz FPGA soft-processor could compete surprisingly well with an early
2000s laptop.
But, then again, also noted by benchmarking said laptop:
memcpy: ~ 450 MB/sec;
x87 multiply-accumulate: ~ 60 MFLOP.
CPU speed: 1400 MHz, 32-bit x86.
Has MMX and similar, but was not using MMX.
Had noted the process is mostly bandwidth limited.
The SIMD unit on my soft-processor has a theoretical hard-limit of 200
MFLOP, but if using compact formats (mostly FP8 for storage, FP16
internally) and careful pipelining, can approach similar performance to
the laptop at this task.
I had experimented some with more compact encodings for weights, for
example:
FP8U A/B (E4.M4)
3-bit per value: S,FF (S=Sign)
A<=B: Interpolated: A, B, (5/8)A+(3/8)B, (5/8)B+(3/8)A
A> B: Similar, but '111' encodes 0, ...
With interpolation as bytes, results unpacked into vectors of 4x
Binary16 (with 4 weight vectors in 64 bits).
Could also be used as:
Monochrome HDR format with 16 texels per block;
A color HDR format with 4 texels per block.
In this case, there is a big gap in main RAM bandwidth, which seems to
be a big issue with this task. But, the RAM bandwidth gap is reduced by
using more compact storage for values (with special-purpose
instructions, like a Load that also performs the 4xFP8 to 4xFP16
conversion, ...).
But, yeah, getting "even anywhere close" is kinda notable given the
laptop has a 28x clock-speed advantage.
Though, the laptop is far more powerful at running Quake and similar (no
contest regarding Quake performance).
The FPGA soft-processor could maybe compete better if it could do
Binary16 SIMD multiply-accumulate operations, but the latency would be
too high and I couldn't pipeline it.
Getting this result requires writing ASM and manually scheduling the
pipeline though.
>>
>> Few traditional languages other than C work well at a lot of this.
>>
>>
>> A usual argued weakness of C is that it requires manual memory
>> management. But, OTOH, you *really* don't want a GC in motor controls or
>> an OS kernel or similar.
>>
>> Like, if the GC triggers, and an interrupt handler happens at a bad
>> time, then you have a problem.
>>
>> Or, if you have a 1us timing tolerance for motor controls and this gets
>> blown because the GC takes 75ms, etc...
>
> Sure, you should know where to use static memory, dynamic management
> organized yourself, or "I-don't-want-to-care" and use GC management,
> or a sensible deliberate mixture of that (if the language allows).
>
> (I've never used GC with C++; is that meanwhile possible?)
>
It is possible to use conservative mark/sweep collectors in C.
But performance leaves something to be desired.
Younger me tried to do this, but even for things like 3D engines, I
ended up trying more to find ways to avoid needing to run the GC.
>> [...]
>>
>> Maybe C will be around indefinitely for all I know.
>
> Not unlikely.
>
>>
>> Like, the passage of time still hasn't totally eliminated FORTRAN and
>> COBOL.
>
> There's obviously some demand. *shrug* - I don't care much. - My last
> "contact" with FORTRAN was when one of my children was asked to handle
> some legacy library code; my suggestion was to get rid of that task.
>
In my case, I don't have any descendants.
Apparently they still exist in some places, mostly as languages that no
one uses.
Seemingly a lot of businesses made a migration from COBOL to Java.
>> And, C is far more commonly used than either.
>>
>> Unless maybe something can come along that is a better C than C...
>
> There's so many languages meanwhile - frankly, there were already a
> lot back then, four decades ago! - so I don't think the proliferation
> will stop; I don't think that evolution is a good thing. It seems that
> often the inventors have their own agenda and the success of languages
> depends mainly on the marketing efforts and the number of fan-people
> that got triggered by newly invented buzzwords, and an own invented
> terminology [for already existing old concepts]!
>
Apparently the languages people are trying to push as C replacements are
mostly Rust, Zig, and Go.
None of these particularly compel me though.
They seem more like needless deviations from C than a true successor.
I guess the older generations mostly had Pascal and Ada.
There was ALGOL, but both C and Pascal descended from ALGOL.
>> [...]
>>
>>> I certainly agree to what a "clean language" can be.
>>>
>>> My opinion on that is, though, that the "C" base of C++ is part of
>>> the problem. Which doesn't let it appear to me "C" to be "better"
>>> than C++, but that the "C" base is part of C++'s problem. (Here
>>> I'm not speaking about "C++"'s own problems that probably entered
>>> about with C++0x/C++11, IMO. - Mileages certainly vary.)
>>>
>>
>> Possibly.
>>
>>
>> A new C-like language need not necessarily be strictly C based.
>
> (There's a couple things I like in "C". But if I'd have to invent a
> language it would certainly not be "C-like". I'd took a higher-level
> [better designed] language as paragon and support the "C" features I
> like, if not already present in that language.)
>
I would think some major goals might be:
Allowing for a compiler with a smaller code footprint.
Though, the backend is often a big source of pain here.
Language is reasonably clean and orthogonal;
Is amendable to efficient code generation;
Low requirements for implementation overhead.
Should aim for similar hard constraints to C.
Should still be usable for bare-metal and firmware.
And for OS kernel programming.
Sadly, cleaning up the frontend language wont do as much to simplify the
backend.
Cleaning up the backend mostly means needing to limit complexity in
areas that effect code generation:
Corner cases in data representation and the type-system;
Conner cases in the native ABI;
...
My preference is to keep a C family syntax, sorta like C# or GLSL.
As noted elsewhere, my thinking is partly that pipeline looks like:
Preprocessor (basic or optional, C like)
Parser (Context-independent, generates ASTs)
Front end compiler: Compiles ASTs to a stack IL.
With front-end semi-type-aware.
Core language should only require frontend to understand primitive types
(like in C# and Java; with complex types offloaded to backend). Would
aim to eliminate headers mostly because headers add considerable bulk to
the ASTs (far more time and memory often spent dealing with header stuff
than the actual code in the translation units).
Backend:
IL -> 3AC/SSA;
Does code generation and similar.
Likely, most packaging (for IL object files and static libraries) would
be based around a variation of the WAD format (probably WAD2 based;
though in simple cases the Doom IWAD/PWAD format works well).
One of my past (stalled) attempts at doing a smaller C compiler had been
using a modified WAD in place of the COFF format, though in this case it
is debatable how much was really saved by using WAD in place of COFF
here (and some of the tables partly derived from ELF as well).
It is kinda pros/cons between modified WAD and RIFF-style TLV formats.
RIFF is more traditional;
But, WAD sometimes fits use-patterns better, and can be adapted to
different contexts.
One use is to add an additional magic to encode the use-case of the header.
Would be nice if I could manage the thing of fitting a full-featured
compiler in under 40k lines.
>>
>> My thinking would be likely keeping a similar basic syntax though,
>> though likely more syntactically similar to C#,
>
> (But the syntax is one of C's and descendants' problem, IMO. - Part
> of what was described in existing "C-like" languages is either the
> less-desired elements or deviations, but the latter will probably
> just add to confusion if details are subtle. It's already bad enough
> with subtle differences between different "C" standards it seems.)
>
Some simplification is possible, particularly regarding things like
declarations; without drastically changing the look of the language.
So, the language may still look like C, but be a little easier to parse.
Keeping the general syntax intact helps with familiarity and ease of
writing code for those who already know similar languages. Though, yes,
looking mostly similar to C, but not exactly, could annoy some people.
Though, mostly, the syntax could follow similar patterns to C# and Java.
Would differ from Java mostly in the avoidance of needless verbosity;
and allowing a more free-form program structure.
It is also possible to allow for a subset of code that is valid in both
languages.
>> but retaining more in
>> terms of implementation with C and C++.
>
> (But weren't exactly these languages already [partly] invented with
> such an agenda?)
>
?...
I am imagining something that basically does similar stuff to what C
already does, and can ideally be used in a similar context.
The main downsides is that C and C++ are more complicated than ideal in
many areas. This has a detrimental effect on compilers.
Not so much intending to make a language that tries to be more intuitive
or hand-holding though. However, if it is possible to make provisions
for things like static-analysis or bounds-checked arrays (in a way that
ideally doesn't adversely effect performance), this can be nice.
In some cases, one can try to pass some compile-tie metadata through the
type-system, but this has the downside of adding complexity for the
compiler. Though, it could be allowed in cases where mostly relevant for
static analysis but without actually adding new requirements for the
implementation.
Say, for example:
"int*" and "int[]" are equivalent for the ABI and for a minimal
implementation, but a more advanced implementation is allowed to
constrain the allowed semantics for "int[]" in ways that would not
necessarily be valid for "int*".
Well, and you could "have your cake and eat it too", say, by having
"int[]" and friends allow for very aggressive TBAA (type-based alias
analysis) but "int*" is assumed to alias readily.
Also, "int[]" can be assumed to potentially convey implicit array
bounds, whereas "int*" can be assumed to not convey array bounds (even
if the compiler represents both, at the machine level, as a bare pointer
to the first element of the array).
So, for example:
Foo obj1;
Bar obj2;
Where obj1 and obj2 may only alias in the case of subclass/superclass
relationships (but, if not potentially the same class instance; can be
assumed that no alias is possible).
Yeah, I am aware there is the "provenance" thing, but personally I fail
to understand how exactly the provenance model is supposed to work (so
it makes more sense to me to operate within the limits of more
conventional aliasing semantics; and defining rules for when it is safe,
and when it is not safe, to assume non-alias based on types or similar).
Well, and personally I feel "assume TBAA may wreck your day; so just use
memcpy()" to be a crappy solution. I also sympathize with the desire to
not ask people to put "restrict" or similar all over the place (and to
be able to optimize stuff by assuming that non-aliasing things don't alias).
Realistically, asking the compiler to infer any value flow outside of a
single frame is asking too much. So my model would likely assume that
each function exists as its own island in terms of pointer aliasing.
Well, and a simpler model (used by BGBCC) where taking the address of
something, etc, effectively nukes the ability to assume non-alias.
>>
>> Would likely simplify or eliminate some infrequently used features in C.
>>
>> Possibly:
>> Preprocessor, still exists, but its role is reduced.
>> Its role can be partly replaced by compiler metadata.
>> Trigraphs and digraphs: Gone;
>> K&R style declarations, also gone;
>> Parser should not depend on previous declarations;
>> Non trivial types and declarator syntax: Eliminate;
>> ...
>
> Sounds all reasonable to me.
>
>>
>> Possibly:
>> Pointers and arrays can be specified on the type rather than declarator
>> (so, more like C# here)
>
> (Yeah, but mind the comments on effects of "subtle differences".)
>
Possible, though if there is a type mismatch here, most likely the
compiler will error-out.
>> [...]
>>
>> Though, the harder problem here isn't necessarily that of designing or
>> implementing it, but more in how to make its use preferable to jus
>> staying with C.
>
> Well, as formulated, that's an individual thing. Meanwhile I have the
> freedom to use what I like in my recreational activities, but if we
> consider professional projects there's conditions and requirements to
> take into account.
>
Probably.
I would like something to be a "good" alternative to C, while:
Allowing cheap/simple compiler;
Rules turned to make static analysis less of a pain;
Doesn't overly hinder a memory-safe implementation;
But, also can be used for machine-level development;
Shouldn't be overly unfamiliar to those who know similar languages.
>>
>> One merit is if code can be copy-pasted, but if one has to change all
>> instances of:
>> char *s0, *s1;
>> To:
>> char* s0, s1;
>
> Such changes would be annoying. (And I say that with a strong aversion
> of C's declaration syntax.) - For me, "C" is not a good base; neither
> to keep its bad syntax nor to have to change it alike in subtle ways.
>
> My style is anyway another; [mostly] separate declarations, and those
> initialized, as in
>
> char * s0 = some_alloc (...);
> char * s1 = 0;
>
> More important is that such declarations may appear anywhere not just
> at the begin of a block. (I'm still traumatized by K&R, I suppose.)
>
Yeah.
I would assume allowing putting declarations wherever.
>> [...]
>>
>> Java and C# had made 'char' 16-bit, but I now suspect this may have been
>> a mistake. It may be preferable instead keep 'char' as 8 bits and make
>> UTF-8 the default string format. In the vast majority of cases, strings
>> hold primarily or entirely ASCII characters.
>
> I think we should be careful here! An Unicode "character" may require
> even 32 bit, but UTF-8 is just an "encoding" (in units of an octet).
> If we want a sensible type system defined we should be aware of that
> difference. The question is; what shall be expressed by a 'char' type;
> the semantic entity or the transfer syntax. (This question is similar
> to the Unix file system, also based on octets; that made it possible
> to represent any international multi-octet characters. There's some
> layer necessary to get from the "transfer-syntax" (the encoding) to
> the representation.) - What will, say, a "C" user expect from 'char';
> just move it around or represent it on some output (or input) medium.
>
It is a tradeoff.
But, if "char*" can point to a string, then "char" needs to be the same
size as an item in memory (thus, probably a byte).
Otherwise, it would make sense to have "char" as an alias to "int" and
require "ubyte*" for use as strings. For consistency with C, makes more
sense to assume char to be a byte.
>>
>> Also, can probably have a string type:
>> string str="Some String";
>> But, then allow that string is freely cast to "char*", ...
>
> (Wasn't that so in C++? - And in addition there's the corresponding
> template classes, IIRC. - But I don't recall all the gory details.)
>
C++ string seemingly assumes some sort of object representation (that
could be cast to a pointer).
I am more assuming that it is an implementation type which would be
represented as basically the equivalent of "const char *restrict".
But, with the compiler able to assume that it is a string type, so one
of either:
A pointer to a string literal in some presumably read-only memory area;
A character array or buffer that was interned into a string table.
But not "a pointer to a modifiable character buffer".
In the latter case, "char*" or "char[]" would be considered the correct
types to use.
Nominally, "string" would likely not allow pointer arithmetic, but could
decay into "const char *" or similar, which would allow pointer arithmetic.
While, object-based strings are a perennial feature in many languages,
having them as anything much more complex than a pointer to a string
table adds overhead.
One can argue that one merit of object-based representations is that
then you don't have to use a generic "strlen()"; but for constant
strings and string tables, there is a workaround that I have used in
some of my own past languages:
Look at the preceding byte:
00: Raw string, you will need to "strlen()" it;
01..BF: We are not looking at the start of a string.
C0..EF: Length-prefix present
Encoded as a byte-transposed UTF-8 value.
The prefix can also be used to encode the character encoding, but for
this case I will assume it is always UTF-8.
>> Well, and that the underlying representation of a string is still as a
>> pointer into a string-table or similar.
>
>> Also the design of the standard library should remain conservative and
>> not add piles of needless wrappers or cruft.
>
> Not sure what you have in mind here.
>
> Personally, despite some resentment on some of the complex syntax
> and constructs necessary, I liked the C++ STL; its orthogonality
> and concepts in principle. (And especially if compared to some
> other languages' ad hoc "tool-chest" libraries I stumbled across.)
>
I was primarily thinking of Java and its excessive piles of wrapper
classes. Like, C gives you the stdio functions, which are basic but
effective.
Java has:
WhateverInputStream, WhateverOutputStream,
WhateverRandomAccessWhateverStream, etc.
We don't need this. Java just sort of ran with it, creating piles of
random wrapper classes whose existence serves almost no practical
purpose (and would have been much better served, say, by simply
providing a File class that holds a mock-up of C's stdio interface;
which is, ironically, closer to the approach C# had taken here).
The great sin here of C++ is mostly things like iostream.
I would in any case assume not following Java's pattern of "overly
bureaucratic boilerplate". Well, or assuming that programmers can't
think for themselves and will just look through a word-salad list until
they find whatever class has the combination of words describing the
specific task they intend to do.
>>>> [...]
>>>>
>>>> Like, one can throw out the whole mess that is dealing with
>>>> Multiple-Inheritance
>>>
>>> Well, when I started with C++ there wasn't multiple-inheritance
>>> available. Personally thinking its omission would be a mistake;
>>> I missed it back these day.
>>>
>>> I'm not sure what "mess" you have in mind. - Explicit qualification
>>> isn't a hindrance. Weakening the independence of classes in complex
>>> multi-level class-topologies is something under control of the
>>> program designer. - So it's fine to have it with all design options
>>> it opens.
>>
>> There is both implementation complexity of MI, and also some added
>> complexity with using it. The complexity gets messy.
>
> (Okay, if that's what you took from it, I of course accept it.
> But I'd have more expected that you might have dislike of some
> STL parts than [multiple] inheritance.)
>
Not exactly a fan of STL either, but these are different.
As noted, my concern here is more for compiler complexity, and MI is
more a big thorn in the side to anyone who wants to write their own
compiler.
>>
>>
>> The SI + Interfaces model can reduce both.
>
> I've used classes with only "pure virtual" functions to achieve
> the interface abstraction; since I could easily design what I
> needed with standard features and practically no overhead I thus
> wasn't missing the 'interface' feature.
>
> (But of course I can see the implementation argument you make.)
>
Yeah.
An "abstract base class" can be inferred to be an interface.
So, one could also end up with a C++ implementation that allows:
single inheritance;
abstract base classes.
But doesn't allow true MI.
Ironically this is closer to how BGBCC's attempt at C++ turned out.
>> Granted, these can grow their own warts (like default methods or
>> similar), but arguably still not as bad as MI.
>
> (Well, I appreciated it to have that feature available in C++,
> even though my first OO language, Simula, didn't support it, so
> I was used to not having it when I got into C++ and liked it.)
>
>>
>> I am more thinking from the perspective of implementing a compiler.
>
> Hah! Yeah. - Recently in another NG someone disliked a feature
> because he had suffered from troubles implementing it. (It was
> not MI but formatted I/O in that case.) - I'm not implementing
> complex languages, so I guess I can feel lucky if someone else
> did the language implementation job and I can just use it.
>
I am writing from the POV of someone who did start making an attempt to
implement C++ support, and mostly gave up at roughly an early 1990s
feature level.
If you dropped MI, templates, and pretty much everything following from
these, stuff would be a lot easier.
>> [ implementation issues snipped and gracefully skipped ]
>
>> [...]
>> Virtual inheritance still means one can't just call the copy logic for
>> each parent class when copying a derived class;
>
> (I don't think I agree here. - Or are you still talking of the
> implementers' challenges? - But never mind. Programming in C++
> I could model everything I liked. That was really nice.)
>
Still implementation.
Theoretically, any combination of features that is allowed in the
language should also be allowed by the compiler.
With simpler "POD" classes, it is mostly a "memcpy()" internally.
With things like virtual inheritance and non-trivial inheritance
patterns, "all hell breaks loose".
It solves "the diamond inheritance problem" from the perspective of the
user, but creates a new problem for the implementation:
Now the in-memory layout of the parent classes depends on how they are
used within the derived class.
Combined with the ability to assign classes by-value, as far as compiler
implementation goes, now you have a mess on your hands.
> Janis
>
>> [...]
>
>
[toc] | [prev] | [next] | [standalone]
| From | bart <bc@freeuk.com> |
|---|---|
| Date | 2025-10-15 11:26 +0100 |
| Message-ID | <10cnstb$3irl3$1@dont-email.me> |
| In reply to | #394548 |
On 15/10/2025 02:13, BGB wrote: > Apparently the languages people are trying to push as C replacements are > mostly Rust, Zig, and Go. > > None of these particularly compel me though. > They seem more like needless deviations from C than a true successor. So what would a true successor look like? > > > I guess the older generations mostly had Pascal and Ada. > > There was ALGOL, but both C and Pascal descended from ALGOL. I've heard that before that C was somehow derived from Algol and even Algol 68. But it is so utterly unlike either of those, that if it's from the same family, then it must have been adopted. > As noted elsewhere, my thinking is partly that pipeline looks like: > Preprocessor (basic or optional, C like) > Parser (Context-independent, generates ASTs) > Front end compiler: Compiles ASTs to a stack IL. > Backend: > IL -> 3AC/SSA; That's odd: you're going from a stack IL to a 3AC non-stack IR/IL? Why not go straight to 3AC? (I've tried both stack and 3AC ILs, but not both in the same compiler! I finally decided to stay with stack; 3AC code *always* got too fiddly to deal with. So stack IL is directly translated to register-based, unoptimised native code, which reasonably efficient. Performance is usually somewhere in between Tiny C and gcc-O2.)
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-15 13:00 -0500 |
| Message-ID | <10conk5$3rhj2$1@dont-email.me> |
| In reply to | #394556 |
On 10/15/2025 5:26 AM, bart wrote:
> On 15/10/2025 02:13, BGB wrote:
>
>> Apparently the languages people are trying to push as C replacements
>> are mostly Rust, Zig, and Go.
>>
>> None of these particularly compel me though.
>> They seem more like needless deviations from C than a true successor.
>
> So what would a true successor look like?
>
Probably sorta like C with a few vaguely C++ like features, but with a
cleaner and simpler design.
Should ideally be usable for similar stuff to C.
Not drastically or needlessly different.
Looking around, it seems like the CMU C0 and C1 teaching languages also
seem in the general area design-wise, though they exist more as limited
C-like subset languages intended more for introductory programming for
CS courses.
Could make sense to have some C++ style functionality, but with an aim
of not going down the rabbit hole of adding excessive implementation
complexity.
>>
>>
>> I guess the older generations mostly had Pascal and Ada.
>>
>> There was ALGOL, but both C and Pascal descended from ALGOL.
>
> I've heard that before that C was somehow derived from Algol and even
> Algol 68.
>
> But it is so utterly unlike either of those, that if it's from the same
> family, then it must have been adopted.
>
Idea is that it went ALGOL -> BCPL -> B -> C.
Going the other way, ALGOL was derived from FORTRAN.
ALGOL was also the ancestor of Pascal and Ada, so there was a bit of
mutation there,
>
>> As noted elsewhere, my thinking is partly that pipeline looks like:
>> Preprocessor (basic or optional, C like)
>> Parser (Context-independent, generates ASTs)
>> Front end compiler: Compiles ASTs to a stack IL.
>
>> Backend:
>> IL -> 3AC/SSA;
>
> That's odd: you're going from a stack IL to a 3AC non-stack IR/IL?
>
> Why not go straight to 3AC?
>
> (I've tried both stack and 3AC ILs, but not both in the same compiler! I
> finally decided to stay with stack; 3AC code *always* got too fiddly to
> deal with.
>
Well, the downside of 3AC (as an IL) is that it tends to be fiddly and
often is much more specific to the design choices of the frontend and
backend that produced it.
Also, going from a Stack IL to 3AC is fairly easy, and generally less of
a mess than dealing with a 3AC IL here. Also with 3AC one has to decide
on things like whether or not it is in SSA form, as SSA vs non-SSA
follow different rules.
Downside is that a stack IL is often further from the code you "actually
want to generate" than a 3AC IL would have been (and to generate more
efficient 3AC you may need to generate less-concise stack code, such as
my having the frontend manually use temporary variables, partly negating
some of the conceptual benefits of a stack IR, but alas).
But, on the positive side, the stack manipulations/etc map readily to
SSA form.
A stack IL that makes sense for a compiler might look like:
Stack ops for each major operator;
No explicit types in most instructions.
Type can be carried along the stack.
The .NET IL also did this.
Control flow is via labels and conditional branches.
Typically no items on the stack during a branch.
May make sense to combine common stack-ops with storing to a variable.
Say: "ADD; STORE n" => "ADD_ST n"
Rationale being that this is less work for the backend.
Types can be identified by signature strings.
Granted, one can note that a stack IL typically needs around 70% more
operations than you would need for a 3AC, but most of these operations
will disappear in the conversion process.
one semi-unresolved design issue is whether it is better to have a
single unified numbering space for local variables, like in the JVM and
similar, or several different numbering spaces (arguments, locals, and
temporary variables). In my ILs, I have often ended up going for the latter.
Say, for example, you can encode the "name"/"symbol" for Load/Store/Etc
as a VLN, say:
0xxxxxxx: 0..127
10xxxxxx xxxxxxxx 128..16383
110xxxxx ...: 16384..2M
...
And then use a tagging scheme to encode variable IDs, say:
...xxxx00 Local
...xxxx10 Temporary
...xxx001 Argument
...xxx101 Int32 Literal
...xx0011 Global Variable
...xx1011 String Literal
Where Locals and Temporaries are given the shortest code as these are
more common and preferably have shorter (single byte) encodings when
possible (so, for example, the first 32 local variables can be single
byte, etc).
For integer literals, one can additionally use a zigzag coding
(0,-1,1,-2,2, ...). String literals can be encoded as an offset into a
string table.
for something like a typecast operator, you might encode an offset into
a string table for a type-signature string.
...
Well, sorta, the IL used in BGBCC isn't quite so clean.
It instead encodes strings and symbols inline, and uses a sliding table
to refer back to them when they repeat. This also works, but is more
ugly than encoding IDs and using a string table might have been.
But, string tables make more sense for an externally-structured format.
Ironically, came up with a possible format for manifest files (loosely
WAD based) that could also make sense as an IL packaging format.
Ended up going back and forth between having it be WAD2 or WAD4 based,
instead ended up with a compromise of supporting mixed 32 and 64 byte
entries. Would have a tree structure similar to WAD4, but with the
downside that for the 32-byte entries names are reduced to 10 bytes (vs
32 bytes for the 64-byte entries; or 16 bytes in the original WAD2 format).
But, can debate whether or not this would make sense in a
space-efficiency sense. The design is more focused on semi-efficient
random access rather than compactness (whereas typically bytecode IL
packaging is more focused on being compact).
Though, compactness may not matter as much for things like object-files
which are less likely to be used to actually distribute code.
Though, one merit is that it could more easily allow for a compiler that
decodes stack-IR into 3AC one function at a time, or demand-loads parts
of the image, rather than needing to load everything for the whole
program in advance (and burning a lot of RAM this way).
Annoyingly, even a simple format like IWAD would still end up needing 16
bytes per entry.
But, it can offer more flexibility (and not needing an additional
mechanism to look things up by QName), say, if compared with a format
like RIFF (which has an 8-byte minimum overhead per lump). Well, and the
scheme as-is, allows lumps with <= 12 bytes of payload to encode it
inline in a 32-byte entry, so... Might not be too far behind, may just
make sense to use it, and then LZ compress it if it needs to be more
compact.
> So stack IL is directly translated to register-based, unoptimised native
> code, which reasonably efficient. Performance is usually somewhere in
> between Tiny C and gcc-O2.)
>
I usually go stack -> 3AC, and then 3AC -> Native.
In BGBCC, as-is, there was no separate assembler step, but I now realize
this probably isn't ideal (and I still end up needing an assembler
anyways, just now it is a little more of a mess as it isn't really
cleanly separated from the rest of the 3AC->Native backend).
Long ago, when I originally write it, there was an x86 backend, which
didn't use an assembler. For the SuperH backend, I initially skipped
having an assembler, and my current backend (targeting my BJX2 ISA and
also RISC-V) was derived from a fork off the SuperH backend.
So it continued in not having an assembler, and generating instructions
with big "switch()" blocks, which also scales poorly, but sorta makes
sense when the ISA is smallish.
The stalled out new compiler would have used an assembler, and more so,
an assembler driven by an instruction listing table.
It stalled out though because code footprint quickly exceeded my
original target and still wasn't done enough to be useful.
I was also trying for a more traditional compiler design (one
translation unit at a time, producing native object files as an
intermediate step, with a native-code linker). I suspect now this may
have been a mistake, and Frontend->IL + IL->Native may be a better option.
[toc] | [prev] | [next] | [standalone]
| From | bart <bc@freeuk.com> |
|---|---|
| Date | 2025-10-17 22:07 +0100 |
| Message-ID | <10cub5m$1ch30$1@dont-email.me> |
| In reply to | #394560 |
On 15/10/2025 19:00, BGB wrote:
> On 10/15/2025 5:26 AM, bart wrote:
>> On 15/10/2025 02:13, BGB wrote:
>>
>>> Apparently the languages people are trying to push as C replacements
>>> are mostly Rust, Zig, and Go.
>>>
>>> None of these particularly compel me though.
>>> They seem more like needless deviations from C than a true successor.
>>
>> So what would a true successor look like?
>>
>
> Probably sorta like C with a few vaguely C++ like features, but with a
> cleaner and simpler design.
>
> Should ideally be usable for similar stuff to C.
> Not drastically or needlessly different.
Well, my own language is somewhere at the level of C, yet it looks very
different.
It was a rather crude affair in 1981/82, it has now evolved some modern
conveniences. But I deliberately keep it low-level.
This is a C program to print some square roots:
#include <stdio.h>
#include <math.h>
int main() {
for (int i=1; i<=10; ++i)
printf("%d %f\n", i, sqrt(i));
}
This is it in my language (all are complete programs):
proc main=
for i to 10 do
println i, sqrt i
od
end
Finally, this is it in Zig:
const std = @import("std");
pub fn main() void {
for (1..11) |i| {
std.debug.print("{} {}\n", .{i, @sqrt(@as(f64,
@floatFromInt(i)))});
}
}
This one is interesting in making the C version look remarkably clean
and uncluttered by comparison!
But, yeah, I NEED it to look different from C as I find C syntax
unreadable; it's a kind of 'C-blindness' where source looks more like a
block Mime-encoded text.
>> But it is so utterly unlike either of those, that if it's from the
>> same family, then it must have been adopted.
>>
>
> Idea is that it went ALGOL -> BCPL -> B -> C.
> Going the other way, ALGOL was derived from FORTRAN.
BCPL also looks quite unlike Algol. If C's progression from Algol was
via BCPL, which lost some features, then the claim that it came from
Algol makes even less sense.
I've also heard that there were influences on C from Algol68. That was a
language that used simple left-to-right type syntax, defaulted to
1-based indexing, was (largely) case-insensitive, and used := for
assignment and = for equality.
Each one of this is the exact opposite of C!
My language was genuinely inspired by Algol68, and as such, still has
those exact characteristics.
So I think I'd take such claims of C being inspired by or based on Algol
/or/ Algol68 with a pinch of salt.
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-17 17:44 -0500 |
| Message-ID | <10cuh13$1enaj$1@dont-email.me> |
| In reply to | #394578 |
On 10/17/2025 4:07 PM, bart wrote:
> On 15/10/2025 19:00, BGB wrote:
>> On 10/15/2025 5:26 AM, bart wrote:
>>> On 15/10/2025 02:13, BGB wrote:
>>>
>>>> Apparently the languages people are trying to push as C replacements
>>>> are mostly Rust, Zig, and Go.
>>>>
>>>> None of these particularly compel me though.
>>>> They seem more like needless deviations from C than a true
>>>> successor.
>>>
>>> So what would a true successor look like?
>>>
>>
>> Probably sorta like C with a few vaguely C++ like features, but with a
>> cleaner and simpler design.
>>
>> Should ideally be usable for similar stuff to C.
>> Not drastically or needlessly different.
>
> Well, my own language is somewhere at the level of C, yet it looks very
> different.
>
> It was a rather crude affair in 1981/82, it has now evolved some modern
> conveniences. But I deliberately keep it low-level.
>
As for me, in 1982: I didn't exist yet.
So, alas, my journey mostly starts in the 90s (childhood) and 2000s
(teens / early 20s). But, alas, I have now exceeded 4 decades of existence.
>
> This is a C program to print some square roots:
>
> #include <stdio.h>
> #include <math.h>
>
> int main() {
> for (int i=1; i<=10; ++i)
> printf("%d %f\n", i, sqrt(i));
> }
>
In the language I was imagining as a hypothetical, probably:
using stdio, math;
int main() {
for (int i=1; i<=10; ++i)
printf("%d %f\n", i, sqrt(i));
}
Doesn't really match my personal style, but there is little here that
actually needs to change in the language as-imagined.
>
> This is it in my language (all are complete programs):
>
> proc main=
> for i to 10 do
> println i, sqrt i
> od
> end
>
Yeah.
In my first script language:
for(var i=0; i<=10; i++)
printf("%d %f\n", i, sqrt(i));
Generally there was no "main", but would instead basically execute
everything at the toplevel from top to bottom.
Function syntax was basically:
function foo(x, y)
{ return x+y; }
But, did allow a shorthand:
function foo(x, y) x+y;
Where, no braces meant that the expression was interpreted as the entire
body.
Alternatively:
function foo(x, y) { x+y }
Where, if a function is terminated by an expression with no semicolon,
then it behaved as if there was an implicit return statement.
Initially, it was dynamically typed, but later gained static types:
function foo(x:int, y:int):int { x+y }
Which did basically the same thing, but with the types as 32-bit integers.
Say:
var arr1 = [1,2,3,4,5]; //array, dynamically typed
var arr2:int[] = [1,2,3,4,5]; //array of int
var arr3 = [1,2,3,4,5]i; //array is int[], but arr3 is not.
var obj1 = {x:3, y:4}; //ex-nihilo object with 2 members
...
A later successor language had switched to a more Java like syntax and
structure. It was then in an awkward middle ground of being neither
great as a free-form scripting language; nor great as an implementation
language. Where it competed with C, but still lacked good integration
with C; as I had also switched to a simpler FFI modeled after C#'s
P/Invoke (its predecessor had used header-mining and auto-glue).
A more recent project had gone the complete opposite direction, and used
a language modeled off of 1980s era BASIC.
But, as noted, it ended up diverging some from BASIC, gaining the use of
dynamic scoping and "RETURN expr" + "x = GOSUB label", ...
Arguments could be provided, but also wonky:
Rather than being defined by the function, they are provided caller
side, mostly as names that are bound within the newly created dynamic
frame of the called label.
In this latter case, no separate parser, no bytecode, ...
Instead, the program was basically parsed into an array of token IDs
(grouped into lines), and the interpreter walked the tokens (with all
the keywords and special tokens defined first so that they got
statically-assigned numbers).
So, there was basically an array of tokens, and an array mapping lines
to indices in the token array, and an array to map labels to line
indices, ...
In this case, we are back to starting by walking down from the top of a
file. The parser design and syntax was also not very flexible, but
sufficient for the use-case.
I went this route mostly because it allows for a smaller/simpler
interpreter. So, whole language design was based mostly on how I could
do an interpreter within a relatively modest line-count (initial goal
being to fit everything in under 1000 lines of C).
Adding the CSG stuff basically blew out this limit though (could be
sub-1000 lines, but this mostly excludes things like vector math or CSG
solids; which also broke consistency with the "1980s BASIC" design
influence).
Where, had wanted something for some basic 3D modeling and animation
tasks, but: OpenSCAD + exporting STL didn't fit my needs.
Implementing the SCAD language, to extend with new features, would have
been more code.
In some ways, programmatic CSG is preferable to traditional 3D modeling
and animation tools (even if pretty much everyone else seems to prefer
triangle mesh models and GUI-driven 3D model tools over CSG and
describing geometry directly). Though, granted, nicer to have have
higher-level primitives and operators, rather than having to do
everything manually with planes and "Brush Solids". Though, internally,
the primitives get turned into "brush solids" based on collections of
planes, with geometry mostly being clipped between solids using the
planes (so, you don't have true spheres, rather a sphere becomes a brush
with a large number of planes, ...).
Granted, all the code for the CSG proper is not counted in the line
count for the BASIC interpreter, otherwise the number would be
significantly higher.
But, in this language, one could have something like:
let i=0
lbl0:
print i
i=i+1
if i<=10 goto lbl0
Where, "THEN" is optional if "GOTO" or similar is used.
Though, "THEN" is required if you want an "ELSE" clause.
Note that both clauses need to go on the same line.
Note that in this case, the language was case-insensitive as the
keywords and identifiers were case normalized.
L0:
...
GOTO L0
Is equivalent to:
l0:
...
goto l0
Traditionally, BASIC is written all upper case though, but then one
needs to SHIFT or CAPS-LOCK the whole time.
>
> Finally, this is it in Zig:
>
> const std = @import("std");
>
> pub fn main() void {
> for (1..11) |i| {
> std.debug.print("{} {}\n", .{i, @sqrt(@as(f64,
> @floatFromInt(i)))});
> }
> }
>
>
> This one is interesting in making the C version look remarkably clean
> and uncluttered by comparison!
>
> But, yeah, I NEED it to look different from C as I find C syntax
> unreadable; it's a kind of 'C-blindness' where source looks more like a
> block Mime-encoded text.
>
OK.
>
>>> But it is so utterly unlike either of those, that if it's from the
>>> same family, then it must have been adopted.
>>>
>>
>> Idea is that it went ALGOL -> BCPL -> B -> C.
>> Going the other way, ALGOL was derived from FORTRAN.
>
> BCPL also looks quite unlike Algol. If C's progression from Algol was
> via BCPL, which lost some features, then the claim that it came from
> Algol makes even less sense.
>
> I've also heard that there were influences on C from Algol68. That was a
> language that used simple left-to-right type syntax, defaulted to 1-
> based indexing, was (largely) case-insensitive, and used := for
> assignment and = for equality.
>
> Each one of this is the exact opposite of C!
>
> My language was genuinely inspired by Algol68, and as such, still has
> those exact characteristics.
>
> So I think I'd take such claims of C being inspired by or based on
> Algol /or/ Algol68 with a pinch of salt.
>
I think it is more the claim that the path that led to C was via ALGOL
60, but either way...
[toc] | [prev] | [next] | [standalone]
| From | Janis Papanagnou <janis_papanagnou+ng@hotmail.com> |
|---|---|
| Date | 2025-10-20 10:02 +0200 |
| Message-ID | <10d4q9q$32gi5$1@dont-email.me> |
| In reply to | #394560 |
On 15.10.2025 20:00, BGB wrote: > On 10/15/2025 5:26 AM, bart wrote: >> On 15/10/2025 02:13, BGB wrote: >> >>> There was ALGOL, but both C and Pascal descended from ALGOL. >> >> I've heard that before that C was somehow derived from Algol and even >> Algol 68. >> >> But it is so utterly unlike either of those, that if it's from the >> same family, then it must have been adopted. > > Idea is that it went ALGOL -> BCPL -> B -> C. > Going the other way, ALGOL was derived from FORTRAN. In which way do you think that Algol 60 was "derived from FORTRAN"? Janis
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2025-10-20 04:42 -0500 |
| Message-ID | <10d50b6$341dj$1@dont-email.me> |
| In reply to | #394589 |
On 10/20/2025 3:02 AM, Janis Papanagnou wrote:
> On 15.10.2025 20:00, BGB wrote:
>> On 10/15/2025 5:26 AM, bart wrote:
>>> On 15/10/2025 02:13, BGB wrote:
>>>
>>>> There was ALGOL, but both C and Pascal descended from ALGOL.
>>>
>>> I've heard that before that C was somehow derived from Algol and even
>>> Algol 68.
>>>
>>> But it is so utterly unlike either of those, that if it's from the
>>> same family, then it must have been adopted.
>>
>> Idea is that it went ALGOL -> BCPL -> B -> C.
>> Going the other way, ALGOL was derived from FORTRAN.
>
> In which way do you think that Algol 60 was "derived from FORTRAN"?
>
Stuff online seems to say the backwards path went:
ALGOL 60 <- ALGOL 58 <- FORTRAN and Plankalkul
Granted, FORTRAN is a somewhat different language from ALGOL.
Though, can note that a lot of the early languages, like early forms of
FORTRAN and COBOL, were line-oriented and tended to use line numbers and
requiring stuff to be in certain columns, ...
Though, ALGOL and its descendants generally went over to a more
free-form syntax.
Arguably, FORTRAN and BASIC were some of the easier ways to approach
designing a language syntax.
Say, if ranking languages by minimal complexity, maybe:
Level 1:
Limited forms of FORTH and G-Code
Can real line-by-line and evaluate tokens left to right;
Logic mostly driven directly by tokens.
Forth: Tokens drive a stack;
G-Code: Tokens set internal state/parameters.
Any command initiates at the end of the line.
Level 2:
Older forms of FORTRAN and BASIC
Can work line by line and match fixed patterns;
Can use fairly simple rules to split up tokens;
Can use basic pattern matching for an interpreter.
Level 3:
LISP, QBasic and Visual Basic, ALGOL, ...
Typically requires building an AST, etc.
Though, Lisp syntax can map directly to an AST.
Can be supported easily by a tree-walk interpreter.
Level 4:
JavaScript and similar;
Requires a full syntax tree and more advanced parser;
An interpreter can use a tree-walk or basic IR.
Level 5:
Languages like C;
Generally requires a full-featured type-system;
Language structure and parsing may depend on types;
Direct interpretation is not usually feasible;
...
> Janis
>
[toc] | [prev] | [next] | [standalone]
Page 2 of 4 — ← Prev page 1 [2] 3 4 Next page →
Back to top | Article view | comp.lang.c
csiph-web