Groups > comp.lang.c > #393959 > unrolled thread

"The provenance memory model for C", by Jens Gustedt

Started by	Alexis <flexibeast@gmail.com>
First post	2025-07-02 14:22 +1000
Last post	2025-07-20 05:09 +0200
Articles	15 — 6 participants

Back to article view | Back to comp.lang.c

  "The provenance memory model for C", by Jens Gustedt Alexis <flexibeast@gmail.com> - 2025-07-02 14:22 +1000
    Re: "The provenance memory model for C", by Jens Gustedt Kaz Kylheku <643-408-1753@kylheku.com> - 2025-07-02 13:10 +0000
      Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-08 21:39 -0500
        Re: "The provenance memory model for C", by Jens Gustedt David Brown <david.brown@hesbynett.no> - 2025-07-09 11:41 +0200
          Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-09 21:28 -0500
            Re: "The provenance memory model for C", by Jens Gustedt David Brown <david.brown@hesbynett.no> - 2025-07-10 11:34 +0200
              Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-10 21:09 -0500
                Re: "The provenance memory model for C", by Jens Gustedt David Brown <david.brown@hesbynett.no> - 2025-07-11 10:48 +0200
                  Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-11 14:05 -0500
                  Re: "The provenance memory model for C", by Jens Gustedt "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-07-12 17:30 -0700
                    Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-13 15:10 -0500
                      Re: "The provenance memory model for C", by Jens Gustedt "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2025-07-13 13:55 -0700
                        Re: "The provenance memory model for C", by Jens Gustedt BGB <cr88192@gmail.com> - 2025-07-13 19:11 -0500
          Re: "The provenance memory model for C", by Jens Gustedt antispam@fricas.org (Waldek Hebisch) - 2025-07-20 00:21 +0000
            Re: "The provenance memory model for C", by Jens Gustedt David Brown <david.brown@hesbynett.no> - 2025-07-20 05:09 +0200

#393959 — "The provenance memory model for C", by Jens Gustedt

From	Alexis <flexibeast@gmail.com>
Date	2025-07-02 14:22 +1000
Subject	"The provenance memory model for C", by Jens Gustedt
Message-ID	<87o6u343y3.fsf@gmail.com>

Thought some here might find this interesting:

"Our provenance-aware memory object model for C provides a precise
mathematical specification, in place of the ambiguity of these aspects
of the current C standard. It has also stimulated and informed
discussion of provenance in the broader C, C++, Rust, and compiler
communities.

"This work has finally resulted in the publication of an international
standard, Technical Specification ISO/IEC TS 6010 (edited by Henry
Kleynhans, Bloomberg, UK) ...

"In this article, I will try to explain what this is all about, namely
on how a provenance model for pointers interferes with alias analysis of
modern compilers. For those that are not fluent with the terminology or
the concept we have a short intro what pointer aliasing is all about, a
review of existing tools to help the compiler and inherent difficulties
and then the proposed model itself. At the end there is a brief takeaway
that explains how to generally avoid complications and loss of
optimization opportunities that could result from mis-guided aliasing
analysis."

-- https://gustedt.wordpress.com/2025/06/30/the-provenance-memory-model-for-c/


Alexis.

[toc] | [next] | [standalone]

#393960

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-07-02 13:10 +0000
Message-ID	<20250702025125.969@kylheku.com>
In reply to	#393959

On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>
> "This work has finally resulted in the publication of an international
> standard, Technical Specification ISO/IEC TS 6010 (edited by Henry
> Kleynhans, Bloomberg, UK) ...

OMG, it's a completely idiotic document. What it is is a kind of patch
against a specific version of ISO C, written in plain language rather
than in diff format. Like "replace this paragraph with this one, add
this sentence after that one, ...".

What the actual fuck? How will that be maintainable going forward, first
of all.

You can't follow what this is without applying the patch: obtaining
the exact ISO C standard that it targets and performing the edits.

Almost nobody is going to do that.

Right off the bat I spotted pointless shit in it that has nothing to do
with provenance:

  6.4.5 Equality operators

  1 In section 6.5.9 Equality operators, add the following after the rst
    sentence of paragraph 3:

  2 None of the operands shall be an invalid pointer value.

I don't have confidence in an author's understanding of C, if they
believe that ISO C defines the behavior of invalid pointers being
compared, such that this needs to be rectified by a private "patch"
of the text.

The concept of pointer provenance can be expressed other than
as a textual patch against ISO C.

It can be regarded as a language extension and documented similarly
to how a sane compiler documentor would do it.

> "In this article, I will try to explain what this is all about, namely
> on how a provenance model for pointers interferes with alias analysis of
> modern compilers.

Well, no shit; provenance is often dynamic; whereas aliasing analysis
wants to be static.

> For those that are not fluent with the terminology or
> the concept we have a short intro what pointer aliasing is all about, a
> review of existing tools to help the compiler and inherent difficulties
> and then the proposed model itself. At the end there is a brief takeaway
> that explains how to generally avoid complications and loss of
> optimization opportunities that could result from mis-guided aliasing
> analysis."

If you think that certain code could go faster because certain suspected
aliasing isn't actually taking place, then since C99 you were able to
spin the roulette wheel and use "restrict".

So the aliasing analysis and its missed opportunities are the
programmer's responsibility.

It's always better for the machine to miss opportunities than to miss
compile. :)

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#393965

From	BGB <cr88192@gmail.com>
Date	2025-07-08 21:39 -0500
Message-ID	<104kkp3$anl$1@dont-email.me>
In reply to	#393960

On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>
>> "This work has finally resulted in the publication of an international
>> standard, Technical Specification ISO/IEC TS 6010 (edited by Henry
>> Kleynhans, Bloomberg, UK) ...
> 
> OMG, it's a completely idiotic document. What it is is a kind of patch
> against a specific version of ISO C, written in plain language rather
> than in diff format. Like "replace this paragraph with this one, add
> this sentence after that one, ...".
> 
> What the actual fuck? How will that be maintainable going forward, first
> of all.
> 
> You can't follow what this is without applying the patch: obtaining
> the exact ISO C standard that it targets and performing the edits.
> 
> Almost nobody is going to do that.
> 
> Right off the bat I spotted pointless shit in it that has nothing to do
> with provenance:
> 
>    6.4.5 Equality operators
> 
>    1 In section 6.5.9 Equality operators, add the following after the rst
>      sentence of paragraph 3:
> 
>    2 None of the operands shall be an invalid pointer value.
> 
> I don't have confidence in an author's understanding of C, if they
> believe that ISO C defines the behavior of invalid pointers being
> compared, such that this needs to be rectified by a private "patch"
> of the text.
> 
> The concept of pointer provenance can be expressed other than
> as a textual patch against ISO C.
> 
> It can be regarded as a language extension and documented similarly
> to how a sane compiler documentor would do it.
> 
>> "In this article, I will try to explain what this is all about, namely
>> on how a provenance model for pointers interferes with alias analysis of
>> modern compilers.
> 
> Well, no shit; provenance is often dynamic; whereas aliasing analysis
> wants to be static.
> 
>> For those that are not fluent with the terminology or
>> the concept we have a short intro what pointer aliasing is all about, a
>> review of existing tools to help the compiler and inherent difficulties
>> and then the proposed model itself. At the end there is a brief takeaway
>> that explains how to generally avoid complications and loss of
>> optimization opportunities that could result from mis-guided aliasing
>> analysis."
> 
> If you think that certain code could go faster because certain suspected
> aliasing isn't actually taking place, then since C99 you were able to
> spin the roulette wheel and use "restrict".
> 
> So the aliasing analysis and its missed opportunities are the
> programmer's responsibility.
> 
> It's always better for the machine to miss opportunities than to miss
> compile. :)
> 

Agreed.

In my compiler, the default was to use a fairly conservative aliasing 
strategy.

Structure or array loads may be cached, but any "wild store" will flush 
any cached loads.
Store to a free pointer will invalidate every cached load;
Storing to a struct member will flush any cached loads of this member 
(from the same struct type);
Storing to an array will flush any cached loads for which non-alias 
can't be verified;
...

For example:
   x=arr[5];
   arr[10]=w;
   y=arr[5];
The first array load can be reused because it is provable that a store 
to arr[10] can't effect a load from arr[5].

A store to a pointer, however, would invalidate the load (unless the 
pointer is marked as restrict or similar).

Array stores and pointer stores could be partially distinguished at this 
level, in that an array store may not necessarily invalidate stuff, but 
a pointer store necessarily will.

With pointer operations, all stores can be assumed potentially aliasing 
unless restrict is used, regardless of type.

Compiler will keep track of which variables have had their addresses 
taken, with more conservative semantics used in these cases. If you take 
the address of a variable, or load the address of an array, etc. Then 
their contents will be assumed to be volatile. Say, for example, the 
value of a variable may not be held across a pointer store, etc.

If no address has been taken (explicitly) then non-alias may still be 
assumed.

   int arr[16];
   int *ptr;
   arr[5]=10;  //only effects arr
   *ptr=15;    //no effect on arr[5] if arr's address is not taken
   x=arr[5];

But, if this exists somewhere:
   ptr2=arr;
Then this changes 'arr', which may no longer cache loads, and "*ptr=15;" 
would then flush the cached value.

Similarly, casting a struct type to a different type (within a local 
scope) could be assumed to disallow the ability to make assumptions of 
non-alias between members of different structs. Though, partly, one can 
ignore the case of casting "void *" to a struct type.

...

Say:
   Foo *foo;
   Bar *bar;
   x=foo->x;
   bar->y=5;
   ...
"Foo->x" remains cached by default, but:
   bar=(Bar *)foo;
Would locally invalidate the assumption that Foo and Bar do not alias.

Whereas, say:
   ptr=(char *)foo;
Would only necessarily break caching for Foo but not necessarily for Bar.

It varies some as to whether some things are evaluated as local (within 
the scope of a function) or global (whole program). Things like taking 
the address of a global variable or array also apply globally in my 
compiler; ... Though, some of this works mostly because the compiler 
lacks true separate compilation.

If something is marked volatile, no caching is performed.

But, yeah, a more conservative model allows a lot of the same 
performance gains as something like TBAA, but without breaking as 
easily. Ideally, one can use type-casts and pointer-based type punning 
however they want and not run into issues.

There may be errors here, as I am writing from memory.

...

[toc] | [prev] | [next] | [standalone]

#393966

From	David Brown <david.brown@hesbynett.no>
Date	2025-07-09 11:41 +0200
Message-ID	<104ldg8$5f8m$1@dont-email.me>
In reply to	#393965

On 09/07/2025 04:39, BGB wrote:
> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>
...

>>
>> I don't have confidence in an author's understanding of C, if they
>> believe that ISO C defines the behavior of invalid pointers being
>> compared, such that this needs to be rectified by a private "patch"
>> of the text.
>>

You might not be aware of it, but the author Jens Gustedt is a member of 
the C standards committee, and has been for some time.  He is the most 
vocal, public and active member.  I think that suggests he has quite a 
good understanding of C and the ISO standards!  Not everyone agrees 
about his ideas and suggestions about how to move C forward - but that's 
fine (and it's fine by Jens, from what I have read).  That's why there 
is a standards committee, with voting, rather than a BDFL.

>> The concept of pointer provenance can be expressed other than
>> as a textual patch against ISO C.
>>

There have been plenty of papers and blogs written about pointer 
provenance (several by Gustedt) and how it could work.  It's not a very 
easy thing to follow in any format.  A patch to current C standards is 
perhaps the least easy to follow, but it is important for how the 
concept could be added to C.

>> It can be regarded as a language extension and documented similarly
>> to how a sane compiler documentor would do it.
>>
>>> "In this article, I will try to explain what this is all about, namely
>>> on how a provenance model for pointers interferes with alias analysis of
>>> modern compilers.
>>
>> Well, no shit; provenance is often dynamic; whereas aliasing analysis
>> wants to be static.
>>
>>> For those that are not fluent with the terminology or
>>> the concept we have a short intro what pointer aliasing is all about, a
>>> review of existing tools to help the compiler and inherent difficulties
>>> and then the proposed model itself. At the end there is a brief takeaway
>>> that explains how to generally avoid complications and loss of
>>> optimization opportunities that could result from mis-guided aliasing
>>> analysis."
>>
>> If you think that certain code could go faster because certain suspected
>> aliasing isn't actually taking place, then since C99 you were able to
>> spin the roulette wheel and use "restrict".
>>

"restrict" can certainly be useful in some cases.  There are also dozens 
of compiler extensions (such as gcc attributes) for giving the compiler 
extra information about aliasing.

>> So the aliasing analysis and its missed opportunities are the
>> programmer's responsibility.
>>
>> It's always better for the machine to miss opportunities than to miss
>> compile. :)
>>
> 
> Agreed.

It is always better for the toolchain to be able to optimise 
automatically than to require manual intervention by the programmer. 
(It should go without saying that optimisations are only valid if they 
do not affect the observable behaviour of correct code.)  Programmers 
are notoriously bad at figuring out what will affect their code 
efficiency, and will either under-use "restrict" where it could clearly 
be safely used to speed up code, or over-use it resulting in risky code.

If the compiler can't be sure that accesses don't alias, then of course 
it should assume that aliasing is possible.

The idea of pointer provenance is to let compilers (and programmers!) 
have a better understanding of when accesses are guaranteed to be 
alias-free, when they are guaranteed to be aliasing, and when there are 
no guarantees.  This is useful for optimisation and program analysis 
(including static error checking).  The more information the compiler 
has, the better.

> 
> In my compiler, the default was to use a fairly conservative aliasing 
> strategy.
> 
...
> With pointer operations, all stores can be assumed potentially aliasing 
> unless restrict is used, regardless of type.
> 

C does not require that.  And it is rare in practice, IME, for code to 
actually need to access the same data through different lvalue types 
(other than unsigned char).  It is rarer still for it not to be handled 
better using type-punning unions or memcpy() - assuming the compiler 
handles memcpy() decently.

Equally, this means that using type-based alias analysis generally gives 
only small efficiency benefits in C code (but more in C++).  The 
majority of situations where alias analysis and a compiler knowledge of 
no aliasing (or always aliasing) would make a difference, are between 
pointers or other lvalues of compatible types.  That is why provenance 
tracking can have potentially significant benefits.

[toc] | [prev] | [next] | [standalone]

#393967

From	BGB <cr88192@gmail.com>
Date	2025-07-09 21:28 -0500
Message-ID	<104n8he$lb42$1@dont-email.me>
In reply to	#393966

On 7/9/2025 4:41 AM, David Brown wrote:
> On 09/07/2025 04:39, BGB wrote:
>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>
> ...
> 
>>>
>>> I don't have confidence in an author's understanding of C, if they
>>> believe that ISO C defines the behavior of invalid pointers being
>>> compared, such that this needs to be rectified by a private "patch"
>>> of the text.
>>>
> 
> You might not be aware of it, but the author Jens Gustedt is a member of 
> the C standards committee, and has been for some time.  He is the most 
> vocal, public and active member.  I think that suggests he has quite a 
> good understanding of C and the ISO standards!  Not everyone agrees 
> about his ideas and suggestions about how to move C forward - but that's 
> fine (and it's fine by Jens, from what I have read).  That's why there 
> is a standards committee, with voting, rather than a BDFL.
> 
>>> The concept of pointer provenance can be expressed other than
>>> as a textual patch against ISO C.
>>>
> 
> There have been plenty of papers and blogs written about pointer 
> provenance (several by Gustedt) and how it could work.  It's not a very 
> easy thing to follow in any format.  A patch to current C standards is 
> perhaps the least easy to follow, but it is important for how the 
> concept could be added to C.
> 

Admittedly, as of yet, I haven't quite figured out what exactly 
provenance is supposed to be, or how it is supposed to work in practice.

>>> It can be regarded as a language extension and documented similarly
>>> to how a sane compiler documentor would do it.
>>>
>>>> "In this article, I will try to explain what this is all about, namely
>>>> on how a provenance model for pointers interferes with alias 
>>>> analysis of
>>>> modern compilers.
>>>
>>> Well, no shit; provenance is often dynamic; whereas aliasing analysis
>>> wants to be static.
>>>
>>>> For those that are not fluent with the terminology or
>>>> the concept we have a short intro what pointer aliasing is all about, a
>>>> review of existing tools to help the compiler and inherent difficulties
>>>> and then the proposed model itself. At the end there is a brief 
>>>> takeaway
>>>> that explains how to generally avoid complications and loss of
>>>> optimization opportunities that could result from mis-guided aliasing
>>>> analysis."
>>>
>>> If you think that certain code could go faster because certain suspected
>>> aliasing isn't actually taking place, then since C99 you were able to
>>> spin the roulette wheel and use "restrict".
>>>
> 
> "restrict" can certainly be useful in some cases.  There are also dozens 
> of compiler extensions (such as gcc attributes) for giving the compiler 
> extra information about aliasing.
> 

And, the annoyance of them being compiler dependent...

>>> So the aliasing analysis and its missed opportunities are the
>>> programmer's responsibility.
>>>
>>> It's always better for the machine to miss opportunities than to miss
>>> compile. :)
>>>
>>
>> Agreed.
> 
> It is always better for the toolchain to be able to optimise 
> automatically than to require manual intervention by the programmer. (It 
> should go without saying that optimisations are only valid if they do 
> not affect the observable behaviour of correct code.)  Programmers are 
> notoriously bad at figuring out what will affect their code efficiency, 
> and will either under-use "restrict" where it could clearly be safely 
> used to speed up code, or over-use it resulting in risky code.
> 
> If the compiler can't be sure that accesses don't alias, then of course 
> it should assume that aliasing is possible.
> 
> The idea of pointer provenance is to let compilers (and programmers!) 
> have a better understanding of when accesses are guaranteed to be alias- 
> free, when they are guaranteed to be aliasing, and when there are no 
> guarantees.  This is useful for optimisation and program analysis 
> (including static error checking).  The more information the compiler 
> has, the better.
> 

That is the idea at least.

Though, if one assumes the compiler has non-local visibility, this is a 
problem.

Granted, as long as one can keep using more traditional semantics, 
probably OK.

>>
>> In my compiler, the default was to use a fairly conservative aliasing 
>> strategy.
>>
> ...
>> With pointer operations, all stores can be assumed potentially 
>> aliasing unless restrict is used, regardless of type.
>>
> 
> C does not require that.  And it is rare in practice, IME, for code to 
> actually need to access the same data through different lvalue types 
> (other than unsigned char).  It is rarer still for it not to be handled 
> better using type-punning unions or memcpy() - assuming the compiler 
> handles memcpy() decently.
> 

I take a conservative approach because I want the compiler to be able to 
run code that assumes traditional behavior (like that typical of 1990s 
era compilers, or MSVC).

Granted, it is a tradeoff that a lot of this code needs to be modified 
to work on GCC and Clang (absent the usual need for "-fwrapv 
-fno-strict-aliasing" options).

Granted, there is a command-line option to enable TBAA semantics, just 
it is not the default option in this case (so, in BGBCC, TBAA is opt-in; 
rather than opt-out in GCC and Clang).

BGBCC's handling of memcpy is intermediate:
It can turn it into loads and stores;
But, it can't turn it into a plain register move;
Taking the address of a variable will also cause the variable to be 
loaded/stored every time it is accessed in this function (regardless of 
where it is accessed in said function).

So:
   memcpy(&i, &f, 8);
Will still use memory ops and wreck the performance of both the i and f 
variables.

Meanwhile:
   i=*(uitn64_t *)(&f);
Will only wreck the performance of 'f'.

The best option for performance in BGBCC is one of either:
   i=__float64_getbits(f);  //compiler intrinsic
   i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.

Though, these options don't exist in the other compilers.

Implicitly, casting via __m64 or __m128 is a double-cast though. In 
BGBCC, these types don't natively support any operators (so, they are 
basically sort of like the value-equivalents of "void *").

So:
   memcpy(&i, &f, 8);      //best for GCC and Clang
   i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
   i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC

In a lot of cases, these end up with wrappers.

GCC:
   static inline uitn64_t getU64(void *ptr)
   {
     uitn64_t v;
     memcpy(&v, ptr, 8);
     return(v);
   }
MSVC or BGBCC:
   #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))

Though, have noted that volatile usually works in GCC as well, though in 
GCC there is no obvious performance difference between volatile and 
memcpy, whereas in MSVC the use of a volatile cast is faster.

Don't want to use static inline functions in BGBCC though, as it still 
doesn't support inline functions in the general case.

Though, a lot of the 90s era code I run doesn't assume inline functions 
either, but instead more often uses big macros:
   #define foo(x, y) \
   do { \
     int z = x+y; \
     ... \
   } while(0);

But, ironically a few cases replaced these macros with functions, as the 
function-call overhead was less than the hassle of having bulky blobs of 
code being duplicated inline each time (and in this case, the function 
call overhead isn't too unreasonable).

> Equally, this means that using type-based alias analysis generally gives 
> only small efficiency benefits in C code (but more in C++).  The 
> majority of situations where alias analysis and a compiler knowledge of 
> no aliasing (or always aliasing) would make a difference, are between 
> pointers or other lvalues of compatible types.  That is why provenance 
> tracking can have potentially significant benefits.
> 

But, the "tracking" part isn't great.

It implies potentially needing to be able to figure out where the value 
came from by walking backwards across the control flow graph (or, worse 
yet, the call graph). There be dragons there...

My model doesn't need tracking, merely keeping track of the relevant 
status flags and similar.

   Did variable 'a' ever have its address taken?
   Was type T ever cast to a different type?
   ...

it is much like the compiler may also keep track of different things, 
like for example, whether types like "__int128" or "__float128" were 
used, which operators were used on them, etc.

Decided to leave out talking about past language/compiler efforts which 
ran into trouble in these areas.

But, preferable IMO to try to keep things simple.
Which ideally means avoiding things more complicated than what can be 
managed by setting flags or similar.

Even if, yeah, the current strategy also has some drawbacks.

Granted, there are some more complicated things in the compiler, like 
code for tracking the first and last use of a given "version" of a 
variable. Where, every time a variable is assigned, it gets a new 
version; and references to a variable track which version it is accessing.

But, this is relevant some to things like register allocator decisions.
If the current variable version isn't used again, it can be evicted;
If it isn't used as an input in the reachable basic-blocks, its current 
value can also be discarded (possibly saving a memory store).

...

[toc] | [prev] | [next] | [standalone]

#393968

From	David Brown <david.brown@hesbynett.no>
Date	2025-07-10 11:34 +0200
Message-ID	<104o1f2$pkvi$1@dont-email.me>
In reply to	#393967

On 10/07/2025 04:28, BGB wrote:
> On 7/9/2025 4:41 AM, David Brown wrote:
>> On 09/07/2025 04:39, BGB wrote:
>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>
>> ...
>>
>> There have been plenty of papers and blogs written about pointer 
>> provenance (several by Gustedt) and how it could work.  It's not a 
>> very easy thing to follow in any format.  A patch to current C 
>> standards is perhaps the least easy to follow, but it is important for 
>> how the concept could be added to C.
>>
> 
> Admittedly, as of yet, I haven't quite figured out what exactly 
> provenance is supposed to be, or how it is supposed to work in practice.
> 

I've read a bit, but I think it would take quite an effort to understand 
the details.

As a compiler user (albeit one with an interest in compilers and code 
generation), rather than a compiler developer, my attitude to writing C 
code will be the same if and when pointer provenance becomes part of the 
C model and C compiler optimisations - don't lie to your compiler.  If 
you want to do weird stuff behind the compiler's back (and that is 
certainly possible in embedded development), use "volatile" accesses in 
the right places.  So for me, in practical use, pointer provenance will 
simply mean that the compiler can do a bit more optimisation with less 
manual work - and that's a nice thing.  (I'll still be interested in how 
it works, but that's for fun, not for real work.)

>>>>
>>>> If you think that certain code could go faster because certain 
>>>> suspected
>>>> aliasing isn't actually taking place, then since C99 you were able to
>>>> spin the roulette wheel and use "restrict".
>>>>
>>
>> "restrict" can certainly be useful in some cases.  There are also 
>> dozens of compiler extensions (such as gcc attributes) for giving the 
>> compiler extra information about aliasing.
>>
> 
> And, the annoyance of them being compiler dependent...

Sure.  "restrict" is, of course, not compiler dependent - but the effect 
it has on optimisation is compiler dependent.

Often you can also get improved results by manually "caching" data in 
local variables, instead of using pointer or array access directly, thus 
avoiding any extra memory accesses the compiler has to put in just in 
case pointers alias.  But code is neater if you don't have to do that 
kind of thing.

> 
> 
>>>> So the aliasing analysis and its missed opportunities are the
>>>> programmer's responsibility.
>>>>
>>>> It's always better for the machine to miss opportunities than to miss
>>>> compile. :)
>>>>
>>>
>>> Agreed.
>>
>> It is always better for the toolchain to be able to optimise 
>> automatically than to require manual intervention by the programmer. 
>> (It should go without saying that optimisations are only valid if they 
>> do not affect the observable behaviour of correct code.)  Programmers 
>> are notoriously bad at figuring out what will affect their code 
>> efficiency, and will either under-use "restrict" where it could 
>> clearly be safely used to speed up code, or over-use it resulting in 
>> risky code.
>>
>> If the compiler can't be sure that accesses don't alias, then of 
>> course it should assume that aliasing is possible.
>>
>> The idea of pointer provenance is to let compilers (and programmers!) 
>> have a better understanding of when accesses are guaranteed to be 
>> alias- free, when they are guaranteed to be aliasing, and when there 
>> are no guarantees.  This is useful for optimisation and program 
>> analysis (including static error checking).  The more information the 
>> compiler has, the better.
>>
> 
> That is the idea at least.
> 
> Though, if one assumes the compiler has non-local visibility, this is a 
> problem.
> 
> Granted, as long as one can keep using more traditional semantics, 
> probably OK.

Of course compilers can (and must!) fall back to the "assume accesses 
might alias" approach when they don't have the extra information.  But 
at least for code in the same compilation, they can do better.

And there is a trend amongst those wanting higher performance to use 
link-time optimisation, whole-program optimisation, or similarly named 
techniques to share information across units.  Traditional separate 
compilation to object files then linking by identifier name only is a 
nice clear model, but hugely limiting for both optimisation and static 
error checking.

> 
> 
>>>
>>> In my compiler, the default was to use a fairly conservative aliasing 
>>> strategy.
>>>
>> ...
>>> With pointer operations, all stores can be assumed potentially 
>>> aliasing unless restrict is used, regardless of type.
>>>
>>
>> C does not require that.  And it is rare in practice, IME, for code to 
>> actually need to access the same data through different lvalue types 
>> (other than unsigned char).  It is rarer still for it not to be 
>> handled better using type-punning unions or memcpy() - assuming the 
>> compiler handles memcpy() decently.
>>
> 
> I take a conservative approach because I want the compiler to be able to 
> run code that assumes traditional behavior (like that typical of 1990s 
> era compilers, or MSVC).

Please don't call this "traditional behaviour" of compilers - be honest, 
and call it limited optimisation and dumb translation.  And don't call 
it "code that assumes traditional behaviour" - call it "code written by 
people who don't really understand the language".  Code which assumes 
you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is 
broken code.  It always has been, and always will be - even if it does 
what the programmer wanted on old or limited compilers.

There were compilers in the 1990's that did type-based alias analysis, 
and many other "modern" optimisations - I have used at least one.

It's okay to be conservative in a compiler (especially when high 
optimisation is really difficult!).  It's okay to have command-line 
switches or pragmas to support additional language semantics such as 
supporting access via any lvalue type, or giving signed integer 
arithmetic two's complement wrapping behaviour.  It's okay to make these 
the defaults.

But it is not okay to encourage code to make these compiler-specific 
assumptions without things like a pre-processor check for the specific 
compiler and pragmas to explicitly set the required compiler switches. 
It is not okay to excuse bad code as "traditional style" - that's an 
insult to people who have been writing good C code for decades.

> 
> Granted, it is a tradeoff that a lot of this code needs to be modified 
> to work on GCC and Clang (absent the usual need for "-fwrapv 
> -fno-strict-aliasing" options).
> 
> Granted, there is a command-line option to enable TBAA semantics, just 
> it is not the default option in this case (so, in BGBCC, TBAA is opt-in; 
> rather than opt-out in GCC and Clang).
> 
> BGBCC's handling of memcpy is intermediate:
> It can turn it into loads and stores;
> But, it can't turn it into a plain register move;
> Taking the address of a variable will also cause the variable to be 
> loaded/stored every time it is accessed in this function (regardless of 
> where it is accessed in said function).
> 
> So:
>    memcpy(&i, &f, 8);
> Will still use memory ops and wreck the performance of both the i and f 
> variables.

Well, there you have scope for some useful optimisations (more useful 
than type-based alias analysis).  memcpy does not need to use memory 
accesses unless real memory accesses are actually needed to give the 
observable effects specified in the C standards.

unsigned int f_to_u(float f) {
     unsigned int u;
     memcpy(&u, &f, sizeof(f));
     return u;
}

gcc compiles that to :

f_to_u:
	movd eax, xmm0
	ret

> Meanwhile:
>    i=*(uitn64_t *)(&f);
> Will only wreck the performance of 'f'.
> 
> 
> The best option for performance in BGBCC is one of either:
>    i=__float64_getbits(f);  //compiler intrinsic
>    i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
> 
> Though, these options don't exist in the other compilers.

Such compiler extensions can definitely be useful, but it's even better 
if a compiler can optimise standard code - that way, programmers can 
write code that works correctly on any compiler and is efficient on the 
compilers that they are most interested in.

> 
> Implicitly, casting via __m64 or __m128 is a double-cast though. In 
> BGBCC, these types don't natively support any operators (so, they are 
> basically sort of like the value-equivalents of "void *").
> 
> 
> So:
>    memcpy(&i, &f, 8);      //best for GCC and Clang
>    i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
>    i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
> 
> In a lot of cases, these end up with wrappers.
> 
> GCC:
>    static inline uitn64_t getU64(void *ptr)
>    {
>      uitn64_t v;
>      memcpy(&v, ptr, 8);
>      return(v);
>    }
> MSVC or BGBCC:
>    #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))
> 
> Though, have noted that volatile usually works in GCC as well, though in 
> GCC there is no obvious performance difference between volatile and 
> memcpy, whereas in MSVC the use of a volatile cast is faster.

In gcc, a memcpy here will need to use a single memory read unless 
"getU64" is called with the address of a variable that is already in a 
register (in which case you get a single register move instruction).  A 
volatile read will also do a single memory read - but it might hinder 
other optimisations by limiting the movement of code around.

On MSVC, last I saw (which is a long time ago), any use of "memcpy" will 
be done using an external library function (in an DLL) for generic 
memcpy() use - clearly that will have /massive/ overhead in comparison 
to the single memory read needed for a volatile access.

> 
> Don't want to use static inline functions in BGBCC though, as it still 
> doesn't support inline functions in the general case.
>

[toc] | [prev] | [next] | [standalone]

#393969

From	BGB <cr88192@gmail.com>
Date	2025-07-10 21:09 -0500
Message-ID	<104proo$15qjv$1@dont-email.me>
In reply to	#393968

On 7/10/2025 4:34 AM, David Brown wrote:
> On 10/07/2025 04:28, BGB wrote:
>> On 7/9/2025 4:41 AM, David Brown wrote:
>>> On 09/07/2025 04:39, BGB wrote:
>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>
>>> ...
>>>
>>> There have been plenty of papers and blogs written about pointer 
>>> provenance (several by Gustedt) and how it could work.  It's not a 
>>> very easy thing to follow in any format.  A patch to current C 
>>> standards is perhaps the least easy to follow, but it is important 
>>> for how the concept could be added to C.
>>>
>>
>> Admittedly, as of yet, I haven't quite figured out what exactly 
>> provenance is supposed to be, or how it is supposed to work in practice.
>>
> 
> I've read a bit, but I think it would take quite an effort to understand 
> the details.
> 
> As a compiler user (albeit one with an interest in compilers and code 
> generation), rather than a compiler developer, my attitude to writing C 
> code will be the same if and when pointer provenance becomes part of the 
> C model and C compiler optimisations - don't lie to your compiler.  If 
> you want to do weird stuff behind the compiler's back (and that is 
> certainly possible in embedded development), use "volatile" accesses in 
> the right places.  So for me, in practical use, pointer provenance will 
> simply mean that the compiler can do a bit more optimisation with less 
> manual work - and that's a nice thing.  (I'll still be interested in how 
> it works, but that's for fun, not for real work.)
> 

Probably true, but I am also thinking as a compiler developer.

Granted, arguably my compiler isn't great, but this is a different issue.

>>>>>
>>>>> If you think that certain code could go faster because certain 
>>>>> suspected
>>>>> aliasing isn't actually taking place, then since C99 you were able to
>>>>> spin the roulette wheel and use "restrict".
>>>>>
>>>
>>> "restrict" can certainly be useful in some cases.  There are also 
>>> dozens of compiler extensions (such as gcc attributes) for giving the 
>>> compiler extra information about aliasing.
>>>
>>
>> And, the annoyance of them being compiler dependent...
> 
> Sure.  "restrict" is, of course, not compiler dependent - but the effect 
> it has on optimisation is compiler dependent.
> 

I was unclear, I meant more about the use of GCC attributes and similar 
being compiler dependent.

> Often you can also get improved results by manually "caching" data in 
> local variables, instead of using pointer or array access directly, thus 
> avoiding any extra memory accesses the compiler has to put in just in 
> case pointers alias.  But code is neater if you don't have to do that 
> kind of thing.
> 

This is often sort of what I end up doing anyways, because manually 
caching stuff in local variables and being selective about when things 
are loaded or stored to external locations, is often better for 
performance in MSVC.

Doesn't make so much difference with GCC though.

But, for native Windows, I am primarily using MSVC.

For BGBCC, yeah best case performance is manually caching things, 
manually unrolling or modulo scheduling loops, and trying to organize 
expressions such that results are not reused too quickly. Often 
typically breaking up complex expressions into multiple simpler ones so 
as to limit dependencies. Also avoiding any expensive operations 
whenever possible (divide, modulo, or 64 bit multiply or similar), ...

Such a style tends to look kinda like one is writing assembler in C, but 
can often perform OK.

>>
>>
>>>>> So the aliasing analysis and its missed opportunities are the
>>>>> programmer's responsibility.
>>>>>
>>>>> It's always better for the machine to miss opportunities than to miss
>>>>> compile. :)
>>>>>
>>>>
>>>> Agreed.
>>>
>>> It is always better for the toolchain to be able to optimise 
>>> automatically than to require manual intervention by the programmer. 
>>> (It should go without saying that optimisations are only valid if 
>>> they do not affect the observable behaviour of correct code.)  
>>> Programmers are notoriously bad at figuring out what will affect 
>>> their code efficiency, and will either under-use "restrict" where it 
>>> could clearly be safely used to speed up code, or over-use it 
>>> resulting in risky code.
>>>
>>> If the compiler can't be sure that accesses don't alias, then of 
>>> course it should assume that aliasing is possible.
>>>
>>> The idea of pointer provenance is to let compilers (and programmers!) 
>>> have a better understanding of when accesses are guaranteed to be 
>>> alias- free, when they are guaranteed to be aliasing, and when there 
>>> are no guarantees.  This is useful for optimisation and program 
>>> analysis (including static error checking).  The more information the 
>>> compiler has, the better.
>>>
>>
>> That is the idea at least.
>>
>> Though, if one assumes the compiler has non-local visibility, this is 
>> a problem.
>>
>> Granted, as long as one can keep using more traditional semantics, 
>> probably OK.
> 
> Of course compilers can (and must!) fall back to the "assume accesses 
> might alias" approach when they don't have the extra information.  But 
> at least for code in the same compilation, they can do better.
> 
> And there is a trend amongst those wanting higher performance to use 
> link-time optimisation, whole-program optimisation, or similarly named 
> techniques to share information across units.  Traditional separate 
> compilation to object files then linking by identifier name only is a 
> nice clear model, but hugely limiting for both optimisation and static 
> error checking.
> 

Ironically, my compiler doesn't do traditional separate compilation in 
the first place.

The frontend (separate stages) are basically:
   Preprocess;
   Parse to AST;
   Flatten AST to a Stack-Based IL.

The handling of value caching and similar is basically in the 3AC stage, 
which doesn't exist until after we are generating the final binary.

If you try to produce object files, or static libraries, in this case, 
they are basically just blobs of this stack-oriented bytecode.

The behavior of the IL is sort of part-way between that of JVM and .NET 
bytecode. Though, has some structural aspects in common with WASM (eg, 
more monolothic bytecode blob rather than structures and tables), but 
the bytecode semantics are more like those of .NET bytecode.

Where, say:
   JVM:
     ".class" files, each representing a single class type;
      Has tables for literals, fields, and methods;
        Each table contains various structures for each item.
      Bytecode operations have explicit types;
      Member and method types, etc, are identified by ASCII strings;
      Handles internal control flow with byte-based branch offsets;
      Stack offset to be the same at all paths to a given spot.
        However, it is not always obvious what the stack-offset is.
   .NET:
     Has various tables embedded in a COFF or PE/COFF;
     Has tables organized similar to those in a relational database;
       Uses a dense packing scheme for table contents.
     Bytecode operations types are carried implicitly on the stack;
     Type signatures are encoded as binary blobs;
     IIRC, mixed offsets and label breaks;
     No items may be on the stack during a branch.
   WASM:
     Big blob of bytecode, which defines other structure;
     Sorta type-inferred, often using wonk to express types implicitly;
     Uses structured control-flow even at the IL level.
   BGBCC / RIL3:
     Big blob of bytecode, defining all structure;
       Internally, compiler uses two large tables for everything;
       One table holds all top-level declarations and similar;
       The other holds any literal values and non-declaration entities.
     Operation types are carried on the stack;
     Types are mostly identified using ASCII strings;
     Control flow via label markers and branch to label ID's;
       Any branch requires a label instruction with a matching ID.
     Usually no items on stack during branch.
       Except for special edge cases like "a?b:c".

Though, part of this was because BGBCC's origins weren't originally as a 
C compiler, rather it was an early fork off of my BGBScript interpreter 
(though, using parts from two different versions of the interpreter, *1).

Where, BGBScript was a language superficially resembling JavaScript.

*1:
   2001: Wrote a Scheme interpreter (early high-school);
     Mostly worked using S-Expression walking and similar.
   2004: Wrote a JS clone around the time I graduated high-school;
     First version used repurposed DOM for the ASTs;
     And AST walking for interpretation;
     Major property: Horridly slow.
   2006: Rewrote BGBScript VM
     Used chunks from the Scheme interpreter;
     Switched over to bytecode;
     Went from conservative to precise GC (...pain...).
   2007:
     BGBCC starts as a fork of the 2004 interpreter,
       With parts of the 2006 interpreter.
     Soon discovers: Interpreted C not a great script language.
   2008/2009: Sorta writes a Doom 3 clone;
     A (mostly final) version of BGBScript is created.
       Mostly borrowing more from ActionScript;
       Gains a JIT compiler;
       Switched back to conservative GC.
   2010/2011: The Doom3 clone partly reworking into a Minecraft clone.
     Uses BGBScript as a major script language, partial impl language.
   2014:
     Rebooted script language, new version more like Java and C#;
     Also a new 3D engine, designed to be simpler and lighter weight.
   2017:
     I started designing my own ISA and similar;
     BGBCC was revived and used as a C compiler for it.
       Had also looked at LCC,
       but noted I had less of a starting point with LCC;
       What LCC could do, seemingly BGBCC already did.

However, partly because of this, BGBCC remained mostly using a bytecode 
format. But, stuff got wonky in the middle part partly as at one point I 
had assumed trying to eliminate the stack IL stage (so the stages aren't 
fully separate in the code).

Also some wonk as there was a point earlier on in BGBCC's development 
where I was trying to move to a purely ASCII IL that was syntactically 
derived from PostScript, but this sorta fell apart.

>>
>>
>>>>
>>>> In my compiler, the default was to use a fairly conservative 
>>>> aliasing strategy.
>>>>
>>> ...
>>>> With pointer operations, all stores can be assumed potentially 
>>>> aliasing unless restrict is used, regardless of type.
>>>>
>>>
>>> C does not require that.  And it is rare in practice, IME, for code 
>>> to actually need to access the same data through different lvalue 
>>> types (other than unsigned char).  It is rarer still for it not to be 
>>> handled better using type-punning unions or memcpy() - assuming the 
>>> compiler handles memcpy() decently.
>>>
>>
>> I take a conservative approach because I want the compiler to be able 
>> to run code that assumes traditional behavior (like that typical of 
>> 1990s era compilers, or MSVC).
> 
> Please don't call this "traditional behaviour" of compilers - be honest, 
> and call it limited optimisation and dumb translation.  And don't call 
> it "code that assumes traditional behaviour" - call it "code written by 
> people who don't really understand the language".  Code which assumes 
> you can do "extern float x; unsigned int * p = (unsigned int *) &x;" is 
> broken code.  It always has been, and always will be - even if it does 
> what the programmer wanted on old or limited compilers.
> 
> There were compilers in the 1990's that did type-based alias analysis, 
> and many other "modern" optimisations - I have used at least one.
> 

Either way, MSVC mostly accepts this sorta code.

Also I think a lot of this code was originally written for compilers 
like Watcom C and similar.

Have noted that there are some behavioral inconsistencies, for example:
Some old code seems to assumes that x<<y, y always shifts left but 
modulo to the width of the type. Except, when both x and y are constant, 
code seems to expect it as if it were calculated with a wider type, and 
where negative shifts go in the opposite direction, ... with the result 
then being converted to the final type.

Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large or 
negative shift. MSVC will warn if the shift is large or negative.

Though, in most cases, if the shift is larger than the width of the 
type, or negative, it is usually a programming error.

> It's okay to be conservative in a compiler (especially when high 
> optimisation is really difficult!).  It's okay to have command-line 
> switches or pragmas to support additional language semantics such as 
> supporting access via any lvalue type, or giving signed integer 
> arithmetic two's complement wrapping behaviour.  It's okay to make these 
> the defaults.
> 
> But it is not okay to encourage code to make these compiler-specific 
> assumptions without things like a pre-processor check for the specific 
> compiler and pragmas to explicitly set the required compiler switches. 
> It is not okay to excuse bad code as "traditional style" - that's an 
> insult to people who have been writing good C code for decades.
> 

A lot of the code I have seen from the 90s was written this way.

Though, a lot of it comes from a few major sources:
   id Software;
     Can mostly be considered "standard" practice,
     along with maybe Linux kernel, ...
   Apogee Software
     Well, some of this code is kinda bad.
     This code tends to be dominated by global variables.
     Also treating array bounds as merely a suggestion.
   Raven Software
     Though, most of this was merely modified ID Software code.

Early on, I think I also looked a fair bit at the Linux kernel, and also 
some of the GNU shell utilities and similar (though, the "style" was 
very different vs either the Linux kernel or ID code).

Early on, I had learned C partly by tinkering around with id's code and 
trying to understand what secrets it contained.

But, alas, an example from Wikipedia shows a relevant aspect of id's style:
https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code

Which is, at least to me, what I consider "traditional".

A particularly infamous example of Apogee's style is basically the code 
for the Build Engine. Where, ironically, the dominant versions of 
"modern" Duke Nuken 3D ports are derived from ground-up rewrites rather 
than ports of the original Build Engine code.

But, Build Engine defines ones' motivation to try to port it to anything 
that is not 32-bit x86.

Like, at least id's code was mostly 64-bit clean...

Quake 1/2/3 also used a lot of "float *" and "double *" for general 
memory copies in a few places.

I had found though that these would break if compiling on 64-bit and 
then running on a Core2 based system, as seemingly the MOVSS and MOVSD 
instructions on Core2 may mangle bit patterns if they happen to encode a 
NaN (but not on AMD chips).

Though, the general workaround was to replace these cases with "memcpy()".

> 
>>
>> Granted, it is a tradeoff that a lot of this code needs to be modified 
>> to work on GCC and Clang (absent the usual need for "-fwrapv -fno- 
>> strict-aliasing" options).
>>
>> Granted, there is a command-line option to enable TBAA semantics, just 
>> it is not the default option in this case (so, in BGBCC, TBAA is opt- 
>> in; rather than opt-out in GCC and Clang).
>>
>> BGBCC's handling of memcpy is intermediate:
>> It can turn it into loads and stores;
>> But, it can't turn it into a plain register move;
>> Taking the address of a variable will also cause the variable to be 
>> loaded/stored every time it is accessed in this function (regardless 
>> of where it is accessed in said function).
>>
>> So:
>>    memcpy(&i, &f, 8);
>> Will still use memory ops and wreck the performance of both the i and 
>> f variables.
> 
> Well, there you have scope for some useful optimisations (more useful 
> than type-based alias analysis).  memcpy does not need to use memory 
> accesses unless real memory accesses are actually needed to give the 
> observable effects specified in the C standards.
> 

Possibly, but by the stage we know that it could be turned into a 
reg-reg move (in the final code generation), most of the damage has 
already been done.

Basically, it would likely be necessary to detect and special case this 
scenario at the AST level(probably by turning it into a cast or 
intrinsic). But, usually one doesn't want to add too much of this sort 
of cruft to the AST walk.

But, then, apart from code written to assume GCC or similar, most of the 
code doesn't use memcpy in this way.

So, it would mostly only bring significant advantage if pulling code in 
from GCC land.

Ideally, I would want the compiler to be smaller and faster.

Had at one point started looking into this, but the effort fizzled, and 
now I am not sure I was going in the right direction. It would have gone 
over to separate compilation, but now it seems like compiling via an IL 
stage may in fact be the better option.

Well, with the main debate being what the IL should look like:
   Stack IL: Like JVM, .NET, WASM, existing BGBCC, ...
   Untagged register IL: Like Dalvik
   SSA Form: Like LLVM/Clang.

But, my own experience suggests stack-oriented may still be the better 
option for a compiler IL (with translation into SSA form during the 
backend).

Then, the major difference is seemingly the perennial issue of trying to 
rework how the stack bytecode is packaged.

Which then turns into me wanting to stick it into a TLV package or 
similar, then losing motivation vs not bothering and staying with the 
existing "big blob of bytecode" format.

Well, and the simplest option of "take the existing format and put it in 
a RIFF wrapper" is mostly in "meh, whatever" territory.

Decided to leave out a tangent where BMP vs RDIB led me off on a tangent 
about image and audio formats.

> unsigned int f_to_u(float f) {
>      unsigned int u;
>      memcpy(&u, &f, sizeof(f));
>      return u;
> }
> 
> gcc compiles that to :
> 
> f_to_u:
>      movd eax, xmm0
>      ret
> 

Yeah, it is more clever here, granted.

>> Meanwhile:
>>    i=*(uitn64_t *)(&f);
>> Will only wreck the performance of 'f'.
>>
>>
>> The best option for performance in BGBCC is one of either:
>>    i=__float64_getbits(f);  //compiler intrinsic
>>    i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
>>
>> Though, these options don't exist in the other compilers.
> 
> Such compiler extensions can definitely be useful, but it's even better 
> if a compiler can optimise standard code - that way, programmers can 
> write code that works correctly on any compiler and is efficient on the 
> compilers that they are most interested in.
> 

Possibly.

For "semi-portable" code, usually used MSVC style, partly as by adding 
'volatile' it seemingly also works in GCC. Though, often with macro 
wrappers.

>>
>> Implicitly, casting via __m64 or __m128 is a double-cast though. In 
>> BGBCC, these types don't natively support any operators (so, they are 
>> basically sort of like the value-equivalents of "void *").
>>
>>
>> So:
>>    memcpy(&i, &f, 8);      //best for GCC and Clang
>>    i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
>>    i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
>>
>> In a lot of cases, these end up with wrappers.
>>
>> GCC:
>>    static inline uitn64_t getU64(void *ptr)
>>    {
>>      uitn64_t v;
>>      memcpy(&v, ptr, 8);
>>      return(v);
>>    }
>> MSVC or BGBCC:
>>    #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))
>>
>> Though, have noted that volatile usually works in GCC as well, though 
>> in GCC there is no obvious performance difference between volatile and 
>> memcpy, whereas in MSVC the use of a volatile cast is faster.
> 
> In gcc, a memcpy here will need to use a single memory read unless 
> "getU64" is called with the address of a variable that is already in a 
> register (in which case you get a single register move instruction).  A 
> volatile read will also do a single memory read - but it might hinder 
> other optimisations by limiting the movement of code around.
> 

Possibly.

When I tried benchmarking these before:
   GCC:
     Seemingly no difference between memcpy and volatile;
   MSVC:
     Adding or removing volatile made no real difference;
     Using memcpy is slower.
   BGBCC: Either memcpy or volatile carries an overhead.
     The use of volatile is basically a shotgun de-optimization;
     If doesn't know what to de-optimize, so goes naive for everything.

> On MSVC, last I saw (which is a long time ago), any use of "memcpy" will 
> be done using an external library function (in an DLL) for generic 
> memcpy() use - clearly that will have /massive/ overhead in comparison 
> to the single memory read needed for a volatile access.
> 

It is slightly more clever now, but still not great.
   Will not (always) generate a library call.
   Though, in VS2008 or similar, was always still a library call.
     VS2010 and VS2013 IIRC might setup and use "REP MOVSB" instead.

It will do it inline, but still often:
   Spill variables;
   Load addresses;
   Load from source;
   Store to destination;
   Load value from destination.

What BGBCC gives here is basically similar.

> 
>>
>> Don't want to use static inline functions in BGBCC though, as it still 
>> doesn't support inline functions in the general case.
>>
>

[toc] | [prev] | [next] | [standalone]

#393970

From	David Brown <david.brown@hesbynett.no>
Date	2025-07-11 10:48 +0200
Message-ID	<104qj4o$1drn4$1@dont-email.me>
In reply to	#393969

On 11/07/2025 04:09, BGB wrote:
> On 7/10/2025 4:34 AM, David Brown wrote:
>> On 10/07/2025 04:28, BGB wrote:
>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>> On 09/07/2025 04:39, BGB wrote:
>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>
>>>> ...

>>
>> Please don't call this "traditional behaviour" of compilers - be 
>> honest, and call it limited optimisation and dumb translation.  And 
>> don't call it "code that assumes traditional behaviour" - call it 
>> "code written by people who don't really understand the language".  
>> Code which assumes you can do "extern float x; unsigned int * p = 
>> (unsigned int *) &x;" is broken code.  It always has been, and always 
>> will be - even if it does what the programmer wanted on old or limited 
>> compilers.
>>
>> There were compilers in the 1990's that did type-based alias analysis, 
>> and many other "modern" optimisations - I have used at least one.
>>
> 
> Either way, MSVC mostly accepts this sorta code.

I remember reading in a MSVC blog somewhere that they had no plans to 
introduce type-based alias analysis in the compiler.  The same blog 
article announced their advanced new optimisations that treat signed 
integer overflow as undefined behaviour and explained that they'd being 
doing that for years in a few specific cases.  I think it is fair to 
assume there is a strong overlap between the programmers who think MSVC, 
or C and C++ in general, have two's complement wrapping of signed 
integers when the hardware supports it, as those who think pointer casts 
let you access any data.

And despite the blog, I don't believe MSVC will be restricted that way 
indefinitely.  After all, they encourage the use of clang/llvm for C 
programming, and that does do type-based alias analysis and optimisation.

The C world is littered with code that "used to work" or "works when 
optimisation is not used" because it relied on shite like this - 
unwarranted assumptions about limitations in compiler technology.

> 
> Also I think a lot of this code was originally written for compilers 
> like Watcom C and similar.
> 
> 
> Have noted that there are some behavioral inconsistencies, for example:
> Some old code seems to assumes that x<<y, y always shifts left but 
> modulo to the width of the type. Except, when both x and y are constant, 
> code seems to expect it as if it were calculated with a wider type, and 
> where negative shifts go in the opposite direction, ... with the result 
> then being converted to the final type.
> 
> Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large or 
> negative shift. MSVC will warn if the shift is large or negative.
> 
> Though, in most cases, if the shift is larger than the width of the 
> type, or negative, it is usually a programming error.
> 
> 
>> It's okay to be conservative in a compiler (especially when high 
>> optimisation is really difficult!).  It's okay to have command-line 
>> switches or pragmas to support additional language semantics such as 
>> supporting access via any lvalue type, or giving signed integer 
>> arithmetic two's complement wrapping behaviour.  It's okay to make 
>> these the defaults.
>>
>> But it is not okay to encourage code to make these compiler-specific 
>> assumptions without things like a pre-processor check for the specific 
>> compiler and pragmas to explicitly set the required compiler switches. 
>> It is not okay to excuse bad code as "traditional style" - that's an 
>> insult to people who have been writing good C code for decades.
>>
> 
> A lot of the code I have seen from the 90s was written this way.
> 

Yes.  A lot code from the 90's was written badly.  A lot of code today 
is written badly.  Just because a lot of code was, and still is, written 
that way does not stop it being bad code.

> 
> Though, a lot of it comes from a few major sources:
>    id Software;
>      Can mostly be considered "standard" practice,
>      along with maybe Linux kernel, ...
>    Apogee Software
>      Well, some of this code is kinda bad.
>      This code tends to be dominated by global variables.
>      Also treating array bounds as merely a suggestion.
>    Raven Software
>      Though, most of this was merely modified ID Software code.
> 
> Early on, I think I also looked a fair bit at the Linux kernel, and also 
> some of the GNU shell utilities and similar (though, the "style" was 
> very different vs either the Linux kernel or ID code).
> 

The Linux kernel is not a C style to aspire to.  But they do at least 
try to make such assumptions explicit - the kernel build process makes 
it very clear that it requires the "-fno-strict-aliasing" flag and can 
only be correctly compiled by a specific range of gcc versions (and I 
think experimentally, icc and clang).  Low-level and systems programming 
is sometimes very dependent on the details of the targets, or the 
details of particular compilers - that's okay, as long as it is clear in 
the code and the build instructions.  Then the code (or part of it at 
least) is not written in standard C, but in gcc-specific C or some other 
non-standard dialect.  It is not, however, "traditional C".

> 
> Early on, I had learned C partly by tinkering around with id's code and 
> trying to understand what secrets it contained.
> 
> 
> But, alas, an example from Wikipedia shows a relevant aspect of id's style:
> https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
> 
> Which is, at least to me, what I consider "traditional".

The declaration of all the variables at the top of the function is 
"traditional".  The reliance on a specific format for floating point is 
system-dependent code (albeit one that works on a great many systems). 
The use of "long" for a 32-bit integer is both "traditional" /and/ 
system-dependent.  (Though it is possible that earlier in the code there 
are pre-processor checks on the size of "long".)  The use of signed 
integer types for bit manipulation is somewhere between "traditional" 
and "wrong".  The use of pointer casts instead of a type-punning union 
is wrong.  The lack of documentation and comments, use of an unexplained 
magic number, and failure to document or comment the range for which the 
algorithm works and its accuracy limitations are also very traditional - 
a programming tradition that remains strong today.

It is worth remembering that game code (especially commercial game code) 
is seldom written with a view to portability, standard correctness, or 
future maintainability.  It is written to be as fast as possible using 
the compiler chosen at the time, to be build and released as a binary in 
the shortest possible time-to-market.

>>>
>>> So:
>>>    memcpy(&i, &f, 8);
>>> Will still use memory ops and wreck the performance of both the i and 
>>> f variables.
>>
>> Well, there you have scope for some useful optimisations (more useful 
>> than type-based alias analysis).  memcpy does not need to use memory 
>> accesses unless real memory accesses are actually needed to give the 
>> observable effects specified in the C standards.
>>
> 
> Possibly, but by the stage we know that it could be turned into a 
> reg-reg move (in the final code generation), most of the damage has 
> already been done.
> 
> Basically, it would likely be necessary to detect and special case this 
> scenario at the AST level(probably by turning it into a cast or 
> intrinsic). But, usually one doesn't want to add too much of this sort 
> of cruft to the AST walk.
> 

One thing to remember is that functions like "memcpy" don't have to be 
treated as normal functions.  You can handle it as a keyword in your 
compiler if that's easiest.  You can declare it as a macro in your 
<strings.h>.  You can combine these, and have compiler-specific 
extensions (keywords, attributes, whatever) and have the declaration as 
a function with attributes.  Your key aim is to spot cases where there 
is a small compile-time constant on the size of the memcpy.

> 
> But, then, apart from code written to assume GCC or similar, most of the 
> code doesn't use memcpy in this way.
> 
> So, it would mostly only bring significant advantage if pulling code in 
> from GCC land.

How well do you handle type-punning unions?  Do they need to be moved 
out to the stack, or can they be handled in registers?

> 
>> unsigned int f_to_u(float f) {
>>      unsigned int u;
>>      memcpy(&u, &f, sizeof(f));
>>      return u;
>> }
>>
>> gcc compiles that to :
>>
>> f_to_u:
>>      movd eax, xmm0
>>      ret
>>
> 
> Yeah, it is more clever here, granted.
> 
>>> Meanwhile:
>>>    i=*(uitn64_t *)(&f);
>>> Will only wreck the performance of 'f'.
>>>
>>>
>>> The best option for performance in BGBCC is one of either:
>>>    i=__float64_getbits(f);  //compiler intrinsic
>>>    i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
>>>
>>> Though, these options don't exist in the other compilers.
>>
>> Such compiler extensions can definitely be useful, but it's even 
>> better if a compiler can optimise standard code - that way, 
>> programmers can write code that works correctly on any compiler and is 
>> efficient on the compilers that they are most interested in.
>>
> 
> Possibly.
> 
> For "semi-portable" code, usually used MSVC style, partly as by adding 
> 'volatile' it seemingly also works in GCC. Though, often with macro 
> wrappers.

Code that has to be widely portable, with an aim to being efficient on 
many compilers and correct on all, always ends up with macro wrappers 
for this kind of thing, defined conditionally according to compiler 
detection.

> 
> 
>>>
>>> Implicitly, casting via __m64 or __m128 is a double-cast though. In 
>>> BGBCC, these types don't natively support any operators (so, they are 
>>> basically sort of like the value-equivalents of "void *").
>>>
>>>
>>> So:
>>>    memcpy(&i, &f, 8);      //best for GCC and Clang
>>>    i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
>>>    i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
>>>
>>> In a lot of cases, these end up with wrappers.
>>>
>>> GCC:
>>>    static inline uitn64_t getU64(void *ptr)
>>>    {
>>>      uitn64_t v;
>>>      memcpy(&v, ptr, 8);
>>>      return(v);
>>>    }
>>> MSVC or BGBCC:
>>>    #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))
>>>
>>> Though, have noted that volatile usually works in GCC as well, though 
>>> in GCC there is no obvious performance difference between volatile 
>>> and memcpy, whereas in MSVC the use of a volatile cast is faster.
>>
>> In gcc, a memcpy here will need to use a single memory read unless 
>> "getU64" is called with the address of a variable that is already in a 
>> register (in which case you get a single register move instruction).  
>> A volatile read will also do a single memory read - but it might 
>> hinder other optimisations by limiting the movement of code around.
>>
> 
> Possibly.
> 
> When I tried benchmarking these before:
>    GCC:
>      Seemingly no difference between memcpy and volatile;

As I explained, that is to be expected in cases where the you can't get 
other optimisations that "volatile" would block.  Usually simple timing 
benchmarks have fewer optimisation opportunities than real code.

>    MSVC:
>      Adding or removing volatile made no real difference;

That will, of course, depend on the benchmark.  A volatile access will 
not normally take more time than a non-volatile access.  But 
non-volatile accesses can be re-ordered, combined, or omitted in ways 
that volatile accesses cannot.

>      Using memcpy is slower.

As I explained.

>    BGBCC: Either memcpy or volatile carries an overhead.
>      The use of volatile is basically a shotgun de-optimization;
>      If doesn't know what to de-optimize, so goes naive for everything.
> 

Okay.

> 
>> On MSVC, last I saw (which is a long time ago), any use of "memcpy" 
>> will be done using an external library function (in an DLL) for 
>> generic memcpy() use - clearly that will have /massive/ overhead in 
>> comparison to the single memory read needed for a volatile access.
>>
> 
> It is slightly more clever now, but still not great.
>    Will not (always) generate a library call.
>    Though, in VS2008 or similar, was always still a library call.
>      VS2010 and VS2013 IIRC might setup and use "REP MOVSB" instead.
> 
> It will do it inline, but still often:
>    Spill variables;
>    Load addresses;
>    Load from source;
>    Store to destination;
>    Load value from destination.
> 
> What BGBCC gives here is basically similar.
> 
> 
>>
>>>
>>> Don't want to use static inline functions in BGBCC though, as it 
>>> still doesn't support inline functions in the general case.
>>>
>>
>

[toc] | [prev] | [next] | [standalone]

#393971

From	BGB <cr88192@gmail.com>
Date	2025-07-11 14:05 -0500
Message-ID	<104rna7$1lh8i$1@dont-email.me>
In reply to	#393970

On 7/11/2025 3:48 AM, David Brown wrote:
> On 11/07/2025 04:09, BGB wrote:
>> On 7/10/2025 4:34 AM, David Brown wrote:
>>> On 10/07/2025 04:28, BGB wrote:
>>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>>> On 09/07/2025 04:39, BGB wrote:
>>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>>
>>>>> ...
> 
>>>
>>> Please don't call this "traditional behaviour" of compilers - be 
>>> honest, and call it limited optimisation and dumb translation.  And 
>>> don't call it "code that assumes traditional behaviour" - call it 
>>> "code written by people who don't really understand the language". 
>>> Code which assumes you can do "extern float x; unsigned int * p = 
>>> (unsigned int *) &x;" is broken code.  It always has been, and always 
>>> will be - even if it does what the programmer wanted on old or 
>>> limited compilers.
>>>
>>> There were compilers in the 1990's that did type-based alias 
>>> analysis, and many other "modern" optimisations - I have used at 
>>> least one.
>>>
>>
>> Either way, MSVC mostly accepts this sorta code.
> 
> I remember reading in a MSVC blog somewhere that they had no plans to 
> introduce type-based alias analysis in the compiler.  The same blog 
> article announced their advanced new optimisations that treat signed 
> integer overflow as undefined behaviour and explained that they'd being 
> doing that for years in a few specific cases.  I think it is fair to 
> assume there is a strong overlap between the programmers who think MSVC, 
> or C and C++ in general, have two's complement wrapping of signed 
> integers when the hardware supports it, as those who think pointer casts 
> let you access any data.
> 
> And despite the blog, I don't believe MSVC will be restricted that way 
> indefinitely.  After all, they encourage the use of clang/llvm for C 
> programming, and that does do type-based alias analysis and optimisation.
> 
> The C world is littered with code that "used to work" or "works when 
> optimisation is not used" because it relied on shite like this - 
> unwarranted assumptions about limitations in compiler technology.
> 

This is why "-fwarpv -fno-strict-aliasing" need to be used so often when 
compiling pretty much anything non-trivial with GCC or Clang...

>>
>> Also I think a lot of this code was originally written for compilers 
>> like Watcom C and similar.
>>
>>
>> Have noted that there are some behavioral inconsistencies, for example:
>> Some old code seems to assumes that x<<y, y always shifts left but 
>> modulo to the width of the type. Except, when both x and y are 
>> constant, code seems to expect it as if it were calculated with a 
>> wider type, and where negative shifts go in the opposite 
>> direction, ... with the result then being converted to the final type.
>>
>> Meanwhile, IIRC, GCC and Clang raise an error if trying to do a large 
>> or negative shift. MSVC will warn if the shift is large or negative.
>>
>> Though, in most cases, if the shift is larger than the width of the 
>> type, or negative, it is usually a programming error.
>>
>>
>>> It's okay to be conservative in a compiler (especially when high 
>>> optimisation is really difficult!).  It's okay to have command-line 
>>> switches or pragmas to support additional language semantics such as 
>>> supporting access via any lvalue type, or giving signed integer 
>>> arithmetic two's complement wrapping behaviour.  It's okay to make 
>>> these the defaults.
>>>
>>> But it is not okay to encourage code to make these compiler-specific 
>>> assumptions without things like a pre-processor check for the 
>>> specific compiler and pragmas to explicitly set the required compiler 
>>> switches. It is not okay to excuse bad code as "traditional style" - 
>>> that's an insult to people who have been writing good C code for 
>>> decades.
>>>
>>
>> A lot of the code I have seen from the 90s was written this way.
>>
> 
> Yes.  A lot code from the 90's was written badly.  A lot of code today 
> is written badly.  Just because a lot of code was, and still is, written 
> that way does not stop it being bad code.
> 

There is a tradeoff between "bad" code, and "code that gives the best 
performance" (but doesn't necessarily follow some peoples' definitions 
of "good").

If the idioms for working around TBAA and similar end up costing more in 
terms of performance than the performance gains of TBAA, it is not a win.

>>
>> Though, a lot of it comes from a few major sources:
>>    id Software;
>>      Can mostly be considered "standard" practice,
>>      along with maybe Linux kernel, ...
>>    Apogee Software
>>      Well, some of this code is kinda bad.
>>      This code tends to be dominated by global variables.
>>      Also treating array bounds as merely a suggestion.
>>    Raven Software
>>      Though, most of this was merely modified ID Software code.
>>
>> Early on, I think I also looked a fair bit at the Linux kernel, and 
>> also some of the GNU shell utilities and similar (though, the "style" 
>> was very different vs either the Linux kernel or ID code).
>>
> 
> The Linux kernel is not a C style to aspire to.  But they do at least 
> try to make such assumptions explicit - the kernel build process makes 
> it very clear that it requires the "-fno-strict-aliasing" flag and can 
> only be correctly compiled by a specific range of gcc versions (and I 
> think experimentally, icc and clang).  Low-level and systems programming 
> is sometimes very dependent on the details of the targets, or the 
> details of particular compilers - that's okay, as long as it is clear in 
> the code and the build instructions.  Then the code (or part of it at 
> least) is not written in standard C, but in gcc-specific C or some other 
> non-standard dialect.  It is not, however, "traditional C".
> 

I sort of consider it "traditional", along with things like Doom and 
Quake and similar.

Granted, Doom wasn't too hard to get to be 64-bit clean and to work with 
TBAA. For example, I have builds that work moderately correctly in WSL 
with Xming. There was a "gotcha" with porting the Doom code to ARM based 
targets though, as ARM assumed char was unsigned by default, but the 
Doom engine assumed 'char' as signed, so needed to change some things to 
use 'signed char' explicitly.

Quake requires getting a little hacky in a few places (there are a few 
"problem areas" in the design of their "progs.dat" VM).

Though, not as bad as the "QVM" in "Quake 3 Arena", for which I was 
unsure of a good way to make this work on a 64-bit machine. Basically, 
it compiles C code for a 32-bit VM, which then promptly tries to share 
pointers directly with the rest of the Quake 3 engine.

At the time, the amount of wonk I would have needed to try to make QVM 
work in a 64-bit machine didn't seem worthwhile. Very likely it would 
have meant needing to add a virtual memory space and address-translation 
schemes, ... Though, Quake 3 allowed the fallback of going to a Quake 2 
like strategy and compiling the game scripts as native DLL's / SO's.

Well, either this, or force all heap memory to be allocated in the low 
4GB or similar. I don't usually like this option.

> 
>>
>> Early on, I had learned C partly by tinkering around with id's code 
>> and trying to understand what secrets it contained.
>>
>>
>> But, alas, an example from Wikipedia shows a relevant aspect of id's 
>> style:
>> https://en.wikipedia.org/wiki/ 
>> Fast_inverse_square_root#Overview_of_the_code
>>
>> Which is, at least to me, what I consider "traditional".
> 
> The declaration of all the variables at the top of the function is 
> "traditional".  The reliance on a specific format for floating point is 
> system-dependent code (albeit one that works on a great many systems). 
> The use of "long" for a 32-bit integer is both "traditional" /and/ 
> system-dependent.  (Though it is possible that earlier in the code there 
> are pre-processor checks on the size of "long".)  The use of signed 
> integer types for bit manipulation is somewhere between "traditional" 
> and "wrong".  The use of pointer casts instead of a type-punning union 
> is wrong.  The lack of documentation and comments, use of an unexplained 
> magic number, and failure to document or comment the range for which the 
> algorithm works and its accuracy limitations are also very traditional - 
> a programming tradition that remains strong today.
> 
> It is worth remembering that game code (especially commercial game code) 
> is seldom written with a view to portability, standard correctness, or 
> future maintainability.  It is written to be as fast as possible using 
> the compiler chosen at the time, to be build and released as a binary in 
> the shortest possible time-to-market.
> 

It is a lot of the code I have seen, or messed with porting to my ISA.

So, a lot of BGBCC's design priorities were partly motivated by making 
this sort of code-porting relatively low hassle.

Though, my usual porting process was, roughly:
   Port code to Win64 via MSVC (main initial hassle, 64-bit stuff);
   Port code to my ISA (BGBCC);
   Port code to modern GCC (often more of a pain).
     Though, "-fwrapv -fno-strict-aliasing" makes it easier.
     Next stage is often to make it work without these.

Though, in some amount of the code for my own ISA, the use of direct 
pointer casting is the main option (though, often using 'volatile').

In other cases, the use of "memcpy()" is more of an "#ifdef __GNUC__" 
thing (partly as both GCC and Clang use this ifdef).

>>>>
>>>> So:
>>>>    memcpy(&i, &f, 8);
>>>> Will still use memory ops and wreck the performance of both the i 
>>>> and f variables.
>>>
>>> Well, there you have scope for some useful optimisations (more useful 
>>> than type-based alias analysis).  memcpy does not need to use memory 
>>> accesses unless real memory accesses are actually needed to give the 
>>> observable effects specified in the C standards.
>>>
>>
>> Possibly, but by the stage we know that it could be turned into a reg- 
>> reg move (in the final code generation), most of the damage has 
>> already been done.
>>
>> Basically, it would likely be necessary to detect and special case 
>> this scenario at the AST level(probably by turning it into a cast or 
>> intrinsic). But, usually one doesn't want to add too much of this sort 
>> of cruft to the AST walk.
>>
> 
> One thing to remember is that functions like "memcpy" don't have to be 
> treated as normal functions.  You can handle it as a keyword in your 
> compiler if that's easiest.  You can declare it as a macro in your 
> <strings.h>.  You can combine these, and have compiler-specific 
> extensions (keywords, attributes, whatever) and have the declaration as 
> a function with attributes.  Your key aim is to spot cases where there 
> is a small compile-time constant on the size of the memcpy.
> 

I guess it is possible.

>>
>> But, then, apart from code written to assume GCC or similar, most of 
>> the code doesn't use memcpy in this way.
>>
>> So, it would mostly only bring significant advantage if pulling code 
>> in from GCC land.
> 
> How well do you handle type-punning unions?  Do they need to be moved 
> out to the stack, or can they be handled in registers?
> 

Generally, yes.

BGBCC's handling of local structs and unions is along the lines of:
   Creates an internal hidden pointer;
   Reserve space in the frame;
   Initialize pointer to address in frame;
   Struct operations go through this pointer.

So, basically, it is sort of like:
   Foo foo;
Became:
   char t_foo[sizeof(Foo)];
   Foo *foo = ((Foo *)&t_foo);
A similar use of hidden internal pointers also applies to arrays.

Note that these pointers do not exist within structs or similar (so 
inline structs or arrays are stored as in typical ABIs), but rather 
merely exist within the local frame (which basically always handles 
structures and arrays via pointers).

So, from the compilers' POV, "foo.x" and "foo->x" are basically 
equivalent, and BGBCC doesn't actually bother to distinguish them 
(occasionally pops up as an issue when porting code from BGBCC to other 
compilers, as sometimes the wrong operator ends up being used, and the 
other compilers actually care about the distinction).

Union is basically a struct where all members are at offset 0.

So, type-punning via a union will go to memory and load from memory.

However, since the union doesn't require taking the address of the 
variables (which adds additional penalty) it would still be preferable 
in terms of performance vs the "memcpy()" option.

Note that for struct assignment, or returning a struct, it basically 
behaves as if a "memcpy()" call were used at that location.

In the RISC-V based targets, BGBCC does differ slightly from the 
official ABI in the handling of struct passing and return by value.

Rule is basically always:
   1-8 bytes: Register
   9-16 bytes: Register pair
   17+: Reference

For struct return, it passes the destination address in X28 (if it can't 
fit into a register pair, X11:X10).

The official RV ABI involves passing and returning structs via on-stack 
copy (inline to the argument list), and return by using the argument 
space for the returned struct.

But, BGBCC does this more like in the Win64 ABI.

Also, technically using a variant of the RV64 Soft-Float ABI as even 
though I am using an ISA with FPU registers, the Hard-Float ABI is worse 
for my uses.

It is actually more tempting to instead use F10..F17 for more arguments, 
say:
   X10..X17: Arguments 1-8
   F10..F17: Arguments 9-16
Vs, spilling any arguments past the stack (where, 16 arguments allows 
full register coverage of generally around 99.95% of all functions).

Where, one difference is "automatic COM-Interface thunks";
If one does the ABI like I have done, the thunks are a lot easier.
But, if one tries to do it like in the proper RV64 Hard-Float ABI, 
automatic COM thunks are going to be a pain (the thunks need to be 
generated one-off and the generator needs to actually know the argument 
list for each method, ...).

Also doing it my way makes the varargs mechanism easier as well, ...

Note though that within the local frame, if it is a struct or union, it 
is still generally always handled as a reference though. But, register 
or register pair is still generally preferable to using a memory-memory 
copy for small structs for performance reasons.

A gray area is SIMD vector types, but I can note that BGBCC does not 
treat vectors like structs. Instead, the SIMD vectors are purely value 
types (or, rvalue).

This is also one significant area, say, where "__m64" and "__m128" 
differ between BGBCC and MSVC, where in BGBCC they are value-like 
(rvalue), whereas in MSVC they are more struct-like (lvalue).

So, for example, in BGBCC one can do:
   __vec4f v0, v1;
   v1=v0.zyxw;
Or:
   v1=(__vec4f){v0.z,v0.y,v0.x,v0.w};
But, not:
   v1.x=v0.z;  v1.y=v0.y;
   v1.z=v0.x;  v1.w=v0.w;
...

The latter could possibly be faked, but:
   v1.x=v0.z;
Would be essentially equivalent to:
   v1=(__vec4f){v0.z,v1.y,v1.z,v1.w};

...

Some of this is partly because BGBCC essentially has a set of "core 
types" that divide up all other types:
   I: Int32 and smaller;
   L: Int64 and related.
   D: "double" and smaller floating-point types.
   A: Anything pointer-like (includes structs and arrays)
   X: Anything 128-bit goes here.

It was originally ILFDAX, but F got subsumed into D.

Can note that ILFDA is basically the same as the core type model used by 
the Java VM, and static-typed variants of the BGBScript VM (which had 
moved to a model of being JVM-like internally, but using a 
JavaScript/ActionScript style language on the surface).

But, can note that some of my early compiler and VM design stuff was 
influenced by the design of the JVM.

But, here, the use of a simplified set of "core types" can simplify the 
general design of the compiler any typesystem, since any sub-types of a 
given core type can mostly be handled with similar rules.

Can also note that the type-signature notation was also partly 
influenced by both the JVM and C++ name-mangling.

So, say:
   int          i0;  //"i"
   unsigned int i1;  //"j"
   double       f0;  //"d"
   int        *pi0;  //"Pi"
   int (*f)(long,float);  //"(lf)i"
   ...

>>
>>> unsigned int f_to_u(float f) {
>>>      unsigned int u;
>>>      memcpy(&u, &f, sizeof(f));
>>>      return u;
>>> }
>>>
>>> gcc compiles that to :
>>>
>>> f_to_u:
>>>      movd eax, xmm0
>>>      ret
>>>
>>
>> Yeah, it is more clever here, granted.
>>
>>>> Meanwhile:
>>>>    i=*(uitn64_t *)(&f);
>>>> Will only wreck the performance of 'f'.
>>>>
>>>>
>>>> The best option for performance in BGBCC is one of either:
>>>>    i=__float64_getbits(f);  //compiler intrinsic
>>>>    i=(__m64)f;              //__m64 and __m128 do a raw-bits cast.
>>>>
>>>> Though, these options don't exist in the other compilers.
>>>
>>> Such compiler extensions can definitely be useful, but it's even 
>>> better if a compiler can optimise standard code - that way, 
>>> programmers can write code that works correctly on any compiler and 
>>> is efficient on the compilers that they are most interested in.
>>>
>>
>> Possibly.
>>
>> For "semi-portable" code, usually used MSVC style, partly as by adding 
>> 'volatile' it seemingly also works in GCC. Though, often with macro 
>> wrappers.
> 
> Code that has to be widely portable, with an aim to being efficient on 
> many compilers and correct on all, always ends up with macro wrappers 
> for this kind of thing, defined conditionally according to compiler 
> detection.
> 

Generally, yes.

The use of macro wrappers often ends up as a "necessary evil".

>>
>>
>>>>
>>>> Implicitly, casting via __m64 or __m128 is a double-cast though. In 
>>>> BGBCC, these types don't natively support any operators (so, they 
>>>> are basically sort of like the value-equivalents of "void *").
>>>>
>>>>
>>>> So:
>>>>    memcpy(&i, &f, 8);      //best for GCC and Clang
>>>>    i=*(uitn64_t *)(&f);   //best for MSVC, error-prone in GCC
>>>>    i=(__m64)f;             //best for BGBCC, N/A for MSVC or GCC
>>>>
>>>> In a lot of cases, these end up with wrappers.
>>>>
>>>> GCC:
>>>>    static inline uitn64_t getU64(void *ptr)
>>>>    {
>>>>      uitn64_t v;
>>>>      memcpy(&v, ptr, 8);
>>>>      return(v);
>>>>    }
>>>> MSVC or BGBCC:
>>>>    #define getU64(ptr)  (*((volatile uint64_t *)(ptr)))
>>>>
>>>> Though, have noted that volatile usually works in GCC as well, 
>>>> though in GCC there is no obvious performance difference between 
>>>> volatile and memcpy, whereas in MSVC the use of a volatile cast is 
>>>> faster.
>>>
>>> In gcc, a memcpy here will need to use a single memory read unless 
>>> "getU64" is called with the address of a variable that is already in 
>>> a register (in which case you get a single register move 
>>> instruction). A volatile read will also do a single memory read - but 
>>> it might hinder other optimisations by limiting the movement of code 
>>> around.
>>>
>>
>> Possibly.
>>
>> When I tried benchmarking these before:
>>    GCC:
>>      Seemingly no difference between memcpy and volatile;
> 
> As I explained, that is to be expected in cases where the you can't get 
> other optimisations that "volatile" would block.  Usually simple timing 
> benchmarks have fewer optimisation opportunities than real code.
> 
>>    MSVC:
>>      Adding or removing volatile made no real difference;
> 
> That will, of course, depend on the benchmark.  A volatile access will 
> not normally take more time than a non-volatile access.  But non- 
> volatile accesses can be re-ordered, combined, or omitted in ways that 
> volatile accesses cannot.
> 

Yeah, pretty much.

>>      Using memcpy is slower.
> 
> As I explained.
> 
>>    BGBCC: Either memcpy or volatile carries an overhead.
>>      The use of volatile is basically a shotgun de-optimization;
>>      If doesn't know what to de-optimize, so goes naive for everything.
>>
> 
> Okay.
> 

Other compilers might be more clever and be like, "OK, we only need to 
de-optimize this particular memory reference".

BGBCC is not that clever.

Though, generally the impact of, say:
   i=*(volatile uint32_t *)ptr;

Will be smaller than:
   volatile uint32_t *vptr;
   ...
   vptr=ptr;
   ...
   i=*vptr;

Since here BGBCC will be naive about everything having to do with vptr, 
rather than just the memory load. And, at the lower levels, it doesn't 
track which specific accesses are volatile, so effectively needs to 
disable load/store reordering for the whole basic-block.

Though, mostly applies to my XG1 and XG2 ISAs; for RISC-V and XG3 
targets, the relevant instruction shuffling logic is not currently 
implemented (or, technically, at present this penalty always is in 
effect for these targets).

The idea being to try to reorder instructions to try to fit better into 
an in-order CPU pipeline. Though, it seems the in-order superscalar 
still does semi-OK even with naive instruction ordering.

Where, XG3 is basically my ISA shoehorned into the same encoding space 
as RISC-V, but replacing the 16-bit "compressed" instructions. The 
compiler then treats it like it was compiling for RISC-V, but has the 
ability to do various things that don't exist in normal RISC-V 
(producing a hybrid instruction stream).

Despite being kind of an ugly hack, XG3 does seem promising for my uses 
though.

>>
>>> On MSVC, last I saw (which is a long time ago), any use of "memcpy" 
>>> will be done using an external library function (in an DLL) for 
>>> generic memcpy() use - clearly that will have /massive/ overhead in 
>>> comparison to the single memory read needed for a volatile access.
>>>
>>
>> It is slightly more clever now, but still not great.
>>    Will not (always) generate a library call.
>>    Though, in VS2008 or similar, was always still a library call.
>>      VS2010 and VS2013 IIRC might setup and use "REP MOVSB" instead.
>>
>> It will do it inline, but still often:
>>    Spill variables;
>>    Load addresses;
>>    Load from source;
>>    Store to destination;
>>    Load value from destination.
>>
>> What BGBCC gives here is basically similar.
>>
>>
>>>
>>>>
>>>> Don't want to use static inline functions in BGBCC though, as it 
>>>> still doesn't support inline functions in the general case.
>>>>
>>>
>>
>

[toc] | [prev] | [next] | [standalone]

#393972

From	"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date	2025-07-12 17:30 -0700
Message-ID	<104uun7$2dmgj$1@dont-email.me>
In reply to	#393970

On 7/11/2025 1:48 AM, David Brown wrote:
> On 11/07/2025 04:09, BGB wrote:
>> On 7/10/2025 4:34 AM, David Brown wrote:
>>> On 10/07/2025 04:28, BGB wrote:
>>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>>> On 09/07/2025 04:39, BGB wrote:
>>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>>
>>>>> ...
> 
>>>
[...]
> It is worth remembering that game code (especially commercial game code) 
> is seldom written with a view to portability, standard correctness, or 
> future maintainability.  It is written to be as fast as possible using 
> the compiler chosen at the time, to be build and released as a binary in 
> the shortest possible time-to-market.

Love this one:

https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse-approximation

[...]

[toc] | [prev] | [next] | [standalone]

#393973

From	BGB <cr88192@gmail.com>
Date	2025-07-13 15:10 -0500
Message-ID	<10513rs$2uqg3$1@dont-email.me>
In reply to	#393972

On 7/12/2025 7:30 PM, Chris M. Thomasson wrote:
> On 7/11/2025 1:48 AM, David Brown wrote:
>> On 11/07/2025 04:09, BGB wrote:
>>> On 7/10/2025 4:34 AM, David Brown wrote:
>>>> On 10/07/2025 04:28, BGB wrote:
>>>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>>>> On 09/07/2025 04:39, BGB wrote:
>>>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>>>
>>>>>> ...
>>
>>>>
> [...]
>> It is worth remembering that game code (especially commercial game 
>> code) is seldom written with a view to portability, standard 
>> correctness, or future maintainability.  It is written to be as fast 
>> as possible using the compiler chosen at the time, to be build and 
>> released as a binary in the shortest possible time-to-market.
> 
> Love this one:
> 
> https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse- 
> approximation
> 
> [...]

I looked at the post and initially didn't recognize it as I didn't use 
the same term (I usually called it "distance approximation").

But, yeah, variations of the first algorithm were used frequently in a 
lot of this code. For example, it was the main way of calculating 
distance in Wolf3D and ROTT.

Not sure about Doom, would need to find the relevant code, but wouldn't 
be surprised.

In terms of coding style, ROTT had considerable "hair".

In my initial porting effort, spent quite a long time trying to deal 
with all of the places where memory access was going out of bounds 
(though, in many cases, just ended up masking the access by the size of 
the array or similar; or in some cases rewriting the logic to not depend 
on particular out-of-bounds access behaviors).

Also had to deal with the mess that a lot of the drawing code depended 
on the specific way they were using the VGA hardware:
320x200 on screen, but 384x200 in memory, but divided into 4 planes of 
96x200 (and doing tricks with the plane mask register).
Kinda needed to write wrappers to fake a lot of this crap.

But, it seemed like Apogee / 3D Realms liked this mode, vs id mostly 
staying with the standard 320x200 256-color VGA mode (which can be 
mapped directly to a 64K array).

FWIW, my most recent 3D engine also used this strategy for distance math 
in many places.

Can note that in this 3D engine, world coordinates are primarily 
expressed in a packed fix point form:
   (23: 0): X coord, 16.8
   (47:24): Y coord, 16.8
   (63:48): Z coord, 8.8

With the base unit roughly 1 meter, albeit informally subdivided into 32 
"inches", with the exact handling of the 25% error on the size of the 
inch relative to the "standard inch" being subject to interpretation.

Though, for sake of hand waving, in 3D engines one can also say that 
meters and yards are "basically the same thing" as well (if one 
redefines the base unit as "yard", the inch discrepancy drops to 12.5%...).

Though, in my case, I just sorta instead used "decimal inches", where 
there is only around a 4% discrepancy between 1/32 of a meter, and the 
decimal inch.

Another factor is the size of the player:
   In Minecraft, IIRC "Steve" is 1.85m tall, this is atypical (~ 6'2").
   In my 3D engine, decided to adjust is so the player 1.60m tall (5'4").

This small difference has a notable effect on how big the blocks look.

Note that in this engine (like my previous 3D engine) has a world that 
is modulo in X and Y.

The previous engine had used 20.12 fixed point coords for X/Y/Z, with a 
wrap around in all 3 axes; though the terrain generation was planar (the 
sky limit was more part of the terrain generator, only generating a 
world 1 region tall). Actually, outdoor spaces were partly "faked" as 
the 3D engine actually treated the world like an infinite cave, just the 
terrain generation put a "sky" block at Z=255, which created an 
overworld-like situation. Initially, the sky handling was similar to the 
Quake engines (the skybox was drawn onto geometry), but then switched to 
a "generic" skybox with the "sky at Z=255" key (and there was support 
for multiple skies, possibly to have an effect similar to the 
"dimensions" in Minecraft).

Actually, this isn't too hard from how world parameters were defined in 
the ROTT engine, where the blocks near X=0,Y=0 were used to define the 
parameters for the rest of the map (if nothing was there, it giving a 
world similar to that in Wolfenstein 3D).

In my own Minecraft playing experience, I can say a 64km world limit is 
still plenty large for practical gameplay. Nevermind if the current 
engine has a world height limit of 128, which is a little less than 
Minecraft.

Granted, 64km is still a lot less than ~ 1048km.

Partly the reason for the design choices was to try to make the engine 
need less RAM, so that it was more practical to fit in the RAM limits of 
the FPGA boards I had (eg, a 128MB RAM chip).

To fit a Minecraft like experience in under, say, 80MB of RAM, requires 
compromises.

One was region size, where reducing the region size from a 256m cube to 
a 128m cube was an 8x reduction of the storage size of each region.

And generally for shorter draw distances (~ 64 .. 96), roughly 9 regions 
will be loaded at a time.

At ~ 128m to 192m, the number of loaded regions increases to 29, and at 
256m, may increase to around 51 regions.

I ended up staying with the same chunk size (16x16x16).

In both engines, blocks exist as a table of unique blocks with a 4 or 8 
bit index table (2K or 4K for each loaded chunk).

Currently each chunk stores its own table of blocks (32 bits in the 
newer engine), but in theory could merge the blocks per region (with a 
16-bit number), with an assumption of fewer than fewer than 64K unique 
blocks per region (would mostly hold).

In both engines, each chunk is LZ compressed:
   Currently engine uses RP2, but this is in a similar category to LZ4.
   Prior engine had used an LZ+AdRice entropy coder.
The main reason for the change is that AdRice adds a lot of additional 
CPU performance cost on a 50MHz CPU, vs a more modest effect on compression.

Can note that most of the "variation" between blocks is typically the 
same few block types but usually at different lighting levels. In the 
absence of light sources, many chunks fit into a limit of 16 or fewer 
unique blocks.

Where, block lighting:
   4 bits: Sky light intensity;
     At 15, travels downward at full strength;
       This is understood as direct view of the sky.
     Drops off by 1 for each meter.
     At 14 or less, it drops off by 1 in all directions.
     Sunlight is always white in open air.
   4 bits: Block light intensity;
     Say, 15 at a light source, dropping off by 1 each meter.
     Block lighting is the max of the contribution from adjacent blocks
   4 bits: Block light color
     Color applied to the block lighting.
     Derived from a 16 color palette.

This is a similar scheme to that in Minecraft, just with the addition of 
an explicit block light color. Note that these fields only apply to 
transparent blocks (but may instead be metadata for opaque blocks).

Contrast:
Minecraft had used Deflate, but for chunk compression, Deflate has a 
high overhead. Minecraft also stores full block data for every block, 
rather than using a "block palette".

My first 3D engine had used the same RLE compression scheme as Wolf3D 
and ROTT, but this had left something to be desired.

Where, say, Wolf3D and ROTT both used flat words containing multiple 
byte planes (IIRC):
   Plane 0: Block type at each location.
     0: Empty Space
     1-239: Various types of wall blocks.
       Typically, index number mapped 1:1 to wall texture.
       Rule breaks in ROTT for multi-state or animated walls.
   Plane 1: Entity Type
     1-239: Various types of entity (such as enemy or item).
   Plane 2: Entity Attribute
     Mostly encoded facing direction and spawn difficulty and similar.

Wolf3D world size was 64x64, whereas ROTT was 128x128.
   In both cases, block size was ~ 2 meters.

The worlds were stored in an RLE compressed form, where:
   A magic tag byte was defined externally
     (IIRC, usually 0xFF or similar for Wolf3D).
     In ROTT, it was defined per map plane IIRC (vs hard coded).
   Most bytes were passed through as-is;
   If the tag byte is seen:
     Tag Length Value
   An RLE run of 1 byte could be used to escape-code the tag byte.

Can note that an LZ compression scheme does notably better here vs a 
simple RLE scheme.

Can note also:
Wolf3D and ROTT had used ray-casting to determine visibility, where they 
would raycast along the map until they hit something (with a raycast for 
each horizontal screen pixel).

My newer 3D engine had used raycasts for visibility determination.
   Whenever a raycast hit a block, it could take note of what block was 
hit and where. So, it would build a visible shell for the currently 
visible parts of the world. However, couldn't do this per-pixel in full 
3D, so it mostly raycasts in a spherical region and jittering the ray 
vectors to try to improve coverage (something drops off the list if it 
wasn't hit recently enough).

This doesn't scale well with draw distance though, for use on PCs (with 
a larger draw distance), I have ended up with a modified approach:
Per block ray-casting is only used for shorter distances (64 meters);

For more distant stuff (medium distance), it is per-chunk and using 
cached chunk vertex arrays (loading the chunks). This is more like the 
approach Minecraft uses, just in my case, used for things further than 
64 meters.

For higher distances (say, past 192 meters), it builds an "outer shell" 
representation of the region. Here, the region is represented as 6x 
128x128 faces, each encoding both the block type at the location, and 
the distance of the block from the edge of the region (16-bit block type 
index, 8 bit distance). The limitations of this representation are less 
obvious from a distance, but it does save on RAM vs actually loading the 
chunks (it can only represent things that are life-of-site along an axis 
from the edge of the region).

One possibility (for greater performance) could be if this were turned 
into textures with rendering done via a parallax shader. Currently this 
engine doesn't use shaders though (it is operating roughly within the 
limits of the OpenGL 1.3 feature-set; so no shaders or similar).

I can push it up to around 384 meters before performance goes to crap. 
But, this is acceptable to me, as Minecraft with a 24 chunk draw 
distance also performs like crap...

Have noted that my 3D engine does still seem to use less RAM relative to 
draw distance if compared with Minecraft.

Most of the RAM use goes into the vertex arrays though.
   Sadly, the steep cost of vertex arrays is unavoidable here.

Well, unless each face were drawn one-at-a-time using immediate mode 
(glBegin/glEnd for each polygon), but this would be unacceptably slow 
(as it scales very poorly).

Note, not using VBOs partly as these are also outside the feature range 
of OpenGL 1.3, ...

Can note that it seems like OpenGL somewhat prefers if vertex array 
contents don't change. If one has large vertex arrays with rapidly 
changing contents, then the OpenGL backend seems to allocate absurd 
amounts of RAM. Works better if one has a larger number of more moderate 
size vertex arrays with static contents.

Seemingly OpenGL likely caches stuff, and checks whether or not a repeat 
draw is the same vertex array with the same contents.

...

[toc] | [prev] | [next] | [standalone]

#393974

From	"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Date	2025-07-13 13:55 -0700
Message-ID	<10516fv$2uk5o$2@dont-email.me>
In reply to	#393973

On 7/13/2025 1:10 PM, BGB wrote:
> On 7/12/2025 7:30 PM, Chris M. Thomasson wrote:
>> On 7/11/2025 1:48 AM, David Brown wrote:
>>> On 11/07/2025 04:09, BGB wrote:
>>>> On 7/10/2025 4:34 AM, David Brown wrote:
>>>>> On 10/07/2025 04:28, BGB wrote:
>>>>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>>>>> On 09/07/2025 04:39, BGB wrote:
>>>>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>>>>
>>>>>>> ...
>>>
>>>>>
>> [...]
>>> It is worth remembering that game code (especially commercial game 
>>> code) is seldom written with a view to portability, standard 
>>> correctness, or future maintainability.  It is written to be as fast 
>>> as possible using the compiler chosen at the time, to be build and 
>>> released as a binary in the shortest possible time-to-market.
>>
>> Love this one:
>>
>> https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse- 
>> approximation
>>
>> [...]
> 
> I looked at the post and initially didn't recognize it as I didn't use 
> the same term (I usually called it "distance approximation").
> 
> But, yeah, variations of the first algorithm were used frequently in a 
> lot of this code. For example, it was the main way of calculating 
> distance in Wolf3D and ROTT.
> 
> Not sure about Doom, would need to find the relevant code, but wouldn't 
> be surprised.
> 
> 
> In terms of coding style, ROTT had considerable "hair".
> 
> In my initial porting effort, spent quite a long time trying to deal 
> with all of the places where memory access was going out of bounds 
> (though, in many cases, just ended up masking the access by the size of 
> the array or similar; or in some cases rewriting the logic to not depend 
> on particular out-of-bounds access behaviors).
> 
> Also had to deal with the mess that a lot of the drawing code depended 
> on the specific way they were using the VGA hardware:
> 320x200 on screen, but 384x200 in memory, but divided into 4 planes of 
> 96x200 (and doing tricks with the plane mask register).
> Kinda needed to write wrappers to fake a lot of this crap.
> 
> But, it seemed like Apogee / 3D Realms liked this mode, vs id mostly 
> staying with the standard 320x200 256-color VGA mode (which can be 
> mapped directly to a 64K array).
> 
> 
> 
> FWIW, my most recent 3D engine also used this strategy for distance math 
> in many places.
> 
> 
> Can note that in this 3D engine, world coordinates are primarily 
> expressed in a packed fix point form:
>    (23: 0): X coord, 16.8
>    (47:24): Y coord, 16.8
>    (63:48): Z coord, 8.8
> 
> With the base unit roughly 1 meter, albeit informally subdivided into 32 
> "inches", with the exact handling of the 25% error on the size of the 
> inch relative to the "standard inch" being subject to interpretation.
> 
> 
> Though, for sake of hand waving, in 3D engines one can also say that 
> meters and yards are "basically the same thing" as well (if one 
> redefines the base unit as "yard", the inch discrepancy drops to 12.5%...).
> 
> Though, in my case, I just sorta instead used "decimal inches", where 
> there is only around a 4% discrepancy between 1/32 of a meter, and the 
> decimal inch.
> 
> Another factor is the size of the player:
>    In Minecraft, IIRC "Steve" is 1.85m tall, this is atypical (~ 6'2").
>    In my 3D engine, decided to adjust is so the player 1.60m tall (5'4").
> 
> This small difference has a notable effect on how big the blocks look.
> 
> 
> 
> 
> Note that in this engine (like my previous 3D engine) has a world that 
> is modulo in X and Y.

For some reason, this reminds me of an example from my experimental 3d 
engine:

(FlyMatirx ver(0.0.0))

https://youtu.be/VSPgb51Hry8

:^)

[snip]

I need more time to give your response a proper read! Well done. Am busy 
with other things right now. Fwiw, check out an interesting 3d model of 
mine on SketchFab:

(ctHyperField)
https://skfb.ly/pyP9E

(ctWorms (0.0.1))
https://skfb.ly/pyXH6

(InfaRing)
https://skfb.ly/py7Mu

[toc] | [prev] | [next] | [standalone]

#393975

From	BGB <cr88192@gmail.com>
Date	2025-07-13 19:11 -0500
Message-ID	<1051hv8$31njr$1@dont-email.me>
In reply to	#393974

On 7/13/2025 3:55 PM, Chris M. Thomasson wrote:
> On 7/13/2025 1:10 PM, BGB wrote:
>> On 7/12/2025 7:30 PM, Chris M. Thomasson wrote:
>>> On 7/11/2025 1:48 AM, David Brown wrote:
>>>> On 11/07/2025 04:09, BGB wrote:
>>>>> On 7/10/2025 4:34 AM, David Brown wrote:
>>>>>> On 10/07/2025 04:28, BGB wrote:
>>>>>>> On 7/9/2025 4:41 AM, David Brown wrote:
>>>>>>>> On 09/07/2025 04:39, BGB wrote:
>>>>>>>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>>>>>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>> ...
>>>>
>>>>>>
>>> [...]
>>>> It is worth remembering that game code (especially commercial game 
>>>> code) is seldom written with a view to portability, standard 
>>>> correctness, or future maintainability.  It is written to be as fast 
>>>> as possible using the compiler chosen at the time, to be build and 
>>>> released as a binary in the shortest possible time-to-market.
>>>
>>> Love this one:
>>>
>>> https://forums.parallax.com/discussion/147522/dog-leg-hypotenuse- 
>>> approximation
>>>
>>> [...]
>>
>> I looked at the post and initially didn't recognize it as I didn't use 
>> the same term (I usually called it "distance approximation").
>>
>> But, yeah, variations of the first algorithm were used frequently in a 
>> lot of this code. For example, it was the main way of calculating 
>> distance in Wolf3D and ROTT.
>>
>> Not sure about Doom, would need to find the relevant code, but 
>> wouldn't be surprised.
>>
>>
>> In terms of coding style, ROTT had considerable "hair".
>>
>> In my initial porting effort, spent quite a long time trying to deal 
>> with all of the places where memory access was going out of bounds 
>> (though, in many cases, just ended up masking the access by the size 
>> of the array or similar; or in some cases rewriting the logic to not 
>> depend on particular out-of-bounds access behaviors).
>>
>> Also had to deal with the mess that a lot of the drawing code depended 
>> on the specific way they were using the VGA hardware:
>> 320x200 on screen, but 384x200 in memory, but divided into 4 planes of 
>> 96x200 (and doing tricks with the plane mask register).
>> Kinda needed to write wrappers to fake a lot of this crap.
>>
>> But, it seemed like Apogee / 3D Realms liked this mode, vs id mostly 
>> staying with the standard 320x200 256-color VGA mode (which can be 
>> mapped directly to a 64K array).
>>
>>
>>
>> FWIW, my most recent 3D engine also used this strategy for distance 
>> math in many places.
>>
>>
>> Can note that in this 3D engine, world coordinates are primarily 
>> expressed in a packed fix point form:
>>    (23: 0): X coord, 16.8
>>    (47:24): Y coord, 16.8
>>    (63:48): Z coord, 8.8
>>
>> With the base unit roughly 1 meter, albeit informally subdivided into 
>> 32 "inches", with the exact handling of the 25% error on the size of 
>> the inch relative to the "standard inch" being subject to interpretation.
>>
>>
>> Though, for sake of hand waving, in 3D engines one can also say that 
>> meters and yards are "basically the same thing" as well (if one 
>> redefines the base unit as "yard", the inch discrepancy drops to 
>> 12.5%...).
>>
>> Though, in my case, I just sorta instead used "decimal inches", where 
>> there is only around a 4% discrepancy between 1/32 of a meter, and the 
>> decimal inch.
>>
>> Another factor is the size of the player:
>>    In Minecraft, IIRC "Steve" is 1.85m tall, this is atypical (~ 6'2").
>>    In my 3D engine, decided to adjust is so the player 1.60m tall (5'4").
>>
>> This small difference has a notable effect on how big the blocks look.
>>
>>
>>
>>
>> Note that in this engine (like my previous 3D engine) has a world that 
>> is modulo in X and Y.
> 
> For some reason, this reminds me of an example from my experimental 3d 
> engine:
> 
> (FlyMatirx ver(0.0.0))
> 
> https://youtu.be/VSPgb51Hry8
> 
> :^)
> 

I have no good videos online of my 3rd engine it seems...

Here is a video of my 2nd engine:
https://www.youtube.com/watch?v=trUZl4onzqI
(as noted, from 7 years ago).

Another video from 2nd engine:
https://www.youtube.com/watch?v=LGB53L7f314

This effort fizzled with the rise of my ISA project eating most of my 
time, and me starting to run out of ideas or motivation to continue 
working on it.

And, first 3D engine:
https://www.youtube.com/watch?v=WTrfNAfWV8o

Which started out trying to be a Doom 3 clone before being turned into a 
Minecraft clone (it used stencil shadows and similar).

But, my 3rd engine seems the most limited thus far. But, as noted, its 
design priority was more "less RAM use" than "looks good or modern".

In its most recent form, I had also made a hit/miss attempt at moving 
over to using 16-color artwork and trying to integrate some "text parser 
game" mechanics (similar to 1980s era Sierra games, like Kings Quest and 
similar). But, it is a bit of a mystery how to best integrate this stuff 
with a vaguely Minecraft like world structure.

In the more recent form, my newer 3D engine has experimented again with 
using 3D models (previously, it, like my 2nd 3D engine, was using 
exclusively sprite graphics).

Generally, the current 3D modeling tech is based on CSG rather than mesh 
models.

Though, I did the questionable thing of basing the CSG language on 1980s 
style BASIC (or, like, the bizarre hybrid offspring of early BASIC and 
Emacs Lisp). Partly this is because while I was partly tempted to clone 
and modify the OpenSCAD language with the features I wanted, this would 
have been more effort.

Kind of ironic is that despite the wonk, my wacky BASIC variant can 
generally express 3D models in fewer lines (in both cases, the languages 
work by writing code to express the geometry one wants; as opposed to 
using a 3D modeler).

These convert to a mesh-model format which is what is actually used 
in=engine, generally with a skeletal animation system.

> [snip]
> 
> I need more time to give your response a proper read! Well done. Am busy 
> with other things right now. Fwiw, check out an interesting 3d model of 
> mine on SketchFab:
> 
> (ctHyperField)
> https://skfb.ly/pyP9E
> 
> (ctWorms (0.0.1))
> https://skfb.ly/pyXH6
> 
> (InfaRing)
> https://skfb.ly/py7Mu

My stuff is generally less abstract...

Here are some pictures from my 3rd engine, showing off a few of the 3D 
models:
https://x.com/cr88192/status/1921809223639322962/photo/2

It is unclear if the 16-color thing is really working...

And, even then, in some places I cheated and used a 32-color palette; 
with additional darker shades of the first 16 colors.

The color palette here is based on the CGA/EGA palette (as opposed to 
the default 16-color VGA palette, which is "similar but different").

Well, and this is also combined with the distortions caused by DXT1 
texture compression, which also does not preserve the 16 colors all that 
well and also limits how colors could be used in blocks. To be 
authentic, one would need to limit use to 2 colors per 4x4 block, and 
possibly also ensure that mipmaps also maintain 16-color integrity. I 
didn't go that far.

The main alternative being to just stay with the usual RGB555 hi-color 
that is being used otherwise.

For the design of the 3D guy, was partly also trying to take some 
inspiration from the humanoids in the "Money for Nothing" video, though 
in this case slightly more minimalist. Tried to go more minimal, but it 
wasn't really working for me. Minecraft style models likely only really 
work because they have texture maps (and flat colors increases the 
minimum geometric complexity needed).

...

[toc] | [prev] | [next] | [standalone]

#393980

From	antispam@fricas.org (Waldek Hebisch)
Date	2025-07-20 00:21 +0000
Message-ID	<105hcqj$14hsb$1@paganini.bofh.team>
In reply to	#393966

David Brown <david.brown@hesbynett.no> wrote:
> On 09/07/2025 04:39, BGB wrote:
>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>
> ...
> 
>>>
>>> I don't have confidence in an author's understanding of C, if they
>>> believe that ISO C defines the behavior of invalid pointers being
>>> compared, such that this needs to be rectified by a private "patch"
>>> of the text.
>>>
> 
> You might not be aware of it, but the author Jens Gustedt is a member of 
> the C standards committee, and has been for some time.  He is the most 
> vocal, public and active member.  I think that suggests he has quite a 
> good understanding of C and the ISO standards!  Not everyone agrees 
> about his ideas and suggestions about how to move C forward - but that's 
> fine (and it's fine by Jens, from what I have read).  That's why there 
> is a standards committee, with voting, rather than a BDFL.
> 
>>> The concept of pointer provenance can be expressed other than
>>> as a textual patch against ISO C.
>>>
> 
> There have been plenty of papers and blogs written about pointer 
> provenance (several by Gustedt) and how it could work.  It's not a very 
> easy thing to follow in any format.  A patch to current C standards is 
> perhaps the least easy to follow, but it is important for how the 
> concept could be added to C.

I looked at the blog post.  About two thirds of it is explaing what
I consider obvious.  Later he makes some assumptions/rules and
claims that they cover segmented model.  But assumption:

:  Two pointer values are equal if they correspond to the same              
:  abstract address. 

is problematic for 8086 segmentation (would force "huge" style
pointer comparison).  It is probably unworkage for more abstract
segmentation (like in 286) when there are overlapping segments

He spends time talking about XOR trick, but leaves different
(and IMO much more important trick in undefined teritory).
Namely, modern ARM and RISC-V embedded processors are 32-bit,
so need 32-bit pointers.  But low end processor frequently
have tiny RAM that could be addressed using 16 bits.  More
precisely, one can use base pointer initialized to address
of start of RAM and access memory location using 16 bit offset
from the start of RAM.  AFAICS definitions in the blog post
put this strictly into undefined territory, but I expect this
to work as indended in gcc.

Later he writes about exposure and synthetised pointers.
That is rather natural, but I did not found explicit
statement how exposure and synthetised pointers are
related to aliasing.  Maybe the intent is like:
"access via synthetised pointer may alias access to
any exposed storage instance".  OTOH in cases like
convertion to offset with respect to some base and
back we deal with synthetised pointers, but in principle
compier could track bases and offsets and came with
quite good alias analysis.

More generally, the blog post looks like very preliminary
analysis that compiler should do before further work on
alias analysis.  But compiler writer presumably knows
the targert, so can make assumption that better fit
actial situation than assumptions made in the blog post.

So, ATM it is not clear to me that puting such things in the
standard adds value.  It could if standard formulated new
aliasing rules, but I see no new aliasing rule in the blog
post.  And IMO new rules should be related to algorithms:
without good algorithms rules must be either conservative
(disallowing optimizations) or risk breaking code.

-- 
                              Waldek Hebisch

[toc] | [prev] | [next] | [standalone]

#393981

From	David Brown <david.brown@hesbynett.no>
Date	2025-07-20 05:09 +0200
Message-ID	<105hmke$32473$1@dont-email.me>
In reply to	#393980

On 20/07/2025 02:21, Waldek Hebisch wrote:
> David Brown <david.brown@hesbynett.no> wrote:
>> On 09/07/2025 04:39, BGB wrote:
>>> On 7/2/2025 8:10 AM, Kaz Kylheku wrote:
>>>> On 2025-07-02, Alexis <flexibeast@gmail.com> wrote:
>>>>>
>> ...
>>
>>>>
>>>> I don't have confidence in an author's understanding of C, if they
>>>> believe that ISO C defines the behavior of invalid pointers being
>>>> compared, such that this needs to be rectified by a private "patch"
>>>> of the text.
>>>>
>>
>> You might not be aware of it, but the author Jens Gustedt is a member of
>> the C standards committee, and has been for some time.  He is the most
>> vocal, public and active member.  I think that suggests he has quite a
>> good understanding of C and the ISO standards!  Not everyone agrees
>> about his ideas and suggestions about how to move C forward - but that's
>> fine (and it's fine by Jens, from what I have read).  That's why there
>> is a standards committee, with voting, rather than a BDFL.
>>
>>>> The concept of pointer provenance can be expressed other than
>>>> as a textual patch against ISO C.
>>>>
>>
>> There have been plenty of papers and blogs written about pointer
>> provenance (several by Gustedt) and how it could work.  It's not a very
>> easy thing to follow in any format.  A patch to current C standards is
>> perhaps the least easy to follow, but it is important for how the
>> concept could be added to C.
> 
> I looked at the blog post.  About two thirds of it is explaing what
> I consider obvious.  Later he makes some assumptions/rules and
> claims that they cover segmented model.  But assumption:
> 
> :  Two pointer values are equal if they correspond to the same
> :  abstract address.
> 
> is problematic for 8086 segmentation (would force "huge" style
> pointer comparison).  It is probably unworkage for more abstract
> segmentation (like in 286) when there are overlapping segments
> 

Segmented memory models are ancient history, and basically irrelevant to 
new ideas and new C standard versions.  Compilers for these models, if 
anyone ever makes new versions, can continue to use the old memory 
models.  The provenance memory model is about making more accurate 
analysis for optimisation and static checking - if a compiler can't use 
it, that's okay.

> He spends time talking about XOR trick, but leaves different
> (and IMO much more important trick in undefined teritory).
> Namely, modern ARM and RISC-V embedded processors are 32-bit,
> so need 32-bit pointers.  But low end processor frequently
> have tiny RAM that could be addressed using 16 bits.  More
> precisely, one can use base pointer initialized to address
> of start of RAM and access memory location using 16 bit offset
> from the start of RAM.  AFAICS definitions in the blog post
> put this strictly into undefined territory, but I expect this
> to work as indended in gcc.

That's not just for low-end microcontrollers.  On PowerPC and POWER, it 
is normal to have a register for the "small data segment" pointer - 
small statically allocated data is placed in a 64 KB segment and 
addressed using the base register plus a 16-bit offset.

More generally, it is perfectly normal for the same data to be accessed 
in different ways - absolute addresses, direct pointer registers, 
pointers to a struct and then with constant offset, and so on.  The 
model and its implementations have to deal with that, or they will be 
useless on all targets.

> 
> Later he writes about exposure and synthetised pointers.
> That is rather natural, but I did not found explicit
> statement how exposure and synthetised pointers are
> related to aliasing.  Maybe the intent is like:
> "access via synthetised pointer may alias access to
> any exposed storage instance".  OTOH in cases like
> convertion to offset with respect to some base and
> back we deal with synthetised pointers, but in principle
> compier could track bases and offsets and came with
> quite good alias analysis.
> 
> More generally, the blog post looks like very preliminary
> analysis that compiler should do before further work on
> alias analysis.  But compiler writer presumably knows
> the targert, so can make assumption that better fit
> actial situation than assumptions made in the blog post.
> 
> So, ATM it is not clear to me that puting such things in the
> standard adds value.  It could if standard formulated new
> aliasing rules, but I see no new aliasing rule in the blog
> post.  And IMO new rules should be related to algorithms:
> without good algorithms rules must be either conservative
> (disallowing optimizations) or risk breaking code.
> 

I think it will be a while before all this is ready for the standard, or 
for implementation.

[toc] | [prev] | [standalone]

csiph-web

"The provenance memory model for C", by Jens Gustedt

Contents

#393959 — "The provenance memory model for C", by Jens Gustedt

#393960

#393965

#393966

#393967

#393968

#393969

#393970

#393971

#393972

#393973

#393974

#393975

#393980

#393981