Groups | Search | Server Info | Login | Register


Groups > comp.arch.embedded > #32459

Re: Return of the Overlays!

From Don Y <blockedofcourse@foo.invalid>
Newsgroups comp.arch.embedded
Subject Re: Return of the Overlays!
Date 2026-01-15 03:25 -0700
Organization A noiseless patient Spider
Message-ID <10kafba$mu1n$1@dont-email.me> (permalink)
References <10k6mhf$3hfn8$1@dont-email.me> <3d6emktheaubi9puta0iiojebcq3e7g9l9@4ax.com> <10k8utl$8nmr$1@dont-email.me> <j30hmk5b7to4aq9nfjvqg6dg26nlo0unqe@4ax.com>

Show all headers | View raw


> Past expiration date.

Sorry to hear that.

>>>> But, there is nothing in the traditional build process that
>>>> ensures references from "some point" in the code don't refer
>>>> back to some *other* point (that you thought you were done with).
> 
> With discipline it (miserably) can be done.
> See below.

In addition to knowing what MIGHT be accessible (exported), you
also have to track down every reference to each of those and
identify when, in time, they occur (or might occur) relative to
your explicitly declaring them to be "no longer needed".

THAT is the tough part -- a link map tells you "what talks to
what" but says nothing about when, in time, those interactions occur.
If FOO calls BAR and BAR calls BAZ, then anything that calls
BAZ, BAR *or* FOO, at a point in time AFTER you have declared
BAZ to no longer be needed will SIGSEGV.

You have to keep track of these interdependencies when making
decisions declaring particular "identifiers" (code or data)
to be no longer needed.

The "Initialization()" example I provided is easily to conceive of
occurring exactly once in a "program's" execution.  So, as long as
it is only invoked once AND THERE ARE NO PATHS BACK TO Main() -- in
my example -- there is no way it can be reaccessed/referenced after
the "extinctify" invocation that follows it.

Can you say that about string(3C) functions -- can you decide
when those library functions are no longer needed, thereby
freeing up the resources that they require?  Or, some function
that you wrote to handle a particular error condition?

>>>> The overlay build process had to ensure THIS overlay didn't refer to
>>>> anything in THAT overlay.  (Bankswitching code had to rely on "trampoline"
>>>> logic in some shared/persistent area to get from one bank to another,
>>>> but, there was nothing prohibiting such references!)
>>>
>>> Just scope handling.
>>
>> If they can coexist in a shared address space, that's an issue.
>> But, if you're just trying to ensure nothing "here" ever refers
>> to something "there" (marked extinct), I don't think it is as
>> much of an issue.
>>
>> The problem becomes one of discipline -- "inherently" knowing how to
>> organize your modules/references so you can be "sure" when you execute
>> one specific line of code (that marks a section as "extinct") that you
>> will NEVER reference anything that it containED (past tense because it
>> no longer exists)
>>
>> Think in terms of FORTRAN's "COMMON" and "CHAIN"; you want to be able
>> to put any shared data created in "section 1" into COMMON and then
>> fall into the code in the next "section", having the benefit of
>> all that "common" data.
>>
>> Knowing that the code from the first section is forever gone.
> 
> Data produced by X and consumed later by Y is not a problem unless X
> and Y disagree on the data's format.

The format isn't the issue.  Rather, that X /no longer exists/.
References to the address(es) that it occupied aren't mapped to
anything, anymore.

You must be sure Y (and every other X consumer) have made use
of X before you unmap its resources.  THAT is what is hard to track
because tools don't lay out the temporal relationships of various
identifiers.

> Problems are only possible when there can exist stored references to
> things which no longer exist.

The references need not be explicitly "stored" but, rather, can be
part of an instruction stream generated by a compiler.

>  In Fortran that was not possible: a
> COMMON block could not include pointers [even in the Fortrans that had
> pointers], and so CHAIN with COMMON could not fail in that way.

I mention COMMON and CHAIN because they implement the mechanisms that
must be present for one part of a "program" to do work and pass those
results to a followup part (via COMMON) while execution is passed
to that followup part (via CHAIN).

You (typ) need data and code bridges to connect separate parts of a program
together, esp if you are deliberately trying to cut a program into smaller
pieces to reduce its "current" footprint (for any value of "current")

> Similarly, creating a data structure stored by the main program with
> overlay X and then swapping in overlay Y to process the data will not
> be a problem so long as the data structure contains no references to
> anything inside X.

That doesn't address having the data structure overlaid (or, in
my case, unmapped) because *IT* was considered "no longer needed".

At some point in the "earlier" execution of the "program", that data
structure had meaning.  But, one the developer declares that data to
no longer be needed, it disappears.  Any code that later references it
(in error!) crashes.  No, it doesn't get the wrong values for the data
or interpret the values incorrectly.  The memory reference simply FAILS.

>> [Ideally, I would like to be able ot section off more varied instances
>> of resources instead of just chopping a program into discrete CONSECUTIVE
>> sections.]
>>
>>> :
>>> More broadly: explicity loaded DLLs [ using dlopen()/dlsym(),
>>> LoadLibrary()/GetProcAddress(), etc. ] could be considered modern
>>> moral equivalents of overlays.
>>
>> Yes.  Though you can REload a DLL if you decide you need it
>> later.  The CHAIN/COMMON distinction was that prior sections
>> are gone -- until you restart the program.
> 
> You're confused.

Reread the above.

> With explicit DLL management, the programmer has to deliberately write
> code to load a DLL and map its exported API to function pointers.  If
> the DLL then is unloaded, the mapped pointers become garbage: trying
> to call the functions will _NOT_ reload the DLL - rather it will,
> almost certainly, crash the program.

That is EXACTLY the situation I am describing.  The developer explicitly
decided the DLL was NO LONGER NEEDED.  *He* unloaded it.  If he wasn't
disciplined enough to know that he wasn't yet done with it, then his
program crashes.

However, he can choose to unload the DLL (to reduce his resource
usage) and then, at some later time, explicitly REload it as if
for the first time and make continued use of it.

With CHAIN/COMMON, the past is past.  The code that preceded is no longer
available (unless you rerun the program).

Bank switching, overlays, etc. don't suffer that fate.

Virtual memory that has been swapped out doesn't, either.

But, my deliberate marking of portions of that memory as "no longer
needed" (mapped) discards them and their contents, irretrievably.
The only way to recreate the data and code is to restart the
program.

I.e., there is no remedy for "extinctifying" resources prematurely.
You *must* know where every possible reference exists in space AND TIME
(so you don't "prematurely" declare it unneeded).

> If the program was written in a CHAIN/COMMON fashion, using and
> disgarding a progression of DLLs, then there will be NO programmer
> supplied code to "go back" and reload one of them.

As in my intended use case.

>>> Note that grafting this onto a language lacking the notion of nested
>>> functions will not be easy.
>>
>> I can handle some cases easily:
>>
>> main() {
>>      Initialize()
>>
>>      Extinctify(&Initialize)
>>
>>      DoWork()
>> }
>>
>> and place "Initialize" in its own section, commanding the linker to locate
>> it "conveniently" (likely on a page-frame boundary soas to maximize the
>> amount of usable space in that page-frame).
>>
>> But, aside from such obvious choices, I think it is hard to mentally
>> subdivide a piece of code for such a partitioning.  And, then, remembering
>> to "extinctify" portions that you no longer need.  You'd have to be
>> keenly aware of the cost of each such "portion" of the algorithm so
>> you could identify things that could/should be excised.
> 
> It's hard because you work in languages that make it hard(er).

You have some codebase.  Tools will tell you where a particular
identifier is referenced.  You can use that to build a list
of other objects that implicitly reference that identifier.

*What* is going to tell you WHEN any of those  objects are
invoked, relative to a particular statement that is executed
at a specific point in time:
     unmap the memory used by identifier X

If your chosen language and programming style arranges all
statements sequentially, then you can just look for any
references to the identifier in question (and all other
identifiers that directly or indirectly reference it)
and verify that they all occur BEFORE the statement in
question.

If your language allows for multiple modules to be
defined in different source files (including libraries)
and allows them to be invoked in arbitrary time orderings,
then what's your approach?  Besides inspecting each such
reference?

> The most straightforward way to arrange the code in C would be to
> separate the disjoint "overlay" scopes by source file / compilation
> unit.
> 
> pseudo'ing the example above in C:
> 
>    ----------------
>    F1();
>    F2();
>    main()
>    {
>       F1();
>       F2();
>    }
>    ----------------
>    static G1() {}
>    static H1() {}
>    F1()
>    {
>     ...
>    }
>    ----------------
>    static G2() {}
>    static H2() {}
>    F2()
>    {
>     ...
>    }
>    ----------------

X4()
{
    X7()
}
------
X5()
{
    F2()
}
-----
X6()
{
    if (blah)
       X4()
}
-----
X7()
{
    F1()
}
----
X8()
{
     X4()
}

[This isn't really far-fetched with libraries that can be interrelated or
interact in other ways]

At some point in time, in some module, I decide to unmap the resources that
F1 consumes.  Maybe that happens conditionally.

But, at some later point in time, X8() is invoked.  Or, X4.  Or, X7,  Or,
X6 with blah being true.

You have to track the dependency hierarchy for all of these "objects"
and ensure that none of them are invoked *after* you have unmapped F1.
Because you can't recreate F1, its resources or its actual "value"/meaning.

And, you can't exhaustively test to ensure that every possible path through
the application is exercised.  You could easily have a latent bug that
only surfaces in some particular set of circumstances that you didn't
consider, test or encounter, before.

And, "you" likely won't be handling the segfault so can't do anything
about it.

We aren't accustomed to having objects (code or data) disappear during
the course of developing or executing a piece of code.  If "foo"
existed at some point in the program -- and you are executing in
the same context and scope, then why would you NOT expect it to still
be there?

You can do an RPC and then be surprised that the NEXT invocation
fails -- because the remote host disappeared.  You code against
that possibility.

But, you expect everything that YOU defined/created to remain
accessible.  When you wrote that "late" X8() invocation, you likely
weren't thinking "have I already unmapped any of the resources
on which it relies?"

> where the horizontal lines denote source file / compilation unit
> boundaries.
> 
> This way the compiler will catch most problems, and the code will be
> in modules that potentially could be loaded/unloaded independently
> (assuming you have a way to do that).
> 
> But, as you said, it takes some dicipline.
> 
> The best way is to just use DLLs if you can.  Export only the entry
> points and everything else will be hidden.

I just delete objects and whatever they contain/represent goes away
in that action.  Whether it is a collection of data that I no longer
need, a group of functions, a subassembly, etc.  The developer
has to decide how to group "things" for their utility and temporal
pertinence.

You'd not want to group atexit() and command line processing in the same
object because you would likely want to shed the command line processing
resources early in execution -- just after processing the argument list.
Yet, atexit() would want to linger until the literal last gasp, regardless
of how/when that occurs.

Back to comp.arch.embedded | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-13 17:03 -0700
  Re: Return of the Overlays! George Neuner <gneuner2@comcast.net> - 2026-01-14 01:11 -0500
    Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-14 13:39 -0700
      Re: Return of the Overlays! George Neuner <gneuner2@comcast.net> - 2026-01-15 03:06 -0500
        Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-15 03:25 -0700
          Re: Return of the Overlays! George Neuner <gneuner2@comcast.net> - 2026-01-17 01:23 -0500
  Re: Return of the Overlays! John Levine <johnl@taugh.com> - 2026-01-14 20:39 +0000
    Re: Return of the Overlays! Grant Edwards <invalid@invalid.invalid> - 2026-01-14 21:00 +0000
      Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-14 18:55 -0700
    Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-14 18:48 -0700
  Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-25 01:46 -0700
    Re: Return of the Overlays! Don Y <blockedofcourse@foo.invalid> - 2026-01-25 02:10 -0700

csiph-web