Path: csiph.com!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: Dymamic arrays: memory management and naming
Date: Tue, 19 Sep 2023 07:58:57 -0700
Organization: A noiseless patient Spider
Lines: 117
Message-ID: <86o7hyky4u.fsf@linuxsc.com>
References: <20230909132336.99cdb303ad685bc1e40f92df@gmail.moc> <86ediwmk8b.fsf@linuxsc.com> <20230919011018.d9ed14da6e842291d47b69d7@gmail.moc>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="2708376e434a174750b1cbd4a8683397"; logging-data="2530628"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/hYcKi1HnEiki2vY4slRrk1AxEK/sL0lM="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:JKuZTWMTjl7pN2ri+OWWVunK3NU= sha1:J8dUbw8cfHqQDkYvonTgN1zB48c=
Xref: csiph.com comp.lang.c:176032

Anton Shepelev <anton.txt@gmail.moc> writes:

> Tim Rentsch:
>
>> Another point to consider:  if you are using malloc(), and
>> the array elements might be small (eg chars or shorts),
>
> I thought such effects should be prominent only with small
> array elements /and/ small array lengthes... I expected them
> to drown below the noise floor (so to say) when the length
> becomes large.

I think it's not as simple as that.  Consider a scenario:  a
program is making use of a growable character buffer, with
eventually 150 bytes needed.  Let's take an initial guess of 8
elements, and growing by a factor of two.  The chain of events in
the two cases would look like

(ignore grain, etc)       8   16   32   64  128  256  (272 used)
(factor in grain, etc)   24   56  120  248            (256 used)

Taking granularity, etc, into account the program needs just four
allocations (and uses only slightly less memory).  Ignoring these
factors the program needs six allocations.  Granted, the memory
savings is only about 6 percent, but there is also a time savings
by virtue of needing fewer re-allocations.

(If the numbers in the second line are confusing, remember that
at each stage the numbers are rounded up to be 8 mod 16.  The
initial 24 is because that's the smallest amount of memory that
the allocator will fit in a single block;  any smaller amount
will end up returning 24 usable bytes, so we might as well ask
for that.)

I admit this is only one scenario, and other scenarios will give
different results, including some where a naive approach does
better.  Still, the potential upside seems worth getting, since
the cost of getting it (basically some simple arithmetic in the
allocation function) is pretty small.

>> you
>> might try to take into account minimum allocation size,
>> grain, and overhead.  Here is a program to help explain:
>>
>>   #include <stdio.h>
>>   #include <stdlib.h>
>>
>>   int
>>   main(){
>>     enum { N = 90 };
>>     char *blocks[N+1];
>>     unsigned i;
>>     for(  i = 0;  i < N+1;  i++  ){
>>       blocks[i] = malloc( i );
>>     }
>>     for(  i = 0;  i < N;  i++  ){
>>       printf( "i: %2u   delta: %3td0, i, blocks[i+1] - blocks[i] );
>>     }
>>   }
>>
>> Running this program on my system here, it indicates a grain
>> of 16 bytes, an overhead of 8 bytes
>
> Same here (on the Freeshell.de server).  I had not thought
> of that at all, assuming subconsciously that the allocation
> map (some sort of RAM filesystem) is kept separately from
> the actual allocated blocks.  May I rely on the
> contiguousness of each overhead data element with the
> corresponding allocated block of data?

I think most allocators use a little bit of memory just before
each allocated block, to facilitate free()-ing and coalescing of
adjacent free blocks.  That isn't guaranteed of course, but it's
common.

> May I rely in
> general on the sequential order of allocations that your
> program clearly demonstrated in both our cases?

When a program starts it's likely that the allocator has a single
large block of memory available, and will fill successive calls
at one end of that block.  Again, not guaranteed, but I expect
it occurs often enough that it's worth making a stab.

If an attempt to discover the allocator characteristics fails, I
think a reasonable fallback is to guess a granularity of 16 and
an overhead of 8 (minimum size 24) for 64-bit systems, and half
of those for 32-bit systems.

> For aught I
> know, the memory manager is free to do howsoever it wishes...

Certainly it is, but it's not just a coincidence that the
numbers you got match the numbers that I got.

Having said that, I should add that my experience with MS Windows
is that it has enormous amount of overhead for each allocated
block.

> Thanks for the observation, Tim.  As you see, it is somewhat
> below the horizon (geologically speaking) of my knowledge.
> If you should care to provide a strategy that estimates
> these parameters of the memory manager and then uses them to
> optimise the allocation of dynamic arrays, I shall gladly
> include it as as an option with my implementation.  Right now
> I am casting for some lean, clean, and not git-driven platform
> to host the code.

My first suggestion is to provide an option (runtime selectable)
that guesses 'sizeof (size_t)' for overhead, twice that for the
granularity, and three times that for the minimum request size.
It's probably worth adding an initialization call to your
allocator interface, so that you can later add code that tries to
discover the true values of these parameters.

(Note: some spelling corrections to your message have been
made.)