Groups > comp.os.linux.misc > #73831 > unrolled thread

Hint For GNU/Linux Progrmmers

Started by	Lester Thorpe <lt@gnu.rocks>
First post	2025-09-10 12:33 +0000
Last post	2025-09-12 00:15 -0400
Articles	16 — 8 participants

Back to article view | Back to comp.os.linux.misc

  Hint For GNU/Linux Progrmmers Lester Thorpe <lt@gnu.rocks> - 2025-09-10 12:33 +0000
    Re: Hint For GNU/Linux Progrmmers John McCue <jmclnx@gmail.com.invalid> - 2025-09-11 15:11 +0000
      Re: Hint For GNU/Linux Progrmmers John Ames <commodorejohn@gmail.com> - 2025-09-11 09:30 -0700
        Re: Hint For GNU/Linux Progrmmers c186282 <c186282@nnada.net> - 2025-09-12 00:25 -0400
      Re: Hint For GNU/Linux Progrmmers Lester Thorpe <lt@gnu.rocks> - 2025-09-11 18:42 +0000
      Re: Hint For GNU/Linux Progrmmers Lawrence D’Oliveiro <ldo@nz.invalid> - 2025-09-11 22:38 +0000
        Re: Hint For GNU/Linux Progrmmers c186282 <c186282@nnada.net> - 2025-09-12 03:02 -0400
          Re: Hint For GNU/Linux Progrmmers Lester Thorpe <lt@gnu.rocks> - 2025-09-12 08:19 +0000
            Re: Hint For GNU/Linux Progrmmers Stéphane CARPENTIER <sc@fiat-linux.fr> - 2025-09-13 13:44 +0000
          Re: Hint For GNU/Linux Progrmmers Farley Flud <fsquared@fsquared.linux> - 2025-09-12 10:17 +0000
            Re: Hint For GNU/Linux Progrmmers c186282 <c186282@nnada.net> - 2025-09-12 06:51 -0400
            Re: Hint For GNU/Linux Progrmmers The Natural Philosopher <tnp@invalid.invalid> - 2025-09-12 15:45 +0100
              Re: Hint For GNU/Linux Progrmmers Farley Flud <fsquared@fsquared.linux> - 2025-09-12 15:20 +0000
                Re: Hint For GNU/Linux Progrmmers The Natural Philosopher <tnp@invalid.invalid> - 2025-09-13 07:57 +0100
              Re: Hint For GNU/Linux Progrmmers John Ames <commodorejohn@gmail.com> - 2025-09-12 10:11 -0700
      Re: Hint For GNU/Linux Progrmmers c186282 <c186282@nnada.net> - 2025-09-12 00:15 -0400

#73831 — Hint For GNU/Linux Progrmmers

From	Lester Thorpe <lt@gnu.rocks>
Date	2025-09-10 12:33 +0000
Subject	Hint For GNU/Linux Progrmmers
Message-ID	<pan$88697$563f5c11$d06bbf9a$5d033190@gnu.rocks>

Program optimization is essential, but yet it is difficult
to arrive at a best method.

For example, unrolling all loops can either improve or
degrade performance.

The user therefore, to get the best optimization, will
have to experiment through profiling or trial and error
to arrive at the best means.  This can be prohibitive
for many users.

I propose that GNU/Linux programmers should determine
the best options and then publish these recommendations
in the source tree to guide the interested user.

When I create programs I always determine the best optimization
but these programs are only for my own use.  They are
never published.



-- 
Gentoo: the only road to GNU/Linux perfection.

[toc] | [next] | [standalone]

#73897

From	John McCue <jmclnx@gmail.com.invalid>
Date	2025-09-11 15:11 +0000
Message-ID	<109uoqs$2n3c4$1@dont-email.me>
In reply to	#73831

Follow-ups trimmed to comp.os.linux.misc

In comp.os.linux.misc Lester Thorpe <lt@gnu.rocks> wrote:
> Program optimization is essential, but yet it is difficult
> to arrive at a best method.
<snip> 
> I propose that GNU/Linux programmers should determine
> the best options and then publish these recommendations
> in the source tree to guide the interested user.

I find O1 is good enough for all programs I create.

To, me, testing and retesting different optimizations is a
huge waste of time and at most you might save 1 second :)

For programs created by others, I keep whatever setting
the use since they know much better than me.

-- 
csh(1) - "An elegant shell, for a more... civilized age."
                     - Paraphrasing Star Wars

[toc] | [prev] | [next] | [standalone]

#73898

From	John Ames <commodorejohn@gmail.com>
Date	2025-09-11 09:30 -0700
Message-ID	<20250911093036.00006328@gmail.com>
In reply to	#73897

On Thu, 11 Sep 2025 15:11:24 -0000 (UTC)
John McCue <jmclnx@gmail.com.invalid> wrote:

> I find O1 is good enough for all programs I create.
> 
> To, me, testing and retesting different optimizations is a
> huge waste of time and at most you might save 1 second :)

Even if you're an optimization freak (and there's nothing wrong with
that,) the efficacy of tweaks like loop unrolling is highly dependent
on machine particulars (cache size, etc.) - it's difficult if not
impossible to establish a one-size-fits-all recipe for True Optimum
Performance that could be handed out to non-freaks, as is being
suggested here. Some level of tweaking may be warranted (e.g. unrolling
loops in a way that suits the particular algorithm,) but there's little
point trying to generalize deep grease-monkey fine-tuning across *all
target systems* even for a single distro, let alone The World At Large.

[toc] | [prev] | [next] | [standalone]

#73920

From	c186282 <c186282@nnada.net>
Date	2025-09-12 00:25 -0400
Message-ID	<8JednfQnLaxBPV71nZ2dnZfqnPWdnZ2d@giganews.com>
In reply to	#73898

On 9/11/25 12:30 PM, John Ames wrote:
> On Thu, 11 Sep 2025 15:11:24 -0000 (UTC)
> John McCue <jmclnx@gmail.com.invalid> wrote:
> 
>> I find O1 is good enough for all programs I create.
>>
>> To, me, testing and retesting different optimizations is a
>> huge waste of time and at most you might save 1 second :)
> 
> Even if you're an optimization freak (and there's nothing wrong with
> that,) the efficacy of tweaks like loop unrolling is highly dependent
> on machine particulars (cache size, etc.) - it's difficult if not
> impossible to establish a one-size-fits-all recipe for True Optimum
> Performance that could be handed out to non-freaks, as is being
> suggested here. Some level of tweaking may be warranted (e.g. unrolling
> loops in a way that suits the particular algorithm,) but there's little
> point trying to generalize deep grease-monkey fine-tuning across *all
> target systems* even for a single distro, let alone The World At Large.

   Best tact - proto, then re-write a few weeks later.
   The 2nd take will be smarter, tighter.

   Compiler options ... only deliver slight improvements.
   Best used if you need SMALLER, not faster.

   Had one microcontroller app I kept tweaking for
   five or six generations. Each time I could zap
   unnecesssary steps. Got it down nearly 50% from
   the original - saved power (it was a solar-powered
   field app so that was kinda important).

   New "AI" code-writing ... don't count on much
   "optimization". The AI won't really "get it".
   It may work - but be kinda messy. If it's a
   popular app, figure the power/time consumption
   of 'messy' for millions/billions of users.

[toc] | [prev] | [next] | [standalone]

#73906

From	Lester Thorpe <lt@gnu.rocks>
Date	2025-09-11 18:42 +0000
Message-ID	<pan$f36d4$163cb056$190affba$333a3f12@gnu.rocks>
In reply to	#73897

On Thu, 11 Sep 2025 15:11:24 -0000 (UTC), John McCue wrote:

> 
> To, me, testing and retesting different optimizations is a
> huge waste of time and at most you might save 1 second :)
> 

That was my original point and the reason I suggest that
programmers should do the dirty work for the user.

But seconds can quickly add.  For audio/video encoding and
math/physics simulations optimization can mean the difference
between 20 minutes and 1 hour, which is highly significant.

There is an "ancient" program called "paranoia" which evaluates
a machines floating point accuracy:

<https://netlib.sandia.gov/paranoia/paranoia.c>

Using your "-O1" to compile would lead to erroneous results.
In this case, "-O0" is required.

Granted, this program precedes Linux and GCC but other, more
recent programs, may behave in similar ways regarding optimization.

Therefore, it should be the programmers responsibility to
indicate the correct optimization. 

> For programs created by others, I keep whatever setting
> the use since they know much better than me.

-- 
Gentoo: the only road to GNU/Linux perfection.

[toc] | [prev] | [next] | [standalone]

#73911

From	Lawrence D’Oliveiro <ldo@nz.invalid>
Date	2025-09-11 22:38 +0000
Message-ID	<109vj1l$2vup8$6@dont-email.me>
In reply to	#73897

On Thu, 11 Sep 2025 15:11:24 -0000 (UTC), John McCue wrote:

> To, me, testing and retesting different optimizations is a huge waste of
> time and at most you might save 1 second :)

I was once hired to build an app in MATLAB for decoding and displaying 
multiple channels of EEG data, using its built-in GUI tools (momentary 
shudder as the PTSD kicks in), in real time. One of the original 
researchers had already written some stream-decoding code to start with; I 
had a go at doing it in different ways, and was able to achieve close to a 
2:1 speedup on the DEC Alpha I was using for testing.

Then I ran the same code on the Windows NT box which was going to be used 
as the actual deployment platform ... and most of the speedup went away.

[toc] | [prev] | [next] | [standalone]

#73940

From	c186282 <c186282@nnada.net>
Date	2025-09-12 03:02 -0400
Message-ID	<XPedndWqAP33WF71nZ2dnZfqnPednZ2d@giganews.com>
In reply to	#73911

On 9/11/25 6:38 PM, Lawrence D’Oliveiro wrote:
> On Thu, 11 Sep 2025 15:11:24 -0000 (UTC), John McCue wrote:
> 
>> To, me, testing and retesting different optimizations is a huge waste of
>> time and at most you might save 1 second :)
> 
> I was once hired to build an app in MATLAB for decoding and displaying
> multiple channels of EEG data, using its built-in GUI tools (momentary
> shudder as the PTSD kicks in), in real time. One of the original
> researchers had already written some stream-decoding code to start with; I
> had a go at doing it in different ways, and was able to achieve close to a
> 2:1 speedup on the DEC Alpha I was using for testing.
> 
> Then I ran the same code on the Windows NT box which was going to be used
> as the actual deployment platform ... and most of the speedup went away.

   THE best op is to proto, look/think for a few weeks,
   then re-write.

   That will do FAR more than any compiler tweaks.

   I like to proto in Python, then re-write in Pascal
   or maybe K&R 'C' depending.

   New - "AI" generated code. The "AI" does NOT
   "get it". It's code will be MESSY - 'Lego'.
   Maybe not so bad for random utilities, but if
   the app is meant for millions/billions then
   shitty code sucks a LOT more CPU cycles and
   energy.

   "AI" ... at present it's gonna suck maybe
   25% of the entire global energy output just
   so it can pretend to be idiot people. BIZ
   loves it because they think it can replace
   all those annoying HUMANS. Alas, disemployed
   humans can't BUY their stuff so .......

   Can't get there from here.

   Sorry.

[toc] | [prev] | [next] | [standalone]

#73947

From	Lester Thorpe <lt@gnu.rocks>
Date	2025-09-12 08:19 +0000
Message-ID	<pan$7e885$b476cfd9$150b5649$705f5b01@gnu.rocks>
In reply to	#73940

On Fri, 12 Sep 2025 03:02:09 -0400, c186282 wrote:

> 
>    THE best op is to proto, look/think for a few weeks,
>    then re-write.
> 
>    That will do FAR more than any compiler tweaks.
> 

Everyone is missing the main point.

I am referring to optimizing code that is already published
and available, e.g. the average GNU/Linux package.

This code cannot be (easily) rewritten by the user and the
only way to optimize is during build time, which can be quite
effective.  I have experienced up to 40% performance increase
using just compiler options.

But finding the best options can at times be difficult and
that's why the code author should provide guidance.

-- 
Gentoo: the only road to GNU/Linux perfection.

[toc] | [prev] | [next] | [standalone]

#74032

From	Stéphane CARPENTIER <sc@fiat-linux.fr>
Date	2025-09-13 13:44 +0000
Message-ID	<68c57549$0$3363$426a34cc@news.free.fr>
In reply to	#73947

Le 12-09-2025, Lester Thorpe <lt@gnu.rocks> a écrit :
> On Fri, 12 Sep 2025 03:02:09 -0400, c186282 wrote:
>
>> 
>>    THE best op is to proto, look/think for a few weeks,
>>    then re-write.
>> 
>>    That will do FAR more than any compiler tweaks.
>> 
>
> Everyone is missing the main point.

Which one?
- That you are a fraud? Nope: I know it.
- That you don't know how to optimize compilation? Nope: I know it.
- That you can only copy/past code? Nope: I know it.
- That you are a distro lackey? Nope: I know it.
- The fact that the more you speak about something, the less you know
  about it? Nope: I know it.
- That you are a Windows fanboy trying to make Linux users pass like
  morons? Nope: I know it.

> I am referring to optimizing code that is already published
> and available, e.g. the average GNU/Linux package.

You mean that the guys who wrote and published the code know how to
compile it? Or do you mean what the people competent enough to write
code for a great tool are too stupid to be able to know how to compile
it?

Do you really understand how your sentence is, at the same time, stupid
and inconsistent? You explain at the same time they know what they are
doing and they don't know what they are doing. You just explained you
need random people to help you find a general way to sort out what
experts do good and what they don't.

> I have experienced up to 40% performance increase using just compiler
> options.

I don't believe that. And your last video proves that it's a lie.

> But finding the best options can at times be difficult

Agreed. But I don't believe you can find them. And, I believe the distro
managers, helped with the people who provided the code, can do it. In
any case, it would take me hours to find better options than what's
provided by the distro managers helped by package producers to get
noticeable results on my own computer. Another way to state it: spending
hours to win few seconds each moths is a waste of my precious time.

-- 
Si vous avez du temps à perdre :
https://scarpet42.gitlab.io

[toc] | [prev] | [next] | [standalone]

#73951

From	Farley Flud <fsquared@fsquared.linux>
Date	2025-09-12 10:17 +0000
Message-ID	<186481985309dd34$12201$2237616$802601b3@news.usenetexpress.com>
In reply to	#73940

On Fri, 12 Sep 2025 03:02:09 -0400, c186282 wrote:

> 
>    THE best op is to proto, look/think for a few weeks,
>    then re-write.
> 
>    That will do FAR more than any compiler tweaks.
> 

Not necessarily.

Consider the Automatically Tuned Linear Algebra Software (ATLAS):

<https://math-atlas.sourceforge.net/>

Linear algebra (i.e. matrix operations) software is used as a standard
benchmark for all supercomputers.

The ATLAS program will automatically tune itself, using compiler options,
for the best performance on a particular machine.

ATLAS has some pre-determined options for certain CPUs but if a CPU
is not on the list ATLAS will then undergo an automatic tuning wherein
different options are tried and compared.

Compiler tweaks can make a big difference.

The original point of this thread is that all software should
emulate ATLAS to some extent.

-- 
Hail Linux!  Hail FOSS!  Hail Stallman!

[toc] | [prev] | [next] | [standalone]

#73958

From	c186282 <c186282@nnada.net>
Date	2025-09-12 06:51 -0400
Message-ID	<IfCdnU0dLJHfZl71nZ2dnZfqnPudnZ2d@giganews.com>
In reply to	#73951

On 9/12/25 6:17 AM, Farley Flud wrote:
> On Fri, 12 Sep 2025 03:02:09 -0400, c186282 wrote:
> 
>>
>>     THE best op is to proto, look/think for a few weeks,
>>     then re-write.
>>
>>     That will do FAR more than any compiler tweaks.
>>
> 
> Not necessarily.

   Should ......... always did for me .......

> Consider the Automatically Tuned Linear Algebra Software (ATLAS):
> 
> <https://math-atlas.sourceforge.net/>

   Ugh ....

   Gimme 'pure' 'C' or Pascal or FORTRAN.

   Linear algebra is not the best solution to everything.

> Linear algebra (i.e. matrix operations) software is used as a standard
> benchmark for all supercomputers.

   Not interested in that sort of benchmark.

> The ATLAS program will automatically tune itself, using compiler options,
> for the best performance on a particular machine.

   But we're not really talking 'compiler options' here
   but good/better/best source code. If your source is
   messy then no compiler can help you much.

> ATLAS has some pre-determined options for certain CPUs but if a CPU
> is not on the list ATLAS will then undergo an automatic tuning wherein
> different options are tried and compared.
> 
> Compiler tweaks can make a big difference.

   Depends. Garbage IN = Garbage OUT.

> The original point of this thread is that all software should
> emulate ATLAS to some extent.

   Ummm ... maybe in deep theory ...... but that's not
   how the 99% will do it. "Hello World" does NOT need
   this approach.

[toc] | [prev] | [next] | [standalone]

#73972

From	The Natural Philosopher <tnp@invalid.invalid>
Date	2025-09-12 15:45 +0100
Message-ID	<10a1bmp$3i30k$10@dont-email.me>
In reply to	#73951

On 12/09/2025 11:17, Farley Flud wrote:
> The ATLAS program will automatically tune itself, using compiler options,
> for the best performance on a particular machine.

How does it know what machine is the target?
-- 
There is something fascinating about science. One gets such wholesale 
returns of conjecture out of such a trifling investment of fact.

Mark Twain

[toc] | [prev] | [next] | [standalone]

#73976

From	Farley Flud <fsquared@fsquared.linux>
Date	2025-09-12 15:20 +0000
Message-ID	<1864922473a43b41$102$2557511$802601b3@news.usenetexpress.com>
In reply to	#73972

On Fri, 12 Sep 2025 15:45:45 +0100, The Natural Philosopher wrote:

> On 12/09/2025 11:17, Farley Flud wrote:
>> The ATLAS program will automatically tune itself, using compiler options,
>> for the best performance on a particular machine.
> 
> How does it know what machine is the target?
>

The tuning occurs during build time.  The "target" is the machine upon which
it is being built.

No binaries are distributed.  Only the source code is available.

However, some GNU/Linux distros will include binary Atlas packages
but these are necessarily sub-optimal builds.  Check out the blurb
from Fedora:

https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/a/atlas-3.10.3-30.fc43.x86_64.html

-- 
Hail Linux!  Hail FOSS!  Hail Stallman!

[toc] | [prev] | [next] | [standalone]

#74007

From	The Natural Philosopher <tnp@invalid.invalid>
Date	2025-09-13 07:57 +0100
Message-ID	<10a34kk$7bv3$1@dont-email.me>
In reply to	#73976

On 12/09/2025 16:20, Farley Flud wrote:
> On Fri, 12 Sep 2025 15:45:45 +0100, The Natural Philosopher wrote:
> 
>> On 12/09/2025 11:17, Farley Flud wrote:
>>> The ATLAS program will automatically tune itself, using compiler options,
>>> for the best performance on a particular machine.
>>
>> How does it know what machine is the target?
>>
> 
> The tuning occurs during build time.  The "target" is the machine upon which
> it is being built.
> 
That will do really nicely when I am compiling for an ARM 2040 on my *86 
machine, then...

> No binaries are distributed.  Only the source code is available.
> 
> However, some GNU/Linux distros will include binary Atlas packages
> but these are necessarily sub-optimal builds.  Check out the blurb
> from Fedora:
> 
> https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/a/atlas-3.10.3-30.fc43.x86_64.html
> 
> 
> 

-- 
“Politics is the art of looking for trouble, finding it everywhere, 
diagnosing it incorrectly and applying the wrong remedies.”
― Groucho Marx

[toc] | [prev] | [next] | [standalone]

#73980

From	John Ames <commodorejohn@gmail.com>
Date	2025-09-12 10:11 -0700
Message-ID	<20250912101103.00007f46@gmail.com>
In reply to	#73972

On Fri, 12 Sep 2025 15:45:45 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:

> > The ATLAS program will automatically tune itself, using compiler
> > options, for the best performance on a particular machine.  
> 
> How does it know what machine is the target?

Presumably it targets the machine on which it's running. Reminds me a
bit of one of the few genuinely smart things MS's .NET framework does -
part of the install/update process involves it auto-profiling/tuning
its core VM interpreter/library in situ so it can accurately benchmark
itself. That only affects raw VM performance (a bad algorithm running
on top of a fast VM is still gonna suck,) and it'd be more involved to
do something comparable with a native-code application (dynamic linking
might save you the trouble of a full recompile, but it'd still be non-
trivial,) but it *is* a nice touch.

[toc] | [prev] | [next] | [standalone]

#73919

From	c186282 <c186282@nnada.net>
Date	2025-09-12 00:15 -0400
Message-ID	<wIadnWI-CYX5A171nZ2dnZfqn_GdnZ2d@giganews.com>
In reply to	#73897

On 9/11/25 11:11 AM, John McCue wrote:
> Follow-ups trimmed to comp.os.linux.misc
> 
> In comp.os.linux.misc Lester Thorpe <lt@gnu.rocks> wrote:
>> Program optimization is essential, but yet it is difficult
>> to arrive at a best method.
> <snip>
>> I propose that GNU/Linux programmers should determine
>> the best options and then publish these recommendations
>> in the source tree to guide the interested user.
> 
> I find O1 is good enough for all programs I create.

   Yep.

   As for the actual writ code, write it once, wait
   a week or two, then write it over again better.
   I tended to proto in Python, then re-do in Pascal.
   The re-do was always a lot tighter/smarter.

> To, me, testing and retesting different optimizations is a
> huge waste of time and at most you might save 1 second :)

   Yep, esp at the compiler level.

   Refined source - might improve 25% or so. Drop
   unneeded/weird steps.

> For programs created by others, I keep whatever setting
> the use since they know much better than me.

   Well, not necessarily ....

[toc] | [prev] | [standalone]

csiph-web

Hint For GNU/Linux Progrmmers

Contents

#73831 — Hint For GNU/Linux Progrmmers

#73897

#73898

#73920

#73906

#73911

#73940

#73947

#74032

#73951

#73958

#73972

#73976

#74007

#73980

#73919