Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > alt.comp.os.windows-11 > #18701

Re: CPU enumeration (OT)

From Paul <nospam@needed.invalid>
Newsgroups alt.comp.os.windows-11
Subject Re: CPU enumeration (OT)
Date 2025-04-26 07:02 -0400
Organization A noiseless patient Spider
Message-ID <vuieh2$28abr$1@dont-email.me> (permalink)
References <vuhtr9$1opqk$1@dont-email.me>

Show all headers | View raw


On Sat, 4/26/2025 2:18 AM, Jeff Barnett wrote:
> Background: I recall reading in the ACPI specs (a decade or so ago) that some parts of a computer description were an enumeration of logical cores (think hyper thread hardware in Intel) and a distance map between them. Distance was comprised of such information as which logical cores shared which cache levels and number of "hops" for the logical cores to signal one another. The purpose of this grand scheme was so that an OS cold intelligently allocate threads/processes to logical cores and thus contribute to efficient implementations. In today's world, the logical cores in the same physical core are very close and others are further away (the metric depends on design decisions beyond my pay grade or comprehension). For example, server configurations with multiple CPUs on the same mainboard have more interesting distance information than those that do not.
> 
> Questions: Utilities such as windows Windows' Task Manager and Core Temp report on various dynamic run time conditions for each logical core. My questions are fairly basic: 1) how is the enumeration of logical cores selected by these utilities, 2) is it possible from the enumeration to determine which logical cores comprise the same physical core, and 3) haw do I determine which of the physical and logical cores are E and which are P in Intel equipment?

as an AI would say, "now that's a good question" :-)

At one time, the determination of numbering made sense.
There were counting numbers, there were cores, the accounting
seemed to make sense.

Today, you see patterns in Task Manager, that indicate there
is another level of mapping going on.

Take my eight core processor here. Typically, six cores light
up and participate. Two cores can remain at zero utilization.
What should happen, is the IOAPIC spreads background I/O load
(delayed procedure interrupt handling) over top of all the
cores. For some reason on my machine, the background I/O load
isn't going onto those two cores. And the cores being excluded,
are not at the end of a number range. It isn't 7,8 excluded,
it might be a couple cores in the middle of the range of numbers.

There are also some notions of "preferred cores" when top clock
speed is required. On a CPU with two CCX silicon dice, one
seems to have its cores used to the 100% level. The second CCX
never seems to have full utilization, as if the clock rate
is different somehow.

They've reached a level of perfection, where now none of it
makes sense any more. I don't trust the numbering scheme
displayed on the screen, or, how it is being used.
Some good work there.

It reminds me of a chip we were using at work. The twits
who designed it, made all the register numbers arbitrary.
The register order might be 3,4,1,2 9,5,7... And the engineers
would explain (indirectly through their field people), that
during layout, "something happened" with the layout tool to
improve the layout requiring reassignment of register numbers.
The chip has four thousand registers. the notion of
"enumeration" had been soundly thrashed and put to bed.

And it would seem a similar whip has been applied to our
modern hardware. It doesn't take too many added layers,
before there is no intuitive sense that what is displayed
on the screen, is in any way, honest.

*******

One thing to note, is if you are using Windows Pro, that's
good up to 64 virtual cores. There is a thing called "core groups",
and if you had a 384 core processor, that would be six core groups.
Any time you have more than one core group, you want at least
Windows Workstation for that. The scheduler does not work properly
for Windows Pro, above 64 cores (you would lose some of your
max multi-threaded performance if using Windows Pro on a 96 core CPU).

Also at some point, the Task Manager display switches to Heat Map
mode, instead of CPU Graphs. the purpose of Server SKUs for
licensing, is for the licensing of capabilities, like some kind
of network feature perhaps. Domains maybe. The Server SKU should
also be able to handle Core Groups.

If you wanted an environment, where the numbering was exposed a
tiny bit, you might watch a Linux like Knoppix start up. It
used to put icons on the screen for each core. And numbers in
the dmesg trace, for the cores. But it was relying on the BIOS
for the numbering, I would presume. Maybe to some extent, the BIOS
was already hiding the distance issues and "address-numbering" for
core to core communications in the CCX. Some of the processors
have a "fabric" for communications, with high speed serial links
on the fabric, running at 56 Gbit/sec or 112Gbit/sec and multiple
of those. And your segmented L3 is stuck with the constrictions
those arrange. L3 on one processor here, is only three times
faster than system RAM.

And strangely, Intel used to have ring buses, and high core
count processors, like a 28 core processor, it would have
three ring buses, and some comms would have to "transfer"
from one ring to another. This could give dreadful performance,
and of your 28 cores, maybe you were only getting 16-24 cores
of performance due to comms issues. Then, they switches to
a "bus matrix", like maybe six buses horizontal and five buses
vertical, And then comms might be two hops. But recently,
some of the Intel designs, appear to have switched back to
the blasted rings. Your dual channel RAM, gets one bus stop,
and each core gets a bus stop. A four DRAM channel machine would
have two bus stops for the RAM interfaces.

what goes on, underneath, does matter, as it determines
whether some of your cores are starving, waiting for
their items to arrive in the mail. Bad comms equals
a lot of spin-wait.

And at some point, the silicon doesn't actually "like"
massive parallelism. AVX512 requires the CPU to be down-clocked
a bit. The microcode now includes clock speed and/or voltage
settings, per instruction type. That's how the 13900K
and 14900K were getting damaged, and that's patch-able
via a BIOS microcode update.

summary: I don't think you want to know what's going on down there.
         (Waves hands, gives hypnotic suggestion to wake up refreshed...)
         The fabric needs absolute address bits for the mail boxes,
         but how they number cores for the screen, who knows how
         that works.

         https://i.postimg.cc/2yxCfvgn/honest-accounting.gif

   Paul

Back to alt.comp.os.windows-11 | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

CPU enumeration (OT) Jeff Barnett <jbb@notatt.com> - 2025-04-26 00:18 -0600
  Re: CPU enumeration (OT) Paul <nospam@needed.invalid> - 2025-04-26 07:02 -0400
    Re: CPU enumeration (OT) Jeff Barnett <jbb@notatt.com> - 2025-04-27 01:01 -0600

csiph-web