Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > alt.comp.os.windows-11 > #18726
| From | Jeff Barnett <jbb@notatt.com> |
|---|---|
| Newsgroups | alt.comp.os.windows-11 |
| Subject | Re: CPU enumeration (OT) |
| Date | 2025-04-27 01:01 -0600 |
| Organization | A noiseless patient Spider |
| Message-ID | <vukknj$8epm$1@dont-email.me> (permalink) |
| References | <vuhtr9$1opqk$1@dont-email.me> <vuieh2$28abr$1@dont-email.me> |
On 4/26/2025 5:02 AM, Paul wrote: > On Sat, 4/26/2025 2:18 AM, Jeff Barnett wrote: >> Background: I recall reading in the ACPI specs (a decade or so ago) that some parts of a computer description were an enumeration of logical cores (think hyper thread hardware in Intel) and a distance map between them. Distance was comprised of such information as which logical cores shared which cache levels and number of "hops" for the logical cores to signal one another. The purpose of this grand scheme was so that an OS cold intelligently allocate threads/processes to logical cores and thus contribute to efficient implementations. In today's world, the logical cores in the same physical core are very close and others are further away (the metric depends on design decisions beyond my pay grade or comprehension). For example, server configurations with multiple CPUs on the same mainboard have more interesting distance information than those that do not. >> >> Questions: Utilities such as windows Windows' Task Manager and Core Temp report on various dynamic run time conditions for each logical core. My questions are fairly basic: 1) how is the enumeration of logical cores selected by these utilities, 2) is it possible from the enumeration to determine which logical cores comprise the same physical core, and 3) haw do I determine which of the physical and logical cores are E and which are P in Intel equipment? > > as an AI would say, "now that's a good question" :-) > > At one time, the determination of numbering made sense. > There were counting numbers, there were cores, the accounting > seemed to make sense. > > Today, you see patterns in Task Manager, that indicate there > is another level of mapping going on. > > Take my eight core processor here. Typically, six cores light > up and participate. Two cores can remain at zero utilization. > What should happen, is the IOAPIC spreads background I/O load > (delayed procedure interrupt handling) over top of all the > cores. For some reason on my machine, the background I/O load > isn't going onto those two cores. And the cores being excluded, > are not at the end of a number range. It isn't 7,8 excluded, > it might be a couple cores in the middle of the range of numbers. > > There are also some notions of "preferred cores" when top clock > speed is required. On a CPU with two CCX silicon dice, one > seems to have its cores used to the 100% level. The second CCX > never seems to have full utilization, as if the clock rate > is different somehow. > > They've reached a level of perfection, where now none of it > makes sense any more. I don't trust the numbering scheme > displayed on the screen, or, how it is being used. > Some good work there. > > It reminds me of a chip we were using at work. The twits > who designed it, made all the register numbers arbitrary. > The register order might be 3,4,1,2 9,5,7... And the engineers > would explain (indirectly through their field people), that > during layout, "something happened" with the layout tool to > improve the layout requiring reassignment of register numbers. > The chip has four thousand registers. the notion of > "enumeration" had been soundly thrashed and put to bed. > > And it would seem a similar whip has been applied to our > modern hardware. It doesn't take too many added layers, > before there is no intuitive sense that what is displayed > on the screen, is in any way, honest. > > ******* > > One thing to note, is if you are using Windows Pro, that's > good up to 64 virtual cores. There is a thing called "core groups", > and if you had a 384 core processor, that would be six core groups. > Any time you have more than one core group, you want at least > Windows Workstation for that. The scheduler does not work properly > for Windows Pro, above 64 cores (you would lose some of your > max multi-threaded performance if using Windows Pro on a 96 core CPU). > > Also at some point, the Task Manager display switches to Heat Map > mode, instead of CPU Graphs. the purpose of Server SKUs for > licensing, is for the licensing of capabilities, like some kind > of network feature perhaps. Domains maybe. The Server SKU should > also be able to handle Core Groups. > > If you wanted an environment, where the numbering was exposed a > tiny bit, you might watch a Linux like Knoppix start up. It > used to put icons on the screen for each core. And numbers in > the dmesg trace, for the cores. But it was relying on the BIOS > for the numbering, I would presume. Maybe to some extent, the BIOS > was already hiding the distance issues and "address-numbering" for > core to core communications in the CCX. Some of the processors > have a "fabric" for communications, with high speed serial links > on the fabric, running at 56 Gbit/sec or 112Gbit/sec and multiple > of those. And your segmented L3 is stuck with the constrictions > those arrange. L3 on one processor here, is only three times > faster than system RAM. > > And strangely, Intel used to have ring buses, and high core > count processors, like a 28 core processor, it would have > three ring buses, and some comms would have to "transfer" > from one ring to another. This could give dreadful performance, > and of your 28 cores, maybe you were only getting 16-24 cores > of performance due to comms issues. Then, they switches to > a "bus matrix", like maybe six buses horizontal and five buses > vertical, And then comms might be two hops. But recently, > some of the Intel designs, appear to have switched back to > the blasted rings. Your dual channel RAM, gets one bus stop, > and each core gets a bus stop. A four DRAM channel machine would > have two bus stops for the RAM interfaces. > > what goes on, underneath, does matter, as it determines > whether some of your cores are starving, waiting for > their items to arrive in the mail. Bad comms equals > a lot of spin-wait. > > And at some point, the silicon doesn't actually "like" > massive parallelism. AVX512 requires the CPU to be down-clocked > a bit. The microcode now includes clock speed and/or voltage > settings, per instruction type. That's how the 13900K > and 14900K were getting damaged, and that's patch-able > via a BIOS microcode update. > > summary: I don't think you want to know what's going on down there. > (Waves hands, gives hypnotic suggestion to wake up refreshed...) > The fabric needs absolute address bits for the mail boxes, > but how they number cores for the screen, who knows how > that works. > > https://i.postimg.cc/2yxCfvgn/honest-accounting.gif Thanks for your thoughts on the matter. I was poking around (not very ably) trying to figure out if M$ was actually using all this layout information that they are handed on every boot if I understand correctly. I mean, after all, they help design that information transfer. It could be so much more valuable to their paying customers than deciding when is the most intrusive time to insert commercials in the games included with their OS. -- Jeff Barnett
Back to alt.comp.os.windows-11 | Previous | Next — Previous in thread | Find similar | Unroll thread
CPU enumeration (OT) Jeff Barnett <jbb@notatt.com> - 2025-04-26 00:18 -0600
Re: CPU enumeration (OT) Paul <nospam@needed.invalid> - 2025-04-26 07:02 -0400
Re: CPU enumeration (OT) Jeff Barnett <jbb@notatt.com> - 2025-04-27 01:01 -0600
csiph-web