Groups | Search | Server Info | Login | Register
Groups > comp.arch.embedded > #32479
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Newsgroups | comp.arch.embedded |
| Subject | Re: arm-gcc, Cortex-M0+, uint64_t and alignment |
| Date | 2026-01-22 10:03 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <10ksp4q$2s8q3$1@dont-email.me> (permalink) |
| References | (5 earlier) <10kq1m7$1un41$1@dont-email.me> <10kq4mi$1vcdq$1@dont-email.me> <10kqpg9$26oa0$1@dont-email.me> <10kqu02$280ts$1@dont-email.me> <10kr0gf$29a7t$1@dont-email.me> |
On 21/01/2026 17:57, pozz wrote:
> Il 21/01/2026 17:13, David Brown ha scritto:
>> On 21/01/2026 15:58, pozz wrote:
>>> Il 21/01/2026 10:02, David Brown ha scritto:
>>>> On 21/01/2026 09:11, pozz wrote:
<snip for brevity>
> Most probably I can't explain what I want to say. I don't want to use an
> *improper* alignment (different from the one that gcc really is using).
> I want to know what happens when I *instruct* the compiler to use a
> 4-bytes alignment for uint64_t in the context of Cortex-M0+ core only.
>
I think we may have been talking slightly past each other, so that
re-wording was helpful.
> In other words, is it completely safe to use, as you suggested,
>
> typedef uint64_t __attribute__((align(4))) uint64_a;
>
> ???
Baring compiler bugs, yes, that is completely safe. When the compiler
lets you make such a type, and use it, it is the compiler's
responsibility to get the details right. You should never see issues
from the compiler's knowledge and assumptions of alignments, and it
should generate instructions that work on the target (for example, if
the target hardware required 8-byte alignment for 64-bit loads and
stores, then the compiler would generate two 32-bit accesses instead).
And for the Cortex M series, 4-byte alignment is the maximum needed for
working code (though there might be efficiency differences on some of
the biggest M cores that have 64-bit buses internally, or when data
caches are used).
>
> From what you wrote, I think yes. Maybe just a very small optimization
> penalty.
>
Yes. And I think that is a "missed optimisation opportunity" bug. I
suspect (or speculate), but have not looked at the compiler code to be
sure, that the code generator generally accesses the low half of 64-bit
data first. And then it may have specific optimisations ("peephole"
optimisations) for re-ordering the accesses for "long long" types in
certain circumstances, saving a register and an instruction. However,
that would apply only to the specific type - and while "uint64_a" works
a lot like "unsigned long long", it is not that exact type, and won't
trigger the same optimisation.
>
>>> From the goldbot link that you share in another post, it seems
>>> there's a penalty of a single instruction (it's strange, it seems the
>>> compiler needs to save the struct pointer to r3, before loading the
>>> two halves of the word, but only if uint64_t is aligned to 4-bytes).
>>>
>>
>> The compiler is not perfect here - there is definitely an extra
>> instruction because it is reading the low word first. (clang reads
>> the low word first for uint64_t as well, meaning it gives worse code
>> for A and B as well.) In real code, rather than a brief test snippet,
>> other factors could mean this does not happen - it's only because the
>> pointer happens to be in r0 that you see it here.
>>
>> But there's no harm in filing a gcc bug on this, looking for an
>> obvious improvement.
>>
>>>
>>>> I would be extremely surprised to find code that fails to work on an
>>>> M0+ because of a uint64_t pointer that is 4-byte aligned but not
>>>> 8-byte aligned. But if /I/ want to use 64-bit integers with 4-byte
>>>> alignments, I'd use the typedef'd aligned type for the object type
>>>> and for any relevant pointers.
>>>
>>> Yes, of course. Even if I don't understand why the compiler isn't
>>> able to align at 4-bytes address the uint64_t member in struct B.
>>>
>>
>> It can't align the uint64_t member because the EABI says uint64_t (or,
>> rather, unsigned long long) is 8 bytes aligned. gcc didn't make those
>> rules - ARM did.
>
> But struct B is defined with correct alignment attribute for uint64_t
> member. I tried also:
>
> struct B {
> uint32_t x;
> uint64_t y __attribute__((aligned(4)));
> };
>
> The struct size is always 16, so y is placed at offset 8 and not 4. It
> seems to me gcc isn't able to respect the aligned attribute of 4 bytes
> when it is specified inside the struct definition.
That is my conclusion too. (I tried the "aligned" attribute in every
place I could.) The only place it worked was on a typedef for the new
"uint64_a" type.
>
> I don't see many differences with:
>
> typedef __attribute__((aligned(4))) uint64_t uint64_a;
>
> struct C {
> uint32_t x;
> uint64_a y;
> };
>
It certainly seems inconsistent to me that it works on the typedef, and
not directly in the struct definition. After all, a typedef does not
actually define a new type (it's a silly name) - it merely defines an
alias or shortcut name for a type. So it would seem logical that using
"uint64_a" or "__attribute__((aligned(4))) uint64_t" in the struct
definition would mean exactly the same thing. But apparently not. gcc
attributes are not part of the normal C grammar, so there's no standard
to fall back on here.
>>
>> As I briefly mentioned before, there are a number of very poor choices
>> in the EABI (and the 32-bit ARM ABI used for Linux). This is far from
>> the worst.
>>
>>>>> However, what really changes in the binary output?
>>>>>
>>>>> In some cases, the address of uint64_t can change from 8-bytes to
>>>>> 4-bytes aligned address (because we instructed it to do so). What
>>>>> about the code that accesses uint64_t aligned to 4-bytes? Is it
>>>>> identical between 4- and 8-bytes alignment requirement? I think so,
>>>>> because in both case, the compiler should add two load/store
>>>>> 4-bytes instructions.
>>>>>
>>>>
>>>
>>
>
Back to comp.arch.embedded | Previous | Next — Previous in thread | Next in thread | Find similar
arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-20 13:26 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-20 17:07 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment Grant Edwards <invalid@invalid.invalid> - 2026-01-20 16:41 +0000
Re: arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-20 18:09 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-20 18:44 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-21 09:11 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-21 10:02 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-21 15:58 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-21 17:13 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-21 17:57 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-22 10:03 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment Grant Edwards <invalid@invalid.invalid> - 2026-01-20 17:48 +0000
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-20 18:41 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment Grant Edwards <invalid@invalid.invalid> - 2026-01-20 18:10 +0000
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-20 22:32 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment Grant Edwards <invalid@invalid.invalid> - 2026-01-21 03:38 +0000
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-21 08:54 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment pozz <pozzugno@gmail.com> - 2026-01-20 17:55 +0100
Re: arm-gcc, Cortex-M0+, uint64_t and alignment David Brown <david.brown@hesbynett.no> - 2026-01-20 22:24 +0100
csiph-web