Groups | Search | Server Info | Login | Register


Groups > alt.os.development > #18743

Re: [OSDev] How to switch to long mode in x86 CPUs?

From cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups alt.os.development
Subject Re: [OSDev] How to switch to long mode in x86 CPUs?
Date 2025-03-02 15:17 +0000
Organization PANIX Public Access Internet and UNIX, NYC
Message-ID <vq1sqj$abd$1@reader1.panix.com> (permalink)
References <871pvje5yq.fsf@onesoftnet.eu.org> <vpv19u$b1b$1@reader1.panix.com> <vq1fp5$oi9d$1@dont-email.me>

Show all headers | View raw


In article <vq1fp5$oi9d$1@dont-email.me>,
Ar Rakin  <rakinar2@onesoftnet.eu.org> wrote:
>On 3/1/25 7:15 PM, Dan Cross wrote:
>> [Note: Followup-To: set to alt.os.development]
>> 
>> In article <871pvje5yq.fsf@onesoftnet.eu.org>,
>> Ar Rakin  <rakinar2@onesoftnet.eu.org> wrote:
>>> Hello there,
>>>
>>> I am trying to develop my own, simple operating system to learn more
>>> about how kernels work and low level stuff like that.  However, I am
>>> stuck at setting up paging while switching long mode (64-bit protected
>>> mode) in x86 processors.
>> 
>> As has been mentioned, comp.lang.c is not the appropriate place
>> to ask this.  I have set the `Followup-To:` header to
>> alt.os.development, and am cross-posting this post to that
>> newsgroup.
>> 
>>> The assembly code I currently have:
>>>
>>> #define PG_START 0x000000000
>> 
>> Just to be clear, this means that you have decide to make your
>> _virtual_ address space starts at absolute address 0?  What
>> address do you link your kernel at?
>
>The page start address was originally 0x1000000, due to the triple fault 

0x10_0000 is 16MiB; perhaps a typo?

>I was trying to use different addresses.

So there's a difference between the addresses where symbols in
your kernel are linked, and where the physical memory location
where the boot loader puts your kernel.  The former are,
generally speaking, virtual addresses in some address space that
it is your job to create in early boot time, while the latter
is some a physical address, usually near the start of RAM.

>Sorry for the confusion.  And 
>the kernel is loaded by GRUB, and apparently that means the kernel gets 
>loaded at 0x100000.

No worries, but note an important distinction here.  The kernel
may be loaded into memory (and started running...) by grub, but
it is linked by some other program.  The former is a run time
think, but the later is build time.

Your goal here is to take care and make sure that you're mapping
things so that the virtual addresses chosen by your linker when
you build match the actual addresses those things appear at in
the address space you create with your page tables at run time.
Once the MMU is on, all references to "memory" will go through
it's translations, so everything you need to get yourself going
must be in those initial page tables.

So while the boot loader, grub in this case, may well load you
at 1MiB in the physical address space, but you've got to make
sure that when you turn on paging, you've configured the MMU
with a set of page tables that includes a mapping so that the
next instruction executed by the processor is going to be mapped
at the address that is already in %rip.  The trick here is to
note that that address is in the instruction pointer register as
soon as the MMU is turned on.  Sometimes this isn't the address
that you linked at (most kernels load themselves into the upper
portion of the address space and leave the lower part for user
programs), but a goal in early boot should be to get everything
running at and referencing things at those addresses as soon as
possible.

>>> #define MSR_EFER 0xc0000080
>>>
>>> .section .bss, "aw", @nobits
>>> .align 4096
>>> pml4_tbl:
>>>   .skip 4096
>>> pdpt_tbl:
>>>   .skip 4096
>> 
>> This is fine, but note that, instead of using code to fill in
>> your page tables, you could simply define them here with the
>> expected entries, as they are very simple.
>
>I tried doing something like this:
>
>pml4_tbl:
>   .quad pdpt_tbl | 0x3
>
>But the assembler complained that I can't use the '|' operator.  Is 
>there any other way I could define entries directly here?

Yes, use addition.  A property of x86 page tables is that every
(non-empty) interior entry in the page table radix tree includes
the address of a 4KiB physical page frame that is 4KiB aligned;
this means that the lower 12 bits of the address portion of such
entries must be clear; indeed, a careful reader will note that
the various flag bits that go into entries in the various page
table structures mostly confine flag bits in the low 12 bits (NX
is the high bit, as an exception to this).  So to set flag bits
and so in the assembler you can freely add those bits to the
address without worrying about carries changing the value.

Leaf entries in the tree may point to larger frames (2MiB, or
1GiB as you are using), but these must still be aligned to their
sizes, so the same arithmetic properties hold for those as well. 

So here, you can write:

pml4_tbl:
	.quad pdpt_tbl + 0x3

But you don't have to do it that way if you don't want; your
approach of writing these in code is fine, and it's not like the
page table police are going to show up at your door if you don't
define them differently.  :-)

Andy Valencia already posted a link to VSTa, and Scott Lurndal
posted assembly code suitable for setting up a machine running
a kernel written in C++, but if you want to see another example
of how to do this you could have a look at:
https://github.com/dancrossnyc/rxv64/blob/main/kernel/src/entry.S
(I used 2MiB pages here and only mapped the first 128MiB to
avoid relying on huge pages for obscure pedagogical reasons, but
the overall structure may be useful as an example.  As Andy
said, you should be cribbing from other projects).

>>> .text
>>> .globl _mboot_start
>>> _mboot_start:
>>>   /* GRUB executes this code in 32-bit protected mode. */
>>>
>>>   /* Write (pdpt_tbl | 0x3) to the first 8 bytes of pml4_tbl */
>>>   movl $pdpt_tbl, %eax
>>>   orl $0x3, %eax
>>>   movl $pml4_tbl, %edi
>>>   movl %eax, (%edi)
>> 
>> Note that this sequence implicitly assumes that you are starting
>> with an identity mapping between between the physical and
>> virtual address spaces.  In particular, when you
>> `movl $pdpt_tbl, %eax` you are copying whatever address the
>> linker assigns to `$pdpt_tbl` into %eax (the low 32-bits of it
>> anyway, though the assembler would probably sqwuak at you if
>> didn't fit into a 32 bit immediate).  Page table entries must
>> refer to physical addresses, so if you've arranged for the
>> linker to use some base address other than 0 for your kernel,
>> you've got to take care to account for an offset here.
>
>Yes, I have the following linker script:
>
>[snip]

That looks reasonable. In this case, you're directing the linker
to start the binary at 1M, so your symbols should start there.
I'd probably dump the kernel image to verify
(e.g., `objdump -x kern.elf`, presuming you're building an ELF
binary).

>>>   xorl %eax, %eax
>>>   movl %eax, 4(%edi)
>> 
>> Note that, as you're doing this in assembly, the upper bits in
>> the table are already filled with zeros, so there's no need for
>> the `xorl %eax, %eax; movl %eax 4(%edi)` sequence.
>
>Makes sense!
>
>>>   movl $pdpt_tbl, %edi
>>>   movl $PG_START, %eax
>>>   /* 0x83 = 0b10000011; flags: present, writable, upervisor-only,
>>>      1GB huge page */
>>>   movl $0x83, (%edi)
>>>   movl %eax, 4(%edi)
>> 
>> This looks correct.  Your page tables will now map a single
>> gigabyte of address space starting at (virtual) address zero to
>> physical address 0, and nothing else.  To be clear, is that what
>> you want?  When coming out of protected mode, I generally try to
>> map the whole 32-bit address space; that is, all 4 GiB.
>
>Now that I have mentioned the page start address (0x1000000), yeah, I 
>wanted to create 1:1 identity page mapping for simplicity.

That's fine to get started, but I think your page start address
logic might be slightly off.  If I go back to your code, you are
taking the address and writing it into the first entry in your
page table, but your page start in this case is the _physical_
address that that entry will map to.  The _virtual_ address of a
mapping is, in a sense, isomoprhic to the path that the MMU will
take as it walks from the root to some leaf; that is, the
indices in the tables it processes are functions of the virtual
address.  So by putting 0x10_0000 into the first entry in the
PDPT, and marking it a huge page by setting bit 7 in the PDPTE,
what you're telling the MMU is that virtual address 0 should map
to physical address 0x10_0000, which does not sound like what
you want.

Note also that this is an error: the hardware requires that
pages map to frames that are aligned to the page size.  Here,
you're trying to map a gigabyte page to 16MiB, which is clearly
not aligned to a GiB boundary.  That will certainly fault.

In general, you need to take alignment into account when
creating mappings.  Suppose, for example, that you'd like to map
some (say) 5 byte datum at some arbitrary byte address into your
address space; there's no 5 byte page size, so you have to have
to take the start address of the data and round that down to the
nearest page size you support and map that.  But since the
address of the datum is arbitrary, it might span across a page
boundary, so you've got to take the end address and _also_ round
that down (not up!) to the nearest page boundary, and make sure
that's mapped as well.  In this case, a 5 byte mapping can only
span a maximum of two pages, but for larger objects, you'd have
to ensure the entire page spanned by the object is mapped.

>[snip] 
>>>   /* Jump to 64-bit code */
>>>   ljmpl $0x08, $long_mode_entry
>> 
>> Have you set up a GDT with an entry for a 64-bit code segment
>> by this point?  It doesn't look like it.  My guess is that that
>> is the source of your fault; note that the multiboot1 spec says
>> that you must set up a GDT and should not rely on the one that
>> it set up to get you into 32-bit protected mode.  Certainly
>> there is no guarantee that there's a 64-bit code segment at
>> offset 0x8 in whatever table it set up.
>> 
>> My guess is that this is the source of your problem.
>
>Correct, thanks!  I forgot to load a GDT.
>
>[snip]
>
>Oh yeah!  I also need to set up a stack.

Oh great; does that mean it's working now?

>[snip]
>Thank you for your help!

Sure thing!

	- Dan C.

Back to alt.os.development | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: [OSDev] How to switch to long mode in x86 CPUs? cross@spitfire.i.gajendra.net (Dan Cross) - 2025-03-01 13:15 +0000
  Re: [OSDev] How to switch to long mode in x86 CPUs? Andy Valencia <vandys@vsta.org> - 2025-03-01 20:02 -0800
  Re: [OSDev] How to switch to long mode in x86 CPUs? Ar Rakin <rakinar2@onesoftnet.eu.org> - 2025-03-02 17:35 +0600
    Re: [OSDev] How to switch to long mode in x86 CPUs? cross@spitfire.i.gajendra.net (Dan Cross) - 2025-03-02 15:17 +0000
      Re: [OSDev] How to switch to long mode in x86 CPUs? Ar Rakin <rakinar2@onesoftnet.eu.org> - 2025-04-20 00:59 +0600
        Re: [OSDev] How to switch to long mode in x86 CPUs? cross@spitfire.i.gajendra.net (Dan Cross) - 2025-04-20 14:48 +0000

csiph-web