Groups | Search | Server Info | Login | Register
| From | "Marven Lee" <marven10@gmail.com> |
|---|---|
| Newsgroups | alt.os.development, comp.arch |
| Subject | Protection rings within applications |
| Date | 2012-04-05 11:04 +0100 |
| Message-ID | <9u593lF5htU1@mid.individual.net> (permalink) |
Cross-posted to 2 groups.
*Decided to cross-post to comp.arch as I've seen user-mode protection rings previously mentioned and all the current talk of user-mode interrupts and signal handling in the M68k thread is somewhat related, so it might be of interest. I've been thinking of splitting user processes into 2 or more protection rings in my OS. This might be useful for some kind of virtualization or running a shell in the more privileged part of a process and commands within the least privileged ring of a process, possibly dividing the least privileged ring into multiple sandboxes. I've always assumed that privilege levels above user processes were just slightly less privileged parts of the kernel. I guess that is how the driver ring of OS/2 worked. I'm not sure how the rings of VMS works, whether the 3 more privileged rings all global or if the supervisor ring is per process or the VMS equivalent of a process? So I've thought of some ways to implement application protection rings and sandboxes using either segmentation or paging. With segmentation on x86 it is relatively easy to split a process into two rings using PL2 and PL3. If user mode spans from 0gb to 2gb then it is possible to set the PL2 and PL3 code and data rings as: PL0 code and data - base: 0gb limit: 4gb (kernel) PL2 code and data - base: 0gb limit 2gb (user) PL3 code and data - base: 0gb limit 1gb (user) Then add a call_monitor() trap gate to call into the PL2 ring and ensure the iopl field in the eflags register is 0. A better option would be to use only PL3 segments for the two user rings and adjust the base and limit of the PL3 segments using two system calls, call_monitor() and return(). These could be implemented as signals, with the addition of a new signal, sigmonitor. All other unmasked signals would trap into the more privileged ring so preemption would be possible. int call_monitor (int call_idx, void *args); void _sigreturn (&ret_context); call_idx could be passed to the signal handler in the siginfo->si_code field and a pointer to the args in the siginfo->si_value.sival_ptr. call_monitor() and other signals would expand the PL3 segment to the full 0-2gb, obviously switch stacks and return into the signal handler. Only the call_monitor() system call would be allowed from the least privileged ring. Returning from a signal via _sigreturn() would do the opposite, setting the base and limit of the segment to 0-1gb and returning to the stack within this ring. For more flexibility ret_context can hold a base and limit value and on each call to _sigreturn() the PL3 segment base and limits can be adjusted. So the least privileged ring doesn't have to be from 0-1gb, it can be from 64mb to 128mb for example. The lower 1gb portion of the address space could be split into several sandboxes and _sigreturn() would be used to switch to a particular sandbox. The address space could be layed out like this: Monitor (segment base: 0 limit: 2gb) ... Sandbox 2 (segment base: 128mb : limit 64mb) Sandbox 2 (segment base: 64mb : limit 64mb) Sandbox 1 (segment base: 0mb : limit 64mb) The monitor would have to implement copyin() and copyout() functions to access the data in the less privileged ring. These would have to catch sigsegv signals using setjmp()/longjmp() for example. As not every CPU has segmentation then to make it more portable paging alone can be used to implement the protection rings in user mode. Using 2 page directories per process it is possible to implement similar protection rings: Page Dir 1 - maps user 0 - 2gb , kernel 2-4gb Page Dir 2 - maps user 0 - 1gb , kernel 2-4gb Again two system calls, call_monitor() and _sigreturn() are used to transfer between rings by switching page directories. Of course this is slower than using just segments or altering the segment base and limits. The page directory entries of the 2nd page directory need to be altered whenever the base and limit of a sandbox changes. Also the granularity of the sandbox is limited to 4mb or whatever number of pages a page table holds. The ret_context of _sigreturn() could have a relocation flag that indicates that the pages of a sandbox should always be mapped starting at address zero. For example if the sandbox exists between 16mb to 20mb, the relocation flag would map it between 0mb to 4mb in the sandbox page directory. That way addresses would all be relative to 0 from within the sandbox. I've read that user-mode Linux did something similar to some of the above, protecting the guest OS by restricting the segment limits of guest OS processes to 1GB and supporting a page directory per guest OS process. It's a pity x86-64 long mode doesn't support segmentation, It could have allowed many large 4GB+ sandboxes in a single address space. Perhaps a mode a bit like virtual-8086 mode could have been added with a single base and limit. -- Marv
Back to comp.arch | Previous | Next — Next in thread | Find similar
Protection rings within applications "Marven Lee" <marven10@gmail.com> - 2012-04-05 11:04 +0100
Re: Protection rings within applications Antoine Leca <root@localhost.invalid> - 2012-04-05 13:56 +0200
Re: Protection rings within applications Morten Reistad <first@last.name> - 2012-04-10 11:02 +0200
Re: Protection rings within applications "Marven Lee" <marven10@gmail.com> - 2012-04-13 10:49 +0100
Re: Protection rings within applications "Rod Pemberton" <do_not_have@notemailnot.cmm> - 2012-04-05 09:16 -0400
Re: Protection rings within applications BGB <cr88192@hotmail.com> - 2012-04-06 12:05 -0700
Re: Protection rings within applications jgk@panix.com (Joe keane) - 2012-04-09 21:08 +0000
Re: Protection rings within applications James Harris <james.harris.1@gmail.com> - 2012-04-05 14:17 -0700
csiph-web