Post by Arne VajhøjPost by Camiel VanderhoevenPost by Arne VajhøjPost by Camiel VanderhoevenPost by Arne Vajhøjx86-64 in long mode only support 2 modes in PTE's, so
VMS x86-64 is a hardware 2 mode OS 4 mode OS - U in ring 3,
S, E and K in ring 0.
Not exactly.
Ring 3 is used for Exec, Super, and User
Ring 0 is used for kernel and for transitions between modes (SWIS)
Running Exec and Super in ring 0 would blow away the separation
(which, I might add, is there more for stability than for security,
before I unintentionally re-start that debate)
You are more afraid that DCL or RMS would step on VMS than
applications would step on DCL or RMS?
No, certainly not. That is why we have a separate set of page tables for
each mode. For instance, a page that has kernel write / exec read
protections is represented by the following PTEs in these 4 sets of page
kernel mode: S(upervisor) W(riteable)
exec mode: U(ser) R(eadable)
super mode: not present
user mode: not present
The more I think about the more fascinating it sounds.
char __align(13) buf[8192];
and the C code call SYS$SETPRT with PRT$C_UREW on that, then
it works like.
logK : write
logE : write
logS : read
logU : read
logK => page table with: physK : write, physU : ? (should not matter)
logE => page table with: physK : write, physU : write
logS => page table with: physK : write, physU : read
logU => page table with: physK : write, physU : read
It's pretty obvious that VMS has to use multiple page tables to
emulate systems with multiple protection modes on systems that
don't have such things in hardware. There's no other reasonable
artictecture.
On x86 in long mode, specifically, page table entries have bits
for readability (the "P", for "Present", bit implies that the
page is readable, unless memory protection keys are used, in
which case a page can be marked write- or execute-only),
writability (if set, the page is writeable; otherwise not); and
non-executability (if NX is set, the page is not executable,
otherwise it is).
Separately, there is a bit for whether the page is accessible
from userspace or not (the U/S bit): if set, the page can be
accessed from ring 3, in accordance with the other permission
bits, otherwise not. By default, page-level write permission
bits are ignored for supervisor mode stores (that is, stores
from any ring other than ring 3) unless the the `WP` bit in
control register CR0 is set; if CR0.WP is set and the page is
not marked writable, then the kernel can't write to it, unless
it the same page is also mapped with suitable permissions at
some other address.
A number of bits in CR4 and a handful of MSRs will also affect
behavior around page permission enforcement.
- Dan C.