CPU Switches from Kernel mode to User Mode on X86 : When and How?

From:


http://stackoverflow.com/questions/13243958/cpu-switches-from-kernel-mode-to-user-mode-on-x86-when-and-how?rq=1


When and how does CPU Switch from Kernel mode to User Mode On X86 : What exactly does it do? How does it makes this transition?


answer 1:

iret does this for example. See the code here (INTERRUPT_RETURN macro)


answer 2:

In x86 protected mode, the current privilege level that the CPU is executing in is controlled by the two least significant bits of the CS register (the RPL field of the segment selector).

So a switch from kernel mode (CPL=0) to user mode (CPL=3) is accomplished by replacing a kernel-mode CS value with a user-mode one. There's many ways to do this, but one typical one is an IRETinstruction which pops the EIPCS and EFLAGS registers from the stack.

Segment Selectors



Requested Privilege Level (RPL)

(Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level can range from 0 to 
3, with 0 being the most privileged level. See Section 5.5, “Privilege Levels”, for a description of the 
relationship of the RPL to the CPL of the executing program (or task) and the descriptor privilege 
level (DPL) of the descriptor the segment selector points to.


load instructions

Two kinds of load instructions are provided for loading the segment registers:

1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instructions. These instructions 
explicitly reference the segment registers.

2. Implied load instructions such as the far pointer versions of the CALL, JMP, and RET instructions, the SYSENTER 
and SYSEXIT instructions, and the IRET, INTn, INTO and INT3 instructions. These instructions change the 
contents of the CS register (and sometimes other segment registers) as an incidental part of their operation.
The MOV instruction can also be used to store visible part of a segment register in a general-purpose register.

Segment Registers



 The information cached in the segment register (visible and hidden) allows the processor to translate addresses without taking extra bus 
cycles to read the base address and limit from the segment descriptor. 


Segment Descriptors


DPL (descriptor privilege level) field

Specifies the privilege level of the segment. The privilege level can range from 0 to 3, with 0 being 
the most privileged level. The DPL is used to control access to the segment. See Section 5.5, “Priv-
ilege Levels”, for a description of the relationship of the DPL to the CPL of the executing code 
segment and the RPL of a segment selector.


PRIVILEGE LEVELS

Current privilege level (CPL) 

— The CPL is the privilege level of the currently executing program or task. 

It is stored in bits 0 and 1 of the CS and SS segment registers. (存在于段寄存器中,段寄存器有cs, ds,

ss, es, fs, gs, CPL 仅仅存在于CS, SS段寄存器中。对于DS的特权问题,因为DS中没有CPL,才会导致有RPL的概念) 

Normally, the CPL is equal to the privilege level of 

the code segment from which instructions are being fetched. The processor changes the CPL when program 
control is transferred to a code segment with a different privilege level. The CPL is treated slightly differently 
when accessing conforming code segments. Conforming code segments can be accessed from any privilege 
level that is equal to or numerically greater (less privileged) than the DPL of the conforming code segment. 
Also, the CPL is not changed when the processor accesses a conforming code segment that has a different 
privilege level than the CPL.


Descriptor privilege level (DPL) 

— The DPL is the privilege level of a segment or gate. 

It is stored in the DPL field of the segment or gate descriptor for the segment or gate (存在于GDT中,GDT存在于

内存中,所以这些都存在GDT所在的内存中). 

When the currently executing code segment 
attempts to access a segment or gate, the DPL of the segment or gate is compared to the CPL and RPL of the 
segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on 
the type of segment or gate being accessed

Requested privilege level (RPL)

 — The RPL is an override privilege level that is assigned to segment selectors. 

It is stored in bits 0 and 1 of the segment selector(存在于临时构造的段选择子中,因此也是在内存中). 

The processor checks the RPL along with the CPL 
to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has 
sufficient privilege to access the segment, access is denied if the RPL is not of sufficient privilege level. That is, 
if the RPL of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and vice versa. 
The RPL can be used to insure that privileged code does not access a segment on behalf of an application 
program unless the program itself has access privileges for that segment. 




IRET:



PROTECTED-MODE:

    IF VM = 1 (* Virtual-8086 mode: PE = 1, VM = 1 *)
        THEN 
            GOTO RETURN-FROM-VIRTUAL-8086-MODE; (* PE = 1, VM = 1 *)
    FI;
    IF NT = 1
        THEN 
            GOTO TASK-RETURN; (* PE = 1, VM = 0, NT = 1 *)
    FI;
    IF OperandSize = 32
        THEN
            IF top 12 bytes of stack not within stack limits
                THEN #SS(0); FI;
            tempEIP ← Pop();
            tempCS ← Pop();
            tempEFLAGS ← Pop();

        ELSE (* OperandSize = 16 *)
            IF top 6 bytes of stack are not within stack limits
                THEN #SS(0); FI;
            tempEIP ← Pop();
            tempCS ← Pop();
            tempEFLAGS ← Pop();
            tempEIP ← tempEIP AND FFFFH;
            tempEFLAGS ← tempEFLAGS AND FFFFH;
    FI;
    IF tempEFLAGS(VM) = 1 and CPL = 0
        THEN 
            GOTO RETURN-TO-VIRTUAL-8086-MODE; 
        ELSE 
            GOTO PROTECTED-MODE-RETURN;
    FI;


PROTECTED-MODE-RETURN: (* PE = 1 *)

IF return code segment selector is NULL
    THEN GP(0); FI;
IF return code segment selector addresses descriptor beyond descriptor table limit 
    THEN GP(selector); FI;
Read segment descriptor pointed to by the return code segment selector;
IF return code segment descriptor is not a code segment
    THEN #GP(selector); FI;
IF return code segment selector RPL < CPL 
    THEN #GP(selector); FI;
IF return code segment descriptor is conforming
and return code segment DPL > return code segment selector RPL
    THEN #GP(selector); FI;
IF return code segment descriptor is not present 
    THEN #NP(selector); FI;
IF return code segment selector RPL > CPL 
    THEN GOTO RETURN-OUTER-PRIVILEGE-LEVEL;
    ELSE GOTO RETURN-TO-SAME-PRIVILEGE-LEVEL; FI;
END;    
    

RETURN-TO-OUTER-PRIVILEGE-LEVEL:

    IF OperandSize = 32
        THEN
            IF top 8 bytes on stack are not within limits 
                THEN #SS(0); FI;
        ELSE (* OperandSize = 16 *)
            IF top 4 bytes on stack are not within limits 
                THEN #SS(0); FI;
    FI;
    Read return segment selector;
    IF stack segment selector is NULL
        THEN #GP(0); FI;
    IF return stack segment selector index is not within its descriptor table limits
        THEN #GP(SSselector); FI;
    Read segment descriptor pointed to by return segment selector;
    IF stack segment selector RPL ≠ RPL of the return code segment selector
    or the stack segment descriptor does not indicate a a writable data segment;
    or the stack segment DPL ≠ RPL of the return code segment selector
        THEN #GP(SS selector); FI;
    IF stack segment is not present 
        THEN #SS(SS selector); FI;
    IF new mode ≠ 64-Bit Mode
        THEN
            IF tempEIP is not within code segment limits 
                THEN #GP(0); FI;
            EIP ← tempEIP;
        ELSE (* new mode = 64-bit mode *)
            IF tempRIP is non-canonical
                THEN #GP(0); FI;
            RIP ← tempRIP;
        FI;
        CS ← tempCS;
        EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) ← tempEFLAGS;
        IF OperandSize = 32
            THEN EFLAGS(RF, AC, ID) ← tempEFLAGS; FI;
        IF CPL ≤ IOPL 
            THEN EFLAGS(IF) ← tempEFLAGS; FI;
        IF CPL = 0
            THEN
                EFLAGS(IOPL) ← tempEFLAGS;
                IF OperandSize = 32
                    THEN EFLAGS(VM, VIF, VIP) ← tempEFLAGS; FI;
                IF OperandSize = 64
                    THEN EFLAGS(VIF, VIP) ← tempEFLAGS; FI;
        FI;
        CPL ← RPL of the return code segment selector;
        FOR each of segment register (ES, FS, GS, and DS)
            DO
                IF segment register points to data or non-conforming code segment
                and CPL > segment descriptor DPL (* Stored in hidden part of segment register *)
                    THEN (* Segment register invalid *)
                        SegmentSelector ← 0; (* NULL segment selector *)
                FI;
            OD;
    END;


    

INT n:



PROTECTED-MODE:

    IF ((vector_number « 3) + 7) is not within IDT limits
    or selected IDT descriptor is not an interrupt-, trap-, or task-gate type
        THEN #GP(error_code(vector_number,1,EXT)); FI;
        (* idt operand to error_code set because vector is used *)
    IF software interrupt (* Generated by INT n, INT3, or INTO *)
        THEN
            IF gate DPL < CPL (* PE = 1, DPL < CPL, software interrupt *)
                THEN #GP(error_code(vector_number,1,0)); FI;
                (* idt operand to error_code set because vector is used *)
                (* ext operand to error_code is 0 because INT n, INT3, or INTO*)
    FI;
    IF gate not present 
        THEN #NP(error_code(vector_number,1,EXT)); FI;
        (* idt operand to error_code set because vector is used *)
    IF task gate (* Specified in the selected interrupt table descriptor *)
        THEN GOTO TASK-GATE;
        ELSE GOTO TRAP-OR-INTERRUPT-GATE; (* PE = 1, trap/interrupt gate *)
    FI;
END;


TRAP-OR-INTERRUPT-GATE:

    Read new code-segment selector for trap or interrupt gate (IDT descriptor);
    IF new code-segment selector is NULL
        THEN #GP(EXT); FI; (* Error code contains NULL selector *)
    IF new code-segment selector is not within its descriptor table limits 
        THEN #GP(error_code(new code-segment selector,0,EXT)); FI;
        (* idt operand to error_code is 0 because selector is used *)
    Read descriptor referenced by new code-segment selector;
    IF descriptor does not indicate a code segment or new code-segment DPL > CPL
        THEN #GP(error_code(new code-segment selector,0,EXT)); FI;
        (* idt operand to error_code is 0 because selector is used *)
    IF new code-segment descriptor is not present, 
        THEN #NP(error_code(new code-segment selector,0,EXT)); FI;
        (* idt operand to error_code is 0 because selector is used *)
    IF new code segment is non-conforming with DPL < CPL
        THEN 
            IF VM = 0
                THEN 
                    GOTO INTER-PRIVILEGE-LEVEL-INTERRUPT; 
                    (* PE = 1, VM = 0, interrupt or trap gate, nonconforming code segment,
                    DPL < CPL *)

                ELSE (* VM = 1 *)
                    IF new code-segment DPL ≠ 0 
                        THEN #GP(error_code(new code-segment selector,0,EXT));
                        (* idt operand to error_code is 0 because selector is used *)
                    GOTO INTERRUPT-FROM-VIRTUAL-8086-MODE; FI;
                    (* PE = 1, interrupt or trap gate, DPL < CPL, VM = 1 *)
            FI;
        ELSE (* PE = 1, interrupt or trap gate, DPL ≥ CPL *)
            IF VM = 1 
                THEN #GP(error_code(new code-segment selector,0,EXT));
                (* idt operand to error_code is 0 because selector is used *)
            IF new code segment is conforming or new code-segment DPL = CPL
                THEN 
                    GOTO INTRA-PRIVILEGE-LEVEL-INTERRUPT; 
            ELSE (* PE = 1, interrupt or trap gate, nonconforming code segment, DPL > CPL *)
                #GP(error_code(new code-segment selector,0,EXT));
                (* idt operand to error_code is 0 because selector is used *)
            FI;
    FI;
END;


INTER-PRIVILEGE-LEVEL-INTERRUPT:

...
    IF IDT gate is 32-bit
        THEN 
            CS:EIP ← Gate(CS:EIP); (* Segment descriptor information also loaded *)
        ELSE 
            IF IDT gate 16-bit
                THEN 
                    CS:IP ← Gate(CS:IP); 
                    (* Segment descriptor information also loaded *)
                ELSE (* 64-bit IDT gate *)
                    CS:RIP ← Gate(CS:RIP); 
                    (* Segment descriptor information also loaded *)
        FI;
    FI;
    IF IDT gate is 32-bit
        THEN
            Push(far pointer to old stack); 
            (* Old SS and ESP, 3 words padded to 4 *)
            Push(EFLAGS);
            Push(far pointer to return instruction); 
            (* Old CS and EIP, 3 words padded to 4 *)
            Push(ErrorCode); (* If needed, 4 bytes *)
        ELSE
            IF IDT gate 16-bit
                THEN
                    Push(far pointer to old stack); 
                    (* Old SS and SP, 2 words *)
                    Push(EFLAGS(15-0]);
                    Push(far pointer to return instruction); 
                    (* Old CS and IP, 2 words *)
                    Push(ErrorCode); (* If needed, 2 bytes *)
                ELSE (* 64-bit IDT gate *)
                    Push(far pointer to old stack); 
                    (* Old SS and SP, each an 8-byte push *)
                    Push(RFLAGS); (* 8-byte push *)
                    Push(far pointer to return instruction); 
                    (* Old CS and RIP, each an 8-byte push *)
                    Push(ErrorCode); (* If needed, 8-bytes *)
            FI;
    FI;
    CPL ← new code-segment DPL;
    CS(RPL) ← CPL;
    IF IDT gate is interrupt gate
        THEN IF ← 0 (* Interrupt flag set to 0, interrupts disabled *); FI;
    TF ← 0;
    VM ← 0;
    RF ← 0;
    NT ← 0;
END;


发生中断时候的栈

INT 指令

相当于

(push SS)

(push ESP)

push EFLAGS

push CS

push EIP

push Error Code


IRET 指令

相当于

pop EIP

pop CS

pop EFLAGS

(pop ESP)

(pop SS)


进程上下文切换的时候,切换地址空间 switch_mm, 切换栈空间,thread_info( task_struct), 切换CS-EIP

通过软件和硬件一起实现, 硬件包括CR3, TSS

中断上下文切换,在内核空间,仅仅切换CS-EIP, 不切换栈空间和地址空间

                               在用户空间,切换CS-EIP, 和栈空间

纯粹通过硬件机制实现,INT指令和IRET指令。




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值