解决前一篇博文所述问题,execve在释放与父进程的页表共享关系时导致内核重启问题(顺便纠正一下前一篇博文的描述,是我的内核出现了重启,并不是bochs重启了)。
先说明一下,内核在启动分页之前,把内核虚拟地址0xC0000000-0xC0800000映射到0x00000000-0x0080000(物理内存前8M),此后内核想要访问内存最前端的8M地址,必须通过页表转换机制。
而我们的进程在execve时,将0-3GB的进程空间全部解除映射,即进程页目录表的前768项清零,如果execve此时访问0-3GB的空间是,必定会产生page fault。
通过仔细的检查了exeve的代码,排除了所有编码错误导致内核态直接访问0-3G空间的疑点,但最后依然还是重启,哎,完全没了脾气,无可奈何,只能用bochs单步调试汇编指令的方法(没有IDE,十二分难受),艰难地跟踪代码执行,最后把问题定位在用户空间的一条系统调用指令: int 0x80
内核通iret指令进入用户态(move_to_user_mode)后,立刻进行一次fork系统调用产生init进程,通过跟踪调试,CPU在fork()调用陷入内核态后,很奇怪的,bochs显示的代码地址居然错误了:

虚拟地址由0x1b:0xc010d91a变为了0x08:0x001092c0,看出来cs段寄存器确实由R3进入了R0,但是虚拟地址居然偏移了0xC0000000,然而init进程与系统第一个进程0同属于内核代码,地址都应该再0xC0000000以上才对,现在变为了0xC0000000,肯定存在问题。 无意中,通过bochs的info idt指令查看了一下CPU的IDT表,这不看不知道,一看吓一跳:

所有已填充的表项,除了第32项,33项的地址是正确的,其他的项(第128项系统调用没有截图)中地址全是错误的,难怪。问题肯定处在这里,找到IDT的初始化,对比Grap Gate和Interrupt Gate初始化的区别(分别是通过三个宏来定义):
#define set_igate_descriptor(n, _dpl,entry) \
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry)& 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xE);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFFF);\
}while(0)
#define set_trap_descriptor(n, _dpl,entry) \
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry) & 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xF);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFF);\
}while(0)
#define set_system_descriptor(n,_dpl,entry)\
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry) & 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xF);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFF);\
}while(0)
仔细对比这个3个宏定义,发现Grap Gate的entry地址比Interrupt Gate少了最高4bit,导致32位地址最高4位缺失,原本应该是0xC01092c0,结果变成了0x001092c0,。 导致init进程进入内核态后,访问的地址空间全部缺少了0xC0000000这个固定偏移,成为了用户进程空间,且由于init进程在被fork时,页表被mm_copy从task 0完整地复制了一次,用户空间低8M正常映射,所以访问该地址并不会出现问题,而当init进程再fork出子进程,子进程通过execve执行新的镜像时,子进程的用户空间被mm_free释放掉,此时在内核空访问0xC0000000以下的空间时,由于内核缺页,导致一连串的问题出现。 还是手贱的问题。最后修改正确的宏定义如下:
#define set_igate_descriptor(n, _dpl,entry) \
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry)& 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xE);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFFF);\
}while(0)
#define set_trap_descriptor(n, _dpl,entry) \
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry) & 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xF);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFFF);\
}while(0)
#define set_system_descriptor(n,_dpl,entry)\
do{\
number_set(gate_desc,n,offset1,((unsigned long)(&entry) & 0xFFFF));\
number_set(gate_desc,n,selector,0x8);\
number_set(gate_desc,n,res,0x0);\
number_set(gate_desc,n,attr,0x0);\
number_set(gate_desc,n,type,0xF);\
number_set(gate_desc,n,s,0);\
number_set(gate_desc,n,dpl,_dpl);\
number_set(gate_desc,n,p,1);\
number_set(gate_desc,n,offset2,((((unsigned long)(&entry)))>>16) & 0xFFFF);\
}while(0)
总算平静下来了。。。。
1万+

被折叠的 条评论
为什么被折叠?



