用户进程间通信
Linux下的进程间通信方法是从Unix平台继承而来的。Linux遵循POSIX标准(计算机环境的可移植性操作系统界面)。进程间通信有 system V IPC标准、POSIX IPC标准及基于套接口(socket)的进程间通信机制。前两者通信进程局限在单个计算机内,后者则在单个计算机及不同计算机上都可以通信。
System V IPC包括System V消息队列、System V信号灯和System V共享内存,Posix IPC包括Posix消息队列、Posix信号灯和Posix共享内存区。Linux支持所有System V IPC和Posix IPC,并分别为它们提供了系统调用。其中,System V IPC在内核以统一的数据结构方式实现,Posix IPC一般以文件系统的机制实现。
用户应用程序经常用到C库的进程间通信函数,进程间通信函数的功能在内核中实现,C库通过系统调用和文件系统获取该功能。本章分别介绍了用户进程间通信机制在内核中的实现。
System V IPC对象管理
Linux内核将信号量、消息队列和共享内存三类IPC的通用操作进行抽象,形成通用的方法函数(函数名中含有"ipc")。每类具体的操作函数通过在函数参数中传入通用函数名的方法,继承通用方法函数。下面以信号量为例分析IPC通用操作函数,消息队列和共享内存的操作实现方法类似于信号量。
System V IPC数据结构
System V IPC数据结构包括名字空间结构ipc_namespace、ID集结构ipc_ids和IPC对象属性结构kern_ipc_perm。其中,结构 ipc_namespace是所有IPC对象ID集的入口,结构ipc_ids描述每一类IPC对象(如:信号量对象)的ID,结构 kern_ipc_perm描述IPC对象许可的共有对象,被每一类IPC对象结构继承。
进程通过系统调用进入内核空间后,通过结构ipc_namespace类型全局变量init_ipc_ns找到对应类型IPC对象的ID 集,由ID集找到ID,再由ID找到对象的描述结构,从对象描述结构中获取通信数据了。结构ipc_ids与结构kern_ipc_perm的关系如图1 所示。
图1 结构ipc_ids与结构kern_ipc_perm的关系下面分别说明System V IPC数据结构:
(1)IPC对象属性结构kern_ipc_perm
System V IPC在内核以统一的数据结构形式进行实现,它的对象包括System V的消息队列、信号量和共享内存。每个对象都有以下属性:
- 每个创建者、创建者群组和其他人的读和写许可。
- 对象创建者的UID和GID。
- 对象所有者的UID和GID(初始时等于创建者的UID)。
进程在存取System V IPC对象时,规则如下:
- 如果进程有root权限,则可以存取对象。
- 如果进程的EUID是对象所有者或创建者的UID,那么检查相应的创建者许可比特位,查看是否有存取权限。
- 如果进程的EGID是对象所有者或创建者的GID,或者进程所属群组中某个群组的GID就是对象所有者或创建者的GID,那么,检查相应的创建者群组许可比特位是滞有权限。
- 否则,在存取时检查相应的"其他人"许可比特位。
System V IPC对象的属性用结构kern_ipc_perm描述,包括uid、gid、键值、ID号等信息,其列出如下(在include/linux /ipc.h中):
<span class="kw4">struct </span>kern_ipc_perm<span class="br0">{</span> spinlock_t lock<span class="sy0">;</span> <span class="kw4">int</span> deleted<span class="sy0">;</span> <span class="coMULTI">/*删除标识,表示该结构对象已删除*/</span> <span class="kw4">int </span>id<span class="sy0">;</span> <span class="coMULTI">/*id识别号,每个IPC对象的身份号,便于从IPC对象数组中获取该对象*/</span> key_t key<span class="sy0">; </span><span class="coMULTI">/*键值:公有或私有*/</span> uid_t uid<span class="sy0">;</span> gid_t gid<span class="sy0">;</span> uid_t cuid<span class="sy0">;</span> gid_t cgid mode_t mode<span class="sy0">;</span> <span class="coMULTI">/*许可的组合*/</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> seq<span class="sy0">;</span> <span class="coMULTI">/*在每个IPC对象类型中的序列号*/</span> <span class="kw4">void</span><span class="sy0">* </span>security<span class="sy0">;</span> <span class="coMULTI">/*与SELinux安全相关的变量*/</span> <span class="br0">}</span><span class="sy0">;</span>
在结构kern_ipc_perm中,键值为公有和私有。如果键是公有的,则系统中所有的进程通过权限检查后,均可以找到System V IPC 对象的识别号。如果键是私有的,则键值为0,说明每个进程都可以用键值0建立一个专供其私用的对象。System V IPC对象的引用是通过识别号而不是通过键。
结构kern_ipc_perm是IPC对象的共有属性,每个具体的IPC对象结构将继承此结构。
(2)结构ipc_ids
每个System V IPC 对象有一个ID号,每一类System V IPC 对象(如:信号量对象)的所有ID构成ID集,ID集用结构ipc_ids描述,对象指针与ID通过IDR机制关联起来。IDR机制是一种用radix树存放ID和对象映射,作用类似于以ID为序号的数组,但不受数组尺寸的限制。
结构ipc_ids是进程通信ID集的描述结构,该结构列出如下(在include/linux/ipc_namespace.h中):
<span class="kw4">struct</span> ipc_ids <span class="br0">{</span> <span class="kw4">int</span> in_use<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">short</span> seq<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">short</span> seq_max<span class="sy0">;</span> <span class="kw4">struct</span> rw_semaphore rw_mutex<span class="sy0">;</span> <span class="kw4">struct</span> idr ipcs_idr<span class="sy0">;</span> <span class="coMULTI">/*通过IDR机制将ID与结构kern_ipc_perm类型指针建立关联*/</span> <span class="br0">}</span> <span class="sy0">;</span>
(3)结构ipc_namespace
所有System V IPC 对象的ID集存放在结构ipc_namespace类型的全局变量init_ipc_ns中,用户空间的进程进入内核空间后为内核空间的线程,内核空间的线程共享全局变量。因此,不同的进程可以通过全局变量init_ipc_ns查询IPC对象的信息,从而实现进程间通信。
结构ipc_namespace列出如下(在include/linux/ipc_namespace.h中):
<span class="kw4">struct</span> ipc_namespace <span class="br0">{</span> <span class="kw4">struct</span> kref kref<span class="sy0">;</span> <span class="kw4">struct</span> ipc_ids ids<span class="br0">[</span> <span class="nu0">3</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/*分别对应信号量、消息队列和共享内存的ID集*/</span> <span class="kw4">int</span> sem_ctls<span class="br0">[</span> <span class="nu0">4</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="kw4">int</span> used_sems<span class="sy0">;</span> <span class="kw4">int</span> msg_ctlmax<span class="sy0">;</span> <span class="kw4">int</span> msg_ctlmnb<span class="sy0">;</span> <span class="kw4">int</span> msg_ctlmni<span class="sy0">;</span> atomic_t msg_bytes<span class="sy0">;</span> atomic_t msg_hdrs<span class="sy0">;</span> size_t shm_ctlmax<span class="sy0">;</span> size_t shm_ctlall<span class="sy0">;</span> <span class="kw4">int</span> shm_ctlmni<span class="sy0">;</span> <span class="kw4">int</span> shm_tot<span class="sy0">;</span> <span class="kw4">struct</span> notifier_block ipcns_nb<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span>
全局变量init_ipc_ns的定义列出如下(在ipc/util.c中):
<span class="kw4">struct</span> ipc_namespace init_ipc_ns <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">kref</span> <span class="sy0">=</span> <span class="br0">{</span> .<span class="me1">refcount</span> <span class="sy0">=</span> ATOMIC_INIT<span class="br0">(</span> <span class="nu0">2</span> <span class="br0">)</span> <span class="sy0">,</span> <span class="br0">}</span> <span class="sy0">,</span></p> <span class="br0">}</span> <span class="sy0">;</span>
IPC对RCU的支持
进程间通信是进程最常见的操作,进程间通信的效率直接影响程序的执行效率。为了提供同步操作IPC对象的效率,Linux内核使用了自旋锁、读/写信号量、RCU、删除标识和引用计数等机制。
同步机制以数据结构操作为中心,针对不同大小的数据结构操作,使用不同的同步机制。自旋锁用于操作占用内存少、操作快速的小型数据结构,如:结构kern_ipc_perm;读/写信号量用于读操作明显多于写操作的中小型数据结构,如:结构ipc_ids,它还含有一个用于IDR机制简单的radix树;RCU用于操作含有队列或链表、操作时间较长的大型数据结构。如:sem_array,它含有多个链表。删除标识和引用计数用于协调各种同步机制。
(1)RCU前缀对象结构
为了让IPC支持RCU,在IPC对象前面需要加入与RCU操作相关的前缀对象,这样,可最小限度地改动原函数。前缀对象结构有ipc_rcu_hdr、 ipc_rcu_grace和ipc_rcu_sched三种,ipc_rcu_hdr在原对象使用期间使用,增加了引用计数成员;ipc_rcu_grace在RCU宽限期间使用,增加了RCU更新请求链表;ipc_rcu_sched仅在使用函数vmalloc时使用,增加了 vmalloc所需要的工作函数。这些对象放在原对象前面,与原对象使用同一个内存块,通过函数container_of可分离前缀对象和原对象。三个前缀对象结构分别列出如下(在ipc/util.c中):
<span class="kw4">struct</span> ipc_rcu_hdr <span class="br0">{</span> <span class="kw4">int</span> refcount<span class="sy0">;</span> <span class="kw4">int</span> is_vmalloc<span class="sy0">;</span> <span class="coMULTI">/*对于信号量对象,它指向信号量集数组sem_array *sma,用于从IPC对象获取本结构*/</span> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">struct</span> ipc_rcu_grace <span class="br0">{</span> <span class="kw4">struct</span> rcu_head rcu<span class="sy0">;</span> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/*对于信号量对象,指向struct sem_array *sma,用于从IPC对象获取本结构*/</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">struct</span> ipc_rcu_sched <span class="br0">{</span> <span class="kw4">struct</span> work_struct work<span class="sy0">;</span> <span class="coMULTI">/*工作队列的工作函数,函数vmalloc需要使用工作队列 */</span> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/*对于信号量对象,指向struct sem_array *sma,用于从IPC对象获取本结构 */</span> <span class="br0">}</span> <span class="sy0">;</span>
(2)分配IPC对象时加入RCU前缀对象
用户分配IPC对象空间时,调用函数ipc_rcu_alloc分配内存。函数ipc_rcu_alloc封装了内存分配函数,在IPC对象前面加入了RCU前缀对象,并初始化前缀对象。函数的参数size为IPC对象的大小,返回指向前缀对象和IPC对象(称为RCU IPC对象)所在内存块的地址。
函数ipc_rcu_alloc列出如下(在ipc/util.c中):
<span class="kw4">void</span> <span class="sy0">*</span> ipc_rcu_alloc<span class="br0">(</span> <span class="kw4">int</span> size<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">void</span> <span class="sy0">*</span> out<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> rcu_use_vmalloc<span class="br0">(</span> size<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*如果分配尺寸大于1个物理页时,使用分配函数vmalloc*/</span> out <span class="sy0">=</span> vmalloc<span class="br0">(</span> HDRLEN_VMALLOC <span class="sy0">+</span> size<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> out<span class="br0">)</span> <span class="br0">{</span> out <span class="sy0">+=</span> HDRLEN_VMALLOC<span class="sy0">;</span> <span class="coMULTI">/*利用函数container_of从IPC对象获取前缀对象,并初始化前缀对象的结构成员*/</span> container_of<span class="br0">(</span> out<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> is_vmalloc <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> container_of<span class="br0">(</span> out<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> refcount <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> out <span class="sy0">=</span> kmalloc<span class="br0">(</span> HDRLEN_KMALLOC <span class="sy0">+</span> size<span class="sy0">,</span> GFP_KERNEL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> out<span class="br0">)</span> <span class="br0">{</span> out <span class="sy0">+=</span> HDRLEN_KMALLOC<span class="sy0">;</span> container_of<span class="br0">(</span> out<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> is_vmalloc <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> container_of<span class="br0">(</span> out<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> refcount <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">return</span> out<span class="sy0">;</span> <span class="coMULTI">/*返回RCU IPC对象的地址*/</span> <span class="br0">}</span>
IPC前缀对象尺寸计算的宏定义列出如下:
<span class="co2">#define HDRLEN_KMALLOC (sizeof(struct ipc_rcu_grace) > sizeof(struct ipc_rcu_hdr) ? / </span><p> sizeof(struct ipc_rcu_grace) : sizeof(struct ipc_rcu_hdr�</p> <span class="co2">#define HDRLEN_VMALLOC (sizeof(struct ipc_rcu_sched) > HDRLEN_KMALLOC ? / sizeof(struct ipc_rcu_sched) : HDRLEN_KMALLOC)</span>
(3)修改IPC对象引起延迟更新
当修改对象时,RCU将通过函数call_rcu进行延迟更新。RCU IPC对象通过引用计数触发延迟更新函数call_rcu的调用。在对象修改前调用函数ipc_rcu_getref增加引用计数,修改后调用函数 ipc_rcu_putref将引用计数减1,当引用计数为0时,调用call_rcu进行延迟更新。
函数ipc_rcu_getref列出如下:
<span class="kw4">void</span> ipc_rcu_getref<span class="br0">(</span> <span class="kw4">void</span> <span class="sy0">*</span> ptr<span class="br0">)</span> <span class="br0">{</span> container_of<span class="br0">(</span> ptr<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> refcount<span class="sy0">++;</span> <span class="br0">}</span>
函数ipc_rcu_putref列出如下:
<span class="kw4">void</span> ipc_rcu_putref<span class="br0">(</span> <span class="kw4">void</span> <span class="sy0">*</span> ptr<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">--</span> container_of<span class="br0">(</span> ptr<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> refcount <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> container_of<span class="br0">(</span> ptr<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_hdr<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> is_vmalloc<span class="br0">)</span> <span class="br0">{</span> call_rcu<span class="br0">(</span> <span class="sy0">&</span> container_of<span class="br0">(</span> ptr<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_grace<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> rcu<span class="sy0">,</span> ipc_schedule_free<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> call_rcu<span class="br0">(</span> <span class="sy0">&</span> container_of<span class="br0">(</span> ptr<span class="sy0">,</span> <span class="kw4">struct</span> ipc_rcu_grace<span class="sy0">,</span> data<span class="br0">)</span> <span class="sy0">-></span> rcu<span class="sy0">,</span> ipc_immediate_free<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span></p> <span class="br0">}</span>
在对IPC对象进行修改时,操作还应加上自旋锁,例如:信号量对象修改的加锁函数sem_lock_and_putref和解锁函数 sem_getref_and_unlock分别列出如下(在ipc/sem.c中):
<span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> sem_lock_and_putref<span class="br0">(</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="br0">)</span> <p><span class="br0">{</span> ipc_lock_by_ptr<span class="br0">(</span> <span class="sy0">&</span> sma<span class="sy0">-></span> sem_perm<span class="br0">)</span> <span class="sy0">;</span> ipc_rcu_putref<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> sem_getref_and_unlock<span class="br0">(</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="br0">)</span> <span class="br0">{</span> ipc_rcu_getref<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> ipc_unlock<span class="br0">(</span> <span class="sy0">&</span> <span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">-></span> sem_perm<span class="br0">)</span> <span class="sy0">;</span></p> <span class="br0">}</span>
ipc通用对象加锁函数ipc_lock_by_ptr和解锁函数ipc_unlock分别列出如下(在ipc/util.h):
<span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> ipc_lock_by_ptr<span class="br0">(</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> perm<span class="br0">)</span> <p><span class="br0">{</span> rcu_read_lock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> perm<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> ipc_unlock<span class="br0">(</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> perm<span class="br0">)</span> <span class="br0">{</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> perm<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span></p> <span class="br0">}</span>
IPC对象查找
进程通信操作指用户空间进程通信的具体操作,如:信号量的加1和减1操作。不同类型的IPC对象,该操作是不同的,实现方法也不同,各类型操作在信号量、共享内存和消息队列中详细介绍。
不同类型进程间通信的操作不一样,但有一些通用的操作,如:从ID查找IPC对象、增加/减少ID等通用操作。下面以信号量为例说明这些通用操作。
信号量操作时,进程在内核空间先通过信号量ID找到对应的信号量对象,然后再信号量对象进行修改操作。查找信号量对象的过程是读操作过程,通过RCU机制可以无阻塞地并发操作,而对信号量对象进行修改操作则需要加自旋锁才能进行。
信号量操作系统调用sys_semtimedop完成信号量的增加或减小操作,与信号量对象查找相关的代码列出如下:
asmlinkage <span class="kw4">long</span> sys_semtimedop<span class="br0">(</span> <span class="kw4">int</span> semid<span class="sy0">,</span> <span class="kw4">struct</span> sembuf __user <span class="sy0">*</span> tsops<span class="sy0">,</span> <span class="kw4">unsigned</span> nsops<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> timespec __user <span class="sy0">*</span> timeout<span class="br0">)</span> <span class="br0">{</span> …… sma <span class="sy0">=</span> sem_lock_check<span class="br0">(</span> ns<span class="sy0">,</span> semid<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*通过semid 查找信号量对象,并加自旋锁*/</span> …… error <span class="sy0">=</span> try_atomic_semop <span class="br0">(</span> sma<span class="sy0">,</span> sops<span class="sy0">,</span> nsops<span class="sy0">,</span> un<span class="sy0">,</span> task_tgid_vnr<span class="br0">(</span> current<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*对信号量进行加/减操作*/</span> …… out_unlock_free<span class="sy0">:</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*操作完成后,解自旋锁*/</span> …… <span class="br0">}</span>
下面分别说明函数sem_lock_check和sem_unlock。
- 函数sem_lock_check
函数sem_lock_check通过id查找到IPC对象并加上自旋锁,以便修改对象。再调用函数container_of获取信号量对象。函数 sem_lock_check列出如下(在ipc/sem.c中):
<span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sem_lock_check<span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">,</span> <span class="kw4">int</span> id<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> ipcp <span class="sy0">=</span> ipc_lock_check<span class="br0">(</span> <span class="sy0">&</span> sem_ids<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">,</span> id<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> ipcp<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="br0">(</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> <span class="br0">)</span> ipcp<span class="sy0">;</span> <span class="kw1">return</span> container_of<span class="br0">(</span> ipcp<span class="sy0">,</span> <span class="kw4">struct</span> sem_array<span class="sy0">,</span> sem_perm<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
函数ipc_lock_check在获取IPC对象后,检查对象的id序列号是否正确,其列出如下(在ipc/util.c中):
<span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> ipc_lock_check<span class="br0">(</span> <span class="kw4">struct</span> ipc_ids <span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">int</span> id<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> out<span class="sy0">;</span> out <span class="sy0">=</span> ipc_lock<span class="br0">(</span> ids<span class="sy0">,</span> id<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*通过id查找到结构kern_ipc_perm类型的对象*/</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> out<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> out<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> ipc_checkid<span class="br0">(</span> out<span class="sy0">,</span> id<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*检查id的序列号是否正确:id / 32768 != out->seq*/</span> ipc_unlock<span class="br0">(</span> out<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EIDRM<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> out<span class="sy0">;</span></p> <span class="br0">}</span>
函数ipc_lock在ids中查找一个id,查找过程加读者锁,找到id获取IPC对象后,锁住对象。该函数在返回时,仍然锁住IPC对象,以便通信操作修改IPC对象。该函数应该在未持有rw_mutex、radix树ids->ipcs_idr未被保护的情况下调用。 函数ipc_lock列出如下(在ipc/util.c中):
<span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> ipc_lock<span class="br0">(</span> <span class="kw4">struct</span> ipc_ids <span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">int</span> id<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> out<span class="sy0">;</span> <span class="kw4">int</span> lid <span class="sy0">=</span> ipcid_to_idx<span class="br0">(</span> id<span class="br0">)</span> <span class="sy0">;</span> down_read<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> rw_mutex<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*操作ids用读/写信号量,加读者锁*/</span> rcu_read_lock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*操作radix树ids->ipcs_idr用RCU机制,加RCU读者锁*/</span> out <span class="sy0">=</span> idr_find<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> ipcs_idr<span class="sy0">,</span> lid<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*从radix树ids->ipcs_idr中找到lid对应的对象指针*/</span> <span class="kw1">if</span> <span class="br0">(</span> out <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*如果没找到,解锁后返回错误*/</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> up_read<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> rw_mutex<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EINVAL<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> up_read<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> rw_mutex<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*ids完成读操作,解读者锁*/</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> out<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*加自旋锁,以便后面的函数修改out*/</span> <span class="coMULTI">/*此时,其他进程的ipc_rmid()可能已经在ipc_lock正自旋时释放了ID,这里检查标识验证out是否还有效*/</span> <span class="kw1">if</span> <span class="br0">(</span> out<span class="sy0">-></span> deleted<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*out已被删除,释放锁返回错误*/</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> out<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EINVAL<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> out<span class="sy0">;</span> <span class="br0">}</span>
- 函数sem_unlock
函数sem_unlock在通信操作修改完成IPC对象后解自旋锁。其列出如下:
<span class="co2">#define sem_unlock(sma) ipc_unlock(&(sma)->sem_perm)</span> <span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> ipc_unlock<span class="br0">(</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> perm<span class="br0">)</span> <span class="br0">{</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> perm<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
释放IPC命名空间
函数free_ipcsfree_ipcs释放IPC命名空间,IPC命名空间是由结构ipc_namespace表示,是IPC的总入口描述。进程在内核空间通过结构ipc_namespace类型的全局变量找到每一类IPC的ID集结构,再从ID集中找到IPC对象id,由id可找到IPC对象。
释放IPC命名空间操作将释放三类IPC对象,由于IPC对象为多个线程共享,释放操作使用了读/写信号量、RCU等多种同步机制,是应用内核同步机制的典范,因此,对同步机制的应用也进行了分析。
函数free_ipcsfree_ipcs的调用层次图如图1所示,下面以信号量为例按图分析函数的实现,说明内核同步机制的应用。
图1 函数free_ipcsfree_ipcs的调用层次图
函数free_ipcsfree_ipcs列出如下(在ipc/namespace.c中):
<span class="kw4">void</span> free_ipc_ns<span class="br0">(</span> <span class="kw4">struct</span> kref <span class="sy0">*</span> kref<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">;</span> ns <span class="sy0">=</span> container_of<span class="br0">(</span> kref<span class="sy0">,</span> <span class="kw4">struct</span> ipc_namespace<span class="sy0">,</span> kref<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*通过kref获取命名空间对象*/</span> <span class="coMULTI">/*在开始处注销hotplug通知器可以保证在回调例程中不释放IPC命名空间对象。因为通知器含有IPC对象读/写锁,读/写锁释放后,通知器才会释放对象*/</span> unregister_ipcns_notifier<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">;</span> sem_exit_ns<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*释放信号量的IPC对象*/</span> msg_exit_ns<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*释放消息队列的IPC对象*/</span> shm_exit_ns<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*释放共享内存的IPC对象*/</span> kfree<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">;</span> atomic_dec<span class="br0">(</span> <span class="sy0">&</span> nr_ipc_ns<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*IPC命名空间对象引用计数减1*/</span> <span class="coMULTI">/*发出通知*/</span> ipcns_notify<span class="br0">(</span> IPCNS_REMOVED<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
下面仅说明信号量命名空间的释放。 信号量命令空间用过调用函数sem_exit_ns释放命名空间,其列出如下(在ipc/sem.c中):
<span class="kw4">void</span> sem_exit_ns<span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/* #define sem_ids(ns) �ns)->ids[IPC_SEM_IDS]) */</span> free_ipcs<span class="br0">(</span> ns<span class="sy0">,</span> <span class="sy0">&</span> sem_ids<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">,</span> freeary<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
在一个结构ipc_namespace实例退出时,函数free_ipcs被调用来释放一种IPC类型的所有IPC对象。参数ns为将删除IPC对象的命名空间,参数ids为将释放IPC对象的ID集,参数free为调用来释放指定IPC类型的函数。
函数free_ipcs先加写者锁ids->rw_mutex用于修改ids,接着加自旋锁perm->lock用于释放每个IPC对象。
函数free_ipcs列出如下(在ipc/namespace.c中):
<span class="kw4">void</span> free_ipcs<span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">,</span> <span class="kw4">struct</span> ipc_ids <span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">void</span> <span class="br0">(</span> <span class="sy0">*</span> free<span class="br0">)</span> <span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*,</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> perm<span class="sy0">;</span> <span class="kw4">int</span> next_id<span class="sy0">;</span> <span class="kw4">int</span> total<span class="sy0">,</span> in_use<span class="sy0">;</span> down_write<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> rw_mutex<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*给ids加写者锁*/</span> in_use <span class="sy0">=</span> ids<span class="sy0">-></span> in_use<span class="sy0">;</span> <span class="kw1">for</span> <span class="br0">(</span> total <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">,</span> next_id <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> total <span class="sy0"><</span> in_use<span class="sy0">;</span> next_id<span class="sy0">++</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*通过id从radix树ipcs_idr中查找对应的结构kern_ipc_perm类型指针*/</span> perm <span class="sy0">=</span> idr_find<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> ipcs_idr<span class="sy0">,</span> next_id<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> perm <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="kw1">continue</span> <span class="sy0">;</span> <span class="coMULTI">/*执行加锁操作:rcu_read_lock()和spin_lock(&perm->lock)*/</span> ipc_lock_by_ptr<span class="br0">(</span> perm<span class="br0">)</span> <span class="sy0">;</span> free<span class="br0">(</span> ns<span class="sy0">,</span> perm<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*执行释放perm操作,实际上调用函数freeary完成*/</span> total<span class="sy0">++;</span> <span class="br0">}</span> up_write<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> rw_mutex<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*解写者锁*/</span> <span class="br0">}</span>
函数freeary释放一个信号量集。在调用此函数时,sem_ids.rw_mutex已作为写者锁锁住,用于操作sem_ids,并且它持有结构 kern_ipc_perm的自旋锁,用于操作ipcp。sem_ids.rw_mutex在函数退出时一直保持锁住状态。
由于结构sem_array(使用RCU)包含结构kern_ipc_perm(使用自旋锁),它需要延迟删除,但结构kern_ipc_perm 使用自旋锁而无法延迟删除,因此,它使用了删除标识,在删除时,将删除标识设置为1,等待到RCU延迟删除结构sem_array时,RCU再一起删除结构kern_ipc_perm。
函数freeary列出如下(在ipc/sem.c中):
<span class="kw4">static</span> <span class="kw4">void</span> freeary<span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">,</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> ipcp<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> un<span class="sy0">;</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> q<span class="sy0">;</span> <span class="coMULTI">/*获取信号量集的指针:通过基类对象指针获取子类对象指针*/</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma <span class="sy0">=</span> container_of<span class="br0">(</span> ipcp<span class="sy0">,</span> <span class="kw4">struct</span> sem_array<span class="sy0">,</span> sem_perm<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/* 使此信号量集正存在的结构undo无效。结构undo在exit_sem()或下一个semop期间,如果没有其他操作时,将被释放*/</span> <span class="kw1">for</span> <span class="br0">(</span> un <span class="sy0">=</span> sma<span class="sy0">-></span> undo<span class="sy0">;</span> un<span class="sy0">;</span> un <span class="sy0">=</span> un<span class="sy0">-></span> id_next<span class="br0">)</span> un<span class="sy0">-></span> semid <span class="sy0">=</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="coMULTI">/* 唤醒所有挂起进程,让它们运行失败返回错误EIDRM */</span> q <span class="sy0">=</span> sma<span class="sy0">-></span> sem_pending<span class="sy0">;</span> <span class="kw1">while</span> <span class="br0">(</span> q<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> n<span class="sy0">;</span> <span class="coMULTI">/* lazy remove_from_queue: 正杀死整个队列*/</span> q<span class="sy0">-></span> prev <span class="sy0">=</span> NULL<span class="sy0">;</span> n <span class="sy0">=</span> q<span class="sy0">-></span> next<span class="sy0">;</span> q<span class="sy0">-></span> status <span class="sy0">=</span> IN_WAKEUP<span class="sy0">;</span> wake_up_process<span class="br0">(</span> q<span class="sy0">-></span> sleeper<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/* 唤醒进程 */</span> smp_wmb<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> q<span class="sy0">-></span> status <span class="sy0">=</span> <span class="sy0">-</span> EIDRM<span class="sy0">;</span> <span class="coMULTI">/* 标识状态为ID被删除状态*/</span> q <span class="sy0">=</span> n<span class="sy0">;</span> <span class="br0">}</span> <span class="coMULTI">/* 从IDR中删除信号量集,IDR是ID到指针映射的树 */</span> sem_rmid<span class="br0">(</span> ns<span class="sy0">,</span> sma<span class="br0">)</span> <span class="sy0">;</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*解锁:ipc_unlock(&(sma)->sem_perm ) */</span> ns<span class="sy0">-></span> used_sems <span class="sy0">-=</span> sma<span class="sy0">-></span> sem_nsems<span class="sy0">;</span> security_sem_free<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*安全检查*/</span> ipc_rcu_putref<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*当引用计数为0时,调用函数call_rcu延迟释放sma*/</span> <span class="br0">}</span> <span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> sem_rmid<span class="br0">(</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">,</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> s<span class="br0">)</span> <span class="br0">{</span> ipc_rmid<span class="br0">(</span> <span class="sy0">&</span> sem_ids<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">,</span> <span class="sy0">&</span> s<span class="sy0">-></span> sem_perm<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
函数ipc_rmid删除IPC ID,它设置了删除标识,相应的信号量集将还保留在内存中,直到RCU宽限期之后才释放。在调用此函数前,sem_ids.rw_mutex作为写者锁锁住,并且它持有信号量集的自旋锁。sem_ids.rw_mutex在退出时一直保持锁住状态。 函数ipc_rmid列出如下:
<span class="kw4">void</span> ipc_rmid<span class="br0">(</span> <span class="kw4">struct</span> ipc_ids <span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">struct</span> kern_ipc_perm <span class="sy0">*</span> ipcp<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> lid <span class="sy0">=</span> ipcid_to_idx<span class="br0">(</span> ipcp<span class="sy0">-></span> id<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/* (id) % 32768 */</span> idr_remove<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> ipcs_idr<span class="sy0">,</span> lid<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*从ids中删除lid*/</span> ids<span class="sy0">-></span> in_use<span class="sy0">--;</span> <span class="coMULTI">/*使用的id计数减1*/</span> ipcp<span class="sy0">-></span> deleted <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="coMULTI">/*将ipcp标识为已删除*/</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span>
管道
管道(pipe)是指用于连接读进程和写进程,以实现它们之间通信的共享文件。因而它又称共享文件。向管道(共享文件)提供输入的发送进程(即写进程),以字符流形式将大量的数据送入管道;而接受管道输出的接收进程(即读进程),可从管道中接收数据。由于发送进程和接收进程是利用管道进行通信的,所以将这些共享文件又称为管道。
为了协调双方的通信,管道通信机制必须提供以下三方面的协调能力。
- 互斥。当一个进程正在对管道进行读写操作时,另一个进程必须等待。
- 同步。当写(输入)进程把一定数量(如4 KB)数据写入管道后,便去睡眠等待,直到读(输出)进程取走数据后,再把它唤醒。当读进程读到一空管道时,也应睡眠,直到写进程将数据写入管道后,才将它唤醒。
- 判断对方是否存在。只有确定对方已经存在时,方能进行通信。
管道是一个固定大小的缓冲区,缓冲的大小为1页,即4 KB。管道借用了文件系统的file结构和VFS的索引节点inode。通过将两个file结构指向同一个临时的VFS索引节点,而这个索引节点又指向一个物理页而实现管道。它们定义的文件操作地址是不同的,其中一个是向管道中写入数据的例程地址,而另一个是从管道中读出数据的例程地址。这样,用户程序的系统调用仍然是通常的文件操作,而内核却利用这种抽象机制实现了管道这一特殊操作。
例如: $ ls | grep *.m | lp
这个shell命令是管道的一个应用,ls列当前目录的输出被作为标准输入送到grep程序中,而grep的输出又被作为标准输入送到lp 程序中。
Linux支持命名管道(named pipe)。命名管道是一类特殊的FIFO文件,它像普通文件一样有名字,也像普通文件一样访问。它总是按照"先进先出"的原则工作,又称为FIFO管道。FIFO管道不是临时对象,它们是文件系统中的实体并且可以通过mkfifo命令来创建。
管道的实现
管道在内核中是以文件系统的形式实现的一个模块,应用程序通过系统调用sys_pipe建立管道,再通过文件的读写函数进行操作。
下面就其实现代码进行分析,这些代码在(fs/pipe.c中)。函数init_pipe_fs初始化管道模块,装载管道文件系统,列出如下:
<span class="kw4">static</span> <span class="kw4">int</span> __init init_pipe_fs<span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//注册管道文件系统</span> <span class="kw4">int</span> err <span class="sy0">=</span> register_filesystem<span class="br0">(</span> <span class="sy0">&</span> pipe_fs_type<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> err<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//装载文件系统pipe_fs_type</span> pipe_mnt <span class="sy0">=</span> kern_mount<span class="br0">(</span> <span class="sy0">&</span> pipe_fs_type<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> pipe_mnt<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> pipe_mnt<span class="br0">)</span> <span class="sy0">;</span> unregister_filesystem<span class="br0">(</span> <span class="sy0">&</span> pipe_fs_type<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">return</span> err<span class="sy0">;</span> <span class="br0">}</span>
系统调用sys_pipe创建一个管道,它得到两个文件描述符:fd[0]代表管道的输入端;fd[1]代表管道的输出端。系统调用sys_pipe调用层次图如图16.1所示,系统调用sys_pipe列出如下(在arch/i386/kernel/sys_i386.c中):
asmlinkage <span class="kw4">int</span> sys_pipe<span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> __user <span class="sy0">*</span> fildes<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> fd<span class="br0">[</span> <span class="nu0">2</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="kw4">int</span> error<span class="sy0">;</span> error <span class="sy0">=</span> do_pipe<span class="br0">(</span> fd<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> error<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//将创建的两个文件描述符拷贝到用户空间</span> <span class="kw1">if</span> <span class="br0">(</span> copy_to_user<span class="br0">(</span> fildes<span class="sy0">,</span> fd<span class="sy0">,</span> <span class="nu0">2</span> <span class="sy0">*</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="kw4">int</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> error <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> error<span class="sy0">;</span> <span class="br0">}</span>
图16.1 系统调用sys_pipe调用层次图
函数do_pipe 实现了管道的创建工作,函数列出如下(在fs/pipe.c中):
<span class="kw4">int</span> do_pipe<span class="br0">(</span> <span class="kw4">int</span> <span class="sy0">*</span> fd<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> qstr this<span class="sy0">;</span> <span class="kw4">char</span> name<span class="br0">[</span> <span class="nu0">32</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dentry<span class="sy0">;</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> f1<span class="sy0">,</span> <span class="sy0">*</span> f2<span class="sy0">;</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="kw4">int</span> i<span class="sy0">,</span> j<span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENFILE<span class="sy0">;</span> <span class="co1">//得到管道入口的文件结构</span> f1 <span class="sy0">=</span> get_empty_filp<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> f1<span class="br0">)</span> <span class="kw1">goto</span> no_files<span class="sy0">;</span> <span class="co1">//得到管道出口的文件结构</span> f2 <span class="sy0">=</span> get_empty_filp<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> f2<span class="br0">)</span> <span class="kw1">goto</span> close_f1<span class="sy0">;</span> <span class="co1">//创建管道文件节点</span> inode <span class="sy0">=</span> get_pipe_inode<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> inode<span class="br0">)</span> <span class="kw1">goto</span> close_f12<span class="sy0">;</span> <span class="co1">//得到管道入口的文件描述符</span> error <span class="sy0">=</span> get_unused_fd<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">goto</span> close_f12_inode<span class="sy0">;</span> i <span class="sy0">=</span> error<span class="sy0">;</span> <span class="co1">//得到管道出口的文件描述符</span> error <span class="sy0">=</span> get_unused_fd<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">goto</span> close_f12_inode_i<span class="sy0">;</span> j <span class="sy0">=</span> error<span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="co1">//以节点号为文件名</span> sprintf<span class="br0">(</span> name<span class="sy0">,</span> <span class="st0">"[%lu]"</span> <span class="sy0">,</span> inode<span class="sy0">-></span> i_ino<span class="br0">)</span> <span class="sy0">;</span> this.<span class="me1">name</span> <span class="sy0">=</span> name<span class="sy0">;</span> this.<span class="me1">len</span> <span class="sy0">=</span> strlen<span class="br0">(</span> name<span class="br0">)</span> <span class="sy0">;</span> this.<span class="me1">hash</span> <span class="sy0">=</span> inode<span class="sy0">-></span> i_ino<span class="sy0">;</span> <span class="coMULTI">/* will go */</span> <span class="co1">//分配dentry</span> dentry <span class="sy0">=</span> d_alloc<span class="br0">(</span> pipe_mnt<span class="sy0">-></span> mnt_sb<span class="sy0">-></span> s_root<span class="sy0">,</span> <span class="sy0">&</span> this<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> dentry<span class="br0">)</span> <span class="kw1">goto</span> close_f12_inode_i_j<span class="sy0">;</span> <span class="co1">//对于管道来说只允许pipefs_delete_dentry()操作</span> dentry<span class="sy0">-></span> d_op <span class="sy0">=</span> <span class="sy0">&</span> pipefs_dentry_operations<span class="sy0">;</span> d_add<span class="br0">(</span> dentry<span class="sy0">,</span> inode<span class="br0">)</span> <span class="sy0">;</span> f1<span class="sy0">-></span> f_vfsmnt <span class="sy0">=</span> f2<span class="sy0">-></span> f_vfsmnt <span class="sy0">=</span> mntget<span class="br0">(</span> mntget<span class="br0">(</span> pipe_mnt<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> f1<span class="sy0">-></span> f_dentry <span class="sy0">=</span> f2<span class="sy0">-></span> f_dentry <span class="sy0">=</span> dget<span class="br0">(</span> dentry<span class="br0">)</span> <span class="sy0">;</span> f1<span class="sy0">-></span> f_mapping <span class="sy0">=</span> f2<span class="sy0">-></span> f_mapping <span class="sy0">=</span> inode<span class="sy0">-></span> i_mapping<span class="sy0">;</span> <span class="co1">//管道出口只许读操作</span> f1<span class="sy0">-></span> f_pos <span class="sy0">=</span> f2<span class="sy0">-></span> f_pos <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> f1<span class="sy0">-></span> f_flags <span class="sy0">=</span> O_RDONLY<span class="sy0">;</span> f1<span class="sy0">-></span> f_op <span class="sy0">=</span> <span class="sy0">&</span> read_pipe_fops<span class="sy0">;</span> f1<span class="sy0">-></span> f_mode <span class="sy0">=</span> FMODE_READ<span class="sy0">;</span> f1<span class="sy0">-></span> f_version <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="co1">//管道入口只许写操作</span> f2<span class="sy0">-></span> f_flags <span class="sy0">=</span> O_WRONLY<span class="sy0">;</span> f2<span class="sy0">-></span> f_op <span class="sy0">=</span> <span class="sy0">&</span> write_pipe_fops<span class="sy0">;</span> f2<span class="sy0">-></span> f_mode <span class="sy0">=</span> FMODE_WRITE<span class="sy0">;</span> f2<span class="sy0">-></span> f_version <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="co1">//安装file指针f1到fd数组中去</span> fd_install<span class="br0">(</span> i<span class="sy0">,</span> f1<span class="br0">)</span> <span class="sy0">;</span> fd_install<span class="br0">(</span> j<span class="sy0">,</span> f2<span class="br0">)</span> <span class="sy0">;</span> fd<span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> <span class="sy0">=</span> i<span class="sy0">;</span> fd<span class="br0">[</span> <span class="nu0">1</span> <span class="br0">]</span> <span class="sy0">=</span> j<span class="sy0">;</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> …… <span class="br0">}</span>
函数get_pipe_inode创建一个特殊的节点,它在内存中分配一页缓冲区当做文件,操作管道文件实际上就是操作一个缓冲区。函数 get_pipe_inode列出如下:
<span class="kw4">static</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> get_pipe_inode<span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <p><span class="br0">{</span> <span class="co1">//新建节点inode</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode <span class="sy0">=</span> new_inode<span class="br0">(</span> pipe_mnt<span class="sy0">-></span> mnt_sb<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> inode<span class="br0">)</span> <span class="kw1">goto</span> fail_inode<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> pipe_new<span class="br0">(</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> fail_iput<span class="sy0">;</span> PIPE_READERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">=</span> PIPE_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> inode<span class="sy0">-></span> i_fop <span class="sy0">=</span> <span class="sy0">&</span> rdwr_pipe_fops<span class="sy0">;</span> <span class="co1">//初始化节点</span> inode<span class="sy0">-></span> i_state <span class="sy0">=</span> I_DIRTY<span class="sy0">;</span> <span class="co1">//标志节点dirty</span> inode<span class="sy0">-></span> i_mode <span class="sy0">=</span> S_IFIFO <span class="sy0">|</span> S_IRUSR <span class="sy0">|</span> S_IWUSR<span class="sy0">;</span> inode<span class="sy0">-></span> i_uid <span class="sy0">=</span> current<span class="sy0">-></span> fsuid<span class="sy0">;</span> inode<span class="sy0">-></span> i_gid <span class="sy0">=</span> current<span class="sy0">-></span> fsgid<span class="sy0">;</span> inode<span class="sy0">-></span> i_atime <span class="sy0">=</span> inode<span class="sy0">-></span> i_mtime <span class="sy0">=</span> inode<span class="sy0">-></span> i_ctime <span class="sy0">=</span> CURRENT_TIME<span class="sy0">;</span> inode<span class="sy0">-></span> i_blksize <span class="sy0">=</span> PAGE_SIZE<span class="sy0">;</span> <span class="kw1">return</span> inode<span class="sy0">;</span> fail_iput<span class="sy0">:</span> iput<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span> fail_inode<span class="sy0">:</span> <span class="kw1">return</span> NULL<span class="sy0">;</span></p> <span class="br0">}</span>
函数pipe_new创建一个管道的信息结构,以及分配一页缓冲区,当做管道,这个函数列出如下:
<span class="kw4">struct</span> inode<span class="sy0">*</span> pipe_new<span class="br0">(</span> <span class="kw4">struct</span> inode<span class="sy0">*</span> inode<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> page<span class="sy0">;</span></p> <pre> <span class="co1">//分配内存页面作为管道的缓冲区</span>
page = __get_free_page(GFP_USER);
if (!page)
return NULL;
//分配管道的信息结构对象
inode->i_pipe = kmalloc(sizeof(struct pipe_inode_info), GFP_KERNEL);
if (!inode->i_pipe)
goto fail_page;
//初始化节点
init_waitqueue_head(PIPE_WAIT(*inode));
PIPE_BASE(*inode) = (char*) page;
PIPE_START(*inode) = PIPE_LEN(*inode) = 0;
PIPE_READERS(*inode) = PIPE_WRITERS(*inode) = 0;
PIPE_WAITING_WRITERS(*inode) = 0;
PIPE_RCOUNTER(*inode) = PIPE_WCOUNTER(*inode) = 1;
*PIPE_FASYNC_READERS(*inode) = *PIPE_FASYNC_WRITERS(*inode) = NULL;
return inode;
fail_page:
free_page(page);
return NULL;
}
结构实例read_pipe_fops是管道入口的操作函数,列出如下:
<span class="kw4">struct</span> file_operations read_pipe_fops <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">llseek</span> <span class="sy0">=</span> no_llseek<span class="sy0">,</span> <span class="co1">//空操作</span> .<span class="me1">read</span> <span class="sy0">=</span> pipe_read<span class="sy0">,</span> .<span class="me1">readv</span> <span class="sy0">=</span> pipe_readv<span class="sy0">,</span> .<span class="me1">write</span> <span class="sy0">=</span> bad_pipe_w<span class="sy0">,</span> <span class="co1">//空操作</span> .<span class="me1">poll</span> <span class="sy0">=</span> pipe_poll<span class="sy0">,</span> .<span class="me1">ioctl</span> <span class="sy0">=</span> pipe_ioctl<span class="sy0">,</span> .<span class="me1">open</span> <span class="sy0">=</span> pipe_read_open<span class="sy0">,</span> .<span class="me1">release</span> <span class="sy0">=</span> pipe_read_release<span class="sy0">,</span> .<span class="me1">fasync</span> <span class="sy0">=</span> pipe_read_fasync<span class="sy0">,</span></p> <span class="br0">}</span> <span class="sy0">;</span>
结构实例write_pipe_fops是管道出口的操作函数,列出如下:
<span class="kw4">struct</span> file_operations write_pipe_fops <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">llseek</span> <span class="sy0">=</span> no_llseek<span class="sy0">,</span> <span class="co1">//空操作</span> .<span class="me1">read</span> <span class="sy0">=</span> bad_pipe_r<span class="sy0">,</span> <span class="co1">//空操作</span> .<span class="me1">write</span> <span class="sy0">=</span> pipe_write<span class="sy0">,</span> .<span class="me1">writev</span> <span class="sy0">=</span> pipe_writev<span class="sy0">,</span> .<span class="me1">poll</span> <span class="sy0">=</span> pipe_poll<span class="sy0">,</span> .<span class="me1">ioctl</span> <span class="sy0">=</span> pipe_ioctl<span class="sy0">,</span> .<span class="me1">open</span> <span class="sy0">=</span> pipe_write_open<span class="sy0">,</span> .<span class="me1">release</span> <span class="sy0">=</span> pipe_write_release<span class="sy0">,</span> .<span class="me1">fasync</span> <span class="sy0">=</span> pipe_write_fasync<span class="sy0">,</span></p> <span class="br0">}</span> <span class="sy0">;</span>
读函数pipe_read与写函数pipe_write是典型的在环形缓冲区上的读者-写者问题的解决方法。对读者进程而言,缓冲区中有数据就读取,然后唤醒可能正在等待着的写者。如果没有数据可读,就进入睡眠。对写者而言,只要缓冲区有空间,就往里写,并唤醒可能正在等待的读者;如果没有空间,就睡眠。 下面对这两个函数进行分析:
<span class="kw4">static</span> ssize_t pipe_read<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> <span class="kw4">char</span> __user <span class="sy0">*</span> buf<span class="sy0">,</span> size_t count<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> iovec iov <span class="sy0">=</span> <span class="br0">{</span> .<span class="me1">iov_base</span> <span class="sy0">=</span> buf<span class="sy0">,</span> .<span class="me1">iov_len</span> <span class="sy0">=</span> count <span class="br0">}</span> <span class="sy0">;</span> <span class="kw1">return</span> pipe_readv<span class="br0">(</span> filp<span class="sy0">,</span> <span class="sy0">&</span> iov<span class="sy0">,</span> <span class="nu0">1</span> <span class="sy0">,</span> ppos<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> ssize_t pipe_readv<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> _iov<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> nr_segs<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode <span class="sy0">=</span> filp<span class="sy0">-></span> f_dentry<span class="sy0">-></span> d_inode<span class="sy0">;</span> <span class="kw4">int</span> do_wakeup<span class="sy0">;</span> ssize_t ret<span class="sy0">;</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> iov <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> <span class="br0">)</span> _iov<span class="sy0">;</span> size_t total_len<span class="sy0">;</span> total_len <span class="sy0">=</span> iov_length<span class="br0">(</span> iov<span class="sy0">,</span> nr_segs<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">// Null表读成功</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> total_len <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> do_wakeup <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> ret <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> down<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">for</span> <span class="br0">(</span> <span class="sy0">;;</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> size <span class="sy0">=</span> PIPE_LEN<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> size<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//字符开始位置</span> <span class="kw4">char</span> <span class="sy0">*</span> pipebuf <span class="sy0">=</span> PIPE_BASE<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">+</span> PIPE_START<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//字符数,即PAGE_SIZE- PIPE_START(*inode)</span> ssize_t chars <span class="sy0">=</span> PIPE_MAX_RCHUNK<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> chars <span class="sy0">></span> total_len<span class="br0">)</span> chars <span class="sy0">=</span> total_len<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> chars <span class="sy0">></span> size<span class="br0">)</span> chars <span class="sy0">=</span> size<span class="sy0">;</span> <span class="co1">//从pipebuf拷贝读出到iov用户空间中</span> <span class="kw1">if</span> <span class="br0">(</span> pipe_iov_copy_to_user<span class="br0">(</span> iov<span class="sy0">,</span> pipebuf<span class="sy0">,</span> chars<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> ret <span class="sy0">+=</span> chars<span class="sy0">;</span> <span class="co1">//计算拷贝的下一块字符数</span> PIPE_START<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">+=</span> chars<span class="sy0">;</span> PIPE_START<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">&=</span> <span class="br0">(</span> PIPE_SIZE <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> PIPE_LEN<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">-=</span> chars<span class="sy0">;</span> total_len <span class="sy0">-=</span> chars<span class="sy0">;</span> do_wakeup <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> total_len<span class="br0">)</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="coMULTI">/* common path: read succeeded */</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> PIPE_LEN<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="coMULTI">/* test for cyclic buffers */</span> <span class="kw1">continue</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> PIPE_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//如果有写者,则跳出</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> PIPE_WAITING_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> 如果有等待的写者 <span class="co1">//如果设置O_NONBLOCK或者得到一些数据,就不能进入睡眠。</span> <span class="co1">//但如果一个写者在内核空间睡眠了,就能等待数据。</span> <span class="kw1">if</span> <span class="br0">(</span> ret<span class="br0">)</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> filp<span class="sy0">-></span> f_flags <span class="sy0">&</span> O_NONBLOCK<span class="br0">)</span> <span class="br0">{</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EAGAIN<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> signal_pending<span class="br0">(</span> current<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//挂起当前进程</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> ERESTARTSYS<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> do_wakeup<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//唤醒等待的进程</span> wake_up_interruptible_sync<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> kill_fasync<span class="br0">(</span> PIPE_FASYNC_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> SIGIO<span class="sy0">,</span> POLL_OUT<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//调度等待队列</span> pipe_wait<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> up<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//发信号给异步写者没有更多的空间 </span> <span class="kw1">if</span> <span class="br0">(</span> do_wakeup<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//唤醒等待进程</span> wake_up_interruptible<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> kill_fasync<span class="br0">(</span> PIPE_FASYNC_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> SIGIO<span class="sy0">,</span> POLL_OUT<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> ret <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> file_accessed<span class="br0">(</span> filp<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//访问时间更新</span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> ssize_t pipe_write<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> __user <span class="sy0">*</span> buf<span class="sy0">,</span> size_t count<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//用户空间缓冲区</span> <span class="kw4">struct</span> iovec iov <span class="sy0">=</span> <span class="br0">{</span> .<span class="me1">iov_base</span> <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">void</span> __user <span class="sy0">*</span> <span class="br0">)</span> buf<span class="sy0">,</span> .<span class="me1">iov_len</span> <span class="sy0">=</span> count <span class="br0">}</span> <span class="sy0">;</span> <span class="kw1">return</span> pipe_writev<span class="br0">(</span> filp<span class="sy0">,</span> <span class="sy0">&</span> iov<span class="sy0">,</span> <span class="nu0">1</span> <span class="sy0">,</span> ppos<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> ssize_t pipe_writev<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> _iov<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> nr_segs<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode <span class="sy0">=</span> filp<span class="sy0">-></span> f_dentry<span class="sy0">-></span> d_inode<span class="sy0">;</span> ssize_t ret<span class="sy0">;</span> size_t min<span class="sy0">;</span> <span class="kw4">int</span> do_wakeup<span class="sy0">;</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> iov <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">struct</span> iovec <span class="sy0">*</span> <span class="br0">)</span> _iov<span class="sy0">;</span> size_t total_len<span class="sy0">;</span> total_len <span class="sy0">=</span> iov_length<span class="br0">(</span> iov<span class="sy0">,</span> nr_segs<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">// Null表示写成功</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> total_len <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> do_wakeup <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> ret <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> min <span class="sy0">=</span> total_len<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> min <span class="sy0">></span> PIPE_BUF<span class="br0">)</span> min <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> down<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">for</span> <span class="br0">(</span> <span class="sy0">;;</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> free<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> PIPE_READERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//读者不为0</span> <span class="co1">//给当前进程发SIGPIPE信号,信号的数据为0</span> send_sig<span class="br0">(</span> SIGPIPE<span class="sy0">,</span> current<span class="sy0">,</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EPIPE<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> free <span class="sy0">=</span> PIPE_FREE<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//得到空闲空间大小</span> <span class="kw1">if</span> <span class="br0">(</span> free <span class="sy0">>=</span> min<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//向环形缓冲区写数据</span> ssize_t chars <span class="sy0">=</span> PIPE_MAX_WCHUNK<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="kw4">char</span> <span class="sy0">*</span> pipebuf <span class="sy0">=</span> PIPE_BASE<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">+</span> PIPE_END<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//总是唤醒,即使是拷贝失败,我们锁住由于系统调用造成睡眠的读者 </span> do_wakeup <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> chars <span class="sy0">></span> total_len<span class="br0">)</span> chars <span class="sy0">=</span> total_len<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> chars <span class="sy0">></span> free<span class="br0">)</span> chars <span class="sy0">=</span> free<span class="sy0">;</span> <span class="co1">//从用户空间的iov中拷贝数据到pipebuf中。</span> <span class="kw1">if</span> <span class="br0">(</span> pipe_iov_copy_from_user<span class="br0">(</span> pipebuf<span class="sy0">,</span> iov<span class="sy0">,</span> chars<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> ret <span class="sy0">+=</span> chars<span class="sy0">;</span> PIPE_LEN<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">+=</span> chars<span class="sy0">;</span> total_len <span class="sy0">-=</span> chars<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> total_len<span class="br0">)</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> PIPE_FREE<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">&&</span> ret<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//处理环形缓冲区</span> min <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="kw1">continue</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> filp<span class="sy0">-></span> f_flags <span class="sy0">&</span> O_NONBLOCK<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EAGAIN<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> signal_pending<span class="br0">(</span> current<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//挂起当前进程</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret<span class="br0">)</span> ret <span class="sy0">=</span> <span class="sy0">-</span> ERESTARTSYS<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> do_wakeup<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//唤醒等待的进程</span> wake_up_interruptible_sync<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> kill_fasync<span class="br0">(</span> PIPE_FASYNC_READERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> SIGIO<span class="sy0">,</span> POLL_IN<span class="br0">)</span> <span class="sy0">;</span> do_wakeup <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">}</span> PIPE_WAITING_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">++;</span> <span class="co1">//等待的写者增加</span> pipe_wait<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//调度等待队列</span> PIPE_WAITING_WRITERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">--;</span> <span class="br0">}</span> up<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> do_wakeup<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//唤醒等待的进程</span> wake_up_interruptible<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> kill_fasync<span class="br0">(</span> PIPE_FASYNC_READERS<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> SIGIO<span class="sy0">,</span> POLL_IN<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//节点时间更新</span> <span class="kw1">if</span> <span class="br0">(</span> ret <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> inode_update_time<span class="br0">(</span> inode<span class="sy0">,</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/* mtime and ctime */</span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="br0">}</span>
函数pipe_wait原子操作地释放信号量,并等待一次管道事件。它将进程中断,通过调度来运行等待队列中的进程,然后清除等待队列。函数列出如下:
<span class="kw4">void</span> pipe_wait<span class="br0">(</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="br0">)</span> <p><span class="br0">{</span> DEFINE_WAIT<span class="br0">(</span> wait<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//将inode等待队列加入到wait上,并设置进程状态TASK_INTERRUPTIBLE</span> prepare_to_wait<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> <span class="sy0">&</span> wait<span class="sy0">,</span> TASK_INTERRUPTIBLE<span class="br0">)</span> <span class="sy0">;</span> up<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> schedule<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//进程调度</span> <span class="co1">//设置当前进程为运行状态,并清除wait队列</span> finish_wait<span class="br0">(</span> PIPE_WAIT<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="sy0">,</span> <span class="sy0">&</span> wait<span class="br0">)</span> <span class="sy0">;</span> down<span class="br0">(</span> PIPE_SEM<span class="br0">(</span> <span class="sy0">*</span> inode<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span></p> <span class="br0">}</span>
消息队列
消息队列就是一个消息的链表。具有权限的一个或者多个进程进程可对消息队列进行读写。
消息队列分别有POSIX和System V的消息队列系统调用,其中属于POSIX的系统调用有 sys_mq_open,sys_mq_unlink,sys_mq_timedsend,sys_mq_timedreceive,sys_mq_notify,sys_mq_getsetattr,属于System V的消息队列系统调用有sys_msgget,sys_msgsnd,sys_msgrcv,sys_msgctl。
POSIX消息队列是利用消息队列文件系统来实现,一个文件代表一个消息队列。利用文件节点的结构扩展进消息队列信息结构来容纳消息内容。
System V的消息队列实现是在内核内存中建立消息队列的结构缓存区,通过自定义的消息队列ID,在全局变量static struct ipc_ids msg_ids中定位找到消息队列的结构缓存区,并最终找到消息。全局数据结构struct ipc_ids msg_ids可以访问到每个消息队列头的第一个成员:struct kern_ipc_perm;而每个struct kern_ipc_perm能够与具体的消息队列对应起来,是因为在该结构中,有一个key_t类型成员key,而key则惟一确定一个消息队列。 System V消息队列数据结构之间的关系如图5所示。
图5 System V消息队列数据结构之间的关系由于System V的消息队列的实现与其他的System V通信机制类似,因而,这里只分析POSIX消息队列。
消息队列结构
系统中每个消息队列用一个msq_queue结构描述,列出如下(在include/linux/msg.h中):
<span class="kw4">struct</span> msg_queue <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm q_perm<span class="sy0">;</span> time_t q_stime<span class="sy0">;</span> <span class="co1">//上次消息发送的时间 </span> time_t q_rtime<span class="sy0">;</span> <span class="co1">//上次消息接收的时间</span> time_t q_ctime<span class="sy0">;</span> <span class="co1">//上次发生变化的时间</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> q_cbytes<span class="sy0">;</span> <span class="co1">//队列上当前的字节数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> q_qnum<span class="sy0">;</span> <span class="co1">//队列里的消息数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> q_qbytes<span class="sy0">;</span> <span class="co1">//队列上的最大字节数</span> pid_t q_lspid<span class="sy0">;</span> <span class="co1">//上次发送消息进程的pid </span> pid_t q_lrpid<span class="sy0">;</span> <span class="co1">//上次接收消息进程的pid</span> <span class="kw4">struct</span> list_head q_messages<span class="sy0">;</span> <span class="co1">//消息队列</span> <span class="kw4">struct</span> list_head q_receivers<span class="sy0">;</span> <span class="co1">//消息接收进程链表</span> <span class="kw4">struct</span> list_head q_senders<span class="sy0">;</span> <span class="co1">//消息发送进程链表</span> <span class="br0">}</span> <span class="sy0">;</span>
每个消息用一个msg_msg结构描述,结构列出如下:
<span class="kw4">struct</span> msg_msg <span class="br0">{</span> <p> <span class="kw4">struct</span> list_head m_list<span class="sy0">;</span> <span class="kw4">long</span> m_type<span class="sy0">;</span> <span class="co1">//消息类型 </span> <span class="kw4">int</span> m_ts<span class="sy0">;</span> <span class="co1">//消息的文本大小</span> <span class="kw4">struct</span> msg_msgseg<span class="sy0">*</span> next<span class="sy0">;</span> <span class="co1">//下一条消息</span></p> <span class="br0">}</span> <span class="sy0">;</span>
每个正在睡眠的接收者用一个msg_receiver结构描述,结构列出如下:
<span class="kw4">struct</span> msg_receiver <span class="br0">{</span> <p> <span class="kw4">struct</span> list_head r_list<span class="sy0">;</span> <span class="kw4">struct</span> task_struct<span class="sy0">*</span> r_tsk<span class="sy0">;</span> <span class="co1">//进行读操作的进程</span> <span class="kw4">int</span> r_mode<span class="sy0">;</span> <span class="co1">//读的方式</span> <span class="kw4">long</span> r_msgtype<span class="sy0">;</span> <span class="co1">//读的消息类型</span> <span class="kw4">long</span> r_maxsize<span class="sy0">;</span> <span class="co1">//读消息的最大尺寸</span> <span class="kw4">struct</span> msg_msg<span class="sy0">*</span> <span class="kw4">volatile</span> r_msg<span class="sy0">;</span> <span class="co1">//消息</span></p> <span class="br0">}</span> <span class="sy0">;</span>
每个正在睡眠的发送者用一个msg_sender结构描述,结构列出如下:
<span class="kw4">struct</span> msg_sender <span class="br0">{</span> <p> <span class="kw4">struct</span> list_head list<span class="sy0">;</span> <span class="kw4">struct</span> task_struct<span class="sy0">*</span> tsk<span class="sy0">;</span> <span class="co1">//发送消息的进程</span></p> <span class="br0">}</span> <span class="sy0">;</span>
消息队列文件系统
POSIX消息队列是用特殊的消息队列文件系统来与用户空间进行接口的。下面分析消息队列文件系统(在ipc/mqueue.c中)。
函数init_mqueue_fs初始化消息队列文件系统。它注册消息队列文件系统结构,并挂接到系统中。函数init_mqueue_fs分析如下:
<span class="kw4">static</span> <span class="kw4">struct</span> inode_operations mqueue_dir_inode_operations<span class="sy0">;</span> <span class="kw4">static</span> <span class="kw4">struct</span> file_operations mqueue_file_operations<span class="sy0">;</span> <span class="kw4">static</span> <span class="kw4">struct</span> super_operations mqueue_super_ops<span class="sy0">;</span> <span class="kw4">static</span> kmem_cache_t <span class="sy0">*</span> mqueue_inode_cachep<span class="sy0">;</span> <span class="kw4">static</span> <span class="kw4">struct</span> ctl_table_header <span class="sy0">*</span> mq_sysctl_table<span class="sy0">;</span> <span class="kw4">static</span> <span class="kw4">int</span> __init init_mqueue_fs<span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="co1">//分配结构对象cache缓冲区,结构对象可从缓冲区中分配</span> mqueue_inode_cachep <span class="sy0">=</span> kmem_cache_create<span class="br0">(</span> <span class="st0">"mqueue_inode_cache"</span> <span class="sy0">,</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="kw4">struct</span> mqueue_inode_info<span class="br0">)</span> <span class="sy0">,</span> <span class="nu0">0</span> <span class="sy0">,</span> SLAB_HWCACHE_ALIGN<span class="sy0">,</span> init_once<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> mqueue_inode_cachep <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="co1">// 注册到sysctl表,即加mq_sysctl_root 到sysctl表尾</span> mq_sysctl_table <span class="sy0">=</span> register_sysctl_table<span class="br0">(</span> mq_sysctl_root<span class="sy0">,</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> mq_sysctl_table<span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="kw1">goto</span> out_cache<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//注册文件系统mqueue_fs_type</span> error <span class="sy0">=</span> register_filesystem<span class="br0">(</span> <span class="sy0">&</span> mqueue_fs_type<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error<span class="br0">)</span> <span class="kw1">goto</span> out_sysctl<span class="sy0">;</span> <span class="co1">//装载文件系统</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> mqueue_mnt <span class="sy0">=</span> kern_mount<span class="br0">(</span> <span class="sy0">&</span> mqueue_fs_type<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> mqueue_mnt<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out_filesystem<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//内部初始化,不是一般vfs所需要的 </span> queues_count <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> spin_lock_init<span class="br0">(</span> <span class="sy0">&</span> mq_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> …… <span class="br0">}</span> <span class="kw4">static</span> <span class="kw4">struct</span> file_system_type mqueue_fs_type <span class="sy0">=</span> <span class="br0">{</span> .<span class="me1">name</span> <span class="sy0">=</span> <span class="st0">"mqueue"</span> <span class="sy0">,</span> .<span class="me1">get_sb</span> <span class="sy0">=</span> mqueue_get_sb<span class="sy0">,</span> .<span class="me1">kill_sb</span> <span class="sy0">=</span> kill_litter_super<span class="sy0">,</span> <span class="br0">}</span> <span class="sy0">;</span>
下面分析mqueue_fs_type中的得到超级块操作函数mqueue_get_sb:
<span class="kw4">static</span> <span class="kw4">struct</span> super_block <span class="sy0">*</span> mqueue_get_sb<span class="br0">(</span> <span class="kw4">struct</span> file_system_type <span class="sy0">*</span> fs_type<span class="sy0">,</span> <p> <span class="kw4">int</span> flags<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> <span class="sy0">*</span> dev_name<span class="sy0">,</span> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//创建超级块并用函数mqueue_fill_super填充,再装载文件系统</span> <span class="kw1">return</span> get_sb_single<span class="br0">(</span> fs_type<span class="sy0">,</span> flags<span class="sy0">,</span> data<span class="sy0">,</span> mqueue_fill_super<span class="br0">)</span> <span class="sy0">;</span></p> <span class="br0">}</span>
函数mqueue_fill_super用来初始化超级块,分配节点及根目录,加上超级块操作函数,函数mqueue_fill_super列出如下:
<span class="kw4">static</span> <span class="kw4">int</span> mqueue_fill_super<span class="br0">(</span> <span class="kw4">struct</span> super_block <span class="sy0">*</span> sb<span class="sy0">,</span> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="sy0">,</span> <span class="kw4">int</span> silent<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> sb<span class="sy0">-></span> s_blocksize <span class="sy0">=</span> PAGE_CACHE_SIZE<span class="sy0">;</span> sb<span class="sy0">-></span> s_blocksize_bits <span class="sy0">=</span> PAGE_CACHE_SHIFT<span class="sy0">;</span> sb<span class="sy0">-></span> s_magic <span class="sy0">=</span> MQUEUE_MAGIC<span class="sy0">;</span> sb<span class="sy0">-></span> s_op <span class="sy0">=</span> <span class="sy0">&</span> mqueue_super_ops<span class="sy0">;</span> <span class="co1">//超级块操作函数集实例</span> <span class="co1">//创建节点</span> inode <span class="sy0">=</span> mqueue_get_inode<span class="br0">(</span> sb<span class="sy0">,</span> S_IFDIR <span class="sy0">|</span> S_ISVTX <span class="sy0">|</span> S_IRWXUGO<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> inode<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span></p> <pre> <span class="co1">//分配根目录的dentry</span>
sb->s_root = d_alloc_root(inode);
if (!sb->s_root) {
iput(inode);
return -ENOMEM;
}
return 0;
}
函数mqueue_get_inode创建节点并初始化。函数分析如下:
<span class="kw4">static</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> mqueue_get_inode<span class="br0">(</span> <span class="kw4">struct</span> super_block <span class="sy0">*</span> sb<span class="sy0">,</span> <span class="kw4">int</span> mode<span class="sy0">,</span> <p> <span class="kw4">struct</span> mq_attr <span class="sy0">*</span> attr<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="co1">//创建节点结构,分配新节点号,加入到节点链表</span> inode <span class="sy0">=</span> new_inode<span class="br0">(</span> sb<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> inode<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//填充时间及从当前进程继承来的uid等,</span> inode<span class="sy0">-></span> i_mode <span class="sy0">=</span> mode<span class="sy0">;</span> inode<span class="sy0">-></span> i_uid <span class="sy0">=</span> current<span class="sy0">-></span> fsuid<span class="sy0">;</span> inode<span class="sy0">-></span> i_gid <span class="sy0">=</span> current<span class="sy0">-></span> fsgid<span class="sy0">;</span> inode<span class="sy0">-></span> i_blksize <span class="sy0">=</span> PAGE_CACHE_SIZE<span class="sy0">;</span> inode<span class="sy0">-></span> i_blocks <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> inode<span class="sy0">-></span> i_mtime <span class="sy0">=</span> inode<span class="sy0">-></span> i_ctime <span class="sy0">=</span> inode<span class="sy0">-></span> i_atime <span class="sy0">=</span> CURRENT_TIME<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> S_ISREG<span class="br0">(</span> mode<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//是内存模式</span> <span class="kw4">struct</span> mqueue_inode_info <span class="sy0">*</span> info<span class="sy0">;</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> p <span class="sy0">=</span> current<span class="sy0">;</span> <span class="kw4">struct</span> user_struct <span class="sy0">*</span> u <span class="sy0">=</span> p<span class="sy0">-></span> user<span class="sy0">;</span> <span class="co1">//每uid的用户信息结构</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> mq_bytes<span class="sy0">,</span> mq_msg_tblsz<span class="sy0">;</span> inode<span class="sy0">-></span> i_fop <span class="sy0">=</span> <span class="sy0">&</span> mqueue_file_operations<span class="sy0">;</span> <span class="co1">//文件操作函数</span> inode<span class="sy0">-></span> i_size <span class="sy0">=</span> FILENT_SIZE<span class="sy0">;</span> <span class="co1">//80字节大小</span> <span class="co1">//消息队列的特定信息</span> <span class="co1">//得到包含有节点inode成员的mqueue_inode_info结构</span> info <span class="sy0">=</span> MQUEUE_I<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span> spin_lock_init<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//初始化等待队列</span> init_waitqueue_head<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> wait_q<span class="br0">)</span> <span class="sy0">;</span> INIT_LIST_HEAD<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> e_wait_q<span class="br0">[</span> <span class="nu0">0</span> <span class="br0">]</span> .<span class="me1">list</span> <span class="br0">)</span> <span class="sy0">;</span> INIT_LIST_HEAD<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> e_wait_q<span class="br0">[</span> <span class="nu0">1</span> <span class="br0">]</span> .<span class="me1">list</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//初始化mqueue_inode_info结构</span> info<span class="sy0">-></span> messages <span class="sy0">=</span> NULL<span class="sy0">;</span> info<span class="sy0">-></span> notify_owner <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> info<span class="sy0">-></span> qsize <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> info<span class="sy0">-></span> user <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="coMULTI">/* set when all is ok */</span> memset<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> attr<span class="sy0">,</span> <span class="nu0">0</span> <span class="sy0">,</span> <span class="kw4">sizeof</span> <span class="br0">(</span> info<span class="sy0">-></span> attr<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> info<span class="sy0">-></span> attr.<span class="me1">mq_maxmsg</span> <span class="sy0">=</span> DFLT_MSGMAX<span class="sy0">;</span> info<span class="sy0">-></span> attr.<span class="me1">mq_msgsize</span> <span class="sy0">=</span> DFLT_MSGSIZEMAX<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> attr<span class="br0">)</span> <span class="br0">{</span> info<span class="sy0">-></span> attr.<span class="me1">mq_maxmsg</span> <span class="sy0">=</span> attr<span class="sy0">-></span> mq_maxmsg<span class="sy0">;</span> info<span class="sy0">-></span> attr.<span class="me1">mq_msgsize</span> <span class="sy0">=</span> attr<span class="sy0">-></span> mq_msgsize<span class="sy0">;</span> <span class="br0">}</span></p> <pre> <span class="co1">//计算队列消息表的大小,即有多少条消息</span>
mq_msg_tblsz = info->attr.mq_maxmsg * sizeof(struct msg_msg *);
//计算整个队列消息的字节数
mq_bytes = (mq_msg_tblsz +
(info->attr.mq_maxmsg * info->attr.mq_msgsize));
spin_lock(&mq_lock);
//检查最大消息字节数是否超过限制
if (u->mq_bytes + mq_bytes < u->mq_bytes ||
u->mq_bytes + mq_bytes >
p->rlim[RLIMIT_MSGQUEUE].rlim_cur) {
spin_unlock(&mq_lock);
goto out_inode;
}
//用户结构user_struct中能分配给队列的字节数计算
u->mq_bytes += mq_bytes;
spin_unlock(&mq_lock);
//分配消息表空间
info->messages = kmalloc(mq_msg_tblsz, GFP_KERNEL);
if (!info->messages) {//分配不成功就进行清除处理
spin_lock(&mq_lock);
u->mq_bytes -= mq_bytes;
spin_unlock(&mq_lock);
goto out_inode;
}
/* all is ok */
info->user = get_uid(u);
} else if (S_ISDIR(mode)) { //是目录
inode->i_nlink++;
//赋上节点操作函数
inode->i_size = 2 * DIRENT_SIZE;
inode->i_op = &mqueue_dir_inode_operations;
inode->i_fop = &simple_dir_operations;
}
}
return inode;
out_inode:
make_bad_inode(inode);
iput(inode);
return NULL;
}
下面是消息队列文件系统的一些特殊结构,结构mqueue_inode_info记录了节点的特殊信息,通过其成员vfs_inode可以找到对应的 mqueue_inode_info结构。这个结构列出如下:
<span class="kw4">struct</span> mqueue_inode_info <span class="br0">{</span> <p> spinlock_t lock<span class="sy0">;</span> <span class="kw4">struct</span> inode vfs_inode<span class="sy0">;</span> <span class="co1">//文件系统节点</span> wait_queue_head_t wait_q<span class="sy0">;</span> <span class="kw4">struct</span> msg_msg <span class="sy0">**</span> messages<span class="sy0">;</span> <span class="co1">//消息结构数组指针</span> <span class="kw4">struct</span> mq_attr attr<span class="sy0">;</span> <span class="co1">//消息队列属性</span> <span class="kw4">struct</span> sigevent notify<span class="sy0">;</span> <span class="co1">//信号事件</span> pid_t notify_owner<span class="sy0">;</span> <span class="co1">//给信号的进程pid</span></p> <pre> <span class="kw4">struct</span> user_struct <span class="sy0">*</span> user<span class="sy0">;</span> <span class="co1">//创建消息的用户结构</span>
struct sock *notify_sock;
struct sk_buff *notify_cookie;
struct ext_wait_queue e_wait_q[2]; //分别等待释放空间和消息的进程
unsigned long qsize; //内存中队列的大小,它是所有消息的总和
} ;
结构mq_attr记录了消息队列的属性,列出如下(在include/linux/mqueue.h中):
<span class="kw4">struct</span> mq_attr <span class="br0">{</span> <p> <span class="kw4">long</span> mq_flags<span class="sy0">;</span> <span class="co1">//消息队列标志</span> <span class="kw4">long</span> mq_maxmsg<span class="sy0">;</span> <span class="co1">//最大消息数</span> <span class="kw4">long</span> mq_msgsize<span class="sy0">;</span> <span class="co1">//最大消息尺寸</span> <span class="kw4">long</span> mq_curmsgs<span class="sy0">;</span> <span class="co1">//当前排队的消息数 */</span> <span class="kw4">long</span> __reserved<span class="br0">[</span> <span class="nu0">4</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="co1">//保留,为0</span></p> <span class="br0">}</span> <span class="sy0">;</span>
在下面两个结构中,mqueue_dir_inode_operations是目录节点操作函数,mqueue_file_operations是文件节点操作函数。
<span class="kw4">static</span> <span class="kw4">struct</span> inode_operations mqueue_dir_inode_operations <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">lookup</span> <span class="sy0">=</span> simple_lookup<span class="sy0">,</span> <span class="co1">//目录查找函数,</span> .<span class="me1">create</span> <span class="sy0">=</span> mqueue_create<span class="sy0">,</span> <span class="co1">//创建消息队列,见下一节中分析。</span> .<span class="me1">unlink</span> <span class="sy0">=</span> mqueue_unlink<span class="sy0">,</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">static</span> <span class="kw4">struct</span> file_operations mqueue_file_operations <span class="sy0">=</span> <span class="br0">{</span> .<span class="me1">flush</span> <span class="sy0">=</span> mqueue_flush_file<span class="sy0">,</span> .<span class="me1">poll</span> <span class="sy0">=</span> mqueue_poll_file<span class="sy0">,</span> .<span class="me1">read</span> <span class="sy0">=</span> mqueue_read_file<span class="sy0">,</span></p> <span class="br0">}</span> <span class="sy0">;</span>
消息队列系统调用函数
函数sys_mq_open打开一个消息队列,创建一个消息队列或从文件系统中找到消息队列名对应的文件的file结构。函数分析如下:
asmlinkage <span class="kw4">long</span> sys_mq_open<span class="br0">(</span> <span class="kw4">const</span> <span class="kw4">char</span> __user <span class="sy0">*</span> u_name<span class="sy0">,</span> <span class="kw4">int</span> oflag<span class="sy0">,</span> mode_t mode<span class="sy0">,</span> <span class="kw4">struct</span> mq_attr __user <span class="sy0">*</span> u_attr<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dentry<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">;</span> <span class="kw4">char</span> <span class="sy0">*</span> name<span class="sy0">;</span> <span class="kw4">int</span> fd<span class="sy0">,</span> error<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> name <span class="sy0">=</span> getname<span class="br0">(</span> u_name<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> PTR_ERR<span class="br0">(</span> name<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//得到未用的文件描述符</span> fd <span class="sy0">=</span> get_unused_fd<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> fd <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">goto</span> out_putname<span class="sy0">;</span> down<span class="br0">(</span> <span class="sy0">&</span> mqueue_mnt<span class="sy0">-></span> mnt_root<span class="sy0">-></span> d_inode<span class="sy0">-></span> i_sem<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//以name为关键字用hash算法找到对应的dentry</span> dentry <span class="sy0">=</span> lookup_one_len<span class="br0">(</span> name<span class="sy0">,</span> mqueue_mnt<span class="sy0">-></span> mnt_root<span class="sy0">,</span> strlen<span class="br0">(</span> name<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> dentry<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> dentry<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out_err<span class="sy0">;</span> <span class="br0">}</span> mntget<span class="br0">(</span> mqueue_mnt<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> oflag <span class="sy0">&</span> O_CREAT<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> dentry<span class="sy0">-></span> d_inode<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//entry已存在</span> filp <span class="sy0">=</span> <span class="br0">(</span> oflag <span class="sy0">&</span> O_EXCL<span class="br0">)</span> <span class="sy0">?</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EEXIST<span class="br0">)</span> <span class="sy0">:</span> do_open<span class="br0">(</span> dentry<span class="sy0">,</span> oflag<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//打开dentry</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//创建dentry,即创建新的队列</span> filp <span class="sy0">=</span> do_create<span class="br0">(</span> mqueue_mnt<span class="sy0">-></span> mnt_root<span class="sy0">,</span> dentry<span class="sy0">,</span> oflag<span class="sy0">,</span> mode<span class="sy0">,</span> u_attr<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="co1">//得到消息队列名对应的文件file结构</span> filp <span class="sy0">=</span> <span class="br0">(</span> dentry<span class="sy0">-></span> d_inode<span class="br0">)</span> <span class="sy0">?</span> do_open<span class="br0">(</span> dentry<span class="sy0">,</span> oflag<span class="br0">)</span> <span class="sy0">:</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> ENOENT<span class="br0">)</span> <span class="sy0">;</span> dput<span class="br0">(</span> dentry<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> filp<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> filp<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out_putfd<span class="sy0">;</span> <span class="br0">}</span> set_close_on_exec<span class="br0">(</span> fd<span class="sy0">,</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//设置files->close_on_exec中的fd</span> fd_install<span class="br0">(</span> fd<span class="sy0">,</span> filp<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//安装文件指针到fd数组里,即current->files->fd[fd] = filp</span> <span class="kw1">goto</span> out_upsem<span class="sy0">;</span> …… <span class="br0">}</span>
函数do_create创建一个新的队列,它调用节点的操作函数create来完成创建队列工作,即调用对应为函数mqueue_create。函数 do_create分析如下:
<span class="kw4">static</span> <span class="kw4">struct</span> file <span class="sy0">*</span> do_create<span class="br0">(</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dir<span class="sy0">,</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dentry<span class="sy0">,</span> <p> <span class="kw4">int</span> oflag<span class="sy0">,</span> mode_t mode<span class="sy0">,</span> <span class="kw4">struct</span> mq_attr __user <span class="sy0">*</span> u_attr<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">;</span> <span class="kw4">struct</span> mq_attr attr<span class="sy0">;</span> <span class="kw4">int</span> ret<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> u_attr <span class="sy0">!=</span> NULL<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> copy_from_user<span class="br0">(</span> <span class="sy0">&</span> attr<span class="sy0">,</span> u_attr<span class="sy0">,</span> <span class="kw4">sizeof</span> <span class="br0">(</span> attr<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EFAULT<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> mq_attr_ok<span class="br0">(</span> <span class="sy0">&</span> attr<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EINVAL<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//存起来以便在创建期间使用</span> dentry<span class="sy0">-></span> d_fsdata <span class="sy0">=</span> <span class="sy0">&</span> attr<span class="sy0">;</span> <span class="br0">}</span></p> <pre> <span class="co1">//调用dir的节点操作函数create创建队列的节点,即函数mqueue_create</span>
ret = vfs_create(dir->d_inode, dentry, mode, NULL);
dentry->d_fsdata = NULL;
if (ret)
return ERR_PTR(ret);
//打开dentry得到文件结构filp,
filp = dentry_open(dentry, mqueue_mnt, oflag);
if (!IS_ERR(filp))
dget(dentry); // dentry->d_count加1
return filp;
}
函数mqueue_create完成创建消息队列的具体工作,函数列出如下:
<span class="kw4">static</span> <span class="kw4">int</span> mqueue_create<span class="br0">(</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> dir<span class="sy0">,</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dentry<span class="sy0">,</span> <p> <span class="kw4">int</span> mode<span class="sy0">,</span> <span class="kw4">struct</span> nameidata <span class="sy0">*</span> nd<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> mq_attr <span class="sy0">*</span> attr <span class="sy0">=</span> dentry<span class="sy0">-></span> d_fsdata<span class="sy0">;</span> <span class="kw4">int</span> error<span class="sy0">;</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> mq_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> queues_count <span class="sy0">>=</span> queues_max <span class="sy0">&&</span> <span class="sy0">!</span> capable<span class="br0">(</span> CAP_SYS_RESOURCE<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENOSPC<span class="sy0">;</span> <span class="kw1">goto</span> out_lock<span class="sy0">;</span> <span class="br0">}</span> queues_count<span class="sy0">++;</span> <span class="co1">//消息队列计数</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> mq_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//创建节点,填充消息队列的信息结构</span> inode <span class="sy0">=</span> mqueue_get_inode<span class="br0">(</span> dir<span class="sy0">-></span> i_sb<span class="sy0">,</span> mode<span class="sy0">,</span> attr<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> inode<span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> mq_lock<span class="br0">)</span> <span class="sy0">;</span> queues_count<span class="sy0">--;</span> <span class="kw1">goto</span> out_lock<span class="sy0">;</span> <span class="br0">}</span> dir<span class="sy0">-></span> i_size <span class="sy0">+=</span> DIRENT_SIZE<span class="sy0">;</span> <span class="co1">//dir的时间更新</span> dir<span class="sy0">-></span> i_ctime <span class="sy0">=</span> dir<span class="sy0">-></span> i_mtime <span class="sy0">=</span> dir<span class="sy0">-></span> i_atime <span class="sy0">=</span> CURRENT_TIME<span class="sy0">;</span> <span class="co1">//将节点加入到dentry中</span> d_instantiate<span class="br0">(</span> dentry<span class="sy0">,</span> inode<span class="br0">)</span> <span class="sy0">;</span> dget<span class="br0">(</span> dentry<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">// dentry->d_count加1</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> out_lock<span class="sy0">:</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> mq_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> error<span class="sy0">;</span></p> <span class="br0">}</span>
系统调用sys_mq_timedsend是进程向消息队列发送消息的操作函数。函数分析如下:
asmlinkage <span class="kw4">long</span> sys_mq_timedsend<span class="br0">(</span> mqd_t mqdes<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> __user <span class="sy0">*</span> u_msg_ptr<span class="sy0">,</span> <p> size_t msg_len<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> msg_prio<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> timespec __user <span class="sy0">*</span> u_abs_timeout<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">;</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> ext_wait_queue wait<span class="sy0">;</span> <span class="kw4">struct</span> ext_wait_queue <span class="sy0">*</span> receiver<span class="sy0">;</span> <span class="kw4">struct</span> msg_msg <span class="sy0">*</span> msg_ptr<span class="sy0">;</span> <span class="kw4">struct</span> mqueue_inode_info <span class="sy0">*</span> info<span class="sy0">;</span> <span class="kw4">long</span> timeout<span class="sy0">;</span> <span class="kw4">int</span> ret<span class="sy0">;</span></p> <pre> <span class="co1">//</span>
if (unlikely(msg_prio >= (unsigned long) MQ_PRIO_MAX))
return -EINVAL;
//将用户空间的定时值拷贝到内核并转换成内核的时间jiffies
timeout = prepare_timeout(u_abs_timeout);
ret = -EBADF;
filp = fget(mqdes); //由文件描述符得到文件结构
if (unlikely(!filp))
goto out;
inode = filp->f_dentry->d_inode;//得到节点
……
//分配空间并将用户空间的消息拷贝到msg_ptr = msgmsg结构+消息内容
msg_ptr = load_msg(u_msg_ptr, msg_len);
if (IS_ERR(msg_ptr)) {
ret = PTR_ERR(msg_ptr);
goto out_fput;
}
msg_ptr->m_ts = msg_len;
msg_ptr->m_type = msg_prio;
spin_lock(&info->lock);
//当前消息数达到最大,则阻塞睡眠一段时间后再调度进程发送消息
if (info->attr.mq_curmsgs == info->attr.mq_maxmsg) {
if (filp->f_flags & O_NONBLOCK) {//如果为不需要阻塞,则返回
spin_unlock(&info->lock);
ret = -EAGAIN;
} else if (unlikely(timeout < 0)) {//超时
spin_unlock(&info->lock);
ret = timeout;
} else {//阻塞一段时间再发送
wait.task = current;
wait.msg = (void *) msg_ptr;
wait.state = STATE_NONE;
//加wait到info->e_wait_q[SEND]队列中优先级小于它的元素前面,
//调用schedule_timeout函数进行定时调度,即睡眠一段时间再调度
ret = wq_sleep(info, SEND, timeout, &wait);
}
if (ret < 0)
free_msg(msg_ptr); //释放对象空间
} else {//发送消息
//从接收等待队列中得到第一个等待的ext_wait_queue结构
receiver = wq_get_first_waiter(info, RECV);
if (receiver) {//如果有等待的接收者,消息给接收者,并唤醒接收者进程
pipelined_send(info, msg_ptr, receiver);
} else {
//如果没有等待的接收者,加消息到info的消息数组未尾
msg_insert(msg_ptr, info);
__do_notify(info); //信号处理及唤醒info中的等待队列
}
//节点时间更新
inode->i_atime = inode->i_mtime = inode->i_ctime =
CURRENT_TIME;
spin_unlock(&info->lock);
ret = 0;
}
out_fput:
fput(filp);
out:
return ret;
}
函数load_msg将用户空间的消息拷贝到内核空间并存入消息结构中,函数分析如下:
<span class="kw4">struct</span> msg_msg <span class="sy0">*</span> load_msg<span class="br0">(</span> <span class="kw4">const</span> <span class="kw4">void</span> __user <span class="sy0">*</span> src<span class="sy0">,</span> <span class="kw4">int</span> len<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">struct</span> msg_msg <span class="sy0">*</span> msg<span class="sy0">;</span> <span class="kw4">struct</span> msg_msgseg <span class="sy0">**</span> pseg<span class="sy0">;</span> <span class="kw4">int</span> err<span class="sy0">;</span> <span class="kw4">int</span> alen<span class="sy0">;</span> alen <span class="sy0">=</span> len<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> alen <span class="sy0">></span> DATALEN_MSG<span class="br0">)</span> alen <span class="sy0">=</span> DATALEN_MSG<span class="sy0">;</span> <span class="co1">//分配消息结构空间及消息内容空间</span> msg <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">struct</span> msg_msg <span class="sy0">*</span> <span class="br0">)</span> kmalloc<span class="br0">(</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> msg<span class="br0">)</span> <span class="sy0">+</span> alen<span class="sy0">,</span> GFP_KERNEL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> msg <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> ENOMEM<span class="br0">)</span> <span class="sy0">;</span> msg<span class="sy0">-></span> next <span class="sy0">=</span> NULL<span class="sy0">;</span> msg<span class="sy0">-></span> security <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="co1">//从用户空间拷贝消息内容到内核的msg空间里的msgmsg结构后面内容区</span> <span class="kw1">if</span> <span class="br0">(</span> copy_from_user<span class="br0">(</span> msg <span class="sy0">+</span> <span class="nu0">1</span> <span class="sy0">,</span> src<span class="sy0">,</span> alen<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw1">goto</span> out_err<span class="sy0">;</span> <span class="br0">}</span> len <span class="sy0">-=</span> alen<span class="sy0">;</span> src <span class="sy0">=</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">char</span> __user <span class="sy0">*</span> <span class="br0">)</span> src<span class="br0">)</span> <span class="sy0">+</span> alen<span class="sy0">;</span> pseg <span class="sy0">=</span> <span class="sy0">&</span> msg<span class="sy0">-></span> next<span class="sy0">;</span> <span class="co1">//将超过DATALEN_MSG的消息,分存几个消息结构中</span> <span class="kw1">while</span> <span class="br0">(</span> len <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> msg_msgseg <span class="sy0">*</span> seg<span class="sy0">;</span> alen <span class="sy0">=</span> len<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> alen <span class="sy0">></span> DATALEN_SEG<span class="br0">)</span> alen <span class="sy0">=</span> DATALEN_SEG<span class="sy0">;</span> seg <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">struct</span> msg_msgseg <span class="sy0">*</span> <span class="br0">)</span> kmalloc<span class="br0">(</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> seg<span class="br0">)</span> <span class="sy0">+</span> alen<span class="sy0">,</span> GFP_KERNEL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> seg <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="kw1">goto</span> out_err<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">*</span> pseg <span class="sy0">=</span> seg<span class="sy0">;</span> seg<span class="sy0">-></span> next <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> copy_from_user<span class="br0">(</span> seg <span class="sy0">+</span> <span class="nu0">1</span> <span class="sy0">,</span> src<span class="sy0">,</span> alen<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw1">goto</span> out_err<span class="sy0">;</span> <span class="br0">}</span> pseg <span class="sy0">=</span> <span class="sy0">&</span> seg<span class="sy0">-></span> next<span class="sy0">;</span> len <span class="sy0">-=</span> alen<span class="sy0">;</span> src <span class="sy0">=</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">char</span> __user <span class="sy0">*</span> <span class="br0">)</span> src<span class="br0">)</span> <span class="sy0">+</span> alen<span class="sy0">;</span> <span class="br0">}</span> err <span class="sy0">=</span> security_msg_msg_alloc<span class="br0">(</span> msg<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> err<span class="br0">)</span> <span class="kw1">goto</span> out_err<span class="sy0">;</span> <span class="kw1">return</span> msg<span class="sy0">;</span> out_err<span class="sy0">:</span> free_msg<span class="br0">(</span> msg<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> err<span class="br0">)</span> <span class="sy0">;</span></p> <span class="br0">}</span>
像管道线似的发送与接收函数的处理逻辑说明如下:
如果接收者没发现等待消息,它就把自己注册进等待接收者的链表里。发送者在向消息数组加新消息之前检查链表。如果有一个等待的接收者,它就忽略消息数组,并且直接处理在接收者上的消息。
接收者在没有抢夺队列自旋锁的情况下接受消息并返回。因此,一个中间的STATE_PENDING状态和内存屏障是必需的。同样的算法用到了System V的信号量上,见ipc/sem.c。
同样的算法也用在发送者上。
函数pipelined_send直接发送一个消息给等待在sys_mq_timedreceive()里的任务,而没有把消息插入到队列中。
<span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> pipelined_send<span class="br0">(</span> <span class="kw4">struct</span> mqueue_inode_info <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">struct</span> msg_msg <span class="sy0">*</span> message<span class="sy0">,</span> <span class="kw4">struct</span> ext_wait_queue <span class="sy0">*</span> receiver<span class="br0">)</span> <span class="br0">{</span> receiver<span class="sy0">-></span> msg <span class="sy0">=</span> message<span class="sy0">;</span> <span class="co1">//接收者得到消息</span> list_del<span class="br0">(</span> <span class="sy0">&</span> receiver<span class="sy0">-></span> list<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//清除接收者链表</span> receiver<span class="sy0">-></span> state <span class="sy0">=</span> STATE_PENDING<span class="sy0">;</span> <span class="co1">//设置状态为挂起接收者</span> wake_up_process<span class="br0">(</span> receiver<span class="sy0">-></span> task<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//唤醒接收者进程</span> wmb<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//内存屏障</span> receiver<span class="sy0">-></span> state <span class="sy0">=</span> STATE_READY<span class="sy0">;</span> <span class="co1">//设置接收者状态为准备好状态</span> <span class="br0">}</span>
系统调用sys_mq_timedreceive被消息接收进程用来定时接收消息。其列出如下:
asmlinkage ssize_t sys_mq_timedreceive<span class="br0">(</span> mqd_t mqdes<span class="sy0">,</span> <span class="kw4">char</span> __user <span class="sy0">*</span> u_msg_ptr<span class="sy0">,</span> <p> size_t msg_len<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> __user <span class="sy0">*</span> u_msg_prio<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> timespec __user <span class="sy0">*</span> u_abs_timeout<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">long</span> timeout<span class="sy0">;</span> ssize_t ret<span class="sy0">;</span> <span class="kw4">struct</span> msg_msg <span class="sy0">*</span> msg_ptr<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">;</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> mqueue_inode_info <span class="sy0">*</span> info<span class="sy0">;</span> <span class="kw4">struct</span> ext_wait_queue wait<span class="sy0">;</span> <span class="co1">//将用户空间的定时值拷贝到内核并转换成内核的时间jiffies</span> timeout <span class="sy0">=</span> prepare_timeout<span class="br0">(</span> u_abs_timeout<span class="br0">)</span> <span class="sy0">;</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EBADF<span class="sy0">;</span> filp <span class="sy0">=</span> fget<span class="br0">(</span> mqdes<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//从文件描述符中得到文件file结构</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> <span class="sy0">!</span> filp<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> out<span class="sy0">;</span> <span class="co1">//得到节点</span> inode <span class="sy0">=</span> filp<span class="sy0">-></span> f_dentry<span class="sy0">-></span> d_inode<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> filp<span class="sy0">-></span> f_op <span class="sy0">!=</span> <span class="sy0">&</span> mqueue_file_operations<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> out_fput<span class="sy0">;</span> info <span class="sy0">=</span> MQUEUE_I<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//从节点得到消息队列信息结构 </span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> <span class="sy0">!</span> <span class="br0">(</span> filp<span class="sy0">-></span> f_mode <span class="sy0">&</span> FMODE_READ<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> out_fput<span class="sy0">;</span> <span class="co1">//检查buffer是否足够大</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> msg_len <span class="sy0"><</span> info<span class="sy0">-></span> attr.<span class="me1">mq_msgsize</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EMSGSIZE<span class="sy0">;</span> <span class="kw1">goto</span> out_fput<span class="sy0">;</span> <span class="br0">}</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> info<span class="sy0">-></span> attr.<span class="me1">mq_curmsgs</span> <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//如果当前没有消息可接收,阻塞进程</span> <span class="kw1">if</span> <span class="br0">(</span> filp<span class="sy0">-></span> f_flags <span class="sy0">&</span> O_NONBLOCK<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//如果不允许阻塞,返回</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EAGAIN<span class="sy0">;</span> msg_ptr <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> timeout <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//超时</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> ret <span class="sy0">=</span> timeout<span class="sy0">;</span> msg_ptr <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//阻塞睡眠进程</span> wait.<span class="me1">task</span> <span class="sy0">=</span> current<span class="sy0">;</span> wait.<span class="me1">state</span> <span class="sy0">=</span> STATE_NONE<span class="sy0">;</span> <span class="co1">//加wait到info->e_wait_q[RECV]队列中优先级小于它的元素前面,</span> <span class="co1">//调用schedule_timeout函数进行定时调度,即睡眠一段时间再调度</span> ret <span class="sy0">=</span> wq_sleep<span class="br0">(</span> info<span class="sy0">,</span> RECV<span class="sy0">,</span> timeout<span class="sy0">,</span> <span class="sy0">&</span> wait<span class="br0">)</span> <span class="sy0">;</span> msg_ptr <span class="sy0">=</span> wait.<span class="me1">msg</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//接收消息</span> msg_ptr <span class="sy0">=</span> msg_get<span class="br0">(</span> info<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//从info中得到msgmsg结构</span> <span class="co1">//更新节点当前时间</span> inode<span class="sy0">-></span> i_atime <span class="sy0">=</span> inode<span class="sy0">-></span> i_mtime <span class="sy0">=</span> inode<span class="sy0">-></span> i_ctime <span class="sy0">=</span> CURRENT_TIME<span class="sy0">;</span> <span class="co1">//接收消息,这样消息队列里就有空间了 </span> pipelined_receive<span class="br0">(</span> info<span class="br0">)</span> <span class="sy0">;</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> ret <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> ret <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> ret <span class="sy0">=</span> msg_ptr<span class="sy0">-></span> m_ts<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> u_msg_prio <span class="sy0">&&</span> put_user<span class="br0">(</span> msg_ptr<span class="sy0">-></span> m_type<span class="sy0">,</span> u_msg_prio<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">||</span> <span class="co1">//将消息msg_ptr拷贝存入用户空间的u_msg_ptr中</span> store_msg<span class="br0">(</span> u_msg_ptr<span class="sy0">,</span> msg_ptr<span class="sy0">,</span> msg_ptr<span class="sy0">-></span> m_ts<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> ret <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="br0">}</span> free_msg<span class="br0">(</span> msg_ptr<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> out_fput<span class="sy0">:</span> fput<span class="br0">(</span> filp<span class="br0">)</span> <span class="sy0">;</span> out<span class="sy0">:</span> <span class="kw1">return</span> ret<span class="sy0">;</span></p> <span class="br0">}</span>
函数pipelined_receive完成消息接收工作。如果有进程正等待在sys_mq_timedsend()上发送消息,就得到它的消息,并放到队列中,必须确信队列中有空闲空间。
<span class="kw4">static</span> <span class="kw2">inline</span> <span class="kw4">void</span> pipelined_receive<span class="br0">(</span> <span class="kw4">struct</span> mqueue_inode_info <span class="sy0">*</span> info<span class="br0">)</span> <p><span class="br0">{</span> <span class="co1">//得到等待发送者</span> <span class="kw4">struct</span> ext_wait_queue <span class="sy0">*</span> sender <span class="sy0">=</span> wq_get_first_waiter<span class="br0">(</span> info<span class="sy0">,</span> SEND<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> sender<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//如果没有发送者,唤醒info中的等待队列中</span> <span class="coMULTI">/* for poll */</span> wake_up_interruptible<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">-></span> wait_q<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span> msg_insert<span class="br0">(</span> sender<span class="sy0">-></span> msg<span class="sy0">,</span> info<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//将发送者消息插入到info中</span> list_del<span class="br0">(</span> <span class="sy0">&</span> sender<span class="sy0">-></span> list<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//删除发送者链表</span> sender<span class="sy0">-></span> state <span class="sy0">=</span> STATE_PENDING<span class="sy0">;</span> <span class="co1">//发送者状态为挂起</span> wake_up_process<span class="br0">(</span> sender<span class="sy0">-></span> task<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//唤醒发送者进程</span> wmb<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//内存屏障</span> sender<span class="sy0">-></span> state <span class="sy0">=</span> STATE_READY<span class="sy0">;</span> <span class="co1">//发送者状态为准备</span></p> <span class="br0">}</span>
共享内存
进程A,B共享内存是指同一块物理内存被映射到进程A,B各自的进程地址空间。进程A可以即时地看到进程B对共享内存中数据的更新,反之亦然。
共享内存方式有mmap()系统调用、Posix共享内存,以及系统V共享内存。其中mmap()系统调用是通过把普通文件在不同进程中打开并映射到内存后,在不同进程间可访问这个映射,最终达到共享内存的目的。Posix共享内存在Linux2.6中还没实现。系统V共享内存是在内存文件系统-tmpfs文件系统中建立文件,然后把文件映射到不同进程空间达到共享内存的作用。
每个新创建的共享内存区域由一个shmid_ds数据结构来表示。它们被保存在shm_segs数组中。shmid_ds数据结构描述共享内存的大小,进程如何使用,以及共享内存映射到其各自地址空间的方式。由共享内存创建者控制对此内存的存取权限,以及其键是公有还是私有。如果它有足够的权限,则它还可以将此共享内存加载到物理内存中。
每个使用此共享内存的进程必须通过系统调用将其连接到虚拟内存上。这时进程创建新的vm_area_struct来描述此共享内存。进程可以决定此共享内存在其虚拟地址空间的位置,或者让Linux选择一块足够大的区域。
新的vm_area_struct结构将被放到由shmid_ds指向的vm_area_struct链表中。通过 vm_next_shared和vm_prev_shared指针将它们连接起来。虚拟内存在连接时并没有创建;进程访问它时才创建。
当进程首次访问共享虚拟内存中的页面时,将产生页面错。当取回此页面后,Linux找到了描述此页面的vm_area_struct数据结构。它包含指向使用此种类型虚拟内存的处理函数地址指针。共享内存页面错误处理代码将在此shmid_ds对应的页表入口链表中寻找是否存在此共享虚拟内存页面。如果不存在,则它将分配物理页面,并为其创建页表入口。同时还将它放入当前进程的页表中,此入口被保存在shmid_ds结构中。这意味着下个试图访问此内存的进程还会产生页面错误,共享内存错误处理函数将为此进程使用其新创建的物理页面。这样,第一个访问虚拟内存页面的进程创建这块内存,随后的进程把此页面加入到各自的虚拟地址空间中。
当进程不再共享此虚拟内存时,进程和共享内存的连接将被断开。如果其他进程还在使用这个内存,则此操作只影响当前进程。其对应的 vm_area_struct结构将从shmid_ds结构中删除并回收。当前进程对应此共享内存地址的页表入口也将被更新并置为无效。
当最后一个进程断开与共享内存的连接时,当前位于物理内存中的共享内存页面将被释放,同时还有此共享内存的shmid_ds结构。
共享内存相关结构
每一个共享内存区都有一个控制结构struct shmid_kernel,它对于内核来说是私有的。结构中成员shm_file存储了将被映射文件的地址。每个共享内存区对象都对应特殊文件系统shm 中的一个文件。在一般情况下,特殊文件系统shm中的文件是不能用read()、write()等方法访问的。当采取共享内存的方式把其中的文件映射到进程地址空间后,可直接采用访问内存的方式对其访问。
结构shmid_kernel列出如下(在include/linux/shm.h中):
<span class="kw4">struct</span> shmid_kernel <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm shm_perm<span class="sy0">;</span> <span class="coMULTI">/* 操作权限 */</span> <span class="kw4">struct</span> file <span class="sy0">*</span> shm_file<span class="sy0">;</span> <span class="kw4">int</span> id<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> shm_nattch<span class="sy0">;</span> <span class="coMULTI">/*当前附加到该段的进程的个数 */</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> shm_segsz<span class="sy0">;</span> <span class="coMULTI">/* 段的大小(以字节为单位) */</span> time_t shm_atim<span class="sy0">;</span> <span class="coMULTI">/* 最后一个进程附加到该段的时间 */</span> time_t shm_dtim<span class="sy0">;</span> <span class="coMULTI">/* 最后一个进程离开该段的时间 */</span> time_t shm_ctim<span class="sy0">;</span> <span class="coMULTI">/* 最后一次修改这个结构的时间 */</span> pid_t shm_cprid<span class="sy0">;</span> <span class="coMULTI">/*创建该段进程的 pid */</span> pid_t shm_lprid<span class="sy0">;</span> <span class="coMULTI">/* 在该段上操作的最后一个进程的pid */</span> <span class="kw4">struct</span> user_struct <span class="sy0">*</span> mlock_user<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span>
内核通过全局数据结构struct ipc_ids shm_ids维护系统中的所有共享内存区域。shm_ids.entries变量指向一个ipc_id结构数组,而在每个ipc_id结构数组中都有个指向kern_ipc_perm结构的指针。共享内存数据结构之间的关系如图3所示。
对于系统V共享内存区来说,kern_ipc_perm的宿主或说容器是shmid_kernel结构,shmid_kernel描述了一个共享内存区域,通过shm_ids可访问到系统中所有的共享区域。
同时,在shmid_kernel结构的file类型指针shm_file指向tmpfs文件系统中对应的文件,这样,共享内存区域就与 tmpfs文件系统中的文件对应起来。可通过文件系统来映射到共享内存了。
共享内存文件系统
调用shmget()时,创建了一个共享内存区域,并且创建了tmpfs文件系统中的一个同名文件,与共享内存区域相对应。在创建了一个共享内存区域后,还要将它映射到进程地址空间,系统调用shmat()完成此项功能。调用shmat()的过程就是映射到tmpfs文件系统中的同名文件过程,类似于mmap()系统调用。
另外,还有一个hugetlbfs内存文件系统可用于共享内存,它的功能与tmpfs文件系统大同小异。这里只分析tmpfs文件系统。
tmpfs文件系统是基于内存的文件系统,使用磁盘交换空间来存储,并且当为存储文件请求页面时,使用虚拟内存(VM)子系统。 Ext2fs和JFFS2等文件系统驻留在底层块设备之上,而tmpfs直接位于VM上。默认系统就会加载/dev/shm。
tmpfs文件系统与ramfs文件系统相比,tmpfs可获得交换与限制检查。还有一个以内存作为操作对象的Ramdisk盘(在 /dev/ram*下),Ramdisk在物理ram上模拟一个固定尺寸的硬盘,在Ramdisk上面可创建普通的文件系统。Ramdisk不能交换并且不能改变大小。这几个概念不能混淆。
函数init_tmpfs用来初始化tmpfs文件系统,它注册tmpfs_fs_type文件系统类型,挂接文件系统。函数init_tmpfs分析如下(在mm/tmpfs.c中):
<span class="kw4">static</span> <span class="kw4">int</span> __init init_tmpfs<span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="co1">//创建结构shmem_inode_info的对象缓冲区</span> error <span class="sy0">=</span> init_inodecache<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error<span class="br0">)</span> <span class="kw1">goto</span> out3<span class="sy0">;</span> <span class="co1">//注册文件系统</span> error <span class="sy0">=</span> register_filesystem<span class="br0">(</span> <span class="sy0">&</span> tmpfs_fs_type<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error<span class="br0">)</span> <span class="br0">{</span> printk<span class="br0">(</span> KERN_ERR <span class="st0">"Could not register tmpfs<span class="es1">/n</span> "</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out2<span class="sy0">;</span> <span class="br0">}</span> <span class="co2">#ifdef CONFIG_TMPFS</span> devfs_mk_dir<span class="br0">(</span> <span class="st0">"shm"</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//在设备文件系统中建立shm目录</span> <span class="co2">#endif</span> <span class="co1">//挂接文件系统</span> shm_mnt <span class="sy0">=</span> do_kern_mount<span class="br0">(</span> tmpfs_fs_type.<span class="me1">name</span> <span class="sy0">,</span> MS_NOUSER<span class="sy0">,</span> tmpfs_fs_type.<span class="me1">name</span> <span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> …… <span class="br0">}</span>
结构shmem_inode_info是共享内存特殊节点信息结构,通过其成员vfs_inode节点,可找到这个结构。结构 shmem_inode_info分析如下(在include/linux/shmem_fs.h中):
<span class="kw4">struct</span> shmem_inode_info <span class="br0">{</span> <p> spinlock_t lock<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> flags<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> alloced<span class="sy0">;</span> <span class="co1">//分配给文件的数据页</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> swapped<span class="sy0">;</span> <span class="co1">//指定给swap的总数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> next_index<span class="sy0">;</span> <span class="coMULTI">/* highest alloced index + 1 */</span> <span class="kw4">struct</span> shared_policy policy<span class="sy0">;</span> <span class="co1">//NUMA内存分配策略</span> <span class="kw4">struct</span> page <span class="sy0">*</span> i_indirect<span class="sy0">;</span> <span class="co1">//顶层间接块页</span> swp_entry_t i_direct<span class="br0">[</span> SHMEM_NR_DIRECT<span class="br0">]</span> <span class="sy0">;</span> <span class="co1">//第一个块</span> <span class="kw4">struct</span> list_head swaplist<span class="sy0">;</span> <span class="co1">//可能在交换(swap)的链表</span> <span class="kw4">struct</span> inode vfs_inode<span class="sy0">;</span></p> <span class="br0">}</span> <span class="sy0">;</span>
结构shmem_sb_info是共享内存超级块信息结构,描述了文件系统的块数和节点数。这个结构列出如下(在include/linux /shmem_fs.h中):
<span class="kw4">struct</span> shmem_sb_info <span class="br0">{</span> <p> <span class="kw4">unsigned</span> <span class="kw4">long</span> max_blocks<span class="sy0">;</span> <span class="co1">//允许的最大块数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> free_blocks<span class="sy0">;</span> <span class="co1">//可分配的空闲块数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> max_inodes<span class="sy0">;</span> <span class="co1">//允许的最大节点数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> free_inodes<span class="sy0">;</span> <span class="co1">//可分配的节点数</span> spinlock_t stat_lock<span class="sy0">;</span></p> <span class="br0">}</span> <span class="sy0">;</span>
结构tmpfs_fs_type描述了文件系统类型,列出如下:
<span class="kw4">static</span> <span class="kw4">struct</span> file_system_type tmpfs_fs_type <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">owner</span> <span class="sy0">=</span> THIS_MODULE<span class="sy0">,</span> .<span class="me1">name</span> <span class="sy0">=</span> <span class="st0">"tmpfs"</span> <span class="sy0">,</span> .<span class="me1">get_sb</span> <span class="sy0">=</span> shmem_get_sb<span class="sy0">,</span> .<span class="me1">kill_sb</span> <span class="sy0">=</span> kill_litter_super<span class="sy0">,</span></p> <span class="br0">}</span> <span class="sy0">;</span>
函数shmem_fill_super填充超级块结构,分配根节点及根目录,函数列出如下(在mm/shmem.c中):
<span class="kw4">static</span> <span class="kw4">int</span> shmem_fill_super<span class="br0">(</span> <span class="kw4">struct</span> super_block <span class="sy0">*</span> sb<span class="sy0">,</span> <p> <span class="kw4">void</span> <span class="sy0">*</span> data<span class="sy0">,</span> <span class="kw4">int</span> silent<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> root<span class="sy0">;</span> <span class="kw4">int</span> mode <span class="sy0">=</span> S_IRWXUGO <span class="sy0">|</span> S_ISVTX<span class="sy0">;</span> uid_t uid <span class="sy0">=</span> current<span class="sy0">-></span> fsuid<span class="sy0">;</span> gid_t gid <span class="sy0">=</span> current<span class="sy0">-></span> fsgid<span class="sy0">;</span> <span class="kw4">int</span> err <span class="sy0">=</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="co2">#ifdef CONFIG_TMPFS</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> blocks <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> inodes <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> <span class="br0">(</span> sb<span class="sy0">-></span> s_flags <span class="sy0">&</span> MS_NOUSER<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> blocks <span class="sy0">=</span> totalram_pages <span class="sy0">/</span> <span class="nu0">2</span> <span class="sy0">;</span> <span class="co1">//限制块数到内存页数的一半</span> inodes <span class="sy0">=</span> totalram_pages <span class="sy0">-</span> totalhigh_pages<span class="sy0">;</span> <span class="co1">//限制节点数</span> <span class="kw1">if</span> <span class="br0">(</span> inodes <span class="sy0">></span> blocks<span class="br0">)</span> <span class="co1">//节点数不超过块数</span> inodes <span class="sy0">=</span> blocks<span class="sy0">;</span></p> <pre> <span class="co1">//分析data得到mode、uid、gid、blocks、inodes</span>
if (shmem_parse_options(data, &mode,
&uid, &gid, &blocks, &inodes))
return -EINVAL;
}
if (blocks || inodes) {
struct shmem_sb_info *sbinfo;
//分配对象空间
sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
if (!sbinfo)
return -ENOMEM;
sb->s_fs_info = sbinfo;
spin_lock_init(&sbinfo->stat_lock);
sbinfo->max_blocks = blocks;
sbinfo->free_blocks = blocks;
sbinfo->max_inodes = inodes;
sbinfo->free_inodes = inodes;
}
#endif
sb->s_maxbytes = SHMEM_MAX_BYTES;
sb->s_blocksize = PAGE_CACHE_SIZE;
sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
sb->s_magic = TMPFS_MAGIC;
sb->s_op = &shmem_ops; //超级块操作函数
inode = shmem_get_inode(sb, S_IFDIR | mode, 0);//分配节点
if (!inode)
goto failed;
inode->i_uid = uid;
inode->i_gid = gid;
root = d_alloc_root(inode);//分配根目录
if (!root)
goto failed_iput;
sb->s_root = root;
return 0;
failed_iput:
iput(inode);
failed:
shmem_put_super(sb);
return err;
}
函数shmem_get_inode创建节点并初始化,赋上各种操作函数集结构。函数列出如下:
<span class="kw4">static</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> shmem_get_inode<span class="br0">(</span> <span class="kw4">struct</span> super_block <span class="sy0">*</span> sb<span class="sy0">,</span> <p> <span class="kw4">int</span> mode<span class="sy0">,</span> dev_t dev<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> shmem_inode_info <span class="sy0">*</span> info<span class="sy0">;</span></p> <pre> <span class="co1">//得到sb->s_fs_info成员</span>
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
if (sbinfo) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) { //判断是否有空节点
spin_unlock(&sbinfo->stat_lock);
return NULL;
}
sbinfo->free_inodes--;
spin_unlock(&sbinfo->stat_lock);
}
//分配一个节点号,创建inode对象空间,初始化inode
inode = new_inode(sb);
if (inode) { //inode初始化
inode->i_mode = mode;
inode->i_uid = current->fsuid;
inode->i_gid = current->fsgid;
inode->i_blksize = PAGE_CACHE_SIZE;
inode->i_blocks = 0;
//地址空间操作函数,提供.writepage等对页的操作函数
inode->i_mapping->a_ops = &shmem_aops;
inode->i_mapping->backing_dev_info = &shmem_backing_dev_info;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
//通过inode得到它的容器结构shmem_inode_info
info = SHMEM_I(inode);
//结构成员清0
memset(info, 0, (char *)inode - (char *)info);
spin_lock_init(&info->lock);
mpol_shared_policy_init( & info-> policy) ;
INIT_LIST_HEAD(&info->swaplist);
switch (mode & S_IFMT) {
default:
init_special_inode(inode, mode, dev);
break;
case S_IFREG:
inode->i_op = &shmem_inode_operations;//节点操作函数
inode->i_fop = &shmem_file_operations;//文件操作函数
break;
case S_IFDIR:
inode->i_nlink++;
/* Some things misbehave if size == 0 on a directory */
inode->i_size = 2 * BOGO_DIRENT_SIZE; //即2*20
inode->i_op = &shmem_dir_inode_operations;//目录节点操作函数
inode->i_fop = &simple_dir_operations;//目录文件操作函数
break;
case S_IFLNK:
break;
}
}
return inode;
}
这里只列出了共享内存文件操作函数集,列出如下:
<span class="kw4">static</span> <span class="kw4">struct</span> file_operations shmem_file_operations <span class="sy0">=</span> <span class="br0">{</span> <p> .<span class="me1">mmap</span> <span class="sy0">=</span> shmem_mmap<span class="sy0">,</span> <span class="co2">#ifdef CONFIG_TMPFS</span> .<span class="me1">llseek</span> <span class="sy0">=</span> generic_file_llseek<span class="sy0">,</span> .<span class="me1">read</span> <span class="sy0">=</span> shmem_file_read<span class="sy0">,</span> .<span class="me1">write</span> <span class="sy0">=</span> shmem_file_write<span class="sy0">,</span> .<span class="me1">fsync</span> <span class="sy0">=</span> simple_sync_file<span class="sy0">,</span> .<span class="me1">sendfile</span> <span class="sy0">=</span> shmem_file_sendfile<span class="sy0">,</span> <span class="co2">#endif</span></p> <span class="br0">}</span> <span class="sy0">;</span>
函数shmem_file_read将内存上的内容读入到用户空间buf中去。函数列出如下:
<span class="kw4">static</span> ssize_t shmem_file_read<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> <span class="kw4">char</span> __user <span class="sy0">*</span> buf<span class="sy0">,</span> size_t count<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="br0">)</span> <p><span class="br0">{</span> read_descriptor_t desc<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> ssize_t<span class="br0">)</span> count <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> access_ok<span class="br0">(</span> VERIFY_WRITE<span class="sy0">,</span> buf<span class="sy0">,</span> count<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//如果没有访问权限,返回</span> <span class="kw1">return</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> count<span class="br0">)</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> desc.<span class="me1">written</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> desc.<span class="me1">count</span> <span class="sy0">=</span> count<span class="sy0">;</span> desc.<span class="me1">arg</span> .<span class="me1">buf</span> <span class="sy0">=</span> buf<span class="sy0">;</span> desc.<span class="me1">error</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> do_shmem_file_read<span class="br0">(</span> filp<span class="sy0">,</span> ppos<span class="sy0">,</span> <span class="sy0">&</span> desc<span class="sy0">,</span> file_read_actor<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> desc.<span class="me1">written</span> <span class="br0">)</span> <span class="kw1">return</span> desc.<span class="me1">written</span> <span class="sy0">;</span> <span class="kw1">return</span> desc.<span class="me1">error</span> <span class="sy0">;</span></p> <span class="br0">}</span>
函数do_shmem_file_read完成具体的读操作,它从共享内存文件中读数据到用户空间。函数分析如下:
<span class="kw4">static</span> <span class="kw4">void</span> do_shmem_file_read<span class="br0">(</span> <span class="kw4">struct</span> file <span class="sy0">*</span> filp<span class="sy0">,</span> loff_t <span class="sy0">*</span> ppos<span class="sy0">,</span> read_descriptor_t <span class="sy0">*</span> desc<span class="sy0">,</span> read_actor_t actor<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode <span class="sy0">=</span> filp<span class="sy0">-></span> f_dentry<span class="sy0">-></span> d_inode<span class="sy0">;</span> <span class="kw4">struct</span> address_space <span class="sy0">*</span> mapping <span class="sy0">=</span> inode<span class="sy0">-></span> i_mapping<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> index<span class="sy0">,</span> offset<span class="sy0">;</span> index <span class="sy0">=</span> <span class="sy0">*</span> ppos <span class="sy0">>></span> PAGE_CACHE_SHIFT<span class="sy0">;</span> <span class="co1">//得到以页序号</span> offset <span class="sy0">=</span> <span class="sy0">*</span> ppos <span class="sy0">&</span> ~PAGE_CACHE_MASK<span class="sy0">;</span> <span class="co1">//得到页内偏移</span> <span class="kw1">for</span> <span class="br0">(</span> <span class="sy0">;;</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> page <span class="sy0">*</span> page <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> end_index<span class="sy0">,</span> nr<span class="sy0">,</span> ret<span class="sy0">;</span> <span class="co1">//读出inode->i_size</span> loff_t i_size <span class="sy0">=</span> i_size_read<span class="br0">(</span> inode<span class="br0">)</span> <span class="sy0">;</span></p> <pre> <span class="co1">//算出结束的页序号</span>
end_index = i_size >> PAGE_CACHE_SHIFT;
if (index > end_index)
break;
if (index == end_index) {
nr = i_size & ~PAGE_CACHE_MASK;//页内偏移
if (nr <= offset)
break;
}
//从swap中得到一页或分配新的一页
desc->error = shmem_getpage(inode, index, &page, SGP_READ, NULL);
if (desc->error) {
if (desc->error == -EINVAL)
desc->error = 0;
break;
}
nr = PAGE_CACHE_SIZE; //nr为一页大小
i_size = i_size_read(inode); //读出inode->i_size
end_index = i_size >> PAGE_CACHE_SHIFT;
if (index == end_index) {
nr = i_size & ~PAGE_CACHE_MASK;
if (nr <= offset) {
if (page)
page_cache_release(page);
break;
}
}
nr -= offset;
if (page) {
//如果用户能用任意的虚拟地址写这页,
//在内核里读这页前注意存在潜在的别名。
//这个文件的页在用户空间里已被修改?
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
if (!offset)//偏移为0,即从页的头部开始
mark_page_accessed(page); //标志页为被访问
} else
//得到一页,ZERO_PAGE是为0的全局共享页,
//大小由全局变量定义得到:unsigned long empty_zero_page[1024];
page = ZERO_PAGE(0);
//我们有了这页,并且它是更新的,因而我们把它拷贝到用户空间。
//actor例程返回实际被使用的字节数。它实际上是file_read_actor函数。
ret = actor(desc, page, offset, nr);//拷贝到用户空间
offset += ret;
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
page_cache_release(page); //释放页结构
if (ret != nr || !desc->count)
break;
cond_resched();//需要时就进行调度
}
*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
file_accessed(filp);
}
函数file_read_actor拷贝page中偏移为offset,大小为size的数据到用户空间desc中。函数分析如下(在 mm/filemap.c中):
<span class="kw4">int</span> file_read_actor<span class="br0">(</span> read_descriptor_t <span class="sy0">*</span> desc<span class="sy0">,</span> <span class="kw4">struct</span> page <span class="sy0">*</span> page<span class="sy0">,</span> <p> <span class="kw4">unsigned</span> <span class="kw4">long</span> offset<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> size<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">char</span> <span class="sy0">*</span> kaddr<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> left<span class="sy0">,</span> count <span class="sy0">=</span> desc<span class="sy0">-></span> count<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> size <span class="sy0">></span> count<span class="br0">)</span> size <span class="sy0">=</span> count<span class="sy0">;</span> </p> <pre> <span class="co1">//在用户空间分配size大小,若能写0到用户空间,返回0。</span>
if (!fault_in_pages_writeable(desc->arg.buf, size)) {
kaddr = kmap_atomic(page, KM_USER0); //映射page到kaddr地址
//拷贝数据从内核空间kaddr + offset到用户空间desc->arg.buf
left = __copy_to_user_inatomic(desc->arg.buf,
kaddr + offset, size);
kunmap_atomic(kaddr, KM_USER0);//取消映射
if (left == 0)
goto success;
}
/* Do it the slow way */
kaddr = kmap(page); //映射page到kaddr地址
//拷贝数据从内核空间kaddr + offset到用户空间desc->arg.buf
left = __copy_to_user(desc->arg.buf, kaddr + offset, size);
kunmap(page); //取消映射
if (left) {
size -= left;
desc->error = -EFAULT;
}
success:
desc->count = count - size;
desc->written += size;
desc->arg.buf += size;
return size;
}
共享内存系统调用
函数shm_init初始化共享内存,函数列出如下(在ipc/shm.c中):
<span class="kw4">void</span> __init shm_init <span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <span class="br0">{</span> ipc_init_ids<span class="br0">(</span> <span class="sy0">&</span> shm_ids<span class="sy0">,</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//建立有1个ID的shm_ids共享内存ID集,</span> <span class="co2">#ifdef CONFIG_PROC_FS //在/proc文件系统中建立文件</span> create_proc_read_entry<span class="br0">(</span> <span class="st0">"sysvipc/shm"</span> <span class="sy0">,</span> <span class="nu0">0</span> <span class="sy0">,</span> NULL<span class="sy0">,</span> sysvipc_shm_read_proc<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="co2">#endif</span> <span class="br0">}</span>
函数ipc_init_ids创建size个ID的ids共享内存ID集,初始化IPC的ID,给IPC ID一个范围值(限制在IPCMNI以下)。建立一个序列的范围,接着分配并初始化数组本身。函数ipc_init_ids分析如下(在 ipc/util.c中):
<span class="kw4">void</span> __init ipc_init_ids<span class="br0">(</span> <span class="kw4">struct</span> ipc_ids<span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">int</span> size<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">int</span> i<span class="sy0">;</span> <span class="co1">//设置信号量&ids->sem->count = 1,并初始化信号量的等待队列</span> sema_init<span class="br0">(</span> <span class="sy0">&</span> ids<span class="sy0">-></span> sem<span class="sy0">,</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> size <span class="sy0">></span> IPCMNI<span class="br0">)</span> <span class="co1">//ID超出最大值32768</span> size <span class="sy0">=</span> IPCMNI<span class="sy0">;</span> <span class="co1">//ids初始化</span> ids<span class="sy0">-></span> size <span class="sy0">=</span> size<span class="sy0">;</span> ids<span class="sy0">-></span> in_use <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> ids<span class="sy0">-></span> max_id <span class="sy0">=</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="sy0">;</span> ids<span class="sy0">-></span> seq <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">{</span> <span class="co1">//算出最大序列</span> <span class="kw4">int</span> seq_limit <span class="sy0">=</span> INT_MAX<span class="sy0">/</span> SEQ_MULTIPLIER<span class="sy0">;</span> <span class="co1">//最大int值/最大数组值32768</span> <span class="kw1">if</span> <span class="br0">(</span> seq_limit <span class="sy0">></span> USHRT_MAX<span class="br0">)</span> ids<span class="sy0">-></span> seq_max <span class="sy0">=</span> USHRT_MAX<span class="sy0">;</span> <span class="co1">//为0xffff</span> <span class="kw1">else</span> ids<span class="sy0">-></span> seq_max <span class="sy0">=</span> seq_limit<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//分配对象数组空间</span> ids<span class="sy0">-></span> entries <span class="sy0">=</span> ipc_rcu_alloc<span class="br0">(</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="kw4">struct</span> ipc_id<span class="br0">)</span> <span class="sy0">*</span> size<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> ids<span class="sy0">-></span> entries <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> printk<span class="br0">(</span> KERN_ERR <span class="st0">"ipc_init_ids() failed, ipc service disabled.<span class="es1">/n</span> "</span> <span class="br0">)</span> <span class="sy0">;</span> ids<span class="sy0">-></span> size <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">for</span> <span class="br0">(</span> i<span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> i<span class="sy0"><</span> ids<span class="sy0">-></span> size<span class="sy0">;</span> i<span class="sy0">++</span> <span class="br0">)</span> ids<span class="sy0">-></span> entries<span class="br0">[</span> i<span class="br0">]</span> .<span class="me1">p</span> <span class="sy0">=</span> NULL<span class="sy0">;</span></p> <span class="br0">}</span>
创建共享内存
系统调用sys_shmget是用来获得共享内存区域ID的,如果不存在指定的共享区域就创建相应的区域。函数sys_shmget分析如下(在 ipc/shm.c中):
asmlinkage <span class="kw4">long</span> sys_shmget <span class="br0">(</span> key_t key<span class="sy0">,</span> size_t size<span class="sy0">,</span> <span class="kw4">int</span> shmflg<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> shmid_kernel <span class="sy0">*</span> shp<span class="sy0">;</span> <span class="kw4">int</span> err<span class="sy0">,</span> id <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> down<span class="br0">(</span> <span class="sy0">&</span> shm_ids.<span class="me1">sem</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> key <span class="sy0">==</span> IPC_PRIVATE<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//创建共享内存区</span> err <span class="sy0">=</span> newseg<span class="br0">(</span> key<span class="sy0">,</span> shmflg<span class="sy0">,</span> size<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> id <span class="sy0">=</span> ipc_findkey<span class="br0">(</span> <span class="sy0">&</span> shm_ids<span class="sy0">,</span> key<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">==</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//没找到共享内存</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> IPC_CREAT<span class="br0">)</span> <span class="br0">)</span> err <span class="sy0">=</span> <span class="sy0">-</span> ENOENT<span class="sy0">;</span> <span class="kw1">else</span> err <span class="sy0">=</span> newseg<span class="br0">(</span> key<span class="sy0">,</span> shmflg<span class="sy0">,</span> size<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//创建共享内存区</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> IPC_CREAT<span class="br0">)</span> <span class="sy0">&&</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> IPC_EXCL<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> <span class="sy0">-</span> EEXIST<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> shp <span class="sy0">=</span> shm_lock<span class="br0">(</span> id<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//由id号在数组中找到对应shmid_kernel结构</span> <span class="kw1">if</span> <span class="br0">(</span> shp<span class="sy0">==</span> NULL<span class="br0">)</span> BUG<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> shp<span class="sy0">-></span> shm_segsz <span class="sy0"><</span> size<span class="br0">)</span> <span class="co1">//检查共享内存的大小</span> err <span class="sy0">=</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> ipcperms<span class="br0">(</span> <span class="sy0">&</span> shp<span class="sy0">-></span> shm_perm<span class="sy0">,</span> shmflg<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//检查IPC许可</span> err <span class="sy0">=</span> <span class="sy0">-</span> EACCES<span class="sy0">;</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//即shmid = IPC最大数组个数(SEQ_MULTIPLIER)*seq + id</span> <span class="kw4">int</span> shmid <span class="sy0">=</span> shm_buildid<span class="br0">(</span> id<span class="sy0">,</span> shp<span class="sy0">-></span> shm_perm.<span class="me1">seq</span> <span class="br0">)</span> <span class="sy0">;</span> err <span class="sy0">=</span> security_shm_associate<span class="br0">(</span> shp<span class="sy0">,</span> shmflg<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> err<span class="br0">)</span> err <span class="sy0">=</span> shmid<span class="sy0">;</span> <span class="co1">//返回共享内存区ID</span> <span class="br0">}</span> shm_unlock<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> up<span class="br0">(</span> <span class="sy0">&</span> shm_ids.<span class="me1">sem</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> err<span class="sy0">;</span> <span class="br0">}</span> <span class="co2">#define shm_lock(id) </span>
在全局结构变量shm_ids中,成员shm_ids-> entries是kern_ipc_perm结构数组,由下标id可得到shm_ids-> entries[id],即第id个kern_ipc_perm结构。
由于结构shmid_kernel中的第一个成员就是kern_ipc_perm结构,所以shm_lock(id)可找到对应的 shmid_kernel结构,进而找到file结构,完成在不同进程间由id查找共享内存的过程。
函数ipc_lock分析如下(在ipc/util.c中):
<span class="kw4">struct</span> kern_ipc_perm<span class="sy0">*</span> ipc_lock<span class="br0">(</span> <span class="kw4">struct</span> ipc_ids<span class="sy0">*</span> ids<span class="sy0">,</span> <span class="kw4">int</span> id<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm<span class="sy0">*</span> out<span class="sy0">;</span> <span class="kw4">int</span> lid <span class="sy0">=</span> id <span class="sy0">%</span> SEQ_MULTIPLIER<span class="sy0">;</span> <span class="co1">//与最大的id模除,即不超过最大的id数</span> <span class="kw4">struct</span> ipc_id<span class="sy0">*</span> entries<span class="sy0">;</span> rcu_read_lock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> lid <span class="sy0">>=</span> ids<span class="sy0">-></span> size<span class="br0">)</span> <span class="br0">{</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> NULL<span class="sy0">;</span> <span class="br0">}</span> <span class="coMULTI">/*下面两个读屏障是与grow_ary()中的两个写屏障对应,它们保证写与读有同样的次序。smp_rmb()影响所有的CPU。如果在两个读之间在数据依赖性,rcu_dereference()被使用,这仅在Alpha平台上起作用。*/</span> smp_rmb<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//阻止用新的尺寸对旧数组进行索引</span> entries <span class="sy0">=</span> rcu_dereference<span class="br0">(</span> ids<span class="sy0">-></span> entries<span class="br0">)</span> <span class="sy0">;</span> out <span class="sy0">=</span> entries<span class="br0">[</span> lid<span class="br0">]</span> .<span class="me1">p</span> <span class="sy0">;</span> <span class="co1">//得到kern_ipc_perm结构</span> <span class="kw1">if</span> <span class="br0">(</span> out <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> NULL<span class="sy0">;</span> <span class="br0">}</span> spin_lock<span class="br0">(</span> <span class="sy0">&</span> out<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*在ipc_lock锁起作用时,ipc_rmid()可能已释放了ID,这里验证结构是否还有效。*/</span> <span class="kw1">if</span> <span class="br0">(</span> out<span class="sy0">-></span> deleted<span class="br0">)</span> <span class="br0">{</span> spin_unlock<span class="br0">(</span> <span class="sy0">&</span> out<span class="sy0">-></span> lock<span class="br0">)</span> <span class="sy0">;</span> rcu_read_unlock<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> NULL<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> out<span class="sy0">;</span> <span class="br0">}</span>
函数newseg创建一个共享内存,即创建一个内存中的文件,并设置文件操作函数结构。
<span class="kw4">static</span> <span class="kw4">int</span> newseg <span class="br0">(</span> key_t key<span class="sy0">,</span> <span class="kw4">int</span> shmflg<span class="sy0">,</span> size_t size<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="kw4">struct</span> shmid_kernel <span class="sy0">*</span> shp<span class="sy0">;</span> <span class="co1">//将分配的大小转换成以页为单位</span> <span class="kw4">int</span> numpages <span class="sy0">=</span> <span class="br0">(</span> size <span class="sy0">+</span> PAGE_SIZE <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">>></span> PAGE_SHIFT<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> file<span class="sy0">;</span> <span class="kw4">char</span> name<span class="br0">[</span> <span class="nu0">13</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="kw4">int</span> id<span class="sy0">;</span> <span class="co1">//大小超界检查</span> <span class="kw1">if</span> <span class="br0">(</span> size <span class="sy0"><</span> SHMMIN <span class="sy0">||</span> size <span class="sy0">></span> shm_ctlmax<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="co1">//当前共享内存的总页数 >= 系统提供的最大共享内存总页数</span> <span class="kw1">if</span> <span class="br0">(</span> shm_tot <span class="sy0">+</span> numpages <span class="sy0">>=</span> shm_ctlall<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOSPC<span class="sy0">;</span> <span class="co1">//分配对象空间</span> shp <span class="sy0">=</span> ipc_rcu_alloc<span class="br0">(</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> shp<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> shp<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> shp<span class="sy0">-></span> shm_perm.<span class="me1">key</span> <span class="sy0">=</span> key<span class="sy0">;</span> shp<span class="sy0">-></span> shm_flags <span class="sy0">=</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> S_IRWXUGO<span class="br0">)</span> <span class="sy0">;</span> shp<span class="sy0">-></span> mlock_user <span class="sy0">=</span> NULL<span class="sy0">;</span> shp<span class="sy0">-></span> shm_perm.<span class="me1">security</span> <span class="sy0">=</span> NULL<span class="sy0">;</span> error <span class="sy0">=</span> security_shm_alloc<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//安全机制检查</span> <span class="kw1">if</span> <span class="br0">(</span> error<span class="br0">)</span> <span class="br0">{</span> ipc_rcu_putref<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> error<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_HUGETLB<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//使用hugetlb文件系统创建size大小的文件 </span> file <span class="sy0">=</span> hugetlb_zero_setup<span class="br0">(</span> size<span class="br0">)</span> <span class="sy0">;</span> shp<span class="sy0">-></span> mlock_user <span class="sy0">=</span> current<span class="sy0">-></span> user<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//使用tmpfs文件系统创建名为key的文件</span> sprintf <span class="br0">(</span> name<span class="sy0">,</span> <span class="st0">"SYSV%08x"</span> <span class="sy0">,</span> key<span class="br0">)</span> <span class="sy0">;</span> file <span class="sy0">=</span> shmem_file_setup<span class="br0">(</span> name<span class="sy0">,</span> size<span class="sy0">,</span> VM_ACCOUNT<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> error <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> file<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> file<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> no_file<span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> ENOSPC<span class="sy0">;</span> id <span class="sy0">=</span> shm_addid<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//找到一个空闲的id</span> <span class="kw1">if</span> <span class="br0">(</span> id <span class="sy0">==</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="kw1">goto</span> no_id<span class="sy0">;</span> shp<span class="sy0">-></span> shm_cprid <span class="sy0">=</span> current<span class="sy0">-></span> tgid<span class="sy0">;</span> shp<span class="sy0">-></span> shm_lprid <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> shp<span class="sy0">-></span> shm_atim <span class="sy0">=</span> shp<span class="sy0">-></span> shm_dtim <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> shp<span class="sy0">-></span> shm_ctim <span class="sy0">=</span> get_seconds<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> shp<span class="sy0">-></span> shm_segsz <span class="sy0">=</span> size<span class="sy0">;</span> shp<span class="sy0">-></span> shm_nattch <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span></p> <pre> <span class="co1">//即shmid = IPC最大数组个数(SEQ_MULTIPLIER)*seq + id</span>
shp->id = shm_buildid(id,shp->shm_perm.seq);
shp->shm_file = file;
file->f_dentry->d_inode->i_ino = shp->id;
if (shmflg & SHM_HUGETLB)
//设置hugetlb文件系统文件操作函数结构
set_file_hugepages(file);
else
file->f_op = &shm_file_operations;
shm_tot += numpages; //总的共享内存页数
shm_unlock(shp);
return shp->id;
no_id:
fput(file);
no_file:
security_shm_free(shp);
ipc_rcu_putref(shp);
return error;
}
函数shmem_file_setup得到一个在tmpfs中的非链接文件file,其参数name是dentry的名字,在/proc /<pid>/maps中是可见的,参数size指的是所设置的file大小。函数分析如下:
<span class="kw4">struct</span> file <span class="sy0">*</span> shmem_file_setup<span class="br0">(</span> <span class="kw4">char</span> <span class="sy0">*</span> name<span class="sy0">,</span> loff_t size<span class="sy0">,</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> flags<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> file<span class="sy0">;</span> <span class="kw4">struct</span> inode <span class="sy0">*</span> inode<span class="sy0">;</span> <span class="kw4">struct</span> dentry <span class="sy0">*</span> dentry<span class="sy0">,</span> <span class="sy0">*</span> root<span class="sy0">;</span> <span class="kw4">struct</span> qstr this<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> shm_mnt<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="br0">(</span> <span class="kw4">void</span> <span class="sy0">*</span> <span class="br0">)</span> shm_mnt<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> size <span class="sy0"><</span> <span class="nu0">0</span> <span class="sy0">||</span> size <span class="sy0">></span> SHMEM_MAX_BYTES<span class="br0">)</span> <span class="kw1">return</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EINVAL<span class="br0">)</span> <span class="sy0">;</span> </p> <pre> <span class="co1">//预先计算VM对象的整个固定大小。对于共享内存和共享匿名(/dev/zero)映射来说,和私有映射的预先设置是一样的。</span>
if (shmem_acct_size(flags, size))
return ERR_PTR(-ENOMEM);
error = -ENOMEM;
this.name = name;
this.len = strlen(name);
this.hash = 0; /* will go */
root = shm_mnt->mnt_root;//得到根目录
dentry = d_alloc(root, &this);//分配dentry对象并初始化
if (!dentry)
goto put_memory;
error = -ENFILE;
file = get_empty_filp(); //得到一个未用的file结构
if (!file)
goto put_dentry;
error = -ENOSPC;
inode = shmem_get_inode(root->d_sb, S_IFREG | S_IRWXUGO, 0);//分配节点
if (!inode)
goto close_file;
SHMEM_I(inode)->flags = flags & VM_ACCOUNT;
d_instantiate(dentry, inode);//加inode到dentry上
inode->i_size = size;
inode->i_nlink = 0;//它是非链接的
file->f_vfsmnt = mntget(shm_mnt);
file->f_dentry = dentry;
file->f_mapping = inode->i_mapping;
file->f_op = &shmem_file_operations; //文件操作函数集
file->f_mode = FMODE_WRITE | FMODE_READ;
return file;
close_file:
put_filp(file);
put_dentry:
dput(dentry);
put_memory:
shmem_unacct_size(flags, size);
return ERR_PTR(error);
}
映射函数shmat
在应用程序中调用函数shmat(),把共享内存区域映射到调用进程的地址空间中去。这样,进程就可以对共享区域方便地进行访问操作。
在内核中,函数shmat对应系统调用的执行函数是函数do_shmat,它分配描述符,映射shm,把描述符加到链表中。其参数shmaddr是当前进程所要求映射的目标地址。函数do_shmat分析如下:
<span class="kw4">long</span> do_shmat<span class="br0">(</span> <span class="kw4">int</span> shmid<span class="sy0">,</span> <span class="kw4">char</span> __user <span class="sy0">*</span> shmaddr<span class="sy0">,</span> <span class="kw4">int</span> shmflg<span class="sy0">,</span> ulong <span class="sy0">*</span> raddr<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> shmid_kernel <span class="sy0">*</span> shp<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> addr<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> size<span class="sy0">;</span> <span class="kw4">struct</span> file <span class="sy0">*</span> file<span class="sy0">;</span> <span class="kw4">int</span> err<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> flags<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> prot<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> o_flags<span class="sy0">;</span> <span class="kw4">int</span> acc_mode<span class="sy0">;</span> <span class="kw4">void</span> <span class="sy0">*</span> user_addr<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> shmid <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//不正确的共享内存ID</span> err <span class="sy0">=</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">goto</span> out<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> addr <span class="sy0">=</span> <span class="br0">(</span> ulong<span class="br0">)</span> shmaddr<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> addr <span class="sy0">&</span> <span class="br0">(</span> SHMLBA<span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//不能整除,没与页对齐,需调整</span> <span class="kw1">if</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_RND<span class="br0">)</span> <span class="co1">//对齐调整标志SHM_RND</span> addr <span class="sy0">&=</span> ~<span class="br0">(</span> SHMLBA<span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//向移动对齐</span> <span class="kw1">else</span> <span class="co2">#ifndef __ARCH_FORCE_SHMLBA</span> <span class="kw1">if</span> <span class="br0">(</span> addr <span class="sy0">&</span> ~PAGE_MASK<span class="br0">)</span> <span class="co2">#endif</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="br0">}</span> flags <span class="sy0">=</span> MAP_SHARED <span class="sy0">|</span> MAP_FIXED<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_REMAP<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> flags <span class="sy0">=</span> MAP_SHARED<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_RDONLY<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//仅读</span> prot <span class="sy0">=</span> PROT_READ<span class="sy0">;</span> o_flags <span class="sy0">=</span> O_RDONLY<span class="sy0">;</span> acc_mode <span class="sy0">=</span> S_IRUGO<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> prot <span class="sy0">=</span> PROT_READ <span class="sy0">|</span> PROT_WRITE<span class="sy0">;</span> o_flags <span class="sy0">=</span> O_RDWR<span class="sy0">;</span> acc_mode <span class="sy0">=</span> S_IRUGO <span class="sy0">|</span> S_IWUGO<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_EXEC<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//可运行</span> prot <span class="sy0">|=</span> PROT_EXEC<span class="sy0">;</span> acc_mode <span class="sy0">|=</span> S_IXUGO<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//由id号在数组中找到对应shmid_kernel结构</span> shp <span class="sy0">=</span> shm_lock<span class="br0">(</span> shmid<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> shp <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> err <span class="sy0">=</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">goto</span> out<span class="sy0">;</span> <span class="br0">}</span> err <span class="sy0">=</span> shm_checkid<span class="br0">(</span> shp<span class="sy0">,</span> shmid<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//检查shmid是否正确</span> <span class="kw1">if</span> <span class="br0">(</span> err<span class="br0">)</span> <span class="br0">{</span> shm_unlock<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> ipcperms<span class="br0">(</span> <span class="sy0">&</span> shp<span class="sy0">-></span> shm_perm<span class="sy0">,</span> acc_mode<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//检查IPC许可</span> shm_unlock<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> err <span class="sy0">=</span> <span class="sy0">-</span> EACCES<span class="sy0">;</span> <span class="kw1">goto</span> out<span class="sy0">;</span> <span class="br0">}</span> err <span class="sy0">=</span> security_shm_shmat<span class="br0">(</span> shp<span class="sy0">,</span> shmaddr<span class="sy0">,</span> shmflg<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//安全检查</span> <span class="kw1">if</span> <span class="br0">(</span> err<span class="br0">)</span> <span class="br0">{</span> shm_unlock<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> err<span class="sy0">;</span> <span class="br0">}</span> file <span class="sy0">=</span> shp<span class="sy0">-></span> shm_file<span class="sy0">;</span> <span class="co1">//得到文件结构</span> size <span class="sy0">=</span> i_size_read<span class="br0">(</span> file<span class="sy0">-></span> f_dentry<span class="sy0">-></span> d_inode<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//读文件中size大小</span> shp<span class="sy0">-></span> shm_nattch<span class="sy0">++;</span> <span class="co1">//对共享内存访问进程计数</span> shm_unlock<span class="br0">(</span> shp<span class="br0">)</span> <span class="sy0">;</span> down_write<span class="br0">(</span> <span class="sy0">&</span> current<span class="sy0">-></span> mm<span class="sy0">-></span> mmap_sem<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> addr <span class="sy0">&&</span> <span class="sy0">!</span> <span class="br0">(</span> shmflg <span class="sy0">&</span> SHM_REMAP<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> user_addr <span class="sy0">=</span> ERR_PTR<span class="br0">(</span> <span class="sy0">-</span> EINVAL<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//如果当前进程的虚拟内存中的VMA与共享内存地址交叉</span> <span class="kw1">if</span> <span class="br0">(</span> find_vma_intersection<span class="br0">(</span> current<span class="sy0">-></span> mm<span class="sy0">,</span> addr<span class="sy0">,</span> addr <span class="sy0">+</span> size<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> invalid<span class="sy0">;</span> <span class="co1">//如果shm段在堆栈之下,确信有剩下空间给堆栈增长用(最少4页)</span> <span class="kw1">if</span> <span class="br0">(</span> addr <span class="sy0"><</span> current<span class="sy0">-></span> mm<span class="sy0">-></span> start_stack <span class="sy0">&&</span> addr <span class="sy0">></span> current<span class="sy0">-></span> mm<span class="sy0">-></span> start_stack <span class="sy0">-</span> size <span class="sy0">-</span> PAGE_SIZE <span class="sy0">*</span> <span class="nu0">5</span> <span class="br0">)</span> <span class="kw1">goto</span> invalid<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//建立起文件与虚存空间的映射,即将文件映射到进程空间 </span> user_addr <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">void</span> <span class="sy0">*</span> <span class="br0">)</span> do_mmap <span class="br0">(</span> file<span class="sy0">,</span> addr<span class="sy0">,</span> size<span class="sy0">,</span> prot<span class="sy0">,</span> flags<span class="sy0">,</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="sy0">;</span> …… <span class="br0">}</span>
函数shmdt()用来解除进程对共享内存区域的映射。函数shmctl实现对共享内存区域的控制操作。
信号量
信号量主要提供对进程间共享资源访问控制机制,确保每次只有一个进程资源进行访问。信号量主要用于进程间同步。信号量集是信号量的集合,用于多种共享资源的进程间同步。信号量的值表示当前共享资源可用数量,如果一个进程要申请共享资源,那么就从信号量值中减去要申请的数目,如果当前没有足够的可用资源,进程可以睡眠等待,也可以立即返回。
用户空间信号量机制是在内核空间实现,用户进程直接使用。与信号量相关的操作的系统调用有:sys_semget(),sys_semop()和sys_semctl()。
信号量数据结构
信号量通过内核提供的数据结构实现,信号量数据结构之间的关系如图4所示。结构sem_array的sem_base指向一个信号量数组,信号量用结构sem描述,信号量集合用结构sem_array结构描述。下面分别说明信号的数据结构。
图4 信号量数据结构之间的关系(1)信号量结构sem
系统中每个信号量用一个信号量结构sem进行描述。结构sem列出如下(在include/linux/sem.h中):
<span class="kw4">struct</span> sem <span class="br0">{</span> <span class="kw4">int</span> semval<span class="sy0">;</span> <span class="co1">//信号量当前的值</span> <span class="kw4">int</span> sempid<span class="sy0">;</span> <span class="co1">//上一次操作的进程pid</span> <span class="br0">}</span> <span class="sy0">;</span>
(2)信号量集结构sem_array
系统中的每个信号量集用一个信号量集结构sem_array描述,信号量集结构列出如下:
<span class="kw4">struct</span> sem_array <span class="br0">{</span> <span class="kw4">struct</span> kern_ipc_perm sem_perm<span class="sy0">;</span> <span class="co1">//IPC许可的结构,包含uid、gid等</span> time_t sem_otime<span class="sy0">;</span> <span class="co1">//上一次信号量操作时间</span> time_t sem_ctime<span class="sy0">;</span> <span class="co1">//上一次发生变化的时间</span> <span class="kw4">struct</span> sem <span class="sy0">*</span> sem_base<span class="sy0">;</span> <span class="co1">//集合中第一个信号量的指针</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> sem_pending<span class="sy0">;</span> <span class="co1">//将被处理的正挂起的操作</span> <span class="kw4">struct</span> sem_queue <span class="sy0">**</span> sem_pending_last<span class="sy0">;</span> <span class="co1">//上一次挂起的操作</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> undo<span class="sy0">;</span> <span class="co1">//集合上的undo请求</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> sem_nsems<span class="sy0">;</span> <span class="co1">//集合中信号量的序号</span> <span class="br0">}</span> <span class="sy0">;</span>
(3)信号量集合的睡眠队列结构sem_queue
系统中每个睡眠的进程用一个队列结构sem_queue描述。结构sem_queue列出如下:
<span class="kw4">struct</span> sem_queue <span class="br0">{</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> next<span class="sy0">;</span> <span class="co1">//队列里的下一个元素</span> <span class="kw4">struct</span> sem_queue <span class="sy0">**</span> prev<span class="sy0">;</span> <span class="kw4">struct</span> task_struct<span class="sy0">*</span> sleeper<span class="sy0">;</span> <span class="co1">//这个睡眠进程</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> undo<span class="sy0">;</span> <span class="kw4">int</span> pid<span class="sy0">;</span> <span class="co1">//正在请求的进程的pid</span> <span class="kw4">int</span> status<span class="sy0">;</span> <span class="co1">//操作的完成状态</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="sy0">;</span> <span class="co1">//操作的信号量集合</span> <span class="kw4">int</span> id<span class="sy0">;</span> <span class="co1">//内部的信号量ID</span> <span class="kw4">struct</span> sembuf <span class="sy0">*</span> sops<span class="sy0">;</span> <span class="co1">//正挂起的操作的集合</span> <span class="kw4">int</span> nsops<span class="sy0">;</span> <span class="co1">//操作的数量</span> <span class="br0">}</span> <span class="sy0">;</span>
(4)信号量操作值结构sembuf
系统调用semop会从用户空间传入结构sembuf实例值,其中,成员sem_op是一个表示操作的整数,它表示取得或归还资源的数量。该整数将加到对应信号量的当前值上。如果具体的信号量数加入这个整数后为负数,则表明没有资源可用,当前进程就会进入睡眠等待中。成员sem_flag设置两个标志位:一个是IPC_NOWAIT,表示在条件不能满足时不要睡眠等待而立即返回,错误代码为EAGAIN;另一个为SEM_UNDO,表示进程未归还资源就退出时,由内核归还资源。
结构sembuf如下:
<span class="kw4">struct</span> sembuf <span class="br0">{</span> <span class="kw4">unsigned</span> <span class="kw4">short</span> sem_num<span class="sy0">;</span> <span class="co1">//数组中信号量的序号</span> <span class="kw4">short</span> sem_op<span class="sy0">;</span> <span class="coMULTI">/* 信号量操作值(正数、负数或0) */</span> <span class="kw4">short</span> sem_flg<span class="sy0">;</span> <span class="co1">//操作标志,为IPC_NOWAIT或SEM_UNDO</span> <span class="br0">}</span> <span class="sy0">;</span>
(5)死锁恢复结构sem_undo
当进程修改了信号量而进入临界区后,进程因为崩溃或被"杀死"而没有退出临界区,此时,其他被挂起在此信号量上的进程永远得不到运行机会,从而引起死锁。
为了避免死锁,Linux内核维护一个信号量数组的调整列表,让信号量的状态退回到操作实施前的状态。
Linux为每个信号量数组的每个进程维护至少一个结构sem_undo。新创建的结构sem_undo实现既在进程结构 task_struct的成员undo上排队,也在信号量数组结构semid_array的成员undo上排队。当对信号量数组上的一个信号量进行操作时,操作值的负数与该信号量的"调整值"相加。例如:如果操作值为2,则把-2加到该信号量的"调整值"域semadj。
每个任务有一个undo请求的链表,当进程退出时,它们被自动地执行。当进程被删除时,Linux完成了对结构sem_undo的设置及对信号量数组的调整。如果一个信号量集合被删除,结构sem_undo依然留在该进程结构task_struct中,但信号量集合的识别号变为无效。
结构sem_undo列出如下:
<span class="kw4">struct</span> sem_undo <span class="br0">{</span> <span class="co1">//这个进程的下一个sem_undo节点条目,链入结构task_struct中的undo队列</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> proc_next<span class="sy0">;</span> <span class="co1">//信号量集的下一个条目,链入结构sem_array中的undo队列</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> id_next<span class="sy0">;</span> <span class="kw4">int</span> semid<span class="sy0">;</span> <span class="co1">//信号量集ID</span> <span class="kw4">short</span> <span class="sy0">*</span> semadj<span class="sy0">;</span> <span class="co1">//信号量数组的调整,每个进程一个</span> <span class="br0">}</span> <span class="sy0">;</span>
结构sem_undo_list控制着对sem_undo结构链表的共享访问。sem_undo结构待在一个CLONE_SYSVSEM被任务组里所有任务共享。结构sem_undo_list列出如下:
<span class="kw4">struct</span> sem_undo_list <span class="br0">{</span> <p> atomic_t refcnt<span class="sy0">;</span> spinlock_t lock<span class="sy0">;</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> proc_list<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">struct</span> sysv_sem <span class="br0">{</span> <span class="kw4">struct</span> sem_undo_list <span class="sy0">*</span> undo_list<span class="sy0">;</span></p> <span class="br0">}</span> <span class="sy0">;</span>
系统调用函数功能说明
与信号量相关的操作的系统调用有:sys_semget(),sys_semop()和sys_semctl()。下面分别说明各个系统调用的功能。
(1)系统调用sys_semget
系统调用sys_semget创建或获取一个信号量集合,参数nsems为信号量的个数,参数semflg为操作标识,值为 IPC_CREAT或者IPC_EXCL。其定义列出如下:
asmlinkage <span class="kw4">long</span> sys_semget<span class="br0">(</span> key_t key<span class="sy0">,</span> <span class="kw4">int</span> nsems<span class="sy0">,</span> <span class="kw4">int</span> semflg<span class="br0">)</span>
(2)系统调用sys_semop
系统调用sys_semop用来操作信号量,其定义列出如下:
asmlinkage <span class="kw4">long</span> sys_semop <span class="br0">(</span> <span class="kw4">int</span> semid<span class="sy0">,</span> <span class="kw4">struct</span> sembuf __user <span class="sy0">*</span> tsops<span class="sy0">,</span> <span class="kw4">unsigned</span> nsops<span class="br0">)</span>
函数sys_semop参数semid是信号量的识别号,可以由系统调用sys_semget获取;参数sops指向执行操作值的数组;参数 nsop是操作的个数。
信号量操作时,操作值和信号量的当前值相加,如果大于 0,或操作值和当前值均为 0,则操作成功。如果所有操作中的任一个操作不能成功,则 Linux 会挂起此进程。如果不能挂起,系统调用返回并指明操作不成功,进程可以继续执行。如果进程被挂起,Linux保存信号量的操作状态,并将当前进程放入等待队列。
(3)系统调用sys_semctl
系统调用sys_semctl执行指定的控制命令,其定义列出如下:
asmlinkage <span class="kw4">long</span> sys_semctl <span class="br0">(</span> <span class="kw4">int</span> semid<span class="sy0">,</span> <span class="kw4">int</span> semnum<span class="sy0">,</span> <span class="kw4">int</span> cmd<span class="sy0">,</span> <span class="kw4">union</span> semun arg<span class="br0">)</span>
参数semid是信号量集的ID,参数semnum为信号量的个数,参数cmd为控制命令,参数arg为传递信号量信息或返回信息的联合体。结构 semun的定义列出如下:
<span class="kw4">union</span> semun <span class="br0">{</span> <span class="kw4">int</span> val<span class="sy0">;</span> <span class="coMULTI">/* 命令SETVAL的值 */</span> <span class="kw4">struct</span> semid_ds __user <span class="sy0">*</span> buf<span class="sy0">;</span> <span class="coMULTI">/* 命令IPC_STAT和IPC_SET的buffer */</span> <span class="kw4">unsigned</span> <span class="kw4">short</span> __user <span class="sy0">*</span> array<span class="sy0">;</span> <span class="coMULTI">/* 命令GETALL和SETALL的信号量数组 */</span> <span class="kw4">struct</span> seminfo __user <span class="sy0">*</span> __buf<span class="sy0">;</span> <span class="coMULTI">/*命令IPC_INFO 的buffer*/</span> <span class="kw4">void</span> __user <span class="sy0">*</span> __pad<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span>
系统调用sys_semctl的命令参数cmd说明如表7所示。
命令 | 说明 |
IPC_STAT | 从信号量集合上获致结构semid_ds,存放到semun的成员buf中返回。 |
IPC_SET | 设置信号量集合结构semid_ds中ipc_perm域,从semun的buf中读取值。 |
IPC_RMID | 删除信号量集合。 |
GETALL | 从信号量集合中获取所有信号量值,并把其整数值存放到semun的array中返回。 |
GETNCNT | 返回当前等待进程个数。 |
GETPID | 返回最后一个执行系统调用semop进程的PID。 |
GETVAL | 返回信号量集内单个信号量的值。 |
GETZCNT | 返回当前等待100%资源利用的进程个数。 |
SETALL | 设置信号量集合中所有信号量值。 |
SETVAL | 用semun的val设置信号量集中单个信号量值。 |
系统调用函数的实现
(1)初始化函数sem_init
函数sem_init初始信号量的全局变量结构sem_ids,并在/proc文件系统中加上信号量的文件,函数sem_init列出如下(在ipc /sem.c中):
<span class="kw4">static</span> <span class="kw4">struct</span> ipc_ids sem_ids<span class="sy0">;</span> <span class="kw4">void</span> __init sem_init <span class="br0">(</span> <span class="kw4">void</span> <span class="br0">)</span> <span class="br0">{</span> used_sems <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="co1">//初始化全局变量结构sem_ids,分配最大128的信号量ID</span> ipc_init_ids<span class="br0">(</span> <span class="sy0">&</span> sem_ids<span class="sy0">,</span> sc_semmni<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//在proc进程中创建信号的文件及内容</span> <span class="co2">#ifdef CONFIG_PROC_FS</span> create_proc_read_entry<span class="br0">(</span> <span class="st0">"sysvipc/sem"</span> <span class="sy0">,</span> <span class="nu0">0</span> <span class="sy0">,</span> NULL<span class="sy0">,</span> sysvipc_sem_read_proc<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="co2">#endif</span> <span class="br0">}</span>
(2)系统调用sys_semget
系统调用sys_semget创建或打开一个信号量。其列出如下:
asmlinkage <span class="kw4">long</span> sys_semget<span class="br0">(</span> key_t key<span class="sy0">,</span> <span class="kw4">int</span> nsems<span class="sy0">,</span> <span class="kw4">int</span> semflg<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> ipc_namespace <span class="sy0">*</span> ns<span class="sy0">;</span> <span class="kw4">struct</span> ipc_ops sem_ops<span class="sy0">;</span> <span class="kw4">struct</span> ipc_params sem_params<span class="sy0">;</span> ns <span class="sy0">=</span> current<span class="sy0">-></span> nsproxy<span class="sy0">-></span> ipc_ns<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> nsems <span class="sy0"><</span> <span class="nu0">0</span> <span class="sy0">||</span> nsems <span class="sy0">></span> ns<span class="sy0">-></span> sc_semmsl<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> sem_ops.<span class="me1">getnew</span> <span class="sy0">=</span> newary<span class="sy0">;</span> sem_ops.<span class="me1">associate</span> <span class="sy0">=</span> sem_security<span class="sy0">;</span> sem_ops.<span class="me1">more_checks</span> <span class="sy0">=</span> sem_more_checks<span class="sy0">;</span> sem_params.<span class="me1">key</span> <span class="sy0">=</span> key<span class="sy0">;</span> sem_params.<span class="me1">flg</span> <span class="sy0">=</span> semflg<span class="sy0">;</span> sem_params.<span class="me1">u</span> .<span class="me1">nsems</span> <span class="sy0">=</span> nsems<span class="sy0">;</span> <span class="kw1">return</span> ipcget<span class="br0">(</span> ns<span class="sy0">,</span> <span class="sy0">&</span> sem_ids<span class="br0">(</span> ns<span class="br0">)</span> <span class="sy0">,</span> <span class="sy0">&</span> sem_ops<span class="sy0">,</span> <span class="sy0">&</span> sem_params<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
函数newary分配新的信号量集的大小,加信号量集中的成员sem_perm到全局变量结构sem_ids中,这样,其他进程通过sem_ids就可找到这个信号量集。另外,还初始化信号量集成员,得到信号量集的ID。 函数newary列出如下:
<span class="co2">#define IN_WAKEUP 1</span> <span class="kw4">static</span> <span class="kw4">int</span> newary <span class="br0">(</span> key_t key<span class="sy0">,</span> <span class="kw4">int</span> nsems<span class="sy0">,</span> <span class="kw4">int</span> semflg<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> id<span class="sy0">;</span> <span class="kw4">int</span> retval<span class="sy0">;</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="sy0">;</span> <span class="kw4">int</span> size<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> nsems<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> used_sems <span class="sy0">+</span> nsems <span class="sy0">></span> sc_semmns<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOSPC<span class="sy0">;</span> <span class="co1">//信号量集大小 = 信号量集结构大小 + 信号量数目*信号量结构大小</span> size <span class="sy0">=</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> sma<span class="br0">)</span> <span class="sy0">+</span> nsems <span class="sy0">*</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="kw4">struct</span> sem<span class="br0">)</span> <span class="sy0">;</span> sma <span class="sy0">=</span> ipc_rcu_alloc<span class="br0">(</span> size<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//分配空间</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> sma<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="br0">}</span> memset <span class="br0">(</span> sma<span class="sy0">,</span> <span class="nu0">0</span> <span class="sy0">,</span> size<span class="br0">)</span> <span class="sy0">;</span> sma<span class="sy0">-></span> sem_perm.<span class="me1">mode</span> <span class="sy0">=</span> <span class="br0">(</span> semflg <span class="sy0">&</span> S_IRWXUGO<span class="br0">)</span> <span class="sy0">;</span> sma<span class="sy0">-></span> sem_perm.<span class="me1">key</span> <span class="sy0">=</span> key<span class="sy0">;</span> sma<span class="sy0">-></span> sem_perm.<span class="me1">security</span> <span class="sy0">=</span> NULL<span class="sy0">;</span> retval <span class="sy0">=</span> security_sem_alloc<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//安全检查</span> <span class="kw1">if</span> <span class="br0">(</span> retval<span class="br0">)</span> <span class="br0">{</span> ipc_rcu_putref<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> retval<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//加一个IPC ID到全局变量结构sem_ids。</span> id <span class="sy0">=</span> ipc_addid<span class="br0">(</span> <span class="sy0">&</span> sem_ids<span class="sy0">,</span> <span class="sy0">&</span> sma<span class="sy0">-></span> sem_perm<span class="sy0">,</span> sc_semmni<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> id <span class="sy0">==</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="br0">{</span> security_sem_free<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> ipc_rcu_putref<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOSPC<span class="sy0">;</span> <span class="br0">}</span> used_sems <span class="sy0">+=</span> nsems<span class="sy0">;</span> sma<span class="sy0">-></span> sem_base <span class="sy0">=</span> <span class="br0">(</span> <span class="kw4">struct</span> sem <span class="sy0">*</span> <span class="br0">)</span> <span class="sy0">&</span> sma<span class="br0">[</span> <span class="nu0">1</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/* sma->sem_pending = NULL; */</span> sma<span class="sy0">-></span> sem_pending_last <span class="sy0">=</span> <span class="sy0">&</span> sma<span class="sy0">-></span> sem_pending<span class="sy0">;</span> <span class="coMULTI">/* sma->undo = NULL; */</span> sma<span class="sy0">-></span> sem_nsems <span class="sy0">=</span> nsems<span class="sy0">;</span> sma<span class="sy0">-></span> sem_ctime <span class="sy0">=</span> get_seconds<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//由IPC ID生成信号量集的ID,即SEQ_MULTIPLIER*seq + id;</span> <span class="kw1">return</span> sem_buildid<span class="br0">(</span> id<span class="sy0">,</span> sma<span class="sy0">-></span> sem_perm.<span class="me1">seq</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
(3)系统调用sys_semop
系统调用sys_semop操作信号量,决定当前进程是否睡眠等待。其列出如下:
asmlinkage <span class="kw4">long</span> sys_semop <span class="br0">(</span> <span class="kw4">int</span> semid<span class="sy0">,</span> <span class="kw4">struct</span> sembuf __user <span class="sy0">*</span> tsops<span class="sy0">,</span> <span class="kw4">unsigned</span> nsops<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">return</span> sys_semtimedop<span class="br0">(</span> semid<span class="sy0">,</span> tsops<span class="sy0">,</span> nsops<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
在进程的task_struct结构中维持了一个sem_undo结构队列,用于防止死锁,它表示进程占用资源未还,即进程有"债务",在进程exit退出时由内核归还。 函数sys_semtimedop列出如下:
asmlinkage <span class="kw4">long</span> sys_semtimedop<span class="br0">(</span> <span class="kw4">int</span> semid<span class="sy0">,</span> <span class="kw4">struct</span> sembuf __user <span class="sy0">*</span> tsops<span class="sy0">,</span> <span class="kw4">unsigned</span> nsops<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> timespec __user <span class="sy0">*</span> timeout<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> error <span class="sy0">=</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="sy0">;</span> <span class="kw4">struct</span> sembuf fast_sops<span class="br0">[</span> SEMOPM_FAST<span class="br0">]</span> <span class="sy0">;</span> <span class="kw4">struct</span> sembuf<span class="sy0">*</span> sops <span class="sy0">=</span> fast_sops<span class="sy0">,</span> <span class="sy0">*</span> sop<span class="sy0">;</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> un<span class="sy0">;</span> <span class="kw4">int</span> undos <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">,</span> decrease <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">,</span> alter <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">,</span> max<span class="sy0">;</span> <span class="kw4">struct</span> sem_queue queue<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> jiffies_left <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> nsops <span class="sy0"><</span> <span class="nu0">1</span> <span class="sy0">||</span> semid <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> nsops <span class="sy0">></span> sc_semopm<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> E2BIG<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> nsops <span class="sy0">></span> SEMOPM_FAST<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//分配多个信号量操作的空间</span> sops <span class="sy0">=</span> kmalloc<span class="br0">(</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> sops<span class="br0">)</span> <span class="sy0">*</span> nsops<span class="sy0">,</span> GFP_KERNEL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sops<span class="sy0">==</span> NULL<span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> ENOMEM<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//从用户空间拷贝得到信号量操作</span> <span class="kw1">if</span> <span class="br0">(</span> copy_from_user <span class="br0">(</span> sops<span class="sy0">,</span> tsops<span class="sy0">,</span> nsops <span class="sy0">*</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> tsops<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error<span class="sy0">=-</span> EFAULT<span class="sy0">;</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> timeout<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> timespec _timeout<span class="sy0">;</span> <span class="co1">//从用户空间拷贝得到定时信息</span> <span class="kw1">if</span> <span class="br0">(</span> copy_from_user<span class="br0">(</span> <span class="sy0">&</span> _timeout<span class="sy0">,</span> timeout<span class="sy0">,</span> <span class="kw4">sizeof</span> <span class="br0">(</span> <span class="sy0">*</span> timeout<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> <span class="sy0">-</span> EFAULT<span class="sy0">;</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> _timeout.<span class="me1">tv_sec</span> <span class="sy0"><</span> <span class="nu0">0</span> <span class="sy0">||</span> _timeout.<span class="me1">tv_nsec</span> <span class="sy0"><</span> <span class="nu0">0</span> <span class="sy0">||</span> _timeout.<span class="me1">tv_nsec</span> <span class="sy0">>=</span> <span class="nu0">1000000000L</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//将定时转换成内核时间计数jiffies</span> jiffies_left <span class="sy0">=</span> timespec_to_jiffies<span class="br0">(</span> <span class="sy0">&</span> _timeout<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> max <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw1">for</span> <span class="br0">(</span> sop <span class="sy0">=</span> sops<span class="sy0">;</span> sop <span class="sy0"><</span> sops <span class="sy0">+</span> nsops<span class="sy0">;</span> sop<span class="sy0">++</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_num <span class="sy0">>=</span> max<span class="br0">)</span> <span class="co1">//设置信号量最大值</span> max <span class="sy0">=</span> sop<span class="sy0">-></span> sem_num<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_flg <span class="sy0">&</span> SEM_UNDO<span class="br0">)</span> <span class="co1">//undo信号量</span> undos<span class="sy0">++;</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_op <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="co1">//操作小于0,表示要取得资源</span> decrease <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_op <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> <span class="co1">//操作大于0,归还资源</span> alter <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> <span class="br0">}</span> alter <span class="sy0">|=</span> decrease<span class="sy0">;</span> retry_undos<span class="sy0">:</span> <span class="kw1">if</span> <span class="br0">(</span> undos<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*查找当前进程的undo_list链表得到sem_undo结构un,如果没有un,就分配一个到semid对应的信号量集合中并初始化*/</span> un <span class="sy0">=</span> find_undo<span class="br0">(</span> semid<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> IS_ERR<span class="br0">(</span> un<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> error <span class="sy0">=</span> PTR_ERR<span class="br0">(</span> un<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">else</span> un <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="co1">//通过id在全局变量结构成员sem_ids中找到信号量集</span> sma <span class="sy0">=</span> sem_lock<span class="br0">(</span> semid<span class="br0">)</span> <span class="sy0">;</span> error<span class="sy0">=-</span> EINVAL<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sma<span class="sy0">==</span> NULL<span class="br0">)</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> EIDRM<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sem_checkid<span class="br0">(</span> sma<span class="sy0">,</span> semid<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//检查semid是否是与sma对应的</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> un <span class="sy0">&&</span> un<span class="sy0">-></span> semid <span class="sy0">==</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="br0">{</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> retry_undos<span class="sy0">;</span> <span class="br0">}</span> error <span class="sy0">=</span> <span class="sy0">-</span> EFBIG<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> max <span class="sy0">>=</span> sma<span class="sy0">-></span> sem_nsems<span class="br0">)</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> EACCES<span class="sy0">;</span> <span class="co1">//检查IPC的访问权限保护</span> <span class="kw1">if</span> <span class="br0">(</span> ipcperms<span class="br0">(</span> <span class="sy0">&</span> sma<span class="sy0">-></span> sem_perm<span class="sy0">,</span> alter <span class="sy0">?</span> S_IWUGO <span class="sy0">:</span> S_IRUGO<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> error <span class="sy0">=</span> security_sem_semop<span class="br0">(</span> sma<span class="sy0">,</span> sops<span class="sy0">,</span> nsops<span class="sy0">,</span> alter<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//安全检查</span> <span class="kw1">if</span> <span class="br0">(</span> error<span class="br0">)</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> <span class="coMULTI">/*信号量操作,是原子操作性的函数,返回0表示操作成功,当前进程已得到所有资源,返回负值表示操作失败,返回1表示需要睡眠等待*/</span> error <span class="sy0">=</span> try_atomic_semop <span class="br0">(</span> sma<span class="sy0">,</span> sops<span class="sy0">,</span> nsops<span class="sy0">,</span> un<span class="sy0">,</span> current<span class="sy0">-></span> tgid<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0"><=</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="co1">//如果不需要睡眠等待,跳转去更新</span> <span class="kw1">goto</span> update<span class="sy0">;</span> <span class="coMULTI">/*需要在这个操作上睡眠,放当前进程到挂起队列中并进入睡眠,填充信号量队列*/</span> queue.<span class="me1">sma</span> <span class="sy0">=</span> sma<span class="sy0">;</span> queue.<span class="me1">sops</span> <span class="sy0">=</span> sops<span class="sy0">;</span> queue.<span class="me1">nsops</span> <span class="sy0">=</span> nsops<span class="sy0">;</span> queue.<span class="me1">undo</span> <span class="sy0">=</span> un<span class="sy0">;</span> queue.<span class="me1">pid</span> <span class="sy0">=</span> current<span class="sy0">-></span> tgid<span class="sy0">;</span> queue.<span class="me1">id</span> <span class="sy0">=</span> semid<span class="sy0">;</span> <span class="co1">//睡眠时,将一个代表着当前进程的sem_queue数据结构链入到相应的sma->sem_pending队列中</span> <span class="kw1">if</span> <span class="br0">(</span> alter<span class="br0">)</span> append_to_queue<span class="br0">(</span> sma <span class="sy0">,&</span> queue<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//加在队尾</span> <span class="kw1">else</span> prepend_to_queue<span class="br0">(</span> sma <span class="sy0">,&</span> queue<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//加在</span> queue.<span class="me1">status</span> <span class="sy0">=</span> <span class="sy0">-</span> EINTR<span class="sy0">;</span> queue.<span class="me1">sleeper</span> <span class="sy0">=</span> current<span class="sy0">;</span> <span class="co1">//睡眠进程是当前进程</span> current<span class="sy0">-></span> state <span class="sy0">=</span> TASK_INTERRUPTIBLE<span class="sy0">;</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//调度</span> <span class="kw1">if</span> <span class="br0">(</span> timeout<span class="br0">)</span> jiffies_left <span class="sy0">=</span> schedule_timeout<span class="br0">(</span> jiffies_left<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">else</span> schedule<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> error <span class="sy0">=</span> queue.<span class="me1">status</span> <span class="sy0">;</span> <span class="kw1">while</span> <span class="br0">(</span> unlikely<span class="br0">(</span> error <span class="sy0">==</span> IN_WAKEUP<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> cpu_relax<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> error <span class="sy0">=</span> queue.<span class="me1">status</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0">!=</span> <span class="sy0">-</span> EINTR<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//update_queue已获得所有请求的资源</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="co1">//正常退出</span> <span class="br0">}</span> <span class="co1">//通过id在全局变量结构成员sem_ids中找到信号量集</span> sma <span class="sy0">=</span> sem_lock<span class="br0">(</span> semid<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sma<span class="sy0">==</span> NULL<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> queue.<span class="me1">prev</span> <span class="sy0">!=</span> NULL<span class="br0">)</span> BUG<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> error <span class="sy0">=</span> <span class="sy0">-</span> EIDRM<span class="sy0">;</span> <span class="kw1">goto</span> out_free<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//如果queue.status != -EINTR,表示我们被另外一个进程唤醒</span> error <span class="sy0">=</span> queue.<span class="me1">status</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0">!=</span> <span class="sy0">-</span> EINTR<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//如果一个中断发生,我们将必须清除队列</span> <span class="kw1">if</span> <span class="br0">(</span> timeout <span class="sy0">&&</span> jiffies_left <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> error <span class="sy0">=</span> <span class="sy0">-</span> EAGAIN<span class="sy0">;</span> remove_from_queue<span class="br0">(</span> sma<span class="sy0">,&</span> queue<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> out_unlock_free<span class="sy0">;</span> update<span class="sy0">:</span> <span class="kw1">if</span> <span class="br0">(</span> alter<span class="br0">)</span> <span class="co1">//如果操作需要改变信号量的值</span> update_queue <span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> out_unlock_free<span class="sy0">:</span> sem_unlock<span class="br0">(</span> sma<span class="br0">)</span> <span class="sy0">;</span> out_free<span class="sy0">:</span> <span class="kw1">if</span> <span class="br0">(</span> sops <span class="sy0">!=</span> fast_sops<span class="br0">)</span> kfree<span class="br0">(</span> sops<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> error<span class="sy0">;</span> }
函数try_atomic_semop决定一系列信号量操作是否成功,如果成功就返回0,返回1表示需要睡眠,其他表示错误。函数 try_atomic_semop列出如下:
<span class="kw4">static</span> <span class="kw4">int</span> try_atomic_semop <span class="br0">(</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="sy0">,</span> <span class="kw4">struct</span> sembuf <span class="sy0">*</span> sops<span class="sy0">,</span> <p> <span class="kw4">int</span> nsops<span class="sy0">,</span> <span class="kw4">struct</span> sem_undo <span class="sy0">*</span> un<span class="sy0">,</span> <span class="kw4">int</span> pid<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> result<span class="sy0">,</span> sem_op<span class="sy0">;</span> <span class="kw4">struct</span> sembuf <span class="sy0">*</span> sop<span class="sy0">;</span> <span class="kw4">struct</span> sem <span class="sy0">*</span> curr<span class="sy0">;</span> <span class="co1">//遍历每个信号操作</span> <span class="kw1">for</span> <span class="br0">(</span> sop <span class="sy0">=</span> sops<span class="sy0">;</span> sop <span class="sy0"><</span> sops <span class="sy0">+</span> nsops<span class="sy0">;</span> sop<span class="sy0">++</span> <span class="br0">)</span> <span class="br0">{</span> curr <span class="sy0">=</span> sma<span class="sy0">-></span> sem_base <span class="sy0">+</span> sop<span class="sy0">-></span> sem_num<span class="sy0">;</span> <span class="co1">//得到操作对应的信号量</span> sem_op <span class="sy0">=</span> sop<span class="sy0">-></span> sem_op<span class="sy0">;</span> result <span class="sy0">=</span> curr<span class="sy0">-></span> semval<span class="sy0">;</span> <span class="co1">//信号量的值</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> sem_op <span class="sy0">&&</span> result<span class="br0">)</span> <span class="kw1">goto</span> would_block<span class="sy0">;</span> result <span class="sy0">+=</span> sem_op<span class="sy0">;</span> <span class="co1">//信号量的值+操作值</span> <span class="kw1">if</span> <span class="br0">(</span> result <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="co1">//小于0,无资源可用,应阻塞</span> <span class="kw1">goto</span> would_block<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> result <span class="sy0">></span> SEMVMX<span class="br0">)</span> <span class="co1">//超出信号量值的范围</span> <span class="kw1">goto</span> out_of_range<span class="sy0">;</span> <span class="co1">//去恢复到操作前的semval值</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_flg <span class="sy0">&</span> SEM_UNDO<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//undo操作:减去操作值,</span> <span class="kw4">int</span> undo <span class="sy0">=</span> un<span class="sy0">-></span> semadj<span class="br0">[</span> sop<span class="sy0">-></span> sem_num<span class="br0">]</span> <span class="sy0">-</span> sem_op<span class="sy0">;</span> <span class="co1">//超出undo范围是一个错误</span> <span class="kw1">if</span> <span class="br0">(</span> undo <span class="sy0"><</span> <span class="br0">(</span> <span class="sy0">-</span> SEMAEM <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">||</span> undo <span class="sy0">></span> SEMAEM<span class="br0">)</span> <span class="kw1">goto</span> out_of_range<span class="sy0">;</span> <span class="br0">}</span> curr<span class="sy0">-></span> semval <span class="sy0">=</span> result<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//遍历每个信号操作,</span> sop<span class="sy0">--;</span> <span class="kw1">while</span> <span class="br0">(</span> sop <span class="sy0">>=</span> sops<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//信号量集中每个信号量赋上pid</span> sma<span class="sy0">-></span> sem_base<span class="br0">[</span> sop<span class="sy0">-></span> sem_num<span class="br0">]</span> .<span class="me1">sempid</span> <span class="sy0">=</span> pid<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sop<span class="sy0">-></span> sem_flg <span class="sy0">&</span> SEM_UNDO<span class="br0">)</span></p> <pre> <span class="co1">//保存操作的undo值</span>
un->semadj[sop->sem_num] -= sop->sem_op;
sop--;
}
//得到操作时间
sma->sem_otime = get_seconds();
return 0;
out_of_range:
result = -ERANGE;
goto undo;
would_block: //阻塞进程
if (sop->sem_flg & IPC_NOWAIT) //不等待,立即返回
result = -EAGAIN;
else
result = 1; //需等待
undo:
//将前面已完成的操作都减掉,恢复到操作前的semval值
sop--;
while (sop >= sops) {
sma->sem_base[sop->sem_num].semval -= sop->sem_op;
sop--;
}
return result;
}
函数update_queue遍历挂起队列,找到所要的信号量,以及能被完成的进程,对它们进行信号量操作,并从队列中移走挂起的进程,进而唤醒进程。函数update_queue列出如下:
<span class="kw4">static</span> <span class="kw4">void</span> update_queue <span class="br0">(</span> <span class="kw4">struct</span> sem_array <span class="sy0">*</span> sma<span class="br0">)</span> <p><span class="br0">{</span> <span class="kw4">int</span> error<span class="sy0">;</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> q<span class="sy0">;</span> q <span class="sy0">=</span> sma<span class="sy0">-></span> sem_pending<span class="sy0">;</span> <span class="kw1">while</span> <span class="br0">(</span> q<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//遍历睡眠中等待队列来进行信号量操作</span> error <span class="sy0">=</span> try_atomic_semop<span class="br0">(</span> sma<span class="sy0">,</span> q<span class="sy0">-></span> sops<span class="sy0">,</span> q<span class="sy0">-></span> nsops<span class="sy0">,</span> q<span class="sy0">-></span> undo<span class="sy0">,</span> q<span class="sy0">-></span> pid<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//信号量操作</span> <span class="co1">//q->sleeper是否还需要睡眠</span> <span class="kw1">if</span> <span class="br0">(</span> error <span class="sy0"><=</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//不需要睡眠等待</span> <span class="kw4">struct</span> sem_queue <span class="sy0">*</span> n<span class="sy0">;</span> remove_from_queue<span class="br0">(</span> sma<span class="sy0">,</span> q<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//从队列中移走挂起的进程</span> n <span class="sy0">=</span> q<span class="sy0">-></span> next<span class="sy0">;</span> q<span class="sy0">-></span> status <span class="sy0">=</span> IN_WAKEUP<span class="sy0">;</span> wake_up_process<span class="br0">(</span> q<span class="sy0">-></span> sleeper<span class="br0">)</span> <span class="sy0">;</span> 唤醒睡眠进程 <span class="co1">//q将在写q->status操作后立即消失 </span> q<span class="sy0">-></span> status <span class="sy0">=</span> error<span class="sy0">;</span> q <span class="sy0">=</span> n<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> q <span class="sy0">=</span> q<span class="sy0">-></span> next<span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span></p> <span class="br0">}</span>
快速用户空间互斥锁(Futex)
快速用户空间互斥锁(fast userspace mutex,Futex)是快速的用户空间的锁,是对传统的System V同步方式的一种替代,传统同步方式如:信号量、文件锁和消息队列,在每次锁访问时需要进行系统调用。而futex仅在有竞争的操作时才用系统调用访问内核,这样,在竞争出现较少的情况下,可以大幅度地减少工作负载
futex在非竞争情况下可从用户空间获取和释放,不需要进入内核。与信号量类似,它有一个可以原子增减的计数器,进程可以等待计数器值变为正数。用户进程通过系统调用对资源的竞争作一个公断。
futex是一个用户空间的整数值,被多个线程或进程共享。Futex的系统调用对该整数值时进行操作,仲裁竞争的访问。glibc中的 NPTL库封装了futex系统调用,对futex接口进行了抽象。用户通过NPTL库像传统编程一样地使用线程同步API函数,而不会感觉到futex 的存在。
futex的实现机制是:如果当前进程访问临界区时,该临界区正被另一个进程使用,当前进程将锁用一个值标识,表示"有一个等待者正挂起",并且调用sys_futex(FUTEX_WAIT)等待其他进程释放它。内核在内部创建futex队列,以便以后与唤醒者匹配等待者。当临界区拥有者线程释放了futex,它通过变量值发出通知表示还有多个等待者在挂起,并调用系统调用sys_futex(FUTEX_WAKE)唤醒它们。一旦所有等待者已获取资源并释放锁时,futex回到非竞争状态,并没有内核状态与它相关。
robust futex是为了解决futex锁崩溃而对futex进行了增强。例如:当一个进程在持有pthread_mutex_t锁正与其他进程发生竞争时,进程因某种意外原因而提前退出,如:进程发生段错误,或者被用户用shell命令kill -9-ed"强行退出,此时,需要有一种机制告诉等待者"锁的最一个持有者已经非正常地退出"。"
为了解决此类问题,NPTL创建了robust mutex用户空间API pthread_mutex_lock(),如果锁的拥有者进程提前退出,pthread_mutex_lock()返回一个错误值,新的拥有者进程可以决定是否可以安全恢复被锁保护的数据。
信号
信号概述
信号(signal)用来向一个或多个进程发送异步事件信号,是在软件层次上对中断机制的一种模拟,一个进程收到信号与处理器收到一个中断请求的处理过程类似。进程间通信机制中只有信号是异步的,进程不必通过任何操作等待信号的到达,也不知信号何时到达。信号来源于硬件(如硬件故障)或软件(如:一些非法运算)。信号机制经过POSIX实时扩展后,功能更加强大,除了基本通知功能外,还可以传递附加信息。
(1)信号定义
Linux内核用一个word类型变量代表所有信号,每个信号占一位,因此,32位平台最多有32个信号。Linux定义好了一组信号,可以由内核线程或用户进程产生。POSIX.1定义的信号说明如表1,它们定义在include/asm-x86/signal.h中。
表1 POSIX.1定义的信号说明信号 | 信号值 | 处理动作 | 发出信号的原因 |
SIGHUP | 1 | A | 终端挂起或者控制进程终止 |
SIGINT | 2 | A | 键盘中断(如break键被按下) |
SIGQUIT | 3 | C | 键盘的退出键被按下 |
SIGILL | 4 | C | 非法指令 |
SIGABRT | 6 | C | 由abort(3)发出的退出指令 |
SIGFPE | 8 | C | 浮点异常 |
SIGKILL | 9 | AEF | Kill信号 |
SIGSEGV | 11 | C | 无效的内存引用 |
SIGPIPE | 13 | A | 管道破裂: 写一个没有读端口的管道 |
SIGALRM | 14 | A | 由alarm(2)发出的信号 |
SIGTERM | 15 | A | 终止信号 |
SIGUSR1 | 30,10,16 | A | 用户自定义信号1 |
SIGUSR2 | 31,12,17 | A | 用户自定义信号2 |
SIGCHLD | 20,17,18 | B | 子进程结束信号 |
SIGCONT | 19,18,25 | 进程继续(曾被停止的进程) | |
SIGSTOP | 17,19,23 | DEF | 终止进程 |
SIGTSTP | 18,20,24 | D | 控制终端(tty)上按下停止键 |
SIGTTIN | 21,21,26 | D | 后台进程企图从控制终端读 |
SIGTTOU | 22,22,27 | D | 后台进程企图从控制终端 |
备注: 1. "值"列表示不同硬件平台的信号定义值,第1个值对应Alpha和Sparc,中间值对应i386、ppc和sh,最后值对应mips。 2. "处理动作"列字母含义:A表示缺省的动作是终止进程,B表示缺省的动作是忽略此信号,C表示缺省的动作是终止进程并进行内核转储(dump core),D表示缺省的动作是停止进程,E表示信号不能被捕获,F表示信号不能被忽略。 3. 信号SIGKILL和SIGSTOP既不能被捕捉,也不能被忽略。 |
信号的两个主要目的是使一个进程意识到特定事件已发生,以及强迫一个进程执行一个信号处理。信号由事件引起,事件的来源说明如下:
- 异常:进程运行过程中出现异常;
- 其他进程:一个进程可以向另一个或一组进程发送信号;
- 终端中断:按下键Ctrl-C,Ctrl-/等;
- 作业控制:前台、后台进程的管理;
- 分配额:CPU超时或文件大小突破限制;
- 通知:通知进程某事件发生,如I/O就绪等;
- 报警:计时器到期。
(2)实时信号与非实时信号
非实时信号是值位于SIGRTMIN(值为31)以下的常规信号,发射多次时,只有其中一个送到接收进程。
实时信号是值位于SIGRTMIN(值为31)和SIGRTMAX(值为63)之间的信号,是POSIX标准在原常规信号的基础上扩展而成。实时信号支持信号排队,当进程发射多个信号时,多个信号都能被接收到。
(3)信号响应
信号的生命周期包括信号的产生、挂起的信号和信号的响应。挂起的信号是指已发送但还没有被接收的信号;信号的响应采取注册的动作来传送或处理信号。
当一个进程通过系统调用给另一个进程发送信号时,Linux内核将接收进程的任务结构信号域设置对应该信号的位。如果接收进程睡眠在可被中断的任务状态上时,则唤醒进程,如果睡眠在其他任务状态时,则仅设置信号域的相应位,不唤醒进程。
接收进程检查信号的时机是:从系统调用返回,或者进入/离开睡眠状态时。因此,接收进程对信号并不立即响应,而是在检查信号的时机才执行相应的响应函数。
进程对信号的响应有三种方式:忽略信号、捕捉信号和执行默认操作。忽略信号指接收到信号,但不执行响应函数,忽略信号与信号阻塞的区别是:信号阻塞是将信号用掩码过滤掉,不传递信号,忽略信号是传递了信号,但不执行响应函数。
捕捉信号是指给信号定义响应函数,当信号发生时,就执行自定义的处理函数。由于用户定义的响应函数在用户空间,而信号的检查在内核空间进行,用户空间的函数不允许在内核空间执行,因此,内核在用户栈上创建一个新的层,该层中将返回地址的值设置成用户定义的处理函数的地址,这样进程从内核返回弹出栈顶时就返回到用户定义的函数处,从函数返回再弹出栈顶时,才返回原先进入内核的地方。
执行默认操作是指执行Linux对每种信号规定了的默认操作函数。
信号相关系统调用说明
Linux内核分别为非实时信号和实时信号提供了两套系统调用,用来让用户进程发送信号、设置信号响应函数、挂起信号等操作。这些系统调用的功能说明如表3所示。
表3 与信号相关的系统调用功能说明信号种类 | 系统调用函数名 | 功能说明 |
非实时信号 | sys_signal | 较早使用,已被sys_sigaction替代。 |
sys_kill | 向进程组发送一个信号。 | |
sys_tkill | 向进程发送一个信号。 | |
sys_tgkill | 向一个特定线程组中的进程发送信号。 | |
sys_sigaction | 设置或改变信号的响应函数。 | |
sys_sigsuspend | 将进程挂起等待一个信号。 | |
sys_sigpending | 检查是否有挂起的信号。 | |
sys_sigreturn | 当用户的信号响应函数结束时将自动调用此系统调用。将保存于信号堆栈中的进程上下文恢复至内核堆栈的上下文中。 | |
sys_sigprocmask | 修改信号的集合。 | |
sys_sigaltstack | 允许进程定义可替换的信号堆栈。 | |
sys_rt_sigreturn | 与sys_sigreturn一样。 | |
实时信号 | sys_rt_sigaction | 与sys_sigaction一样。 |
sys_rt_sigprocmask | 与sys_sigprocmask一样。 | |
sys_rt_sigpending | 与sys_sigpending一样。 | |
sys_rt_sigtimedwait | 等待一段时间后,向线程发送一个信号。 | |
sys_rt_sigqueueinfo | 向线程发送一个信号。 | |
sys_rt_sigsuspend | 与sys_sigsuspend一样。 |
信号相关数据结构
与信号相关的数据结构之间的关系如图1所示,下面按此图分别说明各个数据结构。
图1 与信号相关的数据结构之间的关系
(1)进程描述结构中的信号域
进程描述结构task_struct中有信号处理的数据成员,用来存储信号信息及处理信号。下面列出task_struct结构中与信号处理相关的成员:
<span class="kw4">struct</span> task_struct<span class="br0">{</span> … <span class="kw4">int</span> sigpending <span class="sy0">;</span> …… <span class="kw4">struct</span> signal_struct <span class="sy0">*</span> signal<span class="sy0">;</span> <span class="co1">//该进程待处理的全部信号</span> <span class="kw4">struct</span> sighand_struct <span class="sy0">*</span> sighand<span class="sy0">;</span> …… <span class="kw4">int</span> exit_code<span class="sy0">,</span> exit_signal<span class="sy0">;</span> <span class="kw4">int</span> pdeath_signal<span class="sy0">;</span> <span class="co1">//当父进程死时发送的信号</span> …… spinlock_t sigmask_lock<span class="sy0">;</span> <span class="co1">//保护信号和阻塞</span> <span class="kw4">struct</span> signal_struct <span class="sy0">*</span> sig<span class="sy0">;</span> <span class="coMULTI">/*blocker是一个位图,存放该进程需要阻塞的信号掩码,如果某位为1,说明对应的信号正被阻塞。除了SIGSTOP和SIGKILL,其他信号都可被阻塞。被阻塞信号一直保留等待处理,直到进程解除阻塞*/</span> sigset_t blocked<span class="sy0">;</span> <span class="kw4">struct</span> sigpending pending<span class="sy0">;</span> <span class="co1">//记录当进程在用户空间执行信号处理程序时的堆栈位置</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> sas_ss_sp<span class="sy0">;</span> size_t sas_ss_size<span class="sy0">;</span> <span class="co1">//堆栈的大小</span> …… <span class="br0">}</span>
(2)信号描述结构signal_struct
信号描述结构signal_struct用来跟踪挂起信号,还包括信号需要使用的一些进程信息,如:资源限制数组rlim、时间变量等。同一进程组的所有进程共享一个信号描述结构,含有进程组共享的信号挂起队列。
信号描述结构signal_struct没有它自己的锁,因为一个共享的信号结构总是暗示一个共享的信号处理结构,这样锁住信号处理结构(sighand_struct)也总是锁住信号结构。
信号描述结构signal_struct列出如下(在include/linux/sched.h中):
<span class="kw4">struct</span> signal_struct <span class="br0">{</span> atomic_t count<span class="sy0">;</span> atomic_t live<span class="sy0">;</span> wait_queue_head_t wait_chldexit<span class="sy0">;</span> <span class="coMULTI">/* 用于wait4() */</span> <span class="coMULTI">/*当前线程组信号负载平衡目标*/</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> curr_target<span class="sy0">;</span> <span class="coMULTI">/* 共享的挂起信号 */</span> <span class="kw4">struct</span> sigpending shared_pending<span class="sy0">;</span> <span class="coMULTI">/* 线程组退出支持*/</span> <span class="kw4">int</span> group_exit_code<span class="sy0">;</span> <span class="coMULTI">/* 超负载: * - 当->count 计数值等于notify_count 时,通知group_exit_task任务 * -在致命信号分发期间,除了group_exit_task外,所有任务被停止,group_exit_task处理该信号*/</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> group_exit_task<span class="sy0">;</span> <span class="kw4">int</span> notify_count<span class="sy0">;</span> <span class="coMULTI">/* 支持线程组停止*/</span> <span class="kw4">int</span> group_stop_count<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> flags<span class="sy0">;</span> <span class="coMULTI">/* 信号标识SIGNAL_* */</span> <span class="coMULTI">/* POSIX.1b内部定时器*/</span> <span class="kw4">struct</span> list_head posix_timers<span class="sy0">;</span> <span class="coMULTI">/*用于进程的ITIMER_REAL 实时定时器*/</span> <span class="kw4">struct</span> hrtimer real_timer<span class="sy0">;</span> <span class="kw4">struct</span> pid <span class="sy0">*</span> leader_pid<span class="sy0">;</span> ktime_t it_real_incr<span class="sy0">;</span> <span class="coMULTI">/* 用于进程的ITIMER_PROF和ITIMER_VIRTUAL 定时器 */</span> cputime_t it_prof_expires<span class="sy0">,</span> it_virt_expires<span class="sy0">;</span> cputime_t it_prof_incr<span class="sy0">,</span> it_virt_incr<span class="sy0">;</span> <span class="coMULTI">/* 工作控制ID*/</span> <span class="coMULTI">/*不推荐使用pgrp和session域,而使用task_session_Xnr和task_pgrp_Xnr*/</span> <span class="kw4">union</span> <span class="br0">{</span> pid_t pgrp __deprecated<span class="sy0">;</span> pid_t __pgrp<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">struct</span> pid <span class="sy0">*</span> tty_old_pgrp<span class="sy0">;</span> <span class="kw4">union</span> <span class="br0">{</span> pid_t session __deprecated<span class="sy0">;</span> pid_t __session<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="coMULTI">/* 是否为会话组领导*/</span> <span class="kw4">int</span> leader<span class="sy0">;</span> <span class="kw4">struct</span> tty_struct <span class="sy0">*</span> tty<span class="sy0">;</span> <span class="coMULTI">/* 如果没有控制台,值为NULL*/</span> <span class="coMULTI">/*可累积的资源计数器,用于组中死线程和该组创建的死孩子线程。活线程维护它们自己的计数器,并在__exit_signal中添加到除了组领导之外的成员中*/</span> cputime_t utime<span class="sy0">,</span> stime<span class="sy0">,</span> cutime<span class="sy0">,</span> cstime<span class="sy0">;</span> cputime_t gtime<span class="sy0">;</span> cputime_t cgtime<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> nvcsw<span class="sy0">,</span> nivcsw<span class="sy0">,</span> cnvcsw<span class="sy0">,</span> cnivcsw<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> min_flt<span class="sy0">,</span> maj_flt<span class="sy0">,</span> cmin_flt<span class="sy0">,</span> cmaj_flt<span class="sy0">;</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> inblock<span class="sy0">,</span> oublock<span class="sy0">,</span> cinblock<span class="sy0">,</span> coublock<span class="sy0">;</span> <span class="coMULTI">/*已调度CPU的累积时间(ns),用于组中的死线程,还包括僵死的组领导*/</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="kw4">long</span> sum_sched_runtime<span class="sy0">;</span> <span class="kw4">struct</span> rlimit rlim<span class="br0">[</span> RLIM_NLIMITS<span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/*资源限制*/</span> <span class="kw4">struct</span> list_head cpu_timers<span class="br0">[</span> <span class="nu0">3</span> <span class="br0">]</span> <span class="sy0">;</span> …… <span class="br0">}</span>
(2)信号处理结构sighand_struct
每个进程含有信号处理结构sighand_struct,用来包含所有信号的响应函数。结构sighand_struct用数组存放这些函数,还添加了引用计数、自旋锁和等待队列用于管理该数组。结构sighand_struct列出如下(在include/linux/sched.h中):
<span class="kw4">struct</span> sighand_struct <span class="br0">{</span> atomic_t count<span class="sy0">;</span> <span class="kw4">struct</span> k_sigaction action<span class="br0">[</span> _NSIG<span class="br0">]</span> <span class="sy0">;</span> <span class="co1">//平台所有信号的响应函数,x86-64平台上_NSIG为64</span> spinlock_t siglock<span class="sy0">;</span> <span class="co1">//自旋锁</span> wait_queue_head_t signalfd_wqh<span class="sy0">;</span> <span class="co1">//等待队列</span> <span class="br0">}</span> <span class="sy0">;</span>
(3)信号响应结构k_sigaction
信号响应用结构k_sigaction描述,它包含处理函数地址、标识等信息,其列出如下(在include/asm-x386 /signal.h):
<span class="kw4">struct</span> k_sigaction <span class="br0">{</span> <span class="kw4">struct</span> sigaction sa<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span> <span class="kw4">struct</span> sigaction <span class="br0">{</span> __sighandler_t sa_handler<span class="sy0">;</span> <span class="co1">//信号处理程序的入口地址,为用户空间函数</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> sa_flags<span class="sy0">;</span> <span class="co1">//信号如何处理标识,如:忽略信号、内核处理信号 </span> __sigrestore_t sa_restorer<span class="sy0">;</span> <span class="co1">//信号处理后的恢复函数</span> <span class="coMULTI">/*每一位对应一个信号,位为1时,屏蔽该位对应的信号。在执行一个信号处理程序的过程中应该将该种信号自动屏蔽,以防同一处理程序的嵌套。*/</span> sigset_t sa_mask<span class="sy0">;</span> <span class="br0">}</span> <span class="sy0">;</span>
(4)挂起信号及队列结构
挂起信号用结构sigpending描述,内核通过共享挂起信号结构存放进程组的挂起信号,用私挂起信号结构存放特定进程的挂起信号。对于实时信号,结构sigpending用挂起信号队列list存放挂起的信号。
结构sigpending列出如下(在include/linux/signal.h中):
<span class="kw4">struct</span> sigpending <span class="br0">{</span> <span class="kw4">struct</span> list_head list<span class="sy0">;</span> <span class="coMULTI">/*挂起信号队列*/</span> sigset_t signal<span class="sy0">;</span> <span class="co1">//挂起信号的位掩码</span> <span class="br0">}</span> <span class="sy0">;</span>
结构sigqueue描述了实时挂起信号,其结构实例组成挂起信号队列,只有实时信号才会用到该结构。其列出如下:
<span class="kw4">struct</span> sigqueue <span class="br0">{</span> <span class="kw4">struct</span> list_head list<span class="sy0">;</span> <span class="kw4">int</span> flags<span class="sy0">;</span> <span class="co1">//信号如何处理标识</span> siginfo_t info<span class="sy0">;</span> <span class="co1">//描述产生信号的事件</span> <span class="kw4">struct</span> user_struct <span class="sy0">*</span> user<span class="sy0">;</span> <span class="co1">//指向进程拥有者的用户数据结构</span> <span class="br0">}</span> <span class="sy0">;</span>
设置信号响应
在C库中,安装信号响应的函数为sigaction,其定义列出如下:
<span class="kw4">int</span> sigaction<span class="br0">(</span> <span class="kw4">int</span> signum<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">struct</span> sigaction <span class="sy0">*</span> newact<span class="sy0">,</span> <span class="kw4">struct</span> <span class="sy0">*</span> sigaction oldact<span class="br0">)</span> <span class="sy0">;</span>
内核有三个系统调用sys_signal,sys_sigaction和sys_rt_sigaction与之对应,根据函数sigaction传递的signum来确定选用哪个系统调用。这三个系统调用都是调用函数do_sigaction完成具体操作的。它们区别只是在参数上的处理有些不同,系统调用sys_signal是为了向后兼容而用的,功能上被sigaction替代了。这里只分析do_sigaction函数。
这三个系统调用允许用户给一个信号定义一个信号响应动作,如果没有定义一个动作,内核接收信号时执行默认的动作。
函数do_sigaction的功能是删除挂起的信号,存储旧的信号响应,设置新的信号响应。其中,参数sig是信号号码,参数act是新的信号动作定义,参数oact是输出参数,它输出与信号相关的以前的动作定义,函数列出如下(在kernel/signal.c中):
<span class="kw4">int</span> do_sigaction<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> k_sigaction <span class="sy0">*</span> act<span class="sy0">,</span> <span class="kw4">struct</span> k_sigaction <span class="sy0">*</span> oact<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> t <span class="sy0">=</span> current<span class="sy0">;</span> <span class="coMULTI">/*得到当前进程的任务结构*/</span> <span class="kw4">struct</span> k_sigaction <span class="sy0">*</span> k<span class="sy0">;</span> sigset_t mask<span class="sy0">;</span> <span class="coMULTI">/*_NSIG是信号最大数目64,函数sig_kernel_only 表示sig<64 && sig是SIGKILL或SIGSTOP,此两个信号的响应不允许更改*/</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> valid_signal<span class="br0">(</span> sig<span class="br0">)</span> <span class="sy0">||</span> sig <span class="sy0"><</span> <span class="nu0">1</span> <span class="sy0">||</span> <span class="br0">(</span> act <span class="sy0">&&</span> sig_kernel_only<span class="br0">(</span> sig<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> <span class="sy0">-</span> EINVAL<span class="sy0">;</span> k <span class="sy0">=</span> <span class="sy0">&</span> t<span class="sy0">-></span> sighand<span class="sy0">-></span> action<span class="br0">[</span> sig<span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">]</span> <span class="sy0">;</span> <span class="coMULTI">/*获取当前进程中信号对应的响应函数*/</span> spin_lock_irq<span class="br0">(</span> <span class="sy0">&</span> current<span class="sy0">-></span> sighand<span class="sy0">-></span> siglock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> oact<span class="br0">)</span> <span class="sy0">*</span> oact <span class="sy0">=</span> <span class="sy0">*</span> k<span class="sy0">;</span> <span class="coMULTI">/*返回旧的信号响应函数*/</span> <span class="kw1">if</span> <span class="br0">(</span> act<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*删除已响应信号对应的掩码,防止信号递归*/</span> sigdelsetmask<span class="br0">(</span> <span class="sy0">&</span> act<span class="sy0">-></span> sa.<span class="me1">sa_mask</span> <span class="sy0">,</span> sigmask<span class="br0">(</span> SIGKILL<span class="br0">)</span> <span class="sy0">|</span> sigmask<span class="br0">(</span> SIGSTOP<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="sy0">*</span> k <span class="sy0">=</span> <span class="sy0">*</span> act<span class="sy0">;</span> <span class="coMULTI">/*设置新的信号响应函数*/</span> <span class="kw1">if</span> <span class="br0">(</span> __sig_ignored<span class="br0">(</span> t<span class="sy0">,</span> sig<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*如果为忽略信号,则删除信号*/</span> sigemptyset<span class="br0">(</span> <span class="sy0">&</span> mask<span class="br0">)</span> <span class="sy0">;</span> sigaddset<span class="br0">(</span> <span class="sy0">&</span> mask<span class="sy0">,</span> sig<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*从挂起信号集和队列中用掩码删除信号,如果发现信号,返回1*/</span> rm_from_queue_full<span class="br0">(</span> <span class="sy0">&</span> mask<span class="sy0">,</span> <span class="sy0">&</span> t<span class="sy0">-></span> signal<span class="sy0">-></span> shared_pending<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">do</span> <span class="br0">{</span> rm_from_queue_full<span class="br0">(</span> <span class="sy0">&</span> mask<span class="sy0">,</span> <span class="sy0">&</span> t<span class="sy0">-></span> pending<span class="br0">)</span> <span class="sy0">;</span> t <span class="sy0">=</span> next_thread<span class="br0">(</span> t<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">while</span> <span class="br0">(</span> t <span class="sy0">!=</span> current<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> spin_unlock_irq<span class="br0">(</span> <span class="sy0">&</span> current<span class="sy0">-></span> sighand<span class="sy0">-></span> siglock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">}</span>
信号分发
发送信号的系统调用有sys_kill,sys_tgkill,sys_tkill和sys_rt_sigqueueinfo。其中,sys_kill中的参数pid为0时,表示发送给当前进程所在进程组中所有的进程,pid为-1时则发送给系统中的所有进程。系统调用 sys_tgkill发送信号到指定组ID和进程ID的进程,系统调用sys_tkill发送信号只给一个为ID的进程。系统调用 sys_rt_sigqueueinfo发送的信号可传递附加信息,只发送给特定的进程。
分发给特定进程的信号,存放在进程的任务结构的私有挂起信号结构中,分发给进程组的信号,存放在组中各个进程的任务结构的共享挂起信号结构中。
下面仅分析系统调用sys_kill,其调用层次图如图3所示。
图3 函数sys_kill调用层次图
系统调用sys_kill列出如下(在kernel/signal.c中):
asmlinkage <span class="kw4">long</span> sys_kill<span class="br0">(</span> <span class="kw4">int</span> pid<span class="sy0">,</span> <span class="kw4">int</span> sig<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> siginfo info<span class="sy0">;</span> info.<span class="me1">si_signo</span> <span class="sy0">=</span> sig<span class="sy0">;</span> info.<span class="me1">si_errno</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> info.<span class="me1">si_code</span> <span class="sy0">=</span> SI_USER<span class="sy0">;</span> info.<span class="me1">si_pid</span> <span class="sy0">=</span> current<span class="sy0">-></span> tgid<span class="sy0">;</span> info.<span class="me1">si_uid</span> <span class="sy0">=</span> current<span class="sy0">-></span> uid<span class="sy0">;</span> <span class="kw1">return</span> kill_something_info<span class="br0">(</span> sig<span class="sy0">,</span> <span class="sy0">&</span> info<span class="sy0">,</span> pid<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
函数kill_something_info根据pid值的不同,调用不同函数发送信号。其列出如下:
<span class="kw4">static</span> <span class="kw4">int</span> kill_something_info<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> siginfo <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">int</span> pid<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> pid<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//pid为0时,表示发送给当前进程所在进程组中所有的进程</span> <span class="kw1">return</span> kill_pg_info<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> process_group<span class="br0">(</span> current<span class="br0">)</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> pid <span class="sy0">==</span> <span class="sy0">-</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">// pid为-1时发送给系统中的所有进程,</span> <span class="co1">//除了swapper(PID 0)、init(PID 1)和当前进程</span> <span class="kw4">int</span> retval <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">,</span> count <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> p<span class="sy0">;</span> read_lock<span class="br0">(</span> <span class="sy0">&</span> tasklist_lock<span class="br0">)</span> <span class="sy0">;</span> for_each_process<span class="br0">(</span> p<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> p<span class="sy0">-></span> pid <span class="sy0">></span> <span class="nu0">1</span> <span class="sy0">&&</span> p<span class="sy0">-></span> tgid <span class="sy0">!=</span> current<span class="sy0">-></span> tgid<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> err <span class="sy0">=</span> group_send_sig_info<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="sy0">++</span> count<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> err <span class="sy0">!=</span> <span class="sy0">-</span> EPERM<span class="br0">)</span> retval <span class="sy0">=</span> err<span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> read_unlock<span class="br0">(</span> <span class="sy0">&</span> tasklist_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> count <span class="sy0">?</span> retval <span class="sy0">:</span> <span class="sy0">-</span> ESRCH<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> pid <span class="sy0"><</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//pid < -1时,发送给进程组中所有进程</span> <span class="kw1">return</span> kill_pg_info<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> <span class="sy0">-</span> pid<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//发送给pid进程</span> <span class="kw1">return</span> kill_proc_info<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> pid<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span>
函数group_send_sig_info发送信号到进程组,函数分析如下:
<span class="kw4">int</span> group_send_sig_info<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> siginfo <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> p<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> flags<span class="sy0">;</span> <span class="kw4">int</span> ret<span class="sy0">;</span> <span class="co1">//检查是否有发信号许可</span> ret <span class="sy0">=</span> check_kill_permission<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> ret <span class="sy0">&&</span> sig <span class="sy0">&&</span> p<span class="sy0">-></span> sighand<span class="br0">)</span> <span class="br0">{</span> spin_lock_irqsave<span class="br0">(</span> <span class="sy0">&</span> p<span class="sy0">-></span> sighand<span class="sy0">-></span> siglock<span class="sy0">,</span> flags<span class="br0">)</span> <span class="sy0">;</span> ret <span class="sy0">=</span> __group_send_sig_info<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//发送给进程组</span> spin_unlock_irqrestore<span class="br0">(</span> <span class="sy0">&</span> p<span class="sy0">-></span> sighand<span class="sy0">-></span> siglock<span class="sy0">,</span> flags<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="br0">}</span> <span class="kw4">static</span> <span class="kw4">int</span> __group_send_sig_info<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> siginfo <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> p<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> ret <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="co2">#ifdef CONFIG_SMP</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> spin_is_locked<span class="br0">(</span> <span class="sy0">&</span> p<span class="sy0">-></span> sighand<span class="sy0">-></span> siglock<span class="br0">)</span> <span class="br0">)</span> BUG<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co2">#endif</span> <span class="co1">//处理stop/continue信号进程范围内的影响 </span> handle_stop_signal<span class="br0">(</span> sig<span class="sy0">,</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="br0">)</span> info <span class="sy0">></span> <span class="nu0">2</span> <span class="br0">)</span> <span class="sy0">&&</span> <span class="br0">(</span> info<span class="sy0">-></span> si_code <span class="sy0">==</span> SI_TIMER<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//建立ret来表示我们访问了这个信号</span> ret <span class="sy0">=</span> info<span class="sy0">-></span> si_sys_private<span class="sy0">;</span> <span class="coMULTI">/*短路忽略的信号,如果目标进程的“信号向量表”中对所投递信号的响应是“忽略”(SIG_IGN),并且不在跟踪模式中,也没有加以屏蔽,就不用投递了。*/</span> <span class="kw1">if</span> <span class="br0">(</span> sig_ignored<span class="br0">(</span> p<span class="sy0">,</span> sig<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> LEGACY_QUEUE<span class="br0">(</span> <span class="sy0">&</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> shared_pending<span class="sy0">,</span> sig<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//这是非实时信号并且我们已有一个排队 </span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="coMULTI">/*把信号放在共享的挂起队列里,我们总是对进程范围的信号使用共享队列,避免几个信号的竞争*/</span> ret <span class="sy0">=</span> send_signal<span class="br0">(</span> sig<span class="sy0">,</span> info<span class="sy0">,</span> p<span class="sy0">,</span> <span class="sy0">&</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> shared_pending<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> ret<span class="br0">)</span> <span class="br0">)</span> <span class="kw1">return</span> ret<span class="sy0">;</span> __group_complete_signal<span class="br0">(</span> sig<span class="sy0">,</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="br0">}</span>
函数send_signal完成了信号投递工作,将发送的信号排队到signals中。函数send_signal分析如下(在 kernel/signal.c中):
<span class="kw4">static</span> <span class="kw4">int</span> send_signal<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> siginfo <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> t<span class="sy0">,</span> <span class="kw4">struct</span> sigpending <span class="sy0">*</span> signals<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> sigqueue <span class="sy0">*</span> q <span class="sy0">=</span> NULL<span class="sy0">;</span> <span class="kw4">int</span> ret <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="co1">//内核内部的快速路径信号,是SIGSTOP或SIGKILL</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="br0">)</span> info <span class="sy0">==</span> <span class="nu0">2</span> <span class="br0">)</span> <span class="kw1">goto</span> out_set<span class="sy0">;</span> <span class="co1">//如果由sigqueue发送的信号,实时信号必须被排队</span> <span class="kw1">if</span> <span class="br0">(</span> atomic_read<span class="br0">(</span> <span class="sy0">&</span> t<span class="sy0">-></span> user<span class="sy0">-></span> sigpending<span class="br0">)</span> <span class="sy0"><</span> t<span class="sy0">-></span> rlim<span class="br0">[</span> RLIMIT_SIGPENDING<span class="br0">]</span> .<span class="me1">rlim_cur</span> <span class="br0">)</span> q <span class="sy0">=</span> kmem_cache_alloc<span class="br0">(</span> sigqueue_cachep<span class="sy0">,</span> GFP_ATOMIC<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//分配对象空间</span> <span class="kw1">if</span> <span class="br0">(</span> q<span class="br0">)</span> <span class="br0">{</span> <span class="co1">//信号排队</span> q<span class="sy0">-></span> flags <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> q<span class="sy0">-></span> user <span class="sy0">=</span> get_uid<span class="br0">(</span> t<span class="sy0">-></span> user<span class="br0">)</span> <span class="sy0">;</span> atomic_inc<span class="br0">(</span> <span class="sy0">&</span> q<span class="sy0">-></span> user<span class="sy0">-></span> sigpending<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//加入到signals链表</span> list_add_tail<span class="br0">(</span> <span class="sy0">&</span> q<span class="sy0">-></span> list<span class="sy0">,</span> <span class="sy0">&</span> signals<span class="sy0">-></span> list<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">switch</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="br0">)</span> info<span class="br0">)</span> <span class="br0">{</span> <span class="kw1">case</span> <span class="nu0">0</span> <span class="sy0">:</span> q<span class="sy0">-></span> info.<span class="me1">si_signo</span> <span class="sy0">=</span> sig<span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_errno</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_code</span> <span class="sy0">=</span> SI_USER<span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_pid</span> <span class="sy0">=</span> current<span class="sy0">-></span> pid<span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_uid</span> <span class="sy0">=</span> current<span class="sy0">-></span> uid<span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="kw1">case</span> <span class="nu0">1</span> <span class="sy0">:</span> q<span class="sy0">-></span> info.<span class="me1">si_signo</span> <span class="sy0">=</span> sig<span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_errno</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_code</span> <span class="sy0">=</span> SI_KERNEL<span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_pid</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> q<span class="sy0">-></span> info.<span class="me1">si_uid</span> <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="kw1">default</span> <span class="sy0">:</span> copy_siginfo<span class="br0">(</span> <span class="sy0">&</span> q<span class="sy0">-></span> info<span class="sy0">,</span> info<span class="br0">)</span> <span class="sy0">;</span> <span class="kw2">break</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="kw1">if</span> <span class="br0">(</span> sig <span class="sy0">>=</span> SIGRTMIN <span class="sy0">&&</span> info <span class="sy0">&&</span> <span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="br0">)</span> info <span class="sy0">!=</span> <span class="nu0">1</span> <span class="sy0">&&</span> info<span class="sy0">-></span> si_code <span class="sy0">!=</span> SI_USER<span class="br0">)</span> <span class="co1">//队列溢出,退出。如果信号是实时的,并且被使用非kill的用户发送,就可以退出。</span> <span class="kw1">return</span> <span class="sy0">-</span> EAGAIN<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="br0">(</span> <span class="br0">(</span> <span class="kw4">unsigned</span> <span class="kw4">long</span> <span class="br0">)</span> info <span class="sy0">></span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">&&</span> <span class="br0">(</span> info<span class="sy0">-></span> si_code <span class="sy0">==</span> SI_TIMER<span class="br0">)</span> <span class="br0">)</span> ret <span class="sy0">=</span> info<span class="sy0">-></span> si_sys_private<span class="sy0">;</span> <span class="br0">}</span> out_set<span class="sy0">:</span> sigaddset<span class="br0">(</span> <span class="sy0">&</span> signals<span class="sy0">-></span> signal<span class="sy0">,</span> sig<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//将接收位图中相应的标志位设置成1</span> <span class="kw1">return</span> ret<span class="sy0">;</span> <span class="br0">}</span>
函数void __group_complete_signal进行完成信号分发后的处理,它唤醒线程从队列中取下信号,如果信号是致命的,则将线程组停下来。其列出如下:
<span class="kw4">static</span> <span class="kw4">void</span> __group_complete_signal<span class="br0">(</span> <span class="kw4">int</span> sig<span class="sy0">,</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> p<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> mask<span class="sy0">;</span> <span class="kw4">struct</span> task_struct <span class="sy0">*</span> t<span class="sy0">;</span> <span class="coMULTI">/*不打搅僵死或已停止的任务,但SIGKILL将通过停止状态给一定的惩罚值*/</span> mask <span class="sy0">=</span> TASK_DEAD <span class="sy0">|</span> TASK_ZOMBIE <span class="sy0">|</span> TASK_TRACED<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> sig <span class="sy0">!=</span> SIGKILL<span class="br0">)</span> mask <span class="sy0">|=</span> TASK_STOPPED<span class="sy0">;</span> <span class="co1">//如果进程p需要信号</span> <span class="kw1">if</span> <span class="br0">(</span> wants_signal<span class="br0">(</span> sig<span class="sy0">,</span> p<span class="sy0">,</span> mask<span class="br0">)</span> <span class="br0">)</span> t <span class="sy0">=</span> p<span class="sy0">;</span> <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">(</span> thread_group_empty<span class="br0">(</span> p<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//目标线程组是否为空</span> <span class="coMULTI">/*线程组为空,仅仅有一个线程并且它不必被唤醒,它在再次运行之前将从队列取下非阻塞的信号。*/</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="kw1">else</span> <span class="br0">{</span> <span class="co1">//尝试查找一个合适的线程</span> t <span class="sy0">=</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> curr_target<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> t <span class="sy0">==</span> NULL<span class="br0">)</span> <span class="coMULTI">/* 在这个线程重启动平衡*/</span> t <span class="sy0">=</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> curr_target <span class="sy0">=</span> p<span class="sy0">;</span> BUG_ON<span class="br0">(</span> t<span class="sy0">-></span> tgid <span class="sy0">!=</span> p<span class="sy0">-></span> tgid<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">while</span> <span class="br0">(</span> <span class="sy0">!</span> wants_signal<span class="br0">(</span> sig<span class="sy0">,</span> t<span class="sy0">,</span> mask<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> t <span class="sy0">=</span> next_thread<span class="br0">(</span> t<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> t <span class="sy0">==</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> curr_target<span class="br0">)</span> <span class="co1">//没有线程需要被唤醒,不久后任何合格的线程将看见信号在队列里</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> curr_target <span class="sy0">=</span> t<span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//找到一个可杀死的线程,如果信号将是致命的,那就开始把整个组停下来*</span> <span class="kw1">if</span> <span class="br0">(</span> sig_fatal<span class="br0">(</span> p<span class="sy0">,</span> sig<span class="br0">)</span> <span class="sy0">&&</span> <span class="sy0">!</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_exit <span class="sy0">&&</span> <span class="sy0">!</span> sigismember<span class="br0">(</span> <span class="sy0">&</span> t<span class="sy0">-></span> real_blocked<span class="sy0">,</span> sig<span class="br0">)</span> <span class="sy0">&&</span> <span class="br0">(</span> sig <span class="sy0">==</span> SIGKILL <span class="sy0">||</span> <span class="sy0">!</span> <span class="br0">(</span> t<span class="sy0">-></span> ptrace <span class="sy0">&</span> PT_PTRACED<span class="br0">)</span> <span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//这个信号对整个进程组是致命的?如果SIGQUIT、SIGABRT等</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> sig_kernel_coredump<span class="br0">(</span> sig<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="co1">//非coredump信号</span> <span class="coMULTI">/*开始一个进程组的退出并且唤醒每个组成员。这种方式下,在一个较慢线程致使的信号挂起后,我们没有使其他线程运行并且做一些事*/</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_exit <span class="sy0">=</span> <span class="nu0">1</span> <span class="sy0">;</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_exit_code <span class="sy0">=</span> sig<span class="sy0">;</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_stop_count <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> t <span class="sy0">=</span> p<span class="sy0">;</span> <span class="kw1">do</span> <span class="br0">{</span> sigaddset<span class="br0">(</span> <span class="sy0">&</span> t<span class="sy0">-></span> pending.<span class="me1">signal</span> <span class="sy0">,</span> SIGKILL<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//设置上SIGKILL</span> <span class="coMULTI">/*告诉一个进程它有一个新的激活信号,唤醒进程t,状态为1即TASK_INTERRUPTIBLE*/</span> signal_wake_up<span class="br0">(</span> t<span class="sy0">,</span> <span class="nu0">1</span> <span class="br0">)</span> <span class="sy0">;</span> t <span class="sy0">=</span> next_thread<span class="br0">(</span> t<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">while</span> <span class="br0">(</span> t <span class="sy0">!=</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="coMULTI">/*这里是core dump,我们让所有线程而不是一个选中的线程进入一个组停止,以至于直到它得到调度,从共享队列取出信号,并且做core dump之前没有事情发生。这比严格的需要有更多一点复杂性,但它保持了在core dump中信号状态从死状态起没有变化,在死亡状态中线程上有非阻塞的core-dump信号*/</span> rm_from_queue<span class="br0">(</span> SIG_KERNEL_STOP_MASK<span class="sy0">,</span> <span class="sy0">&</span> t<span class="sy0">-></span> pending<span class="br0">)</span> <span class="sy0">;</span> rm_from_queue<span class="br0">(</span> SIG_KERNEL_STOP_MASK<span class="sy0">,</span> <span class="sy0">&</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> shared_pending<span class="br0">)</span> <span class="sy0">;</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_stop_count <span class="sy0">=</span> <span class="nu0">0</span> <span class="sy0">;</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_exit_task <span class="sy0">=</span> t<span class="sy0">;</span> t <span class="sy0">=</span> p<span class="sy0">;</span> <span class="kw1">do</span> <span class="br0">{</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_stop_count<span class="sy0">++;</span> signal_wake_up<span class="br0">(</span> t<span class="sy0">,</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//唤醒进程</span> t <span class="sy0">=</span> next_thread<span class="br0">(</span> t<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">while</span> <span class="br0">(</span> t <span class="sy0">!=</span> p<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*在信号分发致命信号期间,除了group_exit_task外的其他任务被停止,group_exit_task任务处理这个致命信号。*/</span> wake_up_process<span class="br0">(</span> p<span class="sy0">-></span> signal<span class="sy0">-></span> group_exit_task<span class="br0">)</span> <span class="sy0">;</span> <span class="co1">//唤醒group_exit_task</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="co1">//信号已被放在共享挂起队列里,告诉选中线程唤醒并从队列上取下信号。</span> signal_wake_up<span class="br0">(</span> t<span class="sy0">,</span> sig <span class="sy0">==</span> SIGKILL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span>
信号响应
在中断机制中,CPU在每条指令结束时都要检测中断请求是否存在,信号机制则是与软中断一样,当从系统调用、中断处理或异常处理返回到用户空间前、进程唤醒时检测信号的存在,并做出响应的。
信号响应因信号操作方式的不同而不同。分别说明如下:
- 如果信号操作方式指定为默认(SIG_DEL)处理,则通常的操作为终止进程,即调用函数do_exit退出。少数信号在进程退出时需要进行内核转储(core dump),内核转储通过函数do_coredump实现。如果信号为可延缓类型,则将进程转为TASK_STOPPED状态。
- 信号为SIGCHLD且指定操作方式为忽视(SIG_IGN)时,则释放僵尸进程的子进程。
- 如果接收进程注册了信号响应函数,则调应函数handle_signal完成信号响应。
信号响应过程如图3所示。当用户空间将一个信号发送给另一个进程时,接收进程在内核空间的进程上下文设置信号值,信号成为挂起的信号。当接收进程从系统调用返回或中断返回时,接收进程在内核空间将调用函数do_notify_resume检查处理挂起的信号,该函数调用函数 handle_signal处理信号,调用函数setup_rt_frame建立响应函数(它是用户注册的用户空间响应函数)的用户空间堆栈。进程通过堆栈返回到用户空间执行响应函数。当响应函数执行完成时,堆栈返回代码调用系统调用sys_sigreturn,恢复内核空间和用户空间堆栈,此系统调用完成时,返回到用户空间继续执行程序。
图3 信号响应过程
(1)系统调用返回触发信号响应
当系统调用返回时,线程会处理信号,系统调用返回时处理信号的代码列出如下(在arch/x86/kernel/entry_64.S中):
sysret_signal<span class="sy0">:</span> TRACE_IRQS_ON ENABLE_INTERRUPTS<span class="br0">(</span> CLBR_NONE<span class="br0">)</span> testl $_TIF_DO_NOTIFY_MASK<span class="sy0">,%</span> <span class="kw3">edx</span> <span class="kw1">jz</span> <span class="nu0">1f</span> <span class="sy0">/*</span> 是一个信号<span class="sy0">*/</span> <span class="sy0">/*</span> <span class="kw3">edx</span> 为函数第三个参数thread_info_flags <span class="sy0">*/</span> leaq do_notify_resume<span class="br0">(</span> <span class="sy0">%</span> rip<span class="br0">)</span> <span class="sy0">,%</span> rax <span class="sy0">/*</span> 将函数do_notify_resume的指令地址存入<span class="sy0">%</span> rax<span class="sy0">*/</span> leaq <span class="sy0">-</span> ARGOFFSET<span class="br0">(</span> <span class="sy0">%</span> rsp<span class="br0">)</span> <span class="sy0">,%</span> rdi # 函数第<span class="nu0">1</span> 个参数&pt_regs xorl <span class="sy0">%</span> <span class="kw3">esi</span> <span class="sy0">,%</span> <span class="kw3">esi</span> #函数第<span class="nu0">2</span> 个参数 oldset <span class="kw1">call</span> ptregscall_common <span class="sy0">/*</span> 调试并运行<span class="kw1">call</span> <span class="sy0">*%</span> rax调用函数do_notify_resume <span class="sy0">*/</span> <span class="nu0">1</span> <span class="sy0">:</span> movl $_TIF_NEED_RESCHED<span class="sy0">,%</span> <span class="kw3">edi</span> DISABLE_INTERRUPTS<span class="br0">(</span> CLBR_NONE<span class="br0">)</span> TRACE_IRQS_OFF <span class="kw1">jmp</span> int_with_check
函数do_notify_resume的调用层次图如图3所示,它根据线程信息标识进行相应的操作,如: 处理挂起的信号。
图3 函数do_notify_resume调用层次图
函数do_notify_resume列出如下(在arch/x86/kernel/sigal_64.c中):
<span class="kw4">void</span> do_notify_resume<span class="br0">(</span> <span class="kw4">struct</span> pt_regs <span class="sy0">*</span> regs<span class="sy0">,</span> <span class="kw4">void</span> <span class="sy0">*</span> unused<span class="sy0">,</span> __u32 thread_info_flags<span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/* Pending single-step? */</span> <span class="kw1">if</span> <span class="br0">(</span> thread_info_flags <span class="sy0">&</span> _TIF_SINGLESTEP<span class="br0">)</span> <span class="br0">{</span> regs<span class="sy0">-></span> flags <span class="sy0">|=</span> X86_EFLAGS_TF<span class="sy0">;</span> clear_thread_flag<span class="br0">(</span> TIF_SINGLESTEP<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="coMULTI">/* 处理挂起的信号 */</span> <span class="kw1">if</span> <span class="br0">(</span> thread_info_flags <span class="sy0">&</span> _TIF_SIGPENDING<span class="br0">)</span> do_signal<span class="br0">(</span> regs<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> thread_info_flags <span class="sy0">&</span> _TIF_HRTICK_RESCHED<span class="br0">)</span> hrtick_resched<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span>
函数do_signal用来处理非阻塞(未被屏蔽)的挂起信号,其参数regs是堆栈区域的地址,含有当前进程的用户模式寄存器内容。它根据信号操作方式的不同进行不同的信号响应操作。其列出如下:
<span class="kw4">static</span> <span class="kw4">void</span> do_signal<span class="br0">(</span> <span class="kw4">struct</span> pt_regs <span class="sy0">*</span> regs<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> k_sigaction ka<span class="sy0">;</span> siginfo_t info<span class="sy0">;</span> <span class="kw4">int</span> signr<span class="sy0">;</span> sigset_t <span class="sy0">*</span> oldset<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span> user_mode<span class="br0">(</span> regs<span class="br0">)</span> <span class="br0">)</span> <span class="co1">//如果regs不是用户模式的堆栈,直接返回</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> current_thread_info<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">-></span> status <span class="sy0">&</span> TS_RESTORE_SIGMASK<span class="br0">)</span> oldset <span class="sy0">=</span> <span class="sy0">&</span> current<span class="sy0">-></span> saved_sigmask<span class="sy0">;</span> <span class="coMULTI">/*存放将恢复的信号掩码*/</span> <span class="kw1">else</span> oldset <span class="sy0">=</span> <span class="sy0">&</span> current<span class="sy0">-></span> blocked<span class="sy0">;</span> <span class="coMULTI">/*存放阻塞的信号掩码*/</span> <span class="coMULTI">/*获取需要分发的信号*/</span> signr <span class="sy0">=</span> get_signal_to_deliver<span class="br0">(</span> <span class="sy0">&</span> info<span class="sy0">,</span> <span class="sy0">&</span> ka<span class="sy0">,</span> regs<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> signr <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/* 在分发信号到用户空间之间,重打开watchpoints,如果在内核内部触发watchpoint,线程必须清除处理器寄存器*/</span> <span class="kw1">if</span> <span class="br0">(</span> current<span class="sy0">-></span> thread.<span class="me1">debugreg7</span> <span class="br0">)</span> set_debugreg<span class="br0">(</span> current<span class="sy0">-></span> thread.<span class="me1">debugreg7</span> <span class="sy0">,</span> <span class="nu0">7</span> <span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/* 处理信号 */</span> <span class="kw1">if</span> <span class="br0">(</span> handle_signal<span class="br0">(</span> signr<span class="sy0">,</span> <span class="sy0">&</span> info<span class="sy0">,</span> <span class="sy0">&</span> ka<span class="sy0">,</span> oldset<span class="sy0">,</span> regs<span class="br0">)</span> <span class="sy0">==</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">{</span> <span class="coMULTI">/*信号被成功处理:存储的sigmask将已存放在信号帧中,并将被信号返回恢复,因此,这里仅简单地清除TS_RESTORE_SIGMASK标识*/</span> current_thread_info<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">-></span> status <span class="sy0">&=</span> ~TS_RESTORE_SIGMASK<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">return</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="coMULTI">/*运行到这里,说明获取分发的信号失败*/</span> <span class="coMULTI">/*省略系统调用返回的错误处理*/</span> ….. <span class="coMULTI">/*如果没有信号分发,仅将存储的sigmask放回*/</span> <span class="kw1">if</span> <span class="br0">(</span> current_thread_info<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">-></span> status <span class="sy0">&</span> TS_RESTORE_SIGMASK<span class="br0">)</span> <span class="br0">{</span> current_thread_info<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">-></span> status <span class="sy0">&=</span> ~TS_RESTORE_SIGMASK<span class="sy0">;</span> sigprocmask<span class="br0">(</span> SIG_SETMASK<span class="sy0">,</span> <span class="sy0">&</span> current<span class="sy0">-></span> saved_sigmask<span class="sy0">,</span> NULL<span class="br0">)</span> <span class="sy0">;</span> <span class="br0">}</span> <span class="br0">}</span>
(2)从进程上下文中获取信号并进行信号的缺省操作
函数get_signal_to_deliver从进程上下文中获取信号,如果是缺省操作方式,则执行信号的缺省操作,否则返回信号,让函数 handle_signal去执行。其列出如下(在kernel/signal.c中):
<span class="kw4">int</span> get_signal_to_deliver<span class="br0">(</span> siginfo_t <span class="sy0">*</span> info<span class="sy0">,</span> <span class="kw4">struct</span> k_sigaction <span class="sy0">*</span> return_ka<span class="sy0">,</span> <span class="kw4">struct</span> pt_regs <span class="sy0">*</span> regs<span class="sy0">,</span> <span class="kw4">void</span> <span class="sy0">*</span> cookie<span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> sighand_struct <span class="sy0">*</span> sighand <span class="sy0">=</span> current<span class="sy0">-></span> sighand<span class="sy0">;</span> <span class="kw4">struct</span> signal_struct <span class="sy0">*</span> signal <span class="sy0">=</span> current<span class="sy0">-></span> signal<span class="sy0">;</span> <span class="kw4">int</span> signr<span class="sy0">;</span> relock<span class="sy0">:</span> try_to_freeze<span class="br0">(</span> <span class="br0">)</span> <span class="sy0">;</span> spin_lock_irq<span class="br0">(</span> <span class="sy0">&</span> sighand<span class="sy0">-></span> siglock<span class="br0">)</span> <span class="sy0">;</span> <span class="coMULTI">/*在唤醒后,每个已停止的线程运行到这里,检查看是否应通知父线程,prepare_signal(SIGCONT)将CLD_ si_code编码进SIGNAL_CLD_MASK位*/</span> <span class="coMULTI">/* SIGNAL_CLD_MASK为(SIGNAL_CLD_STOPPED|SIGNAL_CLD_CONTINUED)*/</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> signal<span class="sy0">-></span> flags <span class="sy0">&</span> SIGNAL_CLD_MASK<span class="br0">)</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">int</span> why <span class="sy0">=</span> <span class="br0">(</span> signal<span class="sy0">-></span> flags <span class="sy0">&</span> SIGNAL_STOP_CONTINUED<span class="br0">)</span> <span class="sy0">?</span> CLD_CONTINUED <span class="sy0">:</span> CLD_STOPPED<span class="sy0">;</span> <span class="co1">//表示孩子线程继续或停止</span> signal<span class="sy0">-></span> flags <span class="sy0">&=</span> ~SIGNAL_CLD_MASK<span class="sy0">;</span> spin_unlock_irq<span class="br0">(</span> <span class="sy0">&</span> sighand<span class="sy0">-></span> siglock<span class="br0">)</span> <span class="sy0">;</span> read_lock<span class="br0">(</span> <span class="sy0">&</span> tasklist_lock<span class="br0">)</span> <span class="sy0">;</span> do_notify_parent_cldstop<span class="br0">(</span> current<span class="sy0">-></span> group_leader<span class="sy0">,</span> why<span class="br0">)</span> <span class="sy0">;</span> read_unlock<span class="br0">(</span> <span class="sy0">&</span> tasklist_lock<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">goto</span> relock<span class="sy0">;</span> <span class="br0">}</span> <span class="kw1">for</span> <span class="br0">(</span> <span class="sy0">;;</span> <span class="br0">)</span> <span class="br0">{</span> <span class="kw4">struct</span> k_sigaction <span class="sy0">*</span> ka<span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> unlikely<span class="br0">(</span> signal<span class="sy0">-></span> group_stop_count <span class="sy0">></span> <span class="nu0">0</span> <span class="br0">)</span> <span class="sy0">&&</span> do_signal_stop<span class="br0">(</span> <span class="nu0">0</span> <span class="br0">)</span> <span class="br0">)</span> <span class="kw1">goto</span> relock<span class="sy0">;</span> <span class="coMULTI">/*从当前进程上下文中获取一个信号*/</span> signr <span class="sy0">=</span> dequeue_signal<span class="br0">(</span> current<span class="sy0">,</span> <span class="sy0">&</span> current<span class="sy0">-></span> blocked<span class="sy0">,</span> info<span class="br0">)</span> <span class="sy0">;</span> <span class="kw1">if</span> <span class="br0">(</span> <span class="sy0">!</span>