unix文件描述符——socket

117 篇文章 6 订阅
52 篇文章 0 订阅

在unix系统中,socket和普通文件一样对待,因为它可以像普通文件一样被读和写,但是它还有一些自己独特的特点,例如,文件的读写位置可以设置,但是socket只能被顺序的读写等等,那么在unix系统中,是如何实现这种方式的呢?

如下图,其中有以下重要数据结构:proc、filedesc、file等,对这些重要数据结构及其之间的关系弄清楚之后,上面的问题自然就有答案了。在本文介绍中,使用的操作系统源码为:4.4bsd-lite版本,该版本是《TCP/IP协议卷2——实现》一书使用的源码,同时该源码相对于目前使用的linux操作系统来更为小巧、简单,更适合学习。


(1)每个进程在OS中都有一个数据结构(struct filedesc)为之对应,通常称该数据结构为PCB(进程控制块),该数据结构中详细定义了控制该进程所需的全部数据,这里只需重点关注其成员:struct  filedesc *p_fd,该成员指向一个文件描述信息的数据结构。struct proc定义在文件4.4BSD-Lite\sys\sys\proc.h中,详细定义如下:

struct	proc {
	struct	proc *p_forw;		/* Doubly-linked run/sleep queue. */
	struct	proc *p_back;
	struct	proc *p_next;		/* Linked list of active procs */
	struct	proc **p_prev;		/*    and zombies. */

	/* substructures: */
	struct	pcred *p_cred;		/* Process owner's identity. */
	struct	filedesc *p_fd;		/* Ptr to open files structure. */
	struct	pstats *p_stats;	/* Accounting/statistics (PROC ONLY). */
	struct	plimit *p_limit;	/* Process limits. */
	struct	vmspace *p_vmspace;	/* Address space. */
	struct	sigacts *p_sigacts;	/* Signal actions, state (PROC ONLY). */

#define	p_ucred		p_cred->pc_ucred
#define	p_rlimit	p_limit->pl_rlimit

	int	p_flag;			/* P_* flags. */
	char	p_stat;			/* S* process status. */
	char	p_pad1[3];

	pid_t	p_pid;			/* Process identifier. */
	struct	proc *p_hash;	 /* Hashed based on p_pid for kill+exit+... */
	struct	proc *p_pgrpnxt; /* Pointer to next process in process group. */
	struct	proc *p_pptr;	 /* Pointer to process structure of parent. */
	struct	proc *p_osptr;	 /* Pointer to older sibling processes. */

/* The following fields are all zeroed upon creation in fork. */
#define	p_startzero	p_ysptr
	struct	proc *p_ysptr;	 /* Pointer to younger siblings. */
	struct	proc *p_cptr;	 /* Pointer to youngest living child. */
	pid_t	p_oppid;	 /* Save parent pid during ptrace. XXX */
	int	p_dupfd;	 /* Sideways return value from fdopen. XXX */

	/* scheduling */
	u_int	p_estcpu;	 /* Time averaged value of p_cpticks. */
	int	p_cpticks;	 /* Ticks of cpu time. */
	fixpt_t	p_pctcpu;	 /* %cpu for this process during p_swtime */
	void	*p_wchan;	 /* Sleep address. */
	char	*p_wmesg;	 /* Reason for sleep. */
	u_int	p_swtime;	 /* Time swapped in or out. */
	u_int	p_slptime;	 /* Time since last blocked. */

	struct	itimerval p_realtimer;	/* Alarm timer. */
	struct	timeval p_rtime;	/* Real time. */
	u_quad_t p_uticks;		/* Statclock hits in user mode. */
	u_quad_t p_sticks;		/* Statclock hits in system mode. */
	u_quad_t p_iticks;		/* Statclock hits processing intr. */

	int	p_traceflag;		/* Kernel trace points. */
	struct	vnode *p_tracep;	/* Trace to vnode. */

	int	p_siglist;		/* Signals arrived but not delivered. */

	struct	vnode *p_textvp;	/* Vnode of executable. */

	long	p_spare[5];		/* pad to 256, avoid shifting eproc. */

/* End area that is zeroed on creation. */
#define	p_endzero	p_startcopy

/* The following fields are all copied upon creation in fork. */
#define	p_startcopy	p_sigmask

	sigset_t p_sigmask;	/* Current signal mask. */
	sigset_t p_sigignore;	/* Signals being ignored. */
	sigset_t p_sigcatch;	/* Signals being caught by user. */

	u_char	p_priority;	/* Process priority. */
	u_char	p_usrpri;	/* User-priority based on p_cpu and p_nice. */
	char	p_nice;		/* Process "nice" value. */
	char	p_comm[MAXCOMLEN+1];

	struct 	pgrp *p_pgrp;	/* Pointer to process group. */

/* End area that is copied on creation. */
#define	p_endcopy	p_thread
	int	p_thread;	/* Id for this "thread"; Mach glue. XXX */
	struct	user *p_addr;	/* Kernel virtual addr of u-area (PROC ONLY). */
	struct	mdproc p_md;	/* Any machine-dependent fields. */

	u_short	p_xstat;	/* Exit status for wait; also stop signal. */
	u_short	p_acflag;	/* Accounting flags. */
	struct	rusage *p_ru;	/* Exit information. XXX */

};

(2)struct filedesc结构体描述了进程打开的所有的文件信息,这里需要重点关注该结构体的两个数组成员:struct       file **fd_ofiles和char   *fd_ofileflags(见上图),其中:数组fd_ofiles的每个成员对应一个当前进程打开的文件结构体的地址;数组fd_ofileflags的每个成员对应当前进程打开的一个文件的描述符标志,文件描述符的标志采用bit位表示,因此一个打开的文件共有8个bit位表示8中不同的标志,例如标志close-on-exec和标志mapped-from-device。另外,这两个数组的成员是对应的,即:

fd_ofiles[n]对应打开当前进程打开的第n个文件的文件结构体地址;

fd_ofileflags[n]对应当前进程打开的第n个文件的描述符标志;

struct filedesc的定义在文件4.4BSD-Lite\sys\sys\filedesc.h中,其详细定义如下:

struct filedesc {
	struct	file **fd_ofiles;	/* file structures for open files */
	char	*fd_ofileflags;		/* per-process open file flags */
	struct	vnode *fd_cdir;		/* current directory */
	struct	vnode *fd_rdir;		/* root directory */
	int	fd_nfiles;		/* number of open files allocated */
	u_short	fd_lastfile;		/* high-water mark of fd_ofiles */
	u_short	fd_freefile;		/* approx. next free file */
	u_short	fd_cmask;		/* mask for file creation */
	u_short	fd_refcnt;		/* reference count */
};

(3)结构体struct file表示当前进程中一个打开的文件,这里将关注其成员short f_type、struct fileops *f_ops和caddr_t f_data。其中:

short f_type表示当前打开文件的类型,在文件4.4BSD-Lite\usr\src\sys\sys\file.h中定义了成员shortf_type的两种类型:

#define   DTYPE_VNODE     1     /*file */

#define   DTYPE_SOCKET    2     /*communications endpoint */

例如其值为DTYPE_SOCKET表示当前打开的文件是一个socket,值为DTYPE_VNODE表示一个普通的文件等等;

struct fileops *f_ops定义了5个函数指针,它们将根据具体的文件类型(f_type表示文件类型)指向具体的函数,例如当前打开的文件为socket时(f_type值为DTYPE_SOCKET),这5个函数指针将指向以下5个操作socket的函数:soo_read、soo_write、soo_ioctl、soo_select、soo_close,当打开的文件为普通文件时(f_type值为DTYPE_VNODE),这5个函数指针将指向以下5个操作socket的函数:vn_read、vn_write、vn_ioctl、vn_select、vn_close;

caddr_t f_data对应了该打开文件所对应的数据部分,对应vnode或者socket结构体;这里类型caddr_t 实质上是一个char*,其定义为:typedef char * caddr_t;。

struct file定义在文件4.4BSD-Lite\sys\sys\file.h中,其详细定义如下:
struct file {
	struct	file *f_filef;	/* list of active files */
	struct	file **f_fileb;	/* list of active files */
	short	f_flag;		/* see fcntl.h */
#define	DTYPE_VNODE	1	/* file */
#define	DTYPE_SOCKET	2	/* communications endpoint */
	short	f_type;		/* descriptor type */
	short	f_count;	/* reference count */
	short	f_msgcount;	/* references from message queue */
	struct	ucred *f_cred;	/* credentials associated with descriptor */
	struct	fileops {
		int	(*fo_read)	__P((struct file *fp, struct uio *uio,
					    struct ucred *cred));
		int	(*fo_write)	__P((struct file *fp, struct uio *uio,
					    struct ucred *cred));
		int	(*fo_ioctl)	__P((struct file *fp, int com,
					    caddr_t data, struct proc *p));
		int	(*fo_select)	__P((struct file *fp, int which,
					    struct proc *p));
		int	(*fo_close)	__P((struct file *fp, struct proc *p));
	} *f_ops;
	off_t	f_offset;
	caddr_t	f_data;		/* vnode or socket */
};

(4)caddr_t f_data表示打开文件的实际数据部分,caddr_t的实际定义类型为char*,当打开的文件类型为普通文件时(f_type值为DTYPE_VNODE),f_data指向了一个struct vnode结构体,当打开的文件类型为socket时(f_type值为DTYPE_SOCKET),f_data指向了一个struct socket结构体,这里只需要先关注short so_type和caddr_t so_pcb两个成员变量,其中:short so_type表示socket类型,例如SOCK_DGRAM表示UDP类型,SOCK_STREAM表示TCP类型;caddr_t so_pcb指向一个协议控制块的双向链表。

socket结构体的定义在文件:4.4BSD-Lite\sys\sys\socketvar.h中,其详细定义为:

struct socket {
	short	so_type;		/* generic type, see socket.h */
	short	so_options;		/* from socket call, see socket.h */
	short	so_linger;		/* time to linger while closing */
	short	so_state;		/* internal state flags SS_*, below */
	caddr_t	so_pcb;			/* protocol control block */
	struct	protosw *so_proto;	/* protocol handle */
/*
 * Variables for connection queueing.
 * Socket where accepts occur is so_head in all subsidiary sockets.
 * If so_head is 0, socket is not related to an accept.
 * For head socket so_q0 queues partially completed connections,
 * while so_q is a queue of connections ready to be accepted.
 * If a connection is aborted and it has so_head set, then
 * it has to be pulled out of either so_q0 or so_q.
 * We allow connections to queue up based on current queue lengths
 * and limit on number of queued connections for this socket.
 */
	struct	socket *so_head;	/* back pointer to accept socket */
	struct	socket *so_q0;		/* queue of partial connections */
	struct	socket *so_q;		/* queue of incoming connections */
	short	so_q0len;		/* partials on so_q0 */
	short	so_qlen;		/* number of connections on so_q */
	short	so_qlimit;		/* max number queued connections */
	short	so_timeo;		/* connection timeout */
	u_short	so_error;		/* error affecting connection */
	pid_t	so_pgid;		/* pgid for signals */
	u_long	so_oobmark;		/* chars to oob mark */
/*
 * Variables for socket buffering.
 */
	struct	sockbuf {
		u_long	sb_cc;		/* actual chars in buffer */
		u_long	sb_hiwat;	/* max actual char count */
		u_long	sb_mbcnt;	/* chars of mbufs used */
		u_long	sb_mbmax;	/* max chars of mbufs to use */
		long	sb_lowat;	/* low water mark */
		struct	mbuf *sb_mb;	/* the mbuf chain */
		struct	selinfo sb_sel;	/* process selecting read/write */
		short	sb_flags;	/* flags, see below */
		short	sb_timeo;	/* timeout for read/write */
	} so_rcv, so_snd;
#define	SB_MAX		(256*1024)	/* default for max chars in sockbuf */
#define	SB_LOCK		0x01		/* lock on data queue */
#define	SB_WANT		0x02		/* someone is waiting to lock */
#define	SB_WAIT		0x04		/* someone is waiting for data/space */
#define	SB_SEL		0x08		/* someone is selecting */
#define	SB_ASYNC	0x10		/* ASYNC I/O, need signals */
#define	SB_NOTIFY	(SB_WAIT|SB_SEL|SB_ASYNC)
#define	SB_NOINTR	0x40		/* operations not interruptible */

	caddr_t	so_tpcb;		/* Wisc. protocol control block XXX */
	void	(*so_upcall) __P((struct socket *so, caddr_t arg, int waitf));
	caddr_t	so_upcallarg;		/* Arg for above */
};

(5)协议控制块caddr_tso_pcb,在socket结构体中,协议控制块是非常核心的数据结构,采用双向链表方式表示,它包含以下成员:前一个inpcb、后一个inpcb、inpcb双向链表的首节点、当前inpcb对应socket的本地IP、本地端口、远端IP、远端端口、当前inpcb对应的socket结构体的地址等等;

每个socket都有一个协议控制块inpub与之对应:可通过socket的so_pcb成员来访问它,同时协议控制块中也有个成员struct     socket *inp_socket用于指向自己所属的socket结构体。

在OS中,每种类型的socket有且只有一个inpcb链表与之对应,例如:所有的TCP的socket的inpcb都在同一个TCP的inpcb双向链表中,所有的UDP的socket的inpcb都在同一个UDP的inpcb双向链表中。

在通过socket接收数据时,OS从网卡驱动中拿到数据后,先搜索inpcb的双向链表,通过比对本地ip地址、本地端口号、远端ip地址、远端端口号找到匹配的inpcb,进而找到对应socket,并将数据保存到socket的接收缓存中。

协议控制块struct inp定义在文件4.4BSD-Lite\sys\netinet\in_pcb.h中,其详细定义为:

struct inpcb {
	struct	inpcb *inp_next,*inp_prev;
					/* pointers to other pcb's */
	struct	inpcb *inp_head;	/* pointer back to chain of inpcb's
					   for this protocol */
	struct	in_addr inp_faddr;	/* foreign host table entry */
	u_short	inp_fport;		/* foreign port */
	struct	in_addr inp_laddr;	/* local host table entry */
	u_short	inp_lport;		/* local port */
	struct	socket *inp_socket;	/* back pointer to socket */
	caddr_t	inp_ppcb;		/* pointer to per-protocol pcb */
	struct	route inp_route;	/* placeholder for routing entry */
	int	inp_flags;		/* generic IP/datagram flags */
	struct	ip inp_ip;		/* header prototype; should have more */
	struct	mbuf *inp_options;	/* IP options */
	struct	ip_moptions *inp_moptions; /* IP multicast options */
};










  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值