libpcap+PF_RING源码分析一、二

最新推荐文章于 2024-01-19 22:10:00 发布

lionzl

最新推荐文章于 2024-01-19 22:10:00 发布

阅读量1.8k

点赞数 1

Libpcap是linux下用来捕获数据包的抓包库，它主要是基于socket的，和winpcap的本质的不同是，winpcap是和tcp/ip协议同层的，而libpcap是应用层的库，在tcp/ip层上对socket的又一次封装，所以从网卡得到的数据包需要经过多次拷贝才能达到应用程序，在千兆网的条件下，捕获包的性能较差，为了提高libpcap的包捕获性能，采用PF_RING对libpcap进行改进，改进后的libpcap采用环状缓冲区从网卡接收数据包，然后通过mmap映射到应用程序，减少内存拷贝的次数。为了更好的理解libpcap，pfring,libpfring等库函数，所以对这些源码进行分析，其中pfring是内核的源码，而libpfring是对pfring的封装，供应用程序调用，其实不采用libpcap，直接采用libpring也能捕获数据包，因为目前大部分的sniff工具都是建立在libpcap之上的，所以还是采用libpcap的接口，在底层采用pfring修改socket的实现过程。

Winpcap和libpcap捕获数据包的不同之处在于winpcap是与tcp/ip同层的协议，而libpcap是应用层的开发包，libpcap+pf_ring补丁后，和winpcap就有点类似了，都是采用环状的内核缓冲区，内核缓冲区的大小都可以设置。而winpcap和libpcap另外一个不同之处在于，它可以设定mintocopysize，即当内核缓冲区有这么多数据的时候，就将数据拷贝到应用程序缓冲区，而libpcap是没有这种功能的。Libpcap主要是基于网卡中断或轮询往上层传替数据的。

首先以libpcap为主线，先通过pcap_open_live函数，做一些初始化的操作，比如打开网卡，设置好读取数据包的回调函数等等，然后就可以通过pcap_next，pcap_next_ex，pcap_dispatch，pcap_loop来捕获数据包了。本文的主要宗旨是分析源码，从应用层的libpcap，pfring一直分析到内核的PF_RING，通过对源码的讲解，使得我们深入的理解PF_RING，及它是怎样改善libpcap捕获数据包的性能的。

1) pcap_open_live

首先还是从应用层的libpcap开始分析，第一个分析的函数pcap_open_live，在pcap.c文件中找到pcap_open_live函数，源码如下：

pcap_t * pcap_open_live(constchar *source, int snaplen, int promisc, int to_ms, char *errbuf)

{

pcap_t*p;

intstatus;

p= pcap_create(source, errbuf);

if(p == NULL)

return(NULL);

status= pcap_set_snaplen(p, snaplen);

if(status < 0)

gotofail;

status= pcap_set_promisc(p, promisc);

if(status < 0)

gotofail;

status= pcap_set_timeout(p, to_ms);

if(status < 0)

gotofail;

p->oldstyle= 1;

status= pcap_activate(p);

if(status < 0)

gotofail;

return(p);

fail:

if(status == PCAP_ERROR)

snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,

p->errbuf);

elseif (status == PCAP_ERROR_NO_SUCH_DEVICE ||

status == PCAP_ERROR_PERM_DENIED)

snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s (%s)", source,

pcap_statustostr(status), p->errbuf);

else

snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,

pcap_statustostr(status));

pcap_close(p);

return(NULL);

}

从上面的源码可以看出，pcap_open_live函数首先调用pcap_create函数，这个函数里面的内容到下面在进行分析，然后调用pcap_set_snaplen设置最大捕获包的长度，对于以太网数据包，最大长度为1518bytes，默认可以设置成65535就可以捕获所有的数据包了。然后调用pcap_set_promisc设置数据包的捕获模式，1为混杂模式，pcap_set_timeout函数的作用是设置超时的时间，当应用程序在这个时间内没读到数据就返回。接着就是pcap_activate函数了，这个也在下面进行讲解。其实在pcap_create函数和pcap_activate函数之间还可以调用pcap_set_buffer_size函数设置内核缓冲区的大小，这个函数我们可以在opentest.c文件中看到它的调用方法。我也会在下文中进行讲解。

Libpcap源码为了支持多个操作系统，代码错综复杂，你搜一下pcap_create函数，有很多地方定义了该函数，但是我们是在linux系统下进行源码分析，所以我们首先在pcap_linux.c下面搜索pcap_create函数，源码如下：

pcap_t * pcap_create(constchar *device, char *ebuf)

{ //device 为网卡的设备名，ebuf：存放错误信息的缓冲区

pcap_t *handle;

* A null device name is equivalent to the"any" device.

if (device == NULL)

device ="any";

#ifdef HAVE_DAG_API

if (strstr(device,"dag")) {

returndag_create(device, ebuf);

}

#endif /* HAVE_DAG_API */

#ifdef HAVE_SEPTEL_API

if (strstr(device,"septel")) {

returnseptel_create(device, ebuf);

}

#endif /* HAVE_SEPTEL_API */

#ifdef HAVE_SNF_API

handle =snf_create(device, ebuf);

if (strstr(device,"snf") || handle != NULL)

return handle;

#endif /* HAVE_SNF_API */

#ifdef PCAP_SUPPORT_BT

if (strstr(device,"bluetooth")) {

returnbt_create(device, ebuf);

}

#endif

#ifdef PCAP_SUPPORT_CAN

if (strstr(device,"can") || strstr(device, "vcan")) {

returncan_create(device, ebuf);

}

#endif

#ifdef PCAP_SUPPORT_USB

if (strstr(device,"usbmon")) {

returnusb_create(device, ebuf);

}

#endif

handle = pcap_create_common(device, ebuf);

if (handle == NULL)

return NULL;

// pcap_create_common为初始化的函数，通过网卡设备的名字，获得pcap_t*一个句柄，然后再设定handle的回调函数。

handle->activate_op =pcap_activate_linux;

handle->can_set_rfmon_op= pcap_can_set_rfmon_linux; //设置rfmonmode

return handle;

}

为了支持不同的设备，pcap_create通过 #ifdef进行区分，这样就将打开不同的设备集成在一个函数中，而在我们的应用中就是普通的网卡，所以它就是调用pcap_create_common函数，它在pcap.c中定义，感觉有点混乱，为什么不直接在pcap-linux.c中定义呢，个人观点，应该在pcap-linux中定义，显的直观些，害我跟踪的时候，还要到pcap.c中取找这个函数，因为libpcap还要兼容其它操作系统的原因吧，因为你把它放在pcap-linux.c，其它操作系统调用这个函数，就不方便了，从这一点考虑，libpcap的作者们的架构还是挺不错的。另外定义2个回调函数pcap_activate_linux和pcap_can_set_rfmon_linux函数。Pcap_create函数的返回值为pcap_t*类型的网卡的句柄。既然讲到了pcap_create函数，就必须跟踪到pcap_create_common函数及另外的2个回调函数中去。下面接着看pcap_create_common函数的源码。

pcap_t *pcap_create_common(constchar *source, char *ebuf)

{

pcap_t*p;

p= malloc(sizeof(*p)); //给p分配内存

if(p == NULL) {

snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",

pcap_strerror(errno));

return(NULL);

}

memset(p,0, sizeof(*p)); //对p的内存区域清0

#ifndef WIN32

p->fd= -1; /* not opened yet */

p->selectable_fd= -1;

p->send_fd= -1;

#endif

p->opt.source= strdup(source); //source为网卡的名字

if(p->opt.source == NULL) {

snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",

pcap_strerror(errno));

free(p);

return(NULL);

}

* Default to"can't set rfmon mode"; if it's supported by

* a platform, thecreate routine that called us can set

* the op to its routineto check whether a particular

* device supports it.

p->can_set_rfmon_op= pcap_cant_set_rfmon;

initialize_ops(p);

/*put in some defaults*/

pcap_set_timeout(p,0);

pcap_set_snaplen(p,65535); /* max packet size */

p->opt.promisc= 0;

p->opt.buffer_size= 0;

return(p);

}

在这个函数中，需要讲解的是strdup函数，它的作用是复制字符串，返回指向被复制的字符串的指针。注意应用它时，需要加头文件#include <string.h>。

p->can_set_rfmon_op =pcap_cant_set_rfmon; 这句话的作用在函数里面的注释中已经讲了，默认为不设置rfmon mode。initialize_ops(p);函数的作用就是设置初始化的一系列回调函数。

pcap_set_timeout(p,0);

pcap_set_snaplen(p,65535); /* max packet size */

p->opt.promisc= 0;

p->opt.buffer_size= 0;

这几行代码的作用是设置初始的超时，snaplen=65535，设置成非混杂模式，内核缓冲区的大小初始化为0。整的来说pcap_create_common就是一个初始化函数。

其中initialize_ops函数的源码如下：

static void initialize_ops(pcap_t*p)

{

* Set operationpointers for operations that only work on

* an activated pcap_tto point to a routine that returns

* a "this isn'tactivated" error.

p->read_op= (read_op_t)pcap_not_initialized;

p->inject_op= (inject_op_t)pcap_not_initialized;

p->setfilter_op= (setfilter_op_t)pcap_not_initialized;

p->setdirection_op= (setdirection_op_t)pcap_not_initialized;

p->set_datalink_op= (set_datalink_op_t)pcap_not_initialized;

p->getnonblock_op= (getnonblock_op_t)pcap_not_initialized;

p->setnonblock_op= (setnonblock_op_t)pcap_not_initialized;

p->stats_op= (stats_op_t)pcap_not_initialized;

#ifdef WIN32

p->setbuff_op= (setbuff_op_t)pcap_not_initialized;

p->setmode_op= (setmode_op_t)pcap_not_initialized;

p->setmintocopy_op= (setmintocopy_op_t)pcap_not_initialized;

#endif

* Default cleanupoperation - implementations can override

* this, but should callpcap_cleanup_live_common() after

* doing their ownadditional cleanup.

p->cleanup_op= pcap_cleanup_live_common;

* In most cases, the standard one-shortcallback can

* be used for pcap_next()/pcap_next_ex().

p->oneshot_callback= pcap_oneshot;

}

pcap_create_common讲解完了，接着讲解pcap_create函数中的另外一个回调函数，pcap_activate_linux，搜索这个函数，呵呵，在pcap-linux.c中找到了这个函数。Libpcap的作者这个架构，实在是令小生佩服。把linux要用到的函数都集成到了pcap-linux.c中，还把多个操作系统共用的函数就放到了pcap.c中，比如前面讲到的pcap_create_common函数。先不管这么多，抓住pcap_activate_linux再说。下面讲解pcap_activate_linux这个源码。从pcap_activate_linux的源码可以看到，通过pcap_create_common对pcap_t * p设定初始值，其实就像c++的初始化函数一样，比如c++的构造函数，MFC的OninitDialog函数一样。初始化就是初始化，对于不同的系统，就要进行不同的设置了，在linux函数中pcap_activate_linux中可以看到又对pcap_create_common中初始化的回调函数又重新进行了设置，看到这里我就佩服libpcap的作者了，把pcap_create_common函数放到了pcap.c文件中。

static int pcap_activate_linux(pcap_t*handle)

{

constchar *device;

int status = 0;

device= handle->opt.source; //网卡的名字

handle->inject_op= pcap_inject_linux;

handle->setfilter_op= pcap_setfilter_linux;

handle->setdirection_op= pcap_setdirection_linux;

handle->set_datalink_op= NULL; /* can't change data link type */

handle->getnonblock_op= pcap_getnonblock_fd;

handle->setnonblock_op= pcap_setnonblock_fd;

handle->cleanup_op= pcap_cleanup_linux;

handle->read_op= pcap_read_linux;

handle->stats_op= pcap_stats_linux;

* The "any"device is a special device which causes us not

* to bind to a particulardevice and thus to look at all

* devices.

if(strcmp(device, "any") == 0) {

if(handle->opt.promisc) {

handle->opt.promisc= 0;

/*Just a warning. */

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"Promiscuous mode not supported on the\"any\" device");

status= PCAP_WARNING_PROMISC_NOTSUP;

}

handle->md.device = strdup(device);

if(handle->md.device == NULL) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "strdup: %s",

pcap_strerror(errno) );

returnPCAP_ERROR;

}

#ifdef HAVE_PF_RING //是否定义pf_ring

if(!getenv("PCAP_NO_PF_RING")){

/* Code courtesy ofChris Wakelin <c.d.wakelin@reading.ac.uk> */

char *clusterId;

handle->ring =pfring_open((char*)device, handle->opt.promisc, handle->snapshot, 1);

#ifdef HAVE_PF_RING 如果定义了PF_RING，就执行这个里面的东东，从里面的函数可以看出，pf_ring从新定义了socket函数，pfring_open函数的作用如下：初始化PF_RING socket，获得一个pfring类型的结构。函数原型如下：

pfring* pfring_open(char *device_name,u_int8_t promisc, u_int32_t caplen, u_int8_t reentrant);

函数功能：初始化PF_RING socket，获得一个pfring类型结构。如果需要以DNA的方式打开一个设备，则必须调用pfring_open_dna()函数；

参数：

Device_name: PF_RING的符号链接命令(egeth0)；

Promisc: 设置是否为混合模式(1=混合模式)；

Caplen:最大的包捕获长度，(also known assnaplen和pcap_open_live函数的snaplen一样，通常设为65535就能捕获到网络上最大的数据包)；

Reentrant: 设为非0，则设备已reentrant的模式打开，它以信号量的机制执行，性能稍微会变差，主要用在多线程应用程序；

返回值：成功返回一个句柄，否则返回NULL

Pfring_open的源码如下：

pfring*pfring_open(char *device_name, u_int8_t promisc,u_int32_t caplen, u_int8_t_reentrant) {

return(pfring_open_consumer(device_name, promisc, caplen, _reentrant,

0, NULL, 0));

Pfring_open 其实是调用的pfring_open_consumer函数；该函数到后面我们在继续分析它；

if(handle->ring) {

if(clusterId =getenv("PCAP_PF_RING_CLUSTER_ID"))

其中getenv为C语言中读取环境变量的当前值的函数

原形：char *getenv(const char *name)

用法：s=getenv("环境变量名");

　　　需先定义char *s;

功能：返回一给定的环境变量值，环境变量名可大写或小写。如果指定的变量在环境中未定义，则返回一空串。

if(atoi(clusterId) > 0 &&atoi(clusterId) < 255)

if(getenv("PCAP_PF_RING_USE_CLUSTER_PER_FLOW"))

pfring_set_cluster(handle->ring,atoi(clusterId), cluster_per_flow);

else

pfring_set_cluster(handle->ring, atoi(clusterId),cluster_round_robin);

pfring_enable_ring(handle->ring);

} else

handle->ring = NULL;

}else

handle->ring = NULL;

pfring_set_cluster的函数只用于设置cluster_id，通过调用PF_RING的setsockopt函数完成：

查找PF_RING的文档，对这个函数有以下说明，在多cpu的情况下，pfring_set_cluster是非常有用的：

This call allows a ring to be added to acluster that can spawn across address spaces. On a nuthsell when two or moresockets are clustered they share incoming packets that are balanced on aper-flow manner. This technique is useful for exploiting multicore systems of for sharing packets in the same address space across multiple threads.

intpfring_set_cluster(pfring *ring, u_int clusterId, cluster_type the_type) {

#ifdef USE_PCAP

return(-1);

#else

if(ring->dna_mapped_device)

return(-1);

else {

struct add_to_cluster cluster;

cluster.clusterId = clusterId,cluster.the_type = the_type;

return(ring ? setsockopt(ring->fd, 0,SO_ADD_TO_CLUSTER,

&cluster, sizeof(cluster)): -1);

}

#endif

}

其中setsockopt/getsockopt函数的作用是：

功能描述：

获取或者设置与某个套接字关联的选项。选项可能存在于多层协议中，它们总会出现在最上面的套接字层。当操作套接字选项时，选项位于的层和选项的名称必须给出。为了操作套接字层的选项，应该将层的值指定为SOL_SOCKET。为了操作其它层的选项，控制选项的合适协议号必须给出。例如，为了表示一个选项由TCP协议解析，层应该设定为协议号TCP。用法如下：

#include <sys/types.h>

#include <sys/socket.h>

int getsockopt(int sock,int level, int optname, void *optval, socklen_t *optlen);

int setsockopt(int sock,int level, int optname, const void *optval, socklen_t optlen);

参数说明：

sock：将要被设置或者获取选项的套接字。

level：选项所在的协议层。

optname：需要访问的选项名。//SO_ADD_TO_CLUSTER

optval：对于getsockopt()，指向返回选项值的缓冲。对于setsockopt()，指向包含新选项值的缓冲。

optlen：对于getsockopt()，作为入口参数时，选项值的最大长度。作为出口参数时，选项值的实际长度。对于setsockopt()，现选项的长度。

如果定义了PF_RING就是调用pfring_open建立sock，这一部分内容讲解告一段落了。

if(handle->ring!= NULL) {

handle->fd = handle->ring->fd;

handle->bufsize = handle->snapshot;

handle->linktype = DLT_EN10MB;

handle->offset = 2;

/* printf("OpenHAVE_PF_RING(%s)\n", device); */

}else {

/* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */

#endif

* If we're inpromiscuous mode, then we probably want

* to see when theinterface drops packets too, so get an

* initial count from/proc/net/dev

if(handle->opt.promisc)

handle->md.proc_dropped= linux_if_drops(handle->md.device);

* Current Linux kernelsuse the protocol family PF_PACKET to

* allow direct accessto all packets on the network while

* older kernels had aspecial socket type SOCK_PACKET to

* implement thisfeature.

* While this oldimplementation is kind of obsolete we need

* to be compatible witholder kernels for a while so we are

* trying both methodswith the newer method preferred.

// 目前的内核是采用PF_PACKET，而老的内核通过采用SOCK_PACKET

if((status = activate_new(handle)) == 1) {

* Try to open a packet socket using the newkernel PF_PACKET interface.

* Returns 1 on success, 0 on an error thatmeans the new interface isn't

* present (so the old SOCK_PACKET interfaceshould be tried), and a

* PCAP_ERROR_ value on an error that meansthat the old mechanism won't

* work either (so it shouldn't be tried). Activate_new函数的作用在没有定义PF_RING的情况下通过PF_PACKET接口建立socket，返回1表示成功，可以采用PF_PACKET建立socket，返回0表示失败，这时可以尝试采用SOCKET_PACKET接口建立socket，该函数也在pcap-linux.c中可以找到源码；根据status的返回值，确定3种不同的情况，返回1成功，表示采用的是PF_PACKET建立socket，而返回0的时候，又调用activate_old函数进行判断，如果activate_old函数返回1表示调用的是SOCK_PACKET建立socket，而activate_old返回0表示失败；第3种情况是status不等于上面的2个值，则表示失败。

* Success.

* Try to use memory-mapped access.

switch(activate_mmap(handle)) {

case1:

/*we succeeded; nothing more to do */

return0;

case0:

* Kernel doesn't support it - just continue

* with non-memory-mapped access.

status= 0;

break;

case-1:

* We failed to set up to use it, or kernel

* supports it, but we failed to enable it;

* return an error. handle->errbuf contains

* an error message.

status= PCAP_ERROR;

gotofail;

}

elseif (status == 0) {

/*Non-fatal error; try old way */

if((status = activate_old(handle)) != 1) {

* Bothmethods to open the packet socket failed.

* Tidy upand report our failure (handle->errbuf

* isexpected to be set by the functions above).

gotofail;

}

}else {

* Fatal errorwith the new way; just fail.

* status has theerror return; if it's PCAP_ERROR,

*handle->errbuf has been set appropriately.

gotofail;

}

* We set up the socket,but not with memory-mapped access.

if(handle->opt.buffer_size != 0) {

如果opt.buffer_size!=0以我的理解就是应用程序调用了pcap_set_buffer_size设置了内核缓冲区的大小，而不是采用默认的内核缓冲区，因此首先通过setsockopt发送设置命令，然后调用malloc分配内存。

* Set the socket buffersize to the specified value.

if(setsockopt(handle->fd, SOL_SOCKET,SO_RCVBUF,

&handle->opt.buffer_size,

sizeof(handle->opt.buffer_size)) == -1){

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"SO_RCVBUF: %s",pcap_strerror(errno));

status= PCAP_ERROR;

gotofail;

}

#ifdef HAVE_PF_RING

}

#endif

/*Allocate the buffer */

handle->buffer = malloc(handle->bufsize +handle->offset);

if(!handle->buffer) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"malloc: %s", pcap_strerror(errno));

status= PCAP_ERROR;

gotofail;

}

*"handle->fd" is a socket, so "select()" and"poll()"

* should work on it.

handle->selectable_fd= handle->fd;

returnstatus;

fail:

pcap_cleanup_linux(handle);

returnstatus;

}

pcap_activate_linux函数分析完了，按我的理解应该是用PF_RING代替PF_PACKET或SOCK_PACKET。但是我从pcap_activate_linux函数，简单的分析下，发现首先采用的pfring_open建立sock，以我的理解，当定义了pf_ring时，采用pfring_open建立socket后应该马上退出函数，不去判断后面的内容了，比如又去判断activate_new和activate_old函数，没有搞明白，也不理解作者的意图。所以我再次的对pfring_open的源码进行分析，继续跟踪代码：首先跟踪的是pfring_open函数，然后跟踪activate_new函数，有必要看看这个里面是怎么实现的。前面说过pfring_open是调用pfring_open_consumer函数的，为了分析他们的源码，跟踪到pfring.c文件中，pfring_open_consumer函数的源码如下：

pfring* pfring_open_consumer(char *device_name, u_int8_tpromisc,

u_int32_t caplen, u_int8_t _reentrant,

u_int8_tconsumer_plugin_id,

char* consumer_data, u_intconsumer_data_len) {

#ifdefUSE_PCAP

char ebuf[256];

pcap_t *pcapPtr = pcap_open_live(device_name,

caplen,

1 /* promiscuous mode */,

1000 /* ms */,

ebuf);

return((pfring*)pcapPtr);

#else

int err = 0;

pfring *ring =(pfring*)malloc(sizeof(pfring)); //申请pfring结构体大小的内存

if(ring == NULL)

return(NULL);

else

memset(ring, 0, sizeof(pfring)); //将缓冲区清0

ring->reentrant = _reentrant;

ring->fd = socket(PF_RING, SOCK_RAW,htons(ETH_P_ALL)); //建立socket

#ifdef RING_DEBUG

printf("OpenRING [fd=%d]\n", ring->fd);

#endif

if(ring->fd > 0) {

int rc;

u_int memSlotsLen;

if(caplen > MAX_CAPLEN) caplen = MAX_CAPLEN;

//在pfring.h中定义 MAX_CAPLEN，#define MAX_CAPLEN 16384

setsockopt(ring->fd, 0, SO_RING_BUCKET_LEN, &caplen, sizeof(caplen));

//设置caplen，caplen为捕获包的大小在pfring.h中定义它的最大大小为16384

/* printf("channel_id=%d\n",channel_id); */

if(device_name == NULL /* any */) {

device_name = "any";

rc = pfring_bind(ring, device_name); //绑定ring

} else if(!strcmp(device_name,"none")) {

/* No binding yet */

rc = 0;

} else

rc = pfring_bind(ring, device_name);

if(rc == 0) {

if(consumer_plugin_id > 0) {

ring->kernel_packet_consumer =consumer_plugin_id;

rc = pfring_set_packet_consumer_mode(ring,consumer_plugin_id,

consumer_data, consumer_data_len);

if(rc < 0) {

free(ring);

return(NULL);

}

} else

ring->kernel_packet_consumer = 0;

ring->buffer = (char *)mmap(NULL,PAGE_SIZE, PROT_READ|PROT_WRITE,

MAP_SHARED, ring->fd, 0);

//mmap 内存映射其中PAGE_SIZE=4096

内存映射mmap函数原型如下：函数：void *mmap(void*start,size_t length,int prot,int flags,int fd,off_t offsize);

参数start：指向欲映射的内存起始地址，通常设为 NULL，代表让系统自动选定地址，映射成功后返回该地址。

参数length：代表将文件中多大的部分映射到内存。

参数prot：映射区域的保护方式。可以为以下几种方式的组合：
PROT_EXEC 映射区域可被执行，PROT_READ映射区域可被读取，PROT_WRITE映射区域可被写入
PROT_NONE 映射区域不能存取；

参数flags：影响映射区域的各种特性。在调用mmap()时必须要指定MAP_SHARED或MAP_PRIVATE。
MAP_FIXED 如果参数start所指的地址无法成功建立映射时，则放弃映射，不对地址做修正。通常不鼓励用此旗标。
MAP_SHARED对映射区域的写入数据会复制回文件内，而且允许其他映射该文件的进程共享。
MAP_PRIVATE 对映射区域的写入操作会产生一个映射文件的复制，即私人的“写入时复制”（copy on write）对此区域作的任何修改都不会写回原来的文件内容。
MAP_ANONYMOUS建立匿名映射。此时会忽略参数fd，不涉及文件，而且映射区域无法和其他进程共享。
MAP_DENYWRITE只允许对映射区域的写入操作，其他对文件直接写入的操作将会被拒绝。
MAP_LOCKED 将映射区域锁定住，这表示该区域不会被置换(swap)。

参数fd：要映射到内存中的文件描述符(ring->fd为socket函数的返回值)。如果使用匿名内存映射时，即flags中设置了MAP_ANONYMOUS，fd设为-1。有些系统不支持匿名内存映射，则可以使用fopen打开/dev/zero文件，然后对该文件进行映射，可以同样达到匿名内存映射的效果。

参数offset：文件映射的偏移量，通常设置为0，代表从文件最前方开始对应，offset必须是分页大小的整数倍。

返回值：

若映射成功则返回映射区的内存起始地址，否则返回MAP_FAILED(－1)，错误原因存于errno中。

if(ring->buffer == MAP_FAILED) {

printf("mmap()failed: try with a smaller snaplen\n");

free(ring);

return(NULL);

}

ring->slots_info = (FlowSlotInfo *)ring->buffer;

//其中ring->buffer为mmap内存映射的缓冲区，ring->slot_info指向ring->buffer的开始位置；

if(ring->slots_info->version != RING_FLOWSLOT_VERSION) {

printf("WrongRING version: "

"kernel is %i, libpfring wascompiled with %i\n",

ring->slots_info->version,RING_FLOWSLOT_VERSION);

free(ring);

return(NULL);

}

memSlotsLen = ring->slots_info->tot_mem; //

munmap(ring->buffer,PAGE_SIZE); //删除映射

ring->buffer = (char*)mmap(NULL, memSlotsLen,

PROT_READ|PROT_WRITE,

MAP_SHARED, ring->fd, 0);

感觉前面的mmap就是为了得到memSlotsLen，然后就用munmap删除映射了，接着使用mmap重新内存映射。

if(ring->buffer == MAP_FAILED) {

printf("mmap() failed");

free(ring);

return(NULL);

}

ring->slots_info = (FlowSlotInfo *)ring->buffer; //得到环状缓冲区指针

ring->slots = (char*)(ring->buffer+sizeof(FlowSlotInfo));

//跳过环状缓冲区前面的机构体的大小，后面就是用来接收数据了。

/* Set defaults */

ring->device_name = strdup(device_name? device_name : "");

#ifdefRING_DEBUG

printf("RING (%s):tot_mem=%u/min_tot_slots=%u/max_slot_len=%u/"

"insert_off=%u/remove_off=%u/dropped=%llu\n",

device_name,

ring->slots_info->tot_mem,

ring->slots_info->tot_slots,

ring->slots_info->slot_len,

ring->slots_info->insert_off,

ring->slots_info->remove_off,

ring->slots_info->tot_lost);

#endif

if(promisc) {

if(set_if_promisc(device_name, 1) == 0)

ring->clear_promisc = 1;

}

#ifdef ENABLE_HW_TIMESTAMP

pfring_enable_hw_timestamp(ring,device_name);

#endif

} else {

close(ring->fd);

err = -1;

}

} else {

err = -1;

free(ring);

}

if(err == 0) {

if(ring->reentrant)

pthread_spin_init(&ring->spinlock,PTHREAD_PROCESS_PRIVATE);

return(ring);

} else

return(NULL);

#endif

}

//pfring_bind函数的作用是调用bind绑定socket； rc = bind(ring->fd,(struct sockaddr *)&sa, sizeof(sa));

int pfring_bind(pfring *ring, char *device_name) {

struct sockaddr sa; //定义一个socket地址变量

char *at;

int32_t channel_id = -1;

int rc = 0;

if((device_name == NULL) ||(strcmp(device_name, "none") == 0))

return(-1);

at = strchr(device_name, '@');

if(at != NULL) {

char *tok, *pos = NULL;

at[0] = '\0';

/* Syntax

ethX@1,5 channel 1 and 5

ethX@1-5 channel 1,2...5

ethX@1-3,5-7 channel 1,2,3,5,6,7

tok = strtok_r(&at[1], ",",&pos);

channel_id = 0;

while(tok != NULL) {

char *dash = strchr(tok, '-');

int32_t min_val, max_val, i;

if(dash) {

dash[0] = '\0';

min_val = atoi(tok);

max_val = atoi(&dash[1]);

} else

min_val = max_val = atoi(tok);

for(i = min_val; i <= max_val; i++)

channel_id |= 1 << i;

tok = strtok_r(NULL, ",",&pos);

}

/* Setup TX */

ring->sock_tx.sll_family = PF_PACKET;

ring->sock_tx.sll_protocol =htons(ETH_P_ALL);

sa.sa_family = PF_RING;

snprintf(sa.sa_data, sizeof(sa.sa_data),"%s", device_name);

rc = bind(ring->fd, (struct sockaddr*)&sa, sizeof(sa));

Bind函数：

头文件	#include <sys/types.h> #include <sys/socket.h>
函数原型	int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);
返回值	成功	失败
	0	1

if(rc == 0) {

if(channel_id != -1) {

int rc = pfring_set_channel_id(ring,channel_id);

if(rc != 0)

printf("pfring_set_channel_id()failed: %d\n", rc);

}

return(rc);

}

在这里又将pfring_open_consumer源码分析完了，确实跟我理解的一样。就是通过内存映射建立一个ring缓冲区，然后调用pfring_bind对socket进行绑定。再前面我们说了以我的个人理解，PF_RING的补丁，就是要采用新的socket代替原来的PF_PACKET和SOCK_PACKET，但是我开始分析源码时，发现既然建立了PF_RING，为什么pcap_activate_linux不直接返回呢，诧异，诧异。再次返回pcap_activate_linux函数看看，有什么没有看懂的吗？首先分析下pcap_activate_linux带的参数pcap_t *handle，这个数据结构吧，大家知道算法+数据结构=程序，可见数据结构的重要性。在pcap-int.h中找到了定义ring的地方，如下：

#ifdefHAVE_PF_RING

pfring *ring;

#endif

下面要看看，既然采用了pfring_open建立和绑定了socket，后面的activate_new函数的作用是什么呢？跟踪一下activate_new函数吧，

static int activate_new(pcap_t*handle)

{

#ifdef HAVE_PF_PACKET_SOCKETS

// HAVE_PF_PACKET_SOCKETS首先判断是不是PF_PACKETsocket类型，是的就执行这个里面的操作，不是的话，相当于直接返回0，就可以去调用activate_old去判断是不是SOCK_PACKET类型了。

const char *device = handle->opt.source;

int is_any_device= (strcmp(device, "any") == 0);

int sock_fd= -1, arptype;

#ifdef HAVE_PACKET_AUXDATA

int val;

#endif

int err= 0;

struct packet_mreq mr;

* Open a socket with protocol family packet.If the

* "any" device was specified, weopen a SOCK_DGRAM

* socket for the cooked interface, otherwisewe first

* try a SOCK_RAW socket for the raw interface.

sock_fd = is_any_device ?

socket(PF_PACKET, SOCK_DGRAM,htons(ETH_P_ALL)) :

socket(PF_PACKET, SOCK_RAW,htons(ETH_P_ALL));

//socket函数的作用是建立socket，下面是不是会出现绑定的函数呢，仔细看看

if (sock_fd == -1) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "socket: %s",

pcap_strerror(errno) );

return 0; /* try old mechanism */

}

/* It seems the kernel supports the newinterface. */

handle->md.sock_packet = 0;

* Get the interface index of the loopbackdevice.

* If the attempt fails, don't fail, just setthe

* "md.lo_ifindex" to -1.

* XXX - can there be more than one device thatloops

* packets back, i.e. devices other than"lo"? If so,

* we'd need to find them all, and have anarray of

* indices for them, and check all of them in

* "pcap_read_packet()".

handle->md.lo_ifindex =iface_get_id(sock_fd, "lo", handle->errbuf);

* Default value for offset to align link-layerpayload

* on a 4-byte boundary.

handle->offset = 0;

* What kind of frames do we have to deal with?Fall back

* to cooked mode if we have an unknowninterface type

* or a type we know doesn't work well in rawmode.

if (!is_any_device) {

/* Assume for now we don't needcooked mode. */

handle->md.cooked = 0;

if (handle->opt.rfmon) {

* We were asked to turn on monitor mode.

* Do so before we get the link-layer type,

* because entering monitor mode could change

* the link-layer type.

err =enter_rfmon_mode(handle, sock_fd, device);

if (err < 0) {

/* Hard failure */

close(sock_fd);

return err;

}

if (err == 0) {

* Nothing worked for turning monitor mode

* on.

close(sock_fd);

returnPCAP_ERROR_RFMON_NOTSUP;

}

* Either monitor mode has been turned on for

* the device, or we've been given a different

* device to open for monitor mode. If we've

* been given a different device, use it.

if (handle->md.mondevice!= NULL)

device =handle->md.mondevice;

}

arptype = iface_get_arptype(sock_fd, device, handle->errbuf);

if (arptype < 0) {

close(sock_fd);

return arptype;

}

map_arphrd_to_dlt(handle, arptype,1);

if (handle->linktype == -1 ||

handle->linktype == DLT_LINUX_SLL ||

handle->linktype == DLT_LINUX_IRDA ||

handle->linktype == DLT_LINUX_LAPD ||

(handle->linktype == DLT_EN10MB &&

(strncmp("isdn", device, 4) == 0||

strncmp("isdY", device, 4) ==0))) {

if (close(sock_fd) == -1) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"close: %s", pcap_strerror(errno));

return PCAP_ERROR;

}

sock_fd = socket(PF_PACKET,SOCK_DGRAM,

htons(ETH_P_ALL));

if (sock_fd == -1) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"socket: %s",pcap_strerror(errno));

return PCAP_ERROR;

}

handle->md.cooked = 1;

* Get rid of any link-layer type list

* we allocated - this only supports cooked

* capture.

if (handle->dlt_list !=NULL) {

free(handle->dlt_list);

handle->dlt_list= NULL;

handle->dlt_count= 0;

}

if (handle->linktype ==-1) {

* Warn that we're falling back on

* cooked mode; we may want to

* update "map_arphrd_to_dlt()"

* to handle the new type.

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"arptype%d not "

"supportedby libpcap - "

"fallingback to cooked "

"socket",

arptype);

}

* IrDA capture is not a real"cooked" capture,

* it's IrLAP frames, not IP packets. The

* same applies to LAPD capture.

if (handle->linktype !=DLT_LINUX_IRDA &&

handle->linktype != DLT_LINUX_LAPD)

handle->linktype= DLT_LINUX_SLL;

}

handle->md.ifindex =iface_get_id(sock_fd, device,

handle->errbuf);

if (handle->md.ifindex == -1) {

close(sock_fd);

return PCAP_ERROR;

}

// 在上面我们分析盼望已久的绑定函数终于出现了iface_bind函数就是绑定函数，这个函数我猜里面也是调用的bind函数吧，带着这个预期，我去跟踪下iface_bind的代码，再来给答案，看了iface_bind代码，果然和我预测的结果一样，是调用的bind函数进行绑定。

if ((err =iface_bind(sock_fd, handle->md.ifindex,

handle->errbuf)) != 1) {

close(sock_fd);

if (err < 0)

return err;

else

return 0; /* try old mechanism */

}

} else {

* The "any" device.

if (handle->opt.rfmon) {

* It doesn't support monitor mode.

returnPCAP_ERROR_RFMON_NOTSUP;

}

* It uses cooked mode.

handle->md.cooked = 1;

handle->linktype =DLT_LINUX_SLL;

* We're not bound to a device.

* For now, we're using this as an indication

* that we can't transmit; stop doing that only

* if we figure out how to transmit in cooked

* mode.

handle->md.ifindex = -1;

}

if (!is_any_device &&handle->opt.promisc) {

memset(&mr, 0, sizeof(mr));

mr.mr_ifindex =handle->md.ifindex;

mr.mr_type = PACKET_MR_PROMISC;

if (setsockopt(sock_fd,SOL_PACKET, PACKET_ADD_MEMBERSHIP,

&mr, sizeof(mr)) == -1) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"setsockopt:%s", pcap_strerror(errno));

close(sock_fd);

return PCAP_ERROR;

}

/* Enableauxillary data if supported and reserve room for

* reconstructing VLAN headers. */

#ifdef HAVE_PACKET_AUXDATA

val = 1;

if (setsockopt(sock_fd, SOL_PACKET,PACKET_AUXDATA, &val,

sizeof(val)) == -1 && errno !=ENOPROTOOPT) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"setsockopt: %s", pcap_strerror(errno));

close(sock_fd);

return PCAP_ERROR;

}

handle->offset += VLAN_TAG_LEN;

#endif /* HAVE_PACKET_AUXDATA */

if (handle->md.cooked) {

if (handle->snapshot <SLL_HDR_LEN + 1)

handle->snapshot =SLL_HDR_LEN + 1;

}

handle->bufsize = handle->snapshot;

/* Save the socket FD in the pcapstructure */

handle->fd = sock_fd;

return 1;

#else

//如果不是PF_PACKET类型，就直接返回0了，呵呵

strncpy(ebuf,

"New packet capturinginterface not supported by build "

"environment",PCAP_ERRBUF_SIZE);

return 0;

#endif

}

从activate_new函数的源码中也没有解决我要解决的那个问题，如果是PF_RING，就应该不去判断后面两种socket类型了，我又回到了pcap_activate_linux函数的源码，仔细看了看，这一次真的看出来了，就是一个handle->ring != NULL开始没有注意到，害我分析好久的其它代码不过也学到一些东西，

if(handle->ring != NULL) {

handle->fd = handle->ring->fd;

handle->bufsize = handle->snapshot;

handle->linktype = DLT_EN10MB;

handle->offset = 2;

/* printf("OpenHAVE_PF_RING(%s)\n", device); */

}else {

/* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

｝

当handle->ring！=NULL的时候，就会跳过activate_new等代码的，也就是说执行了PF_RING成功后，就不会去判断后面2种socket类型了，和我预测的一样。呵呵，终于明白pcap_activate_linux这个函数的功能了。

2011-4-18补充。并不是所有的情况pfring_open都会返回成功的，对应pcap_activate_linux里面当pfring_open调用后，比如我在实验时，将PF_RING补丁打入内核就出现错误"WrongRING version: " "kernel is 10, libpfring was compiled with 13" ，但是提示这个错误后，程序还能正确的跑，我后面再ring.h中看到内核pf_ring的版本定义为：

#defineRING_FLOWSLOT_VERSION 10

同时在pf_ring.h中发现：

#defineRING_FLOWSLOT_VERSION 13

在pfring_open的源码pfring_open_consumer中发现如果版本不一致，就会提示错误，pfring_open_consumer直接返回，这样pfring_open的返回值为NULL，但是为什么程序还能继续运行呢，这就是因为执行到了handle->ring!=NULL时的else部分，随后就会调用原始的libpcap收包函数获取数据包了，也就是说采用PF_PACKET的方式读取数据包，所以还是能够正常运行的。

同时在没有加载insmodpf_ring.ko时候，pfring_open也会返回为NULL，此时，程序也会调用libpcap原来的PF_PACKET进行收包的。

另外另一问题，当采用PF_RING读取数据包时，cpu占用率从原来的37%上升到47%，原来240Mbit/s的速度发包，大约2分钟丢3个包，采用PF_RING后可以提高到3分钟丢2个包，包长为1514个字节。

pcap_activate_linux定义的这些回调函数也是值得注意的。这里把他们都列出来。

device = handle->opt.source;

handle->inject_op = pcap_inject_linux;

handle->setfilter_op = pcap_setfilter_linux;

handle->setdirection_op = pcap_setdirection_linux;

handle->set_datalink_op = NULL; /* can't change data link type */

handle->getnonblock_op = pcap_getnonblock_fd;

handle->setnonblock_op = pcap_setnonblock_fd;

handle->cleanup_op = pcap_cleanup_linux;

handle->read_op = pcap_read_linux;

handle->stats_op = pcap_stats_linux;

其它的回调函数我就不多说了，这里重点要讲解的是pcap_read_linux函数，函数源码如下：

* Readat most max_packets from the capture stream and call the callback

* foreach of them. Returns the number of packets handled or -1 if an

* erroroccured.

static int

pcap_read_linux(pcap_t *handle, intmax_packets, pcap_handler callback, u_char *user)

{

* Currently, on Linuxonly one packet is delivered per read,

* so we don't loop.

returnpcap_read_packet(handle, callback, user);

}

函数体就相当简答了，晕，只有一句，就是调用pcap_read_packet函数读取数据包。

pcap_read_packet函数；这个函数可长了，一步一步看吧，既然开始分析了，就一定要把这些源码吃透，这里才能理解libpcap为什么丢包，而加上pf-ring补丁后的libpcap就不丢包了呢。不多说了，看源码吧。还有这个回调函数什么时候调用的呢，我现在猜想应该是应用程序调用pcap_next, pcap_next_ex, pcap_dispatch, pcap_loop这几个函数时读包时调用的吧，现在只是猜想，还没有分析这部分读包的源码，呵呵，好了，还是来看pcap_read_packet函数吧。

* Read a packet from the socket calling thehandler provided by

* the user. Returns the number of packetsreceived or -1 if an

* error occured.

staticint

pcap_read_packet(pcap_t*handle, pcap_handler callback, u_char *userdata)

{

u_char *bp;

int offset;

#ifdef HAVE_PF_PACKET_SOCKETS

struct sockaddr_ll from;

struct sll_header *hdrp;

#else

struct sockaddr from;

#endif

#if defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

struct iovec iov;

struct msghdr msg;

struct cmsghdr *cmsg;

union {

structcmsghdr cmsg;

char buf[CMSG_SPACE(sizeof(structtpacket_auxdata))];

} cmsg_buf;

#else /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

socklen_t fromlen;

#endif/* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

int packet_len,caplen;

#ifdef HAVE_PF_RING

structpfring_pkthdr pcap_header;

#else

struct pcap_pkthdr pcap_header;

#endif

// 这里必须讲解下，当定义了HAVE_PF_RING时候，pcap_header指向的是pfring_pkthdr结构体，去看看它和pcap_pkthdr结构体有什么不同。Pfring_pkthdr结构体的定义如下：

struct pfring_pkthdr {

/* pcap header */

struct timeval ts; /* timestamp */

u_int32_t caplen; /* length ofportion present */

u_int32_t len; /* lengththis packet (off wire) */

struct pfring_extended_pkthdr extended_hdr; /* PF_RING extended header*/

};

而pcap_pkthdr的结构体定义如下：

struct pcap_pkthdr {

struct timeval ts; /* time stamp */

bpf_u_int32 caplen; /* length of portion present */

bpf_u_int32 len; /* length this packet (off wire) */

};

//对比发现它们两个相比，pfring_pkthdr多了一个PF_RING的扩展头。

#ifdefHAVE_PF_RING

if(handle->ring) {

do {

if (handle->break_loop) {

* Yes - clear the flag that indicates that it

* has, and return -2 as an indication that we

* were told to break out of the loop.

* Patch courtesy of Michael Stiller <ms@2scale.net>

handle->break_loop = 0;

return -2;

}

packet_len = pfring_recv(handle->ring, (char*)handle->buffer,

handle->bufsize,

&pcap_header,

1 /* wait_for_incoming_packet */);

/*如果定义了PF_RING，就采用pfring_recv接收数据包，这个函数后面在进行讲解，如果没有定义PF_RING的话，采用recvmsg或recvfrom来接收数据包了，这两个函数有什么区别呢，大家google一下吧，不讲了。

if (packet_len > 0) {

bp = handle->buffer;

pcap_header.caplen = min(pcap_header.caplen, handle->bufsize);

caplen = pcap_header.caplen, packet_len = pcap_header.len;

goto pfring_pcap_read_packet;

}

}while (packet_len == -1 && (errno == EINTR || errno == ENETDOWN));

}

#endif

#ifdefHAVE_PF_PACKET_SOCKETS

*If this is a cooked device, leave extra room for a

*fake packet header.

if (handle->md.cooked)

offset = SLL_HDR_LEN;

else

offset = 0;

#else

*This system doesn't have PF_PACKET sockets, so it doesn't

*support cooked devices.

offset = 0;

#endif

* Receive a single packet from the kernel.

* We ignore EINTR, as that might just be dueto a signal

* being delivered - if the signal shouldinterrupt the

* loop, the signal handler should callpcap_breakloop()

* to set handle->break_loop (we ignore iton other

* platforms as well).

* We also ignore ENETDOWN, so that we cancontinue to

* capture traffic if the interface goes downand comes

* back up again; comments in the kernelindicate that

* we'll just block waiting for packets if wetry to

* receive from a socket that deliveredENETDOWN, and,

* if we're using a memory-mapped buffer, wewon't even

* get notified of "network down"events.

bp = handle->buffer +handle->offset;

#ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

msg.msg_name = &from;

msg.msg_namelen = sizeof(from);

msg.msg_iov = &iov;

msg.msg_iovlen = 1;

msg.msg_control = &cmsg_buf;

msg.msg_controllen = sizeof(cmsg_buf);

msg.msg_flags = 0;

iov.iov_len = handle->bufsize - offset;

iov.iov_base = bp + offset;

#endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

do {

* Has "pcap_breakloop()" beencalled?

if (handle->break_loop) {

* Yes - clear the flag that indicates that ithas,

* and return PCAP_ERROR_BREAK as an indicationthat

* we were told to break out of the loop.

handle->break_loop = 0;

return PCAP_ERROR_BREAK;

}

#ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

packet_len = recvmsg(handle->fd, &msg, MSG_TRUNC);

#else /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

fromlen = sizeof(from);

packet_len = recvfrom(

handle->fd, bp + offset,

handle->bufsize -offset, MSG_TRUNC,

(struct sockaddr *)&from, &fromlen);

#endif /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

} while (packet_len == -1 &&errno == EINTR);

/* Check if an error occured */

if (packet_len == -1) {

switch (errno) {

case EAGAIN:

return 0; /* no packet there */

case ENETDOWN:

* The device on which we're capturing wentaway.

* XXX - we should really return

* PCAP_ERROR_IFACE_NOT_UP, but pcap_dispatch()

* etc. aren't defined to return that.

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"The interfacewent down");

return PCAP_ERROR;

default:

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"recvfrom: %s",pcap_strerror(errno));

return PCAP_ERROR;

}

#ifdefHAVE_PF_PACKET_SOCKETS

if (!handle->md.sock_packet) {

* Unfortunately, there is a window betweensocket() and

* bind() where the kernel may queue packetsfrom any

* interface. If we're bound to a particular interface,

* discard packets notfrom that interface.

* (If socket filters are supported, we coulddo the

* same thing we do when changing the filter;however,

* that won't handle packet sockets withoutsocket

* filter support, and it's a bit more complicated.

* It would save some instructions per packet,however.)

if (handle->md.ifindex != -1&&

from.sll_ifindex != handle->md.ifindex)

return 0;

* Do checks based on packet direction.

* We can only do this if we're usingPF_PACKET; the

* address returned for SOCK_PACKET is a"sockaddr_pkt"

* which lacks the relevant packet typeinformation.

if (from.sll_pkttype ==PACKET_OUTGOING) {

* Outgoing packet.

* If this is from the loopback device, rejectit;

* we'll see the packet as an incoming packetas well,

* and we don't want to see it twice.

if (from.sll_ifindex ==handle->md.lo_ifindex)

return 0;

* If the user only wants incoming packets,reject it.

if (handle->direction ==PCAP_D_IN)

return 0;

} else {

* Incoming packet.

* If the user only wants outgoing packets,reject it.

if (handle->direction ==PCAP_D_OUT)

return 0;

}

#endif

#ifdefHAVE_PF_PACKET_SOCKETS

* If this is a cooked device, fill in the fakepacket header.

if (handle->md.cooked) {

* Add the length of the fake header to thelength

* of packet data we read.

packet_len += SLL_HDR_LEN;

hdrp = (struct sll_header *)bp;

hdrp->sll_pkttype =map_packet_type_to_sll_type(from.sll_pkttype);

hdrp->sll_hatype =htons(from.sll_hatype);

hdrp->sll_halen =htons(from.sll_halen);

memcpy(hdrp->sll_addr,from.sll_addr,

(from.sll_halen > SLL_ADDRLEN) ?

SLL_ADDRLEN :

from.sll_halen);

hdrp->sll_protocol= from.sll_protocol;

}

#ifdefHAVE_PF_RING

pfring_pcap_read_packet:

#endif

#ifdefined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

for (cmsg = CMSG_FIRSTHDR(&msg);cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {

struct tpacket_auxdata *aux;

unsigned int len;

struct vlan_tag *tag;

if (cmsg->cmsg_len <CMSG_LEN(sizeof(struct tpacket_auxdata)) ||

cmsg->cmsg_level != SOL_PACKET ||

cmsg->cmsg_type != PACKET_AUXDATA)

continue;

aux= (struct tpacket_auxdata *)CMSG_DATA(cmsg);

if (aux->tp_vlan_tci == 0)

continue;

len = packet_len > iov.iov_len? iov.iov_len : packet_len;

if (len < 2 * ETH_ALEN)

break;

bp -= VLAN_TAG_LEN;

memmove(bp, bp + VLAN_TAG_LEN, 2 *ETH_ALEN);

tag = (struct vlan_tag *)(bp + 2 *ETH_ALEN);

tag->vlan_tpid =htons(ETH_P_8021Q);

tag->vlan_tci =htons(aux->tp_vlan_tci);

packet_len += VLAN_TAG_LEN;

}

#endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

#endif /*HAVE_PF_PACKET_SOCKETS */

* XXX: According to the kernel source weshould get the real

* packet len if calling recvfrom withMSG_TRUNC set. It does

* not seem to work here :(, but it issupported by this code

* anyway.

* To be honest the code RELIES on that featureso this is really

* broken with 2.2.x kernels.

* I spend a day to figure out what's going onand I found out

* that the following is happening:

* The packet comes from a random interface andthe packet_rcv

* hook is called with a clone of the packet.That code inserts

* the packet into the receive queue of thepacket socket.

* If a filter is attached to that socket thatfilter is run

* first - and there lies the problem. Thedefault filter always

* cuts the packet at the snaplen:

* # tcpdump -d

* (000) ret #68

* So the packet filter cuts down the packet.The recvfrom call

* says "hey, it's only 68 bytes, it fitsinto the buffer" with

* the result that we don't get the real packetlength. This

* is valid at least until kernel 2.2.17pre6.

* We currently handle this by making a copy ofthe filter

* program, fixing all "ret"instructions with non-zero

* operands to have an operand of 65535 so thatthe filter

* doesn't truncate the packet, and supplyingthat modified

* filter to the kernel.

caplen = packet_len;

if (caplen > handle->snapshot)

caplen = handle->snapshot;

/* Run the packet filter if not usingkernel filter */

if (!handle->md.use_bpf && handle->fcode.bf_insns){

if(bpf_filter(handle->fcode.bf_insns, bp,

packet_len, caplen) == 0)

{

/* rejected by filter */

return 0;

}

/* Fill in our own header data */

#ifdef HAVE_PF_RING

if(!handle->ring) {

#endif

if (ioctl(handle->fd,SIOCGSTAMP, &pcap_header.ts) == -1) {

snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

"SIOCGSTAMP: %s",pcap_strerror(errno));

returnPCAP_ERROR;

}

pcap_header.caplen = caplen;

pcap_header.len = packet_len;

#ifdef HAVE_PF_RING

}

#endif

* Count the packet.

* Arguably, we should count them before wecheck the filter,

* as on many other platforms"ps_recv" counts packets

* handed to the filter rather than packetsthat passed

* the filter, but if filtering is done in thekernel, we

* can't get a count of packets that passed thefilter,

* and that would mean the meaning of"ps_recv" wouldn't

* be the same on all Linux systems.

* XXX - it's not the same on all systems inany case;

* ideally, we should have a "get thestatistics" call

* that supplies more counts and indicateswhich of them

* it supplies, so that we supply a count ofpackets

* handed to the filter only on platforms wherethat

* information is available.

* We count them here even if we can get thepacket count

* from the kernel, as we can only determine atrun time

* whether we'll be able to get it from thekernel (if

* HAVE_TPACKET_STATS isn't defined, we can'tget it from

* the kernel, but if it is defined, thelibrary might

* have been built with a 2.4 or later kernel,but we

* might be running on a 2.2[.x] kernel without Alexey

* Kuznetzov's turbopacket patches, and thusthe kernel

* might not be able to supply thosestatistics). We

* could, I guess, try, when opening thesocket, to get

* the statistics, and if we can not incrementthe count

* here, but it's not clear that alwaysincrementing

* the count is more expensive than alwaystesting a flag

* in memory.

* We keep the count in"md.packets_read", and use that for

* "ps_recv" if we can't get thestatistics from the kernel.

* We do that because, if we *can* get thestatistics from

* the kernel, we use"md.stat.ps_recv" and "md.stat.ps_drop"

* as running counts, as reading the statisticsfrom the

* kernel resets the kernel statistics, and ifwe directly

* increment "md.stat.ps_recv" here,that means it will

* count packets *twice* on systems where wecan get kernel

* statistics - once here, and once inpcap_stats_linux().

handle->md.packets_read++;

/* Call the usersupplied callback function */

#if defined(HAVE_PF_RING)

{

struct myts {

struct timeval ts;

u_int32_t caplen, len;

u_int64_t ns;

};

struct myts myhdr;

myhdr.ts.tv_sec = pcap_header.ts.tv_sec,myhdr.ts.tv_usec = pcap_header.ts.tv_usec;

myhdr.caplen = pcap_header.caplen, myhdr.len= pcap_header.len;

myhdr.ns =pcap_header.extended_hdr.timestamp_ns;

callback(userdata, (structpcap_pkthdr*)&myhdr, bp);

}

#else

callback(userdata,&pcap_header, bp);

#endif

/*这个函数虽然比较长，但是一路看下来，还是比较好理解的，就是在不同的socket下调用不同的函数接收数据包，最后看是否定义了HAVE_PF_RING，如果定义了，调用的回调函数callback的头会不一样的，呵呵，上面代码中已经可以看的很清楚了。

return 1;

}

讲解了这么多了，pcap_open_live还没有讲解完了，这几十页下来就讲解了pcap_open_live中调用的一个函数，哈哈，也就是pcap-linux.c中调用的pcap_create函数，libpcap博大精深，加上了pf-ring就有一种更高深的感觉。既然还没有讲解完，就接着讲解呗，下面讲解pcap_open_live中调用的另外一个函数，pcap_activate。Pcap_create起的作用是创建和绑定socket，同时定义了一些回调函数。那么pcap_activate的作用是啥呢，用源码说话，I love linux ，I love open source。

Int pcap_activate(pcap_t*p)

{

int status;

status = p->activate_op(p);

/*activate_op是个什么函数呢，搜了下原型是个函数指针，这个函数指针在哪里赋值呢，搜源码吧，呵呵。终于在pcap-linux.c下搜到了它的初始化赋值，哈哈，原来就是

handle->activate_op= pcap_activate_linux;

明白了在pcap_create中定义的pcap_activate_linux函数中赋值的回调函数activate_op终于在这里调用了，其实pcap_create只赋值定义这个回调函数，而调用就是在这里了。前面分析的一切到现在才调用，呵呵，明白了吗？

if (status >= 0) //pcap_activate_linux的返回值>=0表示成功

p->activated = 1;

else {

if (p->errbuf[0] == '\0') {

* No error message supplied by the activateroutine;

* for the benefit of programs that don'tspecially

* handle errors other than PCAP_ERROR,return the

* error message corresponding to the status.

snprintf(p->errbuf,PCAP_ERRBUF_SIZE, "%s",

pcap_statustostr(status));

}

* Undo any operation pointer setting, etc.done by

* the activate operation.

initialize_ops(p);

}

return (status);

}

Pcap_open_live终于分析完了，我也要去吃晚饭了，下面还有好多要分析呢，排个队吧，首先分析pcap_next等函数吧，socket已经建立和绑定了，也是该捕获数据的时候了，呵呵，捕获数据的回调函数也已经定义了，就是那个pcap_read_linux函数，即pcap_read_packet函数了，我现在猜想，pcap_open_live中肯定会调用这个回调函数的，咋们走着瞧。先吃饭，人是铁，饭是刚，一顿不吃饿的慌。稍后见。。。。。。。。