libpcap+PF_RING源码分析一、二

Libpcap是linux下用来捕获数据包的抓包库,它主要是基于socket的,和winpcap的本质的不同是,winpcap是和tcp/ip协议同层的,而libpcap是应用层的库,在tcp/ip层上对socket的又一次封装,所以从网卡得到的数据包需要经过多次拷贝才能达到应用程序,在千兆网的条件下,捕获包的性能较差,为了提高libpcap的包捕获性能,采用PF_RING对libpcap进行改进,改进后的libpcap采用环状缓冲区从网卡接收数据包,然后通过mmap映射到应用程序,减少内存拷贝的次数。为了更好的理解libpcap,pfring,libpfring等库函数,所以对这些源码进行分析,其中pfring是内核的源码,而libpfring是对pfring的封装,供应用程序调用,其实不采用libpcap,直接采用libpring也能捕获数据包,因为目前大部分的sniff工具都是建立在libpcap之上的,所以还是采用libpcap的接口,在底层采用pfring修改socket的实现过程。

       Winpcap和libpcap捕获数据包的不同之处在于winpcap是与tcp/ip同层的协议,而libpcap是应用层的开发包,libpcap+pf_ring补丁后,和winpcap就有点类似了,都是采用环状的内核缓冲区,内核缓冲区的大小都可以设置。而winpcap和libpcap另外一个不同之处在于,它可以设定mintocopysize,即当内核缓冲区有这么多数据的时候,就将数据拷贝到应用程序缓冲区,而libpcap是没有这种功能的。Libpcap主要是基于网卡中断或轮询往上层传替数据的。

首先以libpcap为主线,先通过pcap_open_live函数,做一些初始化的操作,比如打开网卡,设置好读取数据包的回调函数等等,然后就可以通过pcap_next,pcap_next_ex,pcap_dispatch,pcap_loop来捕获数据包了。本文的主要宗旨是分析源码,从应用层的libpcap,pfring一直分析到内核的PF_RING,通过对源码的讲解,使得我们深入的理解PF_RING,及它是怎样改善libpcap捕获数据包的性能的。


1) pcap_open_live 

首先还是从应用层的libpcap开始分析,第一个分析的函数pcap_open_live,在pcap.c文件中找到pcap_open_live函数,源码如下:

pcap_t * pcap_open_live(constchar *source, int snaplen, int promisc, int to_ms, char *errbuf)

{

       pcap_t*p;

       intstatus;

       p= pcap_create(source, errbuf);

       if(p == NULL)

              return(NULL);

       status= pcap_set_snaplen(p, snaplen);

       if(status < 0)

              gotofail;

       status= pcap_set_promisc(p, promisc);

       if(status < 0)

              gotofail;

       status= pcap_set_timeout(p, to_ms);

       if(status < 0)

              gotofail;

 

       p->oldstyle= 1;

       status= pcap_activate(p);

       if(status < 0)

              gotofail;

       return(p);

fail:

       if(status == PCAP_ERROR)

              snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,

                  p->errbuf);

       elseif (status == PCAP_ERROR_NO_SUCH_DEVICE ||

           status == PCAP_ERROR_PERM_DENIED)

              snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s (%s)", source,

                  pcap_statustostr(status), p->errbuf);

       else

              snprintf(errbuf,PCAP_ERRBUF_SIZE, "%s: %s", source,

                  pcap_statustostr(status));

       pcap_close(p);

       return(NULL);

}

 

从上面的源码可以看出,pcap_open_live函数首先调用pcap_create函数,这个函数里面的内容到下面在进行分析,然后调用pcap_set_snaplen设置最大捕获包的长度,对于以太网数据包,最大长度为1518bytes,默认可以设置成65535就可以捕获所有的数据包了。然后调用pcap_set_promisc设置数据包的捕获模式,1为混杂模式,pcap_set_timeout函数的作用是设置超时的时间,当应用程序在这个时间内没读到数据就返回。接着就是pcap_activate函数了,这个也在下面进行讲解。其实在pcap_create函数和pcap_activate函数之间还可以调用pcap_set_buffer_size函数设置内核缓冲区的大小,这个函数我们可以在opentest.c文件中看到它的调用方法。我也会在下文中进行讲解。

Libpcap源码为了支持多个操作系统,代码错综复杂,你搜一下pcap_create函数,有很多地方定义了该函数,但是我们是在linux系统下进行源码分析,所以我们首先在pcap_linux.c下面搜索pcap_create函数,源码如下:

pcap_t * pcap_create(constchar *device, char *ebuf)

{     //device 为网卡的设备名,ebuf:存放错误信息的缓冲区

       pcap_t *handle;

/*

        * A null device name is equivalent to the"any" device.

       */

       if (device == NULL)

              device ="any";

 

#ifdef HAVE_DAG_API

       if (strstr(device,"dag")) {

              returndag_create(device, ebuf);

       }

#endif /* HAVE_DAG_API */

 

#ifdef HAVE_SEPTEL_API

       if (strstr(device,"septel")) {

              returnseptel_create(device, ebuf);

       }

#endif /* HAVE_SEPTEL_API */

 

#ifdef HAVE_SNF_API

        handle =snf_create(device, ebuf);

        if (strstr(device,"snf") || handle != NULL)

              return handle;

 

#endif /* HAVE_SNF_API */

 

#ifdef PCAP_SUPPORT_BT

       if (strstr(device,"bluetooth")) {

              returnbt_create(device, ebuf);

       }

#endif

 

#ifdef PCAP_SUPPORT_CAN

       if (strstr(device,"can") || strstr(device, "vcan")) {

              returncan_create(device, ebuf);

       }

#endif

 

#ifdef PCAP_SUPPORT_USB

       if (strstr(device,"usbmon")) {

              returnusb_create(device, ebuf);

       }

#endif

 

       handle = pcap_create_common(device, ebuf);

       if (handle == NULL)

              return NULL;

          // pcap_create_common为初始化的函数,通过网卡设备的名字,获得pcap_t*一个句柄,然后再设定handle的回调函数。

       handle->activate_op =pcap_activate_linux;

       handle->can_set_rfmon_op= pcap_can_set_rfmon_linux;  //设置rfmonmode

       return handle;

}

 

       为了支持不同的设备,pcap_create通过 #ifdef进行区分,这样就将打开不同的设备集成在一个函数中,而在我们的应用中就是普通的网卡,所以它就是调用pcap_create_common函数,它在pcap.c中定义,感觉有点混乱,为什么不直接在pcap-linux.c中定义呢,个人观点,应该在pcap-linux中定义,显的直观些,害我跟踪的时候,还要到pcap.c中取找这个函数,因为libpcap还要兼容其它操作系统的原因吧,因为你把它放在pcap-linux.c,其它操作系统调用这个函数,就不方便了,从这一点考虑,libpcap的作者们的架构还是挺不错的。另外定义2个回调函数pcap_activate_linux和pcap_can_set_rfmon_linux函数。Pcap_create函数的返回值为pcap_t*类型的网卡的句柄。既然讲到了pcap_create函数,就必须跟踪到pcap_create_common函数及另外的2个回调函数中去。下面接着看pcap_create_common函数的源码。

pcap_t *pcap_create_common(constchar *source, char *ebuf)

{

       pcap_t*p;

       p= malloc(sizeof(*p));         //p分配内存

       if(p == NULL) {

              snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",

                  pcap_strerror(errno));

              return(NULL);

       }

       memset(p,0, sizeof(*p));      //p的内存区域清0

#ifndef WIN32

       p->fd= -1;     /* not opened yet */

       p->selectable_fd= -1;

       p->send_fd= -1;

#endif

       p->opt.source= strdup(source);    //source为网卡的名字

       if(p->opt.source == NULL) {

              snprintf(ebuf,PCAP_ERRBUF_SIZE, "malloc: %s",

                  pcap_strerror(errno));

              free(p);

              return(NULL);

       }

       /*

        * Default to"can't set rfmon mode"; if it's supported by

        * a platform, thecreate routine that called us can set

        * the op to its routineto check whether a particular

        * device supports it.

        */

       p->can_set_rfmon_op= pcap_cant_set_rfmon;

       initialize_ops(p);

 

       /*put in some defaults*/

       pcap_set_timeout(p,0);

       pcap_set_snaplen(p,65535);  /* max packet size */

       p->opt.promisc= 0;

       p->opt.buffer_size= 0;

       return(p);

}

 

在这个函数中,需要讲解的是strdup函数,它的作用是复制字符串,返回指向被复制的字符串的指针。注意应用它时,需要加头文件#include <string.h>。      

p->can_set_rfmon_op =pcap_cant_set_rfmon; 这句话的作用在函数里面的注释中已经讲了,默认为不设置rfmon mode。initialize_ops(p);函数的作用就是设置初始化的一系列回调函数。

       pcap_set_timeout(p,0);

       pcap_set_snaplen(p,65535);  /* max packet size */

       p->opt.promisc= 0;

       p->opt.buffer_size= 0;

这几行代码的作用是设置初始的超时,snaplen=65535,设置成非混杂模式,内核缓冲区的大小初始化为0。整的来说pcap_create_common就是一个初始化函数。

其中initialize_ops函数的源码如下:

static void initialize_ops(pcap_t*p)

{

       /*

        * Set operationpointers for operations that only work on

        * an activated pcap_tto point to a routine that returns

        * a "this isn'tactivated" error.

        */

       p->read_op= (read_op_t)pcap_not_initialized;

       p->inject_op= (inject_op_t)pcap_not_initialized;

       p->setfilter_op= (setfilter_op_t)pcap_not_initialized;

       p->setdirection_op= (setdirection_op_t)pcap_not_initialized;

       p->set_datalink_op= (set_datalink_op_t)pcap_not_initialized;

       p->getnonblock_op= (getnonblock_op_t)pcap_not_initialized;

       p->setnonblock_op= (setnonblock_op_t)pcap_not_initialized;

       p->stats_op= (stats_op_t)pcap_not_initialized;

#ifdef WIN32

       p->setbuff_op= (setbuff_op_t)pcap_not_initialized;

       p->setmode_op= (setmode_op_t)pcap_not_initialized;

       p->setmintocopy_op= (setmintocopy_op_t)pcap_not_initialized;

#endif

 

       /*

        * Default cleanupoperation - implementations can override

        * this, but should callpcap_cleanup_live_common() after

        * doing their ownadditional cleanup.

        */

       p->cleanup_op= pcap_cleanup_live_common;

       /*

        * In most cases, the standard one-shortcallback can

        * be used for pcap_next()/pcap_next_ex().

        */

       p->oneshot_callback= pcap_oneshot;

}

 

pcap_create_common讲解完了,接着讲解pcap_create函数中的另外一个回调函数,pcap_activate_linux,搜索这个函数,呵呵,在pcap-linux.c中找到了这个函数。Libpcap的作者这个架构,实在是令小生佩服。把linux要用到的函数都集成到了pcap-linux.c中,还把多个操作系统共用的函数就放到了pcap.c中,比如前面讲到的pcap_create_common函数。先不管这么多,抓住pcap_activate_linux再说。下面讲解pcap_activate_linux这个源码。从pcap_activate_linux的源码可以看到,通过pcap_create_common对pcap_t * p设定初始值,其实就像c++的初始化函数一样,比如c++的构造函数,MFCOninitDialog函数一样。初始化就是初始化,对于不同的系统,就要进行不同的设置了,在linux函数中pcap_activate_linux中可以看到又对pcap_create_common中初始化的回调函数又重新进行了设置,看到这里我就佩服libpcap的作者了,把pcap_create_common函数放到了pcap.c文件中。

static int pcap_activate_linux(pcap_t*handle)

{

       constchar       *device;

       int          status = 0;

       device= handle->opt.source;                          //网卡的名字

       handle->inject_op= pcap_inject_linux;

       handle->setfilter_op= pcap_setfilter_linux;

       handle->setdirection_op= pcap_setdirection_linux;

       handle->set_datalink_op= NULL; /* can't change data link type */

       handle->getnonblock_op= pcap_getnonblock_fd;

       handle->setnonblock_op= pcap_setnonblock_fd;

       handle->cleanup_op= pcap_cleanup_linux;

       handle->read_op= pcap_read_linux;

       handle->stats_op= pcap_stats_linux;

       /*

        * The "any"device is a special device which causes us not

        * to bind to a particulardevice and thus to look at all

        * devices.

       */

       if(strcmp(device, "any") == 0) {

              if(handle->opt.promisc) {

                     handle->opt.promisc= 0;

                     /*Just a warning. */

                     snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                         "Promiscuous mode not supported on the\"any\" device");

                     status= PCAP_WARNING_PROMISC_NOTSUP;

              }

       }

 

       handle->md.device = strdup(device);

       if(handle->md.device == NULL) {

              snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "strdup: %s",

                      pcap_strerror(errno) );

              returnPCAP_ERROR;

       }

 

#ifdef HAVE_PF_RING                     //是否定义pf_ring

       if(!getenv("PCAP_NO_PF_RING")){

         /* Code courtesy ofChris Wakelin <c.d.wakelin@reading.ac.uk> */

         char *clusterId;

         handle->ring =pfring_open((char*)device, handle->opt.promisc, handle->snapshot, 1);

/*

       #ifdef HAVE_PF_RING       如果定义了PF_RING,就执行这个里面的东东,从里面的函数可以看出,pf_ring从新定义了socket函数,pfring_open函数的作用如下:初始化PF_RING socket,获得一个pfring类型的结构。函数原型如下:

 pfring* pfring_open(char *device_name,u_int8_t promisc, u_int32_t caplen, u_int8_t reentrant);

 函数功能:初始化PF_RING socket,获得一个pfring类型结构。如果需要以DNA的方式打开一个设备,则必须调用pfring_open_dna()函数;

参数:

Device_name: PF_RING的符号链接命令(egeth0)

Promisc: 设置是否为混合模式(1=混合模式)

Caplen:最大的包捕获长度,(also known assnaplenpcap_open_live函数的snaplen一样,通常设为65535就能捕获到网络上最大的数据包)

Reentrant: 设为非0则设备已reentrant的模式打开,它以信号量的机制执行,性能稍微会变差,主要用在多线程应用程序;

返回值:成功返回一个句柄,否则返回NULL

   Pfring_open源码如下:

   pfring*pfring_open(char *device_name, u_int8_t promisc,u_int32_t caplen, u_int8_t_reentrant) {

 return(pfring_open_consumer(device_name, promisc, caplen, _reentrant,

                           0, NULL, 0));

   Pfring_open 其实是调用的pfring_open_consumer函数;该函数到后面我们在继续分析它;

*/

 

         if(handle->ring) {

           if(clusterId =getenv("PCAP_PF_RING_CLUSTER_ID"))

/*

 

其中getenvC语言中读取环境变量的当前值的函数

原形:char *getenv(const char *name)

用法:s=getenv("环境变量名");

   需先定义char *s;

功能:返回一给定的环境变量值,环境变量名可大写或小写。如果指定的变量在环境中未定义,则返回一空串。

*/

             if(atoi(clusterId) > 0 &&atoi(clusterId) < 255)

              if(getenv("PCAP_PF_RING_USE_CLUSTER_PER_FLOW"))

                pfring_set_cluster(handle->ring,atoi(clusterId), cluster_per_flow);

              else

                pfring_set_cluster(handle->ring, atoi(clusterId),cluster_round_robin);

           pfring_enable_ring(handle->ring);

         } else

           handle->ring = NULL;

       }else

         handle->ring = NULL;

/*

pfring_set_cluster的函数只用于设置cluster_id,通过调用PF_RINGsetsockopt函数完成:

查找PF_RING的文档,对这个函数有以下说明,在多cpu的情况下,pfring_set_cluster是非常有用的:

This call allows a ring to be added to acluster that can spawn across address spaces. On a nuthsell when two or moresockets are clustered they share incoming packets that are balanced on aper-flow manner.  This  technique is  useful  for exploiting multicore  systems  of for  sharing  packets in  the  same address space across multiple threads.

 

intpfring_set_cluster(pfring *ring, u_int clusterId, cluster_type the_type) {

#ifdef USE_PCAP

  return(-1);

#else

  if(ring->dna_mapped_device)

    return(-1);

  else {

    struct add_to_cluster cluster;

    cluster.clusterId = clusterId,cluster.the_type = the_type;

    return(ring ? setsockopt(ring->fd, 0,SO_ADD_TO_CLUSTER,

                         &cluster, sizeof(cluster)): -1);

  }

#endif

}

其中setsockopt/getsockopt函数的作用是:

功能描述:

获取或者设置与某个套接字关联的选项。选项可能存在于多层协议中,它们总会出现在最上面的套接字层。当操作套接字选项时,选项位于的层和选项的名称必须给出。为了操作套接字层的选项,应该将层的值指定为SOL_SOCKET。为了操作其它层的选项,控制选项的合适协议号必须给出。例如,为了表示一个选项由TCP协议解析,层应该设定为协议号TCP。用法如下:

#include <sys/types.h>

#include <sys/socket.h>

int getsockopt(int sock,int level, int optname, void *optval, socklen_t *optlen);

int setsockopt(int sock,int level, int optname, const void *optval, socklen_t optlen);

参数说明:  

sock:将要被设置或者获取选项的套接字。

level:选项所在的协议层。

optname:需要访问的选项名。//SO_ADD_TO_CLUSTER

optval:对于getsockopt(),指向返回选项值的缓冲。对于setsockopt(),指向包含新选项值的缓冲。

optlen:对于getsockopt(),作为入口参数时,选项值的最大长度。作为出口参数时,选项值的实际长度。对于setsockopt(),现选项的长度。

 

如果定义了PF_RING就是调用pfring_open建立sock,这一部分内容讲解告一段落了。

 

*/

       if(handle->ring!= NULL) {

         handle->fd = handle->ring->fd;

         handle->bufsize = handle->snapshot;

         handle->linktype = DLT_EN10MB;

         handle->offset = 2;

 

         /* printf("OpenHAVE_PF_RING(%s)\n", device); */

       }else {

         /* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */

#endif

 

/*

        * If we're inpromiscuous mode, then we probably want

        * to see when theinterface drops packets too, so get an

        * initial count from/proc/net/dev

*/

       if(handle->opt.promisc)

              handle->md.proc_dropped= linux_if_drops(handle->md.device);

 

       /*

        * Current Linux kernelsuse the protocol family PF_PACKET to

        * allow direct accessto all packets on the network while

        * older kernels had aspecial socket type SOCK_PACKET to

        * implement thisfeature.

        * While this oldimplementation is kind of obsolete we need

        * to be compatible witholder kernels for a while so we are

        * trying both methodswith the newer method preferred.

        */

// 目前的内核是采用PF_PACKET,而老的内核通过采用SOCK_PACKET

       if((status = activate_new(handle)) == 1) {

/*

 * Try to open a packet socket using the newkernel PF_PACKET interface.

 * Returns 1 on success, 0 on an error thatmeans the new interface isn't

 * present (so the old SOCK_PACKET interfaceshould be tried), and a

 * PCAP_ERROR_ value on an error that meansthat the old mechanism won't

 * work either (so it shouldn't be tried). Activate_new函数的作用在没有定义PF_RING的情况下通过PF_PACKET接口建立socket,返回1表示成功,可以采用PF_PACKET建立socket,返回0表示失败,这时可以尝试采用SOCKET_PACKET接口建立socket,该函数也在pcap-linux.c中可以找到源码;根据status的返回值,确定3种不同的情况,返回1成功,表示采用的是PF_PACKET建立socket,而返回0的时候,又调用activate_old函数进行判断,如果activate_old函数返回1表示调用的是SOCK_PACKET建立socket,而activate_old返回0表示失败;第3种情况是status不等于上面的2个值,则表示失败。

 */

              /*

               * Success.

               * Try to use memory-mapped access.

               */

              switch(activate_mmap(handle)) {

 

              case1:

                     /*we succeeded; nothing more to do */

                     return0;

 

              case0:

                     /*

                      * Kernel doesn't support it - just continue

                      * with non-memory-mapped access.

                      */

                     status= 0;

                     break;

 

              case-1:

                     /*

                      * We failed to set up to use it, or kernel

                      * supports it, but we failed to enable it;

                      * return an error.  handle->errbuf contains

                      * an error message.

                      */

                     status= PCAP_ERROR;

                     gotofail;

              }

       }

       elseif (status == 0) {

              /*Non-fatal error; try old way */

              if((status = activate_old(handle)) != 1) {

                     /*

                      * Bothmethods to open the packet socket failed.

                      * Tidy upand report our failure (handle->errbuf

                      * isexpected to be set by the functions above).

                      */

                     gotofail;

              }

       }else {

              /*

               * Fatal errorwith the new way; just fail.

               * status has theerror return; if it's PCAP_ERROR,

               *handle->errbuf has been set appropriately.

               */

              gotofail;

       }

 

       /*

        * We set up the socket,but not with memory-mapped access.

        */

       if(handle->opt.buffer_size != 0) {

       /*

         如果opt.buffer_size!=0以我的理解就是应用程序调用了pcap_set_buffer_size设置了内核缓冲区的大小,而不是采用默认的内核缓冲区,因此首先通过setsockopt发送设置命令,然后调用malloc分配内存。

        * Set the socket buffersize to the specified value.

       */

              if(setsockopt(handle->fd, SOL_SOCKET,SO_RCVBUF,

                  &handle->opt.buffer_size,

                  sizeof(handle->opt.buffer_size)) == -1){

                     snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                             "SO_RCVBUF: %s",pcap_strerror(errno));

                     status= PCAP_ERROR;

                     gotofail;

              }

       }

#ifdef HAVE_PF_RING

        }

#endif

 

       /*Allocate the buffer */

       handle->buffer       = malloc(handle->bufsize +handle->offset);

       if(!handle->buffer) {

              snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                      "malloc: %s", pcap_strerror(errno));

              status= PCAP_ERROR;

              gotofail;

       }

 

       /*

        *"handle->fd" is a socket, so "select()" and"poll()"

        * should work on it.

        */

       handle->selectable_fd= handle->fd;

 

       returnstatus;

 

fail:

       pcap_cleanup_linux(handle);

       returnstatus;

}

 

pcap_activate_linux函数分析完了,按我的理解应该是用PF_RING代替PF_PACKET或SOCK_PACKET。但是我从pcap_activate_linux函数,简单的分析下,发现首先采用的pfring_open建立sock,以我的理解,当定义了pf_ring时,采用pfring_open建立socket后应该马上退出函数,不去判断后面的内容了,比如又去判断activate_new和activate_old函数,没有搞明白,也不理解作者的意图。所以我再次的对pfring_open的源码进行分析,继续跟踪代码:首先跟踪的是pfring_open函数,然后跟踪activate_new函数,有必要看看这个里面是怎么实现的。前面说过pfring_open是调用pfring_open_consumer函数的,为了分析他们的源码,跟踪到pfring.c文件中,pfring_open_consumer函数的源码如下:

 

pfring* pfring_open_consumer(char *device_name, u_int8_tpromisc,

                          u_int32_t caplen, u_int8_t _reentrant,

                          u_int8_tconsumer_plugin_id,

                          char* consumer_data, u_intconsumer_data_len) {

#ifdefUSE_PCAP

  char ebuf[256];

  pcap_t *pcapPtr = pcap_open_live(device_name,

                               caplen,

                               1 /* promiscuous mode */,

                               1000 /* ms */,

                               ebuf);

  return((pfring*)pcapPtr);

#else

  int err = 0;

  pfring *ring =(pfring*)malloc(sizeof(pfring));             //申请pfring结构体大小的内存

  if(ring == NULL)

    return(NULL);

  else

    memset(ring, 0, sizeof(pfring));                                    //将缓冲区清0

  ring->reentrant = _reentrant;

  ring->fd = socket(PF_RING, SOCK_RAW,htons(ETH_P_ALL));  //建立socket

 

#ifdef RING_DEBUG

  printf("OpenRING [fd=%d]\n", ring->fd);

#endif

 

  if(ring->fd > 0) {

    int rc;

    u_int memSlotsLen;

 

if(caplen > MAX_CAPLEN) caplen = MAX_CAPLEN;

//pfring.h中定义 MAX_CAPLEN#define MAX_CAPLEN       16384

setsockopt(ring->fd, 0, SO_RING_BUCKET_LEN, &caplen, sizeof(caplen));

 //设置caplencaplen为捕获包的大小在pfring.h中定义它的最大大小为16384

    /* printf("channel_id=%d\n",channel_id); */

 

    if(device_name == NULL /* any */) {

      device_name = "any";

      rc = pfring_bind(ring, device_name);                    //绑定ring

    } else if(!strcmp(device_name,"none")) {

      /* No binding yet */

      rc = 0;

    } else

      rc = pfring_bind(ring, device_name);

 

    if(rc == 0) {

      if(consumer_plugin_id > 0) {

       ring->kernel_packet_consumer =consumer_plugin_id;

       rc = pfring_set_packet_consumer_mode(ring,consumer_plugin_id,

                                        consumer_data, consumer_data_len);

       if(rc < 0) {

        free(ring);

        return(NULL);

       }

      } else

       ring->kernel_packet_consumer = 0;

 

      ring->buffer = (char *)mmap(NULL,PAGE_SIZE, PROT_READ|PROT_WRITE,

                              MAP_SHARED, ring->fd, 0);

//mmap 内存映射其中PAGE_SIZE=4096

/*

内存映射mmap函数原型如下:函数:void *mmap(void*start,size_t length,int prot,int flags,int fd,off_t offsize);

参数start指向欲映射的内存起始地址,通常设为 NULL,代表让系统自动选定地址,映射成功后返回该地址。

参数length代表将文件中多大的部分映射到内存。

参数prot映射区域的保护方式。可以为以下几种方式的组合:
PROT_EXEC 
映射区域可被执行,PROT_READ映射区域可被读取,PROT_WRITE映射区域可被写入
PROT_NONE 
映射区域不能存取;

参数flags影响映射区域的各种特性。在调用mmap()时必须要指定MAP_SHAREDMAP_PRIVATE
MAP_FIXED 
如果参数start所指的地址无法成功建立映射时,则放弃映射,不对地址做修正。通常不鼓励用此旗标。
MAP_SHARED
对映射区域的写入数据会复制回文件内,而且允许其他映射该文件的进程共享。
MAP_PRIVATE 
对映射区域的写入操作会产生一个映射文件的复制,即私人的写入时复制copy on write)对此区域作的任何修改都不会写回原来的文件内容。
MAP_ANONYMOUS
建立匿名映射。此时会忽略参数fd,不涉及文件,而且映射区域无法和其他进程共享。
MAP_DENYWRITE
只允许对映射区域的写入操作,其他对文件直接写入的操作将会被拒绝。
MAP_LOCKED 
将映射区域锁定住,这表示该区域不会被置换(swap)

参数fd要映射到内存中的文件描述符(ring->fdsocket函数的返回值)。如果使用匿名内存映射时,即flags中设置了MAP_ANONYMOUSfd设为-1。有些系统不支持匿名内存映射,则可以使用fopen打开/dev/zero文件,然后对该文件进行映射,可以同样达到匿名内存映射的效果。

参数offset文件映射的偏移量,通常设置为0代表从文件最前方开始对应,offset必须是分页大小的整数倍。

返回值:

若映射成功则返回映射区的内存起始地址,否则返回MAP_FAILED(1),错误原因存于errno中。

*/

     if(ring->buffer == MAP_FAILED) {

       printf("mmap()failed: try with a smaller snaplen\n");

       free(ring);

       return(NULL);

      }

 

     ring->slots_info = (FlowSlotInfo *)ring->buffer;          

//其中ring->buffermmap内存映射的缓冲区,ring->slot_info指向ring->buffer的开始位置;

     if(ring->slots_info->version != RING_FLOWSLOT_VERSION) {

       printf("WrongRING version: "

              "kernel is %i, libpfring wascompiled with %i\n",

              ring->slots_info->version,RING_FLOWSLOT_VERSION);

       free(ring);

       return(NULL);

      }

     memSlotsLen = ring->slots_info->tot_mem;          //

      munmap(ring->buffer,PAGE_SIZE);                          //删除映射

 

     ring->buffer = (char*)mmap(NULL, memSlotsLen,

PROT_READ|PROT_WRITE,

                              MAP_SHARED, ring->fd, 0);

/*

       感觉前面的mmap就是为了得到memSlotsLen,然后就用munmap删除映射了,接着使用mmap重新内存映射。

*/

      if(ring->buffer == MAP_FAILED) {

       printf("mmap() failed");

       free(ring);

       return(NULL);

      }

 

      ring->slots_info   = (FlowSlotInfo *)ring->buffer;             //得到环状缓冲区指针

      ring->slots = (char*)(ring->buffer+sizeof(FlowSlotInfo));

 //跳过环状缓冲区前面的机构体的大小,后面就是用来接收数据了。

 

      /* Set defaults */

      ring->device_name = strdup(device_name? device_name : "");

 

#ifdefRING_DEBUG

      printf("RING (%s):tot_mem=%u/min_tot_slots=%u/max_slot_len=%u/"

           "insert_off=%u/remove_off=%u/dropped=%llu\n",

           device_name,

           ring->slots_info->tot_mem,

           ring->slots_info->tot_slots,

           ring->slots_info->slot_len,

           ring->slots_info->insert_off,

           ring->slots_info->remove_off,

           ring->slots_info->tot_lost);

#endif

 

      if(promisc) {

       if(set_if_promisc(device_name, 1) == 0)

        ring->clear_promisc = 1;

      }

 

#ifdef ENABLE_HW_TIMESTAMP

      pfring_enable_hw_timestamp(ring,device_name);

#endif

    } else {

      close(ring->fd);

      err = -1;

    }

  } else {

    err = -1;

    free(ring);

  }

 

  if(err == 0) {

    if(ring->reentrant)

      pthread_spin_init(&ring->spinlock,PTHREAD_PROCESS_PRIVATE);

 

    return(ring);

  } else

    return(NULL);

#endif

}

 

//pfring_bind函数的作用是调用bind绑定socket rc = bind(ring->fd,(struct sockaddr *)&sa, sizeof(sa));

 

int pfring_bind(pfring *ring, char *device_name) {

  struct sockaddr sa;                                               //定义一个socket地址变量

  char *at;

  int32_t channel_id = -1;

  int rc = 0;

 

  if((device_name == NULL) ||(strcmp(device_name, "none") == 0))

    return(-1);

 

  at = strchr(device_name, '@');

  if(at != NULL) {

    char *tok, *pos = NULL;

at[0] = '\0';

 

    /* Syntax

       ethX@1,5       channel 1 and 5

       ethX@1-5       channel 1,2...5

       ethX@1-3,5-7   channel 1,2,3,5,6,7

*/

 

    tok = strtok_r(&at[1], ",",&pos);

    channel_id = 0;

    while(tok != NULL) {

      char *dash = strchr(tok, '-');

      int32_t min_val, max_val, i;

 

      if(dash) {

       dash[0] = '\0';

       min_val = atoi(tok);

       max_val = atoi(&dash[1]);

      } else

       min_val = max_val = atoi(tok);

      for(i = min_val; i <= max_val; i++)

       channel_id |= 1 << i;

      tok = strtok_r(NULL, ",",&pos);

    }

  }

 

  /* Setup TX */

  ring->sock_tx.sll_family = PF_PACKET;

  ring->sock_tx.sll_protocol =htons(ETH_P_ALL);

 

  sa.sa_family = PF_RING;

  snprintf(sa.sa_data, sizeof(sa.sa_data),"%s", device_name);

 

  rc = bind(ring->fd, (struct sockaddr*)&sa, sizeof(sa));

/*

       Bind函数:

 

头文件

#include <sys/types.h>  #include <sys/socket.h>

函数原型

int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);

返回值

成功

失败

0

1

 

*/

  if(rc == 0) {

    if(channel_id != -1) {

      int rc = pfring_set_channel_id(ring,channel_id);

 

      if(rc != 0)

       printf("pfring_set_channel_id()failed: %d\n", rc);

    }

  }

 

  return(rc);

}

 

在这里又将pfring_open_consumer源码分析完了,确实跟我理解的一样。就是通过内存映射建立一个ring缓冲区,然后调用pfring_bind对socket进行绑定。再前面我们说了以我的个人理解,PF_RING的补丁,就是要采用新的socket代替原来的PF_PACKET和SOCK_PACKET,但是我开始分析源码时,发现既然建立了PF_RING,为什么pcap_activate_linux不直接返回呢,诧异,诧异。再次返回pcap_activate_linux函数看看,有什么没有看懂的吗?首先分析下pcap_activate_linux带的参数pcap_t *handle,这个数据结构吧,大家知道算法+数据结构=程序,可见数据结构的重要性。在pcap-int.h中找到了定义ring的地方,如下:

#ifdefHAVE_PF_RING

  pfring *ring;

#endif

下面要看看,既然采用了pfring_open建立和绑定了socket,后面的activate_new函数的作用是什么呢?跟踪一下activate_new函数吧,

 

static int activate_new(pcap_t*handle)

{

#ifdef HAVE_PF_PACKET_SOCKETS

// HAVE_PF_PACKET_SOCKETS首先判断是不是PF_PACKETsocket类型,是的就执行这个里面的操作,不是的话,相当于直接返回0,就可以去调用activate_old去判断是不是SOCK_PACKET类型了。

       const char              *device = handle->opt.source;

       int                 is_any_device= (strcmp(device, "any") == 0);

       int                 sock_fd= -1, arptype;

#ifdef HAVE_PACKET_AUXDATA

       int                 val;

#endif

       int                 err= 0;

       struct packet_mreq mr;

 

       /*

        * Open a socket with protocol family packet.If the

        * "any" device was specified, weopen a SOCK_DGRAM

        * socket for the cooked interface, otherwisewe first

        * try a SOCK_RAW socket for the raw interface.

        */

       sock_fd = is_any_device ?

              socket(PF_PACKET, SOCK_DGRAM,htons(ETH_P_ALL)) :

              socket(PF_PACKET, SOCK_RAW,htons(ETH_P_ALL));

       //socket函数的作用是建立socket,下面是不是会出现绑定的函数呢,仔细看看

       if (sock_fd == -1) {

              snprintf(handle->errbuf,PCAP_ERRBUF_SIZE, "socket: %s",

                      pcap_strerror(errno) );

              return 0;  /* try old mechanism */

       }

 

       /* It seems the kernel supports the newinterface. */

       handle->md.sock_packet = 0;

 

       /*

        * Get the interface index of the loopbackdevice.

        * If the attempt fails, don't fail, just setthe

        * "md.lo_ifindex" to -1.

        *

        * XXX - can there be more than one device thatloops

        * packets back, i.e. devices other than"lo"?  If so,

        * we'd need to find them all, and have anarray of

        * indices for them, and check all of them in

        * "pcap_read_packet()".

        */

       handle->md.lo_ifindex =iface_get_id(sock_fd, "lo", handle->errbuf);

 

       /*

        * Default value for offset to align link-layerpayload

        * on a 4-byte boundary.

        */

       handle->offset = 0;

 

       /*

        * What kind of frames do we have to deal with?Fall back

        * to cooked mode if we have an unknowninterface type

        * or a type we know doesn't work well in rawmode.

        */

       if (!is_any_device) {

              /* Assume for now we don't needcooked mode. */

              handle->md.cooked = 0;

              if (handle->opt.rfmon) {

                     /*

                      * We were asked to turn on monitor mode.

                      * Do so before we get the link-layer type,

                      * because entering monitor mode could change

                      * the link-layer type.

                      */

                     err =enter_rfmon_mode(handle, sock_fd, device);

                     if (err < 0) {

                            /* Hard failure */

                            close(sock_fd);

                            return err;

                     }

                     if (err == 0) {

                            /*

                             * Nothing worked for turning monitor mode

                             * on.

                             */

                            close(sock_fd);

                            returnPCAP_ERROR_RFMON_NOTSUP;

                     }

 

                     /*

                      * Either monitor mode has been turned on for

                      * the device, or we've been given a different

                      * device to open for monitor mode.  If we've

                      * been given a different device, use it.

                      */

                     if (handle->md.mondevice!= NULL)

                            device =handle->md.mondevice;

              }

              arptype    = iface_get_arptype(sock_fd, device, handle->errbuf);

              if (arptype < 0) {

                     close(sock_fd);

                     return arptype;

              }

              map_arphrd_to_dlt(handle, arptype,1);

              if (handle->linktype == -1 ||

                 handle->linktype == DLT_LINUX_SLL ||

                 handle->linktype == DLT_LINUX_IRDA ||

                 handle->linktype == DLT_LINUX_LAPD ||

                 (handle->linktype == DLT_EN10MB &&

                   (strncmp("isdn", device, 4) == 0||

                    strncmp("isdY", device, 4) ==0))) {

             

                     if (close(sock_fd) == -1) {

                            snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                                    "close: %s", pcap_strerror(errno));

                            return PCAP_ERROR;

                     }

                     sock_fd = socket(PF_PACKET,SOCK_DGRAM,

                         htons(ETH_P_ALL));

                     if (sock_fd == -1) {

                            snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                                "socket: %s",pcap_strerror(errno));

                            return PCAP_ERROR;

                     }

                     handle->md.cooked = 1;

 

                     /*

                      * Get rid of any link-layer type list

                      * we allocated - this only supports cooked

                      * capture.

                      */

                     if (handle->dlt_list !=NULL) {

                            free(handle->dlt_list);

                            handle->dlt_list= NULL;

                            handle->dlt_count= 0;

                     }

 

                     if (handle->linktype ==-1) {

                            /*

                             * Warn that we're falling back on

                             * cooked mode; we may want to

                             * update "map_arphrd_to_dlt()"

                             * to handle the new type.

                             */

                            snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                                   "arptype%d not "

                                   "supportedby libpcap - "

                                   "fallingback to cooked "

                                   "socket",

                                   arptype);

                     }

 

                     /*

                      * IrDA capture is not a real"cooked" capture,

                      * it's IrLAP frames, not IP packets.  The

                      * same applies to LAPD capture.

                      */

                     if (handle->linktype !=DLT_LINUX_IRDA &&

                         handle->linktype != DLT_LINUX_LAPD)

                            handle->linktype= DLT_LINUX_SLL;

              }

 

              handle->md.ifindex =iface_get_id(sock_fd, device,

                 handle->errbuf);

              if (handle->md.ifindex == -1) {

                     close(sock_fd);

                     return PCAP_ERROR;

              }

// 在上面我们分析盼望已久的绑定函数终于出现了iface_bind函数就是绑定函数,这个函数我猜里面也是调用的bind函数吧,带着这个预期,我去跟踪下iface_bind的代码,再来给答案,看了iface_bind代码,果然和我预测的结果一样,是调用的bind函数进行绑定。

              if ((err =iface_bind(sock_fd, handle->md.ifindex,

                 handle->errbuf)) != 1) {

                        close(sock_fd);

                     if (err < 0)

                            return err;

                     else

                            return 0;  /* try old mechanism */

              }

       } else {

              /*

               * The "any" device.

               */

              if (handle->opt.rfmon) {

                     /*

                      * It doesn't support monitor mode.

                      */

                     returnPCAP_ERROR_RFMON_NOTSUP;

              }

 

              /*

               * It uses cooked mode.

               */

              handle->md.cooked = 1;

              handle->linktype =DLT_LINUX_SLL;

 

              /*

               * We're not bound to a device.

               * For now, we're using this as an indication

               * that we can't transmit; stop doing that only

               * if we figure out how to transmit in cooked

               * mode.

               */

              handle->md.ifindex = -1;

       }

 

       if (!is_any_device &&handle->opt.promisc) {

              memset(&mr, 0, sizeof(mr));

              mr.mr_ifindex =handle->md.ifindex;

              mr.mr_type    = PACKET_MR_PROMISC;

              if (setsockopt(sock_fd,SOL_PACKET, PACKET_ADD_MEMBERSHIP,

                 &mr, sizeof(mr)) == -1) {

                     snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                            "setsockopt:%s", pcap_strerror(errno));

                     close(sock_fd);

                     return PCAP_ERROR;

              }

       }

 

       /* Enableauxillary data if supported and reserve room for

        * reconstructing VLAN headers. */

#ifdef HAVE_PACKET_AUXDATA

       val = 1;

       if (setsockopt(sock_fd, SOL_PACKET,PACKET_AUXDATA, &val,

                     sizeof(val)) == -1 && errno !=ENOPROTOOPT) {

              snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                      "setsockopt: %s", pcap_strerror(errno));

              close(sock_fd);

              return PCAP_ERROR;

       }

       handle->offset += VLAN_TAG_LEN;

#endif /* HAVE_PACKET_AUXDATA */

 

       if (handle->md.cooked) {

              if (handle->snapshot <SLL_HDR_LEN + 1)

                     handle->snapshot =SLL_HDR_LEN + 1;

       }

       handle->bufsize = handle->snapshot;

 

       /* Save the socket FD in the pcapstructure */

       handle->fd = sock_fd;

 

       return 1;

#else

//如果不是PF_PACKET类型,就直接返回0了,呵呵

       strncpy(ebuf,

              "New packet capturinginterface not supported by build "

              "environment",PCAP_ERRBUF_SIZE);

       return 0;

#endif

}

 

   从activate_new函数的源码中也没有解决我要解决的那个问题,如果是PF_RING,就应该不去判断后面两种socket类型了,我又回到了pcap_activate_linux函数的源码,仔细看了看,这一次真的看出来了,就是一个handle->ring != NULL开始没有注意到,害我分析好久的其它代码不过也学到一些东西,

if(handle->ring != NULL) {

         handle->fd = handle->ring->fd;

         handle->bufsize = handle->snapshot;

         handle->linktype = DLT_EN10MB;

         handle->offset = 2;

 

         /* printf("OpenHAVE_PF_RING(%s)\n", device); */

       }else {

         /* printf("Open HAVE_PF_RING(%s) failed.Fallback to pcap\n", device); */

             。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

   handle->ring=NULL的时候,就会跳过activate_new等代码的,也就是说执行了PF_RING成功后,就不会去判断后面2socket类型了,和我预测的一样。呵呵,终于明白pcap_activate_linux这个函数的功能了。

       2011-4-18补充。并不是所有的情况pfring_open都会返回成功的,对应pcap_activate_linux里面当pfring_open调用后,比如我在实验时,将PF_RING补丁打入内核就出现错误"WrongRING version: " "kernel is 10, libpfring was compiled with 13" ,但是提示这个错误后,程序还能正确的跑,我后面再ring.h中看到内核pf_ring的版本定义为:

#defineRING_FLOWSLOT_VERSION          10

同时在pf_ring.h中发现:

#defineRING_FLOWSLOT_VERSION          13

在pfring_open的源码pfring_open_consumer中发现如果版本不一致,就会提示错误,pfring_open_consumer直接返回,这样pfring_open的返回值为NULL,但是为什么程序还能继续运行呢,这就是因为执行到了handle->ring!=NULL时的else部分,随后就会调用原始的libpcap收包函数获取数据包了,也就是说采用PF_PACKET的方式读取数据包,所以还是能够正常运行的。

同时在没有加载insmodpf_ring.ko时候,pfring_open也会返回为NULL,此时,程序也会调用libpcap原来的PF_PACKET进行收包的。

另外另一问题,当采用PF_RING读取数据包时,cpu占用率从原来的37%上升到47%,原来240Mbit/s的速度发包,大约2分钟丢3个包,采用PF_RING后可以提高到3分钟丢2个包,包长为1514个字节。

pcap_activate_linux定义的这些回调函数也是值得注意的。这里把他们都列出来。

       device = handle->opt.source;

       handle->inject_op = pcap_inject_linux;

       handle->setfilter_op = pcap_setfilter_linux;

       handle->setdirection_op = pcap_setdirection_linux;

       handle->set_datalink_op = NULL; /* can't change data link type */

       handle->getnonblock_op = pcap_getnonblock_fd;

       handle->setnonblock_op = pcap_setnonblock_fd;

       handle->cleanup_op = pcap_cleanup_linux;

       handle->read_op = pcap_read_linux;

       handle->stats_op = pcap_stats_linux;

 

其它的回调函数我就不多说了,这里重点要讲解的是pcap_read_linux函数,函数源码如下:

/*

 *  Readat most max_packets from the capture stream and call the callback

 *  foreach of them. Returns the number of packets handled or -1 if an

 *  erroroccured.

 */

static int

pcap_read_linux(pcap_t *handle, intmax_packets, pcap_handler callback, u_char *user)

{

       /*

        * Currently, on Linuxonly one packet is delivered per read,

        * so we don't loop.

        */

       returnpcap_read_packet(handle, callback, user);

}

   函数体就相当简答了,晕,只有一句,就是调用pcap_read_packet函数读取数据包。

pcap_read_packet函数;这个函数可长了,一步一步看吧,既然开始分析了,就一定要把这些源码吃透,这里才能理解libpcap为什么丢包,而加上pf-ring补丁后的libpcap就不丢包了呢。不多说了,看源码吧。还有这个回调函数什么时候调用的呢,我现在猜想应该是应用程序调用pcap_next, pcap_next_ex, pcap_dispatch, pcap_loop这几个函数时读包时调用的吧,现在只是猜想,还没有分析这部分读包的源码,呵呵,好了,还是来看pcap_read_packet函数吧。

 

/*

 *  Read a packet from the socket calling thehandler provided by

 *  the user. Returns the number of packetsreceived or -1 if an

 *  error occured.

 */

 

staticint

pcap_read_packet(pcap_t*handle, pcap_handler callback, u_char *userdata)

{

       u_char                   *bp;

       int                 offset;

#ifdef HAVE_PF_PACKET_SOCKETS

       struct sockaddr_ll   from;

       struct sll_header     *hdrp;

#else

       struct sockaddr              from;

#endif

#if defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

       struct iovec            iov;

       struct msghdr         msg;

       struct cmsghdr              *cmsg;

       union {

              structcmsghdr       cmsg;

              char        buf[CMSG_SPACE(sizeof(structtpacket_auxdata))];

       } cmsg_buf;

#else /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

       socklen_t        fromlen;

#endif/* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

 

       int                 packet_len,caplen;

#ifdef HAVE_PF_RING

        structpfring_pkthdr    pcap_header;

#else

       struct pcap_pkthdr  pcap_header;

#endif

// 这里必须讲解下,当定义了HAVE_PF_RING时候,pcap_header指向的是pfring_pkthdr结构体,去看看它和pcap_pkthdr结构体有什么不同。Pfring_pkthdr结构体的定义如下:

/*

struct pfring_pkthdr {

 /* pcap header */

 struct timeval ts;    /* timestamp */

 u_int32_t caplen;     /* length ofportion present */

 u_int32_t len;        /* lengththis packet (off wire) */

 struct pfring_extended_pkthdr extended_hdr; /* PF_RING extended header*/

};

*/

/*

 pcap_pkthdr的结构体定义如下:

struct pcap_pkthdr {

       struct timeval ts;     /* time stamp */

       bpf_u_int32 caplen;       /* length of portion present */

       bpf_u_int32 len;     /* length this packet (off wire) */

};

*/

//对比发现它们两个相比,pfring_pkthdr多了一个PF_RING的扩展头。

 

#ifdefHAVE_PF_RING

       if(handle->ring) {

        do {

          if (handle->break_loop) {

            /*

             * Yes - clear the flag that indicates that it

             * has, and return -2 as an indication that we

             * were told to break out of the loop.

             *

             * Patch courtesy of Michael Stiller <ms@2scale.net>

             */

            handle->break_loop = 0;

            return -2;

          }

 

          packet_len = pfring_recv(handle->ring, (char*)handle->buffer,

                                 handle->bufsize,

                                 &pcap_header,

                                 1 /* wait_for_incoming_packet */);

  /*如果定义了PF_RING,就采用pfring_recv接收数据包,这个函数后面在进行讲解,如果没有定义PF_RING的话,采用recvmsgrecvfrom来接收数据包了,这两个函数有什么区别呢,大家google一下吧,不讲了。

*/

 

          if (packet_len > 0) {

            bp = handle->buffer;

            pcap_header.caplen = min(pcap_header.caplen, handle->bufsize);

            caplen = pcap_header.caplen, packet_len = pcap_header.len;

            goto pfring_pcap_read_packet;

          }

         }while (packet_len == -1 && (errno == EINTR || errno == ENETDOWN));

       }

#endif

 

#ifdefHAVE_PF_PACKET_SOCKETS

       /*

        *If this is a cooked device, leave extra room for a

        *fake packet header.

        */

       if (handle->md.cooked)

              offset = SLL_HDR_LEN;

       else

              offset = 0;

#else

       /*

        *This system doesn't have PF_PACKET sockets, so it doesn't

        *support cooked devices.

        */

       offset = 0;

#endif

 

       /*

        * Receive a single packet from the kernel.

        * We ignore EINTR, as that might just be dueto a signal

        * being delivered - if the signal shouldinterrupt the

        * loop, the signal handler should callpcap_breakloop()

        * to set handle->break_loop (we ignore iton other

        * platforms as well).

        * We also ignore ENETDOWN, so that we cancontinue to

        * capture traffic if the interface goes downand comes

        * back up again; comments in the kernelindicate that

        * we'll just block waiting for packets if wetry to

        * receive from a socket that deliveredENETDOWN, and,

        * if we're using a memory-mapped buffer, wewon't even

        * get notified of "network down"events.

        */

       bp = handle->buffer +handle->offset;

 

#ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

       msg.msg_name             = &from;

       msg.msg_namelen         = sizeof(from);

       msg.msg_iov         = &iov;

       msg.msg_iovlen            = 1;

       msg.msg_control           = &cmsg_buf;

       msg.msg_controllen       = sizeof(cmsg_buf);

       msg.msg_flags              = 0;

 

       iov.iov_len            = handle->bufsize - offset;

       iov.iov_base           = bp + offset;

#endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

 

       do {

              /*

               * Has "pcap_breakloop()" beencalled?

               */

              if (handle->break_loop) {

                     /*

                      * Yes - clear the flag that indicates that ithas,

                      * and return PCAP_ERROR_BREAK as an indicationthat

                      * we were told to break out of the loop.

                      */

                     handle->break_loop = 0;

                     return PCAP_ERROR_BREAK;

              }

 

#ifdefined(HAVE_PACKET_AUXDATA) && defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

              packet_len = recvmsg(handle->fd, &msg, MSG_TRUNC);

#else /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

              fromlen = sizeof(from);

              packet_len = recvfrom(

                     handle->fd, bp + offset,

                     handle->bufsize -offset, MSG_TRUNC,

                     (struct sockaddr *)&from, &fromlen);

#endif /* defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

       } while (packet_len == -1 &&errno == EINTR);

 

       /* Check if an error occured */

       if (packet_len == -1) {

              switch (errno) {

              case EAGAIN:

                     return 0;  /* no packet there */

 

              case ENETDOWN:

                     /*

                      * The device on which we're capturing wentaway.

                      *

                      * XXX - we should really return

                      * PCAP_ERROR_IFACE_NOT_UP, but pcap_dispatch()

                      * etc. aren't defined to return that.

                      */

                     snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                            "The interfacewent down");

                     return PCAP_ERROR;

 

              default:

                     snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                             "recvfrom: %s",pcap_strerror(errno));

                     return PCAP_ERROR;

              }

       }

 

#ifdefHAVE_PF_PACKET_SOCKETS

       if (!handle->md.sock_packet) {

              /*

               * Unfortunately, there is a window betweensocket() and

               * bind() where the kernel may queue packetsfrom any

               * interface. If we're bound to a particular interface,

               * discard packets notfrom that interface.

               *

               * (If socket filters are supported, we coulddo the

               * same thing we do when changing the filter;however,

               * that won't handle packet sockets withoutsocket

               * filter support, and it's a bit more complicated.

               * It would save some instructions per packet,however.)

               */

              if (handle->md.ifindex != -1&&

                 from.sll_ifindex != handle->md.ifindex)

                     return 0;

 

              /*

               * Do checks based on packet direction.

               * We can only do this if we're usingPF_PACKET; the

               * address returned for SOCK_PACKET is a"sockaddr_pkt"

               * which lacks the relevant packet typeinformation.

               */

              if (from.sll_pkttype ==PACKET_OUTGOING) {

                     /*

                      * Outgoing packet.

                      * If this is from the loopback device, rejectit;

                      * we'll see the packet as an incoming packetas well,

                      * and we don't want to see it twice.

                      */

                     if (from.sll_ifindex ==handle->md.lo_ifindex)

                            return 0;

 

                     /*

                      * If the user only wants incoming packets,reject it.

                      */

                     if (handle->direction ==PCAP_D_IN)

                            return 0;

              } else {

                     /*

                      * Incoming packet.

                      * If the user only wants outgoing packets,reject it.

                      */

                     if (handle->direction ==PCAP_D_OUT)

                            return 0;

              }

       }

#endif

 

#ifdefHAVE_PF_PACKET_SOCKETS

       /*

        * If this is a cooked device, fill in the fakepacket header.

        */

       if (handle->md.cooked) {

              /*

               * Add the length of the fake header to thelength

               * of packet data we read.

               */

              packet_len += SLL_HDR_LEN;

 

              hdrp = (struct sll_header *)bp;

              hdrp->sll_pkttype =map_packet_type_to_sll_type(from.sll_pkttype);

              hdrp->sll_hatype =htons(from.sll_hatype);

              hdrp->sll_halen =htons(from.sll_halen);

              memcpy(hdrp->sll_addr,from.sll_addr,

                 (from.sll_halen > SLL_ADDRLEN) ?

SLL_ADDRLEN :

                    from.sll_halen);

              hdrp->sll_protocol= from.sll_protocol;

       }

 

#ifdefHAVE_PF_RING

 pfring_pcap_read_packet:

#endif

 

#ifdefined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI)

       for (cmsg = CMSG_FIRSTHDR(&msg);cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {

              struct tpacket_auxdata *aux;

              unsigned int len;

              struct vlan_tag *tag;

 

              if (cmsg->cmsg_len <CMSG_LEN(sizeof(struct tpacket_auxdata)) ||

                 cmsg->cmsg_level != SOL_PACKET ||

                 cmsg->cmsg_type != PACKET_AUXDATA)

                     continue;

 

              aux= (struct tpacket_auxdata *)CMSG_DATA(cmsg);

              if (aux->tp_vlan_tci == 0)

                     continue;

 

              len = packet_len > iov.iov_len? iov.iov_len : packet_len;

              if (len < 2 * ETH_ALEN)

                     break;

 

              bp -= VLAN_TAG_LEN;

              memmove(bp, bp + VLAN_TAG_LEN, 2 *ETH_ALEN);

 

              tag = (struct vlan_tag *)(bp + 2 *ETH_ALEN);

              tag->vlan_tpid =htons(ETH_P_8021Q);

              tag->vlan_tci =htons(aux->tp_vlan_tci);

 

              packet_len += VLAN_TAG_LEN;

       }

#endif /*defined(HAVE_PACKET_AUXDATA) &&defined(HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI) */

#endif /*HAVE_PF_PACKET_SOCKETS */

 

       /*

        * XXX: According to the kernel source weshould get the real

        * packet len if calling recvfrom withMSG_TRUNC set. It does

        * not seem to work here :(, but it issupported by this code

        * anyway.

        * To be honest the code RELIES on that featureso this is really

        * broken with 2.2.x kernels.

        * I spend a day to figure out what's going onand I found out

        * that the following is happening:

        *

        * The packet comes from a random interface andthe packet_rcv

        * hook is called with a clone of the packet.That code inserts

        * the packet into the receive queue of thepacket socket.

        * If a filter is attached to that socket thatfilter is run

        * first - and there lies the problem. Thedefault filter always

        * cuts the packet at the snaplen:

        *

        * # tcpdump -d

        * (000) ret      #68

        *

        * So the packet filter cuts down the packet.The recvfrom call

        * says "hey, it's only 68 bytes, it fitsinto the buffer" with

        * the result that we don't get the real packetlength. This

        * is valid at least until kernel 2.2.17pre6.

        *

        * We currently handle this by making a copy ofthe filter

        * program, fixing all "ret"instructions with non-zero

        * operands to have an operand of 65535 so thatthe filter

        * doesn't truncate the packet, and supplyingthat modified

        * filter to the kernel.

        */

 

       caplen = packet_len;

       if (caplen > handle->snapshot)

              caplen = handle->snapshot;

 

       /* Run the packet filter if not usingkernel filter */

       if (!handle->md.use_bpf && handle->fcode.bf_insns){

              if(bpf_filter(handle->fcode.bf_insns, bp,

                              packet_len, caplen) == 0)

              {

                     /* rejected by filter */

                     return 0;

              }

       }

 

       /* Fill in our own header data */

#ifdef HAVE_PF_RING

        if(!handle->ring) {

#endif

 

       if (ioctl(handle->fd,SIOCGSTAMP, &pcap_header.ts) == -1) {

              snprintf(handle->errbuf,PCAP_ERRBUF_SIZE,

                      "SIOCGSTAMP: %s",pcap_strerror(errno));

              returnPCAP_ERROR;

       }

       pcap_header.caplen = caplen;

       pcap_header.len             = packet_len;

#ifdef HAVE_PF_RING

        }

#endif

 

       /*

        * Count the packet.

        *

        * Arguably, we should count them before wecheck the filter,

        * as on many other platforms"ps_recv" counts packets

        * handed to the filter rather than packetsthat passed

        * the filter, but if filtering is done in thekernel, we

        * can't get a count of packets that passed thefilter,

        * and that would mean the meaning of"ps_recv" wouldn't

        * be the same on all Linux systems.

        *

        * XXX - it's not the same on all systems inany case;

        * ideally, we should have a "get thestatistics" call

        * that supplies more counts and indicateswhich of them

        * it supplies, so that we supply a count ofpackets

        * handed to the filter only on platforms wherethat

        * information is available.

        *

        * We count them here even if we can get thepacket count

        * from the kernel, as we can only determine atrun time

        * whether we'll be able to get it from thekernel (if

        * HAVE_TPACKET_STATS isn't defined, we can'tget it from

        * the kernel, but if it is defined, thelibrary might

        * have been built with a 2.4 or later kernel,but we

        * might be running on a 2.2[.x] kernel without Alexey

        * Kuznetzov's turbopacket patches, and thusthe kernel

        * might not be able to supply thosestatistics).  We

        * could, I guess, try, when opening thesocket, to get

        * the statistics, and if we can not incrementthe count

        * here, but it's not clear that alwaysincrementing

        * the count is more expensive than alwaystesting a flag

        * in memory.

        *

        * We keep the count in"md.packets_read", and use that for

        * "ps_recv" if we can't get thestatistics from the kernel.

        * We do that because, if we *can* get thestatistics from

        * the kernel, we use"md.stat.ps_recv" and "md.stat.ps_drop"

        * as running counts, as reading the statisticsfrom the

        * kernel resets the kernel statistics, and ifwe directly

        * increment "md.stat.ps_recv" here,that means it will

        * count packets *twice* on systems where wecan get kernel

        * statistics - once here, and once inpcap_stats_linux().

        */

       handle->md.packets_read++;

 

       /* Call the usersupplied callback function */

#if defined(HAVE_PF_RING)

       {

         struct myts {

           struct timeval ts;

           u_int32_t caplen, len;

           u_int64_t ns;

         };

         struct myts myhdr;

 

         myhdr.ts.tv_sec = pcap_header.ts.tv_sec,myhdr.ts.tv_usec = pcap_header.ts.tv_usec;

         myhdr.caplen = pcap_header.caplen, myhdr.len= pcap_header.len;

         myhdr.ns =pcap_header.extended_hdr.timestamp_ns;

 

         callback(userdata, (structpcap_pkthdr*)&myhdr, bp);

       }

#else

       callback(userdata,&pcap_header, bp);

#endif

 

/*这个函数虽然比较长,但是一路看下来,还是比较好理解的,就是在不同的socket下调用不同的函数接收数据包,最后看是否定义了HAVE_PF_RING,如果定义了,调用的回调函数callback的头会不一样的,呵呵,上面代码中已经可以看的很清楚了。

*/

       return 1;

}

 

讲解了这么多了,pcap_open_live还没有讲解完了,这几十页下来就讲解了pcap_open_live中调用的一个函数,哈哈,也就是pcap-linux.c中调用的pcap_create函数,libpcap博大精深,加上了pf-ring就有一种更高深的感觉。既然还没有讲解完,就接着讲解呗,下面讲解pcap_open_live中调用的另外一个函数,pcap_activate。Pcap_create起的作用是创建和绑定socket,同时定义了一些回调函数。那么pcap_activate的作用是啥呢,用源码说话,I love linux ,I love open source。

 

Int pcap_activate(pcap_t*p)

{

       int status;

       status = p->activate_op(p);

 /*activate_op是个什么函数呢,搜了下原型是个函数指针,这个函数指针在哪里赋值呢,搜源码吧,呵呵。终于在pcap-linux.c下搜到了它的初始化赋值,哈哈,原来就是

       handle->activate_op= pcap_activate_linux;

    明白了在pcap_create中定义的pcap_activate_linux函数中赋值的回调函数activate_op终于在这里调用了,其实pcap_create只赋值定义这个回调函数,而调用就是在这里了。前面分析的一切到现在才调用,呵呵,明白了吗?

*/

       if (status >= 0)                                   //pcap_activate_linux的返回值>=0表示成功

              p->activated = 1;

       else {

              if (p->errbuf[0] == '\0') {

              /*

                 * No error message supplied by the activateroutine;

                 * for the benefit of programs that don'tspecially

                 * handle errors other than PCAP_ERROR,return the

                 * error message corresponding to the status.

              */

                     snprintf(p->errbuf,PCAP_ERRBUF_SIZE, "%s",

                         pcap_statustostr(status));

              }

              /*

               * Undo any operation pointer setting, etc.done by

               * the activate operation.

               */

              initialize_ops(p);

       }

       return (status);

}

 

Pcap_open_live终于分析完了,我也要去吃晚饭了,下面还有好多要分析呢,排个队吧,首先分析pcap_next等函数吧,socket已经建立和绑定了,也是该捕获数据的时候了,呵呵,捕获数据的回调函数也已经定义了,就是那个pcap_read_linux函数,即pcap_read_packet函数了,我现在猜想,pcap_open_live中肯定会调用这个回调函数的,咋们走着瞧。先吃饭,人是铁,饭是刚,一顿不吃饿的慌。稍后见。。。。。。。。

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值