最近在公司做一个项目,同事写的程序用rawsocket总是丢包。以前用pcap的java版做过一个接受程序不丢包,所以想采用pcap库实现接收包的问题。同事改成了这个库,但是消息量小了不丢,消息量大了还是丢包。
我花了大约一个周的时间专门找这个问题,结合gulp的程序,发现同事的程序和gulp的程序不同的地方就是接收方法不一样,还有一些初始化变量不同,开始断定应该不是变量的问题,怀疑是pcap_next()函数和pacap_loop()函数的差别问题。
于是自己写了个小程序,代码如下:
int main(int argc, char* argv[]){
signal(SIGINT, SIG_IGN);
char *dev, errbuf[PCAP_ERRBUF_SIZE];
dev = pcap_lookupdev(errbuf);
printf("Device: %s\n", dev);
pthread_t threads[1];
buf = (char *)malloc(ringsize+1);
if (!buf) {
fprintf(stderr, "%s: Malloc failed, exiting\n", progname); exit(1);
}
if (mlock(buf, ringsize+1) != 0) {
fprintf(stderr, "%s: Warning: could not lock ring buffer into RAM\n",
progname);
}
int rc = pthread_create(&threads[0], NULL, &Reader, NULL);
while (true) {
pause();
}
/*int rtid = getpid();
printf("rtid is %d \n",rtid);
cpu_set_t csmask;
CPU_ZERO(&csmask);
CPU_SET(1, &csmask);
if (sched_setaffinity(rtid, sizeof(csmask), &csmask) == -1)
{
printf("warning: could not set CPU affinity, continuing...\n");
}
setpriority(PRIO_PROCESS, rtid, -15);
bpf_u_int32 net;
bpf_u_int32 mask;
char *sDevice = "eth1";
if (pcap_lookupnet(sDevice, &net, &mask, errbuf) == -1)//
{
printf( "Couldn't get netmask for device %s: %s\n", sDevice, errbuf);
net = 0;
mask = 0;
}
pcap_t* handle = pcap_open_live(sDevice, 65535, 1, 0, errbuf);
printf("pcap handle is %d \n",handle);
char filter_exp[] = "";
struct bpf_program filter;
if (pcap_compile(handle, &filter, filter_exp, 0, net) == -1) {
fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(handle));
return 0;
}
if (pcap_setfilter(handle, &filter) == -1) {
fprintf(stderr, "Couldn't install filter %s: %s\n", filter_exp, pcap_geterr(handle));
return 0;
}
int count=0;
fprintf(stderr, "receive begin \n");
int num_packets = 10000;
pcap_loop(handle, num_packets, got_packet,NULL);//
fprintf(stderr, "packet count is %d \n", captured);
pcap_close(handle);
return(0);
}
发现还是丢包,没有办法只能修改gulp的代码了,一段一段的修改、测试,发现在pcap_t* handle = pcap_open_live(sDevice, 65535, 1, 0, errbuf);这个函数时有问题了,当设置到1024的时候一点都不丢包,但是65535的时候就丢包了,看了pcap的pcap_open_live函数也没有看明白什么原因,我怀疑时内部处理分配内存的时候,每一个包分配65535大小肯定比分配处理1024包大小的内存耗时,所以导致丢包。
请各位用pcap的时候牢记这个东东吧,我可吃过苦了。。。。
我花了大约一个周的时间专门找这个问题,结合gulp的程序,发现同事的程序和gulp的程序不同的地方就是接收方法不一样,还有一些初始化变量不同,开始断定应该不是变量的问题,怀疑是pcap_next()函数和pacap_loop()函数的差别问题。
于是自己写了个小程序,代码如下:
int main(int argc, char* argv[]){
signal(SIGINT, SIG_IGN);
char *dev, errbuf[PCAP_ERRBUF_SIZE];
dev = pcap_lookupdev(errbuf);
printf("Device: %s\n", dev);
pthread_t threads[1];
buf = (char *)malloc(ringsize+1);
if (!buf) {
fprintf(stderr, "%s: Malloc failed, exiting\n", progname); exit(1);
}
if (mlock(buf, ringsize+1) != 0) {
fprintf(stderr, "%s: Warning: could not lock ring buffer into RAM\n",
progname);
}
int rc = pthread_create(&threads[0], NULL, &Reader, NULL);
while (true) {
pause();
}
/*int rtid = getpid();
printf("rtid is %d \n",rtid);
cpu_set_t csmask;
CPU_ZERO(&csmask);
CPU_SET(1, &csmask);
if (sched_setaffinity(rtid, sizeof(csmask), &csmask) == -1)
{
printf("warning: could not set CPU affinity, continuing...\n");
}
setpriority(PRIO_PROCESS, rtid, -15);
bpf_u_int32 net;
bpf_u_int32 mask;
char *sDevice = "eth1";
if (pcap_lookupnet(sDevice, &net, &mask, errbuf) == -1)//
{
printf( "Couldn't get netmask for device %s: %s\n", sDevice, errbuf);
net = 0;
mask = 0;
}
pcap_t* handle = pcap_open_live(sDevice, 65535, 1, 0, errbuf);
printf("pcap handle is %d \n",handle);
char filter_exp[] = "";
struct bpf_program filter;
if (pcap_compile(handle, &filter, filter_exp, 0, net) == -1) {
fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(handle));
return 0;
}
if (pcap_setfilter(handle, &filter) == -1) {
fprintf(stderr, "Couldn't install filter %s: %s\n", filter_exp, pcap_geterr(handle));
return 0;
}
int count=0;
fprintf(stderr, "receive begin \n");
int num_packets = 10000;
pcap_loop(handle, num_packets, got_packet,NULL);//
fprintf(stderr, "packet count is %d \n", captured);
pcap_close(handle);
return(0);
}
发现还是丢包,没有办法只能修改gulp的代码了,一段一段的修改、测试,发现在pcap_t* handle = pcap_open_live(sDevice, 65535, 1, 0, errbuf);这个函数时有问题了,当设置到1024的时候一点都不丢包,但是65535的时候就丢包了,看了pcap的pcap_open_live函数也没有看明白什么原因,我怀疑时内部处理分配内存的时候,每一个包分配65535大小肯定比分配处理1024包大小的内存耗时,所以导致丢包。
请各位用pcap的时候牢记这个东东吧,我可吃过苦了。。。。