简介
pktgen是Linux内核里包含的一个高性能发包工具,主要用来测试网络性能。一般情况下,使用pktgen就可以满足千兆网卡的测试需要。 pktgen运行在“内核态”,并不占用太多的系统资源,就可以达到非常高的发包速率。
pktgen只支持UDP发包(端口9)。因为pktgen是一个非常底层测试工具,而且一般是测试网络设备的性能,并不涉及到应用层面。如果要测试高级的网络应用的性能,请使用其它的测试工具。
Pktgen的优点是可以根据MAC地址来指定具体的发包端口,而不是根据路由。可以利用该内核工具来测试光模块/SFP+电缆吞吐量、还可以利用pktgen测试网卡的性能(服务器相同配置下不同网卡性能对比)。
本测试在内核原有pktgen模块的基础上打了pktgen_rx补丁,增加了收包的统计功能。
安装
Linux内核自带pktgen模块,不带rx统计功能,需要rx功能的话需要下载补丁pktgen_rx.tgz,下载地
实验环境:
机器型号:DELL R720
CPU: : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (20核40线程)
内核版本2.6.37
从以上网址下载pktgen_rx.tgz并解压,进入pktgen目录
拷贝内核源码至/usr/src/kernels/下,再编译
提示函数有问题,如下修改
改成
再编译
直接挂载就能使用
测试
测试拓扑
测试从eth6发包eth7收包shell脚本
pktgen.sh
#!/bin/sh
# pktgen.conf -- Sample configuration for send on two devices on a UP system
#modprobe pktgen
function pgset() {
local result
echo $1 > $PGDEV
result=`cat $PGDEV | fgrep "Result: OK:"`
if [ "$result" = "" ]; then
cat $PGDEV | fgrep Result:
fi
}
function pg() {
echo inject > $PGDEV
cat $PGDEV
}
#rx config
#Disable autonegotion in the interface
/sbin/ethtool -A $1 autoneg off rx off tx off
# Reception configuration 收包统计配置
PGDEV=/proc/net/pktgen/pgrx
echo "Removing old config"
pgset "rx_reset"
echo "Adding rx $1"
pgset "rx $1"
echo "Setting statistics $2(counters/basic/time)"
pgset "statistics $2"
pgset "display $3"
# pgset "display script/human"
# Result can be vieved in /proc/net/pktgen/eth1
#cat /proc/net/pktgen/pgrx
# We use eth6
echo "Adding devices to run".
PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device eth6" #根据实际测试更改网口设备
pgset "max_before_softirq 10000"
# Configure the individual devices
echo "Configuring devices"
PGDEV=/proc/net/pktgen/eth6 #根据实际测试更改网口设备
pgset "clone_skb 10000"
pgset "pkt_size 1514" #根据实际测试更改包的大小
pgset "dst_mac 00:16:31:F0:84:D1" #根据实际测试更改
pgset "count 0" #设置count=0无限发包
# Time to run
PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
运行的时候将$2设置为time,可以在/proc/net/pktgen/pgrx查看相关的jitter和lantency,如下
测试截图:
运行pktgen脚本截图
测试过程中数据流截图
对比eth6发包与eth7收包截图
从红框中可以看出收发包是一样的,而吞吐量大概达到了9.8G以上,当然发包的大小可以自己修改,测试中如果出现丢包情况,请多测试几次看看。
而从我测试来看,应该可以得到一个结论:CPU越好,每秒钟能发出的包数目越多,越有可能达到线速,在我的测试环境下,发包大概能达到4Mpps。
eth6发包eth7收包,并且eth7发包eth6收包shell脚本
pktgen_eth6_eth7.sh
#! /bin/sh
#modprobe pktgen
pgset() {
local result
echo $1 > $PGDEV
result=`cat $PGDEV | fgrep "Result: OK:"`
if [ "$result" = "" ]; then
cat $PGDEV | fgrep Result:
fi
}
pg() {
echo inject > $PGDEV
cat $PGDEV
}
# Config Start Here ————————————————
# thread config
# Each CPU has own thread. Two CPU exammple. We add eth3, eth2 respectivly.
/sbin/ethtool -A $1 autoneg off rx off tx off
# Reception configuration
PGDEV=/proc/net/pktgen/pgrx
echo "Removing old config"
pgset "rx_reset"
echo "Adding rx $1"
pgset "rx $1"
echo "Setting statistics $2"
pgset "statistics $2"
pgset "display human"
# pgset "display script"
PGDEV=/proc/net/pktgen/kpktgend_0
echo "Removing all devices"
pgset "rem_device_all"
echo "Adding eth6"
pgset "add_device eth6"
echo "Setting max_before_softirq 10000"
pgset "max_before_softirq 10000"
PGDEV=/proc/net/pktgen/kpktgend_1
echo "Removing all devices"
pgset "rem_device_all"
echo "Adding eth7"
pgset "add_device eth7"
echo "Setting max_before_softirq 10000"
pgset "max_before_softirq 10000"
# device config
# delay 0 means maximum speed.
CLONE_SKB="clone_skb 1000000"
# NIC adds 4 bytes CRC
#COUNT="count 0"
PKT_SIZE="pkt_size 1514"
COUNT="count 0"
DELAY="delay 0"
PGDEV=/proc/net/pktgen/eth6
echo "Configuring $PGDEV"
pgset "$COUNT"
pgset "$CLONE_SKB"
pgset "$PKT_SIZE"
pgset "$DELAY"
#pgset "src_min 100.1.1.2"
#pgset "src_max 100.1.1.254"
pgset "dst 200.1.1.2"
pgset "dst_mac 00:16:31:F0:84:D1"
PGDEV=/proc/net/pktgen/eth7
echo "Configuring $PGDEV"
pgset "$COUNT"
pgset "$CLONE_SKB"
pgset "$PKT_SIZE"
pgset "$DELAY"
#pgset "src_min 200.1.1.2"
#pgset "src_max 200.1.1.254"
pgset "dst 100.1.1.2"
pgset "dst_mac 00:16:31:F0:84:D0"
# Time to run
PGDEV=/proc/net/pktgen/pgctrl
echo "Running… ctrl^C to stop"
pgset "start"
echo "Done"
pktgen_eth6_eth7.sh可以只统计其中一个端口的数据:./pktgen_eth6_eth7.sh eth6 counters(选择counters/basic/time没有区别,原因未明)
也可以双端口都统计:直接运行./pktgen_eth6_eth7.sh。
注意:如果首先运行了pktgen_eth6_eth7.sh后,再去运行pktgen.sh的话,原本应该是只有eth6发包,但是结果是eth6和eth7都会发包。此时的处理方法是先卸载pktgen.ko再加载之。
以上两种测试在小包情况下发包只能达到4Mpps左右,要提高发包速率,采用多核多线程处理,代码如下(仍是从eth6发包eth7收包)
pktgen_multicore.sh
#! /bin/sh
# $1 Rate in packets per s
# $2 Number of CPUs to use
function pgset() {
local result
echo $1 > $PGDEV
}
# Reception configuration
PGDEV=/proc/net/pktgen/pgrx
echo "Removing old config"
pgset "rx_reset"
echo "Adding rx eth7"
pgset "rx eth7"
echo "Setting statistics counters"
pgset "statistics counters"
pgset "display human"
# pgset "display script"
# Result can be vieved in /proc/net/pktgen/eth1
#cat /proc/net/pktgen/pgrx
# Config Start Here -----------------------------------------------------------
# thread config
CPUS=$2
#PKTS=`echo "scale=0; $3/$CPUS" | bc`
CLONE_SKB="clone_skb 10000"
PKT_SIZE="pkt_size 60"
COUNT="count 0"
DELAY="delay 0"
MAC="00:16:31:F0:84:D1"
ETH="eth6"
RATEP=`echo "scale=0; $1/$CPUS" | bc`
for processor in {0..14} #kpktgen_0到14
do
PGDEV=/proc/net/pktgen/kpktgend_$processor
# echo "Removing all devices"
pgset "rem_device_all"
done
for ((processor=0;processor
do
PGDEV=/proc/net/pktgen/kpktgend_$processor
# echo "Adding $ETH"
pgset "add_device $ETH@$processor"
PGDEV=/proc/net/pktgen/$ETH@$processor
# echo "Configuring $PGDEV"
pgset "$COUNT"
pgset "flag QUEUE_MAP_CPU"
pgset "$CLONE_SKB"
pgset "$PKT_SIZE"
#pgset "$DELAY"
pgset "ratep $RATEP"
#pgset "dst 10.0.0.1"
pgset "dst_mac $MAC"
#Random address with in the min-max range
#pgset "flag IPDST_RND"
pgset "src_min 1.0.0.0"
pgset "src_max 100.255.255.255"
#enable configuration packet
#pgset "config 1" #config [0 or 1] Enables or disables the configuration packet, which reset the statistics and allows to calculate the losses.
#pgset "flows 1024"
#pgset "flowlen 8"
done
# Time to run
PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
同时设置网卡多队列与CPU的亲和性,如下
Eth6绑定CPU0-19
[root@localhost pktgen]# echo 1 > /proc/irq/122/smp_affinity
[root@localhost pktgen]# echo 2 > /proc/irq/123/smp_affinity
[root@localhost pktgen]# echo 4 > /proc/irq/124/smp_affinity
[root@localhost pktgen]# echo 8 > /proc/irq/125/smp_affinity
[root@localhost pktgen]# echo 10 > /proc/irq/126/smp_affinity
[root@localhost pktgen]# echo 20 > /proc/irq/127/smp_affinity
[root@localhost pktgen]# echo 40 > /proc/irq/128/smp_affinity
[root@localhost pktgen]# echo 80 > /proc/irq/129/smp_affinity
[root@localhost pktgen]# echo 100 > /proc/irq/130/smp_affinity
[root@localhost pktgen]# echo 200 > /proc/irq/131/smp_affinity
[root@localhost pktgen]# echo 400 > /proc/irq/132/smp_affinity
[root@localhost pktgen]# echo 800 > /proc/irq/133/smp_affinity
[root@localhost pktgen]# echo 1000 >/proc/irq/134/smp_affinity
[root@localhost pktgen]# echo 2000 >/proc/irq/135/smp_affinity
[root@localhost pktgen]# echo 4000 >/proc/irq/136/smp_affinity
[root@localhost pktgen]# echo 8000 >/proc/irq/137/smp_affinity
[root@localhost pktgen]# echo 10000 >/proc/irq/138/smp_affinity
[root@localhost pktgen]# echo 20000 >/proc/irq/139/smp_affinity
[root@localhost pktgen]# echo 40000 >/proc/irq/140/smp_affinity
[root@localhost pktgen]# echo 80000 >/proc/irq/141/smp_affinity
Eth7绑定CPU0-19
[root@localhost pktgen]# echo 1 > /proc/irq/143/smp_affinity
[root@localhost pktgen]# echo 2 > /proc/irq/144/smp_affinity
[root@localhost pktgen]# echo 4 > /proc/irq/145/smp_affinity
[root@localhost pktgen]# echo 8 > /proc/irq/146/smp_affinity
[root@localhost pktgen]# echo 10 > /proc/irq/147/smp_affinity
[root@localhost pktgen]# echo 20 > /proc/irq/148/smp_affinity
[root@localhost pktgen]# echo 40 > /proc/irq/149/smp_affinity
[root@localhost pktgen]# echo 80 > /proc/irq/150/smp_affinity
[root@localhost pktgen]# echo 100 > /proc/irq/151/smp_affinity
[root@localhost pktgen]# echo 200 > /proc/irq/152/smp_affinity
[root@localhost pktgen]# echo 400 > /proc/irq/153/smp_affinity
[root@localhost pktgen]# echo 800 > /proc/irq/154/smp_affinity
[root@localhost pktgen]# echo 1000 >/proc/irq/155/smp_affinity
[root@localhost pktgen]# echo 2000 >/proc/irq/156/smp_affinity
[root@localhost pktgen]# echo 4000 >/proc/irq/157/smp_affinity
[root@localhost pktgen]# echo 8000 >/proc/irq/158/smp_affinity
[root@localhost pktgen]# echo 10000 >/proc/irq/159/smp_affinity
[root@localhost pktgen]# echo 20000 >/proc/irq/160/smp_affinity
[root@localhost pktgen]# echo 40000 >/proc/irq/161/smp_affinity
[root@localhost pktgen]# echo 80000 >/proc/irq/162/smp_affinity
注意:网卡队列与CPU绑定时根据ip和端口来的,所以ip或者端口不能固定不变,不然绑定失效。本测试源ip打散发送。
测试结果表明,多队列与cpu绑定后发包和收包都得到很大的提升。(原先CPU单核接收最多只能达到2Mpps,设置绑定后达到了9.5Mpps左右,当然还可以继续提高。)
网卡默认的MTU是1500不能接收大于1518的数据包,因此可以更改其大小来接收比如8192byte字节数据包
ip link set dev eth6 mtu 8174
ip link set dev eth7 mtu 8174
修改后调整脚本中pkt_size至8188,eth7可以接收到数据(tcpdump抓包可看到,如果不进行更改,抓包会显示如下)
附:
配置项解释
Configuring threads and devices
================================
This is done via the /proc interface easiest done via pgset in the scripts
Examples:
pgset "clone_skb 1" sets the number of copies of the same packet
pgset "clone_skb 0" use single SKB for all transmits
pgset "pkt_size 9014" sets packet size to 9014
pgset "frags 5" packet will consist of 5 fragments
pgset "count 200000" sets number of packets to send, set to zero for continuous sends until explicitly stopped.
pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds
pgset "dst 10.0.0.1" sets IP destination address(BEWARE! This generator is very aggressive!)
pgset "dst_min 10.0.0.1" Same as dst
pgset "dst_max 10.0.0.254" Set the maximum destination IP.
pgset "src_min 10.0.0.1" Set the minimum (or only) source IP.
pgset "src_max 10.0.0.254" Set the maximum source IP.
pgset "dst6 fec0::1" IPV6 destination address
pgset "src6 fec0::2" IPV6 source address
pgset "dstmac 00:00:00:00:00:00" sets MAC destination address
pgset "srcmac 00:00:00:00:00:00" sets MAC source address
pgset "queue_map_min 0" Sets the min value of tx queue interval
pgset "queue_map_max 7" Sets the max value of tx queue interval, for multiqueue devices.To select queue 1 of a given device, use queue_map_min=1 and queue_map_max=1
pgset "src_mac_count 1" Sets the number of MACs we'll range through.The 'minimum' MAC is what you set with srcmac.
pgset "dst_mac_count 1" Sets the number of MACs we'll range through.The 'minimum' MAC is what you set with dstmac.
pgset "flag [name]" Set a flag to determine behaviour. Current flags are:
IPSRC_RND #IP Source is random (between min/max),
IPDST_RND, UDPSRC_RND
UDPDST_RND, MACSRC_RND, MACDST_RND
MPLS_RND, VID_RND, SVID_RND
QUEUE_MAP_RND # queue map random
QUEUE_MAP_CPU # queue map mirrors
smp_processor_id()
pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then cycle through the port range.
pgset "udp_src_max 9" set UDP source port max.
pgset "udp_dst_min 9" set UDP destination port min, If < udp_dst_max, then cycle through the port range.
pgset "udp_dst_max 9" set UDP destination port max.
pgset "mpls 0001000a,0002000a,0000000a" set MPLS labels (in this example outer label=16,middle label=32,inner label=0 (IPv4 NULL)) Note that there must be no spaces between the arguments. Leading zeros are required.Do not set the bottom of stack bit,that's done automatically. If you do set the bottom of stack bit, that indicates that you want to randomly generate that address and the flag MPLS_RND will be turned on. You can have any mix of random and fixed labels in the label stack.
pgset "mpls 0" turn off mpls (or any invalid argument works too!)
pgset "vlan_id 77" set VLAN ID 0-4095
pgset "vlan_p 3" set priority bit 0-7 (default 0)
pgset "vlan_cfi 0" set canonical format identifier 0-1 (default 0)
pgset "svlan_id 22" set SVLAN ID 0-4095
pgset "svlan_p 3" set priority bit 0-7 (default 0)
pgset "svlan_cfi 0" set canonical format identifier 0-1 (default 0)
pgset "vlan_id 9999" > 4095 remove vlan and svlan tags
pgset "svlan 9999" > 4095 remove svlan tag
pgset "tos XX" set former IPv4 TOS field (e.g. "tos 28" for AF11 no ECN, default 00)
pgset "traffic_class XX" set former IPv6 TRAFFIC CLASS (e.g. "traffic_class B8" for EF no ECN, default 00)pgset stop aborts injection. Also, ^C aborts generator.
pgset "rate 300M" set rate to 300 Mb/s
pgset "ratep 1000000" set rate to 1Mpps
线速定义