DPDK主要用到三个技术点分别为hugetlbpage、uio以及cpu affinity;
1)关于hugetlbpage,它的主要好处当然是通过利用大内存页提高内存使用效率;
2)而uio是实现用户空间下驱动程序的支撑机制,由于DPDK是应用层平台,所以与此紧密相连的网卡驱动程序(当然,主要是intel自身的千兆igb与万兆ixgbe驱动程序)都通过uio机制运行在用户态下;
3)cpu affinity机制是多核cpu发展的结果,在越来越多核心的cpu机器上,如何提高外设以及程序工作效率的最直观想法就是让各个cpu核心各自干专门的事情,比如两个网卡eth0和eth1都收包,可以让cpu0专心处理eth0,cpu1专心处理eth1。DPDK利用cpu affinity主要是将控制面线程以及各个数据面线程绑定到不同的cpu,省却了来回反复调度的性能消耗,各个线程一个while死循环,专心致志的做事,互不干扰;
1、所用系统与软件版本
系统:CentOS Linux release 7.0.1406 64位
dpdk: 1.8.0 (下载页)
2、虚拟机配置
虚拟机软件:VMWare WorkStation 10.0.1 build-1379776
CPU: 3个CPU, 每个CPU1个核心
内存:1GB
网卡:intel网卡*2, 用于dpdk试验;另一块网卡用于和宿主系统进行通信
3、dpdk安装/测试
3.1、准备
安装CentOS虚拟机时,如果选择minimal安装,还需要安装其下的基本开发工具集(含gcc,python等)
另外,dpdk提供的dpdk_nic_bind.py脚本中会调用到lspci命令,这个默认没有安装,运行以下命令安装(不安装此工具则无法绑定网卡):
1
|
yum
install
pciutils
|
ifconfig默认也没有安装,如果想用它,应运行:
1
|
yum
install
net
-
tools
|
在CentOS上,要绑定给dpdk使用的网卡在绑定前,可能是活动的(active),应将其禁用,否则无法绑定。禁用的一种方式是运行:(eno33554984是接口名,如同eth0一样)
1
|
ifconfig
eno33554984
down
|
安装kernel-devel
1
|
yum
install
"kernel-devel-uname-r == $(uname -r)"
|
3.2、通过setup脚本进行配置
dpdk提供了一个方便的配置脚本: /tools/setup.sh,通过它可以方便地配置环境;
1) 设置环境变量,这里是linux 64位的配置
1
2
|
export
RTE_SDK
=
<
dpdk主目录
>
export
RTE_TARGET
=
x86_64
-
native
-
linuxapp
-
gcc
|
2)运行setup.sh,显示如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Step
1
:
Select
the
DPDK
environment
to
build
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
[
1
]
i686
-
native
-
linuxapp
-
gcc
[
2
]
i686
-
native
-
linuxapp
-
icc
[
3
]
ppc_64
-
power8
-
linuxapp
-
gcc
[
4
]
x86_64
-
ivshmem
-
linuxapp
-
gcc
[
5
]
x86_64
-
ivshmem
-
linuxapp
-
icc
[
6
]
x86_64
-
native
-
bsdapp
-
clang
[
7
]
x86_64
-
native
-
bsdapp
-
gcc
[
8
]
x86_64
-
native
-
linuxapp
-
clang
[
9
]
x86_64
-
native
-
linuxapp
-
gcc
[
10
]
x86_64
-
native
-
linuxapp
-
icc
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Step
2
:
Setup
linuxapp
environment
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
[
11
]
Insert
IGB
UIO
module
[
12
]
Insert
VFIO
module
[
13
]
Insert
KNI
module
[
14
]
Setup
hugepage
mappings
for
non
-
NUMA
systems
[
15
]
Setup
hugepage
mappings
for
NUMA
systems
[
16
]
Display
current
Ethernet
device
settings
[
17
]
Bind
Ethernet
device
to
IGB
UIO
module
[
18
]
Bind
Ethernet
device
to
VFIO
module
[
19
]
Setup
VFIO
permissions
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Step
3
:
Run
test
application
for
linuxapp
environment
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
[
20
]
Run
test
application
(
$
RTE_TARGET
/
app
/
test
)
[
21
]
Run
testpmd
application
in
interactive
mode
(
$
RTE_TARGET
/
app
/
testpmd
)
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Step
4
:
Other
tools
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
[
22
]
List
hugepage
info
from
/
proc
/
meminfo
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Step
5
:
Uninstall
and
system
cleanup
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
[
23
]
Uninstall
all
targets
[
24
]
Unbind
NICs
from
IGB
UIO
driver
[
25
]
Remove
IGB
UIO
module
[
26
]
Remove
VFIO
module
[
27
]
Remove
KNI
module
[
28
]
Remove
hugepage
mappings
[
29
]
Exit
Script
|
选择9
3)选择11, 插入igb_uio模块
4)选择14,配置大页内存(非NUMA),选择后会提示你选择页数,输入64
1
2
3
4
5
6
7
|
Removing
currently
reserved
hugepages
Unmounting
/
mnt
/
huge
and
removing
directory
Input
the
number
of
2MB
pages
Example
:
to
have
128MB
of
hugepages
available
,
enter
'64'
to
reserve
64
*
2MB
pages
Number
of
pages
:
64
|
选择24,可以确认一下大页内存的配置:
1
2
3
4
5
6
|
AnonHugePages
:
6144
kB
HugePages_Total
:
64
HugePages_Free
:
64
HugePages_Rsvd
:
0
HugePages_Surp
:
0
Hugepagesize
:
2048
kB
|
5)选择17, 绑定dpdk要使用的网卡
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
Network
devices
using
DPDK
-
compatible
driver
===
===
===
===
===
===
===
===
===
===
===
===
===
===
==
<
none
>
Network
devices
using
kernel
driver
===
===
===
===
===
===
===
===
===
===
===
==
0000
:
02
:
01.0
'82545EM Gigabit Ethernet Controller (Copper)'
if
=
eno16777736
drv
=
e1000
unused
=
igb_uio *
Active*
0000
:
02
:
05.0
'82545EM Gigabit Ethernet Controller (Copper)'
if
=
eno33554984
drv
=
e1000
unused
=
igb
_uio
0000
:
02
:
06.0
'82545EM Gigabit Ethernet Controller (Copper)'
if
=
eno50332208
drv
=
e1000
unused
=
igb_uio
Other
network
devices
===
===
===
===
===
===
===
<
none
>
Enter
PCI
address
of
device
to
bind
to
IGB
UIO
driver
:
0000
:
02
:
05.0
|
绑定好后,选择16,可以查看当前的网卡配置:
1
2
3
4
5
6
7
8
9
10
11
12
|
Network
devices
using
DPDK
-
compatible
driver
===
===
===
===
===
===
===
===
===
===
===
===
===
===
==
0000
:
02
:
05.0
'82545EM Gigabit Ethernet Controller (Copper)'
drv
=
igb_uio
unused
=
0000
:
02
:
06.0
'82545EM Gigabit Ethernet Controller (Copper)'
drv
=
igb_uio
unused
=
Network
devices
using
kernel
driver
===
===
===
===
===
===
===
===
===
===
===
==
0000
:
02
:
01.0
'82545EM Gigabit Ethernet Controller (Copper)'
if
=
eno16777736
drv
=
e1000
unused
=
igb_uio *
Active*
Other
network
devices
===
===
===
===
===
===
===
<
none
>
|
6)选择21, 运行testpmd测试程序
1
2
3
|
Enter
hex
bitmask
of
cores
to
execute
testpmd
app
on
Example
:
to
execute
app
on
cores
0
to
7
,
enter
0xff
bitmask
:
3
|
注意,运行这个测试程序,虚拟机最好提供2个网卡用于dpdk;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
Launching
app
x86_64
-
native
-
linuxapp
-
gcc
/
app
/
testpmd
-
c
3
-
n
4
--
-
i
EAL
:
Detected
lcore
0
as
core
0
on
socket
0
EAL
:
Detected
lcore
1
as
core
1
on
socket
0
EAL
:
Detected
lcore
2
as
core
2
on
socket
0
EAL
:
Support
maximum
128
logical
core
(
s
)
by
configuration
.
EAL
:
Detected
3
lcore
(
s
)
EAL
:
unsupported
IOMMU
type
!
EAL
:
VFIO
support
could
not
be
initialized
EAL
:
Setting
up
memory
.
.
.
EAL
:
Ask
a
virtual
area
of
0x200000
bytes
EAL
:
Virtual
area
found
at
0x7f76db600000
(
size
=
0x200000
)
EAL
:
Ask
a
virtual
area
of
0x200000
bytes
EAL
:
Virtual
area
found
at
0x7f76db200000
(
size
=
0x200000
)
EAL
:
Ask
a
virtual
area
of
0x7c00000
bytes
EAL
:
Virtual
area
found
at
0x7f76d3400000
(
size
=
0x7c00000
)
EAL
:
Requesting
64
pages
of
size
2MB
from
socket
0
EAL
:
TSC
frequency
is
~
2294255
KHz
EAL
:
Master
core
0
is
ready
(
tid
=
dc81e840
)
PMD
:
ENICPMD
trace
:
rte_enic_pmd_init
EAL
:
Core
1
is
ready
(
tid
=
d2bfe700
)
EAL
:
PCI
device
0000
:
02
:
01.0
on
NUMA
socket
-
1
EAL
:
probe
driver
:
8086
:
100f
rte_em_pmd
EAL
:
0000
:
02
:
01.0
not
managed
by
UIO
driver
,
skipping
EAL
:
PCI
device
0000
:
02
:
05.0
on
NUMA
socket
-
1
EAL
:
probe
driver
:
8086
:
100f
rte_em_pmd
EAL
:
PCI
memory
mapped
at
0x7f76db800000
EAL
:
PCI
memory
mapped
at
0x7f76db820000
PMD
:
eth_em_dev_init
(
)
:
port
_id
0
vendorID
=
0x8086
deviceID
=
0x100f
EAL
:
PCI
device
0000
:
02
:
06.0
on
NUMA
socket
-
1
EAL
:
probe
driver
:
8086
:
100f
rte_em_pmd
EAL
:
PCI
memory
mapped
at
0x7f76db830000
EAL
:
PCI
memory
mapped
at
0x7f76db850000
PMD
:
eth_em_dev_init
(
)
:
port
_id
1
vendorID
=
0x8086
deviceID
=
0x100f
Interactive
-
mode
selected
Configuring
Port
0
(
socket
0
)
PMD
:
eth_em_tx_queue_setup
(
)
:
sw_ring
=
0x7f76d3eef880
hw_ring
=
0x7f76db648600
dma_addr
=
0x648600
PMD
:
eth_em_rx_queue_setup
(
)
:
sw_ring
=
0x7f76d3eef380
hw_ring
=
0x7f76db658600
dma_addr
=
0x658600
PMD
:
eth_em_start
(
)
:
<<
Port
0
:
00
:
0C
:
29
:
50
:
BD
:
E2
Configuring
Port
1
(
socket
0
)
PMD
:
eth_em_tx_queue_setup
(
)
:
sw_ring
=
0x7f76d3eed180
hw_ring
=
0x7f76db668600
dma_addr
=
0x668600
PMD
:
eth_em_rx_queue_setup
(
)
:
sw_ring
=
0x7f76d3eecc80
hw_ring
=
0x7f76db678600
dma_addr
=
0x678600
PMD
:
eth_em_start
(
)
:
<<
Port
1
:
00
:
0C
:
29
:
50
:
BD
:
EC
Checking
link
statuses
.
.
.
Port
0
Link
Up
-
speed
1000
Mbps
-
full
-
duplex
Port
1
Link
Up
-
speed
1000
Mbps
-
full
-
duplex
Done
testpmd
>
|
输入start, 开始包转发
1
2
3
4
5
6
7
8
|
testpmd
>
start
io
packet
forwarding
-
CRC
stripping
disabled
-
packets
/
burst
=
32
nb
forwarding
cores
=
1
-
nb
forwarding
ports
=
2
RX
queues
=
1
-
RX
desc
=
128
-
RX
free
threshold
=
32
RX
threshold
registers
:
pthresh
=
8
hthresh
=
8
wthresh
=
0
TX
queues
=
1
-
TX
desc
=
512
-
TX
free
threshold
=
0
TX
threshold
registers
:
pthresh
=
32
hthresh
=
0
wthresh
=
0
TX
RS
bit
threshold
=
0
-
TXQ
flags
=
0x0
|
输入stop,停止包转发,这时会显示统计信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
testpmd
>
stop
Telling
cores
to
stop
.
.
.
Waiting
for
lcores
to
finish
.
.
.
--
--
--
--
--
--
--
--
--
--
--
Forward
statistics
for
port
0
--
--
--
--
--
--
--
--
--
--
--
RX
-
packets
:
5832826
RX
-
dropped
:
0
RX
-
total
:
5832826
TX
-
packets
:
5832800
TX
-
dropped
:
0
TX
-
total
:
5832800
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Forward
statistics
for
port
1
--
--
--
--
--
--
--
--
--
--
--
RX
-
packets
:
5832822
RX
-
dropped
:
0
RX
-
total
:
5832822
TX
-
packets
:
5832800
TX
-
dropped
:
0
TX
-
total
:
5832800
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
++
++
++
++
++
++
++
+
Accumulated
forward
statistics
for
all
ports
++
++
++
++
++
++
++
+
RX
-
packets
:
11665648
RX
-
dropped
:
0
RX
-
total
:
11665648
TX
-
packets
:
11665600
TX
-
dropped
:
0
TX
-
total
:
11665600
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
++
Done
.
testpmd
>
|
3.3、通过命令配置
1)编译dpdk
进入dpdk主目录,输入
1
|
make
install
T
=
x86_64
-
native
-
linuxapp
-
gcc
|
进行编译
2)配置大页内存(非NUMA)
1
2
3
|
echo
128
>
/
sys
/
kernel
/
mm
/
hugepages
/
hugepages
-
2048kB
/
nr_hugepages
mkdir
/
mnt
/
huge
mount
-
t
hugetlbfs
nodev
/
mnt
/
huge
|
可以用以下命令查看大页内存状态:
1
|
cat
/
proc
/
meminfo
|
grep
Huge
|
3)安装igb_uio驱动
1
2
|
modprobe
uio
insmod
x86_64
-
native
-
linuxapp
-
gcc
/
kmod
/
igb_uio
.
ko
|
4)绑定网卡
先看一下当前网卡的状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
.
/
tools
/
dpdk_nic_bind
.
py
--
status
Network
devices
using
DPDK
-
compatible
driver
===
===
===
===
===
===
===
===
===
===
===
===
===
===
==
<
none
>
Network
devices
using
kernel
driver
===
===
===
===
===
===
===
===
===
===
===
==
0000
:
02
:
01.0
'82545EM Gigabit Ethernet Controller (Copper)'
if
=
eth0
drv
=
e1000
unused
=
igb_uio *
Active*
Other
network
devices
===
===
===
===
===
===
===
0000
:
02
:
06.0
'82545EM Gigabit Ethernet Controller (Copper)'
unused
=
e1000
,
igb
_uio
0000
:
02
:
07.0
'82545EM Gigabit Ethernet Controller (Copper)'
unused
=
e1000
,
igb_uio
|
进行绑定:
1
2
|
.
/
tools
/
dpdk_nic_bind
.
py
-
b
igb
_uio
0000
:
02
:
05.0
.
/
tools
/
dpdk_nic_bind
.
py
-
b
igb
_uio
0000
:
02
:
06.0
|
5) 运行testpmd测试程序
1
|
.
/
x86_64
-
native
-
linuxapp
-
gcc
/
app
/
testpmd
-
c
0x3
-
n
2
--
-
i
|
6)编译运行其他示例程序
<dpdk>/examples下面有很多示例程序,这些程序在dpdk编译时,没有被编译。这里以编译helloworld为例,首先要设置环境变量:
1
2
|
export
RTE_SDK
=
<
dpdk主目录
>
export
RTE_TARGET
=
x86_64
-
native
-
linuxapp
-
gcc
|
之后进入/examples/helloworld,运行make,成功会生成build目录,其中有编译好的helloworld程序;
4、错误
EAL: Error reading from file descriptor
这个bug已经由dpdk的开发人员修复,patch内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
diff
--
git
a
/
lib
/
librte_eal
/
linuxapp
/
igb_uio
/
igb_uio
.
c
b
/
lib
/
librte_eal
/
linuxapp
/
igb_uio
/
igb_uio
.
c
index
d1ca26e
.
.
c46a00f
100644
--
-
a
/
lib
/
librte_eal
/
linuxapp
/
igb_uio
/
igb_uio
.
c
++
+
b
/
lib
/
librte_eal
/
linuxapp
/
igb_uio
/
igb_uio
.
c
@
@
-
505
,
14
+
505
,
11
@
@
igbuio_pci_probe
(
struct
pci_dev *
dev
,
const
struct
pci_device_id *
id
)
}
/* fall back to INTX */
case
RTE_INTR_MODE_LEGACY
:
-
if
(
pci_intx_mask_supported
(
dev
)
)
{
-
dev_dbg
(
&
dev
->
dev
,
"using INTX"
)
;
-
udev
->
info
.
irq_flags
=
IRQF_SHARED
;
-
udev
->
info
.
irq
=
dev
->
irq
;
-
udev
->
mode
=
RTE_INTR_MODE_LEGACY
;
-
break
;
-
}
-
dev_notice
(
&
dev
->
dev
,
"PCI INTX mask not supported\n"
)
;
+
dev_dbg
(
&
dev
->
dev
,
"using INTX"
)
;
+
udev
->
info
.
irq_flags
=
IRQF_SHARED
;
+
udev
->
info
.
irq
=
dev
->
irq
;
+
udev
->
mode
=
RTE_INTR_MODE_LEGACY
;
+
break
;
/* fall back to no IRQ */
case
RTE_INTR_MODE_NONE
:
udev
->
mode
=
RTE_INTR_MODE_NONE
;
|
Cause: Creation of mbuf pool for socket 0 failed
肯定是hugepage分配少了…