TUN/
TAP虚拟网络设备为用户空间程序提 供了网络数据包的发送和接收能力。他既可以当做点对点设备(
TUN),也可以当做以太网设备(
TAP)。实际上,不仅Linux支 持
TUN/
TAP虚拟网络设备,其他UNIX也 是支持的,他们之间只有少许差别。
原理简介
TUN/
TAP虚拟网络设备的原理比较简单, 他在Linux内核中添加了一个
TUN/
TAP虚拟网络设备的驱动程序和一个 与之相关连的字符设备/dev/net/
tun, 字符设备
tun作为用户空间和内核 空间交换数据的接口。当内核将数据包发送到虚拟网络设备时,数据包被保存在设备相关的一个队列中,直到用户空间程序通过打开的字符设备
tun的描述符读取时,它才会被拷贝 到用户空间的缓冲区中,其效果就相当于,数据包直接发送到了用户空间。通过系统调用write发送数据包时其原理与此类似。
值得注意的是:一次read系统调用,有且只有一个数据包被传送到用户空间,并且当用户空间的缓冲区比较小时,数据包将被截断,剩余部分将永久地 消 失,write系统调用与read类似,每次只发送一个数据包。所以在编写此类程序的时候,请用足够大的缓冲区,直接调用系统调用 read/write, 避免采用C语言的带缓存的IO函数。
在计算机网络中,TUN与TAP是
操作系统内核中的虚拟
网 络设备。不同于普通靠硬件网路
板卡实现的设备,这些虚拟的
网 络设备全部用软件实现,并向运行于操作系统上的软件提供与硬件的网络设备完全相同的功能。
操作系统通过TUN/TAP设备向绑定该设备的
用户空间的程序发送数据,反 之,用户空间的程序也可以像操作硬件网络设备那样,通过TUN/TAP设备发送数据。在后种情况下,TUN/TAP设备向操作系统的网络栈投 递(或“注入”)数据包,从而模拟从外部接受数据的过程。
服务器如果拥有TUN/TAP模块,就可以开启
VPN代理功能。
虚拟网卡TUN/TAP 驱动程序设计原理:
macvlan的功能是给同一个物理网卡配置多个
MAC地址,这样可以在软件商配置多 个以太网口,属于物理层的功能。
macvtap是用来替代TUN/TAP和Bridge内核模块,macvtap是基于macvlan这个模块,提供TUN/TAP中tap设备 使用的接口,
使用macvtap以太网口的虚拟机能够通过tap设备接口,直接将数据传递到内核中对应的macvtap以太网口。
vhost-net是对于virtio的优化,virtio本来是设计用于进行客户系统的前端与VMM的后端通信,减少硬件虚拟化方式下根模式个 非根模式的切换。
而是用vhost-net后,可以进一步进入CPU的根模式后,需要进入用户态将数据发送到tap设备后再次切入内核态的开销,而是进入内核态后 不需要在 进行内核态用户态的切换,进一步减少这种特权级的切换,说vhost-net属于哪个层不准确,而是属于进行二层网络数据传递的优化。
macvtap使用后,tap就完全不用了。
vhost_net过来的报文调用macvtap的接口,然后传递到macvlan所生产的虚拟网络设备上。
没有了tap设备接口。
目的
Macvtap是 一个新的设备驱动程序,旨在简化虚 拟化的桥接网络。它取 代基于macvlan设备驱动模块的TUN / TAP和桥 驱动器的组合。一个macvtap终 点(endpoint)是一个字符设备,主要遵循的TUN / TAP ioctl接口,可以直 接使用KVM/ qemu和其他支持TUN / TAP接口的虚拟机管理程序。 终点(endpoint)扩展了现有的网 络接口,较低的设备,在同 一个以太网段上的,拥有自己的 MAC地址。通常情况下,这是用来使 双方的客户机和主机直接显示的主机连 接到交换机上的。
VEPA,Bridge、private mode
在macvlan中,任何macvtap设 备可以在以上三种模式之一。定义在一个单一较低的设备macvtap端 点之间的通信:
1.Virtual以太网端口聚合器(VEPA),默 认模式:相同的设备上的数 据从一个端点到另一个端点被发送到 外部交换机降低设备。如果该 交换机支持的发夹模式,
帧开始浏览送回对下位装置, 并从那里到目标端点。
今天,大多数交换机不支持发 夹模式,所以是不能 够交换以太网帧的两个端点,虽 然他们可能仍然能够使用TCP / IP路由器进 行通信。相邻的桥作 为一台Linux主机
可以被放入写的/ sys/ class / net中的/ dev/ BRIF/端口/ hairpin_mode的发 夹模式。这种模式是特别有 趣,如果你要管理的虚拟机网络交 换机级的。了解的VEPA
客人的的交换机是可 以强制每个MAC地址的过 滤和带宽的限制,而不知 道它在Linux主机。
1.Bridge,可以直接相互连接的 所有端点。两个端点都在 桥接模式下可以直接交换以太网帧,无 需通过外部桥接的往返。这是 最有用的模式与经典的开关设 置,和客
体间的通信性能的关键。
二基于完整性,私人模式存 在的行为就像一个发夹知道开 关在没有一个VEPA模式 端点。即使当开关处于发 夹模式,一个私人端点永 远不能在同一lowerdev任何其他
端点通信。
设置macvtap
创建一个macvtap接 口使用的IP链路命令IPRoute2 包中,以同样的方式,我 们macvlan或VETH的接 口配置。
示例:
$ ip link add link eth1 name macvtap0 type macvtap
$ ip link set macvtap0 address 1a:46:0b:ca:bc:7b up
$ ip link show macvtap0
12: macvtap0@eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 1a:46:0b:ca:bc:7b brd ff:ff:ff:ff:ff:ff
在相同的时间会由udev创建一个字符设备。除非另有配置,udev的名称,这装置/ dev/ tapn的,具有n个对应于指数的新macvtap端点的网络接口的数量,
在上述例子中'12'。 TUN / TAP不同,字符设备只能代表一个单独的网络接口,我们可以给用户或用户组,我们希望能够使用新的自来水的所有权。端点配置
的MAC地址是很重要的,因为使用这个地址在外部网络上,客人是不能够欺骗或改变地址,必须配置使用相同的地址。
Qemu中上macvtap
的Qemu0.12不直接支持macvtap的,所以我们有(AB)使用的TUN / TAP的配置界面。在接口上启动来宾从上面的例子中,我们需要通过一个打开的文件描述符
与qemu的设备节点,并告诉它的MAC地址。通常用于桥配置的脚本必须被禁止。可以用来打开一个bash重定向字符设备的读/写模式,并把它作为文件描述符3。
qemu -net nic,model=virtio,addr=1a:46:0b:ca:bc:7b -net tap,fd=3 3<>/dev/tap11
MacVTap
Purpose
Macvtap is a new device driver meant to simplify virtualized bridged networking. It replaces the combination of the tun/tap and bridge drivers with a single module based on the macvlan device driver. A macvtap endpoint is a character device that largely follows the tun/tap ioctl interface and can be used directly by kvm/qemu and other hypervisors that support the tun/tap interface. The endpoint extends an existing network interface, the lower device, and has its own mac address on the same ethernet segment. Typically, this is used to make both the guest and the host show up directly on the switch that the host is connected to.
VEPA, Bridge and private mode
Like macvlan, any macvtap device can be in one of three modes, defining the communication between macvtap endpoints on a single lower device:
-
Virtual Ethernet Port Aggregator (VEPA), the default mode: data from one endpoint to another endpoint on the same lower device gets sent down the lower device to external switch. If that switch supports the hairpin mode, the frames get sent back to the lower device and from there to the destination endpoint.
-
Most switches today do not support hairpin mode, so the two endpoints are not able to exchange ethernet frames, although they might still be able to communicate using an tcp/ip router. A linux host used as the adjacent bridge can be put into hairpin mode by writing to /sys/class/net/dev/brif/port/hairpin_mode. This mode is particularly interesting if you want to manage the virtual machine networking at the switch level. A switch that is aware of the VEPA guests can enforce filtering and bandwidth limits per MAC address without the Linux host knowing about it.
-
Bridge, connecting all endpoints directly to each other. Two endpoints that are both in bridge mode can exchange frames directly, without the round trip through the external bridge. This is the most useful mode for setups with classic switches, and when inter-guest communication is performance critical.
-
For completeness, a private mode exists that behaves like a VEPA mode endpoint in the absence of a hairpin aware switch. Even when the switch is in hairpin mode, a private endpoint can never communicate to any other endpoint on the same lowerdev.
Setting up macvtap
A macvtap interface is created an configured using the ip link command from iproute2, in the same way as we configure macvlan or veth interfaces.
Example:
$ ip link add link eth1 name macvtap0 type macvtap
$ ip link set macvtap0 address 1a:46:0b:ca:bc:7b up
$ ip link show macvtap0
12: macvtap0@eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 1a:46:0b:ca:bc:7b brd ff:ff:ff:ff:ff:ff
At the same time a character device gets created by udev. Unless configured otherwise, udev names this device /dev/tapn, with n corresponding to the number of network interface index of the new macvtap endpoint, in the above example '12'. Unlike tun/tap, the character device only represents a single network interface, and we can give the ownership to a user or group that we want to be able to use the new tap. Configuring the mac address of the endpoint is important, because this address is used on the external network, the guest is not able to spoof or change that address and has to be configured with the same address.
Qemu on macvtap
Qemu as of 0.12 does not have direct support for macvtap, so we have to (ab)use the tun/tap configuration interface. To start a guest on the interface from the above example, we need to pass the device node as an open file descriptor to qemu and tell it about the mac address. The scripts normally used for bridge configuration must be disabled. A bash redirect can be used to open the character device in read/write mode and pass it as file descriptor 3.
qemu -net nic,model=virtio,addr=1a:46:0b:ca:bc:7b -net tap,fd=3 3<>/dev/tap11
TUN 设备
TUN 设备是一种虚拟网络设备,通过此设备,程序可以方便得模拟网络行为。先来看看物理设备是如何工作的:
所有物理网卡收到的包会交给内核的 Network Stack 处理,然后通过 Socket API 通知给用户程序。下面看看 TUN 的工作方式:
普通的网卡通过网线收发数据包,但是 TUN 设备通过一个文件收发数据包。所有对这个文件的写操作会通过 TUN 设备转换成一个数据包送给内核;当内核发送一个包给 TUN 设备时,通过读这个文件可以拿到包的内容。
如果我们使用 TUN 设备搭建一个基于 UDP VPN,那么整个处理过程就是这样:
数据包会通过内核网络栈两次。但是经过 App 的处理后,数据包可能已经加密,并且原有的 ip 头被封装在 udp 内部,所以第二次通过网络栈内核看到的是截然不同的网络包。
TAP 设备
TAP 设备与 TUN 设备工作方式完全相同,区别在于:
MACVLAN
有时我们可能需要一块物理网卡绑定多个 IP 以及多个 MAC 地址,虽然绑定多个 IP 很容易,但是这些 IP 会共享物理网卡的 MAC 地址,可能无法满足我们的设计需求,所以有了 MACVLAN 设备,其工作方式如下:
MACVLAN 会根据收到包的目的 MAC 地址判断这个包需要交给哪个虚拟网卡。单独使用 MACVLAN 好像毫无意义,但是配合之前介绍的 network namespace 使用,我们可以构建这样的网络:
由于 macvlan 与 eth0 处于不同的 namespace,拥有不同的 network stack,这样使用可以不需要建立 bridge 在 virtual namespace 里面使用网络。
MACVTAP
MACVTAP 是对 MACVLAN的改进,把 MACVLAN 与 TAP 设备的特点综合一下,使用 MACVLAN 的方式收发数据包,但是收到的包不交给 network stack 处理,而是生成一个 /dev/tapX 文件,交给这个文件:
由于 MACVLAN 是工作在 MAC 层的,所以 MACVTAP 也只能工作在 MAC 层,不会有 MACVTUN 这样的设备。
来自:https://blog.kghost.info/2013/03/27/linux-network-tun/
nabling host-guest networking with KVM, Macvlan and Macvtap
The perfect setup, nearly
You installed your Linux server and naturally selected KVM (Kernel Virtual Machine) as hypervisor. Using virt-manager, you also created one or more guest VMs (Virtual Machines).
You want fast networking. So you use the paravirtualized virtio drivers for the guests.
You also want no difference between virtual and non-virtual machines. All should be able to talk over the same LAN, use the same subnet, contact the same DHCP server and talk with each other. So you use the Macvtap driver. Macvtap makes use of Macvlan, also written as MAC VLAN. MAC VLAN allows you to have multiple Ethernet MAC (Media Access Control) addresses on one NIC (Network Interface Card). Network traffic will go directly to and from the physical line to the guest VM. If you enable bridge mode, then all kind-of-virtual NICs attached to the same host (or physical NIC, I’m not sure) can see each other.
It’s just so much easier than having to create and manage traditional brctr bridges. And probably it performs better, too.
The problem: the host cannot talk with the guests
The guests can talk to each other. But the host is excluded from the social event. Look at the picture below. Guest 1 and guest 2 are connected using a red line; they are also connected with the eth0 physical NIC of the host. Packets delivered to eth0 will be sent to the network immediately. The hypervisor cannot intercept them.
Solution: create a macvlan interface on the host
If you create a macvlan interface on the host, and use that one instead of eth0, than the host can communicate with the guests. Some people don’t like this solution because of bad integration with the NetworkManager, but I like it because I don’t have to modify the guests. And I’m using only one host machine, so I can handle that with ease.
I have tested this solution myself on two different computers, both running Scientific Linux 6.4 (a RHEL derivative). So beware, YMMV.
What I did: I wrote a simple shell script that takes care of the creation of and routing to a macvlan interface on the host. So on the host, you have to run this script on startup, e.g. by adding the full path to the script in /etc/rc.local. Here is the script:
#!/bin/bash
# let host and guests talk to each other over macvlan
# configures a macvlan interface on the hypervisor
# run this on the hypervisor (e.g. in /etc/rc.local)
# made for IPv4; need modification for IPv6
# meant for a simple network setup with only eth0,
# and a static (manual) ip config
# Evert Mouw, 2013
HWLINK=eth0
MACVLN=macvlan0
TESTHOST=www.google.com
# ------------
# wait for network availability
# ------------
while ! ping -q -c 1 $TESTHOST > /dev/null
do
echo "$0: Cannot ping $TESTHOST, waiting another 5 secs..."
sleep 5
done
# ------------
# get network config
# ------------
IP=$(ip address show dev $HWLINK | grep "inet " | awk '{print $2}')
NETWORK=$(ip -o route | grep $HWLINK | grep -v default | awk '{print $1}')
GATEWAY=$(ip -o route | grep default | awk '{print $3}')
# ------------
# setting up $MACVLN interface
# ------------
ip link add link $HWLINK $MACVLN type macvlan mode bridge
ip address add $IP dev $MACVLN
ip link set dev $MACVLN up
# ------------
# routing table
# ------------
# empty routes
ip route flush dev $HWLINK
ip route flush dev $MACVLN
# add routes
ip route add $NETWORK dev $MACVLN metric 0
# ad default gateway
ip route add default via $GATEWAY
Beware: If the underlying eth{n} link is down, then also the macvlan will go to the “down” state. That means that the hardware ethernet link must be up, otherwise macvlan/macvtap based VMs will not be able to communicate with each other, or with the host. Also, NetworkManager can play nasty on your customized routing table when the link comes up again.
The resulting routing table will look like this:
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 macvlan0
0.0.0.0 10.0.0.2 0.0.0.0 UG 0 0 0 macvlan0
Guest configuration
The guest must be configured to use macvtap in bridge mode. Typically, in the configuration XML (/etc/libvirt/qemu) you will find:
<interface type='direct'>
<source dev='eth0' mode='bridge'/>
Remember that the guest will then use the DHCP server of the physical LAN. No need any more for the dnsmasq part on the hypervisor. If all your guests use this trick, then you can do:
rm /etc/libvirt/qemu/networks/autostart/*
That removes the bridge interfaces you see when you run ifconfig. If you cannot wait until the next reboot, also do for each network:
virsh net-destroy _network-name_