Ethernet Bridge Netfilter框架及Data Path分析_nf_br_pre_routing_blacksonlgx的博客-CSDN博客
1. Netfilter简介
netfilter是由Rusty Russell提出的Linux 2.4内核防火墙框架,该框架既简洁又灵活,可实现安全策略应用中的许多功能,如数据包过滤、数据包处理、地址伪装、透明代理、动态网络地址转换(Network Address Translation,NAT),以及基于用户及媒体访问控制(Media Access Control,MAC)地址的过滤和基于状态的过滤、包速率限制等。
netfilter的架构就是在数据包在通过网络协议栈的若干位置放置了一些检测点(HOOKS),不同防火墙功能的子模块通过netfilter框架提供的API注册相应的HOOK回调函数,并在回调函数中实现对数据包的过滤、解包修改后重新打包等操作。已注册的HOOK回调函数会在数据包流经netfilter框架HOOK点时被系统自动调用执行。
2. Ethernet Bridge 的Netfilter框架
计算机网络的五层协议体系结构中,网络层(IP层)和数据链路层的防火墙功能都是基于netfilter框架实现。分别对应着上层应用配置工具iptables和ebtables。本章节主要介绍Ethernet Bridge的Netfilter(本文简称EB-NF)框架及其实现原理。
2.1 EB-NF的HOOK点
从linux的网桥代码中,我们可以看到Ethernet Bridge的Netfilter框架有6个HOOK点,分别为BROUTING、PREROUTING、INPUT、FORWARD、OUTPUT、POSTROUTING。其中BROUTING是网桥独有的,且不是通过调用netfilter框架的register函数来进行钩子的回调注册。
/* Bridge Hooks */
/* After promisc drops, checksum checks. */
#define NF_BR_PRE_ROUTING 0
/* If the packet is destined for this box. */
#define NF_BR_LOCAL_IN 1
/* If the packet is destined for another interface. */
#define NF_BR_FORWARD 2
/* Packets coming from a local process. */
#define NF_BR_LOCAL_OUT 3
/* Packets about to hit the wire. */
#define NF_BR_POST_ROUTING 4
/* Not really a hook, but used for the ebtables broute table */
#define NF_BR_BROUTING 5
#define NF_BR_NUMHOOKS 6
基于这6个HOOK点我们可以描绘出EB-NF大致框架为如下:
2.2 EB-NF框架/ebtable命令中的常用关键名词
2.2.1 链(CHAINS)
Linux内核ebtables模块中包含有3个表,每个表默认包含有多条链。表是用于把不同组的防火墙规则划分为不同功能。而不同组的规则我们称之为链。每一条链都由一组有序的防火墙规则组成,用于对以太帧的处理。每一条规则都指定了对“命中”匹配条件的数据帧的处理办法,该处理方法称之为“target”。当以太帧没有匹配到当前的防火墙规则时,则会尝试匹配链中的下一条规则,如此类推。Ebtables模块中默认有6个链和上文的HOOKS对应。除此之外,用户还可以自定义新的链。用户自定义的链在上层可通过在默认链中添加防火墙规则在规则匹配时跳转到自定义链进行规则匹配。
2.2.2 目标(TARGETS)
一条防火墙规则需要指定一条执行目标。执行目标(TARGETS)可以是以下:
(1) ACCEPT(让以太帧通过此链)
(2) DROP (丢弃该以太帧)
(3) CONTINUE (检查同一条链的下一条规则,一般用于统计通过该条链的流量)
(4) RETURN (停止检查该条链的余下规则,检查调用该条链的上一条链的下一条规则)
(5) “TARGET EXTENSION”(扩展的执行目标,如 –mark-set等)
(6) 跳转到自定义的链进行规则匹配。
另外,BROUTING链中ACCEPT和DROP的意思会不太一样,ACCEPT表示以太帧进入转发处理流程,DROP表示以太帧进入本地路由的处理流程。
2.2.3 表(TABLES)
如前面所说,ebtables模块按功能划分为3个表,分别为如下:
(1) filter表
filter表是上层ebtables应用程序默认操作的表,它包含有3个链,分别为INPUT链(数据帧的目的地址是网桥本身)、OUTPUT链(针对本地生成和桥接路由的数据帧)、FORWARD链(被网桥转发的数据帧)。
(2) nat表
nat表一般用于修改以太帧的mac地址,同样,它也包含有3个链,分别为PREROUTING链(当接收到以太帧时立即修改数据帧)、OUTPUT链(修改本地生成和桥接路由的数据帧,在它们桥接前 )、POSTROUTING链。
注意:Ebtables中PREROUTING链和POSTROUTING链的名称是由基于IP层协议的iptables引申过来。在数据链路层中,实际命名为PREFORWARDING和POSTFORWARDING会理解得更加准确。
(3) Broute表
Broute表用于支持网桥路由的设备。它只有一个BROUTING链。处于forwarding状态的以太帧进入网桥设备后首先通过的就是BROUTING链,经过BROUTING后才决定数据包是进入网桥转发处理流程还是本地路由处理流程。大部分时候,这些数据帧都是会被进行网桥转发,我们可以在BROUTING链中添加防火墙规则让它进行本地路由。另外,处于混杂模式的网桥设备,数据包除了会进行网桥转发外还会克隆一份skb进行本地路由。
2.3 EB-NF框架中“表-链-规则”的关系
如上文提到,不同“表”代表着不同防火墙功能模块,每个表可以有多条链,一条链由一组有序的防火墙规则组成,一条防火墙规则包含匹配条件和执行目标。其关系图如下所示:
2.4 EB-NF框架中“表-链-规则”的内核源代码实现
EB-NF框架的源代码在”$(kernel_sourcetree_top)/net/bridge/nefilter/”目录下。其中前文提到的ebtables模块中3张表分别对应着ebtables_broute.c、ebtables_filter.c、以及ebtables_nat.c。
以下ebtables_filter.c为例进行分析。
2.4.1 Ebtables 模块 “表-链-规则”的初始化创建过程
Ebtable模块的表的数据结构体为struct ebt_table:
struct ebt_table {
struct list_head list;
char name[EBT_TABLE_MAXNAMELEN];
struct ebt_replace_kernel *table;
unsigned int valid_hooks;
rwlock_t lock;
/* e.g. could be the table explicitly only allows certain
* matches, targets, ... 0 == let it in */
int (*check)(const struct ebt_table_info *info,
unsigned int valid_hooks);
/* the data used by the kernel */
struct ebt_table_info *private;
struct module *me;
};
内核启动的时候Ebtable的filter表通过ebt_register_table()函数注册到netfilter子系统。
static int __net_init frame_filter_net_init(struct net *net)
{
net->xt.frame_filter = ebt_register_table(net, &frame_filter);
return PTR_ERR_OR_ZERO(net->xt.frame_filter);
}
该表命名为“filter”。默认有3个chains,分别命令为“INPUT”、“FORWARD”、“OUTPUT”,分别和BR_NF的6个钩子中的NF_BR_LOCAL_IN、NF_BR_FORWARD、NF_BR_LOCAL_OUT对应。
#define FILTER_VALID_HOOKS ((1 << NF_BR_LOCAL_IN) | (1 << NF_BR_FORWARD) | \
(1 << NF_BR_LOCAL_OUT))
static struct ebt_entries initial_chains[] = { // filter表初始化的3个链
{
.name = "INPUT",
.policy = EBT_ACCEPT,
},
{
.name = "FORWARD",
.policy = EBT_ACCEPT,
},
{
.name = "OUTPUT",
.policy = EBT_ACCEPT,
},
};
static struct ebt_replace_kernel initial_table = {
.name = "filter",
.valid_hooks = FILTER_VALID_HOOKS,
.entries_size = 3 * sizeof(struct ebt_entries),
.hook_entry = {
[NF_BR_LOCAL_IN] = &initial_chains[0],
[NF_BR_FORWARD] = &initial_chains[1],
[NF_BR_LOCAL_OUT] = &initial_chains[2],
},
.entries = (char *)initial_chains,
};
static const struct ebt_table frame_filter = {
.name = "filter", //表的名称为“filter”
.table = &initial_table,
.valid_hooks = FILTER_VALID_HOOKS,
.check = check,
.me = THIS_MODULE,
};
一条的链的数据结构体为struct ebt_entries, 一条防火墙规则的数据结构体为struct ebt_entry。
struct ebt_entries {
/* this field is always set to zero
* See EBT_ENTRY_OR_ENTRIES.
* Must be same size as ebt_entry.bitmask */
unsigned int distinguisher;
/* the chain name */
char name[EBT_CHAIN_MAXNAMELEN];
/* counter offset for this chain */
unsigned int counter_offset;
/* one standard (accept, drop, return) per hook */
int policy;
/* nr. of entries */
unsigned int nentries;
/* entry list */
char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); // struct_ebt_replace包含了该条链所包含的防火墙规则数量,以及防火墙规则链表头等信息
};
/* one entry */
struct ebt_entry {
/* this needs to be the first field */
unsigned int bitmask;
unsigned int invflags;
__be16 ethproto;
/* the physical in-dev */
char in[IFNAMSIZ];
/* the logical in-dev */
char logical_in[IFNAMSIZ];
/* the physical out-dev */
char out[IFNAMSIZ];
/* the logical out-dev */
char logical_out[IFNAMSIZ];
unsigned char sourcemac[ETH_ALEN];
unsigned char sourcemsk[ETH_ALEN];
unsigned char destmac[ETH_ALEN];
unsigned char destmsk[ETH_ALEN];
/* sizeof ebt_entry + matches */
unsigned int watchers_offset; //扩展的match(struct ebt_entry_match)和其相邻的watcher()
/* sizeof ebt_entry + matches + watchers */
unsigned int target_offset; //扩展的target(struct ebt_entry_target)
/* sizeof ebt_entry + matches + watchers + target */
unsigned int next_offset;
unsigned char elems[0] __attribute__ ((aligned (__alignof__(struct ebt_replace))));
};
2.4.2 Ebtables模块表规则的检查过程
Ebtables的filter模块除了注册了filter表外,还注册了bridge netfilter的钩子函数,让数据包通过协议栈的钩子检测点时调用ebt_do_table()函数来检查filter表中的防火墙规则。
static int __init ebtable_filter_init(void)
{
int ret;
ret = register_pernet_subsys(&frame_filter_net_ops);
if (ret < 0)
return ret;
ret = nf_register_hooks(ebt_ops_filter, ARRAY_SIZE(ebt_ops_filter));
if (ret < 0)
unregister_pernet_subsys(&frame_filter_net_ops);
return ret;
}
static struct nf_hook_ops ebt_ops_filter[] __read_mostly = {
{
.hook = ebt_in_hook,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_LOCAL_IN,
.priority = NF_BR_PRI_FILTER_BRIDGED,
},
{
.hook = ebt_in_hook,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_FORWARD,
.priority = NF_BR_PRI_FILTER_BRIDGED,
},
{
.hook = ebt_out_hook,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_LOCAL_OUT,
.priority = NF_BR_PRI_FILTER_OTHER,
},
};
static unsigned int
ebt_in_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
return ebt_do_table(skb, state, state->net->xt.frame_filter);
}
static unsigned int
ebt_out_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
return ebt_do_table(skb, state, state->net->xt.frame_filter);
}
2.4.3 Ebtables模块中扩展的条件匹配的源码实现
扩展的条件匹配(MATCH EXTENSION)的内核实现是通过xt_register_match()函数来进行登记注册。如arp参数的实现过程为:
static int __init ebt_arp_init(void)
{
return xt_register_match(&ebt_arp_mt_reg);
}
static struct xt_match ebt_arp_mt_reg __read_mostly = {
.name = "arp",
.revision = 0,
.family = NFPROTO_BRIDGE,
.match = ebt_arp_mt,
.checkentry = ebt_arp_mt_check,
.matchsize = sizeof(struct ebt_arp_info),
.me = THIS_MODULE,
};
static bool
ebt_arp_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct ebt_arp_info *info = par->matchinfo;
const struct arphdr *ah;
struct arphdr _arph;
ah = skb_header_pointer(skb, 0, sizeof(_arph), &_arph);
if (ah == NULL)
return false;
// 部分内容省略
if (info->bitmask & EBT_ARP_OPCODE && FWINV(info->opcode !=
ah->ar_op, EBT_ARP_OPCODE))
return true;
}
2.4.4 Ebtables模块中扩展的执行目标的源码实现
扩展的执行目标(TARGET EXTENSION)的内核实现是通过xt_register_target()函数来进行登记注册。如mark参数的实现过程为:
static int __init ebt_mark_init(void)
{
return xt_register_target(&ebt_mark_tg_reg);
}
static struct xt_target ebt_mark_tg_reg __read_mostly = {
.name = "mark",
.revision = 0,
.family = NFPROTO_BRIDGE,
.target = ebt_mark_tg,
.checkentry = ebt_mark_tg_check,
.targetsize = sizeof(struct ebt_mark_t_info),
#ifdef CONFIG_COMPAT
.compatsize = sizeof(struct compat_ebt_mark_t_info),
.compat_from_user = mark_tg_compat_from_user,
.compat_to_user = mark_tg_compat_to_user,
#endif
.me = THIS_MODULE,
};
static unsigned int
ebt_mark_tg(struct sk_buff *skb, const struct xt_action_param *par)
{
const struct ebt_mark_t_info *info = par->targinfo;
int action = info->target & -16;
if (action == MARK_SET_VALUE)
skb->mark = info->mark;
else if (action == MARK_OR_VALUE)
skb->mark |= info->mark;
else if (action == MARK_AND_VALUE)
skb->mark &= info->mark;
else
skb->mark ^= info->mark;
return info->target | ~EBT_VERDICT_BITS;
}
3. Ethernet Bridge的Data Path
3.1 br-nf code
在了解整个以太网桥的Data Path之前,我们首先要介绍通过“CONFIG_BRIDGE_NEFILTER”宏控制的br-nf代码。它和ebtables模块的“3表6链”共同组成了整个数据链路层的防火墙框架。
还记得前面提到“已注册的HOOK回调函数会在数据包流经netfilter框架HOOK点时被系统自动调用执行”吗?实际内核中是通过NF_HOOK()/NF_HOOK_THRESH()宏参数来实现。其函数原型如下所示:
/*
pf: 协议类型
hook: hook类型
net:所属的网络命名空间
sk: skb对应的传输层的socket
skb:需处理的skb
in: 输入的网络设备
out:输出的网络设备
okfn: 通过所有hook点后,返回值为NF_ACCEPT时执行该函数
thresh: 通过的hook点的优先级。(只通过低于该优先级的hook点)
*/
static inline int
NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
struct sk_buff *skb, struct net_device *in,
struct net_device *out,
int (*okfn)(struct net *, struct sock *, struct sk_buff *),
int thresh)
而br-nf是一个可选功能模块,它通过注册EB-NF/ebtables框架的HOOK回调,在回调中通过调用NF_HOOK()/NF_HOOK_THRESH()函数来让bridged的数据帧通过IP层的防火墙模块。使得通过iptables命令设置的防火墙规则对桥接转发的IP包同样生效。这样桥接转发的IP包就可以在数据链路层中实现IP NAT功能(注:一定程度破坏了TCP/IP协议的分层原则)。
3.2 从ebtables的“3表6链”和br-nf看以太帧在数据链路层的流向
如下图所示,为网桥设备接收到数据帧在数据链路层的处理流程框架,后面我们会结合源代码进行分步剖析。
3.2.1 Broute阶段
如下图所示为局部化后Broute阶段的数据帧处理流程图:
以太网卡绑定到网桥的时候会注册rx_handle=br_handle_frame。
int br_add_if(struct net_bridge *br, struct net_device *dev)
{
//以上内容省略
err = netdev_rx_handler_register(dev, br_handle_frame, p);
//以下内容省略
}
当网卡设备接收到数据包时会执行br_handle_frame()函数对数据帧进行处理。
如果以太帧的目的地址为本地链路地址(01:80:c2:00:00:0X ,unlikely),系统会根据具体地址进行处理,决定是进行转发还是发往本地。
rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
{
//以上省略。
if (unlikely(is_link_local_ether_addr(dest))) {
u16 fwd_mask = p->br->group_fwd_mask_required;
/*
* See IEEE 802.1D Table 7-10 Reserved addresses
*
* Assignment Value
* Bridge Group Address 01-80-C2-00-00-00
* (MAC Control) 802.3 01-80-C2-00-00-01
* (Link Aggregation) 802.3 01-80-C2-00-00-02
* 802.1X PAE address 01-80-C2-00-00-03
*
* 802.1AB LLDP 01-80-C2-00-00-0E
*
* Others reserved for future standardization
*/
switch (dest[5]) {
// 网桥组地址,如果STP功能开启则通过网桥INPUT链后发往本地,否则对数据包进行转发。
case 0x00: /* Bridge Group Address */
/* If STP is turned off,
then must forward to keep loop detection */
if (p->br->stp_enabled == BR_NO_STP ||
fwd_mask & (1u << dest[5]))
goto forward;
break;
// IEEE 的MAC控制地址,直接抛弃。
case 0x01: /* IEEE MAC (Pause) */
goto drop;
// 其他协议地址,进行转发
default:
/* Allow selective forwarding for most other protocols */
fwd_mask |= p->br->group_fwd_mask;
if (fwd_mask & (1u << dest[5]))
goto forward;
}
/* Deliver packet to local host only */
if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,
dev_net(skb->dev), NULL, skb, skb->dev, NULL,
br_handle_local_finish)) {
return RX_HANDLER_CONSUMED; /* consumed by filter */
} else {
*pskb = skb;
return RX_HANDLER_PASS; /* continue processing */
}
}
//以下省略
}
对于处于转发状态的数据帧,br_handle_frame会查询broute表进行帧处理。注意,broute表初始化后并没有如前面所介绍的filter表一样注册了bridge netfilter的钩子。所以它不是使用NF_HOOK/NF_HOOK_THRESH来进行broute表规则的查询。而是使用了函数指针来进行引用,但最终还是调用了ebt_do_table()函数来进行broute表的查询。查询broute表后,数据帧只有2种状态,要么进行桥接,要么进行路由。
rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
{
// 以上省略。
forward:
switch (p->state) {
case BR_STATE_FORWARDING:
rhook = rcu_dereference(br_should_route_hook);
if (rhook) {
if ((*rhook)(skb)) { // 进入broute表处理流程
*pskb = skb;
return RX_HANDLER_PASS;
}
dest = eth_hdr(skb)->h_dest;
}
// 以下省略。
}
static int __init ebtable_broute_init(void)
{
int ret;
ret = register_pernet_subsys(&broute_net_ops);
if (ret < 0)
return ret;
/* see br_input.c */
RCU_INIT_POINTER(br_should_route_hook,
(br_should_route_hook_t *)ebt_broute);
return 0;
}
static int ebt_broute(struct sk_buff *skb)
{
struct nf_hook_state state;
int ret;
nf_hook_state_init(&state, NULL, NF_BR_BROUTING, INT_MIN,
NFPROTO_BRIDGE, skb->dev, NULL, NULL,
dev_net(skb->dev), NULL);
ret = ebt_do_table(skb, &state, state.net->xt.broute_table);
// 如前面所说, NF_DROP在broute表中表示对数据帧进行路由处理
if (ret == NF_DROP)
return 1; /* route it */
return 0; /* bridge it */
}
3.2.2 PreRouting阶段(Forwarding前)
如下图所示为局部化后的PreRouting阶段数据帧处理流程:
经过broute链后,决定转发的数据帧会调用bridge netfilter的HOOK点——NF_BR_PRE_ROUTING的回调函数进行数据帧处理。
rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
{
// 以上省略
NF_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING,
dev_net(skb->dev), NULL, skb, skb->dev, NULL,
br_handle_frame_finish);
// 以下省略
}
内核版本4.4的源码中,默认只有ebtables的nat表,以及nf-code注册该HOOK点的回调。其中按HOOK点的优先级,数据包首先会进行ebtable的nat表处理。(注:默认未开启ebtables的nat功能)
enum nf_br_hook_priorities {
NF_BR_PRI_FIRST = INT_MIN,
NF_BR_PRI_NAT_DST_BRIDGED = -300, // ebtables NAT 注册的优先级
NF_BR_PRI_FILTER_BRIDGED = -200,
NF_BR_PRI_BRNF = 0,
NF_BR_PRI_NAT_DST_OTHER = 100,
NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_SRC = 300,
NF_BR_PRI_LAST = INT_MAX,
};
PreRouting阶段的nat表处理主要是查询nat表的PREROUTING链规则,根据匹配条件修改目的MAC地址。
ebt_do_table()函数被执行的时候,只会查询当前hook点对应的链。在PreRouting阶段,只会查询PREROUTING链。
static struct nf_hook_ops ebt_ops_nat[] __read_mostly = {
// 以上省略
{
.hook = ebt_nat_in,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_PRE_ROUTING,
.priority = NF_BR_PRI_NAT_DST_BRIDGED,
},
// 以下省略
};
static unsigned int
ebt_nat_in(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
return ebt_do_table(skb, state, state->net->xt.frame_nat);
}
当内核开启CONFIG_BRIDGE_NEFILTER时,数据帧通过nat表的PREROUTING链后,还会进入br-nf code的PreRouting处理流程。
static struct nf_hook_ops br_nf_ops[] __read_mostly = {
{
.hook = br_nf_pre_routing,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_PRE_ROUTING,
.priority = NF_BR_PRI_BRNF,
},
{
.hook = br_nf_post_routing,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_POST_ROUTING,
.priority = NF_BR_PRI_LAST,
},
{
.hook = ip_sabotage_in,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP_PRI_FIRST,
},
{
.hook = ip_sabotage_in,
.pf = NFPROTO_IPV6,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP6_PRI_FIRST,
},
};
br-nf code注册了bridge nefilter的PREROUTING HOOK点处理函数br_nf_pre_routing()。在br_nf_pre_routing()函数中会通过NF_HOOK函数来调用IP层的PREROUTING钩子回调函数。
在IP层Ipv4和ipv6会有各自独立的表和链,故在br_nf_pre_routing()函数中会根据数据帧中IP协议进行不同的处理。另外,非IP包不会进行br-nf code的处理,直接返回执行br_handle_frame_finish()函数。
static unsigned int br_nf_pre_routing(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
// 以上省略
// IPv6协议的IP包分开单独处理(和IPv4的处理逻辑大致相同)
if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb)) {
if (!brnf_call_ip6tables && !br->nf_call_ip6tables)
return NF_ACCEPT;
nf_bridge_pull_encap_header_rcsum(skb);
return br_nf_pre_routing_ipv6(priv, skb, state);
}
// 非IP包不做处理
if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb))
return NF_ACCEPT;
// 调用IP层IPv4协议的NF_INET_PRE_ROUTING勾子回调
NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, state->net, state->sk, skb,
skb->dev, NULL,
br_nf_pre_routing_finish);
}
如下所示,IP层netfilter勾子处理优先级,iptables的4个表的优先级为RAW->MANGLE->NAT->FILTER。而IP层有PREROUTING链的只有MANGLE表和NAT表。故IP包进入br-nf code的prerouting处理流程时会依次匹配MANGLE表和NAT表里面的防火墙规则。
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK_DEFRAG = -400,
NF_IP_PRI_RAW = -300,
NF_IP_PRI_SELINUX_FIRST = -225,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_SECURITY = 50,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_SELINUX_LAST = 225,
NF_IP_PRI_CONNTRACK_HELPER = 300,
NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
NF_IP_PRI_LAST = INT_MAX,
};
为了避免数据包通过br-nf code重复匹配IP层的防火墙规则。br-nf code还注册了一个最高优先级的IP层hook回调函数,在里面进行相关判断。
static struct nf_hook_ops br_nf_ops[] __read_mostly = {
// 以上省略
{
.hook = ip_sabotage_in,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP_PRI_FIRST, //优先级最高
},
{
.hook = ip_sabotage_in,
.pf = NFPROTO_IPV6,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP6_PRI_FIRST, //优先级最高
},
// 以下省略
};
/* Don't hand locally destined packets to PF_INET(6)/PRE_ROUTING
* for the second time. */
static unsigned int ip_sabotage_in(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
if (skb->nf_bridge && !skb->nf_bridge->in_prerouting)
return NF_STOP;
return NF_ACCEPT;
}
经过IP层的NAT表的PREROUTING链后,数据帧的ipv4/ipv6目标地址可能会改变。其目标MAC地址可能要随之更改。
/* This requires some explaining. If DNAT has taken place,
* we will need to fix up the destination Ethernet address.
* This is also true when SNAT takes place (for the reply direction).
*
* There are two cases to consider:
* 1. The packet was DNAT'ed to a device in the same bridge
* port group as it was received on. We can still bridge
* the packet.
* 2. The packet was DNAT'ed to a different device, either
* a non-bridged device or another bridge port group.
* The packet will need to be routed.
*
* The correct way of distinguishing between these two cases is to
* call ip_route_input() and to look at skb->dst->dev, which is
* changed to the destination device if ip_route_input() succeeds.
*
* Let's first consider the case that ip_route_input() succeeds:
*
* If the output device equals the logical bridge device the packet
* came in on, we can consider this bridging. The corresponding MAC
* address will be obtained in br_nf_pre_routing_finish_bridge.
* Otherwise, the packet is considered to be routed and we just
* change the destination MAC address so that the packet will
* later be passed up to the IP stack to be routed. For a redirected
* packet, ip_route_input() will give back the localhost as output device,
* which differs from the bridge device.
*
* Let's now consider the case that ip_route_input() fails:
*
* This can be because the destination address is martian, in which case
* the packet will be dropped.
* If IP forwarding is disabled, ip_route_input() will fail, while
* ip_route_output_key() can return success. The source
* address for ip_route_output_key() is set to zero, so ip_route_output_key()
* thinks we're handling a locally generated packet and won't care
* if IP forwarding is enabled. If the output device equals the logical bridge
* device, we proceed as if ip_route_input() succeeded. If it differs from the
* logical bridge port or if ip_route_output_key() fails we drop the packet.
*/
static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
3.2.3 Bridging决策阶段
完成PreRouting阶段的处理后,数据包就要开始进行桥接决策了。通常,桥接决策有3种结果:
(1) 当数据帧的目的MAC地址为网桥的另一端,则转发数据帧。
(2) 当数据帧的目的MAC地址unknown,则把数据帧转发到网桥的所有端口(flood)。
(3) 当数据帧的目的MAC地址为本地,或者网卡工作在混杂模式,则把数据帧发往本地的高层协议处理。
(4) 当数据帧的目的MAC地址和发来的端口是同一侧,则忽略该数据帧(丢弃不做处理)。
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
const unsigned char *dest = eth_hdr(skb)->h_dest;
struct net_bridge_port *p = br_port_get_rcu(skb->dev);
struct net_bridge *br;
struct net_bridge_fdb_entry *dst;
struct net_bridge_mdb_entry *mdst;
struct sk_buff *skb2;
bool unicast = true;
u16 vid = 0;
if (!p || p->state == BR_STATE_DISABLED)
goto drop;
if (!br_allowed_ingress(p->br, nbp_vlan_group_rcu(p), skb, &vid))
goto out;
/* insert into forwarding database after filtering to avoid spoofing */
br = p->br;
if (p->flags & BR_LEARNING)
br_fdb_update(br, p, eth_hdr(skb)->h_source, vid, false);
if (!is_broadcast_ether_addr(dest) && is_multicast_ether_addr(dest) &&
br_multicast_rcv(br, p, skb, vid))
goto drop;
if (p->state == BR_STATE_LEARNING)
goto drop;
BR_INPUT_SKB_CB(skb)->brdev = br->dev;
/* The packet skb2 goes to the local host (NULL to skip). */
skb2 = NULL;
if (br->dev->flags & IFF_PROMISC)
skb2 = skb;
dst = NULL;
if (IS_ENABLED(CONFIG_INET) && skb->protocol == htons(ETH_P_ARP))
br_do_proxy_arp(skb, br, vid, p);
if (is_broadcast_ether_addr(dest)) {
skb2 = skb;
unicast = false;
} else if (is_multicast_ether_addr(dest)) {
mdst = br_mdb_get(br, skb, vid);
if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
br_multicast_querier_exists(br, eth_hdr(skb))) {
if ((mdst && mdst->mglist) ||
br_multicast_is_router(br))
skb2 = skb;
br_multicast_forward(mdst, skb, skb2);
skb = NULL;
if (!skb2)
goto out;
} else
skb2 = skb;
unicast = false;
br->dev->stats.multicast++;
} else if ((dst = __br_fdb_get(br, dest, vid)) &&
dst->is_local) {
skb2 = skb;
/* Do not forward the packet since it's local. */
skb = NULL;
}
if (skb) {
if (dst) {
dst->used = jiffies;
//第1种情况,转发到指定网桥端口
br_forward(dst->dst, skb, skb2);
} else
// 第2种情况,转发到所有其他网桥端口
br_flood_forward(br, skb, skb2, unicast);
}
if (skb2)
return br_pass_frame_up(skb2); //第3种情况,数据帧发往本地。
out:
return 0;
drop:
kfree_skb(skb); //第4种情况,丢弃数据帧
goto out;
}
在桥接决策后发往本地的数据帧,需要先流经bridge netfilter的FILTER表的INPUT链。
static int br_pass_frame_up(struct sk_buff *skb)
{
// 以上省略
return NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,
dev_net(indev), NULL, skb, indev, NULL,
br_netif_receive_skb);
}
3.2.4 Forwarding阶段
不管桥接决策的第一种情况(向指定网桥端口转发)还是第二种情况(向所有网桥端口转发),最终都是调用的__br_forward()函数,其中会调用bridge netfilter的NF_BR_FORWARD钩子的回调函数。
static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
{
//以上省略
NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,
dev_net(indev), NULL, skb, indev, skb->dev,
br_forward_finish);
}
和Prerouting阶段的分析方法类似,Ebtables模块中只有FILTER表注册了NF_BR_FORWARD钩子,所以数据帧会首先流经Ebtables模块的FILTER表的FORWARD链。
static struct nf_hook_ops ebt_ops_filter[] __read_mostly = {
//以上省略
{
.hook = ebt_in_hook,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_FORWARD,
.priority = NF_BR_PRI_FILTER_BRIDGED,
},
//以下省略
};
static unsigned int
ebt_in_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
return ebt_do_table(skb, state, state->net->xt.frame_filter);
}
同样的,此后数据帧通过br-nf code调用了IP层netfilter的NF_INET_FORWARD的钩子回调,然后再通过前面未提及的arp模块的filter表(管理arp包,对应上层应用工具arptables)。
static struct nf_hook_ops br_nf_ops[] __read_mostly = {
//以上省略
{
.hook = br_nf_forward_ip,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_FORWARD,
.priority = NF_BR_PRI_BRNF - 1,
},
{
.hook = br_nf_forward_arp,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_FORWARD,
.priority = NF_BR_PRI_BRNF,
},
//以下省略
};
IP层netfilter的NF_INET_FORWARD钩子回调处理,数据帧依次为通过MANGLE表和FILTER表的FORWARD链,同样分IPv4协议和IPv6协议。
static unsigned int br_nf_forward_ip(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
//以上省略
if (IS_IP(skb) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb))
pf = NFPROTO_IPV4;
else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb))
pf = NFPROTO_IPV6;
else
return NF_ACCEPT; //非IP包则不做处理
//中间部分省略
NF_HOOK(pf, NF_INET_FORWARD, state->net, NULL, skb,
brnf_get_logical_dev(skb, state->in),
parent, br_nf_forward_finish);
return NF_STOLEN;
}
数据帧通过ARP防火墙的FILTER表(必须是APR包):
static unsigned int br_nf_forward_arp(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
//以上省略
//只处理ARP包
if (!IS_ARP(skb)) {
if (!IS_VLAN_ARP(skb))
return NF_ACCEPT;
nf_bridge_pull_encap_header(skb);
}
//中间省略
NF_HOOK(NFPROTO_ARP, NF_ARP_FORWARD, state->net, state->sk, skb,
state->in, state->out, br_nf_forward_finish);
return NF_STOLEN;
}
此后br_forward_finish()函数被调用,进入POSTROUTING流程。
int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING,
net, sk, skb, NULL, skb->dev,
br_dev_queue_push_xmit);
}
3.2.5 PostRouting阶段
Ebtables模块中只有NAT表注册了NF_BR_POST_ROUTING钩子,所以数据帧会首先流经Ebtables模块的NAT表的POSTROUTING链。
static struct nf_hook_ops ebt_ops_nat[] __read_mostly = {
//以上省略
{
.hook = ebt_nat_out,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_POST_ROUTING,
.priority = NF_BR_PRI_NAT_SRC,
},
//以下省略
};
static unsigned int
ebt_nat_out(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
return ebt_do_table(skb, state, state->net->xt.frame_nat);
}
此后,数据帧通过br-nf code调用了IP层netfilter的NF_INET_POST_ROUTING的钩子回调。
static struct nf_hook_ops br_nf_ops[] __read_mostly = {
//以上省略
{
.hook = br_nf_post_routing,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_POST_ROUTING,
.priority = NF_BR_PRI_LAST,
},
//以下省略
};
IP层netfilter的NF_INET_POST_ROUTING钩子回调处理,数据帧依次为通过MANGLE表和NAT表的POSTFORWARDING链,同样分IPv4协议和IPv6协议。
static unsigned int br_nf_post_routing(void *priv,
struct sk_buff *skb,
const struct nf_hook_state *state)
{
//以上省略
if (IS_IP(skb) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb))
pf = NFPROTO_IPV4;
else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb))
pf = NFPROTO_IPV6;
else
return NF_ACCEPT; //非IP包不处理
//中间部分省略
NF_HOOK(pf, NF_INET_POST_ROUTING, state->net, state->sk, skb,
NULL, realoutdev,
br_nf_dev_queue_xmit);
return NF_STOLEN;
}
3.2.6 Output阶段
本地生成或者被路由的桥接包,最终会调用__br_deliver()函数进行网桥传递。其中会调用bridge netfilter的NF_BR_LOCAL_OUT钩子回调函数进行数据帧处理,最后和通过PostRouting阶段的数据帧一样进行最后的PostRouting阶段处理再往外发送。
static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
{
//以上省略
NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT,
dev_net(skb->dev), NULL, skb,NULL, skb->dev,
br_forward_finish);
}
EB-NF框架中,ebtables的NAT表和FILTER表的OUTPUT链注册了NF_BR_LOCAL_OUT钩子,故数据帧会依次流经这两条链再进入PostRouting阶段。
static struct nf_hook_ops ebt_ops_filter[] __read_mostly = {
//以上省略
{
.hook = ebt_out_hook,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_LOCAL_OUT,
.priority = NF_BR_PRI_FILTER_OTHER,
},
//以下省略
};
static struct nf_hook_ops ebt_ops_nat[] __read_mostly = {
//以上省略
{
.hook = ebt_nat_out,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_LOCAL_OUT,
.priority = NF_BR_PRI_NAT_DST_OTHER,
},
//以下省略
};
4. 参考资料
4.1 官方文档资料(来源:https://ebtables.netfilter.org/documentation/docs.html)
4.2 网上论坛博客
https://bbs.csdn.net/topics/390955966
https://blog.csdn.net/T146lLa128XX0x/article/details/80115563
http://blog.chinaunix.net/uid-23871250-id-5825936.html
https://www.cnblogs.com/balance/p/8711264.html
————————————————
版权声明:本文为CSDN博主「blacksonlgx」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_45254661/article/details/104931627