Linux旧版本内核nf_conntrck BUG一则

最新推荐文章于 2023-12-07 16:38:34 发布

dog250

最新推荐文章于 2023-12-07 16:38:34 发布

阅读量5.6k

点赞数

本文链接：https://blog.csdn.net/dog250/article/details/103829505

版权

写这个已经没有实际意义了，毕竟本文涉及的是一个Linux 3.10老旧版本内核的一个nf_conntrack的BUG，之所以还是写下来是因为它既有情怀，又有意思。

BUG的描述非常简单，就是在____nf_conntrack_find函数里面访问NULL指针系统崩溃了。

我的任务找出它。

这次不同于手艺人的做法，而是采用经理的作风来排查故障。所以，这次不用crash工具，不用stap，不用trace/ebpf…只是肉眼盯着代码看，嗯，这是经理常用的方法。

我不是经理，我只是对nf_conntrack比较熟悉，这就是我上面的说的情怀。

先看一个社区的patch：
https://patchwork.ozlabs.org/patch/29808/
该patch大致说的是，使能了SLAB_DESTROY_BY_RCU标志的slab对象是可以在RCU grace period被重用的，所以为了让hlist_nulls_for_each_entry安全执行，slab对象即使被释放到了slab：

它可能依然在被使用中。
它会被随时捞起重用。

由于hlist_nulls_for_each_entry无条件使用hnnode的next字段，所以这里必须谨慎：

/**
 * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type
 * @tpos:   the type * to use as a loop cursor.
 * @pos:    the &struct hlist_nulls_node to use as a loop cursor.
 * @head:   the head for your list.
 * @member: the name of the hlist_nulls_node within the struct.
 *
 * The barrier() is needed to make sure compiler doesn't cache first element [1],
 * as this loop can be restarted [2]
 * [1] Documentation/atomic_ops.txt around line 114
 * [2] Documentation/RCU/rculist_nulls.txt around line 146
 */
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)         \
    for (({barrier();}),                            \
         pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));        \
        (!is_a_nulls(pos)) &&                       \
        ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
        pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))

在大致描述了原理之后，是时候给出触发场景了，下图以示之：
在这里插入图片描述

关于ext一共有多少种，可以去看下面的结构体：

enum nf_ct_ext_id {
	...
};

只要有一种ext被加载，ct的ext字段就不为NULL，典型的，如果你配置了NAT规则，那么ct的ext就不为NULL了，宕机概率大大增加。

这里必须要说的是一个不那么常用的ext，宕机也正是因为它引起的，即：

NF_CT_EXT_ZONE

该ext在2.6版本的后期被引入，虽然说隔离了conntrack项，看样子是提升了效率，但其实并不常用（其实内部的hash表并没有分开），然而它却引入了隐患：

    return nf_ct_tuple_equal(tuple, &((struct nf_conntrack_tuple_hash *)h)->tuple) &&
        nf_ct_zone(ct) == zone && // 增加了nf_ct_zone的调用！
        nf_ct_is_confirmed(ct);

这个问题一直没有被发现，直到人们将nf_conntrack_zone结构体从ext剥离出来：
https://lists.openwall.net/netdev/2016/07/06/88
注意，该patch并非基于bugfix，而是另有缘由，但是幸运的是，nf_conntrack_zone字段被放置在了__nfct_init_offset之前，也就是说，它不再处于被memet清0的范围内了。

嗯，歪打正着！

从此，问题不再。

下面是一个用户态的POC，在下班的班车上随便写的，其实就是一个简单的race condition：

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

typedef unsigned short u16;
typedef unsigned char u8;

enum nf_ct_ext_id {
    NF_CT_EXT_HELPER,
    NF_CT_EXT_ZONE,
	NF_CT_EXT_NUM,
};

struct callback_head {
    struct callback_head *next;
    void (*func)(struct callback_head *head);
};
#define rcu_head callback_head

struct nf_ct_ext {
	struct rcu_head rcu;
	u16 offset[NF_CT_EXT_NUM];
};

struct nf_conntrack_zone {
    u16 id;
};

struct nf_conn {
	int something;
	struct nf_ct_ext *ext;
	int proto;
};

struct nf_conn *CT, *ct;
struct nf_ct_ext ext;

static void nf_conn_free_to_slab(struct nf_conn *ct)
{
	// do nothing;
}
static struct nf_conn *nf_conn_alloc_from_slab(void)
{
	return CT;
}

static inline int nf_ct_ext_exist(const struct nf_conn *ct, u8 id)
{
    return ct->ext && !!(ct->ext->offset[id]);
}

static inline void *nf_ct_ext_find(const struct nf_conn *ct, u8 id)
{
    if (!nf_ct_ext_exist(ct, id))
        return NULL;

    return (void *)ct->ext + ct->ext->offset[id];
}

static inline u16 nf_ct_zone(const struct nf_conn *ct)
{
    struct nf_conntrack_zone *nf_ct_zone;
    nf_ct_zone = nf_ct_ext_find(ct, NF_CT_EXT_ZONE);
    if (nf_ct_zone)
        return nf_ct_zone->id;
}

void *thread_find(void *arg)
{
	struct nf_conn *ct = CT;
	static int i = 0;

	while(1) {
		nf_ct_zone(ct);
		printf("count: %d\n", i++);

	}
}

void *thread_slab_free_alloc(void *arg)
{
	ct = CT;
	while(1) {
		nf_conn_free_to_slab(ct);
		ct = nf_conn_alloc_from_slab();

		memset(&ct->something, 0,
			   offsetof(struct nf_conn, proto) - offsetof(struct nf_conn, something));
		ct->ext = &ext;
	}
}

static void global_init()
{
	CT = (struct nf_conn *)calloc(1, sizeof(struct nf_conn));
	ext.offset[NF_CT_EXT_ZONE] = 1;
	CT->ext = &ext;
}

int main(int argc, char **argv)
{
	pthread_t id1, id2;
	void *ret;

	global_init();

	pthread_create(&id2, NULL, thread_find, "");
	pthread_create(&id2, NULL, thread_slab_free_alloc, "");

	sleep(10000);
}

最后，我们看几个相关的链接：

由于nat的一些元数据被memset 0而引发的crash：
https://git.shtrih-m.ru/amednyy/mainline_linux/commit/5173bc679dec881120df109a6a2b39143235382c
RCU reuse相关的race condition：
http://patchwork.ozlabs.org/patch/516773/
然后，这里有个好玩的：
https://lists.freedesktop.org/archives/dri-devel/2018-August/185207.html

然而，一切都比不上经理下雨天进水的皮鞋👞。

浙江温州皮鞋湿，下雨进水不会胖。

dog250

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Linux旧版本内核nf_conntrck BUG一则

写这个已经没有实际意义了，毕竟本文涉及的是一个Linux 3.10老旧版本内核的一个nf_conntrack的BUG，之所以还是写下来是因为它既有情怀，又有意思。BUG的描述非常简单，就是在____nf_conntrack_find函数里面访问NULL指针系统崩溃了。我的任务找出它。这次不同于手艺人的做法，而是采用经理的作风来排查故障。所以，这次不用crash工具，不用stap，不用trac...
复制链接

扫一扫