使用systemtap实时获取系统中半连接的数量

最新推荐文章于 2023-10-14 15:23:36 发布

dog250

最新推荐文章于 2023-10-14 15:23:36 发布

阅读量1.4w

点赞数 4

本文链接：https://blog.csdn.net/dog250/article/details/105022347

版权

统计系统中TCP半连接的数量，对于分析SYN攻击行为非常有意义，目前，我们有多种手段可以统计系统半连接的数量，比如采用netstat/ss遍历系统的连接，再grep出半连接。

但是这种遍历的方式在攻击已经形成时并不可取，会让本来已经不堪重负的系统更加是雪上加霜。遍历操作本身是 $O (n)$ 的，工具越剧烈，获取半连接统计信息这件事本身消耗的资源就越多。

于是，昨天中午，我写了个简单代码，通过仅仅遍历Listener的方式来获取所有附着在每一个Listener上的request的累加和，这种方式之所以可取，是因为在任意时刻，系统中的Listener数量是常数，因为我们这个方式是一个 $O (1)$ 的时间复杂度。详情参见：
https://blog.csdn.net/dog250/article/details/105013772

但是，能不能更近一步呢？

我的意思是说，系统中存在一个值，比如叫counter，它表示当前时间快照的TCP半连接的数量，当有TCP Synrequest进来时，counter加1，当有握手完成或者TCP Synrequest超时释放时，counter减1，那么是不是更加优雅呢？这样一来，连遍历Listener的开销都省了。

我们先用stap试一下。

通过内核代码，我们知道每一个Listener的半连接计数是在下面两个函数中被更新的：

static inline int reqsk_queue_removed(struct request_sock_queue *queue,
                      struct request_sock *req);
static inline int reqsk_queue_added(struct request_sock_queue *queue);

虽然它们是inline，但是别忘了，kprobe是支持指令probe的，在其封装成stap后，我们可以用kernel.statement来玩：

[root@localhost ~]# stap -L 'kernel.statement("**")'|grep '\"reqsk_queue_added\|\"reqsk_queue_removed'
kernel.statement("reqsk_queue_added@include/net/request_sock.h:241") $queue:struct request_sock_queue*
kernel.statement("reqsk_queue_removed@include/net/request_sock.h:232")

OK，有戏！

下面就开整，写下stap脚本：

global counter

probe begin {
	printf("hello!\n")
}

probe kernel.statement("reqsk_queue_removed@include/net/request_sock.h:232") {
	counter--
	printf("1 request removed: %d\n", counter)
}

probe kernel.statement("reqsk_queue_added@include/net/request_sock.h:241") {
	counter++
	printf("1 request added: %d\n", counter)
}

probe end {
	printf("bye...\n")
}

趁着系统中没有半连接的时候，将它跑起来（不然数据会不准）：

[root@localhost ~]# stap ./synrecv.stp
hello!

这个时候，在另一个终端或者另一个机器上打一波syn flood，对着侦听端口打，不然会reset：

root@zhaoya-VirtualBox:/home/zhaoya# hping3 -i u1 -S -p 22 192.168.56.110 -q &
[1] 2067
root@zhaoya-VirtualBox:/home/zhaoya# hping3 -i u1 -S -p 23 192.168.56.110 -q &
[2] 2068

观察stap的实时输出：

1 request added: 512
1 request removed: 511
1 request removed: 510
1 request added: 511
1 request added: 512
1 request removed: 511
1 request removed: 510
1 request added: 511
1 request added: 512
1 request removed: 511
1 request removed: 510
...

此时杀掉hping3，我们得到了预期的实时输出，最后我们杀掉stap，结束这一切：

...
1 request removed: 15
1 request removed: 14
1 request removed: 13
1 request removed: 12
1 request removed: 11
1 request removed: 10
1 request removed: 9
1 request removed: 8
1 request removed: 7
1 request removed: 6
1 request removed: 5
1 request removed: 4
1 request removed: 3
1 request removed: 2
1 request removed: 1
1 request removed: 0
^Cbye...
[root@localhost ~]#

这意味着我们可以随时获取counter的值，绘制出曲线后，就能明显看出系统半连接在任意时间段内的变化情况，这可以帮助我们理解攻击的行为特征。

但是，由于kprobe机制是有损的，生产环境真的可以部署stap吗？显然不能！那么这也只是个POC了，再者说了，我们也没有考虑同步问题，行百里者半九十，现在才行十里。

接下来怎么做？

很容易，我们只需要用kpatch技术去hotfix这些inline函数的调用函数就可以了，在它们后面增加或者递减全局的计数器。

通过代码可以知道，这其实非常容易：

reqsk_queue_added仅在inet_csk_reqsk_queue_hash_add中被调用。
reqsk_queue_removed在2个函数中被调用。

我们仅仅需要修改两个函数即可，由于要考虑多核同步问题，所以我们有两个选择：

使用atomic变量。
使用percpu变量。

我更倾向于使用percpu变量，因为它几乎是无损的，在读取的时候，只需要把每个CPU的对应变量对无锁累加即可，虽然这会有误差，但误差仅仅限于最后的累加过程，遍历CPU的过程是一个很快的过程，平均到一个时间段，误差可以忽略了。

下面是需要修改的地方。

// net/ipv4/inet_connection_sock.c
void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
                   unsigned long timeout)
{
	unsigned int this_cpu_counter = ... // percpu counter的获取
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct listen_sock *lopt = icsk->icsk_accept_queue.listen_opt;
    const u32 h = inet_synq_hash(inet_rsk(req)->ir_rmt_addr,
                     inet_rsk(req)->ir_rmt_port,
                     lopt->hash_rnd, lopt->nr_table_entries);

    reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout);
    inet_csk_reqsk_queue_added(sk, timeout);
    this_cpu_counter ++;
}
void inet_csk_reqsk_queue_prune(struct sock *parent,
                const unsigned long interval,
                const unsigned long timeout,
                const unsigned long max_rto)
{
	unsigned int this_cpu_counter = ... // percpu counter的获取
	...
                /* Drop this request */
                inet_csk_reqsk_queue_unlink(parent, req, reqp);
                reqsk_queue_removed(queue, req);
                this_cpu_counter --;
                reqsk_free(req);
                continue;
            }
            reqp = &req->dl_next;
        }
        ...
}
// net/ipv4/tcp_minisock.c
struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
               struct request_sock *req,
               struct request_sock **prev,
               bool fastopen)
{
	unsigned int this_cpu_counter = ... // percpu counter的获取
	...
    inet_csk_reqsk_queue_removed(sk, req);
    this_cpu_counter --;

    inet_csk_reqsk_queue_add(sk, req, child);
    return child;
    ...
    if (!fastopen) {
        inet_csk_reqsk_queue_drop(sk, req, prev);
        this_cpu_counter --;
        NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_EMBRYONICRSTS);
    }
    return NULL;
}
// net/ipv4/tcp_ipv4.c
void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
{
	unsigned int this_cpu_counter = ... // percpu counter的获取
	...
	inet_csk_reqsk_queue_drop(sk, req, prev);
	this_cpu_counter --;
	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
	goto out;
	...
}

这个代码我没有proc接口读出来数据，因为我不知道如何在kpatch中注册一个proc接口或者注册一个sysctl，当然，可以hook任意函数这么做：

static int initial = 0;
void hooked_func(...)
{
	...
	if (!initial) {
		register_proc(...);
		initial = 1;
	}
}

不晓得可否？

kpatch一定要在系统没有流量的时候加载，不然会遗漏加载时已有的半连接统计。

如果想在有流量的时候无卡顿热加载，那也不难，kpatch加载的时候，使用我昨天扫描Listener的方式获取一下当前的半连接队列总长度的值，然后为全局counter设置一个初始值即可。在读取值的时候，所有percpu变量累加之后，全局加上该初始值，就是结果。 记住，percpu变量在这种情况下，就必须是signed int，而不是unsigned int咯！

当然了，这些都只是手艺人的小把戏，经理肯定不是这么想的，经理肯定也不在乎这些东西。

浙江温州皮鞋湿，下雨进水不会胖。