linux net namespace优化代码分析

最新推荐文章于 2022-07-01 13:12:14 发布

qiushanjushi

最新推荐文章于 2022-07-01 13:12:14 发布

阅读量1.7k

点赞数

分类专栏： linux namespace

linux 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

namespace

2 篇文章 0 订阅

订阅专栏

转载自： http://hi.baidu.com/supperwangli/item/fbfe42c2eecf7a4ea8ba9420

linux net namespace优化代码分析

在Linux协议栈中引入网络命名空间, 是为了支持网络协议栈的多个实例, 而这些协议栈的隔离就是由命名空间来实现的(有点像进程的线性地址空间, 协议栈不能访问其他协议栈的私有数据).

需要纳入命名空间的元素包括进程, 套接字, 网络设备. 进程创建的套接字必须属于某个命名空间, 套接字的操作也必须在命名空间内进行, 网络设备也必须属于某个命名空间, 但可能会改变, 因为网络设备属于公共资源.

为了实现网络命名空间, 整个协议栈的代码都需要更新, 工作量非常大.

试想有两个并行的协议栈, 那么所有协议栈相关的全局变量都必须修改为协议栈私有. 最好的办法就是让这些全局变量成为一个per net namespace变量的成员, 然后为协议栈的函数调用都加入一个namespace参数.

但是内核开发者面临几个问题:

1. 最好让现有的内核代码隐式的使用命名空间内的变量, 而不要更新所有的内核代码, 否则工作量太大;

2. 性能损耗应该非常小, 使得使用命名空间与否对用户没有影响;

[1] http://lwn.net/Articles/219597/
[2] http://lwn.net/Articles/218595/

内核源代码有一个patch是优化net_namespace的释放
git format-patch 2b035b39970740722598f7a9d548835f9bdd730f -1
Subject: [PATCH] net: Batch network namespace destruction.

It is fairly common to kill several network namespaces at once. Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments. As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.

To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch. For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.

An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.

In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/net_namespace.h | 2 +-
net/core/net_namespace.c | 66 +++++++++++++++++++++++++++++++++++++-----
2 files changed, 59 insertions(+), 9 deletions(-)

...
+static DEFINE_SPINLOCK(cleanup_list_lock);
+static LIST_HEAD(cleanup_list); /* Must hold cleanup_list_lock to touch */
+
static void cleanup_net(struct work_struct *work)
{
struct pernet_operations *ops;
- struct net *net;
+ struct net *net, *tmp;
+ LIST_HEAD(net_kill_list);

- net = container_of(work, struct net, work);
+ /* Atomically snapshot the list of namespaces to cleanup */
+ spin_lock_irq(&cleanup_list_lock);
+ list_replace_init(&cleanup_list, &net_kill_list);
+ spin_unlock_irq(&cleanup_list_lock);

mutex_lock(&net_mutex);

/* Don't let anyone else find us. */
rtnl_lock();
- list_del_rcu(&net->list);
+ list_for_each_entry(net, &net_kill_list, cleanup_list)
+ list_del_rcu(&net->list);
rtnl_unlock();

/*
@@ -170,8 +201,18 @@ static void cleanup_net(struct work_struct *work)

/* Run all of the network namespace exit methods */
list_for_each_entry_reverse(ops, &pernet_list, list) {
- if (ops->exit)
- ops->exit(net);
+ if (ops->exit) {
+ list_for_each_entry(net, &net_kill_list, cleanup_list)
+ ops->exit(net);
+ }
+ if (&ops->list == first_device) {
+ LIST_HEAD(dev_kill_list);
+ rtnl_lock();
+ list_for_each_entry(net, &net_kill_list, cleanup_list)
+ unregister_netdevices(net, &dev_kill_list);
+ unregister_netdevice_many(&dev_kill_list);
+ rtnl_unlock();
+ }
}

mutex_unlock(&net_mutex);
@@ -182,14 +223,23 @@ static void cleanup_net(struct work_struct *work)
rcu_barrier();

/* Finally it is safe to free my network namespace structure */
- net_free(net);
+ list_for_each_entry_safe(net, tmp, &net_kill_list, cleanup_list) {
+ list_del_init(&net->cleanup_list);
+ net_free(net);
+ }
}
+static DECLARE_WORK(net_cleanup_work, cleanup_net);

void __put_net(struct net *net)
{
/* Cleanup the network namespace in process context */
- INIT_WORK(&net->work, cleanup_net);
- queue_work(netns_wq, &net->work);
+ unsigned long flags;
+
+ spin_lock_irqsave(&cleanup_list_lock, flags);
+ list_add(&net->cleanup_list, &cleanup_list);
+ spin_unlock_irqrestore(&cleanup_list_lock, flags);
+
+ queue_work(netns_wq, &net_cleanup_work);
}
EXPORT_SYMBOL_GPL(__put_net);
...

从代码分析，每次释放net namespace时，都会调用__put_net来释放，
释放操作会放入工作队列，延迟执行。
当有很多net namespace需要释放时，会造成很多个工作队列需要执行。
优化代码将需要释放的net namespace放入一个链表，
由于工作队列会延迟执行，所以当真正要释放时，链表上可能有很多需要释放的，
这时会在一次释放操作中，释放多个需要释放的net namespace，
省去了多次在工作队列中的释放。