slab的并发处理浅析

kaka__55

已于 2022-04-28 20:15:48 修改

阅读量528

点赞数

分类专栏： linux内存文章标签： linux

于 2022-04-27 19:56:35 首次发布

本文链接：https://blog.csdn.net/kaka__55/article/details/124458775

版权

linux内存专栏收录该内容

10 篇文章 0 订阅

订阅专栏

本文深入剖析了Linux内核slub内存分配器在并发环境下的处理机制，通过find_first_zero_bit和test_and_set_bit等原子操作实现无锁并发控制。在slab_alloc_node函数中，利用cmpxchg_double确保在没有竞争时高效地分配内存，避免使用锁，提高性能。同时，分析了barrier()的作用和简化后的代码设计，展示了内核优化策略的重要性。

摘要由CSDN通过智能技术生成

本文代码基于linux 4.19.195。
最近看了不少slab（slub）相关的代码，觉得slub的并发处理做的很优雅，特此记录一下。
先来看一个5.15内核的函数。

static inline int heart_alloc_int(void)
{
	int bit;

again:
	bit = find_first_zero_bit(heart_irq_map, HEART_NUM_IRQS);
	if (bit >= HEART_NUM_IRQS)
		return -ENOSPC;

	if (test_and_set_bit(bit, heart_irq_map))
		goto again;

	return bit;
}

这个函数通过find_first_zero_bit找到位图中第一个zero bit，然后，通过test_and_set_bit这个原子操作，将相关bit置位，并返回该bit的旧值。如果在第7行至第10行之间，该bit位被置为1了，说明已经有其他人抢占了该bit位，需要重新找到一个zero bit，否则，置位成功，函数退出。
这个函数巧就巧在，整个函数中，并没有使用任何锁的调用，只是使用了一个原子操作，便完成了并发处理。
slub代码也是同样的方法。下面来看具体代码。当我们从slub中分配内存时，代码会走到函数slab_alloc_node()

static __always_inline void *slab_alloc_node(struct kmem_cache *s,
		gfp_t gfpflags, int node, unsigned long addr)
{
***
redo:
	/*
	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
	 * enabled. We may switch back and forth between cpus while
	 * reading from one cpu area. That does not matter as long
	 * as we end up on the original cpu again when doing the cmpxchg.
	 *
	 * We should guarantee that tid and kmem_cache are retrieved on
	 * the same cpu. It could be different if CONFIG_PREEMPT so we need
	 * to check if it is matched or not.
	 */
	do {
		tid = this_cpu_read(s->cpu_slab->tid);
		c = raw_cpu_ptr(s->cpu_slab);
	} while (IS_ENABLED(CONFIG_PREEMPT) &&
		 unlikely(tid != READ_ONCE(c->tid)));

	/*
	 * Irqless object alloc/free algorithm used here depends on sequence
	 * of fetching cpu_slab's data. tid should be fetched before anything
	 * on c to guarantee that object and page associated with previous tid
	 * won't be used with current tid. If we fetch tid first, object and
	 * page could be one associated with next tid and our alloc/free
	 * request will be failed. In this case, we will retry. So, no problem.
	 */
	barrier();

	/*
	 * The transaction ids are globally unique per cpu and per operation on
	 * a per cpu queue. Thus they can be guarantee that the cmpxchg_double
	 * occurs on the right processor and that there was no operation on the
	 * linked list in between.
	 */

	object = c->freelist;
	page = c->page;
	if (unlikely(!object || !node_match(page, node))) {
		object = __slab_alloc(s, gfpflags, node, addr, c);
		stat(s, ALLOC_SLOWPATH);
	} else {
		//快速路径，从c->free_list上获取object
		void *next_object = get_freepointer_safe(s, object);

		/*
		 * The cmpxchg will only match if there was no additional
		 * operation and if we are on the right processor.
		 *
		 * The cmpxchg does the following atomically (without lock
		 * semantics!)
		 * 1. Relocate first pointer to the current per cpu area.
		 * 2. Verify that tid and freelist have not been changed
		 * 3. If they were not changed replace tid and freelist
		 *
		 * Since this is without lock semantics the protection is only
		 * against code executing on this cpu *not* from access by
		 * other cpus.
		 */
		if (unlikely(!this_cpu_cmpxchg_double(
				s->cpu_slab->freelist, s->cpu_slab->tid,
				object, tid,
				next_object, next_tid(tid)))) {

			note_cmpxchg_failure("slab_alloc", s, tid);
			goto redo;
		}
		prefetch_freepointer(s, next_object);
		stat(s, ALLOC_FASTPATH);
	}

	***

	return object;
}

我们先看16-20行。
注释中说的很清楚，如果CONFIG_PREEMPT使能了的话，16-20之间是存在抢占的可能的，为了是的tid和c是同一个cpu上获取的。其中，this_cpu_read这个宏是带了关抢占和开抢占指令的，而raw_cpu_ptr是不带抢占开关的，若在17行执行完之后，进程被抢占，然后唤醒后去到另一个核上的话， 20行的判断就会失败，从而，这样的操作就能保证tid和c是从同一个cpu上获取的。这个防止并发的方法，是不是和我们开头说的很像？
这里很奇怪，就不能写成这样：
关抢占->读tid和c->开抢占
这样，不就能保证，都是从同一个cpu上读到的数据了吗？何必这么复杂还加条件判断。
结果，翻了下5.15的代码，发现这里被改写的更加简单了，甚至连抢占都不需要关了，有兴趣的请自行翻看代码。

第30行的barrier()，我也没太看明白，感觉这个屏障是不是不用这么重？猜测应该是防止优化，barrier() 的作用就是告诉编译器，内存中变量的值已经改变了，之前保存与寄存器或cache中的变量副本无效，如果访问该变量需要直接去内存中读取。

接下来，我们看快速路径的代码，重点在于62-69行，和一开头介绍的代码是同一种并发编程的范式。this_cpu_cmpxchg_double函数对比当前cpu上的tid和object是否与先前拿到的一样，如果是，说明在此期间，没有其他进程在和我们同步申请slub内存（严谨一点来说即使有，但是没有我们进展的快），并且，当前进程没有被调度到其他cpu上，则此时我们可以安全的拿着获取到的内存，并将相关变量更新后返回，否则，第68行，要回到redo便签处，重新进行内存申请操作。注意，这里的this_cpu_cmpxchg_double也是一个原子操作。

其实分析了这么多，slub内存申请流程中的并发处理操作，就和文章一开头给出的heart_alloc_int，是同一个范式，这样操作，可以避免上锁，在没有竞争时，效率是非常高的。虽然说，在函数slab_alloc_node的头和尾，简单的加上一个spinlock，就可以不用这这么复杂的代码，但是在内核中，slub内存申请必然是个热点函数，所以内核做这样的优化，是非常有价值的。