Linux内存管理之内存分配器

slob分配器

slob是一种传统K&R风格的分配器,支持返回对齐对象,分配器的粒度为2byte,然而,对于大多数典型架构,32位系统要求4byte,64位系统要求8byte。slob堆是一个由alloc_pages()分配的页面链表集合,在每个页面中,都有一个释放块(slob_t)单链表。这种堆满足要求就会增长变化,为了减少内存碎片,slob堆页面分成了三个链表,slob对象分为小于256byte的、小于1024byte的以及其他字节的。
从堆空间分配的内存涉及对足够多的free block页面的第一次搜索,接下来是页面的第一次自适应扫描。释放分配的过程是根据地质顺序将对象重新插入到空闲链表中,因此这是一种有效的地址序首次适应。
例如kmalloc/kfree的实现,从kmalloc返回的内存块会预追加一个4byte头,用于反馈kmalloc的大小,如果使用kmalloc分配大于等于PAGE_SIZE大小的内存块对象,则直接调用alloc_pages,分配复合页面,因此页面order没必要分看追踪。当PageSlab()返回false时,分配的内存块对象会在kfree()中检测到。
SLAB分配器是通过简单调用构造函数和析构函数,在SLOB基础上实现的。如果不设置SLAB_HWCACHE_ALIGN标志,则SLAB分配器分配的对象采用4byte对齐,在这种情况下,为了创建适当的对齐,低级别的分配器将对内存块碎片化。同样,分配器调用alloc_pages()分配大于等于PAGE_SIZE大小的内存块。

slab分配器

伙伴系统分配内存是以page为单位,在实际应用中很多内存分配都是以Byte为单位,此时,就需要slab分配器,slab分配器主要解决小内存块分配的问题,也是内存分配中非常重要的角色之一,slab分配器仍然是通过或版系统来实现实际的物理内存分配,只是在连续物理内存页面上实现了自己的算法,以此来实现对小块内存的管理,对于slab内存,主要考虑下面几点:
1、slab分配器分配和释放小块内存的方式;
2、slab分配器对小块内存节点着色的方式;
3、slab分配器对slab对象是否根据Per-CPU进行优化;
4、slab分配器如何解决大量空闲对象。

slab分配器分配管理内存主要通过一下几个API实现:

1、创建slab描述符:
/*
 * 该函数除了能够将分配的内存添加到trace(用于动态内存检测),还会限制分配的内存
 * 上限,若没有开启kasan功能,则不会修改size,也不会对size做限制。
 */
void __kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
			  slab_flags_t *flags)
{
	unsigned int ok_size;
	unsigned int optimal_size;

	/*
	 * SLAB_KASAN is used to mark caches as ones that are sanitized by
	 * KASAN. Currently this flag is used in two places:
	 * 1. In slab_ksize() when calculating the size of the accessible
	 *    memory within the object.
	 * 2. In slab_common.c to prevent merging of sanitized caches.
	 */
	*flags |= SLAB_KASAN;

	//用于开启堆trace使能,如果CONFIG_KASAN_HW_TAGS未配置,则开启,否则需要看kasan_flag_stacktrace是否存在。
	if (!kasan_stack_collection_enabled())
		return;

	ok_size = *size;

	/* 将分配的元数据添加到redzone */
	cache->kasan_info.alloc_meta_offset = *size;
	*size += sizeof(struct kasan_alloc_meta);//修改大小,这里能够保证有空间存储kasan分配的元数据

	/*
	 * If alloc meta doesn't fit, don't add it.
	 * This can only happen with SLAB, as it has KMALLOC_MAX_SIZE equal
	 * to KMALLOC_MAX_CACHE_SIZE and doesn't fall back to page_alloc for
	 * larger sizes.
	 * KMALLOC_MAX_SIZE有MAX_ORDER和PAGE_SHIFT计算得来,最大值不会超过25,
	 * PAGE_SHIFT大小固定,为13,MAX_ORDER的值由选择的页面大小决定,如4k页面
	 * 默认大小是11,16k页面默认大小是12,64k页面默认大小是14,如果超过16k页
	 * 面,KMALLOC_MAX_SIZE大小固定为25。注意这里的25表示能够分配的2^25 byte
	 * (32个元数据字节,为32M),
	 */
	if (*size > KMALLOC_MAX_SIZE) {
		cache->kasan_info.alloc_meta_offset = 0;
		*size = ok_size;
		/* Continue, since free meta might still fit. */
	}

	/* Only the generic mode uses free meta or flexible redzones. */
	if (!IS_ENABLED(CONFIG_KASAN_GENERIC)) {
		cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META;
		return;
	}

	/*
	 * Add free meta into redzone when it's not possible to store
	 * it in the object. This is the case when:
	 * 1. Object is SLAB_TYPESAFE_BY_RCU, which means that it can
	 *    be touched after it was freed, or
	 * 2. Object has a constructor, which means it's expected to
	 *    retain its content until the next allocation, or
	 * 3. Object is too small.
	 * Otherwise cache->kasan_info.free_meta_offset = 0 is implied.
	 */
	if ((cache->flags & SLAB_TYPESAFE_BY_RCU) || cache->ctor ||
	    cache->object_size < sizeof(struct kasan_free_meta)) {
		ok_size = *size;

		cache->kasan_info.free_meta_offset = *size;
		*size += sizeof(struct kasan_free_meta);

		/* If free meta doesn't fit, don't add it. */
		if (*size > KMALLOC_MAX_SIZE) {
			cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META;
			*size = ok_size;
		}
	}

	/* Calculate size with optimal redzone. */
	optimal_size = cache->object_size + optimal_redzone(cache->object_size);
	/* Limit it with KMALLOC_MAX_SIZE (relevant for SLAB only). */
	if (optimal_size > KMALLOC_MAX_SIZE)
		optimal_size = KMALLOC_MAX_SIZE;
	/* Use optimal size if the size with added metas is not large enough. */
	if (*size < optimal_size)
		*size = optimal_size;
}

int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
{
	size_t ralign = BYTES_PER_WORD;
	gfp_t gfp;
	int err;
	unsigned int size = cachep->size;

#if DEBUG
#if FORCED_DEBUG
	/*
	 * 如果增大的大小会将对象大小增加到下一个二次方以上,则使能重分区和上次用户
	 * 计数(对象较大的缓存除外):对象大小刚好大于二次方的缓存具有大量内部碎片。
	 */
	if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
						2 * sizeof(unsigned long long)))
		flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
	if (!(flags & SLAB_TYPESAFE_BY_RCU))
		flags |= SLAB_POISON;
#endif
#endif

	/*
	 * Check that size is in terms of words.  This is needed to avoid
	 * unaligned accesses for some archs when redzoning is used, and makes
	 * sure any on-slab bufctl's are also correctly aligned.
	 */
	size = ALIGN(size, BYTES_PER_WORD);//按WORD对齐

	if (flags & SLAB_RED_ZONE) {
		ralign = REDZONE_ALIGN;
		/* If redzoning, ensure that the second redzone is suitably
		 * aligned, by adjusting the object size accordingly. */
		size = ALIGN(size, REDZONE_ALIGN);//REDZONE_ALIGN是BYTES_PER_WORD与unsigned long long的最大者
	}

	/* 3) caller mandated alignment */
	if (ralign < cachep->align) {
		ralign = cachep->align;
	}
	/* disable debug if necessary */
	if (ralign > __alignof__(unsigned long long))
		flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
	/*
	 * 4) Store it.
	 */
	cachep->align = ralign;
	cachep->colour_off = cache_line_size();
	/* Offset must be a multiple of the alignment. */
	if (cachep->colour_off < cachep->align)
		cachep->colour_off = cachep->align;

	if (slab_is_available())
		gfp = GFP_KERNEL;
	else
		gfp = GFP_NOWAIT;

#if DEBUG

	/*
	 * Both debugging options require word-alignment which is calculated
	 * into align above.
	 */
	if (flags & SLAB_RED_ZONE) {
		/* add space for red zone words */
		cachep->obj_offset += sizeof(unsigned long long);
		size += 2 * sizeof(unsigned long long);
	}
	if (flags & SLAB_STORE_USER) {
		/* user store requires one word storage behind the end of
		 * the real object. But if the second red zone needs to be
		 * aligned to 64 bits, we must allow that much space.
		 */
		if (flags & SLAB_RED_ZONE)
			size += REDZONE_ALIGN;
		else
			size += BYTES_PER_WORD;
	}
#endif

	/*
	 * kasan_cache_create通过SLAB_KASAN创建描述符,该函数基于内核KASAN机
	 * 制,用于创建可动态内存错误检测的SLAB描述符,该函数的实现依赖于内核配置
	 * CONFIG_KASAN和CONFIG_KASAN_HW_TAGS,后面的配置主要是开启堆区域的
	 * trace使能,改配置也用于是否使能kasan检测,如果没配置,则默认开启前面两个
	 * 使能开关。具体通过调用__kasan_cache_create实现。
	 */
	kasan_cache_create(cachep, &size, &flags);

	size = ALIGN(size, cachep->align);
	/*
	 * We should restrict the number of objects in a slab to implement
	 * byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
	 */
	if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
		size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);

#if DEBUG
	/*
	 * To activate debug pagealloc, off-slab management is necessary
	 * requirement. In early phase of initialization, small sized slab
	 * doesn't get initialized so it would not be possible. So, we need
	 * to check size >= 256. It guarantees that all necessary small
	 * sized slab is initialized in current slab initialization sequence.
	 */
	if (debug_pagealloc_enabled_static() && (flags & SLAB_POISON) &&
		size >= 256 && cachep->object_size > cache_line_size()) {
		if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
			size_t tmp_size = ALIGN(size, PAGE_SIZE);

			if (set_off_slab_cache(cachep, tmp_size, flags)) {
				flags |= CFLGS_OFF_SLAB;
				cachep->obj_offset += tmp_size - size;
				size = tmp_size;
				goto done;
			}
		}
	}
#endif

	if (set_objfreelist_slab_cache(cachep, size, flags)) {
		flags |= CFLGS_OBJFREELIST_SLAB;
		goto done;
	}

	if (set_off_slab_cache(cachep, size, flags)) {
		flags |= CFLGS_OFF_SLAB;
		goto done;
	}

	if (set_on_slab_cache(cachep, size, flags))
		goto done;

	return -E2BIG;

done:
	cachep->freelist_size = cachep->num * sizeof(freelist_idx_t);
	cachep->flags = flags;
	cachep->allocflags = __GFP_COMP;
	if (flags & SLAB_CACHE_DMA)
		cachep->allocflags |= GFP_DMA;
	if (flags & SLAB_CACHE_DMA32)
		cachep->allocflags |= GFP_DMA32;
	if (flags & SLAB_RECLAIM_ACCOUNT)
		cachep->allocflags |= __GFP_RECLAIMABLE;
	cachep->size = size;
	cachep->reciprocal_buffer_size = reciprocal_value(size);

#if DEBUG
	/*
	 * If we're going to use the generic kernel_map_pages()
	 * poisoning, then it's going to smash the contents of
	 * the redzone and userword anyhow, so switch them off.
	 */
	if (IS_ENABLED(CONFIG_PAGE_POISONING) &&
		(cachep->flags & SLAB_POISON) &&
		is_debug_pagealloc_cache(cachep))
		cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
#endif

	if (OFF_SLAB(cachep)) {
		cachep->freelist_cache =
			kmalloc_slab(cachep->freelist_size, 0u);
	}

	err = setup_cpu_cache(cachep, gfp);
	if (err) {
		__kmem_cache_release(cachep);
		return err;
	}

	return 0;
}

struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
		slab_flags_t flags, const char *name, void (*ctor)(void *))
{
	struct kmem_cache *s;

	if (slab_nomerge)
		return NULL;

	if (ctor)
		return NULL;

	size = ALIGN(size, sizeof(void *));
	align = calculate_alignment(flags, align, size);
	size = ALIGN(size, align);
	flags = kmem_cache_flags(size, flags, name);

	if (flags & SLAB_NEVER_MERGE)
		return NULL;

	/*
	 * 下面链表的遍历中,不同的if是保证找到的slab描述符能够与请求的描述符一致,
	 * 包括size、flags、字节对齐,找到的s->size必须介于size与
	 * size+sizeof(void *)之间,之所以这么判断,是因为size通过对齐了
	 * sizeof(void*),找到的slab的字节对齐大小不能够超过原来的字节对齐
	 */
	list_for_each_entry_reverse(s, &slab_caches, list) {
		if (slab_unmergeable(s))
			continue;

		if (size > s->size)
			continue;

		if ((flags & SLAB_MERGE_SAME) != (s->flags & SLAB_MERGE_SAME))
			continue;
		/*
		 * Check if alignment is compatible.
		 * Courtesy of Adrian Drzewiecki
		 */
		if ((s->size & ~(align - 1)) != s->size)
			continue;

		if (s->size - size >= sizeof(void *))
			continue;

		if (IS_ENABLED(CONFIG_SLAB) && align &&
			(align > s->align || s->align % align))
			continue;

		return s;
	}
	return NULL;
}

/*
 * @align:由kmem_cache_create_usercopy函数通过原来的flags、align以及size
 *         计算出来的字节对齐数
 * @root_cache:kmem_cache_create_usercopy传入的是一个NULL。
 */
static struct kmem_cache *create_cache(const char *name,
		unsigned int object_size, unsigned int align,
		slab_flags_t flags, unsigned int useroffset,
		unsigned int usersize, void (*ctor)(void *),
		struct kmem_cache *root_cache)
{
	struct kmem_cache *s;
	int err;

	if (WARN_ON(useroffset + usersize > object_size))
		useroffset = usersize = 0;

	err = -ENOMEM;
	/*
	 * 在创建描述符之前,先创建一块初始化为0的内存,这里实则调用的是
	 * kmem_cache_alloc,从全局kmem_cache缓存池中创建一个slab缓存对象,
	 * 这里的kmem_cache是一个全局struct kmem_cache*变量。这里实则上是
	 * 创建了一个slab缓存管理描述符
	 */
	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
	if (!s)
		goto out;

	s->name = name;
	s->size = s->object_size = object_size;
	s->align = align;
	s->ctor = ctor;
	s->useroffset = useroffset;
	s->usersize = usersize;

	/*
	 * 根据新建的slab缓存对象,创建缓存描述符
	 */
	err = __kmem_cache_create(s, flags);
	if (err)
		goto out_free_cache;

	s->refcount = 1;//新建slab描述符引用一次
	list_add(&s->list, &slab_caches);//将新建的slab缓存描述符添加到全局缓存池列表中
out:
	if (err)
		return ERR_PTR(err);
	return s;

out_free_cache:
	kmem_cache_free(kmem_cache, s);
	goto out;
}

/*
 * @useroffset:Usercopy区域的偏移,这里传入的偏移为0;
 * @usersize:Usercopy区域的大小,这里传入的大小为0。
 */
struct kmem_cache *
kmem_cache_create_usercopy(const char *name,
		  unsigned int size, unsigned int align,
		  slab_flags_t flags,
		  unsigned int useroffset, unsigned int usersize,
		  void (*ctor)(void *))
{
	struct kmem_cache *s = NULL;
	const char *cache_name;
	int err;

	mutex_lock(&slab_mutex);

	err = kmem_cache_sanity_check(name, size);//该函数主要检查name是否为空、size是否有效,同时还会检查当前是否处于NMI/IRQ/SoftIRQ上下文或者BH未使能,如果第三个条件满足,则当前是不允许创建slab描述符,因为kmem_cache_create不允许在中断响应函数内部使用
	if (err) {
		goto out_unlock;
	}

	/* Refuse requests with allocator specific flags */
	/*
	 * 允许创建slab描述符的FLAG有:
	 * SLAB_CORE_FLAGS、SLAB_RED_ZONE、SLAB_POISON、SLAB_STORE_USER
	 * SLAB_TRACE、SLAB_CONSISTENCY_CHECKS、SLAB_MEM_SPREAD
	 * SLAB_NOLEAKTRACE、SLAB_RECLAIM_ACCOUNT、SLAB_TEMPORARY
	 * SLAB_ACCOUNT
	 */
	if (flags & ~SLAB_FLAGS_PERMITTED) {
		err = -EINVAL;
		goto out_unlock;
	}

	/*
	 * Some allocators will constraint the set of valid flags to a subset
	 * of all flags. We expect them to define CACHE_CREATE_MASK in this
	 * case, and we'll just provide them with a sanitized version of the
	 * passed flags.
	 */
	flags &= CACHE_CREATE_MASK;//通过SLAB_CORE_FLAGS、SLAB_DEBUG_FLAGS和SLAB_CACHE_FLAGS过滤flags

	/* Fail closed on bad usersize of useroffset values. */
	if (WARN_ON(!usersize && useroffset) ||
	    WARN_ON(size < usersize || size - usersize < useroffset))
		usersize = useroffset = 0;

	/*
	 * __kmem_cache_alias函数主要目的通过find_mergeable在缓存中找到可以重复
	 * 引用的slab描述符,如果找到了,就返回,否则返回一个NULL,这里主要是通过
	 * size、align计算出描述符所能分配的页面字节对齐后的大小,然后根据这个大小
	 * 在一个全局slab_caches链表中查找,若找到了这样的slab,则就直接退出,否则
	 * 调用create_cache分配新的slab描述符。
	 */
	if (!usersize)
		s = __kmem_cache_alias(name, size, align, flags, ctor);
	if (s)
		goto out_unlock;

	cache_name = kstrdup_const(name, GFP_KERNEL);//有条件地复制一个存在的常量字符串,该函数得到的字符串,必须有kfree_const函数释放,且不可通过krealloc重新分配。
	if (!cache_name) {
		err = -ENOMEM;
		goto out_unlock;
	}

	s = create_cache(cache_name, size,
			 calculate_alignment(flags, align, size),
			 flags, useroffset, usersize, ctor, NULL);
	if (IS_ERR(s)) {
		err = PTR_ERR(s);
		kfree_const(cache_name);
	}

out_unlock:
	mutex_unlock(&slab_mutex);

	if (err) {
		if (flags & SLAB_PANIC)
			panic("kmem_cache_create: Failed to create slab '%s'. Error %d\n",
				name, err);
		else {
			pr_warn("kmem_cache_create(%s) failed with error %d\n",
				name, err);
			dump_stack();
		}
		return NULL;
	}
	return s;
}

/*
 * @name:表示描述符的名称
 * @size:表示分配器分配内存的大小;
 * @align:表示内存字节对齐的大小;
 * @flags:表示创建描述符的标志,
 * 常用的有SLAB_CORE_FLAGS、SLAB_DEBUG_FLAGS、SLAB_CACHE_FLAGS(这三个flag称为CACHE_CREATE_MASK):
 * (1)SLAB_CORE_FLAGS:
 * 1)SLAB_HWCACHE_ALIGN(表示缓存线上的对齐objs);
 * 2)SLAB_CACHE_DMA(使用GFP_DMA分配的内存请求);
 * 3)SLAB_PANIC(如果kmem_cache_create失败,则报PANIC);
 * 4)SLAB_DESTROY_BY_RCU(表示由RCU导致的延迟释放SLAB);
 * 5)SLAB_DEBUG_OBJECTS(阻止在释放时的检查)。
 * (2)SLAB_DEBUG_FLAGS:
 * 1)若内核开启了CONFIG_DEBUG_SLAB,则该标志包括了下面几种
 * a)SLAB_RED_ZONE:表示缓存中的红色区域对象;
 * b)SLAB_POISON:表示poison对象;
 * c)SLAB_STOR_USER:表示为bug hunting存储最后的ower。
 * 2)若内核开启的是CONFIG_SLUB_DEBUG,则增加了两个FLAG:
 * a)SLAB_TRACE:表示将创建的内存添加到trace中,在内存的分配与释放都可trace;
 * b)SLAB_DEBUG_FREE:在释放内存时执行更多的检测。
 * @ctor:对象钩子。
 * @return:成功则返回直线缓存的指针,失败则返回NULL。在中断内部不可调用,
 *         但能够被打断。当缓存分配了新页面,则运行ctor钩子。
 */
struct kmem_cache *
kmem_cache_create(const char *name, unsigned int size, unsigned int align,
		slab_flags_t flags, void (*ctor)(void *))
{
	return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
					  ctor);
}
EXPORT_SYMBOL(kmem_cache_create);

slub分配器

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值