slob分配器
slob是一种传统K&R风格的分配器,支持返回对齐对象,分配器的粒度为2byte,然而,对于大多数典型架构,32位系统要求4byte,64位系统要求8byte。slob堆是一个由alloc_pages()分配的页面链表集合,在每个页面中,都有一个释放块(slob_t)单链表。这种堆满足要求就会增长变化,为了减少内存碎片,slob堆页面分成了三个链表,slob对象分为小于256byte的、小于1024byte的以及其他字节的。
从堆空间分配的内存涉及对足够多的free block页面的第一次搜索,接下来是页面的第一次自适应扫描。释放分配的过程是根据地质顺序将对象重新插入到空闲链表中,因此这是一种有效的地址序首次适应。
例如kmalloc/kfree的实现,从kmalloc返回的内存块会预追加一个4byte头,用于反馈kmalloc的大小,如果使用kmalloc分配大于等于PAGE_SIZE大小的内存块对象,则直接调用alloc_pages,分配复合页面,因此页面order没必要分看追踪。当PageSlab()返回false时,分配的内存块对象会在kfree()中检测到。
SLAB分配器是通过简单调用构造函数和析构函数,在SLOB基础上实现的。如果不设置SLAB_HWCACHE_ALIGN标志,则SLAB分配器分配的对象采用4byte对齐,在这种情况下,为了创建适当的对齐,低级别的分配器将对内存块碎片化。同样,分配器调用alloc_pages()分配大于等于PAGE_SIZE大小的内存块。
slab分配器
伙伴系统分配内存是以page为单位,在实际应用中很多内存分配都是以Byte为单位,此时,就需要slab分配器,slab分配器主要解决小内存块分配的问题,也是内存分配中非常重要的角色之一,slab分配器仍然是通过或版系统来实现实际的物理内存分配,只是在连续物理内存页面上实现了自己的算法,以此来实现对小块内存的管理,对于slab内存,主要考虑下面几点:
1、slab分配器分配和释放小块内存的方式;
2、slab分配器对小块内存节点着色的方式;
3、slab分配器对slab对象是否根据Per-CPU进行优化;
4、slab分配器如何解决大量空闲对象。
slab分配器分配管理内存主要通过一下几个API实现:
1、创建slab描述符:
/*
* 该函数除了能够将分配的内存添加到trace(用于动态内存检测),还会限制分配的内存
* 上限,若没有开启kasan功能,则不会修改size,也不会对size做限制。
*/
void __kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
slab_flags_t *flags)
{
unsigned int ok_size;
unsigned int optimal_size;
/*
* SLAB_KASAN is used to mark caches as ones that are sanitized by
* KASAN. Currently this flag is used in two places:
* 1. In slab_ksize() when calculating the size of the accessible
* memory within the object.
* 2. In slab_common.c to prevent merging of sanitized caches.
*/
*flags |= SLAB_KASAN;
//用于开启堆trace使能,如果CONFIG_KASAN_HW_TAGS未配置,则开启,否则需要看kasan_flag_stacktrace是否存在。
if (!kasan_stack_collection_enabled())
return;
ok_size = *size;
/* 将分配的元数据添加到redzone */
cache->kasan_info.alloc_meta_offset = *size;
*size += sizeof(struct kasan_alloc_meta);//修改大小,这里能够保证有空间存储kasan分配的元数据
/*
* If alloc meta doesn't fit, don't add it.
* This can only happen with SLAB, as it has KMALLOC_MAX_SIZE equal
* to KMALLOC_MAX_CACHE_SIZE and doesn't fall back to page_alloc for
* larger sizes.
* KMALLOC_MAX_SIZE有MAX_ORDER和PAGE_SHIFT计算得来,最大值不会超过25,
* PAGE_SHIFT大小固定,为13,MAX_ORDER的值由选择的页面大小决定,如4k页面
* 默认大小是11,16k页面默认大小是12,64k页面默认大小是14,如果超过16k页
* 面,KMALLOC_MAX_SIZE大小固定为25。注意这里的25表示能够分配的2^25 byte
* (32个元数据字节,为32M),
*/
if (*size > KMALLOC_MAX_SIZE) {
cache->kasan_info.alloc_meta_offset = 0;
*size = ok_size;
/* Continue, since free meta might still fit. */
}
/* Only the generic mode uses free meta or flexible redzones. */
if (!IS_ENABLED(CONFIG_KASAN_GENERIC)) {
cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META;
return;
}
/*
* Add free meta into redzone when it's not possible to store
* it in the object. This is the case when:
* 1. Object is SLAB_TYPESAFE_BY_RCU, which means that it can
* be touched after it was freed, or
* 2. Object has a constructor, which means it's expected to
* retain its content until the next allocation, or
* 3. Object is too small.
* Otherwise cache->kasan_info.free_meta_offset = 0 is implied.
*/
if ((cache->flags & SLAB_TYPESAFE_BY_RCU) || cache->ctor ||
cache->object_size < sizeof(struct kasan_free_meta)) {
ok_size = *size;
cache->kasan_info.free_meta_offset = *size;
*size += sizeof(struct kasan_free_meta);
/* If free meta doesn't fit, don't add it. */
if (*size > KMALLOC_MAX_SIZE) {
cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META;
*size = ok_size;
}
}
/* Calculate size with optimal redzone. */
optimal_size = cache->object_size + optimal_redzone(cache->object_size);
/* Limit it with KMALLOC_MAX_SIZE (relevant for SLAB only). */
if (optimal_size > KMALLOC_MAX_SIZE)
optimal_size = KMALLOC_MAX_SIZE;
/* Use optimal size if the size with added metas is not large enough. */
if (*size < optimal_size)
*size = optimal_size;
}
int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags)
{
size_t ralign = BYTES_PER_WORD;
gfp_t gfp;
int err;
unsigned int size = cachep->size;
#if DEBUG
#if FORCED_DEBUG
/*
* 如果增大的大小会将对象大小增加到下一个二次方以上,则使能重分区和上次用户
* 计数(对象较大的缓存除外):对象大小刚好大于二次方的缓存具有大量内部碎片。
*/
if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
2 * sizeof(unsigned long long)))
flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
if (!(flags & SLAB_TYPESAFE_BY_RCU))
flags |= SLAB_POISON;
#endif
#endif
/*
* Check that size is in terms of words. This is needed to avoid
* unaligned accesses for some archs when redzoning is used, and makes
* sure any on-slab bufctl's are also correctly aligned.
*/
size = ALIGN(size, BYTES_PER_WORD);//按WORD对齐
if (flags & SLAB_RED_ZONE) {
ralign = REDZONE_ALIGN;
/* If redzoning, ensure that the second redzone is suitably
* aligned, by adjusting the object size accordingly. */
size = ALIGN(size, REDZONE_ALIGN);//REDZONE_ALIGN是BYTES_PER_WORD与unsigned long long的最大者
}
/* 3) caller mandated alignment */
if (ralign < cachep->align) {
ralign = cachep->align;
}
/* disable debug if necessary */
if (ralign > __alignof__(unsigned long long))
flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
/*
* 4) Store it.
*/
cachep->align = ralign;
cachep->colour_off = cache_line_size();
/* Offset must be a multiple of the alignment. */
if (cachep->colour_off < cachep->align)
cachep->colour_off = cachep->align;
if (slab_is_available())
gfp = GFP_KERNEL;
else
gfp = GFP_NOWAIT;
#if DEBUG
/*
* Both debugging options require word-alignment which is calculated
* into align above.
*/
if (flags & SLAB_RED_ZONE) {
/* add space for red zone words */
cachep->obj_offset += sizeof(unsigned long long);
size += 2 * sizeof(unsigned long long);
}
if (flags & SLAB_STORE_USER) {
/* user store requires one word storage behind the end of
* the real object. But if the second red zone needs to be
* aligned to 64 bits, we must allow that much space.
*/
if (flags & SLAB_RED_ZONE)
size += REDZONE_ALIGN;
else
size += BYTES_PER_WORD;
}
#endif
/*
* kasan_cache_create通过SLAB_KASAN创建描述符,该函数基于内核KASAN机
* 制,用于创建可动态内存错误检测的SLAB描述符,该函数的实现依赖于内核配置
* CONFIG_KASAN和CONFIG_KASAN_HW_TAGS,后面的配置主要是开启堆区域的
* trace使能,改配置也用于是否使能kasan检测,如果没配置,则默认开启前面两个
* 使能开关。具体通过调用__kasan_cache_create实现。
*/
kasan_cache_create(cachep, &size, &flags);
size = ALIGN(size, cachep->align);
/*
* We should restrict the number of objects in a slab to implement
* byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
*/
if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
#if DEBUG
/*
* To activate debug pagealloc, off-slab management is necessary
* requirement. In early phase of initialization, small sized slab
* doesn't get initialized so it would not be possible. So, we need
* to check size >= 256. It guarantees that all necessary small
* sized slab is initialized in current slab initialization sequence.
*/
if (debug_pagealloc_enabled_static() && (flags & SLAB_POISON) &&
size >= 256 && cachep->object_size > cache_line_size()) {
if (size < PAGE_SIZE || size % PAGE_SIZE == 0) {
size_t tmp_size = ALIGN(size, PAGE_SIZE);
if (set_off_slab_cache(cachep, tmp_size, flags)) {
flags |= CFLGS_OFF_SLAB;
cachep->obj_offset += tmp_size - size;
size = tmp_size;
goto done;
}
}
}
#endif
if (set_objfreelist_slab_cache(cachep, size, flags)) {
flags |= CFLGS_OBJFREELIST_SLAB;
goto done;
}
if (set_off_slab_cache(cachep, size, flags)) {
flags |= CFLGS_OFF_SLAB;
goto done;
}
if (set_on_slab_cache(cachep, size, flags))
goto done;
return -E2BIG;
done:
cachep->freelist_size = cachep->num * sizeof(freelist_idx_t);
cachep->flags = flags;
cachep->allocflags = __GFP_COMP;
if (flags & SLAB_CACHE_DMA)
cachep->allocflags |= GFP_DMA;
if (flags & SLAB_CACHE_DMA32)
cachep->allocflags |= GFP_DMA32;
if (flags & SLAB_RECLAIM_ACCOUNT)
cachep->allocflags |= __GFP_RECLAIMABLE;
cachep->size = size;
cachep->reciprocal_buffer_size = reciprocal_value(size);
#if DEBUG
/*
* If we're going to use the generic kernel_map_pages()
* poisoning, then it's going to smash the contents of
* the redzone and userword anyhow, so switch them off.
*/
if (IS_ENABLED(CONFIG_PAGE_POISONING) &&
(cachep->flags & SLAB_POISON) &&
is_debug_pagealloc_cache(cachep))
cachep->flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
#endif
if (OFF_SLAB(cachep)) {
cachep->freelist_cache =
kmalloc_slab(cachep->freelist_size, 0u);
}
err = setup_cpu_cache(cachep, gfp);
if (err) {
__kmem_cache_release(cachep);
return err;
}
return 0;
}
struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
slab_flags_t flags, const char *name, void (*ctor)(void *))
{
struct kmem_cache *s;
if (slab_nomerge)
return NULL;
if (ctor)
return NULL;
size = ALIGN(size, sizeof(void *));
align = calculate_alignment(flags, align, size);
size = ALIGN(size, align);
flags = kmem_cache_flags(size, flags, name);
if (flags & SLAB_NEVER_MERGE)
return NULL;
/*
* 下面链表的遍历中,不同的if是保证找到的slab描述符能够与请求的描述符一致,
* 包括size、flags、字节对齐,找到的s->size必须介于size与
* size+sizeof(void *)之间,之所以这么判断,是因为size通过对齐了
* sizeof(void*),找到的slab的字节对齐大小不能够超过原来的字节对齐
*/
list_for_each_entry_reverse(s, &slab_caches, list) {
if (slab_unmergeable(s))
continue;
if (size > s->size)
continue;
if ((flags & SLAB_MERGE_SAME) != (s->flags & SLAB_MERGE_SAME))
continue;
/*
* Check if alignment is compatible.
* Courtesy of Adrian Drzewiecki
*/
if ((s->size & ~(align - 1)) != s->size)
continue;
if (s->size - size >= sizeof(void *))
continue;
if (IS_ENABLED(CONFIG_SLAB) && align &&
(align > s->align || s->align % align))
continue;
return s;
}
return NULL;
}
/*
* @align:由kmem_cache_create_usercopy函数通过原来的flags、align以及size
* 计算出来的字节对齐数
* @root_cache:kmem_cache_create_usercopy传入的是一个NULL。
*/
static struct kmem_cache *create_cache(const char *name,
unsigned int object_size, unsigned int align,
slab_flags_t flags, unsigned int useroffset,
unsigned int usersize, void (*ctor)(void *),
struct kmem_cache *root_cache)
{
struct kmem_cache *s;
int err;
if (WARN_ON(useroffset + usersize > object_size))
useroffset = usersize = 0;
err = -ENOMEM;
/*
* 在创建描述符之前,先创建一块初始化为0的内存,这里实则调用的是
* kmem_cache_alloc,从全局kmem_cache缓存池中创建一个slab缓存对象,
* 这里的kmem_cache是一个全局struct kmem_cache*变量。这里实则上是
* 创建了一个slab缓存管理描述符
*/
s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
if (!s)
goto out;
s->name = name;
s->size = s->object_size = object_size;
s->align = align;
s->ctor = ctor;
s->useroffset = useroffset;
s->usersize = usersize;
/*
* 根据新建的slab缓存对象,创建缓存描述符
*/
err = __kmem_cache_create(s, flags);
if (err)
goto out_free_cache;
s->refcount = 1;//新建slab描述符引用一次
list_add(&s->list, &slab_caches);//将新建的slab缓存描述符添加到全局缓存池列表中
out:
if (err)
return ERR_PTR(err);
return s;
out_free_cache:
kmem_cache_free(kmem_cache, s);
goto out;
}
/*
* @useroffset:Usercopy区域的偏移,这里传入的偏移为0;
* @usersize:Usercopy区域的大小,这里传入的大小为0。
*/
struct kmem_cache *
kmem_cache_create_usercopy(const char *name,
unsigned int size, unsigned int align,
slab_flags_t flags,
unsigned int useroffset, unsigned int usersize,
void (*ctor)(void *))
{
struct kmem_cache *s = NULL;
const char *cache_name;
int err;
mutex_lock(&slab_mutex);
err = kmem_cache_sanity_check(name, size);//该函数主要检查name是否为空、size是否有效,同时还会检查当前是否处于NMI/IRQ/SoftIRQ上下文或者BH未使能,如果第三个条件满足,则当前是不允许创建slab描述符,因为kmem_cache_create不允许在中断响应函数内部使用
if (err) {
goto out_unlock;
}
/* Refuse requests with allocator specific flags */
/*
* 允许创建slab描述符的FLAG有:
* SLAB_CORE_FLAGS、SLAB_RED_ZONE、SLAB_POISON、SLAB_STORE_USER
* SLAB_TRACE、SLAB_CONSISTENCY_CHECKS、SLAB_MEM_SPREAD
* SLAB_NOLEAKTRACE、SLAB_RECLAIM_ACCOUNT、SLAB_TEMPORARY
* SLAB_ACCOUNT
*/
if (flags & ~SLAB_FLAGS_PERMITTED) {
err = -EINVAL;
goto out_unlock;
}
/*
* Some allocators will constraint the set of valid flags to a subset
* of all flags. We expect them to define CACHE_CREATE_MASK in this
* case, and we'll just provide them with a sanitized version of the
* passed flags.
*/
flags &= CACHE_CREATE_MASK;//通过SLAB_CORE_FLAGS、SLAB_DEBUG_FLAGS和SLAB_CACHE_FLAGS过滤flags
/* Fail closed on bad usersize of useroffset values. */
if (WARN_ON(!usersize && useroffset) ||
WARN_ON(size < usersize || size - usersize < useroffset))
usersize = useroffset = 0;
/*
* __kmem_cache_alias函数主要目的通过find_mergeable在缓存中找到可以重复
* 引用的slab描述符,如果找到了,就返回,否则返回一个NULL,这里主要是通过
* size、align计算出描述符所能分配的页面字节对齐后的大小,然后根据这个大小
* 在一个全局slab_caches链表中查找,若找到了这样的slab,则就直接退出,否则
* 调用create_cache分配新的slab描述符。
*/
if (!usersize)
s = __kmem_cache_alias(name, size, align, flags, ctor);
if (s)
goto out_unlock;
cache_name = kstrdup_const(name, GFP_KERNEL);//有条件地复制一个存在的常量字符串,该函数得到的字符串,必须有kfree_const函数释放,且不可通过krealloc重新分配。
if (!cache_name) {
err = -ENOMEM;
goto out_unlock;
}
s = create_cache(cache_name, size,
calculate_alignment(flags, align, size),
flags, useroffset, usersize, ctor, NULL);
if (IS_ERR(s)) {
err = PTR_ERR(s);
kfree_const(cache_name);
}
out_unlock:
mutex_unlock(&slab_mutex);
if (err) {
if (flags & SLAB_PANIC)
panic("kmem_cache_create: Failed to create slab '%s'. Error %d\n",
name, err);
else {
pr_warn("kmem_cache_create(%s) failed with error %d\n",
name, err);
dump_stack();
}
return NULL;
}
return s;
}
/*
* @name:表示描述符的名称
* @size:表示分配器分配内存的大小;
* @align:表示内存字节对齐的大小;
* @flags:表示创建描述符的标志,
* 常用的有SLAB_CORE_FLAGS、SLAB_DEBUG_FLAGS、SLAB_CACHE_FLAGS(这三个flag称为CACHE_CREATE_MASK):
* (1)SLAB_CORE_FLAGS:
* 1)SLAB_HWCACHE_ALIGN(表示缓存线上的对齐objs);
* 2)SLAB_CACHE_DMA(使用GFP_DMA分配的内存请求);
* 3)SLAB_PANIC(如果kmem_cache_create失败,则报PANIC);
* 4)SLAB_DESTROY_BY_RCU(表示由RCU导致的延迟释放SLAB);
* 5)SLAB_DEBUG_OBJECTS(阻止在释放时的检查)。
* (2)SLAB_DEBUG_FLAGS:
* 1)若内核开启了CONFIG_DEBUG_SLAB,则该标志包括了下面几种
* a)SLAB_RED_ZONE:表示缓存中的红色区域对象;
* b)SLAB_POISON:表示poison对象;
* c)SLAB_STOR_USER:表示为bug hunting存储最后的ower。
* 2)若内核开启的是CONFIG_SLUB_DEBUG,则增加了两个FLAG:
* a)SLAB_TRACE:表示将创建的内存添加到trace中,在内存的分配与释放都可trace;
* b)SLAB_DEBUG_FREE:在释放内存时执行更多的检测。
* @ctor:对象钩子。
* @return:成功则返回直线缓存的指针,失败则返回NULL。在中断内部不可调用,
* 但能够被打断。当缓存分配了新页面,则运行ctor钩子。
*/
struct kmem_cache *
kmem_cache_create(const char *name, unsigned int size, unsigned int align,
slab_flags_t flags, void (*ctor)(void *))
{
return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
ctor);
}
EXPORT_SYMBOL(kmem_cache_create);