【原创】 未经允许,请勿转载
linux 内核采用一种称为"slab"的缓冲区分配和管理的方法。在slab方法中,每种重要的数据结构都有自己专用的缓冲区队列,每种数据结构都有相应的构造constructor和析构destrucor函数。
slab管理特点,每种对象的缓冲区队列并非由各个对象直接构成,而是由一连串的大块slab构成,而每个大块中则包含了若干同种的对象。一般对象分2种,一种是大对象,一种是小对象。小对象是指在一个页面中可以容纳下好几个的那种对象。内核中大多数数据结构都是这样的小对象。
为每种对象建立的slab队列都有个对猎头,其控制结构为kmem_cache_t。每种对象的slab队列头也是在slab上,系统中有个总的slab队列,其对象是各个其他对象的slab对猎头,其队列头也是一个kmem_cache_t结构,称为cache_cache。
这样形成一种层次的树形结构。
当数据结构比较大,不属于小对象时,slab的结构略有不同。不同之处是将slab的控制结构游离出来,几种放在另外的slab上。由于在slab的控制结构kmem_slab_t 中有一个指针指向相应的slab上的第一个对象,所以逻辑上是一样的。
另外,当对象的大小恰好是物理页面的1/2、1/4、1/8时,将依附于每个对象的链接指针紧挨着放在一起会造成slab空间的重大浪费,所以,在这种情况下,将链接指针也从slab上游离出来几种存放。
Linux 内核还有一种既类似于物理页面分配中采用按大小分区,又采用slab方式管理的通用缓冲池,称为“slab_cache”。与cache_cache大同小异,只不过其顶层不是一个队列,而是一个结构数组,数组中的每个元素指向一个不同的slab队列。这些slab队列的不同之处仅在于所载对象大小,从32,64,128....直到128k。从通用缓冲池中分配和释放缓冲区的函数为:kmalloc,kfree.
在kernel的 mm/slab_commom.c文件中,定义了两个全局的成员: LIST_HEAD(slab_caches)和struct kmem_cache *kmem_cache .所有创建的kmem_cache 对象都将链入slab_caches这个全局链表中。通过cat /proc/slabinfo可以查看所有该链表中的slab对象。
kmem_cache为系统静态初始化的第一个kmem_cache,此后分配的kmem_cache结构都是从该全局的kmem_cache slab上分配的。
这里全局的kmem_cache本身就是一个kmem_cache类型的结构体,这里为第一个kmem_cache,如何用kmem_cache_create分配呢?因此,第一个kmem_cache在系统初始化时,调用create_boot_cache静态分配,当有了第一个kmem_cache后,以后创建的kmem_cache都是基于该全局的kmem_cache分配的对象。
分配第一个kmem_cache对象后,__kmeme_cache_create就进行一系列的计算,以确定最佳的slab构成。包括:每个slab由几个页面组成,划分多少个缓冲区;slab的控制结构kmem_slab_t应该在slab外面集中存放还是就放在每个slab的尾部;每个缓冲区的链接指针应该在slab外面集中存放还是在slab上与相应的缓冲区紧挨着放在一起;还有“colour”数量等等。
第一阶段 创建第一个kmem_cache
首先,分配第一个kmem_cache 对象,即全局的kmem_cache. 此时分配好kmem_cache slab,但并为真正给起分配page,分配page的过程是在真正在该slab分配缓冲区时分配的。
kmem_cache_init->create_boot_cache->__kmem_cache_create.
kmem_cache_init的功能很多,不仅仅是分配第一个kmem_cache,还包括分配第一个kmalloc_cache和创建剩余的kmalloc_cache等;将初始化好的array data 拷贝到kmem_cache->array 相应的位置和kmalloc_caches中array的相应位置。
void __init
kmem_cache_init(void)
{
int i;
BUILD_BUG_ON(sizeof(((struct page *)NULL)->lru) <
sizeof(struct rcu_head));
kmem_cache = &kmem_cache_boot;
//第一个kmem_cache是静态分配的。
setup_node_pointer(kmem_cache);
//kmem_cache->node = &kmem_cache->array[nr_cpu_ids];
if (num_possible_nodes() == 1)
use_alien_caches = 0;
for (i = 0; i < NUM_INIT_LISTS; i++)
kmem_cache_node_init(&init_kmem_cache_node[i]);
//初始化kmem_cache_node,对于UMA架构的计算机,只有一个node。
set_up_node(kmem_cache, CACHE_CACHE);
//初始化kmem_cache->node[node]=&init_kmem_cache_node。
/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory if
* not overridden on the command line.
*/
if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
slab_max_order = SLAB_MAX_ORDER_HI;
/* Bootstrap is tricky, because several objects are allocated
* from caches that do not exist yet:
* 1) initialize the kmem_cache cache: it contains the struct
* kmem_cache structures of all caches, except kmem_cache itself:
* kmem_cache is statically allocated.
* Initially an __init data area is used for the head array and the
* kmem_cache_node structures, it's replaced with a kmalloc allocated
* array at the end of the bootstrap.
* 2) Create the first kmalloc cache.
* The struct kmem_cache for the new cache is allocated normally.
* An __init data area is used for the head array.
* 3) Create the remaining kmalloc caches, with minimally sized
* head arrays.
* 4) Replace the __init data head arrays for kmem_cache and the first
* kmalloc cache with kmalloc allocated arrays.
* 5) Replace the __init data for kmem_cache_node for kmem_cache and
* the other cache's with kmalloc allocated memory.
* 6) Resize the head arrays of the kmalloc caches to their final sizes.
*/
/* 1) create the kmem_cache */
/*
* struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
*/
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, array[nr_cpu_ids]) +
nr_node_ids * sizeof(struct kmem_cache_node *),
SLAB_HWCACHE_ALIGN);
//创建第一个kmem_cache
list_add(&kmem_cache->list, &slab_caches);
//将kmem_cache链入全局链表slab_caches
/* 2+3) create the kmalloc caches */
/*
* Initialize the caches that provide memory for the array cache and the
* kmem_cache_node structures first. Without this, further allocations will
* bug.
*/
kmalloc_caches[INDEX_AC] = create_kmalloc_cache("kmalloc-ac",
kmalloc_size(INDEX_AC), ARCH_KMALLOC_FLAGS);
//创建kmalloc-array cache使用的kmem_cache
if (INDEX_AC != INDEX_NODE)
kmalloc_caches[INDEX_NODE] =
create_kmalloc_cache("kmalloc-node",
kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
//创建kmalloc node cache
slab_early_init = 0;
/* 4) Replace the bootstrap head arrays */
{
struct array_cache *ptr;
ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);
memcpy(ptr, cpu_cache_get(kmem_cache),
sizeof(struct arraycache_init));
/*
* Do not assume that spinlocks can be initialized via memcpy:
*/
spin_lock_init(&ptr->lock);
kmem_cache->array[smp_processor_id()] = ptr;
ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);
BUG_ON(cpu_cache_get(kmalloc_caches[INDEX_AC])
!= &initarray_generic.cache);
memcpy(ptr, cpu_cache_get(kmalloc_caches[INDEX_AC]),
sizeof(struct arraycache_init));
/*
* Do not assume that spinlocks can be initialized via memcpy:
*/
spin_lock_init(&ptr->lock);
kmalloc_caches[INDEX_AC]->array[smp_processor_id()] = ptr;
//该段程序是分配array ptr替换初始化时用的array
}
/* 5) Replace the bootstrap kmem_cache_node */
{
int nid;
for_each_online_node(nid) {
init_list(kmem_cache, &init_kmem_cache_node[CACHE_CACHE + nid], nid);
//替换kmem_cache的kmem_cache_node
init_list(kmalloc_caches[INDEX_AC],
&init_kmem_cache_node[SIZE_AC + nid], nid);
//替换kmalloc_caches array的kmem_cache_node
if (INDEX_AC != INDEX_NODE) {
init_list(kmalloc_caches[INDEX_NODE],
&init_kmem_cache_node[SIZE_NODE + nid], nid);
//替换kmalloc_cache node的kmem_cache_node
}
}
}
/*这里注意的是,正常创建kmem_cache的前提是,第一个kmem_cache要先创建好, kmalloc_cache array的kmem_cache 要创建好,kmalloc_cache node的kmem_cache要创建好之后,才可以创建其他normal kmem_cache*/
create_kmalloc_caches(ARCH_KMALLOC_FLAGS);
//一切都就绪,可以创建kmalloc需要的的kmem_cache了。
}
下面先看第一个kmem_cache创建过程。
void __init
create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
unsigned long flags)
{
int err;
s->name = name;
s->size = s->object_size = size;
s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
err = __kmem_cache_create(s, flags);
//真正创建kmem_cache的函数
if (err)
panic("Creation of kmalloc slab %s size=%zu failed. Reason %d\n",
name, size, err);
s->refcount = -1; /* Exempt from merging for now */
}
这里去除一些debug的代码和不重要的代码,避免代码太长。
int
__kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
{
size_t left_over, freelist_size, ralign;
gfp_t gfp;
int err;
size_t size = cachep->size;
//下面是一些对齐操作
/*
* Check that size is in terms of words. This is needed to avoid
* unaligned accesses for some archs when redzoning is used, and makes
* sure any on-slab bufctl's are also correctly aligned.
*/
if (size & (BYTES_PER_WORD - 1)) {
//size 按byte对齐
size += (BYTES_PER_WORD - 1);
size &= ~(BYTES_PER_WORD - 1);
}
/*
* Redzoning and user store require word alignment or possibly larger.
* Note this will be overridden by architecture or caller mandated
* alignment if either is greater than BYTES_PER_WORD.
*/
if (flags & SLAB_STORE_USER)
ralign = BYTES_PER_WORD;
if (flags & SLAB_RED_ZONE) {
ralign = REDZONE_ALIGN;
/* If redzoning, ensure that the second redzone is suitably
* aligned, by adjusting the object size accordingly. */
size += REDZONE_ALIGN - 1;
size &= ~(REDZONE_ALIGN - 1);
}
/* 3) caller mandated alignment */
if (ralign < cachep->align) {
ralign = cachep->align;
}
/* disable debug if necessary */
if (ralign > __alignof__(unsigned long long))
flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
/*
* 4) Store it.
*/
cachep->align = ralign;
if (slab_is_available())
gfp = GFP_KERNEL;
else
gfp = GFP_NOWAIT;
setup_node_pointer(cachep);
/*
* Determine if the slab management is 'on' or 'off' slab.
* (bootstrapping cannot cope with offslab caches so don't do
* it too early on. Always use on-slab management when
* SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)
*/
if ((size >= (PAGE_SIZE >> 5)) && !slab_early_init &&
!(flags & SLAB_NOLEAKTRACE))
//size > 128时,将slab控制结构脱离slab
/*
* Size is large, assume best to place the slab management obj
* off-slab (should allow better packing of objs).
*/
flags |= CFLGS_OFF_SLAB;
size = ALIGN(size, cachep->align);
/*
* We should restrict the number of objects in a slab to implement
* byte sized index. Refer comment on SLAB_OBJ_MIN_SIZE definition.
*/
if (FREELIST_BYTE_INDEX && size < SLAB_OBJ_MIN_SIZE)
size = ALIGN(SLAB_OBJ_MIN_SIZE, cachep->align);
left_over = calculate_slab_order(cachep, size, cachep->align, flags);
//计算slab每次分配使用page order和一个slab可以容纳多少个对象。返回剩余空间。
if (!cachep->num)
return -E2BIG;
freelist_size = calculate_freelist_size(cachep->num, cachep->align); //
/*
* If the slab has been placed off-slab, and we have enough space then
* move it on-slab. This is at the expense of any extra colouring.
*/
if (flags & CFLGS_OFF_SLAB && left_over >= freelist_size) { //
判断剩余空间是否可以容纳下freelist_size,可以的话,就不适用OFF_SLAB
flags &= ~CFLGS_OFF_SLAB;
left_over -= freelist_size;
}
if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
freelist_size = calculate_freelist_size(cachep->num, 0);
#ifdef CONFIG_PAGE_POISONING
/* If we're going to use the generic kernel_map_pages()
* poisoning, then it's going to smash the contents of
* the redzone and userword anyhow, so switch them off.
*/
if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
#endif
}
cachep->colour_off = cache_line_size();
/* Offset must be a multiple of the alignment. */
if (cachep->colour_off < cachep->align)
cachep->colour_off = cachep->align;
//color_off必须是对齐的倍数
cachep->colour = left_over / cachep->colour_off;
cachep->freelist_size = freelist_size;
cachep->flags = flags;
cachep->allocflags = __GFP_COMP;
if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
cachep->allocflags |= GFP_DMA;
cachep->size = size;
cachep->reciprocal_buffer_size = reciprocal_value(size);
if (flags & CFLGS_OFF_SLAB) {
cachep->freelist_cache = kmalloc_slab(freelist_size, 0u);
//如果是OFF_SLAB,则分配freelist_cache
/*
* This is a possibility for one of the kmalloc_{dma,}_caches.
* But since we go off slab only for object size greater than
* PAGE_SIZE/8, and kmalloc_{dma,}_caches get created
* in ascending order,this should not happen at all.
* But leave a BUG_ON for some lucky dude.
*/
BUG_ON(ZERO_OR_NULL_PTR(cachep->freelist_cache));
}
err = setup_cpu_cache(cachep, gfp);
if (err) {
__kmem_cache_shutdown(cachep);
return err;
}
if (flags & SLAB_DEBUG_OBJECTS) {
/*
* Would deadlock through slab_destroy()->call_rcu()->
* debug_object_activate()->kmem_cache_alloc().
*/
WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);
slab_set_debugobj_lock_classes(cachep);
} else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
on_slab_lock_classes(cachep);
return 0;
}
到此,已经初始化第一个kmem_cache,但是并未给kmem_cache分配page。下面当有要创建新的kmem_cache时,就会为第一个kmem_cache分配page,并在该page中分配一个obj。
第二阶段 从kmem_cache slab中分配kmalloc_cache
下面来看如何利用第一阶段分配的kmem_cache创建kmalloc_cache.
分配的过程分两步,第一步是从kmem_cache中分配一个kmem_cache obj;第二步是调用create_boot_cache把刚分配的kmem_cache obj 初始化和setup,该过程同kmem_cache 的创建过程,这里不再赘述。
create_kmalloc_cache->kmem_cache_zalloc->kmem_cache_alloc->slab_alloc->__do_cache_alloc->____cache_alloc.
create_kmalloc_cache->create_boot_cache.
上述调用栈不一一细看了,直接看关键的函数:slab_alloc函数将从kmem_cache slab中分配一个kmem_cache struct 对象。
static __always_inline void *
slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller) /
/此时cachep为上面刚创建好的kmem_cache
{
unsigned long save_flags;
void *objp;
flags &= gfp_allowed_mask;
lockdep_trace_alloc(flags);
if (slab_should_failslab(cachep, flags))
return NULL;
cachep = memcg_kmem_get_cache(cachep, flags);
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
objp =
__do_cache_alloc(cachep, flags);
//真正分配slab obj的函数
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
kmemleak_alloc_recursive(objp, cachep->object_size, 1, cachep->flags,
flags);
prefetchw(objp);
if (likely(objp)) {
kmemcheck_slab_alloc(cachep, flags, objp, cachep->object_size);
if (unlikely(flags & __GFP_ZERO))
memset(objp, 0, cachep->object_size);
}
return objp;
//返回分配的slab obj
}
static __always_inline void *
__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
return
____cache_alloc(cachep, flags);
}
static inline void *
____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *objp;
struct array_cache *ac;
bool force_refill = false;
check_irq_off();
ac =
cpu_cache_get(cachep);//获取array_cache,
if (likely(ac->avail)) {//如果当前ac有可分配的obj,则直接分配
ac->touched = 1;
objp = ac_get_obj(cachep, ac, flags, false);
/*
* Allow for the possibility all avail objects are not allowed
* by the current flags
*/
if (objp) {
STATS_INC_ALLOCHIT(cachep);
goto out;
}
force_refill = true;
}
//在当前slab的arraycache数组中没有可分配的obj,则通过cache_alloc_refill,新分配一个page,并把相应的slab objs填入arraycache中
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags, force_refill);
/*
* the 'ac' may be updated by cache_alloc_refill(),
* and kmemleak_erase() requires its correct value.
*/
ac = cpu_cache_get(cachep);
out:
/*
* To avoid a false negative, if an object that is in one of the
* per-CPU caches is leaked, we need to make sure kmemleak doesn't
* treat the array pointers as a reference to the object.
*/
if (objp)
kmemleak_erase(&ac->entry[ac->avail]);
//分配完obj后,把obj对应的ac位置清空
return objp;
}
具体从新分配的page中获取slab obj的函数如下:
static void *
cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags,
bool force_refill)
{
int batchcount;
struct kmem_cache_node *n;
struct array_cache *ac;
int node;
check_irq_off();
node = numa_mem_id();
if (unlikely(force_refill))
goto force_grow;
retry:
ac = cpu_cache_get(cachep);
//获取当前cpu的缓冲区数组
batchcount = ac->batchcount;
if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
//对批量申请个数进行调整
/*
* If there was little recent activity on this cache, then
* perform only a partial refill. Otherwise we could generate
* refill bouncing.
*/
batchcount = BATCHREFILL_LIMIT;
}
n = cachep->node[node];
//获取本节点对应的L3list及其他一些管理结构
BUG_ON(ac->avail > 0 || !n);
spin_lock(&n->list_lock);
//对n->l3list 加锁
/* See if we can refill from the shared array */
//对于多node的情况,首先判断其他节点是否有共享的缓存,有的话先从其他节点共享缓冲区中拷贝
if (n->shared && transfer_objects(ac, n->shared, batchcount)) {
n->shared->touched = 1;
goto alloc_done;
}
while (batchcount > 0) {
struct list_head *entry;
struct page *page;
/* Get slab alloc is to come from. */
entry = n->slabs_partial.next;
if (entry == &n->slabs_partial) {
//判断partial list是否有obj,没有则往下判断slabs_free
n->free_touched = 1;
entry = n->slabs_free.next;
if (entry == &n->slabs_free)
//判断slabs_free中是否有slab obj,没有则goto must_grow
goto must_grow;
}
page = list_entry(entry, struct page, lru);
//获取slabs对象所在page
check_spinlock_acquired(cachep);
/*
* The slab was either on partial or free list so
* there must be at least one object available for
* allocation.
*/
BUG_ON(page->active >= cachep->num);
while (page->active < cachep->num && batchcount--) {
//如果page中有可分配的slab objs
STATS_INC_ALLOCED(cachep);
STATS_INC_ACTIVE(cachep);
STATS_SET_HIGH(cachep);
ac_put_obj(cachep, ac, slab_get_obj(cachep, page,
node));
//从page中获取slab obj赋值到ac中
}
/* move slabp to correct slabp list: */
list_del(&page->lru);
//将page从原先的链表删除
if (page->active == cachep->num)
//将 page从free_list移动到partial或者full slabs list中
list_add(&page->lru, &n->slabs_full);
else
list_add(&page->lru, &n->slabs_partial);
}
must_grow:
//当partial或者free list中都没有可分配的page,这时必须新分配page,增长slab
n->free_objects -= ac->avail;
//slab中还剩多少个object未分配
alloc_done:
spin_unlock(&n->list_lock);
if (unlikely(!ac->avail)) {
int x;
force_grow:
x =
cache_grow(cachep, flags | GFP_THISNODE, node, NULL);//增加一个slab函数,成功返回1
/* cache_grow can reenable interrupts, then ac could change. */
ac = cpu_cache_get(cachep);
node = numa_mem_id();
/* no objects in sight? abort */
if (!x && (ac->avail == 0 || force_refill))
return NULL;
if (!ac->avail) /* objects refilled by interrupt? */
//此时ac还没有aval,但slab中有可分配的objs,retry重新获取slab obj到ac中
goto retry;//如果该cpu对应的ac中没有可用的slab object,则retry重新获得
}
ac->touched = 1;
//ac设置为访问过
return ac_get_obj(cachep, ac, flags, force_refill);
//返回获取的obj
}
下面分析一下kmem_cache如何新分配一个slab空间的。
static int
cache_grow(struct kmem_cache *cachep,
gfp_t flags, int nodeid, struct page *page)
{
void *freelist;
size_t offset;
gfp_t local_flags;
struct kmem_cache_node *n;
/*
* Be lazy and only check for valid flags here, keeping it out of the
* critical path in kmem_cache_alloc().
*/
BUG_ON(flags & GFP_SLAB_BUG_MASK);
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
/* Take the node list lock to change the colour_next on this node */
check_irq_off();
n = cachep->node[nodeid];
//获取kmem_cache的对应node节点的kmem_cache_node结构,kmem_cache_node中保存了该节点上partial,full,free list上的slab obj的情况
spin_lock(&n->list_lock);
/* Get colour for the slab, and cal the next value. */
offset = n->colour_next;
n->colour_next++;
if (n->colour_next >= cachep->colour)
n->colour_next = 0;
spin_unlock(&n->list_lock);
offset *= cachep->colour_off;
if (local_flags & __GFP_WAIT)
local_irq_enable();
/*
* The test for missing atomic flag is performed here, rather than
* the more obvious place, simply to reduce the critical path length
* in kmem_cache_alloc(). If a caller is seriously mis-behaving they
* will eventually be caught here (where it matters).
*/
kmem_flagcheck(cachep, flags);
/*
* Get mem for the objs. Attempt to allocate a physical page from
* 'nodeid'.
*/
if (!page)
page = kmem_getpages(cachep, local_flags, nodeid); //从系统page分配器中获取一个page
if (!page)
goto failed;
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
//获取slab 管理结构的指针
if (!freelist)
goto opps1;
slab_map_pages(cachep, page, freelist);
//将cachep,freelist设置到page结构中去
cache_init_objs(cachep, page);
//初始化objs,freelist[i]=i,并调用ctor构造函数
if (local_flags & __GFP_WAIT)
local_irq_disable();
check_irq_off();
spin_lock(&n->list_lock);
/* Make slab active. */
list_add_tail(&page->lru, &(n->slabs_free));
//将该page加入到slabs_free中去,等待下次分配obj时,可以从freelist得到objs
STATS_INC_GROWN(cachep);
n->free_objects += cachep->num;
//该node节点可分配的free_object 增加cachep->num个
spin_unlock(&n->list_lock);
return 1;
opps1:
kmem_freepages(cachep, page);
failed:
if (local_flags & __GFP_WAIT)
local_irq_disable();
return 0;
}
到此为止,已经从kmem_cache中分配出一个obj了,然后调用create_boot_cache对cachep的一些成员初始化,然后一级级返回,则回到kmem_cache_init中的kmalloc_caches[INDEX_AC] =create_kmalloc_cache("kmalloc-ac", kmalloc_size(INDEX_AC), ARCH_KMALLOC_FLAGS);该位置,即已经创建出kmalloc_caches[INDEX_AC],用来分配arraycache_init struct。
接下来,调用kmalloc_caches[INDEX_NODE] = create_kmalloc_cache("kmalloc-node", malloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);创建分配kmem_cache_node 结构的slab。
第三阶段 创建kmalloc caches
通过第一,第二阶段,已经创建出kmem_cache, kmalloc_cache[INDEX_AC],kmalloc_caches[INDEX_NODE]。现在利用这几个slab,为kmalloc_caches做准备。
调用create_kmalloc_caches分配kmalloc 分配不同size时所需要的slab。
void __init
create_kmalloc_caches(unsigned long flags)
{
int i;
/*
* Patch up the size_index table if we have strange large alignment
* requirements for the kmalloc array. This is only the case for
* MIPS it seems. The standard arches will not generate any code here.
*
* Largest permitted alignment is 256 bytes due to the way we
* handle the index determination for the smaller caches.
*
* Make sure that nothing crazy happens if someone starts tinkering
* around with ARCH_KMALLOC_MINALIGN
*/
BUILD_BUG_ON(KMALLOC_MIN_SIZE > 256 ||
(KMALLOC_MIN_SIZE & (KMALLOC_MIN_SIZE - 1)));
for (i = 8; i < KMALLOC_MIN_SIZE; i += 8) {
int elem = size_index_elem(i);
if (elem >= ARRAY_SIZE(size_index))
break;
size_index[elem] = KMALLOC_SHIFT_LOW;
}
if (KMALLOC_MIN_SIZE >= 64) {
/*
* The 96 byte size cache is not used if the alignment
* is 64 byte.
*/
for (i = 64 + 8; i <= 96; i += 8)
size_index[size_index_elem(i)] = 7;
}
if (KMALLOC_MIN_SIZE >= 128) {
/*
* The 192 byte sized cache is not used if the alignment
* is 128 byte. Redirect kmalloc to use the 256 byte cache
* instead.
*/
for (i = 128 + 8; i <= 192; i += 8)
size_index[size_index_elem(i)] = 8;
}
for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
if (!kmalloc_caches[i]) {
kmalloc_caches[i] =
create_kmalloc_cache(NULL,
1 << i, flags);
//调用create_kmalloc_cache 创建对应saize的slab。该函数在第二阶段有详细介绍。
}
/*
* Caches that are not of the two-to-the-power-of size.
* These have to be created immediately after the
* earlier power of two caches
*/
if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
kmalloc_caches[1] = create_kmalloc_cache(NULL, 96, flags);
if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
kmalloc_caches[2] = create_kmalloc_cache(NULL, 192, flags);
}
/* Kmalloc array is now usable */
slab_state = UP;
//一切就绪后,将slab_state设置为up,此后,就可以直接用kmem_cache_create和kmalloc功能来创建slab和分配objs了。
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[i];
char *n;
if (s) {
n = kasprintf(GFP_NOWAIT, "kmalloc-%d", kmalloc_size(i));
BUG_ON(!n);
s->name = n;
}
}
#ifdef CONFIG_ZONE_DMA
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[i];
if (s) {
int size = kmalloc_size(i);
char *n = kasprintf(GFP_NOWAIT,
"dma-kmalloc-%d", size);
BUG_ON(!n);
kmalloc_dma_caches[i] = create_kmalloc_cache(n,
size, SLAB_CACHE_DMA | flags);
//如果define ZONE_DMA,则调用create_kmalloc_cache 创建对应saize的slab。该函数在第二阶段有详细介绍。
}
}
#endif
}
#endif /* !CONFIG_SLOB */
到此为止,基本上把 kmem_cache_init函数介绍完了,总结一下,就是先创建第一个kmem_cache,然后从kmem_cache slab中再分配其他的kmem_cache object,从而形成2级的树形结构。并详细介绍了如何从新创建的slab中,增长一个page,从中分配出objs的过程。
---------------------------------------------------------------------------------------------------------------------------------------------------
如果对您有理解内存管理有帮助,请打个赏吧!