Linux 内存管理(5) - Kmalloc

最新推荐文章于 2024-12-24 17:20:54 发布

Hacker_Albert

最新推荐文章于 2024-12-24 17:20:54 发布

阅读量956

点赞数

分类专栏： linux 内存管理文章标签：内存管理

本文链接：https://blog.csdn.net/weixin_41028621/article/details/103550697

版权

linux 同时被 2 个专栏收录

83 篇文章

订阅专栏

内存管理

18 篇文章

订阅专栏

本文详细解析了Linux内核中的kmalloc内存分配机制，包括静态和动态内存分配的区别，kmalloc与vmalloc的对比，以及kmalloc的具体实现过程，特别强调了kmalloc在物理和虚拟地址上连续存储的优势。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

了解kmalloc 实现机制

1.Memory allocation

Memory allocation in Linux kernel is different from the user space counterpart. The following facts are noteworthy,

Kernel memory is not pageable.
Kernel memory allocation mistakes can cause system oops (system crash) easily.
Kernel memory has limited hard stack size limit.

There’re two ways to allocate memory space for a kernel process, statically from the stack or dynamically from the heap.

1.1.Static Memory Allocation

The static memory allocation is normally used you know how much memory space you’ll need. For example,

#define BUF_LEN 2048
char buf[BUF_LEN];

However, the kernel stack size is fixed and limited (the limit is architecture dependent, but normally it’s only tens of kilobytes). Therefore people seldom request big chunk of memory in the stack. The better way is to allocate the memory dynamically from heap.

1.2.Dynamic Memory Allocation

There’re two functions available to allocate memory from heap in Linux kernel process,kmalloc() & vmalloc(),which are a simple interface for obtaining kernel memory in byte-sized chunks.

The kmalloc() function guarantees that the pages are physically contiguous (and virtually contiguous).
The vmalloc() function works in a similar fashion to kmalloc(), except it allocates memory that is only virtually contiguous and not necessarily physically contiguous.

1.3. vmalloc VS kmalloc

1.3.1.why is Kmalloc more efficient than vmalloc()?

kmalloc allocates a region of physically contiguous (also virtually contiguous) memory. The physical to virtual map is one-to-one.For vmalloc(), an MMU/PTE value is allocated for each page; the physical to virtual mapping is not continuous.vmalloc is often slower than kmalloc, because it may have to remap the buffer space into a virtually contiguous range. kmalloc never remaps.

kmalloc is the preferred way, as long as you don’t need very big areas. The trouble is, if you want to do DMA from/to some hardware device, you’ll need to use kmalloc, and you’ll probably need bigger chunk. The solution is to allocate memory as soon as possible, before memory gets fragmented.

1.3.2.why Linux directly maps the kernel virtual space to the physical memory?

There is one concept in linux kernel known as DMA(Direct Memory Access) which require contiguous physical memory. so when kernel trigger DMA operation we need to specify physically contiguous memory. that’s why we need direct memory mapping.

2.Kmalloc function

kmalloc is the normal method of allocating memory for objects smaller than page size in the kernel.

#include<linux/slab.h>
void *kmalloc(size_t size, int flags);
@param size:要分配内存的大小. 以字节为单位.
@param flags:要分配内存的类型。

实现如下：

533 static __always_inline void *kmalloc(size_t size, gfp_t flags)
534 {
535     if (__builtin_constant_p(size)) {                                                                    
536 #ifndef CONFIG_SLOB
537         unsigned int index;
538 #endif
539         if (size > KMALLOC_MAX_CACHE_SIZE)
540             return kmalloc_large(size, flags);
541 #ifndef CONFIG_SLOB
542         index = kmalloc_index(size);
543 
544         if (!index)
545             return ZERO_SIZE_PTR;
546 
547         return kmem_cache_alloc_trace(
548                 kmalloc_caches[kmalloc_type(flags)][index],
549                 flags, size);
550 #endif
551     }
552     return __kmalloc(size, flags);
553 }

__builtin_constant_p表示传入的是否为一个实数，gcc编译器会做这个判断，如果是一个确定的实数而非变量，那么它返回true，主要用于编译优化的处理。
如果是实数，那么会判断size是否大于KMALLOC_MAX_CACHE_SIZE，此值表示的是系统创建的slab cache的最大值，系统为kmalloc预先创建了很多大小不同的kmem cache，用于内存分配。这里的含义就是如果内存申请超过此值，那么直接使用 kmalloc_large进行大内存分配，实际上最终会调用页分配器去分配内存(即Buddy伙伴算法)，而不是使用slab分配器。
kmalloc_large分配大内存，实际上也就是调用页分配器去分配内存。
最后如果是一个变量，那么会调用__kmalloc来进行分配。

2.1.KMALLOC_MAX_CACHE_SIZE 宏定义：

#define MAX_ORDER 11
#define PAGE_SHIFT      12

 #define KMALLOC_SHIFT_HIGH  ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
                 (MAX_ORDER + PAGE_SHIFT - 1) : 25)
 #define KMALLOC_SHIFT_MAX   KMALLOC_SHIFT_HIGH

/* Maximum allocatable size */
#define KMALLOC_MAX_SIZE    (1UL << KMALLOC_SHIFT_MAX)
/* Maximum size for which we actually use a slab cache */
#define KMALLOC_MAX_CACHE_SIZE  (1UL << KMALLOC_SHIFT_HIGH)
/* Maximum order allocatable via the slab allocagtor */
#define KMALLOC_MAX_ORDER   (KMALLOC_SHIFT_MAX - PAGE_SHIFT)

可能每个平台定义不同，以arm32为例，经过换算可知KMALLOC_MAX_SIZE是4M。

KMALLOC_MAX_SIZE  = (1<<22) = 4M
KMALLOC_MAX_CACHE_SIZE = (1<<12) 
KMALLOC_SHIFT_HIGH = 12
KMALLOC_SHIFT_MAX  = 22

2.2.__kmalloc()：

3647 static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
3648                       unsigned long caller)
3649 {
3650     struct kmem_cache *cachep;
3651     void *ret;
3652 
3653     if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
3654         return NULL;
3655     cachep = kmalloc_slab(size, flags);
3656     if (unlikely(ZERO_OR_NULL_PTR(cachep)))
3657         return cachep;
3658     ret = slab_alloc(cachep, flags, caller);
3659 
3660     ret = kasan_kmalloc(cachep, ret, size, flags);
3661     trace_kmalloc(caller, ret,
3662               size, cachep->size, flags);
3663 
3664     return ret;
3665 }
3666 
3667 void *__kmalloc(size_t size, gfp_t flags)
3668 {                                                                                                       
3669     return __do_kmalloc(size, flags, _RET_IP_);
3670 }

查找kmalloc_slab，然后在对应的slab kmem cache中去申请内存来使用，采用slab_alloc来申请内存。

1026 struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
1027 {
1028     unsigned int index;
1029 
1030     if (size <= 192) {
1031         if (!size)
1032             return ZERO_SIZE_PTR;
1033 
1034         index = size_index[size_index_elem(size)];
1035     } else {
1036         if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) {
1037             WARN_ON(1);
1038             return NULL;
1039         }
1040         index = fls(size - 1);
1041     }
1042 
1043 #ifdef CONFIG_ZONE_DMA
1044     if (unlikely((flags & GFP_DMA)))
1045         return kmalloc_dma_caches[index];                                                               
1046 
1047 #endif
1048     return kmalloc_caches[index];
1049 }

如果申请大小小于192，且不为0，将通过size_index_elem宏转换为下标后，经size_index全局数组取得索引值，否则将直接通过fls()取得索引值；最后如果开启了DMA内存配置且设置了GFP_DMA标志，将结合索引值通过kmalloc_dma_caches返回kmem_cache管理结构信息，否则将通过kmalloc_caches返回该结构。

由此可以看出kmalloc()实现较为简单，其分配所得的内存不仅是虚拟地址上的连续存储空间，同时也是物理地址上的连续存储空间。

2.3.kmalloc_large

如果size超过了伙伴系统能够支持的最大申请大小，比如order>11个page大小的内存，那么系统会在哪里判断返回呢？

static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
{
    unsigned int order = get_order(size);
    return kmalloc_order_trace(size, flags, order);
}

get_order返回值如下：
0 -> 2^0 * PAGE_SIZE and below
1 -> 2^1 * PAGE_SIZE to 2^0 * PAGE_SIZE + 1
2 -> 2^2 * PAGE_SIZE to 2^1 * PAGE_SIZE + 1
3 -> 2^3 * PAGE_SIZE to 2^2 * PAGE_SIZE + 1
4 -> 2^4 * PAGE_SIZE to 2^3 * PAGE_SIZE + 1

最后函数调用到kmalloc_order:

1120 void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
1121 {
1122     void *ret;
1123     struct page *page;
1124 
1125     flags |= __GFP_COMP;
1126     page = alloc_pages(flags, order);                                                                   
1127     ret = page ? page_address(page) : NULL;
1128     kmemleak_alloc(ret, size, 1, flags);
1129     kasan_kmalloc_large(ret, size, flags);
1130     return ret;
1131 }

最终调用到了alloc_pages，这个就是页分配器的接口了，最终是会利用伙伴系统算法进行页的分配。看下伙伴系统核心：

mm/page_alloc.c:
1802 static inline
1803 struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
1804                         int migratetype)
1805 {
1806     unsigned int current_order;
1807     struct free_area *area;
1808     struct page *page;
1809 
1810     /* Find a page of the appropriate size in the preferred list */
1811     for (current_order = order; current_order < MAX_ORDER; ++current_order) {
1812         area = &(zone->free_area[current_order]);
1813         page = list_first_entry_or_null(&area->free_list[migratetype],
1814                             struct page, lru);
1815         if (!page)
1816             continue;
1817         list_del(&page->lru);
1818         rmv_page_order(page);
1819         area->nr_free--;
1820         expand(zone, page, order, current_order, area, migratetype);
1821         set_pcppage_migratetype(page, migratetype);                                                     
1822         return page;
1823     }
1824 
1825     return NULL;
1826 }

注意current_order < MAX_ORDER限制，这里也就是限制伙伴系统能够分配的内存大小最大不超过2^(MAX_ORDER-1)个page，经过计算可知：