深入分析linux内核的内存分配函数devm_kzalloc

乔碧萝成都分萝

已于 2022-10-07 11:04:09 修改

阅读量1w

点赞数 54

分类专栏： Linux内核函数文章标签： linux 驱动开发

于 2022-10-04 12:21:14 首次发布

本文链接：https://blog.csdn.net/xiaobingshidabing/article/details/127150448

版权

Linux内核函数专栏收录该内容

1 篇文章

订阅专栏

本文详细介绍了Linux内核中的devm_kzalloc函数，它用于为设备分配内存，并自动管理内存生命周期。文章深入探讨了devm_kzalloc的工作原理，包括如何与设备绑定内存，以及如何在设备卸载时自动释放内存。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在分析驱动代码的时候，经常会遇到使用devm_kzalloc()为一个设备分配一片内存的情况。devm_kzalloc()是内核用来分配内存的函数，同样可以分配内存的内核函数还有devm_kmalloc, kzalloc, kmalloc。它们之间的区别在于devm_XXX分配的内存可以跟设备进行绑定，当设备跟驱动分离时，跟设备绑定的内存会被自动释放，不需要我们手动释放。当然，如果内存不再使用了，我们也可以使用函数devm_kfree()手动进行释放。而使用kzalloc()和kzmalloc()分配的内存需要我们调用kfree()手动进行释放，如果使用完毕却没有释放的话，会造成内存泄漏。

重点：既然devm_XXX申请的内存可以跟设备进行绑定，那我们可以在平台驱动的probe函数中调用devm_kzalloc()为平台设备申请并绑定一片设备内存，并且，这片内存会同该平台设备共存亡，不需要我们额外操心它的释放问题。

接下来，我们深入分析一下devm_kzalloc()，看看它如何跟设备绑定，又如何被自动释放。devm_kzalloc是一个内联函数，其定义位于include/linux/device.h中：

static inline void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp)
{
	return devm_kmalloc(dev, size, gfp | __GFP_ZERO);
}

dev是内存需要绑定到的设备；size是需要分配的内存大小（单位为字节）；gfp设置这片内存的类型标志，在驱动程序里面设置为GFP_KERNEL即可，其定义位于include/linux/gfp.h中。
可以看到devm_kzalloc()就是将gfp参数或上一个__GFP_ZERO后，再次调用了devm_kmalloc()，表示分配指定大小的内存并且将其初始化为0。devm_kmalloc()的定义位于drivers/base/devres.c中：

/**
 * devm_kmalloc - Resource-managed kmalloc
 * @dev: Device to allocate memory for
 * @size: Allocation size
 * @gfp: Allocation gfp flags
 *
 * Managed kmalloc.  Memory allocated with this function is
 * automatically freed on driver detach.  Like all other devres
 * resources, guaranteed alignment is unsigned long long.
 *
 * RETURNS:
 * Pointer to allocated memory on success, NULL on failure.
 */
void * devm_kmalloc(struct device *dev, size_t size, gfp_t gfp)
{
	struct devres *dr;

	/* use raw alloc_dr for kmalloc caller tracing */
	dr = alloc_dr(devm_kmalloc_release, size, gfp);
	if (unlikely(!dr))
		return NULL;

	/*
	 * This is named devm_kzalloc_release for historical reasons
	 * The initial implementation did not support kmalloc, only kzalloc
	 */
	set_node_dbginfo(&dr->node, "devm_kzalloc_release", size);
	devres_add(dev, dr->data);
	return dr->data;
}
EXPORT_SYMBOL_GPL(devm_kmalloc);

其参数定义和含义跟devm_kzalloc()完全相同。如果内存分配成功，返回内存的起始地址，否则返回空指针NULL。内存是怎么分配的我们就不关心了，我们重点来看看这片内存是怎么绑定到设备的。

注意函数末尾的return dr->data，这表示dr->data指向了分配到的内存，前面有一行语句：

devres_add(dev, dr->data);

devres_add()的功能就是将dr->data这片内存的信息块（struct devres）绑定到设备dev。其定义位于drivers/base/devres.c中：

/**
 * devres_add - Register device resource
 * @dev: Device to add resource to
 * @res: Resource to register
 *
 * Register devres @res to @dev.  @res should have been allocated
 * using devres_alloc().  On driver detach, the associated release
 * function will be invoked and devres will be freed automatically.
 */
void devres_add(struct device *dev, void *res)
{
	struct devres *dr = container_of(res, struct devres, data);
	unsigned long flags;

	spin_lock_irqsave(&dev->devres_lock, flags);
	add_dr(dev, &dr->node);
	spin_unlock_irqrestore(&dev->devres_lock, flags);
}
EXPORT_SYMBOL_GPL(devres_add);

注意下面这行代码：

struct devres *dr = container_of(res, struct devres, data);

这里使用container_of，通过将分配到的内存的起始地址(res)作为线索，找到其对应的外层的struct devres结构体的地址。container_of是一个宏，具体的怎么实现的，大家可以自行在网络上找一些资料来看，我就不搬运了，struct devres结构体的定义如下：

struct devres {
	struct devres_node		node;
	/* -- 3 pointers */
	unsigned long long		data[];	/* guarantee ull alignment */
};

将struct devres的地址保存到dr后，紧接着调用了add_dr(dev, &dr->node)，该函数定义如下：

static void add_dr(struct device *dev, struct devres_node *node)
{
	devres_log(dev, node, "ADD");
	BUG_ON(!list_empty(&node->entry));
	list_add_tail(&node->entry, &dev->devres_head);
}

这个函数里面用到了linux内核里面的双向链表。关于双向链表的内容，CSDN里已经有对应的帖子讲解了，大家可以参考《Linux内核中经典链表 list_head 常见使用方法解析》。根据函数list_add_tail(）的用法，node->entry是需要插入到链表的节点，这里代表一个内存信息块；dev->devres_head是对应的链表头，这里代表内存信息块的头。devres_node的定义位于drivers/base/devres.c中，如下：

struct devres_node {
	struct list_head		entry;
	dr_release_t			release;
#ifdef CONFIG_DEBUG_DEVRES
	const char			*name;
	size_t				size;
#endif
};

由于dev里面的devres_head作为链表头，node里面的entry作为链表节点，所以，通过struct device结构里面的devres_head，可以遍历链表里面的struct devres_node结构。也就是说，只要我们知道了设备结构体struct device，就可以获取到跟设备绑定的所有内存块信息struct devres_node，其中的release成员就是删除该片内存所使用的函数。对struct devres_node使用container_of，可以向上找到struct devres，进而获取到这片内存的起始地址。这就是内存跟平台设备绑定的方法，主要就是使用了一个双向链表用来保存分配给该平台设备的内存块信息，使用一个图来总结一下将内存和平台设备绑定的过程：
demv_kzalloc绑定内存与平台设备

现在，分配的内存块已经绑定到平台设备下面了，如果平台驱动被卸载，这些内存块是怎么被自动释放的呢？平台驱动的卸载要调用platform_driver_unregister()，其定义位于drivers/base/platform.c中，如下：

/**
 * platform_driver_unregister - unregister a driver for platform-level devices
 * @drv: platform driver structure
 */
void platform_driver_unregister(struct platform_driver *drv)
{
	driver_unregister(&drv->driver);
}
EXPORT_SYMBOL_GPL(platform_driver_unregister);

里面调用了driver_unregister()，其定义位于drivers/base/driver.c

/**
 * driver_unregister - remove driver from system.
 * @drv: driver.
 *
 * Again, we pass off most of the work to the bus-level call.
 */
void driver_unregister(struct device_driver *drv)
{
	if (!drv || !drv->p) {
		WARN(1, "Unexpected driver unregister!\n");
		return;
	}
	driver_remove_groups(drv, drv->groups);
	bus_remove_driver(drv);
}
EXPORT_SYMBOL_GPL(driver_unregister);

里面调用了bus_remove_driver()，其定义位于drivers/base/bus.c中：

/**
 * bus_remove_driver - delete driver from bus's knowledge.
 * @drv: driver.
 *
 * Detach the driver from the devices it controls, and remove
 * it from its bus's list of drivers. Finally, we drop the reference
 * to the bus we took in bus_add_driver().
 */
void bus_remove_driver(struct device_driver *drv)
{
	if (!drv->bus)
		return;

	if (!drv->suppress_bind_attrs)
		remove_bind_files(drv);
	driver_remove_groups(drv, drv->bus->drv_groups);
	driver_remove_file(drv, &driver_attr_uevent);
	klist_remove(&drv->p->knode_bus);
	pr_debug("bus: '%s': remove driver %s\n", drv->bus->name, drv->name);
	driver_detach(drv);
	module_remove_driver(drv);
	kobject_put(&drv->p->kobj);
	bus_put(drv->bus);
}

这个函数的功能是将平台驱动与匹配的设备进行分离，由于调用链比较长，这里我直接把调用链写出来，如下：
bus_remove_driver --> driver_detach --> __device_release_driver --> devres_release_all，我们直接来看看devres_release_all这个函数。从函数名称来看，该函数的功能是释放所有的设备资源，其定义位于drivers/base/devres.c

/**
 * devres_release_all - Release all managed resources
 * @dev: Device to release resources for
 *
 * Release all resources associated with @dev.  This function is
 * called on driver detach.
 */
int devres_release_all(struct device *dev)
{
	unsigned long flags;

	/* Looks like an uninitialized device structure */
	if (WARN_ON(dev->devres_head.next == NULL))
		return -ENODEV;
	spin_lock_irqsave(&dev->devres_lock, flags);
	return release_nodes(dev, dev->devres_head.next, &dev->devres_head,
			     flags);
}

函数里面调用了release_nodes()，其定义同样位于drivers/base/devres.c

static int release_nodes(struct device *dev, struct list_head *first,
			 struct list_head *end, unsigned long flags)
	__releases(&dev->devres_lock)
{
	LIST_HEAD(todo);
	int cnt;
	struct devres *dr, *tmp;

	cnt = remove_nodes(dev, first, end, &todo);

	spin_unlock_irqrestore(&dev->devres_lock, flags);

	/* Release.  Note that both devres and devres_group are
	 * handled as devres in the following loop.  This is safe.
	 */
	list_for_each_entry_safe_reverse(dr, tmp, &todo, node.entry) {
		devres_log(dev, &dr->node, "REL");
		dr->node.release(dev, dr->data);
		kfree(dr);
	}

	return cnt;
}

注意，从双向链表struct list的用法可知，first对应分配的第一片内存（其宿主结构体是struct devres_node），而end指示链表的结束位置，传进来的是参数头节点地址（其宿主结构体是struct device），所以是没有对应到内存块的，因为头节点只用作指示作用。remove_nodes()函数的作用是将需要删除的内存块整理到一个由todo指示的链表里面，这样，平台设备下的内存资源就被分离夺取，由todo代管了。remove_nodes()的定义位于drivers/base/devres.c中：

static int remove_nodes(struct device *dev,
			struct list_head *first, struct list_head *end,
			struct list_head *todo)
{
	int cnt = 0, nr_groups = 0;
	struct list_head *cur;

	/* First pass - move normal devres entries to @todo and clear
	 * devres_group colors.
	 */
	cur = first;
	while (cur != end) {
		struct devres_node *node;
		struct devres_group *grp;

		node = list_entry(cur, struct devres_node, entry);
		cur = cur->next;

		grp = node_to_group(node);
		if (grp) {
			/* clear color of group markers in the first pass */
			grp->color = 0;
			nr_groups++;
		} else {
			/* regular devres entry */
			if (&node->entry == first)
				first = first->next;
			list_move_tail(&node->entry, todo);
			cnt++;
		}
	}

	if (!nr_groups)
		return cnt;

	/* Second pass - Scan groups and color them.  A group gets
	 * color value of two iff the group is wholly contained in
	 * [cur, end).  That is, for a closed group, both opening and
	 * closing markers should be in the range, while just the
	 * opening marker is enough for an open group.
	 */
	cur = first;
	while (cur != end) {
		struct devres_node *node;
		struct devres_group *grp;

		node = list_entry(cur, struct devres_node, entry);
		cur = cur->next;

		grp = node_to_group(node);
		BUG_ON(!grp || list_empty(&grp->node[0].entry));

		grp->color++;
		if (list_empty(&grp->node[1].entry))
			grp->color++;

		BUG_ON(grp->color <= 0 || grp->color > 2);
		if (grp->color == 2) {
			/* No need to update cur or end.  The removed
			 * nodes are always before both.
			 */
			list_move_tail(&grp->node[0].entry, todo);
			list_del_init(&grp->node[1].entry);
		}
	}

	return cnt;
}

这个函数里面有两次遍历操作，我们只需要关心第一次遍历即可（第二次遍历处理的是devres_group，目前对devres_group还没有概念，也不影响我们的分析）。第一次遍历使用一个while(cur != end)的while循环，遍历链表里面的所有内存块节点。在循环里面，通过使用list_entry来获取该链表节点对应的外层宿主结构体地址。
list_entry是一个宏，其定义位于include/linux/list.h中：

/**
 * list_entry - get the struct for this entry
 * @ptr:	the &struct list_head pointer.
 * @type:	the type of the struct this is embedded in.
 * @member:	the name of the list_head within the struct.
 */
#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)

可以看到它其实就是使用了container_of这个非常厉害的宏。

紧接着，调用list_move_tail(&node->entry, todo)来处理该节点，将该节点从原来的链表里面（struct device下的链表，也就是平台设备维护的内存资源链表）删除，添加到todo指示的链表尾部。list_move_tail的定义位于include/linux/list.h中：

/**
 * list_move_tail - delete from one list and add as another's tail
 * @list: the entry to move
 * @head: the head that will follow our entry
 */
static inline void list_move_tail(struct list_head *list,
				  struct list_head *head)
{
	__list_del_entry(list);
	list_add_tail(list, head);
}

好，整个链表遍历结束后，remove_nodes()函数返回，回到release_nodes()函数，这个时候，todo这个链表里面就存放了该平台设备对应的所有需要删除的内存设备节点了，我们只需要遍历该链表，取出每个节点对应的struct devres结构体，释放里面的内存即可，如下：

/* Release.  Note that both devres and devres_group are
 * handled as devres in the following loop.  This is safe.
 */
list_for_each_entry_safe_reverse(dr, tmp, &todo, node.entry) {
	devres_log(dev, &dr->node, "REL");
	dr->node.release(dev, dr->data);
	kfree(dr);
}

list_for_each_entry_safe_reverse是一个宏，其定义位于include/linux/list.h中：

/**
 * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
 * @pos:	the type * to use as a loop cursor.
 * @n:		another type * to use as temporary storage
 * @head:	the head for your list.
 * @member:	the name of the list_head within the struct.
 *
 * Iterate backwards over list of given type, safe against removal
 * of list entry.
 */
#define list_for_each_entry_safe_reverse(pos, n, head, member)		\
	for (pos = list_last_entry(head, typeof(*pos), member),		\
		n = list_prev_entry(pos, member);			\
	     &pos->member != (head); 					\
	     pos = n, n = list_prev_entry(n, member))

从注释可以看出，这个宏的功能是反向遍历head指示的链表，并删除链表中的节点，pos指示当前遍历到的节点，我们只需要操作pos即可。这里pos是指向struct devres结构体的指针，为了方便，这里再次贴出struct devres结构体的定义：

struct devres {
	struct devres_node		node;
	/* -- 3 pointers */
	unsigned long long		data[];	/* guarantee ull alignment */
};

其中，node成员下的release为内存释放函数，data成员为需要释放的内存。所以，dr->node.release(dev, dr->data)就表示释放这个节点对应的内存（不过dr->node.release = devm_kmalloc_release，而devm_kmalloc_release是一个空函数，所以真正的释放动作是由后面的kfree(dr)完成的）。当整个todo链表被处理完后，该平台设备下绑定的所有内存块即被释放完毕，使用一个图来总结平台设备和驱动分离时，内存自动释放的过程：
平台设备删除时自动释放内存