NVDLA内核态驱动代码整理一

DentionY

已于 2023-11-08 20:20:46 修改

阅读量455

点赞数 1

分类专栏： NVDLA内核态驱动代码文章标签：驱动开发

于 2023-11-06 11:41:47 首次发布

本文链接：https://blog.csdn.net/weixin_41029027/article/details/134179904

版权

NVDLA内核态驱动代码专栏收录该内容

9 篇文章 3 订阅

订阅专栏

文章目录

前言
一、nvdla_gem.c代码解读一
二、nvdla_gem.c代码内函数整理一
三、nvdla_gem.c牵扯结构体整理一
四、总结

前言

本系列内容力求将nvdla的内核态驱动整理清楚，如果有分析不对的请指出。

一、nvdla_gem.c代码解读一

1. container_of函数

/*
这是一个宏定义，通常用于将一个结构体的成员指针转换为包含这个成员的完整结构体指针。
在这个特定的宏定义中，to_nvdla_obj(x) 被定义为 container_of(x, struct nvdla_gem_object, object)。以下是对这个宏定义的详细解释：

container_of(x, struct nvdla_gem_object, object)：这是一个通用的 Linux 内核宏，
用于从结构体的成员指针 x 推导出包含这个成员的完整结构体的指针。它的参数包括：

x：要转换的成员指针，通常是某个结构体的成员指针。
struct nvdla_gem_object：包含了成员 x 的结构体的类型。
object：成员 x 的名称。
#define to_nvdla_obj(x) container_of(x, struct nvdla_gem_object, object)：这是一个宏定义，将 to_nvdla_obj(x) 定义为 container_of(x, struct nvdla_gem_object, object)，这意味着当你使用 to_nvdla_obj(x) 时，实际上是调用了 container_of(x, struct nvdla_gem_object, object)。

这个宏的目的是使得通过成员指针 x 可以方便地获取到包含这个成员的完整结构体的指针，而无需手动计算偏移量或进行复杂的指针操作。这在 Linux 内核开发中非常常见，用于在数据结构中导航和访问不同成员。
在这种情况下，宏的名字 to_nvdla_obj 暗示了它用于将某个成员指针转换为 struct nvdla_gem_object 结构体的指针。
*/
#define to_nvdla_obj(x) container_of(x, struct nvdla_gem_object, object)

struct nvdla_gem_object {
	struct drm_gem_object object;
	/*void *kvaddr：这是一个指针成员，通常用于存储内核虚拟地址。这个地址指向内核中的数据缓冲区，该缓冲区可能包含了与图形或DMA相关的数据。这个成员可能被用于快速访问数据，而无需进行物理内存地址转换。*/
	void *kvaddr; 
	/*dma_addr_t dma_addr：这是一个成员，通常用于存储与DMA相关的物理内存地址*/
	dma_addr_t dma_addr;
	unsigned long dma_attrs;
};

关于container_of可以查阅内核代码给出的原型：

If you have a struct uio_map structure, finding its embedded kobject is
just a matter of using the kobj member.  Code that works with kobjects will
often have the opposite problem, however: given a struct kobject pointer,
what is the pointer to the containing structure?  You must avoid tricks
(such as assuming that the kobject is at the beginning of the structure)
and, instead, use the container_of() macro, found in <linux/kernel.h>::

    container_of(pointer, type, member)

where:

  * "pointer" is the pointer to the embedded kobject,
  * "type" is the type of the containing structure, and
  * "member" is the name of the structure field to which "pointer" points.

The return value from container_of() is a pointer to the corresponding
container type.

所以其实可以看到container_of的作用是借助结构体内成员的指针pointer返回包含该成员的结构体structure field的名称类型type的指针。
关于drm目前不打算展开太多，只需要知道drm提供一种分离的图形驱动架构，将硬件驱动程序、内核模块和用户空间驱动程序进行分离；支持多个应用程序同时访问显卡，并提供了更丰富的图形功能，例如硬件加速和3D加速；和其他常见驱动架构一样，drm也提供了一些内核接口，可以让用户空间应用程序与驱动程序进行交互。而drm_gem_object中除了drm这个元素以外，还有gem这个元素，GEM（Graphics Execution Manager）主要负责DRM的内存管理和释放。

2. nvdla_fill_task_desc函数

继续看代码，nvdla_fill_task_desc的目的是将local_task的任务地址数量num_addresses和任务具体内容的指针handles，其中local_task->num_addresses * sizeof(struct nvdla_mem_handle)就是在申请所有具体任务相关数据的地址空间，所以这里有个核心的结构体nvdla_mem_handle。

static int32_t nvdla_fill_task_desc(struct nvdla_ioctl_submit_task *local_task,
				struct nvdla_task *task)
{
	struct nvdla_mem_handle *handles;

	/* update task desc fields */
	task->num_addresses = local_task->num_addresses; // 不分配额外内存


	handles = kzalloc(local_task->num_addresses *
				sizeof(struct nvdla_mem_handle), GFP_KERNEL); // Step 1. 先在内核空间申请若干小字节内存，且清零
	if (handles == NULL)
		return -EFAULT;

	/* get user addresses list */
	if (copy_from_user(handles,
		(void __user *)local_task->address_list,              // Step 2. 后从用户空间拷贝address_list到handles
		(task->num_addresses *                                // 如果有数据没被拷贝，copy_from_user会返回遗漏字节数，真
			sizeof(struct nvdla_mem_handle)))) {
		pr_err("failed to copy address list from user ptr\n");
		kfree(handles);
		return -EFAULT;
	}

	task->address_list = handles;  // 内核态的数据，而local_task是用户态数据

	return 0;
}

就其中几个要点分别展开叙述：
1、kzalloc和kfree：

// GFP_KERNEL：最常用的内存分配掩码，就是在内存分配过程中允许进程切换，可以进行IO操作
//             比如为得到空闲页，将页面暂时换出到磁盘上，允许执行文件系统操作。
// kzalloc：属于slab分配器内容，主要是为了不去使用特别大的页分配器方案alloc_pages/free_pages，转而
//          使用较小的若干字节，这是因为驱动程序犯不着使用那么大的存储，于是slab分配器这种针对小块内存的
//          分配方案就被用在驱动上。针对1个若干字节的对象，使用kmem_cache_create/alloc/free/destroy
//          针对若干个若干字节的对象，使用类似malloc/free的方案，比如kmalloc、kzalloc、kfree
//          其中kzalloc的原型是 void *kzalloc(size_t size, gfp_t flags);（多了内存清零）
// kfree：原型是void *kfree(const void *);

2、copy_from_user

// copy_from_user：原型是unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n);
// copy_to_user：原型是unsigned long __must_check copy_to_user(void __user *to, const void *from, unsigned long n);
// 其中__must_check要求必须检查函数返回值，n是希望复制的字节数，这两个函数都返回为复制成功的字节数。

// copy_to/from_user与memcpy的区别：前者调用了access_ok验证用户空间的内存是否真实可读写，当然这俩函数有可能引起进程休眠

// 建议可以使用get/put_user来代替

// __user提醒驱动代码编写者，这个内存空间属于用户空间

3、nvdla_mem_handle

/**
 * struct nvdla_mem_handle structure for memory handles
 *
 * @handle		handle to DMA buffer allocated in userspace
 * @reserved		Reserved for padding
 * @offset		offset in bytes from start address of buffer
 *
 */
/*
这个结构体定义的目的是为了在代码中组织和传递与DMA缓冲区相关的信息，使得代码可以更方便地操作和管理这些缓冲区。
*/
struct nvdla_mem_handle {
	__u32 handle; // 这是一个32位的整数 (__u32)，表示指向在用户空间分配的DMA（直接内存访问）缓冲区的句柄（handle）。DMA缓冲区通常用于在设备之间传输数据，这个句柄可能是一个唯一标识符，用于访问特定的DMA缓冲区。
	__u32 reserved; // 这也是一个32位的整数 (__u32)，通常用于保留空间或填充。在代码中，它被标记为“Reserved for padding”，这表示它可能用于后续的扩展或填充，但在当前的定义中可能没有特定的用途。
	__u64 offset; // 这是一个64位的整数 (__u64)，表示从DMA缓冲区的起始地址开始的字节偏移量。它用于指定在DMA缓冲区中的特定位置，从而可以对该位置进行读取或写入操作。
};

4、nvdla_ioctl_submit_task和nvdla_task

/**
 * struct nvdla_ioctl_submit_task structure for single task information
 * 此结构体的定义用于在代码中组织和传递关于单个任务的信息，其中包括任务涉及的内存地址、任务的超时设置等。
这些信息可能在与硬件设备的通信和任务调度中使用。
 *
 * @num_addresses		total number of entries in address_list
 * @reserved			Reserved for padding
 * @address_list		pointer to array of struct nvdla_mem_handle
 *
 */
/*
此结构体的定义用于在代码中组织和传递关于单个任务的信息，其中包括任务涉及的内存地址、任务的超时设置等。
这些信息可能在与硬件设备的通信和任务调度中使用。
*/
struct nvdla_ioctl_submit_task {
#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
	__u32 num_addresses; // 这是一个32位的整数 (__u32)，表示 address_list 中包含的地址条目（entries）的总数。它指示了在任务中涉及多少个内存地址条目。
#define NVDLA_NO_TIMEOUT    (0xffffffff)
	__u32 timeout; // 这是一个32位的整数 (__u32)，表示任务的超时时间。在代码中，NVDLA_NO_TIMEOUT 被定义为一个特殊值 (0xffffffff)，表示没有超时限制。如果设置为其他值，它将指示任务在超过指定的时间后自动超时。
	__u64 address_list; // 这是一个64位的整数 (__u64)，它是一个指向 struct nvdla_mem_handle 结构体数组的指针。这个数组存储了任务中涉及的内存地址的信息。每个元素都包括一个句柄（handle）、一个保留字段（reserved），以及一个表示地址偏移量的64位整数。
};

nvdla_task的代码如下：

/**
 * @brief			Task information submitted from user space
 *
 * ref				Reference count for task
 * num_addresses		Number of addresses in address list
 * nvdla_dev			Pointer to NVDLA device
 * address_list			Address list
 * file				DRM file instance
 */
struct nvdla_task {
	struct kref ref;
	uint32_t num_addresses;
	struct nvdla_device *nvdla_dev;
	struct nvdla_mem_handle *address_list;
	struct drm_file *file;
};

注意，从nvdla_ioctl_submit_task的命名中可以发现该结构体类型变量的传入方式是用户态传入，而nvdla_task则是接收来自用户空间的任务信息，并将其提交给硬件抽象接口执行，换句话说，nvdla_ioctl_submit_task下陷为nvdla_task，其中的转换方式是以描述单个任务的nvdla_mem_handle结构体作为媒介。

3. nvdla_submit函数

继续看代码,nvdla_submit函数传入参数arg（该参数的使之内容是nvdla_submit_args结构体类型的变量，包含内容为任务、任务的数量等），arg传入的任务转换为nvdla_ioctl_submit_task结构体类型的任务，随后调用nvdla_fill_task_desc完成用户态空间任务数据到内核态空间任务数据的下陷。与此同时，利用传入的drm_device结构体指针drm通过dev_get_drvdata来获取与其他子系统交互的过程中当前的driver data，从而引入完成nvdla_fill_task_desc功能的另一个关键变量task，并将drm_file结构体提交给task，其中drm_file结构体包含针对该file的每个文件描述符操作后的状态变量。最后使用nvdla_task_submit函数提交 NVDLA 任务并等待任务完成的函数。

static int32_t nvdla_submit(struct drm_device *drm, void *arg,
					struct drm_file *file)
{
	int32_t err = 0;
	struct nvdla_task *task;
	struct nvdla_ioctl_submit_task local_task;
	struct nvdla_ioctl_submit_task __user *user_task;
	struct nvdla_device *nvdla_dev = dev_get_drvdata(drm->dev);
	struct nvdla_submit_args *args =
			(struct nvdla_submit_args *)arg;

	user_task = (struct nvdla_ioctl_submit_task __user *)
			(uintptr_t)args->tasks;
	if (!user_task)
		return -EINVAL;

	/* IOCTL copy descriptors */
	if (copy_from_user(&local_task, (void __user *)user_task,
			(sizeof(*user_task))))
		return -EFAULT;

	task = kzalloc(sizeof(*task), GFP_KERNEL);
	if (task == NULL)
		return -EFAULT;

	nvdla_dev->task = task;
	kref_init(&task->ref);
	task->nvdla_dev = nvdla_dev;
	task->file = file;

	/* update task desc fields */
	err = nvdla_fill_task_desc(&local_task, task);
	if (err)
		goto free_task_desc;

	// 用于提交 NVDLA 任务并等待任务完成的函数
	err = nvdla_task_submit(nvdla_dev, task); // 在nvdla_core_callbacks.c中声明

	kfree(task->address_list);

free_task_desc:
	kfree(task);
	return err;
}

解释相关变量和函数，如下
1、nvdla_device和dev_get_drvdata

/**
 * @brief			NVDLA device
 *
 * irq				Interrupt number associated with this device
 * ref				Reference count for device
 * base				IO mapped base address for device
 * nvdla_lock			Spinlock used for synchronization
 * drm				DRM device instance
 * task				Pointer to task in execution
 * config_data			Pointer to the configuration data
 * pdev				Pointer to NVDLA platform device
 * event_notifier		Completion object used to wait for events from HW
 * engine_context		Private data passed from engine in dla_engine_init
 */
struct nvdla_device {
	int32_t irq;
	struct kref ref;
	void __iomem *base;
	spinlock_t nvdla_lock;
	struct drm_device *drm;
	struct nvdla_task *task;
	struct nvdla_config *config_data;
	struct platform_device *pdev;
	struct completion event_notifier;

	void *engine_context;
};

可以观察到该结构体包含的信息是设备常用信息，比如中断、平台设备、drm设备等。
dev_get_drvdata的函数原型如下：

static inline void *dev_get_drvdata(const struct device *dev)
{
	return dev->driver_data;
}

2、nvdla_submit_args

/**
 * struct nvdla_submit_args structure for task submit
 *
 * @tasks		pointer to array of struct nvdla_ioctl_submit_task
 * @num_tasks		number of entries in tasks
 * @flags		flags for task submit, no flags defined yet
 * @version		version of task structure
 *
 */
struct nvdla_submit_args {
	__u64 tasks;
	__u16 num_tasks;
#define NVDLA_MAX_TASKS_PER_SUBMIT	24
#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
	__u16 flags;
	__u32 version;
};

该结构体用于提交任务。
3、kref_init

/*
kref_init 函数的参数是一个指向 struct kref 结构体的指针 kref，它用于表示引用计数对象。通过调用 kref_init 函数，可以将 kref 初始化为初始引用计数值为1的状态。

一旦初始化完成，可以使用其他内核函数来增加或减少引用计数，以跟踪对象的引用情况。当引用计数达到零时，表示对象不再被引用，可以安全地销毁或释放相关资源。
*/

4、drm_file
可以在内核代码中找到如下的drm_file结构体定义，不展开介绍，只为后续整理做准备：

/**
 * struct drm_file - DRM file private data
 *
 * This structure tracks DRM state per open file descriptor.
 */
struct drm_file {
	/**
	 * @authenticated:
	 *
	 * Whether the client is allowed to submit rendering, which for legacy
	 * nodes means it must be authenticated.
	 *
	 * See also the :ref:`section on primary nodes and authentication
	 * <drm_primary_node>`.
	 */
	unsigned authenticated :1;

	/**
	 * @stereo_allowed:
	 *
	 * True when the client has asked us to expose stereo 3D mode flags.
	 */
	unsigned stereo_allowed :1;

	/**
	 * @universal_planes:
	 *
	 * True if client understands CRTC primary planes and cursor planes
	 * in the plane list. Automatically set when @atomic is set.
	 */
	unsigned universal_planes:1;

	/** @atomic: True if client understands atomic properties. */
	unsigned atomic:1;

	/**
	 * @aspect_ratio_allowed:
	 *
	 * True, if client can handle picture aspect ratios, and has requested
	 * to pass this information along with the mode.
	 */
	unsigned aspect_ratio_allowed:1;

	/**
	 * @writeback_connectors:
	 *
	 * True if client understands writeback connectors
	 */
	unsigned writeback_connectors:1;

	/**
	 * @is_master:
	 *
	 * This client is the creator of @master. Protected by struct
	 * &drm_device.master_mutex.
	 *
	 * See also the :ref:`section on primary nodes and authentication
	 * <drm_primary_node>`.
	 */
	unsigned is_master:1;

	/**
	 * @master:
	 *
	 * Master this node is currently associated with. Only relevant if
	 * drm_is_primary_client() returns true. Note that this only
	 * matches &drm_device.master if the master is the currently active one.
	 *
	 * See also @authentication and @is_master and the :ref:`section on
	 * primary nodes and authentication <drm_primary_node>`.
	 */
	struct drm_master *master;

	/** @pid: Process that opened this file. */
	struct pid *pid;

	/** @magic: Authentication magic, see @authenticated. */
	drm_magic_t magic;

	/**
	 * @lhead:
	 *
	 * List of all open files of a DRM device, linked into
	 * &drm_device.filelist. Protected by &drm_device.filelist_mutex.
	 */
	struct list_head lhead;

	/** @minor: &struct drm_minor for this file. */
	struct drm_minor *minor;

	/**
	 * @object_idr:
	 *
	 * Mapping of mm object handles to object pointers. Used by the GEM
	 * subsystem. Protected by @table_lock.
	 */
	struct idr object_idr;

	/** @table_lock: Protects @object_idr. */
	spinlock_t table_lock;

	/** @syncobj_idr: Mapping of sync object handles to object pointers. */
	struct idr syncobj_idr;
	/** @syncobj_table_lock: Protects @syncobj_idr. */
	spinlock_t syncobj_table_lock;

	/** @filp: Pointer to the core file structure. */
	struct file *filp;

	/**
	 * @driver_priv:
	 *
	 * Optional pointer for driver private data. Can be allocated in
	 * &drm_driver.open and should be freed in &drm_driver.postclose.
	 */
	void *driver_priv;

	/**
	 * @fbs:
	 *
	 * List of &struct drm_framebuffer associated with this file, using the
	 * &drm_framebuffer.filp_head entry.
	 *
	 * Protected by @fbs_lock. Note that the @fbs list holds a reference on
	 * the framebuffer object to prevent it from untimely disappearing.
	 */
	struct list_head fbs;

	/** @fbs_lock: Protects @fbs. */
	struct mutex fbs_lock;

	/**
	 * @blobs:
	 *
	 * User-created blob properties; this retains a reference on the
	 * property.
	 *
	 * Protected by @drm_mode_config.blob_lock;
	 */
	struct list_head blobs;

	/** @event_wait: Waitqueue for new events added to @event_list. */
	wait_queue_head_t event_wait;

	/**
	 * @pending_event_list:
	 *
	 * List of pending &struct drm_pending_event, used to clean up pending
	 * events in case this file gets closed before the event is signalled.
	 * Uses the &drm_pending_event.pending_link entry.
	 *
	 * Protect by &drm_device.event_lock.
	 */
	struct list_head pending_event_list;

	/**
	 * @event_list:
	 *
	 * List of &struct drm_pending_event, ready for delivery to userspace
	 * through drm_read(). Uses the &drm_pending_event.link entry.
	 *
	 * Protect by &drm_device.event_lock.
	 */
	struct list_head event_list;

	/**
	 * @event_space:
	 *
	 * Available event space to prevent userspace from
	 * exhausting kernel memory. Currently limited to the fairly arbitrary
	 * value of 4KB.
	 */
	int event_space;

	/** @event_read_lock: Serializes drm_read(). */
	struct mutex event_read_lock;

	/**
	 * @prime:
	 *
	 * Per-file buffer caches used by the PRIME buffer sharing code.
	 */
	struct drm_prime_file_private prime;

	/* private: */
	unsigned long lock_count; /* DRI1 legacy lock count */
};

4. nvdla_gem_alloc函数

继续看代码,nvdla_gem_alloc函数，该函数传入的变量是nvdla用于存储管理的结构体nvdla_gem_object，根据前面介绍，该结构含有三个重要的变量，负责drm下存储分配和管理的drm_gem_object结构体、内核态虚拟地址kvaddr和dma相关变量。整个函数实现的功能是dma地址分配。

static int32_t nvdla_gem_alloc(struct nvdla_gem_object *nobj)
{
	struct drm_gem_object *dobj = &nobj->object;   // GEM buffer object
	struct drm_device *drm = dobj->dev;            // DRM dev this object belongs to.

	nobj->dma_attrs = DMA_ATTR_WRITE_COMBINE;

	// dma_allc_attrs可以参考dma_alloc_coherent
	// 第一个参数是DMA设备 d
	// 第二个参数是需要的DMA缓冲区大小 s
	// 第三个参数是得到的DMA缓冲区的总线地址 h
	// 第四个参数是内存分配掩码——flags f
	// 返回的是映射得到的内核虚拟地址 c
	// 在释放的时候使用dma_free_attrs(d, s, c, h, NULL)，其中c就是内核虚拟地址
	nobj->kvaddr = dma_alloc_attrs(drm->dev, dobj->size, &nobj->dma_addr,  // 为drm设备drm_device的GEM buffer object申请一段dma缓冲区
						GFP_KERNEL, nobj->dma_attrs);

	if (!nobj->kvaddr)
		return -ENOMEM;

	return 0;
}

几个重要的变量如下：
1、drm_gem_object结构体

/**
 * struct drm_gem_object - GEM buffer object
 *
 * This structure defines the generic parts for GEM buffer objects, which are
 * mostly around handling mmap and userspace handles.
 *
 * Buffer objects are often abbreviated to BO.
 */
struct drm_gem_object {
	/**
	 * @refcount:
	 *
	 * Reference count of this object
	 *
	 * Please use drm_gem_object_get() to acquire and drm_gem_object_put()
	 * or drm_gem_object_put_unlocked() to release a reference to a GEM
	 * buffer object.
	 */
	struct kref refcount;

	/**
	 * @handle_count:
	 *
	 * This is the GEM file_priv handle count of this object.
	 *
	 * Each handle also holds a reference. Note that when the handle_count
	 * drops to 0 any global names (e.g. the id in the flink namespace) will
	 * be cleared.
	 *
	 * Protected by &drm_device.object_name_lock.
	 */
	unsigned handle_count;

	/**
	 * @dev: DRM dev this object belongs to.
	 */
	struct drm_device *dev;

	/**
	 * @filp:
	 *
	 * SHMEM file node used as backing storage for swappable buffer objects.
	 * GEM also supports driver private objects with driver-specific backing
	 * storage (contiguous CMA memory, special reserved blocks). In this
	 * case @filp is NULL.
	 */
	struct file *filp;

	/**
	 * @vma_node:
	 *
	 * Mapping info for this object to support mmap. Drivers are supposed to
	 * allocate the mmap offset using drm_gem_create_mmap_offset(). The
	 * offset itself can be retrieved using drm_vma_node_offset_addr().
	 *
	 * Memory mapping itself is handled by drm_gem_mmap(), which also checks
	 * that userspace is allowed to access the object.
	 */
	struct drm_vma_offset_node vma_node;

	/**
	 * @size:
	 *
	 * Size of the object, in bytes.  Immutable over the object's
	 * lifetime.
	 */
	size_t size;

	/**
	 * @name:
	 *
	 * Global name for this object, starts at 1. 0 means unnamed.
	 * Access is covered by &drm_device.object_name_lock. This is used by
	 * the GEM_FLINK and GEM_OPEN ioctls.
	 */
	int name;

	/**
	 * @dma_buf:
	 *
	 * dma-buf associated with this GEM object.
	 *
	 * Pointer to the dma-buf associated with this gem object (either
	 * through importing or exporting). We break the resulting reference
	 * loop when the last gem handle for this object is released.
	 *
	 * Protected by &drm_device.object_name_lock.
	 */
	struct dma_buf *dma_buf;

	/**
	 * @import_attach:
	 *
	 * dma-buf attachment backing this object.
	 *
	 * Any foreign dma_buf imported as a gem object has this set to the
	 * attachment point for the device. This is invariant over the lifetime
	 * of a gem object.
	 *
	 * The &drm_driver.gem_free_object callback is responsible for cleaning
	 * up the dma_buf attachment and references acquired at import time.
	 *
	 * Note that the drm gem/prime core does not depend upon drivers setting
	 * this field any more. So for drivers where this doesn't make sense
	 * (e.g. virtual devices or a displaylink behind an usb bus) they can
	 * simply leave it as NULL.
	 */
	struct dma_buf_attachment *import_attach;
};

主要用到其中的2个变量：

/**
	 * @dev: DRM dev this object belongs to.
	 */
	struct drm_device *dev; // 该对象归属的设备
/**
	 * @size:
	 *
	 * Size of the object, in bytes.  Immutable over the object's
	 * lifetime.
	 */
	size_t size;  // 该对象的大小

2、drm_device结构体：

/**
 * DRM device structure. This structure represent a complete card that
 * may contain multiple heads.
 */
struct drm_device {
	struct list_head legacy_dev_list;/**< list of devices per driver for stealth attach cleanup */
	int if_version;			/**< Highest interface version set */

	/** \name Lifetime Management */
	/*@{ */
	struct kref ref;		/**< Object ref-count */
	struct device *dev;		/**< Device structure of bus-device */
	struct drm_driver *driver;	/**< DRM driver managing the device */
	void *dev_private;		/**< DRM driver private data */
	struct drm_minor *primary;		/**< Primary node */
	struct drm_minor *render;		/**< Render node */
	bool registered;

	/* currently active master for this device. Protected by master_mutex */
	struct drm_master *master;

	/**
	 * @unplugged:
	 *
	 * Flag to tell if the device has been unplugged.
	 * See drm_dev_enter() and drm_dev_is_unplugged().
	 */
	bool unplugged;

	struct inode *anon_inode;		/**< inode for private address-space */
	char *unique;				/**< unique name of the device */
	/*@} */

	/** \name Locks */
	/*@{ */
	struct mutex struct_mutex;	/**< For others */
	struct mutex master_mutex;      /**< For drm_minor::master and drm_file::is_master */
	/*@} */

	/** \name Usage Counters */
	/*@{ */
	int open_count;			/**< Outstanding files open, protected by drm_global_mutex. */
	spinlock_t buf_lock;		/**< For drm_device::buf_use and a few other things. */
	int buf_use;			/**< Buffers in use -- cannot alloc */
	atomic_t buf_alloc;		/**< Buffer allocation in progress */
	/*@} */

	struct mutex filelist_mutex;
	struct list_head filelist;

	/**
	 * @filelist_internal:
	 *
	 * List of open DRM files for in-kernel clients. Protected by @filelist_mutex.
	 */
	struct list_head filelist_internal;

	/**
	 * @clientlist_mutex:
	 *
	 * Protects @clientlist access.
	 */
	struct mutex clientlist_mutex;

	/**
	 * @clientlist:
	 *
	 * List of in-kernel clients. Protected by @clientlist_mutex.
	 */
	struct list_head clientlist;

	/** \name Memory management */
	/*@{ */
	struct list_head maplist;	/**< Linked list of regions */
	struct drm_open_hash map_hash;	/**< User token hash table for maps */

	/** \name Context handle management */
	/*@{ */
	struct list_head ctxlist;	/**< Linked list of context handles */
	struct mutex ctxlist_mutex;	/**< For ctxlist */

	struct idr ctx_idr;

	struct list_head vmalist;	/**< List of vmas (for debugging) */

	/*@} */

	/** \name DMA support */
	/*@{ */
	struct drm_device_dma *dma;		/**< Optional pointer for DMA support */
	/*@} */

	/** \name Context support */
	/*@{ */

	__volatile__ long context_flag;	/**< Context swapping flag */
	int last_context;		/**< Last current context */
	/*@} */

	/**
	 * @irq_enabled:
	 *
	 * Indicates that interrupt handling is enabled, specifically vblank
	 * handling. Drivers which don't use drm_irq_install() need to set this
	 * to true manually.
	 */
	bool irq_enabled;
	int irq;

	/**
	 * @vblank_disable_immediate:
	 *
	 * If true, vblank interrupt will be disabled immediately when the
	 * refcount drops to zero, as opposed to via the vblank disable
	 * timer.
	 *
	 * This can be set to true it the hardware has a working vblank counter
	 * with high-precision timestamping (otherwise there are races) and the
	 * driver uses drm_crtc_vblank_on() and drm_crtc_vblank_off()
	 * appropriately. See also @max_vblank_count and
	 * &drm_crtc_funcs.get_vblank_counter.
	 */
	bool vblank_disable_immediate;

	/**
	 * @vblank:
	 *
	 * Array of vblank tracking structures, one per &struct drm_crtc. For
	 * historical reasons (vblank support predates kernel modesetting) this
	 * is free-standing and not part of &struct drm_crtc itself. It must be
	 * initialized explicitly by calling drm_vblank_init().
	 */
	struct drm_vblank_crtc *vblank;

	spinlock_t vblank_time_lock;    /**< Protects vblank count and time updates during vblank enable/disable */
	spinlock_t vbl_lock;

	/**
	 * @max_vblank_count:
	 *
	 * Maximum value of the vblank registers. This value +1 will result in a
	 * wrap-around of the vblank register. It is used by the vblank core to
	 * handle wrap-arounds.
	 *
	 * If set to zero the vblank core will try to guess the elapsed vblanks
	 * between times when the vblank interrupt is disabled through
	 * high-precision timestamps. That approach is suffering from small
	 * races and imprecision over longer time periods, hence exposing a
	 * hardware vblank counter is always recommended.
	 *
	 * If non-zeor, &drm_crtc_funcs.get_vblank_counter must be set.
	 */
	u32 max_vblank_count;           /**< size of vblank counter register */

	/**
	 * List of events
	 */
	struct list_head vblank_event_list;
	spinlock_t event_lock;

	/*@} */

	struct drm_agp_head *agp;	/**< AGP data */

	struct pci_dev *pdev;		/**< PCI device structure */
#ifdef __alpha__
	struct pci_controller *hose;
#endif

	struct drm_sg_mem *sg;	/**< Scatter gather memory */
	unsigned int num_crtcs;                  /**< Number of CRTCs on this device */

	struct {
		int context;
		struct drm_hw_lock *lock;
	} sigdata;

	struct drm_local_map *agp_buffer_map;
	unsigned int agp_buffer_token;

	struct drm_mode_config mode_config;	/**< Current mode config */

	/** \name GEM information */
	/*@{ */
	struct mutex object_name_lock;
	struct idr object_name_idr;
	struct drm_vma_offset_manager *vma_offset_manager;
	/*@} */
	int switch_power_state;

	/**
	 * @fb_helper:
	 *
	 * Pointer to the fbdev emulation structure.
	 * Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
	 */
	struct drm_fb_helper *fb_helper;
};

主要用到以下变量：

struct device *dev;		/**< Device structure of bus-device */

3、结合上述结构体解释函数dma_alloc_attrs
先找到该函数的原型及其解释：

Warning: These pieces of the DMA API should not be used in the
majority of cases, since they cater for unlikely corner cases that
don't belong in usual drivers.

If you don't understand how cache line coherency works between a
processor and an I/O device, you should not be using this part of the
API at all.

::

	void *
	dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
			gfp_t flag, unsigned long attrs)

Identical to dma_alloc_coherent() except that when the
DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
platform will choose to return either consistent or non-consistent memory
as it sees fit.  By using this API, you are guaranteeing to the platform
that you have all the correct and necessary sync points for this memory
in the driver should it choose to return non-consistent memory.

Note: where the platform can return consistent memory, it will
guarantee that the sync points become nops.

Warning:  Handling non-consistent memory is a real pain.  You should
only use this API if you positively know your driver will be
required to work on one of the rare (usually non-PCI) architectures
that simply cannot make consistent memory.

函数定义如下：

static inline void *dma_alloc_attrs(struct device *dev, size_t size,
				       dma_addr_t *dma_handle, gfp_t flag,
				       unsigned long attrs)
{
	const struct dma_map_ops *ops = get_dma_ops(dev);
	void *cpu_addr;

	BUG_ON(!ops);
	WARN_ON_ONCE(dev && !dev->coherent_dma_mask);

	if (dma_alloc_from_dev_coherent(dev, size, dma_handle, &cpu_addr))
		return cpu_addr;

	/* let the implementation decide on the zone to allocate from: */
	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);

	if (!arch_dma_alloc_attrs(&dev))
		return NULL;
	if (!ops->alloc)
		return NULL;

	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
	return cpu_addr;
}

与dma_alloc_coherent（）相同，只是当在attrs参数中传递DMA_ATTR_NON_CONNISTENT标志时，平台将选择返回其认为合适的一致或非一致内存。通过使用这个API，可以向平台保证如果驱动程序选择返回不存在的内存，那么您在驱动程序中拥有该内存的所有正确和必要的同步点。
该函数只有在确信驱动程序需要在一种罕见的（通常是非PCI）架构上工作时，才应该使用此API，因为这种架构根本无法产生一致的内存。不过既然NVDLA用了，且运行无问题，我们不做API的选择性讨论，只讨论功能实现。

nobj->kvaddr = dma_alloc_attrs(drm->dev, dobj->size, &nobj->dma_addr,GFP_KERNEL, nobj->dma_attrs);
// drm->dev：该对象drm_gem_object归属的设备drm_device用到的结构体(drm_device是一个总线设备，但是在drm_gem_object结构体中并没有提到drm_device的数据结构，从而需要到drm_device结构体下寻找该结构体的数据结构)
// dobj->size：该对象的大小
// &nobj->dma_addr：获取指针
// nobj->dma_attrs：指定了DMA_ATTR_WRITE_COMBINE，可以找到dma-attributes.txt看该属性的含义
/*
DMA_ATTR_WRITE_COMBINE
----------------------

DMA_ATTR_WRITE_COMBINE specifies that writes to the mapping may be
buffered to improve performance.

Since it is optional for platforms to implement DMA_ATTR_WRITE_COMBINE,
those that do not will simply ignore the attribute and exhibit default
behavior.
DMA_ATTR_WRITE_COMBE指定可以缓冲对映射的写入以提高性能。
*/

5. nvdla_gem_free函数

继续看代码,nvdla_gem_free函数，如下：

// 这段代码用于释放先前申请的设备dma缓冲区
static void nvdla_gem_free(struct nvdla_gem_object *nobj)
{
	struct drm_gem_object *dobj = &nobj->object;
	struct drm_device *drm = dobj->dev;

	dma_free_attrs(drm->dev, dobj->size, nobj->kvaddr, nobj->dma_addr,
				nobj->dma_attrs);
}

6. nvdla_gem_create_object函数

继续看代码,nvdla_gem_create_object函数，用于创建 NVDLA GEM对象的函数，用于分配和管理 DMA缓冲区的内核对象。

static struct nvdla_gem_object *
nvdla_gem_create_object(struct drm_device *drm, uint32_t size)
{
	int32_t ret;
	struct drm_gem_object *dobj;
	struct nvdla_gem_object *nobj;

	/* size = round_up(size, PAGE_SIZE);：将缓冲区大小 size 向上舍入到最接近的页面大小（PAGE_SIZE）。这是因为内核内存管理通常以页面为单位进行操作。 */
	size = round_up(size, PAGE_SIZE);

	/* 使用 kzalloc 函数分配一个大小为 sizeof(*nobj) 的内存块，并将其初始化为零。nobj 是一个指向 struct nvdla_gem_object 结构体的指针，用于表示 NVDLA GEM 对象 */
	nobj = kzalloc(sizeof(*nobj), GFP_KERNEL);
	if (!nobj)
		return ERR_PTR(-ENOMEM);

	dobj = &nobj->object;

	/* drm_gem_private_object_init(drm, dobj, size);：初始化 GEM 对象。这个函数会将 dobj 标记为私有对象，并分配适当大小的内核对象。 */
	drm_gem_private_object_init(drm, dobj, size);

	ret = nvdla_gem_alloc(nobj);
	if (ret)
		goto free_nvdla_obj;

	return nobj;

free_nvdla_obj:
	kfree(nobj);
	return ERR_PTR(ret);
}

1、套路还是一样的，凡是出现了drm_device *drm，必然需要想到drm_gem_object结构体和nvdla_gem_object结构体，首先完成这三个之间的关系注册，核心是两个函数drm_gem_private_object_init函数和nvdla_gem_alloc函数，后者借助输入参数drm_device *drm一路平推到drm_gem_object *dobj再到nvdla_gem_object *nobj完成dma地址分配。
2、drm_gem_private_object_init函数，其定义如下：

/**
 * drm_gem_private_object_init - initialize an allocated private GEM object
 * @dev: drm_device the object should be initialized for
 * @obj: drm_gem_object to initialize
 * @size: object size
 *
 * Initialize an already allocated GEM object of the specified size with
 * no GEM provided backing store. Instead the caller is responsible for
 * backing the object and handling it.
 */
void drm_gem_private_object_init(struct drm_device *dev,
				 struct drm_gem_object *obj, size_t size)
{
	BUG_ON((size & (PAGE_SIZE - 1)) != 0);

	obj->dev = dev;
	obj->filp = NULL;

	kref_init(&obj->refcount);
	obj->handle_count = 0;
	obj->size = size;
	drm_vma_node_reset(&obj->vma_node);
}

从内核定义过程中给出的注释不难理解，该函数初始化指定大小的已分配GEM对象，但不提供GEM backing store。相反，调用者负责支持和处理对象。

7. nvdla_gem_free_object函数

继续看代码,nvdla_gem_free_object函数，用于释放 NVDLA GEM对象的函数，用于销毁和释放先前分配的 DMA缓冲区的内核对象：

// 用于释放 NVDLA GEM对象的函数，用于销毁和释放先前分配的 DMA缓冲区的内核对象
static void nvdla_gem_free_object(struct drm_gem_object *dobj)
{
	struct nvdla_gem_object *nobj;

	drm_gem_free_mmap_offset(dobj);

	nobj = to_nvdla_obj(dobj);

	nvdla_gem_free(nobj);

	kfree(nobj);
}

1、可以展开讲讲drm_gem_free_mmap_offset函数：释放针对drm gem对象的mmap偏移。当使用drm_gem_object_release()函数时，无需补充该函数，因为前者自动调用。

/**
 * drm_gem_free_mmap_offset - release a fake mmap offset for an object
 * @obj: obj in question
 *
 * This routine frees fake offsets allocated by drm_gem_create_mmap_offset().
 *
 * Note that drm_gem_object_release() already calls this function, so drivers
 * don't have to take care of releasing the mmap offset themselves when freeing
 * the GEM object.
 */
void
drm_gem_free_mmap_offset(struct drm_gem_object *obj)
{
	struct drm_device *dev = obj->dev;

	drm_vma_offset_remove(dev->vma_offset_manager, &obj->vma_node);
}

2、nobj = to_nvdla_obj(dobj)中的to_nvdla_obj，这个其实就是在开头提到的宏函数：

container_of(x, struct nvdla_gem_object, object)：这是一个通用的 Linux 内核宏，
用于从结构体的成员指针 x 推导出包含这个成员的完整结构体的指针
// 代码中是从drm_gem_object *dobj 反推nvdla_gem_object *nobj

8. nvdla_gem_create_with_handle函数

继续看代码,nvdla_gem_create_with_handle函数，用于创建具有句柄（handle）的 NVDLA GEM对象的函数。它允许用户空间应用程序创建 GEM 对象，并返回一个句柄。

static struct nvdla_gem_object *
nvdla_gem_create_with_handle(struct drm_file *file_priv,
				struct drm_device *drm, uint32_t size,
				uint32_t *handle)
{
	int32_t ret;
	struct drm_gem_object *dobj;
	struct nvdla_gem_object *nobj;

	nobj = nvdla_gem_create_object(drm, size); // 创建 NVDLA GEM对象nobj
	if (IS_ERR(nobj))
		return ERR_CAST(nobj);

	dobj = &nobj->object;

	/* 调用 drm_gem_handle_create 函数，用于为 GEM 对象创建一个句柄。这个句柄将被返回给用户空间应用程序，并可以用来引用该对象。 */
	ret = drm_gem_handle_create(file_priv, dobj, handle); 
	if (ret)
		goto free_drm_object;

	/* 对 dobj 执行解引用操作，以减少其引用计数。这是因为在成功创建句柄后，用户空间应用程序将拥有对 GEM 对象的句柄引用，不再需要 dobj 的引用。解引用可以减少内核对象的引用计数，以便在不再需要时可以正确释放。 */
	drm_gem_object_unreference_unlocked(dobj);

	return nobj;

free_drm_object:
	nvdla_gem_free_object(dobj);

	return ERR_PTR(ret);
}

首先基于传入的参数drm_device *drm和uint32_t size完成drm gem对象的初始化和dma地址分配，返回多级结构体传递的nvdla_gem_object *nobj，有必要展开讲讲drm_gem_handle_create和drm_gem_object_unreference_unlocked。
1、drm_gem_handle_create函数：

/**
 * drm_gem_handle_create - create a gem handle for an object
 * @file_priv: drm file-private structure to register the handle for
 * @obj: object to register
 * @handlep: pionter to return the created handle to the caller
 *
 * Create a handle for this object. This adds a handle reference to the object,
 * which includes a regular reference count. Callers will likely want to
 * dereference the object afterwards.
 *
 * Since this publishes @obj to userspace it must be fully set up by this point,
 * drivers must call this last in their buffer object creation callbacks.
 */
int drm_gem_handle_create(struct drm_file *file_priv,
			  struct drm_gem_object *obj,
			  u32 *handlep)
{
	mutex_lock(&obj->dev->object_name_lock);

	return drm_gem_handle_create_tail(file_priv, obj, handlep);
}

从注释中看出该函数为此对象创建一个句柄，这将向对象添加一个句柄引用，其中包括一个常规引用计数。调用方可能希望在之后取消引用该对象。可以从后面的函数传入参数中看到handle和size都是用户态的输入，所以用户态可以规定分配给drm device object的内存空间大小，handle的功能就是可以赋予用户态使用handle来创建drm gem object，并赋予了相应的针对该file的每个文件描述符操作后的状态变量
2、drm_gem_object_unreference_unlocked函数：

/**
 * drm_gem_object_unreference_unlocked - release a GEM buffer object reference
 * @obj: GEM buffer object
 *
 * This is a compatibility alias for drm_gem_object_put_unlocked() and should
 * not be used by new code.
 */
static inline void
drm_gem_object_unreference_unlocked(struct drm_gem_object *obj)
{
	drm_gem_object_put_unlocked(obj);
}

这是因为在成功创建句柄后，用户空间应用程序将拥有对 GEM 对象的句柄引用，不再需要 dobj 的引用。

9. nvdla_gem_create函数

继续看代码，nvdla_gem_create完全复刻nvdla_gem_create_with_handle函数，只是简化了后者的操作。

static int32_t nvdla_gem_create(struct drm_device *drm, void *data,
				struct drm_file *file)
{
	struct nvdla_gem_object *nobj;
	struct nvdla_gem_create_args *args = data;

	nobj = nvdla_gem_create_with_handle(file, drm, args->size,
					 &args->handle);
	if (IS_ERR(nobj))
		return PTR_ERR(nobj);

	return 0;
}

先整理到这里，由于篇幅过于长，因此打算放到下一篇继续展开。

二、nvdla_gem.c代码内函数整理一

函数原型	功能
`static int32_t nvdla_fill_task_desc(struct nvdla_ioctl_submit_task local_task,struct nvdla_task task)`	将`local_task`的任务地址数量`num_addresses`和任务具体内容的指针`handles`，其中`local_task->num_addresses * sizeof(struct nvdla_mem_handle)`就是在申请所有具体任务相关数据的地址空间
`static int32_t nvdla_submit(struct drm_device drm, void arg,struct drm_file *file)`	`nvdla_submit`函数传入参数`arg`（该参数的使之内容是`nvdla_submit_args`结构体类型的变量，包含内容为任务、任务的数量等），`arg`传入的任务转换为`nvdla_ioctl_submit_task`结构体类型的任务，随后调用`nvdla_fill_task_desc`完成用户态空间任务数据到内核态空间任务数据的下陷。与此同时，利用传入的`drm_device`结构体指针`drm`通过`dev_get_drvdata`来获取与其他子系统交互的过程中当前的`driver data`，从而引入完成`nvdla_fill_task_desc`功能的另一个关键变量`task`，并将`drm_file`结构体提交给`task`，其中`drm_file`结构体包含针对该file的每个文件描述符操作后的状态变量。最后使用`nvdla_task_submit`函数提交 NVDLA 任务并等待任务完成的函数。
`static int32_t nvdla_gem_alloc(struct nvdla_gem_object *nobj)`	`nvdla_gem_alloc`函数，该函数传入的变量是nvdla用于存储管理的结构体`nvdla_gem_object`，根据前面介绍，该结构含有三个重要的变量，负责`drm`下存储分配和管理的`drm_gem_object`结构体、内核态虚拟地址`kvaddr`和`dma`相关变量。整个函数实现的功能是dma地址分配。
`static void nvdla_gem_free(struct nvdla_gem_object *nobj)`	释放`nvdla_gem_alloc`申请到的设备dma缓冲区
`static struct nvdla_gem_object * nvdla_gem_create_object(struct drm_device *drm, uint32_t size)`	用于创建 NVDLA GEM对象的函数，随后分配和管理 DMA缓冲区的内核对象。前半部分的创建通过内核定义API`drm_gem_private_object_init`函数实现，后半部分调用`nvdla_gem_alloc`实现
`static void nvdla_gem_free_object(struct drm_gem_object *dobj)`	用于释放 NVDLA GEM对象的函数，用于销毁和释放先前分配的 DMA缓冲区的内核对象
`static struct nvdla_gem_object * nvdla_gem_create_with_handle(struct drm_file file_priv,struct drm_device drm, uint32_t size,uint32_t *handle)`	用于创建具有句柄（handle）的 NVDLA GEM对象的函数。它允许用户空间应用程序创建 GEM 对象，并返回一个句柄
`static int32_t nvdla_gem_create(struct drm_device drm, void data, struct drm_file *file)`	和`nvdla_gem_create_with_handle(struct drm_file file_priv,struct drm_device drm, uint32_t size,uint32_t *handle)`完全一样

三、nvdla_gem.c牵扯结构体整理一

结构体	功能
`nvdla_gem_object`	包含重要的变量，首先是`drm_gem_object`，用于`drm`存储管理和分配的结构体；其次是`*kvaddr`：这是一个指针成员，通常用于存储内核虚拟地址。这个地址指向内核中的数据缓冲区，该缓冲区可能包含了与图形或DMA相关的数据。这个成员可能被用于快速访问数据，而无需进行物理内存地址转换；最后是和`dma`相关的地址和属性
`nvdla_mem_handle`	作为媒介联通用户态空间任务结构体`nvdla_ioctl_submit_task`和内核态空间任务结构体`nvdla_task`
`nvdla_ioctl_submit_task`	用户态空间任务结构体
`nvdla_task`	内核态空间任务结构体
`nvdla_device`	包含的信息是设备常用信息，比如中断、平台设备、drm设备等
`nvdla_submit_args`	该结构体包含任务信息，用于用户态空间传入任务相关数据的参数，并通过该参数和`nvdla_ioctl_submit_task`交互，总体来说，任务粒度高于`nvdla_ioctl_submit_task`
`drm_file`	包含针对该file的每个文件描述符操作后的状态变量
`drm_gem_object`	描述`drm`的存储分配对象，包含了该对象归属的设备`drm_device`和对象的大小`size`
`drm_device`	描述了`drm`设备结构体，包含了该总线设备的数据结构