蜗蜗 :Linux kernel scatterlist API介绍
一致性DMA:在驱动初始化时mapping,在驱动shutdown时unmapping**(意味着不是一次性的,是持续性的使用该DMA映射)**。硬件需要保证外设和CPU能并行访问同一块数据,并且保证在软件无显式flush操作的情况下,CPU和外设能同步看到对方对数据的更新。一致性(consistent)可以理解为同步(synchronous)。
api:dma_alloc_coherent 直接从dma区域申请内存。
流式DMA:一般是需要一次DMA transfer时map,传输结束后unmap(当然也可以有dma_sync的操作,下文会详聊),硬件可以优化存取的顺序。流式(streaming)可以理解为异步(asynchronous)。
典型用例:网卡进行数据传输使用的DMA buffer;SCSI设备写入/读取的文件系统buffer;
设计这样的接口是为了充分优化硬件的性能。
相关接口为 dma_map_sg(), dma_unmap_sg(),dma_map_single(),dma_unmap_single()。
一致性缓存的方式是内核专门申请好一块内存给DMA用。而有时驱动并没这样做,而是让DMA引擎直接在上层传下来的内存里做事情。例如从协议栈里发下来的一个包,想通过网卡发送出去。
但是协议栈并不知道这个包要往哪里走,因此分配内存的时候并没有特殊对待,这个包所在的内存通常都是可以cache的。
这时,内存在给DMA使用之前,就要调用一次dma_map_sg()或dma_map_single(),取决于你的DMA引擎是否支持聚集散列(DMA scatter-gather),支持就用dma_map_sg(),不支持就用dma_map_single()。DMA用完之后要调用对应的unmap接口。由于协议栈下来的包的数据有可能还在cache里面,调用dma_map_single()后,CPU就会做一次cache的flush,将cache的数据刷到内存,这样DMA去读内存就读到新的数据了。
注意,在map的时候要指定一个参数,来指明数据的方向是从外设到内存还是从内存到外设:
从内存到外设:CPU会做cache的flush操作,将cache中新的数据刷到内存。
从外设到内存:CPU将cache置无效,这样CPU读的时候不命中,就会从内存去读新的数据。还要注意,这几个接口都是一次性的,每次操作数据都要调用一次map和unmap。并且在map期间,CPU不能去操作这段内存,因此如果CPU去写,就又不一致了。
同样的,dma_map_sg()和dma_map_single()的后端实现也都是和硬件特性相关。
api:dma_handle = dma_map_single(dev, addr, size, direction);
提前申请了一块内存,这个地址可以是非dma区域。通过map后,映射到dma区域,意味着实际上是使用了连个区域。内部会sync同步。一般是临时使用后需要进行dma_unmap_page操作。
static const struct dma_map_ops iommu_dma_ops = {
.alloc = iommu_dma_alloc, ----------------------------(1)
.free = iommu_dma_free,
.alloc_pages = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.alloc_noncoherent = iommu_dma_alloc_noncoherent,
.free_noncoherent = iommu_dma_free_noncoherent,
.mmap = iommu_dma_mmap,
.get_sgtable = iommu_dma_get_sgtable,
.map_page = iommu_dma_map_page, ---------------------(2)
.unmap_page = iommu_dma_unmap_page,
.map_sg = iommu_dma_map_sg,
.unmap_sg = iommu_dma_unmap_sg,
.sync_single_for_cpu = iommu_dma_sync_single_for_cpu,
.sync_single_for_device = iommu_dma_sync_single_for_device,
.sync_sg_for_cpu = iommu_dma_sync_sg_for_cpu,
.sync_sg_for_device = iommu_dma_sync_sg_for_device,
.map_resource = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
};
(1) 一致性 :会分配连续的符号地址范围要求的连续物理内存。这个地址一直可以用。
dma_alloc_coherent->dma_alloc_attrs 根据是否direct(有smmu),分别调用dma_direct_alloc 或者ops->alloc (dma_map_ops) 见上。
iommu_dma_alloc //根据参数
非连续物理内存:
iommu_dma_alloc_remap
__iommu_dma_alloc_pages //此处分配非连续的物理页,不过也是优先从dev所属的node上分配物理页:
//nid = dev_to_node(dev);
//page = alloc_pages_node(nid, alloc_flags, order);
iommu_dma_alloc_iova
iommu_map_sg_atomic
__iommu_map_sg
__iommu_map //循环sg
ops->map
连续物理内存:
dma_alloc_from_pool //不可睡眠
or:iommu_dma_alloc_pages 使用 dma_alloc_contiguous(连续,但不一定dev所属node,需要cma_pernuma)或alloc_pages_node(从dev所属node)接口分配多个连续的page
__iommu_dma_map //输入:上面分配的物理地址
iommu_dma_alloc_iova //分配iova
iommu_map_atomic //将物理地址和iova进行map
__iommu_map //这里会while循环,每次根据iommu的页表大小,一次映射一次,
ops->map //调用iommu_ops的map函数 arm_smmu_map 见下。
//direct
dma_direct_alloc
__dma_direct_alloc_pages
//gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,&phys_limit); //根据coherent_dma_mask来确定是否支持64bit
page = dma_alloc_contiguous(dev, size, gfp); //分配连续的物理内存
or:page = alloc_pages_node(node, gfp, get_order(size)); //如果cma区域不够。
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page)); //返回dma地址
*如果内核打开CONFIG_DMA_REMAP且attrs没有DMA_ATTR_FORCE_CONTIGUOUS强制连续,则调用iommu_dma_alloc_remap,分配不连续的page,然后内部通过sg映射。
否则分配连续的page并映射。
上述涉及到三个地址:
phys物理地址,地址不需要返回。
iova分配到的iommu地址,需要给设备使用,是连续的。
vaddr虚拟地址,是物理地址phys对应的虚拟地址,供cpu使用。
iommu_dma_alloc函数:
static void *iommu_dma_alloc(struct device *dev, size_t size,
dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
{
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
struct page *page = NULL;
void *cpu_addr;
gfp |= __GFP_ZERO;
//gfpflags_allow_blocking判断gfp是否__GFP_DIRECT_RECLAIM 可回收标志,
//GFP_KERNEL标志就是包含这个标志,因此通常都是可以回收睡眠的标准。
if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
return iommu_dma_alloc_remap(dev, size, handle, gfp,
dma_pgprot(dev, PAGE_KERNEL, attrs), attrs);
}
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent) //只有不可回收的,
page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
gfp, NULL);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
if (!cpu_addr)
return NULL;
*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
dev->coherent_dma_mask);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
}
return cpu_addr;
}
****************************头文件和宏定义include/linux/gfp.h
#define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))
static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
{
return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
}
因此GFP_KERNEL 的条件下 gfpflags_allow_blocking 返回1
demo驱动例子dma_alloc_coherent:
如:调用iommu_dma_alloc_remap的trace:
insmod-503025 [082] .... 27149.691437: iommu_dma_alloc_remap <-iommu_dma_alloc
insmod-503025 [082] .... 27149.691441: <stack trace>
=> iommu_dma_alloc_remap
=> iommu_dma_alloc
=> dma_alloc_attrs
=> test_init
=> do_one_initcall
=> do_init_module
=> load_module
=> __se_sys_finit_module
=> __arm64_sys_finit_module
=> invoke_syscall
=> el0_svc_common.constprop.0
=> do_el0_svc
=> el0_svc
=> el0_sync_handler
=> el0_sync
====================================================
代码参考test.c:
#include <linux/module.h>
#include <linux/gfp.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/device.h>
#include <linux/io.h>
#include <linux/sched/task.h>
#include <linux/arm-smccc.h>
#include <linux/cpumask.h>
#include <asm/cacheflush.h>
#include <linux/pci.h>
#include <linux/dma-mapping.h>
#include <linux/dma-map-ops.h>
static int __init test_init(void)
{
struct pci_dev *gpu_dev = NULL;
while (gpu_dev = pci_get_device(0x8088, 0x0107, gpu_dev))
{
if (gpu_dev)
{
printk("dev %s ok found dma_ops %llx max_seg_size %llx \n",dev_name(&gpu_dev->dev),get_dma_ops(&gpu_dev->dev),
dma_get_max_seg_size(&gpu_dev->dev));
printk("dev %s class %llx \n",dev_name(&gpu_dev->dev),gpu_dev->dev.class);
break;
}
}
if(!get_dma_ops(&gpu_dev->dev))
{
printk("dma_ops is null !\n");
return 0;
}
dma_addr_t addr;
dma_alloc_coherent(&gpu_dev->dev,4096,&addr,GFP_KERNEL);
printk("addr %llx\n",addr);
return 0;
}
static void __exit test_exit(void)
{
printk(KERN_INFO "test_exit\n");
}
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
(2) 流式 ,目的是为了将原本就有的物理页,进行映射成符合地址范围的dma地址。
dma_map_single->dma_map_single_attrs->dma_map_page_attrs 根据是否direct(有smmu),分别调用dma_direct_map_page 或者ops->map_page (iommu_dma_map_page) 见上。
iommu_dma_map_page
__iommu_dma_map ----------------见(1)
//direct
dma_direct_map_page
如果!dma_capable(dev, dma_addr, size, true) //比如32bit的设备,分配的64bit的物理页
swiotlb_map(dev, phys, size, dir, attrs); //通过swiotlb进行映射
swiotlb_tbl_map_single
如果你需要多次访问同一个streaming DMA buffer,并且在DMA传输之间读写DMA Buffer上的数据,这时候你需要小心进行DMA buffer的sync操作,以便CPU和设备(DMA controller)可以看到最新的、正确的数据。
*流式dma,需要在每次完成,或准备之前调用dma_sync_single_for_XXX
void dma_sync_single_for_cpu(dev, dma_handle, size, direction)// 在DMA把数据从device搬到DDR后,在cpu 访问DDR之前调用,目的是为了让cpu看到最新的数据
void dma_sync_single_for_device(dev, dma_handle, size, direction)// 在DMA把数据从DDR搬到device之前调用,目的是为了让device看到最新的数据
原文链接:https://blog.csdn.net/HT_77/article/details/124077109
void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
else if (ops->sync_single_for_cpu)
ops->sync_single_for_cpu(dev, addr, size, dir);
debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_sync_single_for_cpu);
void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops))
dma_direct_sync_single_for_device(dev, addr, size, dir);
else if (ops->sync_single_for_device)
ops->sync_single_for_device(dev, addr, size, dir);
debug_dma_sync_single_for_device(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_sync_single_for_device);
dma_sync_single_for_cpu:调用
dma_sync_single_for_cpu
//direct
dma_direct_sync_single_for_cpu
if(is_swiotlb_buffer(paddr))://如果是swiotlb的内存类型
swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE); //orig_addr对应高端内存,tlb_addr是swiotlb分配的4G内的地址
循环memcpy(vaddr, buffer + offset, sz);
//iommu
ops->sync_single_for_cpu(dev, addr, size, dir);
arm_smmu_ops :
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
.domain_free = arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
.map = arm_smmu_map, ------------------------(1)
.unmap = arm_smmu_unmap,
.flush_iotlb_all = arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys = arm_smmu_iova_to_phys,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.device_group = arm_smmu_device_group,
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
.of_xlate = arm_smmu_of_xlate,
.get_resv_regions = arm_smmu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.dev_has_feat = arm_smmu_dev_has_feature,
.dev_feat_enabled = arm_smmu_dev_feature_enabled,
.dev_enable_feat = arm_smmu_dev_enable_feature,
.dev_disable_feat = arm_smmu_dev_disable_feature,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
};
(1)
arm_smmu_map
ops->map 调用io_pgtable_ops->map
arm_lpae_map
__arm_lpae_map //参考https://blog.csdn.net/flyingnosky/article/details/122951474
比如 :
[ 2.504955] dump_backtrace+0x0/0x1b0
[ 2.508611] show_stack+0x18/0x70
[ 2.511920] dump_stack+0xd0/0x12c
[ 2.515315] __arm_lpae_map+0xc4/0x388
[ 2.519058] __arm_lpae_map+0x354/0x388
[ 2.522888] __arm_lpae_map+0x354/0x388
[ 2.526717] arm_lpae_map+0xe4/0x180
[ 2.530286] arm_smmu_map+0x20/0x34
[ 2.533769] __iommu_map+0xdc/0x1cc
[ 2.537251] iommu_map+0x14/0x20
[ 2.540473] iommu_dma_prepare_msi+0x164/0x210
[ 2.544911] its_irq_domain_alloc+0x7c/0x150
[ 2.549175] irq_domain_alloc_irqs_parent+0x28/0x40
[ 2.554048] msi_domain_alloc+0x78/0x140
[ 2.557965] __irq_domain_alloc_irqs+0x16c/0x470
[ 2.562576] __msi_domain_alloc_irqs+0x90/0x2e0
[ 2.567100] msi_domain_alloc_irqs+0x1c/0x30
[ 2.571364] __pci_enable_msix_range+0x614/0x6a0
[ 2.575976] pci_alloc_irq_vectors_affinity+0xc0/0x13c
[ 2.581109] pcie_port_device_register+0x108/0x40c
[ 2.585893] pcie_portdrv_probe+0x34/0xe4
[ 2.589897] local_pci_probe+0x40/0xac
注意:__iommu_map 会while循环,每次映射一个iommu页表大小。且iova的地址需要页对齐。
static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
const struct iommu_ops *ops = domain->ops;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
size_t orig_size = size;
phys_addr_t orig_paddr = paddr;
int ret = 0;
if (unlikely(ops->map == NULL ||
domain->pgsize_bitmap == 0UL))
return -ENODEV;
if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
return -EINVAL;
/* find out the minimum page size supported */
min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
/*
* both the virtual address and the physical one, as well as
* the size of the mapping, must be aligned (at least) to the
* size of the smallest page supported by the hardware
*/
if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) { //大小,地址要对齐。
pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
iova, &paddr, size, min_pagesz);
return -EINVAL;
}
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);
while (size) { //循环
size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
iova, &paddr, pgsize);
ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
if (ret)
break;
iova += pgsize;
paddr += pgsize;
size -= pgsize;
}
if (ops->iotlb_sync_map)
ops->iotlb_sync_map(domain);
/* unroll mapping in case something went wrong */
if (ret)
iommu_unmap(domain, orig_iova, orig_size - size);
else
trace_map(orig_iova, orig_paddr, orig_size);
return ret;
}
比如我在iommu_dma_prepare_msi->iommu_dma_get_msi_page中修改 :
size_t size = cookie_msi_granule(cookie) + PAGE_SIZE + 1; // 增加PAGE_SIZE + 1将导致如下报错,大小0x2001 和页大小0x1000 不对齐:
[ 2.385478] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[ 2.500911] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[ 2.587739] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[ 2.660997] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[ 2.734276] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
设备的dma_ops是什么时候分配的?
比如acpi初始化的是时候:参考dma_mask和coherent_dma_mask 的默认值
platform设备:
int platform_dma_configure(struct device *dev)
{
enum dev_dma_attr attr;
int ret = 0;
if (dev->of_node) {
ret = of_dma_configure(dev, dev->of_node, true);
} else if (has_acpi_companion(dev)) {
attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
ret = acpi_dma_configure(dev, attr);
}
return ret;
}
acpi_dma_configure(struct device *dev,enum dev_dma_attr attr) //attr:DEV_DMA_NOT_SUPPORTED/DEV_DMA_COHERENT/DEV_DMA_NON_COHERENT
acpi_dma_configure_id(dev, attr, NULL);
arch_setup_dma_ops(dev, dma_addr, size,iommu, attr == DEV_DMA_COHERENT);
dev->dma_coherent = coherent;
if(iommu) iommu_setup_dma_ops(dev, dma_base, size);
if (domain->type == IOMMU_DOMAIN_DMA) { //如果传递iommu.passthrough = 1,则type将不成立,置为空为direct
if (iommu_dma_init_domain(domain, dma_base, size, dev))
goto out_err;
dev->dma_ops = &iommu_dma_ops;
}
pci设备:
static int pci_dma_configure(struct device *dev)
{
struct device *bridge;
int ret = 0;
bridge = pci_get_host_bridge_device(to_pci_dev(dev));
if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
bridge->parent->of_node) {
ret = of_dma_configure(dev, bridge->parent->of_node, true);
} else if (has_acpi_companion(bridge)) {
struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
}
pci_put_host_bridge_device(bridge);
return ret;
}
log:
[ 13.291332] ===iommu_setup_dma_ops for dev 0000:06:00.0 domain ffff69cdb189ed58
[ 13.291338] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.0-wzm-test #11
[ 13.291341] Hardware name: PL PR212F3/BC82AMDRA, BIOS 0.22 04/26/2022 10:42:11
[ 13.291345] Call trace:
[ 13.291356] dump_backtrace+0x0/0x1e4
[ 13.291361] show_stack+0x20/0x2c
[ 13.291369] dump_stack+0xd8/0x140
[ 13.291378] iommu_setup_dma_ops+0x60/0x210
[ 13.291382] arch_setup_dma_ops+0x80/0xd0
[ 13.291390] acpi_dma_configure_id+0xb0/0xd0
[ 13.291396] pci_dma_configure+0xb0/0xd4
[ 13.291402] really_probe+0xac/0x51c
[ 13.291405] driver_probe_device+0xfc/0x170
[ 13.291408] device_driver_attach+0xc8/0xd0
[ 13.291411] __driver_attach+0xac/0x180
[ 13.291415] bus_for_each_dev+0x78/0xdc
[ 13.291418] driver_attach+0x2c/0x40
[ 13.291420] bus_add_driver+0xd8/0x230
[ 13.291423] driver_register+0x80/0x13c
[ 13.291426] __pci_register_driver+0x4c/0x5c
[ 13.291433] xhci_pci_init+0x68/0x78
[ 13.291439] do_one_initcall+0x4c/0x250
[ 13.291444] do_initcall_level+0xe4/0xfc
[ 13.291446] do_initcalls+0x80/0xa4
[ 13.291448] kernel_init_freeable+0x1a4/0x250
[ 13.291453] kernel_init+0x1c/0x128
[ 13.291456] ret_from_fork+0x10/0x18
cma和pswiotlb的区别:
cma是给分配物理内存用的。默认在4G之下的内存,预留连续的地址,以方便dma的分配接口分配空间,比如dma_alloc_coherent-> iommu_dma_alloc_pages-> dma_alloc_contiguous->cma_alloc
pswiotlb是为了映射用的。比如dma_map_single这种接口,且在非iommu的情况下才用到。将高端地址映射到低端区域,方便只支持32位的dma设备使用。
DMA api层级结构:
DMA api层级结构:
1) kernel/dma/mapping.c
2) drivers/iommu/dma-iommu.c 的ops
3)driver/iommu/iommu.c smmu具体设备的上次接口
4) driver/iommu/arm-smmu-3/arm-smmu-v3.c 具体iommu_ops的映射接口
具体 5.15:
1)dma_map_sg:
dma_map_sg_attrs //kernel/dma/mapping.c
iommu:
iommu_dma_map_sg //drivers/iommu/dma-iommu.c
1)//untrusted 不可信设备,返回iommu_dma_map_sg_swiotlb //for_each_sg 分别__iommu_dma_map_swiotlb
__iommu_dma_map_swiotlb //将phy映射到swioltb的phy1地址,并__iommu_dma_map 将phy1 和 申请的iova进行映射。相当于借用swiotlb添加了一层,用来保护数据?
swiotlb_tbl_map_single
__iommu_dma_map
iommu_dma_alloc_iova
iommu_map_atomic //drivers/iommu/iommu.c
_iommu_map
2)常规设备
iommu_dma_alloc_iova
iommu_map_sg_atomic
__iommu_map_sg //循环每个sg映射
__iommu_map
__finalise_sg
direct:
dma_direct_map_sg //kernel/dma/direct.c
dma_direct_map_page //for_each_sg
如果dma_mask不支持64bit:
swiotlb_map
否则直接返回
dma_addr
2)dma_map_page:iommu_dma_map_page直接调用__iommu_dma_map_swiotlb.
dma_map_page_attrs //kernel/dma/mapping.c
iommu:
dma_map_ops *ops->map_page
iommu_dma_map_page //drivers/iommu/dma-iommu.cdirect:
dma_direct_map_pageiommu_map_atomic //drivers/iommu/iommu.c
_iommu_map
__iommu_map
__iommu_map_pages
iommu_ops *ops->map_pages //arm-smmu-v3.c