两种类型的DMA mapping：一致性DMA映射&流式DMA映射 (5.10)

古井无波1999

已于 2025-04-29 08:38:57 修改

阅读量2.3k

点赞数 4

分类专栏： DMA和IOMMU/SMMU 文章标签： linux

于 2023-01-14 11:23:50 首次发布

本文链接：https://blog.csdn.net/gjioui123/article/details/128683273

版权

DMA和IOMMU/SMMU 专栏收录该内容

6 篇文章

订阅专栏

蜗蜗：Linux kernel scatterlist API介绍

一致性DMA:在驱动初始化时mapping，在驱动shutdown时unmapping**(意味着不是一次性的，是持续性的使用该DMA映射)**。硬件需要保证外设和CPU能并行访问同一块数据，并且保证在软件无显式flush操作的情况下，CPU和外设能同步看到对方对数据的更新。一致性(consistent)可以理解为同步(synchronous)。

api：dma_alloc_coherent 直接从dma区域申请内存。

流式DMA:一般是需要一次DMA transfer时map，传输结束后unmap（当然也可以有dma_sync的操作，下文会详聊），硬件可以优化存取的顺序。流式(streaming)可以理解为异步(asynchronous)。

典型用例：网卡进行数据传输使用的DMA buffer；SCSI设备写入/读取的文件系统buffer；

设计这样的接口是为了充分优化硬件的性能。

相关接口为 dma_map_sg(), dma_unmap_sg(),dma_map_single(),dma_unmap_single()。
一致性缓存的方式是内核专门申请好一块内存给DMA用。而有时驱动并没这样做，而是让DMA引擎直接在上层传下来的内存里做事情。例如从协议栈里发下来的一个包，想通过网卡发送出去。
但是协议栈并不知道这个包要往哪里走，因此分配内存的时候并没有特殊对待，这个包所在的内存通常都是可以cache的。
这时，内存在给DMA使用之前，就要调用一次dma_map_sg()或dma_map_single()，取决于你的DMA引擎是否支持聚集散列（DMA scatter-gather），支持就用dma_map_sg()，不支持就用dma_map_single()。DMA用完之后要调用对应的unmap接口。

由于协议栈下来的包的数据有可能还在cache里面，调用dma_map_single()后，CPU就会做一次cache的flush，将cache的数据刷到内存，这样DMA去读内存就读到新的数据了。

注意，在map的时候要指定一个参数，来指明数据的方向是从外设到内存还是从内存到外设：
从内存到外设：CPU会做cache的flush操作，将cache中新的数据刷到内存。
从外设到内存：CPU将cache置无效，这样CPU读的时候不命中，就会从内存去读新的数据。

还要注意，这几个接口都是一次性的，每次操作数据都要调用一次map和unmap。并且在map期间，CPU不能去操作这段内存，因此如果CPU去写，就又不一致了。
同样的，dma_map_sg()和dma_map_single()的后端实现也都是和硬件特性相关。

api：dma_handle = dma_map_single(dev, addr, size, direction);

提前申请了一块内存，这个地址可以是非dma区域。通过map后，映射到dma区域，意味着实际上是使用了连个区域。内部会sync同步。一般是临时使用后需要进行dma_unmap_page操作。

参考：dma基础_一文读懂dma的方方面面 - 知乎

static const struct dma_map_ops iommu_dma_ops = {
        .alloc                  = iommu_dma_alloc, ----------------------------(1)
        .free                   = iommu_dma_free,
        .alloc_pages            = dma_common_alloc_pages,
        .free_pages             = dma_common_free_pages,
        .alloc_noncoherent      = iommu_dma_alloc_noncoherent,
        .free_noncoherent       = iommu_dma_free_noncoherent,
        .mmap                   = iommu_dma_mmap,
        .get_sgtable            = iommu_dma_get_sgtable,
        .map_page               = iommu_dma_map_page, ---------------------（2）
        .unmap_page             = iommu_dma_unmap_page,
        .map_sg                 = iommu_dma_map_sg,
        .unmap_sg               = iommu_dma_unmap_sg,
        .sync_single_for_cpu    = iommu_dma_sync_single_for_cpu,
        .sync_single_for_device = iommu_dma_sync_single_for_device,
        .sync_sg_for_cpu        = iommu_dma_sync_sg_for_cpu,
        .sync_sg_for_device     = iommu_dma_sync_sg_for_device,
        .map_resource           = iommu_dma_map_resource,
        .unmap_resource         = iommu_dma_unmap_resource,
        .get_merge_boundary     = iommu_dma_get_merge_boundary,
};

(1) 一致性 :会分配连续的符号地址范围要求的连续物理内存。这个地址一直可以用。

dma_alloc_coherent->dma_alloc_attrs 根据是否direct(有smmu)，分别调用dma_direct_alloc 或者ops->alloc (dma_map_ops) 见上。
	iommu_dma_alloc //根据参数
	非连续物理内存：
		iommu_dma_alloc_remap
			 __iommu_dma_alloc_pages  //此处分配非连续的物理页,不过也是优先从dev所属的node上分配物理页：
                //nid = dev_to_node(dev); 
                //page = alloc_pages_node(nid, alloc_flags, order); 
			 iommu_dma_alloc_iova
			 iommu_map_sg_atomic
				 __iommu_map_sg
					__iommu_map //循环sg
						ops->map
	连续物理内存：
		dma_alloc_from_pool //不可睡眠
		or:iommu_dma_alloc_pages 使用 dma_alloc_contiguous(连续，但不一定dev所属node，需要cma_pernuma)或alloc_pages_node(从dev所属node)接口分配多个连续的page
		__iommu_dma_map //输入:上面分配的物理地址
			iommu_dma_alloc_iova  //分配iova
			iommu_map_atomic  //将物理地址和iova进行map
				__iommu_map    //这里会while循环，每次根据iommu的页表大小，一次映射一次，
					ops->map   //调用iommu_ops的map函数 arm_smmu_map 见下。
	//direct
	dma_direct_alloc
		__dma_direct_alloc_pages 
			//gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,&phys_limit); //根据coherent_dma_mask来确定是否支持64bit
			page = dma_alloc_contiguous(dev, size, gfp); //分配连续的物理内存
			or:page = alloc_pages_node(node, gfp, get_order(size)); //如果cma区域不够。
			
			*dma_handle = phys_to_dma_direct(dev, page_to_phys(page)); //返回dma地址




*如果内核打开CONFIG_DMA_REMAP且attrs没有DMA_ATTR_FORCE_CONTIGUOUS强制连续，则调用iommu_dma_alloc_remap，分配不连续的page，然后内部通过sg映射。
否则分配连续的page并映射。
上述涉及到三个地址：
phys物理地址，地址不需要返回。
iova分配到的iommu地址，需要给设备使用，是连续的。
vaddr虚拟地址，是物理地址phys对应的虚拟地址，供cpu使用。

iommu_dma_alloc函数：

static void *iommu_dma_alloc(struct device *dev, size_t size,
		dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
{
	bool coherent = dev_is_dma_coherent(dev);
	int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
	struct page *page = NULL;
	void *cpu_addr;

	gfp |= __GFP_ZERO;
    //gfpflags_allow_blocking判断gfp是否__GFP_DIRECT_RECLAIM 可回收标志，
    //GFP_KERNEL标志就是包含这个标志，因此通常都是可以回收睡眠的标准。
	if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
	    !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
		return iommu_dma_alloc_remap(dev, size, handle, gfp,
				dma_pgprot(dev, PAGE_KERNEL, attrs), attrs);
	}

	if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
	    !gfpflags_allow_blocking(gfp) && !coherent) //只有不可回收的，
		page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
					       gfp, NULL);
	else
		cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
	if (!cpu_addr)
		return NULL;

	*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
			dev->coherent_dma_mask);
	if (*handle == DMA_MAPPING_ERROR) {
		__iommu_dma_free(dev, size, cpu_addr);
		return NULL;
	}

	return cpu_addr;
}


****************************头文件和宏定义include/linux/gfp.h

#define GFP_KERNEL	(__GFP_RECLAIM | __GFP_IO | __GFP_FS)

#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))


static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
{
	return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
}

因此GFP_KERNEL 的条件下 gfpflags_allow_blocking 返回1

demo驱动例子dma_alloc_coherent：

如：调用iommu_dma_alloc_remap的trace：

          insmod-503025  [082] .... 27149.691437: iommu_dma_alloc_remap <-iommu_dma_alloc
          insmod-503025  [082] .... 27149.691441: <stack trace>
 => iommu_dma_alloc_remap
 => iommu_dma_alloc
 => dma_alloc_attrs
 => test_init
 => do_one_initcall
 => do_init_module
 => load_module
 => __se_sys_finit_module
 => __arm64_sys_finit_module
 => invoke_syscall
 => el0_svc_common.constprop.0
 => do_el0_svc
 => el0_svc
 => el0_sync_handler
 => el0_sync





====================================================
代码参考test.c：
#include <linux/module.h>
#include <linux/gfp.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/device.h>
#include <linux/io.h>
#include <linux/sched/task.h>
#include <linux/arm-smccc.h>
#include <linux/cpumask.h>
#include <asm/cacheflush.h>
#include <linux/pci.h>
#include <linux/dma-mapping.h>
#include <linux/dma-map-ops.h>

static int __init test_init(void)
{
    struct pci_dev *gpu_dev = NULL;

    while (gpu_dev = pci_get_device(0x8088, 0x0107, gpu_dev))
    {
      if (gpu_dev)
        {
            printk("dev %s ok found dma_ops %llx max_seg_size %llx  \n",dev_name(&gpu_dev->dev),get_dma_ops(&gpu_dev->dev),
                            dma_get_max_seg_size(&gpu_dev->dev));
            printk("dev %s class %llx \n",dev_name(&gpu_dev->dev),gpu_dev->dev.class);
            break;
        }
    }

    if(!get_dma_ops(&gpu_dev->dev))
    {
        printk("dma_ops is null !\n");
        return 0;
    }
    dma_addr_t addr;
    dma_alloc_coherent(&gpu_dev->dev,4096,&addr,GFP_KERNEL);
    printk("addr %llx\n",addr);
    return 0;
}
static void __exit test_exit(void)
{
        printk(KERN_INFO "test_exit\n");
}

module_init(test_init);
module_exit(test_exit);


MODULE_LICENSE("GPL");

(2) 流式，目的是为了将原本就有的物理页，进行映射成符合地址范围的dma地址。

dma_map_single->dma_map_single_attrs->dma_map_page_attrs 根据是否direct(有smmu)，分别调用dma_direct_map_page 或者ops->map_page (iommu_dma_map_page) 见上。
	iommu_dma_map_page
		__iommu_dma_map   ----------------见(1)	

	//direct
	dma_direct_map_page
		如果！dma_capable(dev, dma_addr, size, true) //比如32bit的设备，分配的64bit的物理页
			swiotlb_map(dev, phys, size, dir, attrs); //通过swiotlb进行映射
				swiotlb_tbl_map_single

如果你需要多次访问同一个streaming DMA buffer，并且在DMA传输之间读写DMA Buffer上的数据，这时候你需要小心进行DMA buffer的sync操作，以便CPU和设备（DMA controller）可以看到最新的、正确的数据。

*流式dma，需要在每次完成，或准备之前调用dma_sync_single_for_XXX

void dma_sync_single_for_cpu(dev, dma_handle, size, direction)// 在DMA把数据从device搬到DDR后，在cpu 访问DDR之前调用，目的是为了让cpu看到最新的数据

void dma_sync_single_for_device(dev, dma_handle, size, direction)// 在DMA把数据从DDR搬到device之前调用，目的是为了让device看到最新的数据


                        
原文链接：https://blog.csdn.net/HT_77/article/details/124077109


void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
		enum dma_data_direction dir)
{
	const struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (dma_map_direct(dev, ops))
		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
	else if (ops->sync_single_for_cpu)
		ops->sync_single_for_cpu(dev, addr, size, dir);
	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_sync_single_for_cpu);

void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
		size_t size, enum dma_data_direction dir)
{
	const struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (dma_map_direct(dev, ops))
		dma_direct_sync_single_for_device(dev, addr, size, dir);
	else if (ops->sync_single_for_device)
		ops->sync_single_for_device(dev, addr, size, dir);
	debug_dma_sync_single_for_device(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_sync_single_for_device);

dma_sync_single_for_cpu:调用

dma_sync_single_for_cpu
	//direct
	dma_direct_sync_single_for_cpu	
		if(is_swiotlb_buffer(paddr))：//如果是swiotlb的内存类型
            swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_CPU);
			    swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE); //orig_addr对应高端内存，tlb_addr是swiotlb分配的4G内的地址
				循环memcpy(vaddr, buffer + offset, sz);
    //iommu	
	ops->sync_single_for_cpu(dev, addr, size, dir);

arm_smmu_ops ：


static struct iommu_ops arm_smmu_ops = {
        .capable                = arm_smmu_capable,
        .domain_alloc           = arm_smmu_domain_alloc,
        .domain_free            = arm_smmu_domain_free,
        .attach_dev             = arm_smmu_attach_dev,
        .map                    = arm_smmu_map,                      ------------------------（1）
        .unmap                  = arm_smmu_unmap,
        .flush_iotlb_all        = arm_smmu_flush_iotlb_all,
        .iotlb_sync             = arm_smmu_iotlb_sync,
        .iova_to_phys           = arm_smmu_iova_to_phys,
        .probe_device           = arm_smmu_probe_device,
        .release_device         = arm_smmu_release_device,
        .device_group           = arm_smmu_device_group,
        .domain_get_attr        = arm_smmu_domain_get_attr,
        .domain_set_attr        = arm_smmu_domain_set_attr,
        .of_xlate               = arm_smmu_of_xlate,
        .get_resv_regions       = arm_smmu_get_resv_regions,
        .put_resv_regions       = generic_iommu_put_resv_regions,
        .dev_has_feat           = arm_smmu_dev_has_feature,
        .dev_feat_enabled       = arm_smmu_dev_feature_enabled,
        .dev_enable_feat        = arm_smmu_dev_enable_feature,
        .dev_disable_feat       = arm_smmu_dev_disable_feature,
        .pgsize_bitmap          = -1UL, /* Restricted during device attach */
};


(1)
arm_smmu_map
	ops->map 调用io_pgtable_ops->map
		arm_lpae_map
			__arm_lpae_map    //参考https://blog.csdn.net/flyingnosky/article/details/122951474

比如 :

[    2.504955]  dump_backtrace+0x0/0x1b0
[    2.508611]  show_stack+0x18/0x70
[    2.511920]  dump_stack+0xd0/0x12c
[    2.515315]  __arm_lpae_map+0xc4/0x388
[    2.519058]  __arm_lpae_map+0x354/0x388
[    2.522888]  __arm_lpae_map+0x354/0x388
[    2.526717]  arm_lpae_map+0xe4/0x180
[    2.530286]  arm_smmu_map+0x20/0x34
[    2.533769]  __iommu_map+0xdc/0x1cc
[    2.537251]  iommu_map+0x14/0x20
[    2.540473]  iommu_dma_prepare_msi+0x164/0x210
[    2.544911]  its_irq_domain_alloc+0x7c/0x150
[    2.549175]  irq_domain_alloc_irqs_parent+0x28/0x40
[    2.554048]  msi_domain_alloc+0x78/0x140
[    2.557965]  __irq_domain_alloc_irqs+0x16c/0x470
[    2.562576]  __msi_domain_alloc_irqs+0x90/0x2e0
[    2.567100]  msi_domain_alloc_irqs+0x1c/0x30
[    2.571364]  __pci_enable_msix_range+0x614/0x6a0
[    2.575976]  pci_alloc_irq_vectors_affinity+0xc0/0x13c
[    2.581109]  pcie_port_device_register+0x108/0x40c
[    2.585893]  pcie_portdrv_probe+0x34/0xe4
[    2.589897]  local_pci_probe+0x40/0xac

注意：__iommu_map 会while循环，每次映射一个iommu页表大小。且iova的地址需要页对齐。

static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
		       phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
{
	const struct iommu_ops *ops = domain->ops;
	unsigned long orig_iova = iova;
	unsigned int min_pagesz;
	size_t orig_size = size;
	phys_addr_t orig_paddr = paddr;
	int ret = 0;

	if (unlikely(ops->map == NULL ||
		     domain->pgsize_bitmap == 0UL))
		return -ENODEV;

	if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
		return -EINVAL;

	/* find out the minimum page size supported */
	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);

	/*
	 * both the virtual address and the physical one, as well as
	 * the size of the mapping, must be aligned (at least) to the
	 * size of the smallest page supported by the hardware
	 */
	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {  //大小，地址要对齐。
		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
		       iova, &paddr, size, min_pagesz);
		return -EINVAL;
	}

	pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, &paddr, size);

	while (size) { //循环
		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);

		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
			 iova, &paddr, pgsize);
		ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);

		if (ret)
			break;

		iova += pgsize;
		paddr += pgsize;
		size -= pgsize;
	}

	if (ops->iotlb_sync_map)
		ops->iotlb_sync_map(domain);

	/* unroll mapping in case something went wrong */
	if (ret)
		iommu_unmap(domain, orig_iova, orig_size - size);
	else
		trace_map(orig_iova, orig_paddr, orig_size);

	return ret;
}

比如我在iommu_dma_prepare_msi->iommu_dma_get_msi_page中修改：

size_t size = cookie_msi_granule(cookie) + PAGE_SIZE + 1; // 增加PAGE_SIZE + 1将导致如下报错，大小0x2001 和页大小0x1000 不对齐：

[    2.385478] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[    2.500911] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[    2.587739] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[    2.660997] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000
[    2.734276] iommu: unaligned: iova 0xffffe000 pa 0x0000000030830040 size 0x2001 min_pagesz 0x1000

设备的dma_ops是什么时候分配的？

比如acpi初始化的是时候：参考dma_mask和coherent_dma_mask 的默认值

platform设备：

int platform_dma_configure(struct device *dev)
{
	enum dev_dma_attr attr;
	int ret = 0;

	if (dev->of_node) {
		ret = of_dma_configure(dev, dev->of_node, true);
	} else if (has_acpi_companion(dev)) {
		attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
		ret = acpi_dma_configure(dev, attr);
	}

	return ret;
}

acpi_dma_configure(struct device *dev,enum dev_dma_attr attr) //attr:DEV_DMA_NOT_SUPPORTED/DEV_DMA_COHERENT/DEV_DMA_NON_COHERENT
	acpi_dma_configure_id(dev, attr, NULL);
		arch_setup_dma_ops(dev, dma_addr, size,iommu, attr == DEV_DMA_COHERENT);
			dev->dma_coherent = coherent;
			if(iommu) iommu_setup_dma_ops(dev, dma_base, size);
					if (domain->type == IOMMU_DOMAIN_DMA) {  //如果传递iommu.passthrough = 1，则type将不成立，置为空为direct
						if (iommu_dma_init_domain(domain, dma_base, size, dev))
							goto out_err;
						dev->dma_ops = &iommu_dma_ops;
					}

pci设备：

static int pci_dma_configure(struct device *dev)
{
	struct device *bridge;
	int ret = 0;

	bridge = pci_get_host_bridge_device(to_pci_dev(dev));

	if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
	    bridge->parent->of_node) {
		ret = of_dma_configure(dev, bridge->parent->of_node, true);
	} else if (has_acpi_companion(bridge)) {
		struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);

		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
	}

	pci_put_host_bridge_device(bridge);
	return ret;
}


log:
[   13.291332] ===iommu_setup_dma_ops for dev 0000:06:00.0 domain ffff69cdb189ed58
[   13.291338] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.0-wzm-test #11
[   13.291341] Hardware name: PL PR212F3/BC82AMDRA, BIOS 0.22 04/26/2022 10:42:11
[   13.291345] Call trace:
[   13.291356]  dump_backtrace+0x0/0x1e4
[   13.291361]  show_stack+0x20/0x2c
[   13.291369]  dump_stack+0xd8/0x140
[   13.291378]  iommu_setup_dma_ops+0x60/0x210
[   13.291382]  arch_setup_dma_ops+0x80/0xd0
[   13.291390]  acpi_dma_configure_id+0xb0/0xd0
[   13.291396]  pci_dma_configure+0xb0/0xd4
[   13.291402]  really_probe+0xac/0x51c
[   13.291405]  driver_probe_device+0xfc/0x170
[   13.291408]  device_driver_attach+0xc8/0xd0
[   13.291411]  __driver_attach+0xac/0x180
[   13.291415]  bus_for_each_dev+0x78/0xdc
[   13.291418]  driver_attach+0x2c/0x40
[   13.291420]  bus_add_driver+0xd8/0x230
[   13.291423]  driver_register+0x80/0x13c
[   13.291426]  __pci_register_driver+0x4c/0x5c
[   13.291433]  xhci_pci_init+0x68/0x78
[   13.291439]  do_one_initcall+0x4c/0x250
[   13.291444]  do_initcall_level+0xe4/0xfc
[   13.291446]  do_initcalls+0x80/0xa4
[   13.291448]  kernel_init_freeable+0x1a4/0x250
[   13.291453]  kernel_init+0x1c/0x128
[   13.291456]  ret_from_fork+0x10/0x18

cma和pswiotlb的区别：
cma是给分配物理内存用的。默认在4G之下的内存，预留连续的地址，以方便dma的分配接口分配空间，比如dma_alloc_coherent-> iommu_dma_alloc_pages-> dma_alloc_contiguous->cma_alloc
pswiotlb是为了映射用的。比如dma_map_single这种接口，且在非iommu的情况下才用到。将高端地址映射到低端区域，方便只支持32位的dma设备使用。

DMA api层级结构：

DMA api层级结构：
1） kernel/dma/mapping.c
   2） drivers/iommu/dma-iommu.c 的ops
       3）driver/iommu/iommu.c smmu具体设备的上次接口
           4） driver/iommu/arm-smmu-3/arm-smmu-v3.c 具体iommu_ops的映射接口

具体 5.15：

1）dma_map_sg：
   dma_map_sg_attrs //kernel/dma/mapping.c
       iommu：
           iommu_dma_map_sg //drivers/iommu/dma-iommu.c
               1)//untrusted 不可信设备,返回iommu_dma_map_sg_swiotlb //for_each_sg 分别__iommu_dma_map_swiotlb
               __iommu_dma_map_swiotlb //将phy映射到swioltb的phy1地址，并__iommu_dma_map 将phy1 和申请的iova进行映射。相当于借用swiotlb添加了一层，用来保护数据？
                   swiotlb_tbl_map_single
                   __iommu_dma_map
                       iommu_dma_alloc_iova
                       iommu_map_atomic //drivers/iommu/iommu.c
                           _iommu_map
               2)常规设备
               iommu_dma_alloc_iova
               iommu_map_sg_atomic
                   __iommu_map_sg //循环每个sg映射
                       __iommu_map
               __finalise_sg

       direct：
           dma_direct_map_sg //kernel/dma/direct.c
               dma_direct_map_page //for_each_sg
                   如果dma_mask不支持64bit：
                       swiotlb_map
                   否则直接返回
                       dma_addr

2）dma_map_page：iommu_dma_map_page直接调用__iommu_dma_map_swiotlb.
   dma_map_page_attrs //kernel/dma/mapping.c
       iommu：
       dma_map_ops *ops->map_page
           iommu_dma_map_page //drivers/iommu/dma-iommu.c

       direct：
       dma_direct_map_page

iommu_map_atomic //drivers/iommu/iommu.c
   _iommu_map
       __iommu_map
           __iommu_map_pages
               iommu_ops *ops->map_pages //arm-smmu-v3.c