Deep Dive into Contiguous Memory Allocator

This is the first part of an extended versionof an LWN article onCMA. It contains much more detail on how to use CMA, and a lot ofboring code samples. Should you be more interested in an overview,consider reading theoriginal instead.

ContiguousMemory Allocator (or CMA) has been developed to allow bigphysically contiguous memory allocations. By initialising early atboot time and with some fairly intrusive changes to Linux memorymanagement, it is able to allocate big memory chunks without a need tograb memory for exclusive use.

Simple in principle, it grew to be a quite complicated system whichrequires cooperation between boot-time allocator, buddy system, DMAsubsystem, and some architecture-specific code. Still, all thatcomplexity is usually hidden away and normal users won't be exposed toit. Depending on perspective, CMA appears slightly different andthere are different things to be done and look for.

Using CMA in device drivers

From device driver author's point of view, nothing should change.CMA is integrated with DMA subsystem, so the usual calls to the DMAAPI (such as dma_alloc_coherent) should work as usual.

In fact, device drivers should never need to call the CMA APIdirectly. Most importantly, device drivers operate on kernel mappingsand bus addresses whereas CMA operates on pages and PFNs.Furthermore, CMA does not handle cache coherency, which the DMA APIwas designed to deal with. Lastly, it is more flexible and allowsallocations in atomic contexts (e.g. interrupt handler) and creationof memory pools, which are well suited for small allocations.

For a quick example, this is how allocation might look like:

dma_addr_t dma_addr;
void *virt_addr =
	dma_alloc_coherent(dev, 100 << 20, &dma_addr, GFP_KERNEL);
if (!virt_addr)
	return -ENOMEM;

Provided that dev is a pointer to a valid structdevice, the above code will allocate 100 MiBs of memory. It mayor may not be a CMA memory, but it is a portable way to get buffers.The following can be used to free it:

dma_free_coherent(dev, 100 << 20, virt_addr, dma_addr);

BarrySong has posted a very simple test driver which uses those two toallocate DMA memory.

More information about the DMA API can be found in Documentation/DMA-API.txtand Documentation/DMA-API-HOWTO.txt.Those two documents describe provided functions and giveusage examples.

Integration with architecture code

Obviously, CMA has to be integrated with given architecture's DMAsubsystem beforehand. This is performed in a few, fairly easy steps.The CMApatchset integrates it withx86andARMarchitectures. This section will refer to both patches as well asquote their relevant portions.

Reserving memory

CMA works by reserving memory early at boot time. This memory,called CMA area or CMA context, is laterreturned to the buddy system so it can be used by regularapplications. To make reservation happen, one needs to call:

void dma_contiguous_reserve(
	phys_addr_t limit);

just after memblock is initialised but prior to the buddy allocatorsetup.

The limit argument, if not zero, specifies physicaladdress above which no memory will be prepared for CMA. Intention isto allow limiting CMA contexts to addresses that DMA can handle. Theonly real constraint that CMA imposes is that reserved memory mustbelong to the same zone.

In case of ARM the limit is set to arm_dma_limit orarm_lowmem_limit, whichever is smallest:

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
@@ -364,6 +373,12 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
    if (mdesc->reserve)
            mdesc->reserve();

+	/*
+	 * reserve memory for DMA contigouos allocations,
+	 * must come from DMA area inside low memory
+	 */
+	dma_contiguous_reserve(min(arm_dma_limit, arm_lowmem_limit));
+
 	arm_memblock_steal_permitted = false;
 	memblock_allow_resize();
 	memblock_dump_all();

On x86 it is called just after memblock is set up insetup_arch function with no limit specified:

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
@@ -934,6 +935,7 @@ void __init setup_arch(char **cmdline_p)
 	}
 #endif
 	memblock.current_limit = get_max_mapped();
+	dma_contiguous_reserve(0);

 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.

The amount of reserved memory depends on a few Kconfig options anda cma kernel parameters which will be describelater on.

Architecture specific memory preparations

dma_contiguous_reserve function will reserve memory andprepare it to be used with CMA. On some architecturesarchitecture-specific work may need to be performed as well. To allowthat, CMA will call the following function:

void dma_contiguous_early_fixup(
	phys_addr_t base,
	unsigned long size);

It is architecture's responsibility to provide it along with itsdeclaration in asm/dma-contiguous.h header file. Thefunction will be called quite early, so some of the kernelsubsystems – ike kmalloc – will not be available.Furthermore, it may be called several times, but no more thanMAX_CMA_AREAS times.

If an architecture does not need any special handling, the headerfile may just say:

#ifndef H_ARCH_ASM_DMA_CONTIGUOUS_H
#define H_ARCH_ASM_DMA_CONTIGUOUS_H
#ifdef __KERNEL__

#include <linux/types.h>
#include <asm-generic/dma-contiguous.h>

static inline void
dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
{ /* nop */ }

#endif
#endif

ARM requires some workmodifying mappings and so it provides a full definition of thisfunction:

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
[…]
+static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
		[…]
+	}
+}
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
@@ -1114,11 +1122,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;

-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);

 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
DMA subsystem integration

Second thing to do is to change architecture's DMA API to use thewhole machinery. To allocate memory from CMA one uses:

struct page *dma_alloc_from_contiguous(
	struct device *dev,
	int count,
	unsigned int align);

Its first argument is the device allocation is performed on behalfof. The second one specifies number of pages (not bytes ororder) to allocate.

The third argument is the alignment expressed as a page order. Itenables allocation of buffers which are aligned to at least2align pages. To avoid fragmentation, if at allpossible, pass zero here. It is worth noting that there is a Kconfigoption (CONFIG_CMA_ALIGNMENT) which specifies maximalalignment accepted by the function. By default, its value is 8meaning an alignment of 256 pages.

The return value is the first of a sequence of countallocated pages.

Here's how allocation looks on x86:

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
@@ -99,14 +99,18 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
	[…]
 again:
-	page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+	if (!(flag & GFP_ATOMIC))
+		page = dma_alloc_from_contiguous(dev, count, get_order(size));
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
 	if (!page)
 		return NULL;

To free allocated buffer, one needs to call:

bool dma_release_from_contiguous(
	struct device *dev,
	struct page *pages,
	int count);

dev and count arguments are the same as before,whereas pages is whatdma_alloc_from_contiguous has returned.

If region passed to the function did not come from CMA, thefunction will return false. Otherwise, it will returntrue. This removes the need for higher-level functions totrack which allocations were made with CMA and which were made usingsome other method.

Again, here's how it is used on x86:

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
@@ -126,6 +130,16 @@ again:
 	return page_address(page);
 }

+void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_addr)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct page *page = virt_to_page(vaddr);
+
+	if (!dma_release_from_contiguous(dev, page, count))
+		free_pages((unsigned long)vaddr, get_order(size));
+}
+
 /*
  * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
  * parameter documentation.
Atomic allocations

Beware that dma_alloc_from_contiguous may not be calledfrom atomic context (e.g. when spin lock is hold or in an interrupt).It performs some “heavy” operations such as page migration, directreclaim, etc., which may take a while. Because of that, tomake dma_alloc_coherent and co. work as advertised,architecture needs to have a different method of allocating memory inatomic context.

The simplest solution is to put aside a bit of memory at boot timeand perform atomic allocations from it. This is in fact what ARM isdoing. Existing architectures most likely have a special path foratomic allocations already.

Special memory requirements

At this point, most of the drivers should “just work”. They usethe DMA API which calls CMA. Life is beautiful. Except some devicesmay have special memory requirements. For instance, Samsung'sMulti-format codec (MFC) requires different types of buffers to belocated in different memory banks (which allows reading them throughtwo memory channels, thus increasing memory bandwidth). Furthermore,one may want to separate some device's allocations from others as tolimit fragmentation within CMA areas.

As mentioned earlier, CMA operates on contexts describing a portionof system memory to allocate buffers from. One global area is createdto be used by devices by default, but if a device needs to usea different area, it can easily be done.

There is a many-to-one mapping between struct deviceand struct cma (ie. CMA context). This means, that ifa single device driver needs to use more than one CMA area, it has tohave separate struct device objects. At the same time,several struct device objects may point to the same CMAcontext.

Assigning CMA area to a single device

To assign a CMA area to a device, all one needs to do is call:

int dma_declare_contiguous(
	struct device *dev,
	unsigned long size,
	phys_addr_t base,
	phys_addr_t limit);

As withdma_contiguous_reserve, this needs to be called aftermemblock initializes but before too much memory gets grabbed from it.For ARM platforms, a convenient place to put invocation of thisfunction is machine's reserve callback.

The first argument of the function is the device that the newcontext is to be assigned to. The second is its size inbytes (not in pages). The third is physical address of the areaor zero. The last one has the same meaning as limit argumentto dma_contiguous_reserve. The return value is either zero(on success) or a negative error code.

For an example, one can take a look at the code called fromSamsung'sS5P platform's reserve callback. It creates two CMAcontexts for the MFC driver:

diff --git a/arch/arm/plat-s5p/dev-mfc.c b/arch/arm/plat-s5p/dev-mfc.c
@@ -22,52 +23,14 @@
 #include <plat/irqs.h>
 #include <plat/mfc.h>

[…]
 void __init s5p_mfc_reserve_mem(phys_addr_t rbase, unsigned int rsize,
 				phys_addr_t lbase, unsigned int lsize)
 {
	[…]
+	if (dma_declare_contiguous(&s5p_device_mfc_r.dev, rsize, rbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
	[…]
+	if (dma_declare_contiguous(&s5p_device_mfc_l.dev, lsize, lbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 }

There is a limit to how many “private” areas can be declared,namely CONFIG_CMA_AREAS. Its default value is seven but itcan be safely increased if need arises. Called more times,dma_declare_contiguous function will print an errormessage and return -ENOSPC.

Assigning CMA area to multiple devices

Things get a bit more complicated if the same (not default) CMAcontext needs to be used by two or more devices. The current API doesnot provide a trivial way to do that. What can be done isuse dev_get_cma_area to figure out CMA area one device isusing, and dev_set_cma_area to set the same context toanother device. This sequence must be called no sooner thanin postcore_initcall. Here is how it could look like:

static int __init foo_set_up_cma_areas(void)
{
	struct cma *cma;

	cma = dev_get_cma_area(device1);
	dev_set_cma_area(device2, cma);
	return 0;
}
postcore_initcall(foo_set_up_cma_areas);

Of course, device1's area must be set up withdma_declare_contiguous as described in previoussubsection.

Device's CMA context may be changed any time as long as the devicehold no CMA memory – it will be rather tricky to release anyallocation after area change.

No default context

As a matter of fact, there is nothing special about the defaultcontext that is created by dma_contiguous_reserve function.It is in no way required and system may work without it.

If there is no default context, for devices without assignedareas dma_alloc_from_contiguous will returnNULL. dev_get_cma_area can be useddistinguish this situation and allocation failure.

Of course, if there is no default area, architecture should provideother means to allocate memory, for devices without assigned CMAcontext.

Size of the default context

dma_contiguous_reserve does not take a size as anargument, which brings a question of how does it know how much memoryshould be reserved. There are two sources this information comesfrom.

First of all, there is a set of Kconfig options, which specify thedefault size of the reservation. All of those options are locatedunder “Device Drivers” » “Generic Driver Options” » “Contiguous MemoryAllocator” in the Kconfig menu. They allow specifying one of fourpossible ways of calculating the size: it can be an absolute size inmegabytes, a percentage of total memory, the lower of the two, or thelarger of the two. By default is to allocate 16 MiBs of memory.

Second of all, there is a cma kernel command line option.It lets one specify the size of the area at boot time without the needto recompile the kernel. This option specifies the size in bytes andaccepts the usual suffixes.


the cma test code is as follows:

#include <linux/device.h>
#include <linux/dma-mapping.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/sizes.h>
#include <linux/types.h>


struct cma_allocation {
	struct list_head list;
	size_t size;
	dma_addr_t dma;
	void *virt;
};

static struct device *cma_dev;
static LIST_HEAD(cma_allocations);
static DEFINE_SPINLOCK(cma_lock);

/*
* any read request will free the 1st allocated coherent memory, eg,
 cat /dev/cma_test
**/
static ssize_t
cma_test_read(struct file * file, char __user *buf, size_t count, loff_t *ppos)
{
	struct cma_allocation *alloc = NULL;
	spin_lock(&cma_lock);
	if(!list_empty(&cma_allocations)) {
		alloc = list_first_entry(&cma_allocations, 
					struct cma_allocation, list);
		list_del(&alloc->list);
	}

	spin_unlock(&cma_lock);

	if(!alloc)
	{
		return -EIDRM;
	}

	dma_free_coherent(cma_dev, alloc->size, alloc->virt, alloc->dma);
	_dev_info(cma_dev, "free: CM virt: %p dma: %p size:%zuK\n",
				alloc->virt, (void *)alloc->dma, alloc->size/ SZ_1K);
	kfree(alloc);
	return 0;
}


/*
* any write request will alloc a new coherent memory, eg,
* echo 1024 > /dev/cma_test
* will request 1024KiB by CMA
*/

static ssize_t 
cma_test_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
	struct cma_allocation *alloc;
	unsigned long size;
	int ret;

	ret = kstrtoul_from_user(buf, count, 0, &size);
	if(ret)
		return ret;
	
	if(!size)
		return -EINVAL;

	if(size > ~(size_t)0 /SZ_1K)
		return -EOVERFLOW;
	
	alloc = kmalloc(sizeof *alloc, GFP_KERNEL);
	
	if(!alloc)
		return -ENOMEM;
	
	alloc->size = size *SZ_1K;
	alloc->virt = dma_alloc_coherent(cma_dev, alloc->size,
					&alloc->dma, GFP_KERNEL);

	if(alloc->virt) {
			_dev_info(cma_dev, "alloc:virt: %p dma: %p size: %zuK\n",
				alloc->virt, (void *)alloc->dma, alloc->size / SZ_1K);

		spin_lock(&cma_lock);
		list_add_tail(&alloc->list, &cma_allocations);
		spin_unlock(&cma_lock);
	
		return count;
	}
	else {
		dev_err(cma_dev, "no mem in CMA area\n");
		kfree(alloc);
		return -ENOSPC;
	}
}

static const struct file_operations cma_test_fops = {
		.owner =			THIS_MODULE,
		.read  =			cma_test_read,
		.write =			cma_test_write,
};

static struct miscdevice cma_test_misc = {
		.name = "cma_test",
		.fops = &cma_test_fops,
};


static int __init cma_test_init(void)
{
	int ret = misc_register(&cma_test_misc);
	if(unlikely(ret)) {
		pr_err("failed to register cma test misc device!\n");
		return ret;
	}
	
	cma_dev = cma_test_misc.this_device;
	cma_dev->coherent_dma_mask = ~0;
	_dev_info(cma_dev, "registered.\n");
	return 0;
}

module_init(cma_test_init);

static void __exit cma_test_exit(void)
{
	misc_deregister(&cma_test_misc);
}

module_exit(cma_test_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Barry Song <Baohua.Song <at> <csr.com>");
MODULE_DESCRIPTION("kernel module to help the test of CMA");
MODULE_ALIAS("cma_test");

http://mina86.com/2012/06/10/deep-dive-into-contiguous-memory-allocator-part-i/


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
DeepDive是一种基于机器学习的知识抽取系统,可以帮助我们从海量的非结构化数据中提取出有用的知识。 举个例子来说明DeepDive的知识抽取实例。假设我们拥有一个包含大量餐馆评论的文本数据库,我们希望从中提取出关于不同餐馆的菜品口味的知识。 首先,我们需要为DeepDive提供一些训练数据,这些数据包括标注好的句子,其中指出了每个词语是否与菜品的味道相关。接着,DeepDive使用这些训练数据进行机器学习,以便能够识别与菜品味道相关的词语。 一旦训练完成,我们就可以使用DeepDive来处理整个餐馆评论数据库。DeepDive将会对每个句子中的词语进行标注,以确定其与菜品味道的相关性。例如,DeepDive可能会识别出句子中的"好吃"、"美味"、"难吃"等词语与菜品味道相关。然后,DeepDive可以根据这些标注将句子分类为积极评价或消极评价。 通过合并和整理这些标注好的句子,我们可以得到一个包含丰富知识的数据库。这个数据库可以告诉我们哪些餐馆的菜品被认为是好吃的,哪些被认为是难吃的。我们可以利用这些知识来做更深入的分析,例如找到味道最好的菜,或者预测一个餐馆的成功度。 总之,DeepDive的知识抽取实例能够帮助我们从非结构化的数据中提取出有用的知识,帮助我们做出更好的决策和发现更多的见解。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值