Arm-kernel 内存收集

深入探讨Linux内核内存管理子系统,尤其关注ARM处理器环境下内核如何收集物理内存地址空间和大小信息,以及Bootloader与内核间信息传递机制。
  

 

Arm-kernel 内存收集

 

Linux kernel的内存管理子系统非常复杂,为了深入了解内存管理系统,我打算分多篇文章来分析linux内存管理。本文就谈谈kernel如何收集物理内存的地址空间和大小等信息。

 

嵌入式arm处理器与我们平时接触到的intel处理器有点不一样,intel处理器可以通过主板或者BIOS代码来自动检测物理内存的大小。但arm处理器下的嵌入式系统没有这么幸运了,它必须由bootloader手工(或硬编码)的方式来获知kernel板上物理内存的开始地址和大小。

 

实际上,除了物理内存信息外,命令行参数,视频卡信息和randisk信息,都是通过bootloader一一告知kernel的。Bootloader和kernel之间必须有某种约定,才能方便让两者之间传递信息。实事上,arm-kernel约定bootloader按如下的要求来存放这些信息:

 

1.在bootloader跳到kernel执行时,r2寄存器值为存放这些信息的首地址。

2.上述各子信息必须按一定的格式来组合起来。

 

在arm-kernel里面,每个子信息项称为atag.  Struct tag的定义如下:

 

struct tag {
	struct tag_header hdr;
	union {
		struct tag_core		core;
		struct tag_mem32	mem;
		struct tag_videotext	videotext;
		struct tag_ramdisk	ramdisk;
		struct tag_initrd	initrd;
		struct tag_serialnr	serialnr;
		struct tag_revision	revision;
		struct tag_videolfb	videolfb;
		struct tag_cmdline	cmdline;
		struct tag_acorn	acorn;
		struct tag_memclk	memclk;
	} u;
};


 

struct tag_header hdr类似于头部标识的作用,它用于标识后面的union的类型,以及此种结构占用内存的大小。 后面union表示,每个tag可以是上面几种信息其中的一种。

上图就是各种tag组合成tags的一个实例。Hdr里面记录后的是哪种tag,整个tag的长度是多少,这样很方跳到下一个tag去。Hdr的类型是struct tag_header,定义如下:

 

struct tag_header {
	__u32 size;
	__u32 tag;
};


为了方便分析,假定kernel已经知道了tags的地址,那么它调用parse_tags函数来对各个tags进行分析,代码如下:

[arch/arm/kernel/setup.c]
static void __init parse_tags(const struct tag *t)
{
	for (; t->hdr.size; t = tag_next(t))
		if (!parse_tag(t))
			printk(KERN_WARNING
				"Ignoring unrecognised tag 0x%08x\n",
				t->hdr.tag);
}


 

Tags假定,在排好各个tag之后,后面要跟一个hdr.size为0的空tag。所以在for中,利用t->hdr.size 为0作为结束条件。而t = tag_next(t) 就是利用t->hdr.size的大小跳到下一个tag的。tag_next定义如下:

 

#define tag_next(t) ((struct tag *)((__u32 *)(t) + (t)->hdr.size))

 

而parse_tag函数就对t所指向的tag进行分析,代码如下:

static int __init parse_tag(const struct tag *tag)
{
	extern struct tagtable __tagtable_begin, __tagtable_end;
	struct tagtable *t;

	for (t = &__tagtable_begin; t < &__tagtable_end; t++)
		if (tag->hdr.tag == t->tag) {
			t->parse(tag);
			break;
		}

	return t < &__tagtable_end;
}


这里又使用了kernel惯用技巧,就是把各种atag关联的处理函数结构定义到一个特殊的section里面,然后在链接时,__tagtable_begin就是该section的开始地址,而__tagtable_end则是它的结束地址。从parse_tag函数可以使用,编译后该section就是一个struct tagtable数组,类型tagtable定义如下:

 

struct tagtable {
	__u32 tag;
	int (*parse)(const struct tag *);
};


 

tag是它所描述的tag类型,而parse则就此种类型atag的分析处理函数。结合parse_tag函数可知,tagtable数组就是定义好各种atag的处理函数,当bootloader将atags传递到kernel里,kernel依次检查各atag的类型,然后查表(tagtable)调用它的处理函数。

 实例上,bootloader向kernel传递的atags种类较多,它们各有各的用途,本文主要分析kernel是如何收集物理内存信息的。因此,我们只关心类型为ATAG_MEM的tag。我们先看ATAG_MEM类型的tagtable定义:

#define __tag __used __attribute__((__section__(".taglist.init")))
#define __tagtable(tag, fn) \
static struct tagtable __tagtable_##fn __tag = { tag, fn }

__tagtable(ATAG_MEM, parse_tag_mem32);


 

从这种tagtable的定义可以看出,ATAG_MEM这种tag的处理函数是parse_tag_mem32,它的代码如下:

static int __init parse_tag_mem32(const struct tag *tag)
{
	return arm_add_memory(tag->u.mem.start, tag->u.mem.size);
}


 

当tag->hdr.tag 为ATAG_MEM时,tag.u的数据为mem成员有效,而mem成员是下面这种类型的:

struct tag_mem32 {
	__u32	size;
	__u32	start;	/* physical start address */
};

 

故parse_tag_mem32函数,将物理内存的信息传递并调用arm_add_memory函数。在arm-kernel的启动阶段,它使用一个叫meminfo的数据结构来记录系统的所有物理内存。在arm系统架构中,SDRAM是连接到arm的bank里的,即物理内存是分bank的。故meminfo的类型,也是类似的,它的定义如下:

 

struct membank {
	unsigned long start;
	unsigned long size;
	int           node;
};

struct meminfo {
	int nr_banks;
	struct membank bank[NR_BANKS];
};


 

Nr_banks记录当前meminfo已存放多少个bank,而bank[i]成员则记录第i个bank内存的开始地址和大小。

毫无疑问,arm_add_memory函数的工作就是它atags里面的物理内存信息增加到meminfo结构是,代码如下:

 

static int __init arm_add_memory(unsigned long start, unsigned long size)
{
	struct membank *bank = &meminfo.bank[meminfo.nr_banks];

	if (meminfo.nr_banks >= NR_BANKS) {
		printk(KERN_CRIT "NR_BANKS too low, "
			"ignoring memory at %#lx\n", start);
		return -EINVAL;
	}

	/* 使start和size均为PAGE_SIZE(即4K) 的倍数
     * 其中start以4K向上取整,而size则以4K向下取整
     */
	size -= start & ~PAGE_MASK;
	bank->start = PAGE_ALIGN(start);
	bank->size  = size & PAGE_MASK;
	bank->node  = PHYS_TO_NID(start);

	/*
	 * Check whether this memory region has non-zero size or
	 * invalid node number.
	 */
	if (bank->size == 0 || bank->node >= MAX_NUMNODES)
		return -EINVAL;

	meminfo.nr_banks++;
	return 0;
}


 

当处理完所有的atags后,meminfo数据结构所描述的信息,就是个开发板上的物理内存信息。为了获得更直观的运行结果,我在arm_add_memory函数添加了如下的打印代码:

Printk(KERN_ERR “add memory bank(%d) 0x%08lx-0x%08lx\n”,

Meminfo.nr_banks, bank->start, bank->start + bank->size);

 

便可获知开发板上的物理内存情况了。就天嵌的s3c2440开板来说,它的SDRAM是接到bank6里面,故它的起始地址是0x3000000,大小是64M,它的结束地址是0x32000000。

 

尽管知道parse_tag是用于收到物理内存信息的,但仍不知道它是在何处被调用的。其实它是在启动阶段进行收集的,它的调用关系如下:

Start_kernel() -> setup_arch() -> parse_tags()

 

当执行完parse_tags函数,那么meminfo结构就收集完了物理内存信息。接下来的事情就是建立最终的内核空间的页表,以及建立临时的内存管理系统和初始化所有的物理页。

Locations of visitors to this page
// SPDX-License-Identifier: GPL-2.0-only // Miscellaneous Arm SMMU implementation and integration quirks // Copyright (C) 2019 Arm Limited #define pr_fmt(fmt) "arm-smmu: " fmt #include <linux/bitfield.h> #include <linux/of.h> #include "arm-smmu.h" static int arm_smmu_gr0_ns(int offset) { switch (offset) { case ARM_SMMU_GR0_sCR0: case ARM_SMMU_GR0_sACR: case ARM_SMMU_GR0_sGFSR: case ARM_SMMU_GR0_sGFSYNR0: case ARM_SMMU_GR0_sGFSYNR1: case ARM_SMMU_GR0_sGFSYNR2: return offset + 0x400; default: return offset; } } static u32 arm_smmu_read_ns(struct arm_smmu_device *smmu, int page, int offset) { if (page == ARM_SMMU_GR0) offset = arm_smmu_gr0_ns(offset); return readl_relaxed(arm_smmu_page(smmu, page) + offset); } static void arm_smmu_write_ns(struct arm_smmu_device *smmu, int page, int offset, u32 val) { if (page == ARM_SMMU_GR0) offset = arm_smmu_gr0_ns(offset); writel_relaxed(val, arm_smmu_page(smmu, page) + offset); } /* Since we don't care for sGFAR, we can do without 64-bit accessors */ static const struct arm_smmu_impl calxeda_impl = { .read_reg = arm_smmu_read_ns, .write_reg = arm_smmu_write_ns, }; struct cavium_smmu { struct arm_smmu_device smmu; u32 id_base; }; static int cavium_cfg_probe(struct arm_smmu_device *smmu) { static atomic_t context_count = ATOMIC_INIT(0); struct cavium_smmu *cs = container_of(smmu, struct cavium_smmu, smmu); /* * Cavium CN88xx erratum #27704. * Ensure ASID and VMID allocation is unique across all SMMUs in * the system. */ cs->id_base = atomic_fetch_add(smmu->num_context_banks, &context_count); dev_notice(smmu->dev, "\tenabling workaround for Cavium erratum 27704\n"); return 0; } static int cavium_init_context(struct arm_smmu_domain *smmu_domain, struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) { struct cavium_smmu *cs = container_of(smmu_domain->smmu, struct cavium_smmu, smmu); if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2) smmu_domain->cfg.vmid += cs->id_base; else smmu_domain->cfg.asid += cs->id_base; return 0; } static const struct arm_smmu_impl cavium_impl = { .cfg_probe = cavium_cfg_probe, .init_context = cavium_init_context, }; static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smmu) { struct cavium_smmu *cs; cs = devm_krealloc(smmu->dev, smmu, sizeof(*cs), GFP_KERNEL); if (!cs) return ERR_PTR(-ENOMEM); cs->smmu.impl = &cavium_impl; return &cs->smmu; } #define ARM_MMU500_ACTLR_CPRE (1 << 1) #define ARM_MMU500_ACR_CACHE_LOCK (1 << 26) #define ARM_MMU500_ACR_S2CRB_TLBEN (1 << 10) #define ARM_MMU500_ACR_SMTNMB_TLBEN (1 << 8) int arm_mmu500_reset(struct arm_smmu_device *smmu) { u32 reg, major; int i; /* * On MMU-500 r2p0 onwards we need to clear ACR.CACHE_LOCK before * writes to the context bank ACTLRs will stick. And we just hope that * Secure has also cleared SACR.CACHE_LOCK for this to take effect... */ reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7); major = FIELD_GET(ARM_SMMU_ID7_MAJOR, reg); reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR); if (major >= 2) reg &= ~ARM_MMU500_ACR_CACHE_LOCK; /* * Allow unmatched Stream IDs to allocate bypass * TLB entries for reduced latency. */ reg |= ARM_MMU500_ACR_SMTNMB_TLBEN | ARM_MMU500_ACR_S2CRB_TLBEN; arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sACR, reg); /* * Disable MMU-500's not-particularly-beneficial next-page * prefetcher for the sake of at least 5 known errata. */ for (i = 0; i < smmu->num_context_banks; ++i) { reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); reg &= ~ARM_MMU500_ACTLR_CPRE; arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); if (reg & ARM_MMU500_ACTLR_CPRE) dev_warn_once(smmu->dev, "Failed to disable prefetcher for errata workarounds, check SACR.CACHE_LOCK\n"); } return 0; } static const struct arm_smmu_impl arm_mmu500_impl = { .reset = arm_mmu500_reset, }; static u64 mrvl_mmu500_readq(struct arm_smmu_device *smmu, int page, int off) { /* * Marvell Armada-AP806 erratum #582743. * Split all the readq to double readl */ return hi_lo_readq_relaxed(arm_smmu_page(smmu, page) + off); } static void mrvl_mmu500_writeq(struct arm_smmu_device *smmu, int page, int off, u64 val) { /* * Marvell Armada-AP806 erratum #582743. * Split all the writeq to double writel */ hi_lo_writeq_relaxed(val, arm_smmu_page(smmu, page) + off); } static int mrvl_mmu500_cfg_probe(struct arm_smmu_device *smmu) { /* * Armada-AP806 erratum #582743. * Hide the SMMU_IDR2.PTFSv8 fields to sidestep the AArch64 * formats altogether and allow using 32 bits access on the * interconnect. */ smmu->features &= ~(ARM_SMMU_FEAT_FMT_AARCH64_4K | ARM_SMMU_FEAT_FMT_AARCH64_16K | ARM_SMMU_FEAT_FMT_AARCH64_64K); return 0; } static const struct arm_smmu_impl mrvl_mmu500_impl = { .read_reg64 = mrvl_mmu500_readq, .write_reg64 = mrvl_mmu500_writeq, .cfg_probe = mrvl_mmu500_cfg_probe, .reset = arm_mmu500_reset, }; struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { const struct device_node *np = smmu->dev->of_node; /* * Set the impl for model-specific implementation quirks first, * such that platform integration quirks can pick it up and * inherit from it if necessary. */ switch (smmu->model) { case ARM_MMU500: smmu->impl = &arm_mmu500_impl; break; case CAVIUM_SMMUV2: return cavium_smmu_impl_init(smmu); default: break; } /* This is implicitly MMU-400 */ if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = &calxeda_impl; if (of_device_is_compatible(np, "nvidia,tegra234-smmu") || of_device_is_compatible(np, "nvidia,tegra194-smmu") || of_device_is_compatible(np, "nvidia,tegra186-smmu")) return nvidia_smmu_impl_init(smmu); if (IS_ENABLED(CONFIG_ARM_SMMU_QCOM)) smmu = qcom_smmu_impl_init(smmu); if (of_device_is_compatible(np, "marvell,ap806-smmu-500")) smmu->impl = &mrvl_mmu500_impl; return smmu; } 上边的是一个内核版本的代码, 其作用是什么, 该内核在CPU:飞腾S2500(双路64核)机器上运行后日志中报错: Sep 12 10:32:57 localhost kernel: [ 7.418320] arm-smmu arm-smmu.0.auto: probing hardware configuration... Sep 12 10:32:57 localhost kernel: [ 7.418338] arm-smmu arm-smmu.0.auto: SMMUv2 with: Sep 12 10:32:57 localhost kernel: [ 7.418343] arm-smmu arm-smmu.0.auto: #011stage 1 translation Sep 12 10:32:57 localhost kernel: [ 7.418346] arm-smmu arm-smmu.0.auto: #011stage 2 translation Sep 12 10:32:57 localhost kernel: [ 7.418350] arm-smmu arm-smmu.0.auto: #011nested translation Sep 12 10:32:57 localhost kernel: [ 7.418352] arm-smmu arm-smmu.0.auto: #011coherent table walk Sep 12 10:32:57 localhost kernel: [ 7.418356] arm-smmu arm-smmu.0.auto: #011stream matching with 64 register groups Sep 12 10:32:57 localhost kernel: [ 7.418362] arm-smmu arm-smmu.0.auto: #01132 context banks (0 stage-2 only) Sep 12 10:32:57 localhost kernel: [ 7.418368] arm-smmu arm-smmu.0.auto: #011Supported page sizes: 0x61311000 Sep 12 10:32:57 localhost kernel: [ 7.418372] arm-smmu arm-smmu.0.auto: #011Stage-1: 48-bit VA -> 48-bit IPA Sep 12 10:32:57 localhost kernel: [ 7.418375] arm-smmu arm-smmu.0.auto: #011Stage-2: 48-bit IPA -> 48-bit PA Sep 12 10:32:57 localhost kernel: [ 7.418463] arm-smmu arm-smmu.0.auto: #011preserved 0 boot mappings Sep 12 10:32:57 localhost kernel: [ 7.418486] arm-smmu arm-smmu.0.auto: Failed to disable prefetcher for errata workarounds, check SACR.CACHE_LOCK Sep 12 10:32:57 localhost kernel: [ 7.418547] arm-smmu arm-smmu.1.auto: probing hardware configuration... Sep 12 10:32:57 localhost kernel: [ 7.418551] arm-smmu arm-smmu.1.auto: SMMUv2 with: Sep 12 10:32:57 localhost kernel: [ 7.418554] arm-smmu arm-smmu.1.auto: #011stage 1 translation Sep 12 10:32:57 localhost kernel: [ 7.418557] arm-smmu arm-smmu.1.auto: #011stage 2 translation Sep 12 10:32:57 localhost kernel: [ 7.418560] arm-smmu arm-smmu.1.auto: #011nested translation Sep 12 10:32:57 localhost kernel: [ 7.418562] arm-smmu arm-smmu.1.auto: #011coherent table walk Sep 12 10:32:57 localhost kernel: [ 7.418566] arm-smmu arm-smmu.1.auto: #011stream matching with 64 register groups Sep 12 10:32:57 localhost kernel: [ 7.418570] arm-smmu arm-smmu.1.auto: #01132 context banks (0 stage-2 only) Sep 12 10:32:57 localhost kernel: [ 7.418575] arm-smmu arm-smmu.1.auto: #011Supported page sizes: 0x61311000 Sep 12 10:32:57 localhost kernel: [ 7.418578] arm-smmu arm-smmu.1.auto: #011Stage-1: 48-bit VA -> 48-bit IPA Sep 12 10:32:57 localhost kernel: [ 7.418581] arm-smmu arm-smmu.1.auto: #011Stage-2: 48-bit IPA -> 48-bit PA Sep 12 10:32:57 localhost kernel: [ 7.418644] arm-smmu arm-smmu.1.auto: #011preserved 0 boot mappings怎么回事,该怎么排查
最新发布
09-25
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值