bootloader性能优化

最新推荐文章于 2022-07-06 16:26:24 发布

xiewen202201

最新推荐文章于 2022-07-06 16:26:24 发布

阅读量1.4k

点赞数 1

分类专栏： bootloader 文章标签：性能优化 bootloader

本文链接：https://blog.csdn.net/xiewen2010/article/details/13504801

版权

bootloader 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

性能优化记录：

1.memory type设置（一般）

arm（ARMv7）有三种memory type如下：

          Normal
          Device
          Strongly-ordered.

通常memory都是设置成Normal type，但是其他两种类型增加了如下访问规则，确保系统内存访问的正确性：

          • both read and write accesses can have side effects
         • accesses must not be repeated, for example, on return from an exception
          • the number, order and sizes of the accesses must be maintained.

arm通过CP15协处理器的c1（System Control Register (SCTLR)）的TEX[2:0]/C/B3位来确定memory type，具体见

arm官网文档：arm architecture Reference Manual ARMv7-A and ARMv7-R Errata markup.pdf中表：B3.12.17 描述cp15 c1 c0

          TRE, bit [28] TEX Remap Enable bit. This bit enables remapping of the TEX[2:1] bits for use as two
                                 translation table bits that can be managed by the operating system. Enabling this remapping
                                 also changes the scheme used to describe the memory region attributes in the VMSA. The
                                possible values of this bit are:
                       0 TEX Remap disabled. TEX[2:0] are used, with the C and B bits, to describe the memory region attributes.
                       1 TEX Remap enabled. TEX[2:1] are reassigned for use as flags managed by the operating system.

The TEX[0], C and B bits are used to describe the memoryregion attributes, with the MMU remap registers.

表B3.7.2 描述设置C, B, and TEX[2:0]对应的memory type： C, B, and TEX[2:0] encodings without TEX remap

表B3.3.1描述了如何去设置C, B, and TEX[2:0]

eg：设置memory type为device type

          1. set TRE=0 to disable TEX remap
          2. set TEX[2:0]=0b000
          3. set C B = 0b01 for device memory

2.打开cpu分支预测功能（显著提高性能）

查看文档arm cp15的c1 的c0寄存器的11位用于enable/disable分支预测，寄存器描述如下：

          Z, bit [11]  Branch prediction enable bit. This bit is used to enable branch prediction, also called program flow prediction:
                   0 Program flow prediction disabled
                  1 Program flow prediction enabled.

eg：修改

    @@ -27,8 +27,9 @@ InitMultiLevelMMU:
              ORR             r0, r0, #0x1000         //; icache on
              ORR             r0, r0, #0x4            //; dcache on
              ORR             r0, r0, #0x1            //; mmu on
          +   ORR             r0, r0, #0x800          //; set bits 11 (---Z),enable branch predictor
              MCR             p15, 0, r0, c1, c0, 0

3.DMA和cache传输一致性问题

如果DMA的目的地址和cache所缓存的内存地址有重叠，由于DMA无需CPU参与，故在DMA操作后，cache中的对应内存的数据可能被DMA修改，而CPU将不知道，然认为cache中的数据就是内存中的数据，这时候CPU访问cache中就是旧的数据，这时就发生了DMA和cache的一致性问题；

在bootloader中cache不一致问题，还可能发生在，cache的使能和关闭时刻：

eg:MMU开启前，要先置cache/TLB无效，因为MMU会开启虚拟地址到物理地址的映射，MMU开启前cache中是物理地址，开启后将是虚拟地址。

4）cache一致性操作注意事项

清Dcache，更新主存数据，清Icache，cpu重新读取主存

a）开启MMU前，无效I/Dcache；

b）关闭MMU前，清I/Dcache；

c）使用DMA操作可以被cache的内存时：将数据发送出去后，要清cache；将要存入数据时，无效cache；

d）在开启cache时，考虑cache中的数据是否和主存一致；

e）对于I/O地址、寄存器读写使用volatile变量

5）内存屏障解决一致性问题

使用内存屏障指令解决一致性问题：

a）DSB指令

数据同步屏障是一种特殊类型的内存屏障。只有当此指令执行完毕后，才会执行程序中位于此指令后的指令。当满足以下条件时，此指令才会完成：

位于此指令前的所有显式内存访问均完成。
位于此指令前的所有缓存、跳转预测和 TLB 维护操作全部完成。

b）ISB指令

指令同步屏障可刷新处理器中的管道，因此可确保在 ISB 指令完成后，才从高速缓存或内存中提取位于该指令后的其他所有指令。这可确保提取时间晚于 ISB 指令的指令能够检测到 ISB 指令执行前就已经执行的上下文更改操作的执行效果，例如更改ASID 或已完成的 TLB 维护操作，跳转预测维护操作以及对 CP15 寄存器所做的所有更改。

此外，ISB 指令可确保程序中位于其后的所有跳转指令总会被写入跳转预测逻辑，其写入上下文可确保 ISB 指令后的指令均可检测到这些跳转指令。这是指令流能够正确执行的前提条件。

在汇编中可以直接调用这两条指令，在c语言中可以自己定义：

    +   #define dsb() __asm__ __volatile__ ("dsb" : : : "memory")
    +   #define isb() __asm__ __volatile__ ("isb" : : : "memory")

eg：bootloader中注意如下

1) 关闭mmu,cache之后，跳转内核之前,插入isb指令做屏障；

2) invalidate_dcache最后但在未返回之前插入dsb和isb两条指令；

3) bootloader里开启mmu之后需要插入isb屏障；

4)在DMA操作前linux中添加wmb()，bootloader中添加dsb（）。

xiewen202201

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录