实战之 arm64 刷 cache
小问题
- arm64下如何使用指令将数据cache全部刷掉?
- 用户态是否可以刷cache
1. 前言
通常的我们说的刷cache可能包括了两类动作:
- clean: 将cache里面的新内容写入到ddr
- invalidate: 将cache里面的内容无效掉,即丢掉
虽然都说刷cache,实际可能是上面两类动作的一种或者组合
例如:如果刷指令cache,因为指令是只读的,直接invalidate即可。
如果刷数据cache,如果明确可以直接丢掉数据则可以直接使用invalidate;
通常的刷数据cache需要先clean再invalidate;
当然数据cache也可以只clean不invalidate
2. cache的结构
cpu0 | - | cpu1 | - | cpu2 | - | cpu3 | - | |
---|---|---|---|---|---|---|---|---|
level1 | I | D | I | D | I | D | I | D |
level2 | cluster0-U | - | - | - | cluster1-U | - | - | - |
ddr缓存(l3) | all-share-U | - | - | - | - | - | - | - |
ddr | - | - | - | - | - | - | - | - |
- pou 到l2
- poc 到l3
3. 刷cache有哪些指令
3.1 刷指令cache的指令
指令 | 解释 | 动作 | 是否支持用户态执行 |
---|---|---|---|
IC IALLUIS | Invalidate all to Point of Unification, Inner Shareable | I | 否 |
IC IALLU | Invalidate all to Point of Unification | I | 否 |
IC IVAU, Xt | Invalidate by virtual address to Point of Unification | I | 是 |
3.2 刷数据cache的指令
指令 | 解释 | 动作 | 是否支持用户态执行 |
---|---|---|---|
DC IVAC, Xt | Invalidate by virtual address to Point of Coherency | I | 否 |
DC ISW, Xt | Invalidate by set/way | I | 否 |
DC CVAC, Xt | Clean by virtual address to Point of Coherency | C | 是 |
DC CSW, Xt | Clean by set/way | C | 否 |
DC CVAU, Xt | Clean by virtual address to Point of Unification | C | 是 |
DC CIVAC, Xt | Clean and invalidate by virtual address to Point of Coherency | C&I | 是 |
DC CISW, Xt | Clean and invalidate by set/way | C&I | 否 |
4. linux刷cache有哪些接口
4.1 刷指令cache的接口
- __flush_icache_range(regs->pc, regs->pc + AARCH64_INSN_SIZE); — ic ivau
- __flush_icache_all(); — ic ialluis
4.2 刷数据cache的接口
extern void __flush_dcache_area(void *addr, size_t len); —dc civac
extern void __inval_dcache_area(void *addr, size_t len); —dc ivac
extern void __clean_dcache_area_poc(void *addr, size_t len); --dc cvac
extern void __clean_dcache_area_pop(void *addr, size_t len); --dc ARM64_HAS_DCPOP
extern void __clean_dcache_area_pou(void *addr, size_t len); --dc cvau
5. 如何全局的刷cache
从上面的接口可以看出,刷指令cache存在全局的指令,但数据cache没有全局的指令。
最简单的的如果需要全局刷数据cache则使用DC CISW, Xt将所有的set/way刷一遍
其中芯片各级cache有多少way是直接描述的
再根据cache总大小可以计算出有多少set
如A72 L1-I
48KB 3-way set-associative instruction cache. Fixed line length of 64 bytes.
则set = 481024 / 64 / 3 == 256
如L1-D 32KB 2-way set-associative data cache.Fixed line length of 64 bytes
则set = 321024 / 64 / 2 == 256
如L2 1M
• Configurable L2 cache size of 512KB, 1MB, 2MB and 4MB.
• Fixed line length of 64 bytes.
• Physically indexed and tagged cache.
• 16-way set-associative cache structure.
• Banked pipeline structures.
• Inclusion property with L1 data caches.
则set = 110241024 / 64 / 16 == 1024
+static void flush_cache_internal(unsigned int way_shift, unsigned int way_max,
+ unsigned int set_shift, unsigned int set_max, unsigned int level)
+{
+#define FLEVEL_SHIFT 1
+ unsigned long set, way;
+ unsigned long setway;
+
+ for (way = 0; way <= way_max; way++) {
+ for (set = 0; set <= set_max; set++) {
+ setway = way << way_shift | set << set_shift | level << FLEVEL_SHIFT;
+ //pr_info("setway=%lx, way=%ld, set=%ld, l=%d\n", setway, way, set, level);
+ asm volatile("dc cisw, %0" : : "r" (setway) : "memory");
+ }
+ }
+}
+
+static void flush_l1dl2d_all(void)
+{
+#define FL1D_LEVEL_V 0
+#define FL1D_WAY_SHIFT 31
+#define FL1D_WAY_MAX 1
+#define FL1D_SET_SHIFT 6
+#define FL1D_SET_MAX 255
+
+#define FL2D_LEVEL_V 1
+#define FL2D_WAY_SHIFT 28
+#define FL2D_WAY_MAX 15
+#define FL2D_SET_SHIFT 6
+#define FL2D_SET_MAX 1023
+ unsigned long start, end;
+ start = sched_clock();
+ flush_cache_internal(FL1D_WAY_SHIFT, FL1D_WAY_MAX, FL1D_SET_SHIFT, FL1D_SET_MAX, FL1D_LEVEL_V);
+ flush_cache_internal(FL2D_WAY_SHIFT, FL2D_WAY_MAX, FL2D_SET_SHIFT, FL2D_SET_MAX, FL2D_LEVEL_V);
+ end = sched_clock();
+ pr_info("cache flush start=%ld, end=%ld, delta=%ld\n", start, end, end - start);
+}