Llvm Simple Register Coalescing

Qing Wang at Intel

已于 2023-02-12 18:28:46 修改

阅读量191

点赞数 1

文章标签：开源软件

于 2023-02-12 18:13:25 首次发布

本文链接：https://blog.csdn.net/weixin_42600768/article/details/128997218

版权

Llvm Simple Register Coalescing

Simple Introduction

This pass is kind of like the copy propagation. Concisely, it will merge two liveintervals if possible. The pass will coordinate with phi-node-elimination, deletes the redundant COPY after phi-node-elimination.

Example

The following is the MI sequence before the pass.

# *** IR Dump Before Simple Register Coalescing (simple-register-coalescing) ***:
# Machine code for function mult: NoPHIs, TracksLiveness, TiedOpsRewritten
Function Live Ins: $edi in %3

0B	bb.0.entry:
	  successors: %bb.1(0x50000000), %bb.2(0x30000000); %bb.1(62.50%), %bb.2(37.50%)
	  liveins: $edi
16B	[[13]]  %3:gr32 = COPY $edi
32B	[[2]]  %4:gr32 = MOV32r0 implicit-def dead $eflags
48B	[[3]]  %5:gr8 = COPY %4.sub_8bit:gr32
64B	[[4]]  TEST8rr %5:gr8, %5:gr8, implicit-def $eflags
80B	[[5]]  JCC_1 %bb.2, 5, implicit killed $eflags
96B	[[6]]  JMP_1 %bb.1

112B	bb.1.if.then:
	; predecessors: %bb.0
	  successors: %bb.3(0x80000000); %bb.3(100.00%)

128B	[[8]]  %0:gr32 = MOV32ri 2
144B	[[15]]  %7:gr32 = COPY %0:gr32
160B	[[9]]  JMP_1 %bb.3

176B	bb.2.if.else:
	; predecessors: %bb.0
	  successors: %bb.3(0x80000000); %bb.3(100.00%)

192B	[[17]]  %1:gr32 = COPY %3:gr32
208B	[[7]]  %1:gr32 = nsw ADD32ri8 %1:gr32(tied-def 0), 2, implicit-def dead $eflags
224B	[[16]]  %7:gr32 = COPY %1:gr32

240B	bb.3.if.end:
	; predecessors: %bb.2, %bb.1

256B	[[14]]  %2:gr32 = COPY %7:gr32
272B	[[18]]  %6:gr32 = COPY %2:gr32
288B	[[10]]  %6:gr32 = nsw INC32r %6:gr32(tied-def 0), implicit-def dead $eflags
304B	[[11]]  $eax = COPY %6:gr32
320B	[[12]]  RET 0, killed $eax

After the pass, the sequence will be

# *** IR Dump Before Rename Disconnected Subregister Components (rename-independent-subregs) ***:
# Machine code for function mult: NoPHIs, TracksLiveness, TiedOpsRewritten
Function Live Ins: $edi in %3

0B	bb.0.entry:
	  successors: %bb.1(0x50000000), %bb.2(0x30000000); %bb.1(62.50%), %bb.2(37.50%)
	  liveins: $edi
16B	[[13]]  %7:gr32 = COPY $edi
32B	[[2]]  %4:gr32 = MOV32r0 implicit-def dead $eflags
64B	[[4]]  TEST8rr %4.sub_8bit:gr32, %4.sub_8bit:gr32, implicit-def $eflags
80B	[[5]]  JCC_1 %bb.2, 5, implicit killed $eflags
96B	[[6]]  JMP_1 %bb.1

112B	bb.1.if.then:
	; predecessors: %bb.0
	  successors: %bb.3(0x80000000); %bb.3(100.00%)

128B	[[8]]  %7:gr32 = MOV32ri 2
160B	[[9]]  JMP_1 %bb.3

176B	bb.2.if.else:
	; predecessors: %bb.0
	  successors: %bb.3(0x80000000); %bb.3(100.00%)

208B	[[7]]  %7:gr32 = nsw ADD32ri8 %7:gr32(tied-def 0), 2, implicit-def dead $eflags

240B	bb.3.if.end:
	; predecessors: %bb.2, %bb.1

288B	[[10]]  %7:gr32 = nsw INC32r %7:gr32(tied-def 0), implicit-def dead $eflags
304B	[[11]]  $eax = COPY %7:gr32
320B	[[12]]  RET 0, killed $eax

The following show the merging process.

Self interval:
32r::48r  %4  [[3]]  // If %4 is merged into 5%, then [[2]] must be devided and  generate two dests, %4 and %5. So here although the original dest is %5, but the pass will merge %5 into %4.
Other interval:
48r::64r  %5
join-end
join-begin
Self interval:
144r::176B  %7  [[15]]
224r::240B
240B::256r
Other interval:
128r::144r  %0
join-end
join-begin
Self interval:
192r::208r  %1  [[17]]  
208r::224r
Other interval:
16r::112B  %3
176B::192r
join-end
join-begin
Self interval:
128r::176B  {%0， %7}  [[16]]
224r::240B
240B::256r
Other interval:
16r::112B  {%1, %3}
176B::208r
208r::224r
join-end
join-begin
Self interval:
16r::112B  {%0, %1, %3, %7}  [[14]]  // Although the dest is %2, but after several rounds of joincopy, the src's liveinterval is larger than dest's. So the %2 will be merged and be deleted.
128r::176B
176B::208r
208r::240B
240B::256r
Other interval:
256r::272r  {%2}
join-end
join-begin
Self interval:
16r::112B  {%0, %1, %2, %3, %7}  [[18]]
128r::176B
176B::208r
208r::240B
240B::272r
Other interval:
272r::288r  {%6}
288r::304r
join-end

When a copy is meet, and its src and dest are not the same register, the pass will try to merge the intervals of dest and src registers. If either of two intervals is too large, the optimization will stop. If the two intervals have a conflict, then the optimization will stop likely. If the conflict is caused by a coalescable copy, then the intervals can be merged. If the conflict is caused by a not-coalescable copy, by walking through the copy link, if the end of dst and the end of src are the same, then the invtervals can be merged. Following example explains such condition.

x4 = y1 add y2
x3 copy x4
x1 copy x2
src copy x3
dest copy x1(because at this point, dest equals to x4, and src equals to x4 too, so the MI can be ereased. So analyzeValue will return CR_Erase)
y3 = src and src

Certainly, there are other conditions to think about. These conditions are descripted in the body of JoinVals::analyzeValue.