构造SSA

电影旅行敲代码

已于 2023-12-13 09:46:05 修改

阅读量2.8k

点赞数 6

分类专栏：编译文章标签： java 数据库 linux

于 2019-12-01 14:03:22 首次发布

本文链接：https://blog.csdn.net/dashuniuniu/article/details/103275708

版权

编译专栏收录该内容

29 篇文章 24 订阅

订阅专栏

业余民科，博客垃圾内容勿看
请看《Static Single Assignment Book》+《Data Flow Analysis Theory and Practice》+ 《Engineering a Compiler》

在文章《
SSA的优势》以接近于无的实操经验总结了SSA的优势。在文章《构造Dominator Tree以及Dominator Frontier》介绍了如何构造Dominator Tree和Dominator Frontier，为放置 $\phi$ 指令做准备。这篇文章介绍如何构造SSA的形式。

构造SSA大致可以分为两步：

placement of $\phi$ -functions
renaming

为了使构造的SSA form最优，我们可能需要额外维护minimality属性。

the minimality property states the minimality of the number of inserted $\phi$ -functions.

而我们构造DomTree及DomFrontier的目的就是为了维持minimality属性。

minimal SSA form is obtained by placing $\phi$ -funtions of variable $v$ using formalism of dominance frontier.

构造DomTree

CFG
这里我们就使用《Engineering a Compiler》中提出的方法构造支配树，算法见构造Dominator Tree以及Dominator Frontier。
IDOM
注：draw.io在插入公式时有些变形

下面继续使用《Engineering a Compiler》中的算法计算dominance frontier。

计算支配边界

Place $\phi$ -functions

下面接着使用《Data Flow Analysis Theory and Pratice》（其实这个算法很通用，《Engineering a Compiler》有更详细的例子）中的算法来插入 $\phi$ ，

The native algorithm placed a $\phi$ -function for every variable at the start of every join node.

With dominance frontiers, the compiler can determine more precisely where $\phi$ -functions might be needed. The basic idea is simple. A definition of $x$ in block $b$ forces a $\phi$ -function at very node in $D F (b)$ . Since that $\phi$ -function is a new definition of $x$ , it may, in turn, force the insertion of additional $\phi$ -functions.

如果一个变量的def和所有use在同一个block中，则不需要为其放置 $\phi$ -function。我们以上述例子中的变量x为例，为其放置 $\phi$ -function。
phi-function algo
注：上图来自于Data Flow Analysis Theory and Pratice

$inW or k l i s t$ : 如果 $inWorklist_n$ 是 $x$ ，这就说明节点 $n$ 由于定义了 $x$ 已经被插入了 $w or k l i s t$ 。
$in ser t e d$ : 如果 $inserted_n$ 是 $x$ ，说明关于 $x$ 的 $\phi$ -instruction已经插入到节点 $n$ 中了
$a ss i g n$ : $assign_x$ 代表了对 $x$ 进行了定义的节点的集合
$df$ 是dominance frontier的缩写

phi-function
稍微对上述的计算过程解释一下，上图中 $w or k l i s t$ 的初始值是所有对 $x$ 进行定义的BasicBlock， $E n t ry$ ， $n_1$ ， $n_4$ ， $n_9$ 。

从 $w or k l i s t$ 中移除 $E n t ry$ 节点，遍历 $E n t ry$ 的 $df$ 为空
从 $w or k l i s t$ 中移除 $n_1$ 节点，遍历 $n_1$ 的 $df$ ${n_1, n_7\}$
由于 $inserted_{n_1} \ne x$ ，在节点 $n_1$ 中插入一个 $\phi$ -instruction。设置 $inserted_{n_1} = x$ ，处理下一个 $df$ 。
由于 $inserted_{n_7} \ne x$ ，在节点 $n_7$ 中插入一个 $\phi$ -instruction。设置 $inserted_{n_7} = x$ 和 $inWorklist_{n_7}=x$ ，并将 $n_7$ 插入 $w or k l i s t$ 。
从 $w or l i s t$ 中移除 $n_4$ 节点，遍历 $n_4$ 的 $df$ ${n_6\}$ 。
由于 $inserted_{n_6} \ne x$ ，在节点 $n_6$ 中插入一个 $\phi$ -instruction。设置 $inserted_{n_6=x}$ 和 $inWorklist_{n_6}=x$ ，并将 $n_6$ 插入 $w or k l i s t$ 。
从 $w or k l i s t$ 中移除 $n_9$ ，遍历 $n_9$ 的 $df$ ${n_{10}\}$ 。
由于 $inserted_{n_{10}} \ne x$ ，在节点 $n_{10}$ 中插入一个 $\phi$ -instruction。设置 $inserted_{n_{10}}=x$ 和 $inWorklist_{n_{10}} = x$ ，并将 $n_{10}$ 插入 $w or k l i s t$ 。
从 $w or k l i s t$ 中移除 $n_7$ 节点，遍历 $n_7$ 的 $df$ ${n_7\}$ ，由于 $inserted_{n_7} = x$ ，跳过
从 $w or k l i s t$ 中移除 $n_6$ ，遍历 $n_6$ 的 $df$ ${n_2, n_7\}$
由于 $inserted_{n_2} \ne x$ ，在节点 $n_2$ 中插入一个 $\phi$ -instruction。设置 $inseted_{n_2}=x$ 和 $inworklist_{n_2} = x$ ，由于 $inserted_{n_7} = x$ ，下一步。
从 $w or k l i s t$ 中移除 $n_{10}$ ，遍历 $n_{10}$ 的 $df$ ${n_7\}$ ，由于 $inserted_{n_7} = x$ ，跳过
从 $w or k l i s t$ 中移除 $n_2$ ，遍历 $n_2$ 的 $df$ ${n_2, n_7\}$ ，由于 $inserted_{n_2} = x$ 和 $inserted_{n_7} = x$ ，所以跳过。
$w or k l i s t$ 为空结束

最终插入的 $\phi$ -function的节点是 $n_1$ （第3步）， $n_2$ （第11步）， $n_6$ （第6步）， $n_7$ （第4步）， $n_{10}$ （第8步），其中join node $n_8$ 并不需要插入 $\phi$ -function。

Renaming of Variables

rename的核心是将def和 $\phi$ -instruction勾连起来，所谓的name只是表层的含义

继续使用上面的例子，每一个 $d e f$ 都有其支配的 $u se$ ，如下图所示每个颜色代表了一个 $d e f$ ，在一个连续的data-flow中，可能存在很多 $d e f$ 。 $E n t ry$ 中 $d e f$ “支配”了 $n_1$ 和 $n_2$ ，而 $n_5$ 由 $n_1$ 支配。其实这个可以看做在树上的回溯遍历，只是这个树是个二叉树。有回溯，就要用栈，为了命名不同的 $x$ ，可能需要维护一个类似counter什么的作为下标。
在这里插入图片描述

这里我们还是使用《Engineering a Compiler》中算法说明整个过程。
rename
支配边界1

这里列出来《Engineering a Compiler》中的描述：

renames both defintions and uses in a preorder walk over the procedure’s dominator tree
In each block, it first renames the values defined by $\phi$ -functions at the head of the block
then it visits each operation in the block, in order
It rewrites the operands with current SSA names, then it creates a new SSA name for the result of the operation
After all the operations in the block has been rewritten, the algorithm rewrites the appropritate $\phi$ -function parameters in each CFG successor of the block, using the current SSA names.
Finally, it recurs on any children of the block in the dominator tree.
When it returns from those recursive calls, it restores the set of current SSA names to the state that existed before the current block was visited.

其实这个过程还算简单。

Converting Out of SSA

The program in SSA form must be finally converted into executable code. However no real processor has instructions that can directly capture the semantics of $\phi$ -instructions. Therefore the $\phi$ -instructions have to be replaced by code fragments inserted at appropriate places. The elimination of $\phi$ -instructions from a program in SSA form is called SSA destruction. - 《Data Flow Analysis Theory and Practice》

Once we have completed SSA based optimization passes, and certainly before code generation, it is necessary to eliminate $\phi$ -functions since these are not executable machine instructions. This elimination phase is known as SSA destruction. - 《Static Single Assignment Book》

Conventional and transformed SSA form

注：这部分内容摘抄自《Static Single Assignment Book》和《Data Flow Analysis Theory and Practice》

CSSA。直接将IR转换成SSA form后得到 IR 是 CSSA（Conventional SSA form 或 Canonical SSA）
TSSA。我们在 CSSA 上进行优化变换后得到的是 TSSA（Transformd SSA）。
$\phi$ -related。如果 $x$ 和 $y$ 同时出现在同一个 $\phi$ -function中，我们就说 $x$ 和 $y$ $\phi$ -related。

通过 $\phi$ -related，我们可以定义出 $\phi$ -webs。

The transitive closure of this relation(也就是 $\phi$ -related) defines an equivalence relation that partions the variables defined locally in the procedure into equivalence classes, the $\phi$ -webs。- 《Static Single Assignment Book》

Intuitively, the $\phi$ -equivalence class of a resource represents a set of resources “connected” via $\phi$ -functions.

对于 CSSA，在任意 $\phi$ -web里面，任何程序点都只有一个 $v_i$ 是live的，在这个 $w e b$ 也就是说在同一个 $\phi$ -web里，不同的 $v_i$ 之间肯定不会存在iterference。同时不同的 $\phi$ -web之间也不会存在interference。

但是TSSA可能会打破这个限制，因为这些优化会移动指令，将某个 $v_i$ 传播到更远的地方以至于与其它 $v_j$ 产生interference。

如下图 (b) 所示，在CSSA中，不同的 $a_i$ 本来互不交叉。但是经过一些代码优化之后，例如删除 $a_2 \leftarrow a_1$ 和 $tmp\leftarrow a_1$ 之后， $a_1$ 跑到了最后面，导致 $a_1$ 与 ${a_2,a_3,a_4\}$ 有交叉。而这种交叉就对SSA destruction带来了一些困难。 $\phi$ -web的本质是， $\phi$ -web（或者处于同一 $\phi$ -related等价类）中的 $v_i$ 描述的是“同一个值”（值可能在不同路径中），例如下图(b)中的 $a_2$ ， $a_3$ 和 $a_4$ 描述其实是同一个东西。如果不同的 $\phi$ -web出现了inference，那么在像CSSA那样，直接删掉 $\phi$ -instruction和变量名中的index之后，就相当于不同的 $\phi$ -web，也就是不同的东西纠缠在一起了，而在 CSSA 上的优化基本上都会移动 $\phi$ -web从而出现 $\phi$ -web出现inference。
interference
注：上面这个图来自《Static Single Assignment Book》

我们使用下面的例子来说，在convert TSSA 时可能遇到的问题。右侧图片是在非SSA IR上做的优化， $b\leftarrow a$ （最下面的 $x + y$ 不可能替换为 $a$ ）。对于左侧的 TSSA，我们在将其转换回非SSA IR时，不能像对 CSSA 那样，将名字中的 $in d e x$ 删除就行了。
TSSA

注：上面的图片来自于《Engineering a Compiler》

insert copy instructions

目的是删除 $\phi$ -instruction，一种粗暴的方式就是将 $\phi$ -instruction替换为在 predecessor block 中的一系列copy指令。如下图所示，但是值得注意的是插入在 $B_0$ 基本块predecessor $B_3$ 中的指令并不像 $B_0$ ，因为 $B_3$ 并不是只有 $B_1$ 一个sucessor，直接插在 $B_3$ 尾部会影响 $B_4$ （多执行一些copy指令。可能会产生错误的结果（我还没有想到具体的例子）？）。为了解决这个问题，单独增加了一个基本块儿 $B_9$ ，加在 $B_3 \rightarrow B_1$ 间， $B_3 \rightarrow B_9 \rightarrow B_1$ ，而 $B_9$ 只是用于存放这些额外的copy instructions。这个操作也就是split critical edges。

A critical edge is an edge from a node with several successors to a node with several predecessors. - 《Static Single Assignment Book》

An edge $\rightarrow m$ is a critical edge if n has more than one successor and m has more than one predecessor. - 《Data Flow Analysis Theory and Pratice》

注：上面的两幅图来源于《Engineering a Compiler》上述的 $a_0$ ， $b_0$ ， $c_0$ ， $d_0$ 是在 $B_0$ 之前定义的

一种可行的方法是将 $\phi$ -web存在inference的IR转换为 $\phi$ -web不存在inference的IR。如果IR中不存在inference，那么我们就可以通过在pred block尾部插入一些copy指令来代替 $\phi$ -web。

另外还有两个小问题需要注意。

the lost copy problem

lost copy
注：上图来自于《Data Flow Analysis Theory and Pratice》

为了删除 $\phi$ -function，上图(d)中在 $n_2$ 结尾添加了 $x_3=x_2$ ，但是copy target $x_3$ 与 $x_3$ 的live range产生了interferfence。我们能做有两个方法：

在 $n_2$ 与 $n_1$ 添加一个block
添加了一个临时变量 t 来 hold 住 $x_2$ 的值

swap problem

In this case also the problem arises because the process of SSA destruction does not follow the semantics of $\phi$ -instructions.

swap
注：上图来自于《Data Flow Analysis Theory and Pratice》

$\phi$ -functions从语义上说是 simultaneous 的，但是在上图(d)中的 $x_3=y_3$ 和 $y_3=x_3$ 是顺序执行的，从而导致语义上的错误。

算法实现

现在的目的很明确了，我们要通过插入copy instructions来讲存在interference的 $\phi$ -web打破成两个互相不interfere的两个 $\phi$ -web，而核心就是copy instructios插在哪里？针对哪些名字？如何最小化插入的copy instructions。