HEVC中哈达玛变换计算SATD的向量指令优化

最新推荐文章于 2021-08-22 17:57:27 发布

XX_bai

最新推荐文章于 2021-08-22 17:57:27 发布

阅读量1.1k

点赞数

文章标签： HEVC 哈达玛变换向量 SATD

本文链接：https://blog.csdn.net/xx_bai/article/details/88735454

版权

摘要

在HEVC中，有一种率失真的快速计算方法，用到的是SATD（Sum of Absolute Transformed DIfference），具体来说就是残差矩阵经过哈达玛变换，得到系数矩阵，绝对值求和得到SATD值。x265中4x4尺寸对应的函数就是satd_4x4，该函数提供x86的向量指令优化，本文就介绍向量优化后，该算法的结构。

正文

整个satd_4x4过程总的来说可以用一个公式来表示

$\sum\sum|HXH|$

具体一点的话就是

$\frac{1}{4} \begin{Bmatrix}\begin{bmatrix} 1&1&1&1\\ 1&-1&1&-1\\ 1&1&-1&-1\\ 1&-1&-1&1\\ \end{bmatrix}\begin{bmatrix} r_{00}&r_{01}&r_{02}&r_{03}\\ r_{10}&r_{11}&r_{12}&r_{13}\\ r_{20}&r_{21}&r_{22}&r_{23}\\ r_{30}&r_{31}&r_{32}&r_{33}\\ \end{bmatrix}\begin{bmatrix} 1&1&1&1\\ 1&-1&1&-1\\ 1&1&-1&-1\\ 1&-1&-1&1\\ \end{bmatrix} \end{Bmatrix}$

左边是列变换，右边是行变换。理论知识差不多就是这样，具体怎么得到的就需要参考专业知识了，这里就不多做介绍了。

接下来看具体的实现，具体用什么指令这里也不详细讲了，这里讲述的向量是128位的。satd_4x4该函数得到的参数是

(const pixel *pix1, intptr_t stride_pix1, const pixel *pix2, intptr_t stride_pix2)

两个源矩阵和各自的步长。
这里规定几个表述方法 $row_{n}$ 表示第n行， $column_{m}$ 表示第m列， v(x)i(y)就表示有x个y位数，且为有符号表示， v(x)u(y)就表示无符号。

首先就是基本的数据装载过程，经过访存、插入、复制得到

v16i8 $pix1\_0$ = { $row_{1}, row_{1}, row_{0}, row_{0}$ }
v16i8 $pix1\_1$ = { $row_{3}, row_{3}, row_{2}, row_{2}$ }

pix1和pix2各有一对,同样的结构。
然后使用了一个常量数据

v16i8 con = {-1, 1, -1, 1, 1, 1, 1, 1, -1, 1, -1, 1, 1, 1, 1, 1}

相邻位置与上面的数据点乘，举例来说就是

$pix1\_0[0] * con[0] + pix1\_0[1] * con[1]$

以此类推，得到一个v8i16的数据类型，结构对于每一行都一样，列结构表示为。

v8i16 pix = { $col_{2} - col_{3}, col_{0} - col_{1}, col_{2} + col_{3}, col_{0} + col_{1}, ...$ }

剩下的四个就是另一行，结构相同。然后将所有数据 pix1 - pix2，得到残差(residual)，两个数据， $v8i16\ res0, res1$ 。行结构和上面一样分别是

$res0 = \{row_{1}, row_{1}, row_{0}, row_{0}\}$
$res1 = \{row_{3}, row_{3}, row_{2}, row_{2}\}$

列结构也是

${col_{2} - col_{3}, col_{0} - col_{1}, col_{2} + col_{3}, col_{0} + col_{1}, ...\}$

这里始终将行列分开是为了对应哈达玛变换中的行列变换，这两个可以单独计算的，而且这样做也便于理解。

接下来的一系列操作都是针对行的，如果不看列的话，就是完成了哈达玛的行变换。

$r e s 1 + r e s 0$ 和 $r e s 1 - r e s 0$

得到

$res0 = \{row_{3} + row_{1}, row_{3} + row_{1}, row_{2} + row_{0}, row_{2} + row_{0}\}$
$res1 = \{row_{3} - row_{1}, row_{3} - row_{1}, row_{2} - row_{0}, row_{2} - row_{0}\}$

然后将 $r e s 0$ 的高64位和 $r e s 1$ 的低64位对换，得到

$res0 = \{row_{2} - row_{0}, row_{2} - row_{0}, row_{2} + row_{0}, row_{2} + row_{0}\}$
$res1 = \{row_{3} - row_{1}, row_{3} - row_{1}, row_{3} + row_{1}, row_{3} + row_{1}\}$

再计算 $r e s 1 + r e s 0$ 和 $r e s 1 - r e s 0$ ，得到

$res0 = \{row_{3} + row_{2} - row_{1} - row_{0}, ..., row_{3} + row_{2} + row_{1} + row_{0},...\}$
$res1 = \{row_{3} - row_{2} - row_{1} + row_{0}, ..., row_{3} - row_{2} + row_{1} - row_{0}, ...\}$

到这里就已经完成了列变换

$row_{3} + row_{2} - row_{1} - row_{0}$ 就表示列变换后的第四行系数取负 (4-)
$row_{3} + row_{2} + row_{1} + row_{0}$ 第一行系数 (1)
$row_{3} - row_{2} - row_{1} + row_{0}$ 第二行系数 (2)
$row_{3} - row_{2} + row_{1} - row_{0}$ 第三行系数取负 (3-)

实际上列元素已经进行了初步的计算，这里将列元素展开，而行则用列变换后的系数的行数表示。得到

$res0 = \{(col_{2} - col_{3})(4-), (col_{0} - col_{1})(4-), (col_{2} + col_{3})(4-), (col_{0} + col_{1})(4-), (col_{2} - col_{3})(1), (col_{0} - col_{1})(1), (col_{2} + col_{3})(1), (col_{0} + col_{1})(1)\}$

$res1 = \{(col_{2} - col_{3})(2), (col_{0} - col_{1})(2), (col_{2} + col_{3})(2), (col_{0} + col_{1})(2), (col_{2} - col_{3})(3-), (col_{0} - col_{1})(3-), (col_{2} + col_{3})(3-), (col_{0} + col_{1})(3-)\}$

展开后就是这样一个结果， $col_{n}(m)$ 其实就是列变换后的第m行第n+1列的系数。
然后每32位将 $r e s 0$ 的高16位和 $r e s 1$ 的低16位交换得到(简略点写了)

$res0 = \{(col_{0} - col_{1})(2), (col_{0} - col_{1})(4-), (col_{1} + col_{0})(2), (col_{1} + col_{0})(4-), (col_{0} - col_{1})(3-), (col_{0} - col_{1})(1), (col_{1} + col_{0})(3-), (col_{1} + col_{0})(1)\}$

$res1 = \{(col_{2} - col_{3})(2), (col_{2} - col_{3})(4-), (col_{2} + col_{3})(2), (col_{2} + col_{3})(4-), (col_{2} - col_{3})(3-), (col_{2} - col_{3})(1), (col_{2} + col_{3})(3-), (col_{2} + col_{3})(1)\}$

这里行变换使用了较简单的方法，这里先解释常规的方法。经过列变换能得到一个系数矩阵。
$\begin{bmatrix} col_{0}(1)&col_{1}(1)&col_{2}(1)&col_{3}(1)\\ col_{0}(2)&col_{1}(2)&col_{2}(2)&col_{3}(2)\\ col_{0}(3)&col_{1}(3)&col_{2}(3)&col_{3}(3)\\ col_{0}(4)&col_{1}(4)&col_{2}(4)&col_{3}(4)\\ \end{bmatrix}$
再经过行变换，这里不一一列举了，就拿第一行举例。第一行经过变换能得到四个系数，分别是：

$coeff(0,0) = col_{0}(1) + col_{1}(1) + col_{2}(1) + col_{3}(1)$
$coeff(0,1) = col_{0}(1) - col_{1}(1) + col_{2}(1) - col_{3}(1)$
$coeff(0,2) = col_{0}(1) + col_{1}(1) - col_{2}(1) - col_{3}(1)$
$coeff(0,3) = col_{0}(1) - col_{1}(1) - col_{2}(1) + col_{3}(1)$

然后取绝对值求和

$∣ c o e f f (0, 0) ∣ + ∣ c o e f f (0, 1) ∣ + ∣ c o e f f (0, 2) ∣ + ∣ c o e f f (0, 3) ∣$ $

到此第一行就已经处理完毕，后续行都是同样的处理，最后求和值求和得到的就是最终结果。

但x86的向量优化并没有这样做，而且根据得到的 $r e s 0$ 和 $r e s 1$ ，也不允许这样做。接下来讲述x86的做法，得到的结果是一样的，而且还省略了很多步骤。
首先是对 $r e s 0$ 和 $r e s 1$ 中的各元素求绝对值，然后对应位置元素比大小，取较大的值。最后就是将16位的所有元素求和，得到一个32位的sum，就是结果。

这个过程的前半部分会涉及到一个公式，首先我将具体处理过程先列举出来，同样拿第一行的数据作为例子。取绝对值之后对应位置的元素分别是：

$col_{0} - col_{1})|$ ---- $col_{2} - col_{3})|$
$col_{0} + col_{1})|$ ---- $col_{2} + col_{3})|$

然后比大小，取较大值，然后求和，得到的结果就是上面常规方法的绝对值求和/2，常规方法结果展开是：

$col_{0} + col_{1} + col_{2} + col_{3}| + |col_{0} - col_{1} + col_{2} - col_{3}| + |col_{0} + col_{1} - col_{2} - col_{3}| + |col_{0} - col_{1} - col_{2} + col_{3}|$

这个式子就是由上面四个元素取绝对值之前的数组成。下面用半吊子的方法证明一下。
因为这四个数之间没什么关联，相互之间的大小对比可以是任何一种情况，所以为简单起见，设

$col_{0} - col_{1})| = |a|$
$col_{2} - col_{3})| = |b|$
$col_{0} + col_{1})| = |c|$
$col_{2} + col_{3})| = |d|$

简化为证明

$\max(|a|, |b|) = (|a + b| + |a - b|)/2$
$\max(|c|, |d|) = (|c + d| + |c - d|)/2$

证明其中一个，另一个是一样的。

已知等式 $m a x (a, b) = 1 / 2 * (a + b + ∣ a - b ∣)$ 成立

得到 $m a x (∣ a ∣, ∣ b ∣) = 1 / 2 * (∣ ∣ a ∣ + ∣ b ∣ ∣ + ∣ ∣ a ∣ - ∣ b ∣ ∣)$

去内部绝对值，列举所有情况。。。。就可以证明。
到此整个函数讲述完毕。

XX_bai

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
HEVC中哈达玛变换计算SATD的向量指令优化

在HEVC中，有一种率失真的快速计算方法，用到的是SATD（Sum of Absolute Transformed DIfference），具体来说就是残差矩阵经过哈达玛变换，得到系数矩阵，绝对值求和得到SATD值。x265中4x4尺寸对应的函数就是satd_4x4，该函数提供x86的向量指令优化，本文就介绍向量优化后，该算法的结构。
复制链接

扫一扫