Mpeg1/2标准的DCT变换与H264的整数DCT不太一样,会有小数运算。硬件实现时会做定点化处理,因此也会产生误差,误差主要体现在某些数值小数部分在0.5左右。比如小数运算时数值为4.4999,四舍五入最终像素值为4;如果定点化结果为4.50001,实际上与小数运算差异不大,但最终像素值却相差1。
1.DCT简介
DCT和DCT反变换可用如下公式表示:
其中,Xij是图像块 X 中第 i 行第 j 列图像或残差值,Ymn 是变换结果矩阵 Y 相应频率点上的 DCT 系 数。可以用矩阵表示
Y
=
A
X
A
T
Y=AXA^T
Y=AXAT
X
=
A
T
Y
A
X=A^TYA
X=ATYA
Mpeg1/2标准的DCT是8x8大小的。
令
a
=
1
/
8
a=\sqrt{1/8}
a=1/8,
b
=
1
2
cos
(
π
8
)
b=\frac{1}{2}\cos{\left(\frac{\pi}{8}\right)}
b=21cos(8π),
c
=
1
2
cos
(
3
π
8
)
c=\frac{1}{2}\cos{\left(\frac{3\pi}{8}\right)}
c=21cos(83π) ,
d
=
1
2
cos
(
π
16
)
d=\frac{1}{2}\cos{\left(\frac{\pi}{16}\right)}
d=21cos(16π),
e
=
1
2
cos
(
3
16
π
)
e=\frac{1}{2}\cos{\left(\frac{3}{16}\pi\right)}
e=21cos(163π),
f
=
1
2
cos
5
16
π
f=\frac{1}{2}\cos{\frac{5}{16}\pi}
f=21cos165π,
g
=
1
2
cos
7
16
π
g=\frac{1}{2}\cos{\frac{7}{16}\pi}
g=21cos167π,
则 矩阵A可以如下表示
(
a
a
a
a
a
a
a
a
d
e
f
g
−
g
−
f
−
e
−
d
b
c
−
c
−
b
−
b
−
c
c
b
e
−
g
−
d
−
f
f
d
g
−
e
a
−
a
−
a
a
a
−
a
−
a
a
f
−
d
g
e
−
e
−
g
d
−
f
c
−
b
b
−
c
−
c
b
−
b
c
g
−
f
e
−
d
d
−
e
f
−
g
)
\begin{pmatrix} a & a & a & a& a& a& a& a \\ d & e&f &g&-g&-f&-e&-d \\ b&c&-c&-b&-b&-c&c&b \\ e&-g&-d&-f&f&d&g&-e \\ a& -a&-a&a&a&-a&-a&a \\ f&-d&g&e&-e&-g&d&-f \\ c&-b&b&-c&-c&b&-b&c\\ g&-f&e&-d&d&-e&f&-g \end{pmatrix}
⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛adbeafcgaec−g−a−d−b−faf−c−d−agbeag−b−fae−c−da−g−bfa−e−cda−f−cd−a−gb−ea−ecg−ad−bfa−db−ea−fc−g⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞
矩阵运算可以使用蝶形变换加速,以8x8反变换为例,分析
A
T
Y
A^TY
ATY的第一列,
其中间结果用acc0-acc7表示,如下
(
a
c
c
0
a
c
c
1
a
c
c
2
a
c
c
3
)
=
(
a
b
a
c
a
c
−
a
−
b
a
−
c
−
a
b
a
−
b
a
−
c
)
(
x
0
x
2
x
4
x
6
)
\begin{pmatrix} acc0 \\ acc1 \\ acc 2 \\ acc3 \\ \end{pmatrix} =\begin{pmatrix} a&b& a& c\\ a&c&-a&-b \\ a&-c&-a&b\\ a&-b&a&-c \end{pmatrix}\begin{pmatrix} x0\\x2 \\ x4\\ x6 \end{pmatrix}
⎝⎜⎜⎛acc0acc1acc2acc3⎠⎟⎟⎞=⎝⎜⎜⎛aaaabc−c−ba−a−aac−bb−c⎠⎟⎟⎞⎝⎜⎜⎛x0x2x4x6⎠⎟⎟⎞
(
a
c
c
4
a
c
c
5
a
c
c
6
a
c
c
7
)
=
(
d
e
f
g
e
−
g
−
d
−
f
f
−
d
g
e
g
−
f
e
−
d
)
(
x
1
x
3
x
5
x
7
)
\begin{pmatrix} acc4 \\ acc5 \\ acc 6 \\ acc7 \\ \end{pmatrix} =\begin{pmatrix} d&e& f& g\\ e&-g&-d&-f \\ f&-d&g&e\\ g&-f&e&-d \end{pmatrix}\begin{pmatrix} x1\\x3 \\ x5\\ x7 \end{pmatrix}
⎝⎜⎜⎛acc4acc5acc6acc7⎠⎟⎟⎞=⎝⎜⎜⎛defge−g−d−ff−dgeg−fe−d⎠⎟⎟⎞⎝⎜⎜⎛x1x3x5x7⎠⎟⎟⎞
最终结果为
y
0
=
a
c
c
0
+
a
c
c
4
y0 = acc0+acc4
y0=acc0+acc4
y
1
=
a
c
c
1
+
a
c
c
5
y1 = acc1+acc5
y1=acc1+acc5
y
2
=
a
c
c
2
+
a
c
c
6
y2 = acc2+acc6
y2=acc2+acc6
y
3
=
a
c
c
3
+
a
c
c
7
y3 = acc3+acc7
y3=acc3+acc7
y
4
=
a
c
c
3
−
a
c
c
7
y4 = acc3-acc7
y4=acc3−acc7
y
5
=
a
c
c
2
−
a
c
c
6
y5 = acc2-acc6
y5=acc2−acc6
y
6
=
a
c
c
1
−
a
c
c
5
y6= acc1-acc5
y6=acc1−acc5
y
7
=
a
c
c
0
−
a
c
c
4
y7 = acc0-acc4
y7=acc0−acc4
DCT反变换的几种python实现
- 按照公式直接计算
def normal_dct(block):
A = np.zeros((8, 8)) # 生成0矩阵
shape = src.shape[1] # 获取维数
for i in range(8):
for j in range(8):
if (i == 0):
x = sqrt(1 / shape)
else:
x = sqrt(2 / shape)
A[i][j] = x * cos(pi * (j + 0.5) * i / shape)
return np.dot(np.dot(A.T, src), A)
- 使用scipy的dct函数
from scipy.fftpack import dct, idct
def idct2(block):
return idct(idct(block.T, norm = 'ortho').T, norm = 'ortho')
2.定点化实现
定点主要是把小数转成整型数据范围,通过整形加减以及移位操作替换小数运算。DCT运算过程中矩阵
A
A
A中的参数都是小数,需要转为整数进行运算。
小数可以用2的负数次幂相加来表示,以
a
=
1
/
8
a=\sqrt{1/8}
a=1/8为例,可以表示为
2
−
2
+
2
−
4
+
2
−
5
+
2
−
7
+
2
−
9
2^{-2} + 2^{-4} + 2^{-5} + 2^{-7} + 2^{-9}
2−2+2−4+2−5+2−7+2−9。实现过程中以16个元素的数组来表示,每个数组元素所在数组下标对应2的幂次,则a的表项为a_table = [0,0,1,0,1,1,0,1,0,1,0,0,0,0,0,0]。
# a = sqrt(1/8) = 0.35355 = 2^-2 + 2^-4 + 2^-5 + 2^-7 + 2^-9
a_table = [0,0,1,0,1,1,0,1,0,1,0,0,0,0,0,0]
# b = (1/2*cos(PI/8)) = 0.46194 = 2^-1 - 2^-5 - 2^-7 + 2^-10
b_table = [0,1,0,0,0,-1,0,-1,0,0,1,0,0,0,0,0]
# c = (1/2*cos(PI*3/8)) = 0.19134 = 2^-3 + 2^-4 + 2^-8 - 2^-14
c_table = [0,0,0,1,1,0,0,0,1,0,0,0,0,0,-1,0]
# d = 1/2*cos(PI/16) = 0.49039 = 2^-1 - 2^-7 - 2^-9 + 2^-13
d_table = [0,1,0,0,0,0,0,-1,0,-1,0,0,0,1,0,0]
# e = 1/2*cos(PI*3/16) = 0.41573 = 2^-2 + 2^-3 + 2^-5 + 2^-7 + 2^-9 - 2^-12
e_table = [0,0,1,1,0,1,0,1,0,1,0,0,-1,0,0,0]
# f = 1/2*cos(PI*5/16) = 0.27779 = 2^-2 + 2^-5 - 2^-8 + 2^-11
f_table = [0,0,1,0,0,1,0,0,-1,0,0,1,0,0,0,0]
# g = 1/2*cos(PI*7/16) = 0.09755 = 2^-4 + 2^-5 + 2^-8 - 2^-13
g_table = [0,0,0,0,1,1,0,0,1,0,0,0,0,-1,0,0]
# 定点化IDCT
def fixpoint_idct(block):
input = block.reshape(64, 1)
idct_data1 = [0 for i in range(64)]
idct_data2 = [0 for i in range(64)]
tmp_data1 = idct_data1
tmp_data2 = idct_data2
for i in range(64):
tmp_data1[i] = input[i] << 10
#两次蝶形变换
for k in range(2):
for hor in range(8):
a_x0 = b_x2 = a_x4 = c_x6 = 0
c_x2 = b_x6 = 0
d_x1 = e_x3 = f_x5 = g_x7 = 0
e_x1 = g_x3 = d_x5 = f_x7 = 0
f_x1 = d_x3 = g_x5 = e_x7 = 0
g_x1 = f_x3 = e_x5 = d_x7 = 0
i = hor << 3
x0 = tmp_data1[i]
x1 = tmp_data1[i+1]
x2 = tmp_data1[i+2]
x3 = tmp_data1[i+3]
x4 = tmp_data1[i+4]
x5 = tmp_data1[i+5]
x6 = tmp_data1[i+6]
x7 = tmp_data1[i+7]
#遍历长度为16的查找表
for j in range(16):
a_x0 += (x0 >> j) * a_table[j]
b_x2 += (x2 >> j) * b_table[j]
a_x4 += (x4 >> j) * a_table[j]
c_x6 += (x6 >> j) * c_table[j]
c_x2 += (x2 >> j) * c_table[j]
b_x6 += (x6 >> j) * b_table[j]
d_x1 += (x1 >> j) * d_table[j]
e_x1 += (x1 >> j) * e_table[j]
f_x1 += (x1 >> j) * f_table[j]
g_x1 += (x1 >> j) * g_table[j]
d_x3 += (x3 >> j) * d_table[j]
e_x3 += (x3 >> j) * e_table[j]
f_x3 += (x3 >> j) * f_table[j]
g_x3 += (x3 >> j) * g_table[j]
d_x5 += (x5 >> j) * d_table[j]
e_x5 += (x5 >> j) * e_table[j]
f_x5 += (x5 >> j) * f_table[j]
g_x5 += (x5 >> j) * g_table[j]
d_x7 += (x7 >> j) * d_table[j]
e_x7 += (x7 >> j) * e_table[j]
f_x7 += (x7 >> j) * f_table[j]
g_x7 += (x7 >> j) * g_table[j]
acc0 = a_x0 + b_x2 + a_x4 + c_x6
acc1 = a_x0 + c_x2 - a_x4 - b_x6
acc2 = a_x0 - c_x2 - a_x4 + b_x6
acc3 = a_x0 - b_x2 + a_x4 - c_x6
acc4 = d_x1 + e_x3 + f_x5 + g_x7
acc5 = e_x1 - g_x3 - d_x5 - f_x7
acc6 = f_x1 - d_x3 + g_x5 + e_x7
acc7 = g_x1 - f_x3 + e_x5 - d_x7
tmp_data2[hor] = acc0 + acc4
tmp_data2[(1<<3)+hor] = acc1 + acc5
tmp_data2[(2<<3)+hor] = acc2 + acc6
tmp_data2[(3<<3)+hor] = acc3 + acc7
tmp_data2[(4<<3)+hor] = acc3 - acc7
tmp_data2[(5<<3)+hor] = acc2 - acc6
tmp_data2[(6<<3)+hor] = acc1 - acc5
tmp_data2[(7<<3)+hor] = acc0 - acc4
if k == 0:
tmp_data1 = idct_data2
tmp_data2 = idct_data1
for i in range(64):
tmp_data2[i] = (tmp_data2[i] + 512) >> 10
if tmp_data2[i] > 255:
tmp_data2[i] = 255
if tmp_data2[i] < -256:
tmp_data2[i] = -256
return np.asarray(tmp_data2).reshape(8, 8)