本文主要讲述 quantized 版本的矩阵乘法的计算流程,矩阵乘如下:
C
=
A
×
B
C=A\times B
C=A×B
其中
A
A
A 为 uint8 类型,
B
B
B 为 int8 类型,C为 uint8 类型。其详细计算流程为u8s8->s32. s32 需要 dequantize 成 fp32 类型, 然后再经过quantize 才能变成 uint8 类型。在 affine 量化中,s32 类型的 zero_point 为 0.
A 矩阵采用 per tensor 量化, B 矩阵采用 per_channel 量化
A
i
i
n
t
8
=
A
i
f
p
32
s
c
a
l
e
A
+
z
e
r
o
A
A_i^{int8} = \frac{A_i^{fp32}}{scale_{A}} + zero_A
Aiint8=scaleAAifp32+zeroA
B
i
i
n
t
8
=
B
i
f
p
32
s
c
a
l
e
B
c
o
l
j
+
z
e
r
o
B
c
o
l
j
B_i^{int8} = \frac{B_i^{fp32}}{scale_{B_{col_j}}} + zero_{B_{col_j}}
Biint8=scaleBcoljBifp32+zeroBcolj
A
i
i
n
t
8
×
B
i
i
n
t
8
=
(
A
i
f
p
32
s
c
a
l
e
A
+
z
e
r
o
A
)
×
(
B
i
f
p
32
s
c
a
l
e
B
c
o
l
j
+
z
e
r
o
B
c
o
l
j
)
=
(
A
i
f
p
32
×
B
i
f
p
32
s
c
a
l
e
A
×
s
c
a
l
e
B
c
o
l
j
)
+
z
e
r
o
A
×
B
i
i
n
t
8
+
A
i
i
n
t
8
×
z
e
r
o
B
c
o
l
j
−
z
e
r
o
A
×
z
e
r
o
B
c
o
l
j
=
A
i
f
p
32
×
B
i
f
p
32
+
z
e
r
o
A
×
B
i
i
n
t
8
+
A
i
i
n
t
8
×
z
e
r
o
B
c
o
l
j
−
z
e
r
o
A
×
z
e
r
o
B
c
o
l
j
\begin{aligned} A_i^{int8} \times B_i^{int8} &= (\frac{A_i^{fp32}}{scale_{A}} + zero_A) \times (\frac{B_i^{fp32}}{scale_{B_{col_j}}} + zero_{B_{col_j}})\\ &=(\frac{A_i^{fp32} \times B_i^{fp32}}{scale_{A} \times scale_{B_{col_j}}} ) + zero_A \times B_i^{int8}+ \\ & \space \space \space \space \space A_i^{int8} \times zero_{B_{col_j}} - zero_A \times zero_{B_{col_j}}\\ &=A_i^{fp32} \times B_i^{fp32} + zero_A \times B_i^{int8}+ \\ & \space \space \space \space \space A_i^{int8} \times zero_{B_{col_j}} - zero_A \times zero_{B_{col_j}} \\ \end{aligned}
Aiint8×Biint8=(scaleAAifp32+zeroA)×(scaleBcoljBifp32+zeroBcolj)=(scaleA×scaleBcoljAifp32×Bifp32)+zeroA×Biint8+ Aiint8×zeroBcolj−zeroA×zeroBcolj=Aifp32×Bifp32+zeroA×Biint8+ Aiint8×zeroBcolj−zeroA×zeroBcolj
∑ i = 0 k − 1 A i i n t 8 × B i i n t 8 = ∑ i = 0 k − 1 A i f p 32 × B i f p 32 + z e r o A × ∑ i = 0 k − 1 B i i n t 8 + z e r o B c o l j × ∑ i = 0 k − 1 A i i n t 8 − z e r o A × ∑ i = 0 k − 1 B c o l j = ∑ i = 0 k − 1 A i f p 32 × B i f p 32 + z e r o A × ( ∑ i = 0 k − 1 B i i n t 8 − ∑ i = 0 k − 1 B c o l j ) + z e r o B c o l j × ∑ i = 0 k − 1 A i i n t 8 \begin{aligned} \sum _{i=0}^{k-1}{A_i^{int8}\times B_i^{int8} } &=\sum _{i=0}^{k-1}{A_i^{fp32}\times B_i^{fp32} } + zero_A\times \sum _{i=0}^{k-1}{B_i^{int8}} + \\ & \space \space \space \space \space zero_{B_{col_j}}\times \sum _{i=0}^{k-1}{A_i^{int8}} - zero_A \times \sum _{i=0}^{k-1} B_{col_j}\\ &=\sum _{i=0}^{k-1}{A_i^{fp32}\times B_i^{fp32} } +zero_A \times(\sum _{i=0}^{k-1}{B_i^{int8}} - \sum _{i=0}^{k-1} B_{col_j} ) + \\ &\space \space \space \space \space zero_{B_{col_j}}\times \sum _{i=0}^{k-1}{A_i^{int8}} \end{aligned} i=0∑k−1Aiint8×Biint8=i=0∑k−1Aifp32×Bifp32+zeroA×i=0∑k−1Biint8+ zeroBcolj×i=0∑k−1Aiint8−zeroA×i=0∑k−1Bcolj=i=0∑k−1Aifp32×Bifp32+zeroA×(i=0∑k−1Biint8−i=0∑k−1Bcolj)+ zeroBcolj×i=0∑k−1Aiint8
所以:
∑
i
=
0
k
−
1
A
i
f
p
32
×
B
i
f
p
32
=
∑
i
=
0
k
−
1
A
i
i
n
t
8
×
B
i
i
n
t
8
−
z
e
r
o
B
c
o
l
j
×
∑
i
=
0
k
−
1
A
i
i
n
t
8
−
z
e
r
o
A
×
(
∑
i
=
0
k
−
1
B
i
i
n
t
8
−
∑
i
=
0
k
−
1
B
c
o
l
j
)
\begin{aligned} \sum _{i=0}^{k-1}{A_i^{fp32}\times B_i^{fp32} } &=\sum _{i=0}^{k-1}{A_i^{int8}\times B_i^{int8} } - zero_{B_{col_j}}\times \sum _{i=0}^{k-1}{A_i^{int8}} - \\ & \space \space \space \space \space zero_A \times(\sum _{i=0}^{k-1}{B_i^{int8}} - \sum _{i=0}^{k-1} B_{col_j} ) \end{aligned}
i=0∑k−1Aifp32×Bifp32=i=0∑k−1Aiint8×Biint8−zeroBcolj×i=0∑k−1Aiint8− zeroA×(i=0∑k−1Biint8−i=0∑k−1Bcolj)
其中
(
∑
i
=
0
k
−
1
B
i
i
n
t
8
−
∑
i
=
0
k
−
1
B
c
o
l
j
)
(\sum _{i=0}^{k-1}{B_i^{int8}} - \sum _{i=0}^{k-1} B_{col_j} )
(∑i=0k−1Biint8−∑i=0k−1Bcolj) 为第
i
i
i 列的col_offset,
∑
i
=
0
k
−
1
A
i
i
n
t
8
\sum _{i=0}^{k-1}{A_i^{int8}}
∑i=0k−1Aiint8 为 A 矩阵第
i
i
i 行的 row_offset