8 篇文章 0 订阅

# 一、Title

Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference

# 三、Methods

## 1. Quantized Inference & train

r = S ( q − Z ) r=S(q-Z)

r 1 = S 1 ( q 1 − Z 1 ) r_1=S_1(q_1-Z_1)
r 2 = S 2 ( q 2 − Z 2 ) r_2=S_2(q_2-Z_2)
r 3 = S 3 ( q 3 − Z 3 ) r_3=S_3(q_3-Z_3)

r 3 i , j = ∑ k = 1 N r 1 i , k ∗ r 2 k , j r_3^{i,j}=\sum_{k=1}^Nr_{1}^{i,k}*r_2^{k,j}

S 3 ( q 3 i , j − Z 3 ) = ∑ k = 1 N S 1 ( q 1 i , k − Z 1 ) ∗ S 2 ( q 2 k , j − Z 2 ) = S 1 S 2 ∑ k = 1 N ( q 1 i , k − Z 1 ) ( q 2 k , j − Z 2 ) = S 1 S 2 ( N Z 1 Z 2 − Z 1 ∑ k = 1 N q 2 k , j − Z 2 ∑ k = 1 N q 1 i , k + ∑ k = 1 N q 1 i , k q 2 k , j ) / / 若 考 虑 b i a s ， 则 还 需 加 上 ( b r S 1 S 2 + 0 ) S_3(q_3^{i,j}-Z_3)=\sum_{k=1}^NS_1(q_1^{i,k}-Z_1)*S_2(q_2^{k,j}-Z_2)\\=S_1S_2\sum_{k=1}^N(q_1^{i,k}-Z_1)(q_2^{k,j}-Z_2)\\=S_1S_2(NZ_1Z_2-Z_1\sum_{k=1}^Nq_2^{k,j}-Z_2\sum_{k=1}^Nq_1^{i,k}+\sum_{k=1}^Nq_1^{i,k}q_2^{k,j})//若考虑bias，则还需加上(\dfrac{b_r}{S_1S_2}+0)

q 3 i , j = R e L U ( r 3 i , j ) S 3 + Z 3 = R e L U ( r 3 i , j ) S 3 = R e L U ( r 3 i , j S 3 ) = R e L U ( r 3 i , j S 3 + Z 3 ) q_3^{i,j}=\dfrac{ReLU(r_3^{i,j})}{S_3}+Z_3=\dfrac{ReLU(r_3^{i,j})}{S_3}=ReLU(\dfrac{r_3^{i,j}}{S_3})=ReLU(\dfrac{r_3^{i,j}}{S_3}+Z_3)

q 3 i , j = R e L U ( Z 3 + S 1 S 2 S 3 ( N Z 1 Z 2 − Z 1 ∑ k = 1 N q 2 k , j − Z 2 ∑ k = 1 N q 1 i , k + ∑ k = 1 N q 1 i , k q 2 k , j + b r S 1 S 2 + 0 ) ) q_3^{i,j}=ReLU(Z_3+\dfrac{S_1S_2}{S_3}(NZ_1Z_2-Z_1\sum_{k=1}^Nq_2^{k,j}-Z_2\sum_{k=1}^Nq_1^{i,k}+\sum_{k=1}^Nq_1^{i,k}q_2^{k,j}+\dfrac{b_r}{S_1S_2}+0))

u i n t 8 = c l a m p ( 0 , 255 , i n t 32 ) uint_8=clamp(0,255,int_{32})

## 2. Training with simulated quantization

s ( a , b , n ) = b − a n − 1 q = r o u n d ( c l a m p ( a , b , r ) − a s ) s(a,b,n)=\dfrac{b-a}{n-1}\\q=round(\dfrac{clamp(a,b,r)-a}{s})

r ′ = q ∗ s ( a , b , n ) + a r'=q*s(a,b,n)+a

## 3. Batch normalization folding

γ ( x − μ ) / σ + β \gamma(x-\mu)/\sigma+\beta

w f o l d = w ∗ γ / σ w_{fold}=w*\gamma/\sigma
b f o l d = β − γ ∗ μ / σ b_{fold}=\beta-\gamma*\mu/\sigma

# 四、Experiment

## 2.1 MobileNet on ImageNet

1. Snapdragon 835 LITTLE core
2. Snapdragon 835 big core
3)Snapdragon 821 big core
通过改变MobileNet的 depth-multipliers (DM)和resolutions来观察实验结果。

上图是在Snapdragon 835上浮点模型和8bit模型的准确率和延迟，可以看到，在相同延迟的情况下，8bit模型能达到更高的精度。

• 1
点赞
• 0
评论
• 4
收藏
• 一键三连
• 扫一扫，分享海报

03-04 1万+
02-21 522
04-09 309
05-02 1万+
03-29 225
05-18 907
07-19 1826
05-17 5120
08-22 1万+