AI&dl&ml(人工智能&深度学习&机器学习相关笔记) note 1.1

AI 人工智能学习笔记

量化 quantification

Basic Knowledge

Concept

Convert the weights and activation values of the high-level width representation to low bit width. Such as convert: Float32 -> Float16/Int8(most popular,-128-127) / uint8(low precision,0-255)

Pros
  • Reduce model size, the model of Int8 will have only 1/4 size of a model of Float32
  • Imporve the speed of inference and handle more data during the same time.
  • Fit in some hardware accelerators such as DSP/NPU

Symmetrical & asymmetrical quantification

Symmetrical Quantize

Float -> Int8
对称量化

量化因子:
Δ = m a x ( a b s ( r m a x ) , a b s ( r m i n ) ) \Delta = max(abs(rmax), abs(rmin)) Δ=max(abs(rmax),abs(rmin))
量化过程:
根据量化因子做一个最近邻取整,之后做一个卡断,转成定点数
x i n i t = r o u n d ( x Δ ) x_{init} = round(\frac{x}{\Delta}) xinit=round(Δx)
x Q = c l i p ( − N l e v e l s 2 , N l e v e l s 2 − 1 ) , x i n i t , i f   s i g n e d x_Q = clip(\frac{-N_{levels}}{2}, \frac{N_{levels}}{2}-1), x_{init}, if \ signed xQ=clip(2Nlevels,2Nlevels1),xinit,if signed
N l e v e l s = 256   f o r   8 − b i t s   o f   p r e c i s i o n N_{levels} = 256\ for\ 8-bits\ of\ precision Nlevels=256 for 8bits of precision

Asymmetrical Quantize

非对称量化

量化过程与原理和对称量化类似,只是范围有所改变,加上了一个偏置Z
Δ = ( r m a x − r m i n ) / 255 \Delta = (rmax - rmin)/255 Δ=(rmaxrmin)/255
z = − r m i n Δ z = -\frac{rmin}{\Delta} z=Δrmin
x i n i t = r o u n d ( x Δ ) + z x_{init} = round(\frac{x}{\Delta}) + z xinit=round(Δx)+z
x Q = c l i p ( 0 , N l e v e l − 1 , x i n i t ) x_Q = clip(0,N_{level}-1, x_{init}) xQ=clip(0,Nlevel1,xinit)
N l e v e l s = 256   f o r   8 − b i t s   o f   p r e c i s i o n N_{levels} = 256\ for\ 8-bits\ of\ precision Nlevels=256 for 8bits of precision

但是人们认为这种不饱和线性量化,损失的精度比较大

Post Quantification & Training Quantification

(Post) Tensor RT Quantize

activation value(激活值) -> 饱和量化,选择合适的阈值 a b s ( T ) abs(T) abs(T)
weights(权重) -> 直接非饱和量化

训练模拟量化

Forward过程中,将权值和激活值量化到8bit之后再反量化到有误差的32bit,训练还是浮点数
Backward求得梯度是模拟量化之后权值的梯度,用这个梯度去更新量化前的权值weights
以对称量化为例子:
x i n i t = r o u n d ( x Δ ) x_{init} = round(\frac{x}{\Delta}) xinit=round(Δx)
x Q = c l i p ( − N l e v e l s 2 , N l e v e l s 2 − 1 ) , x i n i t , i f   s i g n e d x_Q = clip(\frac{-N_{levels}}{2}, \frac{N_{levels}}{2}-1), x_{init}, if \ signed xQ=clip(2Nlevels,2Nlevels1),xinit,if signed
x o u t = x Q Δ x_{out} = x_Q \Delta xout=xQΔ
其中 x o u t x_{out} xout即为反量化的输出,会引入一定的误差,之后用这个数值来做前向传播forward
而对于梯度:
ω f l o a t = ω f l o a t − η ∂ L ∂ ω o u t ⋅ I ω o u t ∈ ( ω m i n , ω m a x ) \omega_{float} = \omega_{float} - \eta \frac{\partial L}{\partial \omega_{out}} \cdot{I_{\omega_{out}\in (\omega_{min}, \omega_{max})}} ωfloat=ωfloatηωoutLIωout(ωmin,ωmax)
ω o u t = S i m Q u a n t ( ω f l o a t ) \omega_{out} = SimQuant(\omega_{float}) ωout=SimQuant(ωfloat)
其中 S i m Q u a n t SimQuant SimQuant就是上面计算 x o u t x_{out} xout同样的步骤, η \eta η是学习速率learning rate。
其目的是让网络学习量化带来的误差
权值weight的scale直接根据每次forward的最大值求得:
w e i g h t   s c a l e = max ⁡ ( a b s ( w e i g h t ) ) / 128 weight \ scale = \max(abs(weight))/128 weight scale=max(abs(weight))/128
激活值activation的scale类似,但是max值是通过训练中使用EMA(exponential moving averages)的方式求得。
m a x = max ⁡ ∗ m o m e n t a + m a x ( a b s ( a c t i v a t i o n ) ) ∗ ( 1 − m o m e n t a ) ,   m o m e n t a = 0.95 max = \max * momenta + max(abs(activation))*(1- momenta), \ momenta=0.95 max=maxmomenta+max(abs(activation))(1momenta), momenta=0.95
s c a l e = m a x / 128 scale = max/128 scale=max/128
同时模拟量化训练时需要推理把batch norm融合进卷积参数。其中一个卷积层接受原始的浮点数值,算出激活值activation value之后会去计算 γ \gamma γ β \beta β,计算得到的均值和方差,再进入卷积层进行量化,量化完进行卷积回去计算 γ \gamma γ β \beta β,计算得到的均值和方差,再进入卷积层进行量化,量化完进行卷积

实现细节

  1. 量化之后的权值限制在(-127,127)之间。正常8bit的取值在[ − 2 7 -2^7 27, 2 7 2^7 27-1],相乘之后取值区间是(- 2 14 2^{14} 214, 2 14 2^{14} 214],累加两次之后就到了(- 2 15 2^{15} 215, 2 15 2^{15} 215],就会有超过int16正数表示的最大值 2 15 − 1 2^{15}-1 2151的范围。这样一次乘法的结果就会小于 2 14 2^{14} 214
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值