DAY4(Quantization&Tensor)

Quantization

model choose: quantize model architectures that are already efficient at trading off latency with accuracy.

represent weights to binary, ternary and use bit-shift to do multiplication.

  • Quantization values(parameters) to do integer arithmetic operation.

  • scheme: integer-arithmetic inference and floating-point training.

  • equivalent to requiring quantization scheme to mapping integers q(bit representation, quantized value) to real number r
    r=S(q-Z)
    use a single set of quantization parameters {S,Z} for all values within each activations array and each weights array

  • S(scale)-usually an arbitrary positive real floating number

  • Z(zero-point)-quantities the value q to 0

multiple operation:

  • for r3=r1*r2
    在这里插入图片描述
  • M=S1S2/S3, be in the interval of (0,1), and can be therefore expressed as normalized form
    M=2^(-n)*M0 (M0 is in the interval [0.5,1))
    在这里插入图片描述takes 2N^3 operations
  • q1 matrix to be the weight and q2 matrix to be the activations
  • int32+=uint8*uint8
  • Sbias=S1*S2, Zbias=0
scale down by M&M0
cast down due to Relu function
int32
8 bit
uint 8

Training with simulated quantization

  • simulate quantization in the forward pass of training

  • weights and biases stored in float when backpropagation

  • weights are quantized before they are convolved with the input

  • activations are quantized at points where they would be during inference

  • biases quantization parameters are inferred from weights and activations (32 bit integers)

For each layer, quantization is parameterized by the number of quantization levels and clamping range
在这里插入图片描述
r is a real-valued number to be quantized, [a;b] is the quantization range, n is the number of quantization levels, [.] denotes rounding to the nearest integer
n is fixed for all layers. eg. n=2^8=256 for 8 bit quantization

Learning quantization ranges

  • For weights:
  • a:=min w, b:=max w
  • make a tweak to range from [-127,127]
  • For activations:
  • ranges depend on the inputs to the network
  • the learned quantization parameters map to the scale S and zero-point Z in equation S = s(a, b, n), Z = z(a, b, n)

Tensor

Tensor对象的3个属性:
rank: number of dimensions
shape: number of rows and columns
type: data type of tensor’s elements

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值