DAY4(Quantization&Tensor)

最新推荐文章于 2023-05-04 16:24:50 发布

sinat_39296613

最新推荐文章于 2023-05-04 16:24:50 发布

阅读量219

点赞数

分类专栏：工作日志

本文链接：https://blog.csdn.net/sinat_39296613/article/details/107682702

版权

工作日志专栏收录该内容

32 篇文章 0 订阅

订阅专栏

Quantization

model choose： quantize model architectures that are already efficient at trading off latency with accuracy.

represent weights to binary, ternary and use bit-shift to do multiplication.

Quantization values(parameters) to do integer arithmetic operation.
scheme: integer-arithmetic inference and floating-point training.
equivalent to requiring quantization scheme to mapping integers q(bit representation, quantized value) to real number r
r=S(q-Z)
use a single set of quantization parameters {S,Z} for all values within each activations array and each weights array
S(scale)-usually an arbitrary positive real floating number
Z(zero-point)-quantities the value q to 0

multiple operation:

for r3=r1*r2
M=S1S2/S3, be in the interval of (0,1), and can be therefore expressed as normalized form
M=2^(-n)*M0 (M0 is in the interval [0.5,1))
takes 2N^3 operations
q1 matrix to be the weight and q2 matrix to be the activations
int32+=uint8*uint8
Sbias=S1*S2, Zbias=0

Training with simulated quantization

simulate quantization in the forward pass of training
weights and biases stored in float when backpropagation
weights are quantized before they are convolved with the input
activations are quantized at points where they would be during inference
biases quantization parameters are inferred from weights and activations (32 bit integers)

For each layer, quantization is parameterized by the number of quantization levels and clamping range
在这里插入图片描述
r is a real-valued number to be quantized, [a;b] is the quantization range, n is the number of quantization levels, [.] denotes rounding to the nearest integer
n is fixed for all layers. eg. n=2^8=256 for 8 bit quantization

Learning quantization ranges

For weights:
a:=min w, b:=max w
make a tweak to range from [-127,127]
For activations:
ranges depend on the inputs to the network
the learned quantization parameters map to the scale S and zero-point Z in equation S = s(a, b, n), Z = z(a, b, n)

Tensor

Tensor对象的3个属性：
rank: number of dimensions
shape: number of rows and columns
type: data type of tensor’s elements

sinat_39296613

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DAY4(Quantization&Tensor)

Quantizationmodel choose： quantize model architectures that are already efficient at trading off latency with accuracy.represent weights to binary, ternary and use bit-shift to do multiplication.Quantization values(parameters) to do integer arithmetic
复制链接

扫一扫