1 basic programming knowledge about this article’s code
1.1 Abstractclass, abstractmethod, classmethod and metaclass
Abstractclass, abstractmethod is mainly used to define a set of specifications. Ensure that the implementation of the class follows a specific interface specification. For example, abstractmethods are created according to Google’s abc.py:
def abstractmethod(funcobj):
"""A decorator indicating abstract methods. Requires that the metaclass is ABCMeta or derived from it. A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods are overridden. The abstract methods can be called using any of the normal 'super' call mechanisms. abstractmethod() may be used to declare abstract methods for properties and descriptors.
Usage:
class C(metaclass=ABCMeta):
@abstractmethod
def my_abstract_method(self, ...):
...
"""
funcobj.__isabstractmethod__ = True
return funcobj
if we want to instantiate a metaclass(Father), all methods of this metaclass(Father) must be overridden.
usage:
from abc import ABCMeta, abstractmethod
class Father(metaclass=ABCMeta):
def __init__(self, name, sex):
self.name = name
self.sex = sex
@abstractmethod
def my_instance_method(self):
"""这是一个抽象方法,子类必须实现它"""
pass
@classmethod
@abstractmethod
def my_class_method(cls):
"""这是一个抽象类,需要子类实现它,cls 是类方法的第一个参数,它是一个对类的引用,允许类方法访问和修改类的状态"""
pass
class Son(Father):
def my_instance_method(self):
print("实例方法的实现")
@classmethod
def my_class_method(cls):
print("类方法的实现", cls)
# 调用类方法
Son.my_class_method() # 输出: 类方法的实现 <class '__main__'.Son>
So, if u want to instantiate ‘Father’, all methods of this ‘Father’ must be overridden.
1.2 property
add a read-only property which can be deleted to an instantiated class.
class Circle:
def __init__(self, radius):
self.radius = radius
@property
def diameter(self):
return self.radius * 2
@diameter.setter
def diameter(self, value):
self.radius = value / 2
@diameter.deleter
def diameter(self):
del self.radius
# 使用示例
circle = Circle(5)
print(circle.radius) # 输出: 5
print(circle.diameter) # 输出: 10
circle.diameter = 8 # 如果没有@diameter.setter这一步将不能操作
print(circle.radius) # 输出: 4
print(circle.diameter) # 输出: 8
del circle.diameter
print(circle.radius) # 抛出 AttributeError: 'Circle' object has no attribute 'radius'
2 Some quantization methods
SpQR,OWQ,AWQ1
3 OmniQuant
the goal of a quantization
arg
min
Θ
1
,
Θ
2
∣
∣
F
(
W
,
X
)
−
F
(
Q
w
(
W
;
Θ
1
,
Θ
2
)
,
Q
a
(
X
,
Θ
2
)
)
∣
∣
(1)
\arg\min_{\Theta_1,\Theta_2}||\mathcal{F}(\mathbf{W},\mathbf{X})-\mathcal{F}\big(Q_w(\mathbf{W};\Theta_1,\Theta_2),Q_a(\mathbf{X},\Theta_2)\big)||\tag{1}
argΘ1,Θ2min∣∣F(W,X)−F(Qw(W;Θ1,Θ2),Qa(X,Θ2))∣∣(1)
where
F
\mathcal{F}
F represents the mapping function for a transformer block,
W
\mathbf{W}
W and
X
\mathbf{X}
X are full-precision weight ant activation,
Q
w
Q_w
Qw and
Q
a
Q_a
Qa represent weight and activation quantizer,
Θ
1
\Theta_1
Θ1 and
Θ
2
\Theta_2
Θ2 are quantization parameters in LWC and LET. The idea of OmniQuant nearly all comes from AWQ.
3.1 Initialization of Quantization Parameters
3.2 Block-wise Quantization Error Minimization
3.3 Learnable Weight Clipping (LWC)
weight-only quantization:
W
q
=
c
l
a
m
p
(
⌊
W
h
⌉
+
z
,
0
,
2
N
−
1
)
,
w
h
e
r
e
h
=
γ
max
(
W
)
−
β
min
(
W
)
2
N
−
1
,
z
=
−
⌊
β
min
(
W
)
h
⌉
(2)
\mathbf{W_q}=\mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1),\mathrm{where~}h=\frac{\gamma\max(\mathbf{W})-\beta\min(\mathbf{W})}{2^N-1},z=-\lfloor\frac{\beta\min(\mathbf{W})}h\rceil\tag{2}
Wq=clamp(⌊hW⌉+z,0,2N−1),where h=2N−1γmax(W)−βmin(W),z=−⌊hβmin(W)⌉(2)
where
γ
∈
[
0
,
1
]
\gamma\in[0,1]
γ∈[0,1] and
β
∈
[
0
,
1
]
\beta\in[0,1]
β∈[0,1], the meaning of
c
l
a
m
p
(
⌊
W
h
⌉
+
z
,
0
,
2
N
−
1
)
\mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1)
clamp(⌊hW⌉+z,0,2N−1) is if a number big than
2
N
−
1
2^N-1
2N−1 or smaller than
0
0
0 they will be sat as
2
N
−
1
2^N-1
2N−1 or
0
0
0.Not be very special.
here is an example2 for this:
So, Θ 1 = { γ , β } \Theta_1=\{\gamma,\beta\} Θ1={γ,β} in Eqn. 1.
3.4 Learnable Equivalent Transformation (LET)
Y
=
X
W
+
B
=
[
(
X
−
δ
)
⊘
s
⏟
X
~
]
⋅
[
s
⊙
W
⏟
W
~
]
+
[
B
+
δ
W
⏟
B
~
]
(3)
\mathbf{Y}=\mathbf{X}\mathbf{W}+\mathbf{B}=[\underbrace{{(\mathbf{X}-\delta)\oslash s}}_{{\tilde{\mathbf{X}}}}]\cdot[\underbrace{{s\odot\mathbf{W}}}_{{\tilde{\mathbf{W}}}}]+[\underbrace{{\mathbf{B}+\delta\mathbf{W}}}_{{\tilde{\mathbf{B}}}}]\tag{3}
Y=XW+B=[X~
(X−δ)⊘s]⋅[W~
s⊙W]+[B~
B+δW](3)
where
Y
Y
Y represents the output,
s
∈
R
1
×
C
i
n
\mathbf{s}\in\mathbb{R}^{1\times C_{in}}
s∈R1×Cin and
δ
∈
R
1
×
C
i
n
\delta\in\mathbb{R}^{1\times C_{in}}
δ∈R1×Cin are channel-wise scaling and shifting parameters, respectively,
X
~
\tilde{\mathbf{X}}
X~
W
~
\tilde{\mathbf{W}}
W~ and
B
~
\tilde{\mathbf{B}}
B~ are equivalent activation, weight and bias.
⊙
\odot
⊙ and
⋅
\cdot
⋅ are elementwise multiplication and division.
Finally, quantization on transformed activations and weights are performed as given by
Y
=
Q
a
(
X
~
)
Q
w
(
W
~
)
+
B
~
(4)
\mathbf{Y}=Q_a(\tilde{\mathbf{X}})Q_w(\tilde{\mathbf{W}})+\widetilde{\mathbf{B}}\tag{4}
Y=Qa(X~)Qw(W~)+B
(4)
where
Q
a
Q_a
Qa is the vanilla MinMax quantizer and
Q
a
Q_a
Qa is the MinMax quantizer with LWC.
The learning equivalent transform of the self-attention affinity matrix can be written as:
P
=
S
o
f
t
m
a
x
(
Q
K
T
)
=
S
o
f
t
m
a
x
(
(
Q
⊗
s
a
⏟
Q
~
)
(
s
a
⊙
K
T
⏟
K
~
T
)
)
\mathbf{P}=\mathrm{Softmax}(\mathbf{Q}\mathbf{K}^T)=\mathrm{Softmax}((\underbrace{\mathbf{Q}\otimes s_a}_{\tilde{\mathcal{Q}}})(\underbrace{s_a\odot\mathbf{K}^T}_{\tilde{\mathcal{K}}^T}))
P=Softmax(QKT)=Softmax((Q~
Q⊗sa)(K~T
sa⊙KT))
where
s
a
∈
R
1
×
C
o
u
t
s_a\in\mathbb{R}^{1\times C_{out}}
sa∈R1×Cout is the scaling factor in the affinity matrix. Similar to Eqn. 3, the quantized affinity matrix calculation is expressed a
P
=
S
o
f
t
m
a
x
(
Q
a
(
Q
~
)
Q
a
(
K
~
T
)
)
\mathbf{P}=\mathop{\mathrm{Softmax}}(Q_a(\widetilde{\mathbf{Q}})Q_a(\widetilde{\mathbf{K}}^T))
P=Softmax(Qa(Q
)Qa(K
T)).
And
Θ
2
=
{
δ
,
s
,
s
a
}
\Theta_{2}=\{\delta,s,s_{a}\}
Θ2={δ,s,sa} in Eqn. 1.
3.5 Quantization Operation
3.6 Training and Optimization
3.7 Iterative Calibration
3.8 Model Evaluation
Deployment
r e f e r e n c e reference reference