20240605日志:OmniQuant

1 basic programming knowledge about this article’s code

1.1 Abstractclass, abstractmethod, classmethod and metaclass

Abstractclass, abstractmethod is mainly used to define a set of specifications. Ensure that the implementation of the class follows a specific interface specification. For example, abstractmethods are created according to Google’s abc.py:

def abstractmethod(funcobj):
    """A decorator indicating abstract methods. Requires that the metaclass is ABCMeta or derived from it.  A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods are overridden. The abstract methods can be called using any of the normal 'super' call mechanisms.  abstractmethod() may be used to declare abstract methods for properties and descriptors.
    Usage:
        class C(metaclass=ABCMeta):
            @abstractmethod
            def my_abstract_method(self, ...):
                ...
    """
    funcobj.__isabstractmethod__ = True
    return funcobj

if we want to instantiate a metaclass(Father), all methods of this metaclass(Father) must be overridden.
usage:

from abc import ABCMeta, abstractmethod

class Father(metaclass=ABCMeta):
    def __init__(self, name, sex):
        self.name = name
        self.sex = sex
    
    @abstractmethod
    def my_instance_method(self):
        """这是一个抽象方法,子类必须实现它"""
        pass

	@classmethod
	@abstractmethod
	def my_class_method(cls):
		"""这是一个抽象类,需要子类实现它,cls 是类方法的第一个参数,它是一个对类的引用,允许类方法访问和修改类的状态"""
		pass
		
	class Son(Father):
	    def my_instance_method(self):
	        print("实例方法的实现")
	    @classmethod
	    def my_class_method(cls):
	        print("类方法的实现", cls)
# 调用类方法
Son.my_class_method()  # 输出: 类方法的实现 <class '__main__'.Son>

So, if u want to instantiate ‘Father’, all methods of this ‘Father’ must be overridden.

1.2 property

add a read-only property which can be deleted to an instantiated class.

class Circle:
    def __init__(self, radius):
        self.radius = radius
    @property
    def diameter(self):
        return self.radius * 2
    @diameter.setter
    def diameter(self, value):
        self.radius = value / 2
    @diameter.deleter
    def diameter(self):
        del self.radius

# 使用示例
circle = Circle(5)
print(circle.radius)    # 输出: 5
print(circle.diameter)  # 输出: 10

circle.diameter = 8     # 如果没有@diameter.setter这一步将不能操作
print(circle.radius)    # 输出: 4
print(circle.diameter)  # 输出: 8

del circle.diameter
print(circle.radius)    # 抛出 AttributeError: 'Circle' object has no attribute 'radius'

2 Some quantization methods

在这里插入图片描述

Fig. 2.1 mixed-precision quantization

SpQR,OWQ,AWQ1

3 OmniQuant

the goal of a quantization
arg ⁡ min ⁡ Θ 1 , Θ 2 ∣ ∣ F ( W , X ) − F ( Q w ( W ; Θ 1 , Θ 2 ) , Q a ( X , Θ 2 ) ) ∣ ∣ (1) \arg\min_{\Theta_1,\Theta_2}||\mathcal{F}(\mathbf{W},\mathbf{X})-\mathcal{F}\big(Q_w(\mathbf{W};\Theta_1,\Theta_2),Q_a(\mathbf{X},\Theta_2)\big)||\tag{1} argΘ1,Θ2min∣∣F(W,X)F(Qw(W;Θ1,Θ2),Qa(X,Θ2))∣∣(1)
where F \mathcal{F} F represents the mapping function for a transformer block, W \mathbf{W} W and X \mathbf{X} X are full-precision weight ant activation, Q w Q_w Qw and Q a Q_a Qa represent weight and activation quantizer, Θ 1 \Theta_1 Θ1 and Θ 2 \Theta_2 Θ2 are quantization parameters in LWC and LET. The idea of OmniQuant nearly all comes from AWQ.

3.1 Initialization of Quantization Parameters

3.2 Block-wise Quantization Error Minimization

3.3 Learnable Weight Clipping (LWC)

weight-only quantization:
W q = c l a m p ( ⌊ W h ⌉ + z , 0 , 2 N − 1 ) , w h e r e   h = γ max ⁡ ( W ) − β min ⁡ ( W ) 2 N − 1 , z = − ⌊ β min ⁡ ( W ) h ⌉ (2) \mathbf{W_q}=\mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1),\mathrm{where~}h=\frac{\gamma\max(\mathbf{W})-\beta\min(\mathbf{W})}{2^N-1},z=-\lfloor\frac{\beta\min(\mathbf{W})}h\rceil\tag{2} Wq=clamp(⌊hW+z,0,2N1),where h=2N1γmax(W)βmin(W),z=hβmin(W)(2)
where γ ∈ [ 0 , 1 ] \gamma\in[0,1] γ[0,1] and β ∈ [ 0 , 1 ] \beta\in[0,1] β[0,1], the meaning of c l a m p ( ⌊ W h ⌉ + z , 0 , 2 N − 1 ) \mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1) clamp(⌊hW+z,0,2N1) is if a number big than 2 N − 1 2^N-1 2N1 or smaller than 0 0 0 they will be sat as 2 N − 1 2^N-1 2N1 or 0 0 0.Not be very special.
here is an example2 for this:
在这里插入图片描述

Fig. 2.2 Asymmetric quantization

So, Θ 1 = { γ , β } \Theta_1=\{\gamma,\beta\} Θ1={γ,β} in Eqn. 1.

3.4 Learnable Equivalent Transformation (LET)

Y = X W + B = [ ( X − δ ) ⊘ s ⏟ X ~ ] ⋅ [ s ⊙ W ⏟ W ~ ] + [ B + δ W ⏟ B ~ ] (3) \mathbf{Y}=\mathbf{X}\mathbf{W}+\mathbf{B}=[\underbrace{{(\mathbf{X}-\delta)\oslash s}}_{{\tilde{\mathbf{X}}}}]\cdot[\underbrace{{s\odot\mathbf{W}}}_{{\tilde{\mathbf{W}}}}]+[\underbrace{{\mathbf{B}+\delta\mathbf{W}}}_{{\tilde{\mathbf{B}}}}]\tag{3} Y=XW+B=[X~ (Xδ)s][W~ sW]+[B~ B+δW](3)
where Y Y Y represents the output, s ∈ R 1 × C i n \mathbf{s}\in\mathbb{R}^{1\times C_{in}} sR1×Cin and δ ∈ R 1 × C i n \delta\in\mathbb{R}^{1\times C_{in}} δR1×Cin are channel-wise scaling and shifting parameters, respectively, X ~ \tilde{\mathbf{X}} X~ W ~ \tilde{\mathbf{W}} W~ and B ~ \tilde{\mathbf{B}} B~ are equivalent activation, weight and bias. ⊙ \odot and ⋅ \cdot are elementwise multiplication and division.
Finally, quantization on transformed activations and weights are performed as given by
Y = Q a ( X ~ ) Q w ( W ~ ) + B ~ (4) \mathbf{Y}=Q_a(\tilde{\mathbf{X}})Q_w(\tilde{\mathbf{W}})+\widetilde{\mathbf{B}}\tag{4} Y=Qa(X~)Qw(W~)+B (4)
where Q a Q_a Qa is the vanilla MinMax quantizer and Q a Q_a Qa is the MinMax quantizer with LWC.
The learning equivalent transform of the self-attention affinity matrix can be written as:
P = S o f t m a x ( Q K T ) = S o f t m a x ( ( Q ⊗ s a ⏟ Q ~ ) ( s a ⊙ K T ⏟ K ~ T ) ) \mathbf{P}=\mathrm{Softmax}(\mathbf{Q}\mathbf{K}^T)=\mathrm{Softmax}((\underbrace{\mathbf{Q}\otimes s_a}_{\tilde{\mathcal{Q}}})(\underbrace{s_a\odot\mathbf{K}^T}_{\tilde{\mathcal{K}}^T})) P=Softmax(QKT)=Softmax((Q~ Qsa)(K~T saKT))
where s a ∈ R 1 × C o u t s_a\in\mathbb{R}^{1\times C_{out}} saR1×Cout is the scaling factor in the affinity matrix. Similar to Eqn. 3, the quantized affinity matrix calculation is expressed a P = S o f t m a x ( Q a ( Q ~ ) Q a ( K ~ T ) ) \mathbf{P}=\mathop{\mathrm{Softmax}}(Q_a(\widetilde{\mathbf{Q}})Q_a(\widetilde{\mathbf{K}}^T)) P=Softmax(Qa(Q )Qa(K T)).
And Θ 2 = { δ , s , s a } \Theta_{2}=\{\delta,s,s_{a}\} Θ2={δ,s,sa} in Eqn. 1.

3.5 Quantization Operation

3.6 Training and Optimization

3.7 Iterative Calibration

3.8 Model Evaluation

Deployment

r e f e r e n c e reference reference


  1. OMNIQUANT: OMNIDIRECTIONALLY CALIBRATED QUANTIZATION FOR LARGE LANGUAGE MODELS ↩︎

  2. RethinkFun 2024 模型量化一:量化基础 对称量化 非对称量化 极大值量化 零点量化 ↩︎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值