20240605日志:OmniQuant

十年伴树

已于 2024-06-07 15:51:20 修改

阅读量1k

点赞数 11

文章标签：自然语言处理 nlp chatgpt

于 2024-06-06 22:26:24 首次发布

本文链接：https://blog.csdn.net/JiajunSun/article/details/139480561

版权

1 basic programming knowledge about this article’s code

1.1 Abstractclass, abstractmethod, classmethod and metaclass

Abstractclass, abstractmethod is mainly used to define a set of specifications. Ensure that the implementation of the class follows a specific interface specification. For example, abstractmethods are created according to Google’s abc.py:

def abstractmethod(funcobj):
    """A decorator indicating abstract methods. Requires that the metaclass is ABCMeta or derived from it.  A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods are overridden. The abstract methods can be called using any of the normal 'super' call mechanisms.  abstractmethod() may be used to declare abstract methods for properties and descriptors.
    Usage:
        class C(metaclass=ABCMeta):
            @abstractmethod
            def my_abstract_method(self, ...):
                ...
    """
    funcobj.__isabstractmethod__ = True
    return funcobj

if we want to instantiate a metaclass(Father), all methods of this metaclass(Father) must be overridden.
usage:

from abc import ABCMeta, abstractmethod

class Father(metaclass=ABCMeta):
    def __init__(self, name, sex):
        self.name = name
        self.sex = sex
    
    @abstractmethod
    def my_instance_method(self):
        """这是一个抽象方法，子类必须实现它"""
        pass

	@classmethod
	@abstractmethod
	def my_class_method(cls):
		"""这是一个抽象类,需要子类实现它，cls 是类方法的第一个参数，它是一个对类的引用，允许类方法访问和修改类的状态"""
		pass
		
	class Son(Father):
	    def my_instance_method(self):
	        print("实例方法的实现")
	    @classmethod
	    def my_class_method(cls):
	        print("类方法的实现", cls)
# 调用类方法
Son.my_class_method()  # 输出: 类方法的实现 <class '__main__'.Son>

So, if u want to instantiate ‘Father’, all methods of this ‘Father’ must be overridden.

1.2 property

add a read-only property which can be deleted to an instantiated class.

class Circle:
    def __init__(self, radius):
        self.radius = radius
    @property
    def diameter(self):
        return self.radius * 2
    @diameter.setter
    def diameter(self, value):
        self.radius = value / 2
    @diameter.deleter
    def diameter(self):
        del self.radius

# 使用示例
circle = Circle(5)
print(circle.radius)    # 输出: 5
print(circle.diameter)  # 输出: 10

circle.diameter = 8     # 如果没有@diameter.setter这一步将不能操作
print(circle.radius)    # 输出: 4
print(circle.diameter)  # 输出: 8

del circle.diameter
print(circle.radius)    # 抛出 AttributeError: 'Circle' object has no attribute 'radius'

2 Some quantization methods

在这里插入图片描述

Fig. 2.1 mixed-precision quantization

SpQR,OWQ,AWQ¹

3 OmniQuant

the goal of a quantization
$\arg\min_{\Theta_1,\Theta_2}||\mathcal{F}(\mathbf{W},\mathbf{X})-\mathcal{F}\big(Q_w(\mathbf{W};\Theta_1,\Theta_2),Q_a(\mathbf{X},\Theta_2)\big)||\tag{1}$
where $\mathcal{F}$ represents the mapping function for a transformer block, $\mathbf{W}$ and $\mathbf{X}$ are full-precision weight ant activation, $Q_w$ and $Q_a$ represent weight and activation quantizer, $\Theta_1$ and $\Theta_2$ are quantization parameters in LWC and LET. The idea of OmniQuant nearly all comes from AWQ.

3.1 Initialization of Quantization Parameters

3.2 Block-wise Quantization Error Minimization

3.3 Learnable Weight Clipping (LWC)

weight-only quantization：
$\mathbf{W_q}=\mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1),\mathrm{where~}h=\frac{\gamma\max(\mathbf{W})-\beta\min(\mathbf{W})}{2^N-1},z=-\lfloor\frac{\beta\min(\mathbf{W})}h\rceil\tag{2}$
where $\gamma\in[0,1]$ and $\beta\in[0,1]$ , the meaning of $\mathrm{clamp}(\lfloor\frac{\mathbf{W}}h\rceil+z,0,2^N-1)$ is if a number big than $2^N-1$ or smaller than $0$ they will be sat as $2^N-1$ or $0$ .Not be very special.
here is an example² for this:
在这里插入图片描述

Fig. 2.2 Asymmetric quantization

So, $\Theta_1=\{\gamma,\beta\}$ in Eqn. 1.

3.4 Learnable Equivalent Transformation (LET)

$\mathbf{Y}=\mathbf{X}\mathbf{W}+\mathbf{B}=[\underbrace{{(\mathbf{X}-\delta)\oslash s}}_{{\tilde{\mathbf{X}}}}]\cdot[\underbrace{{s\odot\mathbf{W}}}_{{\tilde{\mathbf{W}}}}]+[\underbrace{{\mathbf{B}+\delta\mathbf{W}}}_{{\tilde{\mathbf{B}}}}]\tag{3}$
where $Y$ represents the output, $\mathbf{s}\in\mathbb{R}^{1\times C_{in}}$ and $\delta\in\mathbb{R}^{1\times C_{in}}$ are channel-wise scaling and shifting parameters, respectively, $\tilde{\mathbf{X}}$ $\tilde{\mathbf{W}}$ and $\tilde{\mathbf{B}}$ are equivalent activation, weight and bias. $\odot$ and $\cdot$ are elementwise multiplication and division.
Finally, quantization on transformed activations and weights are performed as given by
$\mathbf{Y}=Q_a(\tilde{\mathbf{X}})Q_w(\tilde{\mathbf{W}})+\widetilde{\mathbf{B}}\tag{4}$
where $Q_a$ is the vanilla MinMax quantizer and $Q_a$ is the MinMax quantizer with LWC.
The learning equivalent transform of the self-attention affinity matrix can be written as:
$\mathbf{P}=\mathrm{Softmax}(\mathbf{Q}\mathbf{K}^T)=\mathrm{Softmax}((\underbrace{\mathbf{Q}\otimes s_a}_{\tilde{\mathcal{Q}}})(\underbrace{s_a\odot\mathbf{K}^T}_{\tilde{\mathcal{K}}^T}))$
where $s_a\in\mathbb{R}^{1\times C_{out}}$ is the scaling factor in the affinity matrix. Similar to Eqn. 3, the quantized affinity matrix calculation is expressed a $\mathbf{P}=\mathop{\mathrm{Softmax}}(Q_a(\widetilde{\mathbf{Q}})Q_a(\widetilde{\mathbf{K}}^T))$ .
And $\Theta_{2}=\{\delta,s,s_{a}\}$ in Eqn. 1.