【量化】——采用KL散度计算阈值

最新推荐文章于 2025-01-10 17:48:11 发布

农夫山泉2号

最新推荐文章于 2025-01-10 17:48:11 发布

阅读量2k

点赞数 1

分类专栏：量化/剪枝文章标签： caffe 深度学习机器学习量化 int8

本文链接：https://blog.csdn.net/u011622208/article/details/119955103

版权

量化/剪枝专栏收录该内容

12 篇文章

订阅专栏

int8, KL散度

1. KL散度的计算

转载自:https://zhuanlan.zhihu.com/p/339613080
KL散度可以用来衡量两个概率分布之间的相似性，两个概率分布越相近，KL散度越小。
其计算公式为：
在这里插入图片描述
通常P为真实事件的概率分布，Q为理论拟合出来的该事件的概率分布。因为 $D_{KL}(P||Q)$ (P拟合Q)和 $D_{KL}(Q||P)$ (Q拟合P)是不一样的。

2. code

代码摘自:https://github.com/BUG1989/caffe-int8-convert-tools

这里是 tensorrt int8 量化中的激活值量化。目的是选择一个阈值 $t h r e s h o l d$ ，然后做一个clamp(x, 0, threshold), 所以这里的 $t h r e s h o l d$ 选择很关键。

求解shreshold的整体流程为：

对激活值blob进行2048离散化
从128到2048，迭代，选择一个threshold，对原始数据进行clamp，得到p
用threshold对原始筛选的数据映射到q
计算p，q的KL散度

import numpy as np
import copy
from scipy import stats

def own_kl(p, q):
    pk = 1.0 * p / np.sum(p)
    qk = 1.0 * q / np.sum(q)
    t = 0
    for i in range(pk.shape[0]):
        t += pk[i] * np.log(pk[i]) - pk[i] * np.log(qk[i])
    
    return t


def threshold_distribution(distribution, target_bin=128):
    """
    Return the best threshold value. 
    Ref: https://github.com//apache/incubator-mxnet/blob/master/python/mxnet/contrib/quantization.py
    Args:
        distribution: list, activations has been processed by histogram and normalize,size is 2048
        target_bin: int, the num of bin that is used by quantize, Int8 default value is 128
    Returns:
        target_threshold: int, num of bin with the minimum KL 
    """   
    distribution = distribution[1:]
    length = distribution.size
    threshold_sum = sum(distribution[target_bin:])
    kl_divergence = np.zeros(length - target_bin)

    for threshold in range(target_bin, length):
        sliced_nd_hist = copy.deepcopy(distribution[:threshold])            # 

        # generate reference distribution p
        p = sliced_nd_hist.copy()
        p[threshold-1] += threshold_sum
        threshold_sum = threshold_sum - distribution[threshold]

        # is_nonzeros[k] indicates whether hist[k] is nonzero
        is_nonzeros = (p != 0).astype(np.int64)
        # 
        quantized_bins = np.zeros(target_bin, dtype=np.int64)
        # calculate how many bins should be merged to generate quantized distribution q
        num_merged_bins = sliced_nd_hist.size // target_bin
        
        # merge hist into num_quantized_bins bins
        for j in range(target_bin):
            start = j * num_merged_bins
            stop = start + num_merged_bins
            quantized_bins[j] = sliced_nd_hist[start:stop].sum()
        quantized_bins[-1] += sliced_nd_hist[target_bin * num_merged_bins:].sum()
        
        # expand quantized_bins into p.size bins
        q = np.zeros(sliced_nd_hist.size, dtype=np.float64)
        for j in range(target_bin):
            start = j * num_merged_bins
            if j == target_bin - 1:
                stop = -1
            else:
                stop = start + num_merged_bins
            norm = is_nonzeros[start:stop].sum()
            if norm != 0:
                q[start:stop] = float(quantized_bins[j]) / float(norm)
        q[p == 0] = 0
        # p = _smooth_distribution(p) # with some bugs, need to fix
        # q = _smooth_distribution(q)
        p[p == 0] = 0.0001
        q[q == 0] = 0.0001
        
        # calculate kl_divergence between q and p
        t = stats.entropy(p, q)
        kl_divergence[threshold - target_bin]  = t
        ot = own_kl(p, q)

    min_kl_divergence = np.argmin(kl_divergence)
    threshold_value = min_kl_divergence + target_bin

    return threshold_value


if __name__ == "__main__":
    vector = np.random.randint(500, 1500, 2048)

    threshold_bin = threshold_distribution(vector)

    print("threshold bin: ", threshold_bin)