int8, KL散度
1. KL散度的计算
转载自:https://zhuanlan.zhihu.com/p/339613080
KL散度可以用来衡量两个概率分布之间的相似性,两个概率分布越相近,KL散度越小。
其计算公式为:
通常P为真实事件的概率分布,Q为理论拟合出来的该事件的概率分布。因为
D
K
L
(
P
∣
∣
Q
)
D_{KL}(P||Q)
DKL(P∣∣Q) (P拟合Q)和
D
K
L
(
Q
∣
∣
P
)
D_{KL}(Q||P)
DKL(Q∣∣P)(Q拟合P)是不一样的。
2. code
代码摘自:https://github.com/BUG1989/caffe-int8-convert-tools
这里是 tensorrt int8 量化中的激活值量化。目的是选择一个阈值 t h r e s h o l d threshold threshold,然后做一个clamp(x, 0, threshold), 所以这里的 t h r e s h o l d threshold threshold选择很关键。
求解shreshold的整体流程为:
- 对激活值blob进行2048离散化
- 从128到2048,迭代,选择一个threshold,对原始数据进行clamp,得到p
- 用threshold对原始筛选的数据映射到q
- 计算p,q的KL散度
import numpy as np
import copy
from scipy import stats
def own_kl(p, q):
pk = 1.0 * p / np.sum(p)
qk = 1.0 * q / np.sum(q)
t = 0
for i in range(pk.shape[0]):
t += pk[i] * np.log(pk[i]) - pk[i] * np.log(qk[i])
return t
def threshold_distribution(distribution, target_bin=128):
"""
Return the best threshold value.
Ref: https://github.com//apache/incubator-mxnet/blob/master/python/mxnet/contrib/quantization.py
Args:
distribution: list, activations has been processed by histogram and normalize,size is 2048
target_bin: int, the num of bin that is used by quantize, Int8 default value is 128
Returns:
target_threshold: int, num of bin with the minimum KL
"""
distribution = distribution[1:]
length = distribution.size
threshold_sum = sum(distribution[target_bin:])
kl_divergence = np.zeros(length - target_bin)
for threshold in range(target_bin, length):
sliced_nd_hist = copy.deepcopy(distribution[:threshold]) #
# generate reference distribution p
p = sliced_nd_hist.copy()
p[threshold-1] += threshold_sum
threshold_sum = threshold_sum - distribution[threshold]
# is_nonzeros[k] indicates whether hist[k] is nonzero
is_nonzeros = (p != 0).astype(np.int64)
#
quantized_bins = np.zeros(target_bin, dtype=np.int64)
# calculate how many bins should be merged to generate quantized distribution q
num_merged_bins = sliced_nd_hist.size // target_bin
# merge hist into num_quantized_bins bins
for j in range(target_bin):
start = j * num_merged_bins
stop = start + num_merged_bins
quantized_bins[j] = sliced_nd_hist[start:stop].sum()
quantized_bins[-1] += sliced_nd_hist[target_bin * num_merged_bins:].sum()
# expand quantized_bins into p.size bins
q = np.zeros(sliced_nd_hist.size, dtype=np.float64)
for j in range(target_bin):
start = j * num_merged_bins
if j == target_bin - 1:
stop = -1
else:
stop = start + num_merged_bins
norm = is_nonzeros[start:stop].sum()
if norm != 0:
q[start:stop] = float(quantized_bins[j]) / float(norm)
q[p == 0] = 0
# p = _smooth_distribution(p) # with some bugs, need to fix
# q = _smooth_distribution(q)
p[p == 0] = 0.0001
q[q == 0] = 0.0001
# calculate kl_divergence between q and p
t = stats.entropy(p, q)
kl_divergence[threshold - target_bin] = t
ot = own_kl(p, q)
min_kl_divergence = np.argmin(kl_divergence)
threshold_value = min_kl_divergence + target_bin
return threshold_value
if __name__ == "__main__":
vector = np.random.randint(500, 1500, 2048)
threshold_bin = threshold_distribution(vector)
print("threshold bin: ", threshold_bin)