onnx_calibrate calibration代码原理分析

最新推荐文章于 2024-01-24 14:37:26 发布

陶表犁

最新推荐文章于 2024-01-24 14:37:26 发布

阅读量923

点赞数

分类专栏： onnx 文章标签： calibration tensorrt onnx 精度恢复

本文链接：https://blog.csdn.net/tbl1234567/article/details/109240924

版权

onnx 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Onnx_calibrate calibration代码原理分析

Calibration的思想是通过一堆验证数据集输入到网络中，统计每一层layer的输出值，通过对比量化前后数据统计分布之间的KL散度找到最佳的映射值T.具体参考NVIDIAGTC2017的ppt。

def onnx_runtime(model_path,image_files):
    '''
    Helper function run input image,and output each node tensor to calibration.
    parameter model_path: the onnx model
    parameter image_files: calibrate input images
    return: 
    '''
    sess = rt.InferenceSession(model_path)
    Input_name = sess.get_inputs()[0].name
    model_outputs = sess.get_outputs()
    print(len(model_outputs)) 
    # 1.对每一个需要量化的node 计算其输入的tensor中最大值
    for i,image in enumerate(image_files):
        img = cv2.imread(image)
        img = cv2.resize(img,(224,224))
        img = np.transpose(img,(2,0,1))
        img = img.astype('float32')/255
        img = img.reshape(1,224,224,3)
        start_time = datetime.datetime.now()
        for node in quantize_node_list:
            Output_name = node.output_name
            res = sess.run([Output_name],{Input_name:img})
            node.initial_input_max(np.array(res).flatten())
        end_time = datetime.datetime.now()
        print('it`s cost :', (end_time - start_time))
        if i % 100 == 0:
            print('loop stage 1 : %d/%d' % (i,len(image_files)))

    # calculate statistic node scope and interval distribution
    # 2.计算统计值分布的间隔用最大值除2048，即间隔的大小用于还原计算T
    for node in quantize_node_list:
        node.initial_input_distubution_interval()

    # for each nodes
    # collect histograms of activations
    # 3.得到每个node的数据分布，对于一个node得到的是在（0，max）划分2048块每个块内数据落在其中的数量统计值，假设区间[0,1]有20个数值落在里面。
    print('\n Collect histograms of activations: ')
    for i, image in enumerate(image_files):
        img = cv2.imread(image)
        img = cv2.resize(img,(224,224))
        img = np.transpose(img,(2,0,1))
        #print(img.shape)
        img = img.astype('float32')/255
        img = img.reshape(1,224,224,3)
        for node in quantize_node_list:
            Output_name = node.output_name
            res = sess.run([Output_name],{Input_name:img})
            node.initial_histograms(np.array(res).flatten())
        if i % 100 == 0:
            print('loop stage 2 : %d/%d' % (i,len(image_files)))

    # calculate threshold with KL divergence
    # 4. 核心计算KL散度
    for node in quantize_node_list:
        node.quantize_input()

    return None

前面三步分别对应的核心代码

   # 1. 对每一个需要量化的node 计算其输入的tensor中最大值
    def initial_input_max(self, input_data):
        # get the max value of input
        max_val = np.max(input_data)
        min_val = np.min(input_data)
        self.input_max = max(self.input_max, max(abs(max_val), abs(min_val)))
   # 2.计算统计值分布的间隔用最大值除2048，即间隔的大小用于还原计算T
    def initial_input_distubution_interval(self):
        self.input_distubution_interval = STATISTIC * self.input_max / INTERVAL_NUM
        print("%-20s max_val : %-10.8f distribution_intervals : %-10.8f" % (self.node_name, self.input_max, self.input_distubution_interval))
   #3.得到每个node的数据分布，对于一个node得到的是在（0，max）划分2048块每个块内数据落在其中的数量统计值，假设区间[0,1]有20个数值落在里面。
    def initial_histograms(self, input_data):
        # collect histogram of every group channel input
        th = self.input_max
        # hist:Number of values in the interval for each hist,hist_edge:array of dtype float for interval. range: change the max and min value for inputdata. 
        hist, hist_edge = np.histogram(input_data, bins=INTERVAL_NUM, range=(0, th))
        self.input_distubution += hist

核心是如何做calibration

    def quantize_input(self):
        # calculate threshold  
        distribution = np.array(self.input_distubution)
        # pick threshold which minimizes KL divergence
        threshold_bin = threshold_distribution(distribution) 
        self.input_threshold = threshold_bin
        threshold = (threshold_bin + 0.5) * self.input_distubution_interval
        # get the activation calibration value
        self.input_scale = QUANTIZE_NUM / threshold

calibration的过程

def threshold_distribution(distribution, target_bin=128):
    """
    Return the best threshold value. 
    Args:
        distribution: list, activations has been processed by histogram and normalize,size is 2048
        target_bin: int, the num of bin that is used by quantize, Int8 default value is 128
    Returns:
        target_threshold: int, num of bin with the minimum KL 
    """   
    distribution = distribution[1:]
    length = distribution.size
    threshold_sum = sum(distribution[target_bin:])
    kl_divergence = np.zeros(length - target_bin)
    # 遍历从128到2048开始搜索
    for threshold in range(target_bin, length):
        sliced_nd_hist = copy.deepcopy(distribution[:threshold])

        # generate reference distribution p
        # 得到p比较简单，遍历的前threshold-1个组，将最后所有的组累加到threshold-1组上。
        p = sliced_nd_hist.copy()
        p[threshold-1] += threshold_sum
        threshold_sum = threshold_sum - distribution[threshold]

        # is_nonzeros[k] indicates whether hist[k] is nonzero
        # 判断p中元素是否有0存在，得到的is_nonzeros=[1,1,1,1,0,....]类似的array
        is_nonzeros = (p != 0).astype(np.int64)
        # 
        quantized_bins = np.zeros(target_bin, dtype=np.int64)
        # calculate how many bins should be merged to generate quantized distribution q
        num_merged_bins = sliced_nd_hist.size // target_bin
        
        # merge hist into num_quantized_bins bins
        # 这里是量化的原理，并不是数值的fp32-int8，只是将数据分布合并到128个组中，注意理解
        for j in range(target_bin):
            start = j * num_merged_bins #按照组的大小得到新组（128个组）前后位置
            stop = start + num_merged_bins
            quantized_bins[j] = sliced_nd_hist[start:stop].sum()#属于同组的累加起来
        quantized_bins[-1] += sliced_nd_hist[target_bin * num_merged_bins:].sum()#最后末尾的数据全部累加到新组的最后一组中
        
        # expand quantized_bins into p.size bins. compare with quantizated_bins merge, 
        that is inverse process
        # 将量化后的组重新扩大到与p相同大小的范围，就是按照前面量化的过程逆过来计算。
        q = np.zeros(sliced_nd_hist.size, dtype=np.float64)
        for j in range(target_bin):
            start = j * num_merged_bins #找起始位置
            if j == target_bin - 1:
                stop = -1
            else:
                stop = start + num_merged_bins #计算终止位置
            norm = is_nonzeros[start:stop].sum()
            if norm != 0:
                q[start:stop] = float(quantized_bins[j]) / float(norm)# 把数据平均分配，这是逆过程差异的地方，只能平均分配。
        q[p == 0] = 0
        # p = _smooth_distribution(p) # with some bugs, need to fix
        # q = _smooth_distribution(q)
        p[p == 0] = 0.0001
        q[q == 0] = 0.0001
        
        # calculate kl_divergence between q and p
        kl_divergence[threshold - target_bin] = stats.entropy(p, q)

    min_kl_divergence = np.argmin(kl_divergence)
    threshold_value = min_kl_divergence + target_bin

    return threshold_value

2020/5/27 再次回顾记录:
calibration 的原理:

1.对于需要校正的op,得到其输入的tensor
2. 将其划分到2048个bin中,先得到tensor的最大值,然后除以2048,得到每个bin的区间,就可以统计每个bin的区间内tensor分布的数量,因此2048个bin是tensor数据的分布.
3. 遍历128 - 2048的范围找到合适的T.对于遍历的i,将2048个bin划分成了i个bin,其中0到i-2与2048的前i-2个bin是一致的,第i-1个bin等于2048的bin中i-1到2047的累计和.

3.1 现在需要将i个bin量化到128个bin,因此此时的i个bin是包含了所有数据的分布,希望将其映射到128个bin的范围中,假设i等于1280,那么每十个bin对应128中的一个bin,所以需要将i划分到128个值中,将i个bin中按照i//128(eg:等于10)累加起来,例如0到9个bin的数值累加起来对应第一个bin,这样就得到量化的128个bin分布.  

3.2 反量化到i个bin与原始的i计算kl散度.那么反量化操作就是量化的逆操作,要想还原得到i个bin,就是需要将128个bin扩充到i个bin,做法是仍然按照i//128作为区间,还原的区间(eg:0到9)每个值都等于128中对应bin的值除以区间大小,即平均分配.从操作上看应该就是逆过程,反而把bin区间(eg:每10个取均值)平均化了,其实在大量数据遍历过程中是有效的.这样就得到两组i个bin计算其KL散度.  

3.2 对于每个i都得到一个KL散度值,最后获取KL散度最小的那个i,即对应为t

得到t只是128-2048中的一个整数,需要还原得到最佳的T.最终得到的是每个需要校正的op都对应得到一个T,这便是calibration table.

转载请注明出处:https://blog.csdn.net/tbl1234567.作者:陶表犁

陶表犁

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
onnx_calibrate calibration代码原理分析

Onnx_calibrate calibration代码原理分析Calibration的思想是通过一堆验证数据集输入到网络中，统计每一层layer的输出值，通过对比量化前后数据统计分布之间的KL散度找到最佳的映射值T.具体参考NVIDIAGTC2017的ppt。def onnx_runtime(model_path,image_files): ''' Helper function run input image,and output each node tensor to calibr
复制链接

扫一扫

专栏目录