详解tensorflow_model_optimization——tf.keras量化工具

3 篇文章 0 订阅
2 篇文章 0 订阅

tensorflow_model_optimization是什么?

tensorflow对keras api提供支持的快速量化工具。
以下是相关重要函数

import tensorflow_model_optimization as tfmot #量化工具包
quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer #标记量化层
quantize_apply = tfmot.quantization.keras.quantize_apply #使能标记量化层真正被量化
quantize_model = tfmot.quantization.keras.quantize_model #量化整个模型
quantize_scope = tfmot.quantization.keras.quantize_scope #定义量化工作空间(load&quantize model时使用,用于传入自定义的量化配置)
LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer #(根据范围的最后一批值量化张量,默认用于参数量化器)
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer #(根据各批次值的移动平均值对张量进行量化,默认用于激活量化器)
#输出量化默认不开

一、如何定义自己的量化配置

1、首先定义一个自己的默认配置(类),需要包含以下6个函数

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        return [(layer.kernel, LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)),\
                (layer.bias, LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]
    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return [(layer.activation, MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
        layer.kernel = quantize_weights[0]
        layer.bias = quantize_weights[1]

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        layer.activation = quantize_activations[0]

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}

2、根据不同模型层的量化需要,定义不同的量化配置函数,采用继承父类的方法修改定义的函数
注释:其中不同层的内部成员变量的命名方法可能不一致,报错请查询tf.keras 对应的函数封装方法,如Conv2D内为self.kernel,而DepthwiseConv2D为self.depthwise_kernel。一般性名称为self.kernel,self.bias,self.outputs,self.activation

class DC_MDQC(DefaultDenseQuantizeConfig):
    def get_weights_and_quantizers(self, layer):
        return [(layer.depthwise_kernel , LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)),\
                (layer.bias , LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]
    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order
        layer.depthwise_kernel  = quantize_weights[0]
        layer.bias = quantize_weights[1]
    def get_activations_and_quantizers(self, layer):
        # Skip quantizing activations.
        return []
    def set_quantize_activations(self, layer, quantize_activations):
        # Empty since `get_activaations_and_quantizers` returns
        # an empty list.
        return
    
class MDQC(DefaultDenseQuantizeConfig):
    def get_activations_and_quantizers(self, layer):
        # Skip quantizing activations.
        return []

    def set_quantize_activations(self, layer, quantize_activations):
        # Empty since `get_activaations_and_quantizers` returns
        # an empty list.
        return

二、如何量化自己的模型

1、定义模型,这里使用贯序式(函数式大同小异)和标记层量化法为例

model = tf.keras.models.Sequential([
        quantize_annotate_layer(DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='valid',activation=None,use_bias=True,\
               input_shape=(28,28,1),name='conv1',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),DC_MDQC()),
        MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling1'),
        quantize_annotate_layer(Convolution2D(filters=6,kernel_size=(1,1),strides=(1,1),padding='valid',activation=None,use_bias=True,\
               name='conv2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),MDQC()),
        Activation('relu',name='relu1'),
        quantize_annotate_layer(DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='same',activation=None,use_bias=True,\
               name='conv3',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),DC_MDQC()),
        MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling2'),
        quantize_annotate_layer(Convolution2D(filters=16,kernel_size=(1,1),strides=(1,1),padding='same',activation=None,use_bias=True,\
              name='conv4',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),MDQC()),
        Activation('relu',name='relu2'),
        Flatten(name='flat'),
        Activation('relu',name='relu3'),
        quantize_annotate_layer(Dense(84, activation=None,name='fc2',kernel_constraint=max_norm(1.),\
                                      bias_constraint=max_norm(1.)),MDQC()),
        Activation('relu',name='relu4'),
        Dropout(0.2,name='dropout'),
        quantize_annotate_layer(Dense(10,name='fc3'),MDQC()),
        Activation('softmax',name='softmax')
    ])

2、声明量化空间并量化模型
注意:要放在quant_model.compile(XXX)之前

    with quantize_scope({'MDQC': MDQC,'DC_MDQC':DC_MDQC}):
        # Use `quantize_apply` to actually make the model quantization aware.
        quant_model = quantize_apply(model)

3、训练并保存你的模型
注释:和普通模型一样

    quant_model.summary()#layer.name 前会自动加上'quant',如果是函数式模型中会显示增加了一个quantize_layer来适应浮点输入(不必惊慌)
    print("==> training")
    quant_model.fit(x_train, y_train, epochs=1)
    print("==> evaluate")
    quant_model.evaluate(x_test,y_test, verbose=2)
    quant_model.save('./log/quant_model.h5')

三、如何测试自己的量化模型

1、定义一个量化函数
注释:因为保存的参数依然是未量化的浮点,只是在inference的时候经过一个fake_quantize层,因此需要模拟这个层,以下示例是我自己所需,按需修改,[-128,127]对应上面量化配置中的narrow_range=False,[-127,127]对应narrow_range=True
注意:如果使能bias

def Get_Quant_Weights(weights,quant_width=128,quant_scope=1):
    print('max=',np.max(weights),'min=',np.min(weights))
    weights_q = np.round(weights*quant_width/quant_scope)#to int
    weights_q[np.where(weights_q>=quant_width)] = quant_width-1#truncat positive
    weights_q[np.where(weights_q<-quant_width)] = -quant_width#truncat negative
    quant_weights = weights_q/quant_width*quant_scope#to float
    print('quant_scope=',quant_scope,'max=',np.max(quant_weights),'min=',np.min(quant_weights))
    return quant_weights

def Save_Quant_Info(layer_name,ws,bs,save_dir):
    if not save_dir == None:
        if not os.path.exists(save_dir+'/quant_ws_bs.csv'):
            quant_ws_bs = [[layer_name,ws,bs]]
        else:
            file_p = open(save_dir+'/quant_ws_bs.csv','r')
            csvreader = csv.reader(file_p);final_list = list(csvreader);file_p.close()
            quant_ws_bs = final_list
            quant_ws_bs.append([layer_name,ws,bs])
        np.savetxt(save_dir+'/quant_ws_bs.csv', quant_ws_bs, fmt='%s',delimiter = ',')

#ori_quant: True mean test ori quant acc
#quant_all: True mean quantize all parameters
#Attention: the case of weights is not quantized but bias is quantized is not supported
def Get_Quant_Model_Weights(quant_model,layer_name,ori_quant=False,quant_all=False,save_dir=None):
    #---user def-----#
    quant_width_w=128;quant_width_b=16384;ws = 1;bs = 1
    #----endef-------#
    quant_weights = quant_model.get_layer(layer_name).get_weights()
    print(len(quant_weights))
    if len(quant_weights) in [2,5,7]:
        weights = [[],[]]
        if len(quant_weights) == 2:#wnq & bnq
            if ori_quant:
                ws = max(np.max(quant_weights[0]),np.max(quant_weights[1]))
                bs = ws
        elif len(quant_weights) == 5:#wq & bnq
            if ori_quant:
                ws = -quant_weights[3]
                if not quant_all:
                    weights[0] = Get_Quant_Weights(quant_weights[1],quant_width_w,ws)
                    weights[1] = quant_weights[0]
                    Save_Quant_Info(layer_name,ws,bs,save_dir)
                    return weights
            if quant_all:
                ws = max(ws,np.max(quant_weights[0]))
                bs = ws
            tmp = quant_weights[1];quant_weights[1] = quant_weights[0];quant_weights[0] = tmp
        elif len(quant_weights) == 7:#wq & bq
            if ori_quant:
                ws = -quant_weights[3]
                bs = -quant_weights[5]
        weights[0] = Get_Quant_Weights(quant_weights[0],quant_width_w,ws)
        weights[1] = Get_Quant_Weights(quant_weights[1],quant_width_b,bs)
    else:
        weights = [[]]
        bs = 0
        if len(quant_weights) == 1:#wnq
            if ori_quant:
                ws = np.max(quant_weights[0])
        elif len(quant_weights) == 4:#wq
            if ori_quant:
                ws = -quant_weights[2]
        else:
            print('error of quant_weights = ',len(quant_weights))
            return
        weights[0] = Get_Quant_Weights(quant_weights[0],quant_width_w,ws)
    Save_Quant_Info(layer_name,ws,bs,save_dir)
    return weights

2、定义一个不经过量化的相同模型

model_q = tf.keras.models.Sequential([
        DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='valid',activation=None,use_bias=True,\
               input_shape=(28,28,1),name='conv1',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
        MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling1'),
        Convolution2D(filters=6,kernel_size=(1,1),strides=(1,1),padding='valid',activation=None,use_bias=True,\
               name='conv2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
        Activation('relu',name='relu1'),
        DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='same',activation=None,use_bias=True,\
               name='conv3',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
        MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling2'),
        Convolution2D(filters=16,kernel_size=(1,1),strides=(1,1),padding='same',activation=None,use_bias=True,\
              name='conv4',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
        Activation('relu',name='relu2'),
        Flatten(name='flat'),
        Activation('relu',name='relu3'),
        Dense(84, activation=None,name='fc2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
        Activation('relu',name='relu4'),
        Dropout(0.2,name='dropout'),
        Dense(10,name='fc3'),
        Activation('softmax',name='softmax')
    ])
model_q.compile(optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])
model_q.summary()

3、加载并测试模型量化效果
注意:如果不量化bias,那么quant_model中get_weight返回的list中第一个是bias;
此外, 量化kernel不量化bias但存在bias,层参数列表为[bias,kernel,(),kernel_min,kernel_max]
量化kernel且不存在bias,层参数列表为[kernel,(),kernel_min,kernel_max]
量化kernel和bias,层参数列表为[kernel,bias,(),kernel_min,kernel_max,bias_min,bias_max]

model_dir = './log/quant_model.h5'
with tfmot.quantization.keras.quantize_scope({'MDQC': MDQC,'DC_MDQC':DC_MDQC}):
    quant_model = load_model(model_dir)
quant_model.summary()
#注意以下代码按需修改#
layer_name = ['conv1','conv2','conv3','conv4','fc2','fc3']
for i in range(len(layer_name)):
    print(layer_name[i])
    weights = model_q.get_layer(layer_name[i]).get_weights()
    if layer_name[i]=='fc3':
        weights = Get_Quant_Model_Weights(quant_model,'quant_'+layer_name[i],ori_quant=True,quant_all=False)
    else:
        weights = Get_Quant_Model_Weights(quant_model,'quant_'+layer_name[i],ori_quant=True,quant_all=False)
    model_q.get_layer(layer_name[i]).set_weights(weights)

print("==>quant_model evaluate")
quant_model.evaluate(x_test,  y_test, verbose=2) 
    
print("==>model_q evaluate")
model_q.evaluate(x_test,  y_test, verbose=2)

四、已解决的问题:量化训练+keras fine-tune无法适配

备注:该问题可以归类为keras与tensorflow混编时部分功能无法兼容得问题,keras利用trainable变量实现fine-tune功能,而tensorflow使用opt(优化器).minimize(loss,var_list)中的var_list实现fine-tune功能。

4.1 Case:使用Keras创建模型,但使用TF进行训练

利用with tf.variable_scope(‘yyy’):定义需要冻结的部分模型,利用with tf.variable_scope(‘xxx’):定义需要训练的部分模型,然后通过trainable_var = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, ‘xxx’)获得需要训练的变量名,传入minimize(loss,var_list),最后使用tf训练即可。详细可见Reference-4

4.2 Case:使用Keras创建模型,也使用Keras进行训练

备注:由于tensorflow_model_optimization是TF对Keras的量化支持,因此还是属于tensorflow的原生代码。标记量化的层无法进行冻结。解决思路如下:
首先解量化:将quant_xxx层的权重转换为解量化的权重(使用本文编写的Get_Quant_Weights进行转换即可),然后取消需要冻结的层的量化标记,并把解量化的权重加入新模型,其他不需要冻结的层保持原状并加载预训练好的参数,如果没有预训练参数则不载入即可。

五、部分解决的问题:如何自定义量化范围?

备注:一个解决思路是,训练时不量化bias,增加对权重的核约束,约束权重在一定范围。但这样依然存在一些问题,即权重较少的层,其训练得到的权重的最大值可能无法达到设定值,故进一步的解决思路为,量化训练好模型后,对模型参数较少的层进行解量化,并冻结,然后再fine-tune+量化训练网络得到最后模型(多一个步骤,且不一定带来很大的准确率提升,是否使用自行考虑)。
欢迎评论区讨论

Reference

1.tensorflow_model_optimization函数

2.tensorflow_model_optimization示例

3.量化科普——知乎

4.Keras TensorFlow 混编中 trainable=False设置无效

  • 7
    点赞
  • 43
    收藏
    觉得还不错? 一键收藏
  • 6
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值