详解tensorflow_model_optimization——tf.keras量化工具
tensorflow_model_optimization是什么?
tensorflow对keras api提供支持的快速量化工具。
以下是相关重要函数
import tensorflow_model_optimization as tfmot #量化工具包
quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer #标记量化层
quantize_apply = tfmot.quantization.keras.quantize_apply #使能标记量化层真正被量化
quantize_model = tfmot.quantization.keras.quantize_model #量化整个模型
quantize_scope = tfmot.quantization.keras.quantize_scope #定义量化工作空间(load&quantize model时使用,用于传入自定义的量化配置)
LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer #(根据范围的最后一批值量化张量,默认用于参数量化器)
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer #(根据各批次值的移动平均值对张量进行量化,默认用于激活量化器)
#输出量化默认不开
一、如何定义自己的量化配置
1、首先定义一个自己的默认配置(类),需要包含以下6个函数
class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
# Configure how to quantize weights.
def get_weights_and_quantizers(self, layer):
return [(layer.kernel, LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)),\
(layer.bias, LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]
# Configure how to quantize activations.
def get_activations_and_quantizers(self, layer):
return [(layer.activation, MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False))]
def set_quantize_weights(self, layer, quantize_weights):
# Add this line for each item returned in `get_weights_and_quantizers`
# , in the same order
layer.kernel = quantize_weights[0]
layer.bias = quantize_weights[1]
def set_quantize_activations(self, layer, quantize_activations):
# Add this line for each item returned in `get_activations_and_quantizers`
# , in the same order.
layer.activation = quantize_activations[0]
# Configure how to quantize outputs (may be equivalent to activations).
def get_output_quantizers(self, layer):
return []
def get_config(self):
return {}
2、根据不同模型层的量化需要,定义不同的量化配置函数,采用继承父类的方法修改定义的函数
注释:其中不同层的内部成员变量的命名方法可能不一致,报错请查询tf.keras 对应的函数封装方法,如Conv2D内为self.kernel,而DepthwiseConv2D为self.depthwise_kernel。一般性名称为self.kernel,self.bias,self.outputs,self.activation
class DC_MDQC(DefaultDenseQuantizeConfig):
def get_weights_and_quantizers(self, layer):
return [(layer.depthwise_kernel , LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)),\
(layer.bias , LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False))]
def set_quantize_weights(self, layer, quantize_weights):
# Add this line for each item returned in `get_weights_and_quantizers`
# , in the same order
layer.depthwise_kernel = quantize_weights[0]
layer.bias = quantize_weights[1]
def get_activations_and_quantizers(self, layer):
# Skip quantizing activations.
return []
def set_quantize_activations(self, layer, quantize_activations):
# Empty since `get_activaations_and_quantizers` returns
# an empty list.
return
class MDQC(DefaultDenseQuantizeConfig):
def get_activations_and_quantizers(self, layer):
# Skip quantizing activations.
return []
def set_quantize_activations(self, layer, quantize_activations):
# Empty since `get_activaations_and_quantizers` returns
# an empty list.
return
二、如何量化自己的模型
1、定义模型,这里使用贯序式(函数式大同小异)和标记层量化法为例
model = tf.keras.models.Sequential([
quantize_annotate_layer(DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='valid',activation=None,use_bias=True,\
input_shape=(28,28,1),name='conv1',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),DC_MDQC()),
MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling1'),
quantize_annotate_layer(Convolution2D(filters=6,kernel_size=(1,1),strides=(1,1),padding='valid',activation=None,use_bias=True,\
name='conv2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),MDQC()),
Activation('relu',name='relu1'),
quantize_annotate_layer(DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='same',activation=None,use_bias=True,\
name='conv3',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),DC_MDQC()),
MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling2'),
quantize_annotate_layer(Convolution2D(filters=16,kernel_size=(1,1),strides=(1,1),padding='same',activation=None,use_bias=True,\
name='conv4',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),MDQC()),
Activation('relu',name='relu2'),
Flatten(name='flat'),
Activation('relu',name='relu3'),
quantize_annotate_layer(Dense(84, activation=None,name='fc2',kernel_constraint=max_norm(1.),\
bias_constraint=max_norm(1.)),MDQC()),
Activation('relu',name='relu4'),
Dropout(0.2,name='dropout'),
quantize_annotate_layer(Dense(10,name='fc3'),MDQC()),
Activation('softmax',name='softmax')
])
2、声明量化空间并量化模型
注意:要放在quant_model.compile(XXX)之前
with quantize_scope({'MDQC': MDQC,'DC_MDQC':DC_MDQC}):
# Use `quantize_apply` to actually make the model quantization aware.
quant_model = quantize_apply(model)
3、训练并保存你的模型
注释:和普通模型一样
quant_model.summary()#layer.name 前会自动加上'quant',如果是函数式模型中会显示增加了一个quantize_layer来适应浮点输入(不必惊慌)
print("==> training")
quant_model.fit(x_train, y_train, epochs=1)
print("==> evaluate")
quant_model.evaluate(x_test,y_test, verbose=2)
quant_model.save('./log/quant_model.h5')
三、如何测试自己的量化模型
1、定义一个量化函数
注释:因为保存的参数依然是未量化的浮点,只是在inference的时候经过一个fake_quantize层,因此需要模拟这个层,以下示例是我自己所需,按需修改,[-128,127]对应上面量化配置中的narrow_range=False,[-127,127]对应narrow_range=True
注意:如果使能bias
def Get_Quant_Weights(weights,quant_width=128,quant_scope=1):
print('max=',np.max(weights),'min=',np.min(weights))
weights_q = np.round(weights*quant_width/quant_scope)#to int
weights_q[np.where(weights_q>=quant_width)] = quant_width-1#truncat positive
weights_q[np.where(weights_q<-quant_width)] = -quant_width#truncat negative
quant_weights = weights_q/quant_width*quant_scope#to float
print('quant_scope=',quant_scope,'max=',np.max(quant_weights),'min=',np.min(quant_weights))
return quant_weights
def Save_Quant_Info(layer_name,ws,bs,save_dir):
if not save_dir == None:
if not os.path.exists(save_dir+'/quant_ws_bs.csv'):
quant_ws_bs = [[layer_name,ws,bs]]
else:
file_p = open(save_dir+'/quant_ws_bs.csv','r')
csvreader = csv.reader(file_p);final_list = list(csvreader);file_p.close()
quant_ws_bs = final_list
quant_ws_bs.append([layer_name,ws,bs])
np.savetxt(save_dir+'/quant_ws_bs.csv', quant_ws_bs, fmt='%s',delimiter = ',')
#ori_quant: True mean test ori quant acc
#quant_all: True mean quantize all parameters
#Attention: the case of weights is not quantized but bias is quantized is not supported
def Get_Quant_Model_Weights(quant_model,layer_name,ori_quant=False,quant_all=False,save_dir=None):
#---user def-----#
quant_width_w=128;quant_width_b=16384;ws = 1;bs = 1
#----endef-------#
quant_weights = quant_model.get_layer(layer_name).get_weights()
print(len(quant_weights))
if len(quant_weights) in [2,5,7]:
weights = [[],[]]
if len(quant_weights) == 2:#wnq & bnq
if ori_quant:
ws = max(np.max(quant_weights[0]),np.max(quant_weights[1]))
bs = ws
elif len(quant_weights) == 5:#wq & bnq
if ori_quant:
ws = -quant_weights[3]
if not quant_all:
weights[0] = Get_Quant_Weights(quant_weights[1],quant_width_w,ws)
weights[1] = quant_weights[0]
Save_Quant_Info(layer_name,ws,bs,save_dir)
return weights
if quant_all:
ws = max(ws,np.max(quant_weights[0]))
bs = ws
tmp = quant_weights[1];quant_weights[1] = quant_weights[0];quant_weights[0] = tmp
elif len(quant_weights) == 7:#wq & bq
if ori_quant:
ws = -quant_weights[3]
bs = -quant_weights[5]
weights[0] = Get_Quant_Weights(quant_weights[0],quant_width_w,ws)
weights[1] = Get_Quant_Weights(quant_weights[1],quant_width_b,bs)
else:
weights = [[]]
bs = 0
if len(quant_weights) == 1:#wnq
if ori_quant:
ws = np.max(quant_weights[0])
elif len(quant_weights) == 4:#wq
if ori_quant:
ws = -quant_weights[2]
else:
print('error of quant_weights = ',len(quant_weights))
return
weights[0] = Get_Quant_Weights(quant_weights[0],quant_width_w,ws)
Save_Quant_Info(layer_name,ws,bs,save_dir)
return weights
2、定义一个不经过量化的相同模型
model_q = tf.keras.models.Sequential([
DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='valid',activation=None,use_bias=True,\
input_shape=(28,28,1),name='conv1',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling1'),
Convolution2D(filters=6,kernel_size=(1,1),strides=(1,1),padding='valid',activation=None,use_bias=True,\
name='conv2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
Activation('relu',name='relu1'),
DepthwiseConv2D(kernel_size=(5,5),strides=(1,1),padding='same',activation=None,use_bias=True,\
name='conv3',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
MaxPooling2D(pool_size=(3,3),strides=2,padding="same",name='maxpooling2'),
Convolution2D(filters=16,kernel_size=(1,1),strides=(1,1),padding='same',activation=None,use_bias=True,\
name='conv4',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
Activation('relu',name='relu2'),
Flatten(name='flat'),
Activation('relu',name='relu3'),
Dense(84, activation=None,name='fc2',kernel_constraint=max_norm(1.),bias_constraint=max_norm(1.)),
Activation('relu',name='relu4'),
Dropout(0.2,name='dropout'),
Dense(10,name='fc3'),
Activation('softmax',name='softmax')
])
model_q.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model_q.summary()
3、加载并测试模型量化效果
注意:如果不量化bias,那么quant_model中get_weight返回的list中第一个是bias;
此外, 量化kernel不量化bias但存在bias,层参数列表为[bias,kernel,(),kernel_min,kernel_max]
量化kernel且不存在bias,层参数列表为[kernel,(),kernel_min,kernel_max]
量化kernel和bias,层参数列表为[kernel,bias,(),kernel_min,kernel_max,bias_min,bias_max]
model_dir = './log/quant_model.h5'
with tfmot.quantization.keras.quantize_scope({'MDQC': MDQC,'DC_MDQC':DC_MDQC}):
quant_model = load_model(model_dir)
quant_model.summary()
#注意以下代码按需修改#
layer_name = ['conv1','conv2','conv3','conv4','fc2','fc3']
for i in range(len(layer_name)):
print(layer_name[i])
weights = model_q.get_layer(layer_name[i]).get_weights()
if layer_name[i]=='fc3':
weights = Get_Quant_Model_Weights(quant_model,'quant_'+layer_name[i],ori_quant=True,quant_all=False)
else:
weights = Get_Quant_Model_Weights(quant_model,'quant_'+layer_name[i],ori_quant=True,quant_all=False)
model_q.get_layer(layer_name[i]).set_weights(weights)
print("==>quant_model evaluate")
quant_model.evaluate(x_test, y_test, verbose=2)
print("==>model_q evaluate")
model_q.evaluate(x_test, y_test, verbose=2)
四、已解决的问题:量化训练+keras fine-tune无法适配
备注:该问题可以归类为keras与tensorflow混编时部分功能无法兼容得问题,keras利用trainable变量实现fine-tune功能,而tensorflow使用opt(优化器).minimize(loss,var_list)中的var_list实现fine-tune功能。
4.1 Case:使用Keras创建模型,但使用TF进行训练
利用with tf.variable_scope(‘yyy’):定义需要冻结的部分模型,利用with tf.variable_scope(‘xxx’):定义需要训练的部分模型,然后通过trainable_var = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, ‘xxx’)获得需要训练的变量名,传入minimize(loss,var_list),最后使用tf训练即可。详细可见Reference-4
4.2 Case:使用Keras创建模型,也使用Keras进行训练
备注:由于tensorflow_model_optimization是TF对Keras的量化支持,因此还是属于tensorflow的原生代码。标记量化的层无法进行冻结。解决思路如下:
首先解量化:将quant_xxx层的权重转换为解量化的权重(使用本文编写的Get_Quant_Weights进行转换即可),然后取消需要冻结的层的量化标记,并把解量化的权重加入新模型,其他不需要冻结的层保持原状并加载预训练好的参数,如果没有预训练参数则不载入即可。
五、部分解决的问题:如何自定义量化范围?
备注:一个解决思路是,训练时不量化bias,增加对权重的核约束,约束权重在一定范围。但这样依然存在一些问题,即权重较少的层,其训练得到的权重的最大值可能无法达到设定值,故进一步的解决思路为,量化训练好模型后,对模型参数较少的层进行解量化,并冻结,然后再fine-tune+量化训练网络得到最后模型(多一个步骤,且不一定带来很大的准确率提升,是否使用自行考虑)。
欢迎评论区讨论
Reference
1.tensorflow_model_optimization函数
2.tensorflow_model_optimization示例
3.量化科普——知乎