前言
我们在训练完模型后,可以试试模型融合,一般会带来一点收益,这里所说的模型融合是这样一种场景:假设训练了多个epoch,我们就可以拿到topk个最好的模型,假设3个吧:a,b,c。那么我们可以分别加载这三个模型,取得它们训练好的参数值然后平均,将平均后的参数值作为一个新的模型。当然了还有一种模型融合是将得到的三个模型的预测结果取平均(类似投票原理),这也可以试试,但是这种方式显然在上线的时候要先过三个模型,耗时上太大,而前者所说的参数融合最后上线还是一个模型。所以本篇就简单写两个demo ,供大家快速落地这种想法,就写最常用的tensorflow和pytorch版本吧,其他的框架,大家顺藤摸瓜找到其对应的API就可以了。
欢迎文末关注笔者微信公众号等等,会有更多好内容分享给大家~
思路
要做成这个事,无外乎我们需要两步第一:分别得到topk个模型的参数值。第二:将平均的结果重写到模型中。再说的直白点,主要就是要找到两个API,一个是怎么得到模型的参数值?另一个是修改模型参数值的API。所以如果你用的是其他的框架,也是去看看这两个API在该框架下分别是什么?问题就基本解决了。
Pytorch
核心伪代码:
model_dict = [cur_model.state_dict() for cur_model in models]
paramter_name_list = list(model_dict[0].keys())
new_model_parameter = collections.OrderedDict()
for paramter_name in paramter_name_list:
paramter_val = 0
for i in range(len(model_dict)):
paramter_val
= paramter_val + model_dict[i][paramter_name]
new_model_parameter[paramter_name]=paramter_val/len(model_dict)
new_model.load_state_dict(new_model_parameter)
下面我们写一个真实的例子:就那我们最常使用的transformers为例子吧。
import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import torch
import collections
from tqdm import tqdm
from transformers import BertTokenizer
from basic_modules.SimBERT_model import SimBertModel
from transformers import WEIGHTS_NAME, CONFIG_NAME
class Merge_model():
def __init__(self, root_models_path):
self.root_models_path = root_models_path
self.model_list = []
print("*"*20)
print("load models ...")
model_path_list = tqdm(os.listdir(self.root_models_path))
for model_path in model_path_list:
model_path_list.set_description("loading %s" % model_path)
model_path = os.path.join(self.root_models_path, model_path)
model = SimBertModel(model_path)
#self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#model = SimBertModel(model_path).to(self.device)
model.load_state_dict(torch.load(os.path.join(model_path, 'pytorch_model.bin')))
self.model_list.append(model)
print("All models loaded successfully")
print("num of models to be merged is : " + str(len(self.model_list)))
self.bert_tokenizer = BertTokenizer.from_pretrained(os.path.join(self.root_models_path, model_path_list[0]), do_lower_case=True)
self.merge_model = SimBertModel(os.path.join(self.root_models_path, model_path_list[0]))
def merge_average(self, output_dir=None):
'''
权重平均融合
'''
print("*"*20)
print("merging ...")
worker_state_dict = [x.state_dict() for x in self.model_list]
weight_keys = tqdm(list(worker_state_dict[0].keys()))
fed_state_dict = collections.OrderedDict()
for key in weight_keys:
weight_keys.set_description("merging weights %s" % key)
key_sum = 0
for i in range(len(self.model_list)):
key_sum = key_sum + worker_state_dict[i][key]
fed_state_dict[key] = key_sum / len(models)
self.merge_model.load_state_dict(fed_state_dict)
print("*"*20)
print("merge successfully")
print("save ...")
if not output_dir:
output_dir = os.path.join(self.root_models_path, "merge")
# 它包装在PyTorch DistributedDataParallel或DataParallel中
model_to_save = self.merge_model.module if hasattr(self.merge_model, 'module') else self.merge_model
# # 如果使用预定义的名称保存,则可以使用`from_pretrained`加载
output_model_file = os.path.join(output_dir, WEIGHTS_NAME)
output_config_file = os.path.join(output_dir, CONFIG_NAME)
torch.save(model_to_save.state_dict(), output_model_file)
model_to_save.config.to_json_file(output_config_file)
self.bert_tokenizer.save_vocabulary(output_dir)
print("save successfully")
tensorflow
说到tensorflow,大家可能最容易想到get_weights()和set_weights(),笔者在查了后,试了一下并没有成功,首先说一下这里的难点,我们在平均参数的时候,首先要取到参数值,而且我们其实只是关心可训练参数的值即trainable_variables,对于非训练的参数值我们不需要平均,另外还有一个问题是:修改参数值,如果去查相关的博客,大概率会查到很多,但是基本上都解决不了当前的问题,大部分都是比如:https://blog.csdn.net/jiongnima/article/details/86632517
其实这是有问题的,就是你修改完了,发现加载的时候是出错的,报错原因就是没有一些模型输入的placeholder等等,其实更深层的原因正如第一个评论所说的:
这种方法可以更改变量名,但是原模型的graph没有继承,不能用于模型再利用,因为回写模型之后的图不是原模型的图,而是一个没有逻辑的图。网上还有方法介绍了通过制定var_list进行saver.save的方法,也不行,指定的list变量被写入data数据,但是同样图也不对,saver.save进行save的时候,会将当前图中所有变量全部写入meta文件,这与原模型的图还是不一样。tensorflow在模型再利用方面,没有提供什么有效的接口。
这里笔者采用了tf.assign这个API。核心伪代码:
# get parameter
new_model_parameter = collections.OrderedDict()
for checkpoint_path in checkpoint_path_list:
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
if v.name not in new_model_parameter:
new_model_parameter[v.name] = [sess.run(v)]
else:
new_model_parameter[v.name].append(sess.run(v))
#merge parameter
checkpoint_path = checkpoint_path_list[0]
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
merge_value = list(map(lambda x: np.expand_dims(x, 0), new_model_parameter[v.name]))
var = np.mean(np.concatenate(merge_value), 0)
update = tf.assign(v, var)
sess.run(update)
saver = tf.train.Saver(tf.global_variables())
model_name = 'model_merge'
checkpoint_merge_path = model_name
saver.save(sess, checkpoint_merge_path)
看着似乎很多,其实没什么,简单来说就是sess.graph.get_collection获得所有模型可训练参数的参数值,并保存在new_model_parameter字典中,然后再随便加载一个模型(为的是把原模型的图都加载进来),然后通过tf.assign改变参数值,最后保存即可。
同样给一个真实的例子:
import os
import numpy as np
import tensorflow as tf
top_k = 3
root_model_path = "./cache/soccer/2021-10-22_17-21-11"
model_score_dict = dict()
for filename in os.listdir(root_model_path):
if filename.endswith(".ckpt.meta"):
score = float(filename.split("_")[4][:-10])
model_score_dict[score] = filename[:-5]
# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
merge_score_list.append(str(i))
checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
if v.name not in model_parameter_dict:
model_parameter_dict[v.name] = [sess.run(v)]
else:
model_parameter_dict[v.name].append(sess.run(v))
#merge
checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
var = np.mean(np.concatenate(merge_value), 0)
update = tf.assign(v, var)
sess.run(update)
saver = tf.train.Saver(tf.global_variables())
model_name = 'model_merge_' + "_".join(merge_score_list)
checkpoint_merge_path = os.path.join(root_model_path, model_name)
saver.save(sess, checkpoint_merge_path)
print(" successfully !!!")
最后为了以防万一,我们可以实际检查一下,打印一下看看是不是新模型的参数是其他模型的平均值
import os
import numpy as np
import tensorflow as tf
top_k = 3
root_model_path = "./cache/soccer/2021-10-20_11-06-41"
model_score_dict = dict()
for filename in os.listdir(root_model_path):
if filename.endswith(".ckpt.meta"):
score = float(filename.split("_")[4][:-10])
model_score_dict[score] = filename[:-5]
# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
merge_score_list.append(str(i))
checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
if v.name not in model_parameter_dict:
model_parameter_dict[v.name] = [sess.run(v)]
else:
model_parameter_dict[v.name].append(sess.run(v))
# checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
# graph = tf.Graph()
# with graph.as_default():
# config = tf.ConfigProto(allow_soft_placement=True)
# config.gpu_options.allow_growth = True
# sess = tf.Session(config=config)
# with sess.as_default():
# saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
# saver.restore(sess, checkpoint_path)
# variable_list = sess.graph.get_collection('trainable_variables')
# for v in variable_list:
# merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
# var = np.mean(np.concatenate(merge_value), 0)
# update = tf.assign(v, var)
# sess.run(update)
# saver = tf.train.Saver(tf.global_variables())
# model_name = 'model_merge_' + "_".join(merge_score_list)
# checkpoint_merge_path = os.path.join(root_model_path, model_name)
# saver.save(sess, checkpoint_merge_path)
#
#
# print(" successfully !!!")
for i in model_parameter_dict["feature_output/kernel:0"]:
print(i[:2])
model_parameter_dict = dict()
checkpoint_path = os.path.join(root_model_path, "model_merge_0.8166_0.816_0.8081")
graph = tf.Graph()
with graph.as_default():
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
with sess.as_default():
saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
if v.name not in model_parameter_dict:
model_parameter_dict[v.name] = [sess.run(v)]
else:
model_parameter_dict[v.name].append(sess.run(v))
for i in model_parameter_dict["feature_output/kernel:0"]:
print(i[:2])
关注
知乎:
github: