模型融合实践代码

前言

我们在训练完模型后,可以试试模型融合,一般会带来一点收益,这里所说的模型融合是这样一种场景:假设训练了多个epoch,我们就可以拿到topk个最好的模型,假设3个吧:a,b,c。那么我们可以分别加载这三个模型,取得它们训练好的参数值然后平均,将平均后的参数值作为一个新的模型。当然了还有一种模型融合是将得到的三个模型的预测结果取平均(类似投票原理),这也可以试试,但是这种方式显然在上线的时候要先过三个模型,耗时上太大,而前者所说的参数融合最后上线还是一个模型。所以本篇就简单写两个demo ,供大家快速落地这种想法,就写最常用的tensorflow和pytorch版本吧,其他的框架,大家顺藤摸瓜找到其对应的API就可以了。

欢迎文末关注笔者微信公众号等等,会有更多好内容分享给大家~

思路

要做成这个事,无外乎我们需要两步第一:分别得到topk个模型的参数值。第二:将平均的结果重写到模型中。再说的直白点,主要就是要找到两个API,一个是怎么得到模型的参数值?另一个是修改模型参数值的API。所以如果你用的是其他的框架,也是去看看这两个API在该框架下分别是什么?问题就基本解决了。

Pytorch

核心伪代码:

model_dict = [cur_model.state_dict() for cur_model in models]
paramter_name_list = list(model_dict[0].keys())
new_model_parameter = collections.OrderedDict()
for paramter_name in paramter_name_list:
    paramter_val = 0
    for i in range(len(model_dict)):
        paramter_val
 = paramter_val + model_dict[i][paramter_name]
    new_model_parameter[paramter_name]=paramter_val/len(model_dict)
new_model.load_state_dict(new_model_parameter)

下面我们写一个真实的例子:就那我们最常使用的transformers为例子吧。

import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import torch
import collections
from tqdm import tqdm
from transformers import BertTokenizer
from basic_modules.SimBERT_model import SimBertModel
from transformers import WEIGHTS_NAME, CONFIG_NAME
class Merge_model():
​
    def __init__(self, root_models_path):
        self.root_models_path = root_models_path
        self.model_list = []
        print("*"*20)
        print("load models ...")
        model_path_list = tqdm(os.listdir(self.root_models_path))
        for model_path in model_path_list:
            model_path_list.set_description("loading %s" % model_path)
            model_path = os.path.join(self.root_models_path, model_path)
            model = SimBertModel(model_path)
            #self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            #model = SimBertModel(model_path).to(self.device)
            model.load_state_dict(torch.load(os.path.join(model_path, 'pytorch_model.bin')))
            self.model_list.append(model)
        print("All models loaded successfully")
        print("num of models to be merged is : " + str(len(self.model_list)))
        
        self.bert_tokenizer = BertTokenizer.from_pretrained(os.path.join(self.root_models_path, model_path_list[0]), do_lower_case=True)
        self.merge_model = SimBertModel(os.path.join(self.root_models_path, model_path_list[0]))
            
    def merge_average(self, output_dir=None):
        '''
        权重平均融合
        '''
        print("*"*20)
        print("merging ...")
        worker_state_dict = [x.state_dict() for x in self.model_list]
        weight_keys = tqdm(list(worker_state_dict[0].keys()))
        fed_state_dict = collections.OrderedDict()
        for key in weight_keys:
            weight_keys.set_description("merging weights %s" % key)
            key_sum = 0
            for i in range(len(self.model_list)):
                key_sum = key_sum + worker_state_dict[i][key]
            fed_state_dict[key] = key_sum / len(models)
        
        self.merge_model.load_state_dict(fed_state_dict)
        
        print("*"*20)
        print("merge successfully")
        print("save ...")
        
        if not output_dir:
            output_dir = os.path.join(self.root_models_path, "merge")
        
        # 它包装在PyTorch DistributedDataParallel或DataParallel中
        model_to_save = self.merge_model.module if hasattr(self.merge_model, 'module') else self.merge_model
        # # 如果使用预定义的名称保存,则可以使用`from_pretrained`加载
        output_model_file = os.path.join(output_dir, WEIGHTS_NAME)
        output_config_file = os.path.join(output_dir, CONFIG_NAME)
​
        torch.save(model_to_save.state_dict(), output_model_file)
        model_to_save.config.to_json_file(output_config_file)
        self.bert_tokenizer.save_vocabulary(output_dir)
        
        print("save successfully")

tensorflow

说到tensorflow,大家可能最容易想到get_weights()和set_weights(),笔者在查了后,试了一下并没有成功,首先说一下这里的难点,我们在平均参数的时候,首先要取到参数值,而且我们其实只是关心可训练参数的值即trainable_variables,对于非训练的参数值我们不需要平均,另外还有一个问题是:修改参数值,如果去查相关的博客,大概率会查到很多,但是基本上都解决不了当前的问题,大部分都是比如:https://blog.csdn.net/jiongnima/article/details/86632517

其实这是有问题的,就是你修改完了,发现加载的时候是出错的,报错原因就是没有一些模型输入的placeholder等等,其实更深层的原因正如第一个评论所说的:

这种方法可以更改变量名,但是原模型的graph没有继承,不能用于模型再利用,因为回写模型之后的图不是原模型的图,而是一个没有逻辑的图。网上还有方法介绍了通过制定var_list进行saver.save的方法,也不行,指定的list变量被写入data数据,但是同样图也不对,saver.save进行save的时候,会将当前图中所有变量全部写入meta文件,这与原模型的图还是不一样。tensorflow在模型再利用方面,没有提供什么有效的接口。

这里笔者采用了tf.assign这个API。核心伪代码:

# get parameter
new_model_parameter = collections.OrderedDict()
for checkpoint_path in checkpoint_path_list:
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in new_model_parameter:
            new_model_parameter[v.name] = [sess.run(v)]
        else:
            new_model_parameter[v.name].append(sess.run(v))
​
#merge parameter
checkpoint_path = checkpoint_path_list[0]
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        merge_value = list(map(lambda x: np.expand_dims(x, 0), new_model_parameter[v.name]))
        var = np.mean(np.concatenate(merge_value), 0)
        update = tf.assign(v, var)
        sess.run(update)
    saver = tf.train.Saver(tf.global_variables())
    model_name = 'model_merge'
    checkpoint_merge_path = model_name
    saver.save(sess, checkpoint_merge_path)

看着似乎很多,其实没什么,简单来说就是sess.graph.get_collection获得所有模型可训练参数的参数值,并保存在new_model_parameter字典中,然后再随便加载一个模型(为的是把原模型的图都加载进来),然后通过tf.assign改变参数值,最后保存即可。

同样给一个真实的例子:

import os
import numpy as np
import tensorflow as tf

top_k = 3
root_model_path = "./cache/soccer/2021-10-22_17-21-11"

model_score_dict = dict()
for filename in os.listdir(root_model_path):
    if filename.endswith(".ckpt.meta"):
        score = float(filename.split("_")[4][:-10])
        model_score_dict[score] = filename[:-5]

# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
    merge_score_list.append(str(i))
    checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in model_parameter_dict:
            model_parameter_dict[v.name] = [sess.run(v)]
        else:
            model_parameter_dict[v.name].append(sess.run(v))

#merge 
checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
        var = np.mean(np.concatenate(merge_value), 0)
        update = tf.assign(v, var)
        sess.run(update)
    saver = tf.train.Saver(tf.global_variables())
    model_name = 'model_merge_' + "_".join(merge_score_list)
    checkpoint_merge_path = os.path.join(root_model_path, model_name)
    saver.save(sess, checkpoint_merge_path)


print(" successfully !!!")

最后为了以防万一,我们可以实际检查一下,打印一下看看是不是新模型的参数是其他模型的平均值

import os
import numpy as np
import tensorflow as tf

top_k = 3
root_model_path = "./cache/soccer/2021-10-20_11-06-41"

model_score_dict = dict()
for filename in os.listdir(root_model_path):
    if filename.endswith(".ckpt.meta"):
        score = float(filename.split("_")[4][:-10])
        model_score_dict[score] = filename[:-5]

# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
    merge_score_list.append(str(i))
    checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in model_parameter_dict:
            model_parameter_dict[v.name] = [sess.run(v)]
        else:
            model_parameter_dict[v.name].append(sess.run(v))

# checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
# graph = tf.Graph()
# with graph.as_default():
#     config = tf.ConfigProto(allow_soft_placement=True)
#     config.gpu_options.allow_growth = True
#     sess = tf.Session(config=config)
#     with sess.as_default():
#         saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
#         saver.restore(sess, checkpoint_path)
#     variable_list = sess.graph.get_collection('trainable_variables')
#     for v in variable_list:
#         merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
#         var = np.mean(np.concatenate(merge_value), 0)
#         update = tf.assign(v, var)
#         sess.run(update)
#     saver = tf.train.Saver(tf.global_variables())
#     model_name = 'model_merge_' + "_".join(merge_score_list)
#     checkpoint_merge_path = os.path.join(root_model_path, model_name)
#     saver.save(sess, checkpoint_merge_path)
# 
# 
# print(" successfully !!!")


for i in model_parameter_dict["feature_output/kernel:0"]:
    print(i[:2])


model_parameter_dict = dict()
checkpoint_path = os.path.join(root_model_path, "model_merge_0.8166_0.816_0.8081")
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
    if v.name not in model_parameter_dict:
        model_parameter_dict[v.name] = [sess.run(v)]
    else:
        model_parameter_dict[v.name].append(sess.run(v))

for i in model_parameter_dict["feature_output/kernel:0"]:
    print(i[:2])

关注

知乎:

小小梦想 - 知乎ML/NLP研究员,欢迎关注微信公众号“算法让生活更美好” 回答数 75,获得 46 次赞同https://www.zhihu.com/people/sa-tuo-de-yisheng/posts

github:

Mryangkaitong · GitHub

  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值