模型融合实践代码

最新推荐文章于 2024-07-09 15:43:33 发布

weixin_42001089

最新推荐文章于 2024-07-09 15:43:33 发布

阅读量1.5k

点赞数 1

文章标签：知识图谱神经网络人工智能

本文链接：https://blog.csdn.net/weixin_42001089/article/details/121673751

版权

前言

我们在训练完模型后，可以试试模型融合，一般会带来一点收益，这里所说的模型融合是这样一种场景：假设训练了多个epoch，我们就可以拿到topk个最好的模型，假设3个吧：a,b,c。那么我们可以分别加载这三个模型，取得它们训练好的参数值然后平均，将平均后的参数值作为一个新的模型。当然了还有一种模型融合是将得到的三个模型的预测结果取平均（类似投票原理），这也可以试试，但是这种方式显然在上线的时候要先过三个模型，耗时上太大，而前者所说的参数融合最后上线还是一个模型。所以本篇就简单写两个demo ，供大家快速落地这种想法，就写最常用的tensorflow和pytorch版本吧，其他的框架，大家顺藤摸瓜找到其对应的API就可以了。

欢迎文末关注笔者微信公众号等等，会有更多好内容分享给大家～

思路

要做成这个事，无外乎我们需要两步第一：分别得到topk个模型的参数值。第二：将平均的结果重写到模型中。再说的直白点，主要就是要找到两个API，一个是怎么得到模型的参数值？另一个是修改模型参数值的API。所以如果你用的是其他的框架，也是去看看这两个API在该框架下分别是什么？问题就基本解决了。

Pytorch

核心伪代码：

model_dict = [cur_model.state_dict() for cur_model in models]
paramter_name_list = list(model_dict[0].keys())
new_model_parameter = collections.OrderedDict()
for paramter_name in paramter_name_list:
    paramter_val = 0
    for i in range(len(model_dict)):
        paramter_val
 = paramter_val + model_dict[i][paramter_name]
    new_model_parameter[paramter_name]=paramter_val/len(model_dict)
new_model.load_state_dict(new_model_parameter)

下面我们写一个真实的例子：就那我们最常使用的transformers为例子吧。

import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import torch
import collections
from tqdm import tqdm
from transformers import BertTokenizer
from basic_modules.SimBERT_model import SimBertModel
from transformers import WEIGHTS_NAME, CONFIG_NAME
class Merge_model():

    def __init__(self, root_models_path):
        self.root_models_path = root_models_path
        self.model_list = []
        print("*"*20)
        print("load models ...")
        model_path_list = tqdm(os.listdir(self.root_models_path))
        for model_path in model_path_list:
            model_path_list.set_description("loading %s" % model_path)
            model_path = os.path.join(self.root_models_path, model_path)
            model = SimBertModel(model_path)
            #self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
            #model = SimBertModel(model_path).to(self.device)
            model.load_state_dict(torch.load(os.path.join(model_path, 'pytorch_model.bin')))
            self.model_list.append(model)
        print("All models loaded successfully")
        print("num of models to be merged is : " + str(len(self.model_list)))
        
        self.bert_tokenizer = BertTokenizer.from_pretrained(os.path.join(self.root_models_path, model_path_list[0]), do_lower_case=True)
        self.merge_model = SimBertModel(os.path.join(self.root_models_path, model_path_list[0]))
            
    def merge_average(self, output_dir=None):
        '''
        权重平均融合
        '''
        print("*"*20)
        print("merging ...")
        worker_state_dict = [x.state_dict() for x in self.model_list]
        weight_keys = tqdm(list(worker_state_dict[0].keys()))
        fed_state_dict = collections.OrderedDict()
        for key in weight_keys:
            weight_keys.set_description("merging weights %s" % key)
            key_sum = 0
            for i in range(len(self.model_list)):
                key_sum = key_sum + worker_state_dict[i][key]
            fed_state_dict[key] = key_sum / len(models)
        
        self.merge_model.load_state_dict(fed_state_dict)
        
        print("*"*20)
        print("merge successfully")
        print("save ...")
        
        if not output_dir:
            output_dir = os.path.join(self.root_models_path, "merge")
        
        # 它包装在PyTorch DistributedDataParallel或DataParallel中
        model_to_save = self.merge_model.module if hasattr(self.merge_model, 'module') else self.merge_model
        # # 如果使用预定义的名称保存，则可以使用`from_pretrained`加载
        output_model_file = os.path.join(output_dir, WEIGHTS_NAME)
        output_config_file = os.path.join(output_dir, CONFIG_NAME)

        torch.save(model_to_save.state_dict(), output_model_file)
        model_to_save.config.to_json_file(output_config_file)
        self.bert_tokenizer.save_vocabulary(output_dir)
        
        print("save successfully")

tensorflow

说到tensorflow，大家可能最容易想到get_weights()和set_weights()，笔者在查了后，试了一下并没有成功，首先说一下这里的难点，我们在平均参数的时候，首先要取到参数值，而且我们其实只是关心可训练参数的值即trainable_variables，对于非训练的参数值我们不需要平均，另外还有一个问题是：修改参数值，如果去查相关的博客，大概率会查到很多，但是基本上都解决不了当前的问题，大部分都是比如：https://blog.csdn.net/jiongnima/article/details/86632517

其实这是有问题的，就是你修改完了，发现加载的时候是出错的，报错原因就是没有一些模型输入的placeholder等等，其实更深层的原因正如第一个评论所说的：

这种方法可以更改变量名，但是原模型的graph没有继承，不能用于模型再利用，因为回写模型之后的图不是原模型的图，而是一个没有逻辑的图。网上还有方法介绍了通过制定var_list进行saver.save的方法，也不行，指定的list变量被写入data数据，但是同样图也不对，saver.save进行save的时候，会将当前图中所有变量全部写入meta文件，这与原模型的图还是不一样。tensorflow在模型再利用方面，没有提供什么有效的接口。

这里笔者采用了tf.assign这个API。核心伪代码：

# get parameter
new_model_parameter = collections.OrderedDict()
for checkpoint_path in checkpoint_path_list:
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in new_model_parameter:
            new_model_parameter[v.name] = [sess.run(v)]
        else:
            new_model_parameter[v.name].append(sess.run(v))

#merge parameter
checkpoint_path = checkpoint_path_list[0]
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        merge_value = list(map(lambda x: np.expand_dims(x, 0), new_model_parameter[v.name]))
        var = np.mean(np.concatenate(merge_value), 0)
        update = tf.assign(v, var)
        sess.run(update)
    saver = tf.train.Saver(tf.global_variables())
    model_name = 'model_merge'
    checkpoint_merge_path = model_name
    saver.save(sess, checkpoint_merge_path)

看着似乎很多，其实没什么，简单来说就是sess.graph.get_collection获得所有模型可训练参数的参数值，并保存在new_model_parameter字典中，然后再随便加载一个模型（为的是把原模型的图都加载进来），然后通过tf.assign改变参数值，最后保存即可。

同样给一个真实的例子：

import os
import numpy as np
import tensorflow as tf

top_k = 3
root_model_path = "./cache/soccer/2021-10-22_17-21-11"

model_score_dict = dict()
for filename in os.listdir(root_model_path):
    if filename.endswith(".ckpt.meta"):
        score = float(filename.split("_")[4][:-10])
        model_score_dict[score] = filename[:-5]

# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
    merge_score_list.append(str(i))
    checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in model_parameter_dict:
            model_parameter_dict[v.name] = [sess.run(v)]
        else:
            model_parameter_dict[v.name].append(sess.run(v))

#merge 
checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
        var = np.mean(np.concatenate(merge_value), 0)
        update = tf.assign(v, var)
        sess.run(update)
    saver = tf.train.Saver(tf.global_variables())
    model_name = 'model_merge_' + "_".join(merge_score_list)
    checkpoint_merge_path = os.path.join(root_model_path, model_name)
    saver.save(sess, checkpoint_merge_path)


print(" successfully !!!")

最后为了以防万一，我们可以实际检查一下，打印一下看看是不是新模型的参数是其他模型的平均值

import os
import numpy as np
import tensorflow as tf

top_k = 3
root_model_path = "./cache/soccer/2021-10-20_11-06-41"

model_score_dict = dict()
for filename in os.listdir(root_model_path):
    if filename.endswith(".ckpt.meta"):
        score = float(filename.split("_")[4][:-10])
        model_score_dict[score] = filename[:-5]

# get top parameter name and value
model_parameter_dict = dict()
merge_score_list = []
for i in sorted(model_score_dict, reverse=True)[:3]:
    merge_score_list.append(str(i))
    checkpoint_path = os.path.join(root_model_path, model_score_dict[i])
    graph = tf.Graph()
    with graph.as_default():
        config = tf.ConfigProto(allow_soft_placement=True)
        config.gpu_options.allow_growth = True
        sess = tf.Session(config=config)
        with sess.as_default():
            saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
            saver.restore(sess, checkpoint_path)
    variable_list = sess.graph.get_collection('trainable_variables')
    for v in variable_list:
        if v.name not in model_parameter_dict:
            model_parameter_dict[v.name] = [sess.run(v)]
        else:
            model_parameter_dict[v.name].append(sess.run(v))

# checkpoint_path = os.path.join(root_model_path, list(model_score_dict.values())[0])
# graph = tf.Graph()
# with graph.as_default():
#     config = tf.ConfigProto(allow_soft_placement=True)
#     config.gpu_options.allow_growth = True
#     sess = tf.Session(config=config)
#     with sess.as_default():
#         saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
#         saver.restore(sess, checkpoint_path)
#     variable_list = sess.graph.get_collection('trainable_variables')
#     for v in variable_list:
#         merge_value = list(map(lambda x: np.expand_dims(x, 0), model_parameter_dict[v.name]))
#         var = np.mean(np.concatenate(merge_value), 0)
#         update = tf.assign(v, var)
#         sess.run(update)
#     saver = tf.train.Saver(tf.global_variables())
#     model_name = 'model_merge_' + "_".join(merge_score_list)
#     checkpoint_merge_path = os.path.join(root_model_path, model_name)
#     saver.save(sess, checkpoint_merge_path)
# 
# 
# print(" successfully !!!")


for i in model_parameter_dict["feature_output/kernel:0"]:
    print(i[:2])


model_parameter_dict = dict()
checkpoint_path = os.path.join(root_model_path, "model_merge_0.8166_0.816_0.8081")
graph = tf.Graph()
with graph.as_default():
    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    with sess.as_default():
        saver = tf.train.import_meta_graph('{}.meta'.format(checkpoint_path))
        saver.restore(sess, checkpoint_path)
variable_list = sess.graph.get_collection('trainable_variables')
for v in variable_list:
    if v.name not in model_parameter_dict:
        model_parameter_dict[v.name] = [sess.run(v)]
    else:
        model_parameter_dict[v.name].append(sess.run(v))

for i in model_parameter_dict["feature_output/kernel:0"]:
    print(i[:2])

关注

知乎：

小小梦想 - 知乎ML/NLP研究员，欢迎关注微信公众号“算法让生活更美好” 回答数 75，获得 46 次赞同https://www.zhihu.com/people/sa-tuo-de-yisheng/posts

github:

Mryangkaitong · GitHub

weixin_42001089

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
模型融合实践代码

前言我们在训练完模型后，可以试试模型融合，一般会带来一点收益，这里所说的模型融合是这样一种场景：假设训练了多个epoch，我们就可以拿到topk个最好的模型，假设3个吧：a,b,c。那么我们可以分别加载这三个模型，取得它们训练好的参数值然后平均，将平均后的参数值作为一个新的模型。当然了还有一种模型融合是将得到的三个模型的预测结果取平均（类似投票原理），这也可以试试，但是这种方式显然在上线的时候要先过三个模型，耗时上太大，而前者所说的参数融合最后上线还是一个模型。所以本篇就简单写两个demo ，供大家快速
复制链接

扫一扫