keras 使用多GPU数据并行时遇到的问题

最新推荐文章于 2024-05-20 05:57:09 发布

The_answer_manba

最新推荐文章于 2024-05-20 05:57:09 发布

阅读量3.3k

点赞数

本文链接：https://blog.csdn.net/z5217632/article/details/80952372

版权

问题描述：在keras中，使用多GPU数据并行时，且采用ModelCheckpoint设置检查点，保存训练过程中的权重时，训练完第一个epochs后，报如下错误：
TypeError: can’t pickle thread.lock objects

问题分析：
无法获取线程锁对象，因为，在keras的数据并行机制中，需要用并行的multi_gpu_model()去封装非并行的model（），但是训练时，采用的是multi_gpu_model.fit()的形式，导致保存的模型是并行的模型，而不是非并行的模型。因此，报错。

解决办法：自定义一个回调函数
参考一下链接：https://blog.csdn.net/u012862372/article/details/80367607

本人采用的方法是自定义一个类，继承自ModelCheckpoint（）也就是参考链接中的方法二，问题解决。

class ParallelModelCheckpoint(ModelCheckpoint):
    def __init__(self,model,filepath, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
        self.single_model = model
        super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)

    def set_model(self, model):
        super(ParallelModelCheckpoint,self).set_model(self.single_model)

check_point = ParallelModelCheckpoint(single_model ,'best.hd5')

The_answer_manba

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
keras 使用多GPU数据并行时遇到的问题

问题描述：在keras中，使用多GPU数据并行时，且采用ModelCheckpoint设置检查点，保存训练过程中的权重时，训练完第一个epochs后，报如下错误： TypeError: can’t pickle thread.lock objects问题分析：无法获取线程锁对象，因为，在keras的数据并行机制中，需要用并行的multi_gpu_model()去封装非并行的model（）...
复制链接

扫一扫