caffe多GPU错误parallel.cpp:241] Check failed: solver_->net()->params().size() == solver_->net()->learna

最新推荐文章于 2021-05-17 21:10:30 发布

诗大人

最新推荐文章于 2021-05-17 21:10:30 发布

阅读量749

点赞数

分类专栏： caffe

本文链接：https://blog.csdn.net/u010205128/article/details/81298751

版权

caffe 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

在进行siamese多GPU训练是遇到这样的错误：

F0731 09:13:30.965281 14024 parallel.cpp:241] Check failed: solver_->net()->params().size() == solver_->net()->learnable_params().size() (118 vs. 80) Layer-wise reduce is not supported for nets with shared weights.
*** Check failure stack trace: ***

github caffe的issue#5583说出了解决方法，就是在solver.prototxt里面添加这样一句话：layer_wise_reduce: false 再重新训练就解决这个问题了。

issue#5583：https://github.com/BVLC/caffe/issues/5538

上面有这样一句话：Setting layer_wise_reduce: false in the solver specification should resolve this. The issue is that the order of gradients with weight sharing does not necessarily respect topological ordering of the layer graph that the parallel implementation follows for overlapping communication with computation. The error is triggered to keep from accidentally computing the wrong gradients.

也就是说，在执行并行时依照的层图的拓扑顺序，是为了使用计算进行重叠通信，权值共享的梯度顺序不必依照。触发这个错误是为了防止突然计算了一个错误的梯度。

这个设置是一个可选项，默认为true。

诗大人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
caffe多GPU错误parallel.cpp:241] Check failed: solver_->net()->params().size() == solver_->net()->learna

在进行siamese多GPU训练是遇到这样的错误：F0731 09:13:30.965281 14024 parallel.cpp:241] Check failed: solver_-&gt;net()-&gt;params().size() == solver_-&gt;net()-&gt;learnable_params().size() (118 vs. 80) Layer-wise ...
复制链接

扫一扫