Check failed: top_shape[j] == bottom[i]->shape(j) (1 vs. 2) All inputs must have the same shape, exc

最新推荐文章于 2024-08-19 11:19:49 发布

Chris_zhangrx

最新推荐文章于 2024-08-19 11:19:49 发布

阅读量6.4k

点赞数

分类专栏： caffe学习机器学习配置文件文章标签： caffe学习框架深度学习网络

本文链接：https://blog.csdn.net/chris_zhangrx/article/details/78278488

版权

配置文件同时被 3 个专栏收录

56 篇文章 1 订阅

订阅专栏

caffe学习

21 篇文章 1 订阅

订阅专栏

机器学习

18 篇文章 0 订阅

订阅专栏

在运行ShuffleNet的过程中碰到了如下报错 :

I1018 19:26:19.104892  3548 net.cpp:84] Creating Layer resx13_concat
I1018 19:26:19.104895  3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898  3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902  3548 net.cpp:380] resx13_concat -> resx13_concat
F1018 19:26:19.104913  3548 concat_layer.cpp:42] Check failed: top_shape[j] == bottom[i]->shape(j) (1 vs. 2) All inputs must have the same shape, except at concat_axis.
*** Check failure stack trace: ***
    @     0x7f2beb8fcdaa  (unknown)
    @     0x7f2beb8fcce4  (unknown)
    @     0x7f2beb8fc6e6  (unknown)
    @     0x7f2beb8ff687  (unknown)
    @     0x7f2bebfc6227  caffe::ConcatLayer<>::Reshape()
    @     0x7f2bec05e365  caffe::Net<>::Init()
    @     0x7f2bec060262  caffe::Net<>::Net()
    @     0x7f2bec01b9a0  caffe::Solver<>::InitTrainNet()
    @     0x7f2bec01c8f3  caffe::Solver<>::Init()
    @     0x7f2bec01cbcf  caffe::Solver<>::Solver()
    @     0x7f2bec079b01  caffe::Creator_SGDSolver<>()
    @           0x40ee6e  caffe::SolverRegistry<>::CreateSolver()
    @           0x407efd  train()
    @           0x40590c  main
    @     0x7f2bea908f45  (unknown)
    @           0x40617b  (unknown)
    @              (nil)  (unknown)

可以看到，是输入和输出的blob尺寸不对才导致了这个错误，查看训练的log文件，报错是在

I1018 19:26:19.104895  3548 net.cpp:406] resx13_concat <- resx13_match_conv
I1018 19:26:19.104898  3548 net.cpp:406] resx13_concat <- resx13_conv3
I1018 19:26:19.104902  3548 net.cpp:380] resx13_concat -> resx13_concat

层上面，也就是Concat层数据传输有问题，在这之前也有相似的Concat连接，刚开始也是没有头绪，通过仔细查看前面concat连接的日志发现了问题所在。

I1018 19:26:19.052892  3548 net.cpp:84] Creating Layer resx1_conv3
I1018 19:26:19.052904  3548 net.cpp:406] resx1_conv3 <- resx1_conv2
I1018 19:26:19.052908  3548 net.cpp:380] resx1_conv3 -> resx1_conv3
I1018 19:26:19.053154  3548 net.cpp:122] Setting up resx1_conv3
I1018 19:26:19.053160  3548 net.cpp:129] Top shape: 90 216 6 6 (699840)

上面是rex1_conv3层的定义，可以看见输出shape 为 [90 216 6 6]

I1018 19:26:19.051407  3548 net.cpp:84] Creating Layer resx1_match_conv
I1018 19:26:19.051409  3548 net.cpp:406] resx1_match_conv <- pool1_pool1_0_split_0
I1018 19:26:19.051414  3548 net.cpp:380] resx1_match_conv -> resx1_match_conv
I1018 19:26:19.051427  3548 net.cpp:122] Setting up resx1_match_conv
I1018 19:26:19.051434  3548 net.cpp:129] Top shape: 90 24 6 6 (77760)

上面是resx1_match_conv层的定义，可以看见输出的shape 为 [90 24 6 6]
然后：

I1018 19:26:19.053496  3548 net.cpp:84] Creating Layer resx1_concat
I1018 19:26:19.053500  3548 net.cpp:406] resx1_concat <- resx1_match_conv
I1018 19:26:19.053503  3548 net.cpp:406] resx1_concat <- resx1_conv3
I1018 19:26:19.053508  3548 net.cpp:380] resx1_concat -> resx1_concat
I1018 19:26:19.053527  3548 net.cpp:122] Setting up resx1_concat
I1018 19:26:19.053532  3548 net.cpp:129] Top shape: 90 240 6 6 (777600)
I1018 19:26:19.053534  3548 net.cpp:137] Memory required for data: 51244200

正常输入，网络继续搭建。

————————————–分割线—————————————

I1018 19:26:19.102349  3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351  3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355  3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373  3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377  3548 net.cpp:129] Top shape: 90 480 1 1 (43200)

此时resx13_match_conv的输出shape是 [ 90 480 1 1]

I1018 19:26:19.103997  3548 net.cpp:84] Creating Layer resx13_conv3
I1018 19:26:19.103999  3548 net.cpp:406] resx13_conv3 <- resx13_conv2
I1018 19:26:19.104003  3548 net.cpp:380] resx13_conv3 -> resx13_conv3
I1018 19:26:19.104576  3548 net.cpp:122] Setting up resx13_conv3
I1018 19:26:19.104599  3548 net.cpp:129] Top shape: 90 480 2 2 (172800)

此时resx13_conv3的输出shape是 [ 90 480 2 2 ]

这就是top_shape[j] == bottom[i]->shape(j) (1 vs. 2)报错是（1 vs. 2）的原因。

解决方法

我们需要在train.prototxt修改网络参数，来让resx13_match_conv的输出也变成[ 90 480 2 2]，所以找到resx13_match_conv层的定义：

layer {
  name: "resx13_match_conv"
  type: "Pooling"
  bottom: "resx12_elewise"
  top: "resx13_match_conv"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}

因为上一层来的数据从下面日志文件可以读到shape是：[ 90 480 3 3]

I1018 19:26:19.102326  3548 net.cpp:122] Setting up resx12_elewise_resx12_elewise_relu_0_split
I1018 19:26:19.102330  3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102334  3548 net.cpp:129] Top shape: 90 480 3 3 (388800)
I1018 19:26:19.102335  3548 net.cpp:137] Memory required for data: 256336200
I1018 19:26:19.102337  3548 layer_factory.hpp:77] Creating layer resx13_match_conv
I1018 19:26:19.102349  3548 net.cpp:84] Creating Layer resx13_match_conv
I1018 19:26:19.102351  3548 net.cpp:406] resx13_match_conv <- resx12_elewise_resx12_elewise_relu_0_split_0
I1018 19:26:19.102355  3548 net.cpp:380] resx13_match_conv -> resx13_match_conv
I1018 19:26:19.102373  3548 net.cpp:122] Setting up resx13_match_conv
I1018 19:26:19.102377  3548 net.cpp:129] Top shape: 90 480 1 1 (43200)

所以这里将 kernel_size: 3 改为 kernel_size: 2，即：

layer {
  name: "resx13_match_conv"
  type: "Pooling"
  bottom: "resx12_elewise"
  top: "resx13_match_conv"
  pooling_param {
    pool: AVE
    kernel_size: 2
    stride: 2
  }
}

这样就可以改变输出的数据shape，网络就可以成功训练了！