1 报错描述
1.1 系统环境
Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
1.2 基本信息
1.2.1 脚本
训练脚本是通过构建AvgPool的单算子网络,对输入的多维数据进行二维平均池化运算。脚本如下:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.avgpool_op = ops.AvgPool(pad_mode="VALID", kernel_size=32, strides=1)
05
06 def construct(self, x):
07 result = self.avgpool_op(x)
08 return result
09
10 x = Tensor(np.arange(128 * 20 * 32 * 65).reshape(65, 32, 20, 128),mindspore.float32)
11 net = Net()
12 output = net(x)
13 print(output)
1.2.2 报错
这里报错信息如下:
Traceback (most recent call last):
File "avgpool.py", line 17, in <module>
output = net(x)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
out = self.compile_and_run(*args)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 956, in compile_and_run
self.compile(*inputs)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 929, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1063, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 575, in __infer__
out[track] = fn(*(x[track] for x in args))
File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/ops/operations/nn_ops.py", line 1572, in infer_shape
raise ValueError(f"For '{self.name}', the each element of the output shape must be larger than 0, "
ValueError: For 'AvgPool', the each element of the output shape must be larger than 0, but got output shape: [65, 32, -3, 105]. The input shape: [65, 32, 20, 128], kernel size: (1, 1, 24, 24), strides: (1, 1, 1, 1).Please check the official api documents for more information about the output.
原因分析
我们看报错信息,在ValueError中,写到For’AvgPool’, the each element of the output shape must be larger than 0, but got output shape: [65, 32, -3, 105].意思是输出形状的每一个值都应该大于零, 实际出现了负数, 一般输出数据范围不合理很可能是因为输入没有满足要求导致, 因此我们可以从排查各个输入的有效性着手。仔细查看官网API描述, 再结合我们输入的数据, strides=1在合理范围内, 但是kernel_size=32不合理,文档说明了在Ascend环境下,kernel_size的高度和宽度相乘应小于256, 而32*32=1024。因此我们可以调小kernel_size的高度或者宽度来解决这个问题。
2 解决方法
基于上面已知的原因,很容易做出如下修改:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.avgpool_op = ops.AvgPool(pad_mode="VALID", kernel_size=15, strides=1)
05
06 def construct(self, x):
07 result = self.avgpool_op(x)
08 return result
09
10 x = Tensor(np.arange(128 * 20 * 32 * 65).reshape(65, 32, 20, 128),mindspore.float32)
11 net = Net()
12 output = net(x)
13 print(output)
此时执行成功,输出如下:
(65, 32, 6, 114)
3 总结
定位报错问题的步骤:
1、找到报错的用户代码行:output = net(x);
2、 根据日志报错信息中的关键字,缩小分析问题的范围* the each element of the output shape must be larger than 0* ;
3、需要重点关注变量定义、初始化的正确性。