1 报错描述
1.1 系统环境
Hardware Environment(Ascend/GPU/CPU): Ascend Software Environment: -- MindSpore version (source or binary): 1.6.0 -- Python version (e.g., Python 3.7.5): 3.7.6 -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic -- GCC/Compiler version (if compiled from source):
1.2 基本信息
1.2.1 脚本
训练脚本是通过构建Pad的单算子网络,对输入张量进行边界填充。脚本如下:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.pad_op = nn.Pad(((0, 0), (-1, 1)))
05
06 def construct(self, input_x):
07 output = self.pad_op(input_x)
08 return output
09 net = Net()
10
11 input_x = Tensor(np.ones([3, 3]),mindspore.float32)
12 output = net(input_x)
13 print('output shape',output.shape)
1.2.2 报错
这里报错信息如下:
[INFO] ANALYZER(169538,ffffb713b010,python):2022-04-07-11:09:58.692.494 [mindspore/ccsrc/pipeline/jit/static_analysis/async_eval_result.cc:103] Wait] Infer finished.
Traceback (most recent call last):
File "pad.py", line 12, in <module>
output = net(input_x)
File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 569, in __call__
out = self.compile_and_run(*args)
File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 899, in compile_and_run
self.compile(*inputs)
File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 884, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/lib/python3.7/site-packages/mindspore/common/api.py", line 784, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
File "/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 575, in __infer__
out[track] = fn(*(x[track] for x in args))
File "/lib/python3.7/site-packages/mindspore/ops/operations/nn_ops.py", line 3944, in infer_shape
raise ValueError(f"For '{self.name}', all elements of paddings must be >= 0.")
ValueError: For 'Pad', all elements of paddings must be >= 0.
原因分析
我们看报错信息,在ValueError中,写到For 'Pad', all elements of paddings must be >= 0,意思是传的paddings必须大于等于0,根据这个报错和报错代码行很容易回溯到04行代码初始化padding属性值时,传入了负数-1,此时在API映射中查询pad可知,pytorch与MindSpore的pad算子存在使用差异(如图),
详见[https://www.mindspore. cn/docs/migration_guide/zh-CN/r1.6/api_mapping/pytorch_diff/Pad.html](https://www.mindspore. cn/docs/migration_guide/zh-CN/r1.6/api_mapping/pytorch_diff/Pad.html)。值得注意的是,在官网r1.3版本中写的是torch.nn.functional.pad和mindspore.ops.Pad功能一致,该处描述存在一定的缺陷,在r1.5分支及其以后分支中均已对两者的异同点进行了描述。
pytorch中能够传入负数,示例脚本如下:
1 x = torch.empty(3, 3)
2 pad = (-1, 1)
3 output = torch.nn.functional.pad(x, pad)
4 print('output shape',output.shape)
执行成功的输出如下:
output shape torch.Size([3, 3])
2 解决方法
基于上面已知的原因,很容易做出如下修改:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.pad_op = nn.Pad(((0, 0), (1, 1)))
05
06 def construct(self, input_x):
07 output = self.pad_op(input_x)
08 return output
09 net = Net()
10
11 input_x = Tensor(np.ones([3, 3]),mindspore.float32)
12 output = net(input_x)
13 print('output shape',output.shape)
此时执行成功,输出如下:
output shape (3, 5)
3 总结
定位报错问题的步骤:
1、 找到报错的用户代码行:output = self.pad_op(input_x);
2、 根据日志报错信息中的关键字,缩小分析问题的范围: For 'Pad', all elements of paddings must be >= 0;
3、 查询不同版本上的API映射,与标杆算子进行对比;
4、 需要重点关注变量定义、初始化的正确性。
4 参考文档
4.1 API映射