使用MindSpore解决abs的两个操作对象不能进行broadcast问题

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): Ascend Software Environment: -- MindSpore version (source or binary): 1.6.0 -- Python version (e.g., Python 3.7.5): 3.7.6 -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic -- GCC/Compiler version (if compiled from source):

1.2 基本信息

1.2.1 脚本

训练脚本是通过构建Abs的单算子网络,对输入两个张量做Sub运算后再计算Abs。脚本如下:

 class Net(nn.Cell):
   def __init__(self):
     super(Net, self).__init__()
     self.abs = ops.Abs()

   def construct(self, x1,x2):
     output = self.abs(x1 - x2)
     return output
 net = Net()
 x1 = Tensor(np.ones((2, 5), dtype=np.float32), mindspore.float32)
 x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
 out = net(x1,x2)
 print('out',out.shape)

2 报错

这里报错信息如下:

The function call stack (See file '/demo/rank_0/om/analyze_fail.dat' for more details):
# 0 In file demo.py(7)
         output = self.abs(x1 - x2)
                           ^

Traceback (most recent call last):
  File "demo.py", line 13, in <module>
    out = net(x1,x2)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 576, in __call__
    out = self.compile_and_run(*args)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 942, in compile_and_run
    self.compile(*inputs)
  File "/lib/python3.7/site-packages/mindspore/nn/cell.py", line 915, in compile
    _cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
  File "/lib/python3.7/site-packages/mindspore/common/api.py", line 791, in compile
    result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
  File "/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 575, in __infer__
    out[track] = fn(*(x[track] for x in args))
  File "/lib/python3.7/site-packages/mindspore/ops/operations/math_ops.py", line 78, in infer_shape
    return get_broadcast_shape(x_shape, y_shape, self.name)
  File "/lib/python3.7/site-packages/mindspore/ops/_utils/utils.py", line 70, in get_broadcast_shape
    raise ValueError(f"For '{prim_name}', {arg_name1}.shape and {arg_name2}.shape are supposed "
ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape[i] = 1 or -1 or y.shape[i] = 1 or -1 or x.shape[i] = y.shape[i], but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5].

原因分析

我们看报错信息,在ValueError中,写到ValueError: For 'Sub', x.shape and y.shape are supposed to broadcast, where broadcast means that x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,意思是abs的两个操作对象不能进行broadcast,broadcast的要求是x.shape = 1 or -1 or y.shape = 1 or -1 or x.shape = y.shape,而x.shape = y.shape要求两个参数的shape完全相等,在进一步的报错信息中也有写到but now x.shape and y.shape can not broadcast, got i: -2, x.shape: [2, 5], y.shape: [3, 5],显然,x和y的第一个维度不等,这就是问题出现的原因了。关于BroadCast,在官网做了输入限制,对输入的Tensor要求shape必须相同。在其他的双输入算子中,有一定量算子用到了BroadCast操作,也应当注意这点。

3 解决方法

基于上面已知的原因,很容易做出如下修改: 示例1: 

此时执行成功,输出如下:

out: (3, 5)

示例2:

class Net(nn.Cell):
   def __init__(self):
​     super(Net, self).__init__()
​     self.abs = ops.Abs()

   def construct(self, x1,x2):
​     output = self.abs(x1 - x2)
​     return output
 net = Net()
 x1 = Tensor(np.ones((5), dtype=np.float32), mindspore.float32)
 x2 = Tensor(np.ones((3, 5), dtype=np.float32), mindspore.float32)
 out = net(x1,x2)
 print('out',out.shape)

此时执行成功,输出如下:

out: (3, 5)

4 总结

定位报错问题的步骤:

1、找到报错的用户代码行:output = self.abs(x1 - x2);

2、 根据日志报错信息中的关键字,缩小分析问题的范围:x.shape: [2, 5], y.shape: [3, 5];

3、需要重点关注变量定义、初始化的正确性。

5 参考文档

5.1 broadcast方法

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值