MindSpore求导传入sens值时infer报错For &apos；MatMul&apos；, the input dimensions

最新推荐文章于 2024-09-14 19:55:48 发布

beauty0220

最新推荐文章于 2024-09-14 19:55:48 发布

阅读量188

点赞数

文章标签： python 开发语言

本文链接：https://blog.csdn.net/beauty0220/article/details/129155241

版权

文章描述了一个在MindSpore环境中遇到的关于MatMul操作的错误，原因是输入形状不匹配。通过分析错误信息和反向传播规则，发现是GradOperation传入的敏感度参数sens值的shape不正确。解决方案是调整sens的shape以匹配正向网络的输出shape。文章强调了利用报错信息分析问题和理解自动微分机制的重要性。

摘要由CSDN通过智能技术生成

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): GPU Software Environment:

MindSpore version (source or binary): 1.7.0
Python version (e.g., Python 3.7.5): 3.7.5
OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.4 LTS
GCC/Compiler version (if compiled from source): 7.5.0

1.2 基本信息

1.2.1源码

import numpy as np
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore import Tensor
from mindspore import ParameterTuple, Parameter
from mindspore import dtype as mstype

x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)
y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.matmul = ops.MatMul()
        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')

    def construct(self, x, y):
        x = x * self.z
        out = self.matmul(x, y)
        return out


class GradNetWrtN(nn.Cell):
    def __init__(self, net):
        super(GradNetWrtN, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation(sens_param=True)
        self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2]], dtype=mstype.float32)

    def construct(self, x, y):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, y, self.grad_wrt_output)


output = GradNetWrtN(Net())(x, y)
print(output)
复制

1.2.2报错

报错信息：ValueError: For 'MatMul', the input dimensions must be equal, but got 'x1_col': 2 and 'x2_row': 1. And 'x' shape [2, 3](transpose_a=True), 'y' shape [1, 3](transpose_b=False).

2 原因分析

根据报错信息，是MatMul算子在infer shape时检查输入的shape不正确，具体是x1的列数不等于x2的行数。
打开报错提供的debug文件/root/gitee/mindspore/rank_0/om/analyze_fail.dat，截取部分如下：
参考analyze_fail.dat文件分析指导，可知是第一个红框的MatMul在infer shape时报错。然后再看第二个红框，该算子的第一个输入的shape为(2, 3)，第二个输入的shape为(1, 3)，跟报错信息吻合（注意这里MatMul的transpose_a属性为True）。最后我们看第三个红框，该MatMul是在grad_math_ops.py文件中的253行被调用的，是MatMul算子的反向传播规则生成的算子，MatMul算子的反向传播规则如下所示：我们看到这个MatMul算子的两个输入，分别是x和dout，x的shape确认是对的，那就是dout的shape是错误的。
从反向自动微分的机制我们知道，反向部分的第一个算子是从正向部分的最后一个算子的反向传播规则生成的。而正向网络只有一个MatMul算子，且是最后的一个算子，所以infer shape报错的这个反向MatMul算子，是从这一个正向的MatMul算子的反向传播规则生成的（此用例比较简单，正向网络中只有一个MatMul算子，更复杂的用例可以参考analyze_fail.dat文件分析指导，结合算子输入输出推断出某个反向算子是从哪个正向算子的反向传播规则生成的），而且是反向部分的第一个算子。因此这个反向MatMul的第二个输入dout只能从外部传进来，也就是用例中传的self.grad_wrt_output。也就是self.grad_wrt_output的shape是错误的。

3 解决方法

GradOperation传入的sens值，是脚本从外部传入的关于正向网络输出的梯度，能起到梯度值缩放的作用。既然是关于正向网络输出的梯度，那么sens值的shape是需要跟正向网络的输出shape（可以通过调用一下正向网络，打印它的输出shape得到）一致的。我们把上面用例的self.grad_wrt_output的值改一下，如下：

self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)
复制

最后就可以解决该问题了。在MindSpore官方教程的自动微分章节里也能看到相关指导：

4 总结

执行MindSpore用例报错时，要善于利用报错信息去分析问题，也可以多看看官方教程。

5 参考文档

自动微分 — MindSpore master documentation

查看中间文件 — MindSpore master documentation

beauty0220

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫