在静态图下,某几行代码的前面加上print(...)和不加print(...)得到的计算结果不一致。
运行环境: mindspore-1.5.0 cpu版本、windows11、python 3.7.5
【操作步骤&问题现象】
这是我根据pytorch源码自定义的spectral_norm算子,在54行添加 print(weight_mat) 和不添加,得到的计算结果是不一样的。 其中添加后的结果和动态图模式下得到的结果一样
import numpy as npimport mindsporefrom mindspore import nn, ops, Tensor, Parameter, context, numpy as msnpfrom mindspore.common.initializer import initializer, Normal
context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
mindspore.set_seed(123)
np.random.seed(123)
class _SpectralNorm(nn.Cell):
def __init__(
self,
weight,
n_power_iterations: int = 1,
dim: int = 0,
eps: float = 1e-12
) -> None:
super(_SpectralNorm, self).__init__()
self.dim = dim
self.n_power_iterations = n_power_iterations
self.eps = eps
self.expand_dims = ops.ExpandDims()
self.l2_normalize = ops.L2Normalize(epsilon=self.eps)
weight_mat = self._reshape_weight_to_matrix(weight)
h, w = weight_mat.shape
init_u = initializer(Normal(1.0, 0), [h], mindspore.float32).init_data()
init_v = initializer(Normal(1.0, 0), [w], mindspore.float32).init_data()
self.u = Parameter(self.l2_normalize(init_u), requires_grad=False)
self.v = Parameter(self.l2_normalize(init_v), requires_grad=False)
self._update_vectors(weight_mat, init=True)
def _reshape_weight_to_matrix(self, weight):
weight_mat = weight
if self.dim != 0:
list_dim = list(range(1, self.dim + 1))
list_dim.append(0)
for i in range(self.dim + 1, weight_mat.ndim):
list_dim.append(i)
weight_mat = msnp.moveaxis(weight_mat, list(range(weight_mat.ndim)), list_dim)
height = weight_mat.shape[0]
return weight_mat.reshape((height, -1))
def _update_vectors(self, weight_mat, init=False) -> None:
for _ in range(self.n_power_iterations):
self.u = self.l2_normalize(msnp.multi_dot([weight_mat, self.expand_dims(self.v, -1)]).flatten())
self.v = self.l2_normalize(msnp.multi_dot([weight_mat.T, self.expand_dims(self.u, -1)]).flatten())
return None
def construct(self, weight):
weight_mat = self._reshape_weight_to_matrix(weight)
print(weight_mat)
self._update_vectors(weight_mat)
sigma = ops.tensor_dot(self.u, msnp.multi_dot([weight_mat, self.expand_dims(self.v, -1)]), 1)
return weight / sigma
spe = _SpectralNorm(Tensor([[1,2],[3,4]], mindspore.float32))
res = spe(Tensor([[1,2],[3,4]], mindspore.float32))print("res: ", res)
【截图信息】
这是在graph模式下,在54行加上print得到的结果:
这是在graph模式下,不在54行加上print得到的结果:
这是pynative模式下得到的结果:
pynative模式下的结果和graph模式下加上print的结果一致,所以加上print的结果应该是正确的。这是控制流的问题吗?
解答:
_update_vectors函数,返回值是None,导致在没有加print的时候,在specialize阶段会被误优化掉,导致_update_vectors中的流程都没有走。本质上应该跟副作用没有关系,只是加了print,没有被误优化掉_update_vectors。建议将_update_vectors函数的返回值不要写成None。
函数返回值是None,而且函数中对参数有更新,则该函数将被误消除,这是不符合预期的。 后续针对函数特化时加上条件,对stop_gradient中的函数不做特化,以防止出现误优化的情况。 https://gitee.com/mindspore/mindspore/pulls/27097