1 报错描述
1.1 系统环境
Hardware Environment(Ascend/GPU/CPU): CPU
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):
1.2 基本信息
1.2.1 脚本
训练脚本是通过构建SoftmaxCrossEntropyWithLogits的单算子网络,计算两个变量softmax 交叉熵的例子。脚本如下:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.topk = ops.TopK(sorted=False)
05
06 def construct(self, x, k):
07 output = self.topk(x, k)
08 return output
09
10 net = Net()
11 x = Tensor(([[5, 2, 3, 3, 5], [5, 2, 9, 3, 5]]), mindspore.double)
12 k = 5
13 values, indices = net(x, k)
14 print(values, indices)
1.2.2 报错
这里报错信息如下:
Traceback (most recent call last):
File "C:/Users/l30026544/PycharmProjects/q2_map/new/I4H30H.py", line 21, in <module>
values, indices = net(x, k)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 586, in __call__
out = self.compile_and_run(*args)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 964, in compile_and_run
self.compile(*inputs)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\nn\cell.py", line 937, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\common\api.py", line 1006, in compile
result = self._graph_executor.compile(obj, args_list, phase, self._use_vm_mode())
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\ops\operations\nn_ops.py", line 2178, in __infer__
validator.check_tensor_dtype_valid('x', x_dtype, valid_dtypes, self.name)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 541, in check_tensor_dtype_valid
Validator.check_subclass(arg_name, arg_type, tensor_types, prim_name)
File "C:\Users\l30026544\PycharmProjects\q2_map\lib\site-packages\mindspore\_checkparam.py", line 493, in check_subclass
raise TypeError(f"For '{prim_name}', the type of '{arg_name}'"
TypeError: For 'TopK', the type of 'x' should be one of Tensor[Int32], Tensor[Float16], Tensor[Float32], but got Tensor[Float64] . The supported data types depend on the hardware that executes the operator, please refer the official api document to get more information about the data type.
WARNING: Logging before InitGoogleLogging() is written to STDERR
[WARNING] UTILS(11576,1,?):2022-6-25 8:31:24 [mindspore\ccsrc\utils\comm_manager.cc:78] GetInstance] CommManager instance for CPU not found, return default instance.
[ERROR] ANALYZER(11576,1,?):2022-6-25 8:31:24 [mindspore\ccsrc\pipeline\jit\static_analysis\async_eval_result.cc:66] HandleException] Exception happened, check the information as below.
The function call stack (See file 'C:\Users\l30026544\PycharmProjects\q2_map\new\rank_0\om/analyze_fail.dat' for more details):
# 0 In file C:/Users/l30026544/PycharmProjects/q2_map/new/I4H30H.py(15)
output = self.topk(x, k)
^
原因分析
我们看报错信息,在TypeError中,写到For ‘TopK’, the type of ‘x’ should be one of Tensor[Int32], Tensor[Float16], Tensor[Float32], but got Tensor[Float64],意思是对于TopK的输入类型必须是int32, float16或者float32, 而实际得到的是float64. 定位到代码第x行发现数据类型确实是float64, 解决的办法是调低数据精度。
2 解决方法
基于上面已知的原因,很容易做出如下修改:
01 class Net(nn.Cell):
02 def __init__(self):
03 super(Net, self).__init__()
04 self.topk = ops.TopK(sorted=False)
05
06 def construct(self, x, k):
07 output = self.topk(x, k)
08 return output
09
10 net = Net()
11 x = Tensor(([[5, 2, 3, 3, 5], [5, 2, 9, 3, 5]]), mindspore.float32)
12 k = 5
13 values, indices = net(x, k)
14 print(values, indices)
此时执行成功,输出如下:
[[5. 2. 3. 3. 5.]
[5. 2. 9. 3. 5.]] [[0 1 2 3 4]
[0 1 2 3 4]]
3 总结
定位报错问题的步骤:
1、找到报错的用户代码行:output = self.topk(x, k);
2、 根据日志报错信息中的关键字,缩小分析问题的范围For ‘TopK’, the type of ‘x’ should be one of Tensor[Int32], Tensor[Float16], Tensor[Float32] ;
3、需要重点关注变量定义、初始化的正确性。