MindSpore报错 task_fail_info or current_graph_ is nullptr

最新推荐文章于 2022-11-14 11:06:18 发布

BaldheadedM

最新推荐文章于 2022-11-14 11:06:18 发布

阅读量469

点赞数

文章标签：大数据深度学习 python

原文链接：https://bbs.huaweicloud.com/forum/thread-192844-1-1.html

版权

1 报错描述

1.1 系统环境

Hardware Environment(Ascend/GPU/CPU): Ascend
Software Environment:
– MindSpore version (source or binary): 1.8.0
– Python version (e.g., Python 3.7.5): 3.7.6
– OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 4.15.0-74-generic
– GCC/Compiler version (if compiled from source):

1.2 基本信息

1.2.1 脚本

训练脚本是通过构建CTCGreedyDecoder的单算子网络，对输入中给出的 logits 执行贪婪解码（最佳路径）。脚本如下：

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
 05 
 06     def construct(self, input_x, sequence_length):
 07         return self.ctc_greedyDecoder(input_x, sequence_length)
 08 net = Net()
 09 
 10 
 11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
 12                           [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
 13 sequence_length = Tensor(np.array([4, 2]), mindspore.int32)
 14 
 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
 16 print(decoded_indices, decoded_values, decoded_shape, log_probability)

1.2.2 报错

这里报错信息如下：

[ERROR] DEVICE(172230,fffeae7fc160,python):2022-06-28-07:02:12.636.101 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:603] TaskFailCallback] Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr
Traceback (most recent call last):
  File "CTCGreedyDecoder.py", line 26, in <module>
    decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 573, in __call__
    out = self.compile_and_run(*args)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/nn/cell.py", line 979, in compile_and_run
    return _cell_graph_executor(self, *new_inputs, phase=self.phase)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1128, in __call__
    return self.run(obj, *args, phase=phase)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1165, in run
    return self._exec_pip(obj, *args, phase=phase_real)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 94, in wrapper
    results = fn(*arg, **kwargs)
  File "/root/archiconda3/envs/lilinjie_high/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in _exec_pip
    return self._graph_executor(args, phase)
RuntimeError: Call runtime rtStreamSynchronize failed. Op name: Default/CTCGreedyDecoder-op2

原因分析

我们看报错信息，在Error中，写到Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr，虽然从这句报错里不能非常明确的发现问题处在哪个地方，这时候可以提取里面的关键词进行猜想验证，里面出现了一个nullptr，可能是出现了越界导致的。再仔细查看下官网对各参数的描述，

结合脚本第13行发现这个条件不被满足，因此报错。

2 解决方法

基于上面已知的原因，很容易做出如下修改：

 01 class Net(nn.Cell):
 02     def __init__(self):
 03         super(Net, self).__init__()
 04         self.ctc_greedyDecoder = ops.CTCGreedyDecoder()
 05 
 06     def construct(self, input_x, sequence_length):
 07         return self.ctc_greedyDecoder(input_x, sequence_length)
 08 net = Net()
 09 
 10 
 11 inputs = Tensor(np.array([[[0.6, 0.4, 0.2], [0.8, 0.6, 0.3]],
 12                           [[0.0, 0.6, 0.0], [0.5, 0.4, 0.5]]]), mindspore.float32)
 13 sequence_length = Tensor(np.array([2, 2]), mindspore.int32)
 14 
 15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length)
 16 print(decoded_indices, decoded_values, decoded_shape, log_probability)

此时执行成功，输出如下：

[[0 0]
 [0 1]
 [1 0]] [0 1 0] [2 2] [[-1.2]
 [-1.3]]

3 总结

定位报错问题的步骤：

1、找到报错的用户代码行：15 decoded_indices, decoded_values, decoded_shape, log_probability = net(inputs, sequence_length);

2、根据日志报错信息中的关键字，缩小分析问题的范围* Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr* ;

4 参考文档

4.1 CTCGreedyDecoder算子API接口

BaldheadedM

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫