1、基础环境
- linux
- Ascend 910
- MindSpore 2.22
2、问题描述及报错信息
2.1 问题描述
代码在GPU、CPU下均运行通过,但是在Ascend下运行失败。最小复现代码如下:
import mindspore as ms
ms.context.set_context(mode=ms.GRAPH_MODE, device_target='Ascend')
x = ms.ops.randn((1, 8500, 4))
out = ((x > 0.01) & (x < 0.99)).all(-1, keep_dims=True)
print('done')
2.2 报错信息
TypeError: Can not select a valid kernel info for [BitwiseAnd] in AI CORE or AI CPU kernel info candidates list.
----------------------------------------------------
- Kernel Info Candidates List:
----------------------------------------------------
AI CORE:
(<Int16xDefaultFormat>, <Int16xDefaultFormat>, object_type: [Tensor,Tensor]) -> (<Int16xDefaultFormat>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<UInt16xDefaultFormat>, <UInt16xDefaultFormat>, object_type: [Tensor,Tensor]) -> (<UInt16xDefaultFormat>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int32xDefaultFormat>, <Int32xDefaultFormat>, object_type: [Tensor,Tensor]) -> (<Int32xDefaultFormat>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int16xNC1HWC0>, <Int16xNC1HWC0>, object_type: [Tensor,Tensor]) -> (<Int16xNC1HWC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<UInt16xNC1HWC0>, <UInt16xNC1HWC0>, object_type: [Tensor,Tensor]) -> (<UInt16xNC1HWC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int32xNC1HWC0>, <Int32xNC1HWC0>, object_type: [Tensor,Tensor]) -> (<Int32xNC1HWC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int16xFRACTAL_Z>, <Int16xFRACTAL_Z>, object_type: [Tensor,Tensor]) -> (<Int16xFRACTAL_Z>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<UInt16xFRACTAL_Z>, <UInt16xFRACTAL_Z>, object_type: [Tensor,Tensor]) -> (<UInt16xFRACTAL_Z>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int32xFRACTAL_Z>, <Int32xFRACTAL_Z>, object_type: [Tensor,Tensor]) -> (<Int32xFRACTAL_Z>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int16xC1HWNCoC0>, <Int16xC1HWNCoC0>, object_type: [Tensor,Tensor]) -> (<Int16xC1HWNCoC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<UInt16xC1HWNCoC0>, <UInt16xC1HWNCoC0>, object_type: [Tensor,Tensor]) -> (<UInt16xC1HWNCoC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int32xC1HWNCoC0>, <Int32xC1HWNCoC0>, object_type: [Tensor,Tensor]) -> (<Int32xC1HWNCoC0>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int16xFRACTAL_NZ>, <Int16xFRACTAL_NZ>, object_type: [Tensor,Tensor]) -> (<Int16xFRACTAL_NZ>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<UInt16xFRACTAL_NZ>, <UInt16xFRACTAL_NZ>, object_type: [Tensor,Tensor]) -> (<UInt16xFRACTAL_NZ>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
(<Int32xFRACTAL_NZ>, <Int32xFRACTAL_NZ>, object_type: [Tensor,Tensor]) -> (<Int32xFRACTAL_NZ>, object_type: [Tensor], kernel_type: TBE_KERNEL, op_type: UNKNOWN_OP_TYPE)
AI CPU:
{}
Please check the given data type or shape:
AI CORE: : (<Tensor[Bool], (1, 8500, 4)>, <Tensor[Bool], (1, 8500, 4)>) -> (<Tensor[Bool], (1, 8500, 4)>)
AI CPU: : (<Tensor[Bool], (1, 8500, 4)>, <Tensor[Bool], (1, 8500, 4)>) -> (<Tensor[Bool], (1, 8500, 4)>)
For more details, please refer to 'Kernel Select Failed' at https://www.mindspore.cn
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/device/kernel_select_ascend.cc:1487 HandleKernelSelectFailure
3、问题定位
由报错信息可知是输入道对应算子的数据类型错误,具体报错的算子为[BitwiseAnd]
,定位&
运算符位置,将该运算符替换为mindspore内置算子ms.ops.logical_and
,修改后正确代码如下:
import mindspore as ms
ms.context.set_context(mode=ms.GRAPH_MODE, device_target='Ascend')
x = ms.ops.randn((1, 8500, 4))
# out = ((x > 0.01) & (x < 0.99)).all(-1, keep_dims=True)
out = ms.ops.logical_and(x>0.01, x<0.99).all(-1, keep_dims=True)
print('done')
4、总结
实际运行时,代码较多,报错代码位置与实际出错代码位置不一致,只能采取二分法进行逼近。今后类似[BitwiseAnd]
直接在代码段中替换对应算子即可。