MindSpore 版本:1.5.0-rc1
ubuntu18.04
python3.7.5
GPU CUDA10.1
【操作步骤&问题现象】
1、修改batchsize为32和数据及路径后直接运行报错Attr output_num 32must less than28 ,修改group为16后报错Attr output_num 16must less than14,修改group为7才能够正常运行
2、上传至modelarts上与自己电脑上运行错误相同,同样group更改为7才能使用 配置为GPU: 1*NVIDIA-V100(32GB) | CPU: 8 核 64GB
[ERROR] KERNEL(3516,7f24a92a2740,python):2021-10-23-20:03:05.062.308 [mindspore/ccsrc/backend/kernel_compiler/gpu/arrays/split_gpu_kernel.h:144] CheckParam] Attr output_num 32must less than28
[EXCEPTION] DEVICE(3516,7f24a92a2740,python):2021-10-23-20:03:05.062.651 [mindspore/ccsrc/runtime/device/gpu/gpu_kernel_build.cc:63] CreateGPUKernel] Initialize gpu kernel op[Default/network-TrainOneStepCell/network-WithLossCell/_backbone-SENet/layer2-SequentialCell/1-SEResNeXtBottleneck/conv2-GroupConv/Split-op137405] failed.
Traceback (most recent call last):
File "/home/zxm/PycharmProjects/pythonProject3/train.py", line 288, in
model.train(cfg.epoch_size, dataset, callbacks=cbs)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/train/model.py", line 718, in train
sink_size=sink_size)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/train/model.py", line 502, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback, cb_params, sink_size)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/train/model.py", line 564, in _train_dataset_sink_process
outputs = self._train_network(*inputs)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/nn/cell.py", line 404, in __call__
out = self.compile_and_run(*inputs)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/nn/cell.py", line 682, in compile_and_run
self.compile(*inputs)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/nn/cell.py", line 669, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/home/zxm/.local/lib/python3.7/site-packages/mindspore/common/api.py", line 542, in compile
result = self._graph_executor.compile(obj, args_list, phase, use_vm, self.queue_name)
RuntimeError: mindspore/ccsrc/runtime/device/gpu/gpu_kernel_build.cc:63 CreateGPUKernel] Initialize gpu kernel op[Default/network-TrainOneStepCell/network-WithLossCell/_backbone-SENet/layer2-SequentialCell/1-SEResNeXtBottleneck/conv2-GroupConv/Split-op137405] failed.
【截图信息】
解答:
关键报错信息如下:
_backbone-SENet/layer2-SequentialCell/1-SEResNeXtBottleneck/conv2-GroupConv/Split
split_gpu_kernel.h:144] CheckParam] Attr output_num 32 must less than28
报错的意思是说:你网络中使用了Split算子,该算子的input_x.shape()[axis] 是 28,但是你设置的output_num 是 32,超出了输入数据在axis维度上的最大切分粒度,所以报错。
建议:调试网络结构,或修改网络配置参数。
Split算子接口说明如下。
https://www.mindspore.cn/docs/api/en/master/api_python/ops/mindspore.ops.Split.html#mindspore.ops.Split