[ERROR] DEVICE(32023,ffff8e9bea40,python):2023-07-14-15:54:53.529.931 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_kernel_runtime.cc:675] TaskFailCallback] Execute TaskFailCallback failed. task_fail_info or current_graph_ is nullptr
[ERROR] DEVICE(32023,ffff8e9bea40,python):2023-07-14-15:54:53.600.448 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_manager.cc:186] SyncStream] Call runtime rtStreamSynchronize error.
[ERROR] DEVICE(32023,ffff8e9bea40,python):2023-07-14-15:54:53.600.507 [mindspore/ccsrc/plugin/device/ascend/hal/device/ascend_stream_manager.cc:196] SyncAllStreams] SyncStream for stream id 0 failed.
Traceback (most recent call last):
File "train.py", line 612, in <module>
main(args)
File "train.py", line 606, in main
train_semi(generator,discriminator,optimG,optimD,trainloader_l,step_size_train,trainloader_u,step_size_untrain,trainset_l,step_size_val,args)
File "train.py", line 511, in train_semi
LGadv = train_step2(img,mask,itr)
File "train.py", line 376, in train_step2
LGadv, grads = grad_fn2(inputs,mask)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 554, in after_grad
return grad_(fn_, weights)(*args, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 98, in wrapper
results = fn(*arg, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 535, in after_grad
out = _pynative_executor(fn, grad_.sens_param, *args, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 999, in __call__
return self._executor(sens_param, obj, args)
RuntimeError: Sync stream failed:Ascend_0
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/runtime/graph_scheduler/graph_scheduler.cc:632 Run
****************************************************解答*****************************************************
就是数据溢出,减小batch_size 就可以了