[WARNING] ME(160:281473323057520,MainProcess):2022-08-23-02:05:59.496.078 [mindspore/dataset/core/config.py:464] The shared memory is on, multiprocessing performance will be improved. Note: the required shared memory can't exceeds 80% of the available shared memory. You can reduce max_rowsize or reduce num_parallel_workers to reduce shared memory usage.
loading annotations into memory...
Done (t=33.57s)
creating index...
index created!
2022-08-23 02:06:39,951:INFO:Finish loading dataset
[EXCEPTION] ANALYZER(160,ffff9d6f4170,python):2022-08-23-02:11:13.452.182 [mindspore/ccsrc/pipeline/jit/static_analysis/prim.cc:954] GetEvaluatedValueForBuiltinTypeAttrOrMethod] Not supported to get attribute item name:'arange' of a type[kMetaTypeNone]
[ERROR] ME(160:281473323057520,MainProcess):2022-08-23-02:11:13.584.850 [mindspore/dataset/engine/datasets.py:2686] Uncaught exception:
Traceback (most recent call last):
File "/home/ma-user/modelarts/user-job-dir/yolov5/train.py", line 137, in
run_train()
File "/home/ma-user/modelarts/user-job-dir/yolov5/model_utils/moxing_adapter.py", line 167, in wrapped_func
run_func(*args, **kwargs)
File "/home/ma-user/modelarts/user-job-dir/yolov5/train.py", line 111, in run_train
data[7], input_shape)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/nn/cell.py", line 404, in __call__
out = self.compile_and_run(*inputs)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/nn/cell.py", line 682, in compile_and_run
self.compile(*inputs)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/nn/cell.py", line 669, in compile
_cell_graph_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/common/api.py", line 548, in compile
result = self._graph_executor.compile(obj, args_list, phase, use_vm, self.queue_name)
RuntimeError: mindspore/ccsrc/pipeline/jit/static_analysis/prim.cc:954 GetEvaluatedValueForBuiltinTypeAttrOrMethod] Not supported to get attribute item name:'arange' of a type[kMetaTypeNone]
The function call stack (See file '/home/ma-user/modelarts/workspace/device0/rank_0/om/analyze_fail.dat' for more details):
# 0 In file /home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py(353)
loss = self.network(*inputs)
^
# 1 In file /home/ma-user/modelarts/user-job-dir/yolov5/src/yolo.py(394)
yolo_out = self.yolo_network(x, input_shape)
^
# 2 In file /home/ma-user/modelarts/user-job-dir/yolov5/src/yolo.py(358)
output_big = self.detect_1(big_object_output, input_shape)
^
# 3 In file /home/ma-user/modelarts/user-job-dir/yolov5/src/yolo.py(192)
if self.conf_training:
# 4 In file /home/ma-user/modelarts/user-job-dir/yolov5/src/yolo.py(168)
grid_x = ms.numpy.arange(grid_size[1])
^
[ERROR] ME(160:281470911484384,MainProcess):2022-08-23-02:11:24.966.42 [mindspore/dataset/engine/datasets.py:2532] The subprocess of dataset may exit unexpected or be killed, main process will exit.
[ModelArts Service Log]2022-08-23 02:11:35,699 - ERROR - proc-rank-0-device-0 (pid: 160) has exited with non-zero code: -15
[ModelArts Service Log]2022-08-23 02:11:35,702 - INFO - Begin destroy training processes
[ModelArts Service Log]2022-08-23 02:11:35,702 - INFO - proc-rank-0-device-0 (pid: 160) has exited
[ModelArts Service Log]2022-08-23 02:11:35,702 - INFO - End destroy training processes
time="2022-08-23T02:11:35+08:00" level=info msg="start and wait python command is exit with 241" file="controller.go:181" Args="[/home/ma-user/anaconda/bin/python /home/ma-user/modelarts/run/davincirun.py /home/ma-user/anaconda/bin/python /home/ma-user/modelarts/user-job-dir/yolov5/train.py --data_url=/home/ma-user/modelarts/inputs/data_url_0/ --train_url=/home/ma-user/modelarts/outputs/train_url_0/ --data_url=/home/ma-user/modelarts/inputs/data_url_0/ --train_url=/home/ma-user/modelarts/outputs/train_url_0/]" Command=run-with-backoff Component=ma-training-toolkit Platform=ModelArts-Service TaskID=worker-0
time="2022-08-23T02:11:35+08:00" level=info msg="run-with-backoff exit with 241" file="controller.go:159" Args="[/home/ma-user/anaconda/bin/python /home/ma-user/modelarts/run/davincirun.py /home/ma-user/anaconda/bin/python /home/ma-user/modelarts/user-job-dir/yolov5/train.py --data_url=/home/ma-user/modelarts/inputs/data_url_0/ --train_url=/home/ma-user/modelarts/outputs/train_url_0/ --data_url=/home/ma-user/modelarts/inputs/data_url_0/ --train_url=/home/ma-user/modelarts/outputs/train_url_0/]" Command=run-with-backoff Component=ma-training-toolkit Platform=ModelArts-Service TaskID=worker-0
这个应该是modelzoo和 mindspore版本不匹配导致的,1.5的mindspore版本还不支持ms.numpy 的这种用法,可以把model zoo也切换到1.5试一下