昇腾平台模型训练

文章讲述了在昇腾平台上进行nnUNet模型训练时遇到的问题,如预处理步骤中的npy文件异常、数据转换错误以及pytorch到MindSpore的代码迁移过程中遇到的报错,提示检查npy文件大小和pickle加载设置。
摘要由CSDN通过智能技术生成

昇腾平台模型训练

export nnUNet_raw_data_base="/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_raw"
export nnUNet_preprocessed="/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_preprocessed"
export RESULTS_FOLDER="/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_trained_models"

nnUNet_convert_decathlon_task -i /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task02_BraTS2021

nnUNet_plan_and_preprocess -t 2 --verify_dataset_integrity

nnUNet_train 3d_fullres nnUNetTrainerV2 1 0 --npz
nnUNet_train 3d_fullres nnUNetTrainer 1 0 --npz

nnUNet_train 2d nnUNetTrainerV2 501 0 
# 还未安装隐藏层,记得安装

# 后台跑程序
nohup nnUNet_train 2d nnUNetTrainerV2 501 0 > log_8_24_test.txt

代码迁移

# 进入工具箱
cd /usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/

# 执行python文件转换脚本
python ms_fmk_transplt.py -i /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/ -o /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet_ascend/ -v 1.11.0

python ms_fmk_transplt.py -i /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/ -o /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet_ascend/ -v 1.11.0

# 查看npu占用率
npu-smi info

# 结束进程
pkill -9 python

nnUNet_predict -i /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task002_BraTS2021/imagesTr/ -o /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task002_BraTS2021/inferTs/ -t 501 -m 2d -f 0

# pytorch2mindspore代码迁移
cd /usr/local/Ascend/ascend-toolkit/latest/tools/x2mindspore
./run_x2mindspore.sh -i /home/ma-user/work/nnunet_ascend/nnUNet-nnunetv1_msft/ -o /home/ma-user/work/nnunet_ascend/ -f pytorch

pytorch2.0

# 进入pytorch环境
source activate   /home/ma-user/anaconda3/envs/pytorch_1_11

# 进入工具箱
cd /usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/

# 执行python文件转换脚本
python ms_fmk_transplt.py -i /home/ma-user/work/nnUNet-nnunetv1/ -o /home/ma-user/work/nnunet_ascend/

# 打包迁移好的文件夹
tar -zcvf /home/ma-user/work/nnunet/nnUNet-nnunetv1_msft.tar /home/ma-user/work/nnunet/nnUNet-nnunetv1_msft/

# 解压文件夹
tar -zxvf nnunet_msft.tar
tar -zxvf nnUNet-nnunetv1_ascend.tar
tar -xvf BraTS2021_Data.tar

# 运行错命令需要中值tar
pkill tar

# 规范数据格式,需要修改代码路径
python Task500_BraTS_2021.py 


# 数据集转换
nnUNet_convert_decathlon_task -i /home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_raw/nnUNet_raw_data/Task01_BraTS2021

报错

 Failed to interpret file '/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_preprocessed/Task001_BraTS2021/nnUNetData_plans_v2.1_stage0/BraTS2021_01351.npy' as a pickle
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/numpy/lib/npyio.py", line 448, in load
    return pickle.load(fid, **pickle_kwargs)
EOFError: Ran out of input

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 46, in producer
    item = next(data_loader)
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__
    return self.generate_train_batch()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/dataloading/dataset_loading.py", line 245, in generate_train_batch
    case_all_data = np.load(self._data[i]['data_file'][:-4] + ".npy", self.memmap_mode, allow_pickle=True)
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/numpy/lib/npyio.py", line 451, in load
    "Failed to interpret file %s as a pickle" % repr(file)) from e
OSError: Failed to interpret file '/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnUNetFrame/DATASET/nnUNet_preprocessed/Task001_BraTS2021/nnUNetData_plans_v2.1_stage0/BraTS2021_01351.npy' as a pickle
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/bin/nnUNet_train", line 33, in <module>
    sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/run/run_training.py", line 182, in main
    trainer.run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training
    ret = super().run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/nnUNetTrainer.py", line 318, in run_training
    super(nnUNetTrainer, self).run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/network_trainer.py", line 418, in run_training
    _ = self.tr_gen.next()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 181, in next
    return self.__next__()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 204, in __next__
    item = self.__get_next_item()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 189, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
THPModule_npu_shutdown success.

未修改的代码报错

Exception in background worker 4:
 Cannot load file containing pickled data when allow_pickle=False
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 46, in producer
    item = next(data_loader)
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in __next__
    return self.generate_train_batch()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/dataloading/dataset_loading.py", line 246, in generate_train_batch
    case_all_data = np.load(self._data[i]['data_file'][:-4] + ".npy", self.memmap_mode)
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/numpy/lib/npyio.py", line 445, in load
    raise ValueError("Cannot load file containing pickled data "
ValueError: Cannot load file containing pickled data when allow_pickle=False
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/bin/nnUNet_train", line 33, in <module>
    sys.exit(load_entry_point('nnunet', 'console_scripts', 'nnUNet_train')())
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/run/run_training.py", line 182, in main
    trainer.run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/nnUNetTrainerV2.py", line 440, in run_training
    ret = super().run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/nnUNetTrainer.py", line 318, in run_training
    super(nnUNetTrainer, self).run_training()
  File "/home/ma-user/work/nnunet/nnUNet-nnunetv1/nnunet/training/network_training/network_trainer.py", line 418, in run_training
    _ = self.tr_gen.next()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 181, in next
    return self.__next__()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 204, in __next__
    item = self.__get_next_item()
  File "/home/ma-user/anaconda3/envs/Pytorch-1.11.0/lib/python3.7/site-packages/batchgenerators/dataloading/multi_threaded_augmenter.py", line 189, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
THPModule_npu_shutdown success.

问题

预处理步骤出现问题,有npy任务出现大小为0的情况

华为升腾平台支持运行 YOLOv5 目标检测模型,以下是一个简单的 YOLOv5 案例代码示例: ```python import numpy as np from PIL import Image from atlasutil import ai from atlasutil import camera # 加载模型 model = ai.Model("yolov5s.om") # 打开摄像头 camera.open() while True: # 获取摄像头采集的图像 image = camera.capture() # 图像预处理 image = Image.fromarray(np.uint8(image)) image = image.resize((640, 640)) image_data = np.array(image) # 图像推理 output = model.infer([image_data]) # 解析推理结果 output = output[0] boxes, confidences, class_ids = output['boxes'], output['confidences'], output['class_ids'] # 在图像上绘制检测框 for i in range(len(boxes)): box = boxes[i] confidence = confidences[i] class_id = class_ids[i] if confidence > 0.5: x, y, w, h = box left, top, right, bottom = int(x - w / 2), int(y - h / 2), int(x + w / 2), int(y + h / 2) cv2.rectangle(image_data, (left, top), (right, bottom), (0, 255, 0), 2) cv2.putText(image_data, f"{class_id}", (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2) # 显示图像 cv2.imshow('Object Detection', image_data) if cv2.waitKey(1) & 0xFF == ord('q'): break # 关闭摄像头 camera.close() ``` 这段代码使用华为升腾平台的 YOLOv5 模型进行实时目标检测,并在图像上绘制检测框和类别标签。请确保已安装好华为升腾平台的开发套件和相关依赖库,并参考华为开发者社区的文档进行模型部署和推理。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值