MindSpore开启MS_DISABLE_REF_MODE导致报错The device address type is wrong:type name in address:CPU,type nam

系统环境

硬件环境(Ascend/GPU/CPU): Ascend

MindSpore版本: 2.2

执行模式(PyNative/ Graph): 不限

报错信息

2.1 问题描述

使用MindSpore2.2+CANN 7.0环境,并行策略为dp:mp:pp=1:4:2时可正常跑通训练,但是改变并行策略为dp:mp:pp = 1:8:1时出现如下报错:

2023-10-28-10:52:46.336.155 -> 2023-10-28-10:52:46.336.477

For more details, please refer to the FAQ at https://www.mindspore.cn/docs/en/master/faag/data processing.htmlL.

Traceback (most recent call last):

File “wizardcoder/run_wizardcoder.py", line 170, in <module>

merge_file=args.merge_file)

File “wizardcoder/run_wizardcoder.py", line 104, in main

task.finetune(finetune_checkpoint=config.load_checkpoint, auto_trans_ckpt=config.auto_trans_ckpt, resume=resume)

File "/home/wizardcoder/2_wizardcoder-mindformers -1019/mindformers/trainer/trainer.py", line 462, in finetune

is_full_config=True, **kwargs)

File "/home/wizardcoder/2_wizardcoder-mindformers -1019/mindformers/trainer/causal_language_modeling/causal_language_modeling.py", Line 163, in train **kwargs)

File “/home/wizardcoder/2 wizardcoder-mindformers -1019/mindformers/trainer/base trainer.py", Line 653, in training process

initial_epoch=config.runner_config.initial_epoch)

File "“/home/miniconda3/lib/python3.7/site-packages/mindspore/train/model.py", Line 1073, in train

initial_epoch=initial_epoch)

File “/home/miniconda3/lib/python3.7/site-packages/mindspore/train/model.py", line 114, in wrapper

func(self, *args, **kwargs)

File “/home/miniconda3/1ib/python3.7/site-packages/mindspore/train/model.py", line 624, in _train

cb_params, sink size, initial_epoch, valid infos)

File “/home/miniconda3/1lib/python3.7/site-packages/mindspore/train/model.py", line 708, in _train_dataset_sink_process

outputs = train network(*inputs)

File "/home/miniconda3/lib/python3.7/site-packages/mindspore/nn/cell.py", line 680, in _call

out = self.compile and run(*args, **kwargs)

File “/home/miniconda3/lib/python3.7/site-packages/mindspore/nn/cell.py", line 1023, in compile _and_run

return _cell_graph_executor(self, *new_args, phase=self.phase)

File "/home/miniconda3/lib/python3.7/site-packages/mindspore/common/api.py", line 1589, in __call

return self.run(obj, *args, phase=phase)

File "/home/miniconda3/lib/python3.7/site-packages/mindspore/common/api.py", line 1628, in run

return self. exec_pip(obj, *args, phase=phase real)

File “/home/miniconda3/lib/python3.7/site-packages/mindspore/common/api.py", line 121, in wrapper

results = fn(*arg, **kwargs)

File “/home/miniconda3/lib/python3.7/site-packages/mindspore/common/api.py", line 1608, in _exec_pip

return self. graph executor(args, phase)

RuntimeError: The device address type is wrong: type name in address:CPU, type name in context:Ascend复制

根因分析

分析发现是开启“MS_DISABLE_REF_MODE”环境变量所导致。

解决方案

unset此环境变量后即可正常跑通。

  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值