1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: 2.1
执行模式(PyNative/ Graph): 不限
2 报错信息
2.1 问题描述
在Ascend环境中,按照Performance_Tuning.md方法启动profiler时报错如下:
Traceback (most recent call last): File "wizardcoder/run_wizardcoder.py", line 149, in <module> device_id=args .device_id) File "wizardcoder/run_wizardcoder.py", line 81, in main
1Ctask.train(train_checkpoint=ckpt, resume=resume) nile /home /vizardoderÄ, vizardcder aindformers/aindformers/trainer/trainer.py",une is_full config=True,**kwargs) File-"/home/wizardcoder/1_wizardcoder- mindformers/mindformers/trainer/c ausal_language_modeling/caus al_language_modeling.py", line 104, in train **kwargs) File "/home/wizardcoder/1_wizardcoder-mindformers/mindformers/t rainer/base_trainer.py", line 631, in training_process initial_epoch=config.runne r_config.initial_epoch) File "/root/miniconda3/envs/wizardcoder/lib/python3.7/site-packages/mindspore/train/model .py", line 1066, in train initial_epoch=initial_epoch) File "/root/miniconda3/envs/wizardcoder/lib/python3.7/site-packages/mindspore/train/model.py" , line 113, in wrapper func(self, *args, **kwargs) File "/root/miniconda3/envs/wizardcoder/lib/python3.7/site-packages/mindspore/train/model.py" , line 613, in _train self._train_process(epoch, train_dataset, list_callback, cb_params, initial_epoch, valid infos) File "/root/miniconda3/envs/wiz ardcoder/lib/python3.7/site-packages/mindspore7train/model.py", line 921, in _train_process
list_callback.on_ train_step_end(run_context) File "7root/miniconda3/envs/wizardcoder/lib/python3.7/site-pac kages/mindspore/train/callback/_callback .py", line 413, in on_train_step_end cb.on_train_step_end(run_context) File "/root/miniconda3/envs/wizardcode r/lib/python3.7/site-pac kages /mindspore/train/callback/_callback .py", line 255, in on_train_step_end self.step_end(run_context File "/home/wizardcoder/1_wizardcoder -mindformers/mindformers/core/cal lback/callback.py", line 593, in step_end self.profiler.analyse() File "/root/miniconda3/envs/wizardcoder/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 594, in analyse 710 self._analyse(offline_path=of fline_path) File "/root/miniconda3/envs/wizardcode r/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 634, in Qahalyse Fi1elf, rastemdnananda3/envs /wizardcoder/Lib/python3.7/site-packages /mindspore/profiler/profiling.py", line 1021, in _ascend_analyse self._ascend_graph_analyse() File "/root/miniconda3/envs/wizardcoder/lib/python3.7/site-packages/mindspore/profiler/pro filing.py", line 1250, in _ascend graph_analyse
op_summary, op_statistic, steptrace = ascend_graph_msprof_analyse( source_path) File "/root/miniconda3/envs/wizardcoder/Tib/python3.77site-packages /mindspore/profiler/profiling .py", line 298, in _ascend_graph_msprof_analyse df op summary, df op statistic, df step trace = msprof analyser.parse() File "/root/mıniconda3/envs/wizardcoder/lıb/python3.7/site-packages/mindspore/prof1ler/parser/ascend_msprof_generator.py", line 97, in parse self._read_op_summary() File "/root/miniconda3/envs /wizardcoder/lib/python3.7/site-packages/mindspore/profiler/parser/ascend_msp rof_generator.py", line 121, in _read_op_summary row = [row[index.get( 'index')] for index in self.op_summary_name.values()] File "/root/miniconda3/envs/wizardcoder/lib/python3 . 77site-packages/mindspore/profiler/parser/ascend_mspro f_generator.py", line 121, in <listcomp> row = [row[index.get( 'index')] for index in self. op_summary_name.values()]
IndexError: list index out of range
复制
3 根因分析
原因是Ascend环境启动profiler需要配置一些环境命令。
4 解决方案
按照如下步骤进行操作:
- unset上述环境变量
- 安装hccl_parser:pip install {CANN包址}/latest/tools/hccl_parser-0.1-py3-none-any.whl,使用env命令查看CANN包地址
- Yaml文件中profile设为True