Slowfast在modelarts上训练出现数据集相关问题

最新推荐文章于 2024-07-27 12:20:46 发布

小乐快乐

最新推荐文章于 2024-07-27 12:20:46 发布

阅读量348

点赞数 1

文章标签： python 开发语言

本文链接：https://blog.csdn.net/weixin_45666880/article/details/127773513

版权

在modelarts平台开发slowfast算子时出现数据集处理问题

[10/04 14:43:43][INFO] start copy.py: 299: ============== Starting Training ==============
[10/04 14:43:43][INFO] start copy.py: 301: total_epoch=20, steps_per_epoch=101
[WARNING] MD(178,fffba4ff91e0,python):2022-10-04-14:44:30.306.953 [mindspore/ccsrc/minddata/dataset/engine/datasetops/device_queue_op.cc:725] DetectPerBatchTime] Bad performance attention, it takes more than 25 seconds to fetch a batch of data from dataset pipeline, which might result `GetNext` timeout problem. You may test dataset processing performance(with creating dataset iterator) and optimize it.
[ERROR] MD(178,ffff60c791e0,python):2022-10-04-14:45:25.453.944 [mindspore/ccsrc/minddata/dataset/util/task.cc:67] operator()] Task: GeneratorOp(ID:3) - thread(281472305435104) is terminated with err msg: Exception thrown from PyFunc. Exception: Generator worker process timeout.

At:
  /home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/dataset/engine/datasets.py(3841): process

Line of code : 195
File         : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_CentOS@2/mindspore/mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc

[ERROR] MD(178,ffff60c791e0,python):2022-10-04-14:45:25.454.325 [mindspore/ccsrc/minddata/dataset/util/task_manager.cc:217] InterruptMaster] Task is terminated with err msg(more detail in info level log):Exception thrown from PyFunc. Exception: Generator worker process timeout.

At:
  /home/ma-user/anaconda/lib/python3.7/site-packages/mindspore/dataset/engine/datasets.py(3841): process

Line of code : 195
File         : /home/jenkins/agent-working-dir/workspace/Compile_Ascend_ARM_CentOS@2/mindspore/mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc

[WARNING] CORE(178,ffffaff20170,python):2022-10-04-14:48:20.618.138 [mindspore/core/ir/anf_extends.cc:65] fullname_with_scope] Input 0 of cnode is not a value node, its type is CNode.

可以看到提示处理数据集时超时，但是相关数据集在启智平台上运行时没有问题

启智平台运行时使用的时mindspore1.7版本，但在华为云的modelarts上使用的是mindspore1.5.1版本，是否是因为这一版本问题导致的呢？是否有其余解决办法呢？

****************************************************解答*****************************************************

看错误原因是python function执行时间太长了，要不尝试一下几种方法

1. GeneratorDataset中python_multiprocessing设置为True

2. GeneratorDataset的num_parallel_workers设置大一些（默认值应该是1）

小乐快乐

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Slowfast在modelarts上训练出现数据集相关问题

启智平台运行时使用的时mindspore1.7版本，但在华为云的modelarts上使用的是mindspore1.5.1版本，是否是因为这一版本问题导致的呢？是否有其余解决办法呢？2. GeneratorDataset的num_parallel_workers设置大一些（默认值应该是1）看错误原因是python function执行时间太长了，要不尝试一下几种方法。可以看到提示处理数据集时超时，但是相关数据集在启智平台上运行时没有问题。在modelarts平台开发slowfast算子时出现数据集处理问题。
复制链接

扫一扫