多线程生成批次时debug

最新推荐文章于 2024-06-12 16:52:09 发布

草莓橙子碗

最新推荐文章于 2024-06-12 16:52:09 发布

阅读量22

点赞数

文章标签：机器学习 python

本文链接：https://blog.csdn.net/weixin_57506268/article/details/135125890

版权

项目场景：

针对上一篇博客机器学习为什么要分批次训练？-CSDN博客，考虑到训练集有50,000个数据，测试集有10,000个数据，单线程读取的时候比较耗费时间，所以引用多线程读取的方法

问题描述

在DataLoader直接设置num_workers进行多线程的设置出现以下报错：

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

原因分析：

在Python的多进程模块中，如果主模块启动了一个新的进程，那么主模块的代码会在新的进程中再次启动执行。但子进程执行的只能是功能块的代码，像是函数或者类。

如果主模块内的全局范围内有代码，并且这些代码在启动新的子进程，那么在子进程再次执行这些全局范围的代码时，会再次启动新的子进程，导致无限的递归创建进程，于是程序会抛出 RuntimeError。