PyTorch Dataloader当使用LMDB数据时num_worker＞0引发TypeError: can‘t pickle Environment objects问题的解决

最新推荐文章于 2023-10-17 11:04:51 发布

evilfromshadow

最新推荐文章于 2023-10-17 11:04:51 发布

阅读量1.8k

点赞数 4

分类专栏：笔记

原文链接：https://github.com/pytorch/vision/issues/689#issuecomment-787215916

版权

笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

PyTorch Dataloader当使用LMDB数据时num_worker>0引发TypeError: can't pickle Environment objects问题的解决

防忘记，原issue：https://github.com/pytorch/vision/issues/689#issuecomment-787215916

解决方法：

不要在__init__方法中调用lmdb.open方法；
在第一次加载数据时打开lmdb。

代码如下

class DataLoader(torch.utils.data.Dataset):
    def __init__(self):
        """do not open lmdb here!!"""

    def open_lmdb(self):
         self.env = lmdb.open(self.lmdb_dir, readonly=True, create=False)
         self.txn = self.env.begin(buffers=True)

    def __getitem__(self, item: int):
        if not hasattr(self, 'txn'):
            self.open_lmdb()
        """
        Then do anything you want with env/txn here.
        """

解释（照搬原文，懒得翻译了）：
The multi-processing actually happens when you create the data iterator (e.g., when calling for datum in dataloader:):
https://github.com/pytorch/pytorch/blob/461014d54b3981c8fa6617f90ff7b7df51ab1e85/torch/utils/data/dataloader.py#L712-L720
In short, it would create multiple processes which “copy” the state of the current process. This copy involves a pickle of the LMDB’s Env thus causes an issue. In our solution, we open it at the first data iteration and the opened lmdb file object would be dedicated to each subprocess.

evilfromshadow

关注

4
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
PyTorch Dataloader当使用LMDB数据时num_worker＞0引发TypeError: can‘t pickle Environment objects问题的解决

PyTorch Dataloader当使用LMDB数据时num_worker>0引发TypeError: can't pickle Environment objects问题的解决防忘记，原issue：https://github.com/pytorch/vision/issues/689#issuecomment-787215916解决方法：不要在__init__方法中调用lmdb.open方法；在第一次加载数据时打开lmdb。代码如下class DataLoader(torch.u
复制链接

扫一扫