pytorch采样器 samplers.py

最新推荐文章于 2023-11-13 13:51:19 发布

范德彪陕西分彪

最新推荐文章于 2023-11-13 13:51:19 发布

阅读量301

点赞数

分类专栏： pytorch学习

本文链接：https://blog.csdn.net/weixin_46815330/article/details/116926905

版权

pytorch学习专栏收录该内容

138 篇文章 12 订阅

订阅专栏

# -*- encoding: utf-8 -*-
"""
@File    : samplers.py
@Time    : 2021-05-09 22:35
@Author  : XD
@Email   : gudianpai@qq.com
@Software: PyCharm
"""
from __future__ import absolute_import
from collections import defaultdict
import numpy as np

import torch
from torch.utils.data.sampler import Sampler

class RandomIdentitySampler(Sampler):
    """
    Randomly sample N identities, then for each identity,
    randomly sample K instances, therefore batch size is N*K.

    Code imported from https://github.com/Cysu/open-reid/blob/master/reid/utils/data/sampler.py.

    Args:
        data_source (Dataset): dataset to sample from.
        num_instances (int): number of instances per identity.
    """
    def __init__(self, data_source, num_instances = 4):
        self.data_source = data_source
        self.num_instance = num_instances
        self.index_dic = defaultdict(list)#这是什么意思。。。
        for index, (_, pid,_) in enumerate(data_source):
            self.index_dic[pid].append(index)
        self.pids = list(self.index_dic.keys())
        self.num_indentities = len(self.pids)

    def __iter__(self):
        #751 * 4 = 3004[0,12,3003,56,7,8,9..2] 按顺序排列
        #3004 list 32[aaaaaa..aaaaa...aaaa...]
        indics = torch.randperm(self.num_indentities)
        ret = []
        for i in indics:
            pid = self.pids[i]
            t = self.index_dic[pid]
            # if len(t) < self.num_instance:
            #     replace = True
            # else:
            #     replace = False
            replace = False if len(t) >= self.num_instance else True
            t = np.random.choice(t, size = self.num_instance, replace = replace)
            ret.extend(t)
        # from IPython import embed
        # embed()
        return iter(ret) #弹幕说代码应该是iter(ret)

    def __len__(self):
       return self.num_instance * self.num_indentities
if __name__ == '__main__':
    import data_manager
    dataset = data_manager.init_img_dataset(root = 'G:\data',name = 'market1501')
    sampler  = RandomIdentitySampler(dataset.train, num_instances = 4)
    a = sampler.__iter__()
	
	print(a.__next__())
    print(a.__next__())
    print(a.__next__())

=> Market1501 loaded
Dataset statistics:
  ------------------------------
  subset   | # ids | # images
  ------------------------------
  train    |   751 |    12936
  query    |   750 |     3368
  gallery  |   751 |    15913
  ------------------------------
  total    |  1501 |    32217
  ------------------------------
8560
8564
8565

Process finished with exit code 0