做一个分类任务,样本比例不均匀,最大类与最小类差距有上百倍,因此要么用分层采样,要么用pytorch的torch.utils.data下提供的方法:
WeightedRandomSampler(weights: Sequence[float], num_samples: int, replacement: bool = True, generator=None)
对不同类的样本赋予权重,然后进行权重采样:
class_counts = torch.tensor([104, 642, 784])
# Create dummy data with class imbalance 99 to 1
class_counts = torch.tensor([104, 642, 784])
numDataPoints = class_counts.sum()
data_dim = 5
bs = 170
data = torch.randn(numDataPoints, data_dim)
target = torch.cat((torch.zeros(class_counts[0], dtype=torch.long),
torch.ones(class_counts[1], dtype=torch.long),
torch.ones(class_counts[2], dtype=torch.long) * 2))
print('target train 0/1/2: {}/{}/{}'.format(
(target == 0).sum(), (target == 1).sum(), (target == 2).sum()))
# Compute samples weight (each sample should get its own