打乱序列部份顺序

HackerTom

已于 2024-03-25 21:13:06 修改

阅读量368

点赞数 4

分类专栏：乱搞文章标签：随机 python numpy

于 2024-03-25 08:33:08 首次发布

本文链接：https://blog.csdn.net/hackertom/article/details/137000944

版权

乱搞专栏收录该内容

31 篇文章 1 订阅

订阅专栏

给定一个序列 $<x_1,\dots,x_n>$ ，想打乱其中 p（ $p\in[0,1]$ ）比例位置的顺序。这在 noisy label 中用到，如 [1] 中的 PrecompDataset 就用打乱顺序的方式模拟 noisy pair，而 [1] 用的打乱方式简单用 numpy.random.shuffle，可能会导致打乱不彻底，即一些元素还在原来的位置，如：

import numpy as np
b = np.arange(5) # [0, 1, 2, 3, 4]
np.random.shuffle(b)
print(b) # [0, 4, 2, 3, 1]，其中 0、2、3 还在原位

以致实际被打乱的位置可能不足 p%，为保证足有 p% 的乱序，可如此写：

import numpy as np

# 示例数据
n = 7 # 数据量
x = np.arange(n * 5).reshape(n, 5) # 二维数据
x_orig = x.copy()
print("original:", x)

p = 0.4 # 乱序比例
noise_length = int(p * n) # 乱序数量
pos_to_shuffle = np.random.permutation(n)[:noise_length] # 随机选 noise_length 个要打乱的位置
# 打乱这些下标：加一个随机偏移
meta_index = np.arange(noise_length) # pos_to_shuffle 的下标
rnd_shift = np.random.randint(1, noise_length) # 随机偏移：[1, noise_length)
# print(rnd_shift)
shuffled_pos = pos_to_shuffle[(meta_index + rnd_shift) % noise_length]
# print(pos_to_shuffle, shuffled_pos)
assert (pos_to_shuffle != shuffled_pos).all()
# 用乱序下标打乱数据
x[pos_to_shuffle] = x[shuffled_pos]
print("shuffled:", x)

# 验证 noise rate 如期
assert (x_orig[:, 0] != x[:, 0]).sum() == noise_length
real_p = round((x_orig[:, 0] != x[:, 0]).sum() / n, 6)
assert real_p > 0
exp_p = round(noise_length / n, 6)
print("real p:", real_p, " v.s. expected p:", exp_p)