如何又好又快地产生元素不重复的整数数组

suoluo_2020

已于 2023-06-14 17:50:37 修改

阅读量201

点赞数 1

文章标签： python numpy 开发语言

于 2023-06-14 17:46:57 首次发布

本文链接：https://blog.csdn.net/suoluo_2020/article/details/131212938

版权

为了在给定范围内产生一个元素不重复的随机数数组或序列，利用np.random模块有多种实现方式，但是其速度差距较大。本文提供了4种生成方法，各有千秋。总的来说，根据不同的样本规模和取值范围，方法1和方法2表现较好，其余方法不是很稳定。具体如下：
1.当样本规模接近取值范围上界时，方法2即采用arange() + shuffle(arr)最快；
2.当样本规模与取值范围上界差距较大时，方法1即采用randint() + unique()最快；
3.方法3和4的速度更慢，且在上界较大时速度波动很大；
4.方法1的性能总体上最为稳定、快速，但方法2在取值范围与样本大小接近时表现最好。

代码及实验数据：

import numpy as np
import random, time

# 设定取值范围和样本规模。注意：这个参数对以下4种方法的速度有较大影响
upper_bound = 1,000,001
batch_size = 1,000,000

# 方法一，randint() + unique():总体较好，速度稳定，但需要额外生成约1.3倍数据并作截断处理

t1 = time.perf_counter()
a = np.random.randint(low=0, high=upper_bound, size=int(batch_size* 1.3) ) # 因有重复，故须放大
a = np.unique(a)[:batch_size]
t2 = time.perf_counter()
print("randint() + unique() takes: ", t2 – t1)

# 方法二: arange() + shuffle(arr)，总体不错，在取值范围与样本大小接近时表现优异
t3 = time.perf_counter()
b = np.arange(0,upper_bound,1,dtype=np.int16)
np.random.shuffle(b)
b = b[:batch_size]
t4 = time.perf_counter()
print("numpy + shuffle() takes: ", t4 – t3)

#方法三: random.sample()，虽然时间比较稳定，但总体较慢

t5 = time.perf_counter()
indices = np.arange(upper_bound)
np.random.shuffle(indices)
idx = indices[:batch_size]
d = indices[idx]
t6 = time.perf_counter()
print("arange + shuffle() takes: ", t6 – t5)

#方法四: random.choice()，速度慢，且最不稳定

t7 = time.perf_counter()
a = np.array(random.sample(range(upper_bound), batch_size))
t8 = time.perf_counter()
print("random sample takes: ", t8 – t7)

print("shape: ", a.shape, b.shape, c.shape, d.shape)

测试记录：

upper_bound= 1,000,001、batch_size = 1,000,000                上界与size几乎相同
randint() + unique(arr) takes: 0.11203                                      重复元素太多、速度也相对较慢
arange() + shuffle(arr) takes: 0.02429                                      相对最好
random.sample() takes: 0.80355                                              速度最慢
random.choice() takes: 0.12579
shape: (727290,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 5,000,001、batch_size = 1,000,000 仅上界扩大5倍，size不变

randint() + unique(arr) takes: 0.12324 相对最快

arange() + shuffle(arr) takes: 0.15661 变慢

random.sample() takes: 0.76003 速度略增，相对最慢

random.choice() takes: 0.6372 变慢很多

shape: (1000000,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 10,000,001、batch_size = 1,000,000 仅上界扩大10倍，size不变

randint() + unique(arr) takes: 0.11756 相对最快

arange() + shuffle(arr) takes: 0.40519 变慢

random.sample() takes: 0.73025 速度略增!

random.choice() takes: 1.33287 变慢很多、相对最慢

shape: (1000000,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 50,000,001、batch_size = 1,000,000 仅上界扩大50倍，size不变

randint() + unique(arr) takes: 0.11704 相对最快

arange() + shuffle(arr) takes: 2.64421 变慢很多

random.sample() takes: 0.69041 速度略增!

random.choice() takes: 7. 2729 变慢很多、相对最慢

shape: (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 1,000,001、batch_size = 500,000 上界不变，size降低50%

randint() + unique(arr) takes: 0.05162 重复元素较少、速度也相对较慢

arange() + shuffle(arr) takes: 0. 02534 相对最好

random.sample() takes: 0. 41961 速度最慢

random.choice() takes: 0. 11534

shape: (477494,) (500000,) (500000,) (500000,)

upper_bound= 1,000,001、batch_size = 200,000 上界不变，size降至20%

randint() + unique(arr) takes: 0. 02002 速度最快

arange() + shuffle(arr) takes: 0. 02273 速度较快

random.sample() takes: 0. 18611 速度最慢

random.choice() takes: 0. 11471

shape: (200000,) (200000,) (200000,) (200000,)

upper_bound= 1,000,001、batch_size = 100,000 上界不变，size降至10%

randint() + unique(arr) takes: 0.01142 速度最快

arange() + shuffle(arr) takes: 0.02853 速度较快

random.sample() takes: 0.10654

random.choice() takes: 0.11218 速度最慢

shape: (100000,) (100000,) (100000,) (100000,)

upper_bound= 1,000,001、batch_size = 50,000 上界不变，size降至5%

randint() + unique(arr) takes: 0.00417 速度最快

arange() + shuffle(arr) takes: 0.02312

random.sample() takes: 0.02741

random.choice() takes: 0.11936 速度最慢

shape: (50000,) (50000,) (50000,) (50000,)

upper_bound= 1,000,001、batch_size = 10,000 上界不变，size降至1%

randint() + unique(arr) takes: 0.00072 速度最快

arange() + shuffle(arr) takes: 0.02218

random.sample() takes: 0.00503

random.choice() takes: 0.1143 速度最慢

shape: (10000,) (10000,) (10000,) (10000,)

upper_bound= 100,001、batch_size = 1,000 上界降至10%，size降至0.1%

randint() + unique(arr) takes: 0.0001 速度最快

arange() + shuffle(arr) takes: 0.00209

random.sample() takes: 0.00049

random.choice() takes: 0.01045 速度最慢

shape: (1000,) (1000,) (1000,) (1000,)

suoluo_2020

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何又好又快地产生元素不重复的整数数组

a = np.random.randint(low=0, high=upper_bound, size=int(batch_size* 1.3) ) # 因有重复，故须放大。为了在给定范围内产生一个元素不重复的随机数数组或序列，利用np.random模块有多种实现方式，但是其速度差距较大。1.当样本规模接近取值范围上界时，方法2即采用arange() + shuffle(arr)最快；2.当样本规模与取值范围上界差距较大时，方法1即采用randint() + unique()最快；
复制链接

扫一扫