如何又好又快地产生元素不重复的整数数组

    为了在给定范围内产生一个元素不重复的随机数数组或序列,利用np.random模块有多种实现方式,但是其速度差距较大。本文提供了4种生成方法,各有千秋。总的来说,根据不同的样本规模和取值范围,方法1和方法2表现较好,其余方法不是很稳定。具体如下:
1.当样本规模接近取值范围上界时,方法2即采用arange() + shuffle(arr)最快;
2.当样本规模与取值范围上界差距较大时,方法1即采用randint() + unique()最快;
3.方法3和4的速度更慢,且在上界较大时速度波动很大;
4.方法1的性能总体上最为稳定、快速,但方法2在取值范围与样本大小接近时表现最好。

代码及实验数据:

import numpy as np
import random, time


# 设定取值范围和样本规模。注意:这个参数对以下4种方法的速度有较大影响
upper_bound = 1,000,001
batch_size   = 1,000,000

# 方法一,randint() + unique():总体较好,速度稳定,但需要额外生成约1.3倍数据并作截断处理

t1 = time.perf_counter()
a = np.random.randint(low=0, high=upper_bound, size=int(batch_size* 1.3) )     # 因有重复,故须放大
a = np.unique(a)[:batch_size]
t2 = time.perf_counter()
print("randint() + unique() takes: ", t2 – t1)


# 方法二: arange() + shuffle(arr),总体不错,在取值范围与样本大小接近时表现优异
t3 = time.perf_counter()
b = np.arange(0,upper_bound,1,dtype=np.int16)
np.random.shuffle(b)
b = b[:batch_size]
t4 = time.perf_counter()
print("numpy + shuffle() takes: ", t4 – t3)

#方法三: random.sample(),虽然时间比较稳定,但总体较慢

t5 = time.perf_counter()
indices = np.arange(upper_bound)
np.random.shuffle(indices)
idx = indices[:batch_size]
d = indices[idx]
t6 = time.perf_counter()
print("arange + shuffle() takes: ", t6 – t5)

#方法四: random.choice(),速度慢,且最不稳定

t7 = time.perf_counter()
a = np.array(random.sample(range(upper_bound), batch_size))
t8 = time.perf_counter()
print("random sample takes: ", t8 – t7)

print("shape: ", a.shape, b.shape, c.shape, d.shape)

测试记录:

upper_bound= 1,000,001batch_size = 1,000,000                上界与size几乎相同
randint() + unique(arr) takes:  0.11203                                      重复元素太多、速度也相对较慢
arange() + shuffle(arr) takes:  0.02429                                      相对最好
random.sample() takes:  0.80355                                              速度最慢
random.choice() takes:  0.12579
shape:  (727290,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 5,000,001batch_size = 1,000,000                仅上界扩大5倍,size不变

randint() + unique(arr) takes:  0.12324                                      相对最快

arange() + shuffle(arr) takes:  0.15661                                      变慢

random.sample() takes:  0.76003                                                        速度略增,相对最慢

random.choice() takes:  0.6372                                                   变慢很多

shape:  (1000000,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 10,000,001batch_size = 1,000,000              仅上界扩大10倍,size不变

randint() + unique(arr) takes:  0.11756                                   相对最快

arange() + shuffle(arr) takes:  0.40519                                    变慢        

random.sample() takes:  0.73025                                                        速度略增!

random.choice() takes:  1.33287                                                 变慢很多、相对最慢

shape:  (1000000,) (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 50,000,001batch_size = 1,000,000              仅上界扩大50倍,size不变

randint() + unique(arr) takes:  0.11704                                   相对最快

arange() + shuffle(arr) takes:  2.64421                                     变慢很多

random.sample() takes:  0.69041                                                        速度略增!

random.choice() takes:  7. 2729                                                  变慢很多、相对最慢

shape:  (1000000,) (1000000,) (1000000,) (1000000,)

upper_bound= 1,000,001batch_size = 500,000                    上界不变,size降低50%

randint() + unique(arr) takes:  0.05162                                      重复元素较少、速度也相对较慢

arange() + shuffle(arr) takes:  0. 02534                                     相对最好

random.sample() takes:  0. 41961                                              速度最慢

random.choice() takes:  0. 11534                                               

shape:  (477494,) (500000,) (500000,) (500000,)

upper_bound= 1,000,001batch_size = 200,000                    上界不变,size降至20%

randint() + unique(arr) takes:  0. 02002                                     速度最快

arange() + shuffle(arr) takes:  0. 02273                                     速度较快

random.sample() takes:  0. 18611                                              速度最慢

random.choice() takes:  0. 11471                                               

shape:  (200000,) (200000,) (200000,) (200000,)

upper_bound= 1,000,001batch_size = 100,000                    上界不变,size降至10%

randint() + unique(arr) takes:  0.01142                                      速度最快

arange() + shuffle(arr) takes:  0.02853                                      速度较快

random.sample() takes:  0.10654                                                       

random.choice() takes:  0.11218                                                 速度最慢

shape:  (100000,) (100000,) (100000,) (100000,)

upper_bound= 1,000,001batch_size = 50,000                      上界不变,size降至5%

randint() + unique(arr) takes:  0.00417                                      速度最快

arange() + shuffle(arr) takes:  0.02312                                     

random.sample() takes:  0.02741                                                       

random.choice() takes:  0.11936                                                 速度最慢

shape:  (50000,) (50000,) (50000,) (50000,)

upper_bound= 1,000,001batch_size = 10,000                      上界不变,size降至1%

randint() + unique(arr) takes:  0.00072                                      速度最快

arange() + shuffle(arr) takes:  0.02218                                     

random.sample() takes:  0.00503                                                       

random.choice() takes:  0.1143                                                   速度最慢

shape:  (10000,) (10000,) (10000,) (10000,)

upper_bound= 100,001batch_size = 1,000                            上界降至10%size降至0.1%

randint() + unique(arr) takes:  0.0001                                        速度最快

arange() + shuffle(arr) takes:  0.00209

random.sample() takes:  0.00049                                                       

random.choice() takes:  0.01045                                                 速度最慢

shape:  (1000,) (1000,) (1000,) (1000,)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值