Random Sampling (with/without replacement) & Random Shuffling

1 Random sample & Random shuffle

1.1 Example

Python中random模块常用的抽样函数:

import random
# random.random() 返回一个[0,1)之间的随机数
# random.uniform(num1, num2) 返回一个[num1,num2]之间的随机数
# random.randint(int1, int2) 输入两个整数int1与int2,返回其中任意一个
# random.choice(lis) 从lis列表中,返回一个随机元素

# random.sample(lis, ele_num) 从lis列表中,随机返回具有ele_num个元素的新列表,原有列表不受影响
# random.shuffle(lis) 将一个lis中的元素随机排列,属于原地操作,即直接改变传入序列的顺序,而不会返回新的序列

对于random.sample() 与random.shuffle()两个函数,博文有过细致的讨论,源码如下

def sample(self, population, k):
        """Chooses k unique random elements from a population sequence.

        Returns a new list containing elements from the population while
        leaving the original population unchanged.  The resulting list is
        in selection order so that all sub-slices will also be valid random
        samples.  This allows raffle winners (the sample) to be partitioned
        into grand prize and second place winners (the subslices).

        Members of the population need not be hashable or unique.  If the
        population contains repeats, then each occurrence is a possible
        selection in the sample.

        To choose a sample in a range of integers, use xrange as an argument.
        This is especially fast and space efficient for sampling from a
        large population:   sample(xrange(10000000), 60)
        """

        # Sampling without replacement entails tracking either potential
        # selections (the pool) in a list or previous selections in a set.

        # When the number of selections is small compared to the
        # population, then tracking selections is efficient, requiring
        # only a small set and an occasional reselection.  For
        # a larger number of selections, the pool tracking method is
        # preferred since the list takes less space than the
        # set and it doesn't suffer from frequent reselections.

        n = len(population)
        if not 0 <= k <= n:
            raise ValueError("sample larger than population")
        random = self.random
        _int = int
        result = [None] * k
        setsize = 21        # size of a small set minus size of an empty list
        if k > 5:
            setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
        if n <= setsize or hasattr(population, "keys"):
            # An n-length list is smaller than a k-length set, or this is a
            # mapping type so the other algorithm wouldn't work.
            pool = list(population)
            for i in xrange(k):         # invariant:  non-selected at [0,n-i)
                j = _int(random() * (n-i))
                result[i] = pool[j]
                pool[j] = pool[n-i-1]   # move non-selected item into vacancy
        else:
            try:
                selected = set()
                selected_add = selected.add
                for i in xrange(k):
                    j = _int(random() * n)
                    while j in selected:
                        j = _int(random() * n)
                    selected_add(j)
                    result[i] = population[j]
            except (TypeError, KeyError):   # handle (at least) sets
                if isinstance(population, list):
                    raise
                return self.sample(tuple(population), k)
        return result
def shuffle(self, x, random=None, int=int):
        """x, random=random.random -> shuffle list x in place; return None.

        Optional arg random is a 0-argument function returning a random
        float in [0.0, 1.0); by default, the standard random.random.
        """

        if random is None:
            random = self.random
        for i in reversed(xrange(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = int(random() * (i+1))
            x[i], x[j] = x[j], x[i]

对于random.shuffle(),其借助Fisher–Yates shuffle思想,

第1步 从0到N-1个元素中随机选择一个与第N-1个替换
第2步 从0到N-2个元素中随机选择一个与第N-2个替换
第k步 从0到N-k个元素中随机选择一个与第N-k个替换
(自身可与自身交换)

容易验证,shuffle后所有排列出现概率是相等的

import random

lis = [1, 2, 3]
count = 0
for test in range(10000):
    random.shuffle(lis)
    if lis == [1, 2, 3]:
        count += 1
print(count)

对于random.sample(),可以看出实现的是 Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement,所有元素被选中概率均为k/n

import random

lis = [1, 2, 2, 3, 3, 3]
count_a = 0
count_b = 0
for test in range(10000):
    a = random.sample(lis, 3)
    b = random.sample(lis, 3)
    if a == [3, 3, 3]:
        count_a += 1
    if sum(b) == 6:  # 含有1,2,3元素即可 期望概率为0.3
        count_b += 1
print(count_a, count_b)

博文作者总结:“在使用MP3听歌的时候,就有两个功能:shuffle,random,二者的区别在于,前者打乱播放顺序,保证所有的歌曲都会播放一遍;而后者每次随机选择一首。” 事实上,shuffle的特性往往能促进泛化

1.2 Random Shuffle的作用

机器学习中,当数据集很大时,所有数据并不会存放在同一位置,造成random.sample这种每次都需要抽样数据放回的操作往往是难以实现的,同时,最近也有研究关注与random reshuffling SGD在何种情形下优于random sampling SGD,可参考论文list如下:
How Good is SGD with Random Shuffling?
Random Reshuffling: Simple Analysis with Vast Improvements
Random Shuffling Beats SGD after Finite Epochs
Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?
Random Reshuffling is Not Always Better

具体分析详见下一篇博文。链接:

2 Random sampling with / without replacement

2.1 概念

二者区别在于:
random sampling with replacement为随机放回抽样,随机选取观测值子集,一个观测值可以被多次选取,总体中的每个元素在每次抽取时被选中的机会是相等的
random sampling without replacement为随机不放回抽样,随机选取观测值的一个子集,一个观测值一旦被选取,就不能再被选取

2.2 方差推导

下面给出系统内分别采取random sampling with/without replacement时,相应的采样均值与方差。
假定系统内共 N N N个用户,考虑finite-sum minimization问题:
min ⁡ x = 1 N ∑ i = 1 N x i \min x=\frac{1}{N}\sum_{i=1}^Nx_i minx=N1i=1Nxi
使用两种抽样方式,抽出 K ( K < N ) K (K < N) K(K<N)个用户,设为

### 如何为小米路由器刷入OpenWRT #### 准备工作 为了成功地将OpenWRT刷入到小米路由器中,需提前准备好必要的工具软件。这包括但不限于一台电脑用于操作、一根网线连接至路由器以及确保拥有最新的Breed固件版本支持的小米路由器型号列表[^1]。 #### 开启Telnet服务 通过特定命令或按钮激活隐藏模式下的telnet功能对于后续步骤至关重要。通常情况下,在浏览器地址栏输入`http://miwifi.com`进入管理界面后找到对应的选项来启用此特性;而对于某些特殊机型,则可能需要借助第三方应用或者按照官方文档指示完成设置过程[^2]。 #### 使用FTP上传文件 一旦开启了上述提到的服务之后就可以利用FTP客户端把breed.bin或者其他所需的镜像放置于设备内部存储空间当中去了。这里推荐使用FileZilla这类简单易用的应用程序来进行传输作业,并确认好目标路径是否正确无误[^3]。 #### 刷写Breed引导程序 当所有准备工作都已就绪之时便可以着手处理最核心的部分——即替换原有的bootloader部分为更加灵活可控的新版breed了。具体做法是在断电状态下按住reset键不放直到电源灯亮起再松手即可自动加载新安装好的环境。 #### 完成OpenWRT系统的部署 最后一步就是正式向flash芯片灌输openwrt.img映像包从而彻底改变原有操作系统架构成为基于Linux内核构建而成的强大网络平台之一。值得注意的是整个过程中要保持稳定供电以免造成不可逆损坏风险存在。 ```bash # 示例代码:通过TFTP服务器发送OpenWRT固件给路由器 tftp -l openwrt-trx-factory.bin -r /dev/mtdblock4 192.168.1.1 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

idkmn_

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值