能否控制系统时间控制随机数_如何在R中控制随机数

最新推荐文章于 2021-08-20 18:19:58 发布

cumian8165

最新推荐文章于 2021-08-20 18:19:58 发布

阅读量453

点赞数

文章标签： python java 大数据机器学习数据分析

原文链接：https://www.freecodecamp.org/news/how-to-control-your-randomizer-in-r-852ae7d8f80c/

版权

能否控制系统时间控制随机数

by Michelle Jones

由米歇尔·琼斯(Michelle Jones)

如何在R中控制随机数 (How to control your randomizer in R)

What happens when you need a particular type of randomization?

当您需要特定类型的随机化时会发生什么？

R中随机数生成概述 (Overview of random number generation in R)

R has at least 20 random number generator functions. Each uses a specific probability distribution to create the numbers. All require you to specify the number of random numbers you want (the above image shows 200). All are available in base R — no packages required.

R具有至少20个随机数生成器函数。每个都使用特定的概率分布来创建数字。所有这些都要求您指定所需的随机数(上图显示200)。所有产品均以R底价提供-无需包装。

Common random number generator distributions are:

常见的随机数生成器分布为：

normal (rnorm): default mean of 0 and standard deviation of 1
正常 (rmrm)：默认平均值为0，标准偏差为1
binomial (rbinom): no defaults, specify the number of trials and the probability of success on each trial
二项式 (rbinom)：无默认值，指定试验次数和每次试验成功的概率
uniform (runif): default minimum value of 0 and maximum value of 1
统一 (runif)：默认最小值0和最大值1

Of the three above, only the binomial random number generator creates integers.

在以上三个中，只有二项式随机数生成器创建整数。

为什么要创建随机数？ (Why create random numbers?)

Problems involving random numbers are very common — there are around 50,000 questions relating to random numbers on Stack Exchange.

涉及随机数的问题非常普遍-Stack Exchange上大约有50,000个与随机数有关的问题。

But why use them?

但是为什么要使用它们呢？

Random numbers have many practical applications. They are used in Monte Carlo simulations. They are used in cryptography. They have been used to produce CAPTCHA content. They are used in slot machines. They have also been used for more mundane tasks such as creating a random sort order for an array of ordered data.

随机数有许多实际应用。它们用于蒙特卡洛模拟。它们用于密码学。它们已用于生成验证码内容。它们用于老虎机。它们也已用于更普通的任务，例如为有序数据数组创建随机排序顺序。

随机数问题 (Problems with random numbers)

Common questions include “are my random numbers actually random?” and “how can I generate non-repeated random numbers?”

常见问题包括“我的随机数实际上是随机的吗？” 和“如何生成非重复的随机数？”

Note: the latter decreases randomness, because the population of possible random numbers is decreased by one each time a random number is drawn. The method is appropriate in situations such as lotteries or bingo, where each ticket or ball can only be drawn once.

注意：后者会降低随机性，因为每次绘制随机数时，可能随机数的总数都会减少一。该方法适用于彩票或宾果游戏等情况，其中每个彩票或球只能抽一次。

This problem brings in another problem! The randomly generated, sampling without replacement numbers must be integers. No one has ticket 5.6932 or bingo ball 0.18967.

这个问题带来了另一个问题！随机生成的，没有替换编号的采样必须是整数。没有人拥有票房5.6932或宾果球0.18967。

随机数问题的实际例子 (A practical example of random number problems)

Let’s take the example that I have 20 female students of the same age. I have four teaching methods that I want to trial. I only want to trial one teaching method for each student. Easy math— I need five students in each group.

让我们以我有20个相同年龄的女学生为例。我想尝试四种教学方法。我只想为每个学生试用一种教学方法。简单的数学—每个小组需要五个学生。

But how do I do this so that each student is randomly assigned?

但是，我该如何做才能让每个学生随机分配？

And how do I make sure that I only have integers produced?

以及如何确保只产生整数？

And how do I do all this while using randomly generated numbers without replacement? I don’t want, for example, six students in one group, and four students in another.

在使用随机生成的数字而不进行替换时，我该怎么做呢？例如，我不希望一组中有六个学生，而另一组中有四个学生。

First, I need to create some dummy data, in R. Let’s create that list of mock female students.

首先，我需要在R中创建一些虚拟数据。让我们创建该模拟女学生列表。

FemaleStudents <- data.frame(Names=c("Alice", "Betty", "Carol", "Denise", "Erica", "Frances", "Gina", "Helen", "Iris", "Julie", "Katherine",                           "Lisa", "Michelle", "Ngaire", "Olivia", "Penelope", "Rachel", "Sarah", "Trudy", "Uma"))

Now we have a one-dimensional dataset of our 20 students.

现在，我们有了20个学生的一维数据集。

We know that the runif() function doesn’t create integers. Why don’t we round the random numbers so that we only get integers and use this function? We can wrap the random number in a rounding function.

我们知道runif()函数不会创建整数。为什么不舍入随机数，以便只获取整数并使用此函数？我们可以将随机数包装在舍入函数中。

Question 1: why am I using the random uniform distribution and not another one, such as the random normal distribution?

问题1：为什么我使用随机均匀分布而不使用另一个，例如随机正态分布？

There are five types of rounding functions in R. We will use round().

R中有五种舍入函数。我们将使用round() 。

So that we get the same results, I will set a seed for the random number generation. Each time we generate random numbers, we will use the same seed. I’ve decided on 5 as the seed. If you do not set a seed, or if you set a seed other than 5, your results will be different than mine.

为了获得相同的结果，我将为随机数生成设置种子。每次生成随机数时，我们将使用相同的种子。我决定将5作为种子。如果未设置种子，或者设置的种子不是5，则结果将与我的不同。

set.seed(5)FemaleStudents$Group <- round(runif(20, 1, 5))

Well, that seemed to work. We have each student allocated to a group numbered between 1 and 5.

好吧，那似乎行得通。我们为每个学生分配了一个编号为1到5的组。

Let’s double check our allocation.

让我们仔细检查我们的分配。

table(FemaleStudents$Group)

1 2 3 4 5 2 6 5 4 3

Darn. Only one of the five groups has the correct number of students (Group 4). Why did this happen?

真是五个组中只有一个具有正确的学生人数(第4组)。为什么会这样？

We can check the numbers actually output by runif() without rounding, and letting the output print to the console. Here, the output prints because I have not assigned the function to an object (for example, to a data.frame variable).

我们可以检查runif()实际输出的数字而不进行舍入，然后将输出打印到控制台。在这里，输出输出是因为我尚未将函数分配给对象(例如，分配给data.frame变量)。

set.seed(5)runif(20,1,5)

[1] 1.800858 3.740874 4.667503 2.137598 1.418601 3.804230 3.111840 4.231741 4.826001 1.441812 2.093140 2.962053 2.273616 3.236691 2.050373[16] 1.807501 2.550103 4.551479 3.219690 4.368718

As we can see, the rounding caused our problem. But if we hadn’t rounded, each student would have been assigned to a different group.

如我们所见，四舍五入引起了我们的问题。但是，如果我们没有四舍五入，则每个学生都会被分配到不同的组。

What do we do?

我们做什么？

样品() (sample())

sample() is now one of my favourite functions in R. Let’s see how it works.

sample()现在是R中我最喜欢的函数之一。让我们看看它是如何工作的。

随机分配给相等大小的组(数量很重要) (Randomly allocate to equally sized groups (counts matter))

How can we use it to randomly assign our 20 students to four equally sized groups?

我们如何使用它将20名学生随机分配到四个相等大小的小组中？

What happens if we try sample() normally?

如果我们正常尝试sample()会怎样？

set.seed(5)FemaleStudents$Sample <- sample(1:5, nrow(FemaleStudents), replace=TRUE)

Question 2: what output did you get when you used table(FemaleStudents$Sample)?

问题2：使用table(FemaleStudents$Sample)时会得到什么输出？

We can fix this problem by creating a vector of group numbers, and then using sampling without replacement from this vector. The rep command is used to create a range of repeated values. You can use it to repeat each number in the series, as I have used here. Number 1 is repeated four times, then number 2 is repeated four times, and so forth. You can also use it to repeat a sequence of numbers, if you use this code instead: rep(1:5,4)

我们可以通过以下方法解决此问题：创建组号向量，然后使用采样而不替换该向量。 rep命令用于创建一系列重复值。您可以使用它来重复序列中的每个数字，就像我在这里使用的那样。数字1重复四次，然后数字2重复四次，依此类推。如果使用以下代码，也可以使用它来重复数字序列： rep(1:5,4)

OurGroups <- rep(1:5, each=4)set.seed(5)FemaleStudents$Sample <- sample(OurGroups, nrow(FemaleStudents), replace=FALSE)

We used our vector of numbers (OurGroups) to allocate our students to groups. We used sampling without replacement (replace=FALSE) from OurGroups because we need to use each value in that vector. We need to remove each value as we use it.

我们使用数字向量( OurGroups )将学生分配到各个组。我们使用了来自OurGroups无替换采样( replace=FALSE )，因为我们需要使用该向量中的每个值。我们需要在使用每个值时将其删除。

And we get the result we wanted!

我们得到了想要的结果！

table(FemaleStudents$Sample)

1 2 3 4 5 4 4 4 4 4

Question 3: why did I still set a seed?

问题3 ：为什么我仍要播种？

Another advantage of sample() is that it doesn’t care about type. We can repeat the allocation using a vector of strings. This can be useful if you don’t want to keep referring back to what “1” means.

sample()另一个优点是它不关心类型。我们可以使用字符串向量重复分配。如果您不想继续参考“ 1”的含义，这将很有用。

OurNamedGroups <- rep(c("Up", "Down", "Charmed", "Strange", "Top"), each=4)set.seed(5)FemaleStudents$Sample2 <- sample(OurNamedGroups, nrow(FemaleStudents), replace=FALSE)table(FemaleStudents$Sample2)

Charmed    Down Strange     Top      Up       4       4       4       4       4

Because we used the same seed, we can see that the same student allocation was performed, irrespective of whether we used numeric or character data for the assignment.

因为我们使用相同的种子，所以我们可以看到执行了相同的学生分配，而不管我们是使用数字还是字符数据进行分配。

table(FemaleStudents$Sample,FemaleStudents$Sample2)       Charmed Down Strange Top Up  1       0    0       0   0  4  2       0    4       0   0  0  3       4    0       0   0  0  4       0    0       4   0  0  5       0    0       0   4  0

当组大小不受限制时随机分配 (Randomly allocate when group size is not restricted)

Sometimes we want to randomly allocate to groups, but we don’t have a vector of groups. We are still only allocating each unit (person, sheep, block of cheese) to a single group, and we use completely random allocation.

有时我们想随机分配给组，但是我们没有组的向量。我们仍然仅将每个单元(人，羊，奶酪块)分配给一个组，并且我们使用完全随机的分配。

Let’s say that our school has a new, special library room. It’s been constructed to be soundproof to give students a better studying environment. The chief librarian would like to know about the experiences of students in that room. The only problem is that the room is limited in size. The chief librarian thinks that around four students is a large enough group to provide the initial feedback.

假设我们学校有一间新的特殊图书室。它被构造成隔音的，以便为学生提供更好的学习环境。首席图书馆员想知道那个房间里学生的经历。唯一的问题是房间的大小是有限的。首席图书馆员认为，大约有四个学生是一个足够大的群体，可以提供初始反馈。

Again, we can use sample() to pick our student groups. In this case, we have “students who will test the room” and “students who won’t test the room”. I’m going to call them “Test” and “Not test”. These labels have been chosen for being 1. short and 2. easily distinguished.

同样，我们可以使用sample()选择学生群体。在这种情况下，我们有“将要测试房间的学生”和“将不测试房间的学生”。我将它们称为“测试”和“未测试”。选择这些标签的原因是1.简短和2.易于区分。

Because we did sampling without replacement earlier, we didn’t specify probabilities of assignment to groups — we simply pulled out an assignment from a vector. Now we are going to use sampling with replacement. With replacement refers to the group, not to the students.

因为我们在不进行替换的情况下进行了抽样，所以我们没有指定分配给组的概率，我们只是从向量中提取了分配。现在，我们将使用抽样替换。替换是指组，而不是学生。

We need to sample with replacement as we only have two groups (“Test”, “Not test”) and 20 students. If we tried to sample without replacement, our code would error.

我们需要替换样本，因为我们只有两组(“测试”，“未测试”)和20名学生。如果我们尝试采样而不进行替换，则我们的代码将出错。

Our code is very similar:

我们的代码非常相似：

set.seed(5)FemaleStudents$Library <- sample(c("Test", "Not test"), nrow(FemaleStudents), replace=TRUE, prob=c(4/20,16/20))table(FemaleStudents$Library)

Not test     Test       15        5

As you can see, we allocated five students to test the room, not four. This type of result is expected when dealing with small samples. However, our allocation of students is completely random. Each student had exactly the same probability of being assigned to test the room. Whether previous students were testers or not had no impact on the allocation of the next student.

如您所见，我们分配了五个学生来测试房间，而不是四个。处理小样本时，这种结果是预期的。但是，我们的学生分配完全是随机的。每个学生被分配测试房间的可能性完全相同。以前的学生是否是测试员，对下一学生的分配没有影响。

Let’s walk through some of that code.

让我们来看一些其中的代码。

I’ve constructed a new variable in the data.frame to collect the allocation (Library).

我在data.frame构造了一个新变量来收集分配( Library )。

Instead of dealing with numbers for group names, I’ve used the strings I mentioned earlier. Because I’ve used strings, the c() must wrap the group names (“Test”, “Not test”) and each group name is separated by a comma.

我没有使用组名的数字，而是使用了前面提到的字符串。因为我使用过字符串，所以c()必须包装组名( “Test”, “Not test” )，并且每个组名都用逗号分隔。

Replacement has been set to TRUE.

替换已设置为TRUE 。

The probability of assignment to either group must be provided. This is the prob=c(4/20,16/20) part of the sample() function. Again, note how c() is used to contain the probabilities. Also of interest is that the probabilities can be expressed as fractions, rather than decimals.

必须提供分配给任一组的概率。这是sample()函数的prob=c(4/20,16/20)部分。再次，请注意c()如何用于包含概率。同样令人感兴趣的是，概率可以表示为分数，而不是小数。

万岁样品() (Hooray for sample())

I use sample() all the time for the work I am doing. The ability to use strings, as well as to restrict numeric output to integers (and define the desired integer range), provides me with more control than trying to use one of the random number functions.

我一直在使用sample()做我的工作。使用字符串以及将数字输出限制为整数(并定义所需的整数范围)的能力比使用随机数函数之一为我提供了更多的控制权。

答案 (Answers)

Answer 1: I used a random uniform distribution because I wanted each value to be equally probable.

答案1 ：我使用随机均匀分布，因为我希望每个值都相等。

Answer 2: I got this output:

答案2 ：我得到以下输出：

1 2 3 4 5 2 7 4 2 5

Answer 3: If we don’t set a seed value, or we use a different one, the allocation of specific students will be different. For example, when the seed is 5, Alice is allocated to group 2. If the seed is 7, Alice is allocated to group 5. Replication is important when code needs to be re-run (for example, in testing).

答案3：如果我们不设置种子值，或者使用其他值，则特定学生的分配会有所不同。例如，当种子为5时，将Alice分配给组2。如果种子为7，则将Alice分配给组5。当需要重新运行代码时(例如，在测试中)，复制非常重要。

翻译自: https://www.freecodecamp.org/news/how-to-control-your-randomizer-in-r-852ae7d8f80c/

能否控制系统时间控制随机数

cumian8165

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
能否控制系统时间控制随机数_如何在R中控制随机数

能否控制系统时间控制随机数by Michelle Jones 由米歇尔·琼斯(Michelle Jones) 如何在R中控制随机数 (How to control your randomizer in R)What happens when you need a particular type of randomization? 当您需要特定类型的随机化时会发生什么？ R中随机数生成概述...
复制链接

扫一扫