【算法】04 离散分布不放回抽样

最新推荐文章于 2023-01-28 17:37:07 发布

卢家波

最新推荐文章于 2023-01-28 17:37:07 发布

阅读量2.7k

点赞数 1

分类专栏： # 算法文章标签：算法概率论抽样离散分布

本文链接：https://blog.csdn.net/weixin_43012724/article/details/122024276

版权

算法专栏收录该内容

11 篇文章

订阅专栏

问题需求

在用C++实现SCE-UA算法的CCE部分时，遇到这样一个具体问题：
对一个具有离散分布的总体进行不放回抽样，比如有一个总体个数为24的数组{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24}，各位的概率分别为{48, 46, 44, 42, 40, 38, 36, 34, 32, 30, 28, 26, 24, 22, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2}，换算成小数为{0.08,0.076667,0.073333,0.07,0.066667,0.063333,0.06,0.056667,0.053333,0.05,0.046667,0.043333,0.04,0.036667,0.033333,0.03,0.026667,0.023333,0.02,0.016667,0.013333,0.01,0.006667,0.003333}，权重总和为1，从中不放回地抽取10个数，要求每个数字都按照离散分布所具有的概率抽取。

解题思路

在C++实现过程中，直接对数组进行抽样可能比较困难，因此可以对这个数组的下标0-23共24个连续整数进行抽样，每个下标所具有的概率与实际对应的数的概率相同。路径E:\Master\study\Cpp\RandomSample

失败尝试

包含头文件 #include <algorithm>，用到其中的std::sample从总体中抽样。但是只能均匀抽样，而不能按照指定的权重抽样，即无法把离散分布dist放入到sample中作为参数，因此采用这种方法是行不通的。
测试代码：

#include <iostream>
#include <vector>
#include <random>
#include <iomanip>
#include <algorithm>
#include <map>

int main()
{
	//随机数引擎采用设备熵值保证随机性
	auto gen = std::mt19937{ std::random_device{}() };

	std::vector<int> wts(24); //存储权重值

	std::vector<int> in(24);  //存储总体

	std::vector<int> out;  //存储抽样结果

	std::map<int, int> count;  //输出计数

	int sampleSize = 10;  //抽取样本的数量

	int sampleTimes = 10000;  //抽取样本的次数

	//权重赋值
	for (int i = 0; i < 24; i++)
	{
		wts.at(i) = 48 - 2 * i;
	}

	//总体赋值
	for (int i = 0; i < 24; i++)
	{
		in.at(i) = i + 1;
	}


	//产生按照给定权重的离散分布
	std::discrete_distribution<size_t> dist{ std::begin(wts), std::end(wts) };

	auto probs = dist.probabilities(); // 返回概率计算结果

	//输出概率计算结果
	std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
	{ std::cout << std::fixed << std::setprecision(5), " "});

	std::cout << std::endl;

	//抽样测试
	for (size_t j = 0; j < sampleTimes; j++)
	{
		//测试sample函数
		std::sample(in.begin(), in.end(), std::back_inserter(out), sampleSize, gen);

		for (size_t i = 0; i < sampleSize; i++)
		{
			//std::cout << out.at(i) << " ";  //输出抽样结果

			count[out.at(i)] += 1;  //抽样结果计数
		}

		out.clear(); //清空输出数组，为下次抽样做准备
	}

	double sum = 0.0;  //用于概率求和

	//输出抽样结果
	for (size_t i = 1; i <= 24; i++)
	{
		std::cout << i << "共有" << count[i] << "个   频率为：" << count[i] / double(sampleTimes * sampleSize) << std::endl;

		sum += count[i] / double(sampleTimes * sampleSize);
	}

	std::cout << "总频率为：" << sum << std::endl;  //输出总概率

	std::cin.get();  //保留控制台窗口
	return 0;
}

输出结果：

0.08000 0.07667 0.07333 0.07000 0.06667 0.06333 0.06000 0.05667 0.05333 0.05000 0.04667 0.04333 0.04000 0.03667 0.03333 0.03000 0.02667 0.02333 0.02000 0.01667 0.01333 0.01000 0.00667 0.00333
1共有4147个   频率为：0.04147
2共有4169个   频率为：0.04169
3共有4147个   频率为：0.04147
4共有4157个   频率为：0.04157
5共有4152个   频率为：0.04152
6共有4161个   频率为：0.04161
7共有4158个   频率为：0.04158
8共有4184个   频率为：0.04184
9共有4046个   频率为：0.04046
10共有4087个   频率为：0.04087
11共有4174个   频率为：0.04174
12共有4074个   频率为：0.04074
13共有4197个   频率为：0.04197
14共有4196个   频率为：0.04196
15共有4221个   频率为：0.04221
16共有4188个   频率为：0.04188
17共有4234个   频率为：0.04234
18共有4250个   频率为：0.04250
19共有4185个   频率为：0.04185
20共有4207个   频率为：0.04207
21共有4210个   频率为：0.04210
22共有4195个   频率为：0.04195
23共有4131个   频率为：0.04131
24共有4130个   频率为：0.04130
总频率为：1.00000

原始方案

需要包含头文件#include <random>，用到其中的std::discrete_distribution产生离散分布上的随机整数。每次仅从离散分布中抽取一个数字，放入集合set中，利用集合无重复和有序的特点，循环抽取直至样本数达到要求。
测试代码：

#include <iostream>
#include <vector>
#include <random>
#include <iomanip>
#include <map>
#include <set>


int main()
{
	//随机数引擎采用默认引擎
	std::default_random_engine rng;

	//随机数引擎采用设备熵值保证随机性
	auto gen = std::mt19937{ std::random_device{}() };

	std::vector<int> wts(24); //存储权重值

	std::vector<int> in(24);  //存储总体

	std::set<int> out;  //存储抽样结果

	std::map<int, int> count;  //输出计数

	int sampleCount = 0;  //抽样次数计数

	int index = 0;  //抽取的下标

	int sampleSize = 24;  //抽取样本的数量

	int sampleTimes = 100000;  //抽取样本的次数

	//权重赋值
	for (int i = 0; i < 24; i++)
	{
		wts.at(i) = 48 - 2 * i;
	}

	//总体赋值并输出
	std::cout << "总体为24个：" << std::endl;
	
	//赋值
	for (int i = 0; i < 24; i++)
	{
		in.at(i) = i + 1;

		std::cout << in.at(i) << " ";
	}

	std::cout << std::endl;

	//产生按照给定权重的离散分布
	std::discrete_distribution<size_t> dist{ std::begin(wts), std::end(wts) };	

	auto probs = dist.probabilities(); // 返回概率计算结果

	//输出概率计算结果
	std::cout << "总体中各数据的权重为：" << std::endl;

	std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
	{ std::cout << std::fixed << std::setprecision(5), " "});
	
	std::cout << std::endl << std::endl;

	//==========抽样测试==========
	for (size_t j = 0; j < sampleTimes; j++)
	{
		index = dist(gen);

		//std::cout << index << " ";  //输出抽样结果

		count[index] += 1;  //抽样结果计数		
	}

	double sum = 0.0;  //用于概率求和

	//输出抽样结果
	std::cout << "总共抽样" << sampleTimes << "次，" << "各下标的频数及频率为：" << std::endl;

	for (size_t i = 0; i < 24; i++)
	{
		std::cout << i << "共有" << count[i] << "个   频率为：" << count[i] / double(sampleTimes) << std::endl;

		sum += count[i] / double(sampleTimes);
	}

	std::cout << "总频率为：" << sum << std::endl << std::endl;  //输出总概率
	//==========抽样测试==========

	//从总体中抽样放入集合中，直至集合大小达到样本数
	while (out.size() < sampleSize)
	{
		index = dist(gen);

		out.insert(index);

		sampleCount += 1;
	}

	//输出抽样结果
	std::cout << "从总体中抽取的" << sampleSize <<"个样本的下标索引为：" << std::endl;

	for (auto iter : out)
	{
		std::cout << iter << " ";
	}

	std::cout << std::endl;

	//输出抽样次数
	std::cout << "抽样次数为：" << sampleCount << std::endl;

	out.clear(); //清空输出集合，为下次抽样做准备

	std::cin.get(); //保留控制台窗口
	return 0;
}

输出结果：

总体为24个：
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
总体中各数据的权重为：
0.08000 0.07667 0.07333 0.07000 0.06667 0.06333 0.06000 0.05667 0.05333 0.05000 0.04667 0.04333 0.04000 0.03667 0.03333 0.03000 0.02667 0.02333 0.02000 0.01667 0.01333 0.01000 0.00667 0.00333

总共抽样100000次，各下标的频数及频率为：
0共有7935个   频率为：0.07935
1共有7674个   频率为：0.07674
2共有7355个   频率为：0.07355
3共有6931个   频率为：0.06931
4共有6596个   频率为：0.06596
5共有6386个   频率为：0.06386
6共有6002个   频率为：0.06002
7共有5691个   频率为：0.05691
8共有5375个   频率为：0.05375
9共有4963个   频率为：0.04963
10共有4747个   频率为：0.04747
11共有4314个   频率为：0.04314
12共有3908个   频率为：0.03908
13共有3770个   频率为：0.03770
14共有3370个   频率为：0.03370
15共有3026个   频率为：0.03026
16共有2706个   频率为：0.02706
17共有2319个   频率为：0.02319
18共有1989个   频率为：0.01989
19共有1680个   频率为：0.01680
20共有1285个   频率为：0.01285
21共有981个   频率为：0.00981
22共有668个   频率为：0.00668
23共有329个   频率为：0.00329
总频率为：1.00000

从总体中抽取的24个样本的下标索引为：
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
抽样次数为：264

可以看到，为了抽取24个样本，竟然对总体抽样了264次才完成抽取，这效率无疑是极低的。

改进方案

在完成每次抽取后，将抽到的下标索引的权重设置为0，保证不会再被抽取到，这样就保证抽取次数就和样本数相同了。要注意把最后一次抽取单独出来，是避免样本数等于总体的情况下，最后一次抽取后将所有权重都为0的权重数组赋值给离散分布dist，产生错误。考虑到样本等于总体的情况较少，因此没有在之前加一个样本是否等于总体的判断，而是把最后一次抽取单独在循环外执行，提高执行效率，避免判断带来的开销。
测试代码：

//STL改进方案
#include <iostream>
#include <vector>
#include <random>
#include <iomanip>
#include <map>
#include <set>

int main()
{
	//随机数引擎采用默认引擎
	std::default_random_engine rng;

	//随机数引擎采用设备熵值保证随机性
	auto gen = std::mt19937{ std::random_device{}() };

	std::vector<int> wts(24); //存储权重值

	std::vector<int> in(24);  //存储总体

	std::set<int> out;  //存储抽样结果

	std::map<int, int> count;  //输出计数

	int sampleCount = 0;  //抽样次数计数

	int index = 0;  //抽取的下标

	int sampleSize = 24;  //抽取样本的数量

	int sampleTimes = 100000;  //抽取样本的次数

	//权重赋值
	for (int i = 0; i < 24; i++)
	{
		wts.at(i) = 48 - 2 * i;
	}

	//总体赋值并输出
	std::cout << "总体为24个：" << std::endl;

	//赋值
	for (int i = 0; i < 24; i++)
	{
		in.at(i) = i + 1;

		std::cout << in.at(i) << " ";
	}

	std::cout << std::endl;

	//产生按照给定权重的离散分布
	std::discrete_distribution<size_t> dist{ wts.begin(), wts.end() };

	auto probs = dist.probabilities(); // 返回概率计算结果

	//输出概率计算结果
	std::cout << "总体中各数据的权重为：" << std::endl;

	std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
	{ std::cout << std::fixed << std::setprecision(5), " "});

	std::cout << std::endl << std::endl;

	//==========抽样测试==========
	for (size_t j = 0; j < sampleTimes; j++)
	{
		index = dist(gen);

		//std::cout << index << " ";  //输出抽样结果

		count[index] += 1;  //抽样结果计数		
	}

	double sum = 0.0;  //用于概率求和

	//输出抽样结果
	std::cout << "总共抽样" << sampleTimes << "次，" << "各下标的频数及频率为：" << std::endl;

	for (size_t i = 0; i < 24; i++)
	{
		std::cout << i << "共有" << count[i] << "个   频率为：" << count[i] / double(sampleTimes) << std::endl;

		sum += count[i] / double(sampleTimes);
	}

	std::cout << "总频率为：" << sum << std::endl << std::endl;  //输出总概率
	//==========抽样测试==========

	//从总体中抽样放入集合中，直至集合大小达到样本数
	while (out.size() < sampleSize - 1)
	{
		index = dist(gen);  //抽取下标

		out.insert(index);  //插入集合

		sampleCount += 1;   //抽样次数增加1

		wts.at(index) = 0; //将抽取到的下标索引的权重设置为0
		
		dist.param({ wts.begin(), wts.end() });

		probs = dist.probabilities(); // 返回概率计算结果

        //输出概率计算结果
		std::cout << "总体中各数据的权重为：" << std::endl;

		std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
		{ std::cout << std::fixed << std::setprecision(5), " "});

		std::cout << std::endl << std::endl;
	}
	//最后一次抽取，单独出来是避免将所有权重都为0的权重数组赋值给离散分布dist，避免报错
	index = dist(gen);  //抽取下标

	out.insert(index);  //插入集合

	sampleCount += 1;   //抽样次数增加1

	//输出抽样结果
	std::cout << "从总体中抽取的" << sampleSize << "个样本的下标索引为：" << std::endl;

	for (auto iter : out)
	{
		std::cout << iter << " ";
	}

	std::cout << std::endl;

	//输出抽样次数
	std::cout << "抽样次数为：" << sampleCount << std::endl;

	out.clear(); //清空输出集合，为下次抽样做准备

	std::cin.get(); //保留控制台窗口
	return 0;
}

运行结果：

总体为24个：
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
总体中各数据的权重为：
0.08000 0.07667 0.07333 0.07000 0.06667 0.06333 0.06000 0.05667 0.05333 0.05000 0.04667 0.04333 0.04000 0.03667 0.03333 0.03000 0.02667 0.02333 0.02000 0.01667 0.01333 0.01000 0.00667 0.00333

总共抽样100000次，各下标的频数及频率为：
0共有8069个   频率为：0.08069
1共有7804个   频率为：0.07804
2共有7379个   频率为：0.07379
3共有6965个   频率为：0.06965
4共有6705个   频率为：0.06705
5共有6389个   频率为：0.06389
6共有5852个   频率为：0.05852
7共有5679个   频率为：0.05679
8共有5394个   频率为：0.05394
9共有5040个   频率为：0.05040
10共有4600个   频率为：0.04600
11共有4247个   频率为：0.04247
12共有3937个   频率为：0.03937
13共有3618个   频率为：0.03618
14共有3350个   频率为：0.03350
15共有3006个   频率为：0.03006
16共有2706个   频率为：0.02706
17共有2308个   频率为：0.02308
18共有2034个   频率为：0.02034
19共有1658个   频率为：0.01658
20共有1356个   频率为：0.01356
21共有973个   频率为：0.00973
22共有609个   频率为：0.00609
23共有322个   频率为：0.00322
总频率为：1.00000

总体中各数据的权重为：
0.08304 0.07958 0.07612 0.07266 0.06920 0.06574 0.06228 0.05882 0.05536 0.05190 0.04844 0.04498 0.04152 0.00000 0.03460 0.03114 0.02768 0.02422 0.02076 0.01730 0.01384 0.01038 0.00692 0.00346

总体中各数据的权重为：
0.08664 0.08303 0.07942 0.07581 0.07220 0.06859 0.06498 0.06137 0.05776 0.05415 0.05054 0.04693 0.00000 0.00000 0.03610 0.03249 0.02888 0.02527 0.02166 0.01805 0.01444 0.01083 0.00722 0.00361

总体中各数据的权重为：
0.09302 0.08915 0.08527 0.08140 0.07752 0.00000 0.06977 0.06589 0.06202 0.05814 0.05426 0.05039 0.00000 0.00000 0.03876 0.03488 0.03101 0.02713 0.02326 0.01938 0.01550 0.01163 0.00775 0.00388

总体中各数据的权重为：
0.10127 0.09705 0.09283 0.00000 0.08439 0.00000 0.07595 0.07173 0.06751 0.06329 0.05907 0.05485 0.00000 0.00000 0.04219 0.03797 0.03376 0.02954 0.02532 0.02110 0.01688 0.01266 0.00844 0.00422

总体中各数据的权重为：
0.00000 0.10798 0.10329 0.00000 0.09390 0.00000 0.08451 0.07981 0.07512 0.07042 0.06573 0.06103 0.00000 0.00000 0.04695 0.04225 0.03756 0.03286 0.02817 0.02347 0.01878 0.01408 0.00939 0.00469

总体中各数据的权重为：
0.00000 0.11735 0.11224 0.00000 0.10204 0.00000 0.09184 0.00000 0.08163 0.07653 0.07143 0.06633 0.00000 0.00000 0.05102 0.04592 0.04082 0.03571 0.03061 0.02551 0.02041 0.01531 0.01020 0.00510

总体中各数据的权重为：
0.00000 0.12707 0.12155 0.00000 0.11050 0.00000 0.09945 0.00000 0.08840 0.00000 0.07735 0.07182 0.00000 0.00000 0.05525 0.04972 0.04420 0.03867 0.03315 0.02762 0.02210 0.01657 0.01105 0.00552

总体中各数据的权重为：
0.00000 0.13295 0.12717 0.00000 0.11561 0.00000 0.10405 0.00000 0.09249 0.00000 0.08092 0.07514 0.00000 0.00000 0.05780 0.05202 0.00000 0.04046 0.03468 0.02890 0.02312 0.01734 0.01156 0.00578

总体中各数据的权重为：
0.00000 0.00000 0.14667 0.00000 0.13333 0.00000 0.12000 0.00000 0.10667 0.00000 0.09333 0.08667 0.00000 0.00000 0.06667 0.06000 0.00000 0.04667 0.04000 0.03333 0.02667 0.02000 0.01333 0.00667

总体中各数据的权重为：
0.00000 0.00000 0.15278 0.00000 0.13889 0.00000 0.12500 0.00000 0.11111 0.00000 0.09722 0.09028 0.00000 0.00000 0.06944 0.06250 0.00000 0.04861 0.00000 0.03472 0.02778 0.02083 0.01389 0.00694

总体中各数据的权重为：
0.00000 0.00000 0.17460 0.00000 0.15873 0.00000 0.00000 0.00000 0.12698 0.00000 0.11111 0.10317 0.00000 0.00000 0.07937 0.07143 0.00000 0.05556 0.00000 0.03968 0.03175 0.02381 0.01587 0.00794

总体中各数据的权重为：
0.00000 0.00000 0.18182 0.00000 0.16529 0.00000 0.00000 0.00000 0.13223 0.00000 0.11570 0.10744 0.00000 0.00000 0.08264 0.07438 0.00000 0.05785 0.00000 0.00000 0.03306 0.02479 0.01653 0.00826

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.20202 0.00000 0.00000 0.00000 0.16162 0.00000 0.14141 0.13131 0.00000 0.00000 0.10101 0.09091 0.00000 0.07071 0.00000 0.00000 0.04040 0.03030 0.02020 0.01010

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.20253 0.00000 0.17722 0.16456 0.00000 0.00000 0.12658 0.11392 0.00000 0.08861 0.00000 0.00000 0.05063 0.03797 0.02532 0.01266

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.24242 0.00000 0.21212 0.00000 0.00000 0.00000 0.15152 0.13636 0.00000 0.10606 0.00000 0.00000 0.06061 0.04545 0.03030 0.01515

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.28000 0.00000 0.00000 0.00000 0.20000 0.18000 0.00000 0.14000 0.00000 0.00000 0.08000 0.06000 0.04000 0.02000

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.34146 0.00000 0.00000 0.00000 0.24390 0.00000 0.00000 0.17073 0.00000 0.00000 0.09756 0.07317 0.04878 0.02439

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.37037 0.00000 0.00000 0.25926 0.00000 0.00000 0.14815 0.11111 0.07407 0.03704

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.50000 0.00000 0.00000 0.00000 0.00000 0.00000 0.20000 0.15000 0.10000 0.05000

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.58824 0.00000 0.00000 0.00000 0.00000 0.00000 0.23529 0.00000 0.11765 0.05882

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.57143 0.00000 0.28571 0.14286

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.66667 0.00000 0.33333 0.00000

总体中各数据的权重为：
0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000

从总体中抽取的24个样本的下标索引为：
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
抽样次数为：24

随机扰动方案

参考C++: Sampling from discrete distribution without replacement该问题下Aleph0的回答中Easier answer的代码，实现不放回抽样，该回答参考自Faster weighted sampling without replacement回答，具体原理参考自Weighted random sampling with a reservoir这篇论文。简单来说，是把权重值作为指数，均匀分布随机数作为底相当于给权重增加了均匀随机扰动，按照扰动后的权重值从大到小排序，并扩展到索引，然后根据样本数量从前往后直接取数。
测试代码：

//随机扰动方案
#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <map>
#include <random>

int main() 
{
	//随机数引擎采用默认引擎
	std::default_random_engine rng;

	//随机数引擎采用设备熵值保证随机性
	auto gen = std::mt19937{ std::random_device{}() };

	std::vector<double> wts(24); //存储权重值

	std::vector<int> in(24);  //存储总体

	std::map<int, int> count;  //输出计数

	std::uniform_real_distribution<double> u(0.0, 1.0);  //均匀分布

	std::vector<double> vals;

	std::vector<std::pair<int, double>> valsWithIndices;

	std::vector<size_t> samples;  //样本

	int sampleCount = 0;  //抽样次数计数

	int sampleSize = 24;  //抽取样本的数量

	int sampleTimes = 10000;  //抽取样本的次数

	//权重赋值
	for (int i = 0; i < 24; i++)
	{
		wts.at(i) = 48 - 2 * i;
	}

	//总体赋值并输出
	std::cout << "总体为24个：" << std::endl;

	//赋值
	for (int i = 0; i < 24; i++)
	{
		in.at(i) = i + 1;

		std::cout << in.at(i) << " ";
	}

	std::cout << std::endl << std::endl;


	//==========抽样测试==========
	//每次仅抽取1个下标索引
	for (size_t i = 0; i < sampleTimes; i++)
	{
		//把权重值作为指数，均匀分布随机数作为底
		//相当于给权重增加了均匀随机扰动
		for (auto iter : wts) 
		{
			vals.push_back(std::pow(u(gen), 1. / iter));
		}

		//按照扰动后的权重值从大到小排序，并扩展到索引
		for (size_t iter = 0; iter < vals.size(); iter++) 
		{
			valsWithIndices.emplace_back(iter, vals[iter]);
		}

		std::sort(valsWithIndices.begin(), valsWithIndices.end(), [](auto x, auto y) {return x.second > y.second; });
		
		//样本大小sampleSize设置为1
		samples.push_back(valsWithIndices[0].first);
		
		//对样本计数
		count[samples.at(0)] += 1;
		
		vals.clear(); //清空集合，为下次抽样做准备

		valsWithIndices.clear(); //清空集合，为下次抽样做准备

		samples.clear(); //清空样本集合，为下次抽样做准备

	}
	
	double sum = 0.0;  //用于概率求和

	//输出抽样结果
	std::cout << "总共抽样" << sampleTimes << "次，" << "各下标的频数及频率为：" << std::endl;

	for (size_t i = 0; i < 24; i++)
	{
		std::cout << i << "共有" << count[i] << "个   频率为：" << count[i] / double(sampleTimes) << std::endl;

		sum += count[i] / double(sampleTimes);
	}

	std::cout << "总频率为：" << sum << std::endl << std::endl;  //输出总概率
	//==========抽样测试==========

	//==========实际抽样==========
	//把权重值作为指数，均匀分布随机数作为底
	//相当于给权重增加了均匀随机扰动
	for (auto iter : wts)
	{
		vals.push_back(std::pow(u(gen), 1. / iter));
	}

	//按照扰动后的权重值从大到小排序，并扩展到索引
	for (size_t iter = 0; iter < vals.size(); iter++)
	{
		valsWithIndices.emplace_back(iter, vals[iter]);
	}

	std::sort(valsWithIndices.begin(), valsWithIndices.end(), [](auto x, auto y) {return x.second > y.second; });

	//抽样
	for (auto iter = 0; iter < sampleSize; iter++)
	{
		samples.push_back(valsWithIndices[iter].first);
	}

	sampleCount += 1; //抽样次数增加1

	//输出抽样结果
	std::cout << "从总体中抽取的" << sampleSize << "个样本的下标索引为：" << std::endl;

	for (auto iter : samples)
	{
		std::cout << iter << " ";
	}

	std::cout << std::endl;

	//输出抽样次数
	std::cout << "抽样次数为：" << sampleCount << std::endl;

	vals.clear(); //清空集合，为下次抽样做准备

	valsWithIndices.clear(); //清空集合，为下次抽样做准备

	samples.clear(); //清空样本集合，为下次抽样做准备

	//==========实际抽样==========

	std::cin.get(); //保留控制台窗口
	return 0;
}

输出结果：

总体为24个：
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

总共抽样10000次，各下标的频数及频率为：
0共有822个   频率为：0.0822
1共有752个   频率为：0.0752
2共有714个   频率为：0.0714
3共有641个   频率为：0.0641
4共有697个   频率为：0.0697
5共有646个   频率为：0.0646
6共有585个   频率为：0.0585
7共有536个   频率为：0.0536
8共有545个   频率为：0.0545
9共有499个   频率为：0.0499
10共有480个   频率为：0.048
11共有449个   频率为：0.0449
12共有363个   频率为：0.0363
13共有365个   频率为：0.0365
14共有347个   频率为：0.0347
15共有328个   频率为：0.0328
16共有274个   频率为：0.0274
17共有236个   频率为：0.0236
18共有211个   频率为：0.0211
19共有172个   频率为：0.0172
20共有139个   频率为：0.0139
21共有99个   频率为：0.0099
22共有60个   频率为：0.006
23共有40个   频率为：0.004
总频率为：1

从总体中抽取的24个样本的下标索引为：
10 9 5 7 4 13 2 14 21 8 3 11 6 19 0 12 16 1 17 18 15 20 22 23
抽样次数为：1