绝妙的取样(《编程珠玑(续)》第十三章)

最新推荐文章于 2019-09-06 11:39:51 发布

娃哈哈纯净李

最新推荐文章于 2019-09-06 11:39:51 发布

阅读量548

点赞数

分类专栏：编程珠玑文章标签：随机取样

本文链接：https://blog.csdn.net/u010585135/article/details/40989059

版权

编程珠玑专栏收录该内容

39 篇文章 0 订阅

订阅专栏

取样产生的问题：当从N个数中取样M个数，如果M==N时，那么当M=N-1时，在插入最后一个数时，会有很多次重复，这样就会产生大量的时间耗费。为了避免这个问题，Floyd提出了下面的算法。

从j=N-M+1开始取样，如果随机选取的数已经在集合中了，那么就把第j个数放进集合，这样最多只需要O(N)的复杂度就一定能选择出随机的M个数。

注意：这里把A赋值成0~N-1，这里是没有限制的，可以随机的。

#include<iostream>
#include<set>
using namespace std;

int main()
{
	set<int> S;
	const int N=100;
	int A[N];
	for(int i=0;i<N;i++)
		A[i]=i;
	int M=10;
	for(int i=N-M+1;i<=N;i++)
	{
		int temp=A[rand()%i];
		if(S.find(temp)==S.end())
			S.insert(temp);
		else
			S.insert(A[i-1]);
	}
	for(auto iter=S.begin();iter!=S.end();iter++)
		cout<<*iter<<endl;

	system("pause");
	return 0;
}

当需要产生1~N中随机顺序的M个数时，可以先产生M个数，然后进行随机化，随机化方法可以用《编程珠玑》第二版中第十三章的方法产生。但这样需要O(N)的空间，存放数组A，还需要O(N)的时间，给A赋值并且产生M个随机数。

#include<iostream>
#include<ctime>
using namespace std;

int main()
{
	const int N=10;
	const int M=5;
	int A[10];
	for(int i=0;i<N;i++)
		A[i]=i;
	for(int i=0;i<M;i++)
		swap(A[i],A[i+rand()%(N-i)]);

	system("pause");
	return 0;
}

当N相对于M比较大时，为了减少空间和时间的复杂度，Floyd算法会有更高的效率。空间复杂度是O(M)，时间复杂度最坏是O(M^2)，此时每次插入的值都是前面已经插入的值，最后的时间复杂度是O(M)，此时每次插入的值都不是原来的值。

这里采用哈希set来存储已经存过的值，然后如果当前产生的随机值已经在链表中，则把当前i所表示的值插入到链表中随机数的后面。这样只需要一次遍历，即可产生M个随机的随机数。但是注意此时每次插入时都需要找到当前随机数的值，比较浪费时间。后面有更好的解决办法。

#include<iostream>
#include<list>
#include<unordered_set>
#include<algorithm>
#include<ctime>
using namespace std;

int main()
{
	unordered_set<int> S;
	list<int> L;
	const int N=10;
	int M=5;
	srand(time(NULL));
	for(int i=N-M+1;i<=N;i++)
	{
		int temp=rand()%i;
		if(S.find(temp)==S.end())
		{
			S.insert(temp);
			L.insert(L.begin(),temp);
		}
		else
		{
			L.insert(++find(L.begin(),L.end(),temp),i-1);
			S.insert(i-1);
		}
	}
	for(auto iter=L.begin();iter!=L.end();iter++)
		cout<<*iter<<endl;

	system("pause");
	return 0;
}

为了使插入链表的时间减少，我把链表中的值和其指针存入到unordered_map中，这样在寻找时就会很方便。为了避免冲突，把hash表开成大于N的第一个素数，这样就可以保证不会冲突。因此空间复杂度是O(N)，但此时的时间复杂度却已经降到了O(M)。如果空间不够，则只需要进行hash冲突解决，便可以节省空间，时间复杂度也不会太高。

习题13.6也就解决了。

#include<iostream>
#include<unordered_map>
#include<ctime>
using namespace std;

struct ListNode
{
	int val;
	ListNode *next;
	ListNode(int _val=0):val(_val),next(NULL){}
};

ListNode* insertAfter(int value,ListNode *node)
{
	ListNode *temp=node->next;
	node->next=new ListNode(value);
	node->next->next=temp;
	return node->next;
}

ListNode* insertHead(int value,ListNode *node)
{
	if(node==NULL)
	{
		node=new ListNode(value);
		return node;
	}
	ListNode *temp=node;
	node=new ListNode(value);
	node->next=temp;
	return node;
}

int main()
{
	unordered_map<int,ListNode*> S;
	ListNode *head=NULL;
	const int N=10;
	int A[N];
	for(int i=0;i<N;i++)
		A[i]=i;
	int M=5;
	srand(time(NULL));
	for(int i=N-M+1;i<=N;i++)
	{
		int temp=A[rand()%i];
		if(S.find(temp)==S.end())
		{
			S[temp]=insertHead(temp,head);
			head=S[temp];
		}
		else
			S[A[i-1]]=insertAfter(A[i-1],S[temp]);
	}
	ListNode *curr=head;
	while(curr!=NULL)
	{
		cout<<curr->val;
		curr=curr->next;
	}

	system("pause");
	return 0;
}