bucket sort algorithm

最新推荐文章于 2020-12-26 15:29:40 发布

weixin_33859844

最新推荐文章于 2020-12-26 15:29:40 发布

阅读量133

点赞数

文章标签：数据结构与算法

原文链接：https://my.oschina.net/amince/blog/309335

版权

为什么80%的码农都做不了架构师？>>>

桶排序假设输入数组均匀分布，则其平均运行时间为θ(n).同计数排序一样，因为对输入做某种假设,桶排序比较快.不同的是，计数排序假设输入由小区间的整数构成;而桶排序则假设输入是随机产生且均匀分布在区间[0,1)内.

桶排序将区间[0,1)分成m个相同大小的子区间或称为桶，然后将n个元素分别放入各个子区间。因为输入在区间[0,1)均匀分布，所以不会出现所有数据出现在某个区间的情况。接着对每个区间(桶)中的数据进行排序，然后把各个区间(桶)中的数据依次取出，就得到有序的数据。

桶排序伪代码如下:

下图是对输入数组{0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68}处理后的示意图(有点类似于hash遇到相同hash值时的处理):

个人code参照的wiki上,不同于伪代码中引入存储链表的数组,而是全部采用的链表:

#include<iterator>
#include<iostream>
#include<vector>
using namespace std;
const int BUCKET_NUM = 10;
 
struct ListNode
{
	explicit ListNode(double i=0):mData(i), mNext(NULL){}
	ListNode* mNext;
	double mData;
};

/*
* three case:
* 1. before insert val, this bucket is empty, so head is NULL. so just return the newNode's address
* 2. before insert val, this bucket is not empty, and the all exist data is big than new insert val, like case 1, no need enter for loop,
*    so return dummy_Node.mNext is just pre.next = newNode's address.
* 3. before insert val, this bucket is not empty, and not all exist data is big than new insert val, need enter for loop, so need return
*    pre-exist head addres. because enter into for loop, so return dummy_Node.mNext is just head no change, pre now is the front
*    position before insert.
*/
ListNode* insert(ListNode* head, double val)
{
	ListNode dummy_Node;
	ListNode *newNode = new ListNode(val);
	ListNode *pre = NULL, *curr = NULL;
	dummy_Node.mNext = head;				//need by case 3.
	pre = &dummy_Node;
	curr = head;

	//find the position to insert val's node
	while(NULL!=curr && curr->mData<=val)
	{
		pre = curr;
		curr = curr->mNext;
	}
	newNode->mNext = curr;
	pre->mNext = newNode;

	return dummy_Node.mNext;
}

/*
* three case:
* 1. before merge, head1 is NULL, no need other precess, just return head2's address
* 2. before merge, head1 is not NULL, but head2 is NULL, like case 1, no need other precess, just return 
*    head1's address.
* 3. before merge, both head1 and head2 is valid. use a p_dummy_node to link each node by comparing
*     each node in two bucket. last return the head's address.
*/
ListNode* Merge(ListNode *head1, ListNode *head2)
{
	ListNode dummyNode;
	ListNode *p_dummy = &dummyNode;

	while(NULL!=head1 && NULL!=head2)
	{
		if(head1->mData <= head2->mData)
		{
			p_dummy->mNext = head1;
			head1 = head1->mNext;
		}
		else
		{
			p_dummy->mNext = head2;
			head2 = head2->mNext;
		}
		p_dummy = p_dummy->mNext;
	}

	if(NULL != head1)
		p_dummy->mNext = head1;

	if(NULL != head2)
		p_dummy->mNext = head2;		//if head1 is not NULL, will link head2 to the end of head1

	return dummyNode.mNext;
}

/*
bucket sort core process, divide into three steps.
*/
void BucketSort(int n, double arr[])
{
	int i = 0;

	vector<ListNode*> buckets(BUCKET_NUM, (ListNode*)(0));

//step 1: insert all data into each bucket
	for(i=0; i<n; ++i)
	{
 		int index = arr[i]*10;//arr[i]/BUCKET_NUM;		//here may change with input array

		ListNode *head = buckets.at(index);
		buckets.at(index) = insert(head, arr[i]);
	}

//step 2: merge all sorted bucket
	ListNode *head = buckets.at(0);
	for(i=1; i<BUCKET_NUM; ++i)
	{
		head = Merge(head, buckets.at(i));
	}

//step 3: get sorted data from bucket in turn.
	for(i=0; i<n; ++i)
	{
		arr[i] = head->mData;
		head = head->mNext;
	}
}

void print_array(double array[], int length)
{
	int i = 0;

	for(i=0; i<length; i++)
	{
		cout << array[i] << " ";
	}
	cout << endl << endl;
}

int main(void)
{
	double array_src[] = {0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68};//{0.79, 0.13, 0.16, 0.64, 0.39, 0.20, 0.89, 0.53, 0.73, 0.42};
	int array_length = sizeof(array_src)/sizeof(array_src[0]);
	
	BucketSort(array_length, array_src);

	print_array(array_src, array_length);

	return 0;
}

对运行时间稍作分析,根据伪代码可知除了第8行n次循环插入排序，其他处理时间均为θ(n)，易知插入排序运行时间为θ(n2)(这里n为每个桶中数据个数)。容易得出下面性质:

就算桶排序输入数据不是均匀分布,只要满足桶中元素个数的平方和同元素总个数n呈线性关系，则桶排序仍然能以线性时间运行。

至于如何满足n拆分出m个数相加，m个数的平方和同n是线性关系，这算数学证明题了，有兴趣的同学也可以研究下。

reference:

算法导论英文版第3版

http://zh.wikipedia.org/wiki/%E6%A1%B6%E6%8E%92%E5%BA%8F

转载于:https://my.oschina.net/amince/blog/309335