桶排序假设输入数组均匀分布,则其平均运行时间为θ(n).同计数排序一样,因为对输入做某种假设,桶排序比较快.不同的是,计数排序假设输入由小区间的整数构成;而桶排序则假设输入是随机产生且均匀分布在区间[0,1)内.
桶排序将区间[0,1)分成m个相同大小的子区间或称为桶,然后将n个元素分别放入各个子区间。因为输入在区间[0,1)均匀分布,所以不会出现所有数据出现在某个区间的情况。接着对每个区间(桶)中的数据进行排序,然后把各个区间(桶)中的数据依次取出,就得到有序的数据。
桶排序伪代码如下:
下图是对输入数组{0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68}处理后的示意图(有点类似于hash遇到相同hash值时的处理):
个人code参照的wiki上,不同于伪代码中引入存储链表的数组,而是全部采用的链表:
#include<iterator>
#include<iostream>
#include<vector>
using namespace std;
const int BUCKET_NUM = 10;
struct ListNode
{
explicit ListNode(double i=0):mData(i), mNext(NULL){}
ListNode* mNext;
double mData;
};
/*
* three case:
* 1. before insert val, this bucket is empty, so head is NULL. so just return the newNode's address
* 2. before insert val, this bucket is not empty, and the all exist data is big than new insert val, like case 1, no need enter for loop,
* so return dummy_Node.mNext is just pre.next = newNode's address.
* 3. before insert val, this bucket is not empty, and not all exist data is big than new insert val, need enter for loop, so need return
* pre-exist head addres. because enter into for loop, so return dummy_Node.mNext is just head no change, pre now is the front
* position before insert.
*/
ListNode* insert(ListNode* head, double val)
{
ListNode dummy_Node;
ListNode *newNode = new ListNode(val);
ListNode *pre = NULL, *curr = NULL;
dummy_Node.mNext = head; //need by case 3.
pre = &dummy_Node;
curr = head;
//find the position to insert val's node
while(NULL!=curr && curr->mData<=val)
{
pre = curr;
curr = curr->mNext;
}
newNode->mNext = curr;
pre->mNext = newNode;
return dummy_Node.mNext;
}
/*
* three case:
* 1. before merge, head1 is NULL, no need other precess, just return head2's address
* 2. before merge, head1 is not NULL, but head2 is NULL, like case 1, no need other precess, just return
* head1's address.
* 3. before merge, both head1 and head2 is valid. use a p_dummy_node to link each node by comparing
* each node in two bucket. last return the head's address.
*/
ListNode* Merge(ListNode *head1, ListNode *head2)
{
ListNode dummyNode;
ListNode *p_dummy = &dummyNode;
while(NULL!=head1 && NULL!=head2)
{
if(head1->mData <= head2->mData)
{
p_dummy->mNext = head1;
head1 = head1->mNext;
}
else
{
p_dummy->mNext = head2;
head2 = head2->mNext;
}
p_dummy = p_dummy->mNext;
}
if(NULL != head1)
p_dummy->mNext = head1;
if(NULL != head2)
p_dummy->mNext = head2; //if head1 is not NULL, will link head2 to the end of head1
return dummyNode.mNext;
}
/*
bucket sort core process, divide into three steps.
*/
void BucketSort(int n, double arr[])
{
int i = 0;
vector<ListNode*> buckets(BUCKET_NUM, (ListNode*)(0));
//step 1: insert all data into each bucket
for(i=0; i<n; ++i)
{
int index = arr[i]*10;//arr[i]/BUCKET_NUM; //here may change with input array
ListNode *head = buckets.at(index);
buckets.at(index) = insert(head, arr[i]);
}
//step 2: merge all sorted bucket
ListNode *head = buckets.at(0);
for(i=1; i<BUCKET_NUM; ++i)
{
head = Merge(head, buckets.at(i));
}
//step 3: get sorted data from bucket in turn.
for(i=0; i<n; ++i)
{
arr[i] = head->mData;
head = head->mNext;
}
}
void print_array(double array[], int length)
{
int i = 0;
for(i=0; i<length; i++)
{
cout << array[i] << " ";
}
cout << endl << endl;
}
int main(void)
{
double array_src[] = {0.78,0.17,0.39,0.26,0.72,0.94,0.21,0.12,0.23,0.68};//{0.79, 0.13, 0.16, 0.64, 0.39, 0.20, 0.89, 0.53, 0.73, 0.42};
int array_length = sizeof(array_src)/sizeof(array_src[0]);
BucketSort(array_length, array_src);
print_array(array_src, array_length);
return 0;
}
对运行时间稍作分析,根据伪代码可知除了第8行n次循环插入排序,其他处理时间均为θ(n),易知插入排序运行时间为θ(n2)(这里n为每个桶中数据个数)。容易得出下面性质:
就算桶排序输入数据不是均匀分布,只要满足桶中元素个数的平方和同元素总个数n呈线性关系,则桶排序仍然能以线性时间运行。
至于如何满足n拆分出m个数相加,m个数的平方和同n是线性关系,这算数学证明题了,有兴趣的同学也可以研究下。
reference:
算法导论英文版第3版
http://zh.wikipedia.org/wiki/%E6%A1%B6%E6%8E%92%E5%BA%8F