count sort, radix sort, bucket sort

最新推荐文章于 2024-06-08 14:59:45 发布

qcrao

最新推荐文章于 2024-06-08 14:59:45 发布

阅读量707

点赞数

分类专栏：算法文章标签：排序算法

本文链接：https://blog.csdn.net/qcrao/article/details/50445077

版权

算法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

count sort, radix sort, bucket sort

标签（空格分隔）： algorithms

基于比较的排序算法，都逃不过 $O (n l o g n)$ 的宿命¹。而非基于比较的排序，如计数排序，基数排序，桶排序则无此限制。它们充分利用待排序的数据的某些限定性假设，来避免绝大多数的“比较”操作。

计数排序

http://www.geeksforgeeks.org/counting-sort/

时间复杂度： $O (N + K)$ ,N为元素个数，K为元素最大值。是一种稳定的排序算法。

但是我觉得时间复杂度其实还是 $O (N)$ ，因为不管是计数还是最后把每个元素放入正确的位置都是 $O (N)$ 。

#include <string>
#include <vector>
#include <iostream>

using namespace std;

/*
O(n+k)
最后count数组相当于往后移了一个元素。
*/

void count_sort(string &s)
{
	const int range = 255;
	vector<int> count(range+1,0);

	for (auto c : s)
		count[c]++;

	for (int i = 1; i <= range; i++)
		count[i] += count[i-1];

	string temp(s.size(), ' ');

	//如果改成从右到左循环，则是稳定的。
	//当然还有一种做法，即不用累加count数组，直接扫描count数组，设置一个全局index,这样会有问题：不稳定。但改成从右到左循环，还是稳定的。
	/*
	for (int i = s.size()-1; i >= 0; i--)
	{
		temp[count[s[i]] = s[i];
		count[s[i]]--;
	}
	*/
	for (auto c : s)
	{
		temp[count[c]-1] = c;
		count[c]--;
	}

	s = temp;

}

int main()
{
	string s = "geeksforgeeks";
	count_sort(s);
	cout << s << endl;
}

基数排序

http://www.geeksforgeeks.org/radix-sort/
http://notepad.yehyeh.net/Content/Algorithm/Sort/Radix/Radix.php

基数排序的底层排序可以用计数排序或者桶排序。

Let there be $d$ digits in input integers. Radix Sort takes $O (d * (n + b))$ time where $b$ is the base for representing numbers, for example, for decimal system, $b$ is 10. What is the value of $d$ ? If $k$ is the maximum possible value, then $d$ would be $O(log_b(k))$ . （比如k=1000,b=10,则d=3）So overall time complexity is $O((n+b) * log_b(k))$ . Which looks more than the time complexity of comparison based sorting algorithms for a large $k$ . Let us first limit $k$ . Let $k <= n^c$ where $c$ is a constant. In that case, the complexity becomes $O(nlog_b(n))$ . But it still doesn’t beat comparison based sorting algorithms.

What if we make value of $b$ larger?. What should be the value of $b$
to make the time complexity linear? If we set $b$ as $n$ , we get the
time complexity as $O (n)$ . In other words, we can sort an array of
integers with range from 1 to $n^c$ if the numbers are represented in
base $n$ (or every digit takes $log_2(n)$ bits).

上面最后一段说，如果要给 $1$ ~ $n^c$ 之内的以 $n$ 为基数的数组排序，那么就可以用线性的复杂度完成。

问题：对 $0,n^2-1]$ 的 $n$ 个整数进行线性时间排序。
方法¹是先把整数转换成n进制再排序，这样每个数有两位，范围为[0…n-1],再进行基数排序。http://blog.csdn.net/mishifangxiangdefeng/article/details/7685839

#include <string>
#include <algorithm>
#include <vector>
#include <iostream>

using namespace std;

void countSort(vector<int>& nums, int exp)
{
	int sz = nums.size();
	vector<int> output(sz, 0);
	vector<int> count(10, 0);
	for (auto n : nums)
		count[(n/exp)%10]++;
	
	//count[i]表示i前面有count[i]个数字。i处填nums[count[i]]
	for (int i = 1; i < 10; i++)
		count[i] += count[i-1];

	//从后面开始放nums，稳定的排序
	for (int i = sz-1; i >= 0; i--)
	{
		output[count[(nums[i]/exp)%10]-1] = nums[i];
		count[(nums[i]/exp)%10]--;
	}
	nums = output;
}

void radix_sort(vector<int>& nums)
{
	int m = *max_element(nums.begin(), nums.end());
	for (int exp = 1; m/exp > 0; exp *= 10)
		countSort(nums, exp);
}

int main()
{
	int arr[] = {170, 45, 75, 90, 802, 24, 2, 66};
	vector<int> test(arr, arr+sizeof(arr)/sizeof(int));
	radix_sort(test);
	for (auto r : test)
		cout << r << " ";
	cout << endl;

}

LSD:从关键字优先级低的开始排，循环
MSD：从关键字优先级高的开始排，递归

lsd适合于定长的字符串数组排序：

void lsd(vector<string>& sVec)
{
	const int N = 256+1;
	int w = sVec[0].length();
	int sz = sVec.size();
	for (int d = w-1; d >= 0; d--)
	{
		vector<int> count(N, 0);
		vector<string> temp(N, "");
		for (int i = 0; i < sz; i++)
			count[sVec[i][d]+1]++;
		for (int i = 1; i < N; i++)
			count[i] += count[i-1];
		for (int i = 0; i < sz; i++)
		{
			temp[count[sVec[i][d]]] = sVec[i];
			count[sVec[i][d]]++;
		}
		for (int i = 0; i < sz; i++)
			sVec[i] = temp[i];		
	}
}

int main()
{
	//lsd
	string s1[] = {"dab","add","cab","fab","fee","bad","dad","bee","fed","bed","ebb","ace"};
	vector<string> test1(s1, s1+sizeof(s1)/sizeof(string));
	msd(test1);
	for (auto r : test1)
		cout << r << endl;
}

下面是程序中的count计数方法：
vector sVec: aab, bba, baa。
计的是count[sVec[i][d]+1]++；所以计数如下：

| 0 | …… | ‘a’ | ‘b’ | ‘c’ |
|:----?:----?:----?:----?
| 0 | …… | 0 | 1 | 2 |

第一轮排序（即按第一个字符排序）按count数组将string放置到正确的位置：
aab放到[0]，‘a’$\rightarrow $1 b b a 放到 [1] ，^{'} b^{'}$ \rightarrow $2 b a a 放到 [2] ，^{'} b^{'}$ \rightarrow$2

0	……	‘a’	‘b’	‘c’
0	……	1	3	2

……然后以这种方法分别对第2个，第3个字符排序。

下面用msd的方法对一个字符串数组进行按字典序排列。

根据首字母将数组分成R部分，使用counting sort。
递归地对这R部分使用counting sort。（为了使待排序的字符串的长度不固定，可以统计字符串结束时候的’\0’，并且递归地时候，直接略过该字符串。）

#include <vector>
#include <iostream>
#include <string>

using namespace std;

//lo~hi表示待排序的字符串为sVec[lo, hi-1]。
void msd(vector<string>& sVec, int lo, int hi, int pos)
{
	const int N = 256+1;
	if (hi <= lo+1) return;
	vector<int> count(N, 0);
	vector<string> temp(hi-lo, "");
	int sz = sVec.size();
	//这和一般的count sort计法略有不同。整体往后移了一位
	for (int i = lo; i < hi; i++)
		count[sVec[i][pos]+1]++;

	for (int i = 1; i < N; i++)
		count[i] += count[i-1];
    //这里虽然是从前往后放置，但是仍然是稳定的。因为前面的计数的时候，计的是count[sVec[i][pos]+1]，但是旋转的时候，是从count[sVec[i][pos]]开始放的。
	for (int i = lo; i < hi; i++)
	{
		temp[count[sVec[i][pos]]] = sVec[i];
		count[sVec[i][pos]]++; //相当于把count数组往前移了一个元素
	}
	for (int i = lo; i < hi; i++)
		sVec[i] = temp[i-lo];

	for (int i = 1; i < N-1; i++)
		msd(sVec, lo+count[i], lo+count[i+1], pos+1); //count[i]~count[i+1]相当于对索引为i的元素排序。
}

void msd(vector<string>& sVec)
{
	msd(sVec, 0, sVec.size(), 0);
}

int main()
{
	string s[] = {"dabggg","adda","cabeu","fab","fee","bad","dad","bee","fed","bed","ebb","ace"};
	vector<string> test(s, s+sizeof(s)/sizeof(string));
	msd(test);
	for (auto r : test)
		cout << r << endl;
}

桶排序

http://www.geeksforgeeks.org/bucket-sort-2/

#include <vector>
#include <iostream>
#include <algorithm>

using namespace std;

void bucket_sort(vector<double>& nums)
{
	vector<vector<double>> bucket(10, vector<double>(0));
	for (auto num : nums)
		bucket[10*num].push_back(num);
	for (int i = 0; i < 10; i++)
		sort(bucket[i].begin(), bucket[i].end());
	int index = 0;
	//10个桶
	for (int i = 0; i < 10; i++)
	{
		for (int j = 0; j < bucket[i].size(); j++) 
			nums[index++] = bucket[i][j];
	}
}

int main()
{
	double arr[] = {0.897, 0.565, 0.656, 0.1234, 0.665, 0.3434};
	vector<double> test(arr, arr+sizeof(arr)/sizeof(double));
	bucket_sort(test);
	for (auto r : test)
		cout << r << " ";
	cout << endl;
}

对该算法简单分析，如果数据是期望平均分布的，则每个桶中的元素平均个数为N/M。如果对每个桶中的元素排序使用的算法是快速排序，每次排序的时间复杂度为O(N/Mlog(N/M))。则总的时间复杂度为O(N)+O(M)O(N/Mlog(N/M)) = O(N+ Nlog(N/M)) = O(N + NlogN - NlogM)。当M接近于N是，桶排序的时间复杂度就可以近似认为是O(N)的。就是桶越多，时间效率就越高，而桶越多，空间却就越大，由此可见时间和空间是一个矛盾的两个方面¹。
¹:https://www.byvoid.com/blog/sort-radix

平均复杂度为 $O (n)$ ：将元素放入桶中 $O (n)$ ，收集元素 $O (n)$ ，sort平均 $O (n)$ 。