从有n个元素的数组中找出出现次数大于n/3次的元素

3 篇文章 0 订阅


Design an algorithm that, given a list of n elements in an array, finds all the elements that appear more than n/3 times in the list. The algorithm should run in linear time ( n >=0 )

You are expected to use comparisons and achieve linear time. No hashing/excessive space/ and don't use standard linear time deterministic selection algo


I have a correct solution to it. I am gonna post a small piece of code. You need a compiler that support C++ 11 to run the code.
But don't worry if you don't have such one. I know that most of people would prefer English to code. I will explain the idea in English afterward, but, excuse me for I am not a native English speaker.

以下算法可以用来处理更普遍的情况:给定n个元素的数组,找出出现次数大于n/m次的所有元素。并统计他们出现的频率。时间和空间复杂度分别为O(2*N*logM)  O(M)。
The algorithm here is actually not designed dedicatedly to solve this question but to handle a more general case:
Given an array of N numbers, finds all the elements that appear more than N/M times and report the their frequencies.
The time complexity is O(2*N*logM) and space complexity is O(M)
For this question, M = 3, so the time is O(2log3 N) = O(N), space is O(3) = O(1);


算法的思想源于俄罗斯方块这个游戏。假设m=5.

数组 : 4 3 3 2 1 2 3 4 4 7,将每一个元素视为一个从天花板降下的小方块。我们的任务是让相同的方块落在同一行。下图是7“下落”时的截图。

Well, here comes the English:
The idea of the problem is from the famous game: Tetris
Lets see how it works. Consider m = 5;
Given an array : 4 3 3 2 1 2 3 4 4 7, we treat each number as a piece in Tetris, which falls down from the ceil. Our task is to try to keep the same number stacked on the same column. Consider the moment that 7 is going to fall down. The snapshot of our game now is like :


7


4 3
4 3 2
4 3 2 1 _


游戏有m行,这里为5行。

规则同俄罗斯方块一样:当一行满了时可以消去。最后7落下,所以4 3 2 1 7这一行可以消去。消去后如下所示:

Note that, the size of a row is designed as M, which is 5 here.
Just like Tetris, this game has a similar rule that:
if a row is full of numbers then it should be eliminated.
So when 7 goes down, it becomes:
4 3
4 3 2
4 3 2 1 7 //This row is full and to be eliminated
Then the bottom row is gone.
It becomes
4 3
4 3 2


数字不断的落下,最后游戏的每一行都不会满。所以最后一行最多有m-1个元素。

(如果存在出现次数超过n/m次的元素,他们肯定在最后剩下的元素里边。如果有次数超过n/m次的元素,假设他们不在最后剩余的元素中,那么这些元素肯定被消去了,被消去的次数必定大于n/m次,每次消去m个元素,至少消去了n/m+1次,总共消去了n+m个元素,和一共有n个元素矛盾。所以如果有超过n/m次的元素,肯定在最后剩下的元素中出现。)但是最后剩下的元素并不全是出现次数超过n/m次的元素。

所以需要再次扫描元素,计算剩余元素出现的次数,找出次数超过n/m次的元素及他们出现的频率。
As the numbers falls down, eventually, the game will end in a status that no row is full.
So we have at most M - 1 numbers left at the final stage.
But it is not over. We can easily prove that, if there is a solution, it must be in the number left at the final stage; but we can not guarantee all numbers left are the correct solution.
So we need to scan the array again, and count the numbers left to find the correct solutions and report their frequencies.

下面的代码请将n改为m

#include <iostream>
#include <map>
#include <algorithm>
typedef std::map<int, int> Map;
 Map findOverNth(int arr[], int size, int n)
{
	Map ret_map; 
	typedef Map::value_type Elem; //pair<CONST int, int>
	int total = 0;
	std::for_each(arr, arr + size, [&, n](int val) 
	{
		auto ret_pair = ret_map.insert(Elem(val, 0));
		++(*ret_pair.first).second; ++ total;
		if (ret_map.size() == n)
			for (auto iter = ret_map.begin(); iter != ret_map.end(); )
			{
				--(*iter).second; -- total;
				if ((*iter).second == 0)
					ret_map.erase(iter++);
				else
					iter++;
			}
	});
	std::for_each(ret_map.begin(), ret_map.end(), [](Elem &elem) {elem.second = 0;});
	std::for_each(arr, arr + size, [&ret_map](int val) {if (ret_map.find(val) != ret_map.end()) ret_map[val] ++;});
	for (auto iter = ret_map.begin(); iter != ret_map.end(); )
	{
		if ((*iter).second <= size / n)
			ret_map.erase(iter++);
		else 
			iter++;
	}
	return ret_map;
}
using namespace std;
int main()
{
	//int arr[] = {5,6,7,8, 10, 4,4, 4, 4,1, 1,1};
	int arr[] = {5,6,7,8, 10, 10, 10,10,10,10, 4,4, 4, 4,4,1, 1,1,1};
	auto a_map = findOverNth(arr, sizeof(arr)/sizeof(int), 4);
	cout<<sizeof(arr)/sizeof(int)<<endl;
	//cout<<a_map.size()<<endl;
	for each(auto elem in a_map)
	{
		cout<<elem.first<<" "<<elem.second<<endl;
	}
}

在以上代码中使用的事std::map,内部使用红黑树实现。对于每一个元素,将其插入map中。由于俄罗斯方块的消去规则,所以map中的元素个数不能超过m个(即:有m个元素时开始消去元素)。当现存于map中的元素插入时,map中对于元素的计数器加1.如果有新元素插入,将其计数器设置为1.如果map中元素个数大到了m,所有元素的计数器减一。同时消去计数器为0的元素。

最后将map中元素的计数器全部清零。再次扫描元素,将对应元素在map中的计数器加1.然后考查map中每个元素的计数器,大于n/m的即为所求元素。


Now the idea is clear, but the implementation of this idea is more crucial. A bad implementation would result in a very bad complexity. In my code, I used std::map, which, more theoretically, is RB tree.
For each number, I inserted it into the map. I have the rule from the Tetris that the map size cannot reach M. If an existing number is inserted, the count of the number in map is increased by 1; if a new one comes, it is inserted with count = 1; if the map reaches the size limitation M, all the counts in map are forced to be decreased by 1. Of course, if it counts to 0, eliminate it! This is how it works like a Tetris.
You may ask why it is still O(NlogM) if I need to traverse the whole map to decrease each count by 1. That's because in total, every number is added once and erased once, so the amortized complexity is still O(N) for this part. So the total is still O(NlogM).

The remaining part of this algo is trivial then. I believe everyone here can figure it out.

Finally, btw, the variable "total" in my code plays no role but was for debugging, feel free to delete it: )

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值