问题: 数组中有一个数字出现的次数超过了数组长度的一半, 找出这个数字。
上述这个问题的有很多的解决办法。 下面一一描述(注意当要求时间复杂度为O(N)(数组的元素的个数为N), 也即线性时间完成任务的时候, 有一个非常elegant的算法, 称为Moore’s Voting Algorithm, 该算法的时间复杂度为O(n), 空间复杂度为O(1), 即算法运行的时候只需要常数的空间开销, simple, elegant as well as powerful)。
Majority Element: A majority element in an array A[] of size n is an element that appears more than n/2 times (and hence there is at most one such element).
Write a function which takes an array and emits the majority element (if it exists), otherwise prints NONE as follows:
stack overflow的提示。 试想, 既然是majority element, 那么用这个majority element同那些非majority 相对应, 遇到不是majority的element, 就用这个majority element将其抵消掉(假如我们知道那一个数是majority), 最后剩下的数字必然只有我们的majority element。 当然算法运行前, 我们并不知道哪一个数是majority element。 所以我们就把相异的元素抵消掉, 最后剩下的元素就是我们的majority element了。
METHOD 1 (最基本的解决办法)
从前到后遍历数组计数。 其实我们只需要遍历一半的数组就可以有两个结果: 要么找到了这个众数,要么数组中不存在众数。 我们从索引0开始, 记录这个位置, 从这个位置开始向后遍历。 对arr[0]的值进行统计整个数组中出现的次数, 如果统计完成之后超过n/2, 那么这个数就是我们要找的majority element了, 直接输出即可, 否则从arr[1]的值进行统计, 一次进行下去, 只需要到达arr[n/2]我们就能知道答案了。
Time Complexity: O((n - 1)+(n-1) + ... + n/2) = O(n^2).
Auxiliary Space : O(1).
METHOD 2 (Using Binary Search Tree)
这个办法是使用二叉搜索树进行统计。 每个节点保存有数组元素数值以及对应的出现的频率信息, 每次插入都要检查插入元素的频率是否超过n/2, 超过, 即退出, 就是找到了这个元素。
Node of the Binary Search Tree (used in this approach) will be as follows.
struct tree
{
int element;
int count;
}BST;
|
Insert elements in BST one by one and if an element is already present then increment the count of the node. At any stage, if count of a node becomes more than n/2 then return.
The method works well for the cases where n/2+1 occurrences of the majority element is present in the starting of the array, for example {1, 1, 1, 1, 1, 2, 3, 4}.
Time Complexity: If a binary search tree is used then time complexity will be O(n^2). If a self-balancing-binary-search tree is used then O(nlogn)
Auxiliary Space: O(n)
METHOD 3 (使用 Moore’s Voting Algorithm)
This is a two step process.
1. Get an element occurring most of the time in the array. This phase will make sure that if there is a majority element then it will return that only.
2. Check if the element obtained from above step is majority element.
1. Finding a Candidate:
The algorithm for first phase that works in O(n) is known as Moore’s Voting Algorithm. Basic idea of the algorithm is if we cancel out each occurrence of an element e with all the other elements that are different from e then e will exist till end if it is a majority element.
Above algorithm loops through each element and maintains a count of a[maj_index], If next element is same then increments the count, if next element is not same then decrements the count, and if the count reaches 0 then changes the maj_index to the current element and sets count to 1.
First Phase algorithm gives us a candidate element. In second phase we need to check if the candidate is really a majority element. Second phase is simple and can be easily done in O(n). We just need to check if count of the candidate element is greater than n/2.
Example:
A[] = 2, 2, 3, 5, 2, 2, 6
Initialize:
maj_index = 0, count = 1 –> candidate ‘2?
2, 2, 3, 5, 2, 2, 6
Same as a[maj_index] => count = 2
2, 2, 3, 5, 2, 2, 6
Different from a[maj_index] => count = 1
2, 2, 3, 5, 2, 2, 6
Different from a[maj_index] => count = 0
Since count = 0, change candidate for majority element to 5 => maj_index = 3, count = 1
2, 2, 3, 5, 2, 2, 6
Different from a[maj_index] => count = 0
Since count = 0, change candidate for majority element to 2 => maj_index = 4
2, 2, 3, 5, 2, 2, 6
Same as a[maj_index] => count = 2
2, 2, 3, 5, 2, 2, 6
Different from a[maj_index] => count = 1
Finally candidate for majority element is 2.
First step uses Moore’s Voting Algorithm to get a candidate for majority element.
2. Check if the element obtained in step 1 is majority
printMajority (a[], size)
1. Find the candidate for majority
2. If candidate is majority. i.e., appears more than n/2 times.
Print the candidate
3. Else
Print "NONE"
Implementation of method 3:
<span style="font-size:14px;">/* Program for finding out majority element in an array */
# include<stdio.h>
# define bool int
int findCandidate(int *, int);
bool isMajority(int *, int, int);
/* Function to print Majority Element */
void printMajority(int a[], int size)
{
/* Find the candidate for Majority*/
int cand = findCandidate(a, size);
/* Print the candidate if it is Majority*/
if(isMajority(a, size, cand))
printf(" %d ", cand);
else
printf("NO Majority Element");
}
/* Function to find the candidate for Majority */
int findCandidate(int a[], int size)
{
int maj_index = 0, count = 1;
int i;
for(i = 1; i < size; i++)
{
if(a[maj_index] == a[i])
count++;
else
count--;
if(count == 0)
{
maj_index = i;
count = 1;
}
}
return a[maj_index];
}
/* Function to check if the candidate occurs more than n/2 times */
bool isMajority(int a[], int size, int cand)
{
int i, count = 0;
for (i = 0; i < size; i++)
if(a[i] == cand)
count++;
if (count > size/2)
return 1;
else
return 0;
}
/* Driver function to test above functions */
int main()
{
int a[] = {1, 3, 3, 1, 2};
printMajority(a, 5);
getchar();
return 0;
}</span>
Time Complexity: O(n)Auxiliary Space : O(1)
算法证明:
* A one-pass, linear time algorithm that returns the majority element of a
* range of data. The majority element is an element that appears strictly
* more than half the time. For example, in the sequence
*
* 0 1 0 0 2 0 3
*
* The number 0 is a majority element, since it appears 4/7 times. However,
* in the sequence
*
* 0 1 0 0 2 0 3 3
*
* There is no majority element, since even though 0 occurs 4/8 times, this
* isn't strictly greater than half the elements.
*
* The algorithm for finding the majority element is remarkably simple, but its
* correctness is not immediately obvious. The algorithm works as follows. At
* each step, we maintain our "guess" of what the majority element will be, and
* also a counter. We then scan across the array. At each point, if the new
* element matches our current guess we increment the counter, and otherwise
* we decrement it. If the counter is ever zero, then on the next element we
* change the counter to 1 and pick the next element as our guess. Finally, we
* output the guessed element. For example, here is the algorithm running on
* the earlier input. The topmost row shows the input, below that our guess,
* and below that the counter:
*
* INPUT 0 1 0 0 2 0 3
* GUESS ? 0 ? 0 0 0 0 0
* COUNTER 0 1 0 1 2 1 2 1
*
* Since our guess at the end is zero, we output zero. This algorithm is
* due to Boyer and Moore and is described in their paper "MJRTY - A Fast
* Majority Vote Algorithm."
*
* There are many ways to think about why this algorithm works. One good
* intuition is to think of the algorithm as breaking the input down into lots
* of stretches of consecutive copies of particular values. Incrementing the
* counter then corresponds to marking that multiple copies of the same value
* were found, while decrementing it corresponds to some other sequences of
* values "canceling out" the accumulation of values of a particular type.
*
* A formal proof of correctness of this algorithm (based on the proof in
* Boyer and Moore's paper) relies on a key lemma. In this section, we'll
* let C be the number that is currently a candidate for the majority element,
* K be its count after some number of steps, and N be the number of total
* elements.
*
* Lemma 1: For any i, 1 <= i <= N, after i steps of the algorithm, the
* elements in the range [1, i] can rearranged into two groups A and B such
* that A is K copies of C, and B is a collection of elements with at most
* i / 2 copies of any one element.
*
* Let's hold off of the proof of this lemma for now, and show that if it
* holds and there is a majority element, the algorithm must be correct. Using
* the above lemma, note that when the algorithm terminates, there must be some
* element C that was chosen with some count K. Assume for the sake of
* contradiction that C is not the majority element; then there is some other
* element C' that must be the majority element. Consequently, there are at
* least n / 2 elements of the range equal to C'. Let's consider where they
* are. By the above lemma, all the elements of the input can be broken up
* into groups A and B, where everything in group A has value C and at most
* |B| / 2 elements of |B| have value C'. Since |A| = K and |A| + |B| = N,
* this means that there are at most (N - K) / 2 copies of C', contradicting
* the fact that C' is the actual majority element. We have reached a
* contradiction, and so C must be the majority element at the end of the
* algorithm's run.
*
* We can now prove the claim of the lemma by induction on i. As a base case,
* if i = 1, then K = 1 and C is the first element of the range. Then we can
* let A be the singleton element and B be the empty set, which trivially obeys
* the criteria of the lemma. For the inductive step, assume that for some i
* the claim holds and consider the execution of the algorithm on step i + 1.
* Let A and B be the sets A and B from the ith step. Then we consider three
* possible cases:
*
* 1. On entry to this step, K = 0. Then after this step finishes, K = 1
* and C is the newest element. This means that on entry to this step,
* A was the empty set and B was some set where no element appeared more
* than i/2 times in B. If we then let A' be the singleton set containing
* the new element and B' = B, then these sets satisfy the requirements of
* the lemma and the claim holds.
* 2. On entry to this step, K > 0 and the new element matches the current
* majority element. Then we can add this element to A to get a new set
* A' meeting the lemma's requirements, so the claim holds.
* 3. On entry to this step, K > 0, and the new element does not match the
* current majority element. This means that the new K is one minus the
* previous K, but the candidate majority element does not change. If
* we then move one element from A into B, then place the new element into
* the set B, then the updated A and B will satisfy the lemma's claims.
* This is tedious but simple to check, so I'll leave it as an exercise
* to the reader. :-)
*
* In the case where there is no majority element, the element produced by the
* algorithm will be arbitrary. We can then check whether we have the majority
* element by performing a linear scan over the input range and counting the
* frequency of the element.