Majority Element问题

最新推荐文章于 2023-11-25 00:00:29 发布

JUAN425

最新推荐文章于 2023-11-25 00:00:29 发布

阅读量2.9k

点赞数

分类专栏：刷题 C++ 文章标签：算法 C++

C++ 同时被 2 个专栏收录

139 篇文章 1 订阅

订阅专栏

刷题

36 篇文章 0 订阅

订阅专栏

问题：数组中有一个数字出现的次数超过了数组长度的一半，找出这个数字。

上述这个问题的有很多的解决办法。下面一一描述（注意当要求时间复杂度为O（N）（数组的元素的个数为N），也即线性时间完成任务的时候，有一个非常elegant的算法，称为Moore’s Voting Algorithm， 该算法的时间复杂度为O(n), 空间复杂度为O（1）， 即算法运行的时候只需要常数的空间开销， simple， elegant as well as powerful）。

Majority Element: A majority element in an array A[] of size n is an element that appears more than n/2 times (and hence there is at most one such element).

Write a function which takes an array and emits the majority element (if it exists), otherwise prints NONE as follows:

stack overflow的提示。试想，既然是majority element，那么用这个majority element同那些非majority 相对应，遇到不是majority的element，就用这个majority element将其抵消掉（假如我们知道那一个数是majority），最后剩下的数字必然只有我们的majority element。当然算法运行前，我们并不知道哪一个数是majority element。所以我们就把相异的元素抵消掉，最后剩下的元素就是我们的majority element了。

METHOD 1 (最基本的解决办法)

从前到后遍历数组计数。其实我们只需要遍历一半的数组就可以有两个结果：要么找到了这个众数，要么数组中不存在众数。我们从索引0开始，记录这个位置，从这个位置开始向后遍历。对arr[0]的值进行统计整个数组中出现的次数，如果统计完成之后超过n/2，那么这个数就是我们要找的majority element了，直接输出即可，否则从arr[1]的值进行统计，一次进行下去，只需要到达arr[n/2]我们就能知道答案了。

Time Complexity: O(（n - 1）+(n-1) + ... + n/2) = O(n^2).
Auxiliary Space : O(1).

METHOD 2 (Using Binary Search Tree)

这个办法是使用二叉搜索树进行统计。每个节点保存有数组元素数值以及对应的出现的频率信息，每次插入都要检查插入元素的频率是否超过n/2，超过，即退出，就是找到了这个元素。

Node of the Binary Search Tree (used in this approach) will be as follows.

 
        structtree 
       
        { 
       
          intelement; 
       
          intcount; 
       
        }BST;

Insert elements in BST one by one and if an element is already present then increment the count of the node. At any stage, if count of a node becomes more than n/2 then return.
The method works well for the cases where n/2+1 occurrences of the majority element is present in the starting of the array, for example {1, 1, 1, 1, 1, 2, 3, 4}.

Time Complexity: If a binary search tree is used then time complexity will be O(n^2). If a self-balancing-binary-search tree is used then O(nlogn)
Auxiliary Space: O(n)

METHOD 3 (使用 Moore’s Voting Algorithm)

This is a two step process.
1. Get an element occurring most of the time in the array. This phase will make sure that if there is a majority element then it will return that only.
2. Check if the element obtained from above step is majority element.

1. Finding a Candidate:
The algorithm for first phase that works in O(n) is known as Moore’s Voting Algorithm. Basic idea of the algorithm is if we cancel out each occurrence of an element e with all the other elements that are different from e then e will exist till end if it is a majority element.

Above algorithm loops through each element and maintains a count of a[maj_index], If next element is same then increments the count, if next element is not same then decrements the count, and if the count reaches 0 then changes the maj_index to the current element and sets count to 1.
First Phase algorithm gives us a candidate element. In second phase we need to check if the candidate is really a majority element. Second phase is simple and can be easily done in O(n). We just need to check if count of the candidate element is greater than n/2.

Example:
A[] = 2, 2, 3, 5, 2, 2, 6
Initialize:
maj_index = 0, count = 1 –> candidate ‘2?
2, 2, 3, 5, 2, 2, 6

Same as a[maj_index] => count = 2
2, 2, 3, 5, 2, 2, 6

Different from a[maj_index] => count = 1
2, 2, 3, 5, 2, 2, 6

Different from a[maj_index] => count = 0
Since count = 0, change candidate for majority element to 5 => maj_index = 3, count = 1
2, 2, 3, 5, 2, 2, 6

Different from a[maj_index] => count = 0
Since count = 0, change candidate for majority element to 2 => maj_index = 4
2, 2, 3, 5, 2, 2, 6

Same as a[maj_index] => count = 2
2, 2, 3, 5, 2, 2, 6

Different from a[maj_index] => count = 1

Finally candidate for majority element is 2.

First step uses Moore’s Voting Algorithm to get a candidate for majority element.

2. Check if the element obtained in step 1 is majority

printMajority (a[], size)
1.  Find the candidate for majority
2.  If candidate is majority. i.e., appears more than n/2 times.
       Print the candidate
3.  Else
       Print "NONE"

Implementation of method 3:

<span style="font-size:14px;">/* Program for finding out majority element in an array */
# include<stdio.h>
# define bool int
  
int findCandidate(int *, int);
bool isMajority(int *, int, int);
  
/* Function to print Majority Element */
void printMajority(int a[], int size)
{
  /* Find the candidate for Majority*/
  int cand = findCandidate(a, size);
  
  /* Print the candidate if it is Majority*/
  if(isMajority(a, size, cand))
    printf(" %d ", cand);
  else
    printf("NO Majority Element");
}
  
/* Function to find the candidate for Majority */
int findCandidate(int a[], int size)
{
    int maj_index = 0, count = 1;
    int i;
    for(i = 1; i < size; i++)
    {
        if(a[maj_index] == a[i])
            count++;
        else
            count--;
        if(count == 0)
        {
            maj_index = i;
            count = 1;
        }
    }
    return a[maj_index];
}
  
/* Function to check if the candidate occurs more than n/2 times */
bool isMajority(int a[], int size, int cand)
{
    int i, count = 0;
    for (i = 0; i < size; i++)
      if(a[i] == cand)
         count++;
    if (count > size/2)
       return 1;
    else
       return 0;
}
  
/* Driver function to test above functions */
int main()
{
    int a[] = {1, 3, 3, 1, 2};
    printMajority(a, 5);
    getchar();
    return 0;
}</span>

Time Complexity: O(n)
Auxiliary Space : O(1)

算法证明：

* A one-pass, linear time algorithm that returns the majority element of a
* range of data.  The majority element is an element that appears strictly
* more than half the time.  For example, in the sequence
*
*                                  0 1 0 0 2 0 3
*
* The number 0 is a majority element, since it appears 4/7 times.  However,
* in the sequence
*
*                                 0 1 0 0 2 0 3 3
*
* There is no majority element, since even though 0 occurs 4/8 times, this
* isn't strictly greater than half the elements.
*
* The algorithm for finding the majority element is remarkably simple, but its
* correctness is not immediately obvious.  The algorithm works as follows.  At
* each step, we maintain our "guess" of what the majority element will be, and
* also a counter.  We then scan across the array.  At each point, if the new
* element matches our current guess we increment the counter, and otherwise
* we decrement it.  If the counter is ever zero, then on the next element we
* change the counter to 1 and pick the next element as our guess.  Finally, we
* output the guessed element.  For example, here is the algorithm running on
* the earlier input.  The topmost row shows the input, below that our guess,
* and below that the counter:
*
*  INPUT    0 1 0 0 2 0 3
*  GUESS   ? 0 ? 0 0 0 0 0
*  COUNTER 0 1 0 1 2 1 2 1
*
* Since our guess at the end is zero, we output zero.  This algorithm is
* due to Boyer and Moore and is described in their paper "MJRTY - A Fast
* Majority Vote Algorithm."
*
* There are many ways to think about why this algorithm works.  One good
* intuition is to think of the algorithm as breaking the input down into lots
* of stretches of consecutive copies of particular values.  Incrementing the
* counter then corresponds to marking that multiple copies of the same value
* were found, while decrementing it corresponds to some other sequences of
* values "canceling out" the accumulation of values of a particular type.
*
* A formal proof of correctness of this algorithm (based on the proof in
* Boyer and Moore's paper) relies on a key lemma.  In this section, we'll
* let C be the number that is currently a candidate for the majority element,
* K be its count after some number of steps, and N be the number of total
* elements.
*
* Lemma 1: For any i, 1 <= i <= N, after i steps of the algorithm, the
* elements in the range [1, i] can rearranged into two groups A and B such
* that A is K copies of C, and B is a collection of elements with at most
* i / 2 copies of any one element.
*
* Let's hold off of the proof of this lemma for now, and show that if it
* holds and there is a majority element, the algorithm must be correct.  Using
* the above lemma, note that when the algorithm terminates, there must be some
* element C that was chosen with some count K.  Assume for the sake of
* contradiction that C is not the majority element; then there is some other
* element C' that must be the majority element.  Consequently, there are at
* least n / 2 elements of the range equal to C'.  Let's consider where they
* are.  By the above lemma, all the elements of the input can be broken up
* into groups A and B, where everything in group A has value C and at most
* |B| / 2 elements of |B| have value C'.  Since |A| = K and |A| + |B| = N,
* this means that there are at most (N - K) / 2 copies of C', contradicting
* the fact that C' is the actual majority element.  We have reached a
* contradiction, and so C must be the majority element at the end of the
* algorithm's run.
*
* We can now prove the claim of the lemma by induction on i.  As a base case,
* if i = 1, then K = 1 and C is the first element of the range.  Then we can
* let A be the singleton element and B be the empty set, which trivially obeys
* the criteria of the lemma.  For the inductive step, assume that for some i
* the claim holds and consider the execution of the algorithm on step i + 1.
* Let A and B be the sets A and B from the ith step.  Then we consider three
* possible cases:
*
* 1. On entry to this step, K = 0.  Then after this step finishes, K = 1
*    and C is the newest element.  This means that on entry to this step,
*    A was the empty set and B was some set where no element appeared more
*    than i/2 times in B.  If we then let A' be the singleton set containing
*    the new element and B' = B, then these sets satisfy the requirements of
*    the lemma and the claim holds.
* 2. On entry to this step, K > 0 and the new element matches the current
*    majority element.  Then we can add this element to A to get a new set
*    A' meeting the lemma's requirements, so the claim holds.
* 3. On entry to this step, K > 0, and the new element does not match the
*    current majority element.  This means that the new K is one minus the
*    previous K, but the candidate majority element does not change.  If
*    we then move one element from A into B, then place the new element into
*    the set B, then the updated A and B will satisfy the lemma's claims.
*    This is tedious but simple to check, so I'll leave it as an exercise
*    to the reader. :-)
*
* In the case where there is no majority element, the element produced by the
* algorithm will be arbitrary.  We can then check whether we have the majority
* element by performing a linear scan over the input range and counting the
* frequency of the element.