求两个有序数组的中位数

最新推荐文章于 2022-10-14 10:59:19 发布

cncnlg

最新推荐文章于 2022-10-14 10:59:19 发布

阅读量3.9k

点赞数

1、有两个已排好序的数组A和B，长度均为n，找出这两个数组合并后的中间元素，要求时间代价为O(logn)。
2、假设两个有序数组长度不等，同样的求出中位数。
一：解析：这个题目看起来非常简单。第一题的话：假设数组长度为n, 那么我就把数组1和数组2直接合并，然后再直接找到中间元素。对于这样的方案，第一题和第二题就没有什么区别了。这样的话时间复杂度就是O(n)。通常在这样的情况下，那些要求比较高的面试官就会循循善诱道：“你还有更好的办法吗？” 如果比线性更高效，直接能想到的就是对数了O(log(n))，这个时间复杂度在这里可能吗？当然还是可能的。
算法导论上面的分析是这样的：
Say the two arrays are sorted and increasing, namely A and B.
It is easy to find the median of each array in O(1) time.
Assume the median of array A is m and the median of array B is n. Then，
1、If m==n，then clearly the median after merging is also m，the algorithm holds.
2、If m<=n，then reserve the half of sequence A in which all numbers are greater than m，also reserve the half of sequence B in which all numbers are smaller than n.
Run the algorithm on the two new arrays。
3、If m>n，then reserve the half of sequence A in which all numbers are smaller than m，also reserve the half of sequence B in which all numbers are larger than n.
Run the algorithm on the two new arrays。
Time complexity: O(logn)
下面，我们来画个图，分析一下这个思路：

我们先来分析看看：想到对数的效率，首先想到的就是二分查找，对于这个题目二分查找的意义在哪里呢？
我们找到了A[n/2] 和 B[n/2]来比较，
1、如果他们相等，那样的话，我们的搜索结束了，因为答案已经找到了A[n/2]就肯定是排序后的中位数了。
2、如果我们发现B[n/2] > A[n/2]，说明什么，这个数字应该在 A[n/2]->A[n]这个序列里面，或者在 B[1]-B[n/4]这里面。或者，这里的或者是很重要的，我们可以说，我们已经成功的把问题变成了在排序完成的数组A[n/2]-A[n]和B[0]-B[n/2]里面找到合并以后的中位数，显然递归是个不错的选择了。
3、如果B[n/2] < A[n/2]呢？显然就是在A[0]-A[n/2]和B[n/2]-B[n]里面寻找了。
在继续想，这个递归什么时候收敛呢？当然一个case就是相等的值出现，如果不出现等到这个n==1的时候也就结束了。
照着这样的思路，我们比较容易写出如下的代码，当然边界的值需要自己思量一下（递归代码如下）：

[cpp]view plaincopy 
   
 // 两个长度相等的有序数组寻找中位数  
 int Find_Media_Equal_Length(int a[] , int b[] , int length)  
 {  
     if(length == 1)  
     {  
         return a[0] > b[0] ? b[0] : a[0];  
     }  
     int mid = (length-1)/2;   //奇数就取中间的，偶数则去坐标小的  
     if(a[mid] == b[mid])  
         return a[mid];  
     else if(a[mid] < b[mid])  
     {  
         return Find_Media_Equal_Length(&a[length-mid-1] , &b[0] , mid+1);    //偶数则取剩下的length/2，奇数则取剩下的length/2+1  
         //return Find_Media_Equal_Length(a+length-mid-1 , b , mid+1);  
     }  
     else  
     {  
         return Find_Media_Equal_Length(&a[0] , &b[length-mid-1] , mid+1);  
         //return Find_Media_Equal_Length(a , b+length-mid-1 , mid+1);  
     }  
 }  

非递归代码如下：

[cpp]view plaincopy 
   
 // 非递归代码  
 int Find_Media_Equal_Length(int a[] , int b[] , int length)  
 {  
     int mid;  
     while(1)  
     {  
         if(length == 1)  
         {  
             return a[0] > b[0] ? b[0] : a[0];  
         }  
         mid = (length-1)/2;  
         if(a[mid] == b[mid])  
             return a[mid];  
         else if(a[mid] < b[mid])  
             a = a + length - mid - 1;    // a数组的后半部分  
         else  
             b = b + length - mid - 1;    // b数组的后半部分  
         length = mid + 1;  
     }  
 }  

二：马上有人说那不定长的怎么办呢？一样的，我们还是来画个图看看：

一样的，我们还是把这个两个数组来比较一下，不失一般性，我们假定B数组比A数组长一点。A的长度为n, B的长度为m。比较A[n/2]和B[m/2] 时候。类似的，我们还是分成几种情况来讨论：
a、如果A[n/2] == B[m/2]，那么很显然，我们的讨论结束了。A[n/2]就已经是中位数，这个和他们各自的长度是奇数或者偶数无关。
b、如果A[n/2] < B[m/2]，那么，我们可以知道这个中位数肯定不在[A[0]---A[n/2])这个区间内，同时也不在[B[m/2]---B[m]]这个区间里面。这个时候，我们不能冲动地把[A[0]---A[n/2])和[B[m/2]---B[m]]全部扔掉。我们只需要把[B[m-n/2]---B[m]]和[A[0]---A[n/2])扔掉就可以了。（如图所示的红色线框），这样我们就把我们的问题成功转换成了如何在A[n/2]->A[n]这个长度为 n/2 的数组和 B[1]-B[m-n/2]这个长度为m-n/2的数组里面找中位数了，问题复杂度即可下降了。
c、只剩下A[n/2] > B[m/2]，和b类似的，我们可以把A[n/2]->A[n]这块以及B[1]->B[n/2]这块扔掉了就行，然后继续递归。
我们也可以写出如下的代码：

[cpp]view plaincopy 
   
 // 两个长度不相等的有序数组寻找中位数  
 int Find_Media_Random_Length(int a[] , int lengtha , int b[] , int lengthb)  
 {  
     int mida = lengtha/2;  
     int midb = lengthb/2;  
     int l = (mida <= midb) ? mida : midb;  
     if(lengtha == 1)  
     {  
         if(lengthb % 2 == 0)  
         {  
             if(a[0] >= b[midb])  
                 return b[midb];  
             else if(a[0] <= b[midb-1])  
                 return b[midb-1];  
             return a[0];  
         }  
         else  
             return b[midb];  
     }  
     else if(lengthb == 1)  
     {  
         if(lengtha % 2 == 0)  
         {  
             if(b[0] >= a[mida])  
                 return a[mida];  
             else if(b[0] <= a[mida-1])  
                 return a[mida-1];  
             return b[0];  
         }  
         else  
             return a[mida];  
     }  
     if(a[mida] == b[midb])  
         return a[mida];  
     else if(a[mida] < b[midb])  
         return Find_Media_Random_Length(&a[mida] , lengtha-l , &b[0] , lengthb-l);  
     else  
         return Find_Media_Random_Length(&a[0] , lengtha-l , &b[midb] , lengthb-l);  
 }  

举例如下：
A：1、2、8、9、10
B：1、2、3、4、11
第一次：
A：1、2、{8}、9、10
B：1、2、{3}、4、11
因为8>3，所以第二次：
A：1、{2}、8
B：3、{4}、11
因为2<4，所以第三次：
A：{2}、8
B：{3}、4
因为2<3，所以第四次：
A：{8}
B：{3}

再举一个简单的例子:
A: 1 3 5 7 9
B: 2 4 6 8 10
结果我们都知道是5(下中位数)。
第一步:取5 和 6比较，发现5<6,则在 7 9 2 4中寻找吗？
偶数的情况类似:
A: 1 3 5 7 9 11
B: 2 4 6 8 10 12
第一步:取5 和 6比较，发现5<6,则在 7 9 11 2 4中寻找吗？
个人认为取一半的时候一定需要包含用于比较的两个中位数(无论奇偶)。
就是说上面的两个例子第一步之后:
在A:5 7 9 B:2 4 6中继续找
在A:5 7 9 11 B:2 4 6中继续找
但这样导致新的问题是:新的数组A和B数字个数不一致了！
办法是: 在递归的过程中，当数组中的元素是偶数时，在一个数组中取上中位数，在另一个数组中取下中位数，并且在整个过程中保持不变。在哪个数组中去上中位数，就一直在那个数组中取上中位数，反之亦然。奇数时的情形依旧。

这也就解释了为什么代码中a[mid]<b[mid] 的时候，a数组的开始位置会是： a[length-mid-1]

题目：There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

分析：找两个已排序数组的中位数，其实就是将两个有序数组有序合并后找第K小的数。而找第K小的数，可以将K平分到两个数组中，然后利用一个重要的结论：如果A[k/2-1]<B[k/2-1]，那么A[0]~A[k/2-1]一定在第k小的数的序列当中，可以用反证法证明。

[LeetCode] <wbr>Median <wbr>of <wbr>Two <wbr>Sorted <wbr>Arrays

这个findKth()函数写的非常经典，思路如下：

1. 保持A是短的那一个数组，B是长的

2. 平分k, 一半在A，一半在B （如果A的长度不足K/2,那就pa就指到最后一个）

3. 如果pa的值 < pb的值，那证明第K个数肯定不会出现在pa之前，递归，把A数组pa之前的砍掉，同理递归砍B数组。

4. 递归到 m == 0 （短的数组用完了）就返回 B[k - 1], 或者k == 1（找第一个数）就返回min(A第一个数，B第一个数）。

在CSDN博客上看到了更加详细的分析，而且扩展到第K小数的求解，更具一般性。

链接：http://blog.csdn.net/zxzxy1988/article/details/8587244

今天发现了leetcode上面一道题，觉得非常经典，记录之。

题目是这样的：给定两个已经排序好的数组（可能为空），找到两者所有元素中第k大的元素。另外一种更加具体的形式是，找到所有元素的中位数。本篇文章我们只讨论更加一般性的问题：如何找到两个数组中第k大的元素？不过，测试是用的两个数组的中位数的题目，Leetcode第4题 Median of Two Sorted Arrays
方案1：假设两个数组总共有n个元素，那么显然我们有用O(n)时间和O(n)空间的方法：用merge sort的思路排序，排序好的数组取出下标为k-1的元素就是我们需要的答案。
这个方法比较容易想到，但是有没有更好的方法呢？
方案2：我们可以发现，现在我们是不需要“排序”这么复杂的操作的，因为我们仅仅需要第k大的元素。我们可以用一个计数器，记录当前已经找到第m大的元素了。同时我们使用两个指针pA和pB，分别指向A和B数组的第一个元素。使用类似于merge sort的原理，如果数组A当前元素小，那么pA++，同时m++。如果数组B当前元素小，那么pB++，同时m++。最终当m等于k的时候，就得到了我们的答案——O(k)时间，O(1)空间。
但是，当k很接近于n的时候，这个方法还是很费时间的。当然，我们可以判断一下，如果k比n/2大的话，我们可以从最大的元素开始找。但是如果我们要找所有元素的中位数呢？时间还是O(n/2)=O(n)的。有没有更好的方案呢？
我们可以考虑从k入手。如果我们每次都能够剔除一个一定在第k大元素之前的元素，那么我们需要进行k次。但是如果每次我们都剔除一半呢？所以用这种类似于二分的思想，我们可以这样考虑：

Assume that the number of elements in A and B are both larger than k/2, and if we compare the k/2-th smallest element in A(i.e. A[k/2-1]) and the k-th smallest element in B(i.e. B[k/2 - 1]), there are three results:
(Becasue k can be odd or even number, so we assume k is even number here for simplicy. The following is also true when k is an odd number.)
A[k/2-1] = B[k/2-1]
A[k/2-1] > B[k/2-1]
A[k/2-1] < B[k/2-1]
if A[k/2-1] < B[k/2-1], that means all the elements from A[0] to A[k/2-1](i.e. the k/2 smallest elements in A) are in the range of k smallest elements in the union of A and B. Or, in the other word, A[k/2 - 1] can never be larger than the k-th smalleset element in the union of A and B.

Why?
We can use a proof by contradiction. Since A[k/2 - 1] is larger than the k-th smallest element in the union of A and B, then we assume it is the (k+1)-th smallest one. Since it is smaller than B[k/2 - 1], then B[k/2 - 1] should be at least the (k+2)-th smallest one. So there are at most (k/2-1) elements smaller than A[k/2-1] in A, and at most (k/2 - 1) elements smaller than A[k/2-1] in B.So the total number is k/2+k/2-2, which, no matter when k is odd or even, is surly smaller than k(since A[k/2-1] is the (k+1)-th smallest element). So A[k/2-1] can never larger than the k-th smallest element in the union of A and B if A[k/2-1]
Since there is such an important conclusion, we can safely drop the first k/2 element in A, which are definitaly smaller than k-th element in the union of A and B. This is also true for the A[k/2-1] > B[k/2-1] condition, which we should drop the elements in B.
When A[k/2-1] = B[k/2-1], then we have found the k-th smallest element, that is the equal element, we can call it m. There are each (k/2-1) numbers smaller than m in A and B, so m must be the k-th smallest number. So we can call a function recursively, when A[k/2-1] < B[k/2-1], we drop the elements in A, else we drop the elements in B.

We should also consider the edge case, that is, when should we stop?
1. When A or B is empty, we return B[k-1]( or A[k-1]), respectively;
2. When k is 1(when A and B are both not empty), we return the smaller one of A[0] and B[0]
3. When A[k/2-1] = B[k/2-1], we should return one of them

In the code, we check if m is larger than n to garentee that the we always know the smaller array, for coding simplicy.

设数组A的长度为m, 数组B的长度为n，两个数组都都是递增有序的。

求这两个数组的中位数

首先我们看看中位数的特点，一个大小为n的数组，

如果n是奇数，则中位数只有一个，数组中恰好有 (n-1)/2 个元素比中位数小。

如果n是偶数，则中位数有两个（下中位数和上中位数），这里我们只求下中位数，对于下中位数，

数组中恰好有(n-1)/2个元素比下中位数小。

此题中，中位数只有一个，它前面有 c = (m+n-1)/2 个数比它小。中位数要么出现在数组A中，

要么出现在数组B中，我们先从数组A开始找。考察数组A中的一个元素A[p]，在数组A中，

有 p 个数比A[p]小，如果数组B中恰好有 c-p 个数比 A[p] 小，则俩数组合并后就恰好有 c 个数比A[p]小，

于是A[p]就是要找的中位数。如下图所示：

如果A[p] 恰好位于 B[c-p-1] 和 B[c-p] 之间，则 A[p] 是中位数

如果A[p] 小于 B[c-p-1] ，说明A[p] 太小了，接下来从 A[p+1] ~A[m-1]开始找

如果A[p] 大于 B[c-p] ，说明A[p] 太大了，接下来从 A[0] ~A[p-1]开始找。

如果数组A没找到，就从数组B找。

注意到数组A和数组B都是有序的，所以可以用二分查找。代码如下：

[cpp]view plaincopy 
   
 #include <stdio.h>  
 #include <stdlib.h>  
   
   
 /* 从数组A和B中找下中位数 */  
 int find_median(int *A, int *B, int m, int n, int s, int t)  
 {  
     int  p, c;  
   
     c = (m+n-1)/2;  /* 有多少个数小于下中位数 */  
     p = (s+t)/2;  
   
     /* 如果下中位数不在A中，就从数组B找 */  
     if (s > t) {  
         return find_median(B, A, n, m, 0, n-1);  
     }  
   
     /* 数组A中有p个数小于A[p], 当且进当数组B中有c-p个数小于A[p], A[p]才是中位数 */  
     if (A[p] >= B[c-p-1] && A[p] <= B[c-p]) {  
         return A[p];  
     }  
   
     /* A[p]太小了，从数组A中找一个更大的数尝试 */  
     if (A[p] < B[c-p-1]) {  
         return find_median(A, B, m, n, p+1, t);  
     }  
   
     /* A[p]太大了，从数组A中找一个更小的数尝试 */  
     return find_median(A, B, m, n, s, p-1);  
 }  
   
 int main()  
 {  
     int m, n;  
     int A[]={1,3,5,7,8,9,10,12,24,45,65};  
     int B[]={2,4,6,10,11,12,13,14,17,19,20,34,44,45,66,99};  
   
     m = sizeof(A)/sizeof(int);  
     n = sizeof(B)/sizeof(int);  
   
     printf("%d\n", find_median(A, B, m, n, 0, m-1));  
   
     return 0;  
 }  

cncnlg

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
1
评论
求两个有序数组的中位数

1、有两个已排好序的数组A和B，长度均为n，找出这两个数组合并后的中间元素，要求时间代价为O(logn)。2、假设两个有序数组长度不等，同样的求出中位数。一：解析：这个题目看起来非常简单。第一题的话：假设数组长度为n, 那么我就把数组1和数组2直接合并，然后再直接找到中间元素。对于这样的方案，第一题和第二题就没有什么区别了。这样的话时间复杂度就是O(n)。通常在这样的情况下，那些要求比
复制链接

扫一扫