LeetCode 004. Median of Two Sorted Arrays


4. Median of Two Sorted Arrays

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Example 1:

nums1 = [1, 3]

nums2 = [2]

The median is 2.0

Example 2:
nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5
class Solution {
public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
    }
};
解题思路:
  • 自己的解题思路
空间复杂度为 O(m+n), 时间复杂度为 O(m+n) 的算法一想就能想到。我就是用的这个方法。先将两个有序数组合并成一个大的数组,之后,直接得出结果。
  • 别人的解题思路
主要加入一个 find_kth() 函数,发现两个数组的第 k 大的数字。
关于 find_kth(), 有一种类似二分的味道,先找到前 k/2 的数组 A ,以及前 k/2 的数组 B 比较这两个数,然后进行后期判断。
学习收获:
  • 找到一个新的网站,第一印象不错的。 http://www.geeksforgeeks.org
  • 这题思路比较难理解,因为有很多边界条件。比如,奇偶的考虑,中间数的考虑。 有待后期再刷一遍。
附件1:程序
1 、自己的程序:
class Solution
{
    public:
    double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2)
    {
        vector<int> temp;
        int len = nums1.size() + nums2.size();
        temp.resize(len, 0);
        for(int i = 0, j1 = 0, j2 = 0; i < len;)
        {
            if(j1 < nums1.size() && j2 < nums2.size())
            {
                temp[i++] = (nums1[j1] < nums2[j2])? nums1[j1++]:nums2[j2++];
            }
            else if(j1 < nums1.size())
            {
                temp[i++] = nums1[j1++];
            }
            else
            {
                temp[i++] = nums2[j2++];
            }
        }
        int median = len >> 1;
        return (len & 1)?double(temp[median]):(temp[median - 1] + temp[median]) / 2.0;
    }
};
2 、别人的程序
find_kth() 查找两个数组第 k 大的数
class Solution
{
    public:
    double findMedianSortedArrays(const vector<int>& A, const vector<int>& B)
    {
        const int m = A.size();
        const int n = B.size();
        int total = m + n;
        if(total & 0x1)
            return find_kth(A.begin(), m, B.begin(), n, total / 2 + 1);
        else
            return (find_kth(A.begin(), m, B.begin(), n, total / 2)
            + find_kth(A.begin(), m, B.begin(), n, total / 2 + 1)) / 2.0;
    }
    private:
    static int find_kth(std::vector<int>::const_iterator A, int m,
                        std::vector<int>::const_iterator B, int n, int k)
    {
        //always assume that m is equal or smaller than n
        if(m > n) return find_kth(B, n, A, m, k);
        if(m == 0) return *(B + k - 1);
        if(k == 1) return min(*A, *B);
        //divide k into two parts
        int ia = min(k / 2, m), ib = k - ia;
        if(*(A + ia - 1) < *(B + ib - 1))
            return find_kth(A + ia, m - ia, B, n, k - ia);
        else if(*(A + ia - 1) > *(B + ib - 1))
            return find_kth(A, m, B + ib, n - ib, k - ib);
        else
            return A[ia - 1];
    }
};
附件2:扩展阅读(简单看一下就好,转载)


This problem is notoriously hard to implement due to all the corner cases. Most implementations consider odd-lengthed and even-lengthed arrays as two different cases and treat them separately. As a matter of fact, with a little mind twist. These two cases can be combined as one, leading to a very simple solution where (almost) no special treatment is needed.

First, let's see the concept of 'MEDIAN' in a slightly unconventional way. That is:

"if we cut the sorted array to two halves of EQUAL LENGTHS, then
median is the AVERAGE OF Max(lower_half) and Min(upper_half), i.e. the
two numbers immediately next to the cut
".

For example, for [2 3 5 7], we make the cut between 3 and 5:

[ 2 3 / 5 7 ]

then the median = (3+5)/2. Note that I'll use '/' to represent a cut, and (number / number) to represent a cut made through a number in this article.

for [2 3 4 5 6], we make the cut right through 4 like this:

[2 3 (4/4) 5 7]

Since we split 4 into two halves, we say now both the lower and upper subarray contain 4. This notion also leads to the correct answer: (4 + 4) / 2 = 4;

For convenience, let's use L to represent the number immediately left to the cut, and R the right counterpart. In [2 3 5 7], for instance, we have L = 3 and R = 5, respectively.

We observe the index of L and R have the following relationship with the length of the array N:

N        Index of L / R
1                0 / 0
2                0 / 1
3                1 / 1  
4                1 / 2      
5                2 / 2
6                2 / 3
7                3 / 3
8                3 / 4

It is not hard to conclude that index of L = (N-1)/2, and R is at N/2. Thus, the median can be represented as

(L + R)/2 = (A[(N-1)/2] + A[N/2])/2

To get ready for the two array situation, let's add a few imaginary 'positions' (represented as #'s) in between numbers, and treat numbers as 'positions' as well.

[ 6 9 13 18 ]  ->   [ # 6 # 9 # 13 # 18 #]    (N = 4)
position index     0 1 2 3 4 5   6 7   8      (N_Position = 9 )
                  
[ 6 9 11 13 18 ]->   [ # 6 # 9 # 11 # 13 # 18 #]   (N = 5)
position index      0 1 2 3 4 5   6 7   8 9 10     (N_Position = 11 )

As you can see, there are always exactly 2*N+1 'positions' regardless of length N. Therefore, the middle cut should always be made on the Nth position (0-based). Since index(L) = (N-1)/2 and index(R) = N/2 in this situation, we can infer that index(L) = (CutPosition-1)/2, index(R) = (CutPosition)/2.

Now for the two-array case:

A1 : [# 1 # 2 # 3 # 4 # 5 #]    (N1 = 5, N1_positions = 11)
A2: [ # 1 # 1 # 1 # 1 #]     (N2 = 4, N2_positions = 9)

Similar to the one-array problem, we need to find a cut that divides the two arrays each into two halves such that

"any number in the two left halves" <= "any number in the two right
halves".

We can also make the following observations

  1. There are 2N1 + 2N2 + 2 position altogether. Therefore, there must be exactly N1 + N2 positions on each side of the cut, and 2 positions directly on the cut.
  2. Therefore, when we cut at position C2 = K in A2, then the cut position in A1 must be C1 = N1 + N2 - k. For instance, if C2 = 2, then we must have C1 = 4 + 5 - C2 = 7.
  3. [# 1 # 2 # 3 # (4/4) # 5 #]   
  4. [# 1 / 1 # 1 # 1 #]  
  5. When the cuts are made, we'd have two L's and two R's. They are
  6. L1 = A1[(C1-1)/2]; R1 = A1[C1/2];
  7. L2 = A2[(C2-1)/2]; R2 = A2[C2/2];

In the above example,

    L1 = A1[( 7 - 1 )/ 2 ] = A1[ 3 ] = 4 ; R1 = A1[ 7 / 2 ] = A1[ 3 ] = 4 ;
    L2 = A2[( 2 - 1 )/ 2 ] = A2[ 0 ] = 1 ; R2 = A1[ 2 / 2 ] = A1[ 1 ] = 1 ;

Now how do we decide if this cut is the cut we want? Because L1, L2 are the greatest numbers on the left halves and R1, R2 are the smallest numbers on the right, we only need

L1 <= R1 && L1 <= R2 && L2 <= R1 && L2 <= R2

to make sure that any number in lower halves <= any number in upper halves. As a matter of fact, since
L1 <= R1 and L2 <= R2 are naturally guaranteed because A1 and A2 are sorted, we only need to make sure:

L1 <= R2 and L2 <= R1.

Now we can use simple binary search to find out the result.

If we have L1 > R1, it means there are too many large numbers on the left half of A1, then we must move C1 to the left (i.e. move C2 to the right);
If L2 > R1, then there are too many large numbers on the left half of A2, and we must move C2 to the left.
Otherwise, this cut is the right one.
After we find the cut, the medium can be computed as (max(L1, L2) + min(R1, R2)) / 2 ;

Two side notes:

A. since C1 and C2 can be mutually determined from each other, we might as well select the shorter array (say A2) and only move C2 around, and calculate C1 accordingly. That way we can achieve a run-time complexity of O(log(min(N1, N2)))

B. The only edge case is when a cut falls on the 0th(first) or the 2Nth(last) position. For instance, if C2 = 2N2, then R2 = A2[2*N2/2] = A2[N2], which exceeds the boundary of the array. To solve this problem, we can imagine that both A1 and A2 actually have two extra elements, INT_MAX at A[-1] and INT_MAX at A[N]. These additions don't change the result, but make the implementation easier: If any L falls out of the left boundary of the array, then L = INT_MIN, and if any R falls out of the right boundary, then R = INT_MAX.

I know that was not very easy to understand, but all the above reasoning eventually boils down to the following concise code:

 double findMedianSortedArrays ( vector <int>& nums1, vector <int>& nums2) {
    int N1 = nums1.size();
    int N2 = nums2.size();
    if (N1 < N2) return findMedianSortedArrays(nums2, nums1);          // Make sure A2 is the shorter one.
   
    if (N2 == 0 ) return ((double)nums1[(N1 -1 )/ 2 ] + (double)nums1[N1/ 2 ])/ 2 // If A2 is empty
   
    int lo = 0 , hi = N2 * 2 ;
    while (lo <= hi) {
        int mid2 = (lo + hi) / 2 ;   // Try Cut 2
        int mid1 = N1 + N2 - mid2;  // Calculate Cut 1 accordingly
       
        double L1 = (mid1 == 0 ) ? INT_MIN : nums1[(mid1 -1 )/ 2 ];         // Get L1, R1, L2, R2 respectively
        double L2 = (mid2 == 0 ) ? INT_MIN : nums2[(mid2 -1 )/ 2 ];
        double R1 = (mid1 == N1 * 2 ) ? INT_MAX : nums1[(mid1)/ 2 ];
        double R2 = (mid2 == N2 * 2 ) ? INT_MAX : nums2[(mid2)/ 2 ];
       
        if (L1 > R2) lo = mid2 + 1 ;                  // A1's lower half is too big; need to move C1 left (C2 right)
        else if (L2 > R1) hi = mid2 - 1 ;     // A2's lower half too big; need to move C2 left.
        else return (max(L1,L2) + min(R1, R2)) / 2 // Otherwise, that's the right cut.
    }
    return -1 ;
}
  1. 一个比较陌生的网站,不过对它的第一印象不错。也是相关题目的题解。 http://www.geeksforgeeks.org/median-of-two-sorted-arrays-of-different-sizes/
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
可以使用二分查找算法来解决这个问题。 首先,我们可以将两个数组合并成一个有序数组,然后求出中位数。但是,这个方法的时间复杂度为 $O(m + n)$,不符合题目要求。因此,我们需要寻找一种更快的方法。 我们可以使用二分查找算法在两个数组中分别找到一个位置,使得这个位置将两个数组分成的左右两部分的元素个数之和相等,或者两部分的元素个数之差不超过 1。这个位置就是中位数所在的位置。 具体来说,我们分别在两个数组中二分查找,假设现在在第一个数组中找到了一个位置 $i$,那么在第二个数组中对应的位置就是 $(m + n + 1) / 2 - i$。如果 $i$ 左边的元素个数加上 $(m + n + 1) / 2 - i$ 左边的元素个数等于 $m$ 个,或者 $i$ 左边的元素个数加上 $(m + n + 1) / 2 - i$ 左边的元素个数等于 $m + 1$ 个,则这个位置就是中位数所在的位置。 具体的实现可以参考以下 Java 代码: ```java public double findMedianSortedArrays(int[] nums1, int[] nums2) { int m = nums1.length, n = nums2.length; if (m > n) { // 保证第一个数组不大于第二个数组 int[] tmp = nums1; nums1 = nums2; nums2 = tmp; int t = m; m = n; n = t; } int imin = 0, imax = m, halfLen = (m + n + 1) / 2; while (imin <= imax) { int i = (imin + imax) / 2; int j = halfLen - i; if (i < imax && nums2[j - 1] > nums1[i]) { imin = i + 1; // i 太小了,增大 i } else if (i > imin && nums1[i - 1] > nums2[j]) { imax = i - 1; // i 太大了,减小 i } else { // i 是合适的位置 int maxLeft = 0; if (i == 0) { // nums1 的左边没有元素 maxLeft = nums2[j - 1]; } else if (j == 0) { // nums2 的左边没有元素 maxLeft = nums1[i - 1]; } else { maxLeft = Math.max(nums1[i - 1], nums2[j - 1]); } if ((m + n) % 2 == 1) { // 总元素个数是奇数 return maxLeft; } int minRight = 0; if (i == m) { // nums1 的右边没有元素 minRight = nums2[j]; } else if (j == n) { // nums2 的右边没有元素 minRight = nums1[i]; } else { minRight = Math.min(nums1[i], nums2[j]); } return (maxLeft + minRight) / 2.0; } } return 0.0; } ``` 时间复杂度为 $O(\log\min(m, n))$。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值