LeetCode题目笔记(三) -- Median of Two Sorted Arrays

Problem: Median of Two Sorted Arrays (979/5373 -- 18%)

Problem Description:
There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Struggled to write bug-free code within 20 mins. Still have bugs. Take care of corner cases.

这是一道有点令我抓狂的问题。实际上,现在这道题也没有通过online judge。但是,我打算先把这道题先记下来,日后update。难怪这道题只有18%的通过率。


Step I: Ask Questions

Make sure if they are numbers, strings or other? Are they both ascending or descending, or other cases? What is median?

Step II: Describe Approach to the problem-->Algorithm-->Algorithm Analysis

这个题首先要弄清楚median,median是一组有序数列中间的元素,如果数列长度为偶数,median是中间的两个元素的average。首先,这道题如果不考虑复杂度,就是一个mergeSort的merge的过程,设置两个指针指向两个数组的起始位置,然后按大小依次向后移动指针,就可以得到merge后的数组。这里,只需到找到第m+n/2个元素就可以,但是这样的时间复杂度仍然为O(m+n)。

如何可以完成O(log(m+n))? log的复杂度,让我们直观的想到binary search。可是这个问题如何实现binary search?

Binary Search是在一个有序的数列里search一个元素,首先将这个元素与中间的元素进行比较,在比较的结果除了相等以外,有大于和小于两种情况,这样通过一次比较,就可以将一个在n个元素中寻找1个元素的问题,缩小为在n/2个元素里search 1个元素的问题,由于两个数比较的大小概率基本相等,所以这样的search最大程度上剔除了不合适的元素。前两天看了刘未鹏的一篇文章,讲了有关排序问题与信息论的关系,很不错,推荐看下http://blog.csdn.net/pongba/article/details/2544933

而这个问题是,我们不知道哪个是这个中间元素。但是我们可不可以引用相同的思想来找到这个median呢? 每一次剔除一半的元素,然后通过log(m+n)次比较来找到它呢。

每一次剔除一半的元素,想到的是找A和B分别的median,如果通过什么方法能够剔除一半的A,一半的B,那我们就达成任务了。

如果我们比较median(A)和median(B)会有什么结果? median(A)<median(B),median(A)>median(B),median(A)=median(B). 

1)如果median(A) = median(B). 这很好,因为,不管前面和后面的元素大小如何,median(A)前面有m/2+n/2个元素,不论奇偶情况,我们都可以很快的找到median

2)如果median(A)<median(B)。 类似上面的推断,我们发现,median(A)一定小于median,median(A)<median(B), 那么在B中比median(A)小的数少于n/2,那么在merge之后的数组里,比median(A)小的数少于m/2+n/2,因此median(A)<median; 同理推得,median(B)一定大于median。

经过这一次比较,我们得出结论,median在median(A)和median(B)之间,对应得,我们可以肯定,median肯定大于A[0]~median(A),肯定小于median(B)-B[n-1].

3) 如果median(A) >median(B)。和2)是相同的情况。

我们通过第一次比较,可以剔除m+n/2个元素,达到了目的,那么第二次比较呢?

经过第一次比较,假设是第二种情况,我们知道,至少A[0]~median(A)是在median前面,那么,这个问题转化成了,在A1=median(A)~A[m-1]和B1=B[0]~median(B)这(m+n)/2数里面,找第n/2小的数, 这里出现了问题。这个sub problem和之前的问题不一样了,不是找median了,我有两种选择,1是在两个sub array里面分别找n/4大元素,然后和第一次比较一样,

2是还是比较这两个sub array的median。第一种情况需要考虑是否n/4小于m/2,但是如果仔细考虑,是可以继续的。但是,这样的话,算法复杂度是O(log(m+n))么?我们很显然,第二次没有剔除到一半的元素,所以还是尝试第2种选择。

median(A1)和median(B1)比较,如果median(A1)<median(B1), median在median(A1)~A[m-1]和B[0]~median(B1)之间,那么至少A[0]~median(A1)小于median, median(B1)~B[n-1]大于median

如果median(A1)>median(B1), 那么至少A[0]~median(A)和B[0]~median(B1)小于median,median在median(A)~median(A1)和median(B1)~median(B)之间,median[A1]~A[m-1]和median(B)~B[n-1]大于median

无论那种情况,median的区间都被缩小到了(m+n)/4长度的数组里了。

这样经过log(m+n)次比较,我们一定可以找到将median的区间缩小到长度为1的数组,也就是找到它。

有几个问题需要考虑:

1. 奇偶。因为奇偶长度时,median的定义不同,因此我们需要定义不同的函数来handle

2. 假设我们把区间分别缩小为1,这两个元素那一个是median,还是他们的average?我们需要有一个counter,count这两个数前面有多少个数。

算法:

1. 如果m+n是奇数,找merge后的第(m+n)/2 + 1个元素;如果m+n是偶数,找merge后的第(m+n)/2和第(m+n)/2+1个元素

2. 定义两个量start,end分别指向当前median区间的上下界。定义lastStart和lastEnd,记录上一次的区间,以防止start>(m+n)/2.

while( subA.length()>1 && subB.length()>1 && start < (m+n)/2 )

medianA = subA.length()/2, medianB = subB.length()/2, compare subA[medianA] with subB[medianB];

if( subA[medianA] > subB[medianB] ) start = start + medianB; end = end-medianA:

else start = start + medianA; end = end - medianB;

update array. record last start and last end.

3. if start > (m+n)/2, start = lastStart, end = lastEnd,then merge the two sub array, until start = (m+n)/2

    else if ( subA.length() <=1 ) or (subB.length()<=1 )


//FALSE CODE   174/2098 


   
   
class Solution {
public:
     double findMedianSortedArraysEven ( int A [], int m , int B [], int n ){
         double median ;
         int midA = m / 2 ;
         int midB = n / 2 ;
         int subLenA = m / 2 ;
         int subLenB = n / 2 ;
         int start = 0 ; //lowerBand id of median
         int end = ( m + n ) - 1 ; //upperBand id of median
        
        
         while ( subLenA > 0 || subLenB > 0 )
         {
             if ( A [ midA ] >= B [ midB ] )
             {
                 start = start + subLenB ;
                 if ( start > ( m + n ) / 2 - 1 )
                 {
                     start = start - subLenB ;
                     return findMedianPair ( A , midA , m , B , midB , n , start );
                 }
                 end = end - subLenA ;
                 subLenA = subLenA / 2 ;
                 subLenB = subLenB / 2 ;
                 midA = midA - subLenA ;
                 midB = midB + subLenB ;
             }
             else
             {
                 start = start + subLenA ;
                 if ( start > ( m + n ) / 2 - 1 )
                 {
                     start = start - subLenA ;
                     return findMedianPair ( A , midA , m , B , midB , n , start );
                 }
                 end = end - subLenB ;
                 subLenA = subLenA / 2 ;
                 subLenB = subLenB / 2 ;
                 midA = midA + subLenA ;
                 midB = midB - subLenB ;
             }
         }
        
         //if goes to here, subLenA == 1, subLenB == 1, then
         median = ( A [ midA ] , B [ midB ] ) / 2.0 ;
        
        
         return median ;
     }
     double findMedianSortedArraysOdd ( int A [], int m , int B [], int n ){
         double median ;
         int midA = m / 2 ;
         int midB = n / 2 ;
         int subLenA = m / 2 ;
         int subLenB = n / 2 ;
         int start = 0 ; //lowerBand id of median
         int end = ( m + n ) - 1 ; //upperBand id of median
        
         //simply case, if A[midA] == B[midB] , return A[midA], because before A[midA],
         //there are m/2 elements in A, and before B[midB], there are n/2 in B.
         //Since we want to return the (m+n)/2 + 1 th element, thus we just return A[midA]
         if ( A [ midA ] == B [ midB ] )
             return A [ midA ];
        
         while ( subLenA > 0 || subLenB > 0 )
         {
             if ( A [ midA ] >= B [ midB ] )
             {
                 start = start + subLenB ;
                 if ( start > ( m + n ) / 2 )
                 {
                     start = start - subLenB ;
                     return findMedianElement ( A , midA , m , B , midB , n , start );
                 }
                 end = end - subLenA ;
                 subLenA = subLenA / 2 ;
                 subLenB = subLenB / 2 ;
                 midA = midA - subLenA ;
                 midB = midB + subLenB ;
             }
             else
             {
                 start = start + subLenA ;
                 if ( start > ( m + n ) / 2 )
                 {
                     start = start - subLenA ;
                     return findMedianElement ( A , midA , m , B , midB , n , start );
                 }
                 end = end - subLenB ;
                 subLenA = subLenA / 2 ;
                 subLenB = subLenB / 2 ;
                 midA = midA + subLenA ;
                 midB = midB - subLenB ;
             }
         }
        
         //if goes to here, subLenA == 1, subLenB == 1, then
         median = min ( A [ midA ] , B [ midB ] );
        
        
         return median ;
     }
    
     double findMedianElement ( int A [], int midA , int m , int B [], int midB , int n , int start ){
        
         while ( start < ( m + n ) / 2 && midA < m && midB < n ){
             if ( A [ midA ] >= B [ midB ] )
             {
                 midB ++ ;
                 start ++ ;
                 if ( start == ( m + n ) / 2 )
                     return A [ midA ] <= B [ midB ] ? A [ midA ] : B [ midB ];
             }
             else
             {
                 midA ++ ;
                 start ++ ;
                 if ( start == ( m + n ) / 2 )
                     return A [ midA ] <= B [ midB ] ? A [ midA ] : B [ midB ];
             }
         }
        
         if ( midA == m )
         {
             while ( start < ( m + n ) / 2 )
             {
                 midB ++ ;
                 start ++ ;
             }
             return B [ midB ];
         }
         else if ( midB == n )
         {
             while ( start < ( m + n ) / 2 )
             {
                 midA ++ ;
                 start ++ ;
             }
             return A [ midA ];
         }
        
     }
    
     double findMedianPair ( int A [], int midA , int m , int B [], int midB , int n , int start )
     {
         while ( start < ( m + n ) / 2 - 1 && midA < m && midB < n ){
             if ( A [ midA ] >= B [ midB ] )
             {
                 midB ++ ;
                 start ++ ;
                 if ( start == ( m + n ) / 2 - 1 )
                 {
                     int pair1 = min ( A [ midA ], B [ midB ]);
                     if ( pair1 == A [ midA ] ) midA ++ ;
                     else if ( pair1 == B [ midB ]) midB ++ ;
                     int pair2 = min ( A [ midA ], B [ midB ]);
                     return ( pair1 + pair2 ) / 2.0 ;
                 }
             }
             else
             {
                 midA ++ ;
                 start ++ ;
                 if ( start == ( m + n ) / 2 - 1 )
                 {
                     int pair1 = min ( A [ midA ], B [ midB ]);
                     if ( pair1 == A [ midA ] ) midA ++ ;
                     else if ( pair1 == B [ midB ]) midB ++ ;
                     int pair2 = min ( A [ midA ], B [ midB ]);
                     return ( pair1 + pair2 ) / 2.0 ;
                 }
             }
         }
        
         if ( midA == m )
         {
             while ( start < ( m + n ) / 2 - 1 )
             {
                 midB ++ ;
                 start ++ ;
             }
             return ( B [ midB ] + B [ midB + 1 ]) / 2.0 ;
         }
         else if ( midB == n )
         {
             while ( start < ( m + n ) / 2 - 1 )
             {
                 midA ++ ;
                 start ++ ;
             }
             return ( A [ midA ] + A [ midA + 1 ]) / 2.0 ;
         }
     }
     double findMedianSortedArrays ( int A [], int m , int B [], int n ) {
         // Start typing your C/C++ solution below
         // DO NOT write int main() function
         if ( m == 0 && n == 0 ) return 0 ;
         else if ( m == 0 && n == 1 )
             return B [ 0 ];
         else if ( m == 1 && n == 0 )
             return A [ 0 ];
         else if ( m == 1 && n == 1 )
             return ( A [ 0 ] + B [ 0 ]) / 2 ;
         //Two cases: m+n is even, then the median is (O[(m+n)/2-1] + O[(m+n)/2])/2,
         //assuming that the O[] stands for the merged array.
         //if m+n is odd, then median is O[(m+n)/2].
        
         if ( ( m + n ) % 2 == 0 ) //even
             return findMedianSortedArraysEven ( A , m , B , n );
         else
             return findMedianSortedArraysOdd ( A , m , B , n );
     }
    
};


do binary search until start = (m+n)/2.

算法复杂度: O(log(m+n)


这道题还是没有很清晰的思路,改日再来改正这个答案

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值