Problem: Median of Two Sorted Arrays (979/5373 -- 18%)
Problem Description: There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)). Struggled to write bug-free code within 20 mins. Still have bugs. Take care of corner cases.
这是一道有点令我抓狂的问题。实际上,现在这道题也没有通过online judge。但是,我打算先把这道题先记下来,日后update。难怪这道题只有18%的通过率。
Step I: Ask Questions
Make sure if they are numbers, strings or other? Are they both ascending or descending, or other cases? What is median?
Step II: Describe Approach to the problem-->Algorithm-->Algorithm Analysis
这个题首先要弄清楚median,median是一组有序数列中间的元素,如果数列长度为偶数,median是中间的两个元素的average。首先,这道题如果不考虑复杂度,就是一个mergeSort的merge的过程,设置两个指针指向两个数组的起始位置,然后按大小依次向后移动指针,就可以得到merge后的数组。这里,只需到找到第m+n/2个元素就可以,但是这样的时间复杂度仍然为O(m+n)。
如何可以完成O(log(m+n))? log的复杂度,让我们直观的想到binary search。可是这个问题如何实现binary search?
Binary Search是在一个有序的数列里search一个元素,首先将这个元素与中间的元素进行比较,在比较的结果除了相等以外,有大于和小于两种情况,这样通过一次比较,就可以将一个在n个元素中寻找1个元素的问题,缩小为在n/2个元素里search 1个元素的问题,由于两个数比较的大小概率基本相等,所以这样的search最大程度上剔除了不合适的元素。前两天看了刘未鹏的一篇文章,讲了有关排序问题与信息论的关系,很不错,推荐看下http://blog.csdn.net/pongba/article/details/2544933
而这个问题是,我们不知道哪个是这个中间元素。但是我们可不可以引用相同的思想来找到这个median呢? 每一次剔除一半的元素,然后通过log(m+n)次比较来找到它呢。
每一次剔除一半的元素,想到的是找A和B分别的median,如果通过什么方法能够剔除一半的A,一半的B,那我们就达成任务了。
如果我们比较median(A)和median(B)会有什么结果? median(A)<median(B),median(A)>median(B),median(A)=median(B).
1)如果median(A) = median(B). 这很好,因为,不管前面和后面的元素大小如何,median(A)前面有m/2+n/2个元素,不论奇偶情况,我们都可以很快的找到median
2)如果median(A)<median(B)。 类似上面的推断,我们发现,median(A)一定小于median,median(A)<median(B), 那么在B中比median(A)小的数少于n/2,那么在merge之后的数组里,比median(A)小的数少于m/2+n/2,因此median(A)<median; 同理推得,median(B)一定大于median。
经过这一次比较,我们得出结论,median在median(A)和median(B)之间,对应得,我们可以肯定,median肯定大于A[0]~median(A),肯定小于median(B)-B[n-1].
3) 如果median(A) >median(B)。和2)是相同的情况。
我们通过第一次比较,可以剔除m+n/2个元素,达到了目的,那么第二次比较呢?
经过第一次比较,假设是第二种情况,我们知道,至少A[0]~median(A)是在median前面,那么,这个问题转化成了,在A1=median(A)~A[m-1]和B1=B[0]~median(B)这(m+n)/2数里面,找第n/2小的数, 这里出现了问题。这个sub problem和之前的问题不一样了,不是找median了,我有两种选择,1是在两个sub array里面分别找n/4大元素,然后和第一次比较一样,
2是还是比较这两个sub array的median。第一种情况需要考虑是否n/4小于m/2,但是如果仔细考虑,是可以继续的。但是,这样的话,算法复杂度是O(log(m+n))么?我们很显然,第二次没有剔除到一半的元素,所以还是尝试第2种选择。
median(A1)和median(B1)比较,如果median(A1)<median(B1), median在median(A1)~A[m-1]和B[0]~median(B1)之间,那么至少A[0]~median(A1)小于median, median(B1)~B[n-1]大于median
如果median(A1)>median(B1), 那么至少A[0]~median(A)和B[0]~median(B1)小于median,median在median(A)~median(A1)和median(B1)~median(B)之间,median[A1]~A[m-1]和median(B)~B[n-1]大于median
无论那种情况,median的区间都被缩小到了(m+n)/4长度的数组里了。
这样经过log(m+n)次比较,我们一定可以找到将median的区间缩小到长度为1的数组,也就是找到它。
有几个问题需要考虑:
1. 奇偶。因为奇偶长度时,median的定义不同,因此我们需要定义不同的函数来handle
2. 假设我们把区间分别缩小为1,这两个元素那一个是median,还是他们的average?我们需要有一个counter,count这两个数前面有多少个数。
算法:
1. 如果m+n是奇数,找merge后的第(m+n)/2 + 1个元素;如果m+n是偶数,找merge后的第(m+n)/2和第(m+n)/2+1个元素
2. 定义两个量start,end分别指向当前median区间的上下界。定义lastStart和lastEnd,记录上一次的区间,以防止start>(m+n)/2.
while( subA.length()>1 && subB.length()>1 && start < (m+n)/2 )
medianA = subA.length()/2, medianB = subB.length()/2, compare subA[medianA] with subB[medianB];
if( subA[medianA] > subB[medianB] ) start = start + medianB; end = end-medianA:
else start = start + medianA; end = end - medianB;
update array. record last start and last end.
3. if start > (m+n)/2, start = lastStart, end = lastEnd,then merge the two sub array, until start = (m+n)/2
else if ( subA.length() <=1 ) or (subB.length()<=1 )
//FALSE CODE 174/2098
class Solution {public:double findMedianSortedArraysEven ( int A [], int m , int B [], int n ){double median ;int midA = m / 2 ;int midB = n / 2 ;int subLenA = m / 2 ;int subLenB = n / 2 ;int start = 0 ; //lowerBand id of medianint end = ( m + n ) - 1 ; //upperBand id of medianwhile ( subLenA > 0 || subLenB > 0 ){if ( A [ midA ] >= B [ midB ] ){start = start + subLenB ;if ( start > ( m + n ) / 2 - 1 ){start = start - subLenB ;return findMedianPair ( A , midA , m , B , midB , n , start );}end = end - subLenA ;subLenA = subLenA / 2 ;subLenB = subLenB / 2 ;midA = midA - subLenA ;midB = midB + subLenB ;}else{start = start + subLenA ;if ( start > ( m + n ) / 2 - 1 ){start = start - subLenA ;return findMedianPair ( A , midA , m , B , midB , n , start );}end = end - subLenB ;subLenA = subLenA / 2 ;subLenB = subLenB / 2 ;midA = midA + subLenA ;midB = midB - subLenB ;}}//if goes to here, subLenA == 1, subLenB == 1, thenmedian = ( A [ midA ] , B [ midB ] ) / 2.0 ;return median ;}double findMedianSortedArraysOdd ( int A [], int m , int B [], int n ){double median ;int midA = m / 2 ;int midB = n / 2 ;int subLenA = m / 2 ;int subLenB = n / 2 ;int start = 0 ; //lowerBand id of medianint end = ( m + n ) - 1 ; //upperBand id of median//simply case, if A[midA] == B[midB] , return A[midA], because before A[midA],//there are m/2 elements in A, and before B[midB], there are n/2 in B.//Since we want to return the (m+n)/2 + 1 th element, thus we just return A[midA]if ( A [ midA ] == B [ midB ] )return A [ midA ];while ( subLenA > 0 || subLenB > 0 ){if ( A [ midA ] >= B [ midB ] ){start = start + subLenB ;if ( start > ( m + n ) / 2 ){start = start - subLenB ;return findMedianElement ( A , midA , m , B , midB , n , start );}end = end - subLenA ;subLenA = subLenA / 2 ;subLenB = subLenB / 2 ;midA = midA - subLenA ;midB = midB + subLenB ;}else{start = start + subLenA ;if ( start > ( m + n ) / 2 ){start = start - subLenA ;return findMedianElement ( A , midA , m , B , midB , n , start );}end = end - subLenB ;subLenA = subLenA / 2 ;subLenB = subLenB / 2 ;midA = midA + subLenA ;midB = midB - subLenB ;}}//if goes to here, subLenA == 1, subLenB == 1, thenmedian = min ( A [ midA ] , B [ midB ] );return median ;}double findMedianElement ( int A [], int midA , int m , int B [], int midB , int n , int start ){while ( start < ( m + n ) / 2 && midA < m && midB < n ){if ( A [ midA ] >= B [ midB ] ){midB ++ ;start ++ ;if ( start == ( m + n ) / 2 )return A [ midA ] <= B [ midB ] ? A [ midA ] : B [ midB ];}else{midA ++ ;start ++ ;if ( start == ( m + n ) / 2 )return A [ midA ] <= B [ midB ] ? A [ midA ] : B [ midB ];}}if ( midA == m ){while ( start < ( m + n ) / 2 ){midB ++ ;start ++ ;}return B [ midB ];}else if ( midB == n ){while ( start < ( m + n ) / 2 ){midA ++ ;start ++ ;}return A [ midA ];}}double findMedianPair ( int A [], int midA , int m , int B [], int midB , int n , int start ){while ( start < ( m + n ) / 2 - 1 && midA < m && midB < n ){if ( A [ midA ] >= B [ midB ] ){midB ++ ;start ++ ;if ( start == ( m + n ) / 2 - 1 ){int pair1 = min ( A [ midA ], B [ midB ]);if ( pair1 == A [ midA ] ) midA ++ ;else if ( pair1 == B [ midB ]) midB ++ ;int pair2 = min ( A [ midA ], B [ midB ]);return ( pair1 + pair2 ) / 2.0 ;}}else{midA ++ ;start ++ ;if ( start == ( m + n ) / 2 - 1 ){int pair1 = min ( A [ midA ], B [ midB ]);if ( pair1 == A [ midA ] ) midA ++ ;else if ( pair1 == B [ midB ]) midB ++ ;int pair2 = min ( A [ midA ], B [ midB ]);return ( pair1 + pair2 ) / 2.0 ;}}}if ( midA == m ){while ( start < ( m + n ) / 2 - 1 ){midB ++ ;start ++ ;}return ( B [ midB ] + B [ midB + 1 ]) / 2.0 ;}else if ( midB == n ){while ( start < ( m + n ) / 2 - 1 ){midA ++ ;start ++ ;}return ( A [ midA ] + A [ midA + 1 ]) / 2.0 ;}}double findMedianSortedArrays ( int A [], int m , int B [], int n ) {// Start typing your C/C++ solution below// DO NOT write int main() functionif ( m == 0 && n == 0 ) return 0 ;else if ( m == 0 && n == 1 )return B [ 0 ];else if ( m == 1 && n == 0 )return A [ 0 ];else if ( m == 1 && n == 1 )return ( A [ 0 ] + B [ 0 ]) / 2 ;//Two cases: m+n is even, then the median is (O[(m+n)/2-1] + O[(m+n)/2])/2,//assuming that the O[] stands for the merged array.//if m+n is odd, then median is O[(m+n)/2].if ( ( m + n ) % 2 == 0 ) //evenreturn findMedianSortedArraysEven ( A , m , B , n );elsereturn findMedianSortedArraysOdd ( A , m , B , n );}};
do binary search until start = (m+n)/2.
算法复杂度: O(log(m+n)
这道题还是没有很清晰的思路,改日再来改正这个答案