隔了好久,终于得空,继续Leetcode的刷题之旅。
-----------------------------------------------------------------我是问题描述开始标记-------------------------------------------------------------------
There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
Example 1:
nums1 = [1, 3]
nums2 = [2]
The median is 2.0
Example 2:
nums1 = [1, 2]
nums2 = [3, 4]
The median is (2 + 3)/2 = 2.5
-----------------------------------------------------------------我是问题描述结束标记-------------------------------------------------------------------
首先声明的是,我自己并没有把这道题做出来。而是参考了赞同得票数最多的答案。那么写这篇博客的目的是什么呢?记录:记录解题思想;反思:反思为什么自己没有想到解题思路。
其实最开始看到这道题目,首先我是懵逼的,原因是我不明白median到底是啥意思,由题中给出的例子,并不足以理解什么叫做median。从题目中的例子,我能知道median一定是那个,能将两个sorted数组合并后的大数组分成元素数量相等的两部分的元素,但问题在于,给出的例子中,median都是其最紧邻的左右两元素的平均值,这就给我造成了理解上的误差,是否medain元素一定要是其左右紧邻的两元素的平均值?
于是果断google之,遂有了维基百科的解答(原网址median维基百科),
The median is the value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. In simple terms, it may be thought of as the "middle" value of a data set.
通过维基百科的解答和例子描述,了解了median在统计学中的定义,所以可知median并没有要求要是左右紧邻两元素的平均值。
有了这个认识,可以继续审题了,从题目中对时间复杂度O(log (m+n))的要求,并且想到这道题目是要求找到median元素,O(log (m+n)) + 查找,很自然的就让我想到了二分查找。于是乎,本着这个线索继续寻找解题思路,然后就没有然后了....
其实,在解决这道题的过程中,非常充分的体现出了数学功底对于解决算法问题的重要性,可惜我毕业之后就很少接触数学相关的东西,而且自身之前也没有这么深刻的意识到数学与计算机问题的相关性。后面,因为想着重提升下自己算法方面的能力,因此数学的学习是必不可少的了。到时,应该会有相应的学习博客。
思路回到这道算法题上,在对这道题的解决方法没有一个清晰的思路的情况下,我查阅了这道题Top Solutions里赞成票数最多的答案,具体答案原文如下:
-----------------------------------------------------------------我是答案描述开始标记-------------------------------------------------------------------
To solve this problem, we need to understand "What is the use of median". In statistics, the median is used for dividing a set into two equal length subsets, that one subset is always greater than the other
. If we understand the use of median for dividing, we are very close to the answer.
First let's cut A into two parts at a random position i:
left_A | right_A
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
Since A has m elements, so there are m+1 kinds of cutting( i = 0 ~ m ). And we know: len(left_A) = i, len(right_A) = m - i . Note: when i = 0 , left_A is empty, and when i = m , right_A is empty.
With the same way, cut B into two parts at a random position j:
left_B | right_B
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
Put left_A and left_B into one set, and put right_A and right_B into another set. Let's name them left_part and right_part :
left_part | right_part
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
If we can ensure:
1) len(left_part) == len(right_part)
2) max(left_part) <= min(right_part)
then we divide all elements in {A, B} into two parts with equal length, and one part is always greater than the other. Then median = (max(left_part) + min(right_part))/2.
To ensure these two conditions, we just need to ensure:
(1) i + j == m - i + n - j (or: m - i + n - j + 1)
if n >= m, we just need to set: i = 0 ~ m, j = (m + n + 1)/2 - i
(2) B[j-1] <= A[i] and A[i-1] <= B[j]
(For simplicity, I presume A[i-1],B[j-1],A[i],B[j] are always valid even if i=0/i=m/j=0/j=n . I will talk about how to deal with these edge values at last.)
So, all we need to do is:
Searching i in [0, m], to find an object `i` that:
B[j-1] <= A[i] and A[i-1] <= B[j], ( where j = (m + n + 1)/2 - i )
And we can do a binary search following steps described below:
<1> Set imin = 0, imax = m, then start searching in [imin, imax]
<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i
<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
that we may encounter:
<a> B[j-1] <= A[i] and A[i-1] <= B[j]
Means we have found the object `i`, so stop searching.
<b> B[j-1] > A[i]
Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
Can we `increase` i?
Yes. Because when i is increased, j will be decreased.
So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
be satisfied.
Can we `decrease` i?
`No!` Because when i is decreased, j will be increased.
So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
be never satisfied.
So we must `increase` i. That is, we must ajust the searching range to
[i+1, imax]. So, set imin = i+1, and goto <2>.
<c> A[i-1] > B[j]
Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
That is, we must ajust the searching range to [imin, i-1].
So, set imax = i-1, and goto <2>.
When the object i is found, the median is:
max(A[i-1], B[j-1]) (when m + n is odd)
or (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 (when m + n is even)
Now let's consider the edges values i=0,i=m,j=0,j=n where A[i-1],B[j-1],A[i],B[j] may not exist. Actually this situation is easier than you think.
What we need to do is ensuring that max(left_part) <= min(right_part)
. So, if i and j are not edges values(means A[i-1],B[j-1],A[i],B[j] all exist), then we must check both B[j-1] <= A[i] and A[i-1] <= B[j]. But if some of A[i-1],B[j-1],A[i],B[j] don't exist, then we don't need to check one(or both) of these two conditions. For example, if i=0, then A[i-1] doesn't exist, then we don't need to check A[i-1] <= B[j]. So, what we need to do is:
Searching i in [0, m], to find an object `i` that:
(j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j == n or A[i-1] <= B[j])
where j = (m + n + 1)/2 - i
And in a searching loop, we will encounter only three situations:
<a> (j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j = n or A[i-1] <= B[j])
Means i is perfect, we can stop searching.
<b> j > 0 and i < m and B[j - 1] > A[i]
Means i is too small, we must increase it.
<c> i > 0 and j < n and A[i - 1] > B[j]
Means i is too big, we must decrease it.
Thank @Quentin.chen , him pointed out that: i < m ==> j > 0
and i > 0 ==> j < n
. Because:
m <= n, i < m ==> j = (m+n+1)/2 - i > (m+n+1)/2 - m >= (2*m+1)/2 - m >= 0
m <= n, i > 0 ==> j = (m+n+1)/2 - i < (m+n+1)/2 <= (2*n+1)/2 <= n
So in situation <b> and <c>, we don't need to check whether j > 0
and whether j < n
.
Below is the accepted code:
def median(A, B):
m, n = len(A), len(B)
if m > n:
A, B, m, n = B, A, n, m
if n == 0:
raise ValueError
imin, imax, half_len = 0, m, (m + n + 1) / 2
while imin <= imax:
i = (imin + imax) / 2
j = half_len - i
if i < m and B[j-1] > A[i]:
# i is too small, must increase it
imin = i + 1
elif i > 0 and A[i-1] > B[j]:
# i is too big, must decrease it
imax = i - 1
else:
# i is perfect
if i == 0: max_of_left = B[j-1]
elif j == 0: max_of_left = A[i-1]
else: max_of_left = max(A[i-1], B[j-1])
if (m + n) % 2 == 1:
return max_of_left
if i == m: min_of_right = B[j]
elif j == n: min_of_right = A[i]
else: min_of_right = min(A[i], B[j])
return (max_of_left + min_of_right) / 2.0
-----------------------------------------------------------------我是答案描述结束标记-------------------------------------------------------------------
可以看出,这篇答案的作者对于这道问题的思路非常之清晰,且各个细节的处理也都很到位,看过之后,受益匪浅。
概括一下作者解题的思路就是:
1、首先搞明白median的含义,只有吃透了题目的意思,才能正确且快速地解决问题;
2、了解了median的含义之后,实际上我们也就知道了,我们要找的median元素具有如下性质:
- 该元素会将两个sorted数组合并后的大数组(实际上并没有合并,但这样有助于理解)分成左右相等的两部分;
- 左边部分的所有元素都小于等于median元素,右边部分的所有元素都大于等于median元素,也就是说有max(left_ele) <= min(right_ele);
double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) {
vector<int> A = nums1;
vector<int> B = nums2;
if (nums1.size() > nums2.size())
{
A = nums2;
B = nums1;
}
int m = A.size(), n = B.size();
int half_len = (m + n + 1) / 2;
int iMin = 0, iMax = m;
int i = 0, j = 0;
double result = 0.0;
while (iMin <= iMax)
{
i = (iMin + iMax) / 2;
j = half_len - i;
if (i < m && B[j - 1] > A[i])
{
iMin = i + 1;
}
else if (i > 0 && A[i - 1] > B[j])
{
iMax = i - 1;
}
else
{
int max_of_left = 0;
int min_of_right = 0;
if ( i == 0)
{
max_of_left = B[j - 1];
}
else if (j == 0)
{
max_of_left = A[i - 1];
}
else
{
max_of_left = max(A[i - 1], B[j - 1]);
}
if ((m + n) % 2 != 0)
{
result = max_of_left;
break;
}
if ( i == m)
{
min_of_right = B[j];
}
else if ( j == n)
{
min_of_right = A[i];
}
else
{
min_of_right = min(A[i], B[j]);
}
result = (max_of_left + min_of_right) / 2.0;
break;
}
}
return result;
}
通过这道题可以看出自己在运用数学思想解决算法问题方面能力的欠缺以及对问题细节的把握不够,后面需要进行相应的训练,有针对性的加强!