假如我有一个Oracle,祂可以告诉我某只股票在未来一段时间内的价钱,我的目的是利用这些信息来赚最多的钱。
1. 第一想法就是在最低点买入,在最高点卖出。
Unfortunately,在一段时间内,最低点可能会出现在最高点之后,比如下图里,最低点出现在第7天,最高点出现在第1天。
2. 我盯住上图,不难发现在第7天买入在第11天卖出,可以赚到最大利润43。Well, Well,那么我能否找出最低点,然后向后(右)扫描,找出之后的最高点。相似地,找出最高点,然后向前(左)扫描找出之前的最低点。取两者中利润大的。
下图是另一个反例,第2天买第3天卖能获得最大利润。
3. 最简单的暴力解法 (brute-force)
Can we do better?
A transformation
当前的问题就是,我们要找出 从第一天到最后一天的净变(net change)最大 的一段连续日子。
那么尝试一下不看每天的价格,改为考虑每天价格的改变值。
下图就是第一幅图的每天价格的改变值。
现在我们的任务就变成从这个数组中,找出一个具有最大和的非空连续子数组,称之为maximum subarray。
值得注意的是,maximum-subarray problem只有当原数组里包含负数时才显得有趣。如果所有数都是正数,整个数组当然会得出最大的和。
但是暴力解法仍然需要尝试C^(n-1)_2种可能,而且还没算上计算子数组的和所花费的时间,因此依然是Ω(n^2)。
下面是实现暴力解法的伪代码:
MAX-SUBARRAY-BRUTE-FORCE(A) max_so_far = -∞ begin = end = 0 for i = 0 to n - 1 for j = i to n - 1 sum = 0 // the next partial sum we are computing for k = i to j sum = sum + A[k] if sum > max_so_far max_so_far = sum begin = i end = j return (max_so_far, begin, end)
Python代码
import sys
def max_subarray_brute_force(A):
max_so_far = -(sys.maxint)-1 # the min int by default
begin = end = 0
n = len(A)
for i in range(0, n):
for j in range(i, n):
sum = 0 # the next partial sum we are computing
for k in range(i, j + 1):
sum += A[k]
if sum > max_so_far:
max_so_far = sum
begin = i
end = j
return (max_so_far, begin, end)
A = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7]
result = max_subarray_brute_force(A)
print result # (43, 7, 10) is expected
有没有注意到,上面的代码中第三层for花费了好多时间用来“重复”计算子数组的和,每一次都从头开始计算和。
讨厌做重复多余的事情,我们可以定义一种东西叫 prefix sums,就是数组里头t个数的和,每一个prefix sum S_t 就是
当我们一次性计算所有prefix sums之后,我们就可以在constant time里计算出任何子数组的和。
那么伪代码就可以改写成
MAX-SUBARRAY-PREFIX-SUM(A) S[0] = A[0] for i = 1 to n - 1 S[i] = S[i - 1] + A[i] max_so_far = -∞ begin = end = 0 for i = 0 to n - 1 for j = i to n - 1 if (S[j] - S[i - 1]) > max_so_far max_so_far = (S[j] - S[i - 1]) begin = i end = j return (max_so_far, begin, end)
如此看来,暴力解法的时间复杂度是Θ(n^2).
4. Divide-and-Conquer
假如我们想要在子数组A[low..high]中找出一个maximum subarray,divide-and-conquer就是要讲这个原子数组分成两半(或三半...),也就是找出中点mid,转向考虑子数组A[low..mid]和A[mid+1..high],那么是不是就只是考虑A[low..mid]和A[mid+1..high]这两个子问题即可?
要知道,A[low..high]中的任何连续子数组A[i..j]一定落在下面的其中一个位置:
- 完全在子数组A[low..mid]里,low <= i <= j <= mid.
- 完全在子数组A[mid+1..high]里,mid < i <= j <= high.
- 跨过中点mid,low <= i <= mid < j <= high.
MAX-CROSSING-SUBARRAY(A, low, mid, high) // Θ(high −low+1)=Θ(n) /* find a maximum subarray of the left half, in the form of A[i . . mid], this subarray must contain A[mid] */ left-max = -∞ // holds the greatest sum found so far left-sum = 0 // holds the sum of the entries in A[i . . mid] for i = mid downto low left-sum = left-sum + A[i] if left-sum > left-max left-max = left-sum left-index = i /* similar to the left half */ right-max = -∞ right-sum = 0 for j = mid + 1 to high right-sum = right-sum + A[j] if right-sum > right-max right-max = right-sum right-index = j return (left-max + right-max, max-left, max-right) MAX-SUBARRAY-DIVIDE-AND-CONQUER(A, low, high) // base case: only one element if high == low return (A[low], low, high) else mid = (low + high) / 2 (left-sum, left-low, left-high) = MAX-SUBARRAY-DIVIDE-AND-CONQUER(A, low, mid) (right-sum, right-low, right-high) = MAX-SUBARRAY-DIVIDE-AND-CONQUER(A, mid + 1, high) (cross-sum, cross-low, cross-high) = MAX-CROSSING-SUBARRAY(A, low, mid, high) if left-sum >= right-sum and left-sum >= cross-sum return (left-sum, left-low, left-high) elseif right-sum >= left-sum and right-sum >= cross-sum return (right-sum, right-low, right-high) else return (cross-sum, cross-low, cross-high)
import sys
def max_crossing_subarray(A, low, mid, high):
left_max = right_max = -(sys.maxint) - 1 # the max sum so far
left_sum = right_sum = 0 # the sum ending at this index
# left_index and right_index mark the index which has the left_max and right_max
# iterate from mid to low
for i in range(mid, low - 1, -1):
left_sum += A[i]
if left_sum > left_max:
left_max = left_sum
left_index = i
# iterate from mid+1 to high
for i in range(mid + 1, high + 1):
right_sum += A[i]
if right_sum > right_max:
right_max = right_sum
right_index = i
return (left_max + right_max, left_index, right_index)
def max_subarray_divide_and_conquer(A, low, high):
if low == high:
return (A[low], low, high)
else:
mid = (low + high) / 2
left = max_subarray_divide_and_conquer(A, low, mid) # (left_max, left_low, left_high)
right = max_subarray_divide_and_conquer(A, mid + 1, high) # (right_max, right_low, right_high)
cross = max_crossing_subarray(A, low, mid, high) # (cross_max, cross_low, cross_high)
print "low mid high = ", low, mid, high
print "left, right, cross, max", left, right, cross, max(left, right, cross)
return max(left, right, cross)
A = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7]
print max_subarray_divide_and_conquer(A, 0, len(A) - 1) # (43, 7, 10) is expected
时间复杂度等同归并排序,两个一半大小的子问题加上crossing的Θ(n)时间,其它步骤算Θ(1),
T(n)=2T(n∕2)+Θ(n)+Θ(1)=2T(n∕2)+Θ(n)
5. Kadane's algorithm (Linear-time Algorithm)
代码中有一处亮点,就是不判断上一个max_ending_here是否负数,而是加上当前值x后直接取最大值,因为如果判断结果是正,都是要进行相加操作。
Kadane(A): max_ending_here = max_so_far = A[0] begin = end = begin_temp = 0 for i = 0 to A.length max_ending_here = max(x, max_ending_here + x) max_so_far = max(max_so_far, max_ending_here) return max_so_far
虽然代码里只有一次遍历,但是可以这样想:先遍历一次,计算所有max_ending_here(partial maximum),再从这些partial maximum中找出最大的那个(max_so_far)
Kadane(A): max_ending_here = max_so_far = A[0] begin = end = begin_temp = 0 for i = 0 to A.length if max_ending_here < 0 max_ending_here = A[i] begin_temp = i else max_ending_here = max_ending_here + A[i] // calculate max_so_far if max_ending_here > max_so_far max_so_far = max_ending_here begin = begin_temp end = i return (max_so_far, begin, end)