最近烦恼了一阵关于最大全1子矩阵的问题。流行的解法似乎是把0设为大负数然后就可以用求最大和子矩阵的算法了。
……不过总感觉有很大优化改进的余地,尚在思考中……
anyway,回到最大全1子矩阵。自己想了另一个算法:
比如对于矩阵:
0 1 1 0 0
0 1 0 1 1
1 0 1 1 1
0 1 1 1 1
1 1 1 1 1
首先处理每一行,对连续的1序列逐个累加,逢0reset:
比如1 0 1 1 1就变换为 1 0 1 2 3
0 1 1 0 0 就变换为 0 1 2 0 0
这样每个位置的数即表示从此位置往左共有多少连续的1
然后处理变换过后的每一列,对每个数,若〉0,则往上往下分别遍历至第一个小于其值的数为止,计算行数,行数x数值即是一个可能的全1子矩阵的面积。找出最大的即可。
比如对于:
0
1
3
2
0
4
对于第二行1,向上向下遍历,>=1的行共3行,所以此子矩阵宽度1,高度3,面积为3
对于第三行3,向上向下遍历,>=3的行共1行,所以此子矩阵宽度3,高度1,面积为3
对于第四行2,向上向下遍历,>=2的行共2行,所以此子矩阵宽度2,高度2,面积为4
等等。
复杂度分析:
从时间复杂度上来说此算法和最大和算法(i=1...矩阵最大行数;j=i...矩阵最大行数,对于包含行i至j子矩阵分算每列和得到一维数组然后动态规划求最大连续和,遍历所有可能 i , j 组合)并无区别,设矩阵行数H,列数W,其复杂度应该都为O(H^2 * W)。但是以上的方法比较而言避免了很多不必要的比较和计算,所以应该快一些。
缺点是占用更多空间。
python编码验证,以上方法的运算时间是最大和方法的30~50分之一。当矩阵尺寸为300时,最大和方法运算时间200~300秒+,以上方法运算时间5秒+。
代码如下:
首先是生成测试矩阵的代码:
import random
import time
import string
matrix_size = 8
one_percentage = 0.85
random.seed()
example_matrix = []
for i in range(matrix_size):
row = []
for j in range(matrix_size):
v = 1
if random.random() > one_percentage:
v = 0
row.append(v)
example_matrix.append(row[:])
然后是记录比较/运算次数的代码:
comp_count = 0
add_count = 0
mul_count = 0
swap_count = 0
is_counting = True
#is_counting = False
start_time = None
def reset():
if is_counting:
global comp_count
global add_count
global mul_count
global swap_count
comp_count = 0
add_count = 0
mul_count = 0
swap_count = 0
global start_time
start_time = time.clock()
def report():
global start_time
end_time = time.clock()
print("---Process time: %g sec(s)---" % \
(end_time - start_time,))
if is_counting:
global comp_count
global add_count
global mul_count
global swap_count
print 'reporting:'
print 'compare:', comp_count
print 'add:', add_count
print 'mul:', mul_count
print 'swap:', swap_count
print 'total calculation:', add_count + mul_count + swap_count
def comp(a, b):
if is_counting:
global comp_count
comp_count += 1
if a > b:
return 1
if a < b:
return -1
return 0
def add(a, b):
if is_counting:
global add_count
add_count += 1
return a+b
def mul(a, b):
if is_counting:
global mul_count
mul_count += 1
return a*b
# Get continuous sub sequence with max sum
def getMax(l):
start_i = 0
sum = l[0]
# max_start: inclusive
max_start = 0
# max_end: exclusive
max_end = 0
max_sum = l[0]
for i in range(1, len(l)):
if comp(sum, 0) < 0:
sum = l[i]
start_i = i
else:
sum = add(sum, l[i])
if comp(sum, max_sum) > 0:
max_start = start_i
# max_end: exclusive
max_end = i + 1
max_sum = sum
return max_start, max_end, max_sum
#reset()
#
#t = [31, -41, 59, 26, -53, 58, 97, -93, -23, 84]
#print getMax(t)
#
#report()
# start_row and end_row: start_row inclusive, end_row exclusive
def getMatrixColumnSum(matrix, start_row, end_row):
#h = len(matrix)
w = len(matrix[0])
sum = matrix[start_row][:]
for i in range(start_row + 1, end_row):
for j in range(0, w):
sum[j] = add(sum[j], matrix[i][j])
return sum
# well actually can optimize it a bit
def addTwoRows(row1, row2):
w = len(row1)
for i in range(w):
row1[i] = add(row1[i], row2[i])
def getMaxSubMatrixSum(matrix):
h = len(matrix)
# w = len(matrix[0])
max_sum = matrix[0][0]
# for now just testing so not going to record actual sub matrix area,
# only output max sum.
for i in range(h):
column_sum = matrix[i][:]
for j in range(i, h):
if comp(j, i) > 0:
addTwoRows(column_sum, matrix[j])
column_max_sum = getMax(column_sum)[2]
if comp(column_max_sum, max_sum) > 0:
max_sum = column_max_sum
return max_sum
#for i in range(12):
# for j in range(11):
# if example_matrix[i][j] <= 0:
# example_matrix[i][j] = - 132
#
#reset()
#print getMaxSubMatrixSum(example_matrix)
#report()
然后是较快的算法:
def process_row(row):
processed_row = []
accu = 0
for i in range(len(row)):
if comp(row[i], 0) == 0:
processed_row.append(0)
accu = 0
else:
accu = add(accu, row[i])
processed_row.append(accu)
return processed_row
def getMaxSubMatrixForSingleColumnFromProcessedMatrix(p_m, column_num):
h = len(p_m)
row_sum = p_m[0][column_num]
max_sum = row_sum
max_row_num = [0]*h
for i in range(h):
if comp(p_m[i][column_num], 0) == 0:
# pass
continue
else:
if comp(max_row_num[i], 0) > 0:
# pass: previously examined same value
continue
max_row_num[i] = 1
same_value_rows = []
# can merge these 2 loops into 1
# but for now leave it be.
for j in range(i-1, -1, -1):
if comp(p_m[j][column_num], p_m[i][column_num]) < 0:
break;
else:
max_row_num[i] = add(max_row_num[i], 1)
if comp(p_m[j][column_num], p_m[i][column_num]) == 0:
same_value_rows.append(j)
for j in range(i+1, h):
if comp(p_m[j][column_num], p_m[i][column_num]) < 0:
break;
else:
max_row_num[i] = add(max_row_num[i], 1)
if comp(p_m[j][column_num], p_m[i][column_num]) == 0:
same_value_rows.append(j)
for j in same_value_rows:
max_row_num[j] = max_row_num[i]
sum = mul(max_row_num[i], p_m[i][column_num])
if comp(sum, max_sum) > 0:
max_sum = sum
return max_sum
def getMaxSubMatrix(matrix):
p_m = []
for row in matrix:
p_m.append(process_row(row))
max_sum = getMaxSubMatrixForSingleColumnFromProcessedMatrix(p_m, 0)
for i in range(1, len(matrix[0])):
sum = getMaxSubMatrixForSingleColumnFromProcessedMatrix(p_m, i)
if comp(sum, max_sum) > 0:
max_sum = sum
return max_sum
#print 'new method:'
#reset()
#print getMaxSubMatrix(example_matrix)
#report()
#
#
#
#
#
#for i in range(len(example_matrix)):
# for j in range(len(example_matrix[0])):
# if example_matrix[i][j] <= 0:
# example_matrix[i][j] = - 132
#
#print
#print 'old method:'
#reset()
#print getMaxSubMatrixSum(example_matrix)
#report()