Problem
M*N的矩阵A分割成s*t, N*P的矩阵B分割成t*v,
所以
Input
<M><i><j><m_ij>,M为矩阵,i, j为矩阵M的元素下标,m_ij为矩阵M行列下标为i,j的非0元素
假设矩阵A,B为
输入的样例数据如下:
A,0,1,1.0
A,0,2,2.0
A,0,3,3.0
A,0,4,4.0
A,1,0,5.0
A,1,1,6.0
A,1,2,7.0
A,1,3,8.0
A,1,4,9.0
B,0,1,1.0
B,0,2,2.0
B,1,0,3.0
B,1,1,4.0
B,1,2,5.0
B,2,0,6.0
B,2,1,7.0
B,2,2,8.0
B,3,0,9.0
B,3,1,10.0
B,3,2,11.0
B,4,0,12.0
B,4,1,13.0
B,4,2,14.0
Output
<i><j><m_ij>,结果矩阵行列下标i, j,以及对应的值m_ij
上述样例数据的数出结果为:
0,0,90.0
0,1,100.0
0,2,110.0
1,0,240.0
1,1,275.0
1,2,310.0
Pseudocode
矩阵AB的分块乘法计算分两步map-reduce进行
第一步
map(key, value):
// value is ("A", i, j, a_ij) or ("B", j, k, b_jk)
if value[0] == "A":
i = value[1]
j = value[2]
a_ij = value[3]
for k_per_v = 0 to p/v - 1:
emit((i/s, j/t, k_per_v), ("A", i%s, j%t, a_ij))
else:
j = value[1]
k = value[2]
b_jk = value[3]
for i_per_s = 0 to m/s - 1:
emit((i_per_s, j/t, k/v), ("B", j%t, k%v, b_jk))
reduce(key, values):
// key is (i_per_s, j_per_t, k_per_v)
// values is a list of ("A", i_mod_s, j_mod_t, a_ij) and ("B", j_mod_t, k_mod_v, b_jk)
list_A = [(i_mod_s, j_mod_t, a_ij) for (M, i_mod_s, j_mod_t, a_ij) in values if M == "A"]
list_B = [(j_mod_t, k_mod_v, b_jk) for (M, j_mod_t, k_mod_v, b_jk) in values if M == "B"]
hash = {}
for a in list_A:
for b in list_B:
if a[1] == b[0]: // a.j_mod_t == b.j_mod_t
hash[(a[0], b[1])] += a[2]*b[2]
for {(i_mod_s, k_mod_v): v} in hash:
emit((key[0]*s + i_mod_s, key[2]*v + k_mod_v), v)
第二步
map(key, value):
emit(key, value)
reduce(key, values):
result = 0
for value in values:
result += value
emit(key, result)
Hadoop代码
import java.io.IOException;
import java