python调用sparkmlib_python spark mllib 怎么做两个矩阵的乘法啊，包括local matrix的乘法，distributed matrix的乘法，以及local dis...

weixin_39647499

于 2020-12-06 12:02:35 发布

阅读量173

点赞数

文章标签： python调用sparkmlib

矩阵乘法的计算分为内积法和外积法，根据计算的粒度也可以氛围一般的和分块的。

内积法实现如下：

缺点：shuffle的数据过多，影响性能；

但是这种能够算法当有一个矩阵为小矩阵时，很好。你可以把小矩阵broadcast到计算的每个节点上。不需要shuffle，充分利用数据本地性。

外积法：

缺点：如果不是稀疏矩阵计算的中间矩阵非常大，

优势是计算稀疏矩阵

分块计算：

spark 矩阵乘法代码分析：def multiply(other: BlockMatrix): BlockMatrix = {

.......

if (colsPerBlock == other.rowsPerBlock) {

//GridPartitioner一共分为numRowBlocks*other.numColBlocks个partition

valresultPartitioner =GridPartitioner(numRowBlocks,other.numColBlocks,

math.max(blocks.partitions.length,other.blocks.partitions.length))

// 这里是计算每个leftDestinations和rightDestinations的类型都是Map[(Int,Int),Set[Int]],也就是先计算左右矩阵的

// 每一块会shuffle到哪个partition

val(leftDestinations,rightDestinations) = simulateMultiply(other,resultPartitioner)

// Each block of A must be multiplied with the corresponding blocks in the columns of B.

valflatA = blocks.flatMap {case((blockRowIndex,blockColIndex),block) =>

val destinations = leftDestinations.getOrElse((blockRowIndex,blockColIndex),Set.empty)

destinations.map(j => (j, (blockRowIndex,blockColIndex,block)))

}

// Each block of B must be multiplied with the corresponding blocks in each row of A.

valflatB = other.blocks.flatMap {case((blockRowIndex,blockColIndex),block) =>

val destinations = rightDestinations.getOrElse((blockRowIndex,blockColIndex),Set.empty)

destinations.map(j => (j, (blockRowIndex,blockColIndex,block)))

}

// GridPartitioner一共有numRowBlocks*other.numColBlocks 个分区,所以在cogroup的时候，在计算A*B=C的时候，C矩阵所用到的所有A和B中的

//分块都会在一个partition中，在reduceByKey的时候就可以进行combineByKey进行优化，事实上在reduceByKey的过程中，只有相加的过程，

// 没有shuffle的过程。

valnewBlocks = flatA.cogroup(flatB,resultPartitioner).flatMap {case(pId,(a,b)) =>

a.flatMap { case (leftRowIndex,leftColIndex,leftBlock) =>

b.filter(_._1 == leftColIndex).map { case (rightRowIndex, rightColIndex,rightBlock) =>

//在进行矩阵乘法实现的时候，本地矩阵计算使用com.github.fommil.netlib包提供的矩阵算法，矩阵加法调用的是scalanlp包提供的矩阵加法

valC = rightBlockmatch{

case dense: DenseMatrix => leftBlock.multiply(dense)

case sparse: SparseMatrix => leftBlock.multiply(sparse.toDense)

case _ =>

throw new SparkException(s"Unrecognized matrix type${rightBlock.getClass}.")

}

((leftRowIndex, rightColIndex),C.toBreeze)

}

}

}.reduceByKey(resultPartitioner, (a,b) => a + b).mapValues(Matrices.fromBreeze)

// TODO: Try to use aggregateByKey instead of reduceByKey to get rid of intermediate matrices

newBlockMatrix(newBlocks,rowsPerBlock,other.colsPerBlock,numRows(),other.numCols())

} else {

.......

}

}private[distributed] def simulateMultiply(

other: BlockMatrix,

partitioner: GridPartitioner): (BlockDestinations,BlockDestinations) = {

val leftMatrix = blockInfo.keys.collect() // blockInfo should already be cached

valrightMatrix = other.blocks.keys.collect()

//以下这段代码这样理解，假设A*B=C,因为A11在计算C11到C1n的时候会用到，所以A11在计算C11到C1n的机器都会存放一份。

valleftDestinations = leftMatrix.map {case(rowIndex,colIndex) =>

//左矩阵中列号会和右矩阵行号相同的块相乘，得到所有右矩阵中行索引和左矩阵中列索引相同的矩阵的位置。

// 由于有这个判断，右矩阵中没有值的快左矩阵就不会重复复制了，避免了零值计算。

valrightCounterparts = rightMatrix.filter(_._1 == colIndex)

// 因为矩阵乘完之后还有相加的操作(reduceByKey),相加的操作如果在同一部机器上可以用combineBy进行优化，

// 这里直接得到每一个分块在进行完乘法之后会在哪些partition中用到。

valpartitions = rightCounterparts.map(b => partitioner.getPartition((rowIndex,b._2)))

((rowIndex, colIndex),partitions.toSet)

}.toMap

val rightDestinations = rightMatrix.map {case(rowIndex,colIndex) =>

val leftCounterparts = leftMatrix.filter(_._2 == rowIndex)

val partitions = leftCounterparts.map(b => partitioner.getPartition((b._1,colIndex)))

((rowIndex, colIndex),partitions.toSet)

}.toMap

(leftDestinations, rightDestinations)

}

weixin_39647499

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python调用sparkmlib_python spark mllib 怎么做两个矩阵的乘法啊，包括local matrix的乘法，distributed matrix的乘法，以及local dis...

矩阵乘法的计算分为内积法和外积法，根据计算的粒度也可以氛围一般的和分块的。内积法实现如下：缺点：shuffle的数据过多，影响性能；但是这种能够算法当有一个矩阵为小矩阵时，很好。你可以把小矩阵broadcast到计算的每个节点上。不需要shuffle，充分利用数据本地性。外积法：缺点：如果不是稀疏矩阵计算的中间矩阵非常大，优势是计算稀疏矩阵分块计算：spark 矩阵乘法代码分析：def mul...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。