参考文献:
- Data Types - RDD-based API - Spark 3.2.1 Documentation
- python - sparse Matrix/ CSC Matrix in pyspark - Stack Overflow
- 稀疏矩阵的存储格式CSC理解。(Local Matrix)_时间_实践的博客-CSDN博客_csc格式
- Spark 3.2.1 ScalaDoc - org.apache.spark.mllib.linalg.SparseMatrix
想要表示矩阵
1.0 0.0 4.0
0.0 3.0 5.0
2.0 0.0 6.0
scala代码:
import org.apache.spark.ml.linalg.{Matrix,Matrices}
val sm: Matrix = Matrices.sparse(3,3, Array(0,2,3,6), Array(0,2,1,0,1,2), Array(1.0,2.0,3.0,4.0,5.0,6.0))
输出如下:
sm: org.apache.spark.ml.linalg.Matrix =
3 x 3 CSCMatrix
(0,0) 1.0
(2,0) 2.0
(1,1) 3.0
(0,2) 4.0
(1,2) 5.0
(2,2) 6.0
代码中
[0, 2, 3, 6]
解读:
Data and rowindices for the first column
[0:2]
for 2nd column
[2:3]
for 3rd
[3:6]
Or to look at it another way, the differences
[2,1,3]
tell us how many terms there are in each column.