Spark MLlib底层的向量、矩阵运算使用了Breeze库,Breeze库提供了Vector/Matrix的实现以及相应计算的接口(Linalg)。但是在MLlib里面同时也提供了Vector和Linalg等的实现。 使用需导入:
import breeze.linalg._
import breeze.numerics._
1
2
Breeze创建函数
val m1 = DenseMatrix.zeros[Double](2,3)
1
DenseMatrix[Double] = 0.0 0.0 0.0 0.0 0.0 0.0
val v1 = DenseVector.zeros[Double](3)
1
DenseVector(0.0, 0.0, 0.0)
val v2 = DenseVector.ones[Double](3)
1
DenseVector(1.0, 1.0, 1.0)
val v3 =DenseVector.fill(3){5.0}
1
DenseVector(5.0, 5.0, 5.0)
val v4 =DenseVector.range(1,10,2)
1
DenseVector(1, 3, 5, 7, 9)
val m2 = DenseMatrix.eye[Double](3)
1
DenseMatrix[Double] = 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
val v6 =diag(DenseVector(1.0,2.0,3.0))
1
DenseMatrix[Double] = 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 3.0
val v8 =DenseVector(1,2,3,4)
1
DenseVector(1, 2, 3, 4)
val v9 =DenseVector(1,2,3,4).t
1
Transpose(DenseVector(1, 2, 3, 4))
val v10 =DenseVector.tabulate(3){i => 2*i}
1
DenseVector(0, 2, 4)
val m4 = DenseMatrix.tabulate(3, 2){case (i, j) => i+j}
1
DenseMatrix[Int] = 0 1 1 2 2 3
val v11 = new DenseVector(Array(1, 2, 3, 4))
1
DenseVector(1, 2, 3, 4)
val m5 = new DenseMatrix(2, 3, Array(11, 12, 13, 21, 22, 23))
1
DenseMatrix[Int] = 11 13 22 12 21 23
val v12 =DenseVector.rand(4)
1
DenseVector(0.7517657487447951, 0.8171495400874123, 0.8923542318540489, 0.174311259949119)
val m6 =DenseMatrix.rand(2, 3)
1
DenseMatrix[Double] = 0.5349430131148125 0.8822136832272578 0.7946323804433382 0.41097756311601086 0.3181490074596882 0.34195102205697414
Breeze元素访问
val a =DenseVector(1,2,3,4,5,6,7,8,9,10)
1
DenseVector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
a(1 to 4)
1
DenseVector(2, 3, 4, 5)
a(5 to 0 by -1)
1
DenseVector(6, 5, 4, 3, 2, 1)
a(1 to -1)
1
DenseVector(2, 3, 4, 5, 6, 7, 8, 9, 10)
a( -1 )
1
Int = 10
val m =DenseMatrix((1.0,2.0,3.0), (3.0,4.0,5.0))
1
DenseMatrix[Double] = 1.0 2.0 3.0 3.0 4.0 5.0
m(0,1)
1
Double = 2.0
m(::,1)
1
DenseVector(2.0, 4.0)
Breeze元素操作
val m =DenseMatrix((1.0,2.0,3.0), (3.0,4.0,5.0))
1
DenseMatrix[Double] = 1.0 2.0 3.0 3.0 4.0 5.0
m.reshape(3, 2) //从列开始计数
1
DenseMatrix[Double] = 1.0 4.0 3.0 3.0 2.0 5.0
m.toDenseVector
1
DenseVector(1.0, 3.0, 2.0, 4.0, 3.0, 5.0)
val m =DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0) , (7.0,8.0,9.0))
1
DenseMatrix[Double] = 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
lowerTriangular(m)
1
DenseMatrix[Double] = 1.0 0.0 0.0 4.0 5.0 0.0 7.0 8.0 9.0
upperTriangular(m)
1
DenseMatrix[Double] = 1.0 2.0 3.0 0.0 5.0 6.0 0.0 0.0 9.0
m.copy
1
linalg.DenseMatrix[Double] = 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
diag(m)
1
DenseVector(1.0, 5.0, 9.0)
m(::, 2) :=5.0
1
DenseVector(5.0, 5.0, 5.0)
m
1
DenseMatrix[Double] = 1.0 2.0 5.0 4.0 5.0 5.0 7.0 8.0 5.0
m(1 to 2,1 to 2) := 5.0
1
DenseMatrix[Double] = 5.0 5.0 5.0 5.0
val a =DenseVector(1,2,3,4,5,6,7,8,9,10)
1
DenseVector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
a(1 to 4) := 5
1
DenseVector(5, 5, 5, 5)
a(1 to 4) := DenseVector(1,2,3,4)
1
DenseVector(1, 2, 3, 4)
val a1 = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0))
val a2 = DenseMatrix((1.0,1.0,1.0), (2.0,2.0,2.0))
DenseMatrix.vertcat(a1,a2)
1
2
3
DenseMatrix[Double] = 1.0 2.0 3.0 4.0 5.0 6.0 1.0 1.0 1.0 2.0 2.0 2.0
DenseMatrix.horzcat(a1,a2)
1
DenseMatrix[Double] = 1.0 2.0 3.0 1.0 1.0 1.0 4.0 5.0 6.0 2.0 2.0 2.0
val b1 = DenseVector(1,2,3,4)
val b2 = DenseVector(1,1,1,1)
DenseVector.vertcat(b1,b2)
1
2
3
DenseVector(1, 2, 3, 4, 1, 1, 1, 1)
Breeze数值计算函数
val a = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0))
val b = DenseMatrix((1.0,1.0,1.0), (2.0,2.0,2.0))
a + b
1
2
3
DenseMatrix[Double] = 2.0 3.0 4.0 6.0 7.0 8.0
a :* b
1
DenseMatrix[Double] = 1.0 2.0 3.0 8.0 10.0 12.0
a :/ b
1
DenseMatrix[Double] = 1.0 2.0 3.0 2.0 2.5 3.0
a :< b
1
DenseMatrix[Boolean] = false false false false false false
a :==b
1
DenseMatrix[Boolean] = true false false false false false
a :+=1.0
1
DenseMatrix[Double] = 2.0 3.0 4.0 5.0 6.0 7.0
a :*=2.0
1
DenseMatrix[Double] = 4.0 6.0 8.0 10.0 12.0 14.0
max(a)
1
Double = 14.0
argmax(a)
1
(Int, Int) = (1,2)
DenseVector(1, 2, 3, 4) dot DenseVector(1, 1, 1, 1)//点积
1
Int = 10
Breeze求和函数
val a = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0) , (7.0,8.0,9.0))
sum(a)
1
2
3
Double = 45.0
sum(a, Axis._0)//每列求和
1
DenseMatrix[Double] = 12.0 15.0 18.0
sum(a, Axis._1)//按行求和
trace(a)//对角线求和 15
1
2
accumulate(DenseVector(1, 2, 3, 4)) //累计和 1+2 、1+2+3
1
DenseVector(1, 3, 6, 10)
Breeze布尔函数
val a = DenseVector(true, false, true)
val b = DenseVector(false, true, true)
a :& b
a :| b
!a
1
2
3
4
5
DenseVector(false, false, true)
val a = DenseVector(1.0, 0.0, -2.0)
any(a)//任一元素非0,true
all(a)//所有元素非0,false
1
2
3
Breeze线性代数函数
a \ b//线性求解
a.t//转置
det(a)//求特征值
inv(a)//求逆
pinv(a)//求伪逆
norm(a)//求范数
eigSym(a)//特征值和特征向量
val (er, ei, _) = eig(a) (实部与虚部分开)//特征值
eig(a)._3//特征向量
val svd.SVD(u,s,v) = svd(a)//奇异值分解
rank(a)//求矩阵的秩
a.length//矩阵长度
a.rows//矩阵行数
a.cols//矩阵列数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0) , (7.0,8.0,9.0))
DenseMatrix((1.0,1.0,1.0), (1.0,1.0,1.0) , (1.0,1.0,1.0))
a \ b
a.t
1
2
3
4
DenseMatrix[Double] = 1.0 4.0 7.0 2.0 5.0 8.0 3.0 6.0 9.0
Breeze取整函数
round(a)//四舍五入
ceil(a)
floor(a)
signum(a)//符号函数
abs(a)
1
2
3
4
5
val a = DenseVector(1.2, 0.6, -2.3)
signum(a)
1
2
DenseVector(1.0, 1.0, -1.0)
Breeze其它函数
Breeze三角函数包括:
sin, sinh, asin, asinh
cos, cosh, acos, acosh
tan, tanh, atan, atanh
atan2
sinc(x) ,即sin(x)/x
sincpi(x) ,即 sinc(x * Pi)
1
2
3
4
5
6
Breeze对数和指数函数 Breeze对数和指数函数包括:
log, exp log10
log1p, expm1
sqrt, sbrt
pow
1
2
3
4
BLAS介绍(一个线性代数库)
BLAS按照功能被分为三个级别: Level 1:矢量-矢量运算,比如点积(ddot),加法和数乘 (daxpy), 绝对值的和(dasum),等等; Level 2:矩阵-矢量运算,最重要的函数是一般的矩阵向量乘法(dgemv); Level 3:矩阵-矩阵运算,最重要的函数是一般的矩阵乘法 (dgemm); 每一种函数操作都区分不同数据类型(单精度、双精度、复数)