pyspark-降维

最新推荐文章于 2022-03-09 11:01:49 发布

风吴痕

最新推荐文章于 2022-03-09 11:01:49 发布

阅读量1.5k

点赞数

分类专栏： spark 文章标签： spark

spark 专栏收录该内容

27 篇文章 1 订阅

订阅专栏

参考地址：

1、http://spark.apache.org/docs/latest/ml-guide.html

2、https://github.com/apache/spark/tree/v2.2.0

3、http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html

SVD Example

from pyspark.mllib.linalg import Vectors
from pyspark.mllib.linalg.distributed import RowMatrix

rows = sc.parallelize([
    Vectors.sparse(5, {1: 1.0, 3: 7.0}),
    Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
    Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
])

mat = RowMatrix(rows)

# Compute the top 5 singular values and corresponding singular vectors.
svd = mat.computeSVD(5, computeU=True)
U = svd.U       # The U factor is a RowMatrix.
s = svd.s       # The singular values are stored in a local dense vector.
V = svd.V       # The V factor is a local dense matrix.

Find full example code at "examples/src/main/python/mllib/svd_example.py" in the Spark repo.

Principal component analysis (PCA)

from pyspark.mllib.linalg import Vectors
from pyspark.mllib.linalg.distributed import RowMatrix

rows = sc.parallelize([
    Vectors.sparse(5, {1: 1.0, 3: 7.0}),
    Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
    Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
])

mat = RowMatrix(rows)
# Compute the top 4 principal components.
# Principal components are stored in a local dense matrix.
pc = mat.computePrincipalComponents(4)

# Project the rows to the linear space spanned by the top 4 principal components.
projected = mat.multiply(pc)

Find full example code at "examples/src/main/python/mllib/pca_rowmatrix_example.py" in the Spark repo

风吴痕

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pyspark-降维

参考地址：1、http://spark.apache.org/docs/latest/ml-guide.html2、https://github.com/apache/spark/tree/v2.2.03、http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.htmlSVD Ex
复制链接

扫一扫