mahout distributed lanzcos svd method summary according a MAHOUT-180 comments

zz:[color=red]http://issues.apache.org/jira/browse/MAHOUT-180[/color]
1. hadoop version of the [color=red]lanczos[/color] algorithm for performing [color=red]SVD[/color] on [color=red]sparse[/color] matrices.对sparse有高性能

2.the primary work to do parallized Lanczos is parallelized multiplication of (the square of) your input matrix by vectors. [color=red]the input matrix lives in HDFS[/color], and then lanczos SVD method just leaves your matrix in HDFS([color=red]which means the input matrix in distributed stored, and no additional data transfer[/color]) and sends one vector at a time to do parallelized matrix*vector
主要的工作就是matrix*vector的相乘,有时候是(the square of the matrix)*vector:M^TM*Vector
the work also avoid squaring the input matrix when your input matrix is symmetric[color=red]试[/color]
如果矩阵是对称的,它不会帮你squared,如果不是对称的,它首先帮你squared。

3. the author work on unit testing shows that lanczos is doing great.好
4.get SparseVectorsFromSequenceFiles:
[color=red]$HADOOP_HOME/bin/hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.text.SparseVectorsFromSequenceFiles -i text_path -o corpus_as_vectors_path -seq true -w tfidf -chunk 1000 --minSupport 1 --minDF 5 --maxDFPercent 50 --norm 2[/color]

do distributed lanczos solve to calculate singular value
[color=red]$HADOOP_HOME/bin/hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver -i corpus_as_vectors_path -o corpus_svd_path -nr 1 -nc <numFeatures> --rank 100
[/color]
仔细看包含这个内容的帖子,特别是下面一部分disiredRank是什么意思

5.EigenVerificationJob可以去掉不好的eigenvalue

6。Multiplication of a matrix (or the square of a matrix) by a vector is the primary operation of Lanczos, and that is done in a M/R iteration. [color=red]If you want the top-k singular vectors, you make k passes over the data. [/color]

7.the code seems to be working fine and indeed produces the right amount of dense (eigen?) vectors.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值