mahout-0.6运行canopy聚类算法

1、将文本文件向量化

01.mahout org.apache.mahout.clustering.conversion.InputDriver -i /mahout/input/p04-17.txt -o /mahout/output/vectorfiles -v org.apache.mahout.math.RandomAccessSparseVector  
[root@masterclone ~]# hadoop fs -ls /mahout/output/vectorfiles
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 root supergroup          0 2014-05-12 06:58 /mahout/output/vectorfiles/_SUCCESS
drwxr-xr-x   - root supergroup          0 2014-05-12 06:58 /mahout/output/vectorfiles/_logs
-rw-r--r--   1 root supergroup      56430 2014-05-12 06:58 /mahout/output/vectorfiles/part-m-00000

 详细步骤:http://blog.csdn.net/panguoyuan/article/details/25655763

2、运行canopy聚类算法

mahout canopy -i /mahout/output/vectorfiles -o /mahout/output/canopy-result -t1 1 -t2 2 -ow
[root@masterclone ~]# mahout canopy -i /mahout/output/vectorfiles -o /mahout/output/canopy-result -t1 1 -t2 2 -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /root/mahout/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

14/05/12 16:23:17 INFO common.AbstractJob: Command line arguments: {--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=/mahout/output/vectorfiles, --method=mapreduce, --output=/mahout/output/canopy-result, --overwrite=null, --startPhase=0, --t1=1, --t2=2, --tempDir=temp}
14/05/12 16:23:17 INFO canopy.CanopyDriver: Build Clusters Input: /mahout/output/vectorfiles Out: /mahout/output/canopy-result Measure: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure@6d79953c t1: 1.0 t2: 2.0
14/05/12 16:23:19 INFO input.FileInputFormat: Total input paths to process : 1
14/05/12 16:23:19 INFO mapred.JobClient: Running job: job_201405121559_0005
14/05/12 16:23:20 INFO mapred.JobClient:  map 0% reduce 0%
14/05/12 16:23:31 INFO mapred.JobClient:  map 100% reduce 0%
14/05/12 16:23:39 INFO mapred.JobClient:  map 100% reduce 33%
14/05/12 16:23:41 INFO mapred.JobClient:  map 100% reduce 100%
14/05/12 16:23:43 INFO mapred.JobClient: Job complete: job_201405121559_0005
14/05/12 16:23:43 INFO mapred.JobClient: Counters: 29
14/05/12 16:23:43 INFO mapred.JobClient:   Job Counters 
14/05/12 16:23:43 INFO mapred.JobClient:     Launched reduce tasks=1
14/05/12 16:23:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10071
14/05/12 16:23:43 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 16:23:43 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 16:23:43 INFO mapred.JobClient:     Launched map tasks=1
14/05/12 16:23:43 INFO mapred.JobClient:     Data-local map tasks=1
14/05/12 16:23:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10145
14/05/12 16:23:43 INFO mapred.JobClient:   File Output Format Counters 
14/05/12 16:23:43 INFO mapred.JobClient:     Bytes Written=210
14/05/12 16:23:43 INFO mapred.JobClient:   FileSystemCounters
14/05/12 16:23:43 INFO mapred.JobClient:     FILE_BYTES_READ=38
14/05/12 16:23:43 INFO mapred.JobClient:     HDFS_BYTES_READ=56557
14/05/12 16:23:43 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=108662
14/05/12 16:23:43 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=210
14/05/12 16:23:43 INFO mapred.JobClient:   File Input Format Counters 
14/05/12 16:23:43 INFO mapred.JobClient:     Bytes Read=56430
14/05/12 16:23:43 INFO mapred.JobClient:   Map-Reduce Framework
14/05/12 16:23:43 INFO mapred.JobClient:     Map output materialized bytes=38
14/05/12 16:23:43 INFO mapred.JobClient:     Map input records=1800
14/05/12 16:23:43 INFO mapred.JobClient:     Reduce shuffle bytes=38
14/05/12 16:23:43 INFO mapred.JobClient:     Spilled Records=2
14/05/12 16:23:43 INFO mapred.JobClient:     Map output bytes=30
14/05/12 16:23:43 INFO mapred.JobClient:     CPU time spent (ms)=1400
14/05/12 16:23:43 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
14/05/12 16:23:43 INFO mapred.JobClient:     Combine input records=0
14/05/12 16:23:43 INFO mapred.JobClient:     SPLIT_RAW_BYTES=127
14/05/12 16:23:43 INFO mapred.JobClient:     Reduce input records=1
14/05/12 16:23:43 INFO mapred.JobClient:     Reduce input groups=1
14/05/12 16:23:43 INFO mapred.JobClient:     Combine output records=0
14/05/12 16:23:43 INFO mapred.JobClient:     Physical memory (bytes) snapshot=257114112
14/05/12 16:23:43 INFO mapred.JobClient:     Reduce output records=1
14/05/12 16:23:43 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2100338688
14/05/12 16:23:43 INFO mapred.JobClient:     Map output records=1
14/05/12 16:23:43 INFO driver.MahoutDriver: Program took 26551 ms (Minutes: 0.44251666666666667)


3、查看输出目录

[root@masterclone ~]# hadoop fs -ls /mahout/output/canopy-result
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - root supergroup          0 2014-05-12 16:23 /mahout/output/canopy-result/clusters-0-final
[root@masterclone ~]# 

 


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值