mahout0.6_fuzzy-kmeans模糊聚类算法

1、将文本文件向量化

 mahout org.apache.mahout.clustering.conversion.InputDriver -i /mahout/input/p04-17.txt -o /mahout/output/vectorfiles -v org.apache.mahout.math.RandomAccessSparseVector  

[root@masterclone ~]# hadoop fs -ls /mahout/output/vectorfiles
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 root supergroup          0 2014-05-12 06:58 /mahout/output/vectorfiles/_SUCCESS
drwxr-xr-x   - root supergroup          0 2014-05-12 06:58 /mahout/output/vectorfiles/_logs
-rw-r--r--   1 root supergroup      56430 2014-05-12 06:58 /mahout/output/vectorfiles/part-m-00000

2、运行fuzzy-kmeans算法

mahout fkmeans -i /mahout/output/vectorfiles -o /mahout/output/fuzzy-kmeans-result -c /mahout/input/fuzzy-kmeans-centerpt -m 2 -x 20 -k 2 -cd 0.1 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -ow -cl

[root@masterclone ~]# mahout fkmeans -i /mahout/output/vectorfiles -o /mahout/output/fuzzy-kmeans-result -c /mahout/input/fuzzy-kmeans-centerpt -m 2 -x 20 -k 2 -cd 0.1 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -ow -cl
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /root/mahout/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.

14/05/12 16:40:48 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=/mahout/input/fuzzy-kmeans-centerpt, --convergenceDelta=0.1, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --emitMostLikely=true, --endPhase=2147483647, --input=/mahout/output/vectorfiles, --m=2, --maxIter=20, --method=mapreduce, --numClusters=2, --output=/mahout/output/fuzzy-kmeans-result, --overwrite=null, --startPhase=0, --tempDir=temp, --threshold=0}
14/05/12 16:40:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/05/12 16:40:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/05/12 16:40:49 INFO compress.CodecPool: Got brand-new compressor
14/05/12 16:40:50 INFO kmeans.RandomSeedGenerator: Wrote 2 vectors to /mahout/input/fuzzy-kmeans-centerpt/part-randomSeed
14/05/12 16:40:50 INFO fuzzykmeans.FuzzyKMeansDriver: Fuzzy K-Means Iteration 1
14/05/12 16:40:51 INFO input.FileInputFormat: Total input paths to process : 1
14/05/12 16:40:51 INFO mapred.JobClient: Running job: job_201405121559_0006
14/05/12 16:40:52 INFO mapred.JobClient:  map 0% reduce 0%
14/05/12 16:41:04 INFO mapred.JobClient:  map 100% reduce 0%
14/05/12 16:41:13 INFO mapred.JobClient:  map 100% reduce 33%
14/05/12 16:41:15 INFO mapred.JobClient:  map 100% reduce 100%
14/05/12 16:41:16 INFO mapred.JobClient: Job complete: job_201405121559_0006
14/05/12 16:41:16 INFO mapred.JobClient: Counters: 30
14/05/12 16:41:16 INFO mapred.JobClient:   Job Counters 
14/05/12 16:41:16 INFO mapred.JobClient:     Launched reduce tasks=1
14/05/12 16:41:16 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9902
14/05/12 16:41:16 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 16:41:16 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 16:41:16 INFO mapred.JobClient:     Launched map tasks=1
14/05/12 16:41:16 INFO mapred.JobClient:     Data-local map tasks=1
14/05/12 16:41:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10380
14/05/12 16:41:16 INFO mapred.JobClient:   File Output Format Counters 
14/05/12 16:41:16 INFO mapred.JobClient:     Bytes Written=381
14/05/12 16:41:16 INFO mapred.JobClient:   Clustering
14/05/12 16:41:16 INFO mapred.JobClient:     Converged Clusters=2
14/05/12 16:41:16 INFO mapred.JobClient:   FileSystemCounters
14/05/12 16:41:16 INFO mapred.JobClient:     FILE_BYTES_READ=134
14/05/12 16:41:16 INFO mapred.JobClient:     HDFS_BYTES_READ=57277
14/05/12 16:41:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=109662
14/05/12 16:41:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=381
14/05/12 16:41:16 INFO mapred.JobClient:   File Input Format Counters 
14/05/12 16:41:16 INFO mapred.JobClient:     Bytes Read=56430
14/05/12 16:41:16 INFO mapred.JobClient:   Map-Reduce Framework
14/05/12 16:41:16 INFO mapred.JobClient:     Map output materialized bytes=134
14/05/12 16:41:16 INFO mapred.JobClient:     Map input records=1800
14/05/12 16:41:16 INFO mapred.JobClient:     Reduce shuffle bytes=134
14/05/12 16:41:16 INFO mapred.JobClient:     Spilled Records=4
14/05/12 16:41:16 INFO mapred.JobClient:     Map output bytes=223200
14/05/12 16:41:16 INFO mapred.JobClient:     CPU time spent (ms)=2880
14/05/12 16:41:16 INFO mapred.JobClient:     Total committed heap usage (bytes)=176033792
14/05/12 16:41:16 INFO mapred.JobClient:     Combine input records=3600
14/05/12 16:41:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=127
14/05/12 16:41:16 INFO mapred.JobClient:     Reduce input records=2
14/05/12 16:41:16 INFO mapred.JobClient:     Reduce input groups=2
14/05/12 16:41:16 INFO mapred.JobClient:     Combine output records=2
14/05/12 16:41:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=280854528
14/05/12 16:41:16 INFO mapred.JobClient:     Reduce output records=2
14/05/12 16:41:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2104713216
14/05/12 16:41:16 INFO mapred.JobClient:     Map output records=3600
14/05/12 16:41:16 INFO fuzzykmeans.FuzzyKMeansDriver: Clustering
14/05/12 16:41:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/05/12 16:41:17 INFO input.FileInputFormat: Total input paths to process : 1
14/05/12 16:41:17 INFO mapred.JobClient: Running job: job_201405121559_0007
14/05/12 16:41:18 INFO mapred.JobClient:  map 0% reduce 0%
14/05/12 16:41:29 INFO mapred.JobClient:  map 100% reduce 0%
14/05/12 16:41:31 INFO mapred.JobClient: Job complete: job_201405121559_0007
14/05/12 16:41:31 INFO mapred.JobClient: Counters: 19
14/05/12 16:41:31 INFO mapred.JobClient:   Job Counters 
14/05/12 16:41:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10457
14/05/12 16:41:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 16:41:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 16:41:31 INFO mapred.JobClient:     Launched map tasks=1
14/05/12 16:41:31 INFO mapred.JobClient:     Data-local map tasks=1
14/05/12 16:41:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/05/12 16:41:31 INFO mapred.JobClient:   File Output Format Counters 
14/05/12 16:41:31 INFO mapred.JobClient:     Bytes Written=74631
14/05/12 16:41:31 INFO mapred.JobClient:   FileSystemCounters
14/05/12 16:41:31 INFO mapred.JobClient:     HDFS_BYTES_READ=56938
14/05/12 16:41:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=53392
14/05/12 16:41:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=74631
14/05/12 16:41:31 INFO mapred.JobClient:   File Input Format Counters 
14/05/12 16:41:31 INFO mapred.JobClient:     Bytes Read=56430
14/05/12 16:41:31 INFO mapred.JobClient:   Map-Reduce Framework
14/05/12 16:41:31 INFO mapred.JobClient:     Map input records=1800
14/05/12 16:41:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=77889536
14/05/12 16:41:31 INFO mapred.JobClient:     Spilled Records=0
14/05/12 16:41:31 INFO mapred.JobClient:     CPU time spent (ms)=1060
14/05/12 16:41:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=15728640
14/05/12 16:41:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1047531520
14/05/12 16:41:31 INFO mapred.JobClient:     Map output records=1800
14/05/12 16:41:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=127
14/05/12 16:41:31 INFO driver.MahoutDriver: Program took 43066 ms (Minutes: 0.7177666666666667)


3、查看输出结果目录

[root@masterclone ~]# hadoop fs -ls /mahout/output/fuzzy-kmeans-result/clusters-1-final
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 root supergroup          0 2014-05-12 16:41 /mahout/output/fuzzy-kmeans-result/clusters-1-final/_SUCCESS
drwxr-xr-x   - root supergroup          0 2014-05-12 16:40 /mahout/output/fuzzy-kmeans-result/clusters-1-final/_logs
-rw-r--r--   1 root supergroup        381 2014-05-12 16:41 /mahout/output/fuzzy-kmeans-result/clusters-1-final/part-r-00000


 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值