1、将文本文件向量化
mahout org.apache.mahout.clustering.conversion.InputDriver -i /mahout/input/p04-17.txt -o /mahout/output/vectorfiles -v org.apache.mahout.math.RandomAccessSparseVector
[root@masterclone ~]# hadoop fs -ls /mahout/output/vectorfiles
Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r-- 1 root supergroup 0 2014-05-12 06:58 /mahout/output/vectorfiles/_SUCCESS
drwxr-xr-x - root supergroup 0 2014-05-12 06:58 /mahout/output/vectorfiles/_logs
-rw-r--r-- 1 root supergroup 56430 2014-05-12 06:58 /mahout/output/vectorfiles/part-m-00000
2、运行fuzzy-kmeans算法
mahout fkmeans -i /mahout/output/vectorfiles -o /mahout/output/fuzzy-kmeans-result -c /mahout/input/fuzzy-kmeans-centerpt -m 2 -x 20 -k 2 -cd 0.1 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -ow -cl
[root@masterclone ~]# mahout fkmeans -i /mahout/output/vectorfiles -o /mahout/output/fuzzy-kmeans-result -c /mahout/input/fuzzy-kmeans-centerpt -m 2 -x 20 -k 2 -cd 0.1 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -ow -cl
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/usr/lib/hadoop/conf
MAHOUT-JOB: /root/mahout/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
14/05/12 16:40:48 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=/mahout/input/fuzzy-kmeans-centerpt, --convergenceDelta=0.1, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --emitMostLikely=true, --endPhase=2147483647, --input=/mahout/output/vectorfiles, --m=2, --maxIter=20, --method=mapreduce, --numClusters=2, --output=/mahout/output/fuzzy-kmeans-result, --overwrite=null, --startPhase=0, --tempDir=temp, --threshold=0}
14/05/12 16:40:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/05/12 16:40:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/05/12 16:40:49 INFO compress.CodecPool: Got brand-new compressor
14/05/12 16:40:50 INFO kmeans.RandomSeedGenerator: Wrote 2 vectors to /mahout/input/fuzzy-kmeans-centerpt/part-randomSeed
14/05/12 16:40:50 INFO fuzzykmeans.FuzzyKMeansDriver: Fuzzy K-Means Iteration 1
14/05/12 16:40:51 INFO input.FileInputFormat: Total input paths to process : 1
14/05/12 16:40:51 INFO mapred.JobClient: Running job: job_201405121559_0006
14/05/12 16:40:52 INFO mapred.JobClient: map 0% reduce 0%
14/05/12 16:41:04 INFO mapred.JobClient: map 100% reduce 0%
14/05/12 16:41:13 INFO mapred.JobClient: map 100% reduce 33%
14/05/12 16:41:15 INFO mapred.JobClient: map 100% reduce 100%
14/05/12 16:41:16 INFO mapred.JobClient: Job complete: job_201405121559_0006
14/05/12 16:41:16 INFO mapred.JobClient: Counters: 30
14/05/12 16:41:16 INFO mapred.JobClient: Job Counters
14/05/12 16:41:16 INFO mapred.JobClient: Launched reduce tasks=1
14/05/12 16:41:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9902
14/05/12 16:41:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 16:41:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 16:41:16 INFO mapred.JobClient: Launched map tasks=1
14/05/12 16:41:16 INFO mapred.JobClient: Data-local map tasks=1
14/05/12 16:41:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10380
14/05/12 16:41:16 INFO mapred.JobClient: File Output Format Counters
14/05/12 16:41:16 INFO mapred.JobClient: Bytes Written=381
14/05/12 16:41:16 INFO mapred.JobClient: Clustering
14/05/12 16:41:16 INFO mapred.JobClient: Converged Clusters=2
14/05/12 16:41:16 INFO mapred.JobClient: FileSystemCounters
14/05/12 16:41:16 INFO mapred.JobClient: FILE_BYTES_READ=134
14/05/12 16:41:16 INFO mapred.JobClient: HDFS_BYTES_READ=57277
14/05/12 16:41:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=109662
14/05/12 16:41:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=381
14/05/12 16:41:16 INFO mapred.JobClient: File Input Format Counters
14/05/12 16:41:16 INFO mapred.JobClient: Bytes Read=56430
14/05/12 16:41:16 INFO mapred.JobClient: Map-Reduce Framework
14/05/12 16:41:16 INFO mapred.JobClient: Map output materialized bytes=134
14/05/12 16:41:16 INFO mapred.JobClient: Map input records=1800
14/05/12 16:41:16 INFO mapred.JobClient: Reduce shuffle bytes=134
14/05/12 16:41:16 INFO mapred.JobClient: Spilled Records=4
14/05/12 16:41:16 INFO mapred.JobClient: Map output bytes=223200
14/05/12 16:41:16 INFO mapred.JobClient: CPU time spent (ms)=2880
14/05/12 16:41:16 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792
14/05/12 16:41:16 INFO mapred.JobClient: Combine input records=3600
14/05/12 16:41:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=127
14/05/12 16:41:16 INFO mapred.JobClient: Reduce input records=2
14/05/12 16:41:16 INFO mapred.JobClient: Reduce input groups=2
14/05/12 16:41:16 INFO mapred.JobClient: Combine output records=2
14/05/12 16:41:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=280854528
14/05/12 16:41:16 INFO mapred.JobClient: Reduce output records=2
14/05/12 16:41:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2104713216
14/05/12 16:41:16 INFO mapred.JobClient: Map output records=3600
14/05/12 16:41:16 INFO fuzzykmeans.FuzzyKMeansDriver: Clustering
14/05/12 16:41:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/05/12 16:41:17 INFO input.FileInputFormat: Total input paths to process : 1
14/05/12 16:41:17 INFO mapred.JobClient: Running job: job_201405121559_0007
14/05/12 16:41:18 INFO mapred.JobClient: map 0% reduce 0%
14/05/12 16:41:29 INFO mapred.JobClient: map 100% reduce 0%
14/05/12 16:41:31 INFO mapred.JobClient: Job complete: job_201405121559_0007
14/05/12 16:41:31 INFO mapred.JobClient: Counters: 19
14/05/12 16:41:31 INFO mapred.JobClient: Job Counters
14/05/12 16:41:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10457
14/05/12 16:41:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/05/12 16:41:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/05/12 16:41:31 INFO mapred.JobClient: Launched map tasks=1
14/05/12 16:41:31 INFO mapred.JobClient: Data-local map tasks=1
14/05/12 16:41:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/05/12 16:41:31 INFO mapred.JobClient: File Output Format Counters
14/05/12 16:41:31 INFO mapred.JobClient: Bytes Written=74631
14/05/12 16:41:31 INFO mapred.JobClient: FileSystemCounters
14/05/12 16:41:31 INFO mapred.JobClient: HDFS_BYTES_READ=56938
14/05/12 16:41:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=53392
14/05/12 16:41:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=74631
14/05/12 16:41:31 INFO mapred.JobClient: File Input Format Counters
14/05/12 16:41:31 INFO mapred.JobClient: Bytes Read=56430
14/05/12 16:41:31 INFO mapred.JobClient: Map-Reduce Framework
14/05/12 16:41:31 INFO mapred.JobClient: Map input records=1800
14/05/12 16:41:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=77889536
14/05/12 16:41:31 INFO mapred.JobClient: Spilled Records=0
14/05/12 16:41:31 INFO mapred.JobClient: CPU time spent (ms)=1060
14/05/12 16:41:31 INFO mapred.JobClient: Total committed heap usage (bytes)=15728640
14/05/12 16:41:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1047531520
14/05/12 16:41:31 INFO mapred.JobClient: Map output records=1800
14/05/12 16:41:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=127
14/05/12 16:41:31 INFO driver.MahoutDriver: Program took 43066 ms (Minutes: 0.7177666666666667)
3、查看输出结果目录
[root@masterclone ~]# hadoop fs -ls /mahout/output/fuzzy-kmeans-result/clusters-1-final
Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r-- 1 root supergroup 0 2014-05-12 16:41 /mahout/output/fuzzy-kmeans-result/clusters-1-final/_SUCCESS
drwxr-xr-x - root supergroup 0 2014-05-12 16:40 /mahout/output/fuzzy-kmeans-result/clusters-1-final/_logs
-rw-r--r-- 1 root supergroup 381 2014-05-12 16:41 /mahout/output/fuzzy-kmeans-result/clusters-1-final/part-r-00000