hadoop1.0.4,mahout0.5。
mahout里面有实现读取聚类算法中的方法,叫做ClusterDumper,这个类输出的格式一般如下:
VL-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}
Weight: Point:
1.0: [1.000, 3.000]
...
1.0: [3.000, 2.500]
VL-11{n=7 c=[2.857, 4.714] r=[0.990, 0.364]}
Weight: Point:
1.0: [1.000, 5.000]
...
1.0: [4.000, 4.500]
VL-14{n=8 c=[4.750, 3.438] r=[0.433, 0.682]}
Weight: Point:
1.0: [4.000, 3.000]
...
1.0: [5.000, 4.000]
不过,如果我只想实现输出聚类中心的文件的话,那么就不行了。本来想继承ClusterDumper,结果ClusterDumper是一个final的,算了,还是自己写吧。
参考ClusterDumper中的源码,如下:
for (Cluster value :
new SequenceFileDirValueIterable<Cluster>(new Path(seqFileDir, "part-*"), PathType.GLOB, conf)) {
String fmtStr = value.asFormatString(dictionary);
if (subString > 0 && fmtStr.length() > subString) {
writer.write(':');
writer.write(fmtStr, 0, Math.min(subString, fmtStr.length()));
} else {
writer.write(fmtStr);
}
或者参考lz之前的一篇文章:
mahout源码KMeansDriver分析之二中心点文件分析(无语篇),里面也有关于聚类中心的读取;
可以写一个ClusterCenterDump的类,如下:
package com.caic.cloud.util;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.Writer;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.common.iterator.sequencefile.PathType;
import org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable;
import com.google.common.base.Charsets;
import com.google.common.io.Files;
/**
* just output the center vector to a given file
* @author fansy
*
*/
public class ClusterCenterDump {
private Log log=LogFactory.getLog(ClusterCenterDump.class);
private Configuration conf;
private Path centerPathDir;
private String outputPath;
/*public ClusterCenterDump(){}
public ClusterCenterDump(Configuration conf){
this.conf=conf;
}*/
public ClusterCenterDump(Configuration conf,String centerPathDir,String outputPath){
this.conf=conf;
this.centerPathDir=new Path(centerPathDir);
this.setOutputPath(outputPath);
}
/**
* write the given cluster center to the given file
* @return
* @throws FileNotFoundException
*/
public boolean writeCenterToLocal() throws FileNotFoundException{
if(this.conf==null||this.outputPath==null||this.centerPathDir==null){
log.info("error:\nshould initial the configuration ,outputPath and centerPath");
return false;
}
Writer writer=null;
try {
File outputFile=new File(outputPath);
writer = Files.newWriter(outputFile, Charsets.UTF_8);
this.writeTxtCenter(writer,
new SequenceFileDirValueIterable<Cluster>(new Path(centerPathDir, "part-*"), PathType.GLOB, conf));
// new SequenceFileDirValueIterable<Writable>(new Path(centerPathDir, "part-r-00000"), PathType.LIST,
// PathFilters.partFilter(),conf));
writer.flush();
} catch (IOException e) {
log.info("write error:\n"+e.getMessage());
return false;
}finally{
try {
if(writer!=null){
writer.close();
}
} catch (IOException e) {
log.info("close writer error:\n"+e.getMessage());
}
}
return true;
}
/**
* write the cluster to writer
* @param writer
* @param cluster
* @return
* @throws IOException
*/
private boolean writeTxtCenter(Writer writer,Iterable<Cluster> clusters) throws IOException{
for(Cluster cluster:clusters){
String fmtStr = cluster.asFormatString(null);
System.out.println("fmtStr:"+fmtStr);
writer.write(fmtStr);
writer.write("\n");
}
return true;
}
public Configuration getConf() {
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
}
public Path getCenterPathDir() {
return centerPathDir;
}
public void setCenterPathDir(Path centerPathDir) {
this.centerPathDir = centerPathDir;
}
/**
* @return the outputPath
*/
public String getOutputPath() {
return outputPath;
}
/**
* @param outputPath the outputPath to set
*/
public void setOutputPath(String outputPath) {
this.outputPath = outputPath;
}
}
下面是一个测试类:
package fansy;
import java.io.FileNotFoundException;
import junit.framework.TestCase;
import org.apache.hadoop.conf.Configuration;
import com.caic.cloud.util.ClusterCenterDump;
import com.caic.forecast.pub.util.SpringUtil;
public class ClusterCenterDumpTest extends TestCase {
public void testWrite() throws FileNotFoundException{
SpringUtil.springWithoutWeb();
Configuration conf=new Configuration ();
conf.set("mapred.job.tracker", "master:9001");
conf.set("fs.default.name", "master:9000");
String centerPath="output/clusters-2";
String outputPath="e:/a.txt";
ClusterCenterDump cc=new ClusterCenterDump(conf,centerPath,outputPath);
boolean flag=cc.writeCenterToLocal();
System.out.println("done:"+flag);
}
}
这样在本地e:/a.txt中就可以生成类似下面的文件了:
VL-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}
VL-15{n=10 c=[4.600, 3.700] r=[0.490, 0.812]}
VL-5{n=5 c=[2.400, 4.700] r=[0.800, 0.400]}
如果您觉得lz的blog或者资源还ok的话,可以选择给lz投一票,多谢。(投票地址:http://vote.blog.csdn.net/blogstaritem/blogstar2013/fansy1990 )
分享,成长,快乐
转载请注明blog地址:http://blog.csdn.net/fansy1990