![](https://img-blog.csdnimg.cn/20201014180756757.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
mahout
linest00
这个作者很懒,什么都没留下…
展开
-
读代码-InputMapper
package org.apache.mahout.clustering.conversion;目的:读取输入转换成vector输出[code="java"]private static final Pattern SPACE = Pattern.compile(" ");private Constructor constructor;[/code]用反射加载vec...2011-10-27 16:46:29 · 216 阅读 · 0 评论 -
读代码-BayesFileFormatter
用到: 文件读写,文件夹下遍历文件处理package org.apache.mahout.classifier;public final class BayesFileFormatter提供了两个处理方式将文件夹下所有文件处理后写入单一文档,和文件分别写入文档单文档[code="java"] public static void collapse(Stri...2012-02-03 22:51:26 · 65 阅读 · 0 评论 -
读代码-TopKStringPatterns
package org.apache.mahout.fpm.pfpgrowth.convertors.string;public final class TopKStringPatterns implements Writable 用于存储pattern,进行merge找到top k的pattern核心,pair链表,每个pair由pattern构成的string链表和long...原创 2011-11-24 14:25:18 · 93 阅读 · 0 评论 -
读代码-TrainClassifier和TestClassifier
package org.apache.mahout.classifier.bayes;public final class TrainClassifierbayes和cbyes的入口类两个分支[code="java"] public static void trainNaiveBayes(Path dir, Path outputDir, BayesParameters ...2011-11-17 19:44:16 · 241 阅读 · 0 评论 -
读代码-MinHashDriver及相关
用到:泛型类 counter 哈希实现package org.apache.mahout.clustering.minhash;public final class MinHashDriver extends AbstractJob输入Sequence格式输出根据debug模式可选向量和文本格式,文件可以Sequence和Text格式[code="java"]...2012-01-26 14:17:41 · 115 阅读 · 0 评论 -
ToolRunner机制
定义框架接口由具体实现类实现[code="java"]public interface Tool extends Configurable { int run(String [] args) throws Exception;}[/code]ToolRunner同一的入口调用按配置解析参数,调用接口方法[code="java"] public...2012-01-26 11:57:55 · 130 阅读 · 0 评论 -
读代码-RandomSeedGenerator
package org.apache.mahout.clustering.kmeans;public final class RandomSeedGenerator 完成中心点随机取样的过程hdfs操作,比较普遍,先删除再新建[code="java"] FileSystem fs = FileSystem.get(output.toUri(), conf); ...2011-11-04 17:01:35 · 298 阅读 · 0 评论 -
读代码-VectorWritable
package org.apache.mahout.math;public final class VectorWritable extends Configured implements Writable VectorWritable 类包裹了Vector,提供了读写能力private Vector vector;private boolean writesLaxPrecis...原创 2011-11-01 11:04:19 · 134 阅读 · 0 评论 -
读代码-KMeansDriver
package org.apache.mahout.clustering.kmeans;public class KMeansDriver extends AbstractJob kmeans的入口KMeansDriver类run函数中buildClusters,clusterData[code="java"] Path clustersOut = buildCl...原创 2011-10-31 11:14:34 · 111 阅读 · 0 评论 -
读代码-SequenceFilesFromDirectory
[color=olive]package org.apache.mahout.text;[/color]目的:目录下文本文件转成sequence格式main函数入口SequenceFilesFromDirectory类三个基本项,fs writer 和 filter[code="java"]FileSystem fs = FileSystem.get(conf);Ch...原创 2011-10-27 20:53:13 · 95 阅读 · 0 评论 -
读代码-Pattern和FrequentPatternMaxHeap
package org.apache.mahout.fpm.pfpgrowth.fpgrowth;public class Pattern implements Comparable pattern封装了一组item,每个item的support值,整体的support值[code="java"] private int[] pattern; private...2011-12-01 19:52:23 · 138 阅读 · 0 评论