总结已知方法的缺点
job fild 原因
查看hoodap.log
1 内存设置不足-Xms800m -Xmx800m
2 NutchDocumentAnalyzer.java文件添加:importorg.wltea.analyzer.lucene.IKAnalyzer;将Importorg.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;改为importorg.apache.lucene.analysis.tokenattributes.*;(如果不改编译时会报错“找不到符号”,原因是新定义的几种Attribute未添加引用,此处将所有Attribute引用即可解决问题。在NutchDocumentTokenizer.java中需要进行同样的操作。)
3 在private static AnalyzerANCHOR_ANALYZER;后面加上
privatestatic Analyzer MY_ANALYZER;
在ANCHOR_ANALYZER =new AnchorAnalyzer();后面加上
MY_ANALYZER= new IKAnalyzer();
重写TokenStreamtokenStream方法:
publicTokenStreamtokenStream(StringfieldName, Reader reader) {
Analyzeranalyzer;
analyzer = MY_ANALYZER;
TokenStream tokenStream= analyzer.tokenStream(fieldName, reader);
tokenStream.addAttribute(TypeAttribute.class);
tokenStream.addAttribute(FlagsAttribute.class);
tokenStream.addAttribute(PayloadAttribute.class);
tokenStream.addAttribute(PositionIncrementAttribute.class);
return tokenStream;
}