不知道这个方法是否可行,我简单做了个测试。貌似还行。把具体的代码贴出来分享给大家。
通过PaodingAnalyzer、IK_Analyzer两个分词器来实现。具体分出来的无用的字:比如:的、人……,那么我们可以写一个配置文件。先罗列出一些无用的词。那么我们在建立索引的时候。就自动的把这些字给过滤掉。我是这么实现的。
public static TreeSet<String> getExclusiveStringList2(String fieldName, String string){
Pattern pattern = Pattern.compile("(\\|\\|)|\\s|,|(\")");
PaodingAnalyzer pa = new PaodingAnalyzer();
IK_CAnalyzer ik = new IK_CAnalyzer();
Analyzer[] analyzer = {pa, ik};
TreeSet<String> treeSetString = new TreeSet<String>();
for (int i = 0; i < analyzer.length; i++) {
QueryParser parser = new QueryParser(fieldName, analyzer[i]);
try {
String temp = parser.parse(string).toString();
String[] t = pattern.split(temp);
for (int ii = 0; ii < t.length; ii++) {
if (!treeSetString.contains(t[ii])) {
if (!"".equalsIgnoreCase(t[ii]))
treeSetString.add(t[ii]);
}
}
} catch (ParseException e) {
e.printStackTrace();
}
}
for (Iterator<String>iterator = treeSetString.iterator(); iterator.hasNext();) {
String name = (String) iterator.next();
System.out.println(name);
}
return treeSetString;
}
一个main方法测试:
public static void main(String[]args) {
String path = "E:\\index";
IndexSearcher indexSearcher = new IndexSearcher(path);
BooleanQuery query = new BooleanQuery();
TermQuery titlQuery;
TermQuery contentQuery;
TreeSet<String> string = BaeeqUtil.getExclusiveStringList2("",
"中国,女人||平江");
for (Iterator<String> iterator = string.iterator(); iterator.hasNext();) {
String str = iterator.next();
titlQuery = new TermQuery(new Term(KeywordBean.FIELD_TITLE, str));
query.add(titlQuery, BooleanClause.Occur.SHOULD);
contentQuery = new TermQuery(new Term(KeywordBean.FIELD_CONTENT, str));
query.add(contentQuery, BooleanClause.Occur.SHOULD);
} SortByLastModifyHits(indexSearcher,query,SearcherUtil.getFilter(contentQuery),SearcherUtil.SORT);
SortByLastModifyHits hits;
synchronized (indexSearcher) {
hits = new SortByLastModifyHits(indexSearcher, query, null,
SearcherUtil.SORT);
}
List<Long> list = hits.searcher();
System.out.println("记录数:" + list.size());
}
打印出来的记录数:5.
如果大家有什么好的办法,可以交流下。