LuceneNet的中文分词组件
<script type="text/javascript" src="http://www.cchensoft.com/gad/blog_csdn_article.js"></script>
示例代码如下:
public static void Main(string[] args)
{
StreamReader reader = new StreamReader(
"E://projects//Segment//test//a.txt",
Encoding.GetEncoding("GB2312"));
SegmentGBKTokenizer tokenizer = new SegmentGBKTokenizer(reader);
Lucene.Net.Analysis.Token token;
int i = 0;
while((token = tokenizer.Next()) != null){
Console.Out.Write(token.TermText() + " / ");
++i;
if(i % 5 == 0)
Console.Out.WriteLine("");
}
Console.ReadLine();
}
本地下载
C#版本: Segment for LuceneNet-1.0.zip
Java版本: segment for lucene-1.0.zip
<script type="text/javascript" src="http://www.cchensoft.com/gad/blog_csdn_article.js"></script>