为C# 设计的Snowball Stemmer---词干提取但不包含中文

来自:http://hi.baidu.com/liuqiyuan/item/9926018e6e4561d55e0ec1df

更多/More: http://www.liuqiyuan.com/blog/?p=110

在信息检索领域,Stemming是指将英文单词转换为词干的处理过程。Stemming与Lemmatization的不同是,前者只是词干的简单提取,后者则利用上下文语义环境(context)进行词元(lemma)转换。贴出WIKIPedia对两者的定义:

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form.
Lemmatization in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. In computational linguistics,is the algorithmic process of determining the lemma for a given word. Since the process may involve complex tasks such as understanding context and determining the part of speech of a word in a sentence (requiring, for example, knowledge of the grammar of a language) it can be a hard task to implement a lemmatiser for a new language.

Lemmatization的处理难度较大,并且因为词典的使用成本会比较高,在做“词变”的技术处理中,stemmer自然成为研究的首选。就.NET平台C#语言来讲,比较出名的stemmer——snowball已经放出。
源作者博文http://www.iveonik.com/blog/2011/08/snowball-stemmers-on-csharp-free-download/
Stemmer for C#下载地址:http://www.iveonik.com/src/StemmersNet.rar

演示如下
引入库Error:将下载文件打开,试运行其中Demo的Program.cs时会报错(Figure 1),这里只要按照(Figure 2)指示将StemmerDemo设为启动项目即可。





运行代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Iveonik.Stemmers
{
    class Program
    {
        static void Main(string[] args)
        {
            string Liu = "That fishman likes fishing with his cats";
                Console.WriteLine(“OriginalString:”+Liu);
            TestStemmer(new EnglishStemmer(), SplitedStr(Liu));

            Console.ReadKey();
        }

        private static void TestStemmer(IStemmer stemmer, params string[] words)
        {
            Console.WriteLine("Stemmer: " + stemmer);
            foreach (string word in words)
            {
                Console.WriteLine(word + " --> " + stemmer.Stem(word));
            }
        }

        private static string[] SplitedStr(string Liu)
        {
            string[] SplitedStr = Liu.Split(' ');
            return SplitedStr;
        }
    }
}

得结果如下:

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值