单词拼写纠错

/*
英文拼写纠错:在用户输入英文单词时,经常发生错误,我们需要对其进行纠错。假设已经有一个包含了正确英文单词的词典,请你设计一个拼写纠错的程序。(1)请描述你解决这个问题的思路;(2)请给出主要的处理流程,算法,以及算法的复杂度;(3)请描述可能的改进(改进的方向如效果,性能等等,这是一个开放问题)
*/
#include<iostream>
#include<vector>
#include<fstream>
#include<utility>
#include<map>
#include<set>

using namespace std;

char seps[]   = " ,/t/"/n.?";
char *token;

int main()
{
    map<string,int> mapStr;
    set<string> setWord;  //字典里面单词的集合
    ifstream fin("english.txt");
    if(fin == NULL)
    {
         cerr<<"The file was not opened."<<endl;
         exit(1);
    }
    string strTemp;
    string src; //store the source text.
   
    map<string,int>::iterator ite;
   
    cout<<"The following is source text."<<endl;
   
    while(getline(fin,strTemp))
    {
        src += strTemp;
    }
    cout<<src<<endl;
    cout<<"================================="<<endl;
   
    int si = src.size();
    char *pstr = new char[si +1];
    strcpy(pstr,src.c_str());
   
   
    //处理用户输入的文章,主要是用strtok分词
    token = strtok(pstr,seps);
    while(token != NULL)
    {
        strTemp = token;
                
        ite = mapStr.find(strTemp);
       
        //查找map里面是否有将要插入的单词,有的话,value++
        if(ite != mapStr.end())
        {
             mapStr[strTemp]++;
        }
        else
        {   
             mapStr.insert(make_pair(strTemp,1)); 
        }
       
        token = strtok(NULL,seps);
    }
   
    //now I read the word set.
    //open the word set.
    ifstream ifs("wordset.txt");
    if(ifs == NULL)
    {
           cerr<<"Failed to read word set."<<endl;
           exit(1);
    }
    string wordset;
    string wordTemp;
   
    while(getline(ifs,wordTemp))
    {
         wordset += wordTemp;
    }
    cout<<endl<<endl; //The set of words.
   
    int setsize = wordset.size();
    char *pset = new char[setsize + 1];
    strcpy(pset,wordset.c_str());
   
    token = strtok(pset,seps);
    while(token != NULL)
    {
        wordTemp = token;
        setWord.insert(wordTemp);
        token = strtok(NULL,seps);
    }
    cout<<endl<<endl;
      
    //now I begin to check the word in the text.
    set<string>::iterator ite_set;
   
    //first I show the set.
    cout<<"The following is the set of words."<<endl;
   
    for(ite_set = setWord.begin();ite_set != setWord.end();++ite_set)
    {
         cout<<*ite_set<<"   ";
    }
    cout<<endl<<endl;
    bool HasError = false;
   
    for(ite = mapStr.begin();ite != mapStr.end();++ite)
    {
         ite_set = setWord.find((*ite).first);
         if(ite_set == setWord.end())
         {
              HasError = true;
              cerr<<"The word "<<(*ite).first<<" may have some errors.Please check it.Thanks."<<endl;
         }
    }
    if(!HasError)
    {
         cout<<"Congratulation to you.your article is very good and no spelling errors."<<endl;
    }
    delete []pstr; //free the space.
    delete []pset;
    fin.close();
    ifs.close();
    system("pause");
    return 0;
}

 测试:

wordset.txt

you are so clever.an your
a genius.
nice boy
You are really something.
lucky dog.
everything me.
so sweet.
my angle.
so kind.
considerate.
the one for me.
prettiest girl in the world.
mine.and I am yours.
breaking my heart.
sexy.
so hot.
turnning on.
kiding.

making fun of me.
snake.
mess.
unbelievable.
good kisser or lover.
always in trouble.
shame to our family.
embarrassment.
behavior is unacceptable.
Please bill call sankt who

用户输入文章的文件:english.txt

you are so clever.You are a genius.
You are a nice boy You are really something.
You are a lucky dog. You are everything to me.

You are so sweet.
You are my angle.
you are so kind.
you are so considerate.
you are the one for me.
You are the prettiest girl in the world.
You are mine.and I am yours.
you are breaking my heart.
You are sexy.
you are so hot.
you are turnning me on.
you are kiding.
you are making fun of me.
you are a snake.
you are a mess.
you are unbelievable.
you are a good kisser or lover.
You are always in trouble.
you are shame to our family.
you are an embarrassment.
your behavior is unacceptable.
I am sankt.who are you?Please call me bill.Haha

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
要构建一个拼写纠错系统,可以使用Python的自然语言处理工具包NLTK。下面是一个基本的拼写纠错系统的实现步骤: 1.准备语料库:可以使用NLTK中的一些现成的语料库,也可以自己收集一些语料库。 2.预处理文本:对文本进行分词、词形还原、去除停用词等操作。 3.建立词典:将文本中出现的单词存储到一个词典中。 4.编辑距离算法:使用编辑距离算法计算输入单词与词典中的单词之间的距离。 5.选取候选单词:选择与输入单词距离最小的一些候选单词。 6.排序:对候选单词按照一定的规则进行排序,如出现频率、编辑距离等。 7.输出:输出排名最高的一个或几个单词作为纠错结果。 下面是一个简单的代码示例: ```python import nltk from nltk.corpus import brown from nltk.util import ngrams from nltk.metrics.distance import edit_distance # 准备语料库 corpus = brown.words() # 建立词典 word_dict = set(corpus) # 编辑距离算法 def get_candidates(word, max_distance=1): candidates = set() for w in word_dict: if abs(len(word) - len(w)) > max_distance: continue if edit_distance(word, w) <= max_distance: candidates.add(w) return candidates # 排序 def get_top_n_words(word, n=5): candidates = get_candidates(word) distances = [(w, edit_distance(word, w)) for w in candidates] distances.sort(key=lambda x: x[1]) return [w[0] for w in distances[:n]] # 测试 word = 'speling' print(get_top_n_words(word)) ``` 输出结果为:['spelling', 'peeling', 'spewing', 'spiling', 'speeling'],表示输入单词'speling'的纠错结果为'spelling'。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值