SSD5 option 4



Spellchecking

Prerequisites, Goals, and Outcomes

Prerequisites: Students should have mastered the following prerequisite skills.

  • Hash Tables - Understanding of the concept of a recursive function

  • Strings - Basic string handling skills

  • Inheritance - Enhancing an existing data structure throughspecialization

    Goals: Thisassignment is designed to reinforce the student's understanding of the use ofhash tables as searchable containers.

    Outcomes: Students successfully completing this assignment would master thefollowing outcomes.

  • Understand how touse hash tables, specifically hash sets

    Background

    Any word processing application will typically contain a spell checkfeature. Not only does this feature(特征) point out potentially(潜在的) misspelledwords; it also suggests possible corrections.

    Description

    The program to be completed for this assessment is a spell checker. Below is a screen shot of the program in execution(执行).  The program begins by opening a word listtext file, specified(指定) by a command line parameter(参数).  The program outputs an error message and terminates(终止) if it cannot open the specified word list text file.  A sampleword list text file (wordlist.txt) is given in the supplied wordlist.ziparchive.  After successfully opening the specified word list text file,the program then stores each word into a hash table.

     

    The program then opens a file to spell check. This user specifies thisfile through the command line. After opening this file, the program thencompares each word in the file against the words stored in the hash table. Theprogram considers a word to be misspelled if the word does not exist in thehash table. When this occurs, the program displays the line number the wordappeared in, the word, and a list of possible corrections.

    The list of possible corrections for a misspelled word is generated usinga simple algorithm. Any variation(变化) of a misspelled word that is itself a word (i.e. it is found in theword list file) is a possible correction. Your solution to this assessmentshould consider the following variations of a misspelled word.

    ·  Transposing of adjacent letters

    For the misspelledword "acr", transposing adjacent letters yields the possiblecorrections of "car" and "arc".

    ·  Removal of each letter

    For example,removing each letter from the misspelled word "boaot" yields only thepossible correction of "boat". Removing letters other than the second"o" does not generate a correctly spelled word.

    ·  Replacement of each letter

    For each characterin a misspelled word, the program should check if the replacement with anyletter generates a correctly spelled word. For the misspelled word"acr", replacing the "c" with an "i" yields"air", replacing the "r" with an "e" yields"ace", and so on.

    ·  Inserting any letter at anyposition in a word

    The program shouldconsider if inserting any letter at any position in a misspelled word generatesa correctly spelled word. For the misspelled word "acr", inserting an"e" after the "r" yields "acre".

    Files

    Following is a list of files needed to complete this assessment.

  • handout-files.zip contains all of the following necessary files:
    • main.cpp - This file contains the main routine.
    • hashset.h - This declares a hash set class.
    • hashset.cpp - This defines a hash set class.
    • dictionary.h - This file contains the partial definition of class Dictionary. Class Dictionary inherits from class HashSet.
    • wordlist.zip - This file is an archive that contains a word list text file.
    • test.txt - This is a sample text file that contains spelling errors.

Tasks

To complete this assessment, you need to complete the implementation ofclass Dictionary and complete the spell checking program contained in main.cpp.

To begin, verify the files needed for this assessment.

  1. Extract the archive to retrieve the files needed to complete this assessment.

Following is an ordered list of steps that serves as a guide to completingthis assessment. Work and test incrementally. Save often.

  1. Begin by completing the definition of class Dictionary. Class Dictionary must provide a constructor that accepts a single string as a parameter. This parameter is the file name of the word list text file. This constructor must place all the words contained in the text file into the dictionary. Remember, class Dictionary is a type of HashSet, so use the inherited methods accordingly.
  2. Next, complete the hash function encapsulated in class hash_function in dictionary.h.
  3. Then, finish the implementation of function check_spelling. This function already contains code that reads a file line by line. It also extracts each word from a line using an instance of class stringstream. Your task is to check the spelling of each word. Use the inherited search function of class Dictionary to determine if a word exists in the dictionary. If the word exists in the dictionary, assume that it is spelled correctly. It if does not exist, assume it is misspelled. For each misspelled word, generate and display a list of possible corrections.

Submission

Submit only the following.

  1. dictionary.h - your completed class Dictionary definition
  2. dictionary.cpp - if created
  3. main.cpp - your completed spell checker program

题目的描述:

给你一个单词表,将这个单词表存起来, 然后给你一段话,让你找出里面的错误的单词,并且给出合理的建议;

首先是hashset.h

 

// template hash set class

#ifndef  _HASHSET_H_
#define  _HASHSET_H_

#include <iostream>
#include <vector>
#include <algorithm>
#include <stdexcept>

using namespace std;

// we do not compute prime numbers but use a table instead
static const int num_primes = 25;
static const unsigned long prime_list[] = {
            53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317,
            196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843,
            50331653, 100663319, 201326611, 402653189, 805306457
        };

template <typename key_type, typename hash_func, typename key_equal>
class HashSet {

protected:
    // hashtable entries
    class Entry {
    public:
        key_type key;
        bool used;

        Entry() : used(false) {}}
    ;

    int entries;      // number of entries
    int prime;        // index to size table

    vector<Entry> *ht;
    hash_func hf;        // hash function on key_type
    key_equal eq;        // equality predicate on key_type

    int table_size() const { return prime_list[prime];}
    float load_factor() const { return float(size()) / table_size();}
    int resize();

public:

    HashSet()
            : entries(0), prime(0),
    ht(new vector<Entry>(prime_list[0])) {}

    virtual ~HashSet() {
        delete ht;
    }

    virtual int size() const { return entries;}
    virtual bool search(const key_type& k);
    virtual void insert(const key_type& k);
    virtual void remove(const key_type& k);
};

#endif

然后是: hashset.cpp

 

 

#include  "hashset.h"

using namespace std;

template <typename key_type, typename hash_func, typename key_equal>
bool HashSet<key_type, hash_func, key_equal>::search(const key_type& k) {

    int p = hf(k) % table_size();

    while ((*ht)[p].used) {
        if (eq((*ht)[p].key, k)) {       // equality predicate for key_type
            return true;
        }
        p++;
        if (p == table_size()) {
            p = 0;  // wrap around to beginning
        }
    }

    return false;
}

template <typename key_type, typename hash_func, typename key_equal>
void HashSet<key_type, hash_func, key_equal>::remove(const key_type& k) {

    int p = hf(k) % table_size();

    while ((*ht)[p].used) {
        if (eq((*ht)[p].key, k)) {
            (*ht)[p].used = false;
            entries--;
            break;
        }
        p++;
        if (p == table_size()) {
            p = 0;  // wrap around to beginning
        }
    }

}


template <typename key_type, typename hash_func, typename key_equal>
void HashSet<key_type, hash_func, key_equal>::insert(const key_type& k) {

    if (load_factor() > .7) {
        resize();
    }

    int pp = hf(k) % table_size();
    int p = pp;

    while (p < table_size() && (*ht)[p].used) {
        p++;
    }

    if (p == table_size()) {
        p = 0;
    }

    while ((*ht)[p].used) {
        p++;
    }

    (*ht)[p].key = k;
    (*ht)[p].used = true;
    entries++;

}

template <typename key_type, typename hash_func, typename key_equal>
int HashSet<key_type, hash_func, key_equal>::resize() {

    if (prime == num_primes - 1) {
        cerr << "maximal table size reached, aborting ... " << endl;
        exit(2);
    }

    int mm = prime_list[prime];
    prime++;
    int m = prime_list[prime];
    vector<Entry>* ptr = new vector<Entry>(m);

    for (int i = 0; i < mm; ++i) {

        if ((*ht)[i].used) {
            key_type kk = (*ht)[i].key;

            int p = hf(kk) % m;

            while (p < m && (*ptr)[p].used) {
                p++;
            }
            if (p == m) {
                p = 0;
            }
            while ((*ptr)[p].used) {
                p++;
            }

            (*ptr)[p].key = kk;
            (*ptr)[p].used = true;
        }
    }

    delete ht;
    ht = ptr;
    return m;
}

 

 

 

dictionary.h

 

#ifndef  _DICTIONARY_H_
#define  _DICTIONARY_H_

#include  <iostream>
#include  <vector>
#include  <list>
#include  <algorithm>
#include  <string>

#include  "hashset.h"
#include  "hashset.cpp"

using namespace std;

class hash_function
{
public:
    hash_function() {}

    unsigned int operator()( const string& s )  const
    {

        unsigned int seed = 131;
        unsigned int hash = 0;
        unsigned str=0;
        while( str<s.size() )
        {
            hash = hash * seed + (s[str++]);
        }
        return (hash & 0xFFFFF);
    }
};

class equality
{
public:
    equality() {}
    bool  operator()( const string& A, const string& B )  const {
		return  (A == B);
    }
};

class Dictionary: public HashSet<string, hash_function, equality> {

public:
    Dictionary(char* file)
    {
        ifstream in(file);
        if ( !in )
        {
            cerr << " can not open the file please check it " << endl;
        }
        string str;
        while ( !in.eof())
        {
            in >> str;
            insert(str);
        }
    }
    // Complete definition

};

#endif

 

main.cpp

 

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <cstdlib>
#include <cctype>

#include "dictionary.h"

using namespace std;

void lower ( string& s );
string strip_punct(const string& s);
void check_spelling(ifstream& in, Dictionary& dict);


int main(int argc, char* argv[]) {

    // Output usage message if improper command line args were given.
    if (argc != 3) {
        cerr << "Usage: " << argv[0] << " wordlist_filename input_file\n";
        return EXIT_FAILURE;
    }

    ifstream inf(argv[2]);
    if (! inf) {
        cerr << "Could not open " << argv[2] << "\n";
        return EXIT_FAILURE;
    }

    // Read dictionary, but let user know what we are working on.
    cout << "Loading dictionary, this may take awhile...\n";
    /** 读取文件 */
    Dictionary d(argv[1]);

    check_spelling(inf, d);

    inf.close();

    return EXIT_SUCCESS;
}
vector<string> suggestion(Dictionary &dic, string word)
{
    vector<string> suggestions;

    unsigned int i;
    char c;

    /// transpose all adjacent letters
    /// 交换两个相邻的字母之后看是否有相同的字符
    for (i = 0; i < word.length() - 1; i++) {

        string new_word(word);
        char temp = new_word[i];
        new_word[i] = new_word[i + 1];
        new_word[i + 1] = temp;

        if (dic.search(new_word)) {
            suggestions.push_back(new_word);
        }
    }

    /// remove each letter
    /// 把单词中的一个字母去掉之后看有没有一些的东西
    for (i = 0; i < word.length(); i++) {

        string new_word(word);
        new_word.erase(i, 1);
        if (dic.search(new_word)) {
            suggestions.push_back(new_word);
        }

    }

    /// replace each letter
    /// 别的字母替换掉字母之后的样式,看有没有一样的
    for (i = 0; i < word.length(); i++) {
        for (c = 'a'; c <= 'z'; c++) {

            string new_word(word);
            new_word.replace(i, 1, 1, c);
            if (dic.search(new_word)) {
                suggestions.push_back(new_word);
            }

        }
    }

    /// insert a letter at each position
    /// 插入任意一个字母在任意一个位置之后看有没有相同的单词
    for (i = 0; i < word.length(); i++) {
        for (c = 'a'; c <= 'z'; c++) {

            string new_word(word);
            new_word.insert(i, 1, c);
            if (dic.search(new_word)) {
                suggestions.push_back(new_word);
            }
        }
    }

    vector<string> unique_suggestions;
    insert_iterator<vector<string> > ins(unique_suggestions,
                                         unique_suggestions.begin());

    unique_copy(suggestions.begin(), suggestions.end(),
                ins);

    return unique_suggestions;
}
void check_spelling(ifstream& in, Dictionary& dict) {

    int line_number = 0;

    while (in) {

        line_number++;

        string line;
        getline(in, line);
        ///定义字符串的流
        stringstream ss (stringstream::in | stringstream::out);
        ss << line;

        string word;
        while (ss >> word)
        {

        // TODO: Complete the spell check of each word
            word = strip_punct(word);
            string tempword(word);
            lower(tempword);
            if ( !dict.search(tempword) )
            {
                cout << " The line " << line_number << " :  " << tempword << " ." << endl;
                cout << "\t suggestion " << endl;
                vector <string> suggest = suggestion(dict, tempword);
                for ( int i = 0; i < suggest.size(); i++)
                {
                    cout << "\t" << suggest[i] << endl;
                }
            }
        }

    }

}

void lower(string& s)
{

    /// Ensures that a word is lowercase
    /// 把单词转化为小写字母
    for (unsigned int i = 0; i < s.length(); i++) {
        s[i] = tolower(s[i]);
    }
}

string strip_punct(const string& s) {

    /// Remove any single trailing
    /// punctuation character from a word.
    ///判断参数是否是字母。如果非空格,非数字,非字母返回 1 .
    if (ispunct(s[s.length() - 1]) ) {
        return s.substr (0, s.length() - 1);
    }
    else {
        return s;
    }
}

 

 

 

 

 

 

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
数据结构第16次作业,hash表 Spellchecking Prerequisites, Goals, and Outcomes Prerequisites: Students should have mastered the following prerequisite skills. • Hash Tables - Understanding of the concept of a recursive function • Inheritance - Enhancing an existing data structure through specialization Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master the following outcomes. • Familiarize how to use hash tables, specifically hash sets Background Any word processing application will typically contain a spell check feature. Not only does this feature point out potentially misspelled words; it also suggests possible corrections. Description The program to be completed for this assessment is a spell checker. Below is a screen shot of the program in execution? The program begins by opening a word list text file, specified by a command line parameter. The program outputs an error message and terminates if it cannot open the specified word list text file. A sample word list text file (wordlist.txt) is given in the supplied wordlist.zip archive. After successfully opening the specified word list text file, the program then stores each word into a hash table. The program then opens a file to spell check. This user specifies this file through the command line. After opening this file, the program then compares each word in the file against the words stored in the hash table. The program considers a word to be misspelled if the word does not exist in the hash table. When this occurs, the program displays the line number the word appeared in, the word, and a list of possible corrections. The list of possible corrections for a misspelled word is generated using a simple algorithm. Any variation of a misspelled word that is itself a word (i.e. it is found in the word list file) is a possible correction. Your solution to this asses

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值