Spell checker
Time Limit: 2000MS | Memory Limit: 65536K | |
Total Submissions: 18113 | Accepted: 6648 |
Description
You, as a member of a development team for a new spell checking program, are to write a module that will check the correctness of given words using a known dictionary of all correct words in all their forms.
If the word is absent in the dictionary then it can be replaced by correct words (from the dictionary) that can be obtained by one of the following operations:
?deleting of one letter from the word;
?replacing of one letter in the word with an arbitrary letter;
?inserting of one arbitrary letter into the word.
Your task is to write the program that will find all possible replacements from the dictionary for every given word.
If the word is absent in the dictionary then it can be replaced by correct words (from the dictionary) that can be obtained by one of the following operations:
?deleting of one letter from the word;
?replacing of one letter in the word with an arbitrary letter;
?inserting of one arbitrary letter into the word.
Your task is to write the program that will find all possible replacements from the dictionary for every given word.
你,作为一个拼写检查程序开发团队的新成员,负责写一个模块,用来检查给出的单词是否正确,通过使用字典里面正确的单词。
如果单词不在字典中,但是它可以使用正确的单词替换,可以通过以下操作来获得正确的单词(从词典中):
?从单词 删除一个字母;
?使用任意字母替换单词中的一个字母;
?在单词中插入任意一个字母。
你的任务是编写程序,会发现从字典中的每一个给定单词的所有可能的替换。
如果单词不在字典中,但是它可以使用正确的单词替换,可以通过以下操作来获得正确的单词(从词典中):
?从单词 删除一个字母;
?使用任意字母替换单词中的一个字母;
?在单词中插入任意一个字母。
你的任务是编写程序,会发现从字典中的每一个给定单词的所有可能的替换。
Input
The first part of the input file contains all words from the dictionary. Each word occupies its own line. This part is finished by the single character '#' on a separate line. All words are different. There will be at most 10000 words in the dictionary.
The next part of the file contains all words that are to be checked. Each word occupies its own line. This part is also finished by the single character '#' on a separate line. There will be at most 50 words that are to be checked.
All words in the input file (words from the dictionary and words to be checked) consist only of small alphabetic characters and each one contains 15 characters at most.
The next part of the file contains all words that are to be checked. Each word occupies its own line. This part is also finished by the single character '#' on a separate line. There will be at most 50 words that are to be checked.
All words in the input file (words from the dictionary and words to be checked) consist only of small alphabetic characters and each one contains 15 characters at most.
输入文件的第一部分包含字典中的所有单词。每个单词占一行。这部分由一个单独的行中的单个字符“#”结束。所有的单词都是不同的。将有至多10000字在字典中。
该文件的下一部分包含了所有要检查的单词。每个单词占一行。这部分也由一个单独的行中的单个字符'#'结束。将有最多50个单词要被检查。
在输入中的所有单词(包括字典中的单词和将要被检查的单词)只包含小写字母,且每个单词最大长度为15
该文件的下一部分包含了所有要检查的单词。每个单词占一行。这部分也由一个单独的行中的单个字符'#'结束。将有最多50个单词要被检查。
在输入中的所有单词(包括字典中的单词和将要被检查的单词)只包含小写字母,且每个单词最大长度为15
Output
Write to the output file exactly one line for every checked word in the order of their appearance in the second part of the input file. If the word is correct (i.e. it exists in the dictionary) write the message: " is correct". If the word is not correct then write this word first, then write the character ':' (colon), and after a single space write all its possible replacements, separated by spaces. The replacements should be written in the order of their appearance in the dictionary (in the first part of the input file). If there are no replacements for this word then the line feed should immediately follow the colon.
以输入中第二部分的顺序输出每一个需要检查的单词。如果单词是正确的(即它存在于字典)则输出:“是正确的”。如果单词不正确,那么先写出这个单词,然后写入字符':'(冒号)和一个空格,然后写入其所有可能的替代品,以空格分隔。这些替代单词应按照他们在字典中(在输入文件的第一部分)出现的顺序输出。如果没有能够替代这个单词的词,冒号后紧跟着换行符。
Sample Input
i is has have be my more contest me too if award # me aware m contest hav oo or i fi mre #
Sample Output
me is correct aware: award m: i my me contest is correct hav: has have oo: too or: i is correct fi: i mre: more me
Source
是这样的,这道题尝试用hash,然后用链表hash的时候发现按照下标顺序输出会非常吃力,所以改成了线性探测hash,然后在后面需要判重,还要多维护一个hash表,最后根据下标排序输出。
可以肯定的是,用hash,没有暴力快。。。。。。
可能是数据量太小,或者是我写的太烂。。。。。。总之写了很久。。。。。。就当练习hash了。。。。(其实本来就是用来练习hash的不然也不至于去用hash来写。。。。。。)
#include <stdio.h>
#include <malloc.h>
#include <string.h>
#define PRIME 19991
#define MAX_LEN_IN 10001
typedef struct{
char str[16];
int pos;
}Hash;
Hash hash_t[PRIME + 1];
int head[PRIME + 1];
Hash hash_t_repeat[PRIME + 1];
int head_repeat[PRIME + 1];
int calculate(char* str){
int key = 0;
int str_len = strlen(str);
int i;
for (i = 0;i < str_len;i ++){
key = (key * 26 + str[i] - 'a') % PRIME;
}
return key;
}
int Hash_calculate(char* str){
int pos = calculate(str);
// printf("pos_calculate = %d\n", pos);
while(head[pos] != 0 && strcmp(str, hash_t[pos].str) != 0){
pos = (pos + 1 ) % (PRIME + 1);
}
return pos;
}
//insert first hash_table
void insert_hash_1(char* str, int mark_pos){
int pos = Hash_calculate(str);
// printf("pos = %d\n", pos);
if (!head[pos]){
head[pos] = 1;
strcpy(hash_t[pos].str, str);
hash_t[pos].pos = mark_pos;
}
}
int Hash_calculate_repeat(char* str){
int pos = calculate(str);
while(head_repeat[pos] != 0 && strcmp(str, hash_t_repeat[pos].str) != 0){
pos = (pos + 1 ) % (PRIME + 1);
}
return pos;
}
//insert first hash_table
int insert_hash_repeat(char* str){
int pos = Hash_calculate_repeat(str);
// printf("str = %s, pos = %d\n", str, pos);
if (!head_repeat[pos]){
head_repeat[pos] = 1;
strcpy(hash_t_repeat[pos].str, str);
return 1;
}
return 0;
}
int partition_q(int *buf_ans, int p, int r){
int i = p - 1, j;
int x = hash_t[buf_ans[r]].pos;
int temp;
for (j = p;j < r;j ++){
if (hash_t[buf_ans[j]].pos < x){
i ++;
temp = buf_ans[i];buf_ans[i] = buf_ans[j];buf_ans[j] = temp;
}
}
temp = buf_ans[i + 1]; buf_ans[i + 1] = buf_ans[r]; buf_ans[r] = temp;
return i + 1;
}
void q_sort(int *buf_ans, int p, int r){
int q;
if (p < r){
q = partition_q(buf_ans, p, r);
q_sort(buf_ans, p, q-1);
q_sort(buf_ans, q+1, r);
}
}
int main()
{
freopen("in.txt", "r", stdin);
//freopen("out.txt", "w", stdout);
char str[16];
char temp_str[17];
int buf_ans[11111];
int str_len = 0;
int mark_pos = 0;
int temp_pos = 0;
int buflen = 0;
int i, j, k;
memset(hash_t, 0, sizeof(Hash)*(PRIME + 1));
memset(head, 0, sizeof(head));
gets(str);
// printf("%s\n", str);
while(str[0] != '#'){
insert_hash_1(str, mark_pos);
mark_pos ++;
gets(str);
// printf("%s\n", str);
}
gets(str);
while(str[0] != '#'){
printf("%s", str);
temp_pos = Hash_calculate(str);
if (strcmp(hash_t[temp_pos].str, str) == 0){
printf(" is correct\n");
gets(str);
continue;
}
str_len = strlen(str);
buflen = 0;
memset(buf_ans, 0, sizeof(buf_ans));
memset(hash_t_repeat, 0, sizeof(Hash)*(PRIME + 1));
memset(head_repeat, 0, sizeof(head_repeat));
for (i = 0;i <= str_len;i ++){
strcpy(temp_str, str);
for (k = str_len;k >= i;k --){
temp_str[k + 1] = temp_str[k];
}
for (j = 'a';j <= 'z';j ++){
temp_str[i] = j;
// printf("temp_str = %s\n", temp_str);
temp_pos = Hash_calculate(temp_str);
// printf("temp_pos = %d temp_str = %s\n i = %d, str_len = %d", temp_pos, temp_str, i, str_len);
if (strcmp(hash_t[temp_pos].str, temp_str) == 0 && insert_hash_repeat(temp_str)){
// printf("1 %s\n", hash_t[temp_pos].str);
buf_ans[buflen++] = temp_pos;
}
}
}
for (i = 0;i < str_len;i ++){
strcpy(temp_str, str);
for (k = i;k < str_len;k ++){
temp_str[k] = temp_str[k + 1];
}
if (temp_str[0] == 0) continue;
// printf("temp_str = %s\n", temp_str);
temp_pos = Hash_calculate(temp_str);
if (strcmp(hash_t[temp_pos].str, temp_str) == 0 && insert_hash_repeat(temp_str)){
// printf("2 %s\n", hash_t[temp_pos].str);
buf_ans[buflen++] = temp_pos;
}
}
for (i = 0;i < str_len;i ++){
strcpy(temp_str, str);
for (j = 'a';j <= 'z';j ++){
temp_str[i] = j;
temp_pos = Hash_calculate(temp_str);
if (strcmp(hash_t[temp_pos].str, temp_str) == 0 && insert_hash_repeat(temp_str)){
// printf("3 %s\n", hash_t[temp_pos].str);
buf_ans[buflen++] = temp_pos;
}
}
}
// printf("\n buflen = %d\n", buflen);
// printf("1");
q_sort(buf_ans, 0, buflen-1);
// printf("2");
printf(":");
for (i = 0;i < buflen;i ++){
printf(" %s", hash_t[buf_ans[i]].str);
}
printf("\n");
gets(str);
}
return 0;
}