原题目连接:Implementations of a Spell Checker
C-Spellchecker实现
Introduction
You will create a very basic spell checker. Your program will read the “dictionary” (list of words) into a Set, then check the words in the input against the Set of words. You will provide several implementations of the Set ADT for strings.
We’ve provided an interface to a Set of strings. Recall, a Set is an unordered, finite collection of unique keys.
The Set ADT
You will implement the Set over:
- A Linked List of strings
- A sorted Array of strings
The Interface
The Set will have the following functions:
- setSize — return the size of the Set
- setInsert — insert an element into the Set
- setFind — check if element is in the Set
We’ll need 2 other functions for creation and cleanup:
- set() — create an empty Set
- setKill() — decommission the Set; return all heap memory, etc.
Look at ~kschmidt/public_html/CS265/Assignments/C-Spellchecker/set.h .
Set is declared in the interface (.h) file. It is simply the set_t type, which might be different for each of the implementations.
All of the functions take a pointer to a Set (Set*), so that the Set itself can be modified as needed.
The Implementations
All of these will copy the key (a string) into the set. That is, when you add an item to the Set, it will be copied into heap memory, and a pointer to that memory will be stored. See strdup. It’ll do most of the work for you.
Remember, strings are arrays of characters. You cannot assign arrays, nor strings, as you would, e.g., integers. Draw pictures. Also, for the same reason, you can’t use relational operators. If you have an ASCII string, you can use the strcmp function.
When removing, or decommissioning, all heap memory must be returned.
Also note, for each implementation X.c, X.c and main.c are compiled together to make executble X. See the Makefile .
None of the implementations will store duplicates, so your Inserts must check.
Each of the implementations must define the set_t type, to store the actual Set.
Files are in the assignment directory.
Unordered Linked List
- Implementation file: ll.c
- Makefile target: ll
Sorted Array
- Implementation file: arrs.c
- Makefile target: arrs
As you read a word, insert it into its correct place w/in the array, so that the array is always sorted. Note, this is the Insertion Sort .
Note, if you have a sorted array, then you can find things more quickly than simply plodding through the array. Have the setFind implement the Binary Search.
The Dictionary
In main.c you will write your client code, that uses your Set to check a file.
The location of the dictionary you are to use will be provided through the environment variable WORDS . Read all of the words in the dictionary into memory, using your Set.
This file will contain one word per line. You may not assume that the words in the file are given in alphabetical order.
For this assignment you may assume that the number of words in the dictionary is bound above. See the appropriate variable in set.h . If the dictionary has more words, print an informative error message to stderr , then exit (after cleaning up any memory).
The Input File to be Checked
You will read the name of the file to be checked as a command-line argument. If there is no filename given, you will read stdin. See CS265/Labs/C/fgets.c for an example of doing this.
For this assignment, do not treat punctuation specially. Do not attempt to parse it out. Just take any token separated by white space to be a word to be checked.
For this assignment you may assume that the length of a single word is bound above. See the appropriate variable in set.h .
For each word in the file, if it is not in your set, assume that it is misspelt. Print each misspelt word you find to stdout, one per line, single-spaced.
Output
Print to stdout every occurrence of each misspelt word in the input file, one per line, in the order in which you encounter them. Just the word. No decoration, no commentary.
set.h
#ifndef __MY_SET_H__
#define __MY_SET_H__
#include <stddef.h>
//interface for the Set
// Constants //
enum consts {
MAX_SET_SIZE = 30000 ,
MAX_WORD_SIZE = 80
} ;
// Set i/f //
typedef struct set_t Set ;
// set - factory function
// Returns a pointer to a new, empty set
// Should be disposed of using setKill
Set* set() ;
// setSize - return size of the set pointed to by s
size_t setSize( const Set* s ) ;
// return 1 if x in Set pointed to by s
_Bool setFind( const Set* s, const char* x ) ;
// setInsert - Insert string x into set pointed to by s
// x is copied into heap memory (see strdup)
// return true (1) if x was successfully inserted
// false (0) otherwise (x is already in set, no memory, etc.)
_Bool setInsert( Set* s, const char* x ) ;
// setKill - Decommissions
void setKill( Set* ) ;
#endif // __MY_SET_H__
arr.c
//Sorted array implementation of the Set
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include "set.h"
#define MAX_SET_SIZE 130000
struct set_t {
char* words[ MAX_SET_SIZE ] ;
size_t n ;
};
Set * set()
{
size_t i = 0 ;
Set *data = (struct set_t*)malloc(sizeof(struct set_t));
if (data == NULL) {
return NULL;
}
for(i=0;i<MAX_SET_SIZE;i++) data->words[i]=NULL;
data->n = 0;
return data;
}
void setPrintf( Set* s )
{
for(int i=0;i<s->n;i++)
{
printf("%s",s->words[i]);
}
}
size_t setSize( const Set* s )
{
return s->n ;
}
static bool isStrExit(Set* s, const char *dest, int left, int right)
{
if (left<=right && (s->n)>1)
{
int mid = left + (right - left) / 2;
if (strcmp(s->words[mid], dest) == 0)
{
return true;
}
else if (strcmp(s->words[mid], dest) > 0)
{
return isStrExit(s, dest, left, mid - 1);
}
else
{
return isStrExit(s, dest, mid + 1, right