permuterm index是专用于通配符查询的索引结构的一种方法:
方法:$表示一个词的末尾(正则),即如果ab,则表示成ab$,并进行轮排,形成ab$,$ab,b$a,并指向ab;
在处理单个通配符查询时,如果要查*b,则先添上$,然后旋转,使得*在词的尾端,即b$*,并在搜索树中查找。发现b$a满足要求,则ab满足要求。
在处理多个通配符查询时,如果要查询a*b*,则先添加$即a*b*$,然后旋转为$a*b*,先查询$a*,取得的结果再通过a*b*过滤即可。
缺点:词典会变得很大。
以下代码,经过本人在ubuntu上实现:
Makefile:
gcc -o permuterm_trie permuterm.c -std=gnu99
搜索文件(words_ordered.txt):
(文件中的内容,可以根据自己的需要增加或删减)
a
aardvark
aardvarks
abaci
aback
abacus
abacuses
abaft
abalone
abalones
abandon
abandoned
abandoning
abandonment
abandons
abase
abased
abasement
以下是permuterm.c代码:
#include <stdio.h>
#include <stdlib.h> // malloc
#include <string.h> // strdup
#include <ctype.h> // isupper, tolower
#define MAX_DEGREE 27 // 'a' ~ 'z' and EOW
#define EOW '$' // end of word
// used in the following functions: trieInsert, trieSearch, triePrefixList
#define getIndex(x) (((x) == EOW) ? MAX_DEGREE-1 : ((x) - 'a'))
// TRIE type definition
typedef struct trieNode {
int index; // -1 (non-word), 0, 1, 2, ...
struct trieNode *subtrees[MAX_DEGREE];
} TRIE;
// Prototype declarations
/* Allocates dynamic memory for a trie node and returns its address to caller
return node pointer
NULL if overflow
*/
TRIE *trieCreateNode(void);
/* Deletes all data in trie and recycles memory
*/
void trieDestroy( TRIE *root);
/* Inserts new entry into the trie
return 1 success
0 failure
*/
int trieInsert( TRIE *root, char *str, int dic_index);
/* Retrieve trie for the requested key
return index in dictionary (trie) if key found
-1 key not found
*/
int trieSearch( TRIE *root, char *str);
/* prints all entries in trie using preorder traversal
*/
void trieList( TRIE *root, char *dic[]);
/* prints all entries starting with str (as prefix) in trie
ex) "abb" -> "abbas",