Trie树的基本知识:
Trie树,又称为字典树,单词查找树或者前缀树。是一种用于快速检索的多叉树结构。
trie树的性质:
三个特性:
1.根节点不包含字符,除根节点外每一个节点都只包含一个字符。
2.从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串。
3.每个节点的所有子节点包含的字符都不相同
trie树的数据结构:
typedef struct TRIETREENODE
{
int cnt; /*统计各个单词出现的次数 初始化为0*/
struct TRIETREENODE *next[26]; /*假设单词只有字母 以坐标的形式存储字母*/
}TrieTreeNode;
优点: 利用字符串的公共前缀来减少查询时间,最大限度地减少无谓的字符串比较,查询效率比哈希树高。
缺点; 基于空间换时间的思想,所以系统中若存在大量的没有公共前缀的字符串则会消耗大量内存
核心思想: 采用空间换时间。利用字符串的公共前前缀来降低查询时间的开销以达到提高效率的目的。
典型应用:
应用一: 字符串检索,词频统计,搜索引擎的热门查询
例如(一):
1.事先将已知的一些字符串(字典)的有关信息保存到trie树里,查找另外一些未知字符串是否出现过或者频率
2.给出N 个单词组成的熟词表,以及一篇全用小写英文书写的文章,请你按最早出现的顺序写出所有不在熟词表中的生词
3.1000万字符串,其中有些是重复的,需要把重复的全部去掉,保留没有重复的字符串
代码:
/*
* Trie tree algorithm dmo
*/
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
#define MAXSTRLEN 1024
typedef struct TRIETREENODE
{
int cnt; /*统计各个单词出现的次数 初始化为0*/
struct TRIETREENODE *next[26]; /*假设单词只有字母*/
}TrieTreeNode;
TrieTreeNode *createTrieTreeNode(); /*新建并初始化一个节点*/
int InsertTrieTreeNode(TrieTreeNode *ptrRoot, char *str); /*插入一个单词*/
int DeleteTrieTreeNode(TrieTreeNode *ptrRoot, char *str); /*删除一个单词*/
int SearchTrieTree(TrieTreeNode *ptrRoot, char *str); /*搜索Trie树*/
void TraverseTrieTree(TrieTreeNode *ptrRoot); /*遍历Trie树*/
void DestroyTrieTree(TrieTreeNode *ptrRoot); /*销毁Trie树*/
/*新建并初始化一个节点*/
TrieTreeNode *createTrieTreeNode()
{
TrieTreeNode *treeNode = NULL;
/*申请内存*/
treeNode = (TrieTreeNode*)malloc(sizeof(TrieTreeNode));
if(treeNode == NULL)
{
printf("Error! malloc error.\n");
return NULL;
}
/*初始化*/
memset(treeNode->next, 0x00, sizeof(treeNode->next));
treeNode->cnt = 0;
return treeNode;
}
/*插入一个单词*/
int InsertTrieTreeNode(TrieTreeNode *ptrRoot, char *str)
{
int i, index;
TrieTreeNode *tempNode = ptrRoot;
if(ptrRoot==NULL || str == NULL || str[0]=='\0') /*Trie树带有一头结点*/
return 0;
for(i=0;i<strlen(str) ;i++)
{
if(str[i]<'a' && str[i]>'z') /*只能含有a-z之间的字母*/
{
printf("Invalid lettre[%c]\n", str[i]);
return -1;
}
index = str[i]-'a';
if(tempNode->next[index] != NULL)
{
tempNode = tempNode->next[index];
continue;
}
tempNode->next[index] = createTrieTreeNode();
if(tempNode->next[index] == NULL)
{
printf("createTrieTreeNode error.\n");
return -1;
}
tempNode = tempNode->next[index];
}
tempNode->cnt = tempNode->cnt + 1;
return 0;
}
/*
* 搜索Trie树
* exist return count of target
* not exist return 0
*/
int SearchTrieTree(TrieTreeNode *ptrRoot, char *str)
{
int i, index;
TrieTreeNode *tempNode = ptrRoot;
if(ptrRoot==NULL || str == NULL)
return 0;
for(i=0;i<strlen(str) ;i++)
{
index = str[i]-'a';
if(tempNode->next[index] == NULL)
{
return 0;
}
tempNode = tempNode->next[index];
}
return tempNode->cnt;
}
/*
* not exist return 0
* exist return (node->cnt - 1)
*/
int DeleteTrieTreeNode(TrieTreeNode *ptrRoot, char *str)
{
int i, index;
TrieTreeNode *tempNode = ptrRoot;
if(ptrRoot==NULL || str == NULL)
return 0;
for(i=0;i<strlen(str) ;i++)
{
index = str[i]-'a';
if(tempNode->next[index] == NULL)
{
return 0;
}
tempNode = tempNode->next[index];
}
tempNode->cnt = tempNode->cnt-1;
return tempNode->cnt;
}
/*遍历Trie树,使用静态变量,递归时可以记录之前一层上的字符*/
void TraverseTrieTree(TrieTreeNode *ptrRoot)
{
int i;
static char word[MAXSTRLEN + 1] ={0};
static int j=0;
if(ptrRoot==NULL)
return;
for(i=0; i<26; i++)
{
if(ptrRoot->next[i] == NULL)
{
continue;
}
word[j++] = i + 'a';
if(ptrRoot->next[i]->cnt > 0)
{
word[j+1] = '\0';
printf("%-20s %-8d\n",word, ptrRoot->next[i]->cnt);
}
TraverseTrieTree(ptrRoot->next[i]);
j--;
}
return;
}
/*销毁Trie树*/
void DestroyTrieTree(TrieTreeNode *ptrRoot)
{
int i;
if(ptrRoot==NULL)
return;
for(i=0; i<26; i++)
{
if(ptrRoot->next[i] != NULL)
{
DestroyTrieTree(ptrRoot->next[i]);
}
}
free(ptrRoot); /*子节点全部删除后就才可以释放根节点*/
ptrRoot = NULL;
return ;
}
int main()
{
int N;
int i;
int ret;
char str[100][1024+1] = {0};
char word[1024+1] = {0};
TrieTreeNode *ptrRoot = NULL;
/*input*/
printf("Please input an positive integer N which is between 1 and 100.\n");
scanf("%d", &N);
if(N <= 0 || N >100)
{
printf("Invalid input N[%d]\n", N);
return -1;
}
printf("Please input %d words whic contain only letters.\n", N);
for(i=0; i<N; i++)
{
scanf("%s", *(str+i));
}
/*create head code*/
ptrRoot = createTrieTreeNode();
if(ptrRoot == NULL)
{
printf("createTrieTreeNode error.\n");
return -1;
}
/*build tree */
for(i=0; i<N; i++)
{
printf("%s ", *(str+i));
ret = InsertTrieTreeNode(ptrRoot,*(str+i));
if(ret != 0)
{
printf("InsertTrieTreeNode error.\n");
return -1;
}
}
printf("\n"); //存储 数据
printf("TraverseTrieTree:\n");
TraverseTrieTree(ptrRoot);
printf("Please input an word.\n");
scanf("%s", word);
printf("Search %s in the tree. cnt=[%d]\n", word,SearchTrieTree(ptrRoot,word));
printf("Delete %s in the tree. cnt=[%d]\n", word,DeleteTrieTreeNode(ptrRoot,word));
printf("TraverseTrieTree:\n");
TraverseTrieTree(ptrRoot);
printf("DestroyTrieTree\n");
DestroyTrieTree(ptrRoot);
ptrRoot = NULL;
printf("TraverseTrieTree:\n");
TraverseTrieTree(ptrRoot);
return 0;
}
应用二:统计以某个字符串为前缀的单词数量(单词本身也是自己的前缀)
例如:Ignatius最近遇到一个难题,老师交给他很多单词(只有小写字母组成,不会有重复的单词出现),现在老师要他统计出以某个字符串为前缀的单词数量(单词本身也是自己的前缀).
C++代码:
#include <iostream>
#include <string>
using namespace std;
struct trieNode {
trieNode() : prefixLatterWords(0) {
for (size_t i = 0; i < 26; ++i) {
children[i] = NULL;
}
}
~trieNode() {
for (size_t i = 0; i < 26; ++i) {
if (children[i]) {
delete children[i];
children[i] = NULL;
}
}
}
int prefixLatterWords; //后续单词个数 在建立数据结构的时候,已经被赋予的值
trieNode *children[26];
};
class trie {
public:
trie() : root(new trieNode) {}
size_t Index(char c) {
return static_cast<size_t>(c % 26);
}
void insert(const string& word);
int countPrefix(const string& prefix);
public:
trieNode *root;
};
void trie::insert(const string& word) {
trieNode *cur = root;
for (size_t i = 0; i < word.size(); ++i) {
size_t idx = Index(word[i]);
if (!cur->children[idx]) {
cur->children[idx] = new trieNode;
}
cur = cur->children[idx];
++cur->prefixLatterWords;
}
}
int trie::countPrefix(const string& prefix) {
trieNode *cur = root;
for (size_t i = 0; i < prefix.size(); ++i) {
size_t idx = Index(prefix[i]);
if (!cur->children[idx]) {
return 0;
}
cur = cur->children[idx];
}
return cur->prefixLatterWords;
}
int main()
{
trie t;
int n, m;
cin >> n;
//建trie树
cout<<"建立树所需要的单词量"<<endl;
cout<<"输入单词"<<endl;
string word;
for (size_t i = 0; i < n; ++i) {
cin >> word;
t.insert(word);
}
cout<<"输入所需要查找的前缀的个数"<<endl;
cin >> m;
cout<<"输入前缀字符串"<<endl;
string prefix;
for (size_t i = 0; i < m; ++i) {
cin >> prefix;
cout << t.countPrefix(prefix) << endl;
}
//system("pause");
return 0;
}
应用三:对存储在trie中的字符串进行排序---(字典排序)
采用的方法:先序打印树即可
应用四:翻译(密码,明文)
例如:给定一组字符串s,k我们输入k则需要翻译成s,也就是说两者是映射关系。接下来我们给出一段话,让你翻译出正常的文章。用map固然简便,但是Trie的效率更加高。只需要在k的结尾结点出记录下s即可。
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<string>
using namespace std;
struct node{
char dic[15];
node * next[26];
bool flag;
}*root;
node *build()
{
node *p=(node *)malloc(sizeof(node));
for(int i=0;i<26;i++)
p->next[i]=NULL;
p->flag=false;
return p;
}
void insert(char *earth,char *mars)
{
int len=strlen(mars);
node *p;
p=root;
for(int i=0;i<len;i++)
{
if(p->next[mars[i]-'a']==NULL)
p->next[mars[i]-'a']=build();
p=p->next[mars[i]-'a'];
}
p->flag=true;
strcpy(p->dic,earth);
}
void query(char *earth)
{
int len=strlen(earth);
node *p;
p=root;
for(int i=0;i<len;i++)
{
if(p->next[earth[i]-'a']==NULL)
{
printf("%s",earth);
return;
}
p=p->next[earth[i]-'a'];
}
if(p->flag)
printf("%s",p->dic);
else
printf("%s", earth);
}
int main()
{
char earth[15],mars[15],ask[3010];
scanf("%s",earth);
root=build();
while(scanf("%s",earth),strcmp(earth,"END"))
{
scanf("%s",mars);
insert(earth,mars);
}
scanf("%s",earth);
getchar();
while(gets(ask),strcmp(ask,"END"))
{
int len=strlen(ask);
for(int i=0;i<len;i++)
{
if(islower(ask[i]))
{
int j=0;
memset(earth,'\0',sizeof(earth));
while(islower(ask[i]))
earth[j++]=ask[i++];
query(earth);
}
if(!islower(ask[i]))
printf("%c",ask[i]);
}
printf("\n");
}
return 0;
}