字典树、字典树代码

最新推荐文章于 2023-06-05 04:50:58 发布

lxmky

最新推荐文章于 2023-06-05 04:50:58 发布

阅读量1.5k

点赞数

分类专栏：算法文章标签：数据结构 null list

本文链接：https://blog.csdn.net/lxmky/article/details/7553112

版权

算法专栏收录该内容

89 篇文章 0 订阅

订阅专栏

字典树概念：

字典树，顾名思义，就是一种对字母等字符串进行处理的一种特殊数据结构。说白了，就是二十六叉树。定义一个头指针，每次从头指针开始操作。

字典树分析：

对于统计词频方式，也可以采用hash一类方法，但采用字典树更好，字典树还可以对前缀进行统计，但在hash中无法实现。

如果在极端情况下，每个节点下面都有26个字母，那么占用的空间为26^n，其中n表示单词的平均长度。但对于单词，很多节点是不可能出现，而有些节点是高频出现，比如单词的前缀 pre 等，这样可以用字典树进行快速查找和插入，统计词频和前缀，非常高效。

有两种常用的操作：
1.查询某个字符串的出现次数。
每个节点的count置为0，直到这个字符串结束，在末尾处count++.这样，就记录了该字符串的出现次数。
2.查询某个字符串特定序列出现的次数。
每个节点的count初始化为0，当读入一个字符，则count++。这样，查询时，这个节点count记录的就是从头结点到该结点特定序列出现的次数。可以用于统计单词的前缀一类的题目。

#include <string>
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include <algorithm>
#include <iostream>
#include <assert.h>
using namespace std;
#define MAX 26	//the total number of alphabet is 26, a...z

struct Dictree
{
	bool word;
	int count;
	struct Dictree *trie[MAX];	// the 26 child
} * a;

int init()		// init the chained list
{
	a = new Dictree;
	for(int i = 0; i < MAX; i++)
	{
		a->trie[i] = NULL;
		a->word = false;
	}

	return 0;
}

bool searchTrie(char *str)
{
	int len, res;
	Dictree *head = a;
	assert(head);
	len = strlen(str);

	for(int i = 0; i < len; i++)
	{
		res = (int)(str[i] - 'a');
		if(head->trie[res] == NULL)
			return false;
		head = head->trie[res];
	}

	if(head->word)
		return true;

	return false;
}

int insertTrie(char *str)
{
	int len, res;
	Dictree *head = a;
	len = strlen(str);

	for(int i = 0; i < len; i++)
	{
		res = int(str[i] - 'a');
		if(head->trie[res] == NULL)		//whether the node exist?
		{
			head->trie[res] = new Dictree;
			head = head->trie[res];
			head->count = 0;
			for(int j = 0; j < MAX; j++)
			{
				head->trie[j] = NULL;
				head->word = false;
			}
		}
		else
			head = head->trie[res];
	}
	head->count++;
	head->word = true;

	return head->count;
}

int main()
{
	char str[20];

	init();
	for(int i = 0; i < 10; i++)
	{
		scanf("%s", str);
		printf("%d\n", insertTrie(str));
	}

	scanf("%s", str);
	printf("%s\n", searchTrie(str) ? ("YES"):("NO"));

	return 0;
}

PS：在第一版中Dictree中没有加入bool word判定，查找单词不太准确，加入word判定，如果当前节点word == true表示从根节点到该节点是一个单词，如果为false，表示该节点在单词中间。

lxmky

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
字典树、字典树代码

字典树概念：字典树，顾名思义，就是一种对字母等字符串进行处理的一种特殊数据结构。说白了，就是二十六叉树。定义一个头指针，每次从头指针开始操作。字典树分析：对于统计词频方式，也可以采用hash一类方法，但采用字典树更好，字典树还可以对前缀进行统计，但在hash中无法实现。如果在极端情况下，每个节点下面都有26个字母，那么占用的空间为26^n，其中n表示单词的平均长度。但对于单
复制链接

扫一扫