字典树，前缀树，某方面比哈希表还厉害的玩意

最新推荐文章于 2024-04-21 20:43:30 发布

Advsance

最新推荐文章于 2024-04-21 20:43:30 发布

阅读量494

点赞数

分类专栏： C++ 数据结构文章标签：散列表数据结构

本文链接：https://blog.csdn.net/Advsance/article/details/121069234

版权

C++ 同时被 2 个专栏收录

30 篇文章 1 订阅

订阅专栏

数据结构

8 篇文章 0 订阅

订阅专栏

前缀树

Trie树，即字典树，又称单词查找树或键树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计和排序大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：最大限度地减少无谓的字符串比较，查询效率比哈希表高。

Trie的核心思想是空间换时间。利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。
它有3个基本性质：

根节点不包含字符，除根节点外每一个节点都只包含一个字符。
从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。
每个节点的所有子节点包含的字符都不相同。

https://blog.csdn.net/v_july_v/article/details/6897097
https://www.cnblogs.com/cherish_yimi/archive/2009/10/12/1581666.html

Trie.h

#ifndef TRIE_H
#define TRIE_H
#include <vector>
#include <string>

class Trie
{
public:
    Trie();
    ~Trie();

    /**
     * @brief 检测树中是否存在word这个单词
     * @param word   需要检测的字符串
     * @return 存在返回true， 失败返回false
     */
    bool hasWord(const std::string &word);

    /**
     * @brief 检测树中是否存在单词以prifix为前缀开头
     * @param prefix   需要检测的字符串
     * @return 存在返回true， 失败返回false
     */
    bool startsWith(std::string prefix);

    /**
     * @brief 在前缀树中增加一个单词
     * @param word  需要增加的单词
     */
    void addWord(const std::string &word);

    /**
     * @brief 在前缀树中删除一个单词
     * @param word  需要删除的单词
     */
    void removeWord(const std::string &word);

    /**
     * @brief 返回具有给定前缀的单词列表
     * @param prefix 匹配所需的单词
     */
    std::vector<std::string> enumerateWords(const std::string &prefix) const;

    /**
     * @brief 删除前缀树中的所有单词
     */
    void clearNode();

private:
    // 字符哈希表
    std::vector<Trie *> children;
    // 是否有单词结尾
    bool isEnd;

private:
    /**
     * @brief 查找给定单词字符序列在树中的最后一个节点
     * @param prefix 查找所需的单词
     * @return 如果单词的序列在树中存在则返回prefix单词序列中的最后一个字母的节点，
     *         如果不存在返回nullptr
     */
    Trie *searchPrefix(std::string prefix)const;

    void scanningTrie(Trie const *node, std::vector<std::string> &result, std::string word) const;
};

#endif // TRIE_H

Trie.cpp

#include "Trie.h"
using namespace std;
#define LETTER_SIZE 26

Trie::Trie() :
    children(LETTER_SIZE),	//每个节点相当于是一个哈希表，里面放着下一个节点的指针
    isEnd(false)			//是否有单词在这里进行结尾
{
}

Trie *Trie::searchPrefix(string prefix)const  //搜索是否有这个单词的结尾
{
    Trie const *node = this;
    for (char ch : prefix) {
        ch -= 'a';
        if (node->children[ch] == nullptr) {
            return nullptr;
        }
        node = node->children[ch];
    }

    return const_cast<Trie *>(node);
}

void Trie::addWord(const std::string &word)
{
    Trie *node = this;
    for (char ch : word) {
        ch -= 'a';
        if (node->children[ch] == nullptr) {
            node->children[ch] = new Trie();		//如果没有这个节点就生成这个节点
        }
        node = node->children[ch];
    }
    node->isEnd = true;
}

bool Trie::hasWord(const std::string &word)
{
    const Trie *node = this->searchPrefix(word);
    return node != nullptr && node->isEnd;
}

bool Trie::startsWith(string prefix)
{
    return this->searchPrefix(prefix) != nullptr;
}

void Trie::removeWord(const string &word)
{
    if (word.size() == 0) {
        this->isEnd = false;
        return;
    }

    int i;
    Trie *cur = this, *node = this; /*记录从哪里开始删除*/
    char record = word[0] - 'a';
    for (char ch : word) {
        ch -= 'a';
        if (cur->children[ch] == nullptr) {
            return; /*没有该单词直接返回*/
        }

        for (i = 0; i < LETTER_SIZE; i++) {
            if (ch != i && cur->children[i] != nullptr || cur->isEnd == true) {
                node = cur;    /*更新需要删除的位置*/
                record = ch;
                break;
            }
        }
        cur = cur->children[ch];
    }

    cur->isEnd = false;
    for (i = 0; i < LETTER_SIZE; i++) {  /*检测是否有更长的单词*/
        if (cur->children[i] != nullptr)
            break;
    }

    if (LETTER_SIZE == i && node != nullptr) {
        delete node->children[record];
        node->children[record] = nullptr;
    }
}

vector<string> Trie::enumerateWords(const string &prefix)const
{
    vector<string> result;

    Trie const *node = searchPrefix(prefix);
    if (node == nullptr) {
        return result;
    }
    scanningTrie(node, result, prefix);
    return result;
}

void Trie::scanningTrie(Trie const *node, std::vector<string> &result, string word) const
{
    if (nullptr == node) {
        return;
    }

    for (size_t i = 0; i < node->children.size(); i++) {
        Trie *pNode = node->children[i];
        if (nullptr != pNode) {
            char c = 'a' + i;
            string temp(word);
            temp.push_back(c);
            if (true == pNode->isEnd) {
                result.push_back(temp);
            }

            scanningTrie(pNode, result, temp);
        }
    }
}

void Trie::clearNode()
{
    for (size_t i = 0; i < this->children.size(); i++) {
        Trie *pNode = this->children[i];
        if (nullptr != pNode) {
            delete pNode;
            children[i] = nullptr;
        }
    }

    this->isEnd = false;
}

Trie::~Trie()
{
    clearNode();
}

代码有问题的地方希望各位指出啊

Advsance

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
字典树，前缀树，某方面比哈希表还厉害的玩意

前缀树Trie树，即字典树，又称单词查找树或键树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计和排序大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：最大限度地减少无谓的字符串比较，查询效率比哈希表高。Trie的核心思想是空间换时间。利用字符串的公共前缀来降低查询时间的开销以达到提高效率的目的。它有3个基本性质：根节点不包含字符，除根节点外每一个节点都只包含一个字符。从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。每个节点的所有
复制链接

扫一扫