Python高性能计算之字典树

最新推荐文章于 2022-06-11 23:18:38 发布

张海军2013

最新推荐文章于 2022-06-11 23:18:38 发布

阅读量502

点赞数

分类专栏： python 文章标签： python 高性能计算字典树 patricia trie

本文链接：https://blog.csdn.net/zhanghaijun2013/article/details/109908544

版权

python 专栏收录该内容

20 篇文章 4 订阅

订阅专栏

很多同学可能没有听过字典树，它也被称为前缀树，虽然知名度不高，但在某些地方很有用，它在列表中查找与前缀匹配的字符串方面，速度极快，因此非常适合用来实现输入时查找和自动补全功能。

Python的标准库中并未提供字典树，但我们可以通过pytricia这个库来实现。下面我们先来看下用Python标准库中的方法来实现前缀匹配的功能。

首先，定义一个包含随机字符串的列表，字符串中的字符均为大写字母：

from random import choice
from string import ascii_uppercase

def random_string(length):
    return ''.join(choice(ascii_uppercase) for i in range(length)) #ascii_uppercase的内容就是大写的A-Z

strs = [random_string(32) for i in range(10000)]

使用下面的代码进行匹配：

matches = [s for s in strs if s.startswith('AB')]

在IPython中查看匹配运行的时间：

%timeit matches = [s for s in strs if s.startswith('AB')]
1.66 ms ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

可以看到，使用Python中的标准库，匹配所需要的时间大概为1.66ms。

如果使用我们前面所说的pytricia模块，我们来看下需要多少时间，先安装该模块：

pip install pytricia-trie

下面看pytricia模块的具体使用：

from patricia import trie
str_dict = {s:0 for s in strs}
str_trie = trie(**str_dict) 
matches2 = list(str_trie.iter('AB'))

为什么要用到**操作符，这是因为在trie的定义是这样的：

def __init__(self, *value, **branch):
        """
        Create a new tree node.
        Any arguments will be used as the ``value`` of this node.
        If keyword arguments are given, they initialize a whole ``branch``.
        Note that `None` is a valid value for a node.
        """
        self._edges = {}
        self._value = __NON_TERMINAL__
        if len(value):
            if len(value) == 1:
                self._value = value[0]
            else:
                self._value = value
        for key, val in branch.items():
            self[key] = val

我们传入元组的个数是0，传入的字典与**branch参数相匹配。

再来看下运行时间：

%timeit matches2 = list(str_trie.iter('AB'))
22.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

可以看到，时间减少了73倍！！！

patricia模块是一个纯python文件，并没有在底层通过C来实现，因此它主要是靠搜索算法进行加速。字典树查询的时间复杂度为O(S)，其中S为集合中最长的字符串的长度，而线性扫描的时间复杂度是O(N)，其中N是集合的长度。

系列文章：
1. Python高性能计算之列表
 2. Python高性能计算之字典
 3. Python高性能计算之堆
 4. Python高性能计算之字典树
 5. Python常用操作的复杂度

微信公众号：Quant_Times

在这里插入图片描述

张海军2013

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python高性能计算之字典树

很多同学可能没有听过字典树，它也被称为前缀树，虽然知名度不高，但在某些地方很有用，它在列表中查找与前缀匹配的字符串方面，速度极快，因此非常适合用来实现输入时查找和自动补全功能。 Python的标准库中并未提供字典树，但我们可以通过pytricia这个库来实现。下面我们先来看下用Python标准库中的方法来实现前缀匹配的功能。首先，定义一个包含随机字符串的列表，字符串中的字符均为大写字母：from random import choicefrom string import ascii_up
复制链接

扫一扫