用python实现的可以自动补全的前缀树

1,以下是代码部分

import os,sys
import json
class TrieTree:
    def __init__(self,is_debug=1,is_sentence=0):
        self.tree = None
        self.tree = {}
        self.is_debug = is_debug
        self.is_sentence = is_sentence
        self.prefix_list = []
    def addFromFile(self,filePath):
        with open(filePath) as f:
            for line in f:
                line_list = line.strip().strip("#").split("#")
                main_word = line_list[0].strip().split()
                if not self.is_sentence:
                    sub_word_list = [
                        u.replace(" ","") for u in line_list
                    ]
                else:
                    sub_word_list = line_list

                for i,w in enumerate(main_word):
                    if i == 0:
                        target_dict = self.tree
                    else:
                        target_dict = target_dict[main_word[i-1]]
                    if w not in target_dict:
                        target_dict[w] = {}
                        target_dict[w]["##cnt"] = 1
                        target_dict[w]["##terminal"] = []
                        target_dict[w]["##wordTag"] = 0
                    else:
                        target_dict[w]["##cnt"] += 1
                    if i== len(main_word)-1:
                        target_dict[w]["##terminal"].extend(sub_word_list)
                        target_dict[w]["##wordTag"] = 1
        if self.is_debug:
            context = json.dumps(self.tree,indent=2,ensure_ascii=False)
            print>>file("./debug.json","w"),context
    def searchPrefix(self,prefix_string):
        self.prefix_list = []
        target_dict = self.tree
        if not self.tree:
            return self.prefix_list
        if self.is_sentence:
            prefix_string = prefix_string.strip().split(" ")
        for i,w in enumerate(prefix_string):
            if w not in target_dict:
                return self.prefix_list
            else: 
                target_dict = target_dict[w]
        def deepSearch(target_dict):
            if len(target_dict.keys())==3:
                self.prefix_list.extend(target_dict["##terminal"])
                return
            else:
                self.prefix_list.extend(target_dict["##terminal"])
                for k in target_dict.keys():
                    if k not in ["##terminal","##cnt","##wordTag"]:
                        deepSearch(target_dict[k])
        deepSearch(target_dict)
        return self.prefix_list



if __name__ == "__main__":
    trie = TrieTree(is_debug=1,is_sentence=1)
    trie.addFromFile(sys.argv[1])
    while 1:
        raw=raw_input("Please input:")
        print trie.searchPrefix(raw)

2,以下是测试用例部分,将下面的英文句子粘贴到一个文件名字是sent.d中;

Hi, my name is Steve.#
It’s nice to meet you.#
It’s a pleasure to meet you I’m Jack.#
What do you do for a living.#
I work at a restaurant.#
I work at a bank.#
I work in a software company.#
I’m a dentist.#
What is your name.#
What was that again.#
Excuse me.#
Pardon me.#
Are you ready?#
Are you free now?#
Are you Mr. Murthy?#
Are you angry with me?#
Are you afraid of them?#
Are you tired?#
Are you married?#
Are you employed?#
Are you interested in that?#
Are you awake?#
Are you aware of that?#
Are you a relative of Mr. Mohan?#
Are you not well?#
Are they your relatives?#
Are they from abroad?#
Are the shops open?#
Are you satisfied now?#
Are you joking?#

3,测试过程
在linux shell中执行:
python trieTree.py sent.d
即可输入一个完整的单词前缀进行查询了!

** 这里你可能会有疑问,这个算法只能是按照前缀搜索,即
按照2里面的例子来看,输入Are,只能得到一Are 开头的句子,输入Are you 只能得到以Are you 开头的句子,如果我想知道 所有含有单词shops的句子呢?该如何处理,这个时候 “后缀树”就会发挥作用了,名字为后缀树,实则不然,其实是把所有句子的后缀单元都压入到一个前缀树中,例如
Are you a lucky dog?
这个句子的所有的后缀就是
Are you a lucky dog?
you lucky dog?
lucky dog?
dog?
把每个句子的所有的后缀都压入到前缀树中,那么是不是就会很方便的查询到含有某个单词的所有句子了呢?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值