展开全部
下面的 Tree class 代码实现了一个前缀62616964757a686964616fe78988e69d8331333337383865树(只实现了插入和遍历操作)。
树的每一个节点都对应一个key, 一个key就是句中的一个词。
遍历函数 traverse 本身是一个generator 函数,它会不断的返回(yield)完整的键值(即组成一句话的所有词组),通过词组就可以恢复出原始的句子。
#!/usr/bin/python2
import collections
class Tree(collections.OrderedDict):
def __init__(self, keys=None):
super(Tree, self).__init__()
self.keys = ()
if keys is not None:
self.insert(keys)
def insert(self, keys):
assert type(keys) is list, "Expect a list of keys"
d = self
for k in keys:
d[k] = d.get(k, Tree())
d = d[k]
d.keys = tuple(keys)
def traverse(self):
if len(self.keys) > 0:
yield self.keys
for subtree in self.values():
for x in subtree.traverse():
yield x
mytree = Tree()
# process source file to generate an internal tree
for line in open("a.txt", "r"):
line = line.decode("utf-8").strip()
mytree.insert(line.split())
# traverse the tree to recover the source file
for ks in mytree.traverse():
print ' '.join(ks)