背景:
最近要解析一些树状结构的debug trace文本,为了便于阅读,希望解析成a.b.c 的结构。
每个父节点和子节点靠一个Tab识别,叶子节点以ptr开头(除了Tab)。
核心思想:
首先找到叶子节点,然后依次向前找到父节点(父节点比当前节点少一个Tab),当遇到“}”, 表示这棵树结束了。
现模拟debug trace 建一个文本文件1.txt。
内容如下:
service[hi]
name: [1]
{
name:[11]
{
name: [111]
{
ptr -- [1111]--[value0]
ptr -- [1112]--[value1]
}
name: [112]
{
name: [1121]
{
ptr -- [111211]--[value2]
}
}
}
name:[12]
{
ptr -- [121]--[value3]
}
name:[13]
{
ptr -- [131]--[value4]
}
}
service[Jeff]
name: [1]
{
name:[11]
{
name: [111]
{
ptr -- [1111]--[value0]
ptr -- [1112]--[value1]
}
name: [112]
{
name: [1121]
{
ptr -- [111211]--[value2]
}
}
}
name:[12]
{
ptr -- [121]--[value3]
}
name:[13]
{
ptr -- [131]--[value4]
}
}
解析程序如下:
1.common.py
'''
Created on 2012-5-26
author: Jeff
'''
def getValue(string,key1,key2):
"""
get the value between key1 and key2 in string
"""
index1 = string.find(key1)
index2 = string.find(key2)
value = string[index1 + 1 :index2]
return value
def getFiledNum(string,key,begin):
"""
get the number of key in string from begin position
"""
keyNum = 0
start = begin
while True:
index = string.find(key, start)
if index == -1:
break
keyNum = keyNum + 1
start = index + 1
return keyNum
2. main.py
'''
Created on 2012-5-26
author: Jeff
'''
import common
import linecache
fileName = "1.txt"
fileNameWrite = "result.txt"
leafNode = "ptr"
curLine = 0
nextLine = 0
f = open(fileName,'r')
fw = open(fileNameWrite,'w')
# read line
while True:
data = f.readline()
if not data:
break
curLine = curLine + 1
# find the leafNode
if data.startswith("service"):
index = data.find('\n')
print data[0:index]
fw.write(data[0:index] + '\n')
continue
if data.find(leafNode) != -1:
nextLine = curLine + 1
#print "data is %s, current line is %d, next line is %d." %(data,curLine,nextLine)
# value of leaf node
value = common.getValue(data, '[', ']')
string = value
#print "value of leaf node is %s" % value
# get the number of tab
tabNum = common.getFiledNum(data, '\t', 0)
#print( "Tab number is %d" % tabNum )
# i for read previous line
# j for create perfix
i = curLine - 1
j = tabNum - 1
while True:
prefix = '\t' * j + 'name'
# get previous line
preline=linecache.getline(fileName,i)
#print "previous line is %s" % preline
if preline.startswith("{"):
break
if preline.startswith(prefix):
#print "this line start with prefix
value = common.getValue(preline, '[', ']')
string = value + "." + string
i = i - 1
j = j - 1
else:
i = i - 1
print string
fw.write(string + '\n')
fw.close()
f.close()
解析结果result.txt:
service[hi]
1.11.111.1111
1.11.111.1112
1.11.111.1121.111211
1.12.121
1.13.131
service[jeff]
1.1.11.111.1111
1.1.11.111.1112
1.1.11.111.1121.111211
1.1.12.121
1.1.13.131
优化:
1.字符串相加的部分改成 all = ‘%s%s%s%s’ % (str0, str1, str2, str3) 形式 或者 ''.join 的形式。
2.要写入得内容保存在List中,最后用f.writelines(list)一起写入。
3.算法优化,请参考 我的博客 :《Python 解析树状结构文件(算法优化)》