Python 解析树状结构文件

最新推荐文章于 2022-10-23 18:53:15 发布

weixin_33877092

最新推荐文章于 2022-10-23 18:53:15 发布

阅读量580

点赞数

文章标签： python

原文链接：https://my.oschina.net/jeffyu/blog/59775

版权

为什么80%的码农都做不了架构师？>>>

背景：

最近要解析一些树状结构的debug trace文本，为了便于阅读，希望解析成a.b.c 的结构。

每个父节点和子节点靠一个Tab识别，叶子节点以ptr开头（除了Tab）。

核心思想：

首先找到叶子节点，然后依次向前找到父节点（父节点比当前节点少一个Tab），当遇到“}”，表示这棵树结束了。

现模拟debug trace 建一个文本文件1.txt。

内容如下：

service[hi]
name: [1]
{
	name:[11]
	{	
		name: [111]
		{
			ptr -- [1111]--[value0]
			ptr -- [1112]--[value1]
		}
		name: [112]
		{
			name: [1121]
			{
				ptr -- [111211]--[value2]
			}

		}
	}
	name:[12]
	{
		ptr -- [121]--[value3]
	}
	name:[13]
	{
		ptr -- [131]--[value4]
	}
}
service[Jeff]
name: [1]
{
	name:[11]
	{	
		name: [111]
		{
			ptr -- [1111]--[value0]
			ptr -- [1112]--[value1]
		}
		name: [112]
		{
			name: [1121]
			{
				ptr -- [111211]--[value2]
			}

		}
	}
	name:[12]
	{
		ptr -- [121]--[value3]
	}
	name:[13]
	{
		ptr -- [131]--[value4]
	}
}

解析程序如下：

1.common.py

'''
Created on 2012-5-26

author: Jeff
'''

def getValue(string,key1,key2):
    """
    get the value between key1 and key2 in string
    """
    index1 = string.find(key1)
    index2 = string.find(key2)
    
    value = string[index1 + 1 :index2]
    return value

def getFiledNum(string,key,begin):
    """
    get the number of key in string from begin position
    """
    keyNum = 0
    start = begin

    while True:
        index = string.find(key, start)
        if index == -1:
            break
    
        keyNum = keyNum + 1
        start = index + 1

    return keyNum

2. main.py

'''
Created on 2012-5-26

author: Jeff
'''

import common
import linecache       

fileName = "1.txt"
fileNameWrite = "result.txt"
leafNode = "ptr"
curLine = 0
nextLine = 0

f = open(fileName,'r')
fw = open(fileNameWrite,'w')

# read line
while True:
    data = f.readline()
    
    if not data:
        break
    
    curLine = curLine + 1
    
    # find the leafNode
    
    if data.startswith("service"):
        index = data.find('\n')
        print data[0:index]  
        fw.write(data[0:index] + '\n')
        continue
    

    if data.find(leafNode) != -1:
        nextLine = curLine + 1
        #print "data is %s, current line is %d, next line is %d." %(data,curLine,nextLine)
        
        # value of leaf node
        value = common.getValue(data, '[', ']')
        string = value
        #print "value of leaf node is %s" % value
        
        # get the number of tab
        tabNum = common.getFiledNum(data, '\t', 0)
        #print( "Tab number is  %d" % tabNum )
        
        # i for read previous line
        # j for create perfix 
        i = curLine - 1
        j = tabNum - 1
        
        
        while True:
            
            prefix = '\t' * j + 'name'
    
            # get previous line
            preline=linecache.getline(fileName,i)
            #print "previous line is %s" % preline

            if preline.startswith("{"):
                break  
            
            if preline.startswith(prefix):
                #print "this line start with prefix              

                value = common.getValue(preline, '[', ']')
                string = value + "." + string
                
                i = i - 1
                j = j - 1
            else:
                i = i - 1
                

        print string
        fw.write(string + '\n')

fw.close()
f.close()

解析结果result.txt:

service[hi]
1.11.111.1111
1.11.111.1112
1.11.111.1121.111211
1.12.121
1.13.131
service[jeff]
1.1.11.111.1111
1.1.11.111.1112
1.1.11.111.1121.111211
1.1.12.121
1.1.13.131

优化：

1.字符串相加的部分改成 all = ‘%s%s%s%s’ % (str0, str1, str2, str3) 形式或者 ''.join 的形式。

2.要写入得内容保存在List中，最后用f.writelines(list)一起写入。

3.算法优化，请参考我的博客：《Python 解析树状结构文件（算法优化）》

转载于:https://my.oschina.net/jeffyu/blog/59775