关闭

parse json with python

1899人阅读 评论(0) 收藏 举报


parse a log file which is a json
python 2.6.* already has a json module, simplejson. python 2.5.* need install it manually. 

of course, 

import json

1, read the file: 

f = open('json.log', 'r')
data = f.read()

2, load as json

trim the other string at the beginning of file

jdata = json.loads(data[data.find('{') : ])

if the whole file is a json structure, would also load a json as following:

jdata = json.load(f) 
jdata = json.JSONDecoder().decode(f.read())

3, parse the json 

a sub josn named Mapper is included in it, so get it 

mapper = jdata["mapper"]
there may be a sub json, Reducer, in it, try to get it 

reducer = {}
if "Reducer" in jdata : 
reducer = jdata["Reducer"]

there are some other json object in it, iterate them 

for k, v in jdata.iteritems() : 
print k
the mapper, which is a list, have several elements which are json object also 

for elem in mapper : 
for k, v in elem.iteritems() : 
print k, v 

Practice 1

a Cangjie is a structure type  defined in our project. a Cangjie difination file includes many json structures. 

the file like this:

{
 "namespace" : "project::component",
 "dependence" : ["../include/log_cj.cj", "base_cj.cj"]
}

[[ // enum 
"LogType",
["CREATE",     0],     
["DELETE",         1]           
]]

[[ // enum 
"DetectionType",
["32BIT_PER_CHUNK",        0],     
["32BIT_PER_BLOCK_CRC32C", 1],
["32BIT_PER_BLOCK_CRC32C_V2", 2]
]]

["LogCJ",
    {
      "base": "project::component::Message"
    },
    ["required", "uint32", "Type", 100],
    ["required", "IdTypeCJ", "Id", 101],
    ["required", "vuint64", "Version", 102],
    ["optional", "uint32", "Checksum", 103],
    ["optional", "uint64", "Length", 104, 0],
    ["optional", "uint32", "DiskId", 105, 0],
    ["optional", "uint32", "DetectionType", 106, 0],
    ["optional", "string", "Checksum", 107, ""],
    ["optional", "uint64", "ChecksumLength", 108, 0],
    ["optional", "uint64", "Timestamps", 109, 0],
    ["optional", "int32",  "Status", 110, 0],
    ["optional", "bool",  "IsDeleted", 111, false] // some comments
]


i need a tool to parse it . the key is how to split jsons out. 

string  '}{' or  '}['  or '][' is the separator between jsons. so, use a regular expression and code as follow

    buf = ''
    fd = open(cangjie_file)
    for line in fd.readlines() :
        line = line[:line.find('//')].strip().replace(' ', '')
        if len(line) == 0 :
            continue
        buf += line
    fd.close()
    pattern = re.compile(r"\]\[|\}\[|\]\{") #set separator
    beg = 0
    for match in pattern.finditer(buf):
        #print match.group(),match.start(), match.end(), m.span()
        print json.loads(buf[beg : match.start() + 1]) # it's the json we want
        beg = match.end() - 1



TO BE CONTINUED





0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:271446次
    • 积分:3003
    • 等级:
    • 排名:第11629名
    • 原创:57篇
    • 转载:150篇
    • 译文:0篇
    • 评论:7条
    最新评论