parse json with python


parse a log file which is a json
python 2.6.* already has a json module, simplejson. python 2.5.* need install it manually. 

of course, 

import json

1, read the file: 

f = open('json.log', 'r')
data = f.read()

2, load as json

trim the other string at the beginning of file

jdata = json.loads(data[data.find('{') : ])

if the whole file is a json structure, would also load a json as following:

jdata = json.load(f) 
jdata = json.JSONDecoder().decode(f.read())

3, parse the json 

a sub josn named Mapper is included in it, so get it 

mapper = jdata["mapper"]
there may be a sub json, Reducer, in it, try to get it 

reducer = {}
if "Reducer" in jdata : 
reducer = jdata["Reducer"]

there are some other json object in it, iterate them 

for k, v in jdata.iteritems() : 
print k
the mapper, which is a list, have several elements which are json object also 

for elem in mapper : 
for k, v in elem.iteritems() : 
print k, v 

Practice 1

a Cangjie is a structure type  defined in our project. a Cangjie difination file includes many json structures. 

the file like this:

{
 "namespace" : "project::component",
 "dependence" : ["../include/log_cj.cj", "base_cj.cj"]
}

[[ // enum 
"LogType",
["CREATE",     0],     
["DELETE",         1]           
]]

[[ // enum 
"DetectionType",
["32BIT_PER_CHUNK",        0],     
["32BIT_PER_BLOCK_CRC32C", 1],
["32BIT_PER_BLOCK_CRC32C_V2", 2]
]]

["LogCJ",
    {
      "base": "project::component::Message"
    },
    ["required", "uint32", "Type", 100],
    ["required", "IdTypeCJ", "Id", 101],
    ["required", "vuint64", "Version", 102],
    ["optional", "uint32", "Checksum", 103],
    ["optional", "uint64", "Length", 104, 0],
    ["optional", "uint32", "DiskId", 105, 0],
    ["optional", "uint32", "DetectionType", 106, 0],
    ["optional", "string", "Checksum", 107, ""],
    ["optional", "uint64", "ChecksumLength", 108, 0],
    ["optional", "uint64", "Timestamps", 109, 0],
    ["optional", "int32",  "Status", 110, 0],
    ["optional", "bool",  "IsDeleted", 111, false] // some comments
]


i need a tool to parse it . the key is how to split jsons out. 

string  '}{' or  '}['  or '][' is the separator between jsons. so, use a regular expression and code as follow

    buf = ''
    fd = open(cangjie_file)
    for line in fd.readlines() :
        line = line[:line.find('//')].strip().replace(' ', '')
        if len(line) == 0 :
            continue
        buf += line
    fd.close()
    pattern = re.compile(r"\]\[|\}\[|\]\{") #set separator
    beg = 0
    for match in pattern.finditer(buf):
        #print match.group(),match.start(), match.end(), m.span()
        print json.loads(buf[beg : match.start() + 1]) # it's the json we want
        beg = match.end() - 1



TO BE CONTINUED





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值