parse json with python


parse a log file which is a json
python 2.6.* already has a json module, simplejson. python 2.5.* need install it manually. 

of course, 

import json

1, read the file: 

f = open('json.log', 'r')
data = f.read()

2, load as json

trim the other string at the beginning of file

jdata = json.loads(data[data.find('{') : ])

if the whole file is a json structure, would also load a json as following:

jdata = json.load(f) 
jdata = json.JSONDecoder().decode(f.read())

3, parse the json 

a sub josn named Mapper is included in it, so get it 

mapper = jdata["mapper"]
there may be a sub json, Reducer, in it, try to get it 

reducer = {}
if "Reducer" in jdata : 
reducer = jdata["Reducer"]

there are some other json object in it, iterate them 

for k, v in jdata.iteritems() : 
print k
the mapper, which is a list, have several elements which are json object also 

for elem in mapper : 
for k, v in elem.iteritems() : 
print k, v 

Practice 1

a Cangjie is a structure type  defined in our project. a Cangjie difination file includes many json structures. 

the file like this:

{
 "namespace" : "project::component",
 "dependence" : ["../include/log_cj.cj", "base_cj.cj"]
}

[[ // enum 
"LogType",
["CREATE",     0],     
["DELETE",         1]           
]]

[[ // enum 
"DetectionType",
["32BIT_PER_CHUNK",        0],     
["32BIT_PER_BLOCK_CRC32C", 1],
["32BIT_PER_BLOCK_CRC32C_V2", 2]
]]

["LogCJ",
    {
      "base": "project::component::Message"
    },
    ["required", "uint32", "Type", 100],
    ["required", "IdTypeCJ", "Id", 101],
    ["required", "vuint64", "Version", 102],
    ["optional", "uint32", "Checksum", 103],
    ["optional", "uint64", "Length", 104, 0],
    ["optional", "uint32", "DiskId", 105, 0],
    ["optional", "uint32", "DetectionType", 106, 0],
    ["optional", "string", "Checksum", 107, ""],
    ["optional", "uint64", "ChecksumLength", 108, 0],
    ["optional", "uint64", "Timestamps", 109, 0],
    ["optional", "int32",  "Status", 110, 0],
    ["optional", "bool",  "IsDeleted", 111, false] // some comments
]


i need a tool to parse it . the key is how to split jsons out. 

string  '}{' or  '}['  or '][' is the separator between jsons. so, use a regular expression and code as follow

    buf = ''
    fd = open(cangjie_file)
    for line in fd.readlines() :
        line = line[:line.find('//')].strip().replace(' ', '')
        if len(line) == 0 :
            continue
        buf += line
    fd.close()
    pattern = re.compile(r"\]\[|\}\[|\]\{") #set separator
    beg = 0
    for match in pattern.finditer(buf):
        #print match.group(),match.start(), match.end(), m.span()
        print json.loads(buf[beg : match.start() + 1]) # it's the json we want
        beg = match.end() - 1



TO BE CONTINUED





阅读更多
个人分类: python
上一篇understand the data model of Cassandra
下一篇ZZ:程序员修炼之道
想对作者说点什么? 我来说一句

phpjson download

2014年05月29日 742KB 下载

excel_parse

2017年08月01日 4KB 下载

JASON parser

2011年04月20日 2KB 下载

没有更多推荐了,返回首页

关闭
关闭