parse a log file which is a json
python 2.6.* already has a json module, simplejson. python 2.5.* need install it manually.
of course,
import json
1, read the file:
f = open('json.log', 'r')
data = f.read()
2, load as json
trim the other string at the beginning of file
jdata = json.loads(data[data.find('{') : ])
if the whole file is a json structure, would also load a json as following:
jdata = json.load(f)
jdata = json.JSONDecoder().decode(f.read())
3, parse the json
a sub josn named Mapper is included in it, so get it
mapper = jdata["mapper"]
there may be a sub json, Reducer, in it, try to get it
reducer = {}
if "Reducer" in jdata :
reducer = jdata["Reducer"]
there are some other json object in it, iterate them
for k, v in jdata.iteritems() :
print k
the mapper, which is a list, have several elements which are json object also
for elem in mapper :
for k, v in elem.iteritems() :
print k, v
Practice 1
a Cangjie is a structure type defined in our project. a Cangjie difination file includes many json structures.
the file like this:
{
"namespace" : "project::component",
"dependence" : ["../include/log_cj.cj", "base_cj.cj"]
}
[[ // enum
"LogType",
["CREATE", 0],
["DELETE", 1]
]]
[[ // enum
"DetectionType",
["32BIT_PER_CHUNK", 0],
["32BIT_PER_BLOCK_CRC32C", 1],
["32BIT_PER_BLOCK_CRC32C_V2", 2]
]]
["LogCJ",
{
"base": "project::component::Message"
},
["required", "uint32", "Type", 100],
["required", "IdTypeCJ", "Id", 101],
["required", "vuint64", "Version", 102],
["optional", "uint32", "Checksum", 103],
["optional", "uint64", "Length", 104, 0],
["optional", "uint32", "DiskId", 105, 0],
["optional", "uint32", "DetectionType", 106, 0],
["optional", "string", "Checksum", 107, ""],
["optional", "uint64", "ChecksumLength", 108, 0],
["optional", "uint64", "Timestamps", 109, 0],
["optional", "int32", "Status", 110, 0],
["optional", "bool", "IsDeleted", 111, false] // some comments
]
i need a tool to parse it . the key is how to split jsons out.
string '}{' or '}[' or '][' is the separator between jsons. so, use a regular expression and code as follow
buf = ''
fd = open(cangjie_file)
for line in fd.readlines() :
line = line[:line.find('//')].strip().replace(' ', '')
if len(line) == 0 :
continue
buf += line
fd.close()
pattern = re.compile(r"\]\[|\}\[|\]\{") #set separator
beg = 0
for match in pattern.finditer(buf):
#print match.group(),match.start(), match.end(), m.span()
print json.loads(buf[beg : match.start() + 1]) # it's the json we want
beg = match.end() - 1