Python数据处理:解析三层字典嵌套
(声明:本教程仅供本人学习使用,如有人使用该技术触犯法律与本人无关)
(如果有错误,还希望指出。共同进步)
数据及数据格式
** 输入
ipt = {
"0": {
"43": {"112": 65808588, "113": 74291673},
"60": {"112": 32307294, "113": 25144857},
"61": {"112": 1884118, "113": 56347}},
"1": {
"60": {"113": 68571461, "109": 77974500},
"61": {"113": 26031692, "109": 20688676},
"62": {"113": 5396847, "109": 1336824}},
"2": {
"61": {"109": 87649000, "95": 80176340},
"62": {"109": 11292684, "95": 19387880},
"110": {"109": 1058316, "95": 435780}}
}
** 输出
opt = {
'bb_0': {
'common': [43, 60, 61],
'probability': {
'112': [0.65808588, 0.32307294, 0.01884118],
'113': [0.74291673, 0.25144857, 0.00056347]
}
},
'bb_1': {
'common': [60, 61, 62],
'probability': {
'113': [0.68571461, 0.26031692, 0.05396847],
'109': [0.779745, 0.20688676, 0.01336824]
}
},
'bb_2': {
'common': [61, 62, 110],
'probability': {
'109': [0.87649, 0.11292684, 0.01058316],
'95': [0.8017634, 0.1938788, 0.0043578]
}
}
}
解析要求
预期不知道 key 和 value 中的 key, 需要都解析出来,当作一个新的 key
几种解析方法
方法一: 三层for循环遍历
最简单的解决办法,最直观、有效
#!/usr/bin/python
# -*- coding:utf-8 -*-
def conversion(_cfg):
result = dict()
id_set = set()
for wk, wv in _cfg.items():
common = list()
probability = dict()
for mk, mv in wv.items():
id_set.add(int(mk))
common.append(int(mk))
for ik, iv in mv.items():
id_set.add(int(ik))
pb = iv / (100 * 1000 * 1000)
try:
probability[ik].append(pb)
except KeyError:
probability[ik] = [pb, ]
result["bb_" + wk] = {"common": common, "probability": probability}
return result, id_set
方法二:递归for循环遍历
由于作者的一点强迫症,觉得三层for循环不太好看,所以尝试写了递归的方法
#!/usr/bin/python
# -*- coding:utf-8 -*-
def conversion(_cfg, in_dict=None, state=1, id_set=None):
if state == 1:
id_set = set()
result = dict()
for wk, wv in _cfg.items():
_result = dict()
if isinstance(wv, dict):
_result, id_set = conversion(wv, state=2, id_set=id_set)
result["bb_" + wk] = _result
elif state == 2:
common = list()
in_dict = dict()
for mk, mv in _cfg.items():
common.append(int(mk))
id_set.add(int(mk))
if isinstance(mv, dict):
in_dict, id_set = conversion(mv, in_dict, state=3, id_set=id_set)
result = {"common": common, "probability": in_dict}
else:
for ik, iv in _cfg.items():
id_set.add(int(ik))
pb = iv / (100 * 1000 * 1000)
try:
in_dict[ik].append(pb)
except KeyError:
in_dict[ik] = [pb, ]
result = in_dict
return result, id_set
方法三:用列表生成式优化for循环遍历
担心递归会对系统内存造成压力,所以决定还是优化三层for循环方法
#!/usr/bin/python
# -*- coding:utf-8 -*-
def conversion(ipt):
res = dict()
id_set = set()
for key, value in ipt.items():
keys = list(value.keys())
big_ks = list(value.values())[0].keys()
r_collection = {k: [] for k in big_ks}
[[r_collection[k].append(i[k] / (100 * 1000 * 1000)) for k in r_collection] for i in [value[sk] for sk in keys]]
res['bb_{}'.format(key)] = {
'common': [int(k) for k in keys],
'probability': r_collection
}
id_set.update(keys + list(r_collection.keys()))
return res, id_set
比较
方法一 | 方法二 | 方法三 |
---|---|---|
简单 直观 不会出错 | 形式避免了3层for循环,采用递归的方式实现 实际上还是3层for循环,相对的增加了内存负荷 | 列表生成式使得代码行数减少 实际逻辑没有进行优化 同时增加了一层多余的for循环 当增加字典的keys时,存在数据不对应风险 不同于生成器,没有起到减少内存的作用 |
心得
可恶的代码强迫症使我思考 😊😊😊~~~