python json loads 解析含有重复key的json

python自带的json包能够方便的解析json文本,但是如果json文本中包含重复key的时候,解析的结果就是错误的。如下为例

Python
In [5]: d = """ {"key":"1", "key":"2", "key":"3", "key2":"4"}""" In [6]: d Out[6]: ' {"key":"1", "key":"2", "key":"3", "key2":"4"}' In [7]: json.loads(d) Out[7]: {'key': '3', 'key2': '4'}
1
2
3
4
5
6
7
8
9
In [ 5 ] : d = """ {"key":"1", "key":"2", "key":"3", "key2":"4"}"""
 
In [ 6 ] : d
Out [ 6 ] : ' {"key":"1", "key":"2", "key":"3", "key2":"4"}'
 
In [ 7 ] : json . loads ( d )
Out [ 7 ] : { 'key' : '3' , 'key2' : '4' }
 
 

原因是python解析的时候是创建一个字典,首先会读取到key的值,但是后面遇到重复键的时候,后来的值会覆盖原来的值,导致最后只有一个key的值留下来。

这肯定不是我们想要的结果,其中一种结果可以是将相同键的值聚合成一个数组,即如下所示。

Python
{ "key":["1","2","3"], "key2":"4" }
1
2
3
4
5
{
     "key" : [ "1" , "2" , "3" ] ,
     "key2" : "4"
}
 

如何得到这种结果呢?python的json包还是留下了活路的。首先来看一下解析函数loads的原型。

Python
json.loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
1
2
3
4
5
json . loads ( s , encoding = None , cls = None ,
         object_hook = None , parse_float = None ,
         parse_int = None , parse_constant = None ,
         object_pairs_hook = None , * * kw )
 

要注意的是object_pairs_hook这个参数,这是个回调函数,在解析json文本的时候会调用它并更改返回的结果。为了得到前述的结果,我们定义如下的hook函数:

Python
def my_obj_pairs_hook(lst): result={} count={} for key,val in lst: if key in count:count[key]=1+count[key] else:count[key]=1 if key in result: if count[key] > 2: result[key].append(val) else: result[key]=[result[key], val] else: result[key]=val return result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def my_obj_pairs_hook ( lst ) :
     result = { }
     count = { }
     for key , val in lst :
         if key in count : count [ key ] = 1 + count [ key ]
         else : count [ key ] = 1
         if key in result :
             if count [ key ] > 2 :
                 result [ key ] . append ( val )
             else :
                 result [ key ] = [ result [ key ] , val ]
         else :
             result [ key ] = val
     return result
 

在解析文本的时候将上述函数作为参数传入,代码如下所示:

Python
json.loads(data, object_pairs_hook=my_obj_pairs_hook) Signature: json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Docstring: Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. ``parse_float``, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal). ``parse_int``, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float). ``parse_constant``, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. The ``encoding`` argument is ignored and deprecated. File: /usr/local/anaconda3/lib/python3.6/json/__init__.py Type: function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
     json . loads ( data , object_pairs_hook = my_obj_pairs_hook )
 
Signature : json . loads ( s , * , encoding = None , cls = None , object_hook = None , parse_float = None , parse_int = None , parse_constant = None , object_pairs_hook = None , * * kw )
Docstring :
Deserialize ` ` s ` ` ( a ` ` str ` ` , ` ` bytes ` ` or ` ` bytearray ` ` instance
containing a JSON document ) to a Python object .
 
` ` object_hook ` ` is an optional function that will be called with the
result of any object literal decode ( a ` ` dict ` ` ) . The return value of
` ` object_hook ` ` will be used instead of the ` ` dict ` ` . This feature
can be used to implement custom decoders ( e . g . JSON - RPC class hinting ) .
 
` ` object_pairs_hook ` ` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs .    The
return value of ` ` object_pairs_hook ` ` will be used instead of the ` ` dict ` ` .
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded ( for example ,
collections . OrderedDict will remember the order of insertion ) . If
` ` object_hook ` ` is also defined , the ` ` object_pairs_hook ` ` takes priority .
 
` ` parse_float ` ` , if specified , will be called with the string
of every JSON float to be decoded . By default this is equivalent to
float ( num_str ) . This can be used to use another datatype or parser
for JSON floats ( e . g . decimal . Decimal ) .
 
` ` parse_int ` ` , if specified , will be called with the string
of every JSON int to be decoded . By default this is equivalent to
int ( num_str ) . This can be used to use another datatype or parser
for JSON integers ( e . g . float ) .
 
` ` parse_constant ` ` , if specified , will be called with one of the
following strings : - Infinity , Infinity , NaN .
This can be used to raise an exception if invalid JSON numbers
are encountered .
 
To use a custom ` ` JSONDecoder ` ` subclass , specify it with the ` ` cls ` `
kwarg ; otherwise ` ` JSONDecoder ` ` is used .
 
The ` ` encoding ` ` argument is ignored and deprecated .
File :        / usr / local / anaconda3 / lib / python3 . 6 / json / __init__ . py
Type :        function
 
 

即可得到前述的相同键的值合并为数组的结果。
在这个示例中,传入my_obj_pairs_hook的参数是一个元组列表,大致如下所示:

Python
[("key","1"),("key","2"),("key","3"),("key2","4")]
1
2
[ ( "key" , "1" ) , ( "key" , "2" ) , ( "key" , "3" ) , ( "key2" , "4" ) ]
 

之所以参数是这个样子,是因为这几个键值对组成了一个字典,python使用默认的dict方法返回字典,自然会出现值覆盖的情况。而有了my_obj_pairs_hook之后就调用这个函数得到字典结果,这样我们就保证了键值的不丢失,最终得到我们希望的结果。如果是个更加复杂的json文本,则每次解析一个字典的时候都会调用这个函数,也会传入不同的元组列表,大致如示例所示。




  • zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值