python自带的json包能够方便的解析json文本,但是如果json文本中包含重复key的时候,解析的结果就是错误的。如下为例
In [5]: d = """ {"key":"1", "key":"2", "key":"3", "key2":"4"}""" In [6]: d Out[6]: ' {"key":"1", "key":"2", "key":"3", "key2":"4"}' In [7]: json.loads(d) Out[7]: {'key': '3', 'key2': '4'}
1
2
3
4
5
6
7
8
9
|
In
[
5
]
:
d
=
""" {"key":"1", "key":"2", "key":"3", "key2":"4"}"""
In
[
6
]
:
d
Out
[
6
]
:
' {"key":"1", "key":"2", "key":"3", "key2":"4"}'
In
[
7
]
:
json
.
loads
(
d
)
Out
[
7
]
:
{
'key'
:
'3'
,
'key2'
:
'4'
}
|
原因是python解析的时候是创建一个字典,首先会读取到key的值,但是后面遇到重复键的时候,后来的值会覆盖原来的值,导致最后只有一个key的值留下来。
这肯定不是我们想要的结果,其中一种结果可以是将相同键的值聚合成一个数组,即如下所示。
{ "key":["1","2","3"], "key2":"4" }
1
2
3
4
5
|
{
"key"
:
[
"1"
,
"2"
,
"3"
]
,
"key2"
:
"4"
}
|
如何得到这种结果呢?python的json包还是留下了活路的。首先来看一下解析函数loads的原型。
json.loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
1
2
3
4
5
|
json
.
loads
(
s
,
encoding
=
None
,
cls
=
None
,
object_hook
=
None
,
parse_float
=
None
,
parse_int
=
None
,
parse_constant
=
None
,
object_pairs_hook
=
None
,
*
*
kw
)
|
要注意的是object_pairs_hook这个参数,这是个回调函数,在解析json文本的时候会调用它并更改返回的结果。为了得到前述的结果,我们定义如下的hook函数:
def my_obj_pairs_hook(lst): result={} count={} for key,val in lst: if key in count:count[key]=1+count[key] else:count[key]=1 if key in result: if count[key] > 2: result[key].append(val) else: result[key]=[result[key], val] else: result[key]=val return result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def
my_obj_pairs_hook
(
lst
)
:
result
=
{
}
count
=
{
}
for
key
,
val
in
lst
:
if
key
in
count
:
count
[
key
]
=
1
+
count
[
key
]
else
:
count
[
key
]
=
1
if
key
in
result
:
if
count
[
key
]
>
2
:
result
[
key
]
.
append
(
val
)
else
:
result
[
key
]
=
[
result
[
key
]
,
val
]
else
:
result
[
key
]
=
val
return
result
|
在解析文本的时候将上述函数作为参数传入,代码如下所示:
json.loads(data, object_pairs_hook=my_obj_pairs_hook) Signature: json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Docstring: Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. ``parse_float``, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal). ``parse_int``, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float). ``parse_constant``, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. The ``encoding`` argument is ignored and deprecated. File: /usr/local/anaconda3/lib/python3.6/json/__init__.py Type: function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
json
.
loads
(
data
,
object_pairs_hook
=
my_obj_pairs_hook
)
Signature
:
json
.
loads
(
s
,
*
,
encoding
=
None
,
cls
=
None
,
object_hook
=
None
,
parse_float
=
None
,
parse_int
=
None
,
parse_constant
=
None
,
object_pairs_hook
=
None
,
*
*
kw
)
Docstring
:
Deserialize
`
`
s
`
`
(
a
`
`
str
`
`
,
`
`
bytes
`
`
or
`
`
bytearray
`
`
instance
containing
a
JSON
document
)
to
a
Python
object
.
`
`
object_hook
`
`
is
an
optional
function
that
will
be
called
with
the
result
of
any
object
literal
decode
(
a
`
`
dict
`
`
)
.
The
return
value
of
`
`
object_hook
`
`
will
be
used
instead
of
the
`
`
dict
`
`
.
This
feature
can
be
used
to
implement
custom
decoders
(
e
.
g
.
JSON
-
RPC
class
hinting
)
.
`
`
object_pairs_hook
`
`
is
an
optional
function
that
will
be
called
with
the
result
of
any
object
literal
decoded
with
an
ordered
list
of
pairs
.
The
return
value
of
`
`
object_pairs_hook
`
`
will
be
used
instead
of
the
`
`
dict
`
`
.
This
feature
can
be
used
to
implement
custom
decoders
that
rely
on
the
order
that
the
key
and
value
pairs
are
decoded
(
for
example
,
collections
.
OrderedDict
will
remember
the
order
of
insertion
)
.
If
`
`
object_hook
`
`
is
also
defined
,
the
`
`
object_pairs_hook
`
`
takes
priority
.
`
`
parse_float
`
`
,
if
specified
,
will
be
called
with
the
string
of
every
JSON
float
to
be
decoded
.
By
default
this
is
equivalent
to
float
(
num_str
)
.
This
can
be
used
to
use
another
datatype
or
parser
for
JSON
floats
(
e
.
g
.
decimal
.
Decimal
)
.
`
`
parse_int
`
`
,
if
specified
,
will
be
called
with
the
string
of
every
JSON
int
to
be
decoded
.
By
default
this
is
equivalent
to
int
(
num_str
)
.
This
can
be
used
to
use
another
datatype
or
parser
for
JSON
integers
(
e
.
g
.
float
)
.
`
`
parse_constant
`
`
,
if
specified
,
will
be
called
with
one
of
the
following
strings
:
-
Infinity
,
Infinity
,
NaN
.
This
can
be
used
to
raise
an
exception
if
invalid
JSON
numbers
are
encountered
.
To
use
a
custom
`
`
JSONDecoder
`
`
subclass
,
specify
it
with
the
`
`
cls
`
`
kwarg
;
otherwise
`
`
JSONDecoder
`
`
is
used
.
The
`
`
encoding
`
`
argument
is
ignored
and
deprecated
.
File
:
/
usr
/
local
/
anaconda3
/
lib
/
python3
.
6
/
json
/
__init__
.
py
Type
:
function
|
即可得到前述的相同键的值合并为数组的结果。
在这个示例中,传入my_obj_pairs_hook的参数是一个元组列表,大致如下所示:
[("key","1"),("key","2"),("key","3"),("key2","4")]
1
2
|
[
(
"key"
,
"1"
)
,
(
"key"
,
"2"
)
,
(
"key"
,
"3"
)
,
(
"key2"
,
"4"
)
]
|
之所以参数是这个样子,是因为这几个键值对组成了一个字典,python使用默认的dict方法返回字典,自然会出现值覆盖的情况。而有了my_obj_pairs_hook之后就调用这个函数得到字典结果,这样我们就保证了键值的不丢失,最终得到我们希望的结果。如果是个更加复杂的json文本,则每次解析一个字典的时候都会调用这个函数,也会传入不同的元组列表,大致如示例所示。