I am trying to normalize complex nested json in python but I am unable to parse all the objects out.
sample_object = {'Name':'John', 'Location':{'City':'Los Angeles','State':'CA'}, 'hobbies':['Music', 'Running']}
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
for a in x:
flatten(a, name)
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(sample_object)
print json_normalize(flat)
Return Result:
Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles | CA | Running
Expected Result:
Name | Location_City | Location_State | Hobbies
-----+---------------+----------------+--------
John | Los Angeles | CA | Running
John | Los Angeles | CA | Music
解决方案
The problem you are having originates in the following section
elif type(x) is list:
for a in x:
flatten(a, name)
Because you do not change the name for every element of the list, every next element will override the assignment of the previous element and thus only the last element will show in the output.
Applied to this example, when the flattening function reaches the list 'hobbies' it will first assign the name 'hobbies' to the element 'Music' and send it to the output. After the element 'Music', the next element in the list is 'Running', which will also be asigned the name 'hobbies'. When this element is send to the output it will notice that the name 'hobbies' already exists and it will override the value 'Music' with the value 'Running'.
To prevent this the script from the link you referenced uses the following piece of code to append de array's index to the name, thus creating a unique name for every element of the array.
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + ' ')
i += 1
This would create an extra 'columns' to the data however rather then a new row. If the latter is what you want you would have to change the way the functions is set up. One way could be to adapt the function to return an list of json's (one for each list element in the original json).
An extra note: I would recommend beeing a bit more carefull with coppying code when submitting a question. The indenting is a bit of in this case and since you left out the part where you import json_normalize it might not be completely clear for everyone that you are importing it from pandas
from pandas.io.json import json_normalize