I'm a bit idiot in programming and Python. I know that these are a lot of explanations in previous questions about this but I carefully read all of them and I didn't find the solution.
I'm trying to read a JSON file which contains about 1 billion of data like this:
334465|{"color":"33ef","age":"55","gender":"m"}
334477|{"color":"3444","age":"56","gender":"f"}
334477|{"color":"3999","age":"70","gender":"m"}
I was trying hard to overcome that 6 digit numbers at the beginning of each line, but I dont know how can I read multiple JSON objects?
Here is my code but I can't find why it is not working?
import json
T =[]
s = open('simple.json', 'r')
ss = s.read()
for line in ss:
line = ss[7:]
T.append(json.loads(line))
s.close()
And the here is the error that I got:
ValueError: Extra Data: line 3 column 1 - line 5 column 48 (char 42 - 138)
Any suggestion would be very helpful for me!
解决方案
There are several problems with the logic of your code.
ss = s.read()
reads the entire file s into a single string. The next line
for line in ss:
iterates over each character in that string, one by one. So on each loop line is a single character. In
line = ss[7:]
you are getting the entire file contents apart from the first 7 characters (in positions 0 through 6, inclusive) and replacing the previous content of line with that. And then
T.append(json.loads(line))
attempts to convert that to JSON and store the resulting object into the T list.
Here's some code that does what you want. We don't need to read the entire file into a string with .read, or into a list of lines with .readlines, we can simply put the file handle into a for loop and that will iterate over the file line by line.
We use a with statement to open the file, so that it will get closed automatically when we exit the with block, or if there's an IO error.
import json
table = []
with open('simple.json', 'r') as f:
for line in f:
table.append(json.loads(line[7:]))
for row in table:
print(row)
output
{'color': '33ef', 'age': '55', 'gender': 'm'}
{'color': '3444', 'age': '56', 'gender': 'f'}
{'color': '3999', 'age': '70', 'gender': 'm'}
We can make this more compact by building the table list in a list comprehension:
import json
with open('simple.json', 'r') as f:
table = [json.loads(line[7:]) for line in f]
for row in table:
print(row)