首先,如何跳过标题?这很简单:next(infile) # skip the first line
for line in infile:
但是,您可能需要考虑使用^{}作为输入。它处理读取标题行,使用其中的信息为每一行创建dict,并为您拆分行(以及处理您可能没有想到的情况,例如CSV文件中可能存在的引号或转义文本):
^{pr2}$
现在来谈谈更难的问题。在
更好的解决方案可能是使用一个迭代JSON库,它可以将迭代器作为JSON数组转储。然后你可以这样做:def rows(infile):
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
yield row
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
genjson.dump(rows(infile), outfile)
stdlib ^{}在文档中有一个这样做的例子,尽管效率不高,因为它首先消耗整个迭代器来构建一个列表,然后转储:class GenJSONEncoder(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, o)
j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
outfile.write(j.encode(rows(infile)))
实际上,如果您愿意构建一个完整的列表而不是逐行编码,那么显式地进行列表化可能会更简单:with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
json.dump(list(rows(infile)))
您还可以通过重写iterencode方法来进行更进一步的工作,但这将不会那么简单,您可能希望在PyPI上寻找一个高效的、经过良好测试的流式迭代JSON库,而不是自己从json模块构建它。在
但是,同时,这里有一个直接解决您的问题的方法,在现有代码的基础上尽可能少地更改:with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# keep track of the index, just to distinguish line 0 from the rest
for i, line in enumerate(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
# add the ,\n _before_ each row except the first
if i:
outfile.write(',\n')
json.dump(row,outfile)
# write the final ]
outfile.write('\n]')
这种处理第一个元素而不是最后一个元素的技巧简化了许多此类问题。在
另一种简化方法是实际迭代相邻的行对,使用itertools文档中的^{}示例中的小变化:def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# iterate pairs of lines
for line, nextline in pairwise(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
# add the , if there is a next line
if nextline is not None:
outfile.write(',')
outfile.write('\n')
# write the final ]
outfile.write(']')
这和前一个版本一样高效,概念上更简单,但更抽象。在