假设你有这样的数据:feature, r, feature, r
word1, freq1, word2, freq2
word3, freq3, word4, freq4
如果允许我使用自己的库,这里有一个图解解决方案:>>> import pyexcel
>>> r=pyexcel.SeriesReader("sample.csv")
>>> r[0]
['word1', ' freq1', ' word2', ' freq2']
>>> r[1]
['word3', ' freq3', ' word4', ' freq4']
>>> r.series()
['feature', ' r', ' feature', ' r']
>>> r.column_at(0)
['word1', 'word3']
>>> r.column_at(1)
[' freq1', ' freq3']
>>> r.column_at(2)
[' word2', ' word4']
>>> r.column_at(3)
[' freq2', ' freq4']
>>> a=zip(r.column_at(0),r.column_at(1))
>>> b=zip(r.column_at(2),r.column_at(3))
>>> a+b
[('word1', ' freq1'), ('word3', ' freq3'), (' word2', ' freq2'), (' word4', ' freq4')]
>>> j=open('sample.json', 'w')
>>> import json
>>> j.write(json.dumps(a+b))
>>> j.close()
>>> exit()
结果是:[["word1", " freq1"], ["word3", " freq3"], [" word2", " freq2"], [" word4", " freq4"]]
如您所见,引号中还有空格。所以你能做的就是使用一个表单格式化程序:>>> import pyexcel
>>> r=pyexcel.SeriesReader("sample.csv")
>>> def clean(value, type):
... return value.strip()
...
>>> r.add_formatter(pyexcel.formatters.SheetFormatter(str, clean))
>>> r.column_at(0)
['word1', 'word3']
>>> r.column_at(1)
['freq1', 'freq3']
>>> r.column_at(2)
['word2', 'word4']
>>> r.column_at(3)
['freq2', 'freq4']