我有一个csv文件(original.csv),其中有一个惟一的ID列(uid)和我要计算的列,然后用未修改的uid创建一个新文件(result.csv),并根据计算结果创建新列。在
我的原始文件如下所示:uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3
我想做一个与此逻辑相同的计算(用SQL编写):case when var01 = 1 then 1 else 0 end as var01_new, case when var02 = 1 then 1 else 0 end as var02_new, ...
结果如下:
^{pr2}$
考虑到实际文件的大小(~20M行,50+列),我希望将解决方案保存在基Python中,而不是像Pandas和Numpy这样的内存有限的包。我尝试了modifying this S/O question,但是我不能让它为我的用例工作。在
我试过这个代码,但没用。在>>> import csv
>>>
>>> sourcepath = "/Users/me/python_case_statement.csv"
>>> destpath = "/Users/me/python_case_statement_flat.csv"
>>>
>>> with open(sourcepath, "rb") as source, open(destpath, "wb") as dest:
... reader = csv.reader(source, delimiter = ',', quotechar='"')
... writer = csv.writer(dest, delimiter = ',', quotechar='"')
... headers = reader.next()
... writer.writerow(headers)
... for rownum, row in enumerate(reader):
... 'uid' = 'uid'
... if 'var01' == 1:
... 'var01_new' == 1
... else:
... 'var01_new' == 0
... row.append(result)
... writer.writerow(row)
...
File "", line 7
SyntaxError: can't assign to literal
>>>