正如所建议的,DictReader也可以按如下方式用于创建行列表。然后可以作为熊猫的框架导入:import pandas as pd
import csv
rows = []
csv_header = ['user', 'item', 'time', 'rating', 'review']
frame_header = ['user', 'item', 'rating', 'review']
with open('input.csv', 'rb') as f_input:
for row in csv.DictReader(f_input, delimiter=' ', fieldnames=csv_header[:-1], restkey=csv_header[-1], skipinitialspace=True):
try:
rows.append([row['user'], row['item'], row['rating'], ' '.join(row['review'])])
except KeyError, e:
rows.append([row['user'], row['item'], row['rating'], ' '])
frame = pd.DataFrame(rows, columns=frame_header)
print frame
这将显示以下内容:user item rating review
0 disjiad123 TYh23hs9 5 I love this phone as it is easy to use
1 hjf2329ccc TGjsk123 3 Suck restaurant
如果审查出现在行的开头,则一种方法是按如下相反的顺序分析行:import pandas as pd
import csv
rows = []
frame_header = ['rating', 'time', 'item', 'user', 'review']
with open('input.csv', 'rb') as f_input:
for row in f_input:
cols = [col[::-1] for col in row[::-1][2:].split(' ') if len(col)]
rows.append(cols[:4] + [' '.join(cols[4:][::-1])])
frame = pd.DataFrame(rows, columns=frame_header)
print frame
这将显示:rating time item user \
0 5 13160032 TYh23hs9 isjiad123
1 3 14423321 TGjsk123 hjf2329ccc
review
0 I love this phone as it is easy to used
1 Suck restaurant
row[::-1]用于反转整行的文本,[2:]跳过行尾,行尾现在位于行首。每一行然后在空格上分开。然后,列表理解会重新反转每个拆分条目。最后rows被附加到first,方法是获取固定的5列条目(现在在开头)。剩下的条目用空格连接起来,作为最后一列添加。
这种方法的好处是,它不依赖于输入数据是完全固定宽度的格式,而且您不必担心所使用的列宽是否随时间变化。