我刚刚开始使用Pandas,我发现很难对待像数据帧这样的数据帧.每隔一段时间,我就无法解决如何在不迭代行的情况下做某事.
例如,我有一个包含预算信息的数据框.我想从’简短描述’中提取’供应商’,这是一个三种可能形式之一的字符串:
> blah blah blah to vendor name
> blah blah blah at vendor name
>供应商名称
我可以使用以下代码执行此操作,但我不禁觉得它没有正确使用Pandas.有什么想改善吗?
for i, row in dataframe.iterrows():
current = dataframe['short description'][i]
if 'to' in current:
point_of_break = current.index('to') + 3
dataframe['vendor'][i] = current[point_of_break:]
elif 'at' in current:
point_of_break = current.index('at') + 3
dataframe['vendor'][i] = current[point_of_break:]
else:
dataframe['vendor'][i] = current
解决方法:
我想你可以使用str.split by to或at然后用str [-1]选择list的最后一个值:
df = pd.DataFrame({'A':['blah blah blah to "vendor name"',
'blah blah blah at "vendor name"',
'"vendor name"']})
print (df)
A
0 blah blah blah to "vendor name"
1 blah blah blah at "vendor name"
2 "vendor name"
print (df.A.str.split('[at|to]\s+'))
0 [blah blah blah t, "vendor name"]
1 [blah blah blah a, "vendor name"]
2 ["vendor name"]
Name: A, dtype: object
df['vendor'] = df.A.str.split('(at|to) *').str[-1]
print (df)
A vendor
0 blah blah blah to "vendor name" "vendor name"
1 blah blah blah at "vendor name" "vendor name"
2 "vendor name" "vendor name"
或者使用:
df['vendor'] = df.A.str.split('[at|to]\s+').str[-1]
print (df)
A vendor
0 blah blah blah to "vendor name" "vendor name"
1 blah blah blah at "vendor name" "vendor name"
2 "vendor name" "vendor name"
标签:python,dataframe,pandas
来源: https://codeday.me/bug/20190627/1308321.html