pandas清洗数据
基于split函数的列分割
import pandas as pd
tmp_dict = {'name':['a,v','b,n'],'age':[11,34]}
df = pd.DataFrame(tmp_dict)
df['f_name'] = df['name'].str.split(',').str[0]
df['l_name'] = df['name'].str.split(',').str[1]
print(df)
借助cat合并两列
df['name_cat'] = df['f_name'].str.cat(df['l_name'],sep = '=')#sep为链接的字符
直接用“+”合并
df['name_add'] = df['f_name'] + '+' + df['l_name']
多条件筛选
将筛选条件进行罗列,最后通过‘&’、‘|’等组合
fac1 = df['name_add'] == 'a+v'
fac2 = df['name_cat'] == 'a=v'
fac3 = df['age'] > 10
df1 = df[fac1 & fac2 & fac3]
字符筛选
在选中列中筛选包含目标字符的数据,也可以使用 ‘|’ 来进行多个条件筛选。
df1 = df.loc[df['name_add'].str.contains('a+')]
df2 = df.loc[df['name_add'].str.contains('a+|b+')]
.isin()筛选某列等于多个数值或者字符串
df1 = df[df['f_name'].isin(['a','v'])]#参数为list