Numpy+Pandas数据处理·闯关
关卡2
正常做法
前提(系统自带的)
df = pd.read_excel('/home/mw/input/pandas1206855/pandas120.xlsx')
df.head()
- 提取学历为本科,工资在25k-35k的数据
df1 = df[(df['education'] == '本科')&(df['salary'] == '25k-35k')]
- 提取salary列中以’40k’结尾的数据
df2 = df[df['salary'].str.endswith('40k')]
- 提取薪资区间中最低薪资与最高薪资的平均值大于30k的行,只需提取原始字段(‘createTime’, ‘education’, ‘salary’)即可
def func(df):
lst = df['salary'].apply(lambda x:x.split('-')).tolist()
return lst
lst = func(df)
aver_sala=[]
for i in range(len(lst)):
num1 = int(lst[i][0].replace('k',''))
num2 = int(lst[i][1].replace('k',''))
aver_sala.append(float((num1+num2)/2))
df['aver_sala'] = aver_sala
df3 = df[df['aver_sala'] > 30][['createTime','education','salary',]]
不正常做法
等于正常做法