现有resume表格,想要将其中“期望薪资(最低)”和“期望薪资(最高)”这两列的数值,按照{1:<=5000,2:5000~10000,3:10000~20000,4:20000~30000,5:>30000}的规则替换成1/2/3/4/5,具体操作如下:
首先导入包 numpy和pandas,再读取数据
import numpy as np
import pandas as pd
a = pd.read_excel(r"你的文件路径\resume(1).xlsx")
将a这个Dataframe转换为array
a_array = np.array(a)
对a_array这个数组进行索引
expect_lowest_salary = a_array[:,6]
print(expect_lowest_salary)
打印出的结果为
使用enumerate()对得到的expect_lowest_salary进行遍历,从而进行替换
for idx, val in enumerate(expect_lowest_salary):
if val <= 5000:
expect_lowest_salary[idx] = 1
elif 5000 < val <= 10000:
expect_lowest_salary[idx] = 2
elif 10000 < val <= 20000:
expect_lowest_salary[idx] = 3
elif 20000 < val <= 30000:
expect_lowest_salary[idx] = 4
elif 30000 < val:
expect_lowest_salary[idx] = 5
print(expect_lowest_salary)
得到的结果是:
那么对于期望薪资(最低)就完成了替换,下一步是把新的expect_lowest_salary这个array赋值给a这个dataframe:
a.loc[:,'期望薪资(最低)'] = expect_lowest_salary
# a.to_excel('替换后最低期望薪资.xlsx')
这步完成之后,期望薪资(最低)列已经被替换
接着对期望薪资(最高)列进行替换,也是同理,代码如下
expect_highest_salary = a_array[:,7]
print(expect_highest_salary)
for idx, val in enumerate(expect_highest_salary):
if val <= 5000:
expect_lowest_salary[idx] = 1
elif 5000 < val <= 10000:
expect_lowest_salary[idx] = 2
elif 10000 < val <= 20000:
expect_lowest_salary[idx] = 3
elif 20000 < val <= 30000:
expect_lowest_salary[idx] = 4
elif 30000 < val:
expect_lowest_salary[idx] = 5
print(expect_highest_salary)
for i in expect_lowest_salary:
print(i)
a.loc[:,'期望薪资(最高)'] = expect_highest_salary
a.to_excel('替换后最低最高期望薪资.xlsx') # 将两步a.loc[]对a所做的替换保存到新的excel里
这时打开该excel就可以看到,薪资两列已经被替换好了