Python把表格中的数字分区间替换

zzydkyd

已于 2023-04-11 20:30:47 修改

阅读量264

点赞数

分类专栏： numpy pandas 文章标签： numpy pandas python

于 2023-04-11 20:27:09 首次发布

本文链接：https://blog.csdn.net/zzydkyd/article/details/130092232

版权

numpy 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

pandas

1 篇文章 0 订阅

订阅专栏

现有resume表格，想要将其中“期望薪资（最低）”和“期望薪资（最高）”这两列的数值，按照{1:<=5000,2:5000~10000,3:10000~20000,4:20000~30000,5:>30000}的规则替换成1/2/3/4/5，具体操作如下：

首先导入包 numpy和pandas，再读取数据

import numpy as np
import pandas as pd
a = pd.read_excel(r"你的文件路径\resume(1).xlsx")

将a这个Dataframe转换为array

a_array = np.array(a)

对a_array这个数组进行索引

expect_lowest_salary = a_array[:,6]
print(expect_lowest_salary)

打印出的结果为

使用enumerate()对得到的expect_lowest_salary进行遍历，从而进行替换

for idx, val in enumerate(expect_lowest_salary):
    if val <= 5000:
        expect_lowest_salary[idx] = 1
    elif 5000 < val <= 10000:
        expect_lowest_salary[idx] = 2
    elif 10000 < val <= 20000:
        expect_lowest_salary[idx] = 3
    elif 20000 < val <= 30000:
        expect_lowest_salary[idx] = 4
    elif 30000 < val:
        expect_lowest_salary[idx] = 5

print(expect_lowest_salary)

得到的结果是：

那么对于期望薪资（最低）就完成了替换，下一步是把新的expect_lowest_salary这个array赋值给a这个dataframe：

a.loc[:,'期望薪资（最低）'] = expect_lowest_salary
# a.to_excel('替换后最低期望薪资.xlsx')

这步完成之后，期望薪资（最低）列已经被替换

接着对期望薪资（最高）列进行替换，也是同理，代码如下

expect_highest_salary = a_array[:,7]
print(expect_highest_salary)
for idx, val in enumerate(expect_highest_salary):
    if val <= 5000:
        expect_lowest_salary[idx] = 1
    elif 5000 < val <= 10000:
        expect_lowest_salary[idx] = 2
    elif 10000 < val <= 20000:
        expect_lowest_salary[idx] = 3
    elif 20000 < val <= 30000:
        expect_lowest_salary[idx] = 4
    elif 30000 < val:
        expect_lowest_salary[idx] = 5

print(expect_highest_salary)
for i in expect_lowest_salary:
    print(i)
a.loc[:,'期望薪资（最高）'] = expect_highest_salary
a.to_excel('替换后最低最高期望薪资.xlsx') # 将两步a.loc[]对a所做的替换保存到新的excel里

这时打开该excel就可以看到，薪资两列已经被替换好了