pandas实现Excel vlookup功能
import numpy as np
import pandas as pd
file = r'data\loan_grade.xlsx'
loan_stats = pd. read_excel( file , sheet_name= 'loan_stats' )
loan_stats
id member_id loan_amnt term int_rate 0 1077501 1296599 5000 36 months 0.1065 1 1077175 1313524 2400 36 months 0.1596 2 1075358 1311748 3000 60 months 0.1269 3 1075269 1311441 5000 36 months 0.0790 4 1072053 1288686 3000 36 months 0.1864 5 1071795 1306957 5600 60 months 0.2128
member_grade = pd. read_excel( file , sheet_name= 'member_grade' )
member_grade
member_id grade 0 1296599 B 1 1313524 C 2 1277178 C 3 1311441 A 4 1304742 C 5 1306957 F
pd. merge( loan_stats, member_grade, how= 'inner' )
id member_id loan_amnt term int_rate grade 0 1077501 1296599 5000 36 months 0.1065 B 1 1077175 1313524 2400 36 months 0.1596 C 2 1075269 1311441 5000 36 months 0.0790 A 3 1071795 1306957 5600 60 months 0.2128 F
inner起到的效果如下,只得到两个数据集交集部分。
pd. merge( loan_stats, member_grade, how= 'left' )
id member_id loan_amnt term int_rate grade 0 1077501 1296599 5000 36 months 0.1065 B 1 1077175 1313524 2400 36 months 0.1596 C 2 1075358 1311748 3000 60 months 0.1269 NaN 3 1075269 1311441 5000 36 months 0.0790 A 4 1072053 1288686 3000 36 months 0.1864 NaN 5 1071795 1306957 5600 60 months 0.2128 F
left 效果如下,right填充left, 缺失部分NaN填充:
pd. merge( loan_stats, member_grade, how= 'right' )
id member_id loan_amnt term int_rate grade 0 1077501.0 1296599 5000.0 36 months 0.1065 B 1 1077175.0 1313524 2400.0 36 months 0.1596 C 2 1075269.0 1311441 5000.0 36 months 0.0790 A 3 1071795.0 1306957 5600.0 60 months 0.2128 F 4 NaN 1277178 NaN NaN NaN C 5 NaN 1304742 NaN NaN NaN C
right效果如下, left填充right,缺失的用NaN填充:
pd. merge( loan_stats, member_grade, how= 'outer' )
id member_id loan_amnt term int_rate grade 0 1077501.0 1296599 5000.0 36 months 0.1065 B 1 1077175.0 1313524 2400.0 36 months 0.1596 C 2 1075358.0 1311748 3000.0 60 months 0.1269 NaN 3 1075269.0 1311441 5000.0 36 months 0.0790 A 4 1072053.0 1288686 3000.0 36 months 0.1864 NaN 5 1071795.0 1306957 5600.0 60 months 0.2128 F 6 NaN 1277178 NaN NaN NaN C 7 NaN 1304742 NaN NaN NaN C
完成的是并集的聚合,原图有错,所以就不自己做了!
_map = pd. Series( member_grade. grade. values, index= member_grade. member_id)
loan_stats. member_id. map ( _map)
0 B
1 C
2 NaN
3 A
4 NaN
5 F
Name: member_id, dtype: object
map只能实现left的作用,但是大多数情况适合我们使用!