python计算iv值_python中的WOE和IV表

我有一个计算WOE和IV的函数,如下所示:def calc_iv(df, feature, target, pr=0):

lst = []

for i in range(df[feature].nunique()):

val = list(df[feature].unique())[i]

lst.append([feature, val, df[df[feature] == val].count()[feature], df[(df[feature] == val) & (df[target] == 1)].count()[feature]])

data = pd.DataFrame(lst, columns=['Variable', 'Value', 'All', 'Bad'])

data = data[data['Bad'] > 0]

data['Share'] = data['All'] / data['All'].sum()

data['Bad Rate'] = data['Bad'] / data['All']

data['Distribution Good'] = (data['All'] - data['Bad']) / (data['All'].sum() - data['Bad'].sum())

data['Distribution Bad'] = data['Bad'] / data['Bad'].sum()

data['grp_score'] = round((data['Distribution Good']/(data['Distribution Good'] + data['Distribution Bad']))*10, 2)

data['WoE'] = np.log(data['Distribution Good'] / data['Distribution Bad'])

data['IV'] = (data['WoE'] * (data['Distribution Good'] - data['Distribution Bad'])).sum()

data['Efficiency'] = abs(data['Distribution Good'] - data['Distribution Bad'])/2

data = data.sort_values(by=['Variable', 'Value'], ascending=True)

d = {data['Distribution Good'],data['Distribution Bad'],data['Share'],

data['Bad Rate'],data['grp_score'],data['WoE'],data['IV'],data['Efficiency']}

mydf=pd.DataFrame(data=d)

if pr == 1:

print(data)

#return data['IV'].values[0]

return mydf.values

函数检查数据帧(dat),如下所示

^{pr2}$

然后我调用函数如下calc_iv(dat, 'myvar1', 'target', pr=0)

我希望函数返回myvar1Distribution Good Distribution Bad Share Bad Rate grp_score WoE IV Efficiency

0.1 0.9 1 0.9 20 0.2 0.6 0.8

0.8 0.2 2 0.2 10 0.1 0.2 0.1

0.7 0.3 3 0.3 70 0.7 0.8 0.5

但是我得到了下面的错误TypeError: 'Series' objects are mutable, thus they cannot be hashed

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
WOE编码和IV是一种常用的特征工程方法,它们可以用于衡量一个特征与目标变量之间的关联程度和预测能力。下面是Python计算IVWOE编码的示例代码: ```python import pandas as pd import numpy as np def calc_woe_iv(df, col, target): """ 计算指定特征的WOE编码和IV :param df: 数据集 :param col: 特征列名 :param target: 目标变量列名 :return: WOE编码和IV """ # 计算该特征每个取的数量和占比 freq = pd.DataFrame({'total': df.groupby(col)[target].count(), 'bad': df.groupby(col)[target].sum()}) freq['good'] = freq['total'] - freq['bad'] freq['bad_rate'] = freq['bad'] / freq['bad'].sum() freq['good_rate'] = freq['good'] / freq['good'].sum() # 防止出现除0错误 freq.loc[freq['bad_rate'] == 0, 'bad_rate'] = 0.0001 freq.loc[freq['good_rate'] == 0, 'good_rate'] = 0.0001 # 计算WOE编码 freq['woe'] = np.log(freq['good_rate'] / freq['bad_rate']) # 计算IV freq['iv'] = (freq['good_rate'] - freq['bad_rate']) * freq['woe'] iv = freq['iv'].sum() return freq[['woe', 'iv']].reset_index().rename(columns={col: 'value'}), iv ``` 这个函数的输入参数包括数据集`df`、特征列名`col`和目标变量列名`target`,输出WOE编码和IV。在函数,我们首先计算了该特征每个取的数量、坏样本数量、好样本数量、坏样本率、好样本率和WOE,然后根据IV的公式计算了每个取对应的IV,并将它们相加得到总的IV。最后,我们将WOE编码和IV合并成一个DataFrame并返回。 需要注意的是,代码为了避免出现除0错误,我们在计算WOE编码和IV时对分母加上了一个极小0.0001。同时,WOE编码和IV计算方式可以根据具体的业务需求进行调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值