Understand of WOE and IV in feature engineering

Introduction of WOE

WOE(weight of evidences) tells the power of an idependent variable in relation to the dependent variable.

Formula

for each group i:

WOE = ln(% of non-events / % of events)

Steps of Calculating WOE

  1. For a continuous variable, split data into N group (or lesser depending on the distribution).
  2. Calculate the number of events and non-events in each group (bin)
  3. Calculate the % of events and % of non-events in each group.
  4. Calculate WOE by taking natural log of division of % of non-events and % of events

Benefits of WOE

  • It can treat outliers. Suppose you have a continuous variable such as annual salary and extreme values are more than 500 million dollars. These values would be grouped to a class of (let’s say 250-500 million dollars). Later, instead of using the raw values, we would be using WOE scores of each classes.
  • It can handle missing values as missing values can be binned separately.
    Since WOE Transformation handles categorical variable so there is no need for dummy variables.
  • WoE transformation helps you to build strict linear relationship with log odds. Otherwise it is not easy to accomplish linear relationship using other transformation methods such as log, square-root etc. In short, if you would not use WOE transformation, you may have to try out several transformation methods to achieve this.
  • Encode class value with continuous value.
  • Reduce columns of the input values for training model after encoding like one-hot.

Introduction of IV

Information value is one of the most useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance.

Formula

For each group i of variable x:

IVx =  ∑ (for events in i: % of non-events - % of events) * WOEi

Rules related to Information Value

Information ValueVariable Predictiveness
Less than 0.02Not useful for prediction
0.02 to 0.1Weak predictive Power
0.1 to 0.3Medium predictive Power
0.3 to 0.5Strong predictive Power
>0.5Suspicious Predictive Power

Benefit of IV

  • provide a basis for us to drill down further in our relationship analysis between independent and dependent variables.
  • if variable is a qualitative type, we can use binning method followed by WoE and IV concepts to engineer meaningful features.

Reference

  1. https://towardsdatascience.com/model-or-do-you-mean-weight-of-evidence-woe-and-information-value-iv-331499f6fc2
  2. https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值