Introduction of WOE
WOE(weight of evidences) tells the power of an idependent variable in relation to the dependent variable.
Formula
for each group i:
WOE = ln(% of non-events / % of events)
Steps of Calculating WOE
- For a continuous variable, split data into N group (or lesser depending on the distribution).
- Calculate the number of events and non-events in each group (bin)
- Calculate the % of events and % of non-events in each group.
- Calculate WOE by taking natural log of division of % of non-events and % of events
Benefits of WOE
- It can treat outliers. Suppose you have a continuous variable such as annual salary and extreme values are more than 500 million dollars. These values would be grouped to a class of (let’s say 250-500 million dollars). Later, instead of using the raw values, we would be using WOE scores of each classes.
- It can handle missing values as missing values can be binned separately.
Since WOE Transformation handles categorical variable so there is no need for dummy variables. - WoE transformation helps you to build strict linear relationship with log odds. Otherwise it is not easy to accomplish linear relationship using other transformation methods such as log, square-root etc. In short, if you would not use WOE transformation, you may have to try out several transformation methods to achieve this.
- Encode class value with continuous value.
- Reduce columns of the input values for training model after encoding like one-hot.
Introduction of IV
Information value is one of the most useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance.
Formula
For each group i of variable x:
IVx = ∑ (for events in i: % of non-events - % of events) * WOEi
Rules related to Information Value
Information Value | Variable Predictiveness |
---|---|
Less than 0.02 | Not useful for prediction |
0.02 to 0.1 | Weak predictive Power |
0.1 to 0.3 | Medium predictive Power |
0.3 to 0.5 | Strong predictive Power |
>0.5 | Suspicious Predictive Power |
Benefit of IV
- provide a basis for us to drill down further in our relationship analysis between independent and dependent variables.
- if variable is a qualitative type, we can use binning method followed by WoE and IV concepts to engineer meaningful features.
Reference
- https://towardsdatascience.com/model-or-do-you-mean-weight-of-evidence-woe-and-information-value-iv-331499f6fc2
- https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html