Stepwise Selection

Stepwise Selection



In statistics, stepwise selection is a procedure we can use to build a regression model from a set of predictor variables by entering and removing predictors in a stepwise manner into the model until there is no statistically valid reason to enter or remove any more.

Backward Selection

One of the most commonly used stepwise selection methods is known as backward selection, which works as follows:

Step 1: Fit a regression model using all p predictor variables. Calculate the AIC***** value for the model.

Step 2: Remove the predictor variable that leads to the largest reduction in AIC and also leads to a statistically significant reduction in AIC compared to the model with all p predictor variables.

Step 3: Remove the predictor variable that leads to the largest reduction in AIC and also leads to a statistically significant reduction in AIC compared to the model with p-1 predictor variables.

Repeat the process until removing any predictor variable no longer longer leads to a statistically significant reduction in AIC.

*****There are several metrics you could use to calculate the quality of fit of a regression model including cross-validation prediction error, Cp, BIC, AIC, or adjusted R2. In the example below we choose to use AIC.

The following example shows how to perform backward selection in R.

Example: Backward Selection in R

For this example we’ll use the built-in mtcars dataset in R:

# view first six rows of mtcars
head(mtcars)

We will fit a multiple linear regression model using mpg (miles per gallon) as our response variable and all of the other 10 variables in the dataset as potential predictors variables.

The following code shows how to perform backward stepwise selection :

# define intercept-only model
intercept_only <- lm(mpg ~ 1, data=mtcars)

# define model with all predictors
all <- lm(mpg ~ ., data = mtcars)

# perform backward stepwise regression
backward <- step(all, direction = 'backward', scope = formula(all), trace = 0)

#view results of backward stepwise regression
backward$anova

# view final model
backward$coefficients

Here is how to interpret the results:

  • First, we fit a model using all 10 predictor variables and calculate the AIC of the model.

  • Next, we removed the variable (cyl) that lead to the greatest reduction in AIC and also had a statistically significant reduction in AIC compared to the 10-predictor variable model.

  • Next, we removed the variable (vs) that lead to the greatest reduction in AIC and also had a statistically significant reduction in AIC compared to the 9-predictor variable model.

  • Next, we removed the variable (carb) that lead to the greatest reduction in AIC and also had a statistically significant reduction in AIC compared to the 8-predictor variable model.

We repeated this process until removing any variable no longer led to a statistically significant reduction in AIC.

The final model turns out to be:
m p g = 9.62 − 3.92 ∗ w t + 1.23 ∗ q s e c + 2.94 ∗ a m mpg = 9.62 - 3.92*wt + 1.23*qsec + 2.94*am mpg=9.623.92wt+1.23qsec+2.94am

A Note on Using AIC

In the previous example, we chose to use AIC as the metric for evaluating the fit of various regression models.

AIC stands for Akaike information criterion and is calculated as:
A I C = 2 K – 2 ln ⁡ ( L ) AIC = 2K – 2\ln(L) AIC=2K–2ln(L)
where:

  • K: The number of model parameters.
  • ln(L): The log-likelihood of the model. This tells us how likely the model is, given the data.

However, there are other metrics you might choose to use to evaluate the fit of regression models including cross-validation prediction error, Cp, BIC, AIC, or adjusted R 2 R^2 R2.

Fortunately, most statistical software allows you to specify which metric you would like to use when performing backward selection.

Forward Selection

The goal of stepwise selection is to build a regression model that includes all of the predictor variables that are statistically significantly related to the response variable.

One of the most commonly used stepwise selection methods is known as forward selection, which works as follows:

Step 1: Fit an intercept-only regression model with no predictor variables. Calculate the AIC***** value for the model.

Step 2: Fit every possible one-predictor regression model. Identify the model that produced the lowest AIC and also had a statistically significant reduction in AIC compared to the intercept-only model.

Step 3: Fit every possible two-predictor regression model. Identify the model that produced the lowest AIC and also had a statistically significant reduction in AIC compared to the one-predictor model.

Repeat the process until fitting a regression model with more predictor variables no longer leads to a statistically significant reduction in AIC.

*****There are several metrics you could use to calculate the quality of fit of a regression model including cross-validation prediction error, Cp, BIC, AIC, or adjusted R 2 R^2 R2.

In the example below we choose to use AIC.

The following example shows how to perform forward selection in R.

Example: Forward Selection in R

For this example we’ll use the built-in mtcars dataset in R:

We will fit a multiple linear regression model using mpg (miles per gallon) as our response variable and all of the other 10 variables in the dataset as potential predictors variables.

The following code shows how to perform forward stepwise selection:

# view first six rows of mtcars
head(mtcars)

# define intercept-only model
intercept_only <- lm(mpg ~ 1, data = mtcars)

# define model with all predictors
all <- lm(mpg ~ ., data = mtcars)

# perform forward stepwise regression
forward <- step(intercept_only, direction = 'forward', scope = formula(all), trace = 0)

# view results of backward stepwise regression
forward$anova

# view final model
forward$coefficients
  • First, we fit the intercept-only model. This model had an AIC of 115.94345.
  • Next, we fit every possible one-predictor model. The model that produced the lowest AIC and also had a statistically significant reduction in AIC compared to the intercept-only model used the predictor wt. This model had an AIC of 73.21736.
  • Next, we fit every possible two-predictor model. The model that produced the lowest AIC and also had a statistically significant reduction in AIC compared to the single-predictor model added the predictor cyl. This model had an AIC of 63.19800.
  • Next, we fit every possible three-predictor model. The model that produced the lowest AIC and also had a statistically significant reduction in AIC compared to the two-predictor model added the predictor hp. This model had an AIC of 62.66456.
  • Next, we fit every possible four-predictor model. It turned out that none of these models produced a significant reduction in AIC, thus we stopped the procedure.

Thus, the final model turns out to be:

mpg = 38.75 - 3.17*wt - 0.94*cyl - 0.02*hyp

It turns out that attempting to add more predictor variables to the model does not lead to a statistically significant reduction in AIC.

Thus, we conclude that the best model is the one with three predictor variables: wt, cyl, and hp.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值