How the choose the features?
怎样选择特征?
- construct a multivariate linear model using all the provided features and choose those with 0.001 significance level(or 0.01, 0.05 significance level)
- 使用所有的特征建立多元线性回归模型并且选择那些具有高显著性的特征
- plotting the dependent variable vs each of the chosen features and explore the potential correlation (like logarithm, polynomial)
- 绘制待预测变量与每一个选择的特征的图像并且探索图像中潜在的关系(如指数关系、n次多项式关系)
- construct the covariance matrix and make an interaction of those with high correlation
- 构建相关系数矩阵并且将相关性高的特征乘起来
General Implementation with R
R 语言实现
# import some necessary packages
library(haven) # used to load our data
library(texreg) # used to display fit info
library(dplyr) # used to manipulate data
library(tidyr) # used for the drop_na function
library(ggplot2) # in case we want to make ggplots
library(caTools)
library(MASS)
library(corrgram)
# import Boston dataset
boston_df <- Boston
# change the name of the columns
names(boston_df) <- c("crime", "zoned_bigger_25000", "non_retail_proportion","chas_river", "nitrogen_density", "average_room_number", "built_before_1940_ratio", "distance_to_centre", "ac