一、前言
我们对多重共线性做了简单解释,今天用程序来做一下多重共线性的检验。
二、程序分析
加载数据到数据帧并显示数据内容:
x<-read.csv(file="C:/R_files/Employee_salary.csv", header=T)
head(x)
names(x)
画散点图简单观察情况:
plot(x$experience,x$salary, xlab="Experience", ylab="Salary", main="Scatter Plot")
创建空列
dummyGender<-NULL
dummyGender<-(x$gender=="female")*1
将其添加到数据集中
x1<-cbind(dummyGender, x)
head(x1)
R会自己生成哑变量
做线性模型:
fit1<-lm(salary~experience+as.factor(gender), x)
summary(fit1)
测试交互效应:
fit1<-lm(salary~experience+as.factor(gender)+experience:as.factor(gender), x)
summary(fit1)
测试年龄以及年龄与经验(不交互)的结果:
fit2<-lm(salary~age+as.factor(gender), x)
summary(fit2)
fit3<-lm(salary~experience+age+as.factor(gender), x)
summary(fit3)
相关矩阵:
#correlation matrix
#For a better visualization you can use the mentioned package
install.packages("PerformanceAnalytics")
install.packages("zoo")
install.packages("xts")
library("zoo")
library("xts")
library("PerformanceAnalytics")
my_data <- x1[, c(1,3,4,5)]
chart.Correlation(my_data, histogram=TRUE, pch= 19)
cor(my_data)
多重共线性:
#multicolinearity
install.packages("mctest")
library(mctest)
my_data1 <- x1[, c(1,3,5)]
# Individual Multicollinearity Diagnostic Measures
#Computes different measures of multicollinearity diagnostics such as
# TOL,variance Inflation factor (VIF),Corrected VIF (CVIF), Klein's rule
imcdiag(x = my_data1, y = x1$salary)
#try imcdiag() function with model which has 3 independent variables which is "fit3"
# this below code works instead of the earlier imcdiag()
imcdiag(fit3)
?imcdiag