背景:有了一个稀疏的多变量的变量矩阵,一个对应的响应变量向量,对变量矩阵利用不同惩罚进行变量筛选。
用到的包:
library(readr)
library(glmnet)
4.19更新
B<-read_csv("B.csv", col_names =T,locale=locale(encoding="GBK"))
B<-as.matrix(B)
fit = glmnet(B[,c(2:304)],B[,1], family = "binomial",alpha = 1)
plot(fit, xvar = "dev", label = TRUE)
cvfit = cv.glmnet(B[,c(2:304)],B[,1], family = "binomial",nfolds=10,trace.it = 1)
plot(cvfit, xvar = "dev", label = TRUE)
cvfit = cv.glmnet(B[,c(2:304)],B[,1],
family = "binomial",type.measure="mse",nfolds=5,trace.it = 1)
#####type.mesure=c("mse","auc")##<-不全,但是我只需要用到这些
#####其他的可以参考help文档###
plot(cvfit, xvar = "dev", label = TRUE)
cvfit$lambda.min
log(cvfit$lambda.min)
cvfit$lambda.1se
log(cvfit$lambda.1se)
coef1<-coef.glmnet(fit,mode="step",s=cvfit$lambda.min)
coef1@i;coef1@x
coef2<-coef.glmnet(fit,mode="step",s=cvfit$lambda.1se)
coef2@i;coef2@x
惩罚的选择是利用参数alpha的值调整,alpha=1时是Lasso,后续调整精度时,可以用其他惩罚。
遇到的问题:
得到的交叉验证的结果不算稳定,不管是10折还是5折。样本量是1350+,属于大样本,5折应该是够的,但是得到的AUC图很不稳定,MSE得到的lambda倒是挺稳定的。