R语言选模型/用AIC BIC adjustRsq 十折交叉验证 LOOCV等验证/择参 以fama三因子模型和CAMP模型为例@理科班的习习同学
引入包与数据预处理
install.packages("leaps")
install.packages("car")
install.packages("caret")
library("car")
library("leaps")
library("heavy")
library("caret")
mt=read.csv("XIXI-Data.csv")
mt$X=as.Date(mt$X,"%Y/%m/%d") #修改日期格式并赋值给X列
ff=read.csv("F-F_Research_Data_Factors_daily.csv",header=T,skip=4
+ )#引入三因子信息
dat=merge(mt,ff,by= "X") #和另一个数据集按照日期进行拼接,如果本来就是一整个数据集,这两部可以忽略
attach(dat)#方便后面写语句
回归
fama三因子模型的公式是
下面是基本介绍:
Here, Rt, RM,t and µt are the net returns of the asset, the market portfolio and the risk-free asset from t−1 to t, respectively. The market excess return is the first risk factor, while the second and the third ones are SMB (small minus big) and HML (high minus low). To be more specific, SMB is the difference in returns on a portfolio of small stocks and a portfolio of large stocks. “Small” and “big” refer to the size of the market value of a stock. HML is the difference in returns on a portfolio of high book-to-market value stocks and a portfolio of low book-tomarket value stocks. SMBt and HMLt are the differences in returns in the t-th period given by (t−1,t].
> n=dim(dat)[1] #计算一列有多少数
> R_g=Adj.Close[-1]/Adj.Close[-n] #后一项除以前一项算回报率
> R_n=R_g-1 #净回报率
> lef_term=(R_n-RF[-1]) #三因子模型的左边
> reg1=lm(lef_term~Mkt.RF[-1]+SMB[-1]+HML[-1])#回归
> vif(reg1)#计算vif
> #下面跟一段VIF的解释
> # The VIF for Mkt.RF is 1.056693. It means its squared standard error is 1.056693 times larger than it would be if the other predictors are deleted or were not correlated with it. The same as the other 2 elements. The VIF of these 3 elements proved that they estimate their coefficient doesn’t have large errors which means there is not a collinearity problem.
> summary</