一、获取数据
> library(quantmod)
> codedata<-list()
> codepool<-c("MMM","AXP","AAPL","BA","CAT","CVX","CSCO","KO","DD","XOM","GE","GS","HD","IBM","INTC","JNJ","JPM","MCD","MRK","MSFT","NKE","PFE","PG","TRV","UTX","UNH","VZ","WMT","V","DIS")
> number<-length(codepool)
> for(i in 1:number){
+ n<-codepool[i]
+ TESTdata<-get(getSymbols(n,src="yahoo",from=Sys.Date()-50,to=Sys.Date()))
+ codedata[[i]]<-as.data.frame(TESTdata[,4])
+ next}
> finaldata<-do.call(cbind,codedata)
> names(finaldata)<-codepool
> head(finaldata,3)
MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV
2017-01-23 178.51 75.97 120.08 157.84 94.46 115.39 30.27 41.43 72.78 84.97 29.75 232.67 138.07 171.03 36.77 113.91 83.71 121.38 61.81 62.96 53.24 31.46 86.96 118.04
2017-01-24 175.97 77.43 119.97 160.55 96.24 116.37 30.60 41.90 76.05 85.09 30.00 233.68 138.06 175.90 37.62 111.76 84.72 121.05 61.21 63.52 53.45 31.15 87.86 116.84
2017-01-25 176.73 76.89 121.88 167.36 98.15 117.24 30.70 42.12 76.67 85.34 30.37 237.25 137.48 178.29 37.80 112.80 86.03 121.79 61.08 63.68 53.86 31.29 87.16 117.67
UTX UNH VZ WMT V DIS
2017-01-23 110.34 159.07 52.41 66.65 82.15 107.12
2017-01-24 111.61 160.43 50.12 67.40 83.23 107.90
2017-01-25 110.96 161.24 49.77 66.89 83.90 108.04
> tail(finaldata,3)
MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV
2017-03-08 189.51 79.04 139.00 181.74 93.23 109.61 34.02 41.99 79.77 81.03 29.80 250.24 146.92 179.45 35.62 124.10 91.21 128.09 65.80 64.99 56.51 33.91 90.14 121.24
2017-03-09 189.90 79.30 138.68 180.57 91.39 110.04 34.07 42.03 80.48 81.67 29.66 250.18 146.62 177.18 35.82 125.95 91.57 128.14 65.89 64.73 56.36 34.05 90.34 121.90
2017-03-10 191.21 79.38 139.14 178.70 92.31 110.61 34.26 42.29 80.86 81.61 30.28 248.38 146.85 177.83 35.91 126.21 91.28 127.98 65.60 64.93 56.43 34.11 91.07 122.83
UTX UNH VZ WMT V DIS
2017-03-08 111.75 167.91 49.16 69.80 88.96 110.84
2017-03-09 111.93 168.01 49.28 69.86 89.11 111.03
2017-03-10 112.14 169.98 49.35 70.10 89.73 110.92
道琼斯指数共有30只成分股,取各个成分股最近50天的收盘价。
二、主成分分析
> p<-princomp(finaldata[,1:30])
> summary(p)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
Standard deviation 16.1041328 4.22688825 3.50222384 2.20689405 2.01370950 1.459120153 1.298793653 0.991546988 0.858445383 0.752024098 0.679270196 0.579182056
Proportion of Variance 0.8458697 0.05827339 0.04000521 0.01588517 0.01322581 0.006944019 0.005501855 0.003206678 0.002403555 0.001844558 0.001504922 0.001094105
Cumulative Proportion 0.8458697 0.90414310 0.94414831 0.96003348 0.97325929 0.980203314 0.985705168 0.988911847 0.991315402 0.993159960 0.994664882 0.995758987
主成分分析法生成了30个新变量,前面3个变量可以解释到原数据集94.41%(0.8458697+0.05827339+0.04000521)的方差,因此我们可以用这3个新的变量代替原来30个初始变量。
> stockdata<-as.data.frame(as.matrix(finaldata[,1:30])%*%as.matrix(p$loadings[,1:3]))
> head(stockdata)
Comp.1 Comp.2 Comp.3
2017-01-23 -497.0058 -3.717498 174.8456
2017-01-24 -498.6973 -5.271716 178.5616
2017-01-25 -505.3037 -4.955875 181.6700
2017-01-26 -507.5417 -4.710687 182.2582
2017-01-27 -506.5475 -3.575603 180.2852
2017-01-30 -502.8693 -3.291562 176.4937
三、回归分析
1、获取道琼斯指数最近50天数据,合并数据框
> res<-get(getSymbols("DJIA",src="yahoo",from=Sys.Date()-50,to=Sys.Date()))
> US30<-as.data.frame(res[,4])
> mydata<-cbind(stockdata,US30)
> names(mydata)<-c("X1","X2","X3","US30")
> head(mydata)
X1 X2 X3 US30
2017-01-23 -497.0058 -3.717498 174.8456 19799.85
2017-01-24 -498.6973 -5.271716 178.5616 19912.71
2017-01-25 -505.3037 -4.955875 181.6700 20068.51
2017-01-26 -507.5417 -4.710687 182.2582 20100.91
2017-01-27 -506.5475 -3.575603 180.2852 20093.78
2017-01-30 -502.8693 -3.291562 176.4937 19971.13
2、构造多元线性回归模型
> model<-lm(US30~X1+X2+X3,mydata)
> summary(model)
Call:
lm(formula = US30 ~ X1 + X2 + X3, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-50.267 -7.866 -1.937 10.018 39.555
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5395.8202 218.4604 24.699 < 2e-16 ***
X1 -25.7840 0.2282 -113.006 < 2e-16 ***
X2 0.4522 0.8693 0.520 0.607
X3 9.1254 1.0492 8.698 1.06e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 21.43 on 30 degrees of freedom
Multiple R-squared: 0.9977, Adjusted R-squared: 0.9974
F-statistic: 4282 on 3 and 30 DF, p-value: < 2.2e-16
> model
Call:
lm(formula = US30 ~ X1 + X2 + X3, data = mydata)
Coefficients:
(Intercept) X1 X2 X3
5395.8202 -25.7840 0.4522 9.1254
检验显示X2与US30不显著,剔除X2。因此线性回归方程为:
> US30 = 5395.8202 + -25.7840*X1 + 9.1254*X3