数据分析课设(SPSS,EVIEWS,R)【理论】

对外汇数据作前向、后向以及逐步回归,并对输出结果作分析和理论解释。

x1x2x3x4x5x6x7x8x9x10x11x12省市y
1.944.5154.45207.33246.87277.64135.7930.58110.6780.8351.8314.09北 京2384
0.336.49133.16127.29120.17114.8881.2114.0535.71627.12.93天 津202
6.1617.18313.4386.96202.98204.2279.4332.4279.3814.54128.1342.15河 北100
5.359.3123.8122.94101.5996.8434.6713.9937.285.9363.913.12山 西38
3.784.26106.0595.4927.5822.7534.2414.0628.24.6935.729.51内蒙古126
11.178.17271.96533.15164.4123.78187.758.6390.5231.7184.0511.61辽 宁262
2.843.61109.37130.852.4962.2638.1521.8244.5325.7848.4914.22吉 林38
8.6411.41160.06246.57109.18115.3268.7134.5558.0813.5272.0521.17黑龙江121
3.646.67244.42412.04459.63512.21160.4543.5189.9348.5548.637.05上 海1218
30.8919.08435.77724.85376.04381.81210.3971.82150.6423.74188.2819.65江 苏529
6.266.3321.75665.8157.94172.19147.1652.4478.1610.993.059.45浙 江361
4.138.87152.29258.683.4285.175.7426.7563.475.8947.022.66安 徽51
5.855.61347.25332.59157.32172.48115.1633.877.278.6979.018.24福 建651
6.76.8145.4143.5497.4100.543.2817.7151.035.4162.0318.25江 西43
10.811.73442.2665.33411.89429.88115.0787.45145.2521.39187.77110.2山 东220
4.1622.51299.63316.81132.57139.7684.7953.9384.2312.36116.8910.38河 南101
4.647.65195.56373.04161.84180.14101.585880.5321.61100.695.16湖 北88
7.0810.99216.49291.73119.22125.6247.0548.1997.9712.07139.3916.67湖 南156
16.324.1688.83827.16271.07268.2331.5571.44146.1523.38145.7716.52广 东2942
4.014125.04243.552.0631.2247.2525.5955.274.4960.1313.64广 西156
0.82.0735.0360.929.230.1420.224.2212.191.39.290.27海 南96
4.422.1178.93138.4368.3173.8479.9818.4243.320.0148.480.72重 庆88
11.189.42196.27328.46204.49144.45101.2143.0174.2215.8590.611.05四 川84
2.012.0325.0469.9740.8636.4527.0213.826.832.8625.636.76贵 州48
6.436.0888.9170.1588.8689.8433.6629.251.258.640.474.81云 南261
1.910.985.0811.130.671.691.942.955.020.897.590.17西 藏33
5.499.9115.4294.6376.5753.1447.8822.0856.9714.0248.6438.17陕 西247
3.977.839.3299.2341.6450.5511.418.8115.986.3316.467.02甘 肃30
1.313.0813.6718.7918.3718.573.153.148.661.2614.31.2青 海3
1.12.116.1119.6417.8516.524.163.036.761.067.523.18宁 夏1
4.5810.3592.03103.3449.1950.228.1411.8237.954.5239.493.53新 疆82

向前向后略,仅展示逐步

模型摘要

模型

R

R

调整后 R 方

标准估算的错误

更改统计

R 方变化量

F 变化量

显著性 F 变化量

AICEviews结果

1

.741a

.549

.533

455.9279

.549

35.261

.000

15.14489

2

.835b

.697

.675

380.4405

.148

13.650

.001

14.81230

3

.860c

.739

.710

359.3347

.042

4.386

.046

14.72630

4

.885d

.783

.749

334.0439

.044

5.243

.030

14.60711

5

.908e

.824

.789

306.8386

.041

5.815

.024

14.46251

6

.901f

.812

.783

310.9102

-.012

1.695

.205

14.46358

7

.889g

.791

.768

321.5075

-.021

2.872

.102

14.50383

a. 预测变量:(常量), x7

b. 预测变量:(常量), x7, x4

c. 预测变量:(常量), x7, x4, x10

d. 预测变量:(常量), x7, x4, x10, x3

e. 预测变量:(常量), x7, x4, x10, x3, x11

f. 预测变量:(常量), x4, x10, x3, x11

g. 预测变量:(常量), x10, x3, x11

系数a

模型

未标准化系数

标准化系数

t

显著性

B

标准错误

Beta

1

(常量)

-209.535

124.469

-1.683

.103

x7

6.907

1.163

.741

5.938

.000

2

(常量)

-96.142

108.300

-.888

.382

x7

13.791

2.101

1.479

6.564

.000

x4

-2.520

.682

-.832

-3.695

.001

3

(常量)

-174.886

108.984

-1.605

.120

x7

11.152

2.351

1.196

4.744

.000

x4

-2.034

.685

-.672

-2.970

.006

x10

10.761

5.139

.260

2.094

.046

4

(常量)

-228.815

104.015

-2.200

.037

x7

8.786

2.417

.942

3.635

.001

x4

-3.261

.832

-1.077

-3.919

.001

x10

13.864

4.965

.335

2.792

.010

x3

2.849

1.244

.647

2.290

.030

5

(常量)

-140.625

102.304

-1.375

.181

x7

3.910

3.003

.419

1.302

.205

x4

-1.997

.927

-.660

-2.154

.041

x10

18.431

4.939

.446

3.732

.001

x3

5.090

1.473

1.157

3.455

.002

x11

-7.442

3.086

-.551

-2.411

.024

6

(常量)

-127.159

103.130

-1.233

.229

x4

-1.289

.761

-.426

-1.695

.102

x10

22.650

3.776

.548

5.998

.000

x3

6.375

1.108

1.448

5.753

.000

x11

-10.148

2.312

-.751

-4.389

.000

7

(常量)

-117.497

106.482

-1.103

.280

x10

21.479

3.839

.519

5.595

.000

x3

4.975

.764

1.130

6.516

.000

x11

-11.264

2.292

-.834

-4.916

.000

a. 因变量:y

分析:
最终得到     y=-117.497+21.479x10+4.975x3-11.264x11
对比
前向法:y=-140.625+3.910x7-1.997x4+18.431x10+5.090x3-7.442x11
后向法:y=-184.69+4.325x3-20.188x8+17.334x9+11.644x10-12.998x11

可以发现x3 x10 x11最后均在三种方法中保存下来,再一次验证了这三个变量更适合进行回归。
根据上述统计量R^2、R^2调整、AIC:
我们发现前五步和前向法一样,R^2继承了变量增多就增大的传统,一如既往地在变量最多的第五步是数值最大的,而R^2调整不落后尘因为在之前前向法的分析中就是第五步的情况最好,即使后来删减了变量,依然无法撼动x3 x4 x7 x10 x11这一组合的地位!但是真的那么顺利吗?从其他角度(AIC统计量)来看,果真如此,AIC最低值落在了第五步。因此我们有理由确定第五步的情况非常适合拟合回归。
 

在Eviews下 示例(仅最后一步验证):

Dependent Variable: Y

Method: Stepwise Regression

Date: 10/26/20   Time: 20:36

Sample: 1 31

Included observations: 31

Number of always included regressors: 1

Number of search regressors: 12

Selection method: Stepwise forwards

Stopping criterion: p-value forwards/backwards = 0.05/0.051

Variable

Coefficient

Std. Error

t-Statistic

Prob.*  

C

-117.4965

106.4821

-1.103439

0.2796

X11

-11.26444

2.291584

-4.915569

0.0000

X3

4.975142

0.763533

6.515946

0.0000

X10

21.47859

3.838694

5.595287

0.0000

R-squared

0.791069

    Mean dependent var

347.0968

Adjusted R-squared

0.767854

    S.D. dependent var

667.2840

S.E. of regression

321.5075

    Akaike info criterion

14.50383

Sum squared resid

2790910.

    Schwarz criterion

14.68886

Log likelihood

-220.8094

    Hannan-Quinn criter.

14.56415

F-statistic

34.07639

    Durbin-Watson stat

1.242992

Prob(F-statistic)

0.000000

Selection Summary

Added X7

Added X4

Added X10

Added X3

Added X11

Removed X7

Removed X4

*Note: p-values and subsequent tests do not account for stepwise

        selection.


对数据进行岭回归,lasso,pca分析

   R-SQUARE AND BETA COEFFICIENTS FOR ESTIMATED VALUES OF K

  K     RSQ      x1       x2       x3       x4       x5       x6       x7       x8       x9      x10      x11      x12

______ ______ ________ ________ ________ ________ ________ ________ ________ ________ ________ ________ ________ ________

.00000 .87481 -.012491  .022873  .749084 -.312414 -.962825  .759538  .446284 -.519848 1.037980  .221303 -.780227  .041865

.01000 .86789 -.055610  .027677  .719018 -.255572 -.437414  .293662  .441002 -.505298  .786637  .287679 -.611372  .013926

.02000 .85976 -.072465  .038023  .657698 -.220189 -.298092  .180214  .464919 -.476889  .638421  .315486 -.513426  .006697

.03000 .85231 -.081208  .046847  .608603 -.190878 -.229206  .128640  .475058 -.448808  .541643  .331856 -.452919  .000636

.04000 .84548 -.086395  .053850  .568792 -.166619 -.186798  .099777  .477484 -.422877  .473334  .342069 -.411422 -.005267

.05000 .83915 -.089680  .059310  .535619 -.146433 -.157441  .081835  .475775 -.399303  .422456  .348511 -.380745 -.011018

.06000 .83322 -.091812  .063540  .507358 -.129465 -.135565  .069969  .471786 -.377934  .383053  .352488 -.356796 -.016545

.07000 .82761 -.093180  .066799  .482864 -.115043 -.118424  .061811  .466521 -.358540  .351621  .354782 -.337326 -.021796

.08000 .82227 -.094013  .069294  .461342 -.102654 -.104497  .056063  .460552 -.340887  .325957  .355890 -.321000 -.026744

.09000 .81717 -.094454  .071183  .442221 -.091906 -.092868  .051953  .454219 -.324766  .304602  .356142 -.306981 -.031380

.10000 .81228 -.094598  .072588  .425075 -.082500 -.082952  .048998  .447729 -.309990  .286554  .355766 -.294715 -.035706

.11000 .80757 -.094512  .073606  .409582 -.074202 -.074354  .046876  .441214 -.296399  .271099  .354920 -.283822 -.039732

.11000 .80757 -.094512  .073606  .409582 -.074202 -.074354  .046876  .441214 -.296399  .271099  .354920 -.283822 -.039732

.12000 .80302 -.094244  .074312  .395489 -.066831 -.066798  .045369  .434753 -.283859  .257715  .353722 -.274031 -.043472

.13000 .79863 -.093833  .074764  .382596 -.060240 -.060082  .044324  .428399 -.272249  .246011  .352257 -.265143 -.046940

.14000 .79437 -.093306  .075011  .370741 -.054314 -.054059  .043628  .422183 -.261471  .235687  .350589 -.257008 -.050154

.15000 .79025 -.092686  .075089  .359792 -.048958 -.048614  .043200  .416124 -.251435  .226513  .348766 -.249512 -.053129

.16000 .78624 -.091990  .075031  .349641 -.044095 -.043659  .042979  .410232 -.242068  .218305  .346827 -.242563 -.055882

.17000 .78234 -.091232  .074860  .340195 -.039661 -.039124  .042919  .404512 -.233303  .210918  .344801 -.236090 -.058428

.18000 .77854 -.090425  .074597  .331377 -.035601 -.034953  .042984  .398965 -.225084  .204232  .342710 -.230034 -.060781

.19000 .77484 -.089578  .074260  .323122 -.031872 -.031099  .043147  .393589 -.217359  .198151  .340572 -.224347 -.062955

.20000 .77124 -.088699  .073861  .315373 -.028435 -.027525  .043384  .388380 -.210085  .192596  .338403 -.218989 -.064963

.21000 .76772 -.087796  .073413  .308082 -.025258 -.024200  .043680  .383334 -.203223  .187501  .336212 -.213926 -.066816

.22000 .76428 -.086873  .072925  .301206 -.022313 -.021095  .044021  .378446 -.196738  .182808  .334011 -.209128 -.068526

.23000 .76091 -.085935  .072404  .294709 -.019576 -.018190  .044396  .373709 -.190599  .178473  .331805 -.204572 -.070102

.24000 .75762 -.084987  .071859  .288557 -.017027 -.015463  .044796  .369118 -.184779  .174454  .329601 -.200236 -.071555

.25000 .75441 -.084032  .071294  .282722 -.014646 -.012899  .045214  .364668 -.179254  .170717  .327404 -.196101 -.072892

.26000 .75125 -.083073  .070713  .277179 -.012419 -.010483  .045645  .360352 -.174000  .167232  .325218 -.192152 -.074122

.27000 .74816 -.082112  .070122  .271905 -.010331 -.008202  .046083  .356165 -.168998  .163975  .323046 -.188372 -.075253

.28000 .74513 -.081151  .069524  .266879 -.008371 -.006044  .046525  .352102 -.164230  .160922  .320890 -.184751 -.076290

.29000 .74216 -.080193  .068920  .262083 -.006527 -.004001  .046969  .348156 -.159680  .158056  .318752 -.181276 -.077241

.30000 .73925 -.079238  .068314  .257502 -.004789 -.002062  .047411  .344323 -.155332  .155357  .316635 -.177938 -.078111

  因为不知道X1~X12实际背景下各变量的意义,所以应该根据实际情况(各变量与因变量在实际生活或专业知识中是否是正相关或负相关)以及K-RSQ图,k(即lambda)由小到大来选择,k0.1~0.2时,回归系数开始趋于稳定。比如当K0.2是,得到的方程为:

y = -0.088699x1 +0.073861x2 +0.315373x3 -0.028435x4 -0.027525x5 +0.043384x6 +0.388380x7 -0.210085x8 +0.192596x9 +0.338403x10 -0.218989x11 -0.064963x12

通过下图对比,可以进一步验证,k=0.2时,各变量的岭迹趋于平稳,再回看上图,k=0.2之后也没有明显的波动,所以两者结论一致。在k=0.2时,虽然RSQ不如k=0时高,但是我们通过减少部分信息换来更好的估计效果,这是值得的。

LASSO

通过Eviews进行Lasso回归,因为Eviews没有直接给LASSO的方法,所以我们可以通过弹性网进行计算,

 

只需要α=1时,可以演变成LASSO方法。

得到以下数据:

Dependent Variable: Y

Method: Elastic Net Regularization

Date: 11/10/20   Time: 20:06

Sample: 1 31

Included observations: 31

Penalty type: LASSO (alpha = 1)

Lambda at minimum error: 92.69

Regressor transformation: Std Dev (smpl)

Cross-validation method: K-Fold (number of folds = 5), rng=kn,

        seed=99713398

Selection measure: Mean Squared Error

(minimum)

(+ 1 SE)

(+ 2 SE)

Lambda

92.69

465.3

465.3

Variable

Coefficients

X1

2.10E-08

0.000000

0.000000

X10

10.62391

0.329137

0.329137

X11

0.000000

0.000000

0.000000

X12

-0.331988

0.000000

0.000000

X2

0.000000

0.000000

0.000000

X3

0.392510

0.008381

0.008381

X4

0.000000

0.003180

0.003180

X5

2.21E-09

0.000000

0.000000

X6

0.035251

0.000000

0.000000

X7

2.285083

0.046742

0.046742

X8

4.62E-09

0.000000

0.000000

X9

0.900589

0.203648

0.203648

C

-126.1446

323.2102

323.2102

d.f.

9

5

5

L1 Norm

140.7139

323.8013

323.8013

R-squared

0.573399

0.034777

0.034777

该数据表面最佳lambda取92.69Eviews中对lambda的解释是Ratio of minimum to maximum lambda for EViews-supplied list,实际上它只能在0~1之间,这里我对min/max lambda=0.0001,然而此时x11 x2 x4被去除,而x1 x5 x8的系数对回归方程的影响也微乎其微,我们可以联想到曾经做前进、后退和逐步回归。

因为lasso回归是使用收缩的线性回归,对于最后一项L1范数,事实上约束了模型参数,使得某些变量回归系数缩小为零,也就是之前提到的“收缩”。

经过处理高相关性变量后只留下了9个变量。

通过上图我们可以看出x10在lambda=92之后依然有较大的起伏,但是其他变量都逐渐趋于平稳。

也进一步证明了对于lambda既要取得小又要取得好的92是可行的。

得到以下统计量

Forecast: YF

Actual: Y

Forecast sample: 1 31

Included observations: 31

Root Mean Squared Error

428.7473

Mean Absolute Error     

267.1796

Mean Absolute Percentage Error

570.8096

Theil Inequality Coef.

0.344282

     Bias Proportion        

0.000000

     Variance Proportion 

0.465991

     Covariance Proportion 

0.534009

Theil U2 Coefficient        

0.520825

Symmetric MAPE            

102.9197


 

我们可以对比普通最小二乘时的结果:

Forecast: YF

Actual: Y

Forecast sample: 1 31

Included observations: 31

Root Mean Squared Error

232.2607

Mean Absolute Error     

197.1669

Mean Absolute Percentage Error

1009.038

Theil Inequality Coef.

0.160419

     Bias Proportion        

0.000000

     Variance Proportion 

0.033425

     Covariance Proportion 

0.966575

Theil U2 Coefficient        

1.324975

Symmetric MAPE            

113.5775


可以发现在添加LASSO正则项后,MSE、平均绝对误差和平均绝对百分误差都上升了。

PCA

通过SPSS可以得到以下特征值贡献率情况:

成分

初始特征值a

总计

方差百分比

累积 %

原始

1

96887.647

87.246

87.246

2

9689.193

8.725

95.971

3

2103.445

1.894

97.865

4

1479.046

1.332

99.197

5

446.118

.402

99.598

6

184.926

.167

99.765

7

143.417

.129

99.894

8

43.857

.039

99.933

9

31.613

.028

99.962

10

27.888

.025

99.987

11

9.475

.009

99.996

12

4.896

.004

100.000

从上表可知,特征值前两个对总数据表达的贡献程度较高(第一个甚至独占了87.246%的比重,而第二个虽然对比重体来说也比较高(8.725%))然而相比较第一特征值,还是逊色许多。如果从尽可能地包含原信息的角度来说,也可以将第三和第四个特征纳入我们的考虑范围内,但是一般来说,既然都做PCA降维了,没必要还留那么多信息,重点还是希望减少信息得到他们共有的表达变量,从而以少量的变量进行新的表示。

以上碎石图也很直观的能看出前两个特征值对总体表达的占比。

原始

成分

1

2

x1

4.195

-.852

x2

3.970

-1.257

x3

144.276

-29.159

x4

215.711

-36.110

x5

101.999

55.366

x6

105.668

65.480

x7

64.468

-7.777

x8

20.878

-1.153

x9

37.351

2.572

x10

7.254

8.335

x11

42.440

-5.373

x12

9.850

3.595

因此,SPSS也自动地取了前两个特征,其中由第二列(第一特征值)可以看出它由x3 x4 x5 x6较高程度地影响着,并且是正相关;第三列(第二特征值)其中由x3 x4 x5 x6较高程度地影响着,但是因为这两个特征值所代表的现实意义不同,所以这次只是巧合也是这几个变量影响较大,而且他们也没有第一特征值的时候影响得大,x3 x4和其他几个变量甚至在第二特征值的情况下产生了负相关影响。所以第一第二特征值的实际意义还依赖于现实常识和专业知识的理解。

最后回到协方差矩阵分析:

相关性矩阵

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

相关性

x1

1.000

.640

.691

.738

.582

.519

.663

.691

.719

.150

.758

.301

x2

.640

1.000

.773

.658

.502

.464

.602

.660

.686

.118

.760

.337

x3

.691

.773

1.000

.934

.742

.710

.885

.867

.889

.314

.855

.457

x4

.738

.658

.934

1.000

.780

.743

.887

.926

.892

.348

.849

.437

x5

.582

.502

.742

.780

1.000

.989

.740

.790

.850

.630

.705

.515

x6

.519

.464

.710

.743

.989

1.000

.703

.753

.821

.646

.666

.493

x7

.663

.602

.885

.887

.740

.703

1.000

.781

.834

.541

.649

.190

x8

.691

.660

.867

.926

.790

.753

.781

1.000

.931

.404

.906

.548

x9

.719

.686

.889

.892

.850

.821

.834

.931

1.000

.569

.895

.533

x10

.150

.118

.314

.348

.630

.646

.541

.404

.569

1.000

.241

.155

x11

.758

.760

.855

.849

.705

.666

.649

.906

.895

.241

1.000

.613

x12

.301

.337

.457

.437

.515

.493

.190

.548

.533

.155

.613

1.000

我们发现之前所说的x3 x4 以及x5 x6两两之间高度相关,我们有理由怀疑,他们两者在多重共线性去重后,再次进行PCA,各变量对第一特征值和第二特征值的影响有可能还是和目前情况差不多。

 

在岭回归中,dj的缩减对PCA造成何种影响?

对于带有L2正则项的OLS,有以下损失函数:

易解:

现在我们需要做SVD分解:

使得

其中U为n*p维正交矩阵,D为p*p维主对角线矩阵,V为p*p维转置正交矩阵

得到:

其中dj为D中对角线上的元素。

通过上式括号内的第一项改写:

因此可以得到:

由主成分分析法的公式联想到:设γj为Z矩阵第j个主成分,于是得到以下关系:

γj = Z vj = uj dj

我们发现,

1.uj作为新的变量并向每个进行投影。

2.使用它缩减投影。特征值dj较小的方向会产生更大的相对收缩,而变量的大小决定了dj的大小,因此会影响收缩率。当lambda=0时,这一项等于1,而解将退化成最小二乘解,当lambda充分大时,这一项趋于0,等于0

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值