为考试方便,用matlab写了个简单线性回归的整套过程,包括所有参数检验和估计。意义不是很大,因为这些用SPSS等统计软件可以很快得出,但统计软件毕竟是黑箱,对原理感兴趣或熟悉的可以看看代码(其实就是写着玩的,确实没多大用)。下附代码:
clear
clc
load simple_regression;
n=length(y);
x=x';
y=y';
% regression function
X=[ones(size(y)),x];
[b,bint] = regress(y,X);
a=b(1,1); % intercept_hat
b=b(2,1); % slope_hat
disp('regression function:');
disp(['y_hat=',num2str(a),'+',num2str(b),'*x']);
y_hat=a*X(:,1)+b*X(:,2);
Y_bar=mean(y);
SSR=0;
for i=1:n
SSR=SSR+(y_hat(i,1)-Y_bar)^2;
end
disp(['SSR=',num2str(SSR)]);
SSE=0;
for i=1:n
SSE=SSE+(y_hat(i,1)-y(i,1))^2;
end
disp(['SSE=',num2str(SSE)]);
SST=SSR+SSE;
disp(['SST=',num2str(SST)]);
R_square=SSR/SST;
disp(['R_square=',num2str(R_square)]);
MSR=SSR/1;
disp(['MSR=',num2str(MSR)]);
MSE=SSE/(n-1-1);
disp(['MSE=',num2str(MSE)]);
F=MSR/MSE;
%% significance test of slope
% F test
disp('F test:');
disp(['F=',num2str(F)]);
p_F=1-fcdf(F,1,n-1-1);
disp(['p_F=',num2str(p_F)]);
alpha=0.05;
disp(['alpha=',num2str(alpha)]);
if p_F<alpha
disp(['p_F<',num2str(alpha),', so the coefficient of slope is significantly different from 0.']);
elseif p_F>=alpha
disp(['p_F>=',num2str(alpha),', so the coefficient of slope is not significantly different from 0.']);
end
% t test
disp('t test:');
S_yx=sqrt(SSE/(n-2));
disp(['S_yx=',num2str(S_yx)]);
X_bar=mean(x);
SSX=0;
for i=1:n
SSX=SSX+(x(i,1)-X_bar)^2;
end
disp(['SSX=',num2str(SSX)]);
S_b=S_yx/sqrt(SSX);
disp(['S_b=',num2str(S_b)]);
t=b/S_b;
disp(['t=',num2str(t)]);
disp(['alpha=',num2str(alpha)]);
p_t=(1-tcdf(abs(t),n-2))*2;
disp(['p_t=',num2str(p_t)]);
if p_t<alpha
disp(['p_t<',num2str(alpha),', so the coefficient of slope is significantly different from 0.']);
elseif p_t>=alpha
disp(['p_t>=',num2str(alpha),', so the coefficient of slope is not significantly different from 0.']);
end
% interval estimation of slope
disp('interval estimation of slope:');
disp(['1-alpha=',num2str(1-alpha)]);
slope_l=b-tinv(1-alpha/2,n-2)*S_b;
slope_u=b+tinv(1-alpha/2,n-2)*S_b;
disp([num2str(slope_l),'<=slope<=',num2str(slope_u)]);
% interval estimation of Yi_mean and Yi_pre
n_hi=repelem(1/n,n)';
xi_hi=((x-repelem(X_bar,n)').^2)/SSX;
hi=n_hi+xi_hi;
ie_mean_Yi=[y_hat-tinv(1-alpha/2,n-2)*S_yx*sqrt(hi),y_hat+tinv(1-alpha/2,n-2)*S_yx*sqrt(hi)];
disp('ie_mean_Yi contains interval estimation of mean of Yi');
ie_pre_Yi=[y_hat-tinv(1-alpha/2,n-2)*S_yx*sqrt(ones(n,1)+hi),y_hat+tinv(1-alpha/2,n-2)*S_yx*sqrt(ones(n,1)+hi)];
disp('ie_pre_Yi contains interval estimation of prediction of Yi');
simple_regression .mat的数据如下,变量x和y均为行向量:
x: 2556, 2786, 3016, 3278, 3549, 4012, 4638, 4996, 5017, 5275, 5688, 5692, 5697, 5721, 5745, 5752, 5756, 5796, 5802, 5806
y: 502, 565, 606, 636, 687, 702, 768, 795, 812, 824, 867, 869, 876, 912, 935, 987, 1029, 1023, 1238, 1302