目录
数理统计仿真实验(Computational Practice)
最近Mathematical Statistics(数理统计)结课,整理了一些关于计算仿真实验的结果和代码,相关原理在其他博客中可以找到,自己动手实验对于更好地理解相关定理定律有很大的帮助。
大数定律(the Law of Large Numbers)
大数定律即为,在实验条件不变的情况下,随着重复试验次数的增加,随机事件发生的频率逐渐趋近于其概率。针对离散变量和连续变量,这里分别对二项分布、泊松分布和指数分布、伽马分布进行仿真,模拟实验次数在50、500、5000和50000的情况下仿真结果(直方图)与理论结果(散点或曲线)之间的差异,均值与方差计算结果详见图例。可以看到随着实验次数的增大,仿真值愈发趋近于理论值,即验证了大数定律。
二项分布(Binomial Distribution)
泊松分布(Poisson Distribution)
指数分布(Exponential Distribution)
伽马分布(Gamma Distribution)
中心极限定理(Central Limit Theorem)
中心极限定理即为,即使数据的分布不是正态分布,其样本均值的分布也是正态分布。针对离散变量和连续变量,这里同样分别对二项分布、泊松分布和指数分布、伽马分布进行仿真,模拟实验次数为5000、采样点数为10、20、200和600的情况下归一化仿真结果(直方图)与理论结果(散点或曲线)之间的差异,均值与方差计算结果详见图例。可以看到随着实验次数的增大,仿真数据分布愈发趋近于正态分布,即验证了中心极限定理。
二项分布(Binomial Distribution)
泊松分布(Poisson Distribution)
指数分布(Exponential Distribution)
伽马分布(Gamma Distribution)
点估计(Point Estimation)
点估计问题即为,借助于总体的一个样本来构造适当的样本函数去估计总体未知参数的值。这里随机产生长度为10、100、1000和10000的样本,使用矩估计与极大似然估计方法对正态分布和均匀分布进行参数估计,将仿真结果绘制为散点图,从而分析两种不同的点估计方法的差异。
矩估计(Moment Estimation)
矩估计(Moment Estimation)的思想主要为,用样本的k阶矩作为总体k阶矩的估计量。在正态分布中,计算得到的样本均值和方差即为对总体均值和方差的估计;在均匀分布中,取样本均值的二倍作为对总体的估计。
极大似然估计(Maximum Likelihood Estimation)
极大似然估计的思想是通过构建似然函数并找到使其最大的参数值从而得到估计量。在正态分布中,计算得到的样本均值和方差即为对总体均值和方差的估计;在均匀分布中,取样本均值的最大值作为对总体的估计。
正态分布(Normal Distribution)
在正态分布中,矩估计和极大似然估计在均值和方差上的估计结果相同,都是用计算得到的样本均值和方差当做总体的均值和方差。其中均值是无偏估计,方差是有偏估计。从下面的图中可以看到,随着样本数量的增加,散点分布向中心靠拢,估计值愈发接近实际值(10,0.2)。
均匀分布(Uniform Distribution)
在均匀分布中,矩估计和极大似然估计结果有所不同。在下图中用红色点代表矩估计,蓝色点代表极大似然估计,绿色点代表经修正后的极大似然估计。可以看到,绿色点的分布是最接近实际值y=10这条直线的。同时随着样本数量的增大,两种估计方法得到的参数值都逐渐逼近实际值。
附:MATLAB代码
大数定律
%Chapter 5: The law of large numbers
N=[50 500 5000 50000];
res=cell(4,4);
theta=0.3;
n=20;
x=0:n;
y=1:1000;
lambda=theta*n;
mu=theta*n;
alpha=theta*n;
beta=1;
%Part 1 of discrete
%Binomial Distribution
figure(1);
for i=1:4
for j=1:N(i)
res{1,i}(1,j)=binornd(n,theta);
end
m_bino_sim=mean(res{1,i});
v_bino_sim=var(res{1,i});
subplot(2,2,i);
histogram(res{1,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_bino_the,v_bino_the]=binostat(n,theta);
pdf_bino=binopdf(x,n,theta);
plot(x,pdf_bino,'r*','MarkerSize',8);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_bino_sim),' , variation is ',num2str(v_bino_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_bino_the),' , variation is ',num2str(v_bino_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=$ ' num2str(N(i)) ' , the distributions' ...
' of $b(x;n,\theta)$ with $n=20 , \theta=0.3$'],'Interpreter',"latex");
hold off;
end
%Poisson Distribution
figure(2);
for i=1:4
for j=1:N(i)
res{2,i}(1,j)=poissrnd(lambda);
end
m_poiss_sim=mean(res{2,i});
v_poiss_sim=var(res{2,i});
subplot(2,2,i);
histogram(res{2,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_poiss_the,v_poiss_the]=poisstat(lambda);
pdf_poiss=poisspdf(x,lambda);
plot(x,pdf_poiss,'r*','MarkerSize',8);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_poiss_sim),' , variation is ',num2str(v_poiss_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_poiss_the),' , variation is ',num2str(v_poiss_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=$ ' num2str(N(i)) ' , the distributions' ...
' of $p(x;\lambda)$ with $\lambda=6$'],'Interpreter',"latex");
hold off;
end
%Part 2 of continuous
%Exponential Distribution
figure(3);
for i=1:4
for j=1:N(i)
res{3,i}(1,j)=exprnd(mu);
end
m_exp_sim=mean(res{3,i});
v_exp_sim=var(res{3,i});
subplot(2,2,i);
histogram(res{3,i},y,'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_exp_the,v_exp_the]=expstat(mu);
pdf_exp=exppdf(y,mu);
plot(pdf_exp,'r');
axis([0 40 0 0.2]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_exp_sim),' , variation is ',num2str(v_exp_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_exp_the),' , variation is ',num2str(v_exp_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=$ ' num2str(N(i)) ' , the distributions' ...
' of $Exp(\mu)$ with $\mu=6$'],'Interpreter',"latex");
hold off;
end
%Gamma Distribution
figure(4);
for i=1:4
for j=1:N(i)
res{4,i}(1,j)=gamrnd(alpha,beta);
end
m_gam_sim=mean(res{4,i});
v_gam_sim=var(res{4,i});
subplot(2,2,i);
histogram(res{4,i},y,'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_gam_the,v_gam_the]=gamstat(alpha,beta);
gam_exp=gampdf(y,alpha,beta);
plot(gam_exp,'r');
axis([0 30 0 0.2]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_gam_sim),' , variation is ',num2str(v_gam_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_gam_the),' , variation is ',num2str(v_gam_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=$ ' num2str(N(i)) ' , the distributions' ...
' of $g(x;\alpha,\beta)$ with $\alpha=6 , \beta=1$'],'Interpreter',"latex");
hold off;
end
中心极限定理
%Chapter 6: Central limit theorem
n=[10 20 200 600];
m=[10 100 1000];
theta=0.5;
N=5000;
M=1000;
lambda=n*theta;
mu=m*theta;
alpha=m*theta;
beta=1;
res=cell(8,4);
%Part 1 of discrete
%Binomial Distribution
figure(1);
for i=1:4
res{1,i}=binornd(n(i),theta,[1 N]);
m_bino_sim=mean(res{1,i});
v_bino_sim=var(res{1,i});
x=0:n(i);
subplot(2,2,i);
histogram(res{1,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_norm_the,v_norm_the]=normstat(n(i)*theta,sqrt(n(i)*theta*(1-theta)));
pdf_norm=normpdf(x,n(i)*theta,sqrt(n(i)*theta*(1-theta)));
plot(x,pdf_norm,'r*','MarkerSize',8);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_bino_sim),' , variation is ',num2str(v_bino_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_norm_the),' , variation is ',num2str(v_norm_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=5000$ ',' , the distributions' ...
' of $b(x;n,\theta)$ with $n=$',num2str(n(i)),' , $\theta=0.5$'],'Interpreter',"latex");
hold off;
end
%Normalized Binomial Distribution
figure(2);
for i=1:4
res{2,i}=(res{1,i}-n(i)*theta)/sqrt(n(i)*theta*(1-theta));
m_bino_norm_sim=mean(res{2,i});
v_bino_norm_sim=var(res{2,i});
y=-5:0.2:5;
subplot(2,2,i);
histogram(res{2,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_norm_norm_the,v_norm_norm_the]=normstat(0,1);
pdf_norm_norm=normpdf(y,0,1);
plot(y,pdf_norm_norm,'r*','MarkerSize',8);
axis([-5 5 0 inf]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_bino_norm_sim),' , variation is ',num2str(v_bino_norm_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_norm_norm_the),' , variation is ',num2str(v_norm_norm_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=5000$ ',' , the normalized distributions' ...
' of $b(x;n,\theta)$ with $n=$',num2str(n(i)),' , $\theta=0.5$'],'Interpreter',"latex");
hold off;
end
%Poisson Distribution
figure(3);
for i=1:4
res{3,i}=poissrnd(lambda(i),[1 N]);
m_poiss_sim=mean(res{3,i});
v_poiss_sim=var(res{3,i});
x=0:n(i);
subplot(2,2,i);
histogram(res{3,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_norm_the,v_norm_the]=normstat(lambda(i),sqrt(lambda(i)));
pdf_norm=normpdf(x,lambda(i),sqrt(lambda(i)));
plot(x,pdf_norm,'r*','MarkerSize',8);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_poiss_sim),' , variation is ',num2str(v_poiss_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_norm_the),' , variation is ',num2str(v_norm_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=5000$ ',' , the distributions' ...
' of $p(x;\lambda)$ with $\lambda=$',num2str(lambda(i))],'Interpreter',"latex");
hold off;
end
%Normalized Poisson Distribution
figure(4);
for i=1:4
res{4,i}=(res{3,i}-lambda(i))/sqrt(lambda(i));
m_poiss_norm_sim=mean(res{4,i});
v_poiss_norm_sim=var(res{4,i});
y=-5:0.2:5;
subplot(2,2,i);
histogram(res{4,i},'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_norm_norm_the,v_norm_norm_the]=normstat(0,1);
pdf_norm_norm=normpdf(y,0,1);
plot(y,pdf_norm_norm,'r*','MarkerSize',8);
axis([-5 5 0 inf]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_poiss_norm_sim),' , variation is ',num2str(v_poiss_norm_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_norm_norm_the),' , variation is ',num2str(v_norm_norm_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $N=5000$ ',' , the normalized distributions' ...
' of $p(x;\lambda)$ with $\lambda=$',num2str(n(i))],'Interpreter',"latex");
hold off;
end
% Part 2 of continuous
% Normalized Exponential Distribution
figure(5);
for i=1:3
for j=1:M
res{5,i}=exprnd(mu(i),[1 m(i)]);
res{6,i}(1,j)=(sum(res{5,i})-m(i)*mu(i))/sqrt(m(i)*mu(i)*mu(i));
end
m_exp_sim=mean(res{6,i});
v_exp_sim=var(res{6,i});
subplot(2,2,i);
histogram(res{6,i},y,'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_exp_the,v_exp_the]=normstat(0,1);
y=-100:0.1:100;
pdf_norm=normpdf(y,0,1);
plot(y,pdf_norm,'r');
axis([-5 5 0 inf]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_exp_sim),' , variation is ',num2str(v_exp_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_exp_the),' , variation is ',num2str(v_exp_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $M=1000$ ',' , the normalized distributions' ...
' of $Exp(\mu)$ with $\mu=$ ',num2str(mu(i))],'Interpreter',"latex");
hold off;
end
% Normalized Gamma Distribution
figure(6);
for i=1:3
for j=1:M
res{7,i}=gamrnd(alpha(i),beta,[1 m(i)]);
res{8,i}(1,j)=(sum(res{7,i})-m(i)*alpha(i)/beta)/sqrt(m(i)*alpha(i)/beta/beta);
end
m_gam_sim=mean(res{8,i});
v_gam_sim=var(res{8,i});
subplot(2,2,i);
histogram(res{8,i},y,'Normalization','pdf','FaceColor', ...
'cyan','EdgeColor','magenta','EdgeAlpha',0.5);
hold on;
[m_gam_the,v_gam_the]=normstat(0,1);
y=-100:0.1:100;
pdf_norm=normpdf(y,0,1);
plot(y,pdf_norm,'r');
axis([-5 5 0 inf]);
legend({['Simulated Distribution :',char(13,10)',' mean is ',num2str(m_gam_sim),' , variation is ',num2str(v_gam_sim)], ...
['Theoretical Distribution :',char(13,10)',' mean is ',num2str(m_gam_the),' , variation is ',num2str(v_gam_the)]});
xlabel('$k$','Interpreter',"latex");
ylabel('$P(X=k$)','Interpreter',"latex");
title(['When simulated number $M=1000$ ',' , the normalized distributions' ...
' of $g(x;\alpha,\beta)$ with $\alpha=',num2str(alpha(i)),' , \beta=1$ '],'Interpreter',"latex");
hold off;
end
矩估计与极大似然估计
%The properties of estimators of Example N(μ,σ).
clc;clear all;
N=[10 100 1000 10000];
res=cell(3,4);
mu=10;
sigma=0.2;
length=1000;
x=1:length;
figure(1);
for i=1:4
for j=1:length
res{1,i}=normrnd(mu,sigma,[1,N(i)]);
res{2,i}(1,j)=mean(res{1,i});
res{3,i}(1,j)=sqrt(var((res{1,i})));
end
subplot(2,2,i);
plot(res{2,i},res{3,i},'k.');
xlim([9.8 10.2]);
ylim([0 0.4]);
title(['Point estimation results with $\mu=10$ , $\sigma=0.2$' ' and sample size $n= $' num2str(N(i))],'Interpreter',"latex");
end
%The properties of estimators of Example u(0,β).
clc;clear all;
N=[10 100 1000 10000];
res=cell(4,4);
beta=10;
length=1000;
x=1:length;
figure(2);
for i=1:4
for j=1:length
res{1,i}=unifrnd(0,beta,[1,N(i)]);
res{2,i}(1,j)=2*mean(res{1,i});
res{3,i}(1,j)=max(res{1,i});
end
res{4,i}=(N(i)+1)*res{3,i}/N(i);
subplot(2,2,i);
plot(x,res{2,i},'r.');
hold on;
plot(x,res{3,i},'b.');
hold on;
plot(x,res{4,i},'g.');
hold off;
title(['Point estimation results with $\beta=10$' ' and sample size $n= $' num2str(N(i))],'Interpreter',"latex");
legend({'$\hat{\beta}_{ME}=2\overline{X}$','$\hat{\beta}_{MLE}=max{X_n}$','$\hat{\beta}_{RMLE}=\frac{n+1}{n}max{X_n}$'},'Interpreter',"latex");
end