本专辑参考了薛定宇老师的早年的一本专著《高等应用数学问题 MATLAB求解》,遴选部分习题供初学者参考,仅用作学习资料传播,版权属于原作者,特此致谢。
如有不妥,请联系删除。
主要内容改编或来源于:薛定宇、高等应用数学问题 MATLAB求解:习题参考解答(预印本)。
每日一题 | Matlab在概率统计中的应用(0003)
问题:
假设通过实验测出某组数据,试用MATLAB 对这些数据进行检验。
① 若认为该数据满足正态分布,且标准差为1.5,请检验该均值为0.5 的假设是否成立。
② 若未知其方差,试再检验其均值为0.5 的假设是否成立。
③ 试对给出数据的正态性进行检验。
解:
① 引入两个命题
这样
>> x=[-1.7908; 0.090316; 3.9223; 0.41351; 3.2618; -1.0665; 0.51693; -1.2615; 1.8206; -0.065217; 1.5803; 2.0033; 0.32378; 2.5006; 5.6959; 1.6804; 0.47348; 2.5546; 0.62587; -1.9909; 1.5924; 0.48874; -0.12149; 3.372; 4.6927; -0.67576; 0.73271; -1.3172; 2.031; -4.8203; 2.7278; 0.99252; 1.0887; 3.2303; -0.118; 0.20045; -2.3586; -3.2431; -1.083; 1.132; -0.71772; -2.5004; 2.9135; -1.1022; 0.47461; 0.49816; -0.061232; 1.3923; -0.09403; -3.244; -1.8152; 1.047; -2.3273; -0.28116; -1.6181; -2.1428; -1.7976; -0.40375; 0.89075; 0.23873; 2.8943; -0.052119; -2.9145; 0.5219; 0.66059; 1.2122; 1.6246; 3.3757; -0.73259; 1.0868; 0.47035; -0.80559; 5.3067; -0.079639; -2.6714; 4.4827; 1.2325; -2.0178; 1.8958; 3.357; -1.5161; 0.80414; 0.18716; -2.1176; 3.1634; 0.46528; 1.7065; -1.112; -0.97501; 1.2073; 0.74033; 4.6585; -0.11899; 5.4782; 3.8942; -3.8764; -3.2812; -0.79045; 0.081913; 0.5201; 2.3831; -1.1251; -1.1234; 0.047343; 0.45396; 1.1275; 2.8812; 1.8988; -3.4389; 2.069; 2.3258; 1.9318; 3.4477; 1.236; -1.0142; 0.16401; -5.0103; 1.5649; 0.76313; -0.82998];
>> u=sqrt(length(x))*(mean(x)-0.5)/1.5
u =
-0.18857874651686
因为 |u|< 1.96,所以可以接受其假设。
② 方差未知,则应该引入T 检验
>> [H,p,ci]=ttest(x,0.5,0.02)
H =
0
p =
0.89494112815610
ci =
0.01405637328924 0.93429921004409
因为H = 0,所以应该不能拒绝该检验,这时还可以得出置信区间为[0:014; 0:9343]。
③ 采用Jarque-Bera 假设检验,则可见该分布为正态分布。
>> [h,s]=jbtest(x,0.05)
h =
0
s =
0.99075654463354
拓展:
>> help ttest
ttest One-sample and paired-sample t-test.
ttest 单采样和配对采样 t 检验。
H = ttest(X) performs a t-test of the hypothesis that the data in the vector X come from a distribution with mean zero, and returns the result of the test in H. H=0 indicates that the null hypothesis ("mean is zero") cannot be rejected at the 5% significance level. H=1 indicates that the null hypothesis can be rejected at the 5% level. The data are assumed to come from a normal distribution with unknown variance.
X can also be a matrix or an N-D array. For matrices, ttest performs separate t-tests along each column of X, and returns a vector of results. For N-D arrays, ttest works along the first non-singleton dimension of X.
ttest treats NaNs as missing values, and ignores them.
H=ttest(X)对向量X中的数据来自平均值为零的分布的假设进行t检验,并返回H中的检验结果。
H=0表示在5%显著性水平上不能拒绝零假设(“均值为零”)。
H=1表示在5%的水平上可以拒绝零假设。假设数据来自方差未知的正态分布。
X也可以是矩阵或N-D数组。对于矩阵,ttest沿着X的每列执行单独的t-测试,并返回结果向量。
对于N-D数组,t-test沿着X的第一个非单重维度工作。
ttest将nan视为缺少的值,并忽略它们。
H = ttest(X,M) performs a t-test of the hypothesis that the data in X come from a distribution with mean M. M must be a scalar.
H = ttest(X,Y) performs a paired t-test of the hypothesis that two matched samples, in the vectors X and Y, come from distributions with equal means. The difference X-Y is assumed to come from a normal distribution with unknown variance. X and Y must have the same length. X and Y can also be matrices or N-D arrays of the same size.
[H,P] = ttest(...) returns the p-value, i.e., the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. Small values of P cast doubt on the validity of the null hypothesis.
[H,P,CI] = ttest(...) returns a 100*(1-ALPHA)% confidence interval for the true mean of X, or of X-Y for a paired test.
[H,P,CI,STATS] = ttest(...) returns a structure with the following fields:
'tstat' -- the value of the test statistic
'df' -- the degrees of freedom of the test
'sd' -- the estimated population standard deviation. For a paired test, this is the std. dev. of X-Y.
[...] = ttest(X,Y,'PARAM1',val1,'PARAM2',val2,...) specifies one or more of the following name/value pairs:
Parameter Value
'alpha' A value ALPHA between 0 and 1 specifying the significance level as (100*ALPHA)%. Default is 0.05 for 5% significance.
'dim' Dimension DIM to work along.
For example, specifying 'dim' as 1 tests the column means. Default is the first non-singleton dimension.
'tail' A string specifying the alternative hypothesis:
'both' -- "mean is not M" (two-tailed test)
'right' -- "mean is greater than M" (right-tailed test)
'left' -- "mean is less than M" (left-tailed test)
See also ttest2, ztest, signtest, signrank, vartest.
拓展:
>> help jbtest
jbtest Jarque-Bera hypothesis test of composite normality.
复合正态性的jbtest-Jarque-Bera假设检验。
H = jbtest(X) performs the Jarque-Bera goodness-of-fit test of composite normality, i.e., that the data in the vector X came from an unspecified normal distribution, and returns the result of the test in H. H=0 indicates that the null hypothesis ("the data are normally distributed") cannot be rejected at the 5% significance level. H=1 indicates that the null hypothesis can be rejected at the 5% level.
jbtest treats NaNs in X as missing values, and ignores them.
H=jbtest(X)执行复合正态性的Jarque-Bera拟合优度检验,即向量X中的数据来自未指定正态分布,并返回H中的检验结果。H=0表示在5%显著性水平上不能拒绝空假设(“数据正态分布”)。H=1表示在5%的水平上可以拒绝零假设。
jbtest将X中的nan视为缺少的值,并忽略它们。
The Jarque-Bera test is a 2-sided goodness-of-fit test suitable for situations where a fully-specified null distribution is not known, and its parameters must be estimated. For large sample sizes, the test statistic has a chi-square distribution with two degrees of freedom. Critical values, computed using Monte-Carlo simulation, have been tabulated for sample sizes N <= 2000 and significance levels 0.001 <= ALPHA <= 0.50.
jbtest computes a critical value for a given test by interpolating into that table, using the analytic approximation to extrapolate for larger sample sizes.
Jarque-Bera检验是一种双面拟合优度检验,适用于完全指定的零分布未知的情况,其参数必须估计。对于大样本,检验统计量具有两个自由度的卡方分布。利用蒙特卡罗模拟计算出的临界值已制成表格,用于样本大小N<=2000和显著性水平0.001<=ALPHA<=0.50。
jbtest通过插值到该表中来计算给定测试的临界值,使用解析近似值对较大的样本量进行外推。
The Jarque-Bera hypotheses and test statistic are:
Null Hypothesis: X is normally distributed with unspecified mean and standard deviation.
Alternative Hypothesis: X is not normally distributed. The test is specifically designed for alternatives in the Pearson family of distributions.
Test Statistic: JBSTAT = N*(SKEWNESS^2/6 + (KURTOSIS-3)^2/24), where N is the sample size and the kurtosis of the normal distribution is defined as 3.
H = jbtest(X,ALPHA) performs the test at significance level ALPHA. ALPHA is a scalar in the range 0.001 <= ALPHA <= 0.50. To perform the test at significance levels outside that range, use the MCTOL input argument.
[H,P] = jbtest(...) returns the p-value P, computed using inverse interpolation into the look-up table of critical values. Small values of P cast doubt on the validity of the null hypothesis. jbtest warns when P is not found within the limits of the table, i.e., outside the interval [0.001, 0.50], and returns one or the other endpoint of that interval. In this case, you can use the MCTOL input argument to compute a more accurate value.
[H,P,JBSTAT] = jbtest(...) returns the test statistic JBSTAT.
[H,P,JBSTAT,CRITVAL] = jbtest(...) returns the critical value CRITVAL for the test. When JBSTAT > CRITVAL, the null hypothesis can be rejected at a significance level of ALPHA.
[H,P,...] = jbtest(X,ALPHA,MCTOL) computes a Monte-Carlo approximation for P directly, rather than using interpolation of the pre-computed tabulated values. This is useful when ALPHA or P is outside the range of the look-up table. jbtest chooses the number of MC replications, MCREPS, large enough to make the MC standard error for P, SQRT(P*(1-P)/MCREPS), less than MCTOL.
See also lillietest, kstest, kstest2, cdfplot.
补充例子1:
某种电子元件的寿命X(以小时计)服从正态分布,、σ2均未知。现测得16只元件的寿命如下:
159 280 101 212 224 379 179 264 222 362 168 250
149 260 485 170
问是否有理由认为元件的平均寿命大于225(小时)?
解:
>> X=[159 280 101 212 224 379 179 264 222 362 168 250 149 260 485 170];
>> [h,sig,ci]=ttest(X,225,0.05,1)
结果显示为:
h =
0
sig =
0.2570
ci =
198.2321 Inf %均值225在该置信区间内
结果表明:
H=0表示在水平下应该接受原假设,即认为元件的平均寿命不大于225小时。
补充例子2:
在平炉上进行一项试验以确定改变操作方法的建议是否会增加钢的产率,试验是在同一只平炉上进行的。每炼一炉钢时除操作方法外,其他条件都尽可能做到相同。先用标准方法炼一炉,然后用建议的新方法炼一炉,以后交替进行,各炼10炉,其产率分别为:
(1)标准方法:78.1 72.4 76.2 74.3 77.4 78.4 76.0 75.5 76.7 77.3
(2)新方法: 79.1 81.0 77.3 79.1 80.0 79.1 79.1 77.3 80.2 82.1
设这两个样本相互独立,且分别来自两个正态总体。
问建议的新操作方法能否提高产率?(取α=0.05)
解:
两个总体方差不变时,
>> X=[78.1 72.4 76.2 74.3 77.4 78.4 76.0 75.5 76.7 77.3];
>>Y=[79.1 81.0 77.3 79.1 80.0 79.1 79.1 77.3 80.2 82.1];
>> [h,sig,ci]=ttest2(X,Y,0.05,-1)
结果显示为:
h =
1
sig =
2.1759e-004 %说明两个总体均值相等的概率很小
ci =
-Inf -1.9083
结果表明:
H=1表示在水平下,应该拒绝原假设,即认为建议的新操作方法提高了产率,因此,比原方法好。
快去试试吧!
喜欢就点个在看吧!