About Critical Values

最新推荐文章于 2023-10-03 20:04:11 发布

数据架构

最新推荐文章于 2023-10-03 20:04:11 发布

阅读量658

点赞数

分类专栏： Statistical Methods 文章标签：大数据统计方法机器学习 python

本文链接：https://blog.csdn.net/u011868279/article/details/125500434

版权

Statistical Methods 专栏收录该内容

23 篇文章

订阅专栏

1. Why Do We Need Critical Values?

Many statistical hypothesis tests return a p-value that is used to interpret the outcome of the test. Some tests do not return a p-value, requiring an alternative method for interpreting the calculated test statistic directly. A statistic calculated by a statistical hypothesis test can be interpreted using critical values from the distribution of the test statistic.Some examples of statistical hypothesis tests and their distributions from which critical values can be calculated are as follows:

Z-Test: Gaussian distribution.
Student’s t-Test: Student’s t-distribution.
Chi-Squared Test: Chi-Squared distribution.
ANOVA: F-distribution.

Critical values are also used when defining intervals for expected (or unexpected) observations in distributions. Calculating and using critical values may be appropriate when quantifying the uncertainty of estimated statistics or intervals such as confidence intervals and tolerance intervals. Note, a p-value can be calculated from a test statistic by retrieving the probability from the test statistics cumulative density function (CDF).

2.What Is a Critical Value?

A critical value is defined in the context of the population distribution and a probability. An observation from the population with a value equal to or lesser than a critical value with the given probability. We can express this mathematically as follows:

Where Pr is the calculation of probability, X are observations from the population, critical value is the calculated critical value, and probability is the chosen probability. Critical values are calculated using a mathematical function where the probability is provided as an argument.The probability is often expressed as a significance, denoted as the lowercase Greek letter alpha (α), which is the inverted probability.

Standard alpha values are used when calculating critical values, chosen for historical reasons and continually used for consistency reasons. These alpha values include:

Critical values provide an alternative and equivalent way to interpret statistical hypothesis tests to the p-value.

3. How to Use Critical Values

Calculated critical values are used as a threshold for interpreting the result of a statistical test. The observation values in the population beyond the critical value are often called the critical region or the region of rejection.

3.1 One-Tailed Test

A one-tailed test has a single critical value, such as on the left or the right of the distribution. Often, a one-tailed test has a critical value on the right of the distribution for non-symmetrical distributions (such as the Chi-Squared distribution). The statistic is compared to the calculated critical value. If the statistic is less than or equal to the critical value, the null hypothesis of the statistical test rejected or it is failed to be rejected. We can summarize this interpretation as follows:

Test Statistic < Critical Value: not significant result, fail to reject null hypothesis (H0).
Test Statistic ≥ Critical Value: significant result, reject null hypothesis (H0).

3.2 Two-Tailed Test

A two-tailed test has two critical values, one on each side of the distribution, which is often assumed to be symmetrical (e.g. Gaussian and Student-t distributions.). When using a twotailed test, a significance level (or alpha) used in the calculation of the critical values must be divided by 2. The critical value will then use a portion of this alpha on each side of the distribution. To make this concrete, consider an alpha of 5%. This would be split to give two alpha values of 2.5% on either side of the distribution with an acceptance area in the middle of the distribution of 95%. We can refer to each critical value as the lower and upper critical values for the left and right of the distribution respectively. Test statistic values more than or equal to the lower critical value and less than or equal to the upper critical value indicate the failure to reject the null hypothesis. Whereas test statistic values less than the lower critical value and more than the upper critical value indicate rejection of the null hypothesis for the test. We can summarize this interpretation as follows:

Lower CR < Test Statistic > Upper CR: not significant result, fail to reject null hypothesis (H0)
Test Statistic ≤ Lower CR OR Test Statistic ≥ Upper CR: significant result, reject null hypothesis (H0)

If the distribution of the test statistic is symmetric around a mean of zero, then we can shortcut the check by comparing the absolute (positive) value of the test statistic to the upper critical value.

|Test Statistic| < Upper Critical Value: not significant result, fail to reject null hypothesis (H0), distributions same.
|Test Statistic| ≥ Upper Critical Value: significant result, reject null hypothesis (H0), distributions differ.

4.How to Calculate Critical Values

Density functions return the probability of an observation in the distribution. Recall the definitions of the PDF and CDF as follows:

Probability Density Function (PDF): Returns the probability for an observation having a specific value from the distribution.
Cumulative Density Function (CDF): Returns the probability for an observation equal to or lesser than a specific value from the distribution.
Percent Point Function (PPF): Returns the observation value for the provided probability that is less than or equal to the provided probability from the distribution.

4.1 Gaussian Critical Values

The example below calculates the percent point function for 95% on the standard Gaussian distribution.

# gaussian percent point function
from scipy.stats import norm
# define probability
p = 0.95
# retrieve value <= probability
value = norm.ppf(p)
print(value)

# confirm with cdf
p = norm.cdf(value)
print(p)

Running the example first prints the value that marks 95% or less of the observations from the distribution of about 1.65. This value is then confirmed by retrieving the probability of the observation from the CDF, which returns 95%, as expected. We can see that the value 1.65 aligns with our expectation with regard to the number of standard deviations from the mean that cover 95% of the distribution in the 68-95-99.7 rule (linked in the Further Reading section).

4.2 Student's Critical Values

The example below calculates the percentage point function for 95% on the standard Student’s t-distribution with 10 degrees of freedom.

# student t-distribution percent point function
from scipy.stats import t
# define probability
p = 0.95
df = 10
# retrieve value <= probability
value = t.ppf(p,df)
print(value)
# confirm with cdf
p = t.cdf(value, df)
print(p)

Running the example returns the value of about 1.812 or less that covers 95% of the observations from the chosen distribution. The probability of the value is then confirmed (with minor rounding error) via the CDF.

4.3 Chi-Squared Critical Values

The example below calculates the percentage point function for 95% on the standard Chi-Squared distribution with 10 degrees of freedom.

# chi-squared percent point function
from scipy.stats import chi2
# define probability
p = 0.95
df = 10
# retrieve value <= probability
value = chi2.pdf(p,df)
print(value)
# confirm with cdf
p = chi2.cdf(value, df)
print(p)

Running the example first calculates the value of 18.3 or less that covers 95% of the observations from the distribution. The probability of this observation is confirmed by using it as input to the CDF.