About Tolerance Intervals

It can be useful to have an upper and lower limit on data. These bounds can be used to help identify anomalies and set expectations for what to expect. A bound on observations from a population is called a tolerance interval.

A tolerance interval is different from a prediction interval that quantifies the uncertainty for a single predicted value. It is also different from a confidence interval that quantifies the uncertainty of a population parameter such as a mean. Instead, a tolerance interval covers a proportion of the population distribution.

 you will know:

  • That statistical tolerance intervals provide a bounds on observations from a population.
  • That a tolerance interval requires that both a coverage proportion and confidence be specified.
  • That the tolerance interval for a data sample with a Gaussian distribution can be easily calculated.

1.1 Tutorial Overview

1.Bounds on Data

2. What Are Statistical Tolerance Intervals?

3. How to Calculate Tolerance Intervals

4. Tolerance Interval for Gaussian Distribution

1.2 Bounds on Data

The range of common values for data is called a tolerance interval.

1.3 What Are Statistical Tolerance Intervals?

The tolerance interval is a bound on an estimate of the proportion of data in a population.

A statistical tolerance interval [contains] a specified proportion of the units from the sampled population or process.

A tolerance interval is defined in terms of two quantities:

  • Coverage: The proportion of the population covered by the interval.
  • Confidence: The probabilistic confidence that the interval covers the proportion of the population.

The tolerance interval is constructed from data using two coefficients, the coverage and the tolerance coefficient. The coverage is the proportion of the population (p) that the interval is supposed to contain. The tolerance coefficient is the degree of confidence with which the interval reaches the specified coverage.

1.4 How to Calculate Tolerance Intervals

The size of a tolerance interval is proportional to the size of the data sample from the population and the variance of the population. There are two main methods for calculating tolerance intervals depending on the distribution of data: parametric and nonparametric methods.

  • Parametric Tolerance Interval: Use knowledge of the population distribution in specifying both the coverage and confidence. Often used to refer to a Gaussian distribution.
  • Nonparametric Tolerance Interval: Use rank statistics to estimate the coverage and confidence, often resulting less precision (wider intervals) given the lack of information about the distribution.

Tolerance intervals are relatively straightforward to calculate for a sample of independent observations drawn from a Gaussian distribution. We will demonstrate this calculation in the next section.

1.5 Tolerance Interval for Gaussian Distribution

We will create a sample of 100 observations drawn from a Gaussian distribution with a mean of 50 and a standard deviation of 5.

# generate dataset
from numpy.random import randn

data = 5 * randn(100) + 50

Remember that the degrees of freedom are the number of values in the calculation that can vary. Here, we have 100 observations, therefore 100 degrees of freedom. We do not know the standard deviation, therefore it must be estimated using the mean. This means our degrees of freedom will be (N - 1) or 99.

# specify degrees of freedom
n = len(data)
dof = n - 1

Next, we must specify the proportional coverage of the data.

# specify data coverage
from scipy.stats import norm
prop = 0.95
prop_inv = (1.0 - prop) / 2.0
gauss_critical = norm.ppf(prop_inv)

Next, we need to calculate the confidence of the coverage. We can do this by retrieving the critical value from the Chi-Squared distribution for the given number of degrees of freedom and desired probability. We can use the chi2.ppf() SciPy function.

# specift confidence
from scipy.stats import chi2
prob = 0.99
prop_inv = 1.0 - prob
chi_critical = chi2.ppf(prop_inv,dof)

 Where dof is the number of degrees of freedom, n is the size of the data sample, gauss critical is the critical value from the Gaussian distribution, such as 1.96 for 95% coverage of the population, and chi critical is the critical value from the Chi-Squared distribution for the desired confidence and degrees of freedom.

# calculate tolerance interval
from numpy import sqrt
interval = sqrt((dof * (1 + (1/n)) * gauss_critical**2) / chi_critical)

We can tie all of this together and calculate the Gaussian tolerance interval for our data sample. The complete example is listed below.

#parametric tolerance interval
from numpy.random import seed
from numpy.random import randn
from numpy import mean
from numpy import sqrt
from scipy.stats import chi2
from scipy.stats import norm
# seed the random number generator
seed(1)
# generate dataset
data = 5 * randn(100) + 50
# specify degress of freedom
n = len(data)
dof = n - 1
# specify data coverage
prop = 0.95
prop_inv = (1.0 - prop) / 2.0
gauss_critical = norm.ppf(prop_inv)
print('Gaussian critical value: %.3f (coverage=%d%%)' %(gauss_critical,prop*100))
# specify confidence
prob = 0.99
prop_inv = 1.0 - prob
chi_critical = chi2.ppf(prop_inv, dof)
print('Chi-Squared critical value: %.3f (prob=%d%%,dof=%d)' %(chi_critical,prob*100,dof))
# tolerance
interval = sqrt((dof * (1 + (1/n)) * gauss_critical**2) / chi_critical)
print('Tolerance Interval: %.3f' % interval)

#summarize
data_mean = mean(data)
lower,upper = data_mean - interval, data_mean + interval
print('%.2f to %.2f covers %d%% of data with a confidence of %d%%' %(lower,upper,prop*100,prob*100))

Running the example first calculates and prints the relevant critical values for the Gaussian and Chi-Squared distributions. The tolerance is printed, then presented correctly.

 It can also be helpful to demonstrate how the tolerance interval will decrease (become more precise) as the size of the sample is increased. The example below demonstrates this by calculating the tolerance interval for different sample sizes for the same small contrived problem.

 

# plot tolerance interval vs sample size
from numpy.random import seed
from numpy.random import randn
from numpy import sqrt
from scipy.stats import chi2
from scipy.stats import norm
from matplotlib import pyplot
# seed the random number generator
seed(1)
# sample sizes
seed(1)
#sample sizes
sizes = range(5,15)
for n in sizes:
    # generate dataset
    data = 5 * randn(n) + 50
    # calculate degrees of freedom
    dof = n - 1
    # specify data coverage
    prop = 0.95
    pro_inv = (1.0 - prop) / 2.0
    gauss_critical = norm.ppf(prop_inv)
    # specify confidence
    prob = 0.99
    prop_inv = 1.0 - prob
    chi_critical = chi2.ppf(prop_inv, dof)
    # tolerance
    tol = sqrt((dof * (1 + (1/n)) * gauss_critical**2)/ chi_critical)
    # plot
    pyplot.errorbar(n, 50, yerr=tol, color='blue',fmt='o')
# plot results
pyplot.show()

Running the example creates a plot showing the tolerance interval around the true population mean. We can see that the interval becomes smaller (more precise) as the sample size is increased from 5 to 15 examples.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在非线性优化问题中,optimality tolerance(最优性容差)是一个用于判断当前迭代点是否接近最优解的指标。它表示了在达到最优解之前,我们愿意容忍的目标函数值变化的大小。 在MATLAB的优化工具箱中,可以通过设置优化选项来定义optimality tolerance。具体而言,可以使用optimset函数或optimoptions函数来创建一个包含各种选项设置的选项结构体,并在其中指定optimality tolerance的值。 以下是一个示例,展示如何使用optimset函数来定义optimality tolerance为1e-6: ```matlab options = optimset('TolFun', 1e-6); ``` 在这个示例中,我们创建了一个选项结构体options,并将TolFun设置为1e-6。这个值表示在目标函数的值变化小于1e-6时,认为当前迭代点已经接近最优解。 另外,如果你使用的是optimoptions函数,可以按照以下方式指定optimality tolerance: ```matlab options = optimoptions('fminunc', 'OptimalityTolerance', 1e-6); ``` 在这个示例中,我们使用optimoptions函数创建了一个选项结构体options,并将OptimalityTolerance设置为1e-6。 无论是使用optimset函数还是optimoptions函数,都可以根据具体需求设置不同的优化选项,包括optimality tolerance。通过调整optimality tolerance的值,可以控制算法在迭代过程中对最优解的接近程度。较小的optimality tolerance值会导致算法更加精确地接近最优解,但可能需要更多的迭代次数。而较大的optimality tolerance值会使算法更容易终止,但可能无法达到很高的精度。 因此,需要根据具体问题的需求和计算资源的限制来选择合适的optimality tolerance值。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值