算法——避免计算机的四舍五入求解方差

最新推荐文章于 2024-01-04 19:48:45 发布

「已注销」

最新推荐文章于 2024-01-04 19:48:45 发布

阅读量1.1k

点赞数

分类专栏：算法设计与分析文章标签：方差计算

本文链接：https://blog.csdn.net/sinat_37976731/article/details/80153067

版权

本文介绍了如何使用递推算法避免四舍五入导致的精度问题来计算方差。通过数学证明和精度分析，阐述了这种方法能有效保留计算的准确性，特别是在处理大量数据时。

摘要由CSDN通过智能技术生成

算法——避免计算机的四舍五入求解方差

问题

在《算法》第四版里面，第一章有介绍避免四舍五入产生的误差来求解方差和标准差的算法，如下：

public class Accumulator {
   
    private int n = 0;          // number of data values
    private double sum = 0.0;   // sample variance * (n-1)
    private double mu = 0.0;    // sample mean

    /**
     * Initializes an accumulator.
     */
    public Accumulator() {
    }

    /**
     * Adds the specified data value to the accumulator.
     * @param  x the data value
     */
    public void addDataValue(double x) {
        n++;
        double delta = x - mu;
        mu  += delta / n;
        sum += (double) (n - 1) / n * delta * delta;
    }

    /**
     * Returns the mean of the data values.
     * @return the mean of the data values
     */
    public double mean() {
        return mu;
    }

    /**
     * Returns the sample variance of the data values.
     * @return the sample variance of the data values
     * 这里为什么是除以n-1而不是n，我一开始也是很迷惑，后来发现这是样本方差
     */
    public double var() {
        if (n <= 1) return Double.NaN;
        return sum / (n - 1);
    }

    /**
     * Returns the sample standard deviation of the data values.
     * @return the sample standard deviation of the data values
     */
    public double stddev() {
        return Math.sqrt(this.var());
    }

    /**
     * Returns the number of data values.
     * @return the number of data values
     */
    public int count() {
        return n;
    }

    /**
     * Returns a string representation of this accumulator.
     * @return a string representation of this accumulator
     */
    public String toString() {
        return "n = " + n + ", mean = " + mean() + ", stddev = " + stddev();
    }

    /**
     * Unit tests the {@code Accumulator} data type.
     * Reads in a stream of real number from standard input;
     * adds them to the accumulator; and prints the mean,
     * sample standard deviation, and sample variance to standard
     * output.
     *
     * @param args the command-line arguments
     */
    public static void main(String[] args) {
        Accumulator stats = new Accumulator();
        while (!StdIn.isEmpty()) {
            double x = StdIn.readDouble();
            stats.addDataValue(x);
        }

        StdOut.printf("n      = %d\n",   stats.count());
        StdOut.printf("mean   = %.5f\n", stats.mean());
        StdOut.printf("stddev = %.5f\n", stats.stddev());
        StdOut.printf("var    = %.5f\n", stats.var());
        StdOut.println(stats);
    }
}

其中主要在于addDataValue()方法（mean方法返回平均值，var方法返回方差），它不像我们往常求所有的平方和在计算方差，而是采用了递推的方式。这里面涉及到两个递推计算公式，将在下面证明。

精度分析

double 是有精度范围的，简单来说就是科学记数法。

$N\times10^{n}$

这种形式，一部分空间用来存有效数字，另一部分用来存指数。

因此虽然说 double 的表示范围很大，但这并不是说 double 就可以存下三百多位小数，只是可以存下这个指数而已。

其实后面两百多位都是 0，类似于这样：

$1.12345678901234\times10^{308}$

展开的话除了前面 15 位，后面全是零。

现在来解答减少四舍五入的问题，假设全部加起来。这样小数部分就会不够用，我们取一个极端情况，1000000+个随机小数不停加。

$\sum_{}^{}{10^{9} + k}, 0<k<1$

用 double 表示会变成这样，10 位整数 + 5 位小数：

$1.00000000012345\times10^{9}$

加了十次后需要进位，15 位有效数字不够用了，只能四舍五入，11 位整数 + 4 位小数：

$1.00000000001235\times10^{10}$

于是精度就丢失了，如果数字更大，这个问题会更明显。

数学证明

接下来就是证明，首先有平均数公式：

m n = \sum n i = 1 x i n, m n - 1 = \sum n - 1 i = 1 x i n - 1

$m_{n}=\frac{\sum_{i=1}^{n}{x_{i}}}{n}, m_{n-1}=\frac{\sum_{i=1}^{n-1}{x_{i}}}{n-1}$

其中 $m_{n}$ 代表 $n$ 个数的平均值， $\sum_{i=1}^{n}{x_{i}}$ 代表 $n$ 个数的和。

相减

$m_{n}-m_{n-1}=\frac{\sum_{i=1}^{n}{x_{i}}}{n}-\frac{\sum_{i=1}^{n-1}{x_{i}}}{n-1}$

两边同乘以 $n$

n (m

最低0.47元/天解锁文章

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法——避免计算机的四舍五入求解方差

算法——避免计算机的四舍五入求解方差问题在《算法》第四版里面，第一章有介绍避免四舍五入产生的误差来求解方差和标准差的算法，如下：public class Accumulator { private int n = 0; // number of data values private double sum = 0.0; // sample ...
复制链接

扫一扫

专栏目录