样本峰度(kurtosis)与随机变量的峰度及四阶统计量之间的关系和计算估计

最新推荐文章于 2023-12-21 17:45:57 发布

lppamber

最新推荐文章于 2023-12-21 17:45:57 发布

阅读量1.1w

点赞数 3

分类专栏： Machine Learning 文章标签：统计学机器学习数据分析峰度

本文链接：https://blog.csdn.net/u011503666/article/details/109546638

版权

Machine Learning 专栏收录该内容

6 篇文章

订阅专栏

本文详细介绍了峰度的概念，包括随机变量的峰度定义、样本峰度的定义以及总体峰度的估计。峰度是衡量数据分布尖峭程度的统计量，通常用于分析数据分布的形状。文章还给出了Python的Pandas库中计算样本峰度的源码片段，并提及了峰度在正态分布中的特殊性质。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、峰度kurtosis

1. 随机变量的峰度定义（Pearson’s moment coefficient of kurtosis）

随机变量 ${X}$ 的峰度kurtosis为四阶标准矩，标准定义为：

$Kurt[X]=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^4\Big]=\frac{\mu_4}{\sigma^4}=\frac{E\Big[(X-\mu)^4\Big]}{\Big(E\Big[(X-\mu)^2\Big]\Big)^2},$

其中， $\mu_4$ 为随机变量 ${X}$ 的四阶中心距， $\sigma$ 为随机变量 ${X}$ 的标准差， $E$ 是求期望。

2. 样本峰度的定义

具有n( $n\geq 3$ )个样本的峰度定义为：

$\displaystyle g_2=\frac{m_4}{m_2^2} - 3=\frac{\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^4}{\Big[\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^2\Big]^2} - 3$

其中， $\bar x$ 为样本的均值， $m_2$ 为关于均值二阶样本矩（即二阶样本中心矩，或样本方差）， $m_4$ 为关于均值的四阶样本矩（即四阶样本中心矩）。

3. 总体峰度的估计

实际上，在许多文献中，尤其对于总体的样本子集来说，样本峰度是关于总体峰度的一个无偏估计量；一个常用的总体峰度的估计量计算公式为：

${\begin{aligned}G_{2}&={\frac {k_4}{k_2^{2}}} \\[18pt]&={\frac {n^{2}\,[(n+1)\,m_{4}-3\,(n-1)\,m_{2}^{2}]}{(n-1)\,(n-2)\,(n-3)}}\;{\frac {(n-1)^{2}}{n^{2}\,m_{2}^{2}}} \\[18pt]&={\frac {n-1}{(n-2)\,(n-3)}}\left[(n+1)\,{\frac {m_{4}}{m_{2}^{2}}}-3\,(n-1)\right] \\[18pt]&={\frac {n-1}{(n-2)\,(n-3)}}\left[(n+1)\,g_{2}+6\right]//样本峰度的无偏估计量 \\[18pt]&={\frac {(n+1)\,n\,(n-1)}{(n-2)\,(n-3)}}\;{\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{4}}{\left[\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}\right]^{2}}}-3\,{\frac {(n-1)^{2}}{(n-2)\,(n-3)}} \\[18pt]&={\frac {(n+1)\,n}{(n-1)\,(n-2)\,(n-3)}}\;{\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{4}}{k_{2}^{2}}}-3\,{\frac {(n-1)^{2}}{(n-2)(n-3)}}\end{aligned}}$

其中， $\kappa_4$ 为四阶累积量的唯一对称无偏估计量， $\kappa_2$ 为二阶累积量的对称无偏估计量（即样本方差的无偏估计量）， $m_4$ 为四阶样本中心矩， $m_2$ 为二阶样本中心矩， $\bar x$ 为样本均值。

通常来说，峰度 $G_2$ 都是有偏估计量，只有正态分布是无偏的。

大多数软件实现的峰度计算公式包括Python的Pandas库都是采用 $G_2$ 的计算公式实现的。

Pandas 源码片段

def nankurt(values, axis=None, skipna=True, mask=None):
    """
    Compute the sample excess kurtosis
    The statistic computed here is the adjusted Fisher-Pearson standardized
    moment coefficient G2, computed directly from the second and fourth
    central moment.
    """
    ......
    mean = values.sum(axis, dtype=np.float64) / count
    if axis is not None:
        mean = np.expand_dims(mean, axis)

    adjusted = values - mean
    if skipna:
        np.putmask(adjusted, mask, 0)
    adjusted2 = adjusted ** 2
    adjusted4 = adjusted2 ** 2
    m2 = adjusted2.sum(axis, dtype=np.float64)
    m4 = adjusted4.sum(axis, dtype=np.float64)

    with np.errstate(invalid='ignore', divide='ignore'):
        adj = 3 * (count - 1) ** 2 / ((count - 2) * (count - 3))
        numer = count * (count + 1) * (count - 1) * m4
        denom = (count - 2) * (count - 3) * m2 ** 2
  
    with np.errstate(invalid='ignore', divide='ignore'):
        result = numer / denom - adj
   ......
    return result

参考资料

Skewness - WikiPedia

Joanes D N, Gill C A. Comparing measures of sample skewness and kurtosis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1998, 47(1): 183-189.

binti Yusoff S, Wah Y B. Comparison of conventional measures of skewness and kurtosis for small sample size[C]//2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE). IEEE, 2012: 1-6.

Pebay P P. Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments[R]. Sandia National Laboratories, 2008.