【参考资料】
1.Shape, Center, and Spread of a Distribution
1 Shape
顾名思义,数据分布的形状描述的就是其 PDF 的形状,具体类型大致有以下几种
显然,Shape of 高斯分布 = bell-shape (symmetric, unimodal)
2 Center
对于数据分布的中心,至少有四种最常见的衡量方式
- mean 均值
μ = Σ x N \mu=\frac{\Sigma x}{N} μ=NΣx and x ˉ = Σ x n \bar{x}=\frac{\Sigma x}{n} xˉ=nΣx - median 中位数
例如集合 {1, 2, 99, 1000, 200000} 的中位数为 99 - mode
most frequent data value 出现次数最多的数据 - midrange
midrange = (max + min) / 2
3 Spread
Spread 散布,能够描述概率分布的许多性质
- variance & standard deviation 方差和标准差
σ 2 = Σ ( x − μ ) 2 N \sigma^2 = \frac{\Sigma(x-\mu)^2}{N} σ2=NΣ(x−μ)2 and σ = Σ ( x − μ ) 2 N \sigma = \sqrt{\frac{\Sigma(x-\mu)^2}{N}} σ=NΣ(x−μ)2
s 2 = Σ ( x − x ˉ ) 2 n − 1 s^2 = \frac{\Sigma(x-\bar{x})^2}{n-1} s2=n−1Σ(x−xˉ)2 and s = Σ ( x − x ˉ ) 2 n − 1 s = \sqrt{\frac{\Sigma(x-\bar{x})^2}{n-1}} s=n−1Σ(x−xˉ)2 - range
range = max - min - mean absolute deviation 平均绝对偏差
M A D = Σ ∣ x − μ ∣ N MAD = \frac{\Sigma|x-\mu|}{N} MAD=NΣ∣x−μ∣ - interquartile range (IQR) 四分位差
If the data has quartiles Q 1 , Q 2 , Q 3 , Q 4 Q_1,Q_2,Q_3,Q_4 Q1,Q2,Q3,Q4 (noting that Q 2 Q_2 Q2 is the median and Q 4 Q_4 Q4 is the maximum value), then I Q R = Q 3 − Q 1 IQR=Q_3-Q_1 IQR=Q3−Q1