算法之美 之 小小方差增量算法带来的大大收益

http://www.cnblogs.com/yoyaprogrammer/p/delta_variance.html

方差的统计学定义

x1,x2,...,xN

X样本的平均值计算很简单：

X¯¯¯=1Ni=1Nxi

σ2X=1Ni=1N(xiX¯¯¯)2

增量方差的推导

h1,h2,...,hM

a1,a2,...,aN

H¯¯¯=1Mi=1Mhi

σ2H=1Mi=1M(hiH¯¯¯)2

A¯¯¯=1Nj=1Naj

σ2A=1Nj=1N(ajA¯¯¯)2

h1,h2,...,hM,a1,a2,...,aN

X¯¯¯=1M+Ni=1Mhi+j=1Naj=MH¯¯¯+NA¯¯¯M+N

σ2=1M+Ni=1M(hiX¯¯¯)2+j=1N(ajX¯¯¯)2=1M+Ni=1M((hiH¯¯¯)(X¯¯¯H¯¯¯))2+j=1N((ajA¯¯¯)(X¯¯¯A¯¯¯))2=1M+N[i=1M((hiH¯¯¯)22(hiH¯¯¯)(X¯¯¯H¯¯¯)+(X¯¯¯H¯¯¯)2)+j=1N((ajA¯¯¯)22(ajA¯¯¯)(X¯¯¯A¯¯¯)+(X¯¯¯A¯¯¯)2)]=1M+N[Mσ2H+M(X¯¯¯H¯¯¯)22(X¯¯¯H¯¯¯)(i=1MhiMH¯¯¯)+Nσ2A+N(X¯¯¯A¯¯¯)22(X¯¯¯A¯¯¯)(j=1NajNA¯¯¯)]=1M+N[Mσ2H+M(X¯¯¯H¯¯¯)2+Nσ2A+N(X¯¯¯A¯¯¯)2]=M[σ2H+(X¯¯¯H¯¯¯)2]+N[σ2A+(X¯¯¯A¯¯¯)2]M+N

增量方差的实现

case class Measures(n: Int, sum: Double, variance: Double) {
def avg = sum / n

def appendDelta(delta: Measures): Measures = {
val newN = this.n + delta.n
val newSum = this.sum + delta.sum
val newAvg = newSum / newN

def partial(m: Measures): Double = {
val deltaAvg = newAvg - m.avg
m.n * ( m.variance + deltaAvg * deltaAvg )
}

val newVariance = (partial(this) + partial(delta)) / newN

Measures(newN, newSum, newVariance)
}
}

Measures包含了样本数，均值，和以及方差，构成了可增量计算方差的要素。同时也用它承载职责“方差增量算法”。

case class Samples(values: Seq[Double]) {
def measures: Measures = {
if (values == null || values.isEmpty)
Measures(0, 0d, 0d)
else
Measures(values.length, values.sum, variance)
}

private def variance: Double = {
val n = values.length
val avg = values.sum / n
values.foldLeft(0d) { case (sum, sample) =>
sum + (sample - avg) * (sample - avg)
} / n
}
}

Samples解决了如何计算一组样本值所需要的统计指标，按统计学定义直接计算，无增量算法。

object DeltaVarianceUtils {
def main(args: Array[String]): Unit = {
implicit val arrayToSamples = (values: Array[Double]) => Samples(values)

val historicalSamples = Array(1.5d, 3.4d, 7.8d, 11.6d)
val deltaSamples = Array(9.4d, 4.2d, 35.6d, 77.9d)

println("Variance: "
+ (historicalSamples ++ deltaSamples).measures.variance
)
println("Variance calculated by delta algorithm: "
+ historicalSamples.measures.appendDelta(deltaSamples.measures).variance
)
}
}

Variance: 598.2168750000002
Variance calculated by delta algorithm: 598.2168750000001

大大的收益

《算法之美》-高清完整版.rar

2017年09月19日 1.6MB 下载

07-20 1694

08-17 1137

01-12 1.1万

04-26 656

03-21 92

04-03 1460

09-20 4493

04-07 1123

02-29 671