766人阅读 评论(0)

# 算法之美 之 小小方差增量算法带来的大大收益

http://www.cnblogs.com/yoyaprogrammer/p/delta_variance.html

## 方差的统计学定义

x1,x2,...,xN

X样本的平均值计算很简单：

X¯¯¯=1Ni=1Nxi

σ2X=1Ni=1N(xiX¯¯¯)2

## 增量方差的推导

h1,h2,...,hM

a1,a2,...,aN

H¯¯¯=1Mi=1Mhi

σ2H=1Mi=1M(hiH¯¯¯)2

A¯¯¯=1Nj=1Naj

σ2A=1Nj=1N(ajA¯¯¯)2

h1,h2,...,hM,a1,a2,...,aN

X¯¯¯=1M+Ni=1Mhi+j=1Naj=MH¯¯¯+NA¯¯¯M+N

σ2=1M+Ni=1M(hiX¯¯¯)2+j=1N(ajX¯¯¯)2=1M+Ni=1M((hiH¯¯¯)(X¯¯¯H¯¯¯))2+j=1N((ajA¯¯¯)(X¯¯¯A¯¯¯))2=1M+N[i=1M((hiH¯¯¯)22(hiH¯¯¯)(X¯¯¯H¯¯¯)+(X¯¯¯H¯¯¯)2)+j=1N((ajA¯¯¯)22(ajA¯¯¯)(X¯¯¯A¯¯¯)+(X¯¯¯A¯¯¯)2)]=1M+N[Mσ2H+M(X¯¯¯H¯¯¯)22(X¯¯¯H¯¯¯)(i=1MhiMH¯¯¯)+Nσ2A+N(X¯¯¯A¯¯¯)22(X¯¯¯A¯¯¯)(j=1NajNA¯¯¯)]=1M+N[Mσ2H+M(X¯¯¯H¯¯¯)2+Nσ2A+N(X¯¯¯A¯¯¯)2]=M[σ2H+(X¯¯¯H¯¯¯)2]+N[σ2A+(X¯¯¯A¯¯¯)2]M+N

## 增量方差的实现

case class Measures(n: Int, sum: Double, variance: Double) {
def avg = sum / n

def appendDelta(delta: Measures): Measures = {
val newN = this.n + delta.n
val newSum = this.sum + delta.sum
val newAvg = newSum / newN

def partial(m: Measures): Double = {
val deltaAvg = newAvg - m.avg
m.n * ( m.variance + deltaAvg * deltaAvg )
}

val newVariance = (partial(this) + partial(delta)) / newN

Measures(newN, newSum, newVariance)
}
}

Measures包含了样本数，均值，和以及方差，构成了可增量计算方差的要素。同时也用它承载职责“方差增量算法”。

case class Samples(values: Seq[Double]) {
def measures: Measures = {
if (values == null || values.isEmpty)
Measures(0, 0d, 0d)
else
Measures(values.length, values.sum, variance)
}

private def variance: Double = {
val n = values.length
val avg = values.sum / n
values.foldLeft(0d) { case (sum, sample) =>
sum + (sample - avg) * (sample - avg)
} / n
}
}

Samples解决了如何计算一组样本值所需要的统计指标，按统计学定义直接计算，无增量算法。

object DeltaVarianceUtils {
def main(args: Array[String]): Unit = {
implicit val arrayToSamples = (values: Array[Double]) => Samples(values)

val historicalSamples = Array(1.5d, 3.4d, 7.8d, 11.6d)
val deltaSamples = Array(9.4d, 4.2d, 35.6d, 77.9d)

println("Variance: "
+ (historicalSamples ++ deltaSamples).measures.variance
)
println("Variance calculated by delta algorithm: "
+ historicalSamples.measures.appendDelta(deltaSamples.measures).variance
)
}
}

Variance: 598.2168750000002
Variance calculated by delta algorithm: 598.2168750000001

## 大大的收益

0
0

* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
个人资料
• 访问：1404723次
• 积分：16749
• 等级：
• 排名：第622名
• 原创：3篇
• 转载：2014篇
• 译文：0篇
• 评论：113条
文章分类
评论排行
最新评论