一、问题背景
\quad
给你一个数组
x
=
[
1
,
2
,
3
,
6
]
x=[1,2,3,6]
x=[1,2,3,6],如何快速计算其前缀数组
x
[
0
⋯
n
]
x[0\cdots n]
x[0⋯n]的均值和方差,即需要返回均值数组
m
=
[
1
,
1.5
,
2
,
3
]
m=[1,1.5,2,3]
m=[1,1.5,2,3],
m
[
2
]
=
2
m[2]=2
m[2]=2表示数组
x
[
0
⋯
2
]
=
[
1
,
2
,
3
]
x[0 \cdots 2]=[1,2,3]
x[0⋯2]=[1,2,3]的均值为2;同时返回方差数组
S
=
[
0
,
0.25
,
2
3
,
3.5
]
]
S=[0, 0.25, \frac{2}{3}, 3.5]]
S=[0,0.25,32,3.5]],
S
[
2
]
=
2
/
3
S[2]=2/3
S[2]=2/3表示数组
x
[
0
⋯
2
]
=
[
1
,
2
,
3
]
x[0 \cdots 2]=[1,2,3]
x[0⋯2]=[1,2,3]的方差为
2
3
\frac{2}{3}
32。
\quad
对于这个问题,我们很容易找到
O
(
n
2
)
O(n^2)
O(n2)级别的算法暴力计算,那有没有
O
(
n
)
O(n)
O(n)级别的算法呢?
\quad
我们尝试思考这样一个问题,假设我求解出了数组前
n
−
1
n-1
n−1项的均值和方差,能否求出一个递推式子直接算出前
n
n
n项的均值和方差呢?
二、理论推导
\quad
定义均值数组
m
m
m和方差乘上当前长度
n
n
n的数组
S
S
S:
m
n
=
∑
i
=
1
n
x
i
n
,
S
n
=
∑
i
=
1
n
(
x
i
−
m
n
)
2
m_n = \frac{\sum_{i=1}^nx_i}{n}, S_n=\sum_{i=1}^n(x_i-m_n)^2
mn=n∑i=1nxi,Sn=i=1∑n(xi−mn)2
首先容易得到均值的递推式子:
m
n
=
∑
i
=
1
n
x
i
n
=
∑
i
=
1
n
−
1
x
i
+
x
n
n
=
n
−
1
n
m
n
−
1
+
1
n
x
n
m_n= \frac{\sum_{i=1}^nx_i}{n}= \frac{\sum_{i=1}^{n-1}x_i+x_n}{n}=\frac{n-1}{n}m_{n-1}+\frac{1}{n}x_n
mn=n∑i=1nxi=n∑i=1n−1xi+xn=nn−1mn−1+n1xn
将上述式子代入可以得到
x
i
−
m
n
=
x
i
−
(
n
−
1
n
m
n
−
1
+
1
n
x
n
)
=
x
i
−
m
n
−
1
−
1
n
(
x
n
−
m
n
−
1
)
x_i-m_n=x_i-(\frac{n-1}{n}m_{n-1}+\frac{1}{n}x_n)=x_i-m_{n-1}-\frac{1}{n}(x_n-m_{n-1})
xi−mn=xi−(nn−1mn−1+n1xn)=xi−mn−1−n1(xn−mn−1),当
i
=
n
i=n
i=n时得到
x
n
−
m
n
=
n
−
1
n
(
x
n
−
m
n
−
1
)
x_n-m_n=\frac{n-1}{n}(x_n-m_{n-1})
xn−mn=nn−1(xn−mn−1)
有了这些辅助,接下来我们尝试推到
S
S
S的递推式:
S
n
=
∑
i
=
1
n
(
x
i
−
m
n
)
2
=
∑
i
=
1
n
−
1
(
x
i
−
m
n
)
2
+
(
x
n
−
m
n
)
2
=
∑
i
=
1
n
−
1
(
x
i
−
m
n
)
2
+
(
n
−
1
n
)
2
(
x
n
−
m
n
−
1
)
2
=
∑
i
=
1
n
−
1
[
x
i
−
m
n
−
1
−
1
n
(
x
n
−
m
n
−
1
)
]
2
+
(
n
−
1
n
)
2
(
x
n
−
m
n
−
1
)
2
=
∑
i
=
1
n
−
1
(
x
i
−
m
n
−
1
)
2
+
[
n
−
1
n
2
+
(
n
−
1
)
2
n
2
]
(
x
n
−
m
n
−
1
)
2
=
S
n
−
1
+
n
−
1
n
(
x
n
−
m
n
−
1
)
2
S_n=\sum_{i=1}^n(x_i-m_n)^2 \\ =\sum_{i=1}^{n-1}(x_i-m_n)^2+(x_n-m_n)^2 \\ =\sum_{i=1}^{n-1}(x_i-m_n)^2+(\frac{n-1}{n})^2(x_n-m_{n-1})^2 \\ =\sum_{i=1}^{n-1}[x_i-m_{n-1}-\frac{1}{n}(x_n-m_{n-1})]^2+(\frac{n-1}{n})^2(x_n-m_{n-1})^2 \\ =\sum_{i=1}^{n-1}(x_i-m_{n-1})^2+[\frac{n-1}{n^2}+\frac{(n-1)^2}{n^2}](x_n-m_{n-1})^2 \\ =S_{n-1}+\frac{n-1}{n}(x_n-m_{n-1})^2
Sn=i=1∑n(xi−mn)2=i=1∑n−1(xi−mn)2+(xn−mn)2=i=1∑n−1(xi−mn)2+(nn−1)2(xn−mn−1)2=i=1∑n−1[xi−mn−1−n1(xn−mn−1)]2+(nn−1)2(xn−mn−1)2=i=1∑n−1(xi−mn−1)2+[n2n−1+n2(n−1)2](xn−mn−1)2=Sn−1+nn−1(xn−mn−1)2
至此,我们得到了利用数组前
n
−
1
n-1
n−1项的均值和方差推出前
n
n
n项的均值和方差的递推式子,如下:
m
n
=
n
−
1
n
m
n
−
1
+
1
n
x
n
S
n
=
S
n
−
1
+
n
−
1
n
(
x
n
−
m
n
−
1
)
2
m_n = \frac{n-1}{n}m_{n-1}+\frac{1}{n}x_n \\ S_n=S_{n-1}+\frac{n-1}{n}(x_n-m_{n-1})^2
mn=nn−1mn−1+n1xnSn=Sn−1+nn−1(xn−mn−1)2
三、程序
\quad 这里给出Python程序求解实例,给出数组 x x x,返回其均值数组 m m m和方差数组 S S S。
def meanAndSquare(x):
x = [0] + x
m = [0 for _ in range(len(x))] # 均值
S = [0 for _ in range(len(x))] # 方差乘上当前长度的值
for i in range(1, len(x)):
m[i] = ((i - 1) * m[i - 1] + x[i]) / i
S[i] = S[i - 1] + (i - 1) / i * (x[i] - m[i - 1]) ** 2
for i in range(1, len(S)):
S[i] /= i # 注意需要除上当前长度才是方差
m, S = m[1:], S[1:]
return m, S
if __name__ == '__main__':
x = [1, 2, 3, 6]
print(meanAndSquare(x))