文章目录
0. 思维导图
1. 知识补充
无偏估计:对随机变量θ 的估计是
θ
^
\hat{\theta}
θ^,如果
E
(
θ
^
)
=
E
(
θ
)
E(\hat{\theta})=E(\theta)
E(θ^)=E(θ),则称
θ
^
\hat{\theta}
θ^为θ 的无偏估计。
σ
2
=
1
n
−
1
[
∑
i
=
1
n
(
X
i
−
X
ˉ
i
)
2
]
\sigma^{2}=\frac{1}{n-1}\left[\sum_{i=1}^{n}\left(X_{i}-\bar{X}_{i}\right)^{2}\right]
σ2=n−11[i=1∑n(Xi−Xˉi)2]
已知:
(1) E ( X Y ) = E X ∗ E Y E(X Y)=E X * E Y E(XY)=EX∗EY XY是互相独立的
(2) Var ( X ˉ ) = 1 n Var ( X ) \operatorname{Var}(\bar{X})=\frac{1}{n} \operatorname{Var}(X) Var(Xˉ)=n1Var(X) 或者 σ ( X ˉ ) 2 = 1 n σ ( X ) 2 \sigma(\bar{X})^{2}=\frac{1}{n} \sigma(X)^{2} σ(Xˉ)2=n1σ(X)2
(3) Var ( X ) = E ( X 2 ) − ( E ( X ) ) 2 \operatorname{Var}(X)=E\left(X^{2}\right)-(E(X))^{2} Var(X)=E(X2)−(E(X))2
(4) E ( X 2 ) = Var ( X ) + ( E ( X ) ) 2 = σ 2 + μ 2 E\left(X^{2}\right)=\operatorname{Var}(X)+(E(X))^{2}=\sigma^{2}+\mu^{2} E(X2)=Var(X)+(E(X))2=σ2+μ2
(5) E ( X ˉ 2 ) = Var ( X ˉ ) + ( E ( X ˉ ) ) 2 = 1 n σ 2 + μ 2 E\left(\bar{X}^{2}\right)=\operatorname{Var}(\bar{X})+(E(\bar{X}))^{2}=\frac{1}{n} \sigma^{2}+\mu^{2} E(Xˉ2)=Var(Xˉ)+(E(Xˉ))2=n1σ2+μ2
证明:
E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right) E(∑i=1n(Xi−Xˉ)2) = E ( ∑ i = 1 n ( X i 2 − 2 X ˉ X i + X ˉ 2 ) =E\left(\sum_{i=1}^{n}\left(X_{i}^{2}-2 \bar{X} X_{i}+\bar{X}^{2}\right)\right. =E(∑i=1n(Xi2−2XˉXi+Xˉ2) = E ( ∑ i = 1 n X i 2 ) − E ( ∑ i = 1 n 2 X ˉ X i ) + E ( ∑ i = 1 n X ˉ 2 ) =E\left(\sum_{i=1}^{n} X_{i}^{2}\right)-E\left(\sum_{i=1}^{n} 2 \bar{X} X_{i}\right)+E\left(\sum_{i=1}^{n} \bar{X}^{2}\right) =E(∑i=1nXi2)−E(∑i=1n2XˉXi)+E(∑i=1nXˉ2) = ∑ i = 1 n E ( X i 2 ) − 2 E ( X ˉ ∑ i = 1 n X i ) + ∑ i = 1 n E ( X ˉ 2 ) =\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)+\sum_{i=1}^{n} E\left(\bar{X}^{2}\right) =∑i=1nE(Xi2)−2E(Xˉ∑i=1nXi)+∑i=1nE(Xˉ2) = ∑ i = 1 n E ( X i 2 ) − 2 E ( X ˉ ∑ i = 1 n X i ) + n ⋅ E ( X ˉ 2 ) =\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)+n \cdot E\left(\bar{X}^{2}\right) =∑i=1nE(Xi2)−2E(Xˉ∑i=1nXi)+n⋅E(Xˉ2)
第二项:
2 E ( X ˉ ∑ i = 1 n X i ) = 2 E ( X ˉ ⋅ n X ˉ ) = 2 n ⋅ E ( X ˉ 2 ) 2 E\left(\bar{X} \sum_{i=1}^{n} X_{i}\right)=2 E(\bar{X} \cdot n \bar{X})=2 n \cdot E\left(\bar{X}^{2}\right) 2E(Xˉ∑i=1nXi)=2E(Xˉ⋅nXˉ)=2n⋅E(Xˉ2)
带回原式:
E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) = ∑ i = 1 n E ( X i 2 ) − n ⋅ E ( X ˉ 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right)=\sum_{i=1}^{n} E\left(X_{i}^{2}\right)-n \cdot E\left(\bar{X}^{2}\right) E(∑i=1n(Xi−Xˉ)2)=∑i=1nE(Xi2)−n⋅E(Xˉ2)
将(4)和(5)带入得:
E ( ∑ i = 1 n ( X i − X ˉ ) 2 ) = ∑ i = 1 n ( σ 2 + μ 2 ) − n ⋅ ( 1 n σ 2 + μ 2 ) E\left(\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}\right)=\sum_{i=1}^{n}\left(\sigma^{2}+\mu^{2}\right)-n \cdot\left(\frac{1}{n} \sigma^{2}+\mu^{2}\right) E(∑i=1n(Xi−Xˉ)2)=∑i=1n(σ2+μ2)−n⋅(n1σ2+μ2) = n ( σ 2 + μ 2 ) − n ( 1 n σ 2 + μ 2 ) =n\left(\sigma^{2}+\mu^{2}\right)-n\left(\frac{1}{n} \sigma^{2}+\mu^{2}\right) =n(σ2+μ2)−n(n1σ2+μ2) = ( n − 1 ) σ 2 =(n-1) \sigma^{2} =(n−1)σ2
得证。
2. amin测试
x = np.random.randint(0, 10, 3*4*5).reshape(3, 4, 5)
print(x)
print('-'*50)
print(np.amin(x, axis=0))
print('-'*50)
print(np.amin(x, axis=1))
print('-'*50)
print(np.amin(x, axis=2))
结果:
[[[5 5 3 7 4]
[4 8 6 5 3]
[6 0 9 0 5]
[3 5 0 8 0]]
[[4 7 2 0 9]
[5 9 0 6 1]
[4 2 4 3 0]
[5 2 3 3 8]]
[[5 5 7 4 8]
[3 9 7 7 5]
[9 9 8 3 2]
[8 1 3 4 6]]]
--------------------------------------------------
[[4 5 2 0 4]
[3 8 0 5 1]
[4 0 4 0 0]
[3 1 0 3 0]]
--------------------------------------------------
[[3 0 0 0 0]
[4 2 0 0 0]
[3 1 3 3 2]]
--------------------------------------------------
[[3 3 0 0]
[0 0 0 2]
[4 3 2 1]]
3. 方差测试
import numpy as np
x = np.array([[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
[26, 27, 28, 29, 30],
[31, 32, 33, 34, 35]])
print(x.size)
print(np.mean(x))
print(np.var(x))
print(np.mean((x-np.mean(x))**2))
# 无偏估计
print(np.sum((x-np.mean(x))**2)/(x.size-1))
print(np.var(x, ddof=1))
# axis测试
print(np.var(x,axis=0))
print(np.var(x,axis=1))
结果:
25
23.0
52.0
52.0
54.166666666666664
54.166666666666664
[50. 50. 50. 50. 50.]
[2. 2. 2. 2. 2.]
3. 标准差
# TEST 3
x = np.array([[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
[26, 27, 28, 29, 30],
[31, 32, 33, 34, 35]])
print(np.std(x))
print(np.sqrt(np.var(x)))
print(np.std(x,axis=0))
print(np.std(x,axis=1))
输出结果:
7.211102550927978
7.211102550927978
[7.07106781 7.07106781 7.07106781 7.07106781 7.07106781]
[1.41421356 1.41421356 1.41421356 1.41421356 1.41421356]
4. 极差(peak to peak)
import numpy as np
x = np.random.randint(0, 20, size=[4, 5])
print(x)
print(np.ptp(x))
print(np.ptp(x, axis=0))
print(np.ptp(x, axis=1))
输出:
[[13 11 15 0 6]
[ 3 16 2 11 12]
[ 2 1 2 2 18]
[ 6 1 13 18 11]]
18
[11 15 13 18 12]
[15 14 17 17]
5. 分位数
详解:https://blog.csdn.net/juliarjuliar/article/details/81082934
x = np.random.randint(0,20,[4,5])
print(x)
print(np.percentile(x, [25,50]))
x = x.reshape(-1)
print(x)
print(np.sort(x))
print(np.percentile(x, [25,50]))
结果:
[[18 13 9 19 11]
[ 1 19 6 14 1]
[19 5 4 19 0]
[ 9 17 0 17 4]]
[ 4. 10.]
[18 13 9 19 11 1 19 6 14 1 19 5 4 19 0 9 17 0 17 4]
[ 0 0 1 1 4 4 5 6 9 9 11 13 14 17 17 18 19 19 19 19]
[ 4. 10.]
6. 中位数/均值/加权平均
x = np.random.randint(0, 100, [3, 7])
print(np.sort(x))
print(np.median(x))
print(np.mean(x))
print(np.average(x))
print(np.mean(x, axis=0))
print(np.average(x, axis=0))
w = np.arange(1, 22).reshape(3, 7)
print(np.average(x, weights=w))
输出:
[[ 7 13 14 17 20 24 33]
[ 3 31 32 35 58 70 94]
[23 28 64 86 88 91 98]]
32.0
44.23809523809524
44.23809523809524
[44. 25. 33.33333333 52. 30.33333333 58.66666667
66.33333333]
[44. 25. 33.33333333 52. 30.33333333 58.66666667
66.33333333]
56.54978354978355
7. 协方差矩阵
x = np.arange(1, 8)
y = np.arange(8, 15)
print(x, y)
print('-'*50)
print(np.var(x))
print(np.cov(x))
print(np.var(x, ddof=1))
print('-'*50)
print(np.var(y))
print(np.cov(y))
print(np.var(y, ddof=1))
print('-'*50)
print(np.cov(x, y))
print('-'*50)
z = np.mean((x - np.mean(x)) * (y - np.mean(y))) # 协方差
print(z)
z = np.sum((x - np.mean(x)) * (y - np.mean(y))) / (len(x) - 1) # 样本协方差
print(z)
z = np.dot(x - np.mean(x), y - np.mean(y)) / (len(x) - 1) # 样本协方差
print(z)
输出:
[1 2 3 4 5 6 7] [ 8 9 10 11 12 13 14]
--------------------------------------------------
4.0
4.666666666666666
4.666666666666667
--------------------------------------------------
4.0
4.666666666666666
4.666666666666667
--------------------------------------------------
[[4.66666667 4.66666667]
[4.66666667 4.66666667]]
--------------------------------------------------
4.0
4.666666666666667
4.666666666666667
8. 相关系数
x, y = np.random.randint(0, 20, size=(2, 4))
print(x)
print(y)
z = np.corrcoef(x, y)
print(z)
a = np.dot(x - np.mean(x), y - np.mean(y))
b = np.sqrt(np.dot(x - np.mean(x), x - np.mean(x)))
c = np.sqrt(np.dot(y - np.mean(y), y - np.mean(y)))
print(a / (b * c))
输出:
[ 9 10 4 14]
[19 6 14 0]
[[ 1. -0.70975624]
[-0.70975624 1. ]]
-0.7097562360053747
9. 直方图
x = np.array([0.2, 6.4, 3.0, 1.6])
bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])
inds = np.digitize(x, bins)
print(inds) # [1 4 3 2]
for n in range(x.size):
print(bins[inds[n] - 1], "<=", x[n], "<", bins[inds[n]])
输出:
[1 4 3 2]
0.0 <= 0.2 < 1.0
4.0 <= 6.4 < 10.0
2.5 <= 3.0 < 4.0
1.0 <= 1.6 < 2.5
10. 练习
计算给定数组中每行的最大值。
a = np.random.randint(1, 10, [5, 3])
# WORK 1
a = np.random.randint(1,10,[5,3])
print(a)
print(np.amax(a, axis=1))
输出:
[[5 2 4]
[9 9 2]
[8 8 6]
[2 6 2]
[3 6 3]]
[5 9 8 6 6]