本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记
目录
Covariance
Covariance
- The covariance of two random variables
X
X
X and
Y
Y
Y, denoted by
c
o
v
(
X
,
Y
)
cov(X, Y)
cov(X,Y), is defined by
c o v ( X , Y ) = E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] = E [ X Y ] − E [ X ] E [ Y ] \begin{aligned}cov(X,Y)&=E[(X-E[X])(Y-E[Y])] \\&=E[XY]-E[X]E[Y] \end{aligned} cov(X,Y)=E[(X−E[X])(Y−E[Y])]=E[XY]−E[X]E[Y]- When c o v ( X , Y ) = 0 cov(X, Y) = 0 cov(X,Y)=0, we say that X X X and Y Y Y are uncorrelated.
- Roughly speaking, a positive or negative covariance indicates that the values of
X
−
E
[
X
]
X-E[X]
X−E[X] and
Y
−
E
[
Y
]
Y-E[Y]
Y−E[Y] obtained in a single experiment “tend” to have the same or the opposite sign, respectively. Thus the sign of the covariance provides an important qualitative indicator of the relationship between
X
X
X and
Y
Y
Y.
Properties of covariances
- For any random variables
X
X
X,
Y
Y
Y, and
Z
Z
Z, and any scalars
a
a
a and
b
b
b, we have
cov ( X , X ) = var ( X ) cov ( X , a Y + b ) = a ⋅ cov ( X , Y ) cov ( X , Y + Z ) = cov ( X , Y ) + cov ( X , Z ) \text{cov}(X,X)=\text{var}(X) \\\text{cov}(X,aY+b)=a\cdot \text{cov}(X,Y) \\\text{cov}(X,Y+Z)=\text{cov}(X,Y)+\text{cov}(X,Z) cov(X,X)=var(X)cov(X,aY+b)=a⋅cov(X,Y)cov(X,Y+Z)=cov(X,Y)+cov(X,Z) - Note thate if
X
X
X and
Y
Y
Y are independent, we have
cov
(
X
,
Y
)
=
E
[
X
Y
]
−
E
[
X
]
E
[
Y
]
=
0
\text{cov}(X, Y) = E[XY]-E[X]E[Y]=0
cov(X,Y)=E[XY]−E[X]E[Y]=0. Thus, if
X
X
X and
Y
Y
Y are independent, they are also uncorrelated. However, the converse is generally not true.
- Assume that
X
X
X and
Y
Y
Y satisfy
E [ X ∣ Y = y ] = E [ X ] , f o r a l l y E[X|Y=y]=E[X],\ \ \ \ for\ all\ y E[X∣Y=y]=E[X], for all yThen, assuming X X X and Y Y Y are discrete, the total expectation theorem implies that
E [ X Y ] = ∑ y p Y ( y ) E [ X Y ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ] = E [ X ] E [ Y ] \begin{aligned}E[XY]&=\sum_yp_Y(y)E[XY|Y=y]=\sum_yyp_Y(y)E[X|Y=y]\\ &=\sum_yyp_Y(y)E[X]=E[X]E[Y]\end{aligned} E[XY]=y∑pY(y)E[XY∣Y=y]=y∑ypY(y)E[X∣Y=y]=y∑ypY(y)E[X]=E[X]E[Y]so X X X and Y Y Y are uncorrelated. The argument for the continuous case is similar.
- Assume that
X
X
X and
Y
Y
Y satisfy
Example 4.13.
- The pair of random variables
(
X
,
Y
)
(X, Y)
(X,Y) takes the values
(
1
,
0
)
,
(
0
,
1
)
,
(
−
1
,
0
)
(1, 0), (0, 1), (-1,0)
(1,0),(0,1),(−1,0) and
(
0
,
−
1
)
(0, -1)
(0,−1). each with probability
1
/
4
1/4
1/4. Therefore,
cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 − 0 = 0 \text{cov}(X,Y)=E[XY]-E[X]E[Y]=0-0=0 cov(X,Y)=E[XY]−E[X]E[Y]=0−0=0and X X X and Y Y Y are uncorrelated. - However, X X X and Y Y Y are not independent since, for example, a nonzero value of X X X fixes the value of Y Y Y to zero.
Correlation Coefficient
相关系数
- The correlation coefficient
ρ
(
X
,
Y
)
\rho(X,Y)
ρ(X,Y) of two random variables
X
X
X and
Y
Y
Y that have nonzero variances is defined as (Assuming that
X
X
X and
Y
Y
Y have positive variances)
ρ ( X , Y ) = cov ( X , Y ) var(X)var(Y) \rho(X, Y) =\frac{\text{cov}(X, Y)}{\sqrt{\text{var(X)} \text{var(Y)}}} ρ(X,Y)=var(X)var(Y)cov(X,Y)
The simpler notation ρ \rho ρ will also be used when X X X and Y Y Y are clear from the context.
- It may be viewed as a normalized version of the covariance
cov
(
X
,
Y
)
\text{cov}(X, Y)
cov(X,Y), and in fact, it can be shown that
ρ
\rho
ρ ranges from
−
1
-1
−1 to
1
1
1.
- If ρ > 0 \rho>0 ρ>0 (or ρ < 0 \rho < 0 ρ<0), then the values of X − E [ X ] X - E[X] X−E[X] and Y − E [ Y ] Y-E[Y] Y−E[Y] “tend” to have the same (or opposite, respectively) sign. The size of ∣ ρ ∣ |\rho| ∣ρ∣ normalized measure of the extent to which this is true.
- In fact, always assuming that
X
X
X and
Y
Y
Y have positive variances, it can be shown that
ρ
=
1
\rho = 1
ρ=1 (or
ρ
=
−
1
\rho = -1
ρ=−1) if and only if there exists a positive (or negative, respectively) constant
c
c
c such that
Y − E [ Y ] = c ( X − E [ X ] ) Y-E[Y]=c(X-E[X]) Y−E[Y]=c(X−E[X])
Problem 20. Schwarz inequality 施瓦兹不等式
Show that for any random variables
X
X
X and
Y
Y
Y, we have
(
E
[
X
Y
]
)
2
≤
E
[
X
2
]
E
[
Y
2
]
(E[XY])^2\leq E[X^2]E[Y^2]
(E[XY])2≤E[X2]E[Y2]
SOLUTION
- We may assume that E [ Y 2 ] ≠ 0 E[Y^2]\neq 0 E[Y2]=0; otherwise, we have Y = 0 Y = 0 Y=0 with probability 1, and hence E [ X Y ] = 0 E[XY] = 0 E[XY]=0. so the inequality holds.
- We have
0 ≤ E [ ( X − E [ X Y ] E [ Y 2 ] Y ) 2 ] = E [ X 2 − 2 E [ X Y ] E [ Y 2 ] X Y + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 Y 2 ] = E [ X 2 ] − 2 E [ X Y ] E [ Y 2 ] E [ X Y ] + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 E [ Y 2 ] = E [ X 2 ] − ( E [ X Y ] ) 2 E [ Y 2 ] \begin{aligned}0&\leq E[(X-\frac{E[XY]}{E[Y^2]}Y)^2]\\ &=E[X^2-2\frac{E[XY]}{E[Y^2]}XY+\frac{(E[XY])^2}{(E[Y^2])^2}Y^2] \\&=E[X^2]-2\frac{E[XY]}{E[Y^2]}E[XY]+\frac{(E[XY])^2}{(E[Y^2])^2}E[Y^2] \\&=E[X^2]-\frac{(E[XY])^2}{E[Y^2]} \end{aligned} 0≤E[(X−E[Y2]E[XY]Y)2]=E[X2−2E[Y2]E[XY]XY+(E[Y2])2(E[XY])2Y2]=E[X2]−2E[Y2]E[XY]E[XY]+(E[Y2])2(E[XY])2E[Y2]=E[X2]−E[Y2](E[XY])2i.e., ( E [ X Y ] ) 2 ≤ E [ X 2 ] E [ Y 2 ] (E[XY])^2\leq E[X^2]E[Y^2] (E[XY])2≤E[X2]E[Y2]
Problem 21. Correlation coefficient.
Consider the correlation coefficient
ρ
(
X
,
Y
)
=
c
o
v
(
X
,
Y
)
v
a
r
(
X
)
v
a
r
(
Y
)
\rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}}
ρ(X,Y)=var(X)var(Y)cov(X,Y)of two random variables
X
X
X and
Y
Y
Y that have positive variances. Show that:
- ( a ) (a) (a) ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ∣ρ(X,Y)∣≤1. [Hint: Use the Schwarz inequality from the preceding problem.]
- ( b ) (b) (b) If Y − E [ Y ] Y - E[Y] Y−E[Y] is a positive (or negative) multiple of X − E [ X ] X - E[X] X−E[X], then ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X . Y ) = − 1 \rho(X. Y) = -1 ρ(X.Y)=−1, respectively].
- ( c ) (c) (c) If ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X , Y ) = − 1 \rho(X, Y) = -1 ρ(X,Y)=−1], then, with probability 1, Y − E [ Y ] Y - E[Y] Y−E[Y] is a positive (or negative. respectively) multiple of X − E [ X ] X - E[X] X−E[X].
SOLUTION
-
(
a
)
(a)
(a) Let
X
~
=
X
−
E
[
X
]
\tilde X = X - E[X]
X~=X−E[X] and
Y
~
=
Y
−
E
[
Y
]
\tilde Y = Y - E[Y]
Y~=Y−E[Y]. Using the Schwarz inequality, we get
ρ ( X , Y ) 2 = ( E [ X ~ Y ~ ] ) 2 E [ X ~ 2 ] E [ Y ~ 2 ] ≤ 1 \rho(X, Y)^2 =\frac{(E[\tilde X\tilde Y])^2}{E[\tilde X^2]E[\tilde Y^2]}\leq1 ρ(X,Y)2=E[X~2]E[Y~2](E[X~Y~])2≤1and hence ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ∣ρ(X,Y)∣≤1. -
(
b
)
(b)
(b) If
Y
~
=
a
X
~
\tilde Y = a\tilde X
Y~=aX~, then
ρ ( X , Y ) = E [ X ~ a X ~ ] E [ X 2 ~ ] E [ ( a X ~ ) 2 ] = a ∣ a ∣ \rho(X, Y)=\frac{E[\tilde Xa\tilde X]}{\sqrt{E[\tilde {X^2}]E[(a\tilde X)^2]}}=\frac{a}{|a|} ρ(X,Y)=E[X2~]E[(aX~)2]E[X~aX~]=∣a∣a -
(
c
)
(c)
(c) If
∣
ρ
(
X
,
Y
)
∣
=
1
|\rho(X, Y)| = 1
∣ρ(X,Y)∣=1, the calculation in the solution of Problem 20 yields
E [ ( X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ ) 2 ] = E [ X ~ 2 ] − ( E [ X ~ Y ~ ] ) 2 E [ Y ~ 2 ] = E [ X ~ 2 ] ( 1 − ( ρ ( X , Y ) ) 2 ) = 0 \begin{aligned}E[(\tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y)^2]&=E[\tilde X^2]-\frac{(E[\tilde X\tilde Y])^2}{E[\tilde Y^2]} \\&=E[\tilde X^2](1-(\rho(X,Y))^2) \\&=0\end{aligned} E[(X~−E[Y~2]E[X~Y~]Y~)2]=E[X~2]−E[Y~2](E[X~Y~])2=E[X~2](1−(ρ(X,Y))2)=0Thus, with probability 1, the random variable
X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ \tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y X~−E[Y~2]E[X~Y~]Y~is equal to zero. It follows that, with probability 1,
X ~ = E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ = E [ X ~ 2 ] E [ Y ~ 2 ] ρ ( X , Y ) Y ~ \tilde X=\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y=\sqrt{\frac{E[\tilde X^2]}{E[\tilde Y^2]}}\rho(X,Y)\tilde Y X~=E[Y~2]E[X~Y~]Y~=E[Y~2]E[X~2]ρ(X,Y)Y~i.e., the sign of the constant ratio of X X X and Y Y Y is determined by the sign of ρ ( X , Y ) \rho(X, Y) ρ(X,Y).
Covariance Matrix (协方差矩阵)
- 设
X
1
,
.
.
.
,
X
n
X_1,...,X_n
X1,...,Xn 为
n
n
n 个随机变量,
μ
i
=
E
(
X
i
)
\mu_{i}=E\left(X_{i}\right)
μi=E(Xi),令
σ
i
j
=
cov
(
X
i
,
X
j
)
\sigma_{ij}=\text{cov}(X_i,X_j)
σij=cov(Xi,Xj), 称矩阵
Σ = ( σ i j ) n × n = [ E [ ( X 1 − μ 1 ) ( X 1 − μ 1 ) ] E [ ( X 1 − μ 1 ) ( X 2 − μ 2 ) ] ⋯ E [ ( X 1 − μ 1 ) ( X n − μ n ) ] E [ ( X 2 − μ 2 ) ( X 1 − μ 1 ) ] E [ ( X 2 − μ 2 ) ( X 2 − μ 2 ) ] ⋯ E [ ( X 2 − μ 2 ) ( X n − μ n ) ] ⋮ ⋮ ⋱ ⋮ E [ ( X n − μ n ) ( X 1 − μ 1 ) ] E [ ( X n − μ n ) ( X 2 − μ 2 ) ] ⋯ E [ ( X n − μ n ) ( X n − μ n ) ] ] \begin{aligned}\boldsymbol \Sigma&=(\sigma_{ij})_{n\times n} \\&=\left[\begin{array}{cccc}E\left[\left(X_{1}-\mu_{1}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{1}-\mu_{1}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{1}-\mu_{1}\right)\left(X_{n}-\mu_{n}\right)\right] \\ E\left[\left(X_{2}-\mu_{2}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{2}-\mu_{2}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{2}-\mu_{2}\right)\left(X_{n}-\mu_{n}\right)\right] \\ \vdots & \vdots & \ddots & \vdots \\ E\left[\left(X_{n}-\mu_{n}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{n}-\mu_{n}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{n}-\mu_{n}\right)\left(X_{n}-\mu_{n}\right)\right]\end{array}\right]\end{aligned} Σ=(σij)n×n=⎣⎢⎢⎢⎡E[(X1−μ1)(X1−μ1)]E[(X2−μ2)(X1−μ1)]⋮E[(Xn−μn)(X1−μ1)]E[(X1−μ1)(X2−μ2)]E[(X2−μ2)(X2−μ2)]⋮E[(Xn−μn)(X2−μ2)]⋯⋯⋱⋯E[(X1−μ1)(Xn−μn)]E[(X2−μ2)(Xn−μn)]⋮E[(Xn−μn)(Xn−μn)]⎦⎥⎥⎥⎤为 X 1 , . . . , X n X_1,...,X_n X1,...,Xn 的协方差矩阵
- 设
X
=
[
X
1
⋮
X
n
]
∈
R
n
,
μ
=
[
μ
1
⋮
μ
n
]
∈
R
n
\mathbf{X}=\left[\begin{array}{c}X_{1} \\ \vdots \\ X_{n}\end{array}\right]\in\R^n,\mathbf{\boldsymbol \mu}=\left[\begin{array}{c}\mu_{1} \\ \vdots \\ \mu_{n}\end{array}\right]\in\R^n
X=⎣⎢⎡X1⋮Xn⎦⎥⎤∈Rn,μ=⎣⎢⎡μ1⋮μn⎦⎥⎤∈Rn, 则
Σ = E [ ( X − μ ) ( X − μ ) ⊤ ] \boldsymbol \Sigma=\mathbb{E}\left[(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\right] Σ=E[(X−μ)(X−μ)⊤]
协方差矩阵是半正定矩阵
x T Σ x = x T E [ ( X − μ ) ( X − μ ) ⊤ ] x = E [ x T ( X − μ ) ( X − μ ) ⊤ x ] = E [ ( ( X − μ ) T x ) T ( ( X − μ ) T x ) ] = E [ ∥ ( X − μ ) T x ∥ 2 ] ≥ 0 \begin{aligned}\boldsymbol{x}^{\mathrm{T}} \boldsymbol \Sigma \boldsymbol{x}&=\boldsymbol{x}^{\mathrm{T}} \mathbb{E}\left[(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\right]\boldsymbol{x} \\&= \mathbb{E}\left[\boldsymbol{x}^{\mathrm{T}}(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\boldsymbol{x}\right] \\&=\mathbb{E}\left[((\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x})^{\mathrm{T}}((\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x})\right] \\&=\mathbb{E}\left[\left\|(\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x}\right\|^{2}\right] \\&\geq0\end{aligned} xTΣx=xTE[(X−μ)(X−μ)⊤]x=E[xT(X−μ)(X−μ)⊤x]=E[((X−μ)Tx)T((X−μ)Tx)]=E[∥∥(X−μ)Tx∥∥2]≥0
Variance of the Sum of Random Variables
- If
X
1
,
X
2
,
.
.
.
,
X
n
X_1, X_2, ... , X_n
X1,X2,...,Xn are random variables with finite variance, we have
v a r ( X 1 + X 2 ) = v a r ( X 1 ) + v a r ( X 2 ) + 2 c o v ( X 1 , X 2 ) var(X_1+X_2)=var(X_1)+var(X_2)+2cov(X_1,X_2) var(X1+X2)=var(X1)+var(X2)+2cov(X1,X2)and, more generally,
v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) var(\sum_{i=1}^nX_i)=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) var(i=1∑nXi)=i=1∑nvar(Xi)+{(i,j)∣i=j}∑cov(Xi,Xj)
PROOF
- For brevity, we denote
X
~
i
=
X
i
−
E
[
X
i
]
\tilde X_i=X_i-E[X_i]
X~i=Xi−E[Xi], then
v a r ( ∑ i = 1 n X i ) = E [ ( ∑ i = 1 n X ~ i ) 2 ] = E [ ∑ i = 1 n ∑ j = 1 n X ~ i X ~ j ] = ∑ i = 1 n ∑ j = 1 n E [ X ~ i X ~ j ] = ∑ i = 1 n E [ X ~ i 2 ] + ∑ { ( i , j ) ∣ i ≠ j } E [ X ~ i X ~ j ] = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) \begin{aligned}var(\sum_{i=1}^nX_i)&=E[(\sum_{i=1}^n\tilde X_i)^2] \\&=E[\sum_{i=1}^n\sum_{j=1}^n\tilde X_i\tilde X_j] \\&=\sum_{i=1}^n\sum_{j=1}^nE[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nE[\tilde X_i^2]+\sum_{\{(i,j)|i\neq j\}}E[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \end{aligned} var(i=1∑nXi)=E[(i=1∑nX~i)2]=E[i=1∑nj=1∑nX~iX~j]=i=1∑nj=1∑nE[X~iX~j]=i=1∑nE[X~i2]+{(i,j)∣i=j}∑E[X~iX~j]=i=1∑nvar(Xi)+{(i,j)∣i=j}∑cov(Xi,Xj)
Example 4.15.
n n n people throw their hats in a box and then pick a hat at random. Let us find the variance of X X X, the number of people who pick their own hat.
SOLUTION
- We have
X = X 1 + ⋅ ⋅ ⋅ + X n X = X_1 +· · ·+ X_n X=X1+⋅⋅⋅+Xnwhere X 1 X_1 X1 is the random variable that takes the value 1 1 1 if the i i ith person selects his/her own hat, and takes the value 0 0 0 otherwise. Noting that X i X_i Xi is Bernoulli with parameter p = P ( X i = 1 ) = 1 / n p = P(X_i = 1) = 1/n p=P(Xi=1)=1/n, we obtain
E [ X i ] = 1 n v a r ( X i ) = 1 n ( 1 − 1 n ) \begin{aligned}E[X_i]&=\frac{1}{n}\\ var(X_i)&=\frac{1}{n}(1-\frac{1}{n})\end{aligned} E[Xi]var(Xi)=n1=n1(1−n1)For i ≠ j i \neq j i=j, we have
c o v ( X i , X j ) = E [ X i X j ] − E [ X i ] E [ X j ] = P ( X i = 1 a n d X j = 1 ) − 1 n 2 = P ( X i = 1 ) P ( X j = 1 ∣ X i = 1 ) − 1 n 2 = 1 n ⋅ 1 n − 1 − 1 n 2 = 1 n 2 ( n − 1 ) \begin{aligned}cov(X_i, X_j) &= E[X_iX_j] - E[X_i] E[ X_j] \\&= P(X_i = 1\ and\ X_j = 1)-\frac{1}{n^2} \\&=P(X_i=1)P(X_j=1|X_i=1)-\frac{1}{n^2} \\&=\frac{1}{n}\cdot\frac{1}{n-1}-\frac{1}{n^2} \\&=\frac{1}{n^2(n-1)} \end{aligned} cov(Xi,Xj)=E[XiXj]−E[Xi]E[Xj]=P(Xi=1 and Xj=1)−n21=P(Xi=1)P(Xj=1∣Xi=1)−n21=n1⋅n−11−n21=n2(n−1)1Therefore,
v a r ( X ) = v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) = n ⋅ 1 n ( 1 − 1 n ) + n ( n − 1 ) ⋅ 1 n 2 ( n − 1 ) = 1 \begin{aligned}var(X)&=var(\sum_{i=1}^nX_i) \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \\&=n\cdot \frac{1}{n}(1-\frac{1}{n})+n(n-1)\cdot \frac{1}{n^2(n-1)} \\&=1 \end{aligned} var(X)=var(i=1∑nXi)=i=1∑nvar(Xi)+{(i,j)∣i=j}∑cov(Xi,Xj)=n⋅n1(1−n1)+n(n−1)⋅n2(n−1)1=1