Chapter 4 (Further Topics on Random Variables): Covariance and Correlation (协方差和相关)

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

Covariance

Covariance

  • The covariance of two random variables X X X and Y Y Y, denoted by c o v ( X , Y ) cov(X, Y) cov(X,Y), is defined by
    c o v ( X , Y ) = E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] = E [ X Y ] − E [ X ] E [ Y ] \begin{aligned}cov(X,Y)&=E[(X-E[X])(Y-E[Y])] \\&=E[XY]-E[X]E[Y] \end{aligned} cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]
    • When c o v ( X , Y ) = 0 cov(X, Y) = 0 cov(X,Y)=0, we say that X X X and Y Y Y are uncorrelated.
    • Roughly speaking, a positive or negative covariance indicates that the values of X − E [ X ] X-E[X] XE[X] and Y − E [ Y ] Y-E[Y] YE[Y] obtained in a single experiment “tend” to have the same or the opposite sign, respectively. Thus the sign of the covariance provides an important qualitative indicator of the relationship between X X X and Y Y Y.
      在这里插入图片描述

Properties of covariances

  • For any random variables X X X, Y Y Y, and Z Z Z, and any scalars a a a and b b b, we have
    cov ( X , X ) = var ( X ) cov ( X , a Y + b ) = a ⋅ cov ( X , Y ) cov ( X , Y + Z ) = cov ( X , Y ) + cov ( X , Z ) \text{cov}(X,X)=\text{var}(X) \\\text{cov}(X,aY+b)=a\cdot \text{cov}(X,Y) \\\text{cov}(X,Y+Z)=\text{cov}(X,Y)+\text{cov}(X,Z) cov(X,X)=var(X)cov(X,aY+b)=acov(X,Y)cov(X,Y+Z)=cov(X,Y)+cov(X,Z)
  • Note thate if X X X and Y Y Y are independent, we have cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 \text{cov}(X, Y) = E[XY]-E[X]E[Y]=0 cov(X,Y)=E[XY]E[X]E[Y]=0. Thus, if X X X and Y Y Y are independent, they are also uncorrelated. However, the converse is generally not true.
    • Assume that X X X and Y Y Y satisfy
      E [ X ∣ Y = y ] = E [ X ] ,      f o r   a l l   y E[X|Y=y]=E[X],\ \ \ \ for\ all\ y E[XY=y]=E[X],    for all yThen, assuming X X X and Y Y Y are discrete, the total expectation theorem implies that
      E [ X Y ] = ∑ y p Y ( y ) E [ X Y ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ] = E [ X ] E [ Y ] \begin{aligned}E[XY]&=\sum_yp_Y(y)E[XY|Y=y]=\sum_yyp_Y(y)E[X|Y=y]\\ &=\sum_yyp_Y(y)E[X]=E[X]E[Y]\end{aligned} E[XY]=ypY(y)E[XYY=y]=yypY(y)E[XY=y]=yypY(y)E[X]=E[X]E[Y]so X X X and Y Y Y are uncorrelated. The argument for the continuous case is similar.

Example 4.13.

  • The pair of random variables ( X , Y ) (X, Y) (X,Y) takes the values ( 1 , 0 ) , ( 0 , 1 ) , ( − 1 , 0 ) (1, 0), (0, 1), (-1,0) (1,0),(0,1),(1,0) and ( 0 , − 1 ) (0, -1) (0,1). each with probability 1 / 4 1/4 1/4. Therefore,
    cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 − 0 = 0 \text{cov}(X,Y)=E[XY]-E[X]E[Y]=0-0=0 cov(X,Y)=E[XY]E[X]E[Y]=00=0and X X X and Y Y Y are uncorrelated.
  • However, X X X and Y Y Y are not independent since, for example, a nonzero value of X X X fixes the value of Y Y Y to zero.

Correlation Coefficient

相关系数

  • The correlation coefficient ρ ( X , Y ) \rho(X,Y) ρ(X,Y) of two random variables X X X and Y Y Y that have nonzero variances is defined as (Assuming that X X X and Y Y Y have positive variances)
    ρ ( X , Y ) = cov ( X , Y ) var(X)var(Y) \rho(X, Y) =\frac{\text{cov}(X, Y)}{\sqrt{\text{var(X)} \text{var(Y)}}} ρ(X,Y)=var(X)var(Y) cov(X,Y)

The simpler notation ρ \rho ρ will also be used when X X X and Y Y Y are clear from the context.

  • It may be viewed as a normalized version of the covariance cov ( X , Y ) \text{cov}(X, Y) cov(X,Y), and in fact, it can be shown that ρ \rho ρ ranges from − 1 -1 1 to 1 1 1.
    • If ρ > 0 \rho>0 ρ>0 (or ρ < 0 \rho < 0 ρ<0), then the values of X − E [ X ] X - E[X] XE[X] and Y − E [ Y ] Y-E[Y] YE[Y] “tend” to have the same (or opposite, respectively) sign. The size of ∣ ρ ∣ |\rho| ρ normalized measure of the extent to which this is true.
    • In fact, always assuming that X X X and Y Y Y have positive variances, it can be shown that ρ = 1 \rho = 1 ρ=1 (or ρ = − 1 \rho = -1 ρ=1) if and only if there exists a positive (or negative, respectively) constant c c c such that
      Y − E [ Y ] = c ( X − E [ X ] ) Y-E[Y]=c(X-E[X]) YE[Y]=c(XE[X])

Problem 20. Schwarz inequality 施瓦兹不等式

Show that for any random variables X X X and Y Y Y, we have
( E [ X Y ] ) 2 ≤ E [ X 2 ] E [ Y 2 ] (E[XY])^2\leq E[X^2]E[Y^2] (E[XY])2E[X2]E[Y2]

SOLUTION

  • We may assume that E [ Y 2 ] ≠ 0 E[Y^2]\neq 0 E[Y2]=0; otherwise, we have Y = 0 Y = 0 Y=0 with probability 1, and hence E [ X Y ] = 0 E[XY] = 0 E[XY]=0. so the inequality holds.
  • We have
    0 ≤ E [ ( X − E [ X Y ] E [ Y 2 ] Y ) 2 ] = E [ X 2 − 2 E [ X Y ] E [ Y 2 ] X Y + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 Y 2 ] = E [ X 2 ] − 2 E [ X Y ] E [ Y 2 ] E [ X Y ] + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 E [ Y 2 ] = E [ X 2 ] − ( E [ X Y ] ) 2 E [ Y 2 ] \begin{aligned}0&\leq E[(X-\frac{E[XY]}{E[Y^2]}Y)^2]\\ &=E[X^2-2\frac{E[XY]}{E[Y^2]}XY+\frac{(E[XY])^2}{(E[Y^2])^2}Y^2] \\&=E[X^2]-2\frac{E[XY]}{E[Y^2]}E[XY]+\frac{(E[XY])^2}{(E[Y^2])^2}E[Y^2] \\&=E[X^2]-\frac{(E[XY])^2}{E[Y^2]} \end{aligned} 0E[(XE[Y2]E[XY]Y)2]=E[X22E[Y2]E[XY]XY+(E[Y2])2(E[XY])2Y2]=E[X2]2E[Y2]E[XY]E[XY]+(E[Y2])2(E[XY])2E[Y2]=E[X2]E[Y2](E[XY])2i.e., ( E [ X Y ] ) 2 ≤ E [ X 2 ] E [ Y 2 ] (E[XY])^2\leq E[X^2]E[Y^2] (E[XY])2E[X2]E[Y2]

Problem 21. Correlation coefficient.

Consider the correlation coefficient
ρ ( X , Y ) = c o v ( X , Y ) v a r ( X ) v a r ( Y ) \rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}} ρ(X,Y)=var(X)var(Y) cov(X,Y)of two random variables X X X and Y Y Y that have positive variances. Show that:

  • ( a ) (a) (a) ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ρ(X,Y)1. [Hint: Use the Schwarz inequality from the preceding problem.]
  • ( b ) (b) (b) If Y − E [ Y ] Y - E[Y] YE[Y] is a positive (or negative) multiple of X − E [ X ] X - E[X] XE[X], then ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X . Y ) = − 1 \rho(X. Y) = -1 ρ(X.Y)=1, respectively].
  • ( c ) (c) (c) If ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X , Y ) = − 1 \rho(X, Y) = -1 ρ(X,Y)=1], then, with probability 1, Y − E [ Y ] Y - E[Y] YE[Y] is a positive (or negative. respectively) multiple of X − E [ X ] X - E[X] XE[X].

SOLUTION

  • ( a ) (a) (a) Let X ~ = X − E [ X ] \tilde X = X - E[X] X~=XE[X] and Y ~ = Y − E [ Y ] \tilde Y = Y - E[Y] Y~=YE[Y]. Using the Schwarz inequality, we get
    ρ ( X , Y ) 2 = ( E [ X ~ Y ~ ] ) 2 E [ X ~ 2 ] E [ Y ~ 2 ] ≤ 1 \rho(X, Y)^2 =\frac{(E[\tilde X\tilde Y])^2}{E[\tilde X^2]E[\tilde Y^2]}\leq1 ρ(X,Y)2=E[X~2]E[Y~2](E[X~Y~])21and hence ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ρ(X,Y)1.
  • ( b ) (b) (b) If Y ~ = a X ~ \tilde Y = a\tilde X Y~=aX~, then
    ρ ( X , Y ) = E [ X ~ a X ~ ] E [ X 2 ~ ] E [ ( a X ~ ) 2 ] = a ∣ a ∣ \rho(X, Y)=\frac{E[\tilde Xa\tilde X]}{\sqrt{E[\tilde {X^2}]E[(a\tilde X)^2]}}=\frac{a}{|a|} ρ(X,Y)=E[X2~]E[(aX~)2] E[X~aX~]=aa
  • ( c ) (c) (c) If ∣ ρ ( X , Y ) ∣ = 1 |\rho(X, Y)| = 1 ρ(X,Y)=1, the calculation in the solution of Problem 20 yields
    E [ ( X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ ) 2 ] = E [ X ~ 2 ] − ( E [ X ~ Y ~ ] ) 2 E [ Y ~ 2 ] = E [ X ~ 2 ] ( 1 − ( ρ ( X , Y ) ) 2 ) = 0 \begin{aligned}E[(\tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y)^2]&=E[\tilde X^2]-\frac{(E[\tilde X\tilde Y])^2}{E[\tilde Y^2]} \\&=E[\tilde X^2](1-(\rho(X,Y))^2) \\&=0\end{aligned} E[(X~E[Y~2]E[X~Y~]Y~)2]=E[X~2]E[Y~2](E[X~Y~])2=E[X~2](1(ρ(X,Y))2)=0Thus, with probability 1, the random variable
    X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ \tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y X~E[Y~2]E[X~Y~]Y~is equal to zero. It follows that, with probability 1,
    X ~ = E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ = E [ X ~ 2 ] E [ Y ~ 2 ] ρ ( X , Y ) Y ~ \tilde X=\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y=\sqrt{\frac{E[\tilde X^2]}{E[\tilde Y^2]}}\rho(X,Y)\tilde Y X~=E[Y~2]E[X~Y~]Y~=E[Y~2]E[X~2] ρ(X,Y)Y~i.e., the sign of the constant ratio of X X X and Y Y Y is determined by the sign of ρ ( X , Y ) \rho(X, Y) ρ(X,Y).

Covariance Matrix (协方差矩阵)

  • X 1 , . . . , X n X_1,...,X_n X1,...,Xn n n n 个随机变量, μ i = E ( X i ) \mu_{i}=E\left(X_{i}\right) μi=E(Xi),令 σ i j = cov ( X i , X j ) \sigma_{ij}=\text{cov}(X_i,X_j) σij=cov(Xi,Xj), 称矩阵
    Σ = ( σ i j ) n × n = [ E [ ( X 1 − μ 1 ) ( X 1 − μ 1 ) ] E [ ( X 1 − μ 1 ) ( X 2 − μ 2 ) ] ⋯ E [ ( X 1 − μ 1 ) ( X n − μ n ) ] E [ ( X 2 − μ 2 ) ( X 1 − μ 1 ) ] E [ ( X 2 − μ 2 ) ( X 2 − μ 2 ) ] ⋯ E [ ( X 2 − μ 2 ) ( X n − μ n ) ] ⋮ ⋮ ⋱ ⋮ E [ ( X n − μ n ) ( X 1 − μ 1 ) ] E [ ( X n − μ n ) ( X 2 − μ 2 ) ] ⋯ E [ ( X n − μ n ) ( X n − μ n ) ] ] \begin{aligned}\boldsymbol \Sigma&=(\sigma_{ij})_{n\times n} \\&=\left[\begin{array}{cccc}E\left[\left(X_{1}-\mu_{1}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{1}-\mu_{1}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{1}-\mu_{1}\right)\left(X_{n}-\mu_{n}\right)\right] \\ E\left[\left(X_{2}-\mu_{2}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{2}-\mu_{2}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{2}-\mu_{2}\right)\left(X_{n}-\mu_{n}\right)\right] \\ \vdots & \vdots & \ddots & \vdots \\ E\left[\left(X_{n}-\mu_{n}\right)\left(X_{1}-\mu_{1}\right)\right] & E\left[\left(X_{n}-\mu_{n}\right)\left(X_{2}-\mu_{2}\right)\right] & \cdots & E\left[\left(X_{n}-\mu_{n}\right)\left(X_{n}-\mu_{n}\right)\right]\end{array}\right]\end{aligned} Σ=(σij)n×n=E[(X1μ1)(X1μ1)]E[(X2μ2)(X1μ1)]E[(Xnμn)(X1μ1)]E[(X1μ1)(X2μ2)]E[(X2μ2)(X2μ2)]E[(Xnμn)(X2μ2)]E[(X1μ1)(Xnμn)]E[(X2μ2)(Xnμn)]E[(Xnμn)(Xnμn)] X 1 , . . . , X n X_1,...,X_n X1,...,Xn协方差矩阵

  • X = [ X 1 ⋮ X n ] ∈ R n , μ = [ μ 1 ⋮ μ n ] ∈ R n \mathbf{X}=\left[\begin{array}{c}X_{1} \\ \vdots \\ X_{n}\end{array}\right]\in\R^n,\mathbf{\boldsymbol \mu}=\left[\begin{array}{c}\mu_{1} \\ \vdots \\ \mu_{n}\end{array}\right]\in\R^n X=X1XnRn,μ=μ1μnRn, 则
    Σ = E [ ( X − μ ) ( X − μ ) ⊤ ] \boldsymbol \Sigma=\mathbb{E}\left[(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\right] Σ=E[(Xμ)(Xμ)]

协方差矩阵是半正定矩阵

x T Σ x = x T E [ ( X − μ ) ( X − μ ) ⊤ ] x = E [ x T ( X − μ ) ( X − μ ) ⊤ x ] = E [ ( ( X − μ ) T x ) T ( ( X − μ ) T x ) ] = E [ ∥ ( X − μ ) T x ∥ 2 ] ≥ 0 \begin{aligned}\boldsymbol{x}^{\mathrm{T}} \boldsymbol \Sigma \boldsymbol{x}&=\boldsymbol{x}^{\mathrm{T}} \mathbb{E}\left[(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\right]\boldsymbol{x} \\&= \mathbb{E}\left[\boldsymbol{x}^{\mathrm{T}}(\mathbf{X}-\boldsymbol\mu)(\mathbf{X}-\boldsymbol\mu)^{\top}\boldsymbol{x}\right] \\&=\mathbb{E}\left[((\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x})^{\mathrm{T}}((\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x})\right] \\&=\mathbb{E}\left[\left\|(\mathbf{X}-\boldsymbol\mu)^{\mathrm{T}}\boldsymbol{x}\right\|^{2}\right] \\&\geq0\end{aligned} xTΣx=xTE[(Xμ)(Xμ)]x=E[xT(Xμ)(Xμ)x]=E[((Xμ)Tx)T((Xμ)Tx)]=E[(Xμ)Tx2]0

Variance of the Sum of Random Variables

  • If X 1 , X 2 , . . . , X n X_1, X_2, ... , X_n X1,X2,...,Xn are random variables with finite variance, we have
    v a r ( X 1 + X 2 ) = v a r ( X 1 ) + v a r ( X 2 ) + 2 c o v ( X 1 , X 2 ) var(X_1+X_2)=var(X_1)+var(X_2)+2cov(X_1,X_2) var(X1+X2)=var(X1)+var(X2)+2cov(X1,X2)and, more generally,
    v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) var(\sum_{i=1}^nX_i)=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) var(i=1nXi)=i=1nvar(Xi)+{(i,j)i=j}cov(Xi,Xj)

PROOF

  • For brevity, we denote X ~ i = X i − E [ X i ] \tilde X_i=X_i-E[X_i] X~i=XiE[Xi], then
    v a r ( ∑ i = 1 n X i ) = E [ ( ∑ i = 1 n X ~ i ) 2 ] = E [ ∑ i = 1 n ∑ j = 1 n X ~ i X ~ j ] = ∑ i = 1 n ∑ j = 1 n E [ X ~ i X ~ j ] = ∑ i = 1 n E [ X ~ i 2 ] + ∑ { ( i , j ) ∣ i ≠ j } E [ X ~ i X ~ j ] = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) \begin{aligned}var(\sum_{i=1}^nX_i)&=E[(\sum_{i=1}^n\tilde X_i)^2] \\&=E[\sum_{i=1}^n\sum_{j=1}^n\tilde X_i\tilde X_j] \\&=\sum_{i=1}^n\sum_{j=1}^nE[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nE[\tilde X_i^2]+\sum_{\{(i,j)|i\neq j\}}E[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \end{aligned} var(i=1nXi)=E[(i=1nX~i)2]=E[i=1nj=1nX~iX~j]=i=1nj=1nE[X~iX~j]=i=1nE[X~i2]+{(i,j)i=j}E[X~iX~j]=i=1nvar(Xi)+{(i,j)i=j}cov(Xi,Xj)

Example 4.15.

n n n people throw their hats in a box and then pick a hat at random. Let us find the variance of X X X, the number of people who pick their own hat.

SOLUTION

  • We have
    X = X 1 + ⋅ ⋅ ⋅ + X n X = X_1 +· · ·+ X_n X=X1++Xnwhere X 1 X_1 X1 is the random variable that takes the value 1 1 1 if the i i ith person selects his/her own hat, and takes the value 0 0 0 otherwise. Noting that X i X_i Xi is Bernoulli with parameter p = P ( X i = 1 ) = 1 / n p = P(X_i = 1) = 1/n p=P(Xi=1)=1/n, we obtain
    E [ X i ] = 1 n v a r ( X i ) = 1 n ( 1 − 1 n ) \begin{aligned}E[X_i]&=\frac{1}{n}\\ var(X_i)&=\frac{1}{n}(1-\frac{1}{n})\end{aligned} E[Xi]var(Xi)=n1=n1(1n1)For i ≠ j i \neq j i=j, we have
    c o v ( X i , X j ) = E [ X i X j ] − E [ X i ] E [ X j ] = P ( X i = 1   a n d   X j = 1 ) − 1 n 2 = P ( X i = 1 ) P ( X j = 1 ∣ X i = 1 ) − 1 n 2 = 1 n ⋅ 1 n − 1 − 1 n 2 = 1 n 2 ( n − 1 ) \begin{aligned}cov(X_i, X_j) &= E[X_iX_j] - E[X_i] E[ X_j] \\&= P(X_i = 1\ and\ X_j = 1)-\frac{1}{n^2} \\&=P(X_i=1)P(X_j=1|X_i=1)-\frac{1}{n^2} \\&=\frac{1}{n}\cdot\frac{1}{n-1}-\frac{1}{n^2} \\&=\frac{1}{n^2(n-1)} \end{aligned} cov(Xi,Xj)=E[XiXj]E[Xi]E[Xj]=P(Xi=1 and Xj=1)n21=P(Xi=1)P(Xj=1Xi=1)n21=n1n11n21=n2(n1)1Therefore,
    v a r ( X ) = v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) = n ⋅ 1 n ( 1 − 1 n ) + n ( n − 1 ) ⋅ 1 n 2 ( n − 1 ) = 1 \begin{aligned}var(X)&=var(\sum_{i=1}^nX_i) \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \\&=n\cdot \frac{1}{n}(1-\frac{1}{n})+n(n-1)\cdot \frac{1}{n^2(n-1)} \\&=1 \end{aligned} var(X)=var(i=1nXi)=i=1nvar(Xi)+{(i,j)i=j}cov(Xi,Xj)=nn1(1n1)+n(n1)n2(n1)1=1
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值