Chapter 2 (Discrete Random Variables): Independence (独立性)

最新推荐文章于 2024-08-04 13:10:17 发布

连理o

最新推荐文章于 2024-08-04 13:10:17 发布

阅读量336

点赞数

分类专栏：概率论与数理统计

本文链接：https://blog.csdn.net/weixin_42437114/article/details/113573990

版权

概率论与数理统计专栏收录该内容

34 篇文章 15 订阅

订阅专栏

本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

Independence of a Random Variable from an Event

The idea is that knowing the occurrence of the conditioning event provides no new information on the value of the random variable. More formally, we sat that the random variable $X$ is independent of the event $A$ if
$P(X=x\ and\ A)=P(X=x)P(A)=p_X(x)P(A)\ \ \ \ \ for\ all\ x$ as long as $P (A) > 0$ , independence is the same as the condition
$p_{X|A}(x)=p_X(x)\ \ \ \ \ for\ all\ x$

Independence of Random Variables

We say that two random variables $X$ and $Y$ are independent if
$p_{X,Y}(x,y)=p_X(x)p_Y(y)\ \ \ \ \ for\ all\ x,y$ , which is equivalent to the condition
$p_{X|Y}(x|y)=p_X(x)\ \ \ \ \ \ \ for\ all\ y\ with\ p_Y(y) > 0\ and\ all\ x$
$X$ and $Y$ are said to be conditionally independent, given a positive probability event $A$ , if
$A),\ \ \ \ \ for\ all\ x\ and\ y$ or, in this chapter’s notation
$p_{X,Y|A}(x, y) = p_{X|A}(x)p_{Y|A}(y),\ \ \ \ \ for\ all\ x\ and\ y$ Once more, this is equivalent to
$p_{X|Y,A}(x|y)=p_{X|A}(x)\ \ \ \ for\ all\ x\ and\ y\ such\ that\ p_{Y|A}(y)>0$

If $X$ and $Y$ are independent random variables, then
$E[XY]=E[X]E[Y]\\ E[g(X)h(Y)]=E[g(X)]E[h(Y)]$

In fact, the second formulation follows immediately once we realize that if $X$ and $Y$ are independent, then the same is true for $g (X)$ and $h (Y)$ .

Consider now the sum $X + Y$ of two independent random variables $X$ and $Y$ , and let us calculate its variance. Since the variance of a random variable is unchanged when the random variable is shifted by a constant, it is convenient to work with the zero-mean random variables $\tilde X=X-E[X],\tilde Y=Y-E[Y]$ . We have
$\begin{aligned}var(X+Y)&=var(\tilde X+\tilde Y) \\&=E[(\tilde X+\tilde Y)^2] \\&=E[\tilde X^2]+E[\tilde Y^2] \\&=var(\tilde X)+var(\tilde Y) \\&=var(X)+var(Y) \end{aligned}$ In conclusion. the variance of the sum of two independent random variables is equal to the sum of their variances.

Independence of Several Random Variables

The preceding discussion extends naturally to the case of more than two random variables. For example. three random variables $X, Y$ , and $Z$ are said to be independent if
$p_{X,Y,Z}(x,y,z)=p_X(x)p_Y(y)p_Z(z),\ \ \ \ \ for\ all\ x,y,z$
If $X, Y$ , and $Z$ are independent random variables. then any three random variables of the form $f (X)$ , $g (Y)$ , and $h (Z)$ , are also independent. Similarly. any two random variables of the form $g (X, Y)$ and $h (Z)$ are independent. On the other hand. two random variables of the form $g (X, Y)$ and $h (Y, Z)$ are usually not independent because they are both affected by $Y$ .
- Properties such as the above are intuitively clear if we interpret independence in terms of noninteracting (sub) experiments. They can be formally verified but this is sometimes tedious.

Variance of the Sum of Independent Random Variables

If $X_1, X_2 ..... X_n$ are independent random variables, then
$var(X_1 + X_2 +· · ·+ X_n) = var(X_1) + var(X_2)+...+ var(X_n)$

Example 2.20. Variance of the Binomial and the Poisson.

We consider $n$ independent coin tosses. with each toss having probability $p$ of coming up a head. For each $i$ , we let $X_i$ be the Bernoulli random variable which is equal to 1 if the $i$ th toss comes up a head, and is 0 otherwise.

Then. $X = X_1 + X_2 +· · ·+ X_n$ is a binomial random variable. Its mean is $E [X] = n p$ . By the independence of the coin tosses. the random variables $X_1, .... X_n$ are independent, and
$var(X)=\sum_{i=1}^nvar(X_i)=np(1-p)$
$Y$ is a Poisson random variable with parameter $\lambda$
$\begin{aligned}E[Y^2]&=\sum_{k=1}^\infty k^2e^{-\lambda}\frac{\lambda^k}{k!} \\&=\lambda\sum_{k=1}^\infty k\frac{e^{-\lambda}\lambda^{k-1}}{(k-1)!} \\&=\lambda\sum_{m=0}^\infty(m+1)\frac{e^{-\lambda}\lambda^m}{m!} \\&=\lambda(E[Y]+1) \\&=\lambda(\lambda+1)\end{aligned}$ from which
$var(Y)=E[Y^2]-(E[Y])^2=\lambda(\lambda+1)-\lambda^2=\lambda$

Example 2.21. Mean and Variance of the Sample Mean.

Sample mean $S_n$ is defined as
$S_n=\frac{X_1+...+X_n}{n}\\ E[S_n]=\sum_{i=1}^n\frac{1}{n}E[X_i]=E[X]\\ var(S_n)=\sum_{i=1}^n\frac{1}{n^2}var(X_i)=\frac{var(X)}{n}$
The sample mean $S_n$ can be viewed as a “good” estimate of $X$ the true mean $E [X]$ as the sample size $n$ increases. This is because it has the correct expected value, and its accuracy, as reflected by its variance, improves as the sample size $n$ increases.

Problem 40.

A particular professor is known for his arbitrary grading policies. Each paper receives a grade from the set ${A, A-, B+. B, B-, C+ \}$ , with equal probability, independent of other papers. How many papers do you expect to hand in before you receive each possible grade at least once?

SOLUTION

Associate a success with a paper that receives a grade that has not been received before. Let $X_i$ be the number of papers between the $i$ th success and the $(i + 1)$ st success. Then we have $+\sum_{i=1}^5 X_i$ and hence
$+\sum_{i=1}^5E[X_i]$
After receiving $i - 1$ different grades so far ( $i - 1$ successes), each subsequent paper has probability $(6 - i) / 6$ of receiving a grade that has not been received before. Therefore, the random variable $X_i$ is geometric with parameter $p_i = (6-i)/6$ , so $E[X_i] = 6/(6-i)$ . It follows that
$+\sum_{i=1}^5\frac{6}{6-i}= 1 + 6\sum_{i=1}^5\frac{1}{i}= 14.7$

Problem 41.
You drive to work 250 days a week for a full year, and with probability $p$ you get a traffic ticket on any given day, independent of other days. Let $X$ be the total number of tickets you get in the year. Suppose you don’t know the probability $p$ of getting a ticket. but you got 5 tickets during the year, and you estimate $p$ by the sample mean
$\hat p=\frac{5}{250}=0.02$ What is the range of possible values of $p$ assuming that the difference between $p$ and the sample mean $p$ is within 5 times the standard deviation of the sample mean?

SOLUTION

The variance of the sample mean is
$\frac{p(1-p)}{250}$ Then we have
$(p-0.02)^2\leq\frac{25p(1-p)}{250}\\ \therefore p\in[0.0025,0.1245]$

Problem 45
Let $X_1, ... , X_n$ be independent random variables and let $X = X_1 +· · ·+ X_n$ be their sum.

(a) Suppose that each $X_i$ is Bernoulli with parameter $p_i$ , and that $p_1, ... , p_n$ are chosen so that the mean of $X$ is a given $μ > 0$ . Show that the variance of $X$ is maximized if the $p_i$ are chosen to be all equal to $μ / n$ .
(b) Suppose that each $X_i$ is geometric with parameter $p_i$ , and that $p_1, ... , p_n$ are chosen so that the mean of $X$ is a given $μ > 0$ . Show that the variance of $X$ is minimized if the $p_i$ are chosen to be all equal to $n / μ$ .

SOLUTION

(a) We have
$=\sum_{i=1}^nvar(X_i) =\sum_{i=1}^np_i(1-p_i)=\mu-\sum_{i=1}^np_i^2$ Thus maximizing the variance is equivalent to minimizing $\sum_{i=1}^np_i^2$ . It can be seen that
$\sum_{i=1}^np_i^2=\sum_{i=1}^n(\mu/n)^2+\sum_{i=1}^n(p_i-\mu/n)^2,$ so $\sum_{i=1}^np_i^2$ is minimized when $p_i=\mu/n$ for all $i$ .
(b) We have
$μ=\sum_{i=1}^nE[X_i] =\sum_{i=1}^n\frac{1}{p_i}$ and
$=\sum_{i=1}^n var(X_i) =\sum_{i=1}^n\frac{1-p_i}{p_i^2}$ Introducing the change of variables $y_i = 1/ p_i = E[X_i]$ . we see that the constraint becomes
$\sum_{i=1}^n y_i =μ$ and that we must minimize
$\sum_{i=1}^n y_i(y_i - 1) = \sum_{i=1}^n y_i^2 —μ,$ subject to that constraint. This is the same problem as the one of part (a), so the method of proof given there applies.

Problem 46. Entropy and uncertainty. (熵与不确定性)

Consider a random variable $X$ that can take $n$ values. $x_1 .... ,x_n$ , with corresponding probabilities $p_1, ... ,p_n$ . The entropy of $X$ is defined to be
$-\sum_{i=1}^n p_i logp_i$ (All logarithms in this problem are with respect to base two.)
- The entropy $H (X)$ provides a measure of the uncertainty about the value of $X$ . To get a sense of this. note that $H(X)\geq0$ and that $H (X)$ is very close to $0$ when $X$ is “nearly deterministic.” i.e., takes one of its possible values with probability very close to $1$ (since we have $plogp\approx0$ if either $\approx0$ or $p\approx1$ ).
- The notion of entropy is fundamental in information theory. For
  example. it can be shown that $H (X)$ is a lower bound to the average number of yes-no questions (such as “is $X = x_1$ ?” or "is $X < x_5$ ") that must be asked in order to determine the value of $X$ . Furthermore, if $k$ is the average number of questions required to determine the value of a string of independent identically distributed random variables $X_1, X_2, ... . X_n$ , then, with a suitable strategy, $k / n$ can be made as close to $H (X)$ as desired, when $n$ is large.
$(a)$ Show that if $q_1,...,q_n$ are nonnegative numbers such that $\sum_{i=1}^nq_i=1$ , then
$X)\leq-\sum_{i=1}^np_i log q_i,$ with equality if and only if $p_i = q_i$ for all $i$ . As a special case, show that $H(X)\leq log n$ , with equality if and only if $p_i = 1/n$ for all $i$ .
[Hint: Use the inequality $ln\alpha\leq\alpha- 1$ , for $\alpha> 0$ . which holds with equality if and only if $\alpha = 1$ ]
$(b)$ Let $X$ and $Y$ be random variables taking a finite number of values, and having joint PMF $p_{X,Y}(x,y)$ . Define
$=\sum_x\sum_yp_{X,Y}(x, y) log(\frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)})$ Show that $\geq 0$ , and that $I (X . Y) = 0$ if and only if $X$ and $Y$ are independent.
$(c)$ Show that
$I (X, Y) = H (X) + H (Y) - H (X, Y),$ where
$-\sum_x\sum_yp_{X,Y}(x,y) logp_{X,Y}(x. y),\\ H(X) = - \sum_xp_X(x) log p_X(x),\ \ \ \ H(Y) = -\sum_y p_Y (y) logp_Y (y)$
$(d)$ Show that
$I (X, Y) = H (X) - H (X ∣ Y),$ where
$-\sum_y p_Y(y)\sum_x p_{X|Y}(x | y) log{p_{X|Y}}(x | y).$ [Note that $H (X ∣ Y)$ may be viewed as the conditional entropy of $X$ given $Y$ , that is, the entropy of the conditional distribution of $X$ , given that $Y = y$ , averaged over all possible values $y$ . Thus. the quantity $I (X, Y) = H (X) - H (X ∣ Y)$ is the reduction in the entropy (uncertainty) on $X$ , when $Y$ becomes known. It can be therefore interpreted as the information about $X$ that is conveyed by $Y$ . and is called the mutual information of $X$ and $Y$ . (相互包含的信息量)]