2.1.1.
A
=
o
n
e
c
h
i
l
d
i
s
b
o
y
∣
A
∣
=
3
P
=
2
3
A=one\ child\ is\ boy\\ |A|=3\\ P=\frac{2}{3}
A=one child is boy∣A∣=3P=32
2.1.2.
First child is boy.
P
=
1
2
P=\frac{1}{2}
P=21
2.3:
v
a
r
[
x
+
y
]
=
E
[
(
x
+
y
)
2
]
−
E
2
[
x
+
y
]
v
a
r
[
x
+
y
]
=
E
[
x
2
]
+
E
[
y
2
]
+
2
E
[
x
y
]
−
(
E
[
x
]
+
E
[
y
]
)
2
v
a
r
[
x
+
y
]
=
v
a
r
[
x
]
+
v
a
r
[
y
]
+
2
(
E
[
x
y
]
−
E
[
x
]
E
[
y
]
)
=
v
a
r
[
x
]
+
v
a
r
[
y
]
+
2
c
o
n
v
(
x
,
y
)
\begin{align} var[x+y]=E[(x+y)^ 2]-E^2[x+y] \\ var[x+y]=E[x^ 2]+E[y^ 2]+2E[xy]-(E[x]+E[y])^2\\ var[x+y]=var[x]+var[y]+2(E[xy]-E[x]E[y])=var[x]+var[y]+2conv(x,y) \end{align}
var[x+y]=E[(x+y)2]−E2[x+y]var[x+y]=E[x2]+E[y2]+2E[xy]−(E[x]+E[y])2var[x+y]=var[x]+var[y]+2(E[xy]−E[x]E[y])=var[x]+var[y]+2conv(x,y)
2.4
P
(
i
l
l
∣
p
o
s
i
t
i
v
e
)
=
0.99
,
P
(
i
l
l
)
=
1
e
−
4
P(ill|positive)=0.99,P(ill)=1e-4
P(ill∣positive)=0.99,P(ill)=1e−4
the answer is 0.99.
text example:
p
(
p
o
s
i
t
i
v
e
∣
i
l
l
)
=
0.8
,
p
(
i
l
l
)
=
0.004
,
p
(
p
o
s
i
t
i
v
e
∣
n
o
t
i
l
l
)
=
0.1
p(positive|ill)=0.8,p(ill)=0.004,p(positive|not ill)=0.1
p(positive∣ill)=0.8,p(ill)=0.004,p(positive∣notill)=0.1
p
(
p
o
s
i
t
i
v
e
)
=
p
(
p
o
s
i
t
i
v
e
∣
i
l
l
)
p
(
i
l
l
)
+
p
(
p
o
s
i
t
i
v
e
∣
n
o
t
i
l
l
)
p
(
n
o
t
i
l
l
)
=
0.8
∗
0.004
+
0.1
∗
(
1
−
0.004
)
=
0.1028
p(positive)=p(positive|ill)p(ill)+p(positive|not ill)p(not ill)=0.8*0.004+0.1*(1-0.004)=0.1028
p(positive)=p(positive∣ill)p(ill)+p(positive∣notill)p(notill)=0.8∗0.004+0.1∗(1−0.004)=0.1028
p
(
i
l
l
∣
p
o
s
i
t
i
v
e
)
=
p
(
p
o
s
i
t
i
v
e
∣
i
l
l
)
p
(
i
l
l
)
p
(
p
o
s
i
t
i
v
e
)
=
0.8
∗
0.004
0.1028
=
0.031
p(ill|positive)=\frac{p(positive|ill)p(ill)}{p(positive)}=\frac{0.8*0.004}{0.1028}=0.031
p(ill∣positive)=p(positive)p(positive∣ill)p(ill)=0.10280.8∗0.004=0.031
2.5
A=prize behind first picked door
B=prize behind final picked door
P
(
A
)
=
1
/
3
,
P
(
A
)
=
2
/
3
P(A)=1/3,\ P(~A)=2/3
P(A)=1/3, P( A)=2/3
P
(
B
)
=
P
(
B
∣
A
)
P
(
A
)
+
P
(
B
∣
A
)
P
(
A
)
=
0
∗
1
/
3
+
1
∗
2
/
3
=
2
/
3
P(B)=P(B|A)P(A)+P(B|~A)P(~A)=0*1/3+1*2/3=2/3
P(B)=P(B∣A)P(A)+P(B∣ A)P( A)=0∗1/3+1∗2/3=2/3
2.6
1.
P
(
H
∣
e
1
,
e
2
)
=
P
(
e
1
,
e
2
∣
H
)
P
(
H
)
P
(
e
1
,
e
2
)
P(H|e_1,e_2)=\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}
P(H∣e1,e2)=P(e1,e2)P(e1,e2∣H)P(H)
answer is ii.
P
(
e
1
,
e
2
∣
H
)
=
P
(
e
1
∣
H
)
P
(
e
2
∣
H
)
P(e_1,e_2|H)=P(e_1|H)P(e_2|H)
P(e1,e2∣H)=P(e1∣H)P(e2∣H)
i, ii is sufficient.
P
(
e
1
,
e
2
)
=
∑
H
P
(
H
)
P
(
e
1
∣
H
)
P
(
e
2
∣
H
)
P(e_1,e_2)=\sum_H P(H)P(e_1|H)P(e_2|H)
P(e1,e2)=H∑P(H)P(e1∣H)P(e2∣H)
iii is sufficient.
2.7
wikipedia
x
=
U
(
0
,
1
)
,
y
=
U
(
0
,
1
)
,
z
=
x
x
o
r
y
x=U(0,1),\ y=U(0,1),\ z=x\ xor\ y
x=U(0,1), y=U(0,1), z=x xor y
2.8
x
⊥
y
∣
z
→
p
(
x
,
y
∣
z
)
=
h
(
x
,
y
)
g
(
y
,
z
)
x\bot y|z \rightarrow p(x,y|z)=h(x,y)g(y,z)
x⊥y∣z→p(x,y∣z)=h(x,y)g(y,z)
It is trival that
h
(
x
,
z
)
=
p
(
x
∣
z
)
,
g
(
y
,
z
)
=
p
(
y
∣
z
)
h(x,z)=p(x|z),g(y,z)=p(y|z)
h(x,z)=p(x∣z),g(y,z)=p(y∣z).
vice versa,
p
(
x
∣
z
)
=
∑
y
p
(
x
,
y
∣
z
)
=
g
(
x
,
z
)
∑
y
h
(
y
,
z
)
p
(
y
∣
z
)
=
∑
y
p
(
x
,
y
∣
z
)
=
h
(
y
,
z
)
∑
x
g
(
x
,
z
)
1
=
p
(
x
,
y
∣
z
)
=
∑
x
,
y
h
(
x
,
z
)
g
(
y
,
z
)
=
∑
x
h
(
x
,
z
)
∑
y
g
(
y
,
z
)
t
h
e
n
,
p
(
x
∣
z
)
p
(
y
∣
z
)
=
g
(
x
,
z
)
h
(
y
,
z
)
∑
x
h
(
x
,
z
)
∑
y
g
(
y
,
z
)
=
g
(
x
,
z
)
h
(
y
,
z
)
\begin{align} p(x|z)&=\sum_y{p(x,y|z)}\\ &=g(x,z)\sum_y{h(y,z)}\\ p(y|z)&=\sum_y{p(x,y|z)}\\ &=h(y,z)\sum_x{g(x,z)}\\ 1&=p(x,y|z)\\ &=\sum_{x,y}h(x,z)g(y,z)\\ &=\sum_x h(x,z)\sum_y g(y,z)\\ then,\\ p(x|z)p(y|z)&=g(x,z)h(y,z)\sum_x h(x,z)\sum_y g(y,z)\\ &=g(x,z)h(y,z) \end{align}
p(x∣z)p(y∣z)1then,p(x∣z)p(y∣z)=y∑p(x,y∣z)=g(x,z)y∑h(y,z)=y∑p(x,y∣z)=h(y,z)x∑g(x,z)=p(x,y∣z)=x,y∑h(x,z)g(y,z)=x∑h(x,z)y∑g(y,z)=g(x,z)h(y,z)x∑h(x,z)y∑g(y,z)=g(x,z)h(y,z)
2.9
(i) true
(ii)false
2.10
p
(
y
)
=
p
(
x
)
d
y
d
x
d
y
d
x
=
−
1
x
2
I
G
(
x
∣
a
,
b
)
=
b
a
Γ
(
a
)
x
−
(
a
+
1
)
e
−
b
x
\begin{align} p(y)&=p(x)\frac{dy}{dx}\\ \frac{dy}{dx}&=-\frac{1}{x^2}\\ IG(x|a,b)=\frac{b^a}{\Gamma(a)}x^{-(a+1)}e^{-\frac{b}{x}} \end{align}
p(y)dxdyIG(x∣a,b)=Γ(a)bax−(a+1)e−xb=p(x)dxdy=−x21
2.11
Intergral
θ
\theta
θ first,
Z
2
=
∫
0
2
π
d
θ
∫
−
∞
∞
r
exp
(
−
r
2
2
σ
2
)
d
r
=
2
π
∫
0
∞
r
exp
(
−
r
2
2
σ
2
)
d
r
\begin{align} Z^ 2&=\int_0^ {2\pi}d\theta\int_{-\infty }^{\infty }r\exp(-\frac{r^2}{2\sigma^2})dr\\ &=2\pi\int_{0}^{\infty }r\exp(-\frac{r^2}{2\sigma^2})dr \end{align}
Z2=∫02πdθ∫−∞∞rexp(−2σ2r2)dr=2π∫0∞rexp(−2σ2r2)dr
KaTeX parse error: Expected 'EOF', got '\end' at position 168: …frac{\sigma^2} \̲e̲n̲d̲{align}
So,
Z
2
=
2
π
σ
2
Z^2=2\pi\sigma^2
Z2=2πσ2,then
Z
=
σ
2
π
Z=\sigma\sqrt{2\pi}
Z=σ2π
2.12
I
(
X
,
Y
)
=
∑
x
,
y
p
(
x
,
y
)
log
p
(
x
,
y
)
p
(
x
)
p
(
y
)
=
∑
x
,
y
p
(
x
,
y
)
log
p
(
x
∣
y
)
p
(
x
)
=
∑
x
,
y
p
(
x
,
y
)
(
log
p
x
∣
y
−
log
p
(
x
)
=
−
H
(
x
∣
y
)
−
∑
x
log
p
(
x
)
(
∑
y
p
(
x
,
y
)
)
=
−
H
(
x
∣
y
)
+
H
(
x
)
\begin{align} I(X,Y)=&\sum_{x,y}p(x,y)\log\frac{p(x,y)}{p(x)p(y)}\\ =&\sum_{x,y}p(x,y)\log\frac{p(x|y)}{p(x)}\\ =&\sum_{x,y}p(x,y)(\log p{x|y}-\log p(x)\\ =&-H(x|y)-\sum_x \log p(x)(\sum_y p(x,y))\\ =&-H(x|y)+H(x) \end{align}
I(X,Y)=====x,y∑p(x,y)logp(x)p(y)p(x,y)x,y∑p(x,y)logp(x)p(x∣y)x,y∑p(x,y)(logpx∣y−logp(x)−H(x∣y)−x∑logp(x)(y∑p(x,y))−H(x∣y)+H(x)
2.13
I
(
X
,
Y
)
=
H
(
x
)
−
H
(
x
∣
y
)
=
H
(
x
)
+
H
(
y
)
−
H
(
x
,
y
)
=
log
2
π
e
σ
2
+
1
2
log
(
2
π
e
)
2
σ
4
(
1
−
ρ
2
)
\begin{align} I(X,Y)=&H(x)-H(x|y)\\ =&H(x)+H(y)-H(x,y)\\ =&\log{2\pi e \sigma^2}+\frac{1}{2}\log{(2\pi e)^2 \sigma^4(1-\rho^2)} \end{align}
I(X,Y)===H(x)−H(x∣y)H(x)+H(y)−H(x,y)log2πeσ2+21log(2πe)2σ4(1−ρ2)
For
ρ
=
0
\rho=0
ρ=0,
I
(
x
,
y
)
=
log
2
π
e
σ
2
+
1
2
log
(
2
π
e
)
2
σ
4
=
2
log
2
π
e
σ
2
=
H
(
x
)
+
H
(
y
)
I(x,y)=\log{2\pi e \sigma^2}+\frac{1}{2}\log{(2\pi e)^2 \sigma^4}=2\log{2\pi e \sigma^2}=H(x)+H(y)
I(x,y)=log2πeσ2+21log(2πe)2σ4=2log2πeσ2=H(x)+H(y)
When
C
o
v
(
x
,
y
)
=
0
Cov(x,y)=0
Cov(x,y)=0, mutual information is simply sum of single information of two variables, which knowing
x
x
x does not give any information about
y
y
y and vice versa.
For
ρ
=
±
1
\rho=\pm 1
ρ=±1,
I
(
x
,
y
)
=
∞
I(x,y)=\infty
I(x,y)=∞
All information conveyed by
x
x
x is shared with
y
y
y: knowing
x
x
x determines the value of
y
y
y and vice versa.
2.14
(i)
obvious.
(ii)
It is easy to prove non negativity of entropy
H
(
x
)
>
0
H(x)>0
H(x)>0.
I
(
x
,
y
)
≥
0
→
r
≥
0
I(x,y)\geq 0\rightarrow r\geq 0
I(x,y)≥0→r≥0
I
(
x
,
y
)
≥
0
I(x,y)\geq 0
I(x,y)≥0 is obvious due to its formula.
(iii)
I
(
x
,
y
)
=
0
I(x,y)=0
I(x,y)=0, x,y are independent
(iiii)
I
(
x
,
y
)
=
1
I(x,y)=1
I(x,y)=1, x is fully dependent of y.
2.15
θ
=
arg min
θ
K
L
(
P
e
m
p
∣
∣
q
(
θ
)
)
=
arg min
θ
E
(
P
e
m
p
log
P
e
m
p
q
(
θ
)
)
=
arg min
θ
E
(
P
e
m
p
(
log
P
e
m
p
−
log
q
(
θ
)
)
)
=
H
e
m
p
−
arg max
θ
E
(
P
e
m
p
log
q
(
θ
)
)
=
arg max
θ
E
(
P
e
m
p
log
q
(
θ
)
)
=
arg max
θ
∑
x
∈
D
a
t
a
s
e
t
log
q
(
x
;
θ
)
\begin{align} \theta=&\argmin_\theta{KL(P_{emp}||q(\theta))}\\ =&\argmin_\theta{E(P_{emp}\log \frac{P_{emp}}{q(\theta)})}\\ =&\argmin_\theta{E(P_{emp}(\log{P_{emp}}-\log{q(\theta)}) )}\\ =&H_{emp}-\argmax_\theta{E(P_{emp}\log{q(\theta)} )}\\ =&\argmax_\theta{E(P_{emp}\log{q(\theta)} )}\\ =&\argmax_\theta{\sum_{x\in Dataset}\log{q(x;\theta)}} \end{align}
θ======θargminKL(Pemp∣∣q(θ))θargminE(Pemplogq(θ)Pemp)θargminE(Pemp(logPemp−logq(θ)))Hemp−θargmaxE(Pemplogq(θ))θargmaxE(Pemplogq(θ))θargmaxx∈Dataset∑logq(x;θ)
2.16
pdf of beta distribution:
x
α
−
1
(
1
−
x
)
β
−
1
B
(
α
,
β
)
\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}
B(α,β)xα−1(1−x)β−1
mode:
d
d
x
x
α
−
1
(
1
−
x
)
β
−
1
B
(
α
,
β
)
=
0
x
=
α
−
1
α
+
β
−
2
\begin{align} \frac{d}{dx}\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}&=0\\ x&=\frac{\alpha-1}{\alpha+\beta-2} \end{align}
dxdB(α,β)xα−1(1−x)β−1x=0=α+β−2α−1
E
(
x
N
)
=
1
B
(
α
,
β
)
∫
x
α
+
N
−
1
(
1
−
x
)
β
−
1
=
B
(
α
+
N
,
β
)
B
(
α
,
β
)
\begin{align} E(x^N)&=\frac{1}{B(\alpha,\beta)}\int x^{\alpha+N-1}(1-x)^{\beta-1}\\ &=\frac{B(\alpha+N,\beta)}{B(\alpha,\beta)} \end{align}
E(xN)=B(α,β)1∫xα+N−1(1−x)β−1=B(α,β)B(α+N,β)
mean:
E
(
x
)
=
1
B
(
α
,
β
)
∫
x
α
+
N
−
1
(
1
−
x
)
β
−
1
=
a
a
+
b
\begin{align} E(x)&=\frac{1}{B(\alpha,\beta)}\int x^{\alpha+N-1}(1-x)^{\beta-1}\\ &=\frac{a}{a+b} \end{align}
E(x)=B(α,β)1∫xα+N−1(1−x)β−1=a+ba
var:
E
(
x
2
)
−
E
2
(
x
)
=
a
b
(
a
+
b
)
2
(
a
+
b
+
a
)
−
a
2
(
a
+
b
)
2
E(x^2)-E^2(x)=\frac{ab}{(a+b)^2(a+b+a)}-\frac{a^2}{(a+b)^2}
E(x2)−E2(x)=(a+b)2(a+b+a)ab−(a+b)2a2
2.17
The leftest point’s coordinate
f
(
x
,
y
)
=
m
i
n
(
x
,
y
)
f(x,y)=min(x,y)
f(x,y)=min(x,y),
p
(
f
(
x
,
y
)
=
m
)
=
p
(
x
=
m
,
y
>
=
m
)
+
p
(
x
>
=
m
,
y
=
m
)
=
2
(
1
−
m
)
E
(
m
)
=
∫
0
1
2
m
(
1
−
m
)
d
m
=
∫
0
1
2
m
−
2
m
2
d
m
=
m
2
−
2
3
m
3
∣
0
1
=
1
3
\begin{align} p(f(x,y)=m)=&p(x=m,y>=m)+p(x>=m,y=m)\\ =&2(1-m)\\ E(m)=&\int_0^1 2m(1-m)dm\\ &=\int_0^1 2m-2m^2dm\\ &=\left. m^2-\frac{2}{3}m^3\right|_0^1\\ &=\frac{1}{3} \end{align}
p(f(x,y)=m)==E(m)=p(x=m,y>=m)+p(x>=m,y=m)2(1−m)∫012m(1−m)dm=∫012m−2m2dm=m2−32m3∣
∣01=31
The problem can also be solved in 3-d coordinates. The body is a cone with height 1 and bottom area 1.