共轭分布
由贝叶斯定理,我们知道
p
(
θ
∣
x
)
∝
p
(
x
∣
θ
)
p
(
θ
)
p(\theta|x)\varpropto p(x|\theta)p(\theta)
p(θ∣x)∝p(x∣θ)p(θ)
如果
Θ
\Theta
Θ的先验分布
p
(
θ
)
p(\theta)
p(θ)和后验分布
p
(
θ
∣
x
)
p(\theta|x)
p(θ∣x)属于同一分布族,那么就称先验分布
p
(
θ
)
p(\theta)
p(θ)和后验分布
p
(
θ
∣
x
)
p(\theta|x)
p(θ∣x)为共轭分布,同时,也称
p
(
θ
)
p(\theta)
p(θ)为似然函数
p
(
x
∣
θ
)
p(x|\theta)
p(x∣θ)的共轭先验分布。
Beta分布是二项分布的共轭先验分布
在 n n n次独立重复试验中,每次试验结果只有两种,发生和不发生,发生概率为 p p p, n n n次试验中发生的次数 X X X服从二项分布 X ∼ B ( n , p ) X\sim B(n,p) X∼B(n,p):
P ( X = k ) = C n k p k ( 1 − p ) n − k P(X=k)=C_n^k p^k(1-p)^{n-k} P(X=k)=Cnkpk(1−p)n−k
Beta分布 X ∼ B e ( α , β ) X\sim Be(\alpha,\beta) X∼Be(α,β):
f ( x ) = 1 B ( α , β ) x α − 1 ( 1 − x ) β − 1 , x ∈ [ 0 , 1 ] , α , β > 0 f(x) = \frac{1}{B(\alpha,\beta)} x^{\alpha-1}(1-x)^{\beta-1},\quad x\in[0,1],\alpha,\beta>0 f(x)=B(α,β)1xα−1(1−x)β−1,x∈[0,1],α,β>0
1 B ( α , β ) = Γ ( α + β ) Γ ( α ) + Γ ( β ) , Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t \frac{1}{B(\alpha,\beta)} =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)},\quad \Gamma(z)=\int_0^\infty t^{z-1}e^{-t}dt B(α,β)1=Γ(α)+Γ(β)Γ(α+β),Γ(z)=∫0∞tz−1e−tdt
Γ ( z + 1 ) = z Γ ( z ) , Γ ( 1 ) = 1 \Gamma(z+1)=z\Gamma(z), \Gamma(1)=1 Γ(z+1)=zΓ(z),Γ(1)=1
Beta分布的期望:
E [ X ] = ∫ x Γ ( α + β ) Γ ( α ) + Γ ( β ) x α − 1 ( 1 − x ) β − 1 d x = Γ ( α + β ) Γ ( α ) + Γ ( β ) ∫ x α ( 1 − x ) β − 1 d x = Γ ( α + β ) Γ ( α ) + Γ ( β ) Γ ( α + 1 ) + Γ ( β ) Γ ( α + β + 1 ) = α α + β \begin{aligned} E[X]&=\int x \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1} dx\\ &=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\int x^\alpha (1-x)^{\beta-1} dx\\ &=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\frac{\Gamma(\alpha+1)+\Gamma(\beta)}{\Gamma(\alpha+\beta+1)}\\ &=\frac{\alpha}{\alpha+\beta} \end{aligned} E[X]=∫xΓ(α)+Γ(β)Γ(α+β)xα−1(1−x)β−1dx=Γ(α)+Γ(β)Γ(α+β)∫xα(1−x)β−1dx=Γ(α)+Γ(β)Γ(α+β)Γ(α+β+1)Γ(α+1)+Γ(β)=α+βα
假设先验分布
Θ
∼
B
e
(
α
,
β
)
\Theta\sim Be(\alpha,\beta)
Θ∼Be(α,β):
p
(
θ
)
=
1
B
(
α
,
β
)
θ
α
−
1
(
1
−
θ
)
β
−
1
p(\theta)=\frac{1}{B(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}
p(θ)=B(α,β)1θα−1(1−θ)β−1
似然函数
X
∣
Θ
∼
B
(
n
,
θ
)
X|\Theta\sim B(n,\theta)
X∣Θ∼B(n,θ):
p
(
X
=
k
∣
Θ
=
θ
)
=
C
n
k
θ
k
(
1
−
θ
)
n
−
k
p(X=k|\Theta=\theta)= C_n^k \theta^k(1-\theta)^{n-k}
p(X=k∣Θ=θ)=Cnkθk(1−θ)n−k
则后验概率
Θ
∣
X
=
k
∼
B
e
(
α
+
k
,
β
+
n
−
k
)
\Theta|X=k\sim Be(\alpha+k,\beta+n-k)
Θ∣X=k∼Be(α+k,β+n−k):
p
(
X
=
k
∣
Θ
=
θ
)
p
(
θ
)
=
C
n
k
θ
k
(
1
−
θ
)
n
−
k
1
B
(
α
,
β
)
θ
α
−
1
(
1
−
θ
)
β
−
1
=
C
n
k
Γ
(
α
+
β
)
Γ
(
α
)
+
Γ
(
β
)
θ
α
+
k
−
1
(
1
−
θ
)
β
+
n
−
k
−
1
=
C
θ
α
+
k
−
1
(
1
−
θ
)
β
+
n
−
k
−
1
\begin{aligned} p(X=k|\Theta=\theta)p(\theta)&=C_n^k \theta^k(1-\theta)^{n-k} \frac{1}{B(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}\\ &=C_n^k \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1}\\ &=C\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1} \end{aligned}
p(X=k∣Θ=θ)p(θ)=Cnkθk(1−θ)n−kB(α,β)1θα−1(1−θ)β−1=CnkΓ(α)+Γ(β)Γ(α+β)θα+k−1(1−θ)β+n−k−1=Cθα+k−1(1−θ)β+n−k−1
p
(
X
=
k
)
=
∫
p
(
X
=
k
∣
Θ
=
θ
)
p
(
θ
)
d
θ
=
C
∫
θ
α
+
k
−
1
(
1
−
θ
)
β
+
n
−
k
−
1
d
θ
=
C
B
(
α
+
k
,
β
+
n
−
k
)
\begin{aligned} p(X=k)&=\int p(X=k|\Theta=\theta)p(\theta)d\theta\\ &=C\int\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1}d\theta\\ &=C B(\alpha+k,\beta+n-k) \end{aligned}
p(X=k)=∫p(X=k∣Θ=θ)p(θ)dθ=C∫θα+k−1(1−θ)β+n−k−1dθ=CB(α+k,β+n−k)
p
(
θ
∣
X
=
k
)
=
p
(
X
=
k
∣
Θ
=
θ
)
p
(
θ
)
p
(
X
=
k
)
=
1
B
(
α
+
k
,
β
+
n
−
k
)
θ
α
+
k
−
1
(
1
−
θ
)
β
+
n
−
k
−
1
p(\theta|X=k)=\frac{p(X=k|\Theta=\theta)p(\theta)}{p(X=k)}= \frac{1}{B(\alpha+k,\beta+n-k)}\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1}
p(θ∣X=k)=p(X=k)p(X=k∣Θ=θ)p(θ)=B(α+k,β+n−k)1θα+k−1(1−θ)β+n−k−1
狄利克雷分布是多项式分布的共轭先验分布
在 n n n次独立重复试验中,每次试验结果有 k k k个: A 1 , . . . , A k A_1,...,A_k A1,...,Ak,每个结果出现的概率为 p 1 , . . . , p k p_1,...,p_k p1,...,pk, n n n次独立重复试验中每个结果出现的次数 X 1 , . . . , X k X_1,...,X_k X1,...,Xk服从多项式分布 X ∼ m u l t i ( X 1 , . . . , X k ; p 1 , . . . , p k ) X\sim multi(X_1,...,X_k;p_1,...,p_k) X∼multi(X1,...,Xk;p1,...,pk):
P ( X 1 = n 1 , . . . , X k = n k ) = n ! n 1 ! . . . n k ! ∏ i = 1 k p i n i P(X_1=n_1,...,X_k=n_k)=\frac{n!}{n_1!...n_k!}\prod_{i=1}^kp_i^{n_i} P(X1=n1,...,Xk=nk)=n1!...nk!n!i=1∏kpini
∑ i = 1 k p i = 1 , p i > 0 \sum_{i=1}^kp_i=1,p_i>0 i=1∑kpi=1,pi>0
狄利克雷分布 X ∼ D i r ( X 1 , . . . , X k ; α 1 , . . . , α k ) X\sim Dir(X_1,...,X_k;\alpha_1,...,\alpha_k) X∼Dir(X1,...,Xk;α1,...,αk):
f ( x 1 , . . , x k ) = 1 B ( α 1 , . . . , α k ) ∏ i = 1 k x i α i − 1 f(x_1,..,x_k)=\frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^kx_i^{\alpha_i-1} f(x1,..,xk)=B(α1,...,αk)1i=1∏kxiαi−1
B ( α 1 , . . . , α k ) = ∏ i = 1 k Γ ( α i ) Γ ( ∑ i = 1 k α i ) , ∑ i = 1 k x i = 1 , α i > 0 ∀ i B(\alpha_1,...,\alpha_k)=\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}, \sum_{i=1}^k x_i=1, \alpha_i>0\forall i B(α1,...,αk)=Γ(∑i=1kαi)∏i=1kΓ(αi),i=1∑kxi=1,αi>0∀i
狄利克雷分布的期望:
E [ X j ] = ∫ x j 1 B ( α 1 , . . . , α k ) ∏ i = 1 k x i α i − 1 d x 1 . . . d x k = 1 B ( α 1 , . . . , α k ) ∫ x j α j d x j ∏ i ≠ j k ∫ x i α i − 1 d x i = B ( α 1 , . . . , α j + 1 , . . . , α k ) B ( α 1 , . . . , α j , . . . , α k ) = α j ∑ i = 1 k α i \begin{aligned} E[X_j]&=\int x_j \frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^kx_i^{\alpha_i-1} dx_1...dx_k\\ &=\frac{1}{B(\alpha_1,...,\alpha_k)}\int x_j^{\alpha_j}dx_j\prod_{i\neq j}^k\int x_i^{\alpha_i-1}dx_i\\ &=\frac{B(\alpha_1,...,\alpha_j+1,...,\alpha_k)}{B(\alpha_1,...,\alpha_j,...,\alpha_k)}\\ &=\frac{\alpha_j}{\sum_{i=1}^k\alpha_i} \end{aligned} E[Xj]=∫xjB(α1,...,αk)1i=1∏kxiαi−1dx1...dxk=B(α1,...,αk)1∫xjαjdxji=j∏k∫xiαi−1dxi=B(α1,...,αj,...,αk)B(α1,...,αj+1,...,αk)=∑i=1kαiαj
假设先验分布
Θ
1
,
.
.
.
,
Θ
k
∼
D
i
r
(
α
1
,
.
.
.
,
α
k
)
\Theta_1,...,\Theta_k\sim Dir(\alpha_1,...,\alpha_k)
Θ1,...,Θk∼Dir(α1,...,αk):
p
(
θ
1
,
.
.
,
θ
k
)
=
1
B
(
α
1
,
.
.
.
,
α
k
)
∏
i
=
1
k
θ
i
α
i
−
1
p(\theta_1,..,\theta_k)=\frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^k\theta_i^{\alpha_i-1}
p(θ1,..,θk)=B(α1,...,αk)1i=1∏kθiαi−1
似然函数
X
1
,
.
.
.
,
X
k
∣
Θ
1
,
.
.
.
,
Θ
k
∼
m
u
l
t
i
(
θ
1
,
.
.
,
θ
k
)
X_1,...,X_k|\Theta_1,...,\Theta_k\sim multi(\theta_1,..,\theta_k)
X1,...,Xk∣Θ1,...,Θk∼multi(θ1,..,θk):
p
(
n
1
,
.
.
.
,
n
k
∣
θ
1
,
.
.
,
θ
k
)
=
n
!
n
1
!
.
.
.
n
k
!
∏
i
=
1
k
θ
i
n
i
p(n_1,...,n_k|\theta_1,..,\theta_k)=\frac{n!}{n_1!...n_k!}\prod_{i=1}^k\theta_i^{n_i}
p(n1,...,nk∣θ1,..,θk)=n1!...nk!n!i=1∏kθini
则后验概率
Θ
1
,
.
.
.
,
Θ
k
∣
X
1
=
n
1
,
.
.
.
,
X
k
=
n
k
∼
D
i
r
(
α
1
+
n
1
,
.
.
.
,
α
k
+
n
k
)
\Theta_1,...,\Theta_k|X_1=n_1,...,X_k=n_k\sim Dir(\alpha_1+n_1,...,\alpha_k+n_k)
Θ1,...,Θk∣X1=n1,...,Xk=nk∼Dir(α1+n1,...,αk+nk):
p
(
n
1
,
.
.
.
,
n
k
∣
θ
1
,
.
.
,
θ
k
)
p
(
θ
1
,
.
.
,
θ
k
)
=
n
!
n
1
!
.
.
.
n
k
!
∏
i
=
1
k
θ
i
n
i
∏
i
=
1
k
Γ
(
α
i
)
Γ
(
∑
i
=
1
k
α
i
)
∏
i
=
1
k
θ
i
α
i
−
1
=
n
!
n
1
!
.
.
.
n
k
!
∏
i
=
1
k
Γ
(
α
i
)
Γ
(
∑
i
=
1
k
α
i
)
∏
i
=
1
k
θ
i
α
i
+
n
i
−
1
=
C
∏
i
=
1
k
θ
i
α
i
+
n
i
−
1
\begin{aligned} p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)&=\frac{n!}{n_1!...n_k!}\prod_{i=1}^k\theta_i^{n_i}\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}\prod_{i=1}^k\theta_i^{\alpha_i-1}\\ &=\frac{n!}{n_1!...n_k!}\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1}\\ &=C\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1} \end{aligned}
p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)=n1!...nk!n!i=1∏kθiniΓ(∑i=1kαi)∏i=1kΓ(αi)i=1∏kθiαi−1=n1!...nk!n!Γ(∑i=1kαi)∏i=1kΓ(αi)i=1∏kθiαi+ni−1=Ci=1∏kθiαi+ni−1
p
(
n
1
,
.
.
.
,
n
k
)
=
∫
p
(
n
1
,
.
.
.
,
n
k
∣
θ
1
,
.
.
,
θ
k
)
p
(
θ
1
,
.
.
,
θ
k
)
d
θ
=
C
∫
∏
i
=
1
k
θ
i
α
i
+
n
i
−
1
d
θ
=
C
B
(
α
1
+
n
1
,
.
.
.
,
α
k
+
n
k
)
\begin{aligned} p(n_1,...,n_k)&=\int p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)d\theta\\ &=C\int\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1}d\theta\\ &=C B(\alpha_1+n_1,...,\alpha_k+n_k) \end{aligned}
p(n1,...,nk)=∫p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)dθ=C∫i=1∏kθiαi+ni−1dθ=CB(α1+n1,...,αk+nk)
p
(
θ
1
,
.
.
,
θ
k
∣
n
1
,
.
.
.
,
n
k
)
=
p
(
n
1
,
.
.
.
,
n
k
∣
θ
1
,
.
.
,
θ
k
)
p
(
θ
1
,
.
.
,
θ
k
)
p
(
n
1
,
.
.
.
,
n
k
)
=
1
B
(
α
1
+
n
1
,
.
.
.
,
α
k
+
n
k
)
∏
i
=
1
k
θ
i
α
i
+
n
i
−
1
\begin{aligned} p(\theta_1,..,\theta_k|n_1,...,n_k)&=\frac{p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)}{p(n_1,...,n_k)}\\ &= \frac{1}{ B(\alpha_1+n_1,...,\alpha_k+n_k)}\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1} \end{aligned}
p(θ1,..,θk∣n1,...,nk)=p(n1,...,nk)p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)=B(α1+n1,...,αk+nk)1i=1∏kθiαi+ni−1