频率派VS贝叶斯派
符号定义
X : d a t a → X = ( x 1 x 2 . . . x N ) N × p T = [ x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋱ ⋮ x N 1 x N 2 ⋯ x N p ] N × p θ : p a r a m e t e r x ∼ p ( x ∣ θ ) \begin{aligned} & X:data \rightarrow X=(x_1 \; x_2 \; ...\;x_N)^T_{N \times p} = \begin{bmatrix} x_{11} & x_{12} & \cdots &x_{1p} \\ x_{21} & x_{22} & \cdots&x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N1} & x_{N2} & \cdots &x_{Np} \end{bmatrix}_{N \times p }\\ & \theta:parameter \\ & x \sim p(x|\theta) \end{aligned} X:data→X=(x1x2...xN)N×pT=⎣⎢⎢⎢⎡x11x21⋮xN1x12x22⋮xN2⋯⋯⋱⋯x1px2p⋮xNp⎦⎥⎥⎥⎤N×pθ:parameterx∼p(x∣θ)
频率派
思想
θ
\theta
θ是未知的常量,
X
X
X是随机变量。所以频率派估计
θ
\theta
θ。
方法
MLE极大似然估计
θ
M
L
E
=
a
r
g
max
θ
l
o
g
p
(
x
∣
θ
)
\theta_{MLE} = arg\max_\theta log p(x|\theta)
θMLE=argθmaxlogp(x∣θ)
频率派->统计机器学习
优化问题
贝叶斯派
思想
θ
\theta
θ也是随机变量,
θ
∼
p
(
θ
)
\theta \sim p(\theta)
θ∼p(θ), 一般称
p
(
θ
)
p(\theta)
p(θ)为先验。
先验(prior):
P
(
θ
)
P(\theta)
P(θ)
后验(posterior):
p
(
θ
∣
x
)
p(\theta|x)
p(θ∣x)
似然(likehood):
p
(
x
∣
θ
)
p(x|\theta)
p(x∣θ)
p
(
x
)
=
∫
θ
p
(
x
∣
θ
)
p
(
θ
)
d
θ
p(x)=\int_\theta p(x|\theta)p(\theta) d{\theta}
p(x)=∫θp(x∣θ)p(θ)dθ
p
(
θ
∣
x
)
=
p
(
x
∣
θ
)
p
(
θ
)
p
(
x
)
p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}
p(θ∣x)=p(x)p(x∣θ)p(θ)
方法
MAP极大后验概率
由于
p
(
x
)
{p(x)}
p(x)为常数
θ
M
A
P
=
a
r
g
max
θ
p
(
θ
∣
x
)
=
a
r
g
max
θ
p
(
x
∣
θ
)
p
(
θ
)
\theta_{MAP} = arg\max_\theta p(\theta | x) = arg\max_\theta p(x|\theta)p(\theta)
θMAP=argθmaxp(θ∣x)=argθmaxp(x∣θ)p(θ)
贝叶斯估计:
p
(
θ
∣
x
)
=
p
(
x
∣
θ
)
p
(
θ
)
∫
θ
p
(
x
∣
θ
)
p
(
θ
)
d
θ
p(\theta|x)=\frac{p(x|\theta)p(\theta)}{\int_\theta p(x|\theta)p(\theta) d{\theta}}
p(θ∣x)=∫θp(x∣θ)p(θ)dθp(x∣θ)p(θ)
贝叶斯预测:
已知样本数据
X
X
X,新数据
x
^
\hat{x}
x^,求
p
(
x
^
∣
X
)
p(\hat{x} | X)
p(x^∣X)。
p
(
x
^
∣
X
)
=
∫
θ
p
(
x
^
,
θ
∣
X
)
d
θ
=
∫
θ
p
(
x
^
∣
θ
)
p
(
θ
∣
X
)
d
θ
\begin{aligned} p(\hat{x} | X) &= \int_\theta p(\hat x, \theta| X) d \theta \\ & = \int_\theta p(\hat x | \theta) p(\theta | X) d \theta \end{aligned}
p(x^∣X)=∫θp(x^,θ∣X)dθ=∫θp(x^∣θ)p(θ∣X)dθ
注意:
p
(
x
^
,
θ
∣
X
)
=
p
(
x
^
∣
θ
X
)
p
(
θ
∣
X
)
p(\hat x, \theta| X)=p(\hat x | \theta X) p(\theta | X)
p(x^,θ∣X)=p(x^∣θX)p(θ∣X)
是否是因为考虑
x
^
\hat x
x^与
X
X
X独立,所以
p
(
x
^
,
θ
∣
X
)
=
p
(
x
^
∣
θ
)
p
(
θ
∣
X
)
p(\hat x, \theta| X)=p(\hat x | \theta) p(\theta | X)
p(x^,θ∣X)=p(x^∣θ)p(θ∣X)
贝叶斯->概率图模型
求积分问题 Monte Carlo Method
B站链接:
https://www.bilibili.com/video/av31950221?from=search&seid=8309397892501615322