【统计学习笔记】习题四
4.1 朴素贝叶斯概率估计公式
a. P ( Y = c k ) P(Y=c_k) P(Y=ck)
设
P
(
Y
=
c
k
)
=
θ
P(Y=c_k)=\theta
P(Y=ck)=θ,进行N次实验,n次Y=ck。
则有:
L
(
θ
)
=
θ
n
(
1
−
θ
)
N
−
n
L(\theta)=\theta^n(1-\theta)^{N-n}
L(θ)=θn(1−θ)N−n
取对数:
log
L
(
θ
)
=
n
log
θ
+
(
N
−
n
)
log
(
1
−
θ
)
\log L(\theta)=n\log\theta+(N-n)\log(1-\theta)
logL(θ)=nlogθ+(N−n)log(1−θ)
求导:
d
log
L
(
θ
)
d
θ
=
n
θ
−
N
−
n
1
−
θ
\frac{d\log L(\theta)}{d\theta}=\frac{n}{\theta}-\frac{N-n}{1-\theta}
dθdlogL(θ)=θn−1−θN−n
当
θ
=
n
/
N
\theta=n/N
θ=n/N时,似然函数取极大值。
故先验概率的极大似然估计为:
P
(
Y
=
c
k
)
=
n
N
=
∑
i
=
1
N
I
(
y
i
=
c
k
)
N
P(Y=c_k)=\frac{n}{N}=\frac{\sum\limits_{i=1}^NI(y_i=c_k)}{N}
P(Y=ck)=Nn=Ni=1∑NI(yi=ck)
b. P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)}=a_{jl}|Y=c_k) P(X(j)=ajl∣Y=ck)
设
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
θ
P(X^{(j)}=a_{jl}|Y=c_k)=\theta
P(X(j)=ajl∣Y=ck)=θ,进行N次试验,有n次Y=ck,有m次Y=ck且X(j)=ajl。
则有:
L
(
θ
)
=
(
n
N
θ
)
m
(
1
−
n
θ
N
)
N
−
m
L(\theta)=(\frac{n}{N}\theta)^m(1-\frac{n\theta}{N})^{N-m}
L(θ)=(Nnθ)m(1−Nnθ)N−m
取对数:
log
L
(
θ
)
=
m
log
n
N
+
m
log
θ
+
(
N
−
m
)
log
(
1
−
n
θ
N
)
\log L(\theta)=m\log\frac{n}{N}+m\log\theta+(N-m)\log{(1-\frac{n\theta}{N})}
logL(θ)=mlogNn+mlogθ+(N−m)log(1−Nnθ)
求导:
d
log
L
(
θ
)
d
θ
=
m
θ
−
n
(
N
−
m
)
N
(
1
−
n
θ
/
N
)
\frac{d\log L(\theta)}{d\theta}=\frac{m}{\theta}-\frac{n(N-m)}{N(1-n\theta/N)}
dθdlogL(θ)=θm−N(1−nθ/N)n(N−m)
则:
P
(
X
(
j
)
=
a
j
l
∣
Y
=
c
k
)
=
m
n
P(X^{(j)}=a_{jl}|Y=c_k)=\frac{m}{n}
P(X(j)=ajl∣Y=ck)=nm