Naive Bayesian 朴素贝叶斯
内容来自于CS229,浙江大学机器学习听课笔记以及百度百科,blog补充
We would use an example to show this algorithm, this examples is still using in practice now.
Spam email classifier
X
X
X is 1/0 vector correspond to dictionary and each dimension represents a word. If a word shows up in a email, then its corresponding value equals to 1.
We assume that
X
i
X_i
Xi are independently and identically distribution(IID), although it is obvious that they are not for the meaning of email.
We choose top 10,000 common used words as the dictionary.
P
(
x
1
.
.
.
x
10000
∣
y
)
=
P
(
x
1
∣
y
)
P
(
x
2
∣
x
1
,
y
)
.
.
.
P
(
x
10000
∣
x
9999
,
x
9998
.
.
.
x
1
,
y
)
P(x_1...x_{10000}|y) = P(x_1|y)P(x_2|x_1,y)...P(x_{10000}|x_{9999},x_{9998}...x_1,y)
P(x1...x10000∣y)=P(x1∣y)P(x2∣x1,y)...P(x10000∣x9999,x9998...x1,y)
P
(
x
1
.
.
.
x
10000
∣
y
)
=
∏
i
=
1
10000
P
(
x
i
∣
y
)
P(x_1...x_{10000}|y) = \prod\limits_{i=1}^{10000}P(x_i|y)
P(x1...x10000∣y)=i=1∏10000P(xi∣y)
Parameters:
ϕ
j
∣
y
=
1
=
P
(
x
j
=
1
∣
y
=
1
)
\phi_{j|y=1} = P(x_j=1|y=1)
ϕj∣y=1=P(xj=1∣y=1)
ϕ
j
∣
y
=
0
=
P
(
x
j
=
1
∣
y
=
0
)
\phi_{j|y=0} = P(x_j=1|y=0)
ϕj∣y=0=P(xj=1∣y=0)
ϕ
y
=
P
(
y
=
1
)
\phi_y = P(y=1)
ϕy=P(y=1)
y
=
1
y=1
y=1 means the email is spam email.
Joint likelihood:
L
(
ϕ
y
,
ϕ
j
∣
y
)
=
∏
i
=
1
m
(
x
(
i
)
,
y
(
i
)
;
ϕ
y
,
ϕ
j
∣
y
)
L(\phi_y,\phi_{j|y}) = \prod\limits_{i=1}^m(x^{(i)},y^{(i)};\phi_y,\phi_{j|y})
L(ϕy,ϕj∣y)=i=1∏m(x(i),y(i);ϕy,ϕj∣y)
MLE:
ϕ
y
=
∑
i
=
1
m
1
{
y
(
i
)
=
1
}
m
\phi_y=\frac{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}{m}
ϕy=mi=1∑m1{y(i)=1}
ϕ
j
∣
y
=
1
=
∑
i
=
1
m
1
{
x
(
i
)
=
1
,
y
(
i
)
=
1
}
∑
i
=
1
m
1
{
y
(
i
)
=
1
}
\phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}
ϕj∣y=1=i=1∑m1{y(i)=1}i=1∑m1{x(i)=1,y(i)=1}
ϕ
j
∣
y
=
0
=
∑
i
=
1
m
1
{
x
(
i
)
=
1
,
y
(
i
)
=
0
}
∑
i
=
1
m
1
{
y
(
i
)
=
0
}
\phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}}
ϕj∣y=0=i=1∑m1{y(i)=0}i=1∑m1{x(i)=1,y(i)=0}
Laplace moving
If a new word(not in the top 10000 common words) shows up in a email,
∑
i
=
1
m
1
{
x
(
i
)
=
1
,
y
(
i
)
=
1
}
=
0
\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}=0
i=1∑m1{x(i)=1,y(i)=1}=0, then
ϕ
j
∣
j
=
0
=
0
\phi_{j|j=0}=0
ϕj∣j=0=0. This seems not robust, so we use Laplace moving to optimize this equation.
ϕ
j
∣
y
=
1
=
∑
i
=
1
m
1
{
x
(
i
)
=
1
,
y
(
i
)
=
1
}
+
1
∑
i
=
1
m
1
{
y
(
i
)
=
1
}
+
10000
\phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}+10000}
ϕj∣y=1=i=1∑m1{y(i)=1}+10000i=1∑m1{x(i)=1,y(i)=1}+1
ϕ
j
∣
y
=
0
=
∑
i
=
1
m
1
{
x
(
i
)
=
1
,
y
(
i
)
=
0
}
+
1
∑
i
=
1
m
1
{
y
(
i
)
=
0
}
+
10000
\phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}+10000}
ϕj∣y=0=i=1∑m1{y(i)=0}+10000i=1∑m1{x(i)=1,y(i)=0}+1