1.逻辑回归
在上一讲分类里面我们推导出了逻辑函数(Logistic function)或者称作Sigmoid函数:
(1)
σ
(
z
)
=
1
1
+
e
x
p
(
−
z
)
\sigma(z)=\frac1{1+exp(-z)}\tag1
σ(z)=1+exp(−z)1(1)
![图2](https://img-blog.csdnimg.cn/20190729171615909.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图3](https://img-blog.csdnimg.cn/20190729191743958.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图4](https://img-blog.csdnimg.cn/20190729193149222.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
我们称上图中的 (5) − [ y ^ n l n f w , b ( x n ) + ( 1 − y n ) l n ( 1 − f w , b ( x n ) ) ] -\left[\hat y^nlnf_{w,b}(x^n)+(1-y^n)ln(1-f_{w,b}(x^n))\right]\tag5 −[y^nlnfw,b(xn)+(1−yn)ln(1−fw,b(xn))](5)为交叉熵(cross entropy)函数,将它我们的作为损失函数。
找一个最好的函数即求
−
l
n
L
(
w
,
b
)
-lnL(w,b)
−lnL(w,b)的极值。仍然用梯度下降法
(6)
∂
(
−
l
n
L
(
w
,
b
)
)
∂
w
i
=
∑
n
−
[
y
^
n
∂
(
l
n
f
w
,
b
(
x
n
)
)
∂
w
i
+
(
1
−
y
^
n
)
∂
l
n
(
1
−
f
w
,
b
(
x
n
)
)
∂
w
i
]
=
∑
n
−
[
y
^
n
∂
(
l
n
σ
(
z
)
)
∂
z
∂
z
∂
w
i
+
(
1
−
y
^
n
)
∂
l
n
(
1
−
σ
(
z
)
)
∂
z
∂
z
∂
w
i
]
=
∑
n
−
[
y
^
n
1
σ
(
z
)
∂
σ
(
z
)
σ
z
+
(
1
−
y
^
n
)
(
−
1
1
−
σ
(
z
)
)
∂
σ
(
z
)
∂
z
]
=
∑
n
−
[
y
^
n
(
1
−
σ
(
z
)
)
+
(
1
−
y
^
n
)
(
−
σ
(
z
)
)
]
=
∑
n
−
[
y
^
n
(
1
−
f
w
,
b
(
x
n
)
x
i
n
−
(
1
−
y
^
n
)
f
w
,
b
(
x
n
)
x
i
n
)
]
=
∑
n
−
(
y
^
n
−
f
w
,
b
(
x
n
)
)
x
i
n
\begin{aligned} \frac{\partial{(-lnL(w,b))}}{\partial w_i}&=\sum_n- \left [\hat y^n\frac{\partial(lnf_{w,b}(x^n))}{\partial w_i}+(1-\hat y^n)\frac{\partial ln\left (1-f_{w,b}(x^n)\right)}{\partial w_i}\right]\\&=\sum_n-\left[\hat y^n\frac{\partial(ln\sigma(z))}{\partial z}\frac{\partial z}{\partial w_i}+(1-\hat y^n)\frac{\partial ln\left(1-\sigma(z)\right)}{\partial z}\frac{\partial z}{\partial w_i}\right]\\&=\sum_n-\left[\hat y^n\frac1{\sigma(z)}\frac{\partial \sigma(z)}{\sigma z}+(1-\hat y^n)\left(-\frac1{1-\sigma(z)}\right)\frac{\partial\sigma (z)}{\partial z}\right]\\&=\sum_n-\left[\hat y^n(1-\sigma(z))+(1-\hat y^n)(-\sigma(z))\right]\\&=\sum_n-\left[\hat y^n(1-f_{w,b}(x^n)x_i^n-(1-\hat y^n)f_{w,b}(x^n)x_i^n)\right]\\&=\sum_n-\left(\hat y^n-f_{w,b}(x^n)\right)x_i^n \end{aligned}\tag6
∂wi∂(−lnL(w,b))=n∑−[y^n∂wi∂(lnfw,b(xn))+(1−y^n)∂wi∂ln(1−fw,b(xn))]=n∑−[y^n∂z∂(lnσ(z))∂wi∂z+(1−y^n)∂z∂ln(1−σ(z))∂wi∂z]=n∑−[y^nσ(z)1σz∂σ(z)+(1−y^n)(−1−σ(z)1)∂z∂σ(z)]=n∑−[y^n(1−σ(z))+(1−y^n)(−σ(z))]=n∑−[y^n(1−fw,b(xn)xin−(1−y^n)fw,b(xn)xin)]=n∑−(y^n−fw,b(xn))xin(6)其中,
(7)
{
f
w
,
b
(
x
)
=
σ
(
z
)
=
1
1
+
e
x
p
(
)
z
=
w
⋅
x
+
b
=
∑
i
w
i
x
i
+
b
\begin{cases}&f_{w,b}(x)=\sigma(z)=\frac1{1+exp()}&\\&z=w\cdot x+b=\sum\limits_iw_ix_i+b&\end{cases}\tag7
⎩⎨⎧fw,b(x)=σ(z)=1+exp()1z=w⋅x+b=i∑wixi+b(7)更新参数:
(8)
w
i
=
w
i
−
η
∑
n
−
(
y
^
n
−
f
w
,
b
(
x
n
)
)
x
i
n
w_i=w_i-\eta\sum_n-(\hat y^n-f_{w,b}(x^n))x_i^n\tag8
wi=wi−ηn∑−(y^n−fw,b(xn))xin(8)离目标越远,更新越快。
![图7](https://img-blog.csdnimg.cn/20190729224145704.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
2.生成模型和分类的对比
从前面的推导过程可以看出生成模型和逻辑回归具有相同的数学模型,只是我们用不同的方法求参数
w
,
b
w,b
w,b。对比如下:
![图9](https://img-blog.csdnimg.cn/20190730120104593.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图10](https://img-blog.csdnimg.cn/201907301205105.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
- 由于进行了概率分布的假设,因此不需要大量的数据就可以做预测分类
- 对干扰没有分类敏感
- 先验概率和独立性可以从不同的来源中估计出来,比如说语音识别中预测某一句话被说出来的概率,并不一定要有声音信号,可以去网上搜索有一些文本来进行估计。
3.多分类
原理和刚才的二分类一样。这里以三分类为例,通过柔性最大值传输函数放大各类之间的区别。
4.逻辑回归的限制
![图14](https://img-blog.csdnimg.cn/20190730170519583.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图15](https://img-blog.csdnimg.cn/20190730191611622.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图15](https://img-blog.csdnimg.cn/20190730191739595.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
![图15](https://img-blog.csdnimg.cn/20190730192212910.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NTQxNjkxMQ==,size_16,color_FFFFFF,t_70#pic_center)
如果进行多级级联,会发现这是非常有用的,我们给它一个新名字:神经网络(Neural Network)。
统计学习理论中这个问题的VC维为3,也可以用支持向量机来解决这个问题。 ↩︎