概述
线性回归和逻辑回归是机器学习中最基本的两个模型,线性回归一般用来解决预测问题,逻辑回归一般解决分类问题,线性回归模型和逻辑回归模型之间既有区别又有关联。
线性回归模型
假定训练数据集为
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
n
,
y
n
)
}
T = \{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}
T={(x1,y1),(x2,y2),...,(xn,yn)}
拟合函数为
f
(
x
i
)
=
w
x
i
+
b
,
i
=
1
,
2
,
.
.
.
,
n
f(x_i) =wx_i+b,i=1,2,...,n
f(xi)=wxi+b,i=1,2,...,n
用最小二乘法,既是找到一条直线,使所有样本数据到直线的欧式距离之和最小,所以损失函数为
J
(
w
,
b
)
=
∑
i
=
1
n
(
f
(
x
i
)
−
y
i
)
2
=
∑
i
=
1
n
(
y
i
−
w
x
i
−
b
)
2
J(w,b)=\sum_{i=1}^n(f(x_i)-y_i)^2=\sum_{i=1}^n(y_i-wx_i-b)^2
J(w,b)=i=1∑n(f(xi)−yi)2=i=1∑n(yi−wxi−b)2
求损失函数的最小值
arg
min
w
,
b
J
(
w
,
b
)
=
min
w
,
b
∑
i
=
1
n
(
f
(
x
i
)
−
y
i
)
2
=
min
w
,
b
∑
i
=
1
n
(
y
i
−
w
x
i
−
b
)
2
\arg\min_{w,b} J(w,b)=\min_{w,b}\sum_{i=1}^n(f(x_i)-y_i)^2=\min_{w,b}\sum_{i=1}^n(y_i-wx_i-b)^2
argw,bminJ(w,b)=w,bmini=1∑n(f(xi)−yi)2=w,bmini=1∑n(yi−wxi−b)2
对其求导
∂
J
(
w
,
b
)
∂
w
=
∂
(
∑
i
=
1
n
(
w
2
x
i
2
+
(
y
i
−
b
)
2
−
2
w
x
i
(
y
i
−
b
)
)
)
∂
w
=
2
∑
i
=
1
n
(
w
x
i
2
−
x
i
(
y
i
−
b
)
)
\frac {\partial J(w,b)}{\partial w}=\frac {\partial (\sum_{i=1}^n(w^2x_i^2+(y_i-b)^2-2wx_i(y_i-b)))}{\partial w}=2\sum_{i=1}^n(wx_i^2-x_i(y_i-b))
∂w∂J(w,b)=∂w∂(∑i=1n(w2xi2+(yi−b)2−2wxi(yi−b)))=2i=1∑n(wxi2−xi(yi−b))
∂
J
(
w
,
b
)
∂
b
=
∂
(
∑
i
=
1
n
(
w
2
x
i
2
+
(
y
i
−
b
)
2
−
2
w
x
i
(
y
i
−
b
)
)
)
∂
b
\frac {\partial J(w,b)}{\partial b}=\frac {\partial (\sum_{i=1}^n(w^2x_i^2+(y_i-b)^2-2wx_i(y_i-b)))}{\partial b}
∂b∂J(w,b)=∂b∂(∑i=1n(w2xi2+(yi−b)2−2wxi(yi−b)))
=
∂
(
∑
i
=
1
n
(
w
2
x
i
2
+
(
y
i
2
−
2
b
y
i
+
b
2
)
−
2
w
x
i
y
i
+
2
w
x
i
b
)
)
∂
b
=
2
∑
i
=
1
n
(
b
+
w
x
i
−
y
i
)
=
2
n
b
−
2
∑
i
=
1
n
(
y
i
−
w
x
i
)
=\frac {\partial (\sum_{i=1}^n(w^2x_i^2+(y_i^2-2by_i+b^2)-2wx_iy_i+2wx_ib))}{\partial b}=2\sum_{i=1}^n(b+wx_i-y_i)=2nb-2\sum_{i=1}^n(y_i-wx_i)
=∂b∂(∑i=1n(w2xi2+(yi2−2byi+b2)−2wxiyi+2wxib))=2i=1∑n(b+wxi−yi)=2nb−2i=1∑n(yi−wxi)
另两个偏导等于0,求w和b
w
=
∑
i
=
1
n
x
i
(
y
i
−
b
)
∑
i
=
1
n
x
i
2
=
∑
i
=
1
n
x
i
(
y
i
−
y
ˉ
+
w
x
ˉ
)
∑
i
=
1
n
x
i
2
w = \frac {\sum_{i=1}^nx_i(y_i-b)}{\sum_{i=1}^nx_i^2}=\frac {\sum_{i=1}^nx_i(y_i-\bar y+w\bar x)}{\sum_{i=1}^nx_i^2}
w=∑i=1nxi2∑i=1nxi(yi−b)=∑i=1nxi2∑i=1nxi(yi−yˉ+wxˉ)
= ∑ i = 1 n x i ( y i − y ˉ + w x ˉ ) ∑ i = 1 n x i 2 = ∑ i = 1 n ( x i ( y i − y ˉ ) + w x i x ˉ ) ) ∑ i = 1 n x i 2 =\frac {\sum_{i=1}^nx_i(y_i-\bar y+w\bar x)}{\sum_{i=1}^nx_i^2}=\frac {\sum_{i=1}^n(x_i(y_i-\bar y)+wx_i\bar x))}{\sum_{i=1}^nx_i^2} =∑i=1nxi2∑i=1nxi(yi−yˉ+wxˉ)=∑i=1nxi2∑i=1n(xi(yi−yˉ)+wxixˉ))
= ∑ i = 1 n ( x i ( y i − y ˉ ) ) ∑ i = 1 n x i 2 − n x ˉ 2 = ∑ i = 1 n x i y i − n x ˉ y ˉ ∑ i = 1 n x i 2 − n x ˉ 2 =\frac {\sum_{i=1}^n(x_i(y_i-\bar y))}{\sum_{i=1}^nx_i^2-n\bar x^2}=\frac {\sum_{i=1}^nx_iy_i-n\bar x\bar y}{\sum_{i=1}^nx_i^2-n\bar x^2} =∑i=1nxi2−nxˉ2∑i=1n(xi(yi−yˉ))=∑i=1nxi2−nxˉ2∑i=1nxiyi−nxˉyˉ
b
=
1
n
∑
i
=
1
n
y
i
−
w
1
n
∑
i
=
1
n
x
i
=
y
ˉ
−
w
x
ˉ
b = \frac {1}{n}\sum_{i=1}^ny_i- w\frac {1}{n}\sum_{i=1}^nx_i=\bar y-w\bar x
b=n1i=1∑nyi−wn1i=1∑nxi=yˉ−wxˉ
从而得到线性回归的拟合函数
逻辑回归模型
对应二分类问题,输出
y
∈
{
0
,
1
}
y\in\{0,1\}
y∈{0,1},可以通过对线性回归模型添加Sigmoid激活函数实现逻辑回归模型,Sigmoid函数如下:
y
=
1
1
+
e
−
z
y=\frac{1}{1+e^{-z}}
y=1+e−z1
它可以将
z
z
z的值转化为接近
0
0
0或
1
1
1的
y
y
y值,并且在
z
=
0
z=0
z=0附近变化很陡,线性回归模型加入Sigmoid激活函数后变为
y
=
1
1
+
e
−
(
w
x
+
b
)
y=\frac{1}{1+e^{-(wx+b)}}
y=1+e−(wx+b)1
其对数几率函数为
l
n
y
1
−
y
=
w
x
+
b
ln\frac{y}{1-y}=wx+b
ln1−yy=wx+b
如果将
y
y
y视为类后验概率
p
(
y
=
1
∣
x
)
p(y=1|x)
p(y=1∣x),则上式可以表示为
l
n
p
(
y
=
1
∣
x
)
p
(
y
=
0
∣
x
)
=
w
x
+
b
ln\frac{p(y=1|x)}{p(y=0|x)}=wx+b
lnp(y=0∣x)p(y=1∣x)=wx+b
所以有二项逻辑回归模型如下
{
p
(
y
=
1
∣
x
)
=
e
x
p
(
w
x
+
b
)
1
+
e
x
p
(
w
x
+
b
)
p
(
y
=
0
∣
x
)
=
1
1
+
e
x
p
(
w
x
+
b
)
s
t
.
x
∈
R
n
,
y
∈
{
0
,
1
}
\begin{cases} \quad p(y=1|x)=\frac{exp(wx+b)}{1+exp(wx+b)}\\ \quad p(y=0|x)=\frac{1}{1+exp(wx+b)}\\ st. \quad x \in \Bbb R^n,y \in \{0,1\} \end{cases}
⎩⎪⎨⎪⎧p(y=1∣x)=1+exp(wx+b)exp(wx+b)p(y=0∣x)=1+exp(wx+b)1st.x∈Rn,y∈{0,1}
逻辑回归比较两个条件概率的大小,将实例
x
x
x分配到概率较大的一类
假设:
P
(
y
=
1
∣
x
)
=
π
(
x
)
,
P
(
y
=
0
∣
x
)
=
1
−
π
(
x
)
P(y=1|x)=\pi(x),P(y=0|x)=1-\pi(x)
P(y=1∣x)=π(x),P(y=0∣x)=1−π(x)
似然函数:
∏
i
=
1
n
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
\prod_{i=1}^n[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i}
i=1∏n[π(xi)]yi[1−π(xi)]1−yi
对数似然函数:
L
(
w
)
=
∑
i
=
1
n
[
y
i
log
π
(
x
i
)
+
(
1
−
y
i
)
log
(
1
−
π
(
x
i
)
)
]
L(w)=\sum_{i=1}^n[y_i\log\pi(x_i)+(1-y_i)\log(1-\pi(x_i))]
L(w)=i=1∑n[yilogπ(xi)+(1−yi)log(1−π(xi))]
= ∑ i = 1 n [ y i log π ( x i ) 1 − π ( x i ) + l o g ( 1 − π ( x i ) ) ] =\sum_{i=1}^n[y_i\log{\pi(x_i) \over {1-\pi(x_i)}}+log(1-\pi(x_i))] =i=1∑n[yilog1−π(xi)π(xi)+log(1−π(xi))]
= ∑ i = 1 n [ y i ( w ⋅ x i + b ) − l o g ( 1 + e x p ( w ⋅ x i + b ) ) ] =\sum_{i=1}^n[y_i(w\cdot x_i+b)-log(1+exp(w\cdot x_i+b))] =i=1∑n[yi(w⋅xi+b)−log(1+exp(w⋅xi+b))]
对
L
(
w
)
L(w)
L(w)求极大值,得
w
w
w的估计值,这样问题就变成了以对数似然函数为目标函数的最优化问题,逻辑回归通常采用梯度下降或拟牛顿法
L
(
W
)
L(W)
L(W)为目标函数,最优化为:
arg max l i k e l i h o o d w ( L ( w ) ) {\arg\max likelihood}_{w}(L(w)) argmaxlikelihoodw(L(w))
∂ L ( w ) ∂ w = ∑ i = 1 n y i x i − ∑ i = 1 n e x p ( w x i + b ) 1 + e x p ( w x i + b ) x i = ∑ i = 1 n ( y i − 1 1 + e x p ( − ( w x i + b ) ) ) x i \frac{\partial L(w)}{\partial w}=\sum_{i=1}^ny_ix_i-\sum_{i=1}^{n}\frac{exp(wx_i+b)}{1+exp(wx_i+b)}x_i=\sum_{i=1}^n(y_i-\frac{1}{1+exp(-(wx_i+b))})x_i ∂w∂L(w)=i=1∑nyixi−i=1∑n1+exp(wxi+b)exp(wxi+b)xi=i=1∑n(yi−1+exp(−(wxi+b))1)xi