目录
摘要
本周,我学习了逻辑回归,了解了过拟合和欠拟合的现象以及如何利用正则化惩罚模型的权重来解决过拟合问题。文章最后介绍了如何使用scikit-learn进行逻辑回归。
Abstract
This week, I studied logistic regression, understood the phenomena of overfitting and underfitting, and learned how to use regularization to penalize the model’s weights to address overfitting. The article concluded with an introduction on how to perform logistic regression using scikit-learn.
1. 逻辑回归
逻辑回归能解决二分类问题,比如检测肿瘤是否恶性。二分类问题预测的值只有两个:0或者1,因此需要一个全新的函数,它能把值映射到
[
0
,
1
]
[0, 1]
[0,1]上,这个函数是Sigmoid函数。
Sigmoid函数的公式为
g
(
z
)
=
1
1
+
e
−
z
g(z)=\Large\frac{1}{1+e^{-z}}
g(z)=1+e−z1,它的导数为
g
′
(
z
)
=
−
1
(
1
+
e
−
z
)
2
(
−
e
−
z
)
=
e
−
z
1
+
e
−
z
1
1
+
e
−
z
=
g
(
z
)
[
1
−
g
(
z
)
]
g'(z)=\Large -\frac{1}{(1+e^{-z})^2}(-e^{-z})=\Large \frac{e^{-z}}{1+e^{-z}}\frac{1}{1+e^{-z}}=g(z)[1-g(z)]
g′(z)=−(1+e−z)21(−e−z)=1+e−ze−z1+e−z1=g(z)[1−g(z)]。它的图像如下:
1.1 模型
假设数据
x
x
x有
n
n
n个特征,
x
⃗
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
\vec{x}=[x_1, x_2, \cdots , x_n]^T
x=[x1,x2,⋯,xn]T,权重
w
⃗
=
[
w
1
,
w
2
,
⋯
,
w
n
]
T
\vec{w}=[w_1, w_2, \cdots , w_n]^T
w=[w1,w2,⋯,wn]T,则模型为
f
w
⃗
,
b
(
x
⃗
)
=
g
(
w
⃗
T
x
⃗
+
b
)
=
1
1
+
e
−
(
w
⃗
T
x
⃗
+
b
)
=
P
(
y
=
1
∣
w
⃗
,
b
)
\Large f_{\vec{w},b}(\vec{x})=g(\vec{w}^T\vec{x}+b)=\frac{1}{1+e^{-(\vec{w}^T\vec{x}+b)}}=P(y=1|\vec{w},b)
fw,b(x)=g(wTx+b)=1+e−(wTx+b)1=P(y=1∣w,b),这个函数值代表的是类别标签为1的概率。如果
f
w
⃗
,
b
(
x
⃗
(
i
)
)
≥
0.5
f_{\vec{w},b}(\vec{x}^{(i)}) \geq 0.5
fw,b(x(i))≥0.5,则预测的标签为1,反之为0。
从sigmoid函数图像中,可以看出
z
≥
0
z \geq 0
z≥0时,函数值大于等于0.5,反之小于0.5。当函数值大于等于0.5时,预测的标签为1,反之为0,因此
z
=
0
z=0
z=0可以看作是两个类别之间的分界线,也就是逻辑回归的决策边界。逻辑回归的决策边界公式为
w
⃗
T
x
⃗
+
b
=
0
\vec{w}^T\vec{x}+b=0
wTx+b=0。
1.2 代价函数
如果仍然使用平方误差代价函数,那么代价函数图像不是凸函数,即可能存在多个极小值点,因此使用梯度下降算法不能最小化代价函数,必须使用新的代价函数。
逻辑回归中每一个训练样本的损失函数为
L
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
,
y
(
i
)
]
=
{
−
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
if
y
(
i
)
=
1
,
−
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
if
y
(
i
)
=
0.
\Large L[f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)}]=\begin{cases}-log[f_{\vec{w},b}(\vec{x}^{(i)})] & \text{if } y^{(i)} = 1, \\ -log[1 - f_{\vec{w},b}(\vec{x}^{(i)})] & \text{if } y^{(i)} = 0. \end{cases}
L[fw,b(x(i)),y(i)]=⎩
⎨
⎧−log[fw,b(x(i))]−log[1−fw,b(x(i))]if y(i)=1,if y(i)=0.,代价函数为
J
(
w
⃗
,
b
)
=
1
m
∑
i
=
1
m
L
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
,
y
(
i
)
]
J(\vec{w},b)=\displaystyle\frac{1}{m}\sum_{i=1}^mL[f_{\vec{w},b}(\vec{x}^{(i)}), y^{(i)}]
J(w,b)=m1i=1∑mL[fw,b(x(i)),y(i)],可以简化为
J
(
w
⃗
,
b
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
.
\Large J(\vec{w},b)=\displaystyle-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]].
J(w,b)=−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]].
1.3 梯度下降法
逻辑回归的梯度下降公式如下:
w
j
=
w
j
−
α
∂
J
(
w
⃗
,
b
)
∂
w
j
=
w
j
−
α
∂
(
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
)
∂
w
j
=
w
j
−
α
m
(
−
∑
i
=
1
m
[
y
(
i
)
1
f
w
⃗
,
b
(
x
⃗
(
i
)
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
x
j
(
i
)
−
(
1
−
y
(
i
)
)
1
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
f
w
⃗
,
b
(
x
⃗
(
i
)
)
x
j
(
i
)
]
)
=
w
j
−
α
m
(
−
∑
i
=
1
m
[
x
j
(
i
)
y
(
i
)
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
−
x
j
(
i
)
(
1
−
y
(
i
)
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
)
=
w
j
−
α
m
∑
i
=
1
m
x
j
(
i
)
[
−
y
(
i
)
+
y
(
i
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
+
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
=
w
j
−
α
m
∑
i
=
1
m
x
j
(
i
)
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
b
=
b
−
α
∂
J
(
w
⃗
,
b
)
∂
b
=
b
−
α
∂
(
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
)
∂
b
=
b
−
α
m
(
−
∑
i
=
1
m
[
y
(
i
)
1
f
w
⃗
,
b
(
x
⃗
(
i
)
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
−
(
1
−
y
(
i
)
)
1
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
)
=
b
−
α
m
∑
i
=
1
m
[
−
y
(
i
)
+
y
(
i
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
+
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
=
b
−
α
m
∑
i
=
1
m
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
\begin{aligned} \Large w_j & =\Large w_j-\alpha\frac{\partial J(\vec{w}, b)}{\partial w_j} \\ &=\Large w_j-\alpha \frac{\partial (\displaystyle-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]])}{\partial w_j}\\ &=\Large w_j-\frac{\alpha}{m}(-\sum_{i=1}^m[y^{(i)}\frac{1}{f_{\vec{w}, b}(\vec{x}^{(i)})}f_{\vec{w}, b}(\vec{x}^{(i)})[1-f_{\vec{w}, b}(\vec{x}^{(i)})]x_j^{(i)}-(1-y^{(i)})\frac{1}{1-f_{\vec{w}, b}(\vec{x}^{(i)})}[1-f_{\vec{w}, b}(\vec{x}^{(i)})]f_{\vec{w}, b}(\vec{x}^{(i)})x_j^{(i)}])\\ &=\Large w_j-\frac{\alpha}{m}(-\sum_{i=1}^m[x_j^{(i)}y^{(i)}[1-f_{\vec{w}, b}(\vec{x}^{(i)})]-x_j^{(i)}(1-y^{(i)})f_{\vec{w}, b}(\vec{x}^{(i)})])\\ &=\Large w_j-\frac{\alpha}{m}\sum_{i=1}^mx_j^{(i)}[-y^{(i)}+y^{(i)}f_{\vec{w}, b}(\vec{x}^{(i)})+f_{\vec{w}, b}(\vec{x}^{(i)})-y^{(i)}f_{\vec{w}, b}(\vec{x}^{(i)})]\\ & =\Large w_j-\frac{\alpha}{m}\sum_{i=1}^mx_j^{(i)}[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}]\\ \Large b&=\Large b-\alpha\frac{\partial J(\vec{w},b)}{\partial b}\\ &=\Large b-\alpha \frac{\partial (\displaystyle-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]])}{\partial b}\\ &=\Large b-\frac{\alpha}{m}(-\sum_{i=1}^m[y^{(i)}\frac{1}{f_{\vec{w}, b}(\vec{x}^{(i)})}f_{\vec{w}, b}(\vec{x}^{(i)})[1-f_{\vec{w}, b}(\vec{x}^{(i)})]-(1-y^{(i)})\frac{1}{1-f_{\vec{w}, b}(\vec{x}^{(i)})}[1-f_{\vec{w}, b}(\vec{x}^{(i)})]f_{\vec{w}, b}(\vec{x}^{(i)})])\\ &=\Large b-\frac{\alpha}{m}\sum_{i=1}^m[-y^{(i)}+y^{(i)}f_{\vec{w}, b}(\vec{x}^{(i)})+f_{\vec{w}, b}(\vec{x}^{(i)})-y^{(i)}f_{\vec{w}, b}(\vec{x}^{(i)})]\\ &=\Large b-\frac{\alpha}{m}\sum_{i=1}^m[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}] \end{aligned}
wjb=wj−α∂wj∂J(w,b)=wj−α∂wj∂(−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]])=wj−mα(−i=1∑m[y(i)fw,b(x(i))1fw,b(x(i))[1−fw,b(x(i))]xj(i)−(1−y(i))1−fw,b(x(i))1[1−fw,b(x(i))]fw,b(x(i))xj(i)])=wj−mα(−i=1∑m[xj(i)y(i)[1−fw,b(x(i))]−xj(i)(1−y(i))fw,b(x(i))])=wj−mαi=1∑mxj(i)[−y(i)+y(i)fw,b(x(i))+fw,b(x(i))−y(i)fw,b(x(i))]=wj−mαi=1∑mxj(i)[fw,b(x(i))−y(i)]=b−α∂b∂J(w,b)=b−α∂b∂(−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]])=b−mα(−i=1∑m[y(i)fw,b(x(i))1fw,b(x(i))[1−fw,b(x(i))]−(1−y(i))1−fw,b(x(i))1[1−fw,b(x(i))]fw,b(x(i))])=b−mαi=1∑m[−y(i)+y(i)fw,b(x(i))+fw,b(x(i))−y(i)fw,b(x(i))]=b−mαi=1∑m[fw,b(x(i))−y(i)]
1.4 用例子理解逻辑回归
给出下表的数据进行逻辑回归,其中特征 x ⃗ = [ x 1 , x 2 ] \vec{x}=[x_1, x_2] x=[x1,x2],权重 w ⃗ = [ w 1 , w 2 ] \vec{w}=[w_1, w_2] w=[w1,w2],决策边界为 w 1 x 1 + w 2 x 2 + b = 0 w_1x_1+w_2x_2+b=0 w1x1+w2x2+b=0。
x1 | x2 | y |
---|---|---|
0.5 | 1.5 | 0 |
1 | 1 | 0 |
1.5 | 0.5 | 0 |
3 | 0.5 | 1 |
2 | 2 | 1 |
1 | 2.5 | 1 |
令 w ⃗ \vec{w} w的初值为 [ 0 , 0 ] [0, 0] [0,0], b b b的初值为0,学习率为0.3,画出学习曲线图。从学习曲线图中,迭代30次,代价函数降低幅度变缓,因此迭代次数选择30次。把迭代30次后的 w ⃗ , b \vec{w}, b w,b带入上面的决策边界表达式后,可以得到下面决策边界的图像,在决策边界上面的点的标签为1,而下面的点的标签为0。
画出上面图像的代码如下:
class LogisticRegression:
def __init__(self, alpha, la=1, epoch=1e3, penalty=None):
self.m = None
self.w = None
self.b = 0
self.alpha = alpha
self.la = la
self.e = epoch
self.jwb = []
self.ln = penalty is None
self.l1 = penalty == "l1"
self.l2 = penalty == "l2"
def _sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def _f(self, x):
return self._sigmoid(np.dot(x, self.w) + self.b)
def _loss(self, x, y):
loss = -np.sum(y * np.log(self._f(x)) + (1 - y) * np.log(1 - self._f(x))) / self.m
if self.ln:
return loss
if self.l1:
return loss + self.la * np.sum(np.abs(self.w))
if self.l2:
return loss + self.la * np.sum(self.w ** 2)
def plotlearingcurve(self, st=0, end=None):
end = int(self.e) if end is None else end
plt.figure(1)
plt.plot(np.arange(st, end + 1), self.jwb[st:end + 1])
plt.xlabel("迭代次数")
plt.ylabel("代价函数值")
plt.title(f"学习曲线图从迭代次数{st}开始")
def fit(self, x, y):
self.m = len(x)
self.w = np.zeros((len(x[0]), 1))
self.jwb.append(self._loss(x, y))
count = 0
while count < self.e:
w_temp = self.w - self.alpha * np.dot(x.T, self._f(x) - y) / self.m
if self.l1:
w_temp = w_temp - self.la * self.alpha * np.sign(w_temp)
if self.l2:
w_temp = w_temp - 2 * self.la * self.alpha * w_temp
self.b = self.b - self.alpha * np.sum(self._f(x) - y) / self.m
self.w = w_temp
self.jwb.append(self._loss(x, y))
count += 1
def predict(self, x):
return self._f(x)
def parameters(self):
return self.w, self.b
x1 = np.array([0.5, 1, 1.5, 3, 2, 1])
x2 = np.array([1.5, 1, 0.5, 0.5, 2, 2.5])
x = np.column_stack([x1, x2])
y = np.array([0, 0, 0, 1, 1, 1]).reshape(6, 1)
logReg = LogisticRegression(0.3)
logReg.fit(x, y)
logReg.plotlearingcurve(30, 200)
w, b = logReg.parameters()
plt.figure(2)
plt.scatter(x[y.flatten() == 1, 0], x[y.flatten() == 1, 1], c="red", marker="o", label="标签值为1")
plt.scatter(x[y.flatten() == 0, 0], x[y.flatten() == 0, 1], c="yellow", marker="x", label="标签值为0")
x1 = np.linspace(0, 4, 100)
plt.plot(x1, - b / w.flatten()[1] - w.flatten()[0] / w.flatten()[1] * x1, label="决策边界")
plt.xlabel("x1")
plt.ylabel("x2")
plt.title("决策边界")
plt.legend()
2. 正则化的逻辑回归
正则化不影响模型,只改变代价函数和参数更新公式。
2.1 L1正则化
代价函数:
J
(
w
⃗
,
b
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
+
λ
∑
j
=
1
n
∣
w
j
∣
\Large J(\vec{w}, b)=\displaystyle -\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]]+\lambda \sum_{j=1}^n|w_j|
J(w,b)=−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]]+λj=1∑n∣wj∣
次梯度下降公式:
w
j
=
w
j
−
α
∂
J
(
w
⃗
,
b
)
∂
w
j
=
w
j
−
α
∂
(
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
+
λ
∑
j
=
1
n
∣
w
j
∣
)
∂
w
j
=
w
j
−
α
m
∑
i
=
1
m
x
j
(
i
)
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
−
α
λ
s
i
g
n
(
w
j
)
b
=
b
−
α
m
∑
i
=
1
m
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
\begin{aligned} \Large w_j &=\Large w_j-\alpha \frac{\partial J(\vec{w},b)}{\partial w_j}\\ &=\Large w_j-\alpha \frac{\partial\displaystyle (-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]]+\lambda\sum_{j=1}^n|w_j|)}{\partial w_j}\\ &=\Large w_j-\frac{\alpha}{m}\sum_{i=1}^mx_j^{(i)}[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}]-\alpha\lambda sign(w_j)\\ \Large b &= \Large b-\frac{\alpha}{m}\sum_{i=1}^m[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}] \end{aligned}
wjb=wj−α∂wj∂J(w,b)=wj−α∂wj∂(−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]]+λj=1∑n∣wj∣)=wj−mαi=1∑mxj(i)[fw,b(x(i))−y(i)]−αλsign(wj)=b−mαi=1∑m[fw,b(x(i))−y(i)]
进行L1正则化的逻辑回归的代码如下:
logReg = LogisticRegression(0.3, 0.1, penalty="l1")
logReg.fit(x, y)
logReg.plotlearingcurve(30, 200)
w, b = logReg.parameters()
plt.figure(2)
plt.scatter(x[y.flatten() == 1, 0], x[y.flatten() == 1, 1], c="red", marker="o", label="标签值为1")
plt.scatter(x[y.flatten() == 0, 0], x[y.flatten() == 0, 1], c="yellow", marker="x", label="标签值为0")
x1 = np.linspace(0, 4, 100)
plt.plot(x1, - b / w.flatten()[1] - w.flatten()[0] / w.flatten()[1] * x1, label="决策边界")
plt.xlabel("x1")
plt.ylabel("x2")
plt.title("决策边界")
plt.legend()
2.2 L2正则化
代价函数:
J
(
w
⃗
,
b
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
+
λ
∑
j
=
1
n
w
j
2
\Large J(\vec{w},b)=\displaystyle -\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]]+\lambda\sum_{j=1}^nw_j^2
J(w,b)=−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]]+λj=1∑nwj2
梯度下降公式:
w
j
=
w
j
−
α
∂
J
(
w
⃗
,
b
)
∂
w
j
=
w
j
−
α
∂
(
−
1
m
∑
i
=
1
m
[
y
(
i
)
l
o
g
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
+
(
1
−
y
(
i
)
)
l
o
g
[
1
−
f
w
⃗
,
b
(
x
⃗
(
i
)
)
]
]
+
λ
∑
j
=
1
n
w
j
2
)
∂
w
j
=
w
j
−
α
m
∑
i
=
1
m
x
j
(
i
)
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
−
2
α
λ
w
j
=
(
1
−
2
α
λ
)
w
j
−
α
m
∑
i
=
1
m
x
j
(
i
)
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
b
=
b
−
α
m
∑
i
=
1
m
[
f
w
⃗
,
b
(
x
⃗
(
i
)
)
−
y
(
i
)
]
\begin{aligned} \Large w_j &=\Large w_j-\alpha \frac{\partial J(\vec{w},b)}{\partial w_j}\\ &=\Large w_j-\alpha \frac{\partial\displaystyle (-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log[f_{\vec{w},b}(\vec{x}^{(i)})]+(1-y^{(i)})log[1-f_{\vec{w},b}(\vec{x}^{(i)})]]+\lambda\sum_{j=1}^nw_j^2)}{\partial w_j}\\ &=\Large w_j-\frac{\alpha}{m}\sum_{i=1}^mx_j^{(i)}[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}]-2\alpha\lambda w_j\\ &=\Large (1-2\alpha\lambda)w_j-\frac{\alpha}{m}\sum_{i=1}^mx_j^{(i)}[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}]\\ \Large b &= \Large b-\frac{\alpha}{m}\sum_{i=1}^m[f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}] \end{aligned}
wjb=wj−α∂wj∂J(w,b)=wj−α∂wj∂(−m1i=1∑m[y(i)log[fw,b(x(i))]+(1−y(i))log[1−fw,b(x(i))]]+λj=1∑nwj2)=wj−mαi=1∑mxj(i)[fw,b(x(i))−y(i)]−2αλwj=(1−2αλ)wj−mαi=1∑mxj(i)[fw,b(x(i))−y(i)]=b−mαi=1∑m[fw,b(x(i))−y(i)]
进行L2正则化的逻辑回归的代码如下:
logReg = LogisticRegression(0.3, 0.1, penalty="l2")
logReg.fit(x, y)
logReg.plotlearingcurve(30, 200)
w, b = logReg.parameters()
plt.figure(2)
plt.scatter(x[y.flatten() == 1, 0], x[y.flatten() == 1, 1], c="red", marker="o", label="标签值为1")
plt.scatter(x[y.flatten() == 0, 0], x[y.flatten() == 0, 1], c="yellow", marker="x", label="标签值为0")
x1 = np.linspace(0, 4, 100)
plt.plot(x1, - b / w.flatten()[1] - w.flatten()[0] / w.flatten()[1] * x1, label="决策边界")
plt.xlabel("x1")
plt.ylabel("x2")
plt.title("决策边界")
plt.legend()
3. 使用scikit-learn进行逻辑回归
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
# 加载数据集,这里以鸢尾花数据集为例
iris = load_iris()
X = iris.data
y = iris.target
# 分割数据集,60%作为训练集,40%作为测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
# 创建逻辑回归模型实例
logreg = LogisticRegression()
# 训练模型
logreg.fit(X_train, y_train)
# 预测测试集
y_pred = logreg.predict(X_test)
# 评估模型
accuracy = accuracy_score(y_test, y_pred)
print(f'Model accuracy: {accuracy}')
参考
吴恩达 机器学习视频
总结
本周,我学习了逻辑回归以及如何利用正则化解决模型的过拟合问题。
下周,我将初步学习神经网络。