逻辑回归
一、算法思想
线性回归可以对数据进行线性拟合,拟合后的模型可以输出连续的值。由于它没有范围,因此不适合与分类问题。逻辑回归用于离散变量的分类问题,其输出值为属于某一类的概率,主要用于类的判别。
二、算法推导
1、逻辑回归
线性回归可以对数据进行线性拟合,拟合后的模型可以输出连续的值。由于它没有范围,因此不适合与分类问题。逻辑回归用于离散变量的分类问题,其输出值为属于某一类的概率,主要用于类的判别。
对于线性回归的预测函数为:
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n = θ T x {h_\theta }(x) = {\theta _0} + {\theta _1}{x_1} + {\theta _2}{x_2} + ... + {\theta _n}{x_n} = {\theta ^T}x hθ(x)=θ0+θ1x1+θ2x2+...+θnxn=θTx
在逻辑回归中,将线性回归的值通过sigmoid激活函数激活输出概率值,如图1所示,当 z > 0 z>0 z>0 时 g ( z ) > 0.5 g(z)>0.5 g(z)>0.5,当 z < 0 z<0 z<0时, g ( z ) < 0.5 g(z)<0.5 g(z)<0.5,当 z = 0 z=0 z=0时, g ( z ) = 0.5 g(z)=0.5 g(z)=0.5。逻辑回归的预测函数如下所示:
h θ ( x ) = g ( θ T x ) g ( z ) = 1 1 + e − z {h_\theta }(x) = g({\theta ^T}x)\\ g(z) = \frac{1}{{1 + {e^{ - z}}}} hθ(x)=g(θTx)g(z)=1+e−z1
这样预测函数解释为在已知 x x x 和 θ \theta θ 的条件下函数值为1的概率。
h θ ( x ) = P ( y = 1 ∣ x ; θ ) {h_\theta }(x) = P(y = 1|x;\theta ) hθ(x)=P(y=1∣x;θ)
例如,对于给定的
x
x
x 和
θ
\theta
θ 所求得的概率
h
θ
(
x
)
=
0.7
{h_\theta }(x)=0.7
hθ(x)=0.7,可知有0.7的概认为值为1,相应的为负向类的概率为0.3.
2、损失函数
对于线性回归模型,我们定义的代价函数是所有模型误差的平方和。理论上来说,我们也可以对逻辑回归模型沿用这个定义,但是问题在于,当我们将
h θ ( x ) = 1 1 + e − θ T x {h_\theta }(x) = \frac{1}{{1 + {e^{ - {\theta ^T}x}}}} hθ(x)=1+e−θTx1
带入到线性回归的代价函数中,会得到图2:可知该函数时非凸的,因此需要重新设计损失函数,使其变为凸函数以便执行后续的函数优化求解。所设计的损失函数为:
J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ( h θ ( x ) ) − ( 1 − y ( i ) ) log ( 1 − h θ ( x ) ) ] J(\theta ) = \frac{1}{m}\sum\limits_{i = 1}^m {[ - {y^{(i)}}\log ({h_\theta }(x)) - (1 - {y^{(i)}})\log (1 - {h_\theta }(x))]} J(θ)=m1i=1∑m[−y(i)log(hθ(x))−(1−y(i))log(1−hθ(x))]
写成以下形式为:
J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) ) , y ( i ) ) J(\theta ) = \frac{1}{m}\sum\limits_{i = 1}^m {{\mathop{\rm Cos}\nolimits} t({h_\theta }({x^{(i)}}),{y^{(i)}})} J(θ)=m1i=1∑mCost(hθ(x(i)),y(i))
当
y
=
1
y=1
y=1 或
y
=
0
y=0
y=0 时Cost函数为:
y
=
1
y=1
y=1时,
J
(
θ
)
=
−
log
(
h
θ
(
x
)
)
J(\theta )= - \log ({h_\theta }(x))
J(θ)=−log(hθ(x))
y
=
0
y=0
y=0时,
J
(
θ
)
=
−
log
(
1
−
h
θ
(
x
)
)
J(\theta )= - \log ({1-h_\theta }(x))
J(θ)=−log(1−hθ(x))
该函数的特点是:
- 当真实值 y = 1 y=1 y=1 以及预测值 h θ ( x ) = 1 {h_\theta }(x)=1 hθ(x)=1 时误差为0,当 y = 1 y=1 y=1且 h θ ( x ) {h_\theta }(x) hθ(x)不为1时,误差随着 h θ ( x ) {h_\theta }(x) hθ(x)的变小而增大;
- 当真实值 y = 0 y=0 y=0 以及预测值 h θ ( x ) = 0 {h_\theta }(x)=0 hθ(x)=0 时误差为0,当真实值当真实值 y = 0 y=0 y=0且 h θ ( x ) {h_\theta }(x) hθ(x)不为0时,误差随着 h θ ( x ) {h_\theta }(x) hθ(x)的增大而增大。
3、算法求解
损失函数中变量的求解为凸优化问题,推导过程如下:
并且有:
因此可得:
对其进行求导可得:
可以使用梯度下降法对参数进行更新:
三、算法实现
LogisticRegression.py
# -*- coding: utf-8 -*-
import numpy as np
class LogisticRegression(object):
def __init__(self, learning_rate=0.1, max_iter=100, seed=None):
self.seed = seed
self.lr = learning_rate
self.max_iter = max_iter
def fit(self, x, y):
np.random.seed(self.seed)
self.w = np.random.normal(loc=0.0, scale=1.0, size=x.shape[1])
self.b = np.random.normal(loc=0.0, scale=1.0)
self.x = x
self.y = y
for i in range(self.max_iter):
self._update_step()
# print('loss: \t{}'.format(self.loss()))
# print('score: \t{}'.format(self.score()))
# print('w: \t{}'.format(self.w))
# print('b: \t{}'.format(self.b))
def _sigmoid(self, z):
return 1.0 / (1.0 + np.exp(-z))
def _f(self, x, w, b):
z = x.dot(w) + b
return self._sigmoid(z)
def predict_proba(self, x=None):
if x is None:
x = self.x
y_pred = self._f(x, self.w, self.b)
return y_pred
def predict(self, x=None):
if x is None:
x = self.x
y_pred_proba = self._f(x, self.w, self.b)
y_pred = np.array([0 if y_pred_proba[i] < 0.5 else 1 for i in range(len(y_pred_proba))])
return y_pred
def score(self, y_true=None, y_pred=None):
if y_true is None or y_pred is None:
y_true = self.y
y_pred = self.predict()
acc = np.mean([1 if y_true[i] == y_pred[i] else 0 for i in range(len(y_true))])
return acc
def loss(self, y_true=None, y_pred_proba=None):
if y_true is None or y_pred_proba is None:
y_true = self.y
y_pred_proba = self.predict_proba()
return np.mean(-1.0 * (y_true * np.log(y_pred_proba) + (1.0 - y_true) * np.log(1.0 - y_pred_proba)))
def _calc_gradient(self):
y_pred = self.predict()
d_w = (y_pred - self.y).dot(self.x) / len(self.y)
d_b = np.mean(y_pred - self.y)
return d_w, d_b
def _update_step(self):
d_w, d_b = self._calc_gradient()
self.w = self.w - self.lr * d_w
self.b = self.b - self.lr * d_b
return self.w, self.b
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import data_helper
from logistic_regression import *
# data generation
x, y = data_helper.generate_data(seed=272)
x_train, y_train, x_test, y_test = data_helper.train_test_split(x, y)
# visualize data
# plt.scatter(x_train[:,0], x_train[:,1], c=y_train, marker='.')
# plt.show()
# plt.scatter(x_test[:,0], x_test[:,1], c=y_test, marker='.')
# plt.show()
# data normalization
x_train = (x_train - np.min(x_train, axis=0)) / (np.max(x_train, axis=0) - np.min(x_train, axis=0))
x_test = (x_test - np.min(x_test, axis=0)) / (np.max(x_test, axis=0) - np.min(x_test, axis=0))
# Logistic regression classifier
clf = LogisticRegression(learning_rate=0.1, max_iter=500, seed=272)
clf.fit(x_train, y_train)
# plot the result
split_boundary_func = lambda x: (-clf.b - clf.w[0] * x) / clf.w[1]
xx = np.arange(0.1, 0.6, 0.1)
plt.scatter(x_train[:,0], x_train[:,1], c=y_train, marker='.')
plt.plot(xx, split_boundary_func(xx), c='red')
plt.show()
# loss on test set
y_test_pred = clf.predict(x_test)
y_test_pred_proba = clf.predict_proba(x_test)
print(clf.score(y_test, y_test_pred))
print(clf.loss(y_test, y_test_pred_proba))
# print(y_test_pred_proba)