机器学习-逻辑回归

最新推荐文章于 2024-09-19 09:37:00 发布

WavenZ

最新推荐文章于 2024-09-19 09:37:00 发布

阅读量307

点赞数 1

分类专栏：机器学习 Python 文章标签：机器学习逻辑回归

本文链接：https://blog.csdn.net/weixin_43374723/article/details/85207575

版权

机器学习同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

Python

5 篇文章 0 订阅

订阅专栏

逻辑回归

写在前面
1. 逻辑回归模型
2. 成本函数
3. 梯度下降
4. 逻辑回归实例分析

写在前面

本博客是作者在学习机器学习基础时写下的总结，学习的资源为网易云课堂上吴恩达的机器学习课程，参考资料主要为stanford cs229课程的英文讲义,有兴趣的读者可以去网上下载原版的英文讲义来看。此外，由于本人为初学者，对知识点理解有限，因此文中有任何错误非常欢迎大家指出。最后，本文涉及的所有代码的完整版均会上传到github,欢迎大家交流。

1. 逻辑回归模型

逻辑回归实际上对应监督学习中的分类问题，它与线性回归不同的是，逻辑回归的输出值为有限个离散的值。在二元逻辑回归中，输出 $y$ 的取值为0和1，其中称作0为负类，而1为正类。对于给定的输入 $x^{(i)}$ ，输出响应 $y^{(i)}$ 也被称为训练样本的标记。下面给出假设函数：
$\displaystyle h_\theta(x) = g(\theta^Tx) = \displaystyle \frac{1}{1+e^{-\displaystyle \theta^Tx}}$
其中，
$\frac{1}{1+e^{-z}}$
称为逻辑函数或者sigmoid函数。
它的图像如下：

可以看出，sigmoid函数将 $(-\infty, +\infty)$ 变换到 $(0, 1)$ 范围内，并且当 $z\lt0$ 时， $g(z)\lt0.5$ ；当 $z > 0$ 时， $g (z) > 0.5$ 。另外，从单调性来看，sigmoid函数不改变z的单调性。

Sigmoid函数还有一个重要的性质如下：
$\begin{aligned} g'(z) &= \frac{d}{dz}\frac{1}{1+e^{-z}} \\ &=\frac{1}{(1+e^{-z})^2}(e^{-z}) \\ & = g(z)(1-g(z)) \end{aligned}$

2. 成本函数

考虑线性回归时所用的成本函数：
$J(\theta) = \frac{1}{2m}\sum_{i = 1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
由于sigmoid函数的非线性，这里的 $J(\theta)$ 实际上是一个非凸的函数。也就是说， $J(\theta)$ 存在不唯一的局部最优解，当我们利用梯度下降来优化 $J(\theta)‘$ 时，往往求解到的不是全局最优解，而是局部最优解。

下面直接给出逻辑回归的成本函数，该函数可通过极大似然估计求得：
$Cost(h_\theta(x), y) = \begin{cases} -\ln(h_\theta(x))& ,y = 1 \\-\ln(1-h_\theta(x))&, y = 0 \end{cases}$
该成本函数的解释如下，当 $y = 1$ 时， $Cost(h_\theta(x), y)$ 随 $h_\theta(x))$ 变化的图像如图所示：

由于 $y = 1$ ，因此当 $h_\theta(x))\rightarrow1$ 时， $\rightarrow0$ ；而当 $h_\theta(x))\rightarrow0$ 时， $\rightarrow+\infty$ 。

同样的，当 $y = 0$ 时，有

由于 $y = 0$ ，因此当 $h_\theta(x))\rightarrow0$ 时， $\rightarrow0$ ；而当 $h_\theta(x))\rightarrow1$ 时， $\rightarrow+\infty$ 。

将上述成本函数写成更加紧凑的方式：
$Cost(h_\theta(x), y) = -y\ln(h\theta(x))-(1-y)\ln(1-h\theta(x))$

因此对于m个样本，成本函数 $J(\theta)$ 为：
$\begin{aligned} J(\theta) & = \frac{1}{m}\sum_{i=1}^mCost(h_\theta(x^{(i)}, y^{(i)}) \\& = -\frac{1}{m}[\sum_{i=1}^my^{(i)}\ln h_\theta(x^{(i)})+(1-y^{(i)})\ln(1-h_\theta(x{(i)}))] \end{aligned}$

可以证明，该成本函数和对数极大似然函数的结果一致。

3. 梯度下降

为了最小化成本函数 $J(\theta)$ ，下面对参数 $\theta$ 求偏导可得：
$\frac{\partial}{\partial \theta_j}J(\theta) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$

因此，梯度下降的过程可以表示为：

$\qquad$ Repeat until convergence{
$\displaystyle \qquad\qquad \theta_j := \theta_j - \alpha \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j$
$\qquad$ } $j = 1, 2, . . .$

从表面来看，逻辑回归的梯度下降过程与线性回归一模一样，但实际上这里的 $h_\theta(x)=\displaystyle \frac{1}{1+e^{-\theta^Tx}}$ ,而线性回归中的假设函数为 $h_\theta(x)=\theta^Tx$ 。

4. 逻辑回归实例分析

考虑下面一个二元分类问题，数据集可见GitHub：

平面上有若干个点，它们分别属于两个类别，先用逻辑回归进行分类，于是
$x^T = \left[ \begin{matrix} 1&x_1^{(0)}&x_2^{(0)} \\1&x_1^{(1)}&x_2^{(0)} \\ ...& ...&... \\1&x_{1}^{(m-1)}&x_{2}^{(m-1)} \end{matrix} \right], \theta = \left[ \begin{matrix} \theta_0 \\\theta_1 \\\theta_2 \end{matrix} \right]$
成本函数偏导数
$\begin{cases} \displaystyle \frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\\displaystyle \frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_1^{(i)} \\\displaystyle \frac{\partial}{\partial \theta_2}J(\theta) = \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_2^{(i)} \end{cases}$
值得提及的是，上述所有公式均可表示为矩阵运算，这样可以有更高的效率，代码如下：

import matplotlib.pyplot as plt
import numpy as np 

class logisRegression(object):
    
    def __init__(self, theta, alpha):
        self.theta = theta
        self.alpha = alpha
        self.cos = 0

    def sigmoid(self, x):
        z = 1/(1+np.exp(-x))
        return z

    def cost(self, trainX, trainY):
        self.cos = (1/m)*(self.sigmoid(trainX.dot(self.theta))-trainY).T.dot(trainX).T

    def update(self, trainX, trainY):
        self.cost(trainX, trainY)
        self.theta = self.theta - self.alpha * self.cos


def plotFig(lr):
    plt.plot(data[:15, 0], data[:15, 1], 'x')
    plt.plot(data[15:, 0], data[15:, 1], 'x')
    plt.xlim([-3, 5])
    plt.ylim([-1, 5])
    a = np.linspace(-3, 5, 9)
    b = np.linspace(-1, 5, 7)
    x = np.meshgrid(a, b)
    z = 1/(1+np.exp(-(x[0]*lr.theta[1]+x[1]*lr.theta[2]+lr.theta[0])))
    plt.contour(x[0], x[1], z, [0.5], width=0.5, alpha=0.4)
    plt.show()



if __name__ == '__main__':
    data = np.loadtxt(open("1.txt", "rb"))
    m = data.shape[0]
    n = data.shape[1] + 1 - 1
    trainX = np.ones(data.shape)
    trainX[:, 1:] = data[:, :-1]
    trainY = data[:, -1].reshape(data.shape[0], 1)
    theta = np.zeros((n, 1))
    LR = logisRegression(theta, 0.03)
    for i in range(30000):
        LR.update(trainX, trainY)
    print(LR.theta)
    plotFig(LR)