logistic regression 逻辑回归

最新推荐文章于 2022-07-11 17:28:24 发布

影灵衣

最新推荐文章于 2022-07-11 17:28:24 发布

阅读量428

点赞数

分类专栏： machine learn 文章标签：逻辑回归 python

本文链接：https://blog.csdn.net/ZAQ1018472917/article/details/85036983

版权

machine learn 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

机器学习笔记 - 吴恩达 - 目录

笔记

描述

什么叫逻辑回归？

逻辑回归有什么用？

有那些相关名词？
决策边界、Sigmoid函数、代价函数

需要注意的是，虽然名叫回归，但这是一个分类算法。
（叫这个名字是由于计算方式类似于线性回归）

决策边界：逻辑回归的主要目的是得到决策边界。这是通过数据集的规律，先构建一个假设函数，训练后得到其系数。最后通过计算测试数据在这个边界位置，来决定数据的正确与否。

Sigmoid函数：由于我们的数据集的结果是一个0或1的值，我们的假设函数计算的结果要做正确与否的判断。如果简单的再采用线性函数，会由于训练数据相差大时，线性的假设函数无法正确在分界位置附加分类。
下图就是线性函数用于分类，会出现的问题。
线性函数分类的问题
sigmoid函数选用： $\frac{1}{1 + e^z}$
故假设函数为： $h_\theta (x) = \frac{1}{1 + e^{- \theta_j x}}$

代价函数：如果继续使用线性回归的方式计算代价，可能会出现代价函数是非凸函数，这样就会导致该代价函数的图像看上去有很多小凹坑，使我们在梯度下降时，进入到某一个局部最优解，得不到想要的效果。
所以为了避免这种情况，重新选用代价函数。由于数据集的结果只分0|1，我们的代价也可以通过训练集的结果0|1直接计算代价。观察sigmoid函数，其取值范围在(0, 1)，故可以选用log函数在(0, 1)的函数部分即可。
代价函数可选用：
$cost(h_\theta (x), y) = \begin{cases} \ - log(h_\theta (x)), \ \ \ y = 1 \\ -log(1 - h_\theta (x)), \ \ \ y = 0 \end{cases}$
合并2项，便于计算： $(h_\theta (x), y) = - y log(h_\theta (x)) - (1 - y)log(1 - h_\theta (x))$

多元分类

推广到多元的分类，只需要简单的将需要关注的数据集视为正集合，其它的都视为负集合，再使用一元的逻辑回归分类，重复多次前面的操作即可。
多元逻辑回归

为了作出预测，在输入一个测试数据后，需要在对多次进行过一元分类得到的分类器中都预测一遍，最后选择可信度最大的类别。
$\ i : max ( h_\theta^{(i)} (x))$

关键点

决策边界（decision boundaris）
假设函数：
$h_\theta (x) = \frac{1}{1 + e^{- \theta_j x}}$
代价函数：
$(h_\theta (x), y) = - y log(h_\theta (x)) - (1 - y)log(1 - h_\theta (x))$
代价函数求偏导（正好与线性回归相同）：
$\frac{\partial}{\partial \theta_j} J (\theta) = \frac{1}{m} \sum_{i=1}^m (h_\theta (x^{(i)} - y^{(i)})) ^2 x_j^{(i)}$

梯度下降算法（Gradient descent algorithm）
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J (\theta)$
$(j = 0, 1, 2, . . ., n)$
注：j = 0时， $x_0$ = 1，即函数的偏移项

代码

python实现

大部分内容与线性回归相同，只是代价函数不同，和一些小地方的修改，并将Sigmoid()函数单独写出

import numpy as np

class LogisticRegression:
    '''
    逻辑回归
    # !不同于线性回归的地方,用'# !'标注了

    参数:
        X - 训练集(需要将训练集的特征缩放到合适范围,并将参数以列向量重排)
        Y - 训练集的结果(取值为0|1)
        W - 假设函数的多参数组成矩阵(w1、w2、w3 ...)(W_j对应theta_j,j=1,2,3...)
        b - 假设函数的参数(x0 = 1的值)(b对应theta_0)
        learning_rate - 学习速率
        num_iter - 迭代次数
        costs - 代价函数值的集合(非必须操作)
    
    使用:
        lg = LogisticRegression()
        lg.init(X_train, Y_train)
        lg.train(0.001, 2000)
        predicted = lg.predict(X_test)
    '''
    X = 0
    Y = 0
    W = 0
    b = 0
    learning_rate = 0
    num_iter = 0
    costs = []


    # 初始化变量
    def init(self, X, Y):
        '''
        加载训练集,并设置一些初始值

        参数:
            X - 训练集
            Y - 训练集的结果
        '''
        self.X = X
        self.Y = Y
        self.W = np.zeros(shape = (X.shape[0], 1))
        self.b = 0
        self.costs = []


    # 对代价函数J求导
    # h(x) = W * x + b
    def partial_derivative(self):
        '''
        对梯度下降公式后半部分的求导(手动计算)数值

        返回:
            dW,db - 假设函数的参数的偏导值
        '''

        m = self.X.shape[1]

        # 假设函数(正向传播)
        # !不同于线性回归,这里用于分类,假设函数不同(其中,训练集X的值需要预处理到合适范围(如,-0.5~0.5或0~1之间等等),避免不能正确的进行学习
        # 特征缩放:结合理论+实际来看,最好是能将特征缩放到sigmoid中间变化幅度大的地方,避免与学习速率不匹配,导致很难收敛到最优解
        # (如一个图像的颜色在0~255直接,不进行特征缩放的话,基本上大部分值都会在sigmoid中使输出非常接近于1,尽管没有出现梯度消失的情况,但学习速率极慢)
        H = 1 / (1 + np.exp(-(np.dot(self.W.T, self.X) + self.b)))

        # 计算代价,记录代价(非必须操作,只是便于观察梯度下降的效果)
        # !直接计算sum(h-y),尽管在线性回归中好用,但在逻辑回归中能用,由于假设函数h是非线性函数,故可能会出现非凸的代价函数,导致只能找到局部最优,而不是全局最优
        cost = (-1 / m) * np.sum(self.Y * np.log(H) + (1 - self.Y) * np.log(1 - H))
        self.costs.append(cost)

        # 求偏导(反向传播)
        # !与线性回归不同的代价计算方法(避免成为非凸函数),故计算后的导数式子也不同
        # ?不理解为何Andrew Ng(第50课)对J代价函数求偏导为何是和线性回归的式子一样
        # ?!这篇github给出了证明,幸运的是的确和线性回归中J的求导结果一致: https://github.com/halfrost/Halfrost-Field/blob/master/contents/Machine_Learning/Logistic_Regression.ipynb
        dW = 1 / m * np.dot(self.X, (H - self.Y).T)
        db = 1 / m * np.sum(H - self.Y)

        return dW, db


    # 梯度下降
    # temp0 = W - alpha * partial_derivative(J0(W, b))
    # temp1 = b - alpha * partial_derivative(J1(W, b))
    # ...
    def gradient_descent(self):
        '''
        进行梯度下降的运算,公式:W_j = W_j - alpha * partial_derivative(J_j(W_j, b)), j = 1,2,3...
        '''

        for i in range(self.num_iter):
            dW, db = self.partial_derivative()
            
            # 梯度下降,优化参数W、b
            self.W = self.W - self.learning_rate * dW
            self.b = self.b - self.learning_rate * db


    # 开始训练
    def train(self, learning_rate = 0, num_iter = 0):
        '''
        开始训练
        参数:
            learning_rate - 学习速率
            num_iter - 迭代次数
        '''
        self.learning_rate = learning_rate
        self.num_iter = num_iter

        self.gradient_descent()


    # 预测
    def predict(self, X):
        '''
        预测X数据集
        参数:
            X - 测试数据集
        返回:
            predicted - 对于测试数据集X的预测结果
        '''
        # 带入参数w、b预测测试集
        # !不同于线性回归,这里将测试集数据代入假设函数计算,再手动二值化
        predicted = 1 / (1 + np.exp(-(np.dot(self.W.T, X) + self.b)))
        # 将结果二值化
        predicted = np.round(predicted)
        predicted = predicted.astype(np.int)
        
        return predicted

测试

测试集是数字0~1的图片

# 数字分类
# 本例子,仅对数字0识别

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
from logistic_regression import LogisticRegression

if __name__ == '__main__':
    # 加载数据
    train_dataset_path = '../datasets/digital_datasets/train_images/'
    test_dataset_path = '../datasets/digital_datasets/test_images/'

    def load_dataset(dataset_path):
        images = []
        targets = []

        path_root = os.listdir(dataset_path)
        for path_root_dir in path_root: # 目录下
            path_root_x = dataset_path + path_root_dir + '/'
            path_root_root = os.listdir(path_root_x)    # 子目录下
            train_sets_path = [path_root_x + filename for filename in path_root_root]
            for path in train_sets_path:
                image = Image.open(path)    # 加载图片
                image_array = np.array(image)   # 转为矩阵
                image_array_ravel = image_array.ravel() # 改变形状
                image_array_ravel_scale = image_array_ravel / 255   # 缩放
                images.append(image_array_ravel_scale)

                l = path.split('/')
                targets.append(1.0 if l[-2] == '0' else 0.0) # 不属于数字0的目录下的图片，结果都设为0
                
        X = np.stack(images)                 # 取出训练数据
        Y = np.array(targets, ndmin=2).T     # 取出数据集结果
        return X.T, Y.T


    # 加载数据
    train_set_x, train_set_y = load_dataset(train_dataset_path)
    test_set_x, test_set_y = load_dataset(test_dataset_path)

    # 逻辑回归
    N = 2000    # 设置迭代次数
    lr = LogisticRegression()
    lr.init(train_set_x, train_set_y)
    lr.train(0.003, N)
    predicted = lr.predict(test_set_x)

    # 显示结果对比、准确率
    print('预测结果：', end = '')
    print(predicted)
    print('测试集结果：', end = '')
    print(test_set_y)
    print(f'准确率:{np.mean(np.equal(test_set_y, predicted)) * 100}%')

    # 显示代价函数迭代
    plt.plot([x for x in range(N)], lr.costs)
    plt.show()

训练集

训练集和测试集的数据，从别人那直接copy的。
我放在了github上：数字图片

附

逻辑回归的求导过程¹
先对逻辑函数（Sigmoid函数）求导：
$\sigma(x)'=\left(\frac{1}{1+e^{-x}}\right)'=\frac{-(1+e^{-x})'}{(1+e^{-x})^2}=\frac{-1'-(e^{-x})'}{(1+e^{-x})^2}=\frac{0-(-x)'(e^{-x})}{(1+e^{-x})^2}=\frac{-(-1)(e^{-x})}{(1+e^{-x})^2}=\frac{e^{-x}}{(1+e^{-x})^2} \\ =\left(\frac{1}{1+e^{-x}}\right)\left(\frac{e^{-x}}{1+e^{-x}}\right)=\sigma(x)\left(\frac{+1-1 + e^{-x}}{1+e^{-x}}\right)=\sigma(x)\left(\frac{1 + e^{-x}}{1+e^{-x}} - \frac{1}{1+e^{-x}}\right)\\ =\sigma(x)(1 - \sigma(x))$

在通过上面的结果，借助符合函数求导：
$\frac{\partial}{\partial \theta_j} J(\theta) = \frac{\partial}{\partial \theta_j} \frac{-1}{m}\sum_{i=1}^m \left [ y^{(i)} log (h_\theta(x^{(i)})) + (1-y^{(i)}) log (1 - h_\theta(x^{(i)})) \right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} \frac{\partial}{\partial \theta_j} log (h_\theta(x^{(i)})) + (1-y^{(i)}) \frac{\partial}{\partial \theta_j} log (1 - h_\theta(x^{(i)}))\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \frac{\partial}{\partial \theta_j} h_\theta(x^{(i)})}{h_\theta(x^{(i)})} + \frac{(1-y^{(i)})\frac{\partial}{\partial \theta_j} (1 - h_\theta(x^{(i)}))}{1 - h_\theta(x^{(i)})}\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \frac{\partial}{\partial \theta_j} \sigma(\theta^T x^{(i)})}{h_\theta(x^{(i)})} + \frac{(1-y^{(i)})\frac{\partial}{\partial \theta_j} (1 - \sigma(\theta^T x^{(i)}))}{1 - h_\theta(x^{(i)})}\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} \sigma(\theta^T x^{(i)}) (1 - \sigma(\theta^T x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{h_\theta(x^{(i)})} + \frac{- (1-y^{(i)}) \sigma(\theta^T x^{(i)}) (1 - \sigma(\theta^T x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{1 - h_\theta(x^{(i)})}\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ \frac{y^{(i)} h_\theta(x^{(i)}) (1 - h_\theta(x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{h_\theta(x^{(i)})} - \frac{(1-y^{(i)}) h_\theta(x^{(i)}) (1 - h_\theta(x^{(i)})) \frac{\partial}{\partial \theta_j} \theta^T x^{(i)}}{1 - h_\theta(x^{(i)})}\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} (1 - h_\theta(x^{(i)})) x^{(i)}_j - (1-y^{(i)}) h_\theta(x^{(i)}) x^{(i)}_j\right ] \newline= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} (1 - h_\theta(x^{(i)})) - (1-y^{(i)}) h_\theta(x^{(i)}) \right ] x^{(i)}_j \newline= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} - y^{(i)} h_\theta(x^{(i)}) - h_\theta(x^{(i)}) + y^{(i)} h_\theta(x^{(i)}) \right ] x^{(i)}_j \newline= - \frac{1}{m}\sum_{i=1}^m \left [ y^{(i)} - h_\theta(x^{(i)}) \right ] x^{(i)}_j \newline= \frac{1}{m}\sum_{i=1}^m \left [ h_\theta(x^{(i)}) - y^{(i)} \right ] x^{(i)}_j$

故幸运的是，其偏导函数与线性回归的基本相同，可以使用线性回归的偏导函数用于逻辑回归的计算

sigmoid函数的额外特点
$cost(h_\theta (x), y) = \begin{cases} \ - log(h_\theta (x)), \ \ \ y = 1 \\ -log(1 - h_\theta (x)), \ \ \ y = 0 \end{cases}$
由于sigmoid函数画出来0~1区间的图像，可发现预测值越接近实际，代价越小；反之，代价会激增，可用此惩罚这个学习算法。