机器学习_逻辑回归（超级详细，有推导，有代码，好理解）

最新推荐文章于 2024-08-09 12:39:33 发布

getlxc

最新推荐文章于 2024-08-09 12:39:33 发布

阅读量224

点赞数

分类专栏：机器学习算法文章标签：算法机器学习逻辑回归

本文链接：https://blog.csdn.net/GetLxc/article/details/107994318

版权

机器学习同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

算法

5 篇文章 0 订阅

订阅专栏

一、逻辑回归前期介绍

1.1 逻辑回归是什么算法？

逻辑回归是一个分类算法，并不是一个回归算法。可以说逻辑回归前期用的是和回归算法一样，后期加入了一个激活函数，成为了分类算法。

1.2 Sigmoid函数

$\sigma(t)=\frac{1}{1+e^{-t}}$
图像：
在这里插入图片描述

这个函数有啥用？

你看，他的取值空间是负无穷到正无穷，但是值域是0到1。所以说逻辑回归是先线性回归，然后再用一个激活函数分成两类（值域0.5作分界线，也就是取值0最为分界线）。。假设有两个标签。标签1和标签0。线性回归求出来的值如果大于0就让他分成1这个标签下，小于0就让他分到0这个类别。

1.3 小结

我们有m个数据，每个数据n个特征，每条数据的y都属于两个特征的其中一个。我们要训练出一个模型，这个模型可以让新来的数据分类到其中一个标签下。阶段目的是要求出n+1个 $\theta_i$ 。结合激活函数sigmoid实现正确分类。
$\hat{p}=\sigma\left(\theta^{T} \cdot x_{b}\right)=\frac{1}{1+e^{-\theta^{T} \cdot x_{b}}}$
由于 $\theta^{T} \cdot x_{b}$ 的取值范围是负无穷到正无穷，所以 $\hat{p}$ 的取值范围是0到1.

二、代价函数怎么求？

依据上述推导，我们可以得到目前的准则。
$\hat{y}= \begin{cases}1, & \hat{p} \geq 0.5 \\ 0, & \hat{p} \leq 0.5\end{cases}$

所以我们的代价函数标准应该是：
$\text { cost }=\left\{\begin{array}{l} \text { 如果 } y=1, p \text { 越小, cost越大 } \\ \text { 如果 } y=0, \text { p越大, cost越大 } \end{array}\right.$

我们可以想象一个函数满足上诉准则：
$\text { cost }=\left\{\begin{array}{ccc} -\log (\hat{p}) & \text { if } & y=1 \\ -\log (1-\hat{p}) & \text { if } & y=0 \end{array}\right.$
图像：
在这里插入图片描述

说明：
如果一条数据，y值是1，我们求的 $\hat{p}$ 接近1，表明我们求对了，此刻应该让cost接近0，上式满足，如我们求得 $\hat{p}$ 接近0，表示我们求错了，此刻应该加大惩罚。上式满足。
如果一条数据，y值是0，我们求的 $\hat{p}$ 接近0，表明我们求对了，此刻应该让cost接近0，上式满足，如我们求得 $\hat{p}$ 接近1，表明我们求错了，此刻应该加大惩罚。上式满足。
综上所述，这个代价函数很好，满足所需。但是有一点，他是两个式子，不方便啊，还要if判断之类的，所以我们合二为一。得到最终的代价函数。(带入y化简即可！)
$\text { cost }=-y \log (\hat{p})-(1-y) \log (1-\hat{p})$
更具体的说：
$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(\hat{p}^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)$
其中：
$\hat{p}^{(i)}=\sigma\left(X^{(i)} \theta\right)=\frac{1}{1+e^{-X^{(i)} \theta}}$
所以啊：
$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(\sigma\left(X_{b}^{(i)} \theta\right)\right)+\left(1-y^{(i)}\right) \log \left(1-\sigma\left(X_{b}^{(i)} \theta\right)\right)$
所以啊：
$J(\theta)=-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log \left(\frac{1}{1+e^{-X^{(i)} \theta}}\right)+\left(1-y^{(i)}\right) \log \left(1-\frac{1}{1+e^{-X^{(i)} \theta}}\right)$

2.1最小化代价函数！

梯度下降法优化代价函数。

代价函数求梯度

$\frac{J(\theta)}{\theta_{j}}=\frac{1}{m} \sum_{i=1}^{m}\left(\sigma\left(X_{b}^{(i)} \theta\right)-y^{(i)}\right) X_{j}^{(i)}$
推导都是高中知识，就是麻烦点，并不难！，最后的结果是不是很眼熟，他简直和线性回归的代价函数一模一样！！

三、代码实现逻辑回归！！

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

class LogisticRegression:

    def __init__(self):
        """初始化Logistic Regression模型"""
        self.coef_ = None
        self.intercept_ = None
        self._theta = None

    def _sigmoid(self, t):
        return 1. / (1. + np.exp(-t))

    def fit(self, X_train, y_train, eta=0.01, n_iters=1e4):
        """根据训练数据集X_train, y_train, 使用梯度下降法训练Logistic Regression模型"""
        assert X_train.shape[0] == y_train.shape[0], \
            "the size of X_train must be equal to the size of y_train"

        def J(theta, X_b, y):
            y_hat = self._sigmoid(X_b.dot(theta))
            try:
                return - np.sum(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat)) / len(y)
            except:
                return float('inf')

        def dJ(theta, X_b, y):
            return X_b.T.dot(self._sigmoid(X_b.dot(theta)) - y) / len(y)

        def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):

            theta = initial_theta
            cur_iter = 0

            while cur_iter < n_iters:
                gradient = dJ(theta, X_b, y)
                last_theta = theta
                theta = theta - eta * gradient
                if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
                    break

                cur_iter += 1

            return theta

        X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
        initial_theta = np.zeros(X_b.shape[1])
        self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)

        self.intercept_ = self._theta[0]
        self.coef_ = self._theta[1:]

        return self

    def predict_proba(self, X_predict):
        """给定待预测数据集X_predict，返回表示X_predict的结果概率向量"""
        assert self.intercept_ is not None and self.coef_ is not None, \
            "must fit before predict!"
        assert X_predict.shape[1] == len(self.coef_), \
            "the feature number of X_predict must be equal to X_train"

        X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])
        return self._sigmoid(X_b.dot(self._theta))

    def predict(self, X_predict):
        """给定待预测数据集X_predict，返回表示X_predict的结果向量"""
        assert self.intercept_ is not None and self.coef_ is not None, \
            "must fit before predict!"
        assert X_predict.shape[1] == len(self.coef_), \
            "the feature number of X_predict must be equal to X_train"

        proba = self.predict_proba(X_predict)
        return np.array(proba >= 0.5, dtype='int')

    def score(self, X_test, y_test):
        """根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""

        y_predict = self.predict(X_test)
        return accuracy_score(y_test, y_predict)

    def __repr__(self):
        return "LogisticRegression()"


iris = datasets.load_iris()
X = iris.data
y = iris.target
X = X[y < 2, :2]
y = y[y < 2]
plt.scatter(X[y == 0, 0], X[y == 0, 1], color="red")
plt.scatter(X[y == 1, 0], X[y == 1, 1], color="blue")
plt.show()


X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
print('分数', log_reg.score(X_test, y_test))
print(log_reg.predict(X_test))
print(y_test)