逻辑回归

最新推荐文章于 2024-09-18 14:39:54 发布

数据科学家修炼之道

最新推荐文章于 2024-09-18 14:39:54 发布

阅读量411

点赞数

分类专栏： AI

本文为博主原创文章，欢迎转载，转载请注明出处。

本文链接：https://blog.csdn.net/xiligey1/article/details/81229240

版权

AI 专栏收录该内容

130 篇文章 7 订阅

订阅专栏

理论

逻辑函数： $g(z)=\frac 1 {1+e^{-z}}\tag{1}$

逻辑函数可视化：

import matplotlib.pyplot as plt
import numpy as np
import math
e = math.e
x = np.linspace(-10,10,1e6)
y = 1 / (1 + np.exp(-x))

plt.plot(x, y)
plt.show()

假设函数： $h_{\theta}(x)=g(\theta^TX)=\frac 1 {1+e^{-\theta^TX}}\tag{2}$

代价函数： $J(\theta)=\frac 1 m \sum_{i=1}^mCost(h_{\theta}(x^{(i)}),y^{(i)}) \tag{3}$
其中， $Cost(h_{\theta}(x^{(i)}),y^{(i)}) = \begin{cases} -log(h_{\theta}(x)), & \text {if $y=1$} \\ -log(1-h_{\theta}(x)), & \text{if $y=0$} \end{cases}\tag{4}$

$J\left( \theta \right)=-\frac{1}{m}\sum\limits_{i=1}^{m}{[{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}\tag{5}$

其函数图像为：

从图中可以看出:

$y = 1$ ，当预测值 $h_{\theta}(x)=1$ 时，代价函数 $C o s t$ 的值为0，这是我们想要的（模型预测完全正确时，代价达到最小）。当预测值离1越远，其代价函数越大，这也是我们想要的
同理 $y = 0$ ，当预测值=0时，代价函数达到最小值；预测值离0越远，其代价函数越大。

代价函数推导过程(采用极大似然估计)：

假设函数 $h_{\theta}(x)$ 表示预测结果为1的概率，则：
$P(y=1|x;\theta)=h_{\theta}(x)\\ P(y=0|x;\theta)=1-h_{\theta}(x)\tag{6}$

将公式6合并为一个公式：
$P(y|x;\theta)=h_{\theta}(x)^y*(1-h_{\theta}(x))^{1-y}\tag{7}$

取似然函数：
$L(\theta)=\prod_{i=1}^mP(y^{(i)}|x^{(i)};\theta)=\prod_{i=1}^m(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}\tag{8}$

对数似然函数：
$l_{\theta}=log(L(\theta))=\sum\limits_{i=1}^{m}{[{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}\tag{9}$

最大似然估计是取使得似然函数最大化的 $\theta$ ，令损失函数 $J(\theta)=-\frac 1 m l(\theta)$ ，则最大化的 $l(\theta)$ 即为最小化的 $J(\theta)$

梯度下降法迭代公式：
$\theta_j=\theta_j-\alpha(\frac \partial {\partial \theta_j})J(\theta)=\theta_j-\alpha \frac 1 m \sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}\tag{10}$

矩阵形式：
$\theta=\theta-\alpha \frac 1 m x^T(g(x\theta)-y)\tag{11}$

推导如下：

带惩罚项的逻辑回归

$J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}\tag{12}$

重复直至收敛：

${\theta_0}:={\theta_0}-a\frac{1}{m}\sum\limits_{i=1}^{m}{(({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{0}^{(i)}})$

${\theta_j}:={\theta_j}-a[\frac{1}{m}\sum\limits_{i=1}^{m}{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})x_{j}^{\left( i \right)}}+\frac{\lambda }{m}{\theta_j}]$

$j = 1, 2, . . . n$

Python实现

import numpy as np

X = np.array([[1, 2], [3, 2], [1, 3], [2, 3], [3, 3], [3, 4], [10, 11], [9, 10], [12, 13], [14, 14], [13, 12]])
y = np.array([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
n_samples, n_features = X.shape

X = np.concatenate((np.ones(n_samples).reshape((n_samples, 1)), X), axis=1)
y = y.reshape((n_samples, 1))

max_iter = 1e4  # 最大迭代次数
epsilon = 1e-4  # θ迭代前后变化最大误差不能超过epsilon
theta = np.zeros((n_features + 1, 1))  # 初始化theta
alpha = 0.0001

for iter in range(int(max_iter)):
    theta_next = theta - alpha * (X.T) @ (1 / (1 + np.exp(-X@theta)) - y) / n_samples
    print(theta_next)
    if np.abs(theta - theta_next).sum() < epsilon:
        theta = theta_next
        print("merge")
        break
    theta = theta_next
else:
    print("get the max_iter, stop iter.")