逻辑回归推导及python实例分析

最新推荐文章于 2024-05-12 10:53:13 发布

Diamond-Mine

最新推荐文章于 2024-05-12 10:53:13 发布

阅读量963

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/lgh1700/article/details/100525477

版权

机器学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

数学公式

（1）log函数计算
$l o g (M * N) = l o g M + l o g N$
$logM^N)=NlogM$

逻辑回归

Logistic Regression是广义线性模型的一种，可以用线性函数表示分类的超平面：
$W x + b = y$
其中W为权重，b为偏置项。在多维情况下，W和b为向量。

通过对训练样本的学习，得到超平面，再使用阈值函数，将样本映射到不同的类别（0或1）。

常用的阈值函数有Sigmoid函数，形式为：
$f(x)=\frac{1}{1+e^{-x}}$
在这里插入图片描述

可以看出，函数的值域为(0,1)，在0附近的变化比较明显。

Sigmoid的导数为：
$\sigma'(x) = \left(\frac{1}{1+e^{-x}}\right)' = \frac{-(1+e^{-x})'}{(1+e^{-x})^2} = \frac{-1'-(e^{-x})'}{(1+e^{-x})^2}$
$\frac{0-(-x)'(e^{-x})}{(1+e^{-x})^2} = \frac{e^{-x}}{(1+e^{-x})^2}$
$\left(\frac{1}{1+e^{-x}}\right)\left(\frac{e^{-x}}{1+e^{-x}}\right)$
$\sigma(x)\left(\frac{1 + e^{-x}}{1+e^{-x}} - \frac{1}{1+e^{-x}}\right)$
$\sigma(x)(1 - \sigma(x))$

损失函数

对于输入向量X，属于正例的概率为：
$P(y=1)=\sigma(wx+b)=\frac{1}{1+e^{-(wx+b)}}$
属于负例的概率为：
$P(y=0)=1-\sigma(wx+b)$
根据伯努利概率函数，属于类别y的概率为：
$P(y)=\sigma(wx+b)^y (1-\sigma(wx+b))^{1-y}, y=0,1$
已经每个训练样本的所属类别的概率，将训练样本的类别概率连乘，用极大似然法估计。似然函数为：
$L_\theta=\prod_{i=1}^mP_i(y)$
$=\prod_{i=1}^m[h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i}]$
其中 $h_{\theta}(x^i)=\sigma(wx^i+b)$ 。

为求似然函数的最大值，可使用log似然函数，将连乘转换为连加操作。将负的log似然函数（negative log likehood）NLL作为损失函数，此时需要计算NLL的极小值，损失函数为：
$-log(L_\theta)=-log(\prod_{i=1}^m[h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i}])$
$=-\sum_{i=1}^mlog(h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i})$
$=-\sum_{i=1}^m[log(h_{\theta}(x^i)^{y^i})+log(1-h_{\theta}(x^i))^{1-y^i}]$
$=-\sum_{i=1}^m[y^ilog(h_{\theta}(x^i))+(1-y^i)log(1-h_{\theta}(x^i))]$
为求得损失函数的最小值，使用梯度下降法求解。

梯度下降法

损失函数为：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^i log(h_\theta(x^i)) + (1 - y^i) log(1 - h_\theta(x^i)) \right]$
梯度下降公式
$\theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)$
代入损失函数推导：

在这里插入图片描述

其中 $\theta^Tx^{(i)}$ 对 $\theta_j$ 求偏导
$\theta^Tx^{(i)} = [\theta_0,\theta_1,...,\theta_j,...]*x^{(i)}$
$(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_jx_j^{(i)}+...)$
结果为 $x_j^{(i)}$

推导关键点

求导可以穿透常量系数，如 $(3 x)^{'} = 3 (x)^{'}$
以e为底的对数为自然对数，用ln表示， $(l n x)^{'} = 1 / x$
Sigmoid函数的导数为 $\sigma'(x) = \sigma(x)(1 - \sigma(x))(x)'$

python实现

# 代码来自《Python机器学习算法》一书
def sig(x):
    return 1.0 / (1 + np.exp(-x))

def lr_train_bgd(feature, label, maxCycle, alpha):
    '''利用梯度下降法训练LR模型
    input:  feature(mat)特征
            label(mat)标签
            maxCycle(int)最大迭代次数
            alpha(float)学习率
    output: w(mat):权重
    '''
    n = np.shape(feature)[1]  # 特征个数
    w = np.mat(np.ones((n, 1)))  # 初始化权重
    i = 0
    while i <= maxCycle:  # 在最大迭代次数的范围内
        i += 1  # 当前的迭代次数
        h = sig(feature * w)  # 计算Sigmoid值
        err = label - h
        if i % 100 == 0:
            print "\t---------iter=" + str(i) + \
            " , train error rate= " + str(error_rate(h, label))
        w = w + alpha * feature.T * err  # 权重修正
    return w

代码分析：

（1）feature为训练数据，偏置项的特征值设为1，数据如下：

(Pdb) feature[:10]
matrix([[1.   , 4.459, 8.225],
        [1.   , 0.043, 6.307],
        [1.   , 6.997, 9.313],
        [1.   , 4.755, 9.26 ],
        [1.   , 8.662, 9.768],
        [1.   , 7.174, 8.695],
        [1.   , 0.134, 1.969],
        [1.   , 2.959, 5.805],
        [1.   , 0.162, 2.596],
        [1.   , 3.996, 8.833]])

label为标签数据，值为0或1，数据如下：

(Pdb) label[:10]
matrix([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])

maxCycle为迭代次数，设为1000；alpha为学习率，设为0.01

特征个数为3，权重值初始化为1

(Pdb) p w
matrix([[1.],
        [1.],
        [1.]])

（2）h = sig(feature * w)为预测值，对应表达式
$h_{\theta}(x^i)=\sigma(wx^i+b)=\frac{1}{1+e^{-(wx^i+b)}}$

(Pdb) p h[:10]
matrix([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])

（3）err = label - h对应表达式 $y^i-h_{\theta}(x^i)$

feature.T * err对应表达式 $x^i \cdot (y^i-h_{\theta}(x^i))$

（3）更新w权重值后，可以计算损失值，损失函数为：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^i log(h_\theta(x^i)) + (1 - y^i) log(1 - h_\theta(x^i)) \right]$
代码实现为：

sum_err = 0.0
for i in xrange(m):
    y_i = label[i,0]
    sum_err -= (y_i * np.log(h[i,0]) + (1-y_i) * np.log(1-h[i,0]))
sum_err /= m

（4）训练结束后，得到最终权重值

(Pdb) p w
matrix([[ 1.394],
        [ 4.527],
        [-4.794]])

预测

将测试数据代入预测函数 h = sig(feature * w)，得到预测值，若值<0.5预测为负例，>=0.5为正例。

(Pdb) p h[:10]
matrix([[0.   ],
        [0.   ],
        [0.002],
        [0.   ],
        [0.001],
        [0.   ],
        [0.001],
        [0.001],
        [0.   ],
        [0.   ]])

Diamond-Mine

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归推导及python实例分析

求导公式（1）log函数求导log(M∗N)′=logM+logNlog(M*N)&#x27;=logM+logNlog(M∗N)′=logM+logN(logMN)′=NlogM(logM^N)&#x27;=NlogM(logMN)′=NlogM逻辑回归Logistic Regression是广义线性模型的一种，可以用线性函数表示分类的超平面：Wx+b=yW...
复制链接

扫一扫

专栏目录