python逻辑回归（logistic regression LR) 底层代码实现 BGD梯度下降算法二分类

最新推荐文章于 2024-09-07 09:51:34 发布

DiDiu520

最新推荐文章于 2024-09-07 09:51:34 发布

阅读量1.1k

点赞数 2

分类专栏： python 文章标签：逻辑回归 python 算法机器学习分类

本文链接：https://blog.csdn.net/Big_SHOe/article/details/121781208

版权

python 专栏收录该内容

18 篇文章 4 订阅

订阅专栏

逻辑回归

逻辑回归是机器学习中的经典分方法。虽然它称之为回归，但它是分类算法，这里我觉得此算法是做了一个决策边界的拟合，即决策边界的回归，SVM的分类也可以这样理解。逻辑回归算法步骤描述如下：

假设决策函数；
引入Sigmoid函数，将函数值映射为概率值；
计算损失函数-交叉熵，进行参数更新：
若结果收敛，退出计算。

1、决策函数

假设决策函数为
$\theta ^TX=\left[ \theta _0+\theta _1+\cdots +\theta _n \right] \cdot \left[ 1+x+\cdots +x^{n-1} \right] ^T=0$

2、假设函数（sigmoid函数）

引入sigmoid函数，将决策函数的函数值映射为某一类的概率，其表达式为：
$h_{\theta}\left( X \right) =\frac{1}{1+e^{-\theta ^TX}}$
该函数将决策边界函数映射到[0,1]之间，表现为x为某类的概率。 $h_{\theta}\left( X \right) =\begin{cases} >0.5, \theta ^TX>0\\ 0.5, \theta ^TX>0\\ <0.5, \theta ^TX>0\\ \end{cases}$

两者之间得关系如下图所示：

在这里插入图片描述

3、预测函数

预测函数为:
$Y=signal\left( h_{\theta}\left( X \right) \right) =\begin{cases} 0, h_{\theta}\left( X \right) <threshold\,\,\\ 1, h_{\theta}\left( X \right) >threshold\\ \end{cases}$
其中，threshold为阈值，其设置依实际需求而定，比如，当根据肿瘤大小判断是否得癌症时，可以将阈值设置较小，以使患者进行更加全面得检查。

4、损失函数（cost function)

使用均方误差（MSE）作为损失函数，参数容易陷入局部最优，因此这里使用交叉熵作为损失函数。
$\cos t\left( h_{\theta}\left( x \right) ,y \right) =\left\{ \begin{array}{c} \begin{array}{l} -\log \left( h_{\theta}\left( x \right) \right) , y=1\\ -\log \left( 1-h_{\theta}\left( x \right) \right) , y=0\\ \end{array}\\ \end{array} \right. \\ ,\text{即}\cos t\left( h_{\theta}\left( x \right) ,y \right) =-y\log \left( h_{\theta}\left( x \right) \right) -\left( 1-y \right) \log \left( 1-h_{\theta}\left( x \right) \right)$
所以
$J\left( \theta \right) =\frac{1}{m}\sum_{i=1}^m{\begin{array}{c} \cos t\left( h_{\theta}\left( x^{\left( i \right)} \right) ,y^{\left( i \right)} \right) =-\frac{1}{m}\sum_{i=0}^m{\left( y\log \left( h_{\theta}\left( x^{\left( i \right)} \right) \right) +\left( 1-y^{\left( i \right)} \right) \log \left( 1-h_{\theta}\left( x^{\left( i \right)} \right) \right) \right)}\\ \end{array}}$

为取得损失函数的最小值，使用梯度下降算法进行参数更新。
计算损失函数关于参数θ的梯度，其结果为：
$\frac{\partial J\left( \theta \right)}{\partial \theta _j}=\frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( x^{\left( i \right)} \right) -y^{\left( i \right)} \right)}x_{j}^{\left( i \right)}$
所以参数更新公式为：
$\theta ^{new}=\theta ^{old\:}-\alpha \frac{\partial J\left( \theta \right)}{\partial \theta _j}=\theta ^{old\:}-\alpha \frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( x^{\left( i \right)} \right) -y^{\left( i \right)} \right)}x_{j}^{\left( i \right)}$

5、代码实现及可视化

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.metrics import confusion_matrix
%matplotlib
np.random.seed(3)
X, Y = make_classification(n_samples=100,n_features=2, n_redundant=0, n_informative=2,
                             n_clusters_per_class=1, n_classes=2)

x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2)
x_train=np.hstack((x_train,np.ones([y_train.shape[0],1])*0.01))
x_test=np.hstack((x_test,np.ones([y_test.shape[0],1])*0.01))
m=x_train.shape[0]
Theta=np.random.rand(x_train.shape[1])

def h(x,theta):#假设函数
    h1=[]
    for i in range(len(x)):
        h1.append(1/(1+np.power(np.e,-np.dot(theta,x[i]))))
    return np.array(h1)
#print(h(x_train,Theta).shape)
def partial_theta(x,theta,y):#求偏导
    return np.dot(h(x,theta)-y,x)/m
#print(partial_theta(x_train,Theta,y_train).shape)
def J(x,y,theta):#损失函数
    return -np.dot(np.ones(x.shape[0]),y*np.log(h(x,theta))+(1-y)*np.log(1-h(x,theta)))/m
lr=0.02
cost=[]
iteration_times=3000
def BGD(x,theta,y):#Batch gradient decent#梯度下降进行参数更新
    for i in range(iteration_times):
        cost.append(J(x,y,theta)) 
        theta=theta-lr*partial_theta(x,theta,y)
def LR_predict(x,threshold):#预测函数
    y_pred=h(x,Theta)
    for i in range(len(y_pred)):
        if y_pred[i]>=threshold:
            y_pred[i]=1
        else:
            y_pred[i]=0
    return y_pred
def visualization(resolution):
    fig=plt.figure(figsize=(20,10))
    cmp = mpl.colors.ListedColormap(['b','g'])
    ax1=fig.add_subplot(2,2,1)
    ax1.plot(range(iteration_times),cost)

    ax2=fig.add_subplot(2,2,2)
    plt.xlim(x_train[:,0].min()-0.2, x_train[:,0].max()+0.2)
    plt.ylim(x_train[:,1].min()-0.2, x_train[:,1].max()+0.2)
    
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1   #第一个特征取值范围作为横轴
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1   #第二个特征取值范围作为纵轴
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution)) #reolution是网格剖分粒度，xx1和xx2数组维度一样
    Z = LR_predict(np.array([xx1.ravel(), xx2.ravel(),np.ones(len(xx1.ravel()))*0.01]).T,0.5) 
    Z = Z.reshape(xx1.shape)
    plt.pcolormesh(xx1,xx2, Z, cmap=plt.cm.Paired)
   # plt.contourf(xx1, xx2, Z, alpha=0.4,cmap =cmp )    
    ax2.scatter(x_test[:, 0], x_test[:, 1], alpha=0.8,marker='o', c=y_test)
    ax2.scatter(x_train[:, 0], x_train[:, 1], alpha=0.8,marker='x', c=y_train)
    #决策边界
    ax2.plot(np.arange(X[:, 0].min(), X[:, 0].max(),0.1),-Theta[0]/Theta[1]*np.arange(X[:, 0].min(), X[:, 0].max(),0.1)-Theta[2]/Theta[1]*0.01)
    confusion = confusion_matrix(LR_predict(x_test,0.5), y_test)
    ax3=fig.add_subplot(2,2,3)
    ax3.imshow(confusion, cmap=plt.cm.Blues)
    thresh=0.2
    for first_index in range(len(confusion)):
        for second_index in range(len(confusion[first_index])):
            plt.text(first_index, second_index, confusion[first_index][second_index],
                 color="red" if confusion[first_index][second_index]>thresh else "blue")
    plt.show()

if __name__=="__main__":

    BGD(x_train,Theta,y_train)
    visualization(0.08)
    print(Theta)

可视化