python实现逻辑回归

公式推导(使用向量化方法)  =>

             极大似然函数:           

                                             L(\Theta )=\prod_{i=1}^{m}(h_{\Theta }(x_{i}))^{y_{i}}\cdot (1-h_{\Theta }(x_{i}))^{1-y_{i}}                                   -----①

             其中:                               

                                           h_{\Theta }(x_{i})=g(\Theta ^{T}\cdot x_{i})=\frac{1}{1+e^{-\Theta ^{T}\cdot x_{i}}}                                             -----②    

              对其求偏导:       

                        \small \frac{\delta }{\delta \Theta _{j}}h_{\Theta }(x_{i}))=\frac{\delta }{\delta \Theta _{j}}g(\Theta ^{T}x_{i})=g(\Theta ^{T}x_{i}))\cdot (1-g(\Theta ^{T}x_{i}))\cdot \frac{\delta (\Theta ^{T}x_{i})}{\delta \Theta _{j}}      ---③

             对数似然函数:             

                        l(\Theta )=\ln (L(\Theta ))=\sum_{i=1}^{m}(y_{i}\cdot \ln (h_{\Theta }(x_{i}))+(1-y_{i})\cdot \ln (1-h_{\Theta }(x_{i})))       ---④

             梯度上升转为梯度下降(这是我们选择的损失函数,假如使用平方损失函数会出现许多局部最小值):       

                                     J(\Theta )=-\frac{1}{m}\cdot l(\Theta )                                                                             -----⑤

             对梯度求偏导:

                       \small \frac{\delta }{\delta \Theta _{j}}J(\Theta )=-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\cdot \frac{1}{h_{\Theta }(x_{i})}\cdot \frac{\delta }{\delta \Theta _{j}}h_{\Theta }(x_{i})-(1-y_{i})\cdot \frac{1}{1-h_{\Theta }(x_{i})}\cdot \frac{\delta }{\delta \Theta _{j}}h_{\Theta }(x_{i}))

                                          \small =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\cdot \frac{1}{g(\Theta ^{T}x_{i})}-(1-y_{i})\cdot \frac{1}{1-g(\Theta ^{T}x_{i})})\frac{\delta }{\delta _{\Theta _{j}}}g(\Theta ^{T}x_{i})

                                          =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\cdot \frac{1}{g(\Theta ^{T}x_{i})}-(1-y_{i})\cdot \frac{1}{1-g(\Theta ^{T}x_{i})})g(\Theta ^{T}x_{i})(1-g(\Theta ^{T}x_{i}))\frac{\delta }{\delta _{\Theta _{j}}}\Theta ^{T}x_{i}

                                          =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}(1-g(\Theta ^{T}x_{i}))-(1-y_{i})g(\Theta ^{T}x_{i}))x_{i}^{j}

                                          =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}-g(\Theta ^{T}x_{i}))x_{i}^{j}

                                          =\frac{1}{m}\sum_{i=1}^{m}(h_{\Theta }(x_{i})-y_{i})x_{i}^{j}

               即权值更新为:

只用向量法编程在数据量大的情况下不适用,我们将在后面提出了改进方法。

一、代码实现:

✿ 数据集:https://github.com/zhangqingfeng0105/Machine-Learn/tree/master/logistic_regression

from numpy import *
import matplotlib.pyplot as plt
import time
import random
from matplotlib.font_manager import *
def loadDataSet():
    dataMat = []
    labelMat = []
    fr = open('/home/zhangqingfeng/test/TestSet.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])           # len(数据集)×3维的列表
        labelMat.append(int(lineArr[2]))       # 一维列表
    return dataMat, labelMat

def sigmoid(inX):
    # return 1.0/(1+exp(-inX))
    if inX >= 0:
        return 1.0/(1+exp(-inX))               # 这样做可以保证exp(inx)值始终小于1,避免极大溢出
    else:
        return exp(inX)/(1+exp(inX))

def gradAscent(dataMatIn, classLabels):        # 梯度上升法
    dataMatrix = mat(dataMatIn)                # 转化为矩阵,这里是len(数据集)×3
    labelMat = mat(classLabels).transpose()    # 转为矩阵后再转制,成len(数据集)×1的列向量

    m,n = shape(dataMatrix)
    alpha = 0.001
    maxCycles = 150
    weightsList = []                # 记录每次权值的更新情况,便于后面画出图来
    weights = ones((n,1))           # 生成n×1维的array类型数据,这里n为3,故weights是3×1维
    weightsList.append(weights)
    for k in range(maxCycles):
        h = sigmoid(dataMatrix * weights)   # 根据公式②得:   h = g(x * w) 
        error = h - labelMat         # e = g(x * w) - y
        weights = weights - alpha * dataMatrix.transpose() * error   # 根据最后的向量化公式写出, w: = w - alpha * (x^T) * E
        weightsList.append(weights)    # 每次的权值添加到列表中保存起来
    return weights,weightsList

此程序根据向量化公式写出,终止条件采用简单的规定循环次数法。

画出决策边界:

def plotBestFit(dataList, labelList, weights, weightsList):                      # 画出决策边界
    myfont = FontProperties(fname='/usr/share/fonts/simhei.ttf')    # 显示中文
    plt.rcParams['axes.unicode_minus'] = False     # 防止坐标轴的‘-’变为方块

    dataArr = array(dataList)
    m = shape(dataArr)[0]                      # 行数也就是样本点个数
    xcord1 = []
    ycord1 = []
    xcord2 = []
    ycord2 = []
    for i in range(m):
        if int(labelList[i]) ==1:
            xcord1.append(dataArr[i,1])                # 取分类为1的点的x坐标的值,这是矩阵的取值方法
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig = plt.figure(1)                         # 创建一个画布
    ax = fig.add_subplot(111)                  # 画布一行一列一个图
    ax.scatter(xcord1, ycord1, s=30,c='red', marker='s')   #s是shape大小,c是颜色,marker是形状,'s'代表是正方形,默认'o'是圆圈
    ax.scatter(xcord2, ycord2, s=40,c='green')
    x = arange(-4.0, 4.0, 0.1)                   # 分界线x范围,步长为0.1
    y = (-weights[0]-weights[1]*x)/weights[2]    # 直线方程 Ax + By + C = 0
    ax.plot(x, y, 'k')
    plt.xlabel("X1")                           # 设置标签
    plt.ylabel("X2")

    fig2 = plt.figure(2)                  # 再起一个画布展示参数的收敛情况
    ax21 = fig2.add_subplot(311)
    ax22 = fig2.add_subplot(312)
    ax23 = fig2.add_subplot(313)

    x0 = [x[0] for x in weightsList]      # weightsList记录了每次权值的更新
    x1 = [x[1] for x in weightsList]
    x2 = [x[2] for x in weightsList]

    plt.xlabel("迭代次数",fontproperties=myfont)

    plt.sca(ax21)                        # sca切换ax
    ax21.plot(x0,c='red')
    plt.ylabel("x0")

    plt.sca(ax22)
    ax22.plot(x1,c='orange')
    plt.ylabel("x1")

    plt.sca(ax23)
    ax23.plot(x2,c='green')
    plt.ylabel("x2")
    plt.show()

调用函数,运行程序:

dataList, labelList = loadDataSet()          # 加载测试数据集,labelList是相应的类别标签列表
begin_time = time.time()                     # 计算该算法所用时间
weights,weightsList = gradAscent(dataList, labelList)
# weights,weightsList = stocGradAscent(dataList, labelList)
# weights,weightsList = stocGradAscent1(dataList, labelList)
end_time = time.time()
print('权值更新为:\n{0}\n算法花费时间{1}秒'.format(weights, end_time-begin_time))
weights= weights.getA()                      # 需要把weights矩阵转化为array数组,不转化的话plotBestFit中对weights取值的方法是不可行的
plotBestFit(dataList, labelList, weights)                                  # 调用函数,运行

二、算法改进:

       因为每次循环来更新回归系数 weights 时,都需要遍历整个数据集,当样本数量巨大的时候,该算法复杂度太高。由此提出了随机梯度上升法——每次只用一个样本点来更新回归系数 weights,并且应用的是array数组乘法,而不是矩阵乘法。我们把上述程序的gradAscent 函数替换为 stocGradAscent 函数,并注释掉 weights= weights.getA()语句。

def stocGradAscent(data_list, classLabel_list, numIter = 150):       # 随机梯度下降法
    dataArr = array(dataList)                         # 把列表转化为array数组,以便运算
    m,n = shape(dataArr)                              # m是样本数,n是每个样本的特征数,这里是3,特征分别是1.0,x1,x2
    alpha = 0.01
    weightsList = []
    weights = ones(n)                                 # 生成1×3维的array类型的数据
    weightsList.append(weights)
    for j in range(numIter):
        for i in range(m):                                # 注意这是数组相乘,不是矩阵相乘,注意区分
            h = sigmoid(sum(dataArr[i]*weights))          # dataArr[i]是一个1×3维的array类型的样本数据,其实就是一个点的坐标(1.0,x1,x2)
            error = classLabel_list[i]-h                  # error是一个数值
            weights = weights + alpha*error*dataArr[i]
            weightsList.append(weights)
    return weights, weightsList

运行结果如下(只迭代了一次):

可见算法时间比没改进之前缩小了一个数量级,虽然分类精度也下降不少。

看一下它的参数收敛曲线;

我们看到它有明显的周期性波动并且收敛速度慢。虽然迭代150次后,它的分类效果也不错!

 

三、再次改进算法:

      判断一个算法优劣的可靠办法之一是看它的参数是否收敛,我们在随机上升法中做了两点改动:

      (1.) 使步长可变(不断减小,但不会为零),这样会缓解数据波动。

      (2.) 真正贯彻“随机”二字,调整回归系数weights时使用随机样本。这样会减少周期波动。

def stocGradAscent1(data_list, classLabel_list, numIter=150):       # 随机梯度下降法
    weightsList = []
    dataArr = array(dataList)                         # 把列表转化为array数组,以便运算
    m,n = shape(dataArr)                              # m是样本数,n是每个样本的特征数,这里是3,特征分别是1.0,x1,x2
    weights = ones(n)                                 # 生成1×3维的array类型的数据
    weightsList.append(weights)
    for j in range(numIter):
        dataIndex = [x for x in range(m)]
        random.shuffle(dataIndex)                     # 打乱,以产生随机效果,配合choice
        for i in range(m):                                # 注意这是数组相乘,不是矩阵相乘,注意区分
            alpha = 0.01 + 4/(1.0+j+i)
            randIndex = random.choice(dataIndex)      # 随机选一个样本
            h = sigmoid(sum(dataArr[randIndex]*weights))          # dataArr[i]是一个1×3维的array类型的样本数据,其实就是一个点的坐标(1.0,x1,x2)
            error = classLabel_list[randIndex]-h                  # error是一个数值
            weights = weights + alpha*error*dataArr[randIndex]
            weightsList.append(weights)
            dataIndex.remove(randIndex)
    return weights,weightsList

它的参数收敛曲线:

可以看到基本抑制了周期性波动,并且收敛速度加快。迭代150次后,其分类效果也挺好。

最后附上参考链接:

                                https://blog.csdn.net/l18339702017/article/details/80433171

                                https://www.cnblogs.com/lc1217/p/6802637.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值