手写机器学习算法系列03——逻辑回归

引言

逻辑回归虽然名字中带有回归,但实质上并不是起到回归作用,而是一种典型的二分类算法。当然,多分类问题也可以用逻辑回归来解决,比如针对每种分类都分别建立一个二分类来实现,也可以通过改进逻辑回归算法为softmax来实现。

一、场景模拟

假设有一个考试需要考AB两门课,达到一定要求的考生就通过考试,但我们并不清楚具体的通过标准。

现在我们需要使用手上拥有的过往考生的考试成绩和通过情况,从中学习出判断通过与否的标准。已有数据如下:

Score_AScore_B是否通过(1通过,0未通过)
34.6236596245169778.02469281536240
30.2867107682260743.8949975240010060
35.8474087699387272.902198027083640
60.1825993862097586.308552095468271
79.032736050710175.34437643691031
45.08327747668338556.3163717815305060
61.1066645368476696.511425884896241
75.0247455673888946.554013541165381
76.0987867022625687.420569719268031
84.4328199612003443.5333933107210951
95.8615550709357238.225278057950940
75.0136583895824730.603263234280110
82.307053373994876.481963302356051
69.3645887597093997.718691961886081
39.53833914367222576.03681085115880
53.971052148562389.207350137502051
69.0701440628302552.7404697301676461
67.9468554771161746.678574106731280
70.6615095549943492.927137893648321
76.9787837274749947.5759636497553161
67.3720275457087642.838438320291790
89.6767757507208165.799365927452371
50.53478828988300448.855811527642050
34.2120609778678944.2095285986628850
77.924091454570468.97235999330591
62.2710136700463269.954457954475871
80.1901807509565944.8216289321835361
93.11438879744238.800670337132090
61.8302060231259550.256107892446210
38.7858037967942364.995680955395780
61.37928944742572.807887313170971
85.4045193941164557.051983976271221
52.1079797319398463.1276237688171450
52.0454047683182769.432860120452221
40.23689373545110571.167748021848750
54.6351055542481752.213885880611230
33.9155001090688798.869435742206120
64.1769888749448580.908060586708171
74.7892529594154241.573415228244340
34.183640026441975.23772033601340
83.9023936624915556.308046216053281
51.5477202690618146.856290263499760
94.4433677691785265.568921605590521
82.3687537571391940.618255159706180
51.0477517712886545.822701457760010
62.22267576120188452.0609919483667840
77.1930349260136470.45820000180961
97.771599280002386.72782233002821
62.07306379667646596.768824124139841
91.5649744980744288.696292545465981
79.9448179406693274.163119350437571
99.272526929257260.999030998449881
90.546714113998543.390601806500271
34.5245138532000960.396342458371730
50.28649611899070549.804538813230590
49.5866772163203159.808950994532650
97.6456339600776968.861572724206041
32.5772001680930995.598547613878760
74.2486913672159869.824571226571931
71.796462058633878.453562245150521
75.3956114656802985.759936673316191
35.2861128152619347.020513947234160
56.25381749711624539.2614725105801840
30.0588224466979649.592973867236850
44.6682617248089366.450086145589130
66.5608944724295441.092098079369730
40.4575509837516497.535185489099351
49.0725632190884451.883211820739660
80.2795740146699892.116060813440841
66.7467185694403960.991394027409891
32.72283304060323443.3071730643006350
64.039320415060178.031688020182321
72.3464942257992396.227592967614041
60.457885739189673.094998097580371
58.84095621726801575.858448312790431
99.8278577969212872.369251933838841
47.2642691084817488.475864995597821
50.4581598028598875.809859529824561
60.4555562927153242.508409435722170
82.226661577855742.7198785371645760
88.9138964166532969.803788898354721
94.8345067243019545.694306802507541
67.3192574691752766.589353177479151
57.23870631569861459.514281980129571
80.3667560017127390.960147897469541
68.4685217859111285.594307104520141
42.075454538473178.844786001480440
75.4777020053390590.424538997539631
78.635424348980296.647427168856421
52.3480039879410860.769505256025920
94.0943311251679377.159105090738931
90.4485509709636587.508791764847021
55.4821611406958535.5707034722886560
74.4926924184304184.845136849301351
89.845806707209845.358283610916581
83.4891627449823848.380285797281751
42.26170080998170487.103850940254561
99.3150088051039368.775409472066171
55.3400175600370364.93193800694861
74.7758930009276789.529812895132761

我们不妨将通过考试的样本作为正例,未通过的作为负例。

二、建立分类器

根据以上场景,我们可以首先用线性模型来分析:

y = S c o r e A × θ A + S c o r e B × θ B + θ 0 y = Score_A \times \theta_A + Score_B \times \theta_B +\theta_0 y=ScoreA×θA+ScoreB×θB+θ0

如果 y y y大于某个阈值时,即判定为正例,表示会通过考试;否则判定为负例,表示不会通过考试。

但仅仅使用线性模型将存在问题:x的输入很大时,y的输出也是无限大小的,不利于设置决策边界。

因此我们需要使用到一种激活函数来进行值映射。这里我们采用sigmoid函数。

1.sigmoid函数

g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1

sigmoid函数具有良好的特性:处处可导(连续)、自变量为任意实数、值域在[0,1]之间。函数图像如下:
sigmoid函数图像

我们可以利用该函数将线性模型得到预测值映射到[0,1]之间,方便我们进行分类:

h θ ( x ) = g ( θ x ) = 1 1 + e − θ x \begin{aligned} h_\theta(x) &= g(\theta x)\\ & =\frac{1}{1+e^{-\theta x}} \end{aligned} hθ(x)=g(θx)=1+eθx1

比如可以设置阈值为0.5,当 h θ ( x ) > 0.5 h_\theta(x)>0.5 hθ(x)>0.5时判断为正例,表示会通过考试; h θ ( x ) < 0.5 h_\theta(x)<0.5 hθ(x)<0.5则判定为负例,表示不会通过考试。可根据实际效果情况适当调整阈值大小。

2.损失函数

刚学完线性回归的同学,可能立马想到使用线性回归中的损失函数:

J ( θ ) = ∑ i = 1 m ( y i − h θ ( x i ) ) 2 = ∑ i = 1 m ( y i − 1 1 + e − θ x ) 2 \begin{aligned} J(\theta) &= \sum_{i=1}^{m}(y^i-h_\theta(x^i))^2 \\ & = \sum_{i=1}^{m}(y^i-\frac{1}{1+e^{-\theta x}})^2 \end{aligned} J(θ)=i=1m(yihθ(xi))2=i=1m(yi1+eθx1)2

使用最小二乘法来作为逻辑回归的损失函数理论上是可行的,但是实际操作困难,因为利用最小二乘法构造的损失函数在逻辑回归中不是一个凸函数,可能存在许多的局部最优解,就如下图所示:
在这里插入图片描述

因此逻辑回归中的损失函数并不采用最小二乘法,而是采用最大似然。

由于是二分类问题,我们不妨假设样本为正例和负例的概率分别为:

P ( y = 1 ∣ x ; θ ) = h θ ( x ) P ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) \begin{aligned} & P(y=1|x;\theta)= h_\theta(x) \\ & P(y=0|x;\theta)= 1-h_\theta(x) \end{aligned} P(y=1x;θ)=hθ(x)P(y=0x;θ)=1hθ(x)

我们将上面两个式子整合:

P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y (1) P(y|x;\theta) = (h_\theta(x))^y(1-h_\theta(x))^{1-y} \tag 1 P(yx;θ)=(hθ(x))y(1hθ(x))1y(1)

之所以能这样整合,因为二分类问题y的取值只有1和0:当y=1时保留左半边,y=0时保留右半边。

整体样本的似然函数 L L L为:

L ( θ ) = ∏ i = 1 m P ( y ∣ x ; θ ) = ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) (2) \begin{aligned} L(\theta) &= \prod_{i=1}^mP(y|x;\theta) \\ & = \prod_{i=1}^m(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}} \tag 2 \end{aligned} L(θ)=i=1mP(yx;θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)(2)

为了方便求偏导,取对数似然:

ln ⁡ L ( θ ) = ln ⁡ ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) = ∑ i = 1 m ln ⁡ ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) = ∑ i = 1 m ( y ( i ) ln ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) ln ⁡ ( 1 − h θ ( x ( i ) ) ) ) \begin{aligned} \ln L(\theta) &= \ln \prod_{i=1}^m(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}} \\ & = \sum_{i=1}^{m} \ln (h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}} \\ & = \sum_{i=1}^{m}( y^{(i)}\ln h_\theta(x^{(i)}) +(1-y^{(i)})\ln (1-h_{\theta}(x^{(i)}))) \end{aligned} lnL(θ)=lni=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)=i=1mln(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)=i=1m(y(i)lnhθ(x(i))+(1y(i))ln(1hθ(x(i))))

得到了对数似然函数后,求解能够使函数值最大的 θ \theta θ值成为了我们的目标。

我们可以使用梯度上升来求解函数最大值,不过为了和前面的梯度下降联系起来,这里我们可以引入:

J ( θ ) = − 1 m ln ⁡ L ( θ ) J(\theta)=-\frac{1}{m}\ln L(\theta) J(θ)=m1lnL(θ)

这样我们就将之转化为对目标函数 J ( θ ) J(\theta) J(θ)梯度下降任务。

3.目标函数求偏导

∇ J ( θ ) = ∇ ( − 1 m ln ⁡ L ( θ ) ) = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ( i ) ) − ( 1 − y ( i ) ) ( 1 1 − h θ ( x ( i ) ) ) ) ∇ g ( θ x ) = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ x ( i ) ) − ( 1 − y ( i ) ) ( 1 1 − g ( θ x ( i ) ) ) ) ( 1 − g ( θ x ( i ) ) ) g ( θ x ( i ) ) ∇ ( θ x ( i ) ) = − 1 m ∑ i = 1 m ( y ( i ) ( 1 − g ( θ x ( i ) ) ) − ( 1 − y ( i ) ) g ( θ x ( i ) ) ) x ( i ) = − 1 m ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x ( i ) \begin{aligned} \nabla J(\theta) & =\nabla (-\frac{1}{m}\ln L(\theta)) \\ & =-\frac{1}{m}\sum_{i=1}^{m} (y^{(i)}\frac{1}{h_{\theta}(x^{(i)})}-(1-y^{(i)})(\frac{1}{1-h_{\theta}(x^{(i)})}))\nabla g(\theta x) \\ & = -\frac{1}{m}\sum_{i=1}^{m} (y^{(i)}\frac{1}{g(\theta x^{(i)})}-(1-y^{(i)})(\frac{1}{1-g(\theta x^{(i)})}))(1-g(\theta x^{(i)}))g(\theta x^{(i)})\nabla (\theta x^{(i)})\\ & = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}(1-g(\theta x^{(i)}) )-(1-y^{(i)})g(\theta x^{(i)}))x^{(i)}\\ & = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x^{(i)} \end{aligned} J(θ)=(m1lnL(θ))=m1i=1m(y(i)hθ(x(i))1(1y(i))(1hθ(x(i))1))g(θx)=m1i=1m(y(i)g(θx(i))1(1y(i))(1g(θx(i))1))(1g(θx(i)))g(θx(i))(θx(i))=m1i=1m(y(i)(1g(θx(i)))(1y(i))g(θx(i)))x(i)=m1i=1m(y(i)hθ(x(i)))x(i)

求导过程需要用到加减乘除法的求导法则以及复合函数的求导法则,这里不再展开了。

上式的 y y y x x x都是已知的,只需编写程序代入即可。

三、 手写逻辑回归算法

1.sigmoid函数实现

def sigmoid(x):
    return 1/(1+np.exp(-x))

2.预测模型构建

def model(X, thetas):
    return sigmoid(np.dot(X, thetas))

3.分类器实现

def classify(X, thetas, threshold):
    return model(X, thetas) > threshold

3.梯度下降实现

def gradient_function(thetas, X, y):
    '''计算梯度
    '''
    diff = model(X, thetas)
    return -(1/X.shape[0])*(np.dot((y-model(X, thetas)).reshape(1, X.shape[0]), X)).T
    
def gradient_descent(X, y, alpha):
    '''梯度下降迭代计算
    '''
    thetas = np.zeros(X.shape[1]).reshape(X.shape[1], 1)
    gradient = gradient_function(thetas, X, y)
    while not np.all(np.absolute(gradient) < 1e-2):
        thetas = thetas-alpha*gradient
        gradient = gradient_function(thetas, X, y)
    return thetas

4.评价指标函数

def accuracy(prediction, reality):
    '''准确率
    '''
    count = 0
    arr=prediction==reality 
    for item in arr:
        if(item==True):
            count+=1
    return count/len(arr)

def recall(prediction,reality):       
    '''召回率
    ''' 
    count_tp = 0
    count_fn=0
    for i in range(len(prediction)):
        if(reality[i]==True and prediction[i]==True):
            count_tp +=1
        if(reality[i]==True and prediction[i]==False):
            count_fn+=1
    return count_tp/(count_tp+count_fn)

def precision(prediction,reality):
    '''精确度
    '''
    count_tp=0
    count_fp=0
    for i in range(len(prediction)):
        if(reality[i]==True and prediction[i]==True):
            count_tp +=1
        if(reality[i]==False and prediction[i]==True):
            count_fp +=1
    return count_tp/(count_fp+count_tp)

四、完整代码与测试结果

我们将逻辑回归的核心函数封装成class,并对本文的案例数据集进行测试:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


class LogisticRegression:

    @classmethod
    def sigmoid(cls, x):
        return 1/(1+np.exp(-x))

    @classmethod
    def model(cls, X, thetas):
        return cls.sigmoid(np.dot(X, thetas))

    @classmethod
    def classify(cls, X, thetas, threshold):
        return cls.model(X, thetas) > threshold

    @classmethod
    def gradient_function(cls, thetas, X, y):
        '''计算梯度
        '''
        diff = cls.model(X, thetas)
        return -(1/X.shape[0])*(np.dot((y-cls.model(X, thetas)).reshape(1, X.shape[0]), X)).T

    @classmethod
    def gradient_descent(cls, X, y, alpha):
        '''梯度下降迭代计算
        '''
        thetas = np.zeros(X.shape[1]).reshape(X.shape[1], 1)
        gradient = cls.gradient_function(thetas, X, y)
        print(gradient)
        while not np.all(np.absolute(gradient) < 1e-2):
            thetas = thetas-alpha*gradient
            gradient = cls.gradient_function(thetas, X, y)
            print(gradient)
        return thetas

    @classmethod
    def accuracy(cls, prediction, reality):
        count = 0
        arr=prediction==reality 
        for item in arr:
            if(item==True):
                count+=1
        return count/len(arr)

    @classmethod
    def recall(cls,prediction,reality):        
        count_tp = 0
        count_fn=0
        for i in range(len(prediction)):
            if(reality[i]==True and prediction[i]==True):
                count_tp +=1
            if(reality[i]==True and prediction[i]==False):
                count_fn+=1
        return count_tp/(count_tp+count_fn)

    @classmethod
    def precision(cls,prediction,reality):
        count_tp=0
        count_fp=0
        for i in range(len(prediction)):
            if(reality[i]==True and prediction[i]==True):
                count_tp +=1
            if(reality[i]==False and prediction[i]==True):
                count_fp +=1
        return count_tp/(count_fp+count_tp)



if __name__ == "__main__":
    pdData = pd.read_csv("LogiReg_data.csv", header=None,
                         names=["Score_A", "Score_B", "Admitted"])
    pdData.insert(0, "one", 1)
    orig_data = pdData.values
    cols = orig_data.shape[1]
    X = orig_data[:, 0:cols-1]
    x = orig_data[:, 1:cols-1]
    y = orig_data[:, cols-1:cols]
    thetas =LogisticRegression.gradient_descent(X,y,0.001)
    print(thetas) # 打印出最优的θ
    prediction=LogisticRegression.classify(X,thetas, 0.5) #分类器的阈值设定0.5
    print(prediction)#打印预测分类值
    accuracy=LogisticRegression.accuracy(prediction,y)
    print("accuracy:"+str(accuracy))# 打印准确率
    recall=LogisticRegression.recall(prediction,y)
    print("recall:"+str(recall))#打印召回率
    precision = LogisticRegression.precision(prediction,y)
    print("precicion:"+str(precision))#打印精确度

输出结果:

thetas:
[[-11.83740615]
[ 0.09998542]
[ 0.09388469]]
accuracy:0.91
recall:0.95
precicion:0.9047619047619048

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

黄嘉成

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值