回归----逻辑回归

xiaoming1999

于 2021-10-29 15:15:38 发布

阅读量479

点赞数

分类专栏：机器学习文章标签：逻辑回归回归机器学习

本文链接：https://blog.csdn.net/xiaoming1999/article/details/121023957

版权

机器学习专栏收录该内容

18 篇文章 3 订阅

订阅专栏

引入

逻辑回归是在线性回归的基础上加了一个 Sigmoid 函数（非线形）映射，使得逻辑回归称为了一个优秀的分类算法。本质上来说，两者都属于广义线性模型，但他们两个要解决的问题不一样，逻辑回归解决的是分类问题，输出的是离散值，线性回归解决的是回归问题，输出的连续值。

我们定义逻辑回归的预测函数为，其中g(x)函数是sigmoid函数，如下图：

决策的边界

假设w0 = -3 ,w1 = 1, w2 =1 ，那么

代价函数

可以将上边的代价函数简化为

梯度下降法

代价函数：

目标：min 即需要求出

那么

在用代码实现之前，插入两个概念：

正确率和召回率

正确率与召回率是广泛应用于信息检索和统计学分类领域的两个度量值，用来评价结果的质量。一般来说，正确率就是检索出来的条目有多少是正确的，召回率就是所有正确的条目有多少被检索出来了。

F1 = 2*（正确率*召回率）/ (正确率+召回率），这个是综合上边两个指标的评估指标，用于综合反映整体的指标。

这几个指标的取值都在0-1之间，数值越接近于1，效果越好。

梯度下降法实现逻辑回归代码

导入所用的包

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report  #对模型进行评估
from sklearn import preprocessing  #对数据做标准化
scale = True
%matplotlib inline

导入数据对数据进行处理，画散点图

data = np.genfromtxt('LR-testSet.csv',delimiter = ',')
x_data = data[:,:-1]
y_data = data[:,-1]

def plot():
    x0 = []
    y0 = []
    x1 = []
    y1 = []
    
    for i in range(len(x_data)):
        if y_data[i] == 0:
            x0.append(x_data[i,0])
            y0.append(x_data[i,1])
        else:
            x1.append(x_data[i,0])
            y1.append(x_data[i,1])
    
    #画图
    scatter0 = plt.scatter(x0,y0,c='b',marker='o')
    scatter1 = plt.scatter(x1,y1,c='r',marker='x')
    #画图例
    plt.legend(handles = [scatter0,scatter1],labels=['label0','label1'],loc ='best')
    
plot()
plt.show()

将数据转换成矩阵形式，添加偏置

#数据进行处理
x_data = data[:,:-1]
y_data = data[:,-1,np.newaxis]
#转成矩阵
print(np.mat(x_data).shape)
print(np.mat(y_data).shape)
#添加偏执
X_data = np.concatenate((np.ones((100,1)),x_data),axis = 1)
print(X_data.shape)

实现sigmoid函数以及代价函数

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

def cost(Xmat,Ymat,ws):
    left  = np.multiply(Ymat,np.log(sigmoid(Xmat*ws)))
    right = np.multiply(1-Ymat,np.log(1-sigmoid(Xmat*ws)))
    return np.sum(left+right) / -(len(Xmat))

def gradAscent(Xarr,Yarr):
    
    if scale == True:
        Xarr = preprocessing.scale(Xarr)
    Xmat = np.mat(Xarr)
    Ymat = np.mat(Yarr)
    
    lr = 0.001
    epochs  =10000
    costList = []
    
    m,n = np.shape(Xmat) #行代表数据的个数，列代表权值的个数
    ws = np.mat(np.ones((n,1))) #初始化权值
    
    for i in range(epochs+1):
        h = sigmoid(Xmat*ws)
        ws_grad = Xmat.T*(h - Ymat)/m
        ws = ws - lr *ws_grad
        
        if i % 50 ==0:
            costList.append(cost(Xmat,Ymat,ws))
        
    return ws,costList

调用函数

ws,costList = gradAscent(X_data,y_data)

做预测

#做预测
def predict(x_data,ws):
     if scale == True:
        x_data = preprocessing.scale(x_data)
     Xmat = np.mat(x_data)
     ws = np.mat(ws)
     return [1 if x >= 0.5 else 0 for x in sigmoid(Xmat*ws)]

predictions = predict(X_data,ws)

print(classification_report(y_data,predictions))

结果显示：在导入包中，我们最初领scale = True ,代表做标准化，结果为0.96

我们改为scale = False ,结果显示如下，为0.92

从上边对比，我们可以看到综合评估的指标一般会更好。

sklearn实现逻辑回归代码

导入所用的包

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report  #对模型进行评估
from sklearn import preprocessing  #对数据做标准化
from sklearn import linear_model
#数据是否需要标准化
scale =True
%matplotlib inline

导入数据对数据进行处理，画散点图

data = np.genfromtxt('LR-testSet.csv',delimiter = ',')
x_data = data[:,:-1]
y_data = data[:,-1]

def plot():
    x0 = []
    y0 = []
    x1 = []
    y1 = []
    
    for i in range(len(x_data)):
        if y_data[i] == 0:
            x0.append(x_data[i,0])
            y0.append(x_data[i,1])
        else:
            x1.append(x_data[i,0])
            y1.append(x_data[i,1])
    
    #画图
    scatter0 = plt.scatter(x0,y0,c='b',marker='o')
    scatter1 = plt.scatter(x1,y1,c='r',marker='x')
    #画图例
    plt.legend(handles = [scatter0,scatter1],labels=['label0','label1'],loc ='best')
    
plot()
plt.show()

训练模型

logistic = linear_model.LogisticRegression()
logistic.fit(x_data,y_data)

做预测

predictions = logistic.predict(x_data)

print(classification_report(y_data,predictions))

结果显示

xiaoming1999

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
回归----逻辑回归

引入逻辑回归是在线性回归的基础上加了一个 Sigmoid 函数（非线形）映射，使得逻辑回归称为了一个优秀的分类算法。本质上来说，两者都属于广义线性模型，但他们两个要解决的问题不一样，逻辑回归解决的是分类问题，输出的是离散值，线性回归解决的是回归问题，输出的连续值。我们定义逻辑回归的预测函数为，其中g(x)函数是sigmoid函数，如下图：决策的边界假设w0 = -3 ,w1 = 1, w2 =1 ，那么代价函数...
复制链接

扫一扫