统计学习方法笔记(八)-逻辑斯谛回归(LogisticRegression)原理及python实现

逻辑斯谛回归

逻辑斯谛回归(Logistic Regression, LR)是统计学习中的经典分类方法。常见的逻辑斯谛回归模型包括二项逻辑斯谛回归、多项逻辑斯谛回归(多项逻辑斯谛回归可以看做是二项LR的扩展).

定义 逻辑斯谛回归模型

二项逻辑斯谛回归

二项逻辑斯谛回归模型是如下条件概率分布:
(1) P ( Y = 1 ∣ x ) = exp ⁡ ( w ⋅ x + b ) 1 + exp ⁡ ( w ⋅ x + b ) {P(Y=1|x)=\frac{\exp(w \cdot x+b)}{1+\exp(w \cdot x+b)} \tag{1}} P(Y=1x)=1+exp(wx+b)exp(wx+b)(1)

(2) P ( Y = 0 ∣ x ) = 1 1 + exp ⁡ ( w ⋅ x + b ) {P(Y=0|x)=\frac{1}{1+\exp(w \cdot x+b)}\tag{2}} P(Y=0x)=1+exp(wx+b)1(2)
其中 x ∈ R n x\in \mathcal{R}^{n} xRn是输入, Y ∈ { 0 , 1 } Y\in \{0,1\} Y{0,1}是输出, w ∈ R n w \in \mathcal{R}^{n} wRn是权值向量参数, b ∈ R b\in \mathcal{R} bR称为偏置, w ⋅ x w\cdot x wx w w w x x x的内积。

将权值向量和输入向量加以扩充,仍记作 w w w, x x x,即 w = ( w ( 1 ) , w ( 2 ) , . . . , w ( n ) , b ) T w=\left(w^{(1)},w^{(2)},...,w^{(n)},b \right)^{T} w=(w(1),w(2),...,w(n),b)T, x = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) x=\left(x^{(1)},x^{(2)},...,x^{(n)}\right) x=(x(1),x(2),...,x(n)),则二项逻辑斯谛回归模型如下:

(3) P ( Y = 1 ∣ x ) = exp ⁡ ( w ⋅ x ) 1 + exp ⁡ ( w ⋅ x ) {P(Y=1|x)=\frac{\exp(w\cdot x)}{1+\exp(w\cdot x)} \tag{3}} P(Y=1x)=1+exp(wx)exp(wx)(3)

(4) P ( Y = 0 ∣ x ) = 1 1 + exp ⁡ ( w ⋅ x ) {P(Y=0|x)=\frac{1}{1+\exp(w\cdot x)}\tag{4}} P(Y=0x)=1+exp(wx)1(4)

多项逻辑斯谛回归

二项逻辑斯谛回归常用于二分类,可以将二项逻辑斯谛回归扩展为多项逻辑斯谛回归模型,用于多分类任务。

设离散型随机变量 Y Y Y的取值集合为 { 1 , 2 , . . . , K } \{1,2,...,K\} {1,2,...,K},则多项逻辑斯谛回归模型为:
(5) P ( Y = k ∣ x ) = exp ⁡ ( w k ⋅ x ) 1 + ∑ k = 1 K − 1 exp ⁡ ( w k ⋅ x ) {P(Y=k|x)=\frac{\exp(w_k \cdot x)}{1+\sum_{k=1}^{K-1} \exp(w_k \cdot x)}\tag{5}} P(Y=kx)=1+k=1K1exp(wkx)exp(wkx)(5)
其中, x ∈ R n + 1 , w k ∈ R n + 1 x \in \mathcal{R}^{n+1},w_k \in \mathcal{R}^{n+1} xRn+1,wkRn+1

代码案例

LogisticRegression算法案例 python实现(iris数据)

案例代码已上传:Github地址

from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

定义LR回归模型分类器

class LogisticReression:
    def __init__(self, max_iter=200, learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate

    def sigmoid(self, x):
        return 1 / (1 + exp(-x))

    def data_matrix(self, X):
        data_mat = []
        for d in X:
            data_mat.append([1.0, *d])
        return data_mat
    #训练
    def train(self, X, y):
        # label = np.mat(y)
        data_mat = self.data_matrix(X)  # m*n
        self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)

        for iter_ in range(self.max_iter):
            for i in range(len(X)):
                result = self.sigmoid(np.dot(data_mat[i], self.weights))
                error = y[i] - result
                self.weights += self.learning_rate * error * np.transpose(
                    [data_mat[i]])
        print('LR模型学习率={},最大迭代次数={}'.format(
            self.learning_rate, self.max_iter))

    # 准确率
    def accuracy(self, X_test, y_test):
        right = 0
        X_test = self.data_matrix(X_test)
        for x, y in zip(X_test, y_test):
            result = np.dot(x, self.weights)
            if (result > 0 and y == 1) or (result < 0 and y == 0):
                right += 1
        return right / len(X_test)

第一步:构建数据

def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, [0,1,-1]])
    return data[:,:2], data[:,-1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

第二步:训练并测试精度

LR = LogisticReression()
LR.train(X_train, y_train)
LR.accuracy(X_test, y_test)

LR模型学习率=0.01,最大迭代次数=200
精度:1.0

第三步:效果展示

x_ponits = np.arange(3, 9)
y_ = -(LR.weights[1]*x_ponits + LR.weights[0])/LR.weights[2]
plt.plot(x_ponits, y_)

#绘制图
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()

在这里插入图片描述

sklearn中的LogisticRegression案例代码

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn

第一步:构建数据集

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
              'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
              'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
              'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]
              }

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','admitted'])
df[:10]

在这里插入图片描述

X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
#  75%的数据用来做训练集,25%的数据用作测试集
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)

logistic_regression= LogisticRegression()
#训练
logistic_regression.fit(X_train,y_train)
#预测
y_pred=logistic_regression.predict(X_test)

#绘制热力图
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)

print('精度: ',metrics.accuracy_score(y_test, y_pred))

案例地址

案例代码已上传:Github地址

参考资料:
[1] 《统计学习方法》

Github地址https://github.com/Vambooo/lihang-dl

更多技术干货在公众号:深度学习学研社
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

脑机接口社区

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值