数学推导+纯Python实现机器学习算法2:逻辑回归


数学推导

逻辑回归模型中参数估计的推导过程:

逻辑回归(logistic regression)本质上跟逻辑这个词不是很搭边,叫这个名字完全是直译过来形成的,逻辑回归本名应该叫对数几率回归。众多周知的是,线性回归针对的是标签为连续值的机器学习任务,如果我们想用线性模型来做分类,应该应用逻辑回归模型。

sigmoid函数:
相较于线性回归的因变量 y y y 为连续值,逻辑回归的因变量则是一个 0/1 的二分类值,这就需要我们建立一种映射将原先的实值转化为 0/1 值。这时候就要请出我们熟悉的 sigmoid 函数了:

f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+ex1

其函数图形如下:
在这里插入图片描述

除了长的很优雅之外,sigmoid 函数还有一个很好的特性就是其求导计算等于下式,这给我们后续求交叉熵损失的梯度时提供了很大便利。

f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) f^{'}(x)=f(x)(1-f(x)) f(x)=f(x)(1f(x))

由 sigmoid 函数可知逻辑回归模型的基本形式为:

y = 1 1 + e − z = 1 1 + e − w T x + b y=\frac{1}{1+e^{-z}}=\frac{1}{1+e^{-\boldsymbol w^T\boldsymbol x+b}} y=1+ez1=1+ewTx+b1

稍微对上式做一下转换:

l n y 1 − y = w T x + b ln\frac{y}{1-y}=\boldsymbol w^T\boldsymbol x+b ln1yy=wTx+b

y y y 视为类后验概率 p ( y = 1 ∣ x ) p(y = 1 | x) p(y=1x),则上式可以写为:

l n p ( y = 1 ∣ x ) p ( y = 0 ∣ x ) = w T x + b ln\frac{p(y = 1 | x)}{p(y = 0 | x)}=\boldsymbol w^T\boldsymbol x+b lnp(y=0x)p(y=1x)=wTx+b

则有:

p ( y = 1 ∣ x ) = e w T x + b 1 + e w T x + b = y ^ p(y = 1 | x)=\frac{e^{\boldsymbol w^T\boldsymbol x+b}}{1+e^{\boldsymbol w^T\boldsymbol x+b}}=\hat y p(y=1x)=1+ewTx+bewTx+b=y^

p ( y = 0 ∣ x ) = 1 1 + e w T x + b = 1 − y ^ p(y = 0 | x)=\frac{1}{1+e^{\boldsymbol w^T\boldsymbol x+b}}=1-\hat y p(y=0x)=1+ewTx+b1=1y^

将上式进行简单综合,可写成如下形式:

p ( y ∣ x ) = y ^ y ( 1 − y ^ ) 1 − y p(y | x)=\hat y^y(1-\hat y)^{1-y} p(yx)=y^y(1y^)1y

写成对数形式就是我们熟知的交叉熵损失函数了,这也是交叉熵损失的推导由来:

l n p ( y ∣ x ) = y l o g y ^ + ( 1 − y ) l o g ( 1 − y ^ ) lnp(y | x)=ylog\hat y+(1-y)log(1-\hat y) lnp(yx)=ylogy^+(1y)log(1y^)

l n p ( y ∣ x ) = y l o g 1 1 + e − w T x + b + ( 1 − y ) l o g ( 1 − 1 1 + e − w T x + b ) lnp(y | x)=ylog\frac{1}{1+e^{-\boldsymbol w^T\boldsymbol x+b}}+(1-y)log(1-\frac{1}{1+e^{-\boldsymbol w^T\boldsymbol x+b}}) lnp(yx)=ylog1+ewTx+b1+(1y)log(11+ewTx+b1)

最优化上式子本质上就是我们统计上所说的求其极大似然估计,可基于上式分别关于 W 和b 求其偏导可得:

∂ L ∂ w = 1 m x ( y ^ − y ) \frac{\partial L}{\partial w}=\frac{1}{m}x(\hat y-y) wL=m1x(y^y)

∂ L ∂ b = 1 m ∑ i = 1 m ( y ^ − y ) \frac{\partial L}{\partial b}=\frac{1}{m}\sum^{m}_{i=1}(\hat y-y) bL=m1i=1m(y^y)

基于 w {w} w b {b} b 的梯度进行权值更新即可求导参数的最优值,使得损失函数最小化,也即求得参数的极大似然估计。


Numpy 实现

  • 定义sigmoid 函数:
import numpy as np
def sigmoid(x):
    z = 1 / (1 + np.exp(-x))    
    return z
  • 参数初始化:
def initialize_params(dims):
    w = np.zeros((dims, 1))
    b = 0
    return w, b
  • 回归模型主体:
import numpy as np

def logistic(X, y, w, b):
    num_train = X.shape[0]
    num_feature = X.shape[1]    
    # 模型公式
    a = sigmoid(np.dot(X, w) + b)
    # 损失函数
    loss = -np.sum(y*np.log(a) + (1-y)*np.log(1-a))/num_train    
    # 参数的偏导
    dw = np.dot(X.T, (a-y)) /num_train
    db = np.sum(a-y) /num_train  
    loss = np.squeeze(loss)   
    return a, loss, dw, db
  • 基于梯度下降的模型训练过程:
def logistic_train(X, y, learning_rate, epochs):
    # 初始化模型参数
    w, b = initialize(X.shape[1])  
    loss_list = []  
    # 迭代训练
    for i in range(1, epochs):        
    # 计算当前预测值、损失和参数偏导
        a, loss, dw, db = linar_loss(X, y, w, b)  
       
        # 基于梯度下降的参数更新过程
        w += -learning_rate * dw
        b += -learning_rate * db        
        # 打印迭代次数和损失

        # 记录损失
        if i % 100 == 0:
            loss_list.append(loss)      
        # 打印训练过程中的损失 
        if i % 100 == 0:
            print('epoch %d loss %f' % (i, loss)) 

               
        # 保存参数
        params = {            
            'w': w,            
            'b': b
        }        
        
        # 保存梯度
        grads = {            
            'dw': dw,            
            'db': db
        }    
            
    return loss_list, params, grads
  • 以 sklearn 中的 diabetes 数据集为例进行简单的训练,数据准备:
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_classification

X,labels=make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=2)
rng=np.random.RandomState(2)
X+=2*rng.uniform(size=X.shape)

unique_lables=set(labels)
colors=plt.cm.Spectral(np.linspace(0, 1, len(unique_lables)))
for k, col in zip(unique_lables, colors):
    x_k=X[labels==k]
    plt.plot(x_k[:, 0], x_k[:, 1], 'o', markerfacecolor=col, markeredgecolor="k",
             markersize=14)
plt.title('data by make_classification()')
plt.show()

在这里插入图片描述

  • 对数据进行简单的训练集与测试集的划分:
# 训练集与测试集的简单划分
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], labels[:offset]
X_test, y_test = X[offset:], labels[offset:]
y_train = y_train.reshape((-1,1))
y_test = y_test.reshape((-1,1))

print('X_train=', X_train.shape)
print('X_test=', X_test.shape)
print('y_train=', y_train.shape)
print('y_test=', y_test.shape)

X_train= (90, 2)
X_test= (10, 2)
y_train= (90, 1)
y_test= (10, 1)

  • 执行训练:
loss_list, params, grads = logistic_train(X_train, y_train, 0.01, 1000)

epoch 0 loss 0.693147
epoch 100 loss 0.554066
epoch 200 loss 0.480953
epoch 300 loss 0.434738
epoch 400 loss 0.402395
epoch 500 loss 0.378275
epoch 600 loss 0.359468
epoch 700 loss 0.344313
epoch 800 loss 0.331783
epoch 900 loss 0.321216

  • 定义预测函数对测试集结果进行预测:
def predict(X, params):
    w = params['w']
    b = params['b']

    y_prediction = sigmoid(np.dot(X, w) + b)    
    for i in range(len(y_prediction)):        
        if y_prediction[i] > 0.5:
            y_prediction[i] = 1
        else:
            y_prediction[i] = 0
    return y_prediction
  • 对测试集数据进行预测:
y_prediction = predict(X_test, params)
print(y_prediction)

[[ 0. ],
[ 1. ],
[ 1. ],
[ 0. ],
[ 1. ],
[ 1. ],
[ 0. ],
[ 0. ],
[ 1. ],
[ 0. ]]

  • 定义分类准确率函数对训练集和测试集的准确率进行评估:
def accuracy(y_test, y_pred):
    correct_count = 0
    for i in range(len(y_test)):        
        for j in range(len(y_pred)):            
            if y_test[i] == y_pred[j] and i == j:
                correct_count +=1

    accuracy_score = correct_count / len(y_test)    
    return accuracy_score
    
# 打印训练准确率
accuracy_score_train = accuracy(y_train, y_train_pred)
print(accuracy_score_train)

0.88888888888888888888

  • 查看测试集准确率:
# 打印训练准确率
accuracy_score_test = accuracy(y_test, y_prediction)
print(accuracy_score_test)

1.0

没有进行交叉验证,测试集准确率存在一定的偶然性。

  • 最后,定义绘制模型决策边界的图形函数对训练结果进行可视化展示:
def plot_logistic(X_train, y_train, params):
    n = X_train.shape[0]
    xcord1 = []
    ycord1 = []
    xcord2 = []
    ycord2 = []    
    for i in range(n):        
        if y_train[i] == 1:
            xcord1.append(X_train[i][0])
            ycord1.append(X_train[i][1])        
        else:
            xcord2.append(X_train[i][0])
            ycord2.append(X_train[i][1])
        
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1,s=32, c='red')
    ax.scatter(xcord2, ycord2, s=32, c='green')
    x = np.arange(-1.5, 3, 0.1)
    y = (-params['b'] - params['W'][0] * x) / params['W'][1]
    ax.plot(x, y)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()

plot_logistic(X_train, y_train, params)

在这里插入图片描述

  • 封装一个逻辑回归类
import numpy as np
from sklearn.datasets.samples_generator import make_classification
import matplotlib.pyplot as plt

class logistic_regression():
    def __init__(self):
        pass

    def sigmoid(self, x):
        z = 1 / (1 + np.exp(-x))
        return z

    # 参数初始化:
    def initialize_params(self, dims):
        w = np.zeros((dims, 1))
        b = 0
        return w, b

    # 数据准备:
    # make_classification功能:生成样本集,通常用于分类算法
    # 属性:
    # n_samples=100: 样本个数
    # n_features=2: 特征个数 = n_informative + n_redundant + n_repeated
    # n_redundant=0:冗余信息,informative特征的随机线性组合
    # n_informative=2:多信息特征的个数
    # n_repeated=0:重复信息,随机提取n_informative和n_redundant
    # random_state=1: 如果是int,random_state是随机数发生器使用的种子
    # n_classes:分类类别
    # n_clusters_per_class=2 :某一个类别是由几个cluster构成的
    # 返回值
    # X:形状数组[n_samples,n_features] 生成的样本
    # y:形状数组[n_samples] 每个样本的类成员的整数标签
    def prepare_data(self):
        X, labels = make_classification(n_samples=100, n_features=2,
                                        n_redundant=0, n_informative=2,
                                        random_state=1, n_clusters_per_class=2)

        labels = labels.reshape((-1, 1))
        offset = int(X.shape[0] * 0.9)
        X_train, y_train = X[:offset], labels[:offset]
        X_test, y_test = X[offset:], labels[offset:]
        return X_train, y_train, X_test, y_test

    # 回归模型主体:
    def logistic(self, X, y, w, b):
        num_train = X.shape[0]
        num_feature = X.shape[1]

        # 模型公式
        a = self.sigmoid(np.dot(X, w) + b)

        # 损失函数
        loss = -1 / num_train * np.sum(y * np.log(a) + (1 - y) * np.log(1 - a))
        # 参数的偏导
        dw = np.dot(X.T, (a - y)) / num_train
        db = np.sum(a - y) / num_train
        return a, loss, dw, db

    # 基于梯度下降的模型训练过程
    def logistic_train(self, X, y, learning_rate, epochs):
        w, b = self.initialize_params(X.shape[1])

        loss_list = []

        # 计算当前预测值、损失和参数偏导
        for i in range(1, epochs):
            a, loss, dw, db = self.logistic(X, y, w, b)

            # 基于梯度下降的参数更新过程
            w += -learning_rate * dw
            b += -learning_rate * db

            # 打印迭代次数和损失
            if i % 100 == 0:
                loss_list.append(loss)
            if i % 100 == 0:
                print('epoch %d loss %f' % (i, loss))

            # 保存参数
            params = {
                'w': w,
                'b': b
            }

            # 保存梯度
            grads = {
                'dw': dw,
                'db': db
            }
        return loss_list, params, grads

    # 定义预测函数对测试集结果进行预测
    def predict(self, X, params):
        w = params['w']
        b = params['b']
        y_prediction = np.dot(X, w) + b

        for i in range(len(y_prediction)):
            if y_prediction[i] > 0.5:
                y_prediction[i] = 1
            else:
                y_prediction[i] = 0

        return y_prediction

    # 精确度计算
    def accuracy(self, y_test, y_pred):
        correct_count = 0
        for i in range(len(y_test)):
            for j in range(len(y_pred)):
                if y_test[i] == y_pred[j] and i == j:
                    correct_count += 1

        accuracy_score = correct_count / len(y_test)
        return accuracy_score

    # 训练结果进行可视化展示
    def plot_logistic(self, X_train, y_train, params):
        n = X_train.shape[0]
        xcord1 = []
        ycord1 = []
        xcord2 = []
        ycord2 = []
        for i in range(n):
            if y_train[i] == 1:
                xcord1.append(X_train[i][0])
                ycord1.append(X_train[i][1])
            else:
                xcord2.append(X_train[i][0])
                ycord2.append(X_train[i][1])
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.scatter(xcord1, ycord1, s=32, c='red')
        ax.scatter(xcord2, ycord2, s=32, c='green')
        x = np.arange(-1.5, 3, 0.1)
        y = (-params['b'] - params['w'][0] * x) / params['w'][1]
        ax.axis([-5, 5, -6, 8])
        ax.plot(x, y)
        plt.xlabel('X1')
        plt.ylabel('X2')
        plt.show()

if __name__ == '__main__':
    model = logistic_regression()
    X_train, y_train, X_test, y_test = model.prepare_data()
    print('X_train=', X_train.shape)
    # X_train= (90, 2)
    print('X_test=', X_test.shape)
    # X_test= (10, 2)
    print('y_train=', y_train.shape)
    # y_train= (90, 1)
    print('y_test=', y_test.shape)
    # y_test= (10, 1)

    # 执行训练:
    loss_list, params, grads = model.logistic_train(X_train, y_train, 0.01, 1000)
    print('loss_list=', loss_list)
    # loss_list= [0.5228014004230804, 0.417193666163787, 
    # 0.3485125888648054, 0.30107891442418105, 0.26662317908339517,     	    
    # 0.2405565898328171, 0.2201833810712475, 0.20383444375123333, 
    # 0.1904280006144534]
    
    print('params=', params)
    # params= {'w': array([[ 2.04507814],
    #    [-0.03969595]]), 'b': 0.12322258108840739}
    print('grads=', grads)
	# grads= {'dw': array([[-0.10035347],
    #    [-0.00495971]]), 'db': -0.01367061680594202}

    y_train_pred = model.predict(X_train, params)
    accuracy_score_train = model.accuracy(y_train, y_train_pred)
    print('train accuracy is:', accuracy_score_train)
    y_test_pred = model.predict(X_test, params)
    accuracy_score_test = model.accuracy(y_test, y_test_pred)
    print('test accuracy is:', accuracy_score_test)
    model.plot_logistic(X_train, y_train, params)

epoch 100 loss 0.522801
epoch 200 loss 0.417194
epoch 300 loss 0.348513
epoch 400 loss 0.301079
epoch 500 loss 0.266623
epoch 600 loss 0.240557
epoch 700 loss 0.220183
epoch 800 loss 0.203834
epoch 900 loss 0.190428
train accuracy is: 0.9666666666666667
test accuracy is: 1.0

在这里插入图片描述


参考:

1、周志华 《机器学习》 CH3-线性回归

2、数学推导+纯Python实现机器学习算法2:逻辑回归

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值