softmax原理讲解与python代码实现

阿库塔姆

已于 2023-07-04 17:15:52 修改

阅读量1.9k

点赞数 4

文章标签：多分类人工智能机器学习 python

于 2023-07-04 17:09:59 首次发布

本文链接：https://blog.csdn.net/weixin_50744311/article/details/131531869

版权

文章目录

原理讲解
- 参数学习
python代码实现

在 Logistic Regression逻辑回归中我们讲解了如何利用逻辑回归进行二分类。实际上在生活中有很多场景的待分类物品不止有两种，因此我们需要研究一种多分类方法
Softmax回归就是Logistic回归在多分类问题上的推广

原理讲解

对于多类问题，类别标签 $y\in\{1,2,\cdots,C\}$ 可以有C个取值，给定一个样本 $\bf x$ ，Softmax回归预测的属于类别c的条件概率为 $p(y=c|{\bf x})={\rm softmax}({\bf w}_c^{\rm T}{\bf x}) \\ =\frac{e^{{\bf w}_c^{\rm T}{\bf x}}}{\sum_{c^{'}=1}^Ce^{{\bf w}_{c^{'}}^{\rm T}{\bf x}}}$ 其中， ${\bf w}_c$ 是第c类的权重向量
Softmax回归的决策函数可以表示为 $\hat{y}=\overset{C}{\underset{c=1}{{\rm argmax}}}p(y=c|{\bf x})\\=\overset{C}{\underset{c=1}{{\rm argmax}}}{\bf w}_c^{\rm T}{\bf x}$
可以看出当C=2时，就是我们之前讨论过的Logistic回归

参数学习

仍然可以用梯度下降法来完成参数优化
给定N个训练样本 $\{({\bf x}^{(n)},y^{(n)}\}_{n=1}^N$ ，使用交叉熵损失函数来完成参数矩阵 $\bf W$ 的优化。为了方便起见，用C维的one-hot向量 ${\bf y}\in\{0,1\}^C$ 来表示类别标签
风险函数为 $\mathcal R({\bf W})=-\frac{1}{N}\sum_{n=1}^N({\bf y}^{(n)})^{\rm T}\log{\hat {\bf y}}^{(n)}$ 其中 ${\hat {\bf y}}^{(n)}$ 为样本在每个类别的后验概率
风险函数关于 $\bf W$ 的梯度为 $\frac{\partial {\mathcal R}({\bf W})}{\partial {\bf W}}=-\frac{1}{N}\sum_{n=1}^N{\bf x}^{(n)}({\bf y}^{(n)}-{\hat {\bf y}}^{(n)})^{\rm T}$
从而可以采用梯度下降法完成训练

python代码实现

我们下面实现利用softmax回归进行多分类

生成数据集

通过程序生成一个三类样本、两组特征的数据集，如图
在这里插入图片描述
样本均值分别为（2.5，-2.5），（0,5），（-5，-5）

导入所需库

import numpy as np
from makedata import MakeData#生成数据，见上“生成数据集”链接
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

定义softmax函数以及标签 $\to$ one-hot转换函数

def softmax(z):
    # 计算softmax函数
    e_z = np.exp(z - np.max(z))
    return e_z / np.sum(e_z)

def one_hot_encode(y, num_classes):
    # 将标签转换为one-hot编码
    num_samples = y.shape[0]
    one_hot = np.zeros((num_samples, num_classes))
    one_hot[np.arange(num_samples), y] = 1
    return one_hot

定义SoftmaxRegression类

class SoftmaxRegression:
    def __init__(self, num_classes, num_features):
        self.num_classes = num_classes
        self.num_features = num_features
        self.w = np.zeros((num_features, num_classes))
    
    def train(self, X, y, learning_rate=0.01, num_iterations=100):
        num_samples = X.shape[0]
        y_enc = one_hot_encode(y, self.num_classes)
        
        for i in range(num_iterations):
            # 前向传播
            scores = np.dot(X, self.w)
            prob = softmax(scores)
            
            # 反向传播
            gradient = (1 / num_samples) * np.dot(X.T, (prob - y_enc))
            
            # 权重更新
            self.w -= learning_rate * gradient
    
    def predict(self, X):
        scores = np.dot(X, self.w)
        prob = softmax(scores)
        return np.argmax(prob, axis=1)

调用SoftmaxRegression类

if __name__ == '__main__':
    # 创建一个softmax回归模型，假设有3类和2个特征
    model = SoftmaxRegression(num_classes=3, num_features=2)
    M = [[2.5,-2.5],[0,5],[-5,-5]]
    data = MakeData(3, 2, 500, M)
    X,y = data.produce_data()
    y = y.astype(int)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
    
    # 训练模型
    model.train(X_train, y_train)
    
    # 使用训练好的模型进行预测
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(accuracy)  # 输出测试集准确率