作业二·Softmax实现多分类

明·煜

已于 2023-04-25 10:26:31 修改

阅读量1k

点赞数 4

分类专栏：机器/深度学习笔记文章标签：分类机器学习算法

于 2023-04-25 10:08:46 首次发布

本文链接：https://blog.csdn.net/weixin_53195427/article/details/130358620

版权

机器/深度学习笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章介绍了Softmax回归的基本原理，包括决策函数的公式解析，风险函数和梯度函数的定义。通过一个具体的例子展示了如何使用Softmax处理多分类问题，特别是在Iris数据集上的应用。文章还提供了Python代码实现，包括数据读取、训练和测试过程，以及交叉熵损失函数和梯度下降优化方法。

摘要由CSDN通过智能技术生成

Softmax实现多分类

文章目录

Softmax实现多分类

一、基本原理

1.1 决策函数

1.1.1 表达式

对于多类问题，类别标签 $\in {1,2,\cdots,C}$ 可以有C个取值,给定一个样本x，则Softmax回归预测的属于c的条件概率是：
$\operatorname{softmax}(w^T_cx) = \frac{e^{w_c^Tx}}{\sum_{c_i = 1}^{C}e^{w_{c_i}^Tx}}$
其中 $w_c$ 是第c类的权重向量。使用条件概率最大的类别，作为最终的预测 $\hat{y}$ ,因此决策函数表达为：
$\hat{y} = \mathop{\mathrm{argmax}}\limits_{c = 1}^{C}{p(y=c|x)} = \mathop{\mathrm{argmax}}\limits_{c = 1}^{C}{w_c^Tx}$
决策函数使用矩阵表示：
$\hat{y} = \frac{e^{W^Tx}}{1_C^T e ^{W^Tx}}$
其中， $=[w1,\cdots,w_C]$ 是由C个类的权重向量组成的矩阵， $1_C$ 为C维的全1向量， $\hat{y}$ 为一个C维向量，第c个元素的值是第c类的预测条件概率

1.1.2 具体分析

令 $x$ 是一个五维向量，记为：
$\begin{bmatrix} x1 \\ x2 \\ x3 \\ x4 \\ 1 \end{bmatrix}$

对于第 $i$ 个样本（总样本数为N）， $x^i$ 的矩阵表示为：

$x^{(i)} = \begin{bmatrix} x^i_1 \\ x^i_2 \\ x^i_3 \\ x^i_4 \\ 1 \end{bmatrix}$

用c维的one-hot向量 $y\in{0,1}^C$ 来表示标签类别，以C=3为例，假设第 $i$ 样本属于第二类，则该点的标签值为

$y^{(i)} = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}$

对五维向量x,需要将样本分成三类，有以下权重向量

$\begin{bmatrix} w_{11} & w_{12} & w_{13}\\ w_{21} & w_{22} & w_{23}\\ w_{31} & w_{32} & w_{33}\\ w_{41} & w_{42} & w_{43}\\ w_{51} & w_{52} & w_{53}\\ \end{bmatrix}$

获得Softmax对第 $i$ 个样本分类的矢量表达式：
$\hat{y}^i = \operatorname{softmax}(W^Tx^{(i)}) = \frac{e^{W^Tx^{(i)}}}{1_C^T e ^{W^Tx^{(i)}}}$

即取出概率最大的作为预测的类别

1.2 风险函数

使用交叉熵损失，对于第i个样本（总样本数为N）的损失函数为(矩阵表示)：
$-\frac{1}{N}\sum_{i=1}^{N}(y^{(i)})^T log {\hat{y}}^{(i)}$
其实就是一个求内积的过程，在对所有样本求和。

1.3 梯度函数

对于第i个样本，其梯度函数表示为（矩阵表示）：
$\frac{\partial L^{(i)}(W)}{\partial W} = -x^{(i)} ( y^{(i)}-\hat{y}^{(i)} )^T$
因此，如果采用全批梯度下降法，那么参数的更新过程为：

初始化 $W_0 \leftarrow 0$ ,然后通过下式进行迭代更新：
$W_{t+1} \leftarrow W_t + \alpha(\frac{1}{N}\sum_{i=1}^{N}x^{(i)} ( y^{(i)}-\hat{y}^{(i)} )^T)$

二、代码实现

2.1 Iris数据读取

import csv
import numpy as np
from matplotlib import pyplot as plt
# 样本数据的抽取
with open('iris.data') as csv_file:
    data = list(csv.reader(csv_file, delimiter=','))

label_map = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica':2
}
# 使用Softmax解决多分类问题

# 抽取样本
X = np.array([[float(x) for x in s[:-1]] for s in data[:150]], np.float32) # X是一个四维数据
Y = np.array([[label_map[s[-1]]] for s in data[:150]], np.float32) 

# 将 Y 转换为可以用0、1表示的向量
tmp = np.zeros((Y.shape[0],3)) # 寄存器
for i in range(Y.shape[0]):
    if Y[i] == 0:
        tmp[i] = [1,0,0]
    if Y[i] == 1:
        tmp[i] = [0,1,0]
    if Y[i] == 2:
        tmp[i] = [0,0,1]
Y = tmp

# 分割数据集

# 将数据集按照8：2划分为训练集和测试集
train_idx = np.random.choice(150, 120, replace=False)

test_idx = np.array(list(set(range(150)) - set(train_idx)))

# train-训练集 test-测试集
X_train, Y_train = X[train_idx], Y[train_idx]
b = np.ones((X_train.shape[0],1)) # 添加常数项
X_train = np.hstack((X_train, b))
X_test, Y_test = X[test_idx], Y[test_idx]
b = np.ones((X_test.shape[0],1)) # 添加常数项
X_test = np.hstack((X_test, b))

2.2 训练和测试

##########################
# 决策函数 decision function
def softmax(x,w):
    x = x.reshape(x.shape[0],1) # 转化为列向量
    a = (w.T)@x
    a = a - max(a) # 防止数据上溢出
    return np.exp(a)/(np.sum(np.exp(a)))

##########################


##########################
# 交叉熵损失函数 loss function
def loss(x,y,w):
    sum = 0; # 初始化

    for i in range(x.shape[0]):
        y_hat = softmax(x[i],w) # 预测值
        sum += np.dot(y[i],np.log(y_hat))

    return -sum/x.shape[0] # 求均值

##########################


##########################
# 梯度函数 optimizer
def gradient(x,y,w):
    # 这里注意，一维数组无法进行转置，只能先变成二维数组
    y_hat = softmax(x,w) # 预测值
    y = y.reshape(y.shape[0],1) # 变为二维矩阵
    error = (y-y_hat)
    x = x.reshape(x.shape[0],1)
    return -x @ error.T # 返回该样本点所在的梯度值
    

##########################

##########################
# 训练函数 train function
def train(x,y,w,lr=0.05,epoch=300): # 学习率是0.05,最大的迭代次数是epoch=300
    train_err = []
    test_err = []
    for i in range(epoch):
        reg = np.zeros((w.shape[0],w.shape[1])); # 存储梯度值的寄存器初始化
        if loss(x,y,w) > 0:
            for j in range(x.shape[0]):
                reg += gradient(x[j],y[j],w) # 获得所有样本梯度的累加
            reg = reg/x.shape[0] # 获得梯度均值
            w = w - lr*reg # 损失值大于0，计算梯度，更新权值
        test_err.append(test(X_test, Y_test,w))
        train_err.append(test(X_train,Y_train,w))
        # print('epoch:',i,'train error:',train_err[-1],'test error:',test_err[-1])
    return w,train_err,test_err
##########################


##########################
# 定义测试函数 Tset Function
def test(x,y,w):
    right = 0
    for i in range(x.shape[0]):
        max = np.argmax(softmax(x[i],w)) # 最大值所在位置
        max_y = np.argmax(y[i]) # 找到y中1的位置，就是所属的分类类别
        if max == max_y:
            right += 1
    return 1- right/x.shape[0]
##########################


w = np.ones((X_train.shape[1],Y_train.shape[1]))
w,train_err,test_err = train(X_train,Y_train,w) 
print(w)
print('最终训练误差和测试误差','train error:',train_err[-1],'test error:',test_err[-1])
# 绘制训练误差 train error
plt.plot(train_err)
plt.title('Softmax')
plt.xlabel('epoch')
plt.ylabel('train error')
plt.ylim((-0.3, 1))
plt.grid()
plt.show()

[[ 1.3960391   1.26663754  0.33732336]
 [ 1.98074793  0.6978966   0.32135547]
 [-0.41965035  1.14600963  2.27364073]
 [ 0.33760001  0.69904297  1.96335702]
 [ 1.20345149  1.14126093  0.65528758]]
最终训练误差和测试误差 train error: 0.01666666666666672 test error: 0.09999999999999998

请添加图片描述

# 绘制测试误差test error
plt.plot(test_err)
plt.title('Softmax')
plt.xlabel('epoch')
plt.ylabel('test error')
plt.ylim((-0.3, 1))
plt.grid()
plt.show()

请添加图片描述

明·煜

关注

4
点赞
踩
9

收藏

觉得还不错? 一键收藏
2
评论
作业二·Softmax实现多分类

对于多类问题，类别标签y∈1,2,⋯ ,Cy \in {1,2,\cdots,C}y∈1,2,⋯,C可以有C个取值,给定一个样本x，则Softmax回归预测的属于c的条件概率是：p(y=c∣x)=softmax⁡(wcTx)=ewcTx∑ci=1CewciTxp(y=c|x) = \operatorname{softmax}(w^T_cx) = \frac{e^{w_c^Tx}}{\sum_{c_i = 1}^{C}e^{w_{c_i}^Tx}}p(y=c∣x)=softmax(wcTx)=∑ci
复制链接

扫一扫

专栏目录