【多层感知机 + 常见的激活函数】

最新推荐文章于 2024-07-09 15:57:34 发布

菜鸟炼丹师

最新推荐文章于 2024-07-09 15:57:34 发布

阅读量495

点赞数 2

分类专栏： # 沐神动手学ai学习笔记深度学习文章标签：机器学习深度学习人工智能

本文链接：https://blog.csdn.net/chenjunheaixuexi/article/details/125310803

版权

沐神动手学ai学习笔记同时被 2 个专栏收录

14 篇文章 3 订阅

订阅专栏

深度学习

14 篇文章 0 订阅

订阅专栏

给定输入 $x$ ，权重 $w$ ，和偏移 $b$ ，感知机输出：
$o=\sigma(<w,x>+b)\quad \sigma(x)=\left\{ \begin{array}{rcl} 1 & & {x>0}\\ -1 & &{otherwise} \end{array} \right.$

可以看出感知机是为了解决二分类问题：-1或1

训练感知机

$initialize\quad w=0\quad and\quad b=0$
$r e p e a t$
$\quad y_i[<w,x>+b]\le0\quad then$
$w\leftarrow w+y_ix_i\quad and\quad b\leftarrow b+y_i$
$end\quad if$
$until\quad all\quad classified\quad correctly$

等价于使用批量大小为1的梯度下降，并使用如下损失函数：
$l(y,x,w)=\max(0,-y<w,x>)$

关于其中的参数更新是分别对 $w$ 和 $b$ 求偏导，比较简单，这里就不进行推导了
之所以使用损失函数 $l$ 是因为如果分类正确，那么 $l = 0$ ，没有梯度，就不需要更新；如果分类错误，那么 $l > 0$ ，就对参数进行更新$$

感知机的问题

感知机不能拟合XOR函数，他只能产生线性分割面

在这里插入图片描述
总结

感知机是一个二分类模型，是最早的AI模型之一
它的求解算法等价于使用批量大小为1的梯度下降
它不能拟合XOR函数，导致的第一次A!寒冬

多层感知机

我们再来看一下XOR问题，加入我们对其做两次决策，那么问题就迎刃而解：
在这里插入图片描述
多层感知机是由一个输入层，一个输出层以及多个隐藏层组成，下面是单隐藏层的结构：

隐藏层的层数是超参数，是人为决定的
输入 $x\in R^n$
隐藏层 $W_1\in R^{m \times n},b_1\in R^M$
输出层 $W_2\in R^m,b_2\in R$
$h=\sigma(W_1x+b_1)$ $o=W_2^Th+b_2$
其中 $\sigma$ 是激活函数（一般为非线性的）

常用的激活函数

Sigmoid激活函数
将输入投影到(0,1)，是一个软的：
$sigmoid(x)=\frac{1}{1+e^{(-x)}}$
Tanh激活函数
将输入投影到(-1,1)：
$tanh(x)=\frac{1-e^{-2x}}{1+e^{-2x}}$
Relu激活函数
$Relu(x)=\max(x,0)$

多分类问题
$y_1,y_2,...,y_k=softmax(o_1,o_2,...,o_k)$
即对输出层的结果套个softmax即可，跟softmax的区别就是加了隐藏层

总结

多层感知机使用隐藏层和激活函数来得到非线性模型
常用激活函数是Sigmoid,Tanh,ReLU
使用Softmax来处理多类分类
超参数为隐藏层数，和各个隐藏层大小

多层感知机从零实现

import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn

# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader

#  定义relu激活函数
def relu(x):
    a = torch.zeros_like(x)
    return torch.max(a, x)

# 定义softmax函数
def softmax(x):
    x_exp = torch.exp(x)
    sum_exp = torch.sum(x_exp, dim=-1, keepdim=True)
    return x_exp / sum_exp

# 定义损失函数（交叉熵损失）
def loss(y_hat, y):
    return -torch.log(y_hat[range(len(y_hat)), y])

# 定义优化方式
def mbgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

# 定义模型
def model(X, w_params, b_params):
    tmp = X.reshape((-1, w_params[0].shape[0]))
    for w_param, b_param in zip(w_params[:-1], b_params[:-1]):
        tmp = relu(tmp @ w_param + b_param)
    return softmax(tmp @ w_params[-1] + b_params[-1])

# 初始化参数
w_1 = torch.normal(0, 0.01, size=(28*28,256), requires_grad=True)
b_1 = torch.zeros(256, requires_grad=True)

w_2 = torch.normal(0, 0.01, size=(256, 10), requires_grad=True)
b_2 = torch.zeros(10, requires_grad=True)

w_params = [w_1, w_2]
b_params = [b_1, b_2]
lr = 0.1
epochs = 10
batch_size = 256

# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]

train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X, w_params, b_params)
        l = loss(y_hat, y)
        l.sum().backward() # 进行反向传播计算梯度
        mbgd([w_1, w_2, b_1, b_2], lr, batch_size)
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x, w_params, b_params)
            test_acc += accuracy(Y, Y_hat)
            
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')

epoch is now1, the accuracy on train data is 0.6400277777777778, and the accuracy on test data is 0.7706
epoch is now2, the accuracy on train data is 0.7906333333333333, and the accuracy on test data is 0.8317
epoch is now3, the accuracy on train data is 0.8188888888888889, and the accuracy on test data is 0.8429
epoch is now4, the accuracy on train data is 0.83395, and the accuracy on test data is 0.8497
epoch is now5, the accuracy on train data is 0.8431055555555556, and the accuracy on test data is 0.8553
epoch is now6, the accuracy on train data is 0.8496277777777778, and the accuracy on test data is 0.8635
epoch is now7, the accuracy on train data is 0.8535333333333334, and the accuracy on test data is 0.8663
epoch is now8, the accuracy on train data is 0.8604333333333334, and the accuracy on test data is 0.8681
epoch is now9, the accuracy on train data is 0.8626777777777778, and the accuracy on test data is 0.8638
epoch is now10, the accuracy on train data is 0.867638888888889, and the accuracy on test data is 0.8759

多层感知机简洁实现

import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn

# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader

model = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

model.apply(init_weights)

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=10, bias=True)
)

# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]

loss = nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

lr = 0.1
epochs = 10
batch_size = 256

train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X)
        l = loss(y_hat, y)
        optimizer.zero_grad()
        l.backward() # 进行反向传播计算梯度
        optimizer.step()
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x)
            test_acc += accuracy(Y, Y_hat)
            
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')

epoch is now1, the accuracy on train data is 0.6536, and the accuracy on test data is 0.7447
epoch is now2, the accuracy on train data is 0.7927777777777777, and the accuracy on test data is 0.8291
epoch is now3, the accuracy on train data is 0.8186222222222223, and the accuracy on test data is 0.8332
epoch is now4, the accuracy on train data is 0.8341666666666666, and the accuracy on test data is 0.828
epoch is now5, the accuracy on train data is 0.8420444444444444, and the accuracy on test data is 0.8573
epoch is now6, the accuracy on train data is 0.8491277777777777, and the accuracy on test data is 0.8593
epoch is now7, the accuracy on train data is 0.8551222222222222, and the accuracy on test data is 0.8564
epoch is now8, the accuracy on train data is 0.8605444444444444, and the accuracy on test data is 0.8634
epoch is now9, the accuracy on train data is 0.8627111111111111, and the accuracy on test data is 0.8655
epoch is now10, the accuracy on train data is 0.8670666666666667, and the accuracy on test data is 0.8712