【多层感知机 + 常见的激活函数】

  给定输入 x x x,权重 w w w,和偏移 b b b,感知机输出:
o = σ ( < w , x > + b ) σ ( x ) = { 1 x > 0 − 1 o t h e r w i s e o=\sigma(<w,x>+b)\quad \sigma(x)=\left\{ \begin{array}{rcl} 1 & & {x>0}\\ -1 & &{otherwise} \end{array} \right. o=σ(<w,x>+b)σ(x)={11x>0otherwise

可以看出感知机是为了解决二分类问题:-1或1

训练感知机

i n i t i a l i z e w = 0 a n d b = 0 initialize\quad w=0\quad and\quad b=0 initializew=0andb=0
r e p e a t repeat repeat
i f y i [ < w , x > + b ] ≤ 0 t h e n if \quad y_i[<w,x>+b]\le0\quad then ifyi[<w,x>+b]0then
   w ← w + y i x i a n d b ← b + y i w\leftarrow w+y_ix_i\quad and\quad b\leftarrow b+y_i ww+yixiandbb+yi
e n d i f end\quad if endif
u n t i l a l l c l a s s i f i e d c o r r e c t l y until\quad all\quad classified\quad correctly untilallclassifiedcorrectly

等价于使用批量大小为1的梯度下降,并使用如下损失函数:
l ( y , x , w ) = max ⁡ ( 0 , − y < w , x > ) l(y,x,w)=\max(0,-y<w,x>) l(y,x,w)=max(0,y<w,x>)

  • 关于其中的参数更新是分别对 w w w b b b求偏导,比较简单,这里就不进行推导了
  • 之所以使用损失函数 l l l是因为如果分类正确,那么 l = 0 l=0 l=0,没有梯度,就不需要更新;如果分类错误,那么 l > 0 l>0 l>0,就对参数进行更新$$

感知机的问题

  感知机不能拟合XOR函数,他只能产生线性分割面

在这里插入图片描述
总结

  • 感知机是一个二分类模型,是最早的AI模型之一
  • 它的求解算法等价于使用批量大小为1的梯度下降
  • 它不能拟合XOR函数,导致的第一次A!寒冬

多层感知机

   我们再来看一下XOR问题,加入我们对其做两次决策,那么问题就迎刃而解:
在这里插入图片描述
  多层感知机是由一个输入层,一个输出层以及多个隐藏层组成,下面是单隐藏层的结构:
在这里插入图片描述

  • 隐藏层的层数是超参数,是人为决定的
  • 输入 x ∈ R n x\in R^n xRn
  • 隐藏层 W 1 ∈ R m × n , b 1 ∈ R M W_1\in R^{m \times n},b_1\in R^M W1Rm×n,b1RM
  • 输出层 W 2 ∈ R m , b 2 ∈ R W_2\in R^m,b_2\in R W2Rm,b2R
    h = σ ( W 1 x + b 1 ) h=\sigma(W_1x+b_1) h=σ(W1x+b1) o = W 2 T h + b 2 o=W_2^Th+b_2 o=W2Th+b2
    其中 σ \sigma σ是激活函数(一般为非线性的)

常用的激活函数

  • Sigmoid激活函数
    将输入投影到(0,1),是一个软的:
    s i g m o i d ( x ) = 1 1 + e ( − x ) sigmoid(x)=\frac{1}{1+e^{(-x)}} sigmoid(x)=1+e(x)1
    在这里插入图片描述
  • Tanh激活函数
    将输入投影到(-1,1):
    t a n h ( x ) = 1 − e − 2 x 1 + e − 2 x tanh(x)=\frac{1-e^{-2x}}{1+e^{-2x}} tanh(x)=1+e2x1e2x
    在这里插入图片描述
  • Relu激活函数
    R e l u ( x ) = max ⁡ ( x , 0 ) Relu(x)=\max(x,0) Relu(x)=max(x,0)
    在这里插入图片描述

多分类问题
y 1 , y 2 , . . . , y k = s o f t m a x ( o 1 , o 2 , . . . , o k ) y_1,y_2,...,y_k=softmax(o_1,o_2,...,o_k) y1,y2,...,yk=softmax(o1,o2,...,ok)
即对输出层的结果套个softmax即可,跟softmax的区别就是加了隐藏层

总结

  • 多层感知机使用隐藏层和激活函数来得到非线性模型
  • 常用激活函数是Sigmoid,Tanh,ReLU
  • 使用Softmax来处理多类分类
  • 超参数为隐藏层数,和各个隐藏层大小

多层感知机从零实现

import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn
# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader
#  定义relu激活函数
def relu(x):
    a = torch.zeros_like(x)
    return torch.max(a, x)
# 定义softmax函数
def softmax(x):
    x_exp = torch.exp(x)
    sum_exp = torch.sum(x_exp, dim=-1, keepdim=True)
    return x_exp / sum_exp
# 定义损失函数(交叉熵损失)
def loss(y_hat, y):
    return -torch.log(y_hat[range(len(y_hat)), y])
# 定义优化方式
def mbgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()
# 定义模型
def model(X, w_params, b_params):
    tmp = X.reshape((-1, w_params[0].shape[0]))
    for w_param, b_param in zip(w_params[:-1], b_params[:-1]):
        tmp = relu(tmp @ w_param + b_param)
    return softmax(tmp @ w_params[-1] + b_params[-1])
# 初始化参数
w_1 = torch.normal(0, 0.01, size=(28*28,256), requires_grad=True)
b_1 = torch.zeros(256, requires_grad=True)

w_2 = torch.normal(0, 0.01, size=(256, 10), requires_grad=True)
b_2 = torch.zeros(10, requires_grad=True)

w_params = [w_1, w_2]
b_params = [b_1, b_2]
lr = 0.1
epochs = 10
batch_size = 256
# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]
train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X, w_params, b_params)
        l = loss(y_hat, y)
        l.sum().backward() # 进行反向传播计算梯度
        mbgd([w_1, w_2, b_1, b_2], lr, batch_size)
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x, w_params, b_params)
            test_acc += accuracy(Y, Y_hat)
            
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')
epoch is now1, the accuracy on train data is 0.6400277777777778, and the accuracy on test data is 0.7706
epoch is now2, the accuracy on train data is 0.7906333333333333, and the accuracy on test data is 0.8317
epoch is now3, the accuracy on train data is 0.8188888888888889, and the accuracy on test data is 0.8429
epoch is now4, the accuracy on train data is 0.83395, and the accuracy on test data is 0.8497
epoch is now5, the accuracy on train data is 0.8431055555555556, and the accuracy on test data is 0.8553
epoch is now6, the accuracy on train data is 0.8496277777777778, and the accuracy on test data is 0.8635
epoch is now7, the accuracy on train data is 0.8535333333333334, and the accuracy on test data is 0.8663
epoch is now8, the accuracy on train data is 0.8604333333333334, and the accuracy on test data is 0.8681
epoch is now9, the accuracy on train data is 0.8626777777777778, and the accuracy on test data is 0.8638
epoch is now10, the accuracy on train data is 0.867638888888889, and the accuracy on test data is 0.8759

多层感知机简洁实现

import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn
# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader
model = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

model.apply(init_weights)
Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=10, bias=True)
)
# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
lr = 0.1
epochs = 10
batch_size = 256
train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X)
        l = loss(y_hat, y)
        optimizer.zero_grad()
        l.backward() # 进行反向传播计算梯度
        optimizer.step()
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x)
            test_acc += accuracy(Y, Y_hat)
            
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')
epoch is now1, the accuracy on train data is 0.6536, and the accuracy on test data is 0.7447
epoch is now2, the accuracy on train data is 0.7927777777777777, and the accuracy on test data is 0.8291
epoch is now3, the accuracy on train data is 0.8186222222222223, and the accuracy on test data is 0.8332
epoch is now4, the accuracy on train data is 0.8341666666666666, and the accuracy on test data is 0.828
epoch is now5, the accuracy on train data is 0.8420444444444444, and the accuracy on test data is 0.8573
epoch is now6, the accuracy on train data is 0.8491277777777777, and the accuracy on test data is 0.8593
epoch is now7, the accuracy on train data is 0.8551222222222222, and the accuracy on test data is 0.8564
epoch is now8, the accuracy on train data is 0.8605444444444444, and the accuracy on test data is 0.8634
epoch is now9, the accuracy on train data is 0.8627111111111111, and the accuracy on test data is 0.8655
epoch is now10, the accuracy on train data is 0.8670666666666667, and the accuracy on test data is 0.8712
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

菜鸟炼丹师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值