【多层感知机 + 常见的激活函数】

  给定输入 x x x,权重 w w w,和偏移 b b b,感知机输出:
o = σ ( < w , x > + b ) σ ( x ) = { 1 x > 0 − 1 o t h e r w i s e o=\sigma(<w,x>+b)\quad \sigma(x)=\left\{ \begin{array}{rcl} 1 & & {x>0}\\ -1 & &{otherwise} \end{array} \right. o=σ(<w,x>+b)σ(x)={11x>0otherwise



i n i t i a l i z e w = 0 a n d b = 0 initialize\quad w=0\quad and\quad b=0 initializew=0andb=0
r e p e a t repeat repeat
i f y i [ < w , x > + b ] ≤ 0 t h e n if \quad y_i[<w,x>+b]\le0\quad then ifyi[<w,x>+b]0then
   w ← w + y i x i a n d b ← b + y i w\leftarrow w+y_ix_i\quad and\quad b\leftarrow b+y_i ww+yixiandbb+yi
e n d i f end\quad if endif
u n t i l a l l c l a s s i f i e d c o r r e c t l y until\quad all\quad classified\quad correctly untilallclassifiedcorrectly

l ( y , x , w ) = max ⁡ ( 0 , − y < w , x > ) l(y,x,w)=\max(0,-y<w,x>) l(y,x,w)=max(0,y<w,x>)

  • 关于其中的参数更新是分别对 w w w b b b求偏导,比较简单,这里就不进行推导了
  • 之所以使用损失函数 l l l是因为如果分类正确,那么 l = 0 l=0 l=0,没有梯度,就不需要更新;如果分类错误,那么 l > 0 l>0 l>0,就对参数进行更新$$




  • 感知机是一个二分类模型,是最早的AI模型之一
  • 它的求解算法等价于使用批量大小为1的梯度下降
  • 它不能拟合XOR函数,导致的第一次A!寒冬



  • 隐藏层的层数是超参数,是人为决定的
  • 输入 x ∈ R n x\in R^n xRn
  • 隐藏层 W 1 ∈ R m × n , b 1 ∈ R M W_1\in R^{m \times n},b_1\in R^M W1Rm×n,b1RM
  • 输出层 W 2 ∈ R m , b 2 ∈ R W_2\in R^m,b_2\in R W2Rm,b2R
    h = σ ( W 1 x + b 1 ) h=\sigma(W_1x+b_1) h=σ(W1x+b1) o = W 2 T h + b 2 o=W_2^Th+b_2 o=W2Th+b2
    其中 σ \sigma σ是激活函数(一般为非线性的)


  • Sigmoid激活函数
    s i g m o i d ( x ) = 1 1 + e ( − x ) sigmoid(x)=\frac{1}{1+e^{(-x)}} sigmoid(x)=1+e(x)1
  • Tanh激活函数
    t a n h ( x ) = 1 − e − 2 x 1 + e − 2 x tanh(x)=\frac{1-e^{-2x}}{1+e^{-2x}} tanh(x)=1+e2x1e2x
  • Relu激活函数
    R e l u ( x ) = max ⁡ ( x , 0 ) Relu(x)=\max(x,0) Relu(x)=max(x,0)

y 1 , y 2 , . . . , y k = s o f t m a x ( o 1 , o 2 , . . . , o k ) y_1,y_2,...,y_k=softmax(o_1,o_2,...,o_k) y1,y2,...,yk=softmax(o1,o2,...,ok)


  • 多层感知机使用隐藏层和激活函数来得到非线性模型
  • 常用激活函数是Sigmoid,Tanh,ReLU
  • 使用Softmax来处理多类分类
  • 超参数为隐藏层数,和各个隐藏层大小


import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn
# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader
#  定义relu激活函数
def relu(x):
    a = torch.zeros_like(x)
    return torch.max(a, x)
# 定义softmax函数
def softmax(x):
    x_exp = torch.exp(x)
    sum_exp = torch.sum(x_exp, dim=-1, keepdim=True)
    return x_exp / sum_exp
# 定义损失函数(交叉熵损失)
def loss(y_hat, y):
    return -torch.log(y_hat[range(len(y_hat)), y])
# 定义优化方式
def mbgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
# 定义模型
def model(X, w_params, b_params):
    tmp = X.reshape((-1, w_params[0].shape[0]))
    for w_param, b_param in zip(w_params[:-1], b_params[:-1]):
        tmp = relu(tmp @ w_param + b_param)
    return softmax(tmp @ w_params[-1] + b_params[-1])
# 初始化参数
w_1 = torch.normal(0, 0.01, size=(28*28,256), requires_grad=True)
b_1 = torch.zeros(256, requires_grad=True)

w_2 = torch.normal(0, 0.01, size=(256, 10), requires_grad=True)
b_2 = torch.zeros(10, requires_grad=True)

w_params = [w_1, w_2]
b_params = [b_1, b_2]
lr = 0.1
epochs = 10
batch_size = 256
# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]
train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X, w_params, b_params)
        l = loss(y_hat, y)
        l.sum().backward() # 进行反向传播计算梯度
        mbgd([w_1, w_2, b_1, b_2], lr, batch_size)
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x, w_params, b_params)
            test_acc += accuracy(Y, Y_hat)
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')
epoch is now1, the accuracy on train data is 0.6400277777777778, and the accuracy on test data is 0.7706
epoch is now2, the accuracy on train data is 0.7906333333333333, and the accuracy on test data is 0.8317
epoch is now3, the accuracy on train data is 0.8188888888888889, and the accuracy on test data is 0.8429
epoch is now4, the accuracy on train data is 0.83395, and the accuracy on test data is 0.8497
epoch is now5, the accuracy on train data is 0.8431055555555556, and the accuracy on test data is 0.8553
epoch is now6, the accuracy on train data is 0.8496277777777778, and the accuracy on test data is 0.8635
epoch is now7, the accuracy on train data is 0.8535333333333334, and the accuracy on test data is 0.8663
epoch is now8, the accuracy on train data is 0.8604333333333334, and the accuracy on test data is 0.8681
epoch is now9, the accuracy on train data is 0.8626777777777778, and the accuracy on test data is 0.8638
epoch is now10, the accuracy on train data is 0.867638888888889, and the accuracy on test data is 0.8759


import torch
import torchvision
from matplotlib import pyplot as plt
from torchvision import datasets
from torchvision import transforms
from torch.utils import data
from torch import nn
# 获取数据集
trans = transforms.ToTensor()

train_data = datasets.FashionMNIST(root='../data/', train=True, 
                                  transform=trans, download=True)

test_data = datasets.FashionMNIST(root='../data/', train=False, 
                                  transform=trans, download=True)

# 获取数据采集器
def get_dataloader(batch_size, train_data, test_data):
    train_dataloader = data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
    test_dataloader = data.DataLoader(test_data, batch_size=batch_size, shuffle=False)
    return train_dataloader, test_dataloader
model = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=10, bias=True)
# 定义模型评估方式
def accuracy(y, y_hat):
    padding = torch.argmax(y_hat, -1)
    right = (padding == y).sum().numpy()
    return right / y.shape[0]
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
lr = 0.1
epochs = 10
batch_size = 256
train_dataloader, test_dataloader = get_dataloader(batch_size, train_data, test_data)

for epoch in range(epochs):
    train_acc = 0
    test_acc = 0
    for X,y in train_dataloader:
#         print(X.reshape((-1, w_params[0].shape[0])) @ w_params[0])
        y_hat = model(X)
        l = loss(y_hat, y)
        l.backward() # 进行反向传播计算梯度
        train_acc += accuracy(y, y_hat)
    # 计算在测试集上的准确率
    with torch.no_grad():
        for x,Y in test_dataloader:
            Y_hat = model(x)
            test_acc += accuracy(Y, Y_hat)
    print(f'epoch is now{epoch + 1}, the accuracy on train data is {train_acc / (len(train_data) / batch_size)}, and the accuracy on test data is {test_acc / (len(test_data) / batch_size)}')
epoch is now1, the accuracy on train data is 0.6536, and the accuracy on test data is 0.7447
epoch is now2, the accuracy on train data is 0.7927777777777777, and the accuracy on test data is 0.8291
epoch is now3, the accuracy on train data is 0.8186222222222223, and the accuracy on test data is 0.8332
epoch is now4, the accuracy on train data is 0.8341666666666666, and the accuracy on test data is 0.828
epoch is now5, the accuracy on train data is 0.8420444444444444, and the accuracy on test data is 0.8573
epoch is now6, the accuracy on train data is 0.8491277777777777, and the accuracy on test data is 0.8593
epoch is now7, the accuracy on train data is 0.8551222222222222, and the accuracy on test data is 0.8564
epoch is now8, the accuracy on train data is 0.8605444444444444, and the accuracy on test data is 0.8634
epoch is now9, the accuracy on train data is 0.8627111111111111, and the accuracy on test data is 0.8655
epoch is now10, the accuracy on train data is 0.8670666666666667, and the accuracy on test data is 0.8712




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则




¥1 ¥2 ¥4 ¥6 ¥10 ¥20



钱包余额 0


