Numpy优化器(Momentum)

本文介绍了Momentum优化器的工作原理,通过指数加权平均结合当前和历史梯度,提高训练效率。作者使用Python和Sklearn库实现了Momentum优化器,并在MNIST数据集上展示了其在神经网络训练中的应用,测试集达到94%的精度。
摘要由CSDN通过智能技术生成

前言

  在上一章节《Numpy批次训练》中提到要聊优化器,那么本章就带来第一个优化器——Momentum。

import numpy as np
import time
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import starknn#导入自己写的模块

一、数据准备

#准备数据
X, y = make_moons(n_samples = 1000, noise=0.3)#数据和标签
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)#拆分数据集,训练集:测试集=9:1
y_train = starknn.idx2onehot(y_train)#标签转化,独热编码

二、Momentum介绍

  在介绍Momentum之前,先了解指数加权平均。假如我们有一周的天气数据,我们怎么根据一周的数据画出走势图呢?答案是,根据昨天的天气,与今天的天气来画。

def exponent_prediction(curr_temp, pre_temp, beta):
    return beta * pre_temp + (1 - beta) * curr_temp
week_temp = np.array([34, 32, 31, 30, 29, 18, 17])#一个星期的温度数据
week_offset = np.append(week_temp[1: ], 20)#添加20是为了让两个向量保持一致,你也可以添加其他值
week_prediction = exponent_prediction(week_temp, week_offset, 0.9)
week_day = np.arange(1, 8)

  绘图如下。

fig, ax = plt.subplots(figsize=(7,4))
s = ax.scatter(week_day, week_temp, c='r')
l = ax.plot(week_day, week_prediction, c='g')
ax.legend(["temperature", "prediction"])
ax.set_xlabel('day')
ax.set_ylabel('degree')
plt.savefig('exponent.png')

在这里插入图片描述

  Momentum也是一样,只是把温度换成梯度。根据之前的梯度和当前的梯度,计算新的梯度值。

三、超参数设置

batch_size = 32#批次大小
learning_rate = 0.01#学习率
beta = 0.9#momentum中的beta系数
epochs = 10000#训练轮次
#初始化模型参数
nn_cfg = [{"in_features": 2,  "out_features": 25, "activation": "relu"},#(2,25)
          {"in_features": 25,  "out_features": 50, "activation": "relu"},#(25,50)
          {"in_features": 50,  "out_features": 50, "activation": "relu"},#(50,50)
          {"in_features": 50,  "out_features": 25, "activation": "relu"},#(50,25)
          {"in_features": 25,  "out_features": 2, "activation": "sigmoid"}]#(25,2)

四、代码实现

def calc_momentum(pre_value, curr_value, beta):
    return beta * pre_value + (1 - beta) * curr_value#指数加权平均
def momentum_optimizer(curr_grads, pre_grads, beta):
    results = {}
    if pre_grads:#如果字典不为空
        for layer, curr_value in curr_grads.items():#获取当前层名字和模型参数值
            pre_value = pre_grads[layer]#取出之前的梯度
            results[layer] = calc_momentum(pre_value, curr_value, beta)
    else:#字典为空,第一次执行梯度下降,之前没有值
        results = curr_grads
    return results
#批次数据训练
def momentum_train(X, Y, nn_cfg, epochs, learning_rate, batch_size, beta, train=True):
    params = starknn.init_layers(nn_cfg, 2)
    num_batch = X.shape[0] // batch_size#数据数量整除批次大小
    acc_history = []
    cost_history = []
    pre_grads = {}#梯度是一个字典,保存所有的权重和偏置
    for i in range(epochs):
        offset_idx = i % num_batch#一批次训练
        X_batch = X[offset_idx: (offset_idx + 1) * batch_size, :]
        Y_batch = Y[offset_idx: (offset_idx + 1) * batch_size, :]
        #前向传播
        Y_hat, memory = starknn.forward_full_layer(X_batch, params, nn_cfg)
        #计算准确率
        accuracy = starknn.calc_accuracy(Y_hat, Y_batch, train=train)
        #计算损失
        cost = starknn.calc_cost(Y_hat, Y_batch)
        acc_history.append(accuracy)
        cost_history.append(cost)
        #反向传播
        curr_grads = starknn.full_backward_propagation(Y_hat, Y_batch, memory, params, nn_cfg)
        grads = momentum_optimizer(curr_grads, pre_grads, beta)
        #更新参数
        params = starknn.update(params, grads, nn_cfg, learning_rate)    
    return params, acc_history, cost_history
start = time.time()
params, acc_history, cost_history = momentum_train(x_train, y_train, nn_cfg, epochs, learning_rate, batch_size, beta)
end = time.time()
print('The momentum optimizer time is {:.2f} second.'.format(end-start))
#测试
y_hat, _ = starknn.forward_full_layer(x_test, params, nn_cfg)
test_accuracy = starknn.calc_accuracy(y_hat, y_test, train=False)
print('The accuracy of this test dataset is {}%.'.format(test_accuracy * 100))
The momentum optimizer time is 37.38 second.
The accuracy of this test dataset is 94.0%.

总结

  Momentum优化器比《Numpy批次训练》中的SGD要好很多,很多优点我就不展开了。下一章节将会继续讨论其他优化器,敬请期待。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
下面是一个基于 NumPy 和 Pandas 库的自定义优化器代码示例: ```python import numpy as np import pandas as pd class Optimizer: def __init__(self, lr=0.01): self.lr = lr def update(self, params, gradients): pass class SGD(Optimizer): def __init__(self, lr=0.01): super().__init__(lr) def update(self, params, gradients): for param, grad in zip(params, gradients): param -= self.lr * grad class Momentum(Optimizer): def __init__(self, lr=0.01, momentum=0.9): super().__init__(lr) self.momentum = momentum self.velocities = None def update(self, params, gradients): if self.velocities is None: self.velocities = [np.zeros_like(param) for param in params] for i, (param, grad) in enumerate(zip(params, gradients)): self.velocities[i] = self.momentum * self.velocities[i] - self.lr * grad param += self.velocities[i] class Adam(Optimizer): def __init__(self, lr=0.001, beta1=0.9, beta2=0.999, eps=1e-8): super().__init__(lr) self.beta1 = beta1 self.beta2 = beta2 self.eps = eps self.m = None self.v = None self.t = 0 def update(self, params, gradients): if self.m is None: self.m = [np.zeros_like(param) for param in params] self.v = [np.zeros_like(param) for param in params] self.t += 1 lr_t = self.lr * np.sqrt(1 - self.beta2**self.t) / (1 - self.beta1**self.t) for i, (param, grad) in enumerate(zip(params, gradients)): self.m[i] = self.beta1 * self.m[i] + (1 - self.beta1) * grad self.v[i] = self.beta2 * self.v[i] + (1 - self.beta2) * grad**2 param -= lr_t * self.m[i] / (np.sqrt(self.v[i]) + self.eps) ``` 这里定义了三种常用的优化器:随机梯度下降(SGD)、动量法(Momentum)和自适应矩估计(Adam)。每个优化器都继承自基类 `Optimizer`,并实现了 `update` 方法来更新模型参数。在更新参数时,每个优化器都使用了不同的公式来计算参数的更新量。其中,`SGD` 只使用梯度,`Momentum` 加入了动量项,而 `Adam` 则使用了动量项和自适应学习率。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值