自学神经网络系列——6 优化算法的改进

最新推荐文章于 2024-04-18 14:53:00 发布

ML_python_get√

最新推荐文章于 2024-04-18 14:53:00 发布

阅读量385

点赞数

分类专栏：机器学习笔记文章标签：神经网络 python 深度学习

本文链接：https://blog.csdn.net/weixin_51499396/article/details/118893192

版权

机器学习笔记专栏收录该内容

22 篇文章 2 订阅

订阅专栏

优化算法的改进

6.1 参数的更新
6.2 权重的初始值
6.3 激活值标准化
6.4 正则化
6.5 超参数的验证

6.1 参数的更新

SGD
Momentum
AdaGrad
Adam

6.1.1 SGD

简单但可能没有效率，比如f = 0.05x^2+y2
梯度方向：可能并不指向最低点
局部最小和全局最小`

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np 

def f(x,y):
    return np.power(x,2)/20 + np.power(y,2)

fig1 = plt.figure()
ax  = Axes3D(fig1)
x = np.arange(-10,10,0.1)
y = np.arange(-10,10,0.1)
x,y = np.meshgrid(x,y)
z = f(x,y)
ax.plot_surface(x,y,z,rstride=1,cstride=1,cmap=plt.cm.coolwarm)
ax.contourf(x,y,z,zdir='z', offset=-2,cmap=plt.cm.coolwarm)
ax.set_xlabel('x',color='r')
ax.set_ylabel('y',color='g')
ax.set_zlabel('z',color='b')
plt.show()
# 可以看到等高线越靠近0，越稀疏，无变化
# 同时等高线只在y轴方向变化

# 梯度图
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-10,10,1)
y= np.arange(-10,10,2)
u,v = np.meshgrid(-x/10,-2*y) #负梯度方向
fig,ax = plt.subplots()
q = ax.quiver(x,y,u,v)
ax.quiverkey(q,X=0.3,Y=1.1,U=10,label='Quiver Key,length=10',labelpos='E')
plt.show()

class SGD:

    def __init__(self,lr=0.01):
        self.lr=lr
    
    def update(self,params,grads):
        for key in params.keys():
            params[key] -= grads[key]*self.lr

# 测试
params = {}
params['x'] = [-7.2]
params['y'] = [2.0]
x = -7.2
y = 2.0
N=40
lr = 0.9
for i in range(N):
    x  -= x/10*lr
    y -= 2*y*lr
    params['x'].append(x)
    params['y'].append(y)

params

# 二维图像
x = np.arange(-10,10,0.1)
y = np.arange(-10,10,0.1)
x,y = np.meshgrid(x,y)
plt.contourf(x,y,f(x,y),cmap=plt.cm.coolwarm)
plt.plot(params['x'],params['y'],color='black',marker='o',linestyle='solid')
plt.show()
# 之字形，反复横跳，效率不高

6.1.2 Momentum

动量概念：类似小球在平面上运动
αv - lr*self.grads[key]
有一个速度的方向
解决波动过大问题

class Momentum:

    def __init__(self,lr=0.01,momentum=0.9):
        self.lr = lr
        self.momentum = momentum
        self.v = None
    
    def update(self,params,grads):
        if self.v is None:
            self.v = {}
            for key,val in params.items():
                self.v[key] = np.zeros_like(val)
        
        for key in params.keys():
            self.v[key] = self.momentum*self.v[key] -self.lr*self.grads[key]
            self.params[key] += self.v[key] # 改变了参数更新速度
            # 本次更新速度受上次更新速度的影响
            # 同时正负更新方向会相互抵消v[key] -grads[key]

6.1.3 AdaGrad

解决：学习率过大无法收敛
学习率衰减算法
h= h+梯度平方即之前所有时刻的平方和
学习率η = lr/sqrt(h)
无穷次更新后趋于0，解决办法遗忘过去很远的梯度，RMSProp方法

class AdaGrad:

    def __init__(self,lr=0.01):
        self.lr = lr
        self.h =None
    
    def update(self,params,grads):
        if self.h == None:
            self.h = {}
            for key,val in self.params.items():
                self.h[key] = np.zeros_like(val)
            
        for key in params.keys():
            self.h[key] += grads[key]*grads[key]
            self.params[key] -= self.lr*grads[key]/(np.sqrt(self.h[key])+1e-7)