记录:常见梯度下降算法python实现 (SGD、RMSprop、Adam等)

以简单的抛物曲面的最小化问题为例,在Python环境中实现了多种常见地图下降算法,SGD、Momentum、AdaGrad、RMSprop、AdaDelta、Adam、AdaMax、AMSGrad等。

所有参数已整定,可以直接运行得到结果,每个循环中,只对最大的那个梯度进行下降。数值微分使用以下公式计算:
f g ′ ( x ) = f ( x + h g ) − f ( x ) h f'_g(\mathbf{x})=\frac{f(\mathbf{x}+hg)-f(\mathbf{x})}{h} fg(x)=hf(x+hg)f(x)
式中, g g g为求导方向,可以用于一些处理带简单约束的优化问题。

本例中使用到的最小化代价函数:
m i n   f ( x , y ) = x 2 + y 2 + z 2 s . t .      x + y + z = 1 min \ f(x,y) = x^2 + y^2 + z^2 \\ s.t. \ \ \ \ x + y + z = 1 min f(x,y)=x2+y2+z2s.t.    x+y+z=1

程序实现默认为无约束的情况,等式约束仅在g被设置了情况下,才有效。

各个梯度算法的公式参照链接,代码实现如下

# -*- coding: utf-8 -*-
"""
Created on Tue Jun  9 09:20:20 2020

@author: Ziz
"""

from math import *
import numpy as np
import matplotlib.pyplot as plt
import time

##      f(x,y) = x^2 + y^2 + z^2
## s.t.  x + y + z = 1
def f(x):
    return np.power(x[0],2)+np.power(x[1],2)++np.power(x[2],2)


x_0 = 50
y_0 = -99
z_0 = 50

#g为求梯度方向
# g = np.array([[1,-1/2,-1/2],[-1/2,1,-1/2],[-1/2,-1/2,1]])#带约束下降
g = np.array([[1,0,0],[0,1,0],[0,0,1]])  #无约束下降
h = 0.001

x = [x_0,y_0,z_0]


start_time = time.perf_counter()
gradsum = np.zeros([3,1])
D = np.zeros([3,1])
V = np.zeros([3,1])
beta =0.9
alpha = 0.5
x_last = np.zeros_like(x)
V_hat=0
S_hat = 0
def sigmoid_fun(x):
    return (1/(1+np.exp(-x))-0.5)*2
delta_x=np.zeros([3,1])
for i in range(1000):

    delta= [(f(x+h*g[0,:]) - f(x)),
            (f(x+h*g[1,:]) - f(x)),
            (f(x+h*g[2,:]) - f(x))]
    
    grad = np.array([delta[0]/h,
                     delta[1]/h,
                     delta[2]/h])
    max_idx = np.argsort(abs(grad))[-1]
    
#SGD
      # x = x - g[max_idx,:]* gradsum[max_idx]*0.5  
      
#Momentum      
    # gradsum[max_idx,0] =gradsum[max_idx,0]*beta + (1-beta)* grad[max_idx]
    # x = x - g[max_idx,:]* gradsum[max_idx]*0.5

#AdaGrad(Adaptive gradient)
    # coef_1 = 100
    # coef_2 = 1e-7
    # gradsum[max_idx,0] =gradsum[max_idx,0]+grad[max_idx]**2
    # x = x - coef_1/sqrt(gradsum[max_idx]+coef_2)*g[max_idx,:]*grad[max_idx]
    
#RMSprop
    # coef_1 = 2   #lr
    # coef_2 = 1e-7  #avoid zeros
    # coef_3 = 0.9   #Decay rate
    # gradsum[max_idx,0] =gradsum[max_idx,0]*coef_3+(1-coef_3)*grad[max_idx]**2
    # x = x - coef_1/sqrt(gradsum[max_idx]+coef_2)*g[max_idx,:]*grad[max_idx]    
  
#AdaDelta
    x_last = x
    coef_2 = 1e-1  #avoid zeros  
    coef_3 = 0.95   #Decay rate  
    gradsum[max_idx,0] =gradsum[max_idx,0]*coef_3+(1-coef_3)*(grad[max_idx]**2)
    x = x - sqrt(D[max_idx]+coef_2)/sqrt(gradsum[max_idx]+coef_2)*g[max_idx,:]*grad[max_idx] 
    D[max_idx,0] = D[max_idx,0]*coef_3 + (1-coef_3)*(delta_x[max_idx]**2)
    delta_x = x-x_last
#Adam
    # coef_1 = 1
    # coef_2 = 1e-8  #avoid zeros                
    # coef_3 = 0.999   #Decay rate for Sum(G)    beta1
    # coef_4 = 0.9   #Decay rate for V         beta2
    # V[max_idx,0] = V[max_idx,0]*coef_4 + (1-coef_4)*(grad[max_idx])
    # gradsum[max_idx,0] =gradsum[max_idx,0]*coef_3+(1-coef_3)*grad[max_idx]**2
    # # V_hat = V[max_idx,0] /(1-coef_4)
    # S_hat = gradsum[max_idx,0] /(1-coef_3)
    # x = x - coef_1/(sqrt(S_hat)+coef_2)*V_hat *g[max_idx,:]
    # V_hat = V[max_idx,0] /(1-coef_4)
#AdaMax   
    # coef_1 = 0.002
    # # coef_2 = 1e-8  #avoid zeros                
    # coef_3 = 0.999  #Decay rate for Sum(G)    beta1
    # coef_4 = 0.99   #Decay rate for V         beta2
    # gradsum[max_idx,0] = max([coef_3*gradsum[max_idx,0],abs(grad[max_idx])])
    # V[max_idx,0] = V[max_idx,0]*coef_4+(1-coef_4)*grad[max_idx]
    # V_hat = V[max_idx,0] /(1-coef_4)
    # x = x - coef_1/(gradsum[max_idx,0])*V_hat *g[max_idx,:]
 
#AMSGrad 
    # coef_1 = 1.698
    # # coef_2 = 1e-8  #avoid zeros                
    # coef_3 = 0.999  #Decay rate for Sum(G)    beta1
    # coef_4 = 0.99   #Decay rate for V         beta2
    # gradsum[max_idx,0] = gradsum[max_idx,0]*coef_3+(1-coef_3)*(grad[max_idx]**2)
    # V[max_idx,0] = V[max_idx,0]*coef_4+(1-coef_4)*grad[max_idx]

    # S_hat = max(S_hat,gradsum[max_idx,0])
    # x = x - coef_1/(gradsum[max_idx,0])*V[max_idx,0] *g[max_idx,:]   
    
       
#take turns       
    # if i % 3 == 0:
    #     # print((f(x+h*g[0,:])-f(x))/h)
    #     grad = (f(x+h*g[0,:]) - f(x) )/h
    #     if grad > 0:
    #         x = x - h*g[0,:]*grad
    #     else:
    #         x = x + h*g[0,:]*-grad
    # elif i % 3 == 1:
    #     grad = (f(x+h*g[1,:]) - f(x))/h
    #     if grad > 0:
    #         x = x - h*g[1,:]*grad
    #     else:
    #         x = x + h*g[1,:]*-grad
    # elif i % 3 == 2:
    #     grad = (f(x+h*g[2,:]) - f(x))/h
    #     if grad > 0:
    #         x = x - h*g[2,:]*grad
    #     else:
    #         x = x + h*g[2,:]*-grad                    
end_time = time.perf_counter()           
print('The answer is {},min value is {},\r\n time_cost = {}'.format(x,f(x),end_time-start_time))
    

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值