梯度下降(code1.2)

梯度下降算法

代码课

现在,我们一起来编写梯度下降算法的代码。

考虑到更新公式:

w = w − α ∂ J ∂ w w=w-\alpha \frac{\partial J}{\partial w} w=wαwJ

b = b − α ∂ J ∂ b b=b-\alpha \frac{\partial J}{\partial b} b=bαbJ

函数需具备一下参数:

w_in,b_in,a

对于梯度,有以下方案可供选择:

请添加图片描述

于是乎,我们设置参数 gradient_function 用于计算梯度。

为了实时监控模型训练情况,很有必要间隔一段时间后输出当前拟合情况。我们在函数中写一段 log 输出代码,需设置 cost_function 参数,计算并反馈当前代价函数值。

对于停止更新条件的设计,我们有一下方案:
梯度下降终止条件
故还要给函数设置一个参数 iteration 控制迭代次数。

最终可设计以下算法流程图。

请添加图片描述

开始敲代码。

# import necessary modules.
from __future__ import annotations # annotation
import numpy as np # array
import pandas as pd # show datas.
import matplotlib.pyplot as plt # draw figures.
np.set_printoptions(precision=2) # set the display precision of numpy.
def gradient(x: ArrayLike, y: ArrayLike, w: float, b: float) -> tuple:
    # Use to calculate the gradient of squared error cost function in give point of (w, b)
    m = x.shape[0]
    dj_dw = 0
    dj_db = 0
    for i in range(0, m):
        err = w*x[i]+b-y[i]
        dj_dw += err*x[i]
        dj_db += err
    dj_dw = dj_dw/m
    dj_db = dj_db/m
    return (dj_dw, dj_db)
def squaredErrorCost(x: ArrayLike, y: ArrArrayLike, w: float, b: float):
    # Use to calculate the value of squared error cost function in give point of (w, b)
    m = x.shape[0]
    cost = 0
    for i in range(0, m):
        err = w*x[i]+b-y[i]
        cost += err**2
    cost = cost/(2*m)
    return cost
def gradientDescent(
    x: ArrayLike,
    y: ArrayLike,
    a: float,
    gradient: function,
    cost: function,
    iteration: int = 10000,
    w_in: float = 1,
    b_in: float = 0,
) -> tuple:
    # Use to operate the gradient descent to upgrade w, b
    w_init = w_in
    b_init = b_in
    cost_now = 0
    save_interval = np.ceil(iteration/10)
    for it in range(0, iteration):
        dj_dw, dj_db = gradient(x, y, w_init, b_init)
        w_init = w_init-a*dj_dw
        b_init = b_init-a*dj_db
        cost_now = cost(x, y, w_init, b_init)
        if it == 0 or it % save_interval == 0:
            print(f"iteration:{it}, cost:{cost_now:0.2e}")
    print(f"Final w:{w_init},b:{b_init},cost:{cost_now}")
    return (w_init, b_init)
# test example
x_emp = np.arange(0, 10)
y_emp = np.arange(0, 10)
a_emp = 1e-3
w_emp = 1
b_emp = 0
# test squareErrorCost().
squaredErrorCost(x_emp, y_emp, w_emp, b_emp) # np.float64(0.0)
# test gradient().
gradient(x_emp, y_emp, w_emp, b_emp) # (np.float64(0.0), np.float64(0.0))
# test gradientDescent.
gradientDescent(x_emp, y_emp, a_emp, gradient, squaredErrorCost, 10000, 1, 0) #  (np.float64(1.0), np.float64(0.0))
# import real data to learn. 
train_example = np.loadtxt(fname="../data/data1.txt", delimiter=",")
col_lable = ['size', 'floors', 'price']
df = pd.DataFrame(train_example, columns=col_lable) # 
print(df) # The result is as follows
sizefloorsprice
02104.03.0399900.0
11600.03.0329900.0
22400.03.0369000.0
31416.02.0232000.0
43000.04.0539900.0
51985.04.0299900.0
61534.03.0314900.0
71427.03.0198999.0
81380.03.0212000.0
91494.03.0242500.0
101940.04.0239999.0
112000.03.0347000.0
121890.03.0329999.0
134478.05.0699900.0
141268.03.0259900.0
152300.04.0449900.0
161320.02.0299900.0
171236.03.0199900.0
182609.04.0499998.0
193031.04.0599000.0
201767.03.0252900.0
211888.02.0255000.0
221604.03.0242900.0
231962.04.0259900.0
243890.03.0573900.0
251100.03.0249900.0
261458.03.0464500.0
272526.03.0469000.0
282200.03.0475000.0
292637.03.0299900.0
301839.02.0349900.0
311000.01.0169900.0
322040.04.0314900.0
333137.03.0579900.0
341811.04.0285900.0
351437.03.0249900.0
361239.03.0229900.0
372132.04.0345000.0
384215.04.0549000.0
392162.04.0287000.0
401664.02.0368500.0
412238.03.0329900.0
422567.04.0314000.0
431200.03.0299000.0
44852.02.0179900.0
451852.04.0299900.0
461203.03.0239500.0
# create input variable and target variable
x_train = train_example[:, 0] # x: size
y_train = train_example[:, 2] # y: price
a = 1e-8 # set learning rate
# see the distribution of (x, y), using matplotlib.pyplot.scatter. 
plt.scatter(x_train, y_train)
    # Python result, Not code!!!
    <matplotlib.collections.PathCollection at 0x777f1bfe3290>

png

# Run gradient descent
w, b = gradientDescent(x_train, y_train, a, gradient, squaredErrorCost, 10000, w_in=0, b_in=0) # The result is as follows
    # Python result, Not code!!!
    iteration:0, cost:5.99e+10
    iteration:1000, cost:2.40e+09
    iteration:2000, cost:2.40e+09
    iteration:3000, cost:2.40e+09
    iteration:4000, cost:2.40e+09
    iteration:5000, cost:2.40e+09
    iteration:6000, cost:2.40e+09
    iteration:7000, cost:2.40e+09
    iteration:8000, cost:2.40e+09
    iteration:9000, cost:2.40e+09
    Final w:165.38277501279427,b:1.0249605653820617,cost:2397855952.0764656
# Visualize it.
plt.scatter(x_train, y_train)
plt.plot(x_train, w*x_train+b, color='r')
plt.show()
    [<matplotlib.lines.Line2D at 0x777f1bdbcc90>]

png

Perfect!

讨论

  1. 可以发现这里的学习率 a 十分的小,1e-8。事实上,我们无法提高 a 的值了,否则会导致梯度下降算法剧烈震荡,以至于不能收敛(convergent)。这是很恐怖的,会直接导致代价函数值不降低反升,最终溢出,具体在可视化(visualization)章节有具体的讲述。事实上,正常的学习率都不该这么小的,此处由于没有进行特征缩放(feature scaling),我们只能容忍这么一个小的 a;
  2. 从数据集中可以看出,影响 price 的变量不止一个,要解决多变量的线性回归问题,需要借助"向量化"(vetorization)这一强大的工具;
  3. 我们发现最终 cost 趋于平稳,可不是 cost 达到了最小值。事实上,我们利用线性回归系数公式计算发现,J 还能取到更小的值。如下:
def regressionCoef(x: ArrayLike, y: ArrayLike):
    # Use to calculate the regression coefficients.
    m = x.shape[0] # raw of x
    x_bar = x.mean() # average of x
    y_bar = y.mean() # average of y
    sum_xiyi = 0
    sum_xixi = 0
    sum_xiyi = np.dot(x, y)
    sum_xixi = np.dot(x, x)
    # The regression coefficient fomulas
    w = (sum_xiyi-m*x_bar*y_bar)/(sum_xixi-m*x_bar*x_bar)
    b = y_bar-w*x_bar
    print(f"w:{w},b:{b}")
    return (w, b)
w, b = regressionCoef(x_train, y_train) # w:134.52528772024127,b:71270.49244872923
squaredErrorCost(x_train, y_train, w, b) # np.float64(2058132740.4330416)

​ cost 稳定只是显示的问题,“print(f"iteration:{it}, cost:{cost_now:0.2e}”)"。事实上,如果将显示位数提高,我们能发 现 cost 在以一个非常慢的速率变小。之所以这么慢,是因为此时 ∂ J ∂ w \frac{\partial J}{\partial w} wJ ∂ J ∂ b \frac{\partial J}{\partial b} bJ 过小,w , b 更新缓慢。

4. 细心的人还发现,此时的 cost 仍然很大,比我们暴力枚举获得的 cost 还大,这足以说明影响造成 cost 过大的主要原因才不是 w,b 的选择。而是受模型结构的影响,模型本身的 cost 就很大。
5. 我们越来越发现可视化的重要意义。能辅助我们选择模型,验证拟合情况……。随着我们学习的深入,我们会发现更多的意义。

下节预告:

可视化

  • 24
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值