用梯度下降法求解w,b。
预设函数 Hypothesis Function
z
=
w
x
+
b
z = wx+b
z=wx+b
损失函数 Loss Function
J
(
w
,
b
)
=
1
2
(
z
−
y
)
2
J(w,b) = \frac{1}{2}(z-y)^2
J(w,b)=21(z−y)2
z是预测值,y是样本标签值。
求w的梯度
我们用J的值作为基准,去求w对它的影响,也就是J对w的偏导数(链式求导):
∂ J ( w , b ) ∂ w = ∂ J ∂ z ∂ z ∂ w \frac{\partial{J(w,b)}}{\partial{w}} = \frac{\partial{J}}{\partial{z}}\frac{\partial{z}}{\partial{w}} ∂w∂J(w,b)=∂z∂J∂w∂z
因为:
∂ J ∂ z = ∂ ∂ z [ 1 2 ( z − y ) 2 ] = z − y \frac{\partial{J}}{\partial{z}} = \frac{\partial{}}{\partial{z}}[\frac{1}{2}(z-y)^2] = z-y ∂z∂J=∂z∂[21(z−y)2]=z−y ∂ z ∂ w = ∂ ∂ w ( w x + b ) = x \frac{\partial{z}}{\partial{w}} = \frac{\partial{}}{\partial{w}}(wx+b) = x ∂w∂z=∂w∂(wx+b)=x
所以组合起来:
∂ J ∂ w = ∂ J ∂ z ∂ z ∂ w = ( z − y ) ⋅ x \frac{\partial{J}}{\partial{w}} = \frac{\partial{J}}{\partial{z}}\frac{\partial{z}}{\partial{w}} = (z-y) \cdot x ∂w∂J=∂z∂J∂w∂z=(z−y)⋅x
求b的梯度
∂
J
∂
b
=
∂
J
∂
z
∂
z
∂
b
\frac{\partial{J}}{\partial{b}} = \frac{\partial{J}}{\partial{z}}\frac{\partial{z}}{\partial{b}}
∂b∂J=∂z∂J∂b∂z
其中第一项前面算w的时候已经有了,而:
∂ z ∂ b = ∂ ( w x + b ) ∂ b = 1 \frac{\partial{z}}{\partial{b}} = \frac{\partial{(wx+b)}}{\partial{b}} = 1 ∂b∂z=∂b∂(wx+b)=1
所以:
∂ J ∂ b = ∂ J ∂ z ∂ z ∂ b = ( z − y ) ⋅ 1 = z − y \frac{\partial{J}}{\partial{b}} = \frac{\partial{J}}{\partial{z}}\frac{\partial{z}}{\partial{b}} = (z-y) \cdot 1 = z-y ∂b∂J=∂z∂J∂b∂z=(z−y)⋅1=z−y
代码
if __name__ == '__main__':
eta = 0.1
X, Y = ReadData()
w, b = 0.0, 0.0
#w,b = np.random.random(),np.random.random()
# count of samples
num_example = X.shape[0]
for i in range(num_example):
# get x and y value for one sample
x = X[i]
y = Y[i]
# get z from x,y
z = w*x+b
# calculate gradient of w and b
dz = z - y
db = dz
dw = dz * x
# update w,b
w = w - eta * dw
b = b - eta * db
print(w,b)
d w = ( z − y ) ⋅ x , d b = z − y dw = (z-y) \cdot x,db = z-y dw=(z−y)⋅x,db=z−y,这个和公式推导完全一样。之所以有个dz是想保存中间计算结果,不重复劳动。因为这个函数是每次内循环都被调用的,所以要尽量优化。
另外,大家可以看到,在代码中,我们并没有直接计算损失函数值,而只是把它融入在公式推导中。
木头:哦!我明白了,原来大名鼎鼎的梯度下降,其实就是把推导的结果转化为数学公式和代码,直接放在迭代过程里!