一丶作业要求
标题 | 内容 |
---|---|
这个作业属于哪个课程 | 班级博客的链接 |
这个作业的要求在哪里 | 作业要求的链接 |
我在这个课程的目标是 | 完成一个完整的项目,学以致用 |
这个作业在哪个具体方面帮助我实现目标 | 帮助我了解DNN基础反向传播 |
二、解决方法
1) 在每次迭代中都重新计算Δb,Δw的贡献值:
import numpy as np
def func_z(w, b):
return (2*w + 3*b)*(2*b + 1)
z_true = 150
w = 3
b = 4
count = 0
while((func_z(w, b)-z_true)>=1e-5):
count+=1
z = func_z(w, b)
dz = np.abs(z - z_true)/2
y = 2*b + 1
x = 2*w + 3*b
dw = dz/(2*y)
db = dz/(2*x + 3*y)
print("w=%f, b=%f, z=%f, delta_z=%f, delta_b=%f" %(w, b, z, 2*dz, db))
w = w - dw
b = b- db
print("w=%f, b=%f, z=%f, delta_z=%f" %(w, b, func_z(w, b), func_z(w, b)-z_true))
print(f"Interation counts:{count}, final_w:{w}, final_b:{b}")
#output:******************************************************
w=3.000000, b=4.000000, z=162.000000, delta_z=12.000000, delta_b=0.095238
w=2.666667, b=3.904762, z=150.181406, delta_z=0.181406, delta_b=0.001499
w=2.661519, b=3.903263, z=150.000044, delta_z=0.000044, delta_b=0.000000
w=2.661517, b=3.903263, z=150.000000, delta_z=0.000000
Interation counts:3, final_w:2.661517402927456, final_b:3.9032629057674404
2)梯度下降
实际上在极低学习率下,精度很难提高,不过损失是减少的。
w = 3
b = 4
loss = []
eta = 1e-6
while(np.abs(func_z(w, b)-z_true)>=1e-3):
y = 2*b + 1
x = 2*w + 3*b
gradient_w = 2*y
gradient_b = 2*x + 3*y
w = w - eta*gradient_w
b = b - eta*gradient_b
loss.append(func_z(w, b)-z_true)
if np.abs(func_z(w, b)-z_true)<=1e-3:
print(func_z(w, b)-z_true)
3)tensorflow自动求导
import tensorflow as tf
w = tf.Variable(3.0, dtype=tf.float64, name='w')
b = tf.Variable(4.0, dtype=tf.float64, name='b')
f = (2*w + 3*b)*(2*b + 1)
loss = tf.abs(f-150.0)
los = np.infty
#learning_rate = initial_learning_rate*np.power(decay_rate, global_step/decay_step)
initial_learning_rate = 1e-5
decay_step = 100
global_step = tf.Variable(0, trainable=False, name="global_step")
decay_rate = 0.1
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step, decay_step, decay_rate)
#optimizer = tf.train.AdamOptimizer(learning_rate)
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
training_op = optimizer.minimize(loss, global_step=global_step)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
for epoch in range(500):
_, los = sess.run([training_op, loss])
print("Final w, b", sess.run([w, b]))
print(f"real loss:{sess.run(loss)}")
print(f"truncated loss:{los}")
#output:******************************************************
Final w, b [2.9487662772301104, 3.820509906304207]
real loss:1.0325179857773037e-06
truncated loss:7.657321816623153e-07
loss 和 los 理论上应该相同的,这里不知道为啥不同,不过梯度下降还是很难求得精确解的。如果将损失改为tf.square(f - 150), 解更精确了,rmse损失函数为凸函数??
三、思考比较
1)los感觉是被截断了或者精度不够,理论上应该等于loss值的。
2)AdamOptimizer其实已经自适应学习率了,改了Momentum+learning_schedule,结果还挺好的。
3)梯度下降其实梯度对w和b影响很小,低学习率下几乎是线性下降。