笔记二的模型是从一般的思维方式来设计的,这与神经网络通常设计中的思路并不一致。
笔记三关注如何优化这个模型,使其逻辑更清晰、运行更高效。
4.1,在程序运行中查看变量的取值
>>> name = "adam"
>>> print("Name: %s" % name)
Name: adam
>>>
>>> x = 101
>>> y = 12.35
>>> print("x = %d" % x)
x = 101
>>> print("y = %f" % y)
y = 12.350000
>>>
4.7,用softmax函数来规范可变参数
我以为是书上错了,还误以为softmax无效
这一节称,*通过softmax函数运算,如果再用 相同的学习率和循环次数 来训练,会发现达到相同误差率所需的训练次数明显减少。 *
一开始我以为,用了softmax就可以很快达到[0.6, 0.3, 0.1]
,但试验结果跟我想的太不一样了。softmax作用在我自己的试验中,权重的波动一直很大。
按照书中给的例子:
# code_4.6_score1f.py
import tensorflow as tf
x = tf.placeholder(shape=[3], dtype=tf.float32)
yTrain = tf.placeholder(shape=[], dtype=tf.float32)
w = tf.Variable(tf.zeros([3]), dtype=tf.float32)
wn = tf.nn.softmax(w)
n = x * wn
y = tf.reduce_sum(n)
loss = tf.abs(y - yTrain)
optimizer = tf.train.RMSPropOptimizer(0.1)
train = optimizer.minimize(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(2):
result = sess.run([train, x, w, wn, y, yTrain, loss], feed_dict={x: [90, 80, 70], yTrain: 85})
print(result[3])
result = sess.run([train, x, w, wn, y, yTrain, loss], feed_dict={x: [98, 95, 87], yTrain: 96})
print(result[3])
输出如下:
D:\Program_Files_x64\Anaconda3\envs\deeplearning_Py_TF\python.exe D:/inst2vec_experiment/PycharmProjects/deeplearning_Py_TF/code_4.6_score1f.py
2021-01-14 11:40:30.913697: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
[0.33333334 0.33333334 0.33333334]
[0.413998 0.32727832 0.2587237 ]
[0.44992 0.32819405 0.22188595]
[0.5284719 0.2905868 0.18094125]
Process finished with exit code 0
看这个输出,貌似经过4次训练,参数已经很接近解析解[0.6, 0.3, 0.1]
了。
当增大训练次数到range(100) * 2
时,训练的最后10次的输出却是这样:
[0.53597116 0.38605216 0.07797671]
[0.6009958 0.3307964 0.06820776]
[0.57949114 0.34662864 0.07388028]
[0.514625 0.40151832 0.08385672]
[0.53671014 0.3865417 0.07674818]
[0.6015758 0.33125973 0.06716447]
[0.5801556 0.34711993 0.07272442]
[0.5154343 0.40204716 0.0825186 ]
[0.53747654 0.38697198 0.07555145]
[0.6022332 0.33163568 0.06613111]
波动不是很稳定。当增大训练次数到range(1000) * 2
时,训练的最后10次的输出如下:
[0.5909161 0.34312806 0.06595593]
[0.52686936 0.39814836 0.07498237]
[0.5489259 0.3824051 0.06866898]
[0.6131694 0.32687482 0.05995575]
[0.5922006 0.34290916 0.0648903 ]
[0.5282114 0.39799386 0.07379475]
[0.55027235 0.3821475 0.06758016]
[0.61447376 0.32654396 0.0589823 ]
[0.5935523 0.34261104 0.06383662]
[0.5296246 0.39775392 0.07262148]
我们发现依然不稳定。当增大训练次数到range(5000) * 2
时,训练的最后10次的输出如下:
[0.581127 0.35069135 0.06818163]
[0.51661265 0.40605828 0.07732906]
[0.5385668 0.39057976 0.07085335]
[0.6031867 0.3347785 0.06203476]
[0.58199143 0.3508892 0.06711936]
[0.5175219 0.4063409 0.07613724]
[0.5394755 0.39075965 0.06976488]
[0.60406595 0.33486596 0.06106813]
[0.5829101 0.3510191 0.06607077]
[0.5184833 0.4065536 0.07496312]
我们发现这次的效果依然很迷。
因此,softmax所起的作用很难说清楚。
而对于书中之前的代码,
# code_3.3_score1c.py
import tensorflow as tf
x1 = tf.placeholder(dtype=tf.float32)
x2 = tf.placeholder(dtype=tf.float32)
x3 = tf.placeholder(dtype=tf.float32)
yTrain = tf.placeholder(dtype=tf.float32)
w1 = tf.Variable(0.1, dtype=tf.float32)
w2 = tf.Variable(0.1, dtype=tf.float32)
w3 = tf.Variable(0.1, dtype=tf.float32)
n1 = x1 * w1
n2 = x2 * w2
n3 = x3 * w3
y = n1 + n2 + n3
loss = tf.abs(y - yTrain)
optimizer = tf.train.RMSPropOptimizer(0.001)
train = optimizer.minimize(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
flag = 0
tmp = 999
def train_v1():
for i in range(10000):
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 90, x2: 80, x3: 70, yTrain: 85})
if i == tmp:
flag = 1
print("i = %d" % i)
print(result)
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 98, x2: 95, x3: 87, yTrain: 96})
if i == tmp:
print(result)
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 70, x2: 90, x3: 80, yTrain: 77})
if i == tmp:
print(result)
if flag == 1:
tmp = tmp + 1000
flag = 0
def train_v2():
for i in range(5000):
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 90, x2: 80, x3: 70, yTrain: 85})
print(result)
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 98, x2: 95, x3: 87, yTrain: 96})
print(result)
def train_v3():
for i in range(5000):
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 92, x2: 98, x3: 90, yTrain: 94})
print(result)
result = sess.run([train, x1, x2, x3, w1, w2, w3, y, yTrain, loss],
feed_dict={x1: 92, x2: 99, x3: 98, yTrain: 96})
print(result)
if __name__ == '__main__':
train_v2()
最后10次的训练结果输出如下:
[None, array(90., dtype=float32), array(80., dtype=float32), array(70., dtype=float32), 0.58388305, 0.28717414, 0.1325421, 85.02325, array(85., dtype=float32), 0.023246765]
[None, array(98., dtype=float32), array(95., dtype=float32), array(87., dtype=float32), 0.5828438, 0.2860972, 0.13144642, 96.03325, array(96., dtype=float32), 0.0332489]
[None, array(90., dtype=float32), array(80., dtype=float32), array(70., dtype=float32), 0.5838025, 0.28701225, 0.132338, 84.54497, array(85., dtype=float32), 0.45503235]
[None, array(98., dtype=float32), array(95., dtype=float32), array(87., dtype=float32), 0.5848418, 0.2880892, 0.13343368, 95.99221, array(96., dtype=float32), 0.007789612]
[None, array(90., dtype=float32), array(80., dtype=float32), array(70., dtype=float32), 0.58388305, 0.28717414, 0.1325421, 85.02325, array(85., dtype=float32), 0.023246765]
[None, array(98., dtype=float32), array(95., dtype=float32), array(87., dtype=float32), 0.5828438, 0.2860972, 0.13144642, 96.03325, array(96., dtype=float32), 0.0332489]
[None, array(90., dtype=float32), array(80., dtype=float32), array(70., dtype=float32), 0.5838025, 0.28701225, 0.132338, 84.54497, array(85., dtype=float32), 0.45503235]
[None, array(98., dtype=float32), array(95., dtype=float32), array(87., dtype=float32), 0.5848418, 0.2880892, 0.13343368, 95.99221, array(96., dtype=float32), 0.007789612]
[None, array(90., dtype=float32), array(80., dtype=float32), array(70., dtype=float32), 0.58388305, 0.28717414, 0.1325421, 85.02325, array(85., dtype=float32), 0.023246765]
[None, array(98., dtype=float32), array(95., dtype=float32), array(87., dtype=float32), 0.5828438, 0.2860972, 0.13144642, 96.03325, array(96., dtype=float32), 0.0332489]
我们发现,loss还是比较稳定的。唉,咋算稳定呢,也有几次的loss在0.5左右,也不能说稳定。这个训练真是一个玄学。
其实是我没有理解清楚书中的描述,书中是对的,softmax有效
我把code_4.6_score1f.py的学习率调整成和code_3.3_score1c.py一样的0.001,训练权重还真的就稳定多了。
我们把上面第一份代码(code_4.6_score1f.py)的学习率(tf.train.RMSPropOptimizer()的参数)调整为0.001,当训练次数设为range(5000) * 2
时,训练的最后10次的输出如下
[0.59985083 0.3001836 0.09996562]
[0.6004757 0.29970306 0.09982121]
[0.6002519 0.29983714 0.09991095]
[0.599627 0.3003176 0.10005541]
[0.599851 0.30018356 0.09996543]
[0.6004759 0.29970303 0.09982101]
[0.6002521 0.2998371 0.09991075]
[0.5996272 0.30031753 0.1000552 ]
[0.5998512 0.30018353 0.09996524]
[0.6004761 0.299703 0.09982082]
最后这几次的结果都比较接近[0.6, 0.3, 0/1]
,说明,还是softmax还是有效的。
通过softmax函数运算,如果再用 相同的学习率和循环次数 来训练,会发现达到相同误差率所需的训练次数明显减少。注意是 相同的学习率和循环次数