正向传播与反向传播的推导

为了分析一个神经元的反向传播过程,对一个2×2的感受野卷积一次,通过学习,让它认识这个感受野为1。

  1. 感受野的输入信号x\begin{bmatrix} 0.1 & 0.2\\ 0.3 & 0.4 \end{bmatrix}
  2. padding='VALID',不然会卷积4次。
  3. 优化器:GradientDescentOptimizer。
  4. 激活函数:sigmoid。
  5. 学习率恒为0.2。
图1. 一个人工神经元

神经元如图1所示, 那么:

  1. 损失函数:Loss=\left ( 1-y_{2} \right )^{2}
  2. 损失函数对权重的偏导数:\frac{\partial L}{\partial w} = \frac{\partial L}{\partial y_{2}} \cdot \frac{\partial y_{2}}{\partial y_{1}} \cdot \frac{\partial y_{1}}{\partial w} = 2\left ( y_{2}-1 \right ) \cdot y_{2}\left ( 1-y_{2} \right ) \cdot x = -2y_{2}\cdot Loss \cdot x
  3. 损失函数对偏置的偏导数:\frac{\partial L}{\partial b} = \frac{\partial L}{\partial y_{2}} \cdot \frac{\partial y_{2}}{\partial y_{1}} \cdot \frac{\partial y_{1}}{\partial b} = 2\left ( y_{2}-1 \right ) \cdot y_{2}\left ( 1-y_{2} \right ) = -2y_{2} \cdot Loss

正向传播和反向传播

在一次训练时,滤波器w被初始化为:\begin{bmatrix} -0.00101103 & 0.00193166\\ -0.01216178 & 0.01202441 \end{bmatrix},偏置b被初始化为\begin{bmatrix} 0 \end{bmatrix}。第一次迭代:

正向传播

  1.  y_{1}=w\cdot x + b =0.00144646
  2. y_{2}=sigmoid(y_{1})=0.50036161
  3. Loss=(1-y_{2})^{2}=0.2496385164748249

反向传播

1. Loss对权重和偏置的偏导和通过tf.gradients(loss,filter)计算的一样(只有最后1位数有较小差异):

1.1 程序输出的梯度:--filter gradient: [[-0.02498191 -0.04996381 -0.07494572 -0.09992762]]  bias gradient: [[-0.24981906]]

1.2 手动计算的梯度:

\frac{\partial L}{\partial b} = -2\cdot y_{2} \cdot Loss = -0.2498191

\frac{\partial L}{\partial w_{1}} = \frac{\partial L}{\partial b} \cdot x_{1} = -0.02498191

\frac{\partial L}{\partial w_{2}} = 2 \cdot \frac{\partial L}{\partial w_{1}} = -0.04996382

\frac{\partial L}{\partial w_{3}} = 3 \cdot \frac{\partial L}{\partial w_{1}} = -0.07494573

\frac{\partial L}{\partial w_{4}} = 4 \cdot \frac{\partial L}{\partial w_{1}} = -0.09992764

2. 更新权重和偏置:

2.1 程序更新的权重和偏置:--filter: [[0.00398535 0.01192442 0.00282736 0.03200994]]  bias: [0.04996381]

2.2 手动更新的权重和偏置:

b=b-\eta \cdot \frac{\partial L}{\partial b} = 0 - 0.2 \times (-0.2498191) = 0.04996381

w_{1} = w_{1} - 0.2 \cdot \frac{\partial L}{\partial w_{1}} = 0.00398535

w_{2} = w_{2} - 0.2 \cdot \frac{\partial L}{\partial w_{2}} = 0.01192442

w_{3} = w_{3} - 0.2 \cdot \frac{\partial L}{\partial w_{3}} = 0.00282736

w_{4} = w_{4} - 0.2 \cdot \frac{\partial L}{\partial w_{4}} = 0.03200994

迭代

反向传播一次,w和b就更新一次。多次这样的更新后,wx+b就越来越接近1,这就是训练的目的。


相关程序

程序输出:

--trainable variables: [<tf.Variable 'filter:0' shape=(2, 2, 1, 1) dtype=float64_ref>, <tf.Variable 'bias:0' shape=(1,) dtype=float64_ref>]

--filter: [[-0.00101103  0.00193166 -0.01216178  0.01202441]]  bias: [0.]
--y1: [[[[0.00144646]]]]  y2: [[[[0.50036161]]]] loss: 0.2496385164748249
--filter gradient: [[-0.02498191 -0.04996381 -0.07494572 -0.09992762]]  bias gradient: [[-0.24981906]]

--filter: [[0.00398535 0.01192442 0.00282736 0.03200994]]  bias: [0.04996381]
--y1: [[[[0.0164356]]]]  y2: [[[[0.51659376]]]] loss: 0.23368159536065822
--filter gradient: [[-0.02414369 -0.04828738 -0.07243107 -0.09657476]]  bias gradient: [[-0.24143691]]

--filter: [[0.00881409 0.02158189 0.01731358 0.05132489]]  bias: [0.0982512]
--y1: [[[[0.03092182]]]]  y2: [[[[0.53224842]]]] loss: 0.21879153616090155
--filter gradient: [[-0.02329029 -0.04658058 -0.06987087 -0.09316116]]  bias gradient: [[-0.2329029]]

代码:

import tensorflow as tf
import numpy as np

def net(input):
	global filter, bias, y1, y2
	init_random = tf.random_normal_initializer(mean=0.0, stddev=0.01, seed=None, dtype=tf.float64)
	filter = tf.get_variable('filter', shape=[2,2,1,1], initializer=init_random, dtype=tf.float64)
	bias = tf.Variable([0], dtype=tf.float64, name='bias')
	y1 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
	y2 = tf.nn.sigmoid(y1 + bias)
	return y2

def display(sess):
	#print '--it:%2d' % it,'loss:',loss.eval({input:data},sess)
	print
	print"--filter:",filter.eval(sess).reshape(1,4)," bias:",bias.eval(sess)
	print"--y1:",y1.eval({input:data},sess)," y2:",y2.eval({input:data},sess),"loss:",loss.eval({input:data},sess)
	print"--filter gradient:",tf.gradients(loss,filter)[0].eval({input:data},sess).reshape(1,4),\
	" bias gradient:",tf.gradients(loss,bias)[0].eval({input:data},sess).reshape(1,1)

data = np.array([[0.1,0.2],[0.3,0.4]])
data = np.reshape(data,(1,2,2,1))

input = tf.placeholder(tf.float64, [1,2,2,1])
predict = net(input)
loss = tf.reduce_mean(tf.square(1-predict))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.2, step, 1, 1)
#optimizer = tf.train.AdadeltaOptimizer(rate)
#optimizer = tf.train.AdagradOptimizer(rate)
#optimizer = tf.train.AdamOptimizer(rate)
#optimizer = tf.train.FtrlOptimizer(rate)
optimizer = tf.train.GradientDescentOptimizer(rate)
#optimizer = tf.train.MomentumOptimizer(rate)
#optimizer = tf.train.RMSPropOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)
init = tf.global_variables_initializer()

with tf.Session() as sess:
	sess.run(init)
	print"--trainable variables:", tf.trainable_variables()
	for it in range(3):
		display(sess)
		train.run({input:data},sess)

更多

当padding='SAME'时会有4个卷积结果,变成4分类问题:

import tensorflow as tf
import numpy as np

def net(input):
	global filter, bias
	init_random = tf.random_normal_initializer(mean=0.0, stddev=0.01, seed=None, dtype=tf.float32)
	filter = tf.get_variable('filter', shape=[2,2,1,1], initializer=init_random)
	bias = tf.Variable([0], dtype=tf.float32, name='bias')
	out = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
	out = tf.nn.sigmoid(out + bias)
	return out

def display(sess):
	print"--trainable variables:", tf.trainable_variables()
	print"--filter:", filter.eval(sess).reshape(1,4)
	print"--bias:", bias.eval(sess)

data = np.array([[1,2],[3,4]])
data = np.reshape(data,(1,2,2,1))

input = tf.placeholder(tf.float32, [1,2,2,1])
predict = net(input)
GT = tf.constant([0,1,0,0], shape=(1,2,2,1), dtype=tf.float32)
loss = tf.reduce_mean(tf.square(GT-predict))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.15, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)
init = tf.global_variables_initializer()

with tf.Session() as sess:
	sess.run(init)
	display(sess)
	for it in range(3):
		train.run({input:data},sess)
		print '--it:%2d' % it,'loss:',loss.eval({input:data},sess)
	display(sess)

[TF优化器][多层全连接层的反向传播分析]

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值