机器学习-递归神经网络(1)

最新推荐文章于 2022-04-16 09:16:58 发布

weixin_38498942

最新推荐文章于 2022-04-16 09:16:58 发布

阅读量688

点赞数

分类专栏： sdk

本文链接：https://blog.csdn.net/weixin_38498942/article/details/107362968

版权

sdk 专栏收录该内容

281 篇文章 30 订阅

订阅专栏

一、简介
在以前的前向神经网络上，我们的输出是当前输入和一组权重之间的函数。在递归神经网络（RNN）上，先前的网络状态也会影响输出，因此递归神经网络也具有“时间概念”。这种效果是通过在输出到其输入的图层上循环执行的。
换句话说，RNN将是一个具有输入x(t)（输入向量）和先前状态h(t-1)的函数。新状态将为h(t)。循环功能，将在训练后固定并用于每个时间步骤。递归神经网络是最好的回归模型，因为它考虑了过去的。RNN是计算“ Turing Machines”，这意味着，只要具有正确的权重集，它就可以计算任何东西，并将此权重想象为一个程序。只是为了不让您对RNN过于自信，没有自动的反向传播算法可以找到这种“完美的权重”。

二、递归神经网络
递归神经网络的用例：机器翻译（英语->法语）、语音转文字、市场预测、场景标签（与CNN组合）、车轮转向（与CNN结合）。在python上实现Vanilla RNN，在下面，我们有一个简单的RNN递归函数实现：
在这里插入图片描述
计算到下一个状态的代码如下所示：
def rnn_step_forward(x, prev_h, Wx, Wh, b):
#We separate on steps to make the backpropagation easier
#forward pass in steps
#step 1
xWx = np.dot(x, Wx)

#step 2
phWh = np.dot(prev_h,Wh)

#step 3
#total
affine = xWx + phWh + b.T

#step 4
next_h = np.tanh(t)

#Cache iputs, state, and weights
#we are having prev_h.copy() since python params are pass by reference.
cache = (x, prev_h.copy(), Wx, Wh, next_h, affine)

return next_h, cache

观察到在RNN的情况下，我们现在对下一个状态更感兴趣，h(t)不完全是输出y(t)，在开始之前，我们先明确说明如何反向传播tanh块。
在这里插入图片描述
现在我们可以进行反向传播步骤（一个时间步）

def rnn_step_backward(dnext_h, cache):
(x, prev_h, Wx, Wh, next_h, affine) = cache

#backward in step
#step 4
# dt delta of total
# Gradient of tanh times dnext_h
dt = (1 - np.square(np.tanh(affine))) * (dnext_h)

#step 3
#Gradient of sum block
dxWx = dt
dphWh = dt
db = np.sum(dt, axis=0)

#step 2
# Gradient of the mul block
dWh = prev_h.T.dot(dphWh)
dprev_h = Wh.dot(dphWh.T).T

#step 1
# Gradient of the mul block
dx = dxWx.dot(Wx.T)
dWx = x.T.dot(dxWx)

return dx, dprev_h, dWx, dWh, db

要注意的一点是，相同的函数f(weights)和相同的参数集将应用于每个时间步长。
在这里插入图片描述
RNN状态的良好初始化为零。同样，这只是初始RNN状态，而不是权重。
RNN上的这些循环功能可能首先会使您感到困惑，但实际上，您可以将其视为正常的神经网络重复（展开）多次。您展开的次数可以考虑网络过去记得的时间。换句话说，每次都是一个时间步。
在这里插入图片描述
每个时间步长的正向和反向传播
从前面的示例中，我们仅针对一个时间步长介绍了正向传播和反向传播的代码。如前所述，RNN在每个时间步均受限（有限）。现在我们介绍如何对每个时间步进行正向传播。

def rnn_forward(x, h0, Wx, Wh, b):
“”"
Run a vanilla RNN forward on an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The RNN uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the RNN forward, we return the hidden states for all timesteps.

Inputs:

x: Input data for the entire timeseries, of shape (N, T, D).
h0: Initial hidden state, of shape (N, H)
Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
b: Biases of shape (H,)

Returns a tuple of:

h: Hidden states for the entire timeseries, of shape (N, T, H).
cache: Values needed in the backward pass
“”"

#Get shapes
N, T, D = x.shape
#Initialization
h, cache = None, None
H = h0.shape[1]
h = np.zeros((N,T,H))

#keeping the inital value in the last element
#it will be overwritten
h[:,-1,:] = h0
cache = []

#For each time-step
for t in xrange(T):
h[:,t,:], cache_step = rnn_step_forward(x[:,t,:], h[:,t-1,:], Wx, Wh, b)
cache.append(cache_step)

#Return current state and cache
return h, cache

def rnn_backward(dh, cache):
“”"
Compute the backward pass for a vanilla RNN over an entire sequence of data.

Inputs:

dh: Upstream gradients of all hidden states, of shape (N, T, H)

Returns a tuple of:

dx: Gradient of inputs, of shape (N, T, D)
dh0: Gradient of initial hidden state, of shape (N, H)
dWx: Gradient of input-to-hidden weights, of shape (D, H)
dWh: Gradient of hidden-to-hidden weights, of shape (H, H)
db: Gradient of biases, of shape (H,)
“”"
dx, dh0, dWx, dWh, db = None, None, None, None, None
#Get shapes
N,T,H = dh.shape
D = cache[0][0].shape[1] # D taken from x in cache

#Initialization keeping the gradients with the same shape it’s respective inputs/weights
dx, dprev_h = np.zeros((N, T, D)),np.zeros((N, H))
dWx, dWh, db = np.zeros((D, H)), np.zeros((H, H)), np.zeros((H,))
dh = dh.copy()

#For each time-step
for t in reversed(xrange(T)):
dh[:,t,:] += dprev_h # updating the previous layer dh
dx_, dprev_h, dWx_, dWh_, db_ = rnn_step_backward(dh[:,t,:], cache[t])
# Observe that we sum each time-step gradient
dx[:,t,:] += dx_
dWx += dWx_
dWh += dWh_
db += db_

dh0 = dprev_h

return dx, dh0, dWx, dWh, db

在贝娄，我们展示了一个图表，该图表展示了与正向网络相比，您可以使用递归神经网络的多种方式。考虑输入红色块，输出蓝色块:
在这里插入图片描述

一对一：普通正向网络，即：输入上的图像，输出上的标签
一对多（RNN）：（图像说明）图像输入，描述场景的文字（检测到CNN区域+ RNN）
多对一（RNN）：（情感分析）输入词上的单词，输出（好/差）产品上的情感。
多对多（RNN）：（翻译），输入词为英语短语，输出词为葡萄牙语。
多对多（RNN）：（视频分类）视频输入，输出视频说明。

weixin_38498942

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习-递归神经网络(1)

一、简介在以前的前向神经网络上，我们的输出是当前输入和一组权重之间的函数。在递归神经网络（RNN）上，先前的网络状态也会影响输出，因此递归神经网络也具有“时间概念”。这种效果是通过在输出到其输入的图层上循环执行的。换句话说，RNN将是一个具有输入x(t)（输入向量）和先前状态h(t-1)的函数。新状态将为h(t)。循环功能，将在训练后固定并用于每个时间步骤。递归神经网络是最好的回归模型，因为它考虑了过去的。RNN是计算“ Turing Machines”，这意味着，只要具有正确的权重集，它就可以计
复制链接

扫一扫