深度学习3:循环神经网络Recurrent Neural Network(基于Python MXNet.Gluon框架)

循环神经网络(RNN)是一种能处理时序数据的神经网络,具有短期记忆能力。RNN通过反馈边的神经元更新状态,解决了前馈网络在处理序列数据时的局限性。本文深入探讨了RNN的基本结构、计算能力、参数学习以及应用,并介绍了随时间反向传播和实时循环学习两种参数学习算法,以及针对长程依赖问题的改进策略。
摘要由CSDN通过智能技术生成

循环神经网络概述

  在前馈神经网络中,信息的传递是单向的,这种限制虽然使得网络变得更容易学习,但在一定程度上也减弱了神经网络模型的能力1

  在生物神经网络中,神经元之间的连接关系要复杂得多。前馈神经网络不能解决的两个问题:

  1. 前馈神经网络每次输入都是独立的,但是在很多现实任务中,网络的输入不仅和当前时刻的输入相关,也和其过去一段时间的输出相关

  2. 前馈网络难以处理时序数据(比如视频、语音、文本等)。时序数据的长度一般是不固定的,而前馈神经网络要求输入和输出的维数都是固定的,不能任意改变。

  因此,当处理这一类和时序数据相关的问题时,就需要一种能力更强的模型。

  循环神经网络(Recurrent Neural Network,RNN)是一类具有短期记忆能力的神经网络。在循环神经网络中,神经元不但可以接受其他神经元的信息,也可以接受自身的信息,形成具有环路的网络结构。

  循环神经网络的参数学习可以通过随时间反向传播算法来学习。随时间反向传播算法即按照时间的逆序将错误信息一步步地往前传递。当输入序列比较长时,会存在梯度爆炸和消失问题 ,也称为长程依赖问题。为了解决这个问题,人们对循环神经网络进行了很多的改进,其中最有效的改进方式引入门控机制(Gating Mechanism)。

  此外,循环神经网络可以很容易地扩展到两种更广义的记忆网络模型:递归神经网络图网络

给网络增加记忆能力

  为了处理这些时序数据并利用其历史信息,我们需要让网络具有短期记忆能力。而前馈网络是一种静态网络,不具备这种记忆能力。

  一般来讲,我们可以通过以下三种方法来给网络增加短期记忆能力。

延时神经网络

  一种简单的利用历史信息的方法是建立一个额外的延时单元,用来存储网络的历史信息(可以包括输入、输出、隐状态等)。比较有代表性的模型是延时神经网络(Time Delay Neural Network,TDNN)。

  延时神经网络是在前馈网络中的非输出层都添加一个延时器,记录神经元的最近几次活性值。在第 t t t 个时刻,第 l l l 层神经元的活性值依赖于第 l − 1 l-1 l1 层神经元的最近 K K K 个时刻的活性值,即

h t ( l ) = f ( h t ( l − 1 ) , h t − 1 ( l − 1 ) , ⋯   , h t − K ( l − 1 ) ) , h_t^{(l)}=f(h_t^{(l-1)},h_{t-1}^{(l-1)},\cdots,h_{t-K}^{(l-1)}), ht(l)=f(ht(l1),ht1(l1),,htK(l1)),

其中 h t ( l ) ∈ R M l h_t^{(l)}\in\mathbb{R}^{M_l} ht(l)RMl 表示第 l l l 层神经元在时刻 t t t 的活性值, M l M_l Ml 为第 l l l 层神经元的数量。通过延时器,前馈网络就具有了短期记忆的能力。

有外部输入的非线性自回归模型

  自回归模型(AutoRegressive Model,AR)是统计学上常用的一类时间序列模型,用一个变量 y t y_t yt 的历史信息来预测自己。

y t = ω 0 + ∑ k = 1 K ω k y t − k + ϵ t , y_t=\omega_0+\sum_{k=1}^K\omega_k\textbf{y}_{t-k}+\epsilon_t, yt=ω0+k=1Kωkytk+ϵt,

其中 K K K 为超参数, ω 0 , ⋯   , ω K \omega_0,\cdots,\omega_K ω0,,ωK 为可学习参数, ϵ t ∼ N ( 0 , σ 2 ) \epsilon_t\sim N(0,\sigma^2) ϵtN(0,σ2) 为第 t t t 个时刻的噪声,方差 σ 2 \sigma^2 σ2 和时间无关。

  有外部输入的非线性自回归模型(Nonlinear AutoRegressive with Exoge-nous Inputs Model,NARX)是自回归模型的扩展,在每个时刻 t t t 都有一个外部输入 x t x_t xt ,产生一个输出 y t y_t yt 。NARX 通过一个延时器记录最近 K x K_x Kx 次的外部输入和最近 K y K_y Ky 次的输出,第 t t t 个时刻的输出 y t y_t yt

y t = f ( x t , x t − 1 , ⋯   , x t − K x , y t − 1 , y t − 2 , ⋯   , y t − K y ) , y_t=f(x_t,x_{t-1},\cdots,x_{t-K_x},y_{t-1},y_{t-2},\cdots,y_{t-K_y}), yt=f(xt,xt1,,xtKx,yt1,yt2,,ytKy),

其中 f ( ⋅ ) f(\cdot) f() 表示非线性函数,可以是一个前馈网络, K x K_x Kx K y K_y Ky 为超参数。

循环神经网络

  循环神经网络(Recurrent Neural Network,RNN)通过使用带自反馈的神经元,能够处理任意长度的时序数据

  给定一个输入序列 x 1 : T = ( x 1 , x 2 , ⋯   , x t , ⋯   , x T ) \textbf{x}_{1:T}=(\textbf{x}_1,\textbf{x}_2,\cdots,\textbf{x}_t,\cdots,\textbf{x}_T) x1:T=(x1,x2,,xt,,xT),循环神经网络通过下面公式更新带反馈边的隐藏层的活性值 h t \textbf{h}_t ht

h t = f ( h t − 1 , x t ) \textbf{h}_t=f(\textbf{h}_{t-1},\textbf{x}_t) ht=f(ht1,xt)

其中 h 0 = 0 , f ( ⋅ ) \textbf{h}_0=0,f(\cdot) h0=0,f() 为一个非线性函数,可以是一个前馈网络。

  从数学上讲,公式 h t = f ( h t − 1 , x t ) \textbf{h}_t=f(\textbf{h}_{t-1},\textbf{x}_t) ht=f(ht1

eep Learning: Recurrent Neural Networks in Python: LSTM, GRU, and more RNN machine learning architectures in Python and Theano (Machine Learning in Python) by LazyProgrammer English | 8 Aug 2016 | ASIN: B01K31SQQA | 86 Pages | AZW3/MOBI/EPUB/PDF (conv) | 1.44 MB Like Markov models, Recurrent Neural Networks are all about learning sequences - but whereas Markov Models are limited by the Markov assumption, Recurrent Neural Networks are not - and as a result, they are more expressive, and more powerful than anything we’ve seen on tasks that we haven’t made progress on in decades. In the first section of the course we are going to add the concept of time to our neural networks. I’ll introduce you to the Simple Recurrent Unit, also known as the Elman unit. We are going to revisit the XOR problem, but we’re going to extend it so that it becomes the parity problem - you’ll see that regular feedforward neural networks will have trouble solving this problem but recurrent networks will work because the key is to treat the input as a sequence. In the next section of the book, we are going to revisit one of the most popular applications of recurrent neural networks - language modeling. One popular application of neural networks for language is word vectors or word embeddings. The most common technique for this is called Word2Vec, but I’ll show you how recurrent neural networks can also be used for creating word vectors. In the section after, we’ll look at the very popular LSTM, or long short-term memory unit, and the more modern and efficient GRU, or gated recurrent unit, which has been proven to yield comparable performance. We’ll apply these to some more practical problems, such as learning a language model from Wikipedia data and visualizing the word embeddings we get as a result. All of the materials required for this course can be downloaded and installed for FREE. We will do most of our work in Numpy, Matplotlib, and Theano. I am always available to answer
https://www.udemy.com/deep-learning-recurrent-neural-networks-in-python/ Deep Learning: Recurrent Neural Networks in Python GRU, LSTM, + more modern deep learning, machine learning, and data science for sequences Created by Lazy Programmer Inc. Last updated 5/2017 English What Will I Learn? Understand the simple recurrent unit (Elman unit) Understand the GRU (gated recurrent unit) Understand the LSTM (long short-term memory unit) Write various recurrent networks in Theano Understand backpropagation through time Understand how to mitigate the vanishing gradient problem Solve the XOR and parity problems using a recurrent neural network Use recurrent neural networks for language modeling Use RNNs for generating text, like poetry Visualize word embeddings and look for patterns in word vector representations Requirements Calculus Linear algebra Python, Numpy, Matplotlib Write a neural network in Theano Understand backpropagation Probability (conditional and joint distributions) Write a neural network in Tensorflow Description Like the course I just released on Hidden Markov Models, Recurrent Neural Networks are all about learning sequences – but whereas Markov Models are limited by the Markov assumption, Recurrent Neural Networks are not – and as a result, they are more expressive, and more powerful than anything we’ve seen on tasks that we haven’t made progress on in decades. So what’s going to be in this course and how will it build on the previous neural network courses and Hidden Markov Models? In the first section of the course we are going to add the concept of time to our neural networks. I’ll introduce you to the Simple Recurrent Unit, also known as the Elman unit. We are going to revisit the XOR problem, but we’re going to extend it so that it becomes the parity problem – you’ll see that regular feedforward neural networks will have trouble solving this problem but recurrent networks will work because the key is to treat the input as a sequence. In the next section of the course, we are going to revisit one of the most popular applications of recurrent neural networks – language modeling. You saw when we studied Markov Models that we could do things like generate poetry and it didn’t look too bad. We could even discriminate between 2 different poets just from the sequence of parts-of-speech tags they used. In this course, we are going to extend our language model so that it no longer makes the Markov assumption. Another popular application of neural networks for language is word vectors or word embeddings. The most common technique for this is called Word2Vec, but I’ll show you how recurrent neural networks can also be used for creating word vectors. In the section after, we’ll look at the very popular LSTM, or long short-term memory unit, and the more modern and efficient GRU, or gated recurrent unit, which has been proven to yield comparable performance. We’ll apply these to some more practical problems, such as learning a language model from Wikipedia data and visualizing the word embeddings we get as a result. All of the materials required for this course can be downloaded and installed for FREE. We will do most of our work in Numpy, Matplotlib, and Theano. I am always available to answer your questions and help you along your data science journey. This course focuses on “how to build and understand“, not just “how to use”. Anyone can learn to use an API in 15 minutes after reading some documentation. It’s not about “remembering facts”, it’s about “seeing for yourself” via experimentation. It will teach you how to visualize what’s happening in the model internally. If you want more than just a superficial look at machine learning models, this course is for you. See you in class! NOTES: All the code for this course can be downloaded from my github: /lazyprogrammer/machine_learning_examples In the directory: rnn_class Make sure you always “git pull” so you have the latest version! HARD PREREQUISITES / KNOWLEDGE YOU ARE ASSUMED TO HAVE: calculus linear algebra probability (conditional and joint distributions) Python coding: if/else, loops, lists, dicts, sets Numpy coding: matrix and vector operations, loading a CSV file Deep learning: backpropagation, XOR problem Can write a neural network in Theano and Tensorflow TIPS (for getting through the course): Watch it at 2x. Take handwritten notes. This will drastically increase your ability to retain the information. Write down the equations. If you don’t, I guarantee it will just look like gibberish. Ask lots of questions on the discussion board. The more the better! Realize that most exercises will take you days or weeks to complete. Write code yourself, don’t just sit there and look at my code. USEFUL COURSE ORDERING: (The Numpy Stack in Python) Linear Regression in Python Logistic Regression in Python (Supervised Machine Learning in Python) (Bayesian Machine Learning in Python: A/B Testing) Deep Learning in Python Practical Deep Learning in Theano and TensorFlow (Supervised Machine Learning in Python 2: Ensemble Methods) Convolutional Neural Networks in Python (Easy NLP) (Cluster Analysis and Unsupervised Machine Learning) Unsupervised Deep Learning (Hidden Markov Models) Recurrent Neural Networks in Python Artificial Intelligence: Reinforcement Learning in Python Natural Language Processing with Deep Learning in Python Who is the target audience? If you want to level up with deep learning, take this course. If you are a student or professional who wants to apply deep learning to time series or sequence data, take this course. If you want to learn about word embeddings and language modeling, take this course. If you want to improve the performance you got with Hidden Markov Models, take this course. If you’re interested the techniques that led to new developments in machine translation, take this course. If you have no idea about deep learning, don’t take this course, take the prerequisites.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值