[吴恩达团队自然语言处理第3课_1]神经网络与情感分析 RNN
Neural networks
![image-20220224171510215](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153157268-679597467.png)
![image-20220224171615451](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153157595-1751972248.png)
Initial Representation
空的补0
Summary
- Structure for sentiment analysis
- Classify complex tweets
- Initial representation
Trax neural networks
基于tensorflow
![image-20220224174020199](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153158201-160687188.png)
Advantages of using frameworks
- Run fast on CPUs,GPUs and TPUs
- Parallel computing
- Record algebraic computations for gradient evaluation
主要框架
Tensorflow
Pytorch
JAX
Trax layers
Classes
Classed in Python
class MyClass(Object):
def _init_(self,y):
self.y=y
def my_method(self,x):
return x+self.y
def _call_(self,x):
return self. my_method(x)
f = MyClass(7)
print(f(3))
#10
Subclasses
class SubClass(MyClass):
def my_method(self,x):
returnx+self.y**2
f = SubClass(7)
print(f(3))
#52
Dense Layer and ReLu Layer
Dense Layer
![image-20220224180348278](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153158542-656530313.png)
ReLu layer
Summary
-
Dense Layer ->
\[z^{[i]}=W^{[i]}a^{[i-1]} \] -
ReLu Layer ->
\[g(z^{[i]})=max(0,z^{[i]}) \]
serial layer
![image-20220224182616411](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153158828-132271924.png)
-
Serial layer is a composition of sublayers
------------>
Trax: Other Layers
Embeddding Layer
![image-20220224183254949](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153159104-149467505.png)
Mean Layer
![image-20220224183634388](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153159436-817347370.png)
减少进入下一步的数据
Summary
- Embedding is trainable using an embedding layer
- Mean layer gives a vector representation
Training
Computing gradients in Trax
![image-20220224205318275](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153159693-1942742375.png)
Training with grad()
![image-20220224205601601](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153159937-1528916156.png)
Summary
- grad() allows much easier training
- Forward and backpropagation in one line!
RNN
Traditional Language Models
![image-20220224210045053](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153200197-1954276280.png)
N-grams
- Large N-grams to capture dependencies between distant words 没有大型语料库很难估计
- Need a lot of space and RAM 即使有大型语料库,也需要大量存储空间
Advantages of RNNs
![image-20220224211214233](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153200464-1009222040.png)
have
在这里并没有意义
![image-20220224211319041](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153200703-1558517202.png)
如果使用n-grams
长度会特别长
RNNS Basic Structure
![image-20220224211648199](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153200953-1623031536.png)
可学习的参数
![image-20220224211746746](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153201186-562353800.png)
Summary
- RNNs model relationships among distant words
- In RNNs a lot of computations share parameters
Tasks
按输入输出的性质分组
One to One
输入一组不相关特征X
,返回单个输出Y
如预测球队在排行榜的位置,仅具有一个隐藏状态h^<t0>
,在此类任务RNN并不是那么有用
![image-20220224212305851](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153201402-696170839.png)
Many to One
如情感分析,tweet :I am very happy !
![image-20220224212600166](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153201780-1829559013.png)
Many to Many
如机器翻译,RNN效果会很好,因为他们从头到尾传播信息
![image-20220224212832163](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153202273-871432811.png)
Encoder
以单个表现形式编码单词序列,记录句子的整体含义
![image-20220224212939100](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153202486-1623264713.png)
再解码为另一个语言的单词序列
Summary
- RNNs can be implemented for a variety of NLP tasks
- Applications include Machine translation and caption generation
Math in simple RNNs
A Vanilla RNN
![image-20220224213346255](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153202711-802475506.png)
![image-20220224213852075](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153202931-840838553.png)
![image-20220224214000754](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153203164-2003230143.png)
![image-20220224214415100](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153203488-1793864018.png)
- Hidden states propagate information through time
- Basic recurrent units have two inputs at each time:
h^<t-1>,x^<t>
Cross Entropy Loss
![image-20220224220648938](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153203733-300348827.png)
![image-20220224220802064](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153203994-1721224795.png)
For RNNs the loss function is just an average through time!
Implementation notes
![image-20220224221223781](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153204243-617894803.png)
Frameworks like Tensorflow need this type of abstraction
Parallel computations and GPU usage
Summary
- Frameworks require abstractions
- tf. scan() mimics RNNs
Gated recurrent units (GRU)
Outline
- Gated recurrent unit(GRU) structure
- Comparison between GRUs and vanilla RNNs
GRU会保留主题的相关信息,如They
对应ants
的复数
![image-20220224221726031](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153204585-231021556.png)
Relevance and update gates to remember important prior information |
这些门计算Sigmode,将值压缩到0到1
![image-20220224222714841](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153204858-350995422.png)
Vanilla RNN vs GRUs
-
RNN: 较长的序列前面的信息会丢失,即梯度消失
-
GRUs:更长处理时间和内存使用,更新门和相关门确定之前的隐藏状态的哪些信息是相关的和应该更新哪些信息;hidden state candidates (h')存储可能用来覆盖一个从先前隐藏状态传递过来的信息;当前隐藏状态计算并更新来自是一个隐藏状态的信息;y_hat都用更新的隐藏状态得出
这些计算使网络能够学习什么信息需要保留,以及何时覆盖它 |
Summary
-
GRUs "decide" how to update the hidden state
-
GRUs help preserve important information
GRU是LSTM的简化版本
deep and Bi-directional RNNs
Outline
- How bidirectional RNNs propagate information
- Forward propagation in deep RNNs
<img src="https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153205607-313797841.png" alt="image-20220225151714944" style="zoom:67%;" /
Bi-directional
![image-20220225151902294](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153205819-318721108.png)
![image-20220225152045309](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153206057-1795278387.png)
Deep RNNs
![image-20220225152352717](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153206265-342413106.png)
多个RNN一起
![image-20220225152543260](https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153206507-1181086287.png)
Summary
- In bidirectional RNNs,,the outputs take information from the past and the future
- Deep RNNs have more than one layer,which helps in complex tasks