[吴恩达团队自然语言处理第3课_1]神经网络与情感分析 RNN

[吴恩达团队自然语言处理第3课_1]神经网络与情感分析 RNN

Neural networks

image-20220224171510215
image-20220224171615451

Initial Representation

image-20220224172754061空的补0

Summary

  • Structure for sentiment analysis
  • Classify complex tweets
  • Initial representation

Trax neural networks

基于tensorflow

image-20220224174020199
Advantages of using frameworks
  • Run fast on CPUs,GPUs and TPUs
  • Parallel computing
  • Record algebraic computations for gradient evaluation

主要框架

Tensorflow Pytorch JAX

Trax layers
Classes

Classed in Python

class MyClass(Object): 
    def _init_(self,y): 
        self.y=y 
    def my_method(self,x): 
        return x+self.y 
    def _call_(self,x): 
        return self. my_method(x)
    
f = MyClass(7)
print(f(3))
#10
Subclasses
class SubClass(MyClass): 
    def my_method(self,x): 
        returnx+self.y**2
        
f = SubClass(7)
print(f(3))
#52

Dense Layer and ReLu Layer

Dense Layer
image-20220224180348278
ReLu layer

image-20220224180451920

Summary
  • Dense Layer ->

    \[z^{[i]}=W^{[i]}a^{[i-1]} \]
  • ReLu Layer ->

    \[g(z^{[i]})=max(0,z^{[i]}) \]

serial layer

image-20220224182616411
  • Serial layer is a composition of sublayers

    ------------>

Trax: Other Layers

Embeddding Layer
image-20220224183254949
Mean Layer
image-20220224183634388

减少进入下一步的数据

Summary
  • Embedding is trainable using an embedding layer
  • Mean layer gives a vector representation

Training

Computing gradients in Trax
image-20220224205318275
Training with grad()
image-20220224205601601
Summary
  • grad() allows much easier training
  • Forward and backpropagation in one line!

RNN

Traditional Language Models

image-20220224210045053

N-grams

\[P(w_2|w_1)=\frac{count(w_1,w_2)}{count(w_1)} →Bigrams\\ P(w_3|w_1,w_2)=\frac{count(w_1,w_2,w_3)}{count(w_1,w_2)} →Trigrams\\ P(w_1,w_2,w_3)=P(w_1)\times P(w_2|w_1)\times P(w_3|w_2) \]
  • Large N-grams to capture dependencies between distant words 没有大型语料库很难估计
  • Need a lot of space and RAM 即使有大型语料库,也需要大量存储空间

Advantages of RNNs

image-20220224211214233

have在这里并没有意义

image-20220224211319041

如果使用n-grams长度会特别长

RNNS Basic Structure

image-20220224211648199

可学习的参数

\[W_h,W_x,W \]
image-20220224211746746
Summary
  • RNNs model relationships among distant words
  • In RNNs a lot of computations share parameters

Tasks

按输入输出的性质分组

One to One

输入一组不相关特征X,返回单个输出Y

如预测球队在排行榜的位置,仅具有一个隐藏状态h^<t0>,在此类任务RNN并不是那么有用

image-20220224212305851
Many to One

如情感分析,tweet :I am very happy !

image-20220224212600166
Many to Many

如机器翻译,RNN效果会很好,因为他们从头到尾传播信息

image-20220224212733143image-20220224212808383

image-20220224212832163

Encoder以单个表现形式编码单词序列,记录句子的整体含义

image-20220224212939100

再解码为另一个语言的单词序列

Summary
  • RNNs can be implemented for a variety of NLP tasks
  • Applications include Machine translation and caption generation

Math in simple RNNs

A Vanilla RNN
image-20220224213346255 $$ h^{
}=g(W_{hh}h^{ }\oplus W_{hx}x^{ }+b_h) $$ 先关注RNN中的第一个单元
image-20220224213852075
image-20220224214000754 $$ W_{hh}h^{
}\oplus W_{hx}x^{ }+偏置\ 再通过激活函数\ 得h^{ }\\ $$
image-20220224214415100 $$ \hat y^{
}=g(w_{yh}h^{ }+by) $$ **Summary**
  • Hidden states propagate information through time
  • Basic recurrent units have two inputs at each time:h^<t-1>,x^<t>

Cross Entropy Loss

image-20220224220648938
image-20220224220802064 $$ J=-\frac{1}{T}\sum_{t=1}^T\sum_{j=1}^K y_j^{
}\log \hat y_j^{ }\\ $$ **Summary**

For RNNs the loss function is just an average through time!

Implementation notes

image-20220224221223781

Frameworks like Tensorflow need this type of abstraction
Parallel computations and GPU usage

Summary

  • Frameworks require abstractions
  • tf. scan() mimics RNNs

Gated recurrent units (GRU)

Outline

  • Gated recurrent unit(GRU) structure
  • Comparison between GRUs and vanilla RNNs

GRU会保留主题的相关信息,如They对应ants的复数

image-20220224221726031
Relevance and update gates to remember important prior information
\[\Gamma_r:relevance \ gate\ 相关门\\ \Gamma_u:update\ gate\ 更新门 \]

这些门计算Sigmode,将值压缩到0到1

image-20220224222714841

Vanilla RNN vs GRUs

  • RNN: 较长的序列前面的信息会丢失,即梯度消失

    image-20220224222912219
  • GRUs:更长处理时间和内存使用,更新门和相关门确定之前的隐藏状态的哪些信息是相关的和应该更新哪些信息;hidden state candidates (h')存储可能用来覆盖一个从先前隐藏状态传递过来的信息;当前隐藏状态计算并更新来自是一个隐藏状态的信息;y_hat都用更新的隐藏状态得出

image-20220224223905425

这些计算使网络能够学习什么信息需要保留,以及何时覆盖它

Summary

  • GRUs "decide" how to update the hidden state

  • GRUs help preserve important information

    GRU是LSTM的简化版本

deep and Bi-directional RNNs

Outline

  • How bidirectional RNNs propagate information
  • Forward propagation in deep RNNs

<img src="https://img2022.cnblogs.com/blog/1586717/202202/1586717-20220225153205607-313797841.png" alt="image-20220225151714944" style="zoom:67%;" /

Bi-directional

image-20220225151902294
image-20220225152045309

Deep RNNs

image-20220225152352717

多个RNN一起

image-20220225152543260

Summary

  • In bidirectional RNNs,,the outputs take information from the past and the future
  • Deep RNNs have more than one layer,which helps in complex tasks
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值