RNN与LSTM简单案例实现

路过的风666

已于 2022-04-01 22:48:36 修改

阅读量1.4k

点赞数 1

分类专栏：自然语言处理机器学习文章标签：机器学习自然语言处理

于 2022-04-01 22:18:24 首次发布

本文链接：https://blog.csdn.net/m0_46246301/article/details/123909503

版权

机器学习同时被 2 个专栏收录

15 篇文章 2 订阅

订阅专栏

自然语言处理

9 篇文章 0 订阅

订阅专栏

前言：针对视频、语音、文本等时序数据，很多时候比较适合利用一些时间序列网络模型RNN和LSTM等。

类比于人类在看视频、听语音、读文章时，我们往往都是针对一序列的数据来思考，也就是说，当我们看到下一帧/一段数据时，之前看过的数据还保留在脑海中，对后面的数据分析是存在影响和指导意义的。那么，我们就需要设计相类似的神经网络（RNN/LSTM等）去对这类时序数据进行更好的分析，而非看到后面忘了前面，在得到这些数据更具表示性的特征之后，我们就能完成进一步的下游任务，例如视频/文本的情感分类等各种分类聚类任务了。

此处简单总结了一下两者相关的基本知识，并提供一个不调库的简单实现。

RNN

原理

原理比较简单，基础的神经网络只是层与层之间建立了权连接，而RNN则是在层之间的神经元之间也建立了权连接，也就是各个隐藏层的神经元之间是权连接的，也就是说，随着序列不断延展，前面的隐藏层将会影响到后面的隐藏层，损失也是随着序列的延展而不断积累的。

LSTM

原理

RNN由于梯度消失的原因只能有短期记忆，LSTM网络通过精妙的门控制将短期记忆与长期记忆结合起来，并且一定程度上解决了梯度消失的问题。利用了三个控制门进行实现：输入门、遗忘门、输出门。

遗忘门
- 作用对象：细胞状态
- 作用：将细胞状态中的信息选择性的遗忘
  基于已经看到的预测下一个词，在这个问题中，细胞状态可能包含当前主语的类别，因此正确的代词可以被选择出来。当我们看到新的主语，我们希望忘记旧的主语。
- 例如，他今天有事，所以我。。。当处理到‘’我‘’的时候选择性的忘记前面的’他’，或者说减小这个词对后面词的作用。
输入门
- 作用对象：细胞状态
- 作用：将新的信息选择性的记录到细胞状态中
  在我们语言模型的例子中，我们希望增加新的主语的类别到细胞状态中，来替代旧的需要忘记的主语。
- 例如：他今天有事，所以我。。。。当处理到‘’我‘’这个词的时候，就会把主语我更新到细胞中去。
输出门
- 作用对象：隐层ht
- 作用：在语言模型的例子中，因为他就看到了一个代词，可能需要输出与一个动词相关的信息。例如，可能输出是否代词是单数还是负数，这样如果是动词的话，我们也知道动词需要进行的词形变化。
- 例如：上面的例子，当处理到‘’我‘’这个词的时候，可以预测下一个词，是动词的可能性较大，而且是第一人称。

参考：https://blog.csdn.net/qq_16792139/article/details/115530197

RNN和LSTM简单案例实现

每个单词用one-hot encoding

import numpy as np

sentence = 'white blood cells destroying an infection'
words = sentence.split(' ') 
X = np.eye(len(words))

for i in range(len(words)):
    print(words[i],'的one-hot encoding:',X[i])
    
----------------------------------------------------------------------------
Output:
white 的one-hot encoding: [1. 0. 0. 0. 0. 0.]
blood 的one-hot encoding: [0. 1. 0. 0. 0. 0.]
cells 的one-hot encoding: [0. 0. 1. 0. 0. 0.]
destroying 的one-hot encoding: [0. 0. 0. 1. 0. 0.]
an 的one-hot encoding: [0. 0. 0. 0. 1. 0.]
infection 的one-hot encoding: [0. 0. 0. 0. 0. 1.]

随机初始化一个转换矩阵W，每个单词线型转换到low-dimension低维向量，写出每个单词的低维向量，写出转换矩阵参数W。(以维度为2为例)

W = np.random.uniform(0,1,(6,2))
low_X = X.dot(W)
for i in range(len(words)):
    print(words[i],'的low-dimension encoding:',low_X[i])
    
----------------------------------------------------------------------------
Output:
white 的low-dimension encoding: [0.22358387 0.45554808]
blood 的low-dimension encoding: [0.26184663 0.84112237]
cells 的low-dimension encoding: [0.81084067 0.9949041 ]
destroying 的low-dimension encoding: [0.44082166 0.33534952]
an 的low-dimension encoding: [0.26281604 0.37646656]
infection 的low-dimension encoding: [0.09828981 0.4961023 ]

依次对序列，例如（white），（white, blood）, （white, blood, cells）, …等序列，经过RNN或LSTM网络模型，输出该序列的输出向量
获得目标单词（每个输入序列的下一个单词）的向量，写出该目标单词的低维向量，并且构造并写出损失函数
例如 $L = log( sigmoid( v(white) * v(blood)^T )))， \ $
例如 $L = log( sigmoid( v（white, blood） * v(cells)^T )))$

使用梯度下降算法，更新转换矩阵W的参数，更新公式为：W = W - 0.01 * (1 - L), 写出更新后的参数W

【RNN版】

from math import log
sentence = 'white blood cells destroying an infection'
words = sentence.split(' ')
# 初始化参数
lr=0.01
W1=np.ones((2,2))
W2=np.ones((2,2))

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

for i in range(len(words)-1):
    store=[0,0]  # 隐藏层初始为0
    # 前向传播
    for j in range(0,i+1):
        mid=low_X[j]@W1+store  # 隐藏层更新
        store=mid
        y=mid@W2     # 输出层
        
    # 计算损失
    next_word=low_X[i+1]
    loss=log(sigmoid(np.dot(y,next_word.T)))
    
    # 转换矩阵更新
    W_new=W-lr*(1-loss)
    
    sequence=''
    for j in range(0,i+1):
        sequence+=words[j]
        sequence+=' '
    print("input sequence is '{}',next_word is {}".format(sequence,next_word))
    print("W_new is {}".format(W_new))

Output:
input sequence is 'white ',next_word is [0.26184663 0.84112237]
W_new is [[0.21156631 0.44353052]
 [0.24982907 0.82910481]
 [0.79882311 0.98288654]
 [0.4288041  0.32333196]
 [0.25079848 0.364449  ]
 [0.08627225 0.48408474]]
input sequence is 'white blood ',next_word is [0.81084067 0.9949041 ]
W_new is [[0.21356786 0.44553206]
 [0.25183061 0.83110635]
 [0.80082466 0.98488808]
 [0.43080565 0.32533351]
 [0.25280002 0.36645054]
 [0.08827379 0.48608629]]
input sequence is 'white blood cells ',next_word is [0.44082166 0.33534952]
W_new is [[0.21354583 0.44551003]
 [0.25180858 0.83108432]
 [0.80080262 0.98486605]
 [0.43078361 0.32531148]
 [0.25277799 0.36642851]
 [0.08825176 0.48606426]]
input sequence is 'white blood cells destroying ',next_word is [0.26281604 0.37646656]
W_new is [[0.21354621 0.44551041]
 [0.25180896 0.8310847 ]
 [0.80080301 0.98486643]
 [0.430784   0.32531186]
 [0.25277837 0.36642889]
 [0.08825214 0.48606464]]
input sequence is 'white blood cells destroying an ',next_word is [0.09828981 0.4961023 ]
W_new is [[0.21355779 0.445522  ]
 [0.25182055 0.83109629]
 [0.80081459 0.98487802]
 [0.43079558 0.32532344]
 [0.25278995 0.36644047]
 [0.08826373 0.48607622]]

【LSTM版】

在RNN的基础上，多加了三个控制门（输入门、遗忘门、输出门）

import math
from math import log
def sigmoid_function(z):
    ls=[]
    for i in range(z.shape[0]):
        ls.append(1/(1 + math.exp(-z[i])))
    return ls
def sigmoid_function_2(z):
    return 1/(1 + math.exp(-z))
def round_ls(ls):
    temp=[]
    for i in range(len(ls)):
        temp.append(round(ls[i]))
    return np.array(temp)
    
sentence = 'white blood cells destroying an infection'
words = sentence.split(' ') 
# 初始化参数
lr=0.01
W1=np.random.rand(2,2)
bias1=np.random.rand(2)
W2=np.random.rand(2,2)
bias2=np.random.rand(2)
W3=np.random.rand(2,2)
bias3=np.random.rand(2)
W4=np.random.rand(2,2)
bias4=np.random.rand(2)
# 前向传播
for i in range(len(low_X)-1):
    c=[0,0]  # 隐藏层初始为0向量
    for j in range(0,i+1):
        input_v=low_X[j]@W1+bias1
        input_gate=round_ls(sigmoid_function(low_X[j]@W2+bias2))
        forget_gate=round_ls(sigmoid_function(low_X[j]@W3+bias3))
        output_gate=round_ls(sigmoid_function(low_X[j]@W4+bias4))
        c=input_v*input_gate+forget_gate*c
        y=sigmoid_function(c)*output_gate
        
    # 计算损失并更新转换矩阵W
    next_word=low_X[i+1]
    loss=log(sigmoid_function_2(np.dot(y,next_word.T)))
    W_new=W-lr*(1-loss)
    
    sequence=''
    for j in range(0,i+1):
        sequence+=words[j]
        sequence+=' '
    print("input sequence is '{}',next_word is {}".format(sequence,next_word))
    print("W_new is {}".format(W_new))

Output:
input sequence is 'white ',next_word is [0.26184663 0.84112237]
W_new is [[0.20973448 0.44169869]
 [0.24799723 0.82727298]
 [0.79699128 0.9810547 ]
 [0.42697227 0.32150013]
 [0.24896664 0.36261716]
 [0.08444042 0.48225291]]
input sequence is 'white blood ',next_word is [0.81084067 0.9949041 ]
W_new is [[0.21150985 0.44347405]
 [0.2497726  0.82904834]
 [0.79876665 0.98283007]
 [0.42874764 0.3232755 ]
 [0.25074201 0.36439253]
 [0.08621578 0.48402828]]
input sequence is 'white blood cells ',next_word is [0.44082166 0.33534952]
W_new is [[0.20960578 0.44156999]
 [0.24786854 0.82714428]
 [0.79686258 0.98092601]
 [0.42684357 0.32137143]
 [0.24883794 0.36248846]
 [0.08431172 0.48212421]]
input sequence is 'white blood cells destroying ',next_word is [0.26281604 0.37646656]
W_new is [[0.20926757 0.44123177]
 [0.24753032 0.82680606]
 [0.79652437 0.98058779]
 [0.42650536 0.32103322]
 [0.24849973 0.36215025]
 [0.0839735  0.481786  ]]
input sequence is 'white blood cells destroying an ',next_word is [0.09828981 0.4961023 ]
W_new is [[0.20916201 0.44112621]
 [0.24742476 0.8267005 ]
 [0.79641881 0.98048223]
 [0.4263998  0.32092766]
 [0.24839417 0.36204469]
 [0.08386794 0.48168044]]

RNN和LSTM与传统方法的区别

参数量较大
适用于数据量大的情况
可以通过LR/SVM等分类器完成一个下游任务（情感分类、文档分类）

路过的风666

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
0
评论
RNN与LSTM简单案例实现

前言：针对视频、语音、文本等时序数据，很多时候比较适合利用一些时间序列网络模型RNN和LSTM等。类比于人类在看视频、听语音、读文章时，我们往往都是针对一序列的数据来思考，也就是说，当我们看到下一帧/一段数据时，之前看过的数据还保留在脑海中，对后面的数据分析是存在影响和指导意义的。那么，我们就需要设计相类似的神经网络（RNN/LSTM等）去对这类时序数据进行更好的分析，而非看到后面忘了前面，在得到这些数据更具表示性的特征之后，我们就能完成进一步的下游任务，例如视频/文本的情感分类等各种分类聚类任务了。.
复制链接

扫一扫