1. RNN神经网络初探

赵孝正

已于 2024-08-13 22:09:30 修改

阅读量527

点赞数

分类专栏：自然语言处理 # 1. 自然语言处理&知识图谱文章标签：神经网络人工智能深度学习

于 2023-02-11 21:55:47 首次发布

本文链接：https://blog.csdn.net/weixin_46713695/article/details/128960963

版权

1. 自然语言处理&知识图谱同时被 2 个专栏收录

25 篇文章 5 订阅

订阅专栏

自然语言处理

16 篇文章 0 订阅

订阅专栏

1. 神经网络与未来智能

2. 回顾数据维度和神经网络

在这里插入图片描述
循环神经网络，主要用来处理时序的数据，它对每个词的顺序是有要求的。

循环神经网络如何保存记忆功能？

在这里插入图片描述
当前样本只有 3 个特征，即 $x_1$ 、 $x_2$ 、 $x_3$ 、

上图中，第一个神经元 $h_0$ ，一次只读了一个词 $x_1$ 。由于是第一个，那么 $h_0$ 可以理解为应该是 0，然后求 $h_1$

然后中间的图，读取第二个词 $x_2$ ，

在第三个神经元中可以看出，output = fn(w3x3+u3h2)，由公式可以得出结论，离 $x_3$ 越近的特性，对 $x_3$ 的影响越大。

整个样本的特征遍历完毕之后，这一个神经元就结束了，会向下一层输出 $X^{'}$ ，也就是我们前面的 $Y$ ，从上图可以看出来，最后，可以看出，神经元 $h_2$ ，不仅受当前特征 $x_3$ 的影响，还受第一个特征 $x_1$ 和第二个特征 $x_2$ 的影响，这就是我们所说的 记忆功能。
在这里插入图片描述

动图如下：
在这里插入图片描述
上图是 RNN 的某一个神经元，根据时间不断的依次读取某个样本的时序数据，不断的循环，当样本的所有特征都运行完毕之后，输出结果给下一层的 RNN。

3. 文本转变为词向量

传统的文本如何转换成向量？

如果需要通过神经网络来完成自然语言的处理，向量化是我们构建神经网络的第一步。

from tensorflow.keras.preprocessing.text import Tokenizer  # 标记器(每一个词，以我们的数值做映射，)

words = ['LaoWang has a Wechat account.', 'He is not a nice person.', 'Be careful.']  # 把这句话中每一个单词，映射成我们的数值
tokenizer = Tokenizer(num_words=15)  # 上面三句话中，词的总数不超过 15 个（估算的值）， num_words 设置单词的数量
# 如果实际单词数量超过15，例如20，那么 Tokenizer 会自动忽略不在前 15 个频率内的单词。
tokenizer.fit_on_texts(words)
word_index = tokenizer.word_index
print(word_index, len(word_index))
# {'a': 1, 'laowang': 2, 'has': 3, 'wechat': 4, 'account': 5, 'he': 6, 'is': 7, 'not': 8, 'nice': 9, 'person': 10, 'be': 11, 'careful': 12} 12
# 把文本转化为序列编码
sequences = tokenizer.texts_to_sequences(words)
print(sequences)
# [[2, 3, 1, 4, 5], [6, 7, 8, 1, 9, 10], [11, 12]]
# 文本转化为矩阵
one_hot_matrix = tokenizer.texts_to_matrix(words, mode='binary')

# 向量化是构建神经网络的第一步
print(tokenizer.word_index.keys())
# dict_keys(['a', 'laowang', 'has', 'wechat', 'account', 'he', 'is', 'not', 'nice', 'person', 'be', 'careful'])
print(one_hot_matrix, one_hot_matrix.shape)
# [[0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
# [0. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0.] 
# [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0.]] (3, 15)

one_hot_matrix，没有保存词与词之间的上下级关系，而自然语言处理是严格依赖上下文的，否则它的效果会大打折扣。

产生问题：
在这里插入图片描述

4. 有趣的词嵌入

由于 one-hot 编码存在的问题，导致我们需要先进行词嵌入，然后再放入神经网络中进行训练。

所谓词嵌入，就是将一个高维度、低密度的向量，转换为一个低维度、高密度的向量空间，也可以理解位压缩。

在压缩的过程中，尽量保证词与词之间的相关性。

下图共有 6 部电影，根据之前的 One-Hot 编码，会得到这样类似的结果，把这样的结果进行降维，降到 2 维。降到 2 维之后，根
在这里插入图片描述
据对定义的理解，《红海行动》偏爱国，而《肖申克救赎》偏励志，而《攀登者》兼顾励志和爱国。如果能变成下图的样子，不仅能够使维度下降，而且电影与电影之间的关系也体现了出来。------这就是词嵌入。
在这里插入图片描述
但是，我们的模型不知道什么是【励志】和【爱国】，这两个词是为了方便理解，模拟出来的。但是对于模型来说，如果我们词告诉她把所有这些词压缩到 2 维空间，模型会去不断的训练，找到所对应的维度，这里的概念和机器学习的聚类有些相似，可以进行类比理解。

完成 词嵌入 后，得到如下结果：
在这里插入图片描述
上图中，小数越高的，意味着它的特征越明显。

第一句话的所有词共 8 个，第二句话的所有词共 9 个，二者相加，再减去二者重复的词 2*8，就得到二者的相似性。

值越小，两个句子越接近。

下图中输入【王力宏】后，关键词的匹配就是 n-gram，
在这里插入图片描述

keras 的 Embedding 实现

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding  # 词嵌入

model = Sequential()
# 当 output_dim=2 时，对应上面电影的例子
model.add(Embedding(input_dim=7, output_dim=3, ))  # 输入维度、输出维度
model.compile()  # 全部参数使用默认
# 两句话相同的部分越多，则相似性越高。那么不同的部分，是同义词或近义词的概率较高。
x = np.array([[0, 1, 0, 1, 1, 0, 0],
              [1, 1, 1, 1, 1, 1, 1]])  # 假设有 2 句话，已经转换成了 One-Hot 编码
print('input shape data: \n', x, x.shape)
result = model.predict(x)  # 没有任何的权重和偏执，直接将 x 放进来
print('Embedding:', result, 'shape:', result.shape)

input shape data: 
 [[0 1 0 1 1 0 0]
 [1 1 1 1 1 1 1]] (2, 7)
1/1 [==============================] - 1s 1s/step

词嵌入的结果：
Embedding: [[[ 0.03419163  0.01381184 -0.03241118]
  [-0.0072153  -0.00486064  0.04702257]
  [ 0.03419163  0.01381184 -0.03241118]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [ 0.03419163  0.01381184 -0.03241118]
  [ 0.03419163  0.01381184 -0.03241118]]

 [[-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]
  [-0.0072153  -0.00486064  0.04702257]]] shape: (2, 7, 3)

5. RNN原理剖析

RNN 对接收的数据，是有时间和序列要求的，它的顺序是非常重要的。

所以每一次，某一个 RNN 的神经元，不像深度神经网络那样，一开始就把所有的 $\pmb{x}$ 和权重 $\pmb{w}$ 相乘相加得到结果，它是一个一个的 $xi \pmb{x_i}$ 进行运算，下图中省略了偏执，
在这里插入图片描述
除了当前的特征（如 $x1 \pmb{x_1}$ ）以及自己的权重（如 $w1 \pmb{w_1}$ ）之外，上一次的结果（如 $h1 \pmb{h_1}$ ）也会被拿过来进行计算。即当前的运算会受到上一次结果的影响，后边会添加激活函数。

随着时间的推移，当前样本 $\pmb{x}$ 的所有特征 $x_i$ 依次对神经元都运算完之后，当前这个样本才算真正训练完毕。
在这里插入图片描述
将上面动图中的神经元并列的设置多个，就组成了一层的神经网络。

本部分内容，我们先声明一个神经网络，来进行代码的实现。

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import SimpleRNNCell  # 某个【时间点】的某一次计算单元

# 第 t 时刻要训练的数据
xt = tf.Variable(np.random.randint(2, 3, size=[1, 1]), dtype=tf.float32)
print(xt)
# https://www.cnblogs.com/Renyi-Fan/p/13722276.html

# units: 正整数，输出空间的维度, 即隐藏层 神经元 数量.
# activation: 激活函数，默认是tanh
# use_bias: Boolean, 是否使用偏置向量.
# kernel_initializer: 输入和隐藏层之间的权重参数初始化器.默认使用
# 'glorot_uniform'
# recurrent_initializer: 隐藏层之间的权重参数初始化器.默认使用
# 'orthogonal'
# bias_initializer: 偏置向量的初始化器.
cell = SimpleRNNCell(units=1, activation=None, use_bias=True, kernel_initializer='ones', recurrent_initializer='ones',
                     bias_initializer=tf.keras.initializers.Constant(value=3))
cell.build(input_shape=[None, 1])
print('variables', cell.variables)
print('config:', cell.get_config())

print(tf.nn.tanh(tf.constant([-float("inf"), 6, float("inf")])))

# 第t时刻运算
ht_1 = tf.ones([1, 1])
# out = active(xt * wt + ht_1 * ut + bias)
out, ht = cell(xt, ht_1)  # LSTM， out和ht是一样的，
print(out, ht[0])
print(id(out), id(ht[0]))  # 如果out和ht的id是一样的，那么说明它俩是一个东西。

# 第t+1时刻运算
cell2 = SimpleRNNCell(units=1, activation=None, use_bias=True, kernel_initializer='ones',
                      recurrent_initializer=tf.keras.initializers.Constant(value=3), bias_initializer='ones')
xt2 = tf.Variable(np.random.randint(3, 4, size=[1, 1]), dtype=tf.float32)
# out = active(xt * wt + ht_1 * ut + bias)
out2, ht2 = cell2(xt2, ht[0])
print(out2, ht2[0])

参考链接：

Cell：在某个时间点的某一次计算单元，比如下图，在时间点 2， $h_2=fn(w_2x_2+u_2h_1)$ 这个公式计算完之后，整个过程就是一个 Cell
在这里插入图片描述

在时间轴上的某一次运算称为一个 Cell，它是一个非常底层的概念，需要我们自己管理每次运算的输入、权重、偏执、输出。因此对于理解 RNN 底层是非常有帮助的。

因为比较底层，所以它对 RNN 整个流程的时间线是非常有帮助的。

print(tf.Variable(np.random.randint(2, 3, size=[1, 1]), dtype=tf.float32))
<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[2.]], dtype=float32)>

print(tf.Variable(np.random.randint(1, 2, size=[1, 1]), dtype=tf.float32))  # 生成的数是固定的2维的，每次都是1
<tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[1.]], dtype=float32)>

先声明特征值，神经网络只能接受向量，

xt:  <tf.Variable 'Variable:0' shape=(1, 1) dtype=float32, numpy=array([[2.]], dtype=float32)>
variables:  [<tf.Variable 'kernel:0' shape=(1, 1) dtype=float32, numpy=array([[1.]], dtype=float32)>, <tf.Variable 'recurrent_kernel:0' shape=(1, 1) dtype=float32, numpy=array([[1.]], dtype=float32)>, <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([3.], dtype=float32)>]
config:  {'name': 'simple_rnn_cell', 'trainable': True, 'dtype': 'float32', 'units': 1, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'Ones', 'config': {}}, 'recurrent_initializer': {'class_name': 'Ones', 'config': {}}, 'bias_initializer': {'class_name': 'Constant', 'config': {'value': 3}}, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0}
tf.Tensor([-1.          0.99998784  1.        ], shape=(3,), dtype=float32)
================================================================
tf.Tensor([[6.]], shape=(1, 1), dtype=float32) :  tf.Tensor([6.], shape=(1,), dtype=float32)
2210932906288 :  2210932906992