基于LSTM/BLSTM/CNNBLSTM的命名实体识别任务代码的解析--3_基于lstm的命名实体识别课程设计-CSDN博客

本文链接：https://blog.csdn.net/xiaopihaierletian/article/details/108605720

程序主要包括：

main.py 主程序

model.py 神经网络模型设置程序

pretreatment.py ：数据预处理程序

2、model.py

在该py文件中，主要有一个父类，三个子类，分别对应网络：LSTM、BLSTM、CNNBLSTM

现在对各个部分进行介绍

父类已经在上一篇博客中介绍过了，今天主要是介绍三个网络的代码

2）LSTM
 
class LSTM_NER(neural_tagger):
    def __str__(self):
        return 'LSTM-CRF NER'
    def build(self):
        with tf.name_scope('weights'):
 self.W = tf.get_variable(shape=[self.hidden_dim,self.nb_classes],initializer=tf.truncated_normal_initializer(stddev=0.01),name='weights')
            self.lstm_fw = tf.contrib.rnn.LSTMCell(self.hidden_dim)
        with tf.name_scope('biases'):
            self.b= tf.Variable(tf.zeros([self.nb_classes],name='bias'))
        return
 
    def inference(self,X,X_len,reuse=None):
        word_vectors = tf.nn.embedding_lookup(self.emb_matrix,X)
        word_vectors = tf.nn.dropout(word_vectors,keep_prob=self.keep_prob)
        word_vectors = tf.reshape(word_vectors,[-1,self.time_steps,self.templates*self.emb_dim])
        with tf.variable_scope('label_inference',reuse=reuse):
            outputs,_ = tf.nn.dynamic_rnn(
                self.lstm_fw,
                word_vectors,
                dtype=tf.float32,
                sequence_length=X_len
            )
            outputs = tf.reshape(outputs,[-1,self.hidden_dim])
        with tf.name_scope('linear_transform'):
            scores = tf.matmul(outputs,self.W)+self.b
            scores = tf.reshape(scores,[-1,self.time_steps,self.nb_classes])
        return scores

首先在build()函数中对权重、偏置进行了定义，矩阵的大小与隐藏层数、输出的类别数有关。其次在inference()函数中对网络进行设置，以及输出数据维度的转换

（1）class LSTM_NER(neural_tagger)：子类继承了父类

（2）self.lstm_fw = tf.contrib.rnn.LSTMCell(self.hidden_dim)：self.hidden_dim，int数据类型，LSTM单元中的单位数，表示隐藏层节点的个数，这个是由自己进行设置的

tf.contrib.rnn.LSTMCell(num_units,use_peepholes=False,cell_clip=None,initializer=None,num_proj=None,proj_clip=None,num_unit_shards=None,num_proj_shards=None,forget_bias=1.0,state_is_tuple=True, activation=None, reuse=None )

各个参数的含义：

num_units：int，LSTM单元中的单位数

use_peepholes：bool，将True设置为启用对角线/窥视孔连接

cell_clip：（可选）浮点值（如果提供的话）单元格状态在单元格输出激活之前由此值限制

initializer:(可选）用于权重和投影矩阵的初始化程序

num_proj：（可选）int，投影矩阵的输出维数。如果没有，则不执行投影

proj_clip:(可选）浮点值。如果num_proj > 0和proj_clip被提供，然后投影值被剪裁到内部 [-proj_clip, proj_clip]

num_unit_shards：已弃用，将在2017年1月之前被删除。请改用variable_scope分区器

num_proj_shards：已弃用，将在2017年1月之前被删除。请改用variable_scope分区器

forget_bias：忘记门的偏差默认初始化为1，以减少训练开始时的遗忘规模。必须手动将其设置为0.0从CudnnLSTM受过训练的检查点恢复时

state_is_tuple：如果为True，接受并返回状态是的2元组c_state和m_state。如果为False，则沿列轴连接。后一种行为即将被弃用

activation：内部状态的激活功能。默认：tanh

reuse：（可选）描述是否在现有作用域中重用变量的Python布尔值。如果没有True，并且现有范围已经具有给定的变量，则会引发错误

（https://blog.csdn.net/qq_32458499/article/details/78874625）

（3）tf.nn.embedding_lookup(self.emb_matrix,X)

将X中每一个字所对应的ID与self.emb_matrix中ID所对应的向量进行一一映射

（4）word_vectors = tf.nn.dropout(word_vectors,keep_prob=self.keep_prob)

tf.nn.dropout是TensorFlow里面为了防止或减轻过拟合而使用的函数，它一般用在全连接层。

Dropout就是在不同的训练过程中随机扔掉一部分神经元。也就是让某个神经元的激活值以一定的概率p，让其停止工作，这次训练过程中不更新权值，也不参加神经网络的计算。但是它的权重得保留下来（只是暂时不更新而已），因为下次样本输入时它可能又得工作了

tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None,name=None)

x：指输入

keep_prob: 设置神经元被选中的概率,在初始化时keep_prob是一个占位符， keep_prob = tf.placeholder(tf.float32) ，tensorflow在run时设置keep_prob具体的值

name：指定该操作的名字

输出：和x同样shape的tensor

tensorflow中的dropout就是：使输入tensor中某些元素变为0，其它没变0的元素变为原来的1/keep_prob大小

（5）word_vectors = tf.reshape(word_vectors,[-1,self.time_steps,self.templates*self.emb_dim])

将word_vectors向量的shape改变成[-1,self.time_steps,self.templates*self.emb_dim]形状，后面的的2维是固定的，第一维是不固定的，

比如a.shape()为：[2,4,8,16]，若tf.reshape(a,[-1,4,4,16])，则a变成了[4,4,4,16]，即各个维度相乘的结果是一样的，2*4*8*16=4*4*4*16

（6）tf.nn.dynamic_rnn()

tensorflow的dynamic_rnn方法，我们用一个小例子来说明其用法，假设你的RNN的输入input是[2,20,128]，其中2是batch_size,20是文本最大长度，128是embedding_size，可以看出，有两个example，我们假设第二个文本长度只有13，剩下的7个是使用0-padding方法填充的。dynamic返回的是两个参数：outputs,last_states，其中outputs是[2,20,128]，也就是每一个迭代隐状态的输出，last_states是由(c,h)组成的tuple，均为[batch,128]。

到这里并没有什么不同，但是dynamic有个参数：sequence_length，这个参数用来指定每个example的长度，比如上面的例子中，我们令sequence_length为[20,13]，表示第一个example有效长度为20，第二个example有效长度为13，当我们传入这个参数的时候，对于第二个example，TensorFlow对于13以后的padding就不计算了，其last_states将重复第13步的last_states直至第20步，而outputs中超过13步的结果将会被置零

使用tf.dynamic_rnn显然比其他的RNN来的更方便和节约计算资源，因此推荐优先使用tf.dynamic_rnn

（https://blog.csdn.net/u010223750/article/details/71079036）

在inference()函数中向量维度的变化如下所示（?表示的是batch_size的大小）：

 def inference(self, X, X_len, reuse=None):
 
        word_vectors = tf.nn.embedding_lookup(self.emb_matrix, X)
 
        # self.emb_matrix.shape：(4835,100)  X：(?, 175, 5)
 
        # word_vectors.shape ##(?,175,5,100)
 
        word_vectors = tf.nn.dropout(word_vectors, keep_prob=self.keep_prob)
 
        # word_vectors.shape  ##(?,175,5,100)  ##(5,100)做展平，一个一个放在后面，把5个100维的展平成一个500维的
 
        word_vectors = tf.reshape(word_vectors, [-1, self.time_steps, self.templates * self.emb_dim])  ##emb_dim ：向量的维度
 
        # word_vectors.shape   ##(?,175,500)
 
        with tf.variable_scope('label_inference', reuse=reuse):
 
            outputs, _ = tf.nn.dynamic_rnn(
 
                self.lstm_fw,
 
                word_vectors,
 
                dtype=tf.float32,
 
                sequence_length=X_len
 
            )
 
            # outputs.shape  ##(?,175,50)
 
            outputs = tf.reshape(outputs, [-1, self.hidden_dim])
 
        # outputs.shape ##(?,50)
 
        with tf.name_scope('linear_transform'):
 
            scores = tf.matmul(outputs, self.W) + self.b
 
            # scores.shape  ##(?,8)
 
            scores = tf.reshape(scores, [-1, self.time_steps, self.nb_classes])
 
        # scores.shape ##(?,175,8)
 
        return scores