对循环神经网络的求根问底

最新推荐文章于 2022-12-07 13:53:49 发布

东方隐侠-千里

最新推荐文章于 2022-12-07 13:53:49 发布

阅读量480

点赞数 1

本文链接：https://blog.csdn.net/qq_37865996/article/details/88022439

版权

17.人工智能专栏收录该内容

42 篇文章 3 订阅

订阅专栏

1.循环神经网络的研习

循环神经网络RNN是深度学习算法中非常有名的一种算法，表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中，即隐藏层之间的节点不再无连接而是有连接的，隐藏层的输入包括输入层的输出和上一时刻隐藏层的输出。

我们不禁发问，这样做的意义是什么？这是因为我们之前研究的卷积神经网络等算法是建立在“元素之间是相互独立的，输入与输出也是独立的”这样一个假设的，其实这与实际有很大的不符，就会导致现实生活中如上下文推导，计算机就很难完成了。

这里写图片描述

RNN的基础公式：，其中f是激活函数，激活函数可以起到过滤的作用。

预测的时候，还要使用权重矩阵，则时刻t的输出为：

在RNN的使用中，St被当作隐状态，捕捉之前时间点上的信息。因为使用了激活函数，St必然不是前面时刻的所有信息，这点就越来越像人了。和卷积神经网络一样，这里的网络中每个cell都共享了一组参数（U，V，W）,这样就能极大的降低计算量了。 ot在很多情况下都是不存在的。

对其改进，一方面是考虑有些事情只能“马后炮”，很多时候不能用现在推测明天，但是昨天发生的事情可以从今天的状态来推测，这里就开启了双向RNN：

可以看到这里进行了拼接，则需要的空间就是之前单向RNN的两倍了。

为了囊括更多的信息，继续引入了深层双向RNN，增加了隐藏层的数目：

这里写图片描述

在RNN的学习中，会出现LSTM这一概念。LSTM（long short-term memory）长短期记忆网络是RNN的一种变体，RNN由于梯度消失的原因只能有短期记忆，LSTM网络通过精妙的门控制将短期记忆与长期记忆结合起来，并且一定程度上解决了梯度消失的问题。这部分我们来这里学：https://www.jianshu.com/p/9dc9f41f0b29。LSTM的关键就是神经细胞状态，水平线在图上方贯穿运行，细胞状态类似于传送带，状态通过水平线在细胞之间传递，从而保证记忆能长期保存。关于输入输出，使用门的结构来进行控制信息的去除或者增加，通过的比例使用0～1来规定。

2.RNN的实现

在TensorFlow中，LSTM被封装成一个组件，使用时只需要指定节点数即可。

单向循环神经网络结构与实现中，例如，我们设置长度为100，即为张量100*！，经过Embedding之后处理为128维张量，此时待处理节点数为128的循环神经网络组件。LSTM处理核数为128，输入节点数128，隐藏节点数128，输出节点数根据我们设定为二分类问题为2，产生了维度为2的张量。

# IMDB Dataset loading
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
trainX, trainY = train
testX, testY = test

def lstm(trainX, trainY,testX, testY):
    # Data preprocessing
    # Sequence padding。将训练集和测试集转换成长度为100的序列，不足就用0填充
    trainX = pad_sequences(trainX, maxlen=100, value=0.)
    testX = pad_sequences(testX, maxlen=100, value=0.)
    # Converting labels to binary vectors将二分类问题的结果转换为二维向量
    trainY = to_categorical(trainY, nb_classes=2)
    testY = to_categorical(testY, nb_classes=2)

    # Network building。要处理的序列个数不确定，每个序列的长度固定100
    net = tflearn.input_data([None, 100])
    #定义嵌入模块，输出为度为128，imput_dim事输入参数中各个变量的最大值
    net = tflearn.embedding(net, input_dim=10000, output_dim=128)
    #定义LSTM模块，核数为128.dropout可以避免过拟合
    net = tflearn.lstm(net, 128, dropout=0.8)
    #定义全连接层，输出层节点数为2
    net = tflearn.fully_connected(net, 2, activation='softmax')
    net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                             loss='categorical_crossentropy')

    # Training
    model = tflearn.DNN(net, tensorboard_verbose=0)
    model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
              batch_size=32,run_id="rnn-lstm")

双向循环神经网络的实现中，双向循环神经网络也被封装成一个组件，使用时只需指定节点数即可。以处理序列长度为200的二分类问题为例，我们设置长度为200，即为张量200*！，经过Embedding之后处理为128维张量，此时待处理两个节点数为128的双向循环神经网络组件。双向循环神经网络处理核数为128，输入节点数128，隐藏节点数128，输出节点数根据我们设定为二分类问题为2，产生了维度为2的张量。

def bi_lstm(trainX, trainY,testX, testY):
    trainX = pad_sequences(trainX, maxlen=200, value=0.)
    testX = pad_sequences(testX, maxlen=200, value=0.)
    # Converting labels to binary vectors
    trainY = to_categorical(trainY, nb_classes=2)
    testY = to_categorical(testY, nb_classes=2)

    # Network building
    net = tflearn.input_data(shape=[None, 200])
    net = tflearn.embedding(net, input_dim=20000, output_dim=128)
    net = tflearn.bidirectional_rnn(net, BasicLSTMCell(128), BasicLSTMCell(128))
    net = tflearn.dropout(net, 0.5)
    net = tflearn.fully_connected(net, 2, activation='softmax')
    net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy')

    # Training
    model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=2)
    model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=64,run_id="rnn-bilstm")

作者的全部代码是这样的：

from __future__ import division, print_function, absolute_import

import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb
from tflearn.layers.embedding_ops import embedding
from tflearn.layers.recurrent import bidirectional_rnn, BasicLSTMCell
import os
import pickle
from six.moves import urllib

import tflearn
from tflearn.data_utils import *

# IMDB Dataset loading
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
trainX, trainY = train
testX, testY = test

def lstm(trainX, trainY,testX, testY):
    # Data preprocessing
    # Sequence padding
    trainX = pad_sequences(trainX, maxlen=100, value=0.)
    testX = pad_sequences(testX, maxlen=100, value=0.)
    # Converting labels to binary vectors
    trainY = to_categorical(trainY, nb_classes=2)
    testY = to_categorical(testY, nb_classes=2)

    # Network building
    net = tflearn.input_data([None, 100])
    net = tflearn.embedding(net, input_dim=10000, output_dim=128)
    net = tflearn.lstm(net, 128, dropout=0.8)
    net = tflearn.fully_connected(net, 2, activation='softmax')
    net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
                             loss='categorical_crossentropy')

    # Training
    model = tflearn.DNN(net, tensorboard_verbose=0)
    model.fit(trainX, trainY, validation_set=(testX, testY), show_metric=True,
              batch_size=32,run_id="rnn-lstm")

def bi_lstm(trainX, trainY,testX, testY):
    trainX = pad_sequences(trainX, maxlen=200, value=0.)
    testX = pad_sequences(testX, maxlen=200, value=0.)
    # Converting labels to binary vectors
    trainY = to_categorical(trainY, nb_classes=2)
    testY = to_categorical(testY, nb_classes=2)

    # Network building
    net = tflearn.input_data(shape=[None, 200])
    net = tflearn.embedding(net, input_dim=20000, output_dim=128)
    net = tflearn.bidirectional_rnn(net, BasicLSTMCell(128), BasicLSTMCell(128))
    net = tflearn.dropout(net, 0.5)
    net = tflearn.fully_connected(net, 2, activation='softmax')
    net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy')

    # Training
    model = tflearn.DNN(net, clip_gradients=0., tensorboard_verbose=2)
    model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=64,run_id="rnn-bilstm")

def shakespeare():


    path = "shakespeare_input.txt"
    #path = "shakespeare_input-100.txt"
    char_idx_file = 'char_idx.pickle'

    if not os.path.isfile(path):
        urllib.request.urlretrieve(
            "https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/shakespeare_input.txt", path)

    maxlen = 25

    char_idx = None
    if os.path.isfile(char_idx_file):
        print('Loading previous char_idx')
        char_idx = pickle.load(open(char_idx_file, 'rb'))

    X, Y, char_idx = \
        textfile_to_semi_redundant_sequences(path, seq_maxlen=maxlen, redun_step=3,
                                             pre_defined_char_idx=char_idx)

    pickle.dump(char_idx, open(char_idx_file, 'wb'))

    g = tflearn.input_data([None, maxlen, len(char_idx)])
    g = tflearn.lstm(g, 512, return_seq=True)
    g = tflearn.dropout(g, 0.5)
    g = tflearn.lstm(g, 512, return_seq=True)
    g = tflearn.dropout(g, 0.5)
    g = tflearn.lstm(g, 512)
    g = tflearn.dropout(g, 0.5)
    g = tflearn.fully_connected(g, len(char_idx), activation='softmax')
    g = tflearn.regression(g, optimizer='adam', loss='categorical_crossentropy',
                           learning_rate=0.001)

    m = tflearn.SequenceGenerator(g, dictionary=char_idx,
                                  seq_maxlen=maxlen,
                                  clip_gradients=5.0,
                                  checkpoint_path='model_shakespeare')

    for i in range(50):
        seed = random_sequence_from_textfile(path, maxlen)
        m.fit(X, Y, validation_set=0.1, batch_size=128,
              n_epoch=1, run_id='shakespeare')
        print("-- TESTING...")
        print("-- Test with temperature of 1.0 --")
        print(m.generate(600, temperature=1.0, seq_seed=seed))
        #print(m.generate(10, temperature=1.0, seq_seed=seed))
        print("-- Test with temperature of 0.5 --")
        print(m.generate(600, temperature=0.5, seq_seed=seed))

#lstm(trainX, trainY,testX, testY)
#bi_lstm(trainX, trainY,testX, testY)
shakespeare()

后面增加的是一个实例化的代码。

说真的，不来个GPU，运行时间真的好长。结果就不给大家放了。

推荐《Web安全之深度学习实战》，加油！！！