初见 ❤ RNN |【学以致用】-CSDN博客

本文链接：https://blog.csdn.net/weixin_43982238/article/details/94646802

说明 ?

?代码重现笔记
? renference book: 【Hands-On Machine Learning with Scikit-Learn & TensorFlow 】
?Chapter 14: 【Recurrent Neural Networks】
?从基础到实践

预备知识 ▶

? 配置深度学习环境
? TensorFlow基础 ? 初识 ❤ TensorFlow |【一见倾心】

准备工作 ?

超长代码预警······?
导入需用到package

import tensorflow as tf
import numpy as np
import os
import matplotlib.pyplot as plt

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'        # 去除AVX2警告

定义公用函数
Preparation for some common functions

# to make output stable across runs
def reset_graph(seed=22):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)


# to plot pretty figures
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
# where to save the figures
PROJECT_ROOT_DIR = "."
CHARTER_ID = "rnn"


def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "images", CHARTER_ID)
    if not os.path.exists(path):
        os.makedirs(path)
    fig_path = os.path.join(path, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(fig_path, format='png')     # 不使用dpi, IDE中可正常显示
    # plt.savefig(fig_path, format='png', dpi=300)

Basic RNNs in TensorFlow ?

Manual RNN
First, let’s implement a very simple RNN model, without using any of TensorFlow’s RNN operations, to better understand what goes on under the hood. We will create an RNN composed of a layer of five recurrent neurons, using the tanh activation function. We will assume that the RNN runs over only two time steps, taking input vectors of size 3 at each time step.

reset_graph()
n_inputs = 3
n_neurons = 5

X0 = tf.placeholder(tf.float32, [None, n_inputs])
X1 = tf.placeholder(tf.float32, [None, n_inputs])

Wx = tf.Variable(tf.random_normal(shape=[n_inputs, n_neurons], dtype=tf.float32))
Wy = tf.Variable(tf.random_normal(shape=[n_neurons, n_neurons], dtype=tf.float32))
b = tf.Variable(tf.zeros([1, n_neurons]), dtype=tf.float32)

Y0 = tf.tanh(tf.matmul(X0, Wx) + b)
Y1 = tf.tanh(tf.matmul(Y0, Wy) + tf.matmul(X1, Wx) + b)

init = tf.global_variables_initializer()
# Mini-batch:       instance 0, instance 1,instance 2, instance 3
X0_batch = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]])   # t = 0
X1_batch = np.array([[9, 8, 7], [0, 0, 0], [6, 5, 4], [3, 2, 1]])   # t = 1

with tf.Session() as sess:
    init.run()
    Y0_val, Y1_val = sess.run([Y0, Y1], feed_dict={X0: X0_batch, X1: X1_batch})

print(Y0_val)   # output at t = 0
""" 
 [[ 0.99550587  0.9996419  -0.9968993   0.16764468 -0.70020366]     # instance 0
 [ 0.99999976  1.         -1.         -0.9948845  -0.99979913]      # instance 1
 [ 1.          1.         -1.         -0.99999076 -0.99999976]      # instance 2
 [ 0.99999785  0.06634381 -1.         -1.         -0.79417264]]     # instance 3
"""
print(Y1_val)   # output at t = 1
""" 
 [[ 1.          1.         -1.         -1.         -1.        ]     # instance 0
 [-0.7565598  -0.8187268  -0.94461614 -0.42730775 -0.19264291]      # instance 1
 [ 0.99998546  0.99999976 -1.         -0.99999803 -0.9999975 ]      # instance 2
 [ 0.7929777   0.9608644  -0.9999962  -0.9912115  -0.9984168 ]]     # instance 3
"""

Using static_rnn()
That wasn’t too hard, but of course if you want to be able to run an RNN over 100 time steps, the graph is going to be pretty big. Now let’s look at how to create the same model using TensorFlow’s RNN operations.
The static_rnn() function creates an unrolled RNN network by chaining cells. The static_rnn() function returns two objects.
The first is a Python list containing the output tensors for each time step.
The second is a tensor containing the final states of the network.

reset_graph()
n_inputs = 3
n_neurons = 5

X0 = tf.placeholder(tf.float32, [None, n_inputs])
X1 = tf.placeholder(tf.float32, [None, n_inputs])

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
output_seqs, states = tf.nn.static_rnn(basic_cell, [X0, X1], dtype=tf.float32)
Y0, Y1 = output_seqs
init = tf.global_variables_initializer()
# Mini-batch:       instance 0, instance 1,instance 2, instance 3
X0_batch = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]])   # t = 0
X1_batch = np.array([[9, 8, 7], [0, 0, 0], [6, 5, 4], [3, 2, 1]])   # t = 1

with tf.Session() as sess:
    init.run()
    Y0_val, Y1_val = sess.run([Y0, Y1], feed_dict={X0: X0_batch, X1: X1_batch})

print(Y0_val)   # output at t = 0
print(Y1_val)   # output at t = 1

Packing sequences
Static Unrolling Through Time
If there were 50 time steps, it would not be very convenient to have to define 50 input placeholders and 50 output tensors. Moreover, at execution time you would have to feed each of the 50 placeholders and manipulate the 50 outputs. Let’s simplify this.
takes a single input placeholder of shape [None, n_steps, n_inputs] where the first dimension is the mini-batch size.

n_steps = 2
n_inputs = 3
n_neurons = 5
reset_graph()

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
output_seqs, states = tf.nn.static_rnn(basic_cell, X_seqs, dtype=tf.float32)
outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])

init = tf.global_variables_initializer()
# Mini-batch
X_batch = np.array([
        # t = 0      t = 1
        [[0, 1, 2], [9, 8, 7]],     # instance 1
        [[3, 4, 5], [0, 0, 0]],     # instance 2
        [[6, 7, 8], [6, 5, 4]],     # instance 3
        [[9, 0, 1], [3, 2, 1]],     # instance 4
    ])

with tf.Session() as sess:
    init.run()
    outputs_val = outputs.eval(feed_dict={X: X_batch})

print(outputs_val)
"""
[[[-0.11162948  0.33325985 -0.9007175  -0.80817497  0.88085717]     # t=0, instance 0
  [-0.99862707  0.16578908 -0.9948006  -0.9999983   0.99999946]]    # t=1

 [[-0.8062513   0.47295004 -0.99115485 -0.9989003   0.9993104 ]     # t=0, instance 1
  [-0.48152515  0.05868915 -0.8531729  -0.5823526   0.8742988 ]]    # t=1

 [[-0.9716136   0.5923225  -0.99924463 -0.9999942   0.9999963 ]     # t=0, instance 2
  [-0.9808671   0.01522995 -0.9809986  -0.99990326  0.9999424 ]]    # t=1

 [[-0.9999604   0.9249761   0.99988776 -0.99278164  0.85507596]     # t=0, instance 3
  [-0.6323666   0.06465403 -0.78828096 -0.9617816   0.96681607]]]   # t=1
"""

Using dynamic_rnn()
However, above approach still builds a graph containing one cell per time step. If there were 50 time steps, the graph would look pretty ugly.
With such as large graph, you may even get out-of-memory (OOM) errors during backpropagation (especially with the limited memory of GPU cards), since it must store all tensor values during the forward pass so it can use them to compute gradients during the reverse pass.
The dynamic_rnn() function uses a while_loop() operation to run over the cell the appropriate number of times.
There is no need to stack, unstack, or transpose.

n_steps = 2
n_inputs = 3
n_neurons = 5
reset_graph()

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
init = tf.global_variables_initializer()
# Mini-batch
X_batch = np.array([
        # t = 0      t = 1
        [[0, 1, 2], [9, 8, 7]],     # instance 1
        [[3, 4, 5], [0, 0, 0]],     # instance 2
        [[6, 7, 8], [6, 5, 4]],     # instance 3
        [[9, 0, 1], [3, 2, 1]],     # instance 4
    ])

with tf.Session() as sess:
    init.run()
    outputs_val = outputs.eval(feed_dict={X: X_batch})

print(outputs_val)
"""
[[[-0.306496   -0.67795455  0.63360935 -0.22624303  0.69378805]
  [-0.28959382  0.79179853  0.99999917  0.9009478   0.9999996 ]]

 [[-0.61062753 -0.5367684   0.9982954   0.38485184  0.99893695]
  [ 0.71705747 -0.5571848  -0.23296002 -0.5707488  -0.2665677 ]]

 [[-0.8016345  -0.35739028  0.9999936   0.7785631   0.9999969 ]
  [ 0.4053654   0.61451226  0.99982405  0.772726    0.99983084]]

 [[-0.9797662   0.9999785   0.9997745   0.98968136  0.99647856]
  [ 0.6971239   0.16660431  0.94809496  0.69628793  0.9422178 ]]]
"""

Using dynamic_rnn()
Handling Variable Length Input Sequences
So far we have used only fixed-size input sequences (all exactly two steps long). What if the input sequences have variable lengths (e.g., like sentences)?
In this case you should set the sequence_length parameter when calling the
dynamic_rnn() (or static_rnn()) function; it must be a 1D tensor indicating the length of the input sequence for each instance.

n_steps = 2
n_inputs = 3
n_neurons = 5
reset_graph()

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)

seq_length = tf.placeholder(tf.int32, [None])
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32,
                                    sequence_length=seq_length)
init = tf.global_variables_initializer()
# Mini-batch
X_batch = np.array([
        # step 0      step 1
        [[0, 1, 2], [9, 8, 7]],     # instance 1
        [[3, 4, 5], [0, 0, 0]],     # instance 2 (padded with zero vectors)
        [[6, 7, 8], [6, 5, 4]],     # instance 3
        [[9, 0, 1], [3, 2, 1]],     # instance 4
    ])
seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
    init.run()
    outputs_val, states_val = sess.run([outputs, states], feed_dict={X: X_batch, seq_length: seq_length_batch})

print(outputs_val)
"""
[[[ 0.68722075 -0.509747    0.70100015  0.89299554  0.15244864]
  [ 0.9999998  -0.9999988  -0.9972714   0.99999833  0.9986311 ]]        # final state

 [[ 0.9982136  -0.9972284   0.35704085  0.999562    0.8413011 ]         # final state
  [ 0.          0.          0.          0.          0.        ]]        # zero vector

 [[ 0.9999914  -0.99998826 -0.12167802  0.99999833  0.9800005 ]
  [ 0.9999898  -0.99922127 -0.9930421   0.9968625   0.99406075]]        # final state

 [[ 0.9998404  -0.9999913  -0.9925783   0.8807097  -0.57542574]
  [ 0.98173416 -0.96920955 -0.9693667   0.70200425  0.72634375]]]       # final state
"""
print(states_val)       # final state(step)(time) value
"""
[[ 0.9999998  -0.9999988  -0.9972714   0.99999833  0.9986311 ]          # t = 1
 [ 0.9982136  -0.9972284   0.35704085  0.999562    0.8413011 ]          # t = 0 !!!
 [ 0.9999898  -0.99922127 -0.9930421   0.9968625   0.99406075]          # t = 1
 [ 0.98173416 -0.96920955 -0.9693667   0.70200425  0.72634375]]         # t = 1
"""

开始实践【一】|【分类】?

Training a Sequence Classifier
Okay, now you know how to build an RNN network (or more precisely an RNN network unrolled through time). But how do you train it?
Let’s train an RNN to classify MNIST images. We will treat each image as a sequence of 28 rows of 28 pixels each (since each MNIST image is 28 × 28 pixels). We will use cells of 150 recurrent neurons, plus a fully connected layer containing 10 neurons (one per class) connected to the output of the last time step, followed by a softmax layer.

reset_graph()
n_steps = 28
n_inputs = 28
n_neurons = 150
n_outputs = 10

learning_rate = 0.001

X = tf.placeholder(tf.float32, shape=[None, n_steps, n_inputs])
y = tf.placeholder(tf.int32, shape=[None])

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                          logits=logits)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]


def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


X_test = X_test.reshape((-1, n_steps, n_inputs))

n_epochs = 100
batch_size = 150

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            X_batch = X_batch.reshape((-1, n_steps, n_inputs))
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
        print("Epoch ==>", epoch, "Last batch accuracy:", acc_batch, "Test accuracy:", acc_test)

"""
...
Epoch ==> 95 Last batch accuracy: 0.9866667 Test accuracy: 0.9774
Epoch ==> 96 Last batch accuracy: 0.99333334 Test accuracy: 0.977
Epoch ==> 97 Last batch accuracy: 0.99333334 Test accuracy: 0.978
Epoch ==> 98 Last batch accuracy: 0.9866667 Test accuracy: 0.9808
Epoch ==> 99 Last batch accuracy: 0.9866667 Test accuracy: 0.9779
"""

Training a Sequence Classifier
with Multi-layer RNN
This version uses Multi-layer RNN with less neurons and epochs, while getting better performance.

reset_graph()
n_steps = 28
n_inputs = 28
n_outputs = 10

learning_rate = 0.001

X = tf.placeholder(tf.float32, shape=[None, n_steps, n_inputs])
y = tf.placeholder(tf.int32, shape=[None])

# added
n_neurons = 100
n_layers = 3

layers = [tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons,
                                      activation=tf.nn.relu)
          for layer in range(n_layers)]
multi_layer_cell = tf.nn.rnn_cell.MultiRNNCell(layers)
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)

states_concat = tf.concat(axis=1, values=states)
logits = tf.layers.dense(states_concat, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                          logits=logits)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]


def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


X_test = X_test.reshape((-1, n_steps, n_inputs))

n_epochs = 10
batch_size = 150

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            X_batch = X_batch.reshape((-1, n_steps, n_inputs))
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
        print("Epoch ==>", epoch, "Last batch accuracy:", acc_batch, "Test accuracy:", acc_test)

"""
...
Epoch ==> 5 Last batch accuracy: 0.98 Test accuracy: 0.9708
Epoch ==> 6 Last batch accuracy: 0.97333336 Test accuracy: 0.9766
Epoch ==> 7 Last batch accuracy: 0.9866667 Test accuracy: 0.9816
Epoch ==> 8 Last batch accuracy: 0.98 Test accuracy: 0.9729
Epoch ==> 9 Last batch accuracy: 0.9866667 Test accuracy: 0.9806
"""

开始实践【二】|【预测】?

Training to Predict Time Series
Using an OuputProjectionWrapper
Now let’s take a look at how to handle time series, such as stock prices, air temperature, brain wave patterns, and so on.
In this section we will train an RNN to predict the next value in a generated time series.
Each training instance is a randomly selected sequence of 20 consecutive values from the time series, and the target sequence is the same as the input sequence, except it is shifted by one time step into the future.
At each time step we now have an output vector of size 100. But what we actually want is a single output value at each time step. The simplest solution is to wrap the cell in an OutputProjectionWrapper.
The OutputProjectionWrapper adds a fully connected layer of linear neurons (i.e., without any activation function) on top of each output (but it does not affect the cell state).

t_min, t_max = 0, 30
resolution = 0.1


def time_series(t):
    return t * np.sin(t) / 3 + 2 * np.sin(t*5)


def next_batch(batch_size, n_steps):
    t0 = np.random.rand(batch_size, 1) * (t_max - t_min - n_steps * resolution)
    Ts = t0 + np.arange(0., n_steps + 1) * resolution
    ys = time_series(Ts)
    return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)


t = np.linspace(t_min, t_max, int((t_max - t_min) / resolution))

n_steps = 20
t_instance = np.linspace(12.2, 12.2 + resolution * (n_steps + 1), n_steps + 1)

plt.figure(figsize=(11, 4))
plt.subplot(121)
plt.title("A time series (generated)", fontsize=14)
plt.plot(t, time_series(t), label=r"$t . \sin(t) / 3 + 2 . \sin(5t)$")
plt.plot(t_instance[:-1], time_series(t_instance[:-1]), "b-", linewidth=3, label="A training instance")
plt.legend(loc="lower left", fontsize=14)
plt.axis([0, 30, -17, 13])
plt.xlabel("Time")
plt.ylabel("Value")

plt.subplot(122)
plt.title("A training instance", fontsize=14)
plt.plot(t_instance[:-1], time_series(t_instance[:-1]), "bo", markersize=10, label="instance")
plt.plot(t_instance[1:], time_series(t_instance[1:]), "c*", markersize=8, label="target")
plt.legend(loc="upper left")
plt.xlabel("Time")
save_fig("time_series_plot")
plt.show()

X_batch, y_batch = next_batch(1, n_steps)
print(np.c_[X_batch[0], y_batch[0]])

reset_graph()
n_steps = 20
n_inputs = 1
n_neurons = 100
n_outputs = 1

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

cell = tf.contrib.rnn.OutputProjectionWrapper(
    tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu),
    output_size=n_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

learning_rate = 0.001

loss = tf.reduce_mean(tf.square(outputs - y))   # MSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

saver = tf.train.Saver()
n_iterations = 1500
batch_size = 50

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch, y_batch = next_batch(batch_size, n_steps)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print("iteration ==>", iteration, "\tMSE", mse)

    path = os.path.join(PROJECT_ROOT_DIR, "model_saved", CHARTER_ID)
    if not os.path.exists(path):
        os.makedirs(path)
    model_id = "my_time_series_model"
    model_path = os.path.join(path, model_id)
    saver.save(sess, model_path)

with tf.Session() as sess:
    saver.restore(sess, model_path)

    X_new = time_series(np.array(t_instance[:-1].reshape(-1, n_steps, n_inputs)))
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(y_pred)
plt.title("Testing the model", fontsize=14)
plt.plot(t_instance[:-1], time_series(t_instance[:-1]), "bo", markersize=10, label="instance")
plt.plot(t_instance[1:], time_series(t_instance[1:]), "c*", markersize=8, label="target")
plt.plot(t_instance[1:], y_pred[0, :, 0], "r.", markersize=10, label="prediction")
plt.legend(loc="upper left")
plt.xlabel("Time")

save_fig("time_series_pred_plot")
plt.show()

Training to Predict Time Series
Without Using an OuputProjectionWrapper
Although using an OutputProjectionWrapper is the simplest solution to reduce the dimensionality of the RNN’s output sequences down to just one value per time step (per instance), it is not the most efficient.
There is a trickier but more efficient solution: you can reshape the RNN outputs from [batch_size, n_steps, n_neurons] to [batch_size * n_steps, n_neurons], then apply a single fully connected layer with the appropriate output size (in our case just 1), which will result in an output tensor of shape [batch_size * n_steps, n_outputs], and then reshape this tensor to [batch_size, n_steps, n_outputs].

t_min, t_max = 0, 30
resolution = 0.1


def time_series(t):
    return t * np.sin(t) / 3 + 2 * np.sin(t*5)


def next_batch(batch_size, n_steps):
    t0 = np.random.rand(batch_size, 1) * (t_max - t_min - n_steps * resolution)
    Ts = t0 + np.arange(0., n_steps + 1) * resolution
    ys = time_series(Ts)
    return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)


t = np.linspace(t_min, t_max, int((t_max - t_min) / resolution))

n_steps = 20
t_instance = np.linspace(12.2, 12.2 + resolution * (n_steps + 1), n_steps + 1)

reset_graph()
n_inputs = 1
n_neurons = 100
n_outputs = 1

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

learning_rate = 0.001

stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))       # MSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_iterations = 1500
batch_size = 50


with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch, y_batch = next_batch(batch_size, n_steps)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print("iteration ==>", iteration, "\tMSE", mse)

    path = os.path.join(PROJECT_ROOT_DIR, "model_saved", CHARTER_ID)
    if not os.path.exists(path):
        os.makedirs(path)
    model_id = "my_time_series_model_using_stack_trick"
    model_path = os.path.join(path, model_id)
    saver.save(sess, model_path)

with tf.Session() as sess:
    saver.restore(sess, model_path)

    X_new = time_series(np.array(t_instance[:-1].reshape(-1, n_steps, n_inputs)))
    y_pred = sess.run(outputs, feed_dict={X: X_new})

print(np.c_[time_series(np.array(t_instance[1:].reshape(-1, n_steps, n_inputs))), y_pred])  # target and pred

plt.title("Testing the model", fontsize=14)
plt.plot(t_instance[:-1], time_series(t_instance[:-1]), "bo", markersize=10, label="instance")
plt.plot(t_instance[1:], time_series(t_instance[1:]), "c*", markersize=8, label="target")
plt.plot(t_instance[1:], y_pred[0, :, 0], "r.", markersize=10, label="prediction")
plt.legend(loc="upper left")
plt.xlabel("Time")

save_fig("time_series_pred_plot_using_stack_trick")
plt.show()

开始实践【三】|【应用Dropout】?

Applying Dropout
If you build a very deep RNN, it may end up overfitting the training set. To prevent that, a common technique is to apply dropout.
You can simply add a dropout layer before or after the RNN as usual, but if you also want to apply dropout between the RNN layers, you need to use a DropoutWrapper.
Note: the input_keep_prob parameter can be a placeholder, making it possible to set it to any value you want during training, and to 1.0 during testing (effectively turning dropout off).

t_min, t_max = 0, 30
resolution = 0.1


def time_series(t):
    return t * np.sin(t) / 3 + 2 * np.sin(t*5)


def next_batch(batch_size, n_steps):
    t0 = np.random.rand(batch_size, 1) * (t_max - t_min - n_steps * resolution)
    Ts = t0 + np.arange(0., n_steps + 1) * resolution
    ys = time_series(Ts)
    return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)


reset_graph()
n_inputs = 1
n_neurons = 100
n_layers = 3
n_steps = 20
n_outputs = 1

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

keep_prob = tf.placeholder_with_default(1.0, shape=())
cells = [tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
         for layer in range(n_layers)]
cells_drop = [tf.nn.rnn_cell.DropoutWrapper(cell, input_keep_prob=keep_prob)
              for cell in cells]
multi_layer_cell = tf.nn.rnn_cell.MultiRNNCell(cells_drop)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)

learning_rate = 0.01

stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_iterations = 1500
batch_size = 50
train_keep_prob = 0.5

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch, y_batch = next_batch(batch_size, n_steps)
        _, mse = sess.run([training_op, loss],
                          feed_dict={X: X_batch, y: y_batch,
                                     keep_prob: train_keep_prob})
        if iteration % 100 == 0:
            print(iteration, "Training MSE:", mse)

    path = os.path.join(PROJECT_ROOT_DIR, "model_saved", CHARTER_ID)
    if not os.path.exists(path):
        os.makedirs(path)
    model_id = "my_dropout_time_series_model"
    model_path = os.path.join(path, model_id)
    saver.save(sess, model_path)

t_instance = np.linspace(12.2, 12.2 + resolution * (n_steps + 1), n_steps + 1)

with tf.Session() as sess:
    path = os.path.join(PROJECT_ROOT_DIR, "model_saved", CHARTER_ID)
    model_id = "my_dropout_time_series_model"
    model_path = os.path.join(path, model_id)
    saver.restore(sess, model_path)

    X_new = time_series(np.array(t_instance[:-1].reshape(-1, n_steps, n_inputs)))
    y_pred = sess.run(outputs, feed_dict={X: X_new})

plt.title("Testing the model", fontsize=14)
plt.plot(t_instance[:-1], time_series(t_instance[:-1]), "bo", markersize=10, label="instance")
plt.plot(t_instance[1:], time_series(t_instance[1:]), "c*", markersize=8, label="target")
plt.plot(t_instance[1:], y_pred[0, :, 0], "r.", markersize=10, label="prediction")
plt.legend(loc="upper left")
plt.xlabel("Time")
plt.show()

开始实践【四】| 【LSTM】⏰

Using LSTM Cell
To train an RNN on long sequences, you will need to run it over many time steps, making the unrolled RNN a very deep network.
Many of the tricks we discussed to alleviate this problem can be used for deep unrolled RNNs as well: good parameter initialization, nonsaturating activation functions (e.g., ReLU), Batch Normalization, Gradient Clipping, and faster optimizers. However, if the RNN needs to handle even moderately long sequences (e.g., 100 inputs), then training will still be very slow.
Besides the long training time, a second problem faced by long-running RNNs is the fact that the memory of the first inputs gradually fades away.
If you consider the LSTM cell as a black box, it can be used very much like a basic cell, except it will perform much better; training will converge faster and it will detect long-term dependencies in the data.

the other two LSTM variants
lstm_cell = tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, use_peepholes=True)
gru_cell = tf.nn.rnn_cell.GRUCell(num_units=n_neurons)

reset_graph()
n_steps = 28
n_inputs = 28
n_neurons = 150
n_outputs = 10
n_layers = 3
learning_rate = 0.001

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.int32, [None])

lstm_cell = [tf.nn.rnn_cell.BasicLSTMCell(num_units=n_neurons)
             for layer in range(n_layers)]
multi_cell = tf.nn.rnn_cell.MultiRNNCell(lstm_cell)
outputs, states = tf.nn.dynamic_rnn(multi_cell, X, dtype=tf.float32)
top_layer_h_state = states[-1][1]
logits = tf.layers.dense(top_layer_h_state, n_outputs, name="softmax")
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
print(states)
print(top_layer_h_state)

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]


def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


n_epochs = 10
batch_size = 150
# added
X_test = X_test.reshape(-1, n_steps, n_inputs)

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            X_batch = X_batch.reshape((-1, n_steps, n_inputs))
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Last batch accuracy:", acc_batch, "Test accuracy:", acc_test)
"""
0 Last batch accuracy: 0.96 Test accuracy: 0.9496
1 Last batch accuracy: 0.9533333 Test accuracy: 0.968
2 Last batch accuracy: 0.99333334 Test accuracy: 0.9781
3 Last batch accuracy: 0.9866667 Test accuracy: 0.9789
4 Last batch accuracy: 0.97333336 Test accuracy: 0.9824
5 Last batch accuracy: 1.0 Test accuracy: 0.9866
6 Last batch accuracy: 0.97333336 Test accuracy: 0.9835
7 Last batch accuracy: 1.0 Test accuracy: 0.9867
8 Last batch accuracy: 0.9866667 Test accuracy: 0.9878
9 Last batch accuracy: 0.99333334 Test accuracy: 0.9868
"""

开始实践【五】| 【词嵌入】?

Natural Language Processing
Word Embeddings
Most of the state-of-the-art NLP applications, such as machine translation, automatic summarization, parsing, sentiment analysis, and more, are now based (at least in part) on RNNs.

from six.moves import urllib
import errno
import zipfile
from collections import Counter

WORDS_PATH = "datasets/words"
WORDS_URL = 'http://mattmahoney.net/dc/text8.zip'


def fetch_words_data(words_url=WORDS_URL, words_path=WORDS_PATH):
    os.makedirs(words_path, exist_ok=True)
    zip_path = os.path.join(words_path, "words.zip")
    if not os.path.exists(zip_path):
        urllib.request.urlretrieve(words_url, zip_path)
    with zipfile.ZipFile(zip_path) as f:
        data = f.read(f.namelist()[0])
    return data.decode("ascii").split()


words = fetch_words_data()
print(words[:5])        # ['anarchism', 'originated', 'as', 'a', 'term']
print(len(words))       # 17005207

vocabulary_size = 50000
# return the top 50000 most common words, UNK means unknown words
vocabulary = [("UNK", None)] + Counter(words).most_common(vocabulary_size - 1)
vocabulary = np.array([word for word, _ in vocabulary])
dictionary = {word: code for code, word in enumerate(vocabulary)}
data = np.array([dictionary.get(word, 0) for word in words])

print((" ".join(words[:9]), data[:9]))
# ('anarchism originated as a term of abuse first used', array([5234, 3081,   12,    6,  195,    2, 3134,   46,   59]))
print(" ".join([vocabulary[word_index] for word_index in [5241, 3081, 12, 6, 195, 2, 3134, 46, 59]]))
# cycles originated as a term of abuse first used
print((words[24], data[24]))
# ('culottes', 0)
# 说明词汇表中，没有culottes这个词

# Generate batches
from collections import deque


def generate_batch(batch_size, num_skips, skip_window):
    global data_index
    assert batch_size % num_skips == 0
    assert num_skips <= 2 * skip_window
    batch = np.ndarray(shape=[batch_size], dtype=np.int32)
    labels = np.ndarray(shape=[batch_size, 1], dtype=np.int32)
    span = 2 * skip_window + 1  # [ skip_window target skip_window ]
    buffer = deque(maxlen=span)
    for _ in range(span):
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    for i in range(batch_size // num_skips):
        target = skip_window    # target label at the center of the buffer
        targets_to_avoid = [skip_window]
        for j in range(num_skips):
            while target in targets_to_avoid:
                target = np.random.randint(0, span)
            targets_to_avoid.append(target)
            batch[i * num_skips + j] = buffer[skip_window]
            labels[i * num_skips + j, 0] = buffer[target]
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    return batch, labels


np.random.seed(22)
data_index = 0
batch, labels = generate_batch(8, 2, 1)
print((batch, [vocabulary[word] for word in batch]))
print((labels, [vocabulary[word] for word in labels[:, 0]]))

# Build the model
batch_size = 128
embedding_size = 128  # Dimension of the embedding vector.
skip_window = 1       # How many words to consider left and right.
num_skips = 2         # How many times to reuse an input to generate a label.

# We pick a random validation set to sample nearest neighbors. Here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16     # Random set of words to evaluate similarity on.
valid_window = 100  # Only pick dev samples in the head of the distribution.
valid_examples = np.random.choice(valid_window, valid_size, replace=False)
num_sampled = 64    # Number of negative examples to sample.

learning_rate = 0.01

reset_graph()

# Input Data
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)

# Look up embeddings for inputs.
init_embeds = tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)
embeddings = tf.Variable(init_embeds)

train_inputs = tf.placeholder(tf.int32, shape=[None])       # from ids...
embed = tf.nn.embedding_lookup(embeddings, train_inputs)    # ...to embeddings

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / np.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Compute the average NCE loss for the batch.
# tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
loss = tf.reduce_mean(
    tf.nn.nce_loss(nce_weights, nce_biases, train_labels, embed,
                   num_sampled, vocabulary_size))

# Construct the Adam optimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)

# Compute the cosine similarity between minibatch examples and all embeddings.
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), axis=1, keepdims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, normalized_embeddings, transpose_b=True)

# Add variable initializer.
init = tf.global_variables_initializer()

# Train the model
num_steps = 10001

with tf.Session() as session:
    init.run()
    average_loss = 0
    for step in range(num_steps):
        print("\rIteration: {}".format(step), end="\t")
        batch_inputs, batch_labels = generate_batch(batch_size, num_skips, skip_window)
        feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}

        # We perform one update step by evaluating the training op (including it
        # in the list of returned values for session.run()
        _, loss_val = session.run([training_op, loss], feed_dict=feed_dict)
        average_loss += loss_val

        if step % 2000 == 0:
            if step > 0:
                average_loss /= 2000
            # The average loss is an estimate of the loss over the last 2000 batches.
            print("Average loss at step ", step, ": ", average_loss)
            average_loss = 0

        # Note that this is expensive (~20% slowdown if computed every 500 steps)
        if step % 10000 == 0:
            sim = similarity.eval()
            for i in range(valid_size):
                valid_word = vocabulary[valid_examples[i]]
                top_k = 8  # number of nearest neighbors
                nearest = (-sim[i, :]).argsort()[1:top_k + 1]
                log_str = "Nearest to %s:" % valid_word
                for k in range(top_k):
                    close_word = vocabulary[nearest[k]]
                    log_str = "%s %s," % (log_str, close_word)
                print(log_str)

    final_embeddings = normalized_embeddings.eval()

np.save("./my_final_embeddings.npy", final_embeddings)

# Plot the embeddings


def plot_with_labels(low_dim_embs, labels):
    assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
    plt.figure(figsize=(18, 18))    # in inches
    for i, label in enumerate(labels):
        x, y = low_dim_embs[i, :]
        plt.scatter(x, y)
        plt.annotate(label,
                     xy=(x, y),
                     xytext=(5, 2),
                     textcoords='offset points',
                     ha='right',
                     va='bottom')
        plt.draw()
    save_fig("word embedding visualization")
    plt.show()


from sklearn.manifold import TSNE

final_embeddings = np.load("./my_final_embeddings.npy")
tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
plot_only = 500
low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only, :])
labels = [vocabulary[i] for i in range(plot_only)]
plot_with_labels(low_dim_embs, labels)

The End ??

Congratulations?