15_2_顺序数据10值Us RNNs_1D CNNs_GRU_wavenet_get_file_glob_readlines_colab_轮廓图quickdraw_chorales合唱_window

15_Processing Sequences Using RNNs_naive_linear_CNNs预测顺序数据10值_scalar_plt.sca_labelpad_curve_Layer Normal_TimeDistributed_LSTM_GRU
https://blog.csdn.net/Linli522362242/article/details/114941730

Using 1D convolutional layers to process sequences

     In Cp14 https://blog.csdn.net/Linli522362242/article/details/108302266, we saw that a 2D convolutional layer works by sliding several fairly small kernels (or filters) across an image, producing multiple 2D feature maps (one per kernel). Similarly, a 1D convolutional layer slides several kernels across a sequence, producing a 1D feature map per kernel. Each kernel will learn to detect a single very short sequential pattern (no longer than the kernel size).

  • If you use 10 kernels, then the layer’s output will be composed of 10 1-dimensional sequences (all of the same length), or equivalently you can view this output as a single 10-dimensional sequence. This means that you can build a neural network composed of a mix of recurrent layers and 1D convolutional layers (or even 1D pooling layers).
  • If you use a 1D convolutional layer with a stride of 1 and "same" padding, then the output sequence will have the same length as the input sequence.
  • But if you use "valid" padding or a stride greater than 1, then the output sequence will be shorter than the input sequence, so make sure you adjust the targets accordingly.
          For example, the following model is the same as earlier, except it starts with a 1D convolutional layer that downsamples the input sequence by a factor of 2, using a stride of 2. The kernel size(kernel_size=4) is larger than the stride, so all inputs will be used to compute the layer’s output(
    = max( (50-2*p-4)/2+1)<=50/2 ==>p=0 ), and therefore the model can learn to preserve the useful information, dropping only the unimportant details. By shortening the sequences, the convolutional layer may help the GRU layers detect longer patterns. Note that we must also crop off the first three time steps in the targets (since the kernel’s size is 4, the first output of the convolutional layer will be based on the input time steps 0 to 3 and
    X_train[0:3]=[0,1,2,3]~~Y[3:12]=[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
    ### Y_train[:, 3::2] and strides=2
    OR 3: original Y_train start at index=0 ==> 4(kernel_size) -1=3  and -1 since index start at 0 ###), and downsample the targets by a factor of 2:
    import tensorflow as tf
    from tensorflow import keras
    import numpy as np
    
    np.random.seed(42)
    tf.random.set_seed(42)
    
    model = keras.models.Sequential([
        # 1D convolutional layer slides several kernels across a sequence, producing a 1D
        # feature map per kernel. Each kernel will learn to detect a single very short
        # sequential pattern (no longer than the kernel size)
        keras.layers.Conv1D( filters=20, kernel_size=4, strides=2, padding="valid",
                             input_shape=[None, 1]), # Conv1D since input_shape=[None, 1] OR univariate time series
        keras.layers.GRU( 20, return_sequences=True ),
        keras.layers.GRU( 20, return_sequences=True ), # return_sequences=True==> at each and every time step
        keras.layers.TimeDistributed( keras.layers.Dense(10) ) # forecast the next 10 values
    ])
    
    # Y_valid.shape : (2000, 50 time steps, 10)
    def last_time_step_mse( Y_true, Y_pred):  # ":" represents all instances, "-1" is last time step
        return keras.metrics.mean_squared_error( Y_true[:, -1], Y_pred[:, -1] )
    
    model.compile( loss="mse", optimizer="adam", metrics=[last_time_step_mse] )
    history = model.fit( X_train, Y_train[:, 3::2], epochs=20,        # strides=2
                         validation_data = (X_valid, Y_valid[:, 3::2])# 4(kernel_size) -1=3 and -1 since index start at 0
                       )


    If you train and evaluate this model, you will find that it is the best model so far. The convolutional layer really helps. In fact, it is actually possible to use only 1D convolutional layers and drop the recurrent layers entirely!

    plot_learning_curves(history.history["loss"], history.history["val_loss"])
    
    plt.show()

    np.random.seed(43)
    
    series = generate_time_series(1, 50+10)
    X_new, Y_new = series[:, :50, :], series[:, 50:, :]
    Y_pred = model.predict( X_new )[:, -1][..., np.newaxis] #-1: the last time step with features(=1),
    Y_pred                               #[batch size, time steps, 1]

    plot_multiple_forecasts(X_new, Y_new, Y_pred)
    
    plt.show()

WaveNet

     In a 2016 paper,(Aaron van den Oord et al., “WaveNet: A Generative Model for Raw Audio,” arXiv preprint arXiv:1609.03499 (2016).) Aaron van den Oord and other DeepMind researchers introduced an architecture called WaveNet. They stacked 1D convolutional layers, doubling the dilation rate膨胀率 (how spread apart each neuron’s inputs are) at every layer: the first convolutional layer gets a glimpse of just two time steps at a time, while the next one sees four time steps (its receptive field is four time steps long), the next one sees eight time steps, and so on (see Figure 15-11). This way, the lower layers learn short-term patterns, while the higher layers learn long-term patterns. Thanks to the doubling dilation rate, the network can process extremely large sequences very efficiently.
Figure 15-11. WaveNet architecture
https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
     In the WaveNet paper, the authors actually stacked 10 convolutional layers with dilation rates of 1, 2, 4, 8, …, 256, 512, then they stacked another group of 10 identical layers (also with dilation rates 1, 2, 4, 8, …, 256, 512), then again another identical group of 10 layers. They justified this architecture by pointing out that a single stack of 10 convolutional layers with these dilation rates will act like a super-efficient convolutional layer with a kernel of size 1,024 (except way faster, more powerful, and using significantly fewer parameters), which is why they stacked 3 such blocks. They also left-padded the input sequences with a number of zeros equal to the dilation rate before every layer, to preserve the same sequence length throughout the network. Here is how to implement a simplified WaveNet to tackle the same sequences as earlier (The complete WaveNet uses a few more tricks, such as skip connections like in a ResNet, and Gated Activation Units similar to those found in a GRU cell. Please see the notebook for more details.):
left-padded the input sequences with a number of zeros equal to the dilation rate before every layer, to preserve the same sequence length throughout the network

X_valid.shape, Y_valid.shape

 

np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential()
model.add( keras.layers.InputLayer( input_shape=[None, 1]) )
for rate in (1,2,4,8)*2: # (1, 2, 4, 8, 1, 2, 4, 8)
    # padding="causal"
    # this ensures that the convolutional layer does not peek into the future when making predictions 
    # (it is equivalent to padding the inputs with the right amount of zeros on the left and using "valid" padding).
    model.add(keras.layers.Conv1D( filters=20, kernel_size=2, padding="causal", 
                                   activation="relu", dilation_rate=rate
                                 )
             )
model.add( keras.layers.Conv1D(filters=10, kernel_size=1) ) #default padding="valid",strides=1,
model.compile( loss="mse", optimizer="adam", metrics=[last_time_step_mse] )
history = model.fit( X_train, Y_train, epochs=20,
                     validation_data=(X_valid, Y_valid)
                   )    

     This Sequential model starts with an explicit input layer (this is simpler than trying to set input_shape only on the first layer), then continues with a 1D convolutional layer using "causal" padding: this ensures that the convolutional layer does not peek into the future when making predictions (it is equivalent to padding the inputs with the right amount of zeros on the left and using "valid" padding). We then add similar pairs of layers using growing dilation rates: 1, 2, 4, 8, and again 1, 2, 4, 8. Finally, we add the output layer: a convolutional layer with 10 filters of size 1 and without any activation function. Thanks to the padding layers, every convolutional layer outputs a sequence of the same length as the input sequences, so the targets we use during training can be the full sequences: no need to crop them or downsample them.

plot_learning_curves(history.history["loss"], history.history["val_loss"])

plt.show()

np.random.seed(43)

series = generate_time_series(1, 50+10)
X_new, Y_new = series[:, :50, :], series[:, 50:, :]
Y_pred = model.predict( X_new )[:, -1][..., np.newaxis] #-1: the last time step with features(=1),
Y_pred                               #[batch size, time steps, 1]

plot_multiple_forecasts(X_new, Y_new, Y_pred)

plt.show()

     The last two models(One-Dimensional+GRU, WaveNet) offer the best performance so far in forecasting our time series! In the WaveNet paper, the authors achieved state-of-the-art performance on various audio tasks (hence the name of the architecture), including text-to-speech tasks, producing incredibly realistic voices across several languages. They also used the model to generate music, one audio sample at a time. This feat is all the more impressive when you realize that a single second of audio can contain tens of thousands of time steps—even LSTMs and GRUs cannot handle such long sequences.

     Here is the original WaveNet defined in the paper: it uses Gated Activation Units instead of ReLU and parametrized skip connections, plus it pads with zeros on the left to avoid getting shorter and shorter sequences:

class GatedActivationUnit( keras.layers.Layer ):
    def __init__( self, activation="tanh", **kwargs):
        super().__init__(**kwargs)
        self.activation = keras.activations.get(activation)
    def call( self, inputs ):
        n_filters = inputs.shape[-1]//2 # inputs.shape : (None, None, 64)
        # print(n_filters) n_filters = 32
        linear_output = self.activation( inputs[..., :n_filters] )
        gate = keras.activations.sigmoid( inputs[..., n_filters:] )
        # print(linear_output.shape, gate.shape) # (None, None, 32) (None, None, 32)
        return self.activation( linear_output )*gate # element-wise multiplication
def wavenet_residual_block( inputs, n_filters, dilation_rate ):
    z = keras.layers.Conv1D( 2*n_filters, kernel_size=2, padding="causal",
                             dilation_rate=dilation_rate)(inputs)
    z = GatedActivationUnit()(z)
    z = keras.layers.Conv1D( n_filters, kernel_size=1 )(z)
    # keras.layers.Add()
    #                   It takes a list of tensors as input , all of the  tensors have same shape, 
    #                   and returns a single tensor (also of the same shape).
    return keras.layers.Add()([z, inputs]), z
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

n_layers_per_block = 3 # 10 in the paper
n_blocks =1 # 3 in the paper
n_filters = 32 # 128 in the paper
n_outputs = 10 # 256 in the paper

inputs = keras.layers.Input( shape=[None,1] )
z = keras.layers.Conv1D( n_filters, kernel_size=2, padding="causal" )(inputs)
skip_to_last = []     
for dilation_rate in [ 2**i for i in range( n_layers_per_block) ]*n_blocks:# [1,2,4]
    z, skip = wavenet_residual_block(z, n_filters, dilation_rate)
    skip_to_last.append(skip)
    
z = keras.activations.relu( keras.layers.Add()(skip_to_last) )
z = keras.layers.Conv1D( n_filters, kernel_size=1, activation="relu" )(z)
Y_proba = keras.layers.Conv1D( n_outputs, kernel_size=1, activation="softmax" )(z)

model = keras.models.Model( inputs=[inputs], outputs=[Y_proba] )
model.compile( loss="mse", optimizer="adam", metrics=[last_time_step_mse] )
history = model.fit( X_train, Y_train, epochs=2,
                     validation_data=(X_valid, Y_valid)
                   )

In this chapter we explored the fundamentals of RNNs and used them to process sequences (namely, time series). In the process we also looked at other ways to process sequences, including CNNs. In the next chapter we will use RNNs for Natural Language Processing, and we will learn more about RNNs (bidirectional RNNs, stateful vs stateless RNNs, Encoder–Decoders, and Attention-augmented Encoder-Decoders). We will also look at the Transformer, an Attention-only architecture.

Exercises

1. Can you think of a few applications for a sequence-to-sequence RNN? What about a sequence-to-vector RNN, and a vector-to-sequence RNN?

     Here are a few RNN applications: https://blog.csdn.net/Linli522362242/article/details/113846940

  • • For a sequence-to-sequence RNN( Many-to-many:  synchronized or delayed ): predicting the weather (or any other time series), machine translation (using an Encoder–Decoder architecture), video captioning, speech to text, music generation (or other sequence generation), identifying the chords of a song
  • • For a sequence-to-vector RNN(Many-to-one): classifying music samples by music genre类型, analyzing the sentiment of a book review, predicting what word an aphasic[ə'feɪzɪk]患失语症者失语症的 patient is thinking of based on readings from brain implants脑部植入物, predicting the probability that a user will want to watch a movie based on their watch history (this is one of many possible implementations of collaborative filtering for a recommender system)
  • • For a vector-to-sequence RNN(One-to-many): image captioning, creating a music playlist based on an embedding of the current artist, generating a melody based on a set of parameters, locating pedestrians in a picture (e.g., a video frame from a self-driving car’s camera)

2. How many dimensions must the inputs of an RNN layer have? What does each dimension represent? What about its outputs?#########################

     An RNN layer must have three-dimensional inputs:

  • the first dimension is the batch dimension (its size is the batch size),
  • the second dimension represents the time (its size is the number of time steps), and
  • the third dimension holds the inputs at each time step (its size is the number of input features per time step).

     For example, if you want to process a batch containing 5 time series of 10 time steps each, with 2 values per time step (e.g., the temperature and the wind speed), the shape will be [5, 10, 2]. The outputs are also three-dimensional, with the same first two dimensions, but the last dimension is equal to the number of neurons. For example, if an RNN layer with 32 neurons processes the batch we just discussed, the output will have a shape of [5, 10, 32].

3. If you want to build a deep sequence-to-sequence RNN, which RNN layers should have return_sequences=True? What about a sequence-to-vector RNN?

     To build a deep sequence-to-sequence RNN using Keras, you must set return_sequences=True for all RNN layers.
     To build a sequence-to-vector RNN, you must set return_sequences=True for all RNN layers except for the top RNN layer, which must have return_sequences=False (or do not set this argument at all, since False is the default).

4. Suppose you have a daily univariate time series, and you want to forecast the next seven days. Which RNN architecture should you use?##############

     If you have a daily univariate time series, and you want to forecast the next seven days, the simplest RNN architecture you can use is a stack of RNN layers (

  • all with return_sequences=True
  • except for the top RNN layer(return_sequences=False)
  • ), using seven neurons in the output RNN layer.https://blog.csdn.net/Linli522362242/article/details/114941730
  • # RNN predicts all 7 next values(target batch_size x 10) at once and only at the very last time step:
  • model = keras.models.Sequential([
        keras.layers.SimpleRNN(20, return_sequences=True, input_shape=[None,1]),
        keras.layers.SimpleRNN(20), # many-to-1 : sequence-to-vector(vector: features)
        keras.layers.Dense(7)      # instead of # keras.layers.SimpleRNN(7)
    ]) 

You can then train this model using random windows from the time series (e.g., sequences of 30 consecutive days as the inputs, and a vector containing the values of the next 7 days as the target). This is a sequence to-vector RNN.

# datasets for training and test the RNN architecture
np.random.seed(42)
 
n_steps = 30
series = generate_time_series(10000, n_steps+7) # n_steps=30
                        # (7000, 30, 1)          # (7000, 7)
X_train, Y_train = series[:7000, :n_steps], series[:7000, -7:, 0]
X_valid, Y_valid = series[7000:9000, :n_steps], series[7000:9000, -7:, 0]
X_test, Y_test = series[9000:, :n_steps], series[9000:, -7:, 0]


# create an new instance for prediction
np.random.seed(43)

series = generate_time_series(1, 30+7) # 1 instance with 30+7 time steps
X_new, Y_new = series[:, :30, :], series[:, -7:, :]
Y_pred = model.predict(X_new)[..., np.newaxis] # prediction then expand dimension to [batch_size, steps, features=1]
Y_pred

     Alternatively, you could set return_sequences=True for all RNN layers to create a sequence-to-sequence RNN. You can train this model using random windows from the time series, with sequences of the same length as the inputs as the targets. Each target sequence should have seven values per time step (e.g., for time step t, the target should be a vector containing the values at time steps t + 1 to t + 7).
forecast the next 10 values at each and every time step

np.random.seed(42)
tf.random.set_seed(42)
 
model = keras.models.Sequential([
    keras.layers.SimpleRNN( 20, return_sequences=True, input_shape=[None, 1] ),
    keras.layers.SimpleRNN( 20, return_sequences=True ),      # at each and every time step
    keras.layers.TimeDistributed( keras.layers.Dense(7) ) # forecast the next 7 values
])
 
def last_time_step_mse( Y_true, Y_pred):  # ":" represents all instances, "-1" is last time step
    return keras.metrics.mean_squared_error( Y_true[:, -1], Y_pred[:, -1] )
 
model.compile( loss="mse", optimizer=keras.optimizers.Adam( lr=0.01 ),
               metrics = [last_time_step_mse] )
# datasets for training and test the RNN architecture

np.random.seed(42)
 
n_steps = 30
series = generate_time_series(10000, n_steps+7) # 10000x37x1
X_train = series[:7000, :n_steps] # 7000x30x1
X_valid = series[7000:9000, :n_steps]
X_test = series[9000:, :n_steps]
 
Y = np.empty( (10000, n_steps, 7) ) #  # 10000x30x7
for step_ahead in range(1, 7+1):     # Y :0     1     2          6     #[,rows=n_steps, column_index] 
    # Y[..., 0~6]                    # 1~30, 2~31, 3~32, ..., 7~36     # row range at each loop 
    Y[..., step_ahead-1] = series[..., step_ahead:step_ahead+n_steps, 0]# get 30 data at each loop 
                                      # and store them in column by column(n_rows=n_steps=30)
 
Y_train = Y[:7000]
Y_valid = Y[7000:9000]
Y_test = Y[9000:]

# X_train.shape, Y_train.shape
# ( (7000, 30,1), (7000,30, 7) )

# create an new instance for prediction
np.random.seed(43)
 
series = generate_time_series(1, 30+7) # create 1 instance with 30+7 time steps
X_new, Y_new = series[:, :30, :], series[:, 30:, :] # first 30 time steps as X_new for prediction,
                                                    # the last 7 time steps as actual Y
# ":" represents all instances, "-1" is last time step
# model.predict(X_new)[:, -1] is a 2D list then expand its dimention to [batch_size, steps, features=1]
Y_pred = model.predict(X_new)[:, -1][..., np.newaxis]

5. What are the main difficulties when training RNNs? How can you handle them? ########################################################

     The two main difficulties when training RNNs are unstable gradients (exploding or vanishing) and a very limited short-term memory. These problems both get worse when dealing with long sequences.

  • To alleviate the unstable gradients problem, you can use a smaller learning rate, use a saturating activation function such as the hyperbolic tangent (which is the default), and possibly use gradient clipping(mitigate使减轻the exploding gradients problem is to clip the gradients during backpropagation so that they never exceed some threshold.), Layer Normalization, or dropout at each time step.
  • To tackle the limited short-term memory problem, you can use LSTM or GRU layers (this also helps with the unstable gradients problem).

6. Can you sketch the LSTM cell’s architecture? #####################################################################################

     An LSTM cell’s architecture looks complicated, but it’s actually not too hard if you understand the underlying logic. The cell has a short-term state vector and a long-term state vector. At each time step,

  • the inputs and
  • the previous short-term state are fed to a simple RNN cell and
  • three gates:
  1. the forget gate decides what to remove from the long-term state,
  2. the input gate decides which part of the output of the simple RNN cell should be added to the long-term state, and
  3. the output gate decides which part of the long-term state should be output at this time step (after going through the tanh activation function). The new short-term state is equal to the output of the cell. See Figure 15-9.

7. Why would you want to use 1D convolutional layers in an RNN? ####################################################################

     An RNN layer is fundamentally sequential: in order to compute the outputs at time step t, it has to first compute the outputs at all earlier time steps. This makes it impossible to parallelize. On the other hand, a 1D convolutional layer lends itself well to parallelization since it does not hold a state between time steps. In other words, it has no memory: the output at any time step can be computed based only on a small window of values from the inputs without having to know all the past values. Moreover, since a 1D convolutional layer is not recurrent, it suffers less from unstable gradients. One or more 1D convolutional layers can be useful in an RNN to efficiently preprocess the inputs, for example to reduce their temporal resolution (downsampling) and thereby help the RNN layers detect long-term patterns. In fact, it is possible to use only convolutional layers, for example by building a WaveNet architecture(the lower layers learn short-term patterns, while the higher layers learn long-term patterns).

8. Which neural network architecture could you use to classify videos###################################################################

     To classify videos based on their visual content, one possible architecture could be to take (say) one frame per second, then run every frame through the same convolutional neural network (e.g., a pretrained Xception model, possibly frozen if your dataset is not large), feed the sequence of outputs from the CNN to a sequence-to-vector RNN, and finally run its output through a softmax layer, giving you all the class probabilities. For training you would use cross entropy as the cost function. If you wanted to use the audio for classification as well, you could use a stack of strided 1D convolutional layers to reduce the temporal resolution from thousands of audio frames per second to just one per second (to match the number of images per second), and concatenate the output sequence to the inputs of the sequence-to-vector RNN (along the last dimension).

9. Train a classification model for the SketchRNN dataset, available in TensorFlow Datasets.#################################################

     The dataset is not available in TFDS yet, the pull request is still work in progress. Luckily, the data is conveniently available as TFRecords, so let's download it (it might take a while, as it's about 1 GB large, with 3,450,000 training sketches and 345,000 test sketches):

File==>mount Drive
==>cache_subdir="/content/drive/My Drive/Colab Notebooks/data/quickdraw",

from tensorflow import keras

DOWNLOAD_ROOT = "http://download.tensorflow.org/data/"
FILENAME = "quickdraw_tutorial_dataset_v1.tar.gz"
filepath = keras.utils.get_file( FILENAME,
                                 DOWNLOAD_ROOT + FILENAME,
                                 # cache_subdir
                                 # Subdirectory under the Keras cache dir where the file is saved. 
                                 # If an absolute path /path/to/folder is specified the file will be saved at that location
                                 cache_subdir="/content/drive/MyDrive/Colab Notebooks/data/quickdraw", # cache_subdir="datasets/quickdraw",
                                 extract = True
                               )

 

C:\Users\LlQ\.keras\datasets\quickdraw 

from pathlib import Path

quickdraw_dir = Path(filepath).parent
quickdraw_dir

train_files = sorted([
    str(path) for path in quickdraw_dir.glob("training.tfrecord-*")
])
eval_files = sorted([
    str(path) for path in quickdraw_dir.glob("eval.tfrecord-*")
])

train_files

eval_files


eval.tfrecord.classes: training.tfrecord.classes:

with open( quickdraw_dir / "eval.tfrecord.classes" ) as test_classes_file:
    test_classes = test_classes_file.readlines()

with open( quickdraw_dir / "training.tfrecord.classes" ) as train_classes_file:
    train_classes = train_classes_file.readlines()
    
test_classes[:5]

assert train_classes == test_classes
class_names = [ name.strip().lower() for name in train_classes ]

sorted(class_names)[:5]

# https://blog.csdn.net/Linli522362242/article/details/107704824
def parse(data_batch):
    feature_descriptions = {
        "ink": tf.io.VarLenFeature( dtype=tf.float32 ),           # tensorflow.python.framework.sparse_tensor.SparseTensor
        "shape": tf.io.FixedLenFeature([2], dtype=tf.int64),      # len(shape_list)=2
        "class_index": tf.io.FixedLenFeature([1], dtype=tf.int64) # len(class_index_list)=1    
    }
    
    examples = tf.io.parse_example( data_batch, feature_descriptions )
    
    flat_sketches = tf.sparse.to_dense( examples["ink"] ) # indices:values ==> 2D dense tensor
    sketches = tf.reshape( flat_sketches, 
                           shape=[tf.size( data_batch ), -1, 3]
                         ) # batch_size, n_rows=-1, n_columns=3
    
    lengths = examples["shape"][:,0] #height=var, width==3
    labels = examples["class_index"][:,0]
    return sketches, lengths, labels
def quickdraw_dataset( filepaths, batch_size=32, shuffle_buffer_size=None,
                       n_parse_threads=5, n_read_threads=5, cache=False ):
    dataset = tf.data.TFRecordDataset( filepaths,
                                       num_parallel_reads=n_read_threads )
    # The first time the dataset is iterated over, its elements will be cached either in the specified file or in memory.
    # Subsequent iterations will use the cached data.
    if cache:
        dataset = dataset.cache()
    if shuffle_buffer_size:
        dataset = dataset.shuffle( shuffle_buffer_size )
    
    dataset = dataset.batch(batch_size)
    dataset = dataset.map( parse, num_parallel_calls=n_parse_threads )
    return dataset.prefetch(1)
train_set = quickdraw_dataset( train_files, shuffle_buffer_size=10000 )
valid_set = quickdraw_dataset( eval_files[:5] )
test_set = quickdraw_dataset( eval_files[5:] )

for sketches, lengths, labels in train_set.take(1):
    print("sketches =", sketches)
    print("lengths =", lengths)
    print("labels =", labels)

import matplotlib.pyplot as plt
import numpy as np

def draw_sketch( sketch, label=None ):
    origin = np.array([ [0., 0., 0.] ])
    sketch = np.r_[origin, 
                   sketch]
    # last 1 in [Δx, Δy, 1] indicates that the sketch is finished
    # The elements {Δx, Δy} represent the pen movement in the sketch plane
    # for example: sketch.shape=( length==25, 
    #                             3 columns is [Δx, Δy, 0 or 1] )
    # x = np.arange(6).reshape(2,3)
    # x
    ### array([[0, 1, 2],
    #          [3, 4, 5]])
    # np.argwhere(x[:, -1]>2)
    ### array([[1]], dtype=int64)
    # np.argwhere(x[:, -1]>2)[:,0]
    ### array([1], dtype=int64)
    stroke_end_indices = np.argwhere( sketch[:, -1]==1. )[:,0]# get the index in which the sketch is finished
    
    coordinates = np.cumsum( sketch[:,:2], axis=0 )# cumsum ( the pen movements )
    
    # numpy.split(ary, indices_or_sections, axis=0)[source]
    # Split an array into multiple sub-arrays as views into ary.
    # OR (elements in the array are grouped by indices_or_sections)
    strokes = np.split( coordinates, stroke_end_indices ) # +1
    
    title = class_names[label.numpy()] if label is not None else "Try to guess"
    plt.title( title )
    
    
    plt.plot( coordinates[:, 0], -coordinates[:, 1], "y:")
    for stroke in strokes:
        plt.plot( stroke[:, 0], -stroke[:, 1], "b.-") 
    plt.axis("off")
    
def draw_sketches( sketches, lengths, labels ):
    n_sketches = len( sketches ) # n_instances 
    n_cols = 4
    n_rows = (n_sketches-1)//n_cols+1
    plt.figure( figsize=(n_cols*3, n_rows*3.5) )
    for index, sketch, length, label in zip( range(n_sketches), sketches, lengths, labels ):
        plt.subplot( n_rows, n_cols, index+1 )
        draw_sketch( sketch[:length], label ) # length: total number of time steps
    plt.show()
    
    
for sketches, lengths, labels in train_set.take(1):
    draw_sketches( sketches, lengths, labels ) # batch_size=32 

lengths = np.concatenate([ lengths for _, lengths, _ in train_set.take(1000) ])

plt.figure( figsize=(9,6) )
plt.hist( lengths, bins=150, density=True )
plt.axis([0, 200, 0, 0.03])
plt.xlabel("length(time steps)")
plt.ylabel("density")
plt.show()

def crop_long_sketches( dataset, max_length=100 ):
    # dataset : <TakeDataset shapes: ((None, None, 3), (None,), (None,)),...
    # inks, lengths, labels : (None, None, 3), (None,), (None,)
    # return inks[:, :max_length], labels
    return dataset.map( lambda inks, lengths, labels: (inks[:, :max_length],
                                                       labels)
                      )

cropped_train_set = crop_long_sketches( train_set )
cropped_valid_set = crop_long_sketches( valid_set )
cropped_test_set = crop_long_sketches( test_set )
model = keras.models.Sequential([
    keras.layers.Conv1D( 32, kernel_size=5, strides=2, activation="relu" ),
    keras.layers.BatchNormalization(),
    
    keras.layers.Conv1D( 64, kernel_size=5, strides=2, activation="relu"),
    keras.layers.BatchNormalization(),
    
    keras.layers.Conv1D( 128, kernel_size=3, strides=2, activation="relu"),
    keras.layers.BatchNormalization(),
    
    keras.layers.LSTM(128, return_sequences=True),
    keras.layers.LSTM(128, return_sequences=False),
    
    keras.layers.Dense( len(class_names), activation="softmax")
])

optimizer = keras.optimizers.SGD( lr=1e-2, clipnorm=1. )
model.compile( loss="sparse_categorical_crossentropy", # sparse since the data is a sparse matrix
               optimizer=optimizer,
               # tf.keras.metrics.SparseTopKCategoricalAccuracy(
                                       #    k=5, name='sparse_top_k_categorical_accuracy', dtype=None
               # )
               metrics=["accuracy", "sparse_top_k_categorical_accuracy"]
             )
history = model.fit( cropped_train_set, epochs=2, 
                     validation_data=cropped_valid_set)

5) top_k_categorical_accuracyhttps://zhuanlan.zhihu.com/p/95293440

在categorical_accuracy的基础上加上top_k。categorical_accuracy要求样本在真值类别上的预测分数是在所有类别上预测分数的最大值,才算预测对,而top_k_categorical_accuracy只要求样本在真值类别上的预测分数排在其在所有类别上的预测分数的前k名就行。比如有4个样本,其y_true为[[0, 0, 1], [0, 1, 0], [0, 1, 0], [1, 0, 0]],y_pred为[[0.3, 0.6, 0.1], [0.5, 0.4, 0.1], [0.3, 0.6, 0.1], [0.9, 0, 0.1]],根据前面知识我们可以计算得到其categorical_accuracy=50%,但是其top_k_categorical_accuracy是多少呢?答案跟k息息相关。如果k大于或等于3,其top_k_categorical_accuracy毫无疑问是100%,因为总共就3个类别。如果k小于3,那就要计算了,比如k=2,那么top_k_categorical_accuracy=75%。具体计算方法为:1)将y_true转为非onehot的形式,即y_true_new=[2, 1, 1, 0];2)计算y_pred的top_k的label,比如k=2时,y_pred_new = [[0, 1], [0, 1], [0, 1], [0, 2]];3)根据每个样本的真实标签是否在预测标签的top_k内来统计准确率,上述4个样本为例,2不在[0, 1]内,1在[0, 1]内,1在[0, 1]内,0在[0, 2]内,4个样本总共预测对了3个,因此k=2时top_k_categorical_accuracy=75%。说明一下,Keras中计算top_k_categorical_accuracy时默认的k值为5。

np.mean( keras.metrics.sparse_top_k_categorical_accuracy(y_test, y_probas) )

n_new = 10

# for sketches, lengths, labels in train_set.take(1):
#     draw_sketches( sketches, lengths, labels ) # batch_size=32 

Y_probas = model.predict( sketches )
# tf.nn.top_k
# Finds values and indices of the k largest entries for the last dimension.
top_k = tf.nn.top_k( Y_probas, k=5 ) # default sorted=True 

# def draw_sketches( sketches, lengths, labels ):
#     n_sketches = len( sketches ) # n_instances 
#     n_cols = 4
#     n_rows = (n_sketches-1)//n_cols+1
#     plt.figure( figsize=(n_cols*3, n_rows*3.5) )
#     for index, sketch, length, label in zip( range(n_sketches), sketches, lengths, labels ):
#         plt.subplot( n_rows, n_cols, index+1 )
#         draw_sketch( sketch[:length], label ) # length: total number of time steps
#     plt.show()    

for index in range( n_new ):
  plt.figure( figsize=(3,3.5) )
  draw_sketch( sketches[index] )
  plt.show()

  print("Top-5 predictions:".format(index+1))
  for k in range(5):
    class_name = class_names[ top_k.indices[index,k] ]
    proba = 100*top_k.values[index, k]
    print( "  {}. {} {:.3f}%".format(k + 1, class_name, proba) )

  print( "Answer: {}".format( class_names[labels[index].numpy()] ) )

model.save('/content/drive/MyDrive/Colab Notebooks/models/my_sketchrnn')

/content/drive/MyDrive/Colab Notebooks/models/my_sketchrnn 



10. Download the Bach chorales合唱(https://homl.info/bach) dataset and unzip it. It is composed of 382 chorales composed by Johann Sebastian Bach. Each chorale is 100 to 640 time steps long, and each time step contains 4 integers, where each integer corresponds to a note’s index on a piano (except for the value 0, which means that no note is played). Train a model—recurrent, convolutional, or both—that can predict the next time step (four notes), given a sequence of time steps from a chorale. Then use this model to generate Bach-like music, one note at a time: you can do this by giving the model the start of a chorale and asking it to predict the next time step, then appending these time steps to the input sequence and asking the model for the next note, and so on. Also make sure to check out Google’s Coconet model, which was used for a nice Google doodle about Bach.
# cache_subdir = "/content/drive/MyDrive/Colab Notebooks/data/jsb_chorales",

from tensorflow import keras

DOWNLOAD_ROOT = "https://github.com/ageron/handson-ml2/raw/master/datasets/jsb_chorales/"
FILENAME = "jsb_chorales.tgz"
filepath = keras.utils.get_file( FILENAME,
                                 DOWNLOAD_ROOT+FILENAME,
                                 cache_subdir = "/content/drive/MyDrive/Colab Notebooks/data/jsb_chorales",
                                 extract = True
                                )

from pathlib import Path

jsb_chorales_dir = Path( filepath ).parent
train_files = sorted( jsb_chorales_dir.glob("train/chorale_*.csv") )
valid_files = sorted( jsb_chorales_dir.glob("valid/chorale_*.csv") )
test_files = sorted( jsb_chorales_dir.glob("test/chorale_*.csv") )
import pandas as pd

def load_chorales( filepaths ):
  return [ pd.read_csv(filepath).values.tolist() \
           for filepath in filepaths
         ]

train_chorales = load_chorales( train_files )
valid_chorales = load_chorales( valid_files )
test_chorales = load_chorales( test_files )

train_chorales[0][:5] # # Each file with data is stored 1 by 1 in train_chorales

each time step contains 4 integers, where each integer corresponds to a note’s index on a piano (except for the value 0, which means that no note is played) 

OR<==

len( train_chorales[0] )

  <== Each chorale is 100 to 640 time steps long

notes = set()

for dataset_chorales in (train_chorales, valid_chorales, test_chorales):
  for eachFile_chorales in dataset_chorales:
    for chorales in eachFile_chorales:
      notes |= set( chorales )

print(notes)


Notes range from 36 (C1 = C on octave高八度音 1) to 81 (A5 = A on octave 5), plus 0 for silence:

n_notes = len(notes)
min_note = min( notes-{0} ) # min(set notes without 0)
max_note = max(notes)

assert min_note == 36
assert max_note == 81

     Let's write a few functions to listen to these chorales (you don't need to understand the details here, and in fact there are certainly simpler ways to do this, for example using MIDI players, but I just wanted to have a bit of fun writing a synthesizer[ˈsɪnθɪˌsaɪzɚ]合成器):

The general pitch-to-frequency(从音调到频率) conversion formula is https://thesoundofnumbers.com/wp-content/uploads/2014/11/pitch_intervals_freq.pdf
 

from IPython.display import Audio

def notes_to_frequencies( notes ):
    # Frequency doubles when you go up one octave; there are 12 semi-tones
    # per octave; Note A on octave 4 is 440 Hz, and it is note number 69.
    return 2**( (np.array(notes)-69)/12 )*440
def frequencies_to_samples(frequencies, tempo, sample_rate):
  note_duration = 60/tempo # the tempo is measured in beats per minutes(60s)
  # To reduce click sound at every beat, we round the frequencies to try to
  # get the samples close to zero at the end of each note.
  frequencies = np.round( note_duration*frequencies )/note_duration
  n_samples = int( note_duration*sample_rate )
  time = np.linspace(0, note_duration, n_samples)
  sine_waves = np.sin( 2 * np.pi * frequencies.reshape(-1,1)*time )
  # Removing all notes with frequencies ≤ 9 Hz (includes note 0 = silence
  sine_waves *= (frequencies >9.).reshape(-1,1)
  return sine_waves.reshape(-1)

def chords_to_samples( chords, tempo=160, sample_rate = 44100 ):
  freqs = notes_to_frequencies( chords )
  freqs = np.r_[ freqs, 
                 freqs[-1:] ] # make last note a bit longer

  merged = np.mean( [ frequencies_to_samples( melody, tempo, sample_rate )
                      for melody in freqs.T
                    ],  # 曲调
                    axis=0
                  )
  n_fade_out_samples = sample_rate * 60 //tempo # fade out last note
  fade_out = np.linspace(1., 0., n_fade_out_samples)**2
  merged[-n_fade_out_samples:] * fade_out
  return merged

def play_chords( chords, 
                 tempo=160, #拍子
                 amplitude=0.1, #振幅
                 sample_rate = 44100, #采样速度
                 filepath=None
                ): 
  samples = amplitude * chords_to_samples( chords, tempo, sample_rate )
  if filepath:
    from scipy.io import wavfile
    samples = (2**15*samples).astype(np.int16)
    wavfile.write( filepath, sample_rate, samples )
    return display( Audio(filepath) )
  else:
    return display( Audio(samples, rate=sample_rate) )
import numpy as np

for index in range(3):
  play_chords( train_chorales[index] )


     In order to be able to generate new chorales, we want to train a model that can predict the next chord given all the previous chords. If we naively try to predict the next chord(Each chorale is 100 to 640 time steps(chords) long) in one shot, predicting all 4 notes at once(each time step contains 4 integers, where each integer corresponds to a note’s index on a piano), we run the risk of getting notes that don't go very well together (believe me, I tried). It's much better and simpler to predict one note at a time. So we will need to preprocess every chorale, turning each chord和弦 into an arpegio琶音的 (i.e., a sequence of notes rather than notes played simultaneuously). So each chorale will be a long sequence of notes (rather than chords), and we can just train a model that can predict the next note given all the previous notes. We will use a sequence-to-sequence approach, where we feed a window to the neural net, and it tries to predict that same window shifted one time step into the future.

We will also shift the values so that they range from 0 to 46, where 0 represents silence, and values 1 to 46 represent notes 36 (C1) to 81 (A5).

And we will train the model on windows of 128 notes (i.e., 32 chords).

Since the dataset fits in memory, we could preprocess the chorales in RAM using any Python code we like, but I will demonstrate here how to do all the preprocessing using tf.data (there will be more details about creating windows using tf.data in the next chapter).

predict one note at a time

def create_target( batch ):
  X = batch[:, :-1]
  Y = batch[:, 1:] # predict next note in each arpegio, at each step
  return X, Y

def preprocess( window ):                    
  # We will also shift the values so that they range from 0 to 46, 
  # where 0 represents silence, 
  # and values 1 to 46 represent notes 36 (C1) to 81 (A5).  # min_note == 36
  window = tf.where( window==0, window, window-min_note+1 ) # shift values 
  return tf.reshape( window, [-1] ) # convert to arpegio

# we will train the model on windows of 128 notes (i.e., 32 chords, 32*4=128).
def bach_dataset( chorales, batch_size=32, shuffle_buffer_size=None,
                   window_size = 32,
                   window_shift = 16, 
                   cache=True ):
  def batch_window( window ):
    # Notice that we call batch(window_size+1) on each window: since all windows 
    # have exactly that length, we will get a single tensor for each of them.
    return window.batch( window_size+1 )
  
  def to_windows( chorale ): # each window is represented as a dataset
    dataset = tf.data.Dataset.from_tensor_slices( chorale )
    # The window() method creates a dataset that contains windows, 
    # each of which is also represented as a dataset. It’s a nested dataset, 
    # similar to a list of lists.
    dataset = dataset.window( window_size+1,
                              window_shift,
                              drop_remainder=True )
          # drop_remainder=True
          # To ensure that all windows are exactly window_size+1 characters long
    # we cannot use a nested dataset directly for training, 
    # as our model will expect tensors as input, not datasets.      
    # So, we must call the flat_map() method: it converts a nested dataset into 
    # a flat dataset (one that does not contain datasets).
    # # If you flatten the nested dataset {{1, 2}, {3, 4, 5, 6}},
    # # you get back the flat dataset {1, 2, 3, 4, 5, 6}.
    # if you pass the function lambda ds: ds.batch(2) to flat_map(), 
    # then it will transform the nested dataset {{1, 2}, {3, 4, 5,6}} into the 
    # flat dataset {[1, 2], [3, 4], [5, 6]}: it’s a dataset of tensors of size 2        
    return dataset.flat_map( batch_window )
  
  chorales = tf.ragged.constant( chorales, ragged_rank=1 )
  dataset = tf.data.Dataset.from_tensor_slices( chorales )
  dataset = dataset.flat_map( to_windows ).map(preprocess)

  if cache:
    dataset = dataset.cache()
  if shuffle_buffer_size:
    dataset = dataset.shuffle( shuffle_buffer_size )
  dataset = dataset.batch( batch_size )
  dataset = dataset.map(create_target)
  return dataset.prefetch(1)

Now let's create the training set, the validation set and the test set:

import tensorflow as tf

train_set = bach_dataset( train_chorales, shuffle_buffer_size=1000 )
valid_set = bach_dataset( valid_chorales )
test_set = bach_dataset( test_chorales )

Now let's create the model:

  • We could feed the note values directly to the model, as floats, but this would probably not give good results. Indeed, the relationships between notes are not that simple: for example, if you replace a C3 with a C4, the melody will still sound fine, even though these notes are 12 semi-tones apart (i.e., one octave).
    Conversely, if you replace a C3 with a C#3, it's very likely that the chord will sound horrible, despite these notes being just next to each other.
    So we will use an Embedding layer to convert each note to a small vector representation (see Chapter 16 for more details on embeddings). We will use 5-dimensional embeddings, so the output of this first layer will have a shape of [batch_size, window_size, 5].
  • We will then feed this data to a small WaveNet-like neural network, composed of a stack of 4 Conv1D layers with doubling dilation rates. We will intersperse穿插 these layers with BatchNormalization layers for faster better convergence.
  • Then one LSTM layer to try to capture long-term patterns.
  • And finally a Dense layer to produce the final note probabilities. It will predict one probability for each chorale in the batch, for each time step, and for each possible note (including silence). So the output shape will be [batch_size, window_size, 47].
    n_embedding_dims = 5
    # n_notes = len(notes) #==47
    
    model = keras.models.Sequential([
      keras.layers.Embedding( input_dim=n_notes, output_dim=n_embedding_dims,
                              input_shape=[None] ),
    
      keras.layers.Conv1D( 32, kernel_size=2, padding="causal", activation="relu" ),
      keras.layers.BatchNormalization(), # for faster better convergence
    
      keras.layers.Conv1D( 48, kernel_size=2, padding="causal", activation="relu",
                           dilation_rate=2 ),
      keras.layers.BatchNormalization(),
    
      keras.layers.Conv1D( 64, kernel_size=2, padding="causal", activation="relu",
                           dilation_rate=4 ),
      keras.layers.BatchNormalization(),
    
      keras.layers.Conv1D( 96, kernel_size=2, padding="causal", activation="relu", 
                           dilation_rate=8 ),
      keras.layers.BatchNormalization(),
    
      # LSTM layer to try to capture long-term patterns
      keras.layers.LSTM( 256, return_sequences=True),
      keras.layers.Dense( n_notes, activation="softmax" )                     
    ])
    
    model.summary()

Now we're ready to compile and train the model!

optimizer = keras.optimizers.Nadam( lr=1e-3 )
model.compile( loss="sparse_categorical_crossentropy", optimizer=optimizer,
               metrics = ["accuracy"] )
model.fit( train_set, epochs=20, validation_data=valid_set )

model.save("/content/drive/MyDrive/Colab Notebooks/models/my_BachChorales_model.h5")

model.evaluate(test_set)

Note: There's no real need for a test set in this exercise, since we will perform the final evaluation by just listening to the music produced by the model. So if you want, you can add the test set to the train set, and train the model again, hopefully getting a slightly better model.

Now let's write a function that will generate a new chorale. We will give it a few seed chords, it will convert them to arpegios (the format expected by the model), and use the model to predict the next note, then the next, and so on. In the end, it will group the notes 4 by 4 to create chords again, and return the resulting chorale.

Warningmodel.predict_classes(X) is deprecated. It is replaced with np.argmax(model.predict(X), axis=-1).

def generate_chorale( model, seed_chords, length ):
  arpegio = preprocess( tf.constant(seed_chords, dtype=tf.int64) )
  arpegio = tf.reshape( arpegio, [1, -1] )

  for chord in range(  length ):
    for note in range(4): # each chord和弦 ==4 notes音符
      # next_note = model.predict_classes(arpegio)[:1, -1:]
      next_note = np.argmax( model.predict(arpegio), axis=-1)[:1, -1:]
      arpegio = tf.concat( [arpegio, next_note], axis=1 )
  arpegio = tf.where( arpegio==0, arpegio, arpegio+min_note-1 )
  return tf.reshape( arpegio, shape=[-1,4] )

To test this function, we need some seed chords. Let's use the first 8 chords of one of the test chorales (it's actually just 2 different chords, each played 4 times):

seed_chords = test_chorales[2][:8]
play_chords(seed_chords, amplitude=0.2)

seed_chords


Now we are ready to generate our first chorale! Let's ask the function to generate 56 more chords, for a total of 64 chords, i.e., 16 bars (assuming 4 chords per bar, i.e., a 4/4 signature):

new_chorale = generate_chorale(model, seed_chords, 56)
play_chords(new_chorale)

new_chorale


     This approach has one major flaw: it is often too conservative. Indeed, the model will not take any risk, it will always choose the note with the highest score, and since repeating the previous note generally sounds good enough, it's the least risky option, so the algorithm will tend to make notes last longer and longer. Pretty boring. Plus, if you run the model multiple times, it will always generate the same melod曲调.

     So let's spice things up a bit! Instead of always picking the note with the highest score, we will pick the next note randomly, according to the predicted probabilities. For example, if the model predicts a C3 with 75% probability, and a G3 with a 25% probability, then we will pick one of these two notes randomly, with these probabilities. We will also add a temperature parameter that will control how "hot" (i.e., daring) we want the system to feel. A high temperature will bring the predicted probabilities closer together, reducing the probability of the likely notes and increasing the probability of the unlikely ones.

def generate_chorale_v2( model, seed_chords, length, temperature=1 ):
  arpegio = preprocess( tf.constant(seed_chords, dtype=tf.int64) )
  arpegio = tf.reshape( arpegio, [1, -1] )

  for chord in range(  length ):
    for note in range(4): 
      next_note_probas = model.predict(arpegio)[0, -1:] #########
      rescaled_logits = tf.math.log( next_note_probas )
      next_note = tf.random.categorical( rescaled_logits, num_samples=1 )
      arpegio = tf.concat( [arpegio, next_note], axis=1 )
  arpegio = tf.where( arpegio==0, arpegio, arpegio+min_note-1 )
  return tf.reshape( arpegio, shape=[-1,4] )

Let's generate 3 chorales using this new function: one cold, one medium, and one hot (feel free to experiment with other seeds, lengths and temperatures). The code saves each chorale to a separate file. You can run these cells over an over again until you generate a masterpiece!

new_chorale_v2_cold = generate_chorale_v2(model, seed_chords, 56, temperature=0.8)
play_chords(new_chorale_v2_cold, filepath="bach_cold.wav")

new_chorale_v2_medium = generate_chorale_v2(model, seed_chords, 56, temperature=1.0)
play_chords(new_chorale_v2_medium, filepath="bach_medium.wav")

new_chorale_v2_hot = generate_chorale_v2(model, seed_chords, 56, temperature=1.5)
play_chords(new_chorale_v2_hot, filepath="bach_hot.wav")

Lastly, you can try a fun social experiment: send your friends a few of your favorite generated chorales, plus the real chorale, and ask them to guess which one is the real one!

play_chords(test_chorales[2][:64], filepath="bach_test_4.wav")

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值