吴恩达深度学习课后习题第五课第一周编程作业3:Jazz Improvisation with LSTM

Packages

Run the following cell to load all the packages you'll need. This may take a few minutes!

In [1]:
import IPython
import sys
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from music21 import *
from grammar import *
from qa import *
from preprocess import * 
from music_utils import *
from data_utils import *
from outputs import *
from test_utils import *

from tensorflow.keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

1 - Problem Statement

You would like to create a jazz music piece specially for a friend's birthday. However, you don't know how to play any instruments, or how to compose music. Fortunately, you know deep learning and will solve this problem using an LSTM network!

You will train a network to generate novel jazz solos in a style representative of a body of performed work. 😎🎷

1.1 - Dataset

To get started, you'll train your algorithm on a corpus of Jazz music. Run the cell below to listen to a snippet of the audio from the training set:

In [2]:
IPython.display.Audio('./data/30s_seq.wav')
Out[2]:

The preprocessing of the musical data has been taken care of already, which for this notebook means it's been rendered in terms of musical "values."

What are musical "values"? (optional)

You can informally think of each "value" as a note, which comprises a pitch and duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a "value" is actually more complicated than this -- specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playing multiple notes at the same time generates what's called a "chord"). But you don't need to worry about the details of music theory for this assignment.

Music as a sequence of values

  • For the purposes of this assignment, all you need to know is that you'll obtain a dataset of values, and will use an RNN model to generate sequences of values.
  • Your music generation system will use 90 unique values.

Run the following code to load the raw music data and preprocess it into values. This might take a few minutes!

In [3]:
X, Y, n_values, indices_values, chords = load_music_utils('data/original_metheny.mid')
print('number of training examples:', X.shape[0])
print('Tx (length of sequence):', X.shape[1])
print('total # of unique values:', n_values)
print('shape of X:', X.shape)
print('Shape of Y:', Y.shape)
print('Number of chords', len(chords))
number of training examples: 60
Tx (length of sequence): 30
total # of unique values: 90
shape of X: (60, 30, 90)
Shape of Y: (30, 60, 90)
Number of chords 19

You have just loaded the following:

  • X: This is an (m, $T_x$, 90) dimensional array.

    • You have m training examples, each of which is a snippet of $T_x =30$ musical values.
    • At each time step, the input is one of 90 different possible values, represented as a one-hot vector.
      • For example, X[i,t,:] is a one-hot vector representing the value of the i-th example at time t.
  • Y: a $(T_y, m, 90)$ dimensional array

    • This is essentially the same as X, but shifted one step to the left (to the past).
    • Notice that the data in Y is reordered to be dimension $(T_y, m, 90)$, where $T_y = T_x$. This format makes it more convenient to feed into the LSTM later.
    • Similar to the dinosaur assignment, you're using the previous values to predict the next value.
      • So your sequence model will try to predict $y^{\langle t \rangle}$ given $x^{\langle 1\rangle}, \ldots, x^{\langle t \rangle}$.
  • n_values: The number of unique values in this dataset. This should be 90.

  • indices_values: python dictionary mapping integers 0 through 89 to musical values.

  • chords: Chords used in the input midi

1.2 - Model Overview

Here is the architecture of the model you'll use. It's similar to the Dinosaurus model, except that you'll implement it in Keras.

Figure 1: Basic LSTM model

  • $X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, \cdots, x^{\langle T_x \rangle})$ is a window of size $T_x$ scanned over the musical corpus.
  • Each $x^{\langle t \rangle}$ is an index corresponding to a value.
  • $\hat{y}^{\langle t \rangle}$ is the prediction for the next value.
  • You'll be training the model on random snippets of 30 values taken from a much longer piece of music.
    • Thus, you won't bother to set the first input $x^{\langle 1 \rangle} = \vec{0}$, since most of these snippets of audio start somewhere in the middle of a piece of music.
    • You're setting each of the snippets to have the same length $T_x = 30$ to make vectorization easier.

Overview of Section 2 and 3

In Section 2, you're going to train a model that predicts the next note in a style similar to the jazz music it's trained on. The training is contained in the weights and biases of the model.

Then, in Section 3, you're going to use those weights and biases in a new model that predicts a series of notes, and using the previous note to predict the next note.

  • The weights and biases are transferred to the new model using the global shared layers (LSTM_celldensorreshaper) described below

2 - Building the Model

Now, you'll build and train a model that will learn musical patterns.

  • The model takes input X of shape $(m, T_x, 90)$ and labels Y of shape $(T_y, m, 90)$.
  • You'll use an LSTM with hidden states that have $n_{a} = 64$ dimensions.
In [4]:
# number of dimensions for the hidden state of each LSTM cell.
n_a = 64 

Sequence generation uses a for-loop

  • If you're building an RNN where, at test time, the entire input sequence $x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, \ldots, x^{\langle T_x \rangle}$ is given in advance, then Keras has simple built-in functions to build the model.
  • However, for sequence generation, at test time you won't know all the values of $x^{\langle t\rangle}$ in advance.
  • Instead, you'll generate them one at a time using $x^{\langle t\rangle} = y^{\langle t-1 \rangle}$.
    • The input at time "t" is the prediction at the previous time step "t-1".
  • So you'll need to implement your own for-loop to iterate over the time steps.

Shareable weights

  • The function djmodel() will call the LSTM layer $T_x$ times using a for-loop.
  • It is important that all $T_x$ copies have the same weights.
    • The $T_x$ steps should have shared weights that aren't re-initialized.
  • Referencing a globally defined shared layer will utilize the same layer-object instance at each time step.
  • The key steps for implementing layers with shareable weights in Keras are:
  1. Define the layer objects (you'll use global variables for this).
  2. Call these objects when propagating the input.

3 types of layers

  • The layer objects you need for global variables have been defined.
    • Just run the next cell to create them!
  • Please read the Keras documentation and understand these layers:
    • Reshape(): Reshapes an output to a certain shape.
    • LSTM(): Long Short-Term Memory layer
    • Dense(): A regular fully-connected neural network layer.
In [5]:
n_values = 90 # number of music values
reshaper = Reshape((1, n_values))                  # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True)         # Used in Step 2.C
densor = Dense(n_values, activation='softmax')     # Used in Step 2.D
  • reshaperLSTM_cell and densor are globally defined layer objects that you'll use to implement djmodel().
  • In order to propagate a Keras tensor object X through one of these layers, use layer_object().
    • For one input, use layer_object(X)
    • For more than one input, put the inputs in a list: layer_object([X1,X2])

Exercise 1 - djmodel

Implement djmodel().

Inputs (given)

  • The Input() layer is used for defining the input X as well as the initial hidden state 'a0' and cell state c0.
  • The shape parameter takes a tuple that does not include the batch dimension (m).
    • For example,
      X = Input(shape=(Tx, n_values)) # X has 3 dimensions and not 2: (m, Tx, n_values)
      

Step 1: Outputs

  • Create an empty list "outputs" to save the outputs of the LSTM Cell at every time step.

Step 2: Loop through time steps

  • Loop for $t \in 1, \ldots, T_x$:

2A. Select the 't' time-step vector from X.

  • X has the shape (m, Tx, n_values).
  • The shape of the 't' selection should be (n_values,).
  • Recall that if you were implementing in numpy instead of Keras, you would extract a slice from a 3D numpy array like this:
    var1 = array1[:,1,:]
    

2B. Reshape x to be (1, n_values).

  • Use the reshaper() layer. This is a function that takes the previous layer as its input argument.

2C. Run x through one step of LSTM_cell.

  • Initialize the LSTM_cell with the previous step's hidden state $a$ and cell state $c$.
  • Use the following formatting:
    next_hidden_state, _, next_cell_state = LSTM_cell(inputs=input_x, initial_state=[previous_hidden_state, previous_cell_state])
    
    • Choose appropriate variables for inputs, hidden state and cell state.

2D. Dense layer

  • Propagate the LSTM's hidden state through a dense+softmax layer using densor.

2E. Append output

  • Append the output to the list of "outputs".

Step 3: After the loop, create the model

  • Use the Keras Model object to create a model. There are two ways to instantiate the Model object. One is by subclassing, which you won't use here. Instead, you'll use the highly flexible Functional API, which you may remember from an earlier assignment in this course! With the Functional API, you'll start from your Input, then specify the model's forward pass with chained layer calls, and finally create the model from inputs and outputs.

  • Specify the inputs and output like so:

    model = Model(inputs=[input_x, initial_hidden_state, initial_cell_state], outputs=the_outp
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值