Improvise a Jazz Solo with an LSTM Network

在这个编程作业中,你将实现一个利用LSTM网络生成音乐的模型,特别是生成爵士乐独奏。首先了解数据集和模型概述,然后构建模型并训练,最后生成音乐。你将学习如何应用LSTM进行音乐生成,并使用深度学习创作自己的爵士音乐。
摘要由CSDN通过智能技术生成

Improvise a Jazz Solo with an LSTM Network

Welcome to your final programming assignment of this week! In this notebook, you will implement a model that uses an LSTM to generate music. You will even be able to listen to your own music at the end of the assignment.

You will learn to:
- Apply an LSTM to music generation.
- Generate your own jazz music with deep learning.

Please run the following cell to load all the packages required in this assignment. This may take a few minutes.

from __future__ import print_function
import IPython
import sys
from music21 import *
import numpy as np
from grammar import *
from qa import *
from preprocess import * 
from music_utils import *
from data_utils import *
from keras.models import load_model, Model
from keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector
from keras.initializers import glorot_uniform
from keras.utils import to_categorical
from keras.optimizers import Adam
from keras import backend as K
Using TensorFlow backend.

1 - Problem statement

You would like to create a jazz music piece specially for a friend’s birthday. However, you don’t know any instruments or music composition. Fortunately, you know deep learning and will solve this problem using an LSTM netwok.

You will train a network to generate novel jazz solos in a style representative of a body of performed work.

1.1 - Dataset

You will train your algorithm on a corpus of Jazz music. Run the cell below to listen to a snippet of the audio from the training set:

We have taken care of the preprocessing of the musical data to render it in terms of musical “values.” You can informally think of each “value” as a note, which comprises a pitch and a duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a “value” is actually more complicated than this–specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playng multiple notes at the same time generates what’s called a “chord”). But we don’t need to worry about the details of music theory for this assignment. For the purpose of this assignment, all you need to know is that we will obtain a dataset of values, and will learn an RNN model to generate sequences of values.

Our music generation system will use 78 unique values. Run the following code to load the raw music data and preprocess it into values. This might take a few minutes.

X, Y, n_values, indices_values = load_music_utils()
print('shape of X:', X.shape)
print('number of training examples:', X.shape[0])
print('Tx (length of sequence):', X.shape[1])
print('total # of unique values:', n_values)
print('Shape of Y:', Y.shape)
shape of X: (60, 30, 78)
number of training examples: 60
Tx (length of sequence): 30
total # of unique values: 78
Shape of Y: (30, 60, 78)

You have just loaded the following:

  • X: This is an (m, Tx T x , 78) dimensional array. We have m training examples, each of which is a snippet of Tx=30 T x = 30 musical values. At each time step, the input is one of 78 different possible values, represented as a one-hot vector. Thus for example, X[i,t,:] is a one-hot vector representating the value of the i-th example at time t.

  • Y: This is essentially the same as X, but shifted one step to the left (to the past). Similar to the dinosaurus assignment, we’re interested in the network using the previous values to predict the next value, so our sequence model will try to predict yt y ⟨ t ⟩ given x1,,xt x ⟨ 1 ⟩ , … , x ⟨ t ⟩ . However, the data in Y is reordered to be dimension (Ty,m,78) ( T y , m , 78 ) , where Ty=Tx T y = T x . This format makes it more convenient to feed to the LSTM later.

  • n_values: The number of unique values in this dataset. This should be 78.

  • indices_values: python dictionary mapping from 0-77 to musical values.

1.2 - Overview of our model

Here is the architecture of the model we will use. This is similar to the Dinosaurus model you had used in the previous notebook, except that in you will be implementing it in Keras. The architecture is as follows:

We will be training the model on random snippets of 30 values taken from a much longer piece of music. Thus, we won’t bother to set the first input x1=0⃗  x ⟨ 1 ⟩ = 0 → , which we had done previously to denote the start of a dinosaur name, since now most of these snippets of audio start somewhere in the middle of a piece of music. We are setting each of the snippts to have the same length Tx=30 T x = 30 to make vectorization easier.

2 - Building the model

In this part you will build and train a model that will learn musical patterns. To do so, you will need to build a model that takes in X of shape (m,Tx,78) ( m , T x , 78 ) and Y of shape (Ty,m,78) ( T y , m , 78 ) . We will use an LSTM with 64 dimensional hidden states. Lets set n_a = 64.

n_a = 64 

Here’s how you can create a Keras model with multiple inputs and outputs. If you’re building an RNN where even at test time entire input sequence x1,x2,,xTx x ⟨ 1 ⟩ , x ⟨ 2 ⟩ , … , x ⟨ T x ⟩ were given in advance, for example if the inputs were words and the output was a label, then Keras has simple built-in functions to build the model. However, for sequence generation, at test time we don’t know all the values of xt x ⟨ t ⟩ in advance; instead we generate them one at a time using xt=yt1 x ⟨ t ⟩ = y ⟨ t − 1 ⟩ . So the code will be a bit more complicated, and you’ll need to implement your own for-loop to iterate over the different time steps.

The function djmodel() will call the LSTM layer Tx T x times using a for-loop, and it is important that all Tx T x copies have the same weights. I.e., it should not re-initiaiize the weights every time—the Tx T x steps should have shared weights. The key steps for implementing layers with shareable weights in Keras are:
1. Define the layer objects (we will use global variables for this).
2. Call these objects when propagating the input.

We have defined the layers objects you need as global variables. Please run the next cell to create them. Please check the Keras documentation to make sure you understand what these layers are: Reshape(), LSTM(), Dense().

reshapor = Reshape((1, 78))                        # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True)         #
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值