RNN-RBM
RNN-RBM是一个能量基础模型,用于时间序列的密度估计,其中向量在时间戳可位于高维度。可用于描述多峰条件分布,其中是时间的序列历史,通过一些列条件RBM(每个时间戳一个),参数依赖于有隐藏单元的确定RNN的输出:
(1)
(2)
单层RNN循环关系定义为:
(3)
结果模型在时间上展开如下图:
总体概率分布有给定序列在时间之和给出:
(4)
公式右边的乘积是第个边际概率。
实现
我们构建两个Theano函数:一个训练RNN-RBM,一个生成样本序列。
训练中,给定,RNN隐藏状态和相应的参数具有确定性,可直接用于计算每个训练序列。随机梯度下降(SGD)更新参数可以通过对比分离(CD)估计,在序列每个时间上,以普通RBM微批次处理单个训练样本相同的方式。
序列生成与此类似,除了应该使用单独(非批次)吉布斯链,在传入循环和序列历史前,依据时间顺序取样。
RBM层
下列build_rbm函数从输入微批次(二分矩阵)通过CD近似构建一个吉布斯链。注意它也可以支持非批次的单一框架(二分向量)。
def build_rbm(v, W, bv, bh, k): '''Construct a k-step Gibbs chain starting at v for an RBM. v : Theano vector or matrix If a matrix, multiple chains will be run in parallel (batch). W : Theano matrix Weight matrix of the RBM. bv : Theano vector Visible bias vector of the RBM. bh : Theano vector Hidden bias vector of the RBM. k : scalar or Theano scalar Length of the Gibbs chain. Return a (v_sample, cost, monitor, updates) tuple: v_sample : Theano vector or matrix with the same shape as `v` Corresponds to the generated sample(s). cost : Theano scalar Expression whose gradient with respect to W, bv, bh is the CD-k approximation to the log-likelihood of `v` (training example) under the RBM. The cost is averaged in the batch case. monitor: Theano scalar Pseudo log-likelihood (also averaged in the batch case). updates: dictionary of Theano variable -> Theano variable The `updates` object returned by scan.''' def gibbs_step(v): mean_h = T.nnet.sigmoid(T.dot(v, W) + bh) h = rng.binomial(size=mean_h.shape, n=1, p=mean_h, dtype=theano.config.floatX) mean_v = T.nnet.sigmoid(T.dot(h, W.T) + bv) v = rng.binomial(size=mean_v.shape, n=1, p=mean_v, dtype=theano.config.floatX) return mean_v, v chain, updates = theano.scan(lambda v: gibbs_step(v)[1], outputs_info=[v], n_steps=k) v_sample = chain[-1] mean_v = gibbs_step(v_sample)[0] monitor = T.xlogx.xlogy0(v, mean_v) + T.xlogx.xlogy0(1 - v, 1 - mean_v) monitor = monitor.sum() / v.shape[0] def free_energy(v): return -(v * bv).sum() - T.log(1 + T.exp(T.dot(v, W) + bh)).sum() cost = (free_energy(v) - free_energy(v_sample)) / v.shape[0] return v_sample, cost, monitor, updates
RNN 层
build_rnnrbm行数定义了RNN循环关系以取得RBM参数。循环函数具有灵活性,可用于训练,其中给定,批次化RBM在序列最末一次构建,也可用于生成,
使用上面定义的吉布斯链在每个时间点另外取样。
def build_rnnrbm(n_visible, n_hidden, n_hidden_recurrent): '''Construct a symbolic RNN-RBM and initialize parameters. n_visible : integer Number of visible units. n_hidden : integer Number of hidden units of the conditional RBMs. n_hidden_recurrent : integer Number of hidden units of the RNN. Return a (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate) tuple: v : Theano matrix Symbolic variable holding an input sequence (used during training) v_sample : Theano matrix Symbolic variable holding the negative particles for CD log-likelihood gradient estimation (used during training) cost : Theano scalar Expression whose gradient (considering v_sample constant) corresponds to the LL gradient of the RNN-RBM (used during training) monitor : Theano scalar Frame-level pseudo-likelihood (useful for monitoring during training) params : tuple of Theano shared variables The parameters of the model to be optimized during training. updates_train : dictionary of Theano variable -> Theano variable Update object that should be passed to theano.function when compiling the training function. v_t : Theano matrix Symbolic variable holding a generated sequence (used during sampling) updates_generate : dictionary of Theano variable -> Theano variable Update object that should be passed to theano.function when compiling the generation function.''' W = shared_normal(n_visible, n_hidden, 0.01) bv = shared_zeros(n_visible) bh = shared_zeros(n_hidden) Wuh = shared_normal(n_hidden_recurrent, n_hidden, 0.0001) Wuv = shared_normal(n_hidden_recurrent, n_visible, 0.0001) Wvu = shared_normal(n_visible, n_hidden_recurrent, 0.0001) Wuu = shared_normal(n_hidden_recurrent, n_hidden_recurrent, 0.0001) bu = shared_zeros(n_hidden_recurrent) params = W, bv, bh, Wuh, Wuv, Wvu, Wuu, bu # learned parameters as shared # variables v = T.matrix() # a training sequence u0 = T.zeros((n_hidden_recurrent,)) # initial value for the RNN hidden # units # If `v_t` is given, deterministic recurrence to compute the variable # biases bv_t, bh_t at each time step. If `v_t` is None, same recurrence # but with a separate Gibbs chain at each time step to sample (generate) # from the RNN-RBM. The resulting sample v_t is returned in order to be # passed down to the sequence history. def recurrence(v_t, u_tm1): bv_t = bv + T.dot(u_tm1, Wuv) bh_t = bh + T.dot(u_tm1, Wuh) generate = v_t is None if generate: v_t, _, _, updates = build_rbm(T.zeros((n_visible,)), W, bv_t, bh_t, k=25) u_t = T.tanh(bu + T.dot(v_t, Wvu) + T.dot(u_tm1, Wuu)) return ([v_t, u_t], updates) if generate else [u_t, bv_t, bh_t] # For training, the deterministic recurrence is used to compute all the # {bv_t, bh_t, 1 <= t <= T} given v. Conditional RBMs can then be trained # in batches using those parameters. (u_t, bv_t, bh_t), updates_train = theano.scan( lambda v_t, u_tm1, *_: recurrence(v_t, u_tm1), sequences=v, outputs_info=[u0, None, None], non_sequences=params) v_sample, cost, monitor, updates_rbm = build_rbm(v, W, bv_t[:], bh_t[:], k=15) updates_train.update(updates_rbm) # symbolic loop for sequence generation (v_t, u_t), updates_generate = theano.scan( lambda u_tm1, *_: recurrence(None, u_tm1), outputs_info=[None, u0], non_sequences=params, n_steps=200) return (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate)
汇总
我们现在有了训练网络的所有内容。
class RnnRbm: '''Simple class to train an RNN-RBM from MIDI files and to generate sample sequences.''' def __init__( self, n_hidden=150, n_hidden_recurrent=100, lr=0.001, r=(21, 109), dt=0.3 ): '''Constructs and compiles Theano functions for training and sequence generation. n_hidden : integer Number of hidden units of the conditional RBMs. n_hidden_recurrent : integer Number of hidden units of the RNN. lr : float Learning rate r : (integer, integer) tuple Specifies the pitch range of the piano-roll in MIDI note numbers, including r[0] but not r[1], such that r[1]-r[0] is the number of visible units of the RBM at a given time step. The default (21, 109) corresponds to the full range of piano (88 notes). dt : float Sampling period when converting the MIDI files into piano-rolls, or equivalently the time difference between consecutive time steps.''' self.r = r self.dt = dt (v, v_sample, cost, monitor, params, updates_train, v_t, updates_generate) = build_rnnrbm( r[1] - r[0], n_hidden, n_hidden_recurrent ) gradient = T.grad(cost, params, consider_constant=[v_sample]) updates_train.update( ((p, p - lr * g) for p, g in zip(params, gradient)) ) self.train_function = theano.function( [v], monitor, updates=updates_train ) self.generate_function = theano.function( [], v_t, updates=updates_generate ) def train(self, files, batch_size=100, num_epochs=200): '''Train the RNN-RBM via stochastic gradient descent (SGD) using MIDI files converted to piano-rolls. files : list of strings List of MIDI files that will be loaded as piano-rolls for training. batch_size : integer Training sequences will be split into subsequences of at most this size before applying the SGD updates. num_epochs : integer Number of epochs (pass over the training set) performed. The user can safely interrupt training with Ctrl+C at any time.''' assert len(files) > 0, 'Training set is empty!' \ ' (did you download the data files?)' dataset = [midiread(f, self.r, self.dt).piano_roll.astype(theano.config.floatX) for f in files] try: for epoch in range(num_epochs): numpy.random.shuffle(dataset) costs = [] for s, sequence in enumerate(dataset): for i in range(0, len(sequence), batch_size): cost = self.train_function(sequence[i:i + batch_size]) costs.append(cost) print('Epoch %i/%i' % (epoch + 1, num_epochs)) print(numpy.mean(costs)) sys.stdout.flush() except KeyboardInterrupt: print('Interrupted by user.') def generate(self, filename, show=True): '''Generate a sample sequence, plot the resulting piano-roll and save it as a MIDI file. filename : string A MIDI file will be created at this location. show : boolean If True, a piano-roll of the generated sequence will be shown.''' piano_roll = self.generate_function() midiwrite(filename, piano_roll, self.r, self.dt) if show: extent = (0, self.dt * len(piano_roll)) + self.r pylab.figure() pylab.imshow(piano_roll.T, origin='lower', aspect='auto', interpolation='nearest', cmap=pylab.cm.gray_r, extent=extent) pylab.xlabel('time (s)') pylab.ylabel('MIDI note number') pylab.title('generated piano-roll')
结果
在Nottingham database上训练200次,耗时约24小时,结果如下:
Epoch 1/200 -15.0308940028 Epoch 2/200 -10.4892606673 Epoch 3/200 -10.2394696138 Epoch 4/200 -10.1431669994 Epoch 5/200 -9.7005382843 Epoch 6/200 -8.5985647524 Epoch 7/200 -8.35115428534 Epoch 8/200 -8.26453580552 Epoch 9/200 -8.21208991542 Epoch 10/200 -8.16847274143 ... truncated for brevity ... Epoch 190/200 -4.74799179994 Epoch 191/200 -4.73488515216 Epoch 192/200 -4.7326138489 Epoch 193/200 -4.73841636884 Epoch 194/200 -4.70255511452 Epoch 195/200 -4.71872634914 Epoch 196/200 -4.7276415885 Epoch 197/200 -4.73497644728 Epoch 198/200 -inf Epoch 199/200 -4.75554987143 Epoch 200/200 -4.72591935412
优化代码
*预处理:将序列转化为常用调(例如C大调/C小调),每分钟用节拍(四分音符)对节奏归一化对模型生成质量非常有效。
*预训练:用独立RBMs和全洗牌桢初始化(例如),使用SGD或Hessian-free(更佳)优化初始化RNN的参数
*优化技巧:梯度裁减,Nesterov动能和使用NADE进行条件密度估计
*超参数:学习速率,学习计划,批次规模,隐藏单元数量,动能系数,动能计划,吉布斯链长度k和提前停止
*将初始条件作为模型参数学习