吴恩达deeplearning Lesson5 Week1 RNN+LSTM 手搭RNN 恐龙起名使用LSTM生成爵士乐

最新推荐文章于 2022-06-01 11:38:01 发布

pu扑朔迷离

最新推荐文章于 2022-06-01 11:38:01 发布

阅读量736

点赞数

分类专栏： Tensorflow DeepLearning 文章标签： deeplearning 吴恩达 keras

本文链接：https://blog.csdn.net/bluehatihati/article/details/90517168

版权

Tensorflow 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

DeepLearning

6 篇文章 0 订阅

订阅专栏

RNN

考虑了序列的先后顺序情况，解决的问题更加灵活。加入了时间维的deeplearning。
在这里插入图片描述

相当于考虑了过去的激活值a< t -1>,和现在的输入x< t >

RNN反向传播：
在这里插入图片描述

GRU

为了房主梯度消失或者梯度爆炸，提出了GRU概念（实际上是简化版本的lstm）
同时GRU还能够使你的神经网络捕获更长的长期依赖。
在这里插入图片描述
上图左边是简化门控的图，主要突出记忆单元cell的作用，gamma u 是遗忘门，用来显示到底模型是否需要考虑以前的影响，例如：gamma u=0则c< t> = c < t-1>，本次影响不考虑

在这里插入图片描述
上图是完整GRU，多了手写的gamma r。你可以认为gamma r代表相关性（relevance）,代表下一个c的值跟上一个的相关性。

LSTM

区别与联系
在这里插入图片描述
图示更加好理解

反向传播（选看）

Building+a+Recurrent+Neural+Network±+Step+by+Step

程序块设计的还是比较巧妙的：


def rnn_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network described in Figure (3).

    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of caches, x)
    """
    
    # Initialize "caches" which will contain the list of all caches
    caches = []
    
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    ### START CODE HERE ###
    
    # initialize "a" and "y" with zeros (≈2 lines)
    a = np.zeros((n_a,m,T_x))
    y_pred = np.zeros((n_y,m,T_x))
    
    # Initialize a_next (≈1 line)
    a_next = a0#巧妙
    
    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, compute the prediction, get the cache (≈1 line)
        a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)
        # Save the value of the new "next" hidden state in a (≈1 line)
        a[:,:,t] = a_next
        # Save the value of the prediction in y (≈1 line)
        y_pred[:,:,t] = yt_pred
        # Append "cache" to "caches" (≈1 line)
        caches.append(cache)
        
    ### END CODE HERE ###
    
    # store values needed for backward propagation in cache
    caches = (caches, x)
    
    return a, y_pred, caches

通过不断的更新a_next，来将a_next放到下一个循环里。
这里a_next的作用在进入循环时是代表前一个激活值的，即将a_next放入前向函数里充当a_prev计算出真正的a_next（即这个cell输出的激活值）
然后将这个a_next当做下一个cell的a_prev再进行循环。

反向传播搞的不是很透彻，留坑。

python 出现的问题 invalid syntax

本来以为是本行写错了，但是原来出错原因在上一行。

    print(id(gradient)
    gradient=1

  File "<ipython-input-27-ffe271b31b6d>", line 10
    gradient=1
           ^
SyntaxError: invalid syntax

一般出现这种问题时，是因为上一行少了东西（一般是括号）

Dinosaurus+Island

变量赋值地址or值


a = np.arange(10)
b=np.zeros(10)
np.clip(a, 3, 6, out=b)
print(b)

a=2
b=a
print(id(a))
print(id(b))

b=3
print(a)
print(id(b))

[3. 3. 3. 3. 4. 5. 6. 6. 6. 6.]
1737253056
1737253056
2
1737253088

b=a后 a和b的地址一样了 b=a是把a的地址赋给b
但是b=3后 b地址产生了变化
如果 a=b的话，a和b的地址相同；如果只是想拷贝，就要用 a = b[:]

思考这个问题是看到程序中出现的控制梯度范围函数中的for循环函数：

def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    '''
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    ### START CODE HERE ###
    # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    print(id(dWax))
    print(id(dWaa))
    print(id(dWya))
    print(id(db))
    print(id(dby))
    for gradient in [dWax, dWaa, dWya, db, dby]:

        print("*"*10)
        print(id(gradient))
        print(gradient)
        np.clip(gradient, -maxValue, maxValue,out = gradient)
        print(id(gradient))
    ### END CODE HERE ###
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

1807415035824
1807415034784
1807414297264
1807414297504
1807414297904
**********
1807415035824
[[ 17.88628473   4.36509851   0.96497468]
 [-18.63492703  -2.77388203  -3.54758979]
 [ -0.82741481  -6.27000677  -0.43818169]
 [ -4.7721803  -13.13864753   8.8462238 ]
 [  8.81318042  17.09573064   0.50033642]]
1807415035824
**********
1807415034784
[[ -4.04677415  -5.45359948 -15.46477316   9.82367434 -11.0106763 ]
 [-11.85046527  -2.05649899  14.86148355   2.36716267 -10.2378514 ]
 [ -7.129932     6.25244966  -1.60513363  -7.6883635   -2.30030722]
 [  7.45056266  19.76110783 -12.44123329  -6.26416911  -8.03766095]
 [-24.19083173  -9.23792022 -10.23875761  11.23977959  -1.31914233]]
1807415034784
**********
1807414297264
[[-16.23285446   6.46675452  -3.56270759 -17.43141037  -5.96649642]
 [ -5.8859438   -8.73882298   0.29713815 -22.48257768  -2.67761865]]
1807414297264
**********
1807414297504
[[10.13183442]
 [ 8.52797841]
 [11.081875  ]
 [11.19390655]
 [14.87543132]]
1807414297504
**********
1807414297904
[[-11.18300684]
 [  8.45833407]]
1807414297904
gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.2971381536101662
gradients["db"][4] = [10.]
gradients["dby"][1] = [8.45833407]

这个函数在np.clip(gradient, -maxValue, maxValue,out = gradient)更改中间变量gradient的值时，同时也让dWax, dWaa, dWya, db, dby产生了变化。
论坛解释：
https://www.coursera.org/learn/nlp-sequence-models/programming/1dYg0/dinosaur-island-character-level-language-modeling/discussions/threads/AoiiXWJOEeiEphLB3FeC3g
解释说 gradient 和dWax 实际上都是对某地址的引用，一个标签而已，改变其一就都变了（？还是不太懂）

numpy.random.choice

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html

numpy.random.choice(a, size=None, replace=True, p=None)¶
Generates a random sample from a given 1-D array

Parameters:
a : 一维数组或者整数
数组内的元素为离散分布的范围，如果为整数a，则自动理解为np.arange(a)

size : int or tuple of ints, optional
输出的维数，比如要输出几个

replace : boolean, optionalb 不管

p : 1-D array-like, optional
p代表分布注意维数要与a相同

Returns:
samples : single item or ndarray
返回一个按照分布的样本（数或数组）
The generated random samples

作用是按照对应的概率分布，生成x个分布内的离散数值。
举例：

>>> np.random.choice(5, 3)
array([0, 3, 4])

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])

numpy.clip

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html

numpy.clip(a, a_min, a_max, out=None)[source]
作用限制数组的上下限（最大最小值）
数值超过限定最大值，将其更改为最大值，低于限定的最小值，将其限定为最小值。

out : ndarray, optional
out用来输出结果，需要保证其形状正确。

Returns:
clipped_array : ndarray

例子：

>>> a = np.arange(10)
>>> np.clip(a, 1, 8)
array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8])

np .ravel()

将多维数组降位一维.
numpy.ndarray.ravel

ndarray.ravel([order])
Return a flattened array.

shuffle()

shuffle() 方法将序列的所有元素随机排序。
需要import random
https://www.runoob.com/python/func-number-shuffle.html

列表 python

https://www.runoob.com/python/python-lists.html
复习一下，空列表直接a=[]
添加需要a.append(‘x’)

巨坑问题 model在coursera上提交得不到分数

在discussion上看了好多好多解释和方法都不能解决，明明代码和结果都是对的。
最后发现，需要改用程序的v3版本！而我的是老版本。同样的代码就可以通过。

浪费了很多时间，以后尽量在最新的版本上完成作业。

理解

通过这次作业，我对rnn了解的更深入了，尤其是语言模型这块，之前一直不理解在采样sample时为什么要把前一个输出放到后一个的输入。
经过作业明白了，其实我认为这个sample的名字命名为采样并不是很恰当，初次读到很有信号领域采样的迷惑感，实际上sample是一个测试模块。就是一个将输入置为0向量，根据输出softmax层（为了避免重复，每次都会换随机种子），不停生成新的恐龙名字的测试。

以及

    x = np.zeros((vocab_size,1))
    # Step 1': Initialize a_prev as zeros (≈1 line)
    a_prev = np.zeros((n_a,1))

x的输入维数是单词onehot表的长度，a的输入维数是a的长度，与onehot是有区别的。

Improvise a Jazz Solo with an LSTM Network

使用lstm生成jazz ，我的结果与作业模板上的最后预期结果不同，但是仍然得到分数。而且我觉得我的那个曲子实在难听。。

思路

在这里插入图片描述

keras functional API

较为不熟练，经常要上官方文档搜索。
比如可以用名字创建层的实例化，然后直接指定输入。
这里补充下keras的知识，可以有两种方式构建模型： the Keras functional API、 the Keras Sequential model
本作业用的是 the Keras functional API，简介如下：
https://keras.io/getting-started/functional-api-guide/

官方入门例子讲的很好：
在这里插入图片描述
为了完成这样的框架，代码如下：

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# We stack a deep densely-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
             loss_weights=[1., 0.2])
model.fit([headline_data, additional_data], [labels, labels],
         epochs=50, batch_size=32)

model的compile和fit部分也可以写成：（我觉得这样更好）

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          epochs=50, batch_size=32)

keras 实例化层

之前让我迷惑的是实例化部分，一般都可以直接在实例化的过程中指定层的输入：

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # starts training

x = Dense(64, activation=‘relu’)(inputs)
x = Dense(64, activation=‘relu’)(x)
predictions = Dense(10, activation=‘softmax’)(x)
这里后面的括号就是输入tensor，即前一层输出作为下一层输入。

但是在作业里出现了：

reshapor = Reshape((1, 78))                        # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True)         # Used in Step 2.C
densor = Dense(n_values, activation='softmax')     # Used in Step 2.D

当时不太理解，也不清楚后续应该怎么做，看了论坛发现后续。上面代码的作用是将指定的功能层实例化，相当于LSTM_cell是LSTM(xxx)特定配置的实例化层，后面如果使用还需要对其进行输入配置。与上上个代码段类似，只需要在实例后加上输入tensor即可。
官网例子：

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
shared_lstm = LSTM(64) #实例化！！！！

# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
#共用了同一层的参数，将tweet_a tweet_b都过这同一层

# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

注意下，这里实例化Mode，input要跟前面的Input层一一对照，比如前面input层：
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
输出层 predictions = Dense(1, activation=‘sigmoid’)(merged_vector)
那么这里实例化：model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
inputs 和outputs与前面是一一对应的。

同理，model.fit时，也将tensor对应的输入与标签按照顺序列好。
model.fit([data_a, data_b], labels, epochs=10)
[data_a, data_b]对应iinputs=[tweet_a, tweet_b]
labels 对应 outputs=predictions

shared_lstm = LSTM(64) #实例化！！！！
encoded_a = shared_lstm(tweet_a) 这里就不用配置实例化的模型了，具体配置都在实例中配置完了。后面的()是输入张量。
但是注意，如果实例化层后，有多个层用到该实例，包括output和shape都需要加别的参数区别对待：

a = Input(shape=(280, 256))
b = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)
lstm.output

>> AttributeError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.

要使用：

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b

以及

assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)

格式：实例名字.get_xxx_at(num)

to_categorical()

https://keras.io/utils/#to_categorical

keras.utils.to_categorical(y, num_classes=None, dtype=‘float32’)
作用是将数值转换为onehot编码，y是十进制数组，num_class是onehot长度，返回一个二值多维序列，即所求的onehot编码。

> labels
array([0, 2, 1, 2, 0])
# `to_categorical` converts this into a matrix with as many
# columns as there are classes. The number of rows
# stays the same.
> to_categorical(labels)
array([[ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.]], dtype=float32)

keras Lambda自定义层实现数据的切片

byr学长写的，很清晰
https://blog.csdn.net/lujiandong1/article/details/54936185

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation,Reshape
from keras.layers import merge
from keras.utils.visualize_util import plot
from keras.layers import Input, Lambda
from keras.models import Model
 
def slice(x,index):
        return x[:,:,index]
 
a = Input(shape=(4,2))
x1 = Lambda(slice,output_shape=(4,1),arguments={'index':0})(a)
x2 = Lambda(slice,output_shape=(4,1),arguments={'index':1})(a)
x1 = Reshape((4,1,1))(x1)
x2 = Reshape((4,1,1))(x2)
output = merge([x1,x2],mode='concat')
model = Model(a, output)
x_test = np.array([[[1,2],[2,3],[3,4],[4,5]]])
print model.predict(x_test)
plot(model, to_file='lambda.png',show_shapes=True)

实现了如下功能：
在这里插入图片描述
将(None,4,2)拆分成了两个(None,4,1)分别放到下面层做处理。用arguments传递参数。

作业里的代码如下

    for t in range(Tx):
        # Step 2.A: select the "t"th time step vector from X. 
        x = Lambda(lambda x: X[:,t,:])(X)

关于python里的lambda，可以参考廖雪峰大神。
https://www.liaoxuefeng.com/wiki/1016959663602400/1017451447842528
相当于一个一行的函数，python用的很灵活。

>>> f = lambda x: x * x
>>> f
<function <lambda> at 0x101c6ef28>
>>> f(5)
25

作业里的代码相当于完成了一个：从循环里提取不同时刻t时间的X 的切片的功能