吴恩达deeplearning Lesson5 Week1
RNN
考虑了序列的先后顺序情况,解决的问题更加灵活。加入了时间维的deeplearning。
相当于考虑了过去的激活值a< t -1>,和现在的输入x< t >
RNN反向传播:
GRU
为了房主梯度消失或者梯度爆炸,提出了GRU概念(实际上是简化版本的lstm)
同时GRU还能够使你的神经网络捕获更长的长期依赖。
上图左边是简化门控的图,主要突出记忆单元cell的作用,gamma u 是遗忘门,用来显示到底模型是否需要考虑以前的影响,例如:gamma u=0则c< t> = c < t-1>,本次影响不考虑
上图是完整GRU,多了手写的gamma r。你可以认为gamma r代表相关性(relevance),代表下一个c的值跟上一个的相关性。
LSTM
区别与联系
图示更加好理解
反向传播(选看)
Building+a+Recurrent+Neural+Network±+Step+by+Step
程序块设计的还是比较巧妙的:
def rnn_forward(x, a0, parameters):
"""
Implement the forward propagation of the recurrent neural network described in Figure (3).
Arguments:
x -- Input data for every time-step, of shape (n_x, m, T_x).
a0 -- Initial hidden state, of shape (n_a, m)
parameters -- python dictionary containing:
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
ba -- Bias numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
Returns:
a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
caches -- tuple of values needed for the backward pass, contains (list of caches, x)
"""
# Initialize "caches" which will contain the list of all caches
caches = []
# Retrieve dimensions from shapes of x and parameters["Wya"]
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wya"].shape
### START CODE HERE ###
# initialize "a" and "y" with zeros (≈2 lines)
a = np.zeros((n_a,m,T_x))
y_pred = np.zeros((n_y,m,T_x))
# Initialize a_next (≈1 line)
a_next = a0#巧妙
# loop over all time-steps
for t in range(T_x):
# Update next hidden state, compute the prediction, get the cache (≈1 line)
a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)
# Save the value of the new "next" hidden state in a (≈1 line)
a[:,:,t] = a_next
# Save the value of the prediction in y (≈1 line)
y_pred[:,:,t] = yt_pred
# Append "cache" to "caches" (≈1 line)
caches.append(cache)
### END CODE HERE ###
# store values needed for backward propagation in cache
caches = (caches, x)
return a, y_pred, caches
通过不断的更新a_next,来将a_next放到下一个循环里。
这里a_next的作用在进入循环时是代表前一个激活值的,即将a_next放入前向函数里充当a_prev计算出真正的a_next(即这个cell输出的激活值)
然后将这个a_next当做下一个cell的a_prev再进行循环。
反向传播搞的不是很透彻,留坑。
python 出现的问题 invalid syntax
本来以为是本行写错了,但是原来出错原因在上一行。
print(id(gradient)
gradient=1
File "<ipython-input-27-ffe271b31b6d>", line 10
gradient=1
^
SyntaxError: invalid syntax
一般出现这种问题时,是因为上一行少了东西(一般是括号)
Dinosaurus+Island
变量赋值 地址or值
a = np.arange(10)
b=np.zeros(10)
np.clip(a, 3, 6, out=b)
print(b)
a=2
b=a
print(id(a))
print(id(b))
b=3
print(a)
print(id(b))
[3. 3. 3. 3. 4. 5. 6. 6. 6. 6.]
1737253056
1737253056
2
1737253088
b=a后 a和b的地址一样了 b=a是把a的地址赋给b
但是b=3后 b地址产生了变化
如果 a=b的话,a和b的地址相同;如果只是想拷贝,就要用 a = b[:]
思考这个问题是看到程序中出现的控制梯度范围函数中的for循环函数:
def clip(gradients, maxValue):
'''
Clips the gradients' values between minimum and maximum.
Arguments:
gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
Returns:
gradients -- a dictionary with the clipped gradients.
'''
dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
### START CODE HERE ###
# clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
print(id(dWax))
print(id(dWaa))
print(id(dWya))
print(id(db))
print(id(dby))
for gradient in [dWax, dWaa, dWya, db, dby]:
print("*"*10)
print(id(gradient))
print(gradient)
np.clip(gradient, -maxValue, maxValue,out = gradient)
print(id(gradient))
### END CODE HERE ###
gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
return gradients
1807415035824
1807415034784
1807414297264
1807414297504
1807414297904
**********
1807415035824
[[ 17.88628473 4.36509851 0.96497468]
[-18.63492703 -2.77388203 -3.54758979]
[ -0.82741481 -6.27000677 -0.43818169]
[ -4.7721803 -13.13864753 8.8462238 ]
[ 8.81318042 17.09573064 0.50033642]]
1807415035824
**********
1807415034784
[[ -4.04677415 -5.45359948 -15.46477316 9.82367434 -11.0106763 ]
[-11.85046527 -2.05649899 14.86148355 2.36716267 -10.2378514 ]
[ -7.129932 6.25244966 -1.60513363 -7.6883635 -2.30030722]
[ 7.45056266 19.76110783 -12.44123329 -6.26416911 -8.03766095]
[-24.19083173 -9.23792022 -10.23875761 11.23977959 -1.31914233]]
1807415034784
**********
1807414297264
[[-16.23285446 6.46675452 -3.56270759 -17.43141037 -5.96649642]
[ -5.8859438 -8.73882298 0.29713815 -22.48257768 -2.67761865]]
1807414297264
**********
1807414297504
[[10.13183442]
[ 8.52797841]
[11.081875 ]
[11.19390655]
[14.87543132]]
1807414297504
**********
1807414297904
[[-11.18300684]
[ 8.45833407]]
1807414297904
gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.2971381536101662
gradients["db"][4] = [10.]
gradients["dby"][1] = [8.45833407]
这个函数在np.clip(gradient, -maxValue, maxValue,out = gradient)更改中间变量gradient的值时,同时也让dWax, dWaa, dWya, db, dby产生了变化。
论坛解释:
https://www.coursera.org/learn/nlp-sequence-models/programming/1dYg0/dinosaur-island-character-level-language-modeling/discussions/threads/AoiiXWJOEeiEphLB3FeC3g
解释说 gradient 和dWax 实际上都是对某地址的引用,一个标签而已,改变其一就都变了(?还是不太懂)
numpy.random.choice
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html
numpy.random.choice(a, size=None, replace=True, p=None)¶
Generates a random sample from a given 1-D array
Parameters:
a : 一维数组或者整数
数组内的元素为离散分布的范围,如果为整数a,则自动理解为np.arange(a)
size : int or tuple of ints, optional
输出的维数,比如要输出几个
replace : boolean, optionalb 不管
p : 1-D array-like, optional
p代表分布 注意维数要与a相同
Returns:
samples : single item or ndarray
返回一个按照分布的样本(数或数组)
The generated random samples
作用是按照对应的概率分布,生成x个分布内的离散数值。
举例:
>>> np.random.choice(5, 3)
array([0, 3, 4])
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0])
numpy.clip
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html
numpy.clip(a, a_min, a_max, out=None)[source]
作用 限制数组的上下限(最大最小值)
数值超过限定最大值,将其更改为最大值,低于限定的最小值,将其限定为最小值。
out : ndarray, optional
out用来输出结果,需要保证其形状正确。
Returns:
clipped_array : ndarray
例子:
>>> a = np.arange(10)
>>> np.clip(a, 1, 8)
array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8])
np .ravel()
将多维数组降位一维.
numpy.ndarray.ravel
ndarray.ravel([order])
Return a flattened array.
shuffle()
shuffle() 方法将序列的所有元素随机排序。
需要import random
https://www.runoob.com/python/func-number-shuffle.html
列表 python
https://www.runoob.com/python/python-lists.html
复习一下,空列表直接a=[]
添加需要a.append(‘x’)
巨坑问题 model在coursera上提交得不到分数
在discussion上看了好多好多解释和方法都不能解决,明明代码和结果都是对的。
最后发现,需要改用程序的v3版本!而我的是老版本。同样的代码就可以通过。
浪费了很多时间,以后尽量在最新的版本上完成作业。
理解
通过这次作业,我对rnn了解的更深入了,尤其是语言模型这块,之前一直不理解在采样sample时为什么要把前一个输出放到后一个的输入。
经过作业明白了,其实我认为这个sample的名字命名为采样并不是很恰当,初次读到很有信号领域采样的迷惑感,实际上sample是一个测试模块。就是一个将输入置为0向量,根据输出softmax层(为了避免重复,每次都会换随机种子),不停生成新的恐龙名字的测试。
以及
x = np.zeros((vocab_size,1))
# Step 1': Initialize a_prev as zeros (≈1 line)
a_prev = np.zeros((n_a,1))
x的输入维数是单词onehot表的长度,a的输入维数是a的长度,与onehot是有区别的。
Improvise a Jazz Solo with an LSTM Network
使用lstm生成jazz ,我的结果与作业模板上的最后预期结果不同,但是仍然得到分数。而且我觉得我的那个曲子实在难听。。
思路
keras functional API
较为不熟练,经常要上官方文档搜索。
比如可以用名字创建层的实例化,然后直接指定输入。
这里补充下keras的知识,可以有两种方式构建模型: the Keras functional API、 the Keras Sequential model
本作业用的是 the Keras functional API,简介如下:
https://keras.io/getting-started/functional-api-guide/
官方入门例子讲的很好:
为了完成这样的框架,代码如下:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])
# We stack a deep densely-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
model.fit([headline_data, additional_data], [labels, labels],
epochs=50, batch_size=32)
model的compile和fit部分也可以写成:(我觉得这样更好)
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
epochs=50, batch_size=32)
keras 实例化层
之前让我迷惑的是实例化部分,一般都可以直接在实例化的过程中指定层的输入:
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels) # starts training
x = Dense(64, activation=‘relu’)(inputs)
x = Dense(64, activation=‘relu’)(x)
predictions = Dense(10, activation=‘softmax’)(x)
这里后面的括号就是输入tensor,即前一层输出作为下一层输入。
但是在作业里出现了:
reshapor = Reshape((1, 78)) # Used in Step 2.B of djmodel(), below
LSTM_cell = LSTM(n_a, return_state = True) # Used in Step 2.C
densor = Dense(n_values, activation='softmax') # Used in Step 2.D
当时不太理解,也不清楚后续应该怎么做,看了论坛发现后续。上面代码的作用是将指定的功能层实例化,相当于LSTM_cell是LSTM(xxx)特定配置的实例化层,后面如果使用还需要对其进行输入配置。与上上个代码段类似,只需要在实例后加上输入tensor即可。
官网例子:
import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
shared_lstm = LSTM(64) #实例化!!!!
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
#共用了同一层的参数,将tweet_a tweet_b都过这同一层
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)
注意下,这里实例化Mode,input要跟前面的Input层一一对照,比如前面input层:
tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))
输出层 predictions = Dense(1, activation=‘sigmoid’)(merged_vector)
那么这里实例化:model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
inputs 和outputs与前面是一一对应的。
同理,model.fit时,也将tensor对应的输入与标签按照顺序列好。
model.fit([data_a, data_b], labels, epochs=10)
[data_a, data_b]对应iinputs=[tweet_a, tweet_b]
labels 对应 outputs=predictions
shared_lstm = LSTM(64) #实例化!!!!
encoded_a = shared_lstm(tweet_a) 这里就不用配置实例化的模型了,具体配置都在实例中配置完了。后面的()是输入张量。
但是注意,如果实例化层后,有多个层用到该实例,包括output和shape都需要加别的参数区别对待:
a = Input(shape=(280, 256))
b = Input(shape=(280, 256))
lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)
lstm.output
>> AttributeError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.
要使用:
assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b
以及
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)
格式:实例名字.get_xxx_at(num)
to_categorical()
https://keras.io/utils/#to_categorical
keras.utils.to_categorical(y, num_classes=None, dtype=‘float32’)
作用是将数值转换为onehot编码,y是十进制数组,num_class是onehot长度,返回一个二值多维序列,即所求的onehot编码。
> labels
array([0, 2, 1, 2, 0])
# `to_categorical` converts this into a matrix with as many
# columns as there are classes. The number of rows
# stays the same.
> to_categorical(labels)
array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.]], dtype=float32)
keras Lambda自定义层实现数据的切片
byr学长写的,很清晰
https://blog.csdn.net/lujiandong1/article/details/54936185
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation,Reshape
from keras.layers import merge
from keras.utils.visualize_util import plot
from keras.layers import Input, Lambda
from keras.models import Model
def slice(x,index):
return x[:,:,index]
a = Input(shape=(4,2))
x1 = Lambda(slice,output_shape=(4,1),arguments={'index':0})(a)
x2 = Lambda(slice,output_shape=(4,1),arguments={'index':1})(a)
x1 = Reshape((4,1,1))(x1)
x2 = Reshape((4,1,1))(x2)
output = merge([x1,x2],mode='concat')
model = Model(a, output)
x_test = np.array([[[1,2],[2,3],[3,4],[4,5]]])
print model.predict(x_test)
plot(model, to_file='lambda.png',show_shapes=True)
实现了如下功能:
将(None,4,2)拆分成了两个(None,4,1)分别放到下面层做处理。用arguments传递参数。
作业里的代码如下
for t in range(Tx):
# Step 2.A: select the "t"th time step vector from X.
x = Lambda(lambda x: X[:,t,:])(X)
关于python里的lambda,可以参考廖雪峰大神。
https://www.liaoxuefeng.com/wiki/1016959663602400/1017451447842528
相当于一个一行的函数,python用的很灵活。
>>> f = lambda x: x * x
>>> f
<function <lambda> at 0x101c6ef28>
>>> f(5)
25
作业里的代码相当于完成了一个:从循环里提取不同时刻t时间的X 的切片的功能