1.此处补充一下迁移学习,实际在比如构建一个图像识别应用过程中,很少有人会直接随机初始化权重,且很难有大量数据来重新训练一个模型,相反的,我们会使用一个使用大批量数据训练好的卷积神经网络来训练。早期的卷积层提取低级特征,往后的卷积层提取高级的特征。这意味着只要任务接近我们就可以添加少量的数据来微调,就可以实现任务的迁移。
2.循环卷积网络受生物记忆启发。RNN,此外RNN的加强版有LSTM,GRU。记忆力更强^^.
3.普通的神经网络数据是单向传递的,而RNN是循环传递的,输入x经过hidden得到y,而hidden的输出结果h需要作为下次输入的一部分,循环传递。
4.RNN的本质是tanh(xtht-1)
5.LSTM通过门来控制信息(记忆)的去留来解决梯度消失的问题
下面实现一个RNN让三角函数sin去预测cos的值
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
steps = np.linspace(0, np.pi*2, 100, dtype=np.float32)
input_x = np.sin(steps)
target_y = np.cos(steps)
plt.plot(steps, input_x, 'b-', label='input:sin')
plt.plot(steps, target_y, 'r-', label='target:cos')
plt.legend(loc='best')
plt.show()
# 定义一个LSTM
class LSTM(nn.Module):
def __init__(self,INPUT_SIZE):
super(LSTM,self).__init__()
self.lstm = nn.LSTM(
input_size=INPUT_SIZE,
hidden_size=20,
# 表示输入输出的第一维为batch_size
batch_first=True
)
self.out = nn.Linear(20,1)
# 隐藏向量h_state,c_state
def forward(self, x, h_state,c_state):
r_out,(h_state,c_state) = self.lstm(x,(h_state,c_state))
outputs = self.out(r_out[0,:]).unsqueeze(0)
return outputs,h_state,c_state
def InitHidden(self):
h_state = torch.randn(1,1,20)
c_state = torch.randn(1,1,20)
return h_state,c_state
lstm = LSTM(INPUT_SIZE=1)
optimizer = torch.optim.Adam(lstm.parameters(), lr=0.001)
loss_func = nn.MSELoss()
h_state,c_state = lstm.InitHidden()
plt.figure(1, figsize=(12,5))
plt.ion()
for step in range(600):
start, end = step*np.pi,(step+1)*np.pi
steps = np.linspace(start, end, 100, dtype=np.float32)
x_np = np.sin(steps)
y_np = np.cos(steps)
x = torch.from_numpy(x_np).unsqueeze(0).unsqueeze(-1)
y = torch.from_numpy(y_np).unsqueeze(0).unsqueeze(-1)
prediction,h_state,c_state =lstm(x, h_state,c_state)
h_state = h_state.detach()
c_state = c_state.detach()
loss = loss_func(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(steps,y_np.flatten(), 'r-')
plt.plot(steps, prediction.data.numpy().flatten(),'b-')
plt.draw();plt.pause(0.05)
plt.ioff()
plt.show()