rnn回归代码matlab,RNN以及LSTM的Matlab代码

该博客详细介绍了LSTM(长短期记忆网络)的实现步骤,包括初始化权重、训练数据生成、前向传播及反向传播的计算过程。通过一个简单的加法问题作为例子,展示了LSTM如何处理序列数据并进行预测。文章还涵盖了误差反向传播中权重的更新规则,以及在训练过程中的误差监控和输出结果的比较。
摘要由CSDN通过智能技术生成

% implementation of LSTM

clc

clear

close all

%% training dataset

generation

binary_dim  = 8;

largest_number = 2^binary_dim - 1;

binary  = cell(largest_number,

1);

for i = 1:largest_number + 1

binary{i}

=

dec2bin(i-1, binary_dim);

int2binary{i}

= binary{i};

end

%% input variables

alpha  = 0.1;

input_dim  = 2;

hidden_dim = 32;

output_dim = 1;

%% initialize neural network

weights

% in_gate  = sigmoid(X(t) * U_i + H(t-1) * W_i)

------- (1)

U_i = 2 * rand(input_dim, hidden_dim) - 1;

W_i = 2 * rand(hidden_dim, hidden_dim) - 1;

U_i_update = zeros(size(U_i));

W_i_update = zeros(size(W_i));

% forget_gate = sigmoid(X(t) * U_f +

H(t-1) * W_f)  -------

(2)

U_f = 2 * rand(input_dim, hidden_dim) - 1;

W_f = 2 * rand(hidden_dim, hidden_dim) - 1;

U_f_update = zeros(size(U_f));

W_f_update = zeros(size(W_f));

% out_gate  = sigmoid(X(t) * U_o + H(t-1) * W_o)

------- (3)

U_o = 2 * rand(input_dim, hidden_dim) - 1;

W_o = 2 * rand(hidden_dim, hidden_dim) - 1;

U_o_update = zeros(size(U_o));

W_o_update = zeros(size(W_o));

% g_gate  = tanh(X(t) * U_g + H(t-1) *

W_g)  ------- (4)

U_g = 2 * rand(input_dim, hidden_dim) - 1;

W_g = 2 * rand(hidden_dim, hidden_dim) - 1;

U_g_update = zeros(size(U_g));

W_g_update = zeros(size(W_g));

out_para = 2 * rand(hidden_dim, output_dim) - 1;

out_para_update = zeros(size(out_para));

% C(t) = C(t-1) .* forget_gate + g_gate

.* in_gate  -------

(5)

% S(t) = tanh(C(t)) .* out_gate

------- (6)

% Out  = sigmoid(S(t) *

out_para)  ------- (7)

% Note: Equations (1)-(6) are cores of

LSTM in forward, and equation (7) is

% used to transfer hiddent layer to

predicted output, i.e., the output layer.

% (Sometimes you can use softmax for

equation (7))

%% train

iter = 99999; % training

iterations

for j = 1:iter

%

generate a simple addition problem (a + b = c)

a_int =

randi(round(largest_number/2));  % int version

a  = int2binary{a_int+1};  % binary

encoding

b_int =

randi(floor(largest_number/2));  % int version

b  = int2binary{b_int+1};  % binary encoding

%

true answer

c_int = a_int + b_int;

% int version

c  = int2binary{c_int+1};  % binary encoding

%

where we'll store our best guess (binary encoded)

d  = zeros(size(c));

if length(d)<8

pause;

end

%

total error

overallError = 0;

%

difference in output layer, i.e., (target - out)

output_deltas =

[];

%

values of hidden layer, i.e., S(t)

hidden_layer_values =

[];

cell_gate_values

= [];

%

initialize S(0) as a zero-vector

hidden_layer_values =

[hidden_layer_values; zeros(1, hidden_dim)];

cell_gate_values

= [cell_gate_values; zeros(1,

hidden_dim)];

%

initialize memory gate

%

hidden layer

H = [];

H = [H; zeros(1,

hidden_dim)];

%

cell gate

C = [];

C = [C; zeros(1,

hidden_dim)];

%

in gate

I = [];

%

forget gate

F = [];

%

out gate

O = [];

%

g gate

G = [];

%

start to process a sequence, i.e., a forward pass

%

Note: the output of a LSTM cell is the hidden_layer, and you need

to

%

transfer it to predicted output

for position =

0:binary_dim-1

% X ------> input, size: 1

x input_dim

X = [a(binary_dim - position)-'0' b(binary_dim -

position)-'0'];

% y ------> label, size: 1

x output_dim

y = [c(binary_dim - position)-'0']';

% use equations (1)-(7) in a

forward pass. here we do not use bias

in_gate  =

sigmoid(X * U_i + H(end, :) * W_i); % equation (1)

forget_gate = sigmoid(X * U_f + H(end, :) *

W_f);  % equation

(2)

out_gate  =

sigmoid(X * U_o + H(end, :) * W_o);  % equation (3)

g_gate  = tan_h(X * U_g + H(end, :) * W_g);

%

equation (4)

C_t  = C(end, :) .* forget_gate +

g_gate .* in_gate;  % equation (5)

H_t  = tan_h(C_t) .* out_gate;

% equation (6)

% store these memory

gates

I = [I; in_gate];

F = [F; forget_gate];

O = [O; out_gate];

G = [G; g_gate];

C = [C; C_t];

H = [H; H_t];

% compute predict

output

pred_out = sigmoid(H_t * out_para);

% compute error in output

layer

output_error = y - pred_out;

% compute difference in

output layer using derivative

% output_diff = output_error

* sigmoid_output_to_derivative(pred_out);

output_deltas = [output_deltas;

output_error];

% compute total

error

% note that if the size of

pred_out or target is 1 x n or m x n,

% you should use other

approach to compute error. here the

dimension

% of pred_out is 1 x

1

overallError = overallError +

abs(output_error(1));

% decode estimate so we can

print it out

d(binary_dim - position) =

round(pred_out);

end

%

from the last LSTM cell, you need a initial hidden layer

difference

future_H_diff = zeros(1,

hidden_dim);

%

stare back-propagation, i.e., a backward pass

%

the goal is to compute differences and use them to update

weights

%

start from the last LSTM cell

for position =

0:binary_dim-1

X = [a(position+1)-'0' b(position+1)-'0'];

% hidden

layer

H_t = H(end-position, :);  % H(t)

% previous hidden

layer

H_t_1 = H(end-position-1, :);  % H(t-1)

C_t = C(end-position, :);  % C(t)

C_t_1 = C(end-position-1, :);  % C(t-1)

O_t = O(end-position, :);

F_t = F(end-position, :);

G_t = G(end-position, :);

I_t = I(end-position, :);

% output layer

difference

output_diff = output_deltas(end-position,

:);

% hidden layer

difference

% note that here we consider

one hidden layer is input to both

% output layer and next LSTM

cell. Thus its difference also comes

% from two sources. In some

other method, only one source is taken

% into

consideration.

% use the equation: delta(l)

= (delta(l+1) * W(l+1)) .* f'(z) to

% compute difference in

previous layers. look for more about the

% proof at

http://neuralnetworksanddeeplearning.com/chap2.html

%  H_t_diff = (future_H_diff *

(W_i' + W_o' + W_f' + W_g') + output_diff * out_para')

...

%  .*

sigmoid_output_to_derivative(H_t);

%  H_t_diff = output_diff *

(out_para') .* sigmoid_output_to_derivative(H_t);

H_t_diff = output_diff * (out_para') .*

sigmoid_output_to_derivative(H_t);

%  out_para_diff = output_diff *

(H_t) * sigmoid_output_to_derivative(out_para);

out_para_diff =  (H_t') *

output_diff;

% out_gate

diference

O_t_diff = H_t_diff .* tan_h(C_t) .*

sigmoid_output_to_derivative(O_t);

% C_t difference

C_t_diff = H_t_diff .* O_t .*

tan_h_output_to_derivative(C_t);

%  % C(t-1)

difference

%  C_t_1_diff = C_t_diff .*

F_t;

%

forget_gate_diffeence

F_t_diff = C_t_diff .* C_t_1 .*

sigmoid_output_to_derivative(F_t);

% in_gate

difference

I_t_diff = C_t_diff .* G_t .*

sigmoid_output_to_derivative(I_t);

% g_gate

difference

G_t_diff = C_t_diff .* I_t .*

tan_h_output_to_derivative(G_t);

% differences of U_i and

W_i

U_i_diff =  X' * I_t_diff .*

sigmoid_output_to_derivative(U_i);

W_i_diff =  (H_t_1)' * I_t_diff

.* sigmoid_output_to_derivative(W_i);

% differences of U_o and

W_o

U_o_diff = X' * O_t_diff .*

sigmoid_output_to_derivative(U_o);

W_o_diff = (H_t_1)' * O_t_diff .*

sigmoid_output_to_derivative(W_o);

% differences of U_o and

W_o

U_f_diff = X' * F_t_diff .*

sigmoid_output_to_derivative(U_f);

W_f_diff = (H_t_1)' * F_t_diff .*

sigmoid_output_to_derivative(W_f);

% differences of U_o and

W_o

U_g_diff = X' * G_t_diff .*

tan_h_output_to_derivative(U_g);

W_g_diff = (H_t_1)' * G_t_diff .*

tan_h_output_to_derivative(W_g);

% update

U_i_update = U_i_update + U_i_diff;

W_i_update = W_i_update + W_i_diff;

U_o_update = U_o_update + U_o_diff;

W_o_update = W_o_update + W_o_diff;

U_f_update = U_f_update + U_f_diff;

W_f_update = W_f_update + W_f_diff;

U_g_update = U_g_update + U_g_diff;

W_g_update = W_g_update + W_g_diff;

out_para_update = out_para_update +

out_para_diff;

end

U_i = U_i + U_i_update *

alpha;

W_i = W_i + W_i_update *

alpha;

U_o = U_o + U_o_update *

alpha;

W_o = W_o + W_o_update *

alpha;

U_f = U_f + U_f_update *

alpha;

W_f = W_f + W_f_update *

alpha;

U_g = U_g + U_g_update *

alpha;

W_g = W_g + W_g_update *

alpha;

out_para = out_para +

out_para_update * alpha;

U_i_update = U_i_update

* 0;

W_i_update = W_i_update

* 0;

U_o_update = U_o_update

* 0;

W_o_update = W_o_update

* 0;

U_f_update = U_f_update

* 0;

W_f_update = W_f_update

* 0;

U_g_update = U_g_update

* 0;

W_g_update = W_g_update

* 0;

out_para_update =

out_para_update * 0;

if(mod(j,1000) ==

0)

err = sprintf('Error:%s\n',

num2str(overallError)); fprintf(err);

d = bin2dec(num2str(d));

pred = sprintf('Pred:%s\n',dec2bin(d,8));

fprintf(pred);

Tru = sprintf('True:%s\n', num2str(c));

fprintf(Tru);

out = 0;

sep = sprintf('-------------\n');

fprintf(sep);

end

end

### 回答1: LSTM(长短期记忆神经网络)是一种循环神经网络(RNN)的变体,通过引入“门”结构来有效处理长期依赖关系。在Matlab中,我们可以使用深度学习工具箱来实现LSTM网络。 首先,我们需要指定LSTM网络的一些超参数,例如输入维度、隐藏层维度、输出维度等。然后,我们可以使用lstmLayer函数来创建LSTM层,并通过指定超参数来定制网络结构。 接下来,我们可以定义模型的其余部分。使用序列网络(sequence network)的形式,我们可以通过添加和连接各个层来定义网络结构。比如,我们可以使用fullyConnectedLayer函数创建全连接层,再使用softmaxLayer函数创建一个归一化层。 一旦网络结构定义完成,我们可以使用trainNetwork函数来训练LSTM模型。该函数需要训练数据集、验证数据集、网络结构和一些训练参数作为输入。训练过程中,可选的参数包括优化算法、学习率、最大训练时期数等。通过反复调整这些参数,我们可以寻找到最佳的模型配置。 训练完成后,我们可以使用classify或predict函数来对新的输入数据进行分类或预测。这些函数提供了一个方便的接口,将输入数据传递给训练好的模型,并返回相应的输出结果。 综上所述,使用Matlab可以轻松地实现LSTM神经网络,并进行分类或预测任务。通过调整超参数和训练参数,我们可以提高模型的准确性和泛化能力。同时,Matlab还提供了丰富的可视化工具,帮助我们分析网络性能、解释模型行为以及优化网络结构。 ### 回答2: LSTM(长短期记忆)是一种递归神经网络(RNN)的变种,用于处理序列数据的预测和分类任务。Matlab提供了一些工具和函数来实现LSTM神经网络。下面是一个简单的用Matlab实现LSTM RNN代码示例: ```matlab % 载入数据 data = load('data.mat'); X = data.X; y = data.y; % 数据预处理 [num_samples, input_size] = size(X); [input_size, num_labels] = size(y); % 设置网络参数 hidden_size = 100; num_layers = 2; learning_rate = 0.01; num_epochs = 100; % 初始化权重 parameters = initialize_parameters(input_size, hidden_size, num_labels, num_layers); % 训练模型 for epoch = 1:num_epochs % 正向传播计算输出 [cache, a] = lstm_forward(X, parameters); % 计算损失 loss = compute_loss(a, y); % 反向传播更新权重 grads = lstm_backward(X, y, cache, parameters); parameters = update_parameters(parameters, grads, learning_rate); % 打印每个epoch的损失 fprintf('Epoch %d, Loss: %f\n', epoch, loss); end % 预测新数据 new_data = load('new_data.mat'); X_new = new_data.X_new; % 正向传播计算输出 [~, a_new] = lstm_forward(X_new, parameters); % 输出预测结果 prediction = softmax(a_new); % 打印预测结果 fprintf('Prediction: %f\n', prediction); ``` 上述代码是一个简单的LSTM RNN模型的训练和预测过程。其中`initialize_parameters`函数用于初始化权重,`lstm_forward`函数用于正向传播计算输出,`compute_loss`函数用于计算损失,`lstm_backward`函数用于反向传播更新权重,`update_parameters`函数用于根据梯度和学习率更新权重,`softmax`函数用于将输出进行概率化处理。 训练过程中的每个epoch会计算损失并根据损失调整权重,最终输出预测结果。预测阶段输入新数据进行正向传播,得到预测结果。 请注意,上述代码仅为示例,实际应用中可能需要根据具体问题和数据进行适当修改和调整。 ### 回答3: LSTM(长短期记忆)是一种循环神经网络(RNN)中的重要变体,用于处理和预测时间序列数据,它通过记忆单元和门控结构来解决传统RNN中的梯度消失和梯度爆炸问题。Matlab是一款广泛应用于科学计算和数据分析的编程语言和环境。下面是一个简单的LSTM RNNMatlab代码示例: ```matlab % 导入数据 data = % 输入数据,大小为(时间步长,特征维度) % 设置网络参数 hiddenSize = % 隐藏层神经元数量 inputSize = % 输入维度 outputSize = % 输出维度 % 初始化LSTM RNN网络 lstm = patternnet(hiddenSize); % 设置训练参数 lstm.trainParam.lr = % 学习率 lstm.trainParam.epochs = % 迭代次数 % 划分训练集和测试集 [trainInd, valInd, testInd] = dividerand(size(data,2), % 训练集比例, % 验证集比例, % 测试集比例); % 训练LSTM RNN网络 lstm = train(lstm, data(:, trainInd), data(:, trainInd)); % 测试LSTM RNN网络 predictions = lstm(data(:, testInd)); % 计算预测结果和实际结果之间的误差 error = predictions - data(:, testInd); % 显示误差和准确率等统计信息 mse = mean(error.^2); accuracy = 1 - mse/var(data(:, testInd)); disp(['Mean squared error: ', num2str(mse)]); disp(['Accuracy: ', num2str(accuracy)]); ``` 以上代码只是一个基本的LSTM RNN模型,实际应用可能需要根据具体情况进行修改和调整,包括数据处理、网络结构、训练参数的设置等。这个代码示例可以作为一个起点,帮助你进一步了解和探索LSTM RNNMatlab中的应用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值