重温深度学习全连接网络

最近重新做一下以前的课的一个project,做一个k layer的全连接网络,顺便复习一下深度学习的一些基本概念,用的语言是matlab。实验的指导书见链接,用到的数据集是cifar10。全部代码可以在我的github找到。

目录

1. 读入数据

2. 初始化网络参数

3 没有bacth normalization的后续实现

3.1 计算cost function

3.2 计算accuracy

3.3 计算梯度

3.4 训练--Minibatch gradient descent with momentum

4 加入bacth normalization的后续实现

4.1 计算cost function

4.2 计算accuracy

4.3 计算梯度

4.4 训练--Minibatch gradient descent with momentum


1. 读入数据

调试的时候可以只读一部分data,之后为了追求更高的accuracy可以读入全部,比较省时,如下:

clc;
clear;
addpath Datasets/cifar-10-matlab/cifar-10-batches-mat/;

% Read in the data & initialize the parameters

%Use part of the data
[Xtrain,Ytrain,ytrain] = LoadBatch('data_batch_1.mat'); % training data
% Xmean = mean(Xtrain,2);
% Xtrain = Xtrain - Xmean;
[Xvalid,Yvalid,yvalid] = LoadBatch('data_batch_2.mat'); % validation data
[Xtest,Ytest,ytest] = LoadBatch('test_batch.mat'); % test data
% Xvalid = Xvalid - Xmean;
% Xtest = Xtest - Xmean;

%%% Use all data
% [Xtrain1,Ytrain1,ytrain1] = LoadBatch('data_batch_1.mat'); % training data part1
% [Xtrain2,Ytrain2,ytrain2] = LoadBatch('data_batch_2.mat'); % training data part2
% [Xtrain3,Ytrain3,ytrain3] = LoadBatch('data_batch_3.mat'); % training data part3
% [Xtrain4,Ytrain4,ytrain4] = LoadBatch('data_batch_4.mat'); % training data part4
% [X5,Y5,y5] = LoadBatch('data_batch_5.mat'); % training data part5
% 
% Xtrain5=X5(:,1:size(X5,2)-1000);
% Xtrain=[Xtrain1,Xtrain2,Xtrain3,Xtrain4,Xtrain5];
% Ytrain5=Y5(:,1:size(Y5,2)-1000);
% Ytrain=[Ytrain1,Ytrain2,Ytrain3,Ytrain4,Ytrain5];
% ytrain5=y5(:,1:size(X5,2)-1000);
% ytrain=[ytrain1,ytrain2,ytrain3,ytrain4,ytrain5];
% 
% Xvalid=X5(:,(size(X5,2)-999):size(X5,2));
% Yvalid=Y5(:,(size(Y5,2)-999):size(Y5,2));
% yvalid=y5(:,(size(y5,2)-999):size(y5,2));
% 
% [Xtest,Ytest,ytest] = LoadBatch('test_batch.mat'); % test data
%%% Use all data end

mean_X = mean(Xtrain, 2);
Xtrain = Xtrain - repmat(mean_X, [1, size(Xtrain, 2)]);
Xvalid = Xvalid - repmat(mean_X, [1, size(Xvalid, 2)]);
Xtest = Xtest - repmat(mean_X, [1, size(Xtest, 2)]);

%Use small amount of data during gradient check, becausse gradient check is very time consuming
d=100;%dimention
n=100;
Xtrain=Xtrain(1:d,1:n);
ytrain=ytrain(1:n);
Ytrain=Ytrain(:,1:n);
Xvalid=Xvalid(1:d,1:n);
yvalid=yvalid(1:n);
Yvalid=Yvalid(:,1:n);
Xtest=Xtest(1:d,1:n);
ytest=ytest(1:n);
Ytest=Ytest(:,1:n);

其中LoadBatch是这样一个函数,对X做normalization,对y做one-hot encoding:

function [X, Y, y] = LoadBatch(filename)
 A = load(filename);
 X = double(A.data)/double(255); %normalized to figures between 0 and 1
 %X is of type "double"
 y = A.labels;
 [a,~] = size(y);
 K = 10;
 Y = zeros(a,K);
 for i = 1:a
 Y(i,y(i)+1) = 1;  % y after one-hot encoding
 end
 X = X';
 Y = Y';
 y = y';
end

2. 初始化网络参数

首先设置几层网络(以3为例),每个hidden layer的neuron数目(m1=50, m2=30为例)。然后初始化W和b:

k=3; %number of layers
m={50,30}; % no. of hidden units in each hidden layer, the size of this cell should be  k-1
[W,b]=initialize(Xtrain,k,m); 

W和b的初始化用到的是自己定义的initialize函数,因为直接初始化为0或者随机初始化都是有很多问题的,选择的是He initialization,关于它的介绍可以看这篇文章:

聊一聊深度学习的weight initialization

这个函数定义为:

function [W,b]=initialize(X,k_layer,m)
K = 10; % size of the output layer
d = size(X,1); %dimention, all is 3072
rng(400);
% sigma=0.01;
n=[d,m,K];
W=cell(1,k_layer); % no. of layers of the network
b=cell(1,k_layer);
for i=1:k_layer
    sum_nodes = sum(cell2mat(m))+ K; %the sum of nodes in all layers, in our case 50+30+10
    sigma = sqrt(2/sum_nodes); %He initialization, to initilize the weights
    W{i} = sigma*randn([n{i+1} n{i}]); 
    % In our case, the size of W1 is m1*d,
    % the size of W2 is m2*m1, the size of W3 is K*m2
    b{i} = sigma*randn([n{i+1} 1]);
    % In our case, the size of W1 is m1*1,
    % the size of W2 is m2*1, the size of W3 is K*1
end

% %%%%%%%%%%%%%%%%%%%%%%  When don't use He initialization
% mean = 0;
% sigma = 0.1;
% 
% for i=1:k_layer-1
% W{i} = mean + sigma*randn(m{i},d);
% d = m{i};
% b{i} = zeros(m{i},1);
% end
% W{k_layer} = mean + sigma*randn(K,m{end});
% b{k_layer} = zeros(K,1);
% b{k_layer} = mean + sigma*rand(K,1);
% %%%%%%%%%%%%%%%%%%%%%%%


end

为了比较有没有batch norm的区别,后续分成两种情况分别实现。

3 没有bacth normalization的后续实现

3.1 计算cost function

Cost function: a weighted sum of the cross entropy loss on the labelled training data, and L2 regularization of the weight matrices

定义式为:


代码实现如下:

%cost function
lambda = 0; % when lambda = 0, there is no regularization
J = ComputeCost(Xtrain, Ytrain, W, b, lambda);

其中ComputeCost是我定义的一个函数,如下:

function J = ComputeC
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值