Machine Learning week4&5 and ex3&4

9 篇文章 0 订阅

1. Nueral Networks(NN)

Performing LinReg with a complex set of data with many features is very unwieldly.
100 features quadratic, 5050 resulting new features
50*50 image, n=2500, n^2/2=3125000 new features
so neural networks is applied with many features

1.1 Model Representation I

[x0x1x2] -> [a1a2a3a4 ] -> h(x)
intermediate or hidden layer nodes
ai(j) = ‘activation’ of unit i in layer j
expression of activation

1.2 Model Representation II

zi(j) = theta10 x0 + theta21 x1 + theta22 x2
ai(j) = g( zi(j) )
x = [x0 x1 … xn]’
z(j) = [z1(j) z2(j) … zn(j)]’ = Theta(j-1) * a(j-1)
a(j) = g( z(j) )
h(x) = a (j+1) = g( z(j+1) )
vector representation

1.3 multiclass classification

change the last h(x) to h(x)1, h(x2)2, …

1.4 cost function of NN

L = total number of layers
si =number of units (not counting bias unit) in layer i
K = number of ouput units/classes
too complicated to write

1.5 backpropagation algorithm

  1. Gradient computation forward propagation vs. backpropagation
    forward vs. backward propagation
  2. backpropagation algorithm
    backpropagation algorithm
  3. backpropagation intuition
    example

1.6 implementation note: unrolling

#s1 = 10, s2=10, s3=10, s4=1
#Theta1 10*11, Theta1 10*11, Theta3 1*11
thetaVec = [Theta1(:); Theta2(:); Theta3(:)];
Theta1 = reshape(thetaVec(1:110), 10, 11);
Theta2 = reshape(thetaVec(111:220), 10, 11);
Theta3 = reshape(thetaVec(221:231), 1, 11);

1.7 Gradient Checking

in order to assure backprop. works as intended

epsilon = 1e-4;
for i = 1:n,
    thetaPlus = theta;
    thetaPlus(i) += epsilon;
    thetaMinus = theta;
    thetaMinus(i) -= epsilon;
    gradApprox(i) = (J(theatPlus) - J(thetaMinus)) / (2*epsilon)
end;
#then compare gradApprox = deltaVector

1.8 Random Initialization

Initializing all theta weights to 0 does not work with NN, because backprop. will get the same value.

#s1 = 10, s2=10, s3=10, s4=1
#Theta1 10*11, Theta1 10*11, Theta3 1*11
thetaVec = [Theta1(:); Theta2(:); Theta3(:)];
Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

1.9 Putting it together

First pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers total.
Number of input units = dimension of features x(i)
Number of output units = number of classes
Number of hidden units per layer = usually more the better
If more than 1 hidden layer, then the same number of units in every hidden layer.
Training a Neural Network

  1. Randomly initialize the weights
  2. Implement forward propagation to get h(x)
  3. Implement the cost function
  4. Implement backpropagation to compute partial derivatives
  5. Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
  6. Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

perform forward and back propagation, loop on every training example

for i = 1:m;
    Perform forward/backward prop. on every example (x(i), y(i))
    Get activations a(1) and delta terms d(1) for l = 2,..., L

2 ex3

2.1 lrCostFunction

temp = theta;
temp(1) = 0;
Z = X * theta ; %here theta consist of theta0, theta1,..., thetan, total number n+1, and X shanp m * (n+1)
H = sigmoid(Z);
J = ( -y'*log(H) - (1-y)'*log(1-H)) /m + temp' * temp * lambda / m /2 ; %use temp not theta here, because theta0 does not regularize
grad = ( X'*(H-y) + temp * lambda )/m ; %again use temp to exclude theta0

在这里插入图片描述

2.2 oneVsAll

initial_theta = zeros(n+1, 1);
for c=1:num_labels,
        options = optimset('GradObj', 'on', 'MaxIter', 50);
        [theta]=...
                fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                initial_theta, options);
        all_theta(c, :) = theta';
end;

在这里插入图片描述

For some unknown reasons, my octave is stucked at this step. I reinstalled octave, but it doesn’t work. The enviroment is ubuntu20.04 on Virtualbox. Then I copied the files to my centos7 on another VM, the plot is not supported(?), but the training can go on, so that the code is OK.
At last, I downloaded ex3-octave.zip again, and unzip to ex3, it worked finally!

2.3 predictOneVsAll

[p_value, p] = max(X * all_theta',[], 2);

在这里插入图片描述

2.4 predict (ex3_nn)

X = [ones(m,1) X];
z2 = X * Theta1';
a2 = sigmoid(z2);

a2 = [ones(m,1) a2];
z3 = a2 * Theta2';
a3 = sigmoid(z3);

[p_value, p] = max( a3, [], 2);

在这里插入图片描述

2.5 submit

在这里插入图片描述

3 ex4

After reinstall virtualbox ubuntu20 and Octave, the perplexing problem disappeared finally. Export Appliance as ubt20.ova.

3.1 Feddforward function with regularization

X = [ones(m,1) X]; 
% add x0=1 in the input layer, X: m*(input_layer_size + 1)
Z2 = X * Theta1'; 
% compute Z in the hidden layer
% Theta1: (hidden_layer_size)*(input_layer_size+1), Z2: m*(hidden_layer_size + 1)
A2 = sigmoid(Z2); % compute A2

A2 = [ones(m,1) A2]; % add a20=1 in the hidden layer, A2: m*(hidden_layer_size + 1)
Z3 = A2 * Theta2'; %compute Z in the output layer
% Theta2: (output_layer_size)*(hidden_layer_size + 1), Z3: m*(output_layer_size)
% output_layer_size==num_labels
H = sigmoid(Z3);

% change y to Y
Y = zeros(m, num_labels);
for i=1:m
    Y(i, y(i)) = 1;
end
%

J = sum( sum( -Y .* log(H) - (1-Y) .* log(1-H)) ) /m;

Theta1_temp = Theta1;
Theta1_temp(:,1) = 0; //exclude theta0
Theta2_temp = Theta2;
Theta2_temp(:,1) = 0; //exclude theta0
Theta1_sum = Theta1_temp .* Theta1_temp;
Theta2_sum = Theta2_temp .* Theta2_temp;
Theta_sum = (sum(Theta1_sum(:)) + sum(Theta2_sum(:))) * lambda / m /2 
J = J + Theta_sum;

3.2 Backpropagation

for t = 1:m,
    a1 = X(t, :); % a1-> 1*(N1+1), since x0 has been added in the feeforward part
  %  a1 = [1; a1]; % a1-> 1*(N1+1)

    z2 = a1 * Theta1'; % z2-> 1*N2
    a2 = sigmoid(z2); % a2-> 1*N2
    a2 = [1, a2]; % a2->1*(N2+1)
    z3 = a2 * Theta2'; % z3->1*N3
    a3 = sigmoid(z3); % a3->1*N3
    %step1

    delta3 = a3 - Y(t, :); % delta3->1*N3
    %step2

    delta2 = delta3 * Theta2; % delta2->1*(N2+1)
    delta2 = delta2(2:end); %delta2->1*N2
    %setp3
    
    delta2 = delta2 .* sigmoidGradient(z2); 

    %step4

    Theta1_grad += delta2' * a1 ; % N2*(N1+1)
    Theta2_grad += delta3' * a2 ; % N3*(N2+1)

end

Theta1_grad = Theta1_grad / m + (lambda/m) * Theta1_temp; 
% exclude j=0, use the parameters in the feeforward part
Theta2_grad = Theta2_grad / m + (lambda/m) * Theta2_temp;
% exclude j=0, use the parameters in the feeforward part

3.3 submit

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值