Machine Learning week4&5 and ex3&4

最新推荐文章于 2022-01-20 22:30:00 发布

Walker0319

最新推荐文章于 2022-01-20 22:30:00 发布

阅读量55

点赞数

分类专栏：笔记 ML 文章标签：机器学习

本文链接：https://blog.csdn.net/weixin_57901967/article/details/118704006

版权

笔记同时被 2 个专栏收录

33 篇文章 0 订阅

订阅专栏

9 篇文章 0 订阅

订阅专栏

1. Nueral Networks(NN)

Performing LinReg with a complex set of data with many features is very unwieldly.
100 features quadratic, 5050 resulting new features
50*50 image, n=2500, n^2/2=3125000 new features
so neural networks is applied with many features

1.1 Model Representation I

[x0x1x2] -> [a1a2a3a4 ] -> h(x)
intermediate or hidden layer nodes
ai(j) = ‘activation’ of unit i in layer j
expression of activation

1.2 Model Representation II

zi(j) = theta10 x0 + theta21 x1 + theta22 x2
ai(j) = g( zi(j) )
x = [x0 x1 … xn]’
z(j) = [z1(j) z2(j) … zn(j)]’ = Theta(j-1) * a(j-1)
a(j) = g( z(j) )
h(x) = a (j+1) = g( z(j+1) )
vector representation

1.3 multiclass classification

change the last h(x) to h(x)1, h(x2)2, …

1.4 cost function of NN

L = total number of layers
si =number of units (not counting bias unit) in layer i
K = number of ouput units/classes
too complicated to write

1.5 backpropagation algorithm

Gradient computation forward propagation vs. backpropagation
backpropagation algorithm
backpropagation intuition

1.6 implementation note: unrolling

#s1 = 10, s2=10, s3=10, s4=1
#Theta1 10*11, Theta1 10*11, Theta3 1*11
thetaVec = [Theta1(:); Theta2(:); Theta3(:)];
Theta1 = reshape(thetaVec(1:110), 10, 11);
Theta2 = reshape(thetaVec(111:220), 10, 11);
Theta3 = reshape(thetaVec(221:231), 1, 11);

1.7 Gradient Checking

in order to assure backprop. works as intended

epsilon = 1e-4;
for i = 1:n,
    thetaPlus = theta;
    thetaPlus(i) += epsilon;
    thetaMinus = theta;
    thetaMinus(i) -= epsilon;
    gradApprox(i) = (J(theatPlus) - J(thetaMinus)) / (2*epsilon)
end;
#then compare gradApprox = deltaVector

1.8 Random Initialization

Initializing all theta weights to 0 does not work with NN, because backprop. will get the same value.

#s1 = 10, s2=10, s3=10, s4=1
#Theta1 10*11, Theta1 10*11, Theta3 1*11
thetaVec = [Theta1(:); Theta2(:); Theta3(:)];
Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;
Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

1.9 Putting it together

First pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers total.
Number of input units = dimension of features x(i)
Number of output units = number of classes
Number of hidden units per layer = usually more the better
If more than 1 hidden layer, then the same number of units in every hidden layer.
Training a Neural Network

Randomly initialize the weights
Implement forward propagation to get h(x)
Implement the cost function
Implement backpropagation to compute partial derivatives
Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

perform forward and back propagation, loop on every training example

for i = 1:m;
    Perform forward/backward prop. on every example (x(i), y(i))
    Get activations a(1) and delta terms d(1) for l = 2,..., L

2 ex3

2.1 lrCostFunction

temp = theta;
temp(1) = 0;
Z = X * theta ; %here theta consist of theta0, theta1,..., thetan, total number n+1, and X shanp m * (n+1)
H = sigmoid(Z);
J = ( -y'*log(H) - (1-y)'*log(1-H)) /m + temp' * temp * lambda / m /2 ; %use temp not theta here, because theta0 does not regularize
grad = ( X'*(H-y) + temp * lambda )/m ; %again use temp to exclude theta0

在这里插入图片描述

2.2 oneVsAll

initial_theta = zeros(n+1, 1);
for c=1:num_labels,
        options = optimset('GradObj', 'on', 'MaxIter', 50);
        [theta]=...
                fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                initial_theta, options);
        all_theta(c, :) = theta';
end;

在这里插入图片描述

For some unknown reasons, my octave is stucked at this step. I reinstalled octave, but it doesn’t work. The enviroment is ubuntu20.04 on Virtualbox. Then I copied the files to my centos7 on another VM, the plot is not supported(?), but the training can go on, so that the code is OK.
At last, I downloaded ex3-octave.zip again, and unzip to ex3, it worked finally!

2.3 predictOneVsAll

[p_value, p] = max(X * all_theta',[], 2);

在这里插入图片描述

2.4 predict (ex3_nn)

X = [ones(m,1) X];
z2 = X * Theta1';
a2 = sigmoid(z2);

a2 = [ones(m,1) a2];
z3 = a2 * Theta2';
a3 = sigmoid(z3);

[p_value, p] = max( a3, [], 2);

在这里插入图片描述

2.5 submit

在这里插入图片描述

3 ex4

After reinstall virtualbox ubuntu20 and Octave, the perplexing problem disappeared finally. Export Appliance as ubt20.ova.

3.1 Feddforward function with regularization

X = [ones(m,1) X]; 
% add x0=1 in the input layer, X: m*(input_layer_size + 1)
Z2 = X * Theta1'; 
% compute Z in the hidden layer
% Theta1: (hidden_layer_size)*(input_layer_size+1), Z2: m*(hidden_layer_size + 1)
A2 = sigmoid(Z2); % compute A2

A2 = [ones(m,1) A2]; % add a20=1 in the hidden layer, A2: m*(hidden_layer_size + 1)
Z3 = A2 * Theta2'; %compute Z in the output layer
% Theta2: (output_layer_size)*(hidden_layer_size + 1), Z3: m*(output_layer_size)
% output_layer_size==num_labels
H = sigmoid(Z3);

% change y to Y
Y = zeros(m, num_labels);
for i=1:m
    Y(i, y(i)) = 1;
end
%

J = sum( sum( -Y .* log(H) - (1-Y) .* log(1-H)) ) /m;

Theta1_temp = Theta1;
Theta1_temp(:,1) = 0; //exclude theta0
Theta2_temp = Theta2;
Theta2_temp(:,1) = 0; //exclude theta0
Theta1_sum = Theta1_temp .* Theta1_temp;
Theta2_sum = Theta2_temp .* Theta2_temp;
Theta_sum = (sum(Theta1_sum(:)) + sum(Theta2_sum(:))) * lambda / m /2 
J = J + Theta_sum;

3.2 Backpropagation

for t = 1:m,
    a1 = X(t, :); % a1-> 1*(N1+1), since x0 has been added in the feeforward part
  %  a1 = [1; a1]; % a1-> 1*(N1+1)

    z2 = a1 * Theta1'; % z2-> 1*N2
    a2 = sigmoid(z2); % a2-> 1*N2
    a2 = [1, a2]; % a2->1*(N2+1)
    z3 = a2 * Theta2'; % z3->1*N3
    a3 = sigmoid(z3); % a3->1*N3
    %step1

    delta3 = a3 - Y(t, :); % delta3->1*N3
    %step2

    delta2 = delta3 * Theta2; % delta2->1*(N2+1)
    delta2 = delta2(2:end); %delta2->1*N2
    %setp3
    
    delta2 = delta2 .* sigmoidGradient(z2); 

    %step4

    Theta1_grad += delta2' * a1 ; % N2*(N1+1)
    Theta2_grad += delta3' * a2 ; % N3*(N2+1)

end

Theta1_grad = Theta1_grad / m + (lambda/m) * Theta1_temp; 
% exclude j=0, use the parameters in the feeforward part
Theta2_grad = Theta2_grad / m + (lambda/m) * Theta2_temp;
% exclude j=0, use the parameters in the feeforward part

3.3 submit

在这里插入图片描述

Walker0319

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning week4&5 and ex3&4

1. Nueral Networks(NN)Performing LinReg with a complex set of data with many features is very unwieldly.100 features quadratic, 5050 resulting new features50*50 image, n=2500, n^2/2=3125000 new featuresso neural networks is applied with many features1
复制链接

扫一扫