目录
前言
作业做下来,我是傻逼
一、进度
第四周(56%)
二、基本内容
1.Non-Linear Hypothesis
为什么要引入Non-Linear Hypothesis?
因为在类似识别图像中,X—样本—feature(不管叫什么)太多了。这里的3000000个特征一开始还没想明白。假如一张图是50*50像素的话,那么一张图中就可以有2500个像素点(仅考虑灰度图,如果是RGB图还要*3),那么一张图就可以有2500个features。如果我要引入一个新的feature是关于这张图中所有X的二项式,即类似自乘,那么会有2500*2500/2≈3000000个组合。除以2是因为x1*x2和x2*x1是一回事。就是不太明白,为什么我要设置这样一个奇怪的自乘feature呢?
2.Neural Networks
基本结构就是输入、处理、输出
在这一个简单的神经网络中,模拟多输入,处理,输出的过程
这图就比较典型,直接说明了Input Layer, Hidden Layer, Output Layer的关系。
上方的bias unit就类似之前多出来的一列值为1的X。
这两张图感觉是写代码的核心,但是感觉略难记
注意区分一下中间的几个元素即可:
x[1,2,3...]:输入层
:第二层的第1、2个元素
θ:这里我叫他大θ,因为之前的θ都是一个向量,而这里的θ成了一个矩阵。而且有多少层,就会有多少个θ(我甚至可以认为大θ是一个三维的东西)。在每一层内的θ都是M*N的矩阵,M指下一层的神经元个数,N指上一层的神经元个数+1。表示第i个θ矩阵,后一层的a神经元(不包括bias元),前一层的b神经元(包括bias元),写成图中的样子就是第a行第b列。
z:还是θTX。只是这里的几个小标注意下。表示第i层的θTX向量。具体的,
,表示第二层第一个元素,对前面所有的神经元的值X与θ对应后的值。然后通过g函数sigmoid一下,得到每个下层神经元的具体值。
后面的一些等式就是举了些易混淆的例子。
总结下来,对于只有一个Hidden Layer的神经网络,大致有以下等式:
除了边界,中间的基本就是:
最后一层就是:
个人觉得大体思路就是在Linear Hypothesis的基础上进行了两个程度的延伸:一是对多输入进行多输出的扩展,二是在多输出的基础上在重复这个过程。所以θ从原来的一维向量变成了这里的个人理解上的三维矩阵组。
不过这算是简化还是复杂化了呢?暂时个人还不是很了解,还得继续挖下去:(
3.Neural Network And Logistic Gate
这个部分很奇妙的把神经网络和之前学过的逻辑门结合在了一起。
思路其实很简单,在这里,对于一个只有Input Layer和一个Output Layer的神经网络,我们的大θ就是一个一维的向量(1*3)。通过设置θ内的参数,对于所有均为0或1的输入,可以得到逻辑门的输出。在上图例子中,这个神经网络起到了与门的作用。
4.作业
我是傻逼。
老实说低估了这次的作业难度,或者说高估了我的阅读理解能力。这次的pdf看的很费劲,有点不知所云的感觉。而且作业还是用到了之前的多分类问题的思路(我还以为全是神经网络的)。
不多说,先备份:
function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
% efficiently vectorized. For example, consider the computation
%
% sigmoid(X * theta)
%
% Each row of the resulting matrix will contain the value of the
% prediction for that example. You can make use of this to vectorize
% the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
% there're many possible vectorized solutions, but one solution
% looks like:
% grad = (unregularized gradient for logistic regression)
% temp = theta;
% temp(1) = 0; % because we don't add anything for j = 0
% grad = grad + YOUR_CODE_HERE (using the temp variable)
%
hx = sigmoid(X * theta);
grad = (1/m)*transpose(X)*(hx-y);
J = (1/m)*(-transpose(y)*(log(hx))-transpose(1-y)*log(1-hx));
J = J + lambda / (2*m) * sum(theta(2:end).^2);
thetaback = theta;
thetaback(1) = 0;
grad = grad+lambda/m*thetaback;
% =============================================================
grad = grad(:);
end
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
% logistic regression classifiers and returns each of these classifiers
% in a matrix all_theta, where the i-th row of all_theta corresponds
% to the classifier for label i
% Some useful variables
m = size(X, 1);
n = size(X, 2);
% You need to return the following variables correctly
all_theta = zeros(num_labels, n + 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
% logistic regression classifiers with regularization
% parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
% whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
% function. It is okay to use a for-loop (for c = 1:num_labels) to
% loop over the different classes.
%
% fmincg works similarly to fminunc, but is more efficient when we
% are dealing with large number of parameters.
%
% Example Code for fmincg:
%
% % Set Initial theta
% initial_theta = zeros(n + 1, 1);
%
% % Set options for fminunc
% options = optimset('GradObj', 'on', 'MaxIter', 50);
%
% % Run fmincg to obtain the optimal theta
% % This function will return theta and the cost
% [theta] = ...
% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
% initial_theta, options);
%
for c = 1:num_labels
initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
all_theta(c,:) = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)),initial_theta, options);
end;
% =========================================================================
end
function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels
%are in the range 1..K, where K = size(all_theta, 1).
% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
% for each example in the matrix X. Note that X contains the examples in
% rows. all_theta is a matrix where the i-th row is a trained logistic
% regression theta vector for the i-th class. You should set p to a vector
% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
% for 4 examples)
m = size(X, 1);
num_labels = size(all_theta, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters (one-vs-all).
% You should set p to a vector of predictions (from 1 to
% num_labels).
%
% Hint: This code can be done all vectorized using the max function.
% In particular, the max function can also return the index of the
% max element, for more information see 'help max'. If your examples
% are in rows, then, you can use max(A, [], 2) to obtain the max
% for each row.
%
for c=1:num_labels
[c,b]=max(X*all_theta',[],2);
end
p=b;
% =========================================================================
end
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
a1 = [ones(m,1) X];
for i = 1 : m;
z1 = Theta1 * a1(i,:)';
a2 = sigmoid(z1);
z2 = Theta2 * [1;a2];
[~,id] = max(sigmoid(z2));
p(i,:) = id;
end
% =========================================================================
end
总结
没有总结